<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta http-equiv="X-UA-Compatible" content="ie=edge">
    <title>PerfView User's Guide</title>
    <style>
        body {
            font-family: Segoe UI,SegoeUI,Segoe WP,Helvetica Neue,Helvetica,Tahoma,Arial,sans-serif;
            font-weight: 400;
            text-rendering: optimizeLegibility;
            -webkit-font-smoothing: antialiased;
        }

        hr {
            border-top: 3px double gray;
        }
    </style>
</head>
<body>
    <!--  ************************************************************************************* -->
    <h1>
        <a id="UsersGuide">PerfView User's Guide</a>
    </h1>
    <p>
        PerfView is a tool for quickly and easily collecting and viewing both time and memory
        performance data. PerfView uses the <a href="http://msdn.microsoft.com/en-us/library/bb968803(v=VS.85).aspx">
            Event Tracing for Windows (ETW)
        </a> feature of the operating system which can
        collect information machine wide a variety of useful events as described in the
        <a href="#AdvancedCollection">advanced collection</a> section. ETW is the same powerful
        technology the <strong>windows performance group uses almost exclusively</strong>
        to track and understand the performance of windows, and the basis for their
        <a href="http://msdn.microsoft.com/en-us/performance/default.aspx">Xperf</a> tool.
        PerfView can be thought of a simplified and user friendly version
        of that tool. In addition PerfView has ability to collect .NET GC Heap information
        for doing memory investigation (Even for very large GC heaps). PerfView's ability
        to decode .NET symbolic information as well as the GC heap make <strong>
            PerfView ideal
            for managed code investigations
        </strong>.
    </p>
    <p>
        <strong>Deploying and Using PerfView</strong>
    </p>
    <p>
        PerfView was designed to be easy to deploy and use.&nbsp;&nbsp; To deploy PerfView
        simply copy the PerfView.exe to the computer you wish to use it on.&nbsp;&nbsp;&nbsp;
        No additional files or installation step is needed.&nbsp;&nbsp;&nbsp; PerfView features
        are &#39;self-discoverable&#39;.&nbsp;&nbsp;&nbsp; The initial display is a &#39;quick
        start&#39; guide that leads you through collecting and viewing your first set of
        profile data. There is also a built in <a href="#Tutorial">tutorial</a>. Hovering
        the mouse over most GUI controls will give you short explanations, and hyperlinks
        send you to the most appropriate part of this user's guide. Finally PerfView is
        &#39;right click enabled&#39; which means that you want to manipulate data in some
        way, right clicking allows you to discover what PerfView's can do for you.
    </p>
    <p>
        PerfView is a V4.6.2 .NET application.&nbsp;&nbsp; Thus you need to have installed
        a V4.6.2 .NET Runtime on the machine which you actually run PerfView.&nbsp;&nbsp;
        On Windows 10 and Windows Server 2016 has .NET V4.6.2.
        On other supported OS you can install .NET 4.6.2 from standalone installer. PerfView is not supported
        on Win2K3 or WinXP.&nbsp;&nbsp;&nbsp; While PerfView itself needs a V4.6.2 runtime,
        it can collect data on processes that use V2.0 and v4.0 runtimes. On machines that don't
        have V4.6.2 or later of the .NET runtime installed, it is also possible to collect ETL data
        with another tool (e.g. XPERF or PerfMonitor) and then copy data file to a machine
        with V4.6.2 and view it with PerfView.
    </p>
    <p>
        <a id="WhatPerfViewCanDoForYou"><strong>What can PerfView do for you?</strong></a>
    </p>
    <p>
        PerfView was designed to collect and analyze both time and memory scenarios.
    </p>
    <ol>
        <li>
            <strong><a id="CPUInvestigation">CPU Investigation</a>:</strong> One of the more useful events
            (and one that is turned on
            by default) is the &#39;profile&#39; sampling event.&nbsp;&nbsp; This event samples
            the instruction pointer of each of the machine&#39;s CPUs every millisecond.&nbsp;&nbsp;
            Each sample captures the complete call stack of the thread current executing; giving
            very detailed and useful information about what that thread was doing at both high
            and low levels of abstraction.&nbsp;&nbsp; PerfView aggregates these stack traces
            and presents them in a <a href="#StackViewer">stack viewer</a> that has powerful
            grouping operations that make understanding this data significantly simpler than
            most profilers.&nbsp;&nbsp;&nbsp;&nbsp; If your application&#39;s performance problem
            is associated with excessive CPU usage, then PerfView will tell you that and give
            you the tools you need to understand exactly what portion of your application is
            mis-behaved. See <a href="#StartingAnAnalysis">Starting a CPU Analysis</a> for more
        </li>
        <li>
            <strong>Managed Memory Investigations:</strong> PerfView also has the ability to take a snapshot
            of the .NET GC heap. Because these heaps can be very large, PerfView allows control
            over how large of a sample is taken, and goes to some trouble to take a representative
            sample if the heap is too big to capture in its entirety. It then converts the graph
            of objects in the heap into a tree, and displays this in the same <a href="#StackViewer">stack viewer</a>
            that was used for CPU investigations. See <a href="#InvestigatingMemoryData">Investigating Memory</a> and
            <a href="#StartingAnAnalysisGCHeap">Starting a GC Heap Analysis</a> for more
        </li>
        <li>
            <strong>Response Time Investigations: </strong>  By collecting with the 'ThreadTime' option
            enough information is collected so that PerfView has the ability to measure what every
            thread (blocked or not), gather all the thread time associated with every request and display
            it as a tree.   This is what the 'Thread Time (with Start-Stop Activities) view is.
            See <a href="#MakingServerInvestigationEasy">Making Server Investigation Easy</a> for more.
        </li>
        <li>
            <strong>Wall Clock / Blocked Time Investigations:</strong> If your program is too slow, but it is not consuming
            excessive CPU, then it must be blocked waiting on something else (disk network,
            ...). PerfView can instruct the OS to log events whenever threads sleep or wake
            up, and has a display for visualizing where your program is waiting.
            See <a href="#BlockedTimeInvestigation">Blocked / Wall Clock Time Investigation</a> for more.
        </li>
        <li>
            <strong> Memory Investigations:</strong> You can also turn on events every time the OS heap memory
            allocator allocates or frees an object.   Using these events you can see what call
            stacks are responsible for the most net unmanaged memory allocations.
            See <a href="#InvestigatingMemoryData">Investigating Memory</a> and
            <a href="#UnmanagedMemoryAnalysis">Unmanaged Heap Analysis</a> for more.
        </li>
        <li>
            <strong> CPU Investigations:</strong>  PerfView has the ability to read the output of the Linux 'Perf Events'
            collector that is built into the Linux kernel.
            See <a href="#ViewingLinuxData">Viewing Linux Data</a> for more.
        </li>
        <li>
            <strong>Viewing your own hierarchical data in PerfView's stack viewer:</strong>  PerfView's stack viewer is
            powerful, but it is also very flexible.  PerfView defines a very simple XML or JSON format that
            it can read into this viewer.   This allows you to easily generate data that you can then
            view in PerfView's powerful stack viewer.
            See <a href="#ViewingExternalData">Viewing External Data</a> for more.
        </li>
    </ol>
    <p>
        See also <a href="#ReferenceGuide">PerfView Reference Guide</a>.
    </p>
    <hr />
    <!--  **************** -->
    <h3>
        <a id="Feedback">Sending feedback / Asking Questions about PerfView</a>
    </h3>
    <p>
        Hopefully the documentation does a reasonably good job of answering your most common
        questions about PerfView and performance investigation in general. If you have a
        question, you should certainly start by searching the user's guide for information
    </p>
    <p>
        Inevitably however, there will be questions that the docs don't answer, or features
        you would like to have that don't yet exist, or bugs you want to report.  PerfView
        is an <a href="https://github.com/Microsoft/perfview">GitHub open source project</a>
        and you should log questions, bugs or other feedback at
    </p>
    <center>
        <a href="https://github.com/Microsoft/perfview/issues"> PerfView Issues </a>
    </center>
    <p>
        If you are just asking a question there is a Label called 'Question' that you can
        use to indicate that.  If it is a bug, it REALLY helps if you supply enough information
        to reproduce the bug.  Typically this includes the data file you are operating on.
        You can drag small files into the issue itself, however more likely you will need
        to put the data file in the cloud somewhere and refer to it in the issue.   Finally
        if you are making a suggestion, the more specific you can be the better.   Large features
        are much less likely to ever be implemented unless you yourself help with the implementation.
        Please keep that in mind.
    </p>
    <!--  **************** -->
    <hr />
    <h3>
        <a id="LatestVersion">Getting the latest version of PerfView</a>
    </h3>
    <p>
        You can get the latest version of PerfView by going to the <a href="https://github.com/Microsoft/perfview/blob/main/documentation/Downloading.md">PerfView GitHub Download Page</a>
    </p>
    <hr />
    <hr />
    <!--  ********************************** -->
    <h2>
        <a id="Tutorial">Tutorial of a Time-Based Investigation</a>
    </h2>
    <p>
        See Also <a href="#TutorialGCHeap">Tutorial of a GC Heap Memory Investigation</a>
    </p>
    <p>
        Perhaps the best way to get started is to simply try out the tutorial example.&nbsp;&nbsp;&nbsp;
        On windows 7 it is recommended that you doc your help as described in <a href="#HelpTip">help tips</a>.&nbsp;
        PerfView comes with two tutorial examples &#39;built in&#39;.&nbsp;&nbsp;&nbsp;
        Also we strongly suggest that any application you write have performance plan as
        described in <a href="http://msdn.microsoft.com/en-us/magazine/cc500596.aspx">part1</a>
        and <a href="http://msdn.microsoft.com/en-us/magazine/cc507639.aspx">part2</a> of
        <a href="http://msdn.microsoft.com/en-us/magazine/cc500596.aspx">
            Measure Early and Often
            for Performance
        </a>.&nbsp;
    </p>
    <ol>
        <li>
            <strong>Tutorial.exe </strong>- A simple program that calls &#39;DateTime.Now&#39;
            repeatedly until it detects that at 5 seconds have gone by. To make this example
            more interesting, it does this using two mutually recursive methods (RecSpin, and
            RecSpinHelper).&nbsp; Each of these helpers spins for a second and then calls the
            other helper to spin for the rest of the time.&nbsp;&nbsp; See <a href="Tutorial.cs.txt">Tutorial.cs</a>
            for the complete source.&nbsp;
        </li>
    </ol>
    <p>
        To run the &#39;Tutorial&#39; example:
    </p>
    <ol>
        <li>
            Click on the &#39;Run a command&#39; hyperlink on the main page.&nbsp; This will
            bring up dialog indicating command to run and the name of the data file to create.&nbsp;&nbsp;
        </li>
        <li>Enter &#39;Tutorial.exe&#39; in the &#39;command&#39; text dialog and hit &lt;enter&gt;.&nbsp;</li>
        <li>
            Unless you started PerfView from an elevated environment, the operating system will
            bring up a user access control to run as administrator (collecting profile data
            is a privileged activity).&nbsp; Click OK to accept.&nbsp;
        </li>
        <li>
            At this point it will begin running the command.&nbsp; The Status bar will blink
            to indicate that it is working on your command.&nbsp;&nbsp; You can monitor its
            progress by hitting the &#39;Log&#39; button in the lower right corner.&nbsp; After
            it has completed it brings up a process selection dialog box. PerfView is asking
            which process you are focused on. In this case we are interested in the 'Tutorial'
            process, so we should select that. If you are interested in all process there is
            a button for that too.
        </li>
    </ol>
    <p>
        You can also run the tutorial example by typing &#39;<strong>PerfView run tutorial</strong>&#39;
        at the command line.&nbsp;&nbsp;&nbsp; See <a href="#CollectingFromCommandLine">
            collecting
            data from the command line
        </a> for more.
    </p>
    <p>
        After selecting 'Tutorial.exe' as the process of interest, PerfView brings up the
        <a href="#StackViewer">stack viewer</a> looking something like this:
    </p>
    <center>
        <img src="images/stackViewer.png" alt="StackView" />
    </center>
    <p>
        This view shows you where CPU time was spent.&nbsp;&nbsp; PerfView took a sample
        of where each processor is (including the full stack), every millisecond (see <a href="#UnderstandingPerfData">understanding perf data</a>) and the stack viewer
        shows these samples.&nbsp;&nbsp; Because we told PerfView we were only interested
        in the Tutorial.exe process this view has been restricted (by &#39;<a href="#IncPatsTextBox">IncPats</a>&#39;)
        to only show you samples that were spent in that process.&nbsp;&nbsp;
    </p>
    <p>
        It is always best to begin your investigation by looking at the summary information
        at the top of the view.&nbsp;&nbsp; This allows you to confirm that indeed the bulk
        of your performance problem is related to CPU usage before you go chasing down exactly
        where CPU is spent.&nbsp; This is what the summary statistics are for.&nbsp; We
        see that the process spent 84% of its wall clock time consuming CPU, which merits
        further investigation.&nbsp;&nbsp; Next we simply look at the <a href="#WhenColumn">&#39;When&#39; column</a>
        for the &#39;Main&#39; method in the program.&nbsp;&nbsp;
        This column shows how CPU was used for that method (or any method it calls) over
        the collection time interval.&nbsp;&nbsp; Time is broken into 32 &#39;TimeBuckets&#39;
        (in this case we see from the summary statistics that each bucket was 197 msec long),
        and a number or letter represents what % of 1 CPU is used.&nbsp; 9s and As mean
        you are close to 100% and we can see that over the lifetime of the main method we
        are close to 100% utilization of 1 CPU most of the time.&nbsp;&nbsp;&nbsp;&nbsp;
        Areas outside the main program are probably not interesting to use (they deal with
        runtime startup and the times before and after process launch), so we probably want
        to &#39;zoom in&#39; to that area.&nbsp;
    </p>
    <h4>
        <a id="ZoomingToARangeOfInterest">Zooming in to a time range of interest</a>
    </h4>
    <p>
        It is pretty common that you are only interested in part of the trace.&nbsp; For
        example you may only care about startup time, or the time from when a mouse was
        clicked and when the menu was displayed.&nbsp;&nbsp; Thus zooming in is typically
        one of first operations you will want to do.&nbsp; zooming in is really just selecting
        a region of time for investigation.&nbsp;&nbsp; The region of time is displayed
        in the &#39;<a href="#StartTextBox">start</a>&#39; and &#39;<a href="#EndTextBox">end</a>&#39;
        textboxes.&nbsp;&nbsp; These can be set in three ways
    </p>
    <ol>
        <li>Manually entering values into the text boxes.</li>
        <li>
            Selecting two cells (typically the &#39;First&#39; and &#39;Last&#39;) cells of
            a particular method of interest, right clicking and selecting &#39;SetTimeRange&#39;
        </li>
        <li>
            Selecting a &#39;When&#39; cell. If you click the cell again, the cell will become
            editable, at which point you can select a region a text right click, and select
            &#39;SetTimeRange&#39; (or hit Alt-R) to select the time range associated with your
            selected characters.
        </li>
    </ol>
    <p>
        Try out each of these techniques.&nbsp;&nbsp;&nbsp; For example to &#39;zoom into&#39;
        just the main method, simply drag the mouse over the &#39;First&#39; and &#39;Last&#39;
        times to select both, right click, and Select Time Range.&nbsp;&nbsp; You can hit
        the &#39;Back&#39; button to undo any changes you&nbsp; made so you can re-select.&nbsp;&nbsp;
        Also notice that each text box remembers the last several values of that box, so
        you can also &#39;go back&#39; particular past values by selecting drop down (small
        down array to the right of the box), and selecting the desired value.
    </p>
    <p>
        For GUI applications, it is not uncommon to take a trace of the whole run but then
        &#39;zoom into&#39; points where the users triggered activity.&nbsp;&nbsp; You can
        do this by switching to the &#39;<a href="#CallTreeView">CallTree</a>&#39; tab.&nbsp;&nbsp;
        This will show you CPU starting from the process itself.&nbsp; The first line of
        is the View is &#39;Process32 tutorial.exe&#39; and is a summary of the CPU time
        for the entire process.&nbsp; The &#39;<a href="#WhenColumn">when</a>&#39; column
        shows you CPU for the process over time (32 time buckets).&nbsp;&nbsp; In a GUI
        application there will be lulls where no CPU was used, followed by bursts of higher
        CPU use corresponding to user actions. These show up in the numbers in the &#39;when&#39;
        column. By clicking on a cell in the &#39;when&#39; column, selecting a range, right
        clicking and selecting SetTimeRange (or Alt-R), you can zoom into one of these &#39;hot
        spots&#39; (you may have to zoom in more than once).&nbsp;&nbsp;&nbsp; Now you have
        focused in on what you are interested in (you can confirm by looking at the methods
        that are called during that time).&nbsp;&nbsp; This is a very useful technique.&nbsp;
    </p>
    <p>
        For managed applications, you will always want to zoom into the main method before
        starting your investigation.&nbsp; The reason is that when profile data is collected,
        after Main has exited, the runtime spends some time dumping symbolic information
        to the ETW log.&nbsp;&nbsp; This is almost never interesting, and you want to ignore
        it in your investigation.&nbsp; Zooming into the Main method will do this.&nbsp;
    </p>
    <h4>
        <a id="ResolvingUnmanagedSymbols">Resolving unmanaged symbols</a>
    </h4>
    After zooming into the region of interest, if you are doing an unmanaged investigation,
    you may need to resolve symbols.&nbsp;&nbsp; Unlike managed code,&nbsp; unmanaged
    code stores its symbolic information in external PDB files which need to be downloaded
    and matched up.&nbsp; Because this can take a while it is not done by default.&nbsp;&nbsp;
    Instead you see question marks in the trace, (like ntdll!?) indicating that PerfView
    knows the sample came from ntdll, but it can&#39;t resolve the name further.&nbsp;&nbsp;&nbsp;&nbsp;
    For many DLLs you will never need to resolve these symbols because you simply don&#39;t
    care (you don&#39;t own or call that code).&nbsp;&nbsp; However if you do care,
    you can quickly get the symbols.&nbsp; Simply select a cell with at DLL!? in it,
    right click, and select &#39;Lookup Symbols&#39;.&nbsp; PerfView will then look
    up the symbols for that DLL and redraw the screen.&nbsp;&nbsp; Try looking up the
    symbols for ntdll by selecting the cell
    <ul>
        <li>OTHER &lt;&lt;ntdll!?&gt;&gt;;</li>
    </ul>
    <p>
        Right clicking, and select 'Lookup Symbols'. After looking up the symbols it will
        become
    </p>
    <ul>
        <li>OTHER &lt;&lt;ntdll!_RtlUserThreadStart&gt;&gt;</li>
    </ul>
    <p>
        If you are doing an unmanaged investigation there are probably a handful of DLLs
        you will need symbols for.&nbsp; A common workflow is to look at the byname view
        and while holding down the CTRL key select all the cells that contain dlls with
        large CPU time but unresolved symbols.&nbsp;&nbsp; Then right click -&gt; Lookup
        Symbols, and PerfView will look them all up in bulk.&nbsp;&nbsp; See <a href="#SymbolResolution">
            symbol
            resolution
        </a> for more details or if lookup symbols fails.&nbsp;
    </p>
    <h4>
        <strong><a id="TutorialBottomUp">A Bottom Up Investigation</a></strong>
    </h4>
    <p>
        PerfView starts you with the &#39;<a href="#ByNameView">ByName view</a>&#39; for
        doing a bottom-up analysis (see also <a href="#StartingAnAnalysis">starting an analysis</a>).&nbsp;
        In this view you see every method that was involved in a sample (either a sample
        occurred in the method or the method called a routine that had a sample).&nbsp;&nbsp;
        Samples can either be exclusive (occurred in within that method), or inclusive (occurred
        in that method or any method that method called).&nbsp;&nbsp;&nbsp; By default the
        by name view sorts methods based on their exclusive time (see also <a href="#ColumnSorting">Column Sorting</a>).&nbsp;&nbsp;
        This shows you the &#39;hottest&#39; methods
        in your program.&nbsp;
    </p>
    <p>
        Typically the problem with a &#39;bottom-up&#39; approach is that the &#39;hot&#39;
        methods in your program are
    </p>
    <ol>
        <li>Not very hot (use &lt; 5% of CPU)</li>
        <li>
            Tend to be &#39;helper&#39; routines (either in your program or in libraries or
            the runtime), that are used &#39;everywhere&#39; and are already well tuned.
        </li>
    </ol>
    <p>
        In both cases, you don&#39;t want to see these helper routines, but rather the lowest
        &#39;semantically interesting&#39; routine.&nbsp;&nbsp;&nbsp; This is where PerfView&#39;s
        powerful grouping features comes into play.&nbsp;&nbsp; By default PerfView groups
        samples by
    </p>
    <ol>
        <li>
            Using the <a href="#GroupPatsTextBox">GroupPats</a> &#39;Just my code&#39;&nbsp;
            pattern to form two groups.&nbsp; The first group is any method in any module that
            is in the same directory (recursively) as the &#39;exe&#39; itself.&nbsp;&nbsp;
            This is the &#39;my code&#39; group and these samples are left alone.&nbsp;&nbsp;
            Any sample that is NOT in that first group is in the &#39;OTHER&#39; group.&nbsp;&nbsp;
            These samples are groups according to the method that was called to enter the group.&nbsp;&nbsp;
        </li>
        <li>
            Using the <a href="#FoldPercentTextBox">Fold %</a> feature.&nbsp;&nbsp; This is
            set to 1, which means that any method that has fewer than 1% of the samples (inclusively)
            in the 'byname' view (that over all the sampled indicated int the summary at the top of the view)
            is not &#39;interesting&#39;
            and should not be shown.&nbsp; Instead its samples are folded (inlined), into its
            caller.&nbsp;
        </li>
    </ol>
    <p>
        For example, the top line in the&nbsp; ByName view is
    </p>
    <ul>
        <li>OTHER &lt;&lt;mscorlib!System.DateTime.get_Now()&gt;&gt;</li>
    </ul>
    <p>
        This is an example of an &#39;<a href="#EntryGroups">entry group</a>&#39;.&nbsp;&nbsp;
        &#39;OTHER&#39; is the group&#39;s name and mscorlib!System.DateTime.get_Now() is
        the method that was called that entered the group.&nbsp;&nbsp; From that point on
        any methods that get_Now() calls <strong>that are within that group</strong> are
        not shown, but rather their time is simply accumulated into this node.&nbsp;&nbsp;
        Effectively this grouping says &#39;I don&#39;t want to see the internal workings
        of functions that are not my code, but I do want see public methods I used to call
        that code.&nbsp;&nbsp;&nbsp; To give you an idea of how useful this feature is,
        simply turn it off (by clearing the value in the &#39;GroupPats&#39; box), and view
        the data.&nbsp; You will see many more methods with names of internal functions
        used by &#39;get_Now&#39; which just make your analysis more difficult.&nbsp; (You
        can use the &#39;back&#39; button to quickly restore the previous group pattern).&nbsp;
    </p>
    <p>
        The other feature that helps &#39;clean up&#39; the bottom-up view is the&nbsp;
        <a href="#FoldPercentTextBox">Fold %</a> feature.&nbsp;&nbsp; This feature will
        cause all &#39;small&#39; call tree nodes (less than the given %) to be automatically
        folded into their parent.&nbsp; Again you can see how much this feature helps by
        clearing the textbox (which means no folding).&nbsp;&nbsp; With that feature off,&nbsp;
        you will see many more entries that have &#39;small&#39; amounts of time.&nbsp;&nbsp;
        These small entries again tend to just add &#39;clutter&#39; and make investigation
        harder.
    </p>
    <h4>More Folding</h4>
    Because of the grouping and folding that PerfView did for you, you can quickly see
    that &#39;DateTime.get_Now()&#39; is the &#39;hot&#39; method (74.6% of all samples).&nbsp;
    However also note that PerfView did not do a &#39;perfect&#39; job.&nbsp;&nbsp;
    We notice that the view has groups &lt;ntdll!?&gt; and &lt;ntoskrln!?&gt; which
    are two important operating system DLLs take up 9.5% and 2% of the CPU and knowing
    just some function in the DLL was called is not terribly useful.&nbsp;&nbsp; We
    have two choices<ol>
        <li>
            Resolve the symbols for these DLLs so that we have meaningful names.&nbsp;&nbsp;
            See <a href="#SymbolResolution">symbol resolution</a> for more.
        </li>
        <li>Fold these entries away.&nbsp; </li>
    </ol>
    <p>
        A quick way of accomplishing (2) is to add the pattern &#39;!?&#39; .&nbsp; This
        pattern says to fold away any nodes that don&#39;t have a method name.&nbsp; See
        <a href="#FoldPatsTextBox">foldPats textbox</a> for more. This leaves us with very
        &#39;clean&#39; function view that has only semantically relevant nodes in it.&nbsp;
    </p>
    <p>
        <strong>
            Review: what all this time selection, grouping and&nbsp; folding is for?&nbsp;
        </strong>
    </p>
    <p>
        The first phase of a perf investigation is forming a &#39;perf model&#39;&nbsp;
        The goal is it assign times to SEMANTICALLY RELEVANT nodes (things the programmer
        understands and can do something about).&nbsp;&nbsp; We do that by either forming
        a semantically interesting group and assigning nodes to it, or by folding the node
        into an existing semantically relevant group or (most commonly) leveraging entry
        points into large groups (modules and classes), as handy &#39;pre made&#39; semantically
        relevant nodes.&nbsp; The goal is to group costs into a relatively small number
        (&lt; 10) of SEMANTICALLY RELEVANT entries. This allows you to reason about whether
        that cost is appropriate or not, (which is the second phase of the investigation).
    </p>
    <h4>Broken Stacks</h4>
    <p>
        One of the nodes that is left is a node called &#39;BROKEN&#39;.&nbsp; This is a
        special node that represents samples whose stack traces were determined to be incomplete
        and therefore cannot be attributed properly.&nbsp;&nbsp; As long as this number
        is small (&lt; a few %) then it can simply be ignored.&nbsp; See <a href="#BrokenStacks">broken stacks</a> for more.
    </p>
    <h4>Time and Percentage.</h4>
    <p>
        PerfView displays both the inclusive and exclusive time as both a metric (msec)
        as well as a % because both are useful.&nbsp;&nbsp; The percentage gives you a good
        idea of the relative cost of the node, however the absolute value is useful because
        it very clearly represents &#39;clock time&#39; (e.g. 300 samples represent 300
        msec of CPU time).&nbsp;&nbsp;&nbsp; The absolute value is also useful because when
        the value gets significantly less than 10 it&nbsp; becomes unreliable (when you
        have only a handful of samples they might have happened &#39;by pure chance&#39;
        and thus should not be relied upon.
    </p>
    <h4>
        <a id="TutorialTopDown">CallTree View&nbsp; (top-down investigations))</a>
    </h4>
    <p>
        The bottom up view did an excellent job of determining that the get_Now() method
        as well as the &#39;SpinForASecond&#39; consume the largest amount of time and thus
        are worth looking at closely.&nbsp;&nbsp;&nbsp;&nbsp; This corresponds beautify
        to our expectations given the source code in <a href="Tutorial.cs.txt">Tutorial.cs</a>.&nbsp;&nbsp;&nbsp;
        However it can also be useful to understand where CPU time was consumed from the
        top down.&nbsp; This is what the <a href="#CallTreeView">CallTree view</a> is for.&nbsp;&nbsp;
        Simply by clicking the &#39;CallTree&#39; tab of the stack viewer will bring&nbsp;
        you to that view.&nbsp;&nbsp; Initially the display only shows the root node, but
        you can open the node by clicking on the check box (or hitting the space bar). This
        will expand the node.&nbsp;&nbsp; As long as a node only has one child, the child
        node is also auto-expanded, to save some clicking.&nbsp;&nbsp;&nbsp; You can also
        right click and select &#39;expand-all&#39; to expand all nodes under the selected
        node.&nbsp;&nbsp; Doing this on the root node yields the following display
    </p>
    <center>
        <img src="images/CallTreeView.png" alt="CallTreeView" />
    </center>
    <p>
        Notice how clean the call tree view is, without a lot of &#39;noise&#39; entries.&nbsp;
        In fact this view does a really good job of describing what is going on.&nbsp;&nbsp;
        Notice it clearly shows the fact that Main calls &#39;RecSpin, which runs for 5
        seconds (from 894ms to 5899msec) consuming 4698 msec of CPU while doing so (The
        CPU is not 5000msec because of the overheads of actually collecting the profile
        (and other OS overhead which is not attributed to this process as well as broken
        stacks), which typically run in the 5-10% range.&nbsp;&nbsp; In this case it seems
        to be about 6%).&nbsp;&nbsp; The &#39;When&#39; column also clearly shows how one
        instance of RecSpin runs SpinForASecond (for exactly a second) and then calls a
        RecSpinHelper which does consumes close to 100% of the CPU for the rest of the time.
        .&nbsp;&nbsp; The call Tree is a wonderful top-down synopsis.&nbsp;
    </p>
    <h4>Getting a &#39;coarser&#39; view</h4>
    <p>
        All of the filtering and grouping parameters at the top of the view affect any of
        the view (byname, caller-callee or CallTree), equally.&nbsp;&nbsp;&nbsp; We can
        use this fact and the &#39;Fold %&#39; functionality to get an even coarser view
        of the &#39;top&#39; of the call tree.&nbsp;&nbsp; With all nodes expanded, simply
        right click on the window and select &#39;Increase Fold %&#39; (or easier hit the
        F7 key).&nbsp; This increases the number it the Fold % textbox by 1.6X.&nbsp;&nbsp;
        By hitting the F7 key repeatedly you keep trimming down the &#39;bottoms&#39; of
        the stacks until you only see only the methods that use a large amount of CPU time.&nbsp;&nbsp;&nbsp;
        The following image shows the CallTreeView&nbsp; after hitting F7 seven times.&nbsp;&nbsp;
    </p>
    <center>
        <img src="images/PrunedCallTreeView.png" alt="CallTreeView" />
    </center>
    <p>
        You can restore the previous view by either using the &#39;Back&#39; button, the
        Shift-F7 key (which decreases the Fold%) or by simply selecting 1 in the Fold% box
        (e.g. from the drop down menu).&nbsp;
    </p>
    <h4>The Caller-Callee view</h4>
    <p>
        Getting a course view of the tree is useful but sometimes you just want to restrict
        your attention to what is happening at a single node.&nbsp;&nbsp; For example, if
        the inclusive time for BROKEN stacks is large, you might want to view the nodes
        under &#39;BROKEN&#39; stacks to get an idea what samples are &#39;missing&#39;
        from their proper position in the call tree.&nbsp;&nbsp; you can do this easily
        by viewing the BROKEN node in the Caller-callee view.&nbsp;&nbsp; To do this right
        click on the BROKEN node, and select Goto -> Caller-callee (or type Alt-C). Because
        so few samples are in our trace are BROKEN this node is not very interesting. By
        setting Fold % to 0 (blank) you get the following view
    </p>
    <center>
        <img src="images/CallerCalleeView.png" alt="CallerCalleeView" />
    </center>
    <p>
        The view is broken in to three grids.&nbsp;&nbsp; The middle piece shows the &#39;current
        node&#39;, in this case &#39;BROKEN&#39;.&nbsp;&nbsp; The top grid shows all nodes
        that call into this focus node.&nbsp;&nbsp; In the case of BROKEN nodes are only
        on one thread.&nbsp;&nbsp;&nbsp;&nbsp; The bottom graph shows all nodes that are
        called by &#39;BROKEN&#39; sorted by inclusive time.&nbsp;&nbsp; We can see that
        most of the broken&nbsp; nodes came from stacks that originated in the &#39;ntoskrnl&#39;
        dll (this is the Windows OS Kernel)&nbsp;&nbsp;&nbsp; To dig in more we would first
        need to resolve symbols for this DLL.&nbsp; See <a href="#SymbolResolution">symbol resolution</a>
        for more.
    </p>
    <h4>
        <a id="TutorialDrillingIntoGroups">Drilling into Groups (Ungrouping)</a>
    </h4>
    <p>
        While groups are a very powerful feature for understanding the performance of your
        program at a &#39;coarse&#39; level, inevitably, you wish to &#39;Drill into&#39;
        those groups and understand the details of PARTICULAR nodes in detail.&nbsp;&nbsp;
        For example,&nbsp; If we were a developer responsible for the DateTime.get_Now(),
        we would not be interested in the fact that it was called from &#39;SpinForASecond&#39;
        routine but what was going on inside.&nbsp;&nbsp; Moreover we DON&#39;T want to
        see samples from other parts of the program &#39;cluttering&#39; the analysis of
        get_Now().&nbsp;&nbsp;&nbsp; This is what the &#39;Drill Into&#39; command is for.&nbsp;&nbsp;&nbsp;
        If we go back to the &#39;ByName&#39; view and select the 3792 samples&nbsp; &#39;Inc&#39;
        column of the &#39;get_Now&#39; right click, and select &#39;Drill Into&#39;, it
        brings a new window where ONLY THOSE 3792 samples have been extracted.
    </p>
    <p>
        Initially Drilling in does not change any filter/grouping parameters.&nbsp;&nbsp;
        However, now that we have isolated the samples of interest, we are free to change
        the grouping and folding to understand the data at a new level of abstraction. Typically
        this means ungrouping something. In this case we would like to see the detail of
        how mscorlib!get_Now() works, so we want to see details inside mscorlib. To do this
        we select the &#39;mscorlib!DateTime.get_Now() node, right click, and select &#39;Ungroup
        Module&#39;.&nbsp;&nbsp; This indicates that we wish to ungroup any methods that
        were in the &#39;mscorlib&#39; module.&nbsp;&nbsp; This allows you to see the &#39;inner
        structure&#39; of that routine (without ungrouping completely) The result is the
        following display
    </p>
    <center>
        <img src="images/Ungrouped.png" alt="Ungrouped" />
    </center>
    <p>
        At this point we can see that most of the &#39;get_Now&#39; time is spend in a function
        called &#39;GetUtcOffsetFromUniversalTime&#39; and &#39;GetDatePart&#39;&nbsp;&nbsp;
        We have the full power of the stack viewer at our disposal, folding, grouping, using
        CallTree or caller-callee views to further refine our analysis.&nbsp;&nbsp; Because
        the &#39;Drill Into&#39; window is separate from its parent, you can treat is as
        &#39;disposable&#39; and simply discard it when you are finished looking at this
        aspect of your program&#39;s performance.&nbsp;
    </p>
    <p>
        In the example above we drilled into the inclusive samples of method.&nbsp; However
        you can also do the same thing to drill&nbsp; into exclusive samples.&nbsp;&nbsp;&nbsp;
        This is useful when user callbacks or virtual functions are involved.&nbsp;&nbsp;
        Take for example a &#39;sort&#39; routine that has internal helper functions.&nbsp;
        In that case it can be useful to segregate those samples that were part of the nodes
        &#39;internal helpers&#39; (which would be folded up as exclusive samples of &#39;sort&#39;)
        from those that were caused by the user &#39;compare&#39; function (which would
        typically not be grouped as exclusive samples because it crossed a module boundary).&nbsp;&nbsp;&nbsp;
        By drilling into the exclusive samples of &#39;sort&#39; and then ungrouping, you
        get to see just those samples in &#39;sort&#39; that were NOT part of the user callback.&nbsp;&nbsp;
        Typically this is EXACTLY what the programmer responsible for the &#39;sort&#39;
        routine would want to see.&nbsp;
    </p>
    <h4>
        <a id="GotoSource">Viewing Source (Line level analysis)</a>
    </h4>
    <p>
        Once the analysis has determined methods are potentially inefficient, the next step
        is to understand the code enough to make an improvement. PerfView helps with this
        by implementing the 'Goto Source' functionality. Simply select a cell with a method
        name in it, right click and choose <a href="#SourceCodeLookup">Goto Source</a> (or
        use Alt-D (D for definition)). PerfView with then attempt to look up the source code
        and if successful will launch a text editor window. For example, if you select the
        'SpinForASecond' cell in the ByName view and select Goto Source the following window
        is displayed.
    </p>
    <center>
        <img src="images/SourceCode.png" alt="Ungrouped" />
    </center>
    <p>
        As you can see, the particular method is displayed and each line has been prefixed
        with the cost (in this case CPU MSec) spent on that line. in this view it shows
        4.9 seconds of CPU time were spent on the first line of the method.
    </p>
    <h5>
        Caveats with Source code
    </h5>
    <p>
        Unfortunately, prior to V4.5 of the .NET Runtime, the runtime did not emit enough
        information into the ETL file to resolve a sample down to a line number (only to
        a method). As a result while PerfView can bring up the source code, it can't accurately
        place samples on particular lines unless the code was running on V4.5 or later.
        When PerfView does not have the information it needs it simply attributes all the
        cost to the first line of the method. This is in fact what you see in the example
        above. If you run your example on a V4.5 runtime, you would get a more interesting
        distribution of cost. This problem does not exist for native code (you will get
        line level resolution). Even on old runtime versions, however, you at least have
        an easy way to navigate to the relevant source.
    </p>
    <p>
        PerfView finds the source code by looking up information in the PDB file associated
        with the code. Thus the first step is that PerfView must be able to find the PDB
        file. By default most tools will place the complete path of the PDB file inside
        the EXE or DLL it builds, which means that if you have not moved the PDB file (and
        are on the machine you built on), then PerfView will find the PDB. It then looks
        in the PDB file which contain the full path name of each of the source files and
        again, if you are on the machine that built the binary then PerfView will find the
        source. So if you run on the same machine you build on, it 'just works'.
    </p>
    <p>
        However it is common to not run on the machine you built on, in which case PerfView
        needs help. PerfView follows the standard conventions for other tools for locating
        source code. In particular if the _NT_SYMBOL_PATH variable is set to a semicolon
        separated list of paths, it will look in those places for the PDB file. In addition
        if _NT_SOURCE_PATH is set to a semicolon separated list of paths, it will search
        for the source file in subdirectories of each of the paths. Thus setting these environment
        variables will allow PerfView's source code feature to work on 'foreign' machines.
        You can also set the _NT_SYMBOL_PATH and _NT_SOURCE_PATH inside the GUI by using
        the menu items on the File menu on the stack viewer menu bar.
    </p>
    <hr />
    <!--  ********************************** -->
    <h2>
        <a id="TutorialGCHeap">Tutorial for GC Heap Memory Analysis</a>
    </h2>
    <p>
        See Also <a href="#Tutorial">Tutorial of a Time-Based Investigation</a>. While there
        currently is no tutorial on doing a GC heap analysis, if you have not walked the
        <a href="#Tutorial">time based investigation tutorial</a> you should do so. Many
        of the same concepts are used in a memory investigation. You should also take a
        look at
    </p>
    <ul>
        <li><a href="#CollectingDataGCHeap">Collecting GC Heap Data</a> </li>
        <li><a href="#UnderstandingPerfDataGCHeap">Understanding GC heap data</a> </li>
        <li><a href="#StartingAnAnalysisGCHeap">Starting a GC heap analysis</a> </li>
        <li><a href="#GCHeapNetMemStacks">Collecting Stacks at GC allocations</a> </li>
    </ul>
    <p>
        <!-- TODO -->
        TUTORIAL NOT COMPLETE
    </p>
    <hr />
    <hr />
    <!--  *************************************************************************************** -->
    <h1>
        <a id="BestPracices">Performance Investigation Best Practices</a>
    </h1>
    <!--  ********************************** -->
    <h2>
        <a id="InvestigatingTime">Investigating Time</a>
    </h2>
    <!--  ****************** -->
    <h3>
        <a id="CollectingData">Collecting Event (Time Based) Profile Data</a>
    </h3>
    <p>
        As mentioned in the <a href="#UsersGuide">introduction</a>, ETW is light weight
        logging mechanism built into the Windows Operating system that can collect a broad
        variety of information about what is going on in the machine.&nbsp; There are two
        ways PerfView supports for collecting ETW profile data.
    </p>
    <ol>
        <li>
            The <strong>Collect-&gt;Run</strong> (Alt-R) menu item, which prompts for a data file
            name to create and a command to run.&nbsp;&nbsp; The command turns on profiling,
            runs the command, and then turns profiling off.&nbsp; The resulting file is then
            displayed in the stack viewer.&nbsp;&nbsp; This is the preferred mechanism when it
            is easy to launch the application of interest.&nbsp;&nbsp; If the command produces
            output, it will be captured in the log (click the &#39;Log&#39; button in the lower
            right corner of the main view).&nbsp;
        </li>
        <li>
            The <strong>Collect-&gt;Collect </strong>(Alt-C) menu item which only prompts for a
            data file name to create. After clicking the 'Start Collection' button you are then
            free to interact with machine in any way necessary to capture the activity of interest.&nbsp;&nbsp;
            Since profiling is machine wide you are guaranteed to capture it.&nbsp;&nbsp; Once
            you have reproduced the problem, you can dismiss the dialog box to stop profiling
            and proceed to analyze the data.
        </li>
    </ol>
    <p>
        You can also automate the collection of profile data by using <a href="#CommandLineReference">command line options</a>.
        See <a href="#CollectingFromCommandLine">collecting data from the command line</a>
        for more.
    </p>
    <p>
        <strong>If you intend to do a <a href="#BlockedTimeInvestigation">wall clock time investigation</a></strong>
    </p>
    <p>
        By default PerfView chooses a set of events that does not generate too much data
        but is useful for a variety of investigations.   However
        <a href="#BlockedTimeInvestigation">wall clock investigations</a>
        require events that are too voluminous to collect by default.  Thus if you wish to
        do a wall clock investigation, you need to set the 'Thread Time' checkbox in the
        collection dialog.
    </p>
    <p>
        <strong>If you intend to copy the ETL file to another machine for analysis</strong>
    </p>
    <p>
        By default to save time PerfView does NOT prepare the ETL file so that it can be
        analyzed on a different machine (see <a href="#merging">merging</a>). Moreover,
        there is symbolic information (PDBS for NGEN images), that also needs to be included
        if the data is to work well on any machine). If you are intending to do this you
        need to merge and include the NGEN pdbs by using the 'ZIP' command. You can do this
        either by
    </p>
    <ul>
        <li>
            Checking the 'Zip' checkbox on the data collection dialog box when the data is being
            created.
        </li>
        <li>
            Specifying the /Zip qualifier on the command line of PerfView when the data is
            being created.
        </li>
        <li>Right clicking on existing ETL file in the main viewer and selecting the ZIP option.</li>
    </ul>
    <p>
        Once the data has been zipped not only does the file contain all the information
        needed to resolve symbolic information, but it also has been compressed for faster
        file copies. If you intend to use the data on another machine, please specify the
        ZIP option.
    </p>
    <hr />
    <!--  ********************************** -->
    <h2>
        <a id="ViewingStackData">Viewing Stack Data</a>
    </h2>
    <!--  ****************** -->
    <h3>
        <a id="SelectProcessDialog">Selecting a Process of Interest</a>
    </h3>
    <p>
        The result of collecting data is an ETL file (and possibly a .kernel.ETL file as
        discussed in <a href="#merging">merging</a>).&nbsp;&nbsp;&nbsp; When you double
        click on the file in the main viewer it opens up &#39;children views&#39;
        of the data that was collected.&nbsp;&nbsp; One of these items will be the &#39;CPU
        Stacks&#39; view.&nbsp;&nbsp;&nbsp; Double clicking on that will bring up a stack
        viewer to view the samples collected.&nbsp;&nbsp;&nbsp; The data in the ETL file
        contains CPU information for ALL processes in the system,&nbsp; however most analyses
        concentrate on a single process.&nbsp;&nbsp; Because of this before the stack viewer
        is displayed a dialog box to select a process of interest is displayed first.&nbsp;
    </p>
    <p>
        By default, this dialog box contains a list of all processes that were active at
        the time the trace was collected sorted by the amount of CPU time each process consumed.&nbsp;&nbsp;&nbsp;&nbsp;
        If you are doing a CPU investigation, there is a good chance the process of interest
        is near the top of this list.&nbsp; Simply double clicking on the desired process
        will bring up the stack viewer filtered to the process you chose.
    </p>
    <p>
        The process view can be sorted by any of the columns by clicking on the column header.&nbsp;
        Thus if you wish to find the process that was started most recently you can sort
        by start time to find it quickly.&nbsp;&nbsp; If the view is sorted by name, if
        you type the first character of the process name it will navigate to the first process
        with that name.&nbsp;
    </p>
    <p>
        <a id="ProcessFilterTextBox"><strong>Process Filter Textbox</strong></a> The box just
        above the list of processes.&nbsp;&nbsp; If you type text in this box, then only
        processes that match this string (PID, process name or command line, case insensitive) will
        be displayed. &nbsp;&nbsp; The * character is a wild card.&nbsp;&nbsp; This is a quick
        way of finding a particular process.
    </p>
    <p>
        If you wish to see samples for more than one process for your analysis click the
        &#39;All Procs&#39; button.&nbsp;&nbsp;&nbsp;&nbsp;
    </p>
    <p>
        Note that the ONLY effect of the process selection dialog box is to add an &#39;<a href="#IncPatsTextBox">Inc Pats</a>&#39; filter that matches the process you
        chose.&nbsp;&nbsp; Thus the dialog box is really just a &#39;friendly interface&#39;
        to the more powerful <a href="#FilteringGroupingStackData">filtering options</a>
        of the stack viewer.&nbsp;&nbsp; In particular, the stack viewer still has access
        to all the samples (even those outside the process you selected), it is just that
        it filters it out because of the include pattern that was set by the dialog box.&nbsp;&nbsp;
        This means that you can remove or modify this filter at a later point in the analysis.&nbsp;
    </p>
    <hr />
    <!--  ****************** -->
    <h3>
        <a id="UnderstandingPerfData">Understanding Perf Data</a>
    </h3>
    <p>
        The data shown by default in the PerfView stack viewer are stack traces taken every
        millisecond on each processor on the system.&nbsp;&nbsp; Every millisecond, whatever
        process is running is stopped and the operating system &#39;walks the stack&#39;
        associated with the running code.&nbsp;&nbsp;&nbsp; What is preserved when taking
        a stack trace is the return address of every method on the stack.&nbsp;&nbsp; Stackwalking
        may not be perfect.&nbsp;&nbsp; It is possible that the OS can&#39;t find the next
        frame (leading to <a href="#BrokenStacks">broken stacks</a>) or that an optimizing
        compiler has removed a method call (see <a href="#MissingFrames">missing frames</a>),
        which can make analysis more difficult.&nbsp;&nbsp; However for the most part the
        scheme works well, and has low overhead (typically 10% slowdown), so monitoring
        can be done on &#39;production&#39; systems.&nbsp;
    </p>
    <p>
        On lightly loaded system, many CPUs are typically in the &#39;Idle&#39; process
        that the OS run when there is nothing else to do.&nbsp;&nbsp;&nbsp; These samples
        are discarded by PerfView because they are almost never interesting.&nbsp;&nbsp;&nbsp;
        All other samples are kept however, regardless of what process they were taken from.&nbsp;&nbsp;&nbsp;
        Most analyses focus on a single process, and further filter all samples that did
        not occur in the process of interest, however PerfView also allows you to also look
        at samples from all processes as one large tree.&nbsp;&nbsp; This is useful in scenarios
        where more than one process is involved end-to-end, or when you need to run an application
        several times to collect enough samples.&nbsp;
    </p>
    <h4>
        <a id="HowManySamples">How many samples do you need?</a>
    </h4>
    <p>
        Because the samples are taken every millisecond per processor, each sample represents
        1 millisecond of CPU time.&nbsp;&nbsp; However exactly where the sample is taken
        is effectively &#39;random&#39;, and so it is really &#39;unfair&#39; to &#39;charge&#39;
        the full millisecond to the routine that happened to be running at the time the
        sample was taken.&nbsp;&nbsp; While this is true, it is also true that as more samples
        are taken this &#39;unfairness&#39; decreases as the square root of the number of
        samples.&nbsp;&nbsp; If a method has just 1 or 2 samples it could be just random
        chance that it happened in that particular method, but methods with 10 samples are
        likely to have truly used between 7 and 13 samples (30% error).&nbsp; Routines with
        100 samples are likely to be within 90 and 110 (10% error).&nbsp;&nbsp;&nbsp; For
        &#39;typical&#39; analysis this means you want at least 1000 and preferably more
        like 5000 samples (There are diminishing returns after 10K).&nbsp;&nbsp; By collecting
        a few thousand samples you ensure that even moderately &#39;warm&#39; methods will
        have at least 10 samples, and &#39;hot&#39; methods will have at least 100s, which
        keep the error acceptably small.&nbsp;&nbsp; Because PerfView does not allow you
        to vary the sampling frequency, this means that you need to run the scenario for
        at least several seconds (for CPU bound tasks), and 10-20 seconds for less CPU bound
        activities.&nbsp;&nbsp;
    </p>
    <p>
        If the program you wish to measure cannot easily be changed to loop for the&nbsp;
        required amount of time, you can create a batch file that repeatedly launches the
        program and use that to collect data.&nbsp; In this case you will want to view the
        CPU samples for all processes, and then use a GroupPat that&nbsp; erases the process
        ID&nbsp; (e.g. process {%}=&gt;$1) and thus groups all processes of the same name
        together.
    </p>
    <p>
        Even with 1000s of samples,&nbsp; there is still &#39;noise&#39; that is at least in the 3% range (sqrt(1000) ~= 30 = 3%).&nbsp;&nbsp; This error gets larger as the methods / groups being investigated
        have fewer samples.&nbsp;&nbsp; This makes it problematic to use sample based profiling
        to compare two traces to track down small regressions (say 3%).&nbsp;&nbsp; Noise
        is likely to be at least as large as the &#39;signal&#39; (diff) you are trying
        to track down.&nbsp;&nbsp; Increasing the number of samples will help, however you
        should always keep in mind the sampling error when comparing small differences between
        two traces.&nbsp;
    </p>
    <h4>Exclusive and Inclusive Metrics</h4>
    <p>
        Because a stack trace is collected for each sample, every node has both an exclusive
        metric (the number of samples that were collected in that particular method) and
        an inclusive metric (the number of samples that collected in that method or any
        method that method called).&nbsp;&nbsp;&nbsp;&nbsp; Typically you are interested
        in inclusive time, however it is important to realize that folding (see <a href="#FoldPatsTextBox">FoldPats</a>
        and <a href="#FoldPercentTextBox">Fold %</a>) and grouping artificially
        increase exclusive time (it is the time in that method (group) and anything folded
        into that group).&nbsp;&nbsp; When you wish to see the internals of what was folded
        into a node, you <a href="#DrillingIntoGroups">Drill Into</a> the groups to open
        a view where the grouping or folding can be undone.&nbsp;
    </p>
    <hr />
    <!--  ****************** -->
    <h3>
        <a id="StartingAnAnalysis">Starting a CPU Analysis</a>
    </h3>
    <p>
        If you have not done so, consider walking through the <a href="#Tutorial">tutorial</a>
        and best practices from <a href="http://msdn.microsoft.com/en-us/magazine/cc500596.aspx">
            Measure
            Early and Often for Performance
        </a>.
    </p>
    <p>
        The default stack viewer in PerfView analyzes CPU usage of your process.&nbsp;&nbsp;
        There are three things that you should always do immediately when starting a CPU
        analysis of a particular process.
    </p>
    <ol>
        <li>
            <strong>Determine that you have at least a few 1000 samples</strong> (preferably
            over 5000).&nbsp;&nbsp;&nbsp; See <a href="#HowManySamples">how many samples do I need</a>
            for more.
        </li>
        <li>
            <strong>Determine that the process is actually CPU bound over time of interest</strong>.&nbsp;
        </li>
        <li>
            <strong>Ensure that you have the symbolic information you need. </strong>&nbsp;See
            <a href="#SymbolResolution">symbol resolution</a> for more.&nbsp;
        </li>
    </ol>
    <p>
        If either of the above conditions fail, the rest of your analysis will very likely
        be inaccurate.&nbsp;&nbsp; If you don&#39;t have enough samples you need to go back
        and recollect so that you get more, modifying the program to run longer, or running
        the program many times to accumulate more samples.&nbsp;&nbsp;&nbsp; If you program
        is running for long enough (typically 5-20 seconds), and you still don&#39;t have
        at least 1000 samples, it is likely it is because CPU is NOT the bottleneck.&nbsp;
        It is very common in STARTUP scenarios that CPU is NOT the problem but that the
        time is being spent fetching data from the disk.&nbsp; It is also possible that
        the program is waiting on network I/O (server responses) or responses from other
        processes on the local system.&nbsp;&nbsp; In all of these cases the time being
        wasted is NOT governed by how much CPU time is used, and thus a CPU analysis is
        inappropriate.&nbsp;
    </p>
    <p>
        You can quickly determine if your process is CPU bound by looking at the <a href="#WhenColumn">
            &#39;When&#39;
            column
        </a> for your &#39;top most&#39; method.&nbsp;&nbsp; If
        the When column has lots of 9s or As in it over the time it is active then it is
        likely the process was CPU bound during that time.&nbsp; This is the time you can
        hope to optimize and if it is not a large fraction of the total time of your app,
        then optimizing it will have little overall effect (See <a href="http://en.wikipedia.org/wiki/Ahmdal's_Law">Amdahl&#39;s Law</a>).
        &nbsp;&nbsp; Switching to the <a href="#CallTreeView">
            CallTree
            view
        </a> and looking at the &#39;When&#39; column of some of the top-most
        methods in the program is a good way of confirming that your application is actually
        CPU bound..&nbsp;&nbsp;
    </p>
    <p>
        Finally you may have enough samples, but you lack the symbolic information to make
        sense of them.&nbsp; This will manifest with names with ? in them.&nbsp; By default
        .NET code should &#39;just work&#39;.&nbsp; For unmanaged code you need to tell
        PerfView which DLLs you are interested in getting symbols for.&nbsp;&nbsp; See <a href="#SymbolResolution">symbol resolution</a> for more.&nbsp;&nbsp; You should
        also quickly check that you don&#39;t have many <a href="#BrokenStacks">broken stacks</a>
        as this too will interfere with analysis.&nbsp;
    </p>
    <h4>
        <a id="TopDownBottomUpAnalysis">Top-down and Bottom-up Analysis</a>
    </h4>
    <p>
        Once you have determined that CPU is actually important to optimize you have a choice
        of how to do your analysis.&nbsp; Performance investigations can either be &#39;top-down&#39;
        (starting with the Main program and how the time spent there is divided into methods
        it calls), or &#39;bottom-up&#39; (starting with methods at &#39;leaf&#39; methods
        where samples were actually taken, and look for methods that used a lot of time).&nbsp;&nbsp;
        Both techniques are useful, however &#39;bottom-up&#39; is usually a better way
        to start because methods at the bottom tend to be simpler and thus easier to understand
        and have intuition about how much CPU they should be using.&nbsp;
    </p>
    <h4>Phase 1: Choosing How to Group Methods</h4>
    <p>
        PerfView starts you out in the &#39;<a href="#ByNameView">ByName</a>&#39; view that
        is appropriate starting point for a bottom-up analysis.&nbsp;&nbsp;&nbsp; It is
        particularly important in a bottom up analysis to group methods into semantically
        relevant groupings.&nbsp;&nbsp; By default PerfView picks a good set starting group
        (called &#39;just my code&#39;).&nbsp; In this grouping any method in any module
        that lives in a directory OTHER than the directory where the EXE lives, is considered
        &#39;OTHER&#39; and the <a href="#EntryGroups">entry group</a> feature is used group
        them by the method used to call out to this external code.&nbsp;&nbsp; See the <a href="#TutorialBottomUp">tutorial</a> more on the meaning of &#39;Just My Code&#39;
        grouping, and the <a href="#GroupPatsTextBox">GroupPats reference</a> for more on
        grouping.&nbsp;
    </p>
    <p>
        For simple applications the default grouping works well.&nbsp;&nbsp; There are other
        predefined groupings in the dropdown of the GroupPats box, and you are free to create
        or extend these as you need.&nbsp;&nbsp;&nbsp; You know that you have a &#39;good&#39;
        set of groupings when what you see in the &#39;ByName&#39; view are method names
        that are semantically relevant (you recognize the names, and know what their semantic
        purpose is), there are not too many of them (less than 20 or so that have an interesting
        amount of exclusive time), but enough that break the program into &#39;interesting&#39;
        pieces that you can focus on in turn (by <a href="#DrillingIntoGroups">Drilling Into</a>).&nbsp;
    </p>
    <p>
        One very simple way of doing this is to increase the<a href="#FoldPercentTextBox">
            Fold
            %
        </a>, which folds away small nodes.&nbsp;&nbsp; There is a shortcuts that increase&nbsp;
        (F7 key)&nbsp; or decrease (Shift F7) this by 1.6X.&nbsp;&nbsp; Thus by repeatedly
        hitting F7, you can &#39;clump&#39; small nodes into large nodes until only a few
        survive and are displayed.&nbsp;&nbsp; While this is fast and easy, it does not
        pay attention to how semantically relevant the resulting groups are.&nbsp;&nbsp;
        As a result it may group things in poor ways (folding away small nodes that were
        semantically relevant, and grouping them into &#39;helper routines&#39; that you
        don&#39;t much want to see).&nbsp;&nbsp; Nevertheless, it is so fast and easy it
        is always worth at least trying to see what happens.&nbsp; Moreover it is almost
        always valuable to fold away truly small nodes.&nbsp; Even if a node is semantically
        relevant, if it uses &lt; 1% of the total CPU time, you probably don&#39;t care
        about it.&nbsp;
    </p>
    <p>
        Typically the best results occur when you use Fold % in the 1-10% range (to get
        rid of the smallest nodes), and then selectively fold way any semantically uninteresting
        nodes that are left.&nbsp;&nbsp; This can be done easily looking at the &#39;ByName&#39;
        view, holding the &#39;Shift&#39; key down, and selecting every node on the graph
        that has some exclusive time&nbsp; (they will be toward the top), and you DON&#39;T
        recognize.&nbsp;&nbsp; After you have completed your scan, simply right click and
        select &#39;Fold Item&#39; and these node will be folded into their caller disappearing
        from the view.&nbsp;&nbsp; Repeat this until there are no nodes in the display that
        use exclusive time that are semantically irrelevant.&nbsp;&nbsp;&nbsp; What you
        have left is what you are looking for.&nbsp;
    </p>
    <h4>Phase 2: <a id="DrillingIntoGroups">Drilling Into Groups</a></h4>
    <p>
        During the first phase of an investigation you spend your time forming semantically
        relevant groups so you can understand the &#39;bigger picture&#39; of how the time
        spent in hundreds of individual methods can be assigned a &#39;meaning&#39;.&nbsp;&nbsp;&nbsp;
        Typically the next phase is to &#39;Drill into&#39; one of these groups that seems
        to be using too much time.&nbsp; In this phase you are selectively ungrouping a
        semantic group to understand what is happening at the next &#39;lower level&#39;
        of abstraction.&nbsp;
    </p>
    <p>
        You accomplish this with two commands
    </p>
    <ol>
        <li>
            Drill Into - By selecting a cell that represents samples (and inclusive or exclusive
            column), right clicking and selecting &#39;Drill Into&#39; it will bring up a new
            StackViewer that has been loaded with JUST THOSE SAMPLES.&nbsp;&nbsp; This allows
            you to change the filtering and grouping in that view WITHOUT having the samples
            from the rest of the run interfere with the analysis.&nbsp;
        </li>
        <li>
            Ungroup - Once you have a new window that you can change the grouping / folding,
            you typically want ungroup one of the selected node so you can &#39;see inside&#39;.&nbsp;
            The way you ungroup depends on the way the group was formed.&nbsp; Possibilities
            include
        </li>
        <li>
            If the node was an entry point group (e.g., OTHER&lt;&lt;mscorlib!get_Now()&gt;&gt;),
            you can indicate that you want just the that entry point to be ungrouped.&nbsp;&nbsp;
            This is what right clicking and selecting &#39;Ungroup&#39; does.&nbsp;&nbsp; Note
            that any methods that the original entry point calls now become entry points to
            the group so this only ungroups to &#39;one level&#39;.&nbsp;
        </li>
        <li>
            If the node was an entry point group (e.g., OTHER&lt;&lt;mscorlib!get_Now()&gt;&gt;),&nbsp;
            you can indicate that you want ALL methods in that MODULE to be ungrouped selecting
            the node and using the &#39;Ungroup Module&#39; command.&nbsp;&nbsp; This tends
            to show most of the interesting internal structure of that group in one shot.&nbsp;
        </li>
        <li>
            If the node is a normal groups (e.g., module mscorlib), you can indicate you want
            just that group ungrouped.&nbsp; The &#39;Ungroup&#39; does this.
        </li>
        <li>
            If the node has many other nodes folded into it (either because of the FoldPats
            or Fold %), then simply removing these will &#39;explode&#39; the group.&nbsp;&nbsp;
            There is a right click shortcut &#39;Clear all Folding&#39;&nbsp; which does this.&nbsp;
        </li>
    </ol>
    <p>
        Typically if &#39;Ungroup&#39; or &#39;Ungroup Module command does not work well,
        use &#39;Clear all Folding&#39;&nbsp; If that does not work well, clear the &#39;GroupPats&#39;
        textbox which will show you the most &#39;ungrouped&#39; view.&nbsp;&nbsp; if this
        view is too complex, you can then use explicit folding (or making ad-hoc groups),
        to build up a new semantic grouping (just like in the first phase of analysis).&nbsp;
    </p>
    <h4>Summary</h4>
    <p>
        In summary, a CPU performance analysis typically consist of three phases
    </p>
    <ol>
        <li>
            Confirming that CPU is indeed the bottleneck and that you have enough samples to
            do an accurate analysis.
        </li>
        <li>
            Using grouping and folding so that methods are clustered into semantically relevant
            groups
        </li>
        <li>
            Drilling into the groups of most interest by selectively ungrouping to understand
            finer detail.&nbsp;
        </li>
    </ol>
    <hr />
    <!--  ********************************** -->
    <h2>
        <a id="InvestigatingMemoryData">Investigating Memory</a>
    </h2>
    <!--  ****************** -->
    <h3>
        <a id="WhenToCareAboutMemory">When to care about Memory</a>
    </h3>
    <p>
        It is pretty clear the benefit of optimizing for time:&nbsp; your program goes faster,
        which means your users are not waiting as long.&nbsp;&nbsp; For memory it is not
        as clear.&nbsp; If your program uses 10% more memory than it could who cares?&nbsp;&nbsp;
        There is a useful MSDN article called <a href="http://msdn.microsoft.com/en-us/magazine/dd882521.aspx">
            Memory
            Usage Auditing for .NET Applications
        </a> which will be summarized here.&nbsp;&nbsp;
        Fundamentally, you really only care about memory when it affects speed, this happens
        when your app gets big (Memory used as indicated by <a href="http://en.wikipedia.org/wiki/Windows_Task_Manager">TaskManager</a>
        &gt; 50 Meg).&nbsp; Even if your application is small, however,
        it is so easy to do a &#39;10 minute memory audit&#39; of your applications total
        memory usage and the .NET&#39;s GC heap, that you really should do so for any application
        that performance matters at all.&nbsp;&nbsp; Literally in seconds you can get a
        dump of the GC heap, and be seeing if the memory &#39;is reasonable&#39;.&nbsp;&nbsp;
        If your app does use 50Meg or 100 Meg of memory, then it probably is having an important
        performance impact and you need to take more time to optimized its memory usage.&nbsp;
        See the article for more details.&nbsp;
    </p>
    <hr />
    <!--  ****************** -->
    <h3>
        <a id="WhenToCareAboutTheGCHeap">When to care about the GC Heap</a>
    </h3>
    <p>
        Even if you have determined that you <a href="#WhenToCareAboutMemory">care about memory</a>,
        it is still not clear that you care about the GC heap.&nbsp; If the GC heap is only
        10% of your memory usage then you should be concentrating your efforts elsewhere.&nbsp;&nbsp;
        You can quickly determine this by opening <a href="http://en.wikipedia.org/wiki/Windows_Task_Manager">TaskManager</a>,
        selecting the &#39;processes&#39; tab an finding your processes
        &#39;Memory (Private Working Set) value .&nbsp;&nbsp;&nbsp; (See
        <a href="http://msdn.microsoft.com/en-us/magazine/dd882521.aspx">
            Memory
            Usage Auditing for .NET Applications
        </a> on an explanation of Private
        working set).&nbsp;&nbsp;&nbsp; Next, use PerfView to take a heap snapshot of the
        same process (Memory -&gt; Take Heap Snapshot).&nbsp;&nbsp; At the top of the view
        will be the &#39;Total Metric&#39; which in this case is bytes of memory.&nbsp;&nbsp;
        If GC Heap is a substantial part of the total memory used by the process, then you
        should be concentrating your memory optimization on the GC heap.&nbsp;
    </p>
    <p>
        If you find that your process is using a lot of memory but it is NOT the GC heap,
        you should download the free SysInternals
        <a href="http://technet.microsoft.com/en-us/sysinternals/dd535533">vmmap</a>
        tool.&nbsp; This tool gives you a breakdown of ALL the memory used
        by your process (it is nicer than the vadump tool mentioned in
        <a href="http://msdn.microsoft.com/en-us/magazine/dd882521.aspx">
            Memory
            Usage Auditing for .NET Applications
        </a>).&nbsp;&nbsp; If this utility shows that the
        Managed heap is large, then you should be investigating that.&nbsp; If it shows you that the &#39;Heap&#39;
        (which is the OS heap) or &#39;Private Data&#39; (which is virtualAllocs)&nbsp;
        you should be<a href="#UnmanagedMemoryAnalysis">
            investigating unmanaged memory
        </a>.
    </p>
    <!--  ****************** -->
    <h3>
        <a id="CollectingDataGCHeap">Collecting GC Heap Data</a>
    </h3>
    <p>
        If you have not already read <a href="#WhenToCareAboutMemory">When to care about Memory</a>
        and <a href="#WhenToCareAboutTheGCHeap">When to care about the GC Heap</a> please
        do so to ensure that GC memory is even relevant to your performance problem.
    </p>
    <p>
        The <strong>Memory-&gt;Take Heap Snapshot</strong> menu item allows you to take
        a snapshot of the GC heap of any running .NET application. When you select this
        menu item it brings up a dialog box displaying all the processes on the system from
        which to select.
    </p>
    <center>
        <img src="images/MemoryCollection.png" alt="Memory Collection" />
    </center>
    <p>
        By typing a few letters of the process name in the filter textbox you can quickly
        reduce the number of processes shown. In the image above simply typing 'x' reduces
        the number of processes to 7 and typing 'xm' would be enough to reduce it to a single
        process (xmlView). Double clicking on the entry will select the entry and start
        the heap dump. Alternatively you can simply select the process with a single click
        and continue to update other fields of the dialog box.
    </p>
    <p>
        If PerfView is not run as administrator it may not show the process of interest
        (if it is not owned by you). By clicking on the Elevate to Admin hyperlink to restart
        PerfView as admin to see all processes.
    </p>
    <p>
        The process to dump is the only required field of the dialog, however you can set
        the others if desired. (See <a href="#MemoryCollectionDialog">
            Memory Collection Dialog
            reference
        </a> for more). To start the dump either click the 'Dump Heap' button
        or simply type the enter key.
    </p>
    <!--  ****************** -->
    <h3>
        <a id="#UnderstandingPerfDataGCHeap">Understanding GC Heap Perf Data</a>
    </h3>
    <p>
        Once you have some GC Heap data, it is important to understand what exactly you
        collected and what its limitations are. Logically what has been captured is a snapshot
        of objects in the heap that were found by traversing references from a set of roots
        (just like the GC itself). This means that you only discover objects that were live
        at the time the snapshot was taken. However two factors make this characterization
        inaccurate in the normal case.
    </p>
    <ul>
        <li>
            <a href="#GCHeapSampling">Sampling:</a> To save time, PerfView may not dump the
            whole heap
        </li>
        <li>
            <a href="#ProcessFreezing">Freezing:</a> To make collection less impactful, it allows
            the process to run as it collects data
        </li>
    </ul>
    <!--  ********* -->
    <h4>
        <a id="GCHeapSampling">Understanding GC Heap Sampling</a>
    </h4>
    <p>
        For some applications GC heaps can get quite large (> 1GB and possibly 50GB or more)
        When GC heaps 1,000,000 objects it slows the viewer quite as well as making the
        size of the heap dump file very large.
    </p>
    <p>
        To avoid this problem, by default PerfView only collects complete GC heap dumps
        for heaps less than 50K objects. Above that PerfView only takes a sample of the
        GC heap. PerfView goes to some trouble to pick a 'good' sample. In particular
    </p>
    <ol>
        <li>The whole heap (both live and dead objects) are considered when performing the sample</li>
        <li>
            It actually collects that whole heap graph in memory and for each type counts how
            objects there are in each type.&nbsp; It also knows the total number of objects
            in the heap.
        </li>
        <li>
            Based on the total number of objects in the heap, and the &#39;target&#39;number
            of object (by default 50K), it computes a &#39;sampling ratio&#39;.&nbsp;&nbsp;
            And from that computes a &#39;quota&#39; of object for each type.&nbsp;
        </li>
        <li>
            It then walks the heap (linearly) randomly selecting objects to hit the quota for
            each type.&nbsp;
        </li>
        <li>
            However, we also require that each object not only contain itself, but also a &#39;path
            to root&#39;.&nbsp;&nbsp; To ensure this
            <ul>
                <li>
                    When the heap graph was walked, spanning tree was formed (using the same priority
                    algorithm used for displaying the heap)
                </li>
                <li>
                    When an object is selected, the parent chain in the spanning tree is also included
                    in the sampled graph.&nbsp;
                </li>
            </ul>
        </li>
        <li>
            In addition, large objects (with size &gt; 85,000 bytes) area ALWAYS collected.&nbsp;
        </li>
        <li>
            After all samples are selected, any references from nodes in the sampled graph are
            included.&nbsp;&nbsp;
        </li>
    </ol>
    <p>
        The result is that all samples always contain at least one path to root (but maybe
        not all paths).&nbsp;&nbsp; All large objects are present, and each type has at
        least a representative number of samples (there may be more because of reason (5)
        and (6)).
    </p>

    <!--  ********* -->
    <h4>
        <a id="GCHeapScaling">Understanding GC Heap Scaling</a>
    </h4>
    <p>
        <a href="#GCHeapSampling">GC heap sampling</a> produces only dumps fraction of objects
        in the GC Heap, but we wish for that sample to represent the whole GC heap. PerfView
        does this by scaling the counts.&nbsp;&nbsp; Unfortunately because of the requirement
        to included any large object and the path to root of any object, a single number
        will not correctly scale the sampled heap so that it represents the original heap.&nbsp;&nbsp;
        PerfView solves this by remembering the Total sizes for each type in the original
        graph as well as the total counts in the scaled graph.&nbsp; Using this information,
        for each type it scales the COUNT for that type so that the SIZE of that type matches
        the original GC heap.&nbsp;&nbsp; Thus what you see in the viewer should be pretty
        close to what you would see in original heap (just much smaller and easier for PerfView
        to digest).&nbsp; In this way large objects (which are ALWAYS taken) will not have
        their counts scaled, but but the most common types (e.g. string), will be heavily
        scaled.&nbsp;&nbsp;&nbsp;&nbsp; You can see the original statistics and the ratios
        that PerfView uses to scale by looking at the log when a .gcdump file has been opened.
    </p>
    <p>
        When PerfView displays a .gcdump file that has been sampled (and thus needs to be
        scaled), it will display the Average amount the COUNTS of the types have been scaled
        as well as the average amount the SIZES had to be scaled in the summary text box
        at the top of the display.&nbsp;&nbsp; This is your indication that sampling/scaling
        is happening, and to be aware that some sampling distortions may be present.&nbsp;
    </p>
    <p>
        It is important to realize that while the scaling tries to counteract the effect of
        sampling (so what is display 'looks' like the true, unsampled, graph), it is not perfect.
        The PER-TYPE statistic SIZE should always be accurate (because that is the metric that
        was used to perform the scaling, but the COUNTs may not be.   In particular for types
        whose instances can vary in size (strings and arrays), the counts may be off (however
        you can see the true numbers in the log file).   In addition the counts and sizes for
        SUBSETS of the heap can be off.
    <p>
        For example if you drill down to one particular part of the heap (say the set of all Dictionary&lt;string, MyType&gt;),
        you might find that the count of the keys (type string) and the count of values (type MyType) are not the same.
        This is clearly unexpected, because each entry should have exactly one of each.   This anomaly is a result
        of the sampling.    The likelihood of an anomaly like this is inversely proportional to the size of
        the subset of the heap you are reasoning over.   Thus when you reason about the heap as
        a whole, there should be no anomaly, but if you reason about a small number of objects deep
        in some sub-tree, the likelihood is very high.
    </p>
    <p>
        Generally speaking, these anomalies do not tend to affect the analysis much.  This is because you
        usually care about LARGE parts of your heap, and this is exactly where sampling is most accurate.
        Thus typically the correct response to these anomalies is to simply ignore them.   If however they
        are interfering with your analysis, you can reduce or eliminate them by simply doing less sampling.
        The Sampling is controlled by the 'Max Dump K Objs' field.  By default 250K objects are collected.
        If you set this number to be larger you will sample less.  If you set it to some VERY large number
        (say 1 Billion), then the graph will not be sampled at all.    Note that there is a reason why
        PerfView samples.   When the number of objects being manipulated gets above 1 million, PerfView's
        viewer will noticeably lag.  Above 10 million and it will be a VERY frustrating experience.   There
        is also a good chance that PerfView will run out of memory when manipulating such large graphs.   It
        will also make the GCDump files proportionally bigger, and unwieldy to copy.   Thus
        changing the default should be considered carefully.   Using the sampled dump is usually the better option.
    </p>
    <p>
        As mentioned, GCHeap collection (for .NET) collects DEAD as well as live objects.&nbsp;&nbsp;
        PerfView does this because it allows you to see the &#39;overhead&#39; of the GC
        (amount of space consumed, but not being used for live objects).&nbsp;&nbsp; It
        also is more robust (if roots or objects can&#39;t be traversed, you don&#39;t lose
        large amounts of the data).&nbsp;&nbsp; When the graph is displayed dead objects
        can be determined because they will pass through the &#39;[not reachable from roots]&#39;
        node.&nbsp;&nbsp; Typically you are not interested in the dead objects, so you can
        exclude dead objects by excluding this node (Alt-E).
    </p>
    <!--  ********* -->
    <h4>
        <a id="ProcessFreezing">GC Heap collection: To Freeze or not to Freeze?</a>
    </h4>
    <p>
        PerfView has the ability to either freeze the process or allow it to run while the
        GC heap is being collected. If the process is frozen, the resulting heap is accurate
        for that point in time, however since even sampling the GC heap can take 10s of
        seconds, it means that the process will not be running for that amount of time.
        For 'always up' servers this is a problem as 10s of seconds is quite noticeable.
        On the other hand if you allow the process to run as the heap is collected, it means
        that the heap references are changing over time. In fact GCs can occur, and memory
        that used to point at one object might now be dead, and conversely new objects will
        be created that will not be rooted by the roots captured earlier in the heap dump.
        Thus the heap data will be inaccurate.
    </p>
    <p>
        Thus we have a trade-off
    </p>
    <ul>
        <li>
            Freeze the heap and get an accurate dump but interrupt the process for seconds to
            10s of seconds.
        </li>
        <li>Allow the process to run and get less accurate heap dumps. </li>
    </ul>
    <p>
        PerfView allows both, but by default it will NOT freeze the process. The rational
        is that for most apps, you take a snapshot while the process is waiting for user
        input (and thus the process acts like it is frozen anyway). The exception is server
        applications. However this is precisely the case where stopping the process for
        10s of seconds would likely be bad. Thus a default to allow the process to run is
        better in most cases.
    </p>
    <p>
        In addition, if the heap is large, it is already the case that you will not dump
        all objects in the heap. As long as the objects being missed by the process running
        are statistically similar to the ones that did not move (likely in a server process),
        then your heap stats are likely to be accurate enough for most performance investigations.
    </p>
    <p>
        Nevertheless, if for whatever reason you wish to eliminate the inaccuracy of a running
        process, simply use the Freeze checkbox or the /Freeze command line qualifier to
        indicate your desire to PerfView.
    </p>
    <!--  ****************** -->
    <h4>
        <a id="HeapGraphToTree">Converting a Heap Graph to a Heap Tree</a>
    </h4>
    <p>
        As described in <a href="#UnderstandingPerfDataGCHeap">Understanding GC heap data</a>
        the data actually captured in a .GCDump file may only be an approximation to the
        GC heap. Nevertheless the .GCDump does capture the fact that the heap is an arbitrary
        reference graph (a node can have any number of incoming and outgoing references
        and the references can form cycles). Such arbitrary graphs are inconvenient from
        an analysis perspective because there is no obvious way to 'roll up' costs in a
        meaningful way. Thus the data is further massaged to turn the graph into a tree.
    </p>
    <p>
        The basic algorithm is to do a weighted breadth-first traversal of the heap visiting
        every node at most once, and only keeping links that where traversed during the
        visit. Thus the arbitrary graph is converted to a tree (no cycles, and every node
        (except the root) has exactly one parent). The default weighting is designed to
        pick the 'best' nodes to be 'parents'. The intuition is that if you have a choice
        between choosing two nodes to be that parent of a particular node, you want to pick
        the most semantically relevant node.
    </p>
    <!--  ******** -->
    <h5>
        <a id="PriorityTextBox">Using Priorities to control</a> <a href="#HeapGraphToTree">graph-to-tree</a>
        conversion
    </h5>
    <p>
        The viewer of gc heap memory data has an extra 'Priority' text box, which contains
        patterns that control the <a href="#HeapGraphToTree">graph-to-tree</a> conversion
        by assigning each object a floating point numeric priority. This is done in a two
        step process, first assigning priorities to type names, and then through types assigning
        objects a priority.
    </p>
    <p>
        The Priority text box is a semicolon list of expressions of the form
    </p>
    <ul>
        <li><i>PAT</i> -> <i>NUM</i></li>
    </ul>
    <p>
        Where <i>PAT</i> is a regular expression pattern as defined in <a href="#PatternMatching">
            Simplified
            Pattern matching
        </a> and <i>NUM</i> is a floating point number. The
        algorithm for assigning priorities to types is simple: find the first pattern in
        the list of patterns that match the type name. If the patterns match assign the
        corresponding priority. If no pattern matches assign a priority of 0. In this way
        every type is given a priority.
    </p>
    <p>
        The algorithm for assigning a priority to an object is equally simple. It starts
        with the priority of its type, but it also adds in 1/10 the priority of its 'parent'
        in the spanning tree being formed. Thus a node gives part of its priority to its
        children, and thus this tends to encourage breadth first behavior (all other priorities
        being equal that is 2 hops away from a node with a given priority will have a higher
        priority than a node that is 3 hops away).
    </p>
    <p>
        Having assigned a priority to all 'about to be traversed' nodes, the choice of the
        next node is simple. PerfView chooses the highest priority node to traverse next.
        Thus nodes with high priority are likely to be part of the spanning tree that PerfView
        forms. This is important because all the rest of the analysis depends on this spanning
        tree.
    </p>
    <p>
        You can see the default priorities in the 'Priority' text box. The rationale behind
        this default is:
    </p>
    <ul>
        <li>
            Runtime infrastructure is given large negative weight and thus are only chosen after
            everything else.
        </li>
        <li>
            Local variables are also given a large negative weight because they are transient,
            but tend to 'short circuit' the 'true' root, because they tend to point into the
            'middle' of data structures.
        </li>
        <li>Framework types are given a small negative weight </li>
        <li>User defined types are given the default weight of 0 </li>
    </ul>
    <p>
        Thus the algorithm tends to traverse user defined types first and find the shortest
        path that has the most user defined types in the path. Only when it runs out of
        such links does it follow framework types (like collection types, GUI infrastructure,
        etc), and only when those are exhausted, will anonymous runtime handles be traversed.
        This tends to assign the cost (size) of objects in the heap to more semantically
        relevant objects when there is a choice.
    </p>
    <h5>
        <b>Best Practices for assigning priorities to your types</b>
    </h5>
    <p>
        The defaults work surprisingly well and often you don't have to augment them. However
        if you do assign priorities to your types, you generally want to choose a number
        between 1 and 10. If all types follow this convention, then generally all child
        nodes will be less (because it was divided by 10) than any type given an explicit
        type. However if you want to give a node a priority so that even its children have
        high priority you can give it a number between 10 and 100. Making the number even
        larger will force even the grandchildren to 'win' most priority comparisons. In
        this way you can force whole areas of the graph to be high priority. Similarly,
        if there are types that you don't want to see, you should give them a number between
        -1 and -10.
    </p>
    <p>
        The GUI has the ability to quickly set the priorities of particular type. If you
        select text in the GUI right click to Priorities -> Raise Item Priority (Alt-P),
        then that type's priority will be increased by 1. There is a similarly 'Lower Item
        Priority (Shift-Alt-P). Similarly, there is a Raise Module Priority (Alt-Q) and
        Lower Module Priority (Shift-Alt-Q) which match any type with the same module as
        the selected cell.
    </p>
    <p>
        Because the graph has been converted to a tree, it is now possible to unambiguously
        assign the cost of a &#39;child&#39; to the parent. In this case the cost is the
        size of the object, and thus at the root the costs will add up to the total (reachable)
        size of the GC heap (that was actually sampled).
    </p>
    <!--  ******** -->
    <h5>Viewing the resulting heap tree</h5>
    <p>
        Once the heap graph has been converted to a tree, the data can be viewed in the
        same stackviewer as was used for ETW callstack data. However in this view the data
        is not the stack of the allocation but rather the connectivity graph of the GC heap.
        You don't have callers and callees but referrers and referees. There is no notion
        of time (the 'when', 'first' and 'last' columns), but the notions of inclusive and
        exclusive time still make sense, an the grouping and folding operations are just
        as useful.
    </p>
    <p>
        It is important to note that this conversion to a tree is inaccurate in that it
        attributes all the cost of a child to one parent (the one in the traversal), and
        no cost to any other nodes that also happened to point to that node. Keep this in
        mind when viewing the data.
    </p>
    <!--  ******** -->
    <h5>
        <a id="Pri1OnlyCheckBox">Primary</a> <a id="PrimaryAndSecondaryNodes">
            vs Secondary Nodes
            in the stack Viewer
        </a>
    </h5>
    <p>
        As described in <a href="#HeapGraphToTree">Converting a Heap Graph to a Heap Tree</a>,
        before the memory data can be display it is converted from a graph (where arcs can
        form cycles and have multiple parents) to a tree (where there is always exactly
        one path from the node to the root. References that are part of this tree are called
        primary refs and are displayed in black in the viewer. However it is useful to also
        see the other references that were trimmed. These other references are called
        secondary nodes. <strong>
            When secondary nodes are present, primary nodes are in bold
            and secondary nodes are normal font weight.
        </strong> Sometimes secondary nodes
        clutter the display so there is a 'Pri1 Only' check box, which when selected suppresses
        the display of secondary nodes.
    </p>
    <p>
        Primary nodes are much more useful than secondary nodes because there is an obvious
        notion of 'ownership' or 'inclusive' cost. It makes sense to talk about the cost
        of a node and all of its children for primary nodes. Secondary nodes do not have
        this characteristic. It is very easy to 'get lost' opening secondary nodes because
        you could be following a loop and not realize it. To help avoid this, each secondary
        nodes is labeled with its 'minimum depth'. This number is the shortest PRIMARY path
        from any node in the set to the root node. Thus if you are trying to find a path
        to root with secondary nodes, following nodes with small depth will get you there.
    </p>
    <p>
        Generally, however it is better to NOT spend time opening secondary nodes. The real
        purpose of showing these nodes is to allow you to determine if your priorities in
        the <a href="#PriorityTextBox">Priority Text Box</a> are appropriate. If you find
        yourself being interested in secondary nodes, there is a good chance that the best
        response is to simply add a priority that will make those secondary nodes primary
        ones. By doing this you can get sensible inclusive metrics, which are the key to
        making sense of the memory data.
    </p>
    <p>
        One good way of setting priorities is to us the right click -> Priority -> Increase
        Priority (Alt-P) and right click -> Priority -> Decrease Priority (Alt-Q) commands.
        By selecting a node that is either interesting, or explicitly not interesting and
        executing these commands you can raise or lower its priority and thus cause it to
        be in the primary tree (or not).
    </p>
    <hr />
    <!--  ****************** -->
    <h3>
        <a id="StartingAnAnalysisGCHeap">Starting an Analysis of GC Heap Dump</a>
    </h3>
    <p>
        This section assumes you have taken determined that the <a href="#WhenToCareAboutTheGCHeap">
            GC
            heap is relevant
        </a>, that you have <a href="#CollectingDataGCHeap">
            collected a GC
            Snapshot
        </a> and that you understand how the <a href="#HeapGraphToTree">
            heap graph was
            converted to a tree
        </a> and <a href="#GCHeapScaling">how the heap data was scaled</a>.
        In addition to the 'normal' heap analysis done here, it can also be useful to review
        the bulk behavior of the GC with the <a href="#GCStats">GCStats</a> report as well
        as <a href="#GCHeapAllocIgnoreFree(CoarseSampling)">GC Heap Alloc Ignore Free (Coarse Sampling)</a> view.
    </p>
    <!--  ********** -->
    <h4>
        <a id="BottomUpGCHeap">Bottom up Analysis</a>
    </h4>
    <p>
        Like a <a href="#Tutorial">CPU time investigation</a>, a GC heap investigation can
        be done bottom up or top down.&nbsp; Like a CPU investigation, a bottom up investigation
        is a good place to start.&nbsp; This is even more true for memory then it was for
        CPU.&nbsp; The reason is that unlike CPU, the tree that is being displayed in the
        view is not the &#39;truth&#39;&nbsp; because the tree view does not represent the
        fact that some nodes are referenced by more than one node (that is they have multiple
        parents).&nbsp; Because of this the top down representation is a bit &#39;arbitrary&#39;
        because you can get different trees depending on details of exactly how the breadth
        first traversal of the graph was done.&nbsp;&nbsp; A bottom up analysis is relatively
        immune to such inaccuracy and thus is a better choice.
    </p>
    <p>
        Like a CPU investigation, a <a href="#TutorialBottomUp">bottom up</a> heap investigation
        starts with forming semantically relevant groups by &#39;folding away&#39; any nodes
        that are NOT semantically relevant.&nbsp; This continues until the size of the groups
        are big enough to be interesting.&nbsp;&nbsp; The &#39;Drill Into&#39; feature can
        then be used to start a sub-analysis.&nbsp; Please see the <a href="#Tutorial">CPU Tutorial</a>
        if you are not familiar with these techniques.
    </p>
    <p>
        The Goto <a href="#CallersView">callers view</a> (F10) is particularly useful for
        a heap investigation because it quickly summarizes paths to the GC roots, which
        indicate why the object is still alive.&nbsp;&nbsp; When you find object that have
        outlived their usefulness, one of these links must be broken for the GC to collect
        it.&nbsp;&nbsp; It is important to note that because the view shows the TREE and
        not the GRAPH of objects, there may be other paths to the object that are not shown.&nbsp;
        Thus to make an object die, it is NECESSARY that one of the paths in the callers
        view be severed, but it may not be SUFFICIENT.
    </p>
    <!--  ********** -->
    <h4>Grouping and Folding for GC Heap Investigation</h4>
    <p>
        Typically, GC heaps are dominated by
    </p>
    <ol>
        <li>Strings (typically the account for 20-25% of the total size of the GC Heap!</li>
        <li>
            Arrays (often byte[]).&nbsp;&nbsp; These often account for 10% or more.&nbsp;
        </li>
    </ol>
    <p>
        Unfortunately while these types dominate the size of the heap they do not really
        help in analysis.&nbsp; What you really want to know is not that you use a lot of
        strings but WHAT OBJECTS YOU CONTROL are using a lot of strings.&nbsp;&nbsp; The
        good news is that this is &#39;standard problem&#39; that of a <a href="#TutorialBottomUp">
            bottom
            up analysis
        </a> that PerfView is really good a solving.&nbsp;&nbsp; By
        default PerfView adds <a href="#FoldPatsTextBox">folding patterns</a> that cause
        the cost of all strings and arrays to be charged to the object that refers to them
        (it is like the field was &#39;inlined&#39; into the structure that referenced it).&nbsp;&nbsp;
        Thus other objects (which are much more likely to be semantically relevant to you),
        are charged this cost.&nbsp;&nbsp; Also by default, the &#39;<a href="#FoldPercentTextBox">Fold%</a>&#39;
        textbox is set to 1, which says that any type that uses less than 1% of the GC heap
        should be removed and its cost charged to whoever referred to it.&nbsp;
    </p>
    <p>
        The bottom up analysis of a GC heap proceeds in much the same way as a <a href="#Tutorial">CPU investigation</a>.
        You use the grouping and folding features of the <a href="#StackViewer">Stack Viewer</a> to eliminate noise and
        to form bigger semantically relevant
        groups. When these get large enough, you use the <a href="#TutorialDrillingIntoGroups">Drill Into</a>
        feature to isolate on such group and understand it at a finer
        level of detail. This detailed understanding of your applications memory use tells
        you the most valuable places to optimize.
    </p>
    <p>
        Once you have determined a type to focus on, it is often useful to understand where
        the types have been allocated. See the <a href="#GCAllocStacks">GC Alloc Stacks view</a>
        for more on this.
    </p>
    <!--  ********** -->
    <h4>
        <a id="MemoryLeaks">Memory Leaks</a>
    </h4>
    <p>
        A common type of memory problem is a 'Memory Leak'. This is a set of objects that
        have served their purpose and are no longer useful, but are still connected to live
        objects and thus cannot be collected by the GC heap. If your GC heap is growing
        over time, there is a good chance you have a memory leak. Caches of various types
        are a common source of 'memory leaks'.
    </p>
    <p>
        A memory leak is really just an extreme case of a normal memory investigation. In
        any memory investigation you are grouping together semantically relevant nodes and
        evaluating whether the costs you see are justified by the value they bring to the
        program. In the case of a memory leak the value is zero, so generally it is just
        about finding the cost. Moreover there is a very straightforward way of finding
        a leak
    </p>
    <ul>
        <li>Run the program to a particular place and take a heap snapshot.</li>
        <li>
            Perform a set of operations (e.g. open and close something) that should be a 'no
            op'.
        </li>
        <li>Take another heap snapshot.</li>
        <li>
            Use the <a href="#Diff">Diff</a> feature of PerfView to find the difference between
            the heaps.
        </li>
        <li>
            Anything in the difference is a memory leak (since the state of the program should
            be the same).
        </li>
    </ul>
    <p>
        Note that because programs often have 'one time' caches, the procedure above often
        needs to be amended. You need to perform the set of operations once or twice before
        taking the baseline. That way any 'on time' caches will have been filled by the
        time the baseline has been captured and thus will not show up in the diff.
    </p>
    <p>
        When you find a likely leak use the 'Goto <a href="#CallersView">callers view</a>
        (F10)' on the node to find a path from the root to that particular node. This shows
        you the objects that are keeping this object alive. To fix the problem you must
        break one of these links (typically by nulling out on of the object fields).
    </p>
    <!--  ********** -->
    <h4>
        <a id="TopDownGCHeap">Top Down Analysis of the GC Heap</a>
    </h4>
    <p>
        While a <a href="#BottomUpGCHeap">Bottom up Analysis</a> is generally the best way
        to start, it is also useful to look at the tree 'top down' by looking at the
        <a href="#CallTreeView">CallTree view</a>. At the top of a GC heap are the roots
        of the graph. Most
        of these roots are either local variables of actively running methods, or static
        variables of various classes. PerfView goes to some trouble to try to get as much
        information as possible about the roots and group them by assembly and class. Taking
        a quick look at which classes are consuming a lot of heap space is often a quick
        way of discovering a leak.
    </p>
    <p>
        However this technique should be used with care. As mentioned in the section on
        <a href="#HeapGraphToTree">Converting a Heap Graph to a Heap Tree</a>, while PerfView
        tries to find the most semantically relevant 'parents' for a node, if a node has
        several parents, PerfView is really only guessing. Thus it is possible that there
        are multiple classes 'responsible' for an object, and you are only seeing one. Thus
        it may be 'unfair' to blame class that was arbitrarily picked as the sole 'owner'
        of the high cost nodes. Nevertheless, the path in the calltree view is at least
        partially to blame, and is at least worthy of additional investigation. Just keep
        in mind the limitations of the view.
    </p>
    <h5>Root Information Caveats</h5>
    <p>
        PerfView uses the .NET Debugger interface to collect symbolic information about
        the roots of the GC heap. There are times (typically because the program is running
        on old .NET runtimes) that PerfView can't collect this information. If PerfView
        is unable to collect this information it still dumps the heap, but the GC roots
        are anonymous e.g. everything is 'other roots'. See the log at the time of the GC
        Heap dump to determine exactly why this information could not be collected.
    </p>
    <h4>
        <a id="GCStats">GC Stats Report</a>
    </h4>
    <p>
        A typical GC Memory investigation includes dump of the GC heap. While this gives
        very detailed information about the heap at the time the snapshot was taken, it
        give no information about the GC behavior over time. This is what the GCStats report
        does. To get a GCStats reports you must <a href="#CollectingData">Collect Event Data</a>
        as you would for a CPU investigation (the GC events are on by default). When you
        open the resulting ETL file one of the children will be a 'GCStats' view. Opening
        this will give you a report for each process on the system detailing how bit the
        GC heap was, when GCs happen, and how much each GC reclaimed. This information is
        quite useful to get a broad idea of how the GC heap changes over time.
    </p>
    <h4>
        <a id="GCHeapAllocIgnoreFree(CoarseSampling)Stacks">GC Heap Alloc Ignore Free (Coarse Sampling) Stacks</a>
    </h4>
    <p>
        In addition to the information needed for a <a href="#GCStats">GC Stats Report</a>,
        a normal <a href="#CollectingData">ETW Event Data collection</a> will also include
        coarse information on where objects where allocated. Every time 100K of GC objects
        were allocated, a stack trace is taken. These stack traces can be displayed in the
        'GC Heap Alloc Stacks' view of the ETL file.
    </p>
    <p>
        These stacks show where a lot of bytes were allocated, however it does not tell
        you which of these objects died quickly, and which lived on to add to the size of
        the overall GC heap. It is these later objects that are the most serious performance
        issue. However by looking at a heap dump you CAN see the live objects, and after
        you have determined that a particular have many instances that live a long time,
        it can be useful to see where they are being allocated. This is what the GC Heap
        Alloc Stacks view will show you.&nbsp;
    </p>
    <p>
        Please keep in mind that the coarse sampling is pretty coarse. Only the objects
        that happen to 'trip' the 100KB sample counter are actually sampled. However what
        is true is that ALL objects over 100K in size will be logged, and any small object
        that is allocated a lot will likely be logged also. In practice this is good enough.
    </p>
    <h5>Large Objects</h5>
    <p>
        The .NET heap segregates the heap into &#39;LARGE objects&#39; (over 85K) and small objects
        (under 85K) and treats them quite differently.&nbsp; In particular large objects are only
        collected on Gen 2 GCs (pretty infrequently).&nbsp;&nbsp; If these large objects live for a
        long time, everything is fine, however if large objects are allocated a lot then either
        you are using a lot of memory or you are create a lot of garbage that will force a lot of
        Gen 2 collections (which are expensive).&nbsp;&nbsp; Thus you should not be allocating many
        large objects.&nbsp;&nbsp; The GC Heap Alloc view has a special &#39;LargeObject&#39; pseudo-frame
        that it injects if the object is big, making it VERY easy to find all the stacks where large
        objects are allocated.&nbsp; This is a common use of the GC Heap Alloc Stacks view.
    </p>

    <h4><a id="GCHeapNetMem(CoarseSampling)Stacks">Net </a><a id="GCHeapNetMemStacks">GC Heap Allocations Stacks (GC Heap Net Mem view)</a></h4>
    <p>
        The first choice of <a href="#InvestigatingMemoryData">
            investigating excessive memory usage
            of the .NET GC heap
        </a>&nbsp; is to <a href="#CollectingDataGCHeap">
            take a heap snapshot
            of the GC heap
        </a>.&nbsp;&nbsp; This is because objects are only kept alive because they
        are rooted, and this information shows you all the paths that are keeping the memory alive.&nbsp;&nbsp;&nbsp;
        However there are times that knowing the allocation stack is useful.&nbsp;&nbsp; The <a href="#GCHeapAlloc">
            GC
            Heap Alloc Stacks
        </a> view shows you these stacks, but it does not know when objects die.&nbsp;&nbsp; It
        is also possible to turn on extra events that allow PerfView to trace object <strong>freeing</strong> as
        well as allocation and thus compute the NET amount of memory allocated on the GC heap (along with the
        call stacks of those allocations).&nbsp;&nbsp;&nbsp; There are two verbosity levels to choose from.&nbsp;&nbsp;
        They are both in the advanced section of the collection dialog box
    </p>
    <ol>
        <li>.NET Alloc - This option logs an events (and stack) every time a object is allocated on the GC heap</li>
        <li>.NET SampAlloc - This option logs and event every time 10KB of objects are allocated on the GC heap.</li>
    </ol>
    <p>
        In both case, they also log when objects are destroyed (so that the net can be computed).&nbsp;&nbsp;&nbsp;
        The the option of firing an event on every allocation is VERY verbose.&nbsp; If your program allocates a lot,
        it can slow it down by a factor if 3 or more.&nbsp;&nbsp; In such cases the files will also be large (&gt;
        1GB for 10-20 seconds of trace).&nbsp;&nbsp; Thus it is best to start with the second option of firing an
        event every 10KB of allocation.&nbsp;&nbsp; This typically well under&nbsp; 1% of the overhead, and thus does
        not impact run time or file size much.&nbsp;&nbsp; It is sufficient for most purposes.&nbsp;
    </p>
    <p>
        When you turn on these events, only .NET processes that start AFTER you start data collection.&nbsp; Thus if
        you are profiling a long running service,
        you would have to restart the application to collect this information.&nbsp;
    </p>
    <p>
        Once you have the data you can view the data in the &#39;GC Heap Net Mem&#39;, which shows you the call
        stacks of all the allocations where the metric is bytes of GC Net GC heap.&nbsp;&nbsp;&nbsp;
        The most notable difference between <a href="#GCHeapAlloc">GC Heap Alloc Stacks</a> and &#39;GC Heap Net Mem&#39;
        is that the former shows allocations stacks of all objects, whereas the latter shows allocations stacks
        of only those objects that were not garbage collected yet.
    </p>
    <p>
        There is basically no difference in what is displayed between traces collected with the &#39;.NET Alloc&#39;
        checkbox or the &#39;.NET SampAlloc&#39; checkbox.&nbsp;&nbsp; It is just that in the case of .NET SampAlloc
        the information may be inaccurate since a particular call stack and type are &#39;charged&#39; with 10K of
        size.&nbsp;&nbsp; However statistically speaking it should give you the same averages if enough samples are collected.&nbsp;
    </p>
    <p>
        The analysis of .NET Net allocations work the same way us <a href="#UnmanagedMemoryAnalysis">unmanaged heap analysis</a>.
    </p>
    <hr />
    <hr />
    <!--  *************************************************************************************** -->
    <h1>
        <a id="ReferenceGuide">PerfView Reference Guide</a>
    </h1>
    <!--  ****************** -->
    <h3>
        <a id="CancelingOperations">Canceling Operations</a> and <a id="LogFile">Status Log</a>
    </h3>
    <p>
        One of the goals of PerfView is for the interface to remain responsive at all times.&nbsp;&nbsp;
        The manifestation of this is the status bar at the bottom of most windows.&nbsp;
        This bar displays a one line output area as well as an indication of whether an
        operation is in flight, a &#39;Cancel&#39; button and a &#39;Log&#39; button.&nbsp;
        Whenever a long operation starts, the status bar will change from &#39;Ready&#39;
        to &#39;Working&#39; and will blink.&nbsp;&nbsp; The cancel button also becomes
        active.&nbsp;&nbsp; If the user grows impatient, he can always cancel the current
        operation.&nbsp;&nbsp;&nbsp; There is also a one line status message that is updated
        as progress is made.&nbsp;
    </p>
    <p>
        When complex operations are performed (like taking a trace or opening a trace for
        the first time), detailed diagnostic information is also collected and stored in
        a Status log.&nbsp; When things go wrong, this log can be useful in debugging the
        issue.&nbsp;&nbsp;&nbsp; Simply click on the &#39;Log&#39; button in the lower right
        corner to see this information.&nbsp;
    </p>
    <hr />
    <!--  ****************** -->
    <h3>
        <a id="MainViewerQuickStart">Quick Start for PerfView's Main View</a>
    </h3>
    <p>
        You have three basic choices in the main view:
    </p>
    <ul>
        <li>
            <a href="#MainViewerQuickStartTime">Collecting Data: Time Investigation </a>
        </li>
        <li><a href="#MainViewerQuickStartGCHeap">Collecting Data: Memory Investigation</a></li>
        <li><a href="#MainViewer">Examining Existing Data</a></li>
    </ul>
    <h4>
        <a id="MainViewerQuickStartTime">Quick Start for collecting Event (Time) data</a>
    </h4>
    <p>
        While we do recommend that you walk the <a href="#Tutorial">tutorial</a>, and review
        <a href="#CollectingData">Collecting Event Data</a> and <a href="#UnderstandingPerfData">
            Understanding
            Performance Data
        </a>, if your goal is to see your time-based profile
        data as quickly as possible, follow the following steps
    </p>
    <ul>
        <li>Click on the Collect -&gt; Run menu entry or type Alt-R.</li>
        <li>
            If you wish to do a <a href="#BlockedTimeInvestigation">wall clock time investigation</a>
            click the 'Thread Time' checkbox
        </li>
        <li>
            Type the command line of the scenario you wish to collected data for and hit &lt;Enter&gt;.&nbsp;
            If&nbsp; you wish you can type &#39;tutorial.exe&#39; to use the tutorial scenario.&nbsp;
            If it is not easy to launch your app from PerfView, see <a href="#CollectingData">
                collecting
                profile data
            </a> for how to collect machine wide.&nbsp;
        </li>
        <li>
            PerfView will run the application.&nbsp;&nbsp; Output will go to Log (to view see
            button in the lower right).&nbsp;&nbsp;&nbsp; You are shooting for&nbsp; 5-10 seconds
            of data (see <a href="#UnderstandingPerfData">Understanding Perf Data</a>).&nbsp;&nbsp;
            Run through the scenario and shut the app down.&nbsp;&nbsp; At this point you have
            created a file called &#39;PerfViewData.etl&#39;.&nbsp;&nbsp; PerfView will then
            process this performance data and display the CPU data.&nbsp; The first step in
            viewing the data is to select the process of interest.&nbsp; Select the process
            you started in step 1.&nbsp;
        </li>
        <li>
            Examine the CPU data it this view.&nbsp; Type F1 to see the <a href="#StackViewerQuickStart">
                stack
                viewer's quick start
            </a>.&nbsp;&nbsp;
        </li>
    </ul>
    <h4>
        <a id="MainViewerQuickStartGCHeap">Quick Start for Collecting GC Heap data</a>
    </h4>
    <p>
        While we do recommend that you walk the <a href="#TutorialGCHeap">tutorial</a>,
        and review <a href="#CollectingDataGCHeap">Collecting GC Heap Data</a> and
        <a href="#UnderstandingPerfDataGCHeap">Understanding GC Heap Data</a>, if your goal is to
        see your memory profile data
        as quickly as possible, follow the following steps
    </p>
    <p>
        <strong>Live Process Collection</strong>
    </p>
    <ul>
        <li>
            Click on the Memory -&gt; 'Take Heap Snapshot' menu entry or type Alt-S. This brings
            up the memory dump dialog box.
        </li>
        <li>
            Type a few characters of the process name of interest into the Filter textbox. This
            will cause only those processes which those characters in its name to be displayed.
        </li>
        <li>
            Double click on the process of interest (or hit Enter if it is selected). This will
            start the data collection and takes between 5 and 60 seconds
        </li>
        <li>
            After PerfView has created the .gcDump file it will immediately open it and display
            the data showing the types that consumed the most GC heap.
        </li>
        <li>
            Examine the GC Heap data it this view. Type F1 to see the
            <a href="#StackViewerQuickStartGCHeap">stack viewer's quick start</a>.
        </li>
    </ul>
    <p>
        <strong>Process Dump Collection</strong>
    </p>
    <ul>
        <li>
            Locate the .dmp file in the Main Viewer's file view and double click on it. This
            will start the data collection and can take up to a few minutes.
        </li>
        <li>
            After PerfView has created the .gcDump file it will immediately open it and display
            the data showing the types that consumed the most GC heap.
        </li>
        <li>
            Examine the GC Heap data it this view. Type F1 to see the <a href="#StackViewerQuickStartGCHeap">
                stack
                viewer's quick start
            </a>.
        </li>
    </ul>
    <hr />
    <!--  ****************** -->
    <h3>
        <a id="MainViewerTips">Main View Tips</a>
    </h3>
    <p>
        In addition to the <a href="#GeneralTips">General Tips</a>, here are tips specific
        to the <a href="#MainViewer">Main View</a>.
    </p>
    <ul>
        <li>
            <strong>The Local Symbol Directory</strong> - The default symbol cache (%TEMP%\SymbolCache),
            works well if either all the symbol files (PDBs) needed to understand the .ETL file
            are on the default symbol server, or the ETL file will not be shared with other
            users.&nbsp; However if you desire to place the ETL file on a file share so that
            others can read it, it is a good idea to create a local symbol directory.&nbsp;
            This is simply a directory named &#39;symbols&#39; that is in the same director
            as the ETL file.&nbsp;&nbsp; PerfView will automatically look for PDB files in this
            location if it exist, AND it will always place a copy of any PDB file it needed
            into this local cache.&nbsp;&nbsp; The result is an ETL file as well as its symbol
            directory is &#39;complete&#39; information needed to decode the ETL file.&nbsp;
            If both are on a file share, then you will always have all the PDBS you had when
            you did your original analysis.&nbsp;&nbsp;&nbsp; To make this easier to discover,
            there is a &#39;Make Local Symbol Dir&#39; entry when you right click on an ETL
            file.&nbsp; This command simply makes a &#39;symbols&#39; directory next to the
            ETL file.&nbsp;&nbsp;
        </li>
        <li>
            <strong>Drag and drop files</strong> - The file treeview supports drag and drop,
            so you can drag a file from the explorer or other tool and release it on the treeview
            in the main window to open the file.&nbsp;
        </li>
        <li>
            <strong>Cut and paste to select files</strong> - If you paste a path name into the
            text box in the top of the treeview it will open that file.&nbsp;
        </li>
        <li>
            <strong>Right click in the tree view</strong> - Operations on files are typically
            exposed by right clicking on items in the treeview.&nbsp;&nbsp;
        </li>
    </ul>
    <hr />
    <!--  ****************** -->
    <h3>
        <a id="MainViewer">PerfView's Main View</a>
    </h3>
    <p>
        The Main view is what greets you when you first start PerfView.&nbsp;&nbsp;&nbsp;
        The main view serves three main purposes
    </p>
    <ol>
        <li>
            It serves as a quick introduction to PerfView with links to important starting points
            in the user's guide.
        </li>
        <li>It hosts all the data collection capabilities of PerfView.</li>
        <li>
            Its left pane acts as a &#39;perf explorer&#39; which allows you to decide which
            performance data&nbsp; you wish to examine.&nbsp; Double clicking on items will
            open them, and right clicking will do other operations.&nbsp;
        </li>
        <li>
            <a id="DirectoryTextBox"><strong>Directory TextBox</strong></a> - At the top of
            left pane is the directory textbox. File -> 'Go To Directory''
            menu option (CTRL-L) on the Main Viewer&nbsp; This is set to the directory to inspect.&nbsp;
            You can also enter file names into this and it will cause them to be opened.&nbsp;&nbsp;
            When you open directory items in the view this textbox is updated to stay in sync.
        </li>
        <li>
            <a id="FileFilterTextBox"><strong>File Filter Textbox</strong></a> The box just
            below the directory textbox.&nbsp;&nbsp; If you type text in this box, then only
            files that match this string (case insensitive) will be displayed.&nbsp;&nbsp; The
            * character is a wild card.&nbsp;&nbsp; This is a quick way of finding a particular
            file in a large directory.
        </li>
    </ol>
    <p>
        The following image highlights the important parts of the Main View.&nbsp;
    </p>
    <center>
        <img src="images/MainViewer.png" alt="MainViewer" />
    </center>
    <h4>Data Collection</h4>
    <p>
        Typically when you first use PerfView, you use it to collect data.&nbsp; PerfView
        can currently collect data for the following kinds of investigations
    </p>
    <ol>
        <li>
            Time Investigations: ETW data (with many variations)&nbsp; You collect this data
            with items in the &#39;Collect&#39; menu entry.&nbsp;&nbsp; See <a href="#CollectingData">
                collecting
                ETW data
            </a> for more.
        </li>
        <li>
            .NET Memory Investigations: .NET Runtime managed heap.&nbsp; You collect this data
            with the &#39;Memory&#39; menu entry see <a href="#CollectingDataGCHeap">
                collecting
                memory data
            </a> for more.&nbsp;
        </li>
    </ol>
    <hr />
    <!--  ********************************** -->
    <h2>
        <a id="TypesOfPerformanceData">Types of Performance Data / Views</a>
    </h2>
    <p>
        The types of data PerfView understands
    </p>
    <ul>
        <li>
            <strong><a id="ETLPerfViewData">ETW Event data files (.ETL, .ETL.ZIP files)</a></strong>
            - ETL Files contain event tracing for windows (ETW) data.&nbsp; It is collected
            via the &#39;Collect&#39; or &#39;Run&#39; PerfView operations.&nbsp;&nbsp; ETW
            files can contain a wealth of different information depending on exactly events
            where activated at the time the data was collected.&nbsp;&nbsp;&nbsp;If the line
            contains the annotation (unmerged) it means that this data consists of multiple
            files and does not contain all the information necessary to copy the file to another
            machine.&nbsp;&nbsp; If you intend to copy the data, you must use the Right Click
            -&gt; Merge (or better Right click -&gt; Zip) operation before transferring it.&nbsp;
            See <a href="#merging">merging</a> for more.&nbsp;&nbsp;
            <ul>
                <li>
                    <strong><a id="PerfViewTraceInfo">TraceInfo View</a></strong> - The TraceInfo view
                    displays &#39;top level&#39; data that does not vary with time.&nbsp; This includes
                    things like when the data was collected, the machine on which it was collected, how
                    many processors and how much memory the machine&nbsp; had etc.
                </li>
                <li>
                    <strong><a id="PerfViewProcesses">Process View</a></strong> - This View shows information
                    about each process that was active at some point during the trace.&nbsp;&nbsp; It
                    gives the command line, the start and stop time, the amount of CPU, and other &#39;coarse&#39;
                    information about the processes.
                </li>
                <li>
                    <strong><a id="Processes/Files/RegistryStacks">Processes / Files / Registry Stacks</a></strong> -
                    This is a high level view showing the processes in the system.  In this view if one process
                    spawns another it will be a child of the parent process.   All DLL load and file opens
                    are also shown.   If the Registry events are turned on, you will see those as well.
                </li>
                <li>
                    <strong><a id="ThreadTime(withStartStopActivities)Stacks">Thread Time (With StartStop Activities) Stacks</a></strong>
                    - This is like <a href="#ThreadTime(withTasks)Stacks">Thread Time Stacks</a> in that it shows
                    what every thread is doing (consuming CPU, Disk, Network) at any instant of time, and
                    it tracks the causality of System.Threading.Tasks.Tasks so that costs incurred by the task
                    is 'charged' to the creator of the task.    However in addition to all this, it looks for
                    'Start-Stop' EventSource events as well as HTTP, ASP.NET WCF events and creates 'activities'
                    for each of these.  These Activities are place 'at the top (near the process node) of the stack
                    so it nicely separates all costs associated with a particular starts-stop activity (e.g. a
                    web request).   It is very valuable for doing server investigations.
                    You will only get this view if you collected data with the
                    <a href="#ThreadTimeCheckbox">Thread Time</a> events.
                    See <a href="#MakingServerInvestigationEasy">Making Server Investigation Easy</a>
                    and <a href="#BlockedTimeInvestigation">Blocked time investigation</a>
                    for more details.
                </li>
                <li>
                    <strong><a id="ThreadTime(withStartStopActivities)(CPUONLY)Stacks">Thread Time (With StartStop Activities) (CPU ONLY) Stacks</a></strong>
                    - This trace is basically the <a href="#ThreadTime(withStartStopActivities)Stacks">Thread Time (With StartStop Activities) Stacks</a> view,
                    however because the trace was not collected with the /threadTime option, the view cannot show blocked time.   The result is that
                    you can see CPU time, grouped by the Start-stop activity, but no blocked or async time.   Thus it is useful to see where CPU time is
                    being spent grouped by the request being serviced, but does not tell you much about wall clock time.   For that you need to collect
                    with the /threadTime option.
                </li>
                <li>
                    <!-- START OF GROUPS -->
                    <strong><a id="MemoryGroup">The Memory Group</a></strong> - This folder contains
                    all the views associated with memory investigations, whether the be native C++ heap,
                    raw Virtual Alloc, or .NET GC heap.
                    <ul>
                        <li>
                            <strong><a id="PerfViewGCStats">GCStats View </a></strong>- The GCStats view shows
                            the activity of the .NET GC over time.&nbsp; A report is generated for each process
                            that used the .NET GC, and for each such process, important statistics about each
                            GC is displayed.&nbsp;&nbsp;&nbsp;
                        </li>

                        <li>
                            <strong><a id="GCHeapAllocStacks">GC Heap Alloc Stacks</a></strong> - The .NET Runtime
                            logs an event very time 100K bytes of GC heap memory is allocated.&nbsp; This view
                            shows this broken down by call stack where the metric is the number of bytes allocated.=&nbsp;
                            Note that much of this memory quickly becomes trash and thus does not contribute
                            greatly to the GC heap size, however, high allocation rates DO consume CPU time
                            and thus this view is useful for tracking down the high allocation call sites to
                            reduce CPU.&nbsp;&nbsp; Keep in mind that this is a sample (only the allocation
                            that &#39;trips&#39; the 100K sample &#39;interval&#39; is logged, however for high
                            volume sites, this sampling will still be accurate.&nbsp;&nbsp;
                            See the <a href="#GCHeapAlloc">GC Heap Alloc Section</a> for more on this view.
                        </li>

                        <li>
                            <strong><a id="Gen2ObjectDeathsStacks">Gen 2 Object Deaths</a></strong> - When the
                            DotNetAlloc or DotNetAllocSamp events are turned on the runtime will log the
                            stack of allocations as well as when GCs happen and what objects are collected.
                            In this view we show you the allocation stack of objects that DIED in Gen 2.
                            If your Gen 2 GCs are expensive, then reducing these objects are the most important
                            way of bring the cost of those Gen 2 GCs down.  There is a (Coarse Sampling) version
                            of this based on the sampling that happens by default every 100KB of allocation.
                        </li>
                        <li>
                            <strong><a id="ServerGCStacks">Server GC Stacks</a></strong> - This
                            is a specialized view that shows you the CPU time that is consumed by the server GC threads.
                            This information is also available by filtering appropriately with the CPU views.
                        </li>

                        <li>
                            <strong><a id="NetVirtualAllocStacks">Net Virtual Alloc Stacks</a></strong> - The stacks
                            at which memory was allocated using VirtualAlloc.&nbsp;&nbsp; The metric is the
                            number of COMMITTED bytes and the metric is negative when memory is freed. You will
                            only get this view if you collected data with the <a href="#VirtualAllocCheckBox">
                                Virtual
                                Alloc
                            </a> events.&nbsp; See <a href="#UnmanagedMemoryAnalysis">Unmanaged Memory Analysis</a> for more.
                        </li>
                        <li>
                            <strong><a id="NetVirtualReserveStacks">Net Virtual Reserve Stacks</a></strong> - The stacks
                            at which virtual address space was allocated using VirtualAlloc.&nbsp;&nbsp; The metric is the
                            number of RESERVED bytes and the metric is negative when memory is RELEASED. You will
                            only get this view if you collected data with the <a href="#VirtualAllocCheckBox">
                                Virtual
                                Alloc
                            </a> events.&nbsp;  This view is only useful if you are running out of ADDRESS space
                            not memory.  Thus it is typically only useful for large 32 bit processes that throw out of memory
                            exceptions because there simply is no more address space to allocated memory into.
                        </li>
                        <li>
                            <strong><a id="NetOSHeapAllocStacks">Net OS Heap Alloc Stacks</a></strong> - The stacks at which
                            memory was allocated using HeapAlloc (used by malloc, and C++ new operators) .&nbsp;&nbsp;
                            The metric is the number of bytes allocated and&nbsp; the metric is negative if
                            the memory is freed.&nbsp; Currently you must use XPERF to collect an ETL trace
                            with these events.
                        </li>
                        <li>
                            <strong><a id="GCHeapAllocIgnoreFree">GC Heap Alloc Ignore Free</a></strong> -
                            This view shows you the stack of allocations weighted by the size of the allocation but does
                            not take into account object death (GCs).   It is like the GC Heap Alloc view, however
                            unlike that view it uses the finer grained events available when the .NET Alloc and
                            .NET Samp Alloc events are turned on.   This view is useful when you are trying to
                            carefully audit all allocations (Because for example you want to minimize GC pauses) and
                            you want more detail than the GC Heap Alloc events give you (which is only every 100K).
                        </li>
                        <li>
                            <strong><a id="PerfViewHeapSnapshots">GC Heap Snapshots</a></strong> -  This node
                            is present if PerfView detects any GC heap snapshots in the ETL file.   These
                            may be either JavaScript or .NET heap snapshots.
                        </li>
                        <li>
                            <strong><a id="JSHeapShnapshot">JS Heap Snapshot</a></strong> -  This node
                            represents a single snapshot of just a JavaScript heap.   It will bring up a
                            'GC Heap Dump' view of the heap if opened.
                        </li>
                        <li>
                            <strong><a id="GCHeapAnalyzer">GC Heap Analyzer</a></strong> -  This node opens
                            a viewer designed to help with GC heap analysis.  It contains much of the
                            same information as the <a href="#GCStats">GC Stats</a> view but is more graphical
                            and interactive.
                        </li>
                    </ul>
                </li> <!-- END OF MEMORY GROUP -->
                <li>
                    <strong><a id="AdvancedGroup">The Advanced Group</a></strong> - This folder contains
                    views that are more specialize investigations that are rarer than the common CPU,
                    wall clock time, and memory investigations.
                    <ul>
                        <li>
                            <strong><a id="ProcessorStacks">Processor Stacks</a></strong> - Shows what every
                            processor is doing, including at what priority. Use GroupPats to focus on Processor or Priority.
                        </li>
                        <li>
                            <strong><a id="ThreadTimeStacks">Thread Time Stacks</a></strong> - Shows what every
                            thread is doing whether it is consuming CPU, disk, network or blocked on something
                            else.  It does not include the 'ReadyThread' information.
                            You will only get this view if you collected data with the
                            <a href="#BlockingTimeBox">Thread Time</a> events.
                            See <a href="#BlockedTimeInvestigation">Blocked time investigation</a> on more details
                            on wall clock / blocked time investigations.
                        </li>
                        <li>
                            <strong><a id="ThreadTime(withTasks)Stacks">Thread Time (With Tasks) Stacks</a></strong>
                            - This is like <a href="#ThreadTimeStacks">Thread Time Stacks</a> in that it shows
                            what every thread is doing (consuming CPU, Disk, Network) at any instant of time.
                            But in addition it attributes any costs that a .NET System.Threading.Tasks.Task
                            is doing to the thread (Task) that created the work item. This is especially useful
                            for programs that use the C# 'async' feature.
                            You will only get this view if you collected data with the
                            <a href="#ThreadTimeCheckbox">Thread Time</a> events.
                            If you investigating a HTTP service or have Start-stop events in your code, it is general
                            better to use the
                            <a href="#ThreadTime(withStartStopActivities)Stacks">Thread Time (With StartStop Activities) Stacks</a>
                            view instead.
                            See <a href="#BlockedTimeInvestigation">Blocked time investigation</a>,
                            <a href="#UnderstandingPerfDataThreadTimeWithTasks">Understanding Thread Time With Tasks</a>
                            and <a href="#MakingServerInvestigationEasy">Making Server Investigation Easy</a>
                            for more details.
                        </li>
                        <li>
                            <strong>
                                <a id="ThreadTime(withReadyThread)Stacks">
                                    Thread Time (with ReadyThread)
                                    Stacks
                                </a>
                            </strong> - Normally <a href="#ThreadTimeStacks">Thread Time Stacks</a>
                            do not show you the thread that unblocked a thread if it is available
                            (ReadyThread events), because it tends to be confusing for the first level of analysis.
                            However once important regions of blocked time are identified, it is critical to
                            understand what UNBLOCKED that activity (after all if something blocked a long time
                            it is the thing that unblocked it that was 'late'.    At every stack that blocks, if
                            there is information about the thread that unblocked it this is appended to the bottom
                            of the stack with a 'READIED_BY' suffix.   This allows you to 'unwind' the causality
                            (what thread caused the unblock).
                            You will only get this view if you collected data with the
                            <a href="#ThreadTimeCheckbox">Thread Time</a> events.
                            See <a href="#BlockedTimeInvestigation">
                                Blocked time investigation
                            </a> for more details
                            on blocked time investigations.
                        </li>
                        <li>
                            <strong><a id="ExceptionsStacks">Exceptions Stacks</a></strong> - The stack of the
                            location where every exception was thrown.&nbsp;&nbsp; If you have high exception
                            rates, this view allows you to quickly locate the offending code.&nbsp;
                        </li>
                        <li>
                            <strong><a id="ImageLoadStacks">Image Load Stacks</a></strong> - The stacks at which
                            any file was mapped into memory (DLL load).&nbsp; The metric is the number of bytes
                            in file loaded, and the metric is negative when the image is unloaded.&nbsp;
                        </li>
                        <li>
                            <strong><a id="ManagedLoadStacks">Managed Load Stacks</a></strong> - The stacks
                            which caused the load of any managed assembly.&nbsp;&nbsp; The metric is the size
                            of the assembly loaded.&nbsp;
                        </li>
                        <li>
                            <strong><a id="PinningAtGCTimeStacks">Pinning at GC Time Stacks</a></strong> - This
                            view is designed to track down issues with unreasonable memory growth in the .NET
                            garbage collector because of excessive pinning of GC objects (which make it hard
                            for the GC to do its job).   By default this view shows you how many objects are pinned
                            at each GC.  However if you turn on the 'clrPrivate' provider with stacks (clrPrivate:@StacksEnabled=true),
                            it will give additional information on the exact stack where the pinning took place
                            for each such pinned object.   If you instead collect with /DotNetAlloc (very expensive)
                            it will tell you were the pinned object was allocated (if it has not scrolled off the
                            circular buffer)
                        </li>
                        <li>
                            <strong><a id="PinningStacks">Pinning Stacks</a></strong> - This  view is designed to
                            track down issues with unreasonable memory growth in the .NET garbage collector because
                            of excessive pinning of GC objects (which make it hard for the GC to do its job).   It
                            only displays if the 'clrPrivate' provider is turned on with stacks (clrPrivate:@StacksEnabled=true).
                            It works much like allocation stacks displaying all live GC pinning handles over their lifetime with the stack where they
                            were created.
                        </li>
                        <li>
                            <strong><a id="DiskI/OStacks">Disk I/O Stacks</a></strong> - The stacks at which
                            disk I/O happens.&nbsp; The metric is the amount of time it took to service the
                            disk operation (it does not include the time waiting for the disk to become available).&nbsp;
                        </li>
                        <li>
                            <strong><a id="FileI/OStacks">File I/O Stacks</a></strong> - The stacks at which
                            File I/O happens.&nbsp; The metric is the number of bytes read or written.&nbsp;&nbsp;
                            Note that this metric is independent of whether the File operation caused disk activity
                            (it might have been serviced from the file system cache). You will only get this
                            view if you collected data with the <a href="#FileIOCheckBox">File I/O</a> events.
                        </li>

                        <li>
                            <strong><a id="CCWRefCountStacks">CCW Ref Count Stacks</a></strong> - Show the stacks
                            where any .NET COM Callable Wrapper (CCW), has its COM reference count changed.
                            If you have a WinRT or COM object that is not being removed, it is because this
                            reference counter is not going to zero.  This view shows everywhere the count changed
                            which allows you to debug the problem.   In order for this view to be shown you need
                            to collect the trace with the ClrPrivate provider (thus /Providers=ClrPrivate:@StacksEnabled=true or placing
                            ClrPrivate:@StrackEnabled=true in the 'Additional Providers' textbox).
                        </li>
                        <li>
                            <strong><a id="WindowsHandleRefCountStacks">Windows Handle Ref Count Stacks</a></strong> -
                            Show the stacks where any Windows OS Handle were created, duplicated or closed.  When handles are created
                            or duplicated a +1 is used as the metric and when they are closed a -1 is used.  Also when
                            a close() on a handle occurs and PerfView has seen a 'create' or 'duplicate' event for it
                            it will use that stack for the close (this is very much like what is done for memory allocation
                            views).  Thus balanced creation and closing will cancel out if both are in the time interval
                            of interest.   There are a few handles that are allocated in a process but are closed by
                            system processes, so may be some imbalances, but generally there are only a handful of these
                            Thus it is relative easy to spot 'leaks' since they will show up as an imbalance.
                            In order for this view to be shown you need to collect with the <a href="#HandleCheckBox">'Handle'</a> kernel events.
                        </li>
                        <li>
                            <strong><a id="HeapSnapshotPinningStacks">Heap Snapshot Pinning Stacks</a></strong> -
                            (Experimental) This view is used when a heap snapshot is taken along with the ETW events for where
                            pinning happens (Providers include ClrPrivate).   This view shows for every object
                            in the heap snapshot that is pinned, the stacks at which is was pinned.  This includes
                            objects pinned because they are pointed to by a async pinned handle at the time of
                            the heap snapshot.
                        </li>
                        <li>
                            <strong><a id="HeapSnapshotPinnedObjectAllocationStacks">Heap Snapshot Pinned Object Allocation Stacks</a></strong> -
                            (Experimental) This view is used when a heap snapshot is taken along with the ETW events for where
                            pinning happens (Providers include ClrPrivate).   This view shows for every object
                            in the heap snapshot that is pinned, the stacks at which is was allocated.  This includes
                            objects pinned because they are pointed to by a async pinned handle at the time of
                            the heap snapshot.
                        </li>
                        <li>
                            <strong><a id="ContentionStacks">Contention Stacks</a></strong> - This view aggregates Contention events.&nbsp;
                            Contention event is fired when a thread tries to acquire a managed lock that is currently owned
                            by another thread. Note that each event has useful event data that is folded by default.
                            For example, unfolding <code>EventData DurationNs</code> reveals individual pauses of each thread: this can be useful 
                            to correlate a particular long wait to another events in the trace. Note that not all contention events
                            are real OS-level waits: the runtime may first spin wait to try acquire the lock fast. The metric
                            represents the amount of time spent to acquire the lock in milliseconds.
                        </li>
                        <li>
                            <strong><a id="WaitHandleWaitStacks">WaitHandleWait Stacks</a></strong> - This view is similar to <a href="#ContentionStacks">Contention Stacks</a>
                            in a sense that it allows to diagnose the blocking waits. However, there are important differences.
                            Contention events are emitted exclusively on <code>System.Threading.Monitor</code> and <code>System.Threading.Lock</code> code paths.
                            WaitHandleWait events are lower level and fire when the OS-provided wait handle is used to block the thread execution.
                            For example, WaitHandleWait will catch waiting inside <code>SemaphoreSlim.Wait()</code> while Contention event won't.
                            WaitHandleWait can be fired alongside the Contention event: for instance, during <code>Monitor.Enter</code> if spin waiting
                            was not enough to acquire the lock. Also note that WaitHandleWait is much more noisy: you'd need to clean up
                            the stacks from "legit" waiting like a main thread waiting for application shutdown signal, or a consumer thread
                            that waits for new items in a blocking channel. WaitHandleWait may not cover all waits in your application, like waits that
                            bypass the dotnet runtime or waits on other synchronous OS APIs like a blocking write to a socket.
                            For more details, refer to <a href="https://github.com/dotnet/runtime/pull/94737#discussion_r1409607825">this diagram</a> from the original PR.
                        </li>
                        <li>
                            <strong><a id="AnyStacks">Any Stacks</a></strong> - This view shows
                            every event that has a stack.&nbsp; It is useful when none of the more specialized
                            stack views are available.&nbsp;&nbsp;
                        </li>
                        <li>
                            <strong><a id="AnyTaskTreeStacks">Any TaskTree</a></strong> - This view is designed to
                            so you the 'task view' of any event in the trace.  It is useful when looking at asynchronous
                            or parallel operations.    When using the System.Threading.Tasks library you can think
                            of all execution as being run in a particular task which we call an activity.   Threads
                            are one kind of activity (denoted a 'Thread Activity') and any task started from the
                            task library is also an activity.   All activities besides Thread Activities have a parent
                            which started them.  Thus you can form a tree of 'parent' activities that ultimately will
                            end with a Thread-activity.  This is with the Any TaskTree will show.   This view is
                            available whenever there are events from the System.Threading.Tasks.TplEventSource event
                            source.
                        </li>
                        <li>
                            <strong><a id="AnyStacks(withStartStopActivities)Stacks">Any Stacks (with StartStop Activities)</a></strong> - This view
                            is very much like the <a href="#AnyStacks">Any Stacks</a> view in that it shows all events that have
                            call stacks associated with them.  The only difference is at the top of the each stack (between the
                            process and thread frames) is a list of Start-Stop tasks (that is  an activity See
                            <a href="http://blogs.msdn.com/b/vancem/archive/2015/09/14/exploring-eventsource-activity-correlation-and-causation-features.aspx">
                                EventSource Activities
                            </a> on how to define your own).
                        </li>
                        <li>
                            <strong><a id="AnyStartStopTreeStacks">Any StartStopTree</a></strong> - This view
                            is a simplification of the <a href="#AnyStacks(withStartStopActivities)Stacks">Any Stacks (with StartStop Activities)</a> view.
                            Like that view, it shows every event in the context of the start-stop activities that are currently
                            active, however it does NOT show call stacks.  Thus this works on any traces with minimal events
                            and is useful when you don't want the extra detail.
                        </li>

                        <li>
                            <strong><a id="PerfViewJitStats">JitStats View</a></strong> - The JitStats view show the
                            activity of the .NET Just in time (JIT) compiler.&nbsp; It shows exactly which methods were
                            JIT compiled, how big the are (both before and after JIT compilation) as well as the amount
                            of time it took to JIT compile the methods.&nbsp;&nbsp;&nbsp; This allows you to quickly
                            determine how much time can be saved by NGening various DLLs. This view also contains information
                            about the &#39;Background JIT&#39; feature of V4.5 of the runtime (as exposed by
                            System.Runtime.ProfileOptimization class) that speeds up startup of applications that
                            were not NGENed by JIT compiling on multiple processors.&nbsp; If background JIT is enabled,
                            this view will add two additional columns that track background JIT specific information.&nbsp;
                            The first is called DistanceAhead, and specifies in milliseconds how early the method was compiled
                            relative to when it was called.&nbsp; The second is called BlockedReason, and specifies why a method was not
                            background compiled.&nbsp; The most common reasons are that a module dependency has not yet
                            been loaded, or that playback aborted because of an unsatisfied module dependency triggering a
                            timeout.&nbsp; If a module dependency has not been satisfied, the name of the module appears
                            in this column.&nbsp; If a timeout is triggered, then the text &quot;Playback Aborted&quot; appears here.
                        </li>
                        <li>
                            <strong><a id="PerfViewEventStats">EventStats View</a></strong> - The EventStats
                            view shows the count roll-up of every event type that was collected in the trace.&nbsp;&nbsp;
                            It also shows which events have stack traces associated with them.&nbsp;&nbsp; This
                            view is mostly for PerfView diagnostic purposes when other views are not working
                            properly.&nbsp; It lets you quickly determine what is in the ETL file so you can
                            determine if it is a problem with data collection (the needed events are not present),
                            or presentation.
                        </li>
                        <li>
                            <strong><a id="Anti-MalwareReal-TimeScanStacks">Anti-Malware Realtime Scan Stacks</a></strong> - This view
                            shows the latency impact of realtime scan requests made by Windows Defender when an application does I/O.
                            &nbsp;&nbsp;&nbsp;
                        </li>
                    </ul>  <!-- END OF ADVANCED GROUP -->
                </li>
                <li>
                    <strong><a id="OldGroup">The Old Group</a></strong> - This folder contains
                    views whose functionality has likely been superceded by another (typically more
                    general purpose view).   These are likely to go away in the future after confirming
                    that all the functionality in them can be achieved using other views.
                    <ul>
                        <li>
                            <strong><a id="ServerRequestCPUStacks">Server Request CPU Stacks</a></strong> -
                            Shows CPU time (in milliseconds) rolled up by request.&nbsp; This view supports ASP.NET
                            and ASP.NET hosted WCF services.
                        </li>
                        <li>
                            <strong><a id="ServerRequestThreadTimeStacks">Server Request Thread Time Stacks</a></strong> -
                            Shows wall clock time rolled up by request.&nbsp; Wall clock time is comprised of CPU
                            time and blocked time and represents the amount of time that was spent on the request
                            regardless of how many threads were used.&nbsp; Any asynchronous operations are not rolled
                            up under the request, as they do not contribute to the wall clock time.&nbsp; This view supports
                            ASP.NET and ASP.NET hosted WCF services.
                        </li>
                        <li>
                            <strong><a id="ServerRequestManagedAllocationStacks">Server Request Managed Allocation Stacks</a></strong> -
                            This view performs the same task as the <a href="#GCHeapAllocStacks">GC Heap Alloc Stacks</a>
                            view, with the exception that results are rolled up by request. This view supports ASP.NET and
                            ASP.NET hosted WCF services.
                        </li>
                        <li>
                            <strong><a id="ASP.NETThreadTimeStacks">ASP.NET Thread Time Stacks</a></strong> -
                            The <a href="#PerfViewAspNetStats">ASP.NET Stats view</a> gives you a high level, aggregated view
                            of what is going on with ASP.NET requests over time.   The ASP.NET Thread Time Stacks view
                            lets you drill into more detail.   In particular this view works like the
                            <a href="#ThreadTimeStacks">Thread Time Stacks</a> view in that it shows you what every thread
                            is doing with its REAL (clock) time, however unlike the Thread Time Stacks view it groups the parts
                            of threads that are actually doing work on behalf of an particular request together.  Thus when
                            a request takes a long time (whether it be because of CPU or blocking on a database), that time
                            will show up in this view attributed to that request.  This makes it very straightforward to
                            quickly understand why requests take a long amount of real time to complete.
                            You will only get this view if you collected data with the
                            <a href="#ThreadTimeCheckbox">Thread </a>events.
                            See <a href="#BlockedTimeInvestigation">
                                Blocked time investigation
                            </a> for more details
                            on blocked time investigations.
                        </li>
                        <li>
                            <strong><a id="ASP.NETThreadTime(withTasks)Stacks">ASP.NET Thread Time (with Tasks) Stacks</a></strong> -
                            This view is basically a fusion of the <a href="#ASP.NETThreadTimeStacks">ASP.NET Thread Time Stacks</a> view
                            and the <a href="#ThreadTime(withTasks)Stacks">Thread Time (With Tasks) Stacks</a> view.
                            Like the ASP.NET Thread time view it shows you time grouped by request.  But like the Thread Time with
                            Tasks view if that request causes other tasks to be spawned as part of its operation those are
                            considered 'children' of that request and show up in the call stack that way.   This is
                            the most useful view if you are investigating a ASP.NET server that uses async as part of its implementation.
                        </li>
                        <li>
                            <strong><a id="ASP.NETThreadTime(CPUONLY)Stacks">ASP.NET Thread Time Stacks (CPU ONLY)</a></strong> -
                            Like the   <a href="#ASP.NETThreadTimeStacks">ASP.NET Thread Time Stacks</a> view except
                            that it indicates that only CPU sampling (not thread time events) were captured so the
                            view is impoverished.  It only shows you the CPU time spent (but won't tell you where the
                            request spent time blocked on networking/file/locks).   To get the full view the /threadTime
                            option must be enabled during data collection.
                        </li>
                    </ul> <!-- END OF OLD GROUP -->
                </li>
                <li>
                    <strong><a id="PerfViewAspNetStats">ASP .NET View</a></strong> - Show when ASP.NET
                    events were collected (these are collected by default or when the &#39;ASP.NET&#39;
                    provider is specified as a &#39;Additional Providers&#39;).&nbsp;&nbsp; This view
                    give you an &#39;overall view of what your ASP.NET server processes were doing,
                    including what the average throughput was, the average request response time, and
                    other basic information to quickly assess your server&#39;s basic health.  You typically
                    drill into more detail by using the <a href="#ASP.NETThreadTimeStacks">ASP.NET Thread Time Stacks view</a>.
                </li>
                <li>
                    <strong><a id="PerfViewIisStats">IIS Stats</a></strong> - Show when IIS ETW
                    events were collected (these are collected when the <b>IIS</b> checkbox or the <b>IIS:WWW Server</b>
                    provider is specified as a &#39;Additional Providers&#39;).&nbsp;&nbsp; This view
                    give you an &#39;overall view of request processing at an IIS level by analyzing IIS ETW events,
                    provides you details on the Top 100 slowest requests in the trace, gives you details of the failed requests in the trace
                    and allows you to drill down further on the modules causing slowness or failures. You can click on the individual requests
                    to see all events fired in the IIS pipeline for that very request. This view helps you diagnose slow performance issues in
                    various stages of request processing in the IIS pipeline.
                </li>
                <li>
                    <strong><a id="ETLEventSource">Events View</a></strong> - The Events view is the
                    &#39;raw&#39; view of the ETW events.&nbsp;&nbsp; Basically and ETL file is simply
                    a sequence of event payloads where each event consists of an event type, a timestamp,
                    an process and thread that generated the event, and additional event-specific information.&nbsp;&nbsp;
                    The Event viewer allows this information to be filtered (by time, process, and text),
                    and viewed in Excel.&nbsp;&nbsp;
                </li>
            </ul>
        <li>
            <strong><a id="CSVPerfViewData">XPERF CSV files (.CSV, .CSVZ files)</a></strong>
            - PerfView has the capability to read the .CSV files that XPERF can generated from
            .ETL files.&nbsp; This is not the recommended way of looking at ETW data because
            .CSV files tend to be 4 times larger, and a certain amount of information is lost
            when the file is converted to .CSV format.&nbsp;&nbsp;&nbsp; However sometimes that
            is all you have so PerfView supports basic operations on this data.&nbsp;
        </li>
        <li>
            <strong><a id="WTPerfViewFile">Windbg/CDB WT command output parsing (WT files)</a></strong>
            - Windbg/CDB has a very useful command called &#39;WT&#39; which will single step
            through a routine (and any sub-routines) and output statistics about how many instructions
            where executed in each routine.&nbsp;&nbsp; However this data is very voluminous
            and hard to read.&nbsp; If it is saved to a file with a .WT extension, then PerfView
            can read it and display it in the stack viewer, making analysis much easier.&nbsp;&nbsp;
        </li>
        <li>
            <strong>
                <a id="DebuggerStackPerfViewFile">
                    Windbg/DBG Debugger Stack Parser (.cdbstack
                    files)
                </a>
            </strong> - The WT command is useful for collecting fine-grained data
            about a particular routine using the debugger.&nbsp; It can also be useful to collect
            data about a particular resource (e.g. when some program API is called that allocates
            a resource) using a breakpoint that dumps the stack and the continues.&nbsp;&nbsp;
            If this output is placed in a file with a .cdbstack extension, then this can be
            viewed with PerfView.&nbsp;
        </li>
        <li>
            <strong>
                <a id="XmlPerfViewFile">
                    PerfView Stack Views (.PerfView.XML or .PerfView.XML.ZIP files)
                </a>
            </strong> - PerfView has the ability to save the data in a stack viewer
            as an XML file (or a ZIPed XML file).&nbsp;&nbsp; This is what the &#39;Save&#39;
            operation in the stack viewer does.&nbsp;&nbsp; Typically this file is MUCH smaller
            than the original ETL file but contains the important information for a broad variety
            of data analyses.&nbsp;&nbsp; This is often the best way to &#39;hand off&#39; an
            investigation to another programmer.&nbsp;&nbsp;&nbsp;&nbsp;  It is also possible to
            generate this XML file using another program and thus view external data using PerfView's
            powerful stack viewer.   PerfView supports both PerfView.xml as well as a PerfView.json
            variation.
        </li>
        <li>
            <strong>
                <a id="XmlTreeFile">
                    Xml Tree Views (*.tree.xml files)
                </a>
            </strong> - XmlTree files are files that encode tree based data as XML trees where
            each frame of the callstack is represented by and XML 'node' element.   Currently
            this works for the export format for the Java YourKit profiler, but will include
            other export formats that have roughly the same structure.
        </li>
        <li>
            <strong><a id="VmmapPerfViewFile">VMMAP data (.mmp files)</a></strong> - PerfView
            can read the (Version 3) .MMP files that are generated by the
            <a href="http://technet.microsoft.com/en-us/sysinternals/dd535533">VMMAP</a> utility.&nbsp;&nbsp;
            Typically this is valuable because PerfView can
            do diffs between two VMMAP files.&nbsp;&nbsp; PerfView&#39;s grouping operations
            are also handy.&nbsp;
        </li>
        <li>
            <strong><a id="ProcessDumpPerfViewFile">Process Dumps (.dmp files)</a></strong>
            - PerfView can extract information about the .NET GC Heap from a process dump file
            (created by Visual Studio, windbg, ntsd or cdb) in much the same way as with a live
            process.&nbsp;&nbsp; Double clicking on this entry performs the &#39;Take Heap Snapshot
            from Process Dump&#39; action.&nbsp;&nbsp;
        </li>
        <li>
            <strong>
                <a id="ClrProfilerHeapPerfViewFile">
                    .NET GC Heap (SOS format) (.gcHeap files)
                </a>
            </strong>- The .NET heap can be dumped using either PerfView or the&nbsp; <a href="http://msdn.microsoft.com/en-us/library/bb190764.aspx">SOS debugging extension</a>
            utility to create a .gcHeap file.&nbsp; Opening this file will display the heap
            in the stack viewer.&nbsp; The spanning tree is generated from the roots of the
            graph, and this tree is shown in the stack viewer.&nbsp;
        </li>
        <li>
            <strong>
                <a id="HeapDumpPerfViewFile">
                    .NET GC Heap (Dump format) (.gcDump files)
                </a>
            </strong>- This is the default format for dumping the GC heap in PerfView.
        </li>
        <li>
            <strong>
                <a id="ClrProfilerCodeSizePerfViewFile">
                    ClrProfiler data for CodeSize (.codeSize
                    files)
                </a>
            </strong> - If a data file (.log file) generated by
            <a href="http://en.wikipedia.org/wiki/CLR_Profiler">CLRProfiler</a> is renamed to
            have a .codeSize Extension, then PerfView can
            read it and give an analysis of the size of the code that was run during the data
            collection.&nbsp;&nbsp; This is useful for trying to minimize the cold startup time
            of managed code.&nbsp;
        </li>
        <li>
            <strong>
                <a id="ClrProfilerAllocStacksPerfViewFile">
                    ClrProfiler data for Allocations
                    (.allocStacks files)
                </a>
            </strong> - If a data file (.log file) generated by
            <a href="http://en.wikipedia.org/wiki/CLR_Profiler">CLRProfiler</a> is renamed to
            have a .allocStacks Extension, then PerfView can
            read it and give an analysis of the allocations there were done focusing on the
            stack (code location) where the allocations were made.&nbsp;&nbsp; This is in contrast
            to the Heap Dumps, which focus on the objects that REFER to the given object, not
            what code allocated it.
        </li>
        <li>
            <strong><a id="DiagSessionPerfViewFile">Diagnostics Session (.diagsession files)</a></strong>
            - PerfView can open and read Diagnostics Session files that contain resources PerfView
            understands. Supported resources include .NET GC Heap (Dump format) (.gcDump files),
            Process Dumps (.dmp files) and Visual Studio Diagnostics Hub ETL (.etl) files.
        </li>
    </ul>
    <hr />
    <!--  ****************** -->
    <!--  ****************** -->
    <h3><a id="ObjectViewerQuickStart">Quick Start for the Object Viewer</a></h3>
    <p>
        TODO NOT DONE
    </p>
    <ul>
        <li>TODO NOT DONE</li>
        <li>TODO NOT DONE</li>
    </ul>
    <hr />
    <!--  ****************** -->
    <h3><a id="ObjectViewerTips">Object Viewer Tips</a></h3>
    <p>
        In addition to the <a href="#GeneralTips">General Tips</a>, here are tips specific
        to the <a href="#ObjectViewer">Object Viewer</a>.
    </p>
    <ul>
        <li><strong>TODO</strong>&nbsp;-  fill in</li>
    </ul>
    <hr />
    <!--  ****************** -->
    <h3><a id="ObjectViewer">The Object Viewer</a></h3>
    <p>
        The object viewer is a view that lets you see specific information about a
        individual object on the GC heap.
    </p>
    <p>TODO NOT DONE </p>
    <hr />
    <!--  ****************** -->

    <h3>
        <a id="StackViewerQuickStart">Quick Start for the Stack Viewer</a>
    </h3>
    <p>
        While we do recommend that you walk the <a href="#Tutorial">tutorial</a>, if your
        goal is to understand what the stack viewer is showing you follow these steps
    </p>
    <ul>
        <li>
            The first view displayed is the &#39;ByName&#39; view suitable for a <a href="#TutorialBottomUp">
                bottom
                up investigation
            </a>.&nbsp; The items on this are sorted by the time that
            was spent <strong>exclusively</strong> in that item displayed.&nbsp;&nbsp; After
            determining that CPU is your problem, looking at the top items in this list is what
            you are interested in doing.
        </li>
        <li>
            If there are ? in the names of items at the top of this list, you need to select
            the cell, right click and select &#39;Lookup Symbols&#39;.&nbsp; See <a href="#ResolvingUnmanagedSymbols">
                resolving
                unmanaged symbols
            </a> for more.&nbsp; If all the time is spent in
            node that looks like &#39;OTHER&lt;&lt;DLL!Function&gt;&gt;&#39; It means that PerfView&#39;s
            default grouping is &#39;too strong&#39; and grouping too much for your scenario.&nbsp;&nbsp;&nbsp;
            See<a href="#StackViewerTroubleshooting"> stack viewer troubleshooting</a> for more.&nbsp;
        </li>
        <li>
            You should make sure that you are looking at an interesting time.&nbsp;&nbsp; In
            particular at process shutdown when profiling is active, there is overhead that
            your likely want to exclude.&nbsp;&nbsp; The easiest way to do this is to restrict
            your analysis to the time in which your Main method was active.&nbsp;&nbsp;&nbsp;
            To do this find Main in the ByName view (Ctrl F-&gt; type Main &lt;Enter&gt;) and
            select the first and last time by Ctrl Clicking on both of those entries then Right
            click -&gt; Set Time Range.&nbsp;&nbsp; See <a href="#ZoomingToARangeOfInterest">
                zooming
                to a range of interest
            </a> for more.
        </li>
    </ul>
    <hr />
    <!--  ****************** -->
    <h3>
        <a id="StackViewerDefaults">Setting Defaults in Stack Viewer</a>
    </h3>
    <p>
        You can set the default value used in the GroupPats and Fold textboxes using the  "File -> Set As Default Grouping/Folding"
        menu item.  These three values are persisted across PerfView sessions for that machine.   The 'File -> Clear User Config'
        will reset these persisted values to their defaults, which is simple way to undo a mistake.
    </p>
    <h3>
        <a id="StackViewerQuickStartGCHeap">Quick Start for the GC Heap Viewer</a>
    </h3>
    <p>
        While we do recommend that you walk the <a href="#TutorialGCHeap">tutorial</a>,
        and review <a href="#UnderstandingPerfDataGCHeap">Understanding GC Heap Perf Data</a>
        and <a href="#StartingAnAnalysisGCHeap">Starting an Analysis of GC Heap Dump</a>,
        if your goal is to see your memory profile data as quickly as possible, follow the
        following steps
    </p>
    <ol>
        <li>
            Determine if memory is of interest (see <a href="#WhenToCareAboutMemory">
                When to
                care about Memory
            </a> and in particular <a href="#WhenToCareAboutTheGCHeap">
                When
                to care about the GC Heap
            </a>, and take a GC heap snapshot (Memory -&gt; Take Heap Snapshot)
        </li>
        <li>
            Understand what the GC stack viewer is showing you, and in particular <a href="#PrimaryAndSecondaryNodes">
                what
                the difference is between primary and secondary nodes is
            </a>.
        </li>
        <li>
            Do Bottom up analysis of objects as described in<a href="#StartingAnAnalysisGCHeap">
                Starting a GC Heap Analysis
            </a>.
        </li>
    </ol>
    <hr />
    <!--  ****************** -->
    <h3>
        <a id="StackViewerTips">Stack Viewer Tips</a>
    </h3>
    <p>
        In addition to the <a href="#GeneralTips">General Tips</a>, here are tips specific
        to the <a href="#StackViewer">Stack Viewer</a>.
    </p>
    <ul>
        <li>
            <strong>Setting the default Grouping and Folding values</strong>&nbsp;-
            Don't like the default grouping.   You can change it.   Use the
            The File -> Set As Default Grouping/Folding to set it to the current value
            See <a href="#StackViewerDefaults">Setting Defaults in Stack Viewer</a> for more.
        </li>
        <li>
            <strong>Zooming in to a Time Range using &#39;When&#39; Field: </strong>&nbsp;-
            If you click on the cell in the &#39;When&#39; column it becomes editable.&nbsp;
            Select a region of text and then type &#39;Alt-R&#39;, which will zoom into the
            time range associated with the selected characters.&nbsp;
        </li>
        <li>
            <strong>Resolving Unmanaged Symbols </strong>&nbsp;- Select a range of cells (by
            dragging or shift-clicking) and then right click and select &#39;Lookup Symbols&#39;
        </li>
        <li>
            <strong>Goto Source</strong> - You can select name in the stack viewer, right click
            and select 'Goto Source' (Alt D), and it will open a text editor with the source
            code of the file with that method where each line is annotated with the metric on
            that line. This feature is very valuable, but can be fragile. See <a href="#SourceCodeLookup">
                Source
                Code Lookup
            </a> for more.
        </li>
        <li>
            <strong><a href="#ColumnSorting">Sorting by other columns</a></strong> - there is
            an small control directly to the right of each column name.&nbsp; If you click in
            this region, you can sort by that column (clicking again reverses the order of the
            sort).&nbsp;&nbsp;
        </li>
        <li>
            <strong>Rearranging columns</strong> - By dragging (click and move mouse), on a
            column header, you can move a column before or after where it is by default.&nbsp;&nbsp;
            This can be useful before you cut and paste.&nbsp;
        </li>
        <li>
            <strong>Saving Stack Views </strong>&nbsp;- Ctrl-S or File-&gt;Save in a stack viewer
            will save that view to a file. This saves all the&nbsp; filters, the symbol names
            that were looked up as well as the filtering options that were selected, the log
            file, any notes you made associated with the view.&nbsp;&nbsp; You can open this
            later and immediately return to the analysis you were doing.&nbsp;
        </li>
        <li>
            <strong>StackView Notes </strong>&nbsp;- There is a text-editor pain in the stack
            viewer that you can write arbitrary notes to yourself (or paste in the names of
            important methods or other data from the view).&nbsp; These notes will be saved
            when the view is saved.&nbsp;
        </li>
        <li>
            <strong>Opening Tree nodes</strong>- The space bar will open the currently selected
            tree node and move to the first child of the opened node.&nbsp; This means that
            you an open up the &#39;hottest&#39; path in a tree simply by repeatedly hitting
            the space bar.&nbsp;&nbsp;
        </li>
        <li>
            <strong>Cutting and pasting time ranges</strong>- If you type Alt-T (right click
            -&gt; Copy Time Range) it will copy the start and end times to the cut/paste buffer
            These can then be pasted into the &#39;start&#39; textBox to clone a time range
            from one view to another.&nbsp; &nbsp;&nbsp;
        </li>
        <li>
            <strong>Viewing event data</strong> If you wish to see all the events (and their
            associated information), you can select one or two times in the stack viewer and
            hit Alt-V to view that time range int the event viewer.
        </li>
    </ul>
    <hr />
    <!--  ****************** -->
    <h3>
        <a id="StackViewer">The Stack Viewer</a>
    </h3>
    <p>
        The stack viewer is main window for doing performance analysis.&nbsp; If you have
        not walked through the <a href="#Tutorial">tutorial</a> or the section on <a href="#StartingAnAnalysis">
            starting
            an analysis
        </a> and <a href="#UnderstandingPerfData">understanding perf data</a>,
        these would be good to read.&nbsp;&nbsp; Here is the layout of the stack viewer
    </p>
    <center>
        <img src="images/stackViewer.png" alt="StackViewer" />
    </center>
    <p>
        The stack viewer has three main views:&nbsp; <a href="#ByNameView">ByName</a>, <a href="#CallerCalleeView">Caller-Callee</a>, and <a href="#CallTreeView">CallTree</a>.&nbsp;&nbsp;
        Each view has its own tab in the stack viewer and the can be selected using these
        tabs.&nbsp; However more typically you use right click or keyboard shortcuts to
        jump from a node in one view to the same node in another view.&nbsp;&nbsp; Double
        clicking on any node in any view in fact will bring you to Caller-Callee view and
        set your focus to that node.&nbsp;
    </p>
    <p>
        Regardless of what view is selected, the samples under consideration and the grouping
        of those samples are the same for every view.&nbsp;&nbsp;&nbsp; This filtering and
        grouping is controlled by the text boxes at the top of the view and are described
        in detail in the<a href="#FilteringGroupingStackData"> section on grouping and filtering</a>.&nbsp;
    </p>
    <p>
        At the very top of the stack viewer is the summary statistics line.&nbsp; This gives
        you statistics about all the samples, including count, and total duration.&nbsp;&nbsp;&nbsp;
        It computes&nbsp; the &#39;TimeBucket&#39; size which is defined as 1/32 of the
        total time interval of the trace.&nbsp;&nbsp; This is the amount of time that is
        represented by each character in the <a href="#WhenColumn">When column</a>.<br>
        It also computes the <strong>Metric/Interval</strong>. This is a quick measurement of how
        CPU bound the trace is as a whole. A value of 1 indicates a program
        that on average consumes all the CPU from a single processor. Unless that is high, your problem is not CPU (it can be some blocking operation like network/disk read).<br>
        However this metric is average over the time data was collected, so can include
        time when the process of interest is not even running.&nbsp; Thus is typically better
        to use the <a href="#WhenColumn">When column</a> for the node presenting the process
        as a whole to determine how CPU bound a process is.&nbsp;
    </p>
    <p>
        In addition to the grouping/filtering textboxes, the stack viewer also has a <a href="#FindTextBox">find textbox</a>,
        which allows you to search (using .NET Regular expression)
        for nodes with particular names.&nbsp;
    </p>
    <!--  ****************** -->
    <h3>Column Descriptions</h3>
    <p>
        The columns displayed in the stack viewer grids&nbsp; independent of the view displayed.&nbsp;&nbsp;
        Columns can be reordered simply by dragging the column headers to the location you
        wish, and most columns can be sorted by clicking on an (often invisible) button
        in the column header directly to the right of the column header text.&nbsp;&nbsp;&nbsp;
        The columns that are display are:
    </p>
    <ul>
        <li>
            <a id="NameColumn"><strong>Name</strong></a> - Each frame on the stack is given
            a name, it starts out as a name of the form module!fullMethodName but may be morphed
            by grouping.&nbsp;&nbsp; There might also be a suffix of the form [N-M frames].&nbsp;&nbsp;
            This is used in the <a href="#CallTreeView">CallTree view</a> whenever a node has
            only one child, which is itself.&nbsp;&nbsp;&nbsp; In this case there is no interesting
            information in chain of calls and so they are combined into a single node however
            the nodes is annotated with the minimum and maximum number of frames that were combined
            for any particular call stack to show that this transformation happened.&nbsp;&nbsp;&nbsp;
            This combining occurs most frequently when the frame name is a group.&nbsp;
        </li>
        <li>
            <a id="ExcColumn"><strong>Exc</strong></a> - The amount of cost (msec of CPU time)
            that can be attributed to the particular method itself (not any of its callees)
            Note that this DOES include any cost that was folded into this node because of <a href="#FoldPatsTextBox">FoldPats</a> or <a href="#FoldPercentTextBox">Fold %</a>
            specifications . <a href="#ColumnSorting">Can sort by it</a>.
        </li>
        <li>
            <a id="ExcPercentColumn"><strong>Exc %</strong></a> - The exclusive cost expressed
            as a percentage of the total cost of all samples.&nbsp; <a href="#ColumnSorting">
                Can
                sort by it
            </a>.
        </li>
        <li>
            <a id="ExcCountColumn"><strong>Exc Ct</strong></a> - The count of samples (instances)
            that are associated with just this entry (not its children). This <b>does</b> include
            any instances included because of <a href="#FoldPatsTextBox">FoldPats</a> or
            <a href="#FoldPercentTextBox">Fold %</a> specifications. <a href="#ColumnSorting">Can sort by it</a>.
        </li>
        <li>
            <a id="FoldColumn"><strong>Fold %</strong></a> The exclusive cost that has been
            folded (inlined) into this node because of <a href="#FoldPatsTextBox">FoldPats</a>
            or <a href="#FoldPercentTextBox">Fold %</a> specifications. <a href="#ColumnSorting">Can sort by it</a>.
        </li>
        <li>
            <a id="FoldCountColumn"><strong>Fold Ct</strong></a> - The count of items that have
            folded (inlined) into this node because of <a href="#FoldPatsTextBox">FoldPats</a>
            or <a href="#FoldPercentTextBox">Fold %</a> specifications. samples (instances)
            that are associated with just this entry (not its children). <a href="#ColumnSorting">Can sort by it</a>.
        </li>
        <li>
            <a id="IncColumn"><strong>Inc</strong></a> - The cost associated with this node
            as well as all its children (callees) recursively.&nbsp;&nbsp; The inclusive cost
            of the ROOT contains all costs.&nbsp; <a href="#ColumnSorting">Can sort by it</a>.
        </li>
        <li>
            <a id="IncPercentColumn"><strong>Inc %</strong></a> - The inclusive cost expressed
            as a percentage of the total cost of all samples (will be 100% for the ROOT node)
            <a href="#ColumnSorting">Can sort by it</a>.
        </li>
        <li>
            <a id="IncCountColumn"><strong>Inc Ct</strong></a> - The count of samples (instances)
            that are associated with this entry or any of children (callees) recursively. <a href="#ColumnSorting">Can sort by it</a>.
        </li>
        <li>
            <a id="IncAvgColumn"><strong>Inc Avg</strong></a> - The average metric per sample of samples (instances)
            that are associated with this entry or any of children (callees) recursively. This is simply InclusiveMetric / InclusiveCount.
        </li>
        <li>
            <a id="WhenColumn"><strong>When</strong></a> - This is a visualization of how the
            INCLUSIVE samples collected for that node vary over time.&nbsp;&nbsp; The total
            range (from the Start and End text boxes), is divided into 32 time &#39;TimeBuckets&#39;
            and the inclusive samples for that node are accumulated into those 32 buckets.&nbsp;&nbsp;
            Each bucket is then represented as a digit that represents a scaled value.&nbsp;&nbsp;
            <ul>
                <li>_ means no samples occurred in that bucket.&nbsp; </li>
                <li>. means that interval consumed between&nbsp; 0% and .1%. </li>
                <li>o means that interval consumed between&nbsp; .1% and 1%. </li>
                <li>0 means that interval consumed between&nbsp; 1% and 10%. </li>
                <li>1 means that interval consumed between 10% and 20% </li>
                <li>...</li>
                <li>9 means that interval consumed between 90% and 100%</li>
                <li>A means that interval consumed between 100% and 110%</li>
                <li>...</li>
                <li>Z means that interval consumed between 350% and 360%</li>
                <li>* means that interval consumed over 360%</li>
                <li></li>
                <li>a means that interval consumed between&nbsp; 0% and -10%</li>
                <li>b means that interval consumed between -10% and -20%</li>
                <li>...</li>
                <li>z means that interval consumed between -250% and -260%</li>
                <li>* means that interval consumed over -260 %</li>
            </ul>
            For resources like CPU, or Disk or blocked time, where there is an obvious relationship
            of consuming a resource (cpu, disk, thread), for a period of time, 100% represents
            consuming 1 of those resources for the period of time of the bucket.&nbsp;&nbsp;
            Thus for CPU, &#39;A&#39; would represent consuming a single CPU for the duration
            of the bucket.&nbsp;&nbsp;&nbsp; For these metrics you can get greater than 100%
            by consuming multiple resources (e.g. if on average you consume 2 CPUs over an interval
            than you will get &#39;K&#39; (200%)).&nbsp;&nbsp;&nbsp; If the metric does not
            have an relationship with time (e.g. memory allocation), then 100% is simply half
            the maximum value over all buckets (that is we scale it so that you will always
            get one &#39;K&#39;).&nbsp; It is quite useful to select time ranges based on the
            &#39;When&#39; field to &#39;zoom in&#39; on an area of high CPU usage.&nbsp;&nbsp;
            See <a href="#SelectingTime">selecting time ranges</a> for more.&nbsp;
        </li>
        <li>
            <a id="FirstColumn"><strong>First</strong></a> - This is the time (in msec from
            the beginning of the trace) of the <strong>first</strong> inclusive sample associated
            with this name.&nbsp; See <a href="#SelectingTime">selecting time ranges</a> for
            more.&nbsp; <a href="#ColumnSorting">Can sort by it</a>.
        </li>
        <li>
            <a id="LastColumn">Last</a> - This is the time (in msec from the beginning of the
            trace) of the <strong>last</strong> inclusive sample associated with this name.&nbsp;
            See <a href="#SelectingTime">selecting time ranges</a> for more.&nbsp; <a href="#ColumnSorting">Can sort by it</a>.
        </li>
    </ul>
    <h4><a id="ColumnSorting">Column Sorting</a></h4>
    <p>
        Many of the columns in the PerfView display can be used to sort the display.  You do this by clicking on the column header
        at the top of the column.   Clicking again switches the direction of the sort.   Be sure to avoid clicking on the hyperlink text
        (it is easy to accidentally click on the  hyperlink).   Clicking near the top typically works, but you may need to make the column
        header larger (by dragging one of the column header separators).   There is already a request to change the hyperlinks so that
        it is easier to access the column sorting feature.
    </p>
    <p>
        There is a known bug that once you sort by a column the search functionality does not respect the new sorted order.  This means
        that searches will seem to randomly jump around when finding the next instance.
    </p>
    <!--  ****************** -->
    <h3>
        <a id="ByNameView">ByName View (Group by Method)</a>
    </h3>
    <p>
        The default view for the stack viewer is the ByName View.&nbsp; In this view EVERY
        node (method or group) is displayed, shorted by the total EXCLUSIVE time for that
        node.&nbsp; This is the view you would use for a <a href="#TopDownBottomUpAnalysis">bottom up analysis</a>.&nbsp;&nbsp;&nbsp;
        See the <a href="#TutorialBottomUp">tutorial</a>
        for an example of using this view.&nbsp;&nbsp; Double clicking on entries will send
        you to the <a href="#CallerCalleeView">Caller-Callee View</a> for the selected node.&nbsp;
    </p>
    <p>
        &nbsp;See <a href="#StackViewer">stack viewer</a> for more.&nbsp;
    </p>
    <!--  ****************** -->
    <h3>
        <a id="CallTreeView">CallTree View</a>
    </h3>
    <p>
        The call tree view shows how each method calls other methods and how many samples
        are associated with each of these called starting at the root.&nbsp;&nbsp;&nbsp;
        It is an appropriate view for doing a <a href="#TopDownBottomUpAnalysis">top down analysis</a>.&nbsp;&nbsp;
        Each node has a checkbox associated with it that displays all the children of that
        node when checked.&nbsp;&nbsp; By checking boxes you can drill down into particular
        methods and thus discover how any particular call contributes to the overall CPU
        time used by the process.&nbsp;&nbsp;
    </p>
    <center>
        <img src="images/CallTreeView.png" alt="CallTreeView" />
    </center>
    <p>
        The call tree view is also well suited for &#39;zooming in&#39; to a region of interest.&nbsp;&nbsp;
        Often you are only interested in the performance of a particular part of the program
        (e.g., the time between a mouse click and the display update associated with that click)&nbsp;&nbsp;
        These regions of time can typically be easily discovered by either looking for regions
        of high CPU utilization using the When column on the Main program node, or by finding
        the name of a function known to be associated with the activity an using the &#39;SetTimeRange&#39;
        command to limit the scope of the investigation.&nbsp;
    </p>
    <p>
        Like all stack-viewer views, the grouping/filtering parameters are applied before
        the calltree is formed.&nbsp;
    </p>
    <p>
        If the stack viewer window was started to display the samples from all processes,
        each process is just a node off the &#39;ROOT&#39; node.&nbsp;&nbsp;&nbsp; This
        is useful when you are investigating &#39;why is my machine slow&#39; and you don&#39;t
        really know what process to look at.&nbsp;&nbsp; By opening the ROOT node and looking
        at the When column, you can quickly see which process is using the CPU and over&nbsp;
        what time period.&nbsp;
    </p>
    <p>
        See the <a href="#TutorialTopDown">tutorial</a> for an example of using this view.&nbsp;&nbsp;
        See <a href="#StackViewer">stack viewer</a> for more.&nbsp;
        See <a href="#FlameGraphView">flame graph</a> for different visual representation.
    </p>
    <!--  ****************** -->
    <h3>
        <a id="CallerCalleeView">Caller Callee View</a>
    </h3>
    <p>
        The caller-callee view is designed to allow you to focus on the resource consumption
        of a single method.&nbsp;&nbsp;&nbsp;&nbsp; Typically you navigate to here by navigating
        from either the ByName or Calltree view by double-clicking on a node name.&nbsp;&nbsp;
        If you have a particular method you are interested in, search for it (<a href="#FindTextBox">
            find
            textbox
        </a>) in the <a href="#ByNameView">ByName view</a> and then double click
        on the entry.&nbsp;
    </p>
    <center>
        <img src="images/CallerCalleeView.png" alt="CallerCalleeView" />
    </center>
    <p>
        The ByName view has the concept of the &#39;Current Node&#39;.&nbsp; This is the
        node of interest and is the grid line in the center of the display.&nbsp;&nbsp;
        The display then shows all nodes (methods or groups) that were called by that current
        node in the lower grid and all nodes that called the current node in the upper pane.&nbsp;&nbsp;
        By double clicking on nodes in either the upper or lower pane you can change the
        current node to a new one, and in that way navigate up and down the call tree.
    </p>
    <p>
        Unlike the CallTree view, however, a node in the Caller-Callee view represents ALL
        calls of the current node.&nbsp;&nbsp;&nbsp; For example in the CallTree view the
        node representing &#39;SpinForASecond&#39; represent all instances of that function
        that have the SAME PATH TO THE ROOT.&nbsp;&nbsp; Thus you will see several instances
        of &#39;SpinForASecond&#39; in the CallTree view.&nbsp;&nbsp; However if I was trying
        to understand the impact of &#39;SpinForASecond&#39; on the whole program, it would
        be hard to do so in the CallTree view because it would look at all those nodes.&nbsp;&nbsp;
        The Caller-Callee view aggregates all the different paths to &#39;SpinForASecond&#39;
        so you can understand quickly ALL the callers of &#39;SpinForASecond&#39; and all
        the callees of &#39;SpinForASecond&#39; over the entire program.&nbsp;
    </p>
    <p>
        It is important to realize that as you double click on different nodes to make the
        current the SET OF SAMPLES CHANGES.&nbsp;&nbsp; When the current node is &#39;SpinForASecond&#39;
        then this view shows ONLY samples that had SpinForASecond&#39; in their call stack.&nbsp;&nbsp;
        However if you double click on &#39;DateTime.get_Now&#39; (a child of &#39;SpinForASecond&#39;)
        then the view will&nbsp; now include samples where &#39;DateTime.get_Now&#39; was
        called by call stacks that did not include &#39;SpinForASecond&#39; and will NOT
        include call stacks that called &#39;SpinForASecond&#39; but not &#39;DateTime.get_Now&#39;
        .&nbsp;&nbsp;&nbsp; This can be confusing if you are not aware it is happening.
    </p>
    <p>
        Sometimes you wish to view all the ways you can get to the root from a particular
        node.&nbsp;&nbsp; You can&#39;t do this using the caller-callee view directly because
        of the issue of changing sample sets.&nbsp;&nbsp; You can simply search for the
        node in the CallTree view, however it will not sort the paths by weight, which makes
        finding the &#39;most important&#39; path more difficult.&nbsp;&nbsp; You can however
        select the current node, right click and select &#39;Include Item&#39;.&nbsp; This
        will cause all samples that do NOT include the current node to be filtered away.&nbsp;&nbsp;
        This should not change the current caller-callee view because that view already
        only considered nodes that included the current node.&nbsp;&nbsp; Now however as
        you make other nodes current, they TOO will be only consider nodes that include
        the original node as well as the new current node.&nbsp;&nbsp; By clicking on caller
        nodes you can trace a path back to the root.&nbsp;
    </p>
    <p>
        Because the caller-callee view aggregates ALL samples which have the current node
        ANYWHERE in its call stack there is a fundamental problem with recursive functions.&nbsp;&nbsp;
        If a single method occurs multiple times on the stack a naive approach would count
        the same SINGLE sample MULTIPLE times (once for each instance on the call stack),
        leading to erroneous results.&nbsp;&nbsp; You can solve the double-counting problem
        by only counting the sample for the first (or last) instance on the stack, but this
        skews the caller-callee view (it will look like the recursive function never calls
        itself which is also inaccurate).&nbsp;&nbsp; The solution that PerfView chooses
        is to &#39;split&#39; the sample.&nbsp;&nbsp; If a function occurs N times on the
        stack than each instance is given a sample size of 1/N.&nbsp;&nbsp; Thus the sample
        is not double-counted but it also shows all callers and callees in a reasonable
        way.&nbsp;&nbsp;&nbsp;
    </p>
    <p>
        &nbsp;See <a href="#StackViewer">stack viewer</a> for more.&nbsp;
    </p>
    <!--  ****************** -->
    <h3>
        <a id="CallersView">Callers View</a>
    </h3>
    <p>
        The callers view shows you all possible callers of a method.&nbsp;&nbsp; It is a
        treeview (like the calltree view), but the &#39;children&#39; of the nodes are the
        &#39;callers&#39; of the node (thus it is &#39;backwards&#39; from the calltree
        view).&nbsp;&nbsp;&nbsp;&nbsp; A very common methodology is to find a node in the
        &#39;byname&#39; view that is reasonably big, look at its callers (&#39;by double
        clicking on the entry in the byname view), and then look to see if there are better
        semantics groupings &#39;up the stack&#39; that this node should be folded into.&nbsp;
    </p>
    <p>
        If you double click on an entry in the Callers view it becomes the focus node for
        the callers view, callees view and caller-callees view.&nbsp; Thus it is fairly
        common to double click on an entry, switch to the Callees view, double click on
        another entry and switch back.&nbsp;
    </p>
    <p>
        In the callers view the top node is always the aggregation of all uses of a particular
        method regardless of the caller. Thus the top line's statistics should always agree
        with the statistics in the 'By Name' view. Moreover any children of a node represent
        the callers of the parent node. This means
    </p>
    <ul>
        <li>
            The sum of the inclusive time of all children nodes will be equal to the parent's
            inclusive time.
        </li>
    </ul>
    <p>
        Any children in the Callers view represent callers of the parent node. These will
        always have an exclusive time of 0, because by definition a caller is NOT the terminal
        method of the stack (since it called something else).
    </p>
    <h4>
        <a id="CallerAndCalleeRecursion">Handling of Recursion in the Caller and Callees view</a>
    </h4>
    <p>
        Both the callers view and the callees view is formed by finding all samples that
        contain the focus frame an looking at the appropriate related node (caller or callee)
        related frame. However when the focus frame is a recursive function there is a because
        there are multiple choices for the caller and callees depending on which recursion
        instance is chosen.
    </p>
    <p>
        PerfView resolves this by always choosing the 'deepest' instance of the recursive
        function in the stack. Thus if A calls B calls C calls B calls D, and the focus
        node was B, then this sample would have a caller of C (not A) and a callee of D
        (not C).
    </p>
    <!-- TODO EXPLAIN RECURSION -->
    <!--  ****************** -->
    <h3>
        <a id="CalleesView">Callees View</a>
    </h3>
    <p>
        The callees view is a treeview that shows all possible callees of a given node.&nbsp;&nbsp;
        It is very similar to the treeview, but where the treeview always starts at the
        root, the callees view always starts at the &#39;focus&#39; node and includes ALL
        stacks that reach that callee.&nbsp;&nbsp; In the calltree view the different instances
        of the node would be scattered across the call tree, and would be hard to focus
        on.&nbsp;
    </p>
    <p>
        If you double click on an entry in the Callees view it becomes the focus node for
        the callees view, callers view and caller-callees view.&nbsp; Thus it is fairly
        common to double click on an entry, switch to the Callers view, double click on
        another entry and switch back.&nbsp;
    </p>
    <p>
        Like the Caller's view there is an issue with double counting when recursive functions
        are involved. See <a href="#CallerAndCalleeRecursion">
            Handling of Recursion in the Caller
            and Callees view
        </a> for more.
    </p>
    <!--  ****************** -->
    <h3>
        <a id="FlameGraphView">Flame Graph View</a>
    </h3>
    <p>
        The flame graph view shows the same data as call tree view, but using different visualization.&nbsp;&nbsp;&nbsp;
        It gives you very intelligible overview.&nbsp;&nbsp;
        The graph starts at the bottom. Each box represents a method in the stack. Every parent is the caller, children are the callees.
        The wider the box, the more time it was on-CPU. The samples count is shown in the tooltip and in the bottom panel.
        To change the content of the flame graph you need to apply the filters for call tree view.&nbsp;&nbsp;
        To learn more about Flame Graphs please visit <a href="http://www.brendangregg.com/flamegraphs.html">http://www.brendangregg.com/flamegraphs.html</a>
    </p>
    <center>
        <img src="images/FlameGraphView.png" alt="FlameGraphView" />
    </center>
    <p>
        The flame graph view in PerfView traditionally reflects the amount of consumed memory, but this can change when we graph the stack differences.
        After garbage collection, amount of memory consumed by a type can be negative when inspected in stack differences.
        In those cases, the corresponding flame graph boxes are drawn with a blue hue, pointing to a memory gain.
        Increasing memory usage is drawn with yellow/red tint as usual.
    </p>
    <center>
        <img src="images/FlameGraphDiffView.png" alt="FlameGraphDiffView" />
    </center>
    <hr />
    <!--  ****************** -->
    <h3>
        <a id="NotesView">Notes View</a>
    </h3>
    <p>
        This allows you to keep notes. This view is contains the same data as in the &#39;Notes
        Pane&#39; that you can toggle with the F2 key.&nbsp; These notes are saved when
        the view is saved, and thus allows you to keep information like the leads you need
        to follow up on during the investigation.&nbsp; The notes pane is particularly useful
        i you need to &#39;hand off&#39; the investigation to another person.&nbsp; By putting
        the &#39;explanation&#39; of the performance problem in the note pane, and sending
        the saved view, the next person can &#39;pick up&#39; where you left off.
    </p>
    <hr />
    <!--  ****************** -->
    <h3>
        <a id="NameTextBox">Reusing Filtering Parameters</a>
    </h3>
    <h4>Naming Parameter sets</h4>
    <p>
        It is often the case that the grouping and filtering parameters definition get reasonably
        complex however they have a relatively simple semantic meaning.&nbsp; It is also
        useful to be able to save and reuse these parameters for other investigations.&nbsp;&nbsp;
        To facilitate this, filter parameter sets can be given a name (simply by entering
        text in the Name text box, and this name can later be used to identify this filter
        parameter set.
    </p>
    <p>
        <!-- TODO Implement -->
        Named Parameter set are current not used by PerfView.
    </p>
    <!--  ****************** -->
    <h3>
        <a id="Diff">Diffing Two Traces</a>
    </h3>
    <p>
        PerfView has the capability of taking the difference between two stack views.&nbsp;
        This is very useful for understanding the cause of a regression caused by a recent
        change.&nbsp;&nbsp; To use this capability you should
    </p>
    <ul>
        <li>
            Open a stack view for both the &#39;test&#39; and the &#39;baseline&#39; that you
            are interested in.
        </li>
        <li>
            Apply any filtering to isolate the scenario of interest (e.g if you only care about
            startup, set the time filter to exclude any other samples).&nbsp;&nbsp; It MUST
            be the case that the two traces represent equivalent work.&nbsp;&nbsp;&nbsp; Moreover,
            the smaller the trace, the easier it will be to analyze.&nbsp;&nbsp; Thus the more
            you can filter it down, the better.&nbsp;&nbsp; While you can just skip this step,
            some effort here will pay off later.&nbsp;&nbsp;
        </li>
        <li>
            Resolve any symbols you think you might need (Right click -&gt; Lookup Warm Symbols
            is often a fine choice).&nbsp; This is because &#39;Lookup Symbols&#39; does not
            work for diffs.
        </li>
        <li>
            Go to the stack view for the &#39;test&#39; data select the &#39;Diff&#39; menu
            bar.&nbsp;&nbsp; Under it you will find every other open stack view (and in particular
            the baseline you also opened).&nbsp; Select this baseline.
        </li>
    </ul>
    <p>
        PerfView will then open up a stack view which contains the different between the
        &#39;test&#39; view and the &#39;baseline&#39; you selected.&nbsp;&nbsp; The algorithm
        it uses to do this is VERY simple.&nbsp; It simply negates the metric for the baseline,
        and then combines these samples with the samples of the test (which are unmodified).&nbsp;
        The result is a trace that has a sample which has the sum of the samples from of the &#39;test&#39;
        and &#39;baseline&#39; however the count value and metric value for all the samples in the baseline are NEGATIVE.
        This means that the counts and metric values will often &#39;cancel out&#39;, leaving just what is in the test
        but not the baseline.&nbsp;&nbsp;
    </p>
    <p>
        Like a normal investigation you should start your &#39;diff&#39; investigation using
        the &#39;By Name&#39; view.&nbsp;&nbsp;&nbsp; In a typical investigation the &#39;test&#39;
        trace has strictly more metric (the regression) than the baseline, and this is reflected
        in the totals for the diff (the total metric for the diff should be the total metric
        for the test minus the total metric for the baseline).&nbsp;&nbsp; The &#39;ByName&#39;
        view&nbsp; then shows you where this difference came from with respect to the groups
        that have been selected with the &#39;GroupPats&#39; (just like a normal trace).&nbsp;&nbsp;
    </p>
    <p>
        If you are lucky, each line in the &#39;By Name&#39; view is positive (or a very
        small negative number).&nbsp;&nbsp; This is the &#39;easy&#39; case, and when this
        happens you have the information you are interested in (the precise groups that
        have additional cost in the test but not the baseline are at the top of the By Name
        view.&nbsp; From this point the diff investigation works just like a normal investigation
        (you can drill down, look at other views, change groupings, fold etc...)
    </p>
    <p>
        However, it is not uncommon to have large negative values in the view.&nbsp;&nbsp;
        When this happens the diff is not that useful because we are interested in the ADDITIONAL
        time in the test trace, but the negative numbers in the view are telling us that
        the are big places where the baseline used more time than the test.&nbsp;&nbsp;&nbsp;
        Clearly the sum has to add up to the final regression, but as long as there are
        large negative values in the view, we can&#39;t trust the large positive values
        in the view because they MAY be canceled by the negative values.
    </p>
    <p>
        Thus analysis of a diff trace always has an addition step:&nbsp; <strong>
            &nbsp;After
            you have formed the diff view but before you have don any analysis, you must use
            the grouping/folding/filtering operators to ensure that negative values have been
            &#39;cancel out&#39; sufficiently
        </strong>.&nbsp;&nbsp; The view needs to have
        only has positive metric numbers (or inconsequential negative numbers).&nbsp;&nbsp;
    </p>
    <p>
        &nbsp;In fact PerfView already helps with this.&nbsp;&nbsp; Normally a process and
        thread node in the stack display contains the process and thread ID for that node.&nbsp;&nbsp;
        While this is useful information it also means the nodes from the baseline and test
        trace are likely to NEVER match (since they have different IDs).&nbsp;&nbsp; If
        left uncorrected, this would cause the &#39;TreeView&#39; to become pretty useless
        (it would show a large positive number under the &#39;test&#39; process, and a slightly
        smaller large negative number under the &#39;baseline&#39; but there would be no
        cancellation.&nbsp;&nbsp; PerfView fixes this by providing groupings that effectively
        remove the process and thread ID from the nodes.&nbsp; Now the nodes match and you
        get the desired cancellation.&nbsp;&nbsp;
    </p>
    <p>
        PerfView can only do so much, however.&nbsp;&nbsp; It can anticipate the need to
        rewrite the process and thread IDs, but it can&#39;t know that you renamed some
        function, or that lazy initialization caused the cost of some initialization to
        move from one place to another.&nbsp;&nbsp; In short PerfView can&#39;t know all
        the &#39;expected&#39; differences that you wish to ignore.&nbsp; It is your job
        as the analyst to make &#39;expected&#39; differences &#39;match exactly&#39; and
        thus cancel out.&nbsp;
    </p>
    <p>
        PerfView&#39;s powerful folding and grouping operators are tools you will use to
        create this cancellation..&nbsp;&nbsp; The mantra to remembers is &#39;grouping is
        your friend&#39;, keep your groups as large as possible.&nbsp;&nbsp;&nbsp;&nbsp;
        In particular
    </p>
    <ul>
        <li>
            <strong>
                If you are having problems with cancellation, First try&nbsp; using the &#39;Group
                By Module&#39; group in the &#39;GroupPats&#39; textbox to isolate the difference
                to a module.
            </strong>
        </li>
    </ul>
    <p>
        The rationale behind this strategy is straightforward.&nbsp;&nbsp; The larger the
        groups you form, the more likely &#39;inconsequential&#39; differences will simply
        &#39;cancel out&#39;.&nbsp;&nbsp;&nbsp; Modules tend to be the most useful &#39;big
        group&#39; and thus grouping all samples by module is likely to show you a view
        where cancellation worked (only small negative numbers in the view).&nbsp;&nbsp;
        Once you identify the samples in a particular module that are responsible for the
        regression, you can then use the &#39;Drill Into&#39; functionality to isolate JUST
        THOSE SAMPLES, and change the groupings to show you more detail.&nbsp;&nbsp; This
        tends to be a very useful strategy.&nbsp;
    </p>
    <h4>More Diffing Cancellation Strategies</h4>
    <p>
        The main technique for achieving cancellation in a diff is to pick big groups and
        then Drill into only those samples that are of interest.&nbsp;&nbsp; However there
        are some other useful things to remember.
    </p>
    <ol>
        <li>Keep the scenario as small as possible.&nbsp;&nbsp; </li>
        <li>
            Typically only a &#39;bottom up&#39; analysis works for diffs.&nbsp; It is just
            too easy for there to be differences &#39;near the top&#39; of the stack that will
            frustrate cancellation.&nbsp; Avoid this by doing a bottom up analysis (the &#39;By
            Name&#39; view and the <a href="#CalleesView">callee's view</a>).
        </li>
    </ol>
    <p>
        <strong>Fixing Renamed functions</strong>
    </p>
    <p>
        Grouping lets you literally rename any node name to any other node name.&nbsp; Thus
        you can &#39;fix&#39; any &#39;expected&#39; differences in a trace.&nbsp;&nbsp;
        For example if MyDll!MethodA was renamed to MyDll!MethodB, you could add the grouping
        pattern
    </p>
    <p style="margin-left: 40px">
        MyDll!MethodA-&gt; MethodA;MyDll!MethodB-&gt;MethodAAl!MethodB-&gt;MethodA
    </p>
    <p>
        which &#39;renames&#39; both of them to simply &#39;MethodA&#39; and resolves the
        diff.&nbsp;&nbsp; Folding can also be used to resolve differences like this.&nbsp;
        For example if these two methods are not event interesting (you don&#39;t need to
        see them on the call stacks), then you could simply fold both of them always with
        the folding pattern
    </p>
    <p style="margin-left: 40px">
        MethodA;MethodB
    </p>
    <p>
        which makes both of them disappear (and thus can&#39;t cause a difference).&nbsp;&nbsp;&nbsp;
    </p>
    <hr />
    <!--  ****************** -->
    <h3><a id="Regression">Regression Investigation with Overweight Analysis</a> </h3>
    <p>
        Overweight analysis is a fairly simple technique in which the inclusive cost of all symbols from two traces are analyzed. Normally a time metric is used but any inclusive cost could work.
    </p>
    <p>
        The idea is this: using the base and the test runs it's easy to get the overall size of the regression. Let's say it was 10%. From there you could take as your null hypothesis that everything is just 10% slower. What you're looking for is symbols that changed
        more than 10% and are therefore in some sense more responsible for the change. The overweight report in this case would simply compute the ratio of the actual growth compared to the expected growth of 10%. When you find symbols with greater than 100% overweight
        those are of great interest.
    </p>
    <p>
        Suppose main calls f and g and does nothing else. Each takes 50ms for a total of 100ms. Now suppose f gets slower, to 60ms. The total is now 110, or 10% worse. How is this algorithm going to help? Well let's look at the overweights. Of course main is 100
        going to 110, or 10%, it's all of it so the expected growth is 10 and the actual is 10. Overweight 100%. Nothing to see there. Now let's look at g, it was 50, stayed at 50. But it was 'supposed' to go to 55. Overweight 0/5 or 0%. And finally, our big winner,
        f, it went from 50 to 60, gain of 10. At 10% growth it should have gained 5. Overweight 10/5 or 200%. It's very clear where the problem is! But actually it gets even better.
    </p>
    <p>
        Suppose that f actually had two children x and y. Each used to take 25ms but now x slowed down to 35ms. With no gain attributable to y, the overweight for y will be 0%, just like g was. But if we look at x we will find that it went from 25 to 35, a gain
        of 10 and it was supposed to grow by merely 2.5 so its overweight is 10/2.5 or 400%. At this point the pattern should be clear:<br>
        <br>
        The overweight number keeps going up as you get closer to the root of the subtree which is the source of the problem. Everything below that will tend to have the same overweight. For instance if the problem is that x is being called one more time by f you'd
        find that x and all its children have the same overweight number.
    </p>
    <p>
        This brings us to the second part of the technique. You want to pick a symbol that has a big overweight but is also responsible for a largeish fraction of the regression. So we compute its growth and divide by the total regression cost to get the responsibility
        percentage. This is important because sometimes you get leaf functions that had 2 samples and grew to 3 just because of sampling error. Those could look like enormous overweights, so you have to concentrate on methods that have a reasonable responsibility
        percentage and also a big overweight. The report automatically filters out anything with less than &#43;/- 2% responsibility.
    </p>
    <p>
        Most of this summary is available online with more examples <a href="http://blogs.msdn.com/b/ricom/archive/2014/12/13/a-systematic-approach-to-finding-performance-regressions-using-overweight-analysis.aspx">
            here.
        </a>
    </p>
    <hr>
    <!--  ****************** -->
    <h3>
        <a id="EventViewerQuickStart">Quick Start for the Event Viewer</a>
    </h3>
    <p>
        The Event Viewer is a relatively advanced feature that lets you see the &#39;raw&#39;
        events collected in an ETL file.&nbsp;&nbsp; To get started as quickly as possible
    </p>
    <ul>
        <li>
            First go back to the ETL file in the main viewer and double click the &#39;EventStats&#39;
            icon under the ETL file.&nbsp; This will give an HTML report of the counts of all
            the events that were collected.&nbsp;&nbsp; This gives you a &#39;rough&#39; idea
            of what is actually in the file.&nbsp;
        </li>
        <li>
            Next launch the Event Viewer (double click on the &#39;Events&#39; icon for the
            ETL file.&nbsp;&nbsp; Click on the left pane and hit Ctrl-A to select all the events
            and hit the enter key.&nbsp;
        </li>
        <li>
            This will display all the events in the trace from in chronological order in the
            right pane.&nbsp;&nbsp;
        </li>
        <li>
            Typically you will want to select a process of interest (select from the dropdown
            view in the &#39;Process Filter&#39; textbox).&nbsp; You will also only want to
            select particular events&nbsp; (by selecting events names in the left pane), and
            a particular time range (in the Start and End text boxes).&nbsp;&nbsp; You can also
            filter the events to those that only contain a certain .NET Regular expression by
            typing something in the &#39;Text Filter&#39; text box.&nbsp;
        </li>
        <li>
            Finally you often will only want to see some of the fields of the events, which
            you can select by the &#39;Cols&#39; dropdown menu.&nbsp; The order in which you
            click the columns determines the order in which they are displayed in the viewer.&nbsp;
        </li>
        <li>
            You can cut and paste items out of this view, or right click -&gt; Export To Excel
            to view&nbsp; the data in the right view in Excel for further analysis.
        </li>
    </ul>
    <hr />
    <!--  ****************** -->
    <h3>
        <a id="EventViewerTips">Event Viewer Tips</a>
    </h3>
    <p>
        In addition to the <a href="#GeneralTips">General Tips</a>, here are tips specific
        to the <a href="#EventViewer">Event Viewer</a>.
    </p>
    <ul>
        <li>
            <strong>Canceling </strong>- A variety of actions (hitting return in a textbox or
            double clicking on an event name) will cause a refresh (which can take a while).&nbsp;
            You can cancel the refresh by hitting the &#39;ESC&#39; key or the cancel button
            (lower right corner).&nbsp;&nbsp;
        </li>
        <li>
            <strong>Viewing Stacks</strong> If an event has a stack trace associated with it
            the 'HasStack' field will be true. By selecting the time of that event and hitting
            Alt-S you can see the stack for that time (which will include that events). You
            can also select two times, and that region will be shown in the AnyStack view.
        </li>
        <li>
            <strong>Selecting columns </strong>- hitting the &#39;cols&#39; dropdown allows
            you to select only particular columns to display.&nbsp;&nbsp; The order in which
            the columns are selected determines the order in the display.&nbsp;&nbsp; Very handy.&nbsp;
        </li>
        <li>
            <strong>Column Sums, Event Counts</strong>- When the display is refreshed, the count
            of the number of events processed is given in the status bar (and log file), as
            well as the sum of any columns that are number values.&nbsp;
        </li>
        <li>
            <strong>Selecting Event Name Quickly</strong> - It is not uncommon to have many
            event names in the left window.&nbsp; To find an event quickly simply type some
            substring of the event name in the &#39;Filter:&#39; textbox immediately above the
            event names list view.&nbsp;&nbsp; As you type characters the event name pane is
            filtered to only contain event names that contain the filter string.&nbsp;&nbsp;
            In a few keystrokes&nbsp; you can narrow your search and then double click on the
            entry you want.&nbsp; This is also an easy way of selecting all events with a particular
            substring (simply select all names in a view and hit enter). The filter textbox
            accepts regular expressions, and one of the more useful is the or operator |. Thus
            the text 'image|process' will filter to all the events that have image or process
            in their name.
        </li>
        <li>
            <strong>Using the histogram to select an event range</strong> - When event are displayed
            it also populates the <a href="#Histogram">Histogram</a> textbox. By selecting ranges
            in this box, you can get a reading of the start, stop, count and rate information
            for that range. Hitting Alt-R will 'zoom into' just that range.
        </li>
        <li>
            <strong>Show Local Time</strong> - The timestamp is displayed in the time zone where
            events are collected (Trace Local). You can switch to the local machine's time zone
            by clicking on the 'View - Show Local Time' option.
        </li>
        <li>
            <strong>Launch in Excel </strong>- Right clicking and selecting &#39;Open in Excel&#39;
            will save the entire view is saved as a temporary CSV file and then opened in excel&nbsp;
            This allows you to do pivot tables and more advanced filtering.&nbsp;&nbsp; Cut
            and paste into excel also works, but this works better if you wish to capture the
            entire view.
        </li>
        <li>
            &nbsp;<strong>Save as CSV</strong> - Right&nbsp; clicking and selecting &#39;Save
            as CSV&#39; save the entire view as a CSV file you specify.&nbsp; This can be opened
            in excel or otherwise post-processed.&nbsp;
        </li>
        <li>
            <strong>Save as XML</strong> - Right&nbsp; clicking and selecting &#39;Save as XML&#39;
            saves the view as an XML file.&nbsp; This file contains ALL the columns (not just
            the ones in the view).&nbsp; Typing &#39;start excel FILE.XML&#39; will load the
            file in excel.&nbsp;
        </li>
        <li>
            <strong>Cutting and pasting time ranges</strong>- If you type Alt-T (right click
            -&gt; Copy Time Range) it will copy the start and end times to the cut/paste buffer
            These can then be pasted into the &#39;start&#39; textBox to clone a type range
            from one view to another.&nbsp; &nbsp;&nbsp;
        </li>
        <li>
            <strong>Tab instead of Enter </strong>- If you wish to change several filter textboxes
            use Tab instead of return to complete the entry so that you don&#39;t cause a refresh.&nbsp;
        </li>
        <li>
            <strong>Visualize Event Counters</strong>- Right click the event name, you can choose
            the option to Show EventCounter graph, and it will popup a HTML based line graph for
            the data. The visualization will respect the text filter so you can choose to display
            a particular subset of the counters.
        </li>
    </ul>
    <hr />
    <!--  ****************** -->
    <h3>
        <a id="EventViewer">The Event Viewer</a>
    </h3>
    <p>
        Some data file (currently on XPERF csv and csvz files) support a view of arbitrary
        events sorted by time.&nbsp;&nbsp; The Event Viewer is a window that is designed
        to display this data.&nbsp; Basically it is a view of events in chronological order
        in time, which can be filtered and searched.&nbsp;&nbsp; A typical scenario is that
        the application has been instrumented with events (like System.Diagnostics.Tracing.EventSource),
        and these events are used to determine a time of interest.&nbsp;&nbsp;
    </p>
    <center>
        <img src="images/eventViewer.png" alt="EventViewer" />
    </center>
    <p>
        The View has two main panels.&nbsp; The panel on the left contains all the events
        types in the trace.&nbsp;&nbsp; You simply select the ones of interest by clicking
        on them with the control key held down (to select several simultaneously.&nbsp;
        The right window contains the actual events records.&nbsp;&nbsp; It is relatively
        expensive to perform the scan over the data to form the list so you must explicitly
        ask for the right panel to be updated.&nbsp;&nbsp; You can do so in several ways
    </p>
    <ol>
        <li>Click the &#39;Update&#39; button in the upper left corner</li>
        <li>Hit F5</li>
        <li>
            Double click on an entry in the left panel (If you have multiple selections you
            must also hold the Ctrl key down to not lose your selection)
        </li>
        <li>Right click and select the &#39;Update&#39; menu item.</li>
        <li>
            Hit enter in any filtering text boxes at the top of the window.&nbsp;&nbsp;
        </li>
    </ol>
    <h4>
        <a id="FilteringByProcess">Filtering by Process</a>
    </h4>
    <p>
        In addition to filtering by event type, you can also filter by process by placing
        text in the &#39;Process Filter&#39; text box.&nbsp; This text is a
        <a href="http://msdn.microsoft.com/en-us/library/az24scfc.aspx">.NET regular expression</a>
        and only records with processes that match this
        text will be selected.&nbsp; The matching is case insensitive, and only has to match
        a substring in the process name.&nbsp;&nbsp; You can use the standard regular expression
        ^ and $ operators to force matches of the complete string. Note that for context
        switch events, the process filter will match both the process being switched from
        (OldProcessName) as well as the new process being switched to (ProcessName).
    </p>
    <h4>
        <a id="MaxRetTextBox">Limiting the number of records returned</a>
    </h4>
    <p>
        Traces can be very large, and thus a very large number of results can be returned
        in the right panel.&nbsp;&nbsp; To speed things up, on a reasonable number (by default
        10000) of records are returned.&nbsp; This is the &#39;MaxRet&#39; value.&nbsp;&nbsp;
        If it is too small, you can update this textbox to something larger.
    </p>
    <h4>
        <a id="TextFilterTextBox">Filtering by Text</a>
    </h4>
    <p>
        In addition to filtering by process, you can also filter by text in the returned
        events.&nbsp;&nbsp; Only records whose entire displayed text matches the pattern will be display.
        Thus if you change the column's displayed it CAN affect the filtering if the there is
        text in the 'Text Filter' text box.
        The string in the 'Text Filter' is interpreted as a
        <a href="http://msdn.microsoft.com/en-us/library/az24scfc.aspx">
            .NET regular
            expression
        </a> and like the process filter by default the match only has to
        match a substring to succeed.  If the pattern begins with a '!' character, then only
        entries that do NOT match the pattern will be shown.
    </p>
    <h4>
        <a id="ColumnsToDisplayTextBox">Selecting Columns</a>
    </h4>
    <p>
        Fields that are specific to the event are shown as a series of NAME=VALUE pairs
        in the &#39;Data&#39; column.&nbsp;&nbsp; This data column can be quite long and
        often the most interested elements are at the end, making the view inconvenient.&nbsp;&nbsp;
        You can fix this by indicating which of these event-specific columns you wish to
        have displayed by placing a field names (case insensitive) in the &#39;Columns to
        Display&#39; textbox .&nbsp; This can be populated easily by clicking on the &#39;Cols&#39;
        button.&nbsp; This displays a popup list of all the columns, and you can simply
        click on the ones of interest (shift and ctrl clicking to select multiple entries),
        and hitting &#39;enter&#39; to continue.&nbsp;&nbsp;&nbsp; The columns will display
        in the order that you selected the items, and the &#39;*&#39; can be used as a wild card
        that represents all columns that have not already been selected.   A maximum of 4
        fields will be displayed in their own columns.  After the first 4 the rest of the specified
        columns will be displayed in the 'rest' column.
    </p>
    <h5>
        <a id="FilteringSelectColumns">Filtering On Select Columns</a>
    </h5>
    <p>
        Events can be filtered using the Columns to Display textbox by specifying expressions combined with boolean operators: || and &&
        based on the selected column within square brackets (<strong>[]</strong>). The format of individual queries is: <strong>LeftOperand Operator RightOperand</strong>
        where:
        <ul>
            <li> <strong>LeftOperand</strong> can either be the name of the property or the name of the event followed by "::" and the name of the property.  </li>
            <li> <strong>Operator</strong> can be one of the following: ==, !=, &#60;, <=, >, >=, Contains. </li>
            <li> <strong>RightOperand</strong> can either be a string or a numeric quantity that'll be interpreted as a double.</li>
        </ul>

        <strong> Notes: </strong>
        <ul>
            <li> Once a query is specified, the logical OR operator || / the logical AND operator && can be used to combine individual expressions. </li>
            <li> Individual expressions can be encased in parentheses (). </li>
            <li> Currently only 26 expressions can be created. </li>
            <li> Spaces are required whenever Contains is used as an operator. </li>
            <li> If you don't specify any fields to display, all fields will show up as part of the "Rest" column.</li>
        </ul>

        <strong> Examples: </strong><br /><br />

        Examples of simple queries include:
        <ul>
            <li><code> [(ThreadID == 1,240) && (ProcessName == devenv)] ThreadID ProcessorNumber </code></li>
            <li><code> [(GC/Start::Depth > 1) && (ProcessName==devenv)] </code></li>
            <li><code> [(ProcessName Contains ServiceHub) || (ProcessName Contains devenv)] ProcessName Count ProcessorNumber Depth </code></li>
        </ul>

        <br /> Examples of some more complex expressions:

        <ul>
            <li><code> [(Count>10 && (Depth >= -1) && (Count<=30) && (Count <= 30 || ProcessorNumber == 2))] Count ProcessorNumber Depth </code></li>
            <li><code> [(Count > 10 && (Count <= 30) && (Count <= 30 || ProcessorNumber == 2))] Count ProcessorNumber Depth </code></li>
        </ul>

        Some video examples of the usage:
        <ul>
            <li> <a href="https://user-images.githubusercontent.com/4951960/168762353-416a619c-0c2f-4b59-887c-213a7fe2f77f.mp4"> 2 Simple Queries Combined </a> </li>
            <li> <a href="https://user-images.githubusercontent.com/4951960/168765726-f6644215-77e3-4f5a-a70f-a1359a82cdc0.mp4"> More Complex Queries Combined </a> </li>
        </ul>
    </p>
    <h4>
        <a id="EventTypes">Event Types</a>
    </h4>
    <p>
        The left hand panel contains all the events that are in the trace.&nbsp;&nbsp; These
        include the events collected by the OS kernel, as well as the .NET runtime, and
        any others that you indicated when you collected the data.
    </p>
    <h5>
        <a id="EventTypesFilter">Filtering the event list</a>
    </h5>
    <p>
        Because the number of event types can be large (typically dozens), there is a &#39;Filter&#39;
        text box at the top of the event type pane.&nbsp;&nbsp; If you are looking for a
        particular event, simply type some part of the event name in this text box and the
        displayed list will be filtered to those events that contain the typed text somewhere
        in the name. The text you type here is really a .NET Regular expression, which means
        you can use wild cards (. and *) and perhaps most importantly the | operator to mean
        'or'. This allow you to filter out all but some interesting events quickly. Also
        remember that Ctrl-A will select everything in the view.
    </p>
    <h5>
        <a id="Histogram">Event Histogram</a>
    </h5>
    <p>
        When the event view is updated, in addition to populating the main listbox, it also
        generates a histogram of event counts which shows how frequency of the selected
        events varies over time. The time interval as designated by the Start and End textboxes
        is divided into 100 buckets and the event count for each of these buckets is calculated
        This number is then scaled so that the largest bucket represents 100% and the same
        convention used in the stackviewer's <a href="#WhenColumn">When Column</a> is used
        to convert this percentage into a number (or letter). This displayed just above
        the listbox. Like the <a href="#WhenColumn">When Column</a> you can select a portion
        of this display and 'zoom in' by using the 'Set Range Filter' command (Alt-R). In
        addition when you change the selection in the histogram text box PerfView will calculate
        the start and end times, total event count and average event rate and display these
        values in the status bar.
    </p>
    <h5>&nbsp;Important Kernel Events</h5>
    <p>
        Here are some Kernel and .NET Events that are worth knowing more about
    </p>
    <ul>
        <li>
            <strong>Windows Kernel/SystemConfig/CPU </strong>- Tells the name of the machine,
            CPU speed, amount of memory the machine has
        </li>
        <li>
            <strong>Windows Kernel/Process/Start</strong> - Tells every process that started
            during the trace, including its command line.
        </li>
        <li>
            <strong>Windows Kernel/Process/End</strong> - Tells every process that ended during
            the trace, including its exit code.
        </li>
        <li>
            <strong>Windows Kernel/Image/Load</strong> - This tells when every DLL that was loaded
            in the process
        </li>
        <li>
            <strong>Windows Kernel/TcpIp/Recv </strong>- Shows when network packets arrive, including
            the source and target IP address and port.
        </li>
        <li>
            <strong>Windows Kernel/PerfInfo/Sample</strong> - These are the samples taken
            every 1msec per CPU that are used in the stack CPU stack viewer.
        </li>
        <li>
            <table>
                <tr>
                    <td>
                        <strong>Windows Kernel/Thread/CSwitch</strong> - Shows whenever a thread either gains
                        or loses the use of a physical CPU.
                        <br />
                    </td>
                </tr>
            </table>
        </li>
        <li>
            <strong>Microsoft-Windows-DotNETRuntime/Runtime/Start</strong>- Indicates the version
            of the .NET runtime used as well as the startup flags (which say if you are using
            the SERVER or CONCURRENT GC).&nbsp;&nbsp;
        </li>
        <li>
            <strong>Microsoft-Windows-DotNETRuntime/GCStart</strong> (and GCStop) - Indicate
            when a GC starts and stops.&nbsp;&nbsp; Use the GC View for a more useful visualization.
        </li>
    </ul>
    <!-- TODO can do more... -->
    <hr />
    <!--  ********************************** -->
    <h2>
        <a id="ETWCollectionDialog">The ETW Data Collection Dialog</a>
    </h2>
    <p>
        Before starting collection PerfView needs to know some parameters.&nbsp;&nbsp; It
        fills in defaults for all but the command to run. Thus in the common scenario you
        only need to fill in the command to run (you are using the &#39;Run&#39; command)
        and hit return to start collecting data.&nbsp;
    </p>
    <ul>
        <li>
            The <strong><a id="CommandToRunTextBox">Command TextBox</a></strong> - This is only
            active for the Run command.&nbsp; It is the command line to run after collection
            has been turned on.&nbsp;&nbsp; This textbox is hidden for the Collect command.&nbsp;
        </li>
        <li>
            The <strong><a id="FocusProcessTextBox">Focus Process TextBox</a></strong> - This is only
            active for the Collect command.&nbsp; It is either a decimal process ID or a name of an executable
            without the directory part but WITH the suffix (e.g. MyProgram.exe). This allows you to only turn on
            non-Kernel events for a particular process, and thus cut the overhead / size of the collection when there are many
            active processes on the system..   Note that it does not have an effect on kernel events (which are
            often the most common, but not always), so it may not help as much as you would like, but DEFINITELY
            helps during rundown (if you have many managed processes, they all do rundown which can be impactful).
            So it always helps when there are many managed processes (because of rundown) but can help quite a lot
            if many of those processes allocate a lot, or use the threadpool (which both can create many events).
            This textbox is hidden for the Collect command.&nbsp;
        </li>
        <li>
            The <strong><a id="DataFileNameTextBox">Data File TextBox</a></strong> - This is
            the name of the output file.&nbsp;&nbsp; It defaults to PerfViewData.etl.&nbsp;&nbsp;
            If you change the file name, you should use the .ETL extension or the viewer will
            not recognize it as an ETW data file.&nbsp;
        </li>
        <li>
            The <strong><a id="CurrentDirTextBox">Current Directory TextBox</a></strong> - This
            is the current directory where the command will be run.&nbsp;
        </li>
        <li>
            The <strong><a id="MergeCheckBox">Merge Checkbox</a></strong> -When data is collected
            it is collected in multiple files.&nbsp; If you will analyze the data on the same
            machine it was collected on, viewing these raw files is fine.&nbsp;&nbsp; However
            if you wish to copy the data to another machine these files need to be merged before
            copying.&nbsp;&nbsp; Checking this box will cause this&nbsp; merging to happen immediately
            after data collection.&nbsp;&nbsp; It tends to take 10s of seconds.&nbsp;&nbsp;
            See the <a href="#merging">merging</a> section for more.
        </li>
        <li>
            The <strong><a id="ZipCheckBox">Zip Checkbox</a></strong> - By default PerfView
            does only the work that is needed to analyze the data on the machine on which it
            was collected. If you intend to copy the data to another machine you should zip
            it first. This does a number of things, including merging all ETL file (see
            <a href="#merging">merging</a>) as well as creating symbolic information for .NET Native images
            (see <a href="#NGenPdbs">NGen Pdbs</a>), and creating a compressed ZIP file that
            contains all of this information. This is the recommended way to create a file that
            can be analyzed on any machine. This will take some extra time to do (e.g. 10s of
            seconds). If you do not do this at the time data was collected it can be done any
            time afterward (on the machine where the data was collected), by right clicking
            on the file in the Main Viewer and selecting the Zip item.
        </li>
        <li>
            The <strong><a id="CircularTextBox">Circular MB TextBox</a></strong> - When collecting
            ETW data, you can collect a lot of data <strong>quickly</strong>, it typically takes
            only a few minutes for the file to reach a gigabyte in size.&nbsp; When the files
            get this big they become too large to handle easily.&nbsp;&nbsp; Instead you want
            to limit the amount of data collected (preferably to only a few seconds.&nbsp;&nbsp;
            One good way of doing this is to collect using a circular buffer.&nbsp; In this
            mode, the file size is limited (to the specified number of megabyte), and when the
            limit is reached the oldest data is overwritten.&nbsp;&nbsp; This keeps file size
            under control.&nbsp; The number in this text box is the file size limit.&nbsp; Note
            that because the ETW system collects up to 3 etl files (one for kernel events, one
            for non-kernel, and one for &#39;rundown&#39; events), the file sizes can be bigger
            than this number but it should not be bigger than a factor of 2.   This text box
            corresponds to the /CircularMB=XXX command line parameter.
        </li>
        <li>
            The <strong><a id="ThreadTimeCheckbox">Thread Time Checkbox</a></strong> -
            This option is needed in any <a href="#BlockedTimeInvestigation">Blocked / Wall Clock Time Investigation</a>.
            It causes a kernel event to be logged every time a thread gets to use the CPU (a context switch).
            It also turns on 'ReadyThread' events that are logged when one thread 'awakens'
            another thread (e.g. sets an event that another thread is waiting for). This option
            also includes all the 'default' events. This is a relatively expensive option (without
            ThreadTime overhead is typically about 3% with thread time it is more typically 10%),
            but still low enough to use any production scenario.   If you care about non-CPU time
            you will want to turn this on.  This can be turned on in the command line by using
            the /threadTime option (which is a shortcut for /kernelEvents=ThreadTime).
        </li>
        <li>
            The <strong><a id="MarkTextBox">Mark TextBox</a></strong> - It is possible to place
            markers in the event log file by pressing the &#39;Mark&#39; button while data collection
            is happening.&nbsp; Each of these marks has a message string associated with it.&nbsp;
            The text in this textbox is used for this message string.&nbsp;&nbsp; Marks show
            up in the &#39;Events&#39; view for the ETL file generated.&nbsp; Their provider
            name is called &#39;PerfView&#39; and their task name is &#39;Mark&#39;, and the
            mark text is in the payload.&nbsp;
        </li>
    </ul>
    <p>
        Whether you use the &#39;Run&#39; or &#39;Collect&#39; command, profile data is
        collected machine wide.&nbsp;&nbsp; In order to collect profile data you must have
        administrator rights.&nbsp; If you do not, PerfView will try to elevate (bring up
        a UAC dialog box), and relaunch itself with administrator privileges.&nbsp;&nbsp;
    </p>
    <!--  ****************** -->
    <h3>Advanced Options</h3>
    <p>
        PerfView chooses a useful default set of ETW events to log which allow common performance
        analysis to be done, however, there are numerous ETW events that could be turned
        on.&nbsp; Here is a sampling of some of the most useful of these more advanced events.&nbsp;
    </p>
    <ul>
        <li>
            The <strong><a id="KernelBaseCheckBox">Kernel Base Checkbox.</a></strong> Turns
            on or off the basic set of kernel events. These base events are low volume events
            like process, thread, image and network events. Typically these events are only
            turned off if you are doing monitoring rather than performance analysis. (e.g. you
            just want to see the messages from your EventSource.
            The option /kernelEvents=None can be used to achieve this effect at the command line.
        </li>
        <li>
            The <strong><a id="CpuSamplesCheckBox">Cpu Samples Checkbox.</a></strong> Turns
            on or off the CPU sampling (by default every 1MSec per CPU). This is on by default.
            Removing the Profile flags to the '/kernelEvents' option (e.g. /kernelEvents=default-Profile)
            can be used to achieve this effect at the command line.
        </li>
        <li>
            The <strong><a id="MemoryCheckBox">Page Fault CheckBox</a></strong> - Causes a kernel
            event to be logged every time a page fault or other primitive memory operation is
            done. A stack is logged for this event.
            Adding the Memory flags to the '/kernelEvents' option (e.g. /kernelEvents=default+Memory)
            can be used to turn this option on the command line.
        </li>
        <li>
            The <strong><a id="FileIOCheckBox">File I/O Checkbox </a></strong>- Causes a kernel
            event to be logged every time a file operation is initiated. Many of these may not
            cause disk events because the data is in the file system cache.
            Adding the FileIOInit flags to the '/kernelEvents' option (e.g. /kernelEvents=default+FileIOInit)
            can be used to turn this option on the command line.
        </li>
        <li>
            The <strong><a id="RegistryCheckBox">Registry CheckBox</a></strong> - Causes a kernel
            event to be logged every time a registry operation is performed.
            Adding the Registry flags to the '/kernelEvents' option (e.g. /kernelEvents=default+Registry)
            can be used to turn this option on the command line.
        </li>
        <li>
            The <strong><a id="VirtualAllocCheckBox">Virtual Alloc CheckBox</a></strong> - Causes
            a kernel event to be logged every time a VirtualAlloc call (primitive memory allocation)
            is make (or freed). See <a href="#UnmanagedMemoryAnalysis">Unmanaged Memory Analysis</a> for more.
            Adding the VirtualAlloc+VAMap flags to the '/kernelEvents' option (e.g. /kernelEvents=default+VirtualAlloc+VAMap)
            can be used to turn this option on the command line.
        </li>
        <li>
            The <strong><a id="IISCheckbox">IIS CheckBox</a></strong> - Causes detailed logging for
            the Internet Information Service IIS to be logged.  You can do this from the command line
            by using /providers:Microsoft-Windows-IIS.   Note that PerfView has special logic that
            notices when this provider is set to a verbose level and turns on a number of other
            IIS related providers in that case.  Thus 17 providers are actually turned on in this case.
            These events can be viewed in the 'events' view.
        </li>
        <li>
            The <strong><a id="RefSetCheckBox">RefSet CheckBox</a></strong> - Causes detailed
            kernel events associated with memory usage (called Reference Set) to be logged.
            This information can be viewed in the 'Any Stacks' view (look for PageAccess).
            Adding the ReferenceSet flags to the '/kernelEvents' option (e.g. /kernelEvents=default+ReferenceSet)
            can be used to turn this option on the command line.
        </li>
        <li>
            The <strong><a id="HandleCheckBox">Handle CheckBox</a></strong> - Causes detailed
            kernel events the creation and closing of Window OS kernel handles to be logged.
            You will see 'Windows Kernel/Object/*' events to show up in the 'events' view as well
            as activating the <a href="#WindowsHandleRefCountStacks">Windows OS Handle Stacks</a> view (as well as the 'Any Stacks' view).
            Adding the Handle flags to the '/kernelEvents' option (e.g. /kernelEvents=default+Handle)
            can be used to turn this option on the command line.
        </li>
        <li>
            The <strong><a id="DotNetAllocCheckBox">.NET Alloc CheckBox</a></strong> - Causes
            an event to be fired every time a .NET object is allocated.
            This can be also activated by the /DotNetAlloc command line option.
            Note that this only affect processes that start AFTER data collection has started.
            See <a href="#GCHeapNetMemStacks">GC Heap Net Mem</a> for more.
            <p>
                This option tends to have a VERY noticeable impact on performance (2X or more).   Also,
                if the application allocates aggressively, so many events will be fired so quickly that
                events will be lost even when the
                /BufferSizeMB qualifier is used to set the size very large (e.g. 500Meg).  For these reasons it
                is usually a better idea to use the <a href="DotNetAllocSampledCheckBox">.NET SampAlloc</a>
                option instead if at all possible.
            </p>
        </li>
        <li>
            The <strong><a id="DotNetAllocSampledCheckBox">.NET SampAlloc CheckBox</a></strong>
            - Cause a 'smart' sampling of the allocations done by the program.   Basically the
            dynamic rate of allocation for each type is measured on the fly, and the sampling
            for each individual type is adjusted so the number of allocations per sec stays under 100.
            The window for measuring the rate is roughly 16-80 msec long  (it is an exponentially
            decaying window).   This means that it is for any type that allocates less than 20
            instances in any 100 msec window is unlikely to be trimmed at all.  Also all objects
            larger than 10K are never trimmed.  However commonly allocated things (e.g. strings
            byte[] and object[] will be trimmed, typically by 10 to 1, 100 to one or even 1000 to one
            until the allocation rate is roughly 100 / sec.  When an object is trimmed, its size
            is remember and added to the next object of that type which is sampled.  Thus the
            size reported is representative of the true allocation size, but the stack associated
            with that size will be shared (and thus may be inaccurate).  Statistically speaking, if you
            have several seconds of  trace (and thus hundreds to thousands of samples) what is
            reported is likely to be close to the true statistics.
            <p>
                The overhead of turning on .NET SampAlloc CheckBox is much less than the
                <a href="DotNetAllocCheckBox">.NET Alloc CheckBox</a>.   Typically the overhead is
                10-20% (unlike 2X or  more), and produces 200 Meg per minute of trace.   This is
                a bit more expensive than turning on /threadTime however low enough that you can
                leave it on in production (especially if the application does not allocate heavily).
            </p>
            <p>
                Note that this only affect processes that start AFTER data collection has started.
                This can be also activated by the /DotNetAllocSampled command line option.
                See <a href="#GCHeapNetMemStacks">GC Heap Net Mem</a> for more.
            </p>
        </li>
        <li>
            The <strong><a id="ETWDotNetAllocSampledCheckBox">ETW .NET Alloc CheckBox</a></strong>
            CURRENTLY ONLY RECOMMENDED for .NET NATIVE and PROJECT K SCENARIOS.
            This checkbox collects 'Smart sampled' object allocation information which is essentially the same data as
            the <a href="#DotNetAllocSampledCheckBox">.NET SampAlloc CheckBox</a>.
            It does this, however in a different way.   The <a href="#DotNetAllocSampledCheckBox">.NET SampAlloc CheckBox</a> works by
            injecting a .NET Profiler DLL (ETWClrProfiler) into any process that starts after data collection begins.
            The ETW .NET Alloc Checkbox accomplishes the same thing by turning on ETW events built into the runtime
            since Version V4.5.2.     Thus there are times when each of them will work but the alternative will not.
            In particular the ETW version does not work on older runtimes, but the Profiler based solution does not work
            on runtimes that do not support the .NET profiler API (e.g. .NET Native or Project K).    Eventually older
            runtimes will not be interesting and using the ETW based solution will be the uniform choice.   This checkbox
            just sets the GCSampledObjectAllocationHigh bit of the /ClrEvents flags, so this option can be turned on
            at the command line using /ClrEvents=default+GCSampledObjectAllocationHigh.
        </li>
        <li>
            The <strong><a id="DotNetCallsCheckBox">.NET Calls CheckBox</a></strong> - Causes
            an event to be fired every time you enter the prolog of a .NET method (thus methods
            that are NOT implemented in .NET do not show up).
            This can be also activated by the /DotNetCalls command line option.
            Note that this only affect processes that start AFTER data collection has started.
            You can however start tracing, start the program then start and stop collection
            (multiple times) afterward to capture different scenarios.
            <p>
                The events from this option are called 'CallEnter' and show up in the 'AnyStacks'
                view in the 'Advanced Group' view.  Most likely you will want to filter out all other
                events in the view by selecting the CallEnter node -> right click -> Include Item.
            </p>
            <p>
                This option tends to have a VERY noticeable impact on performance (5X or more).
                If the application runs a lot of code (common), it may be necessary to make
                /BufferSizeMB qualifier very large (e.g. 1000Meg).  and even that may not be enough
                This option is really only meant for small isolated tests.
            </p>
            <p>
                There is an command line option /DotNetCallsSampled which works like /DotNetCalls, however it
                samples every 997 calls rather than every call.  This cuts the overhead (and file size)
                by a factor of ~1000 which is better if overhead is a concern.
            </p>
            <p>
                By default the runtime does not disable inlining of methods.  Thus you will not see
                inlined calls in your trace.   There is also a command line option /DisableInlining
                which disables inlining so you will see every call.  This slows things down even more
                so should only be used in 'small' scenarios.
            </p>
        </li>
        <li>
            The <strong><a id="JITInliningCheckBox">JIT Inlining Checkbox</a></strong> - Causes
            an event to be captured for every inlining decision made by the JIT.  The results are
            available as two tables in the JIT Stats report, one showing all of the successfully
            inlined call sites and one showing all of the failed inlining call sites (where the JIT
            decided not to inline).  Each table shows the method being compiled, the caller, and the
            callee; the failed table also shows the reason provided by the JIT for why inlining
            wasn't performed. For fine-tuning performance of hot paths, this information can be very
            valuable in understanding where functions you expected to be inlined aren't being inlined,
            allowing you to then examine why and potentially tweak your code accordingly (such as by
            separating out a fast path into its own method that's more likely to be inlined, by using
            [MethodImpl(MethodImplOptions.AggressiveInlining)], etc.). This feature can also be
            activated with the /JITInlining command line option.
        </li>
        <li>
            The <strong><a id="CCWRefCountCheckBox">.NET Native CCW</a></strong> - Causes
            an event to be captured produced via .NET COM Callable Wrapper (CCW), an increase and decrease references count.
            Works only with .NET Native applications. Also shows the stacks according to collected events.
        </li>
        <li>
            The <strong><a id="NetCaptureCheckBox">Net Capture Checkbox</a></strong> This option
            turns on Microsoft-Window-NDIS-PacketCapture events using in the 'netsh trace' command
            built into windows.   The full payload of every packet will be logged to the ETL file.
            PerfView's event viewer has rudimentary packet parsing capabilities, but for non
            trivial scenarios it is recommend that you use the <a href="#NetMonCheckBox">NetMon option</a> and use the
            <a href="http://www.microsoft.com/en-us/download/details.aspx?id=4865">NetMon tool</a> to parse
            the packets.  The /NetworkCapture option enables this from the command line.
        </li>
        <li>
            The <strong><a id="NetMonCheckBox">NetMon Checkbox</a></strong>  This option is like
            does everything that the <a href="#NetCaptureCheckBox">Net Capture option</a> does (log
            every packet to the ETL file.   However it also generates another ETL file (_NetMon.etl)
            that has just the Networking packets and can be read directly by the
            <a href="http://www.microsoft.com/en-us/download/details.aspx?id=4865">NetMon tool</a>.
            Thus you can use the full power of the NetMon tool to inspect the networking behavior but
            also have all the system events as well.   The /NetMonCapture option enables this from the command line.
        </li>
        <li>
            The <strong><a id="VSCheckBox">VS Checkbox</a></strong> - Turns on the providers
            built into Visual Studio. Only people profiling Visual Studio itself should care
            about this option
        </li>
        <li>
            The <strong><a id="ClrCheckBox">.NET Checkbox</a></strong> - Turns on the default
            .NET providers. Unless you have No .NET code in the process of interested, you should
            leave this provider on.   The /ClrEvents=None command line option achieves the same effect.
        </li>
        <li>
            The <strong><a id="ClrAllCheckBox">.NET All Checkbox</a></strong> - Turns on all
            .NET providers, even the more verbose ones. Currently the only additional CLR events
            are the Interop and just in time (JIT) tracing options (which tell about inlining
            decisions).
        </li>
        <li>
            The <strong><a id="BackgroundJITCheckBox">Background JIT Checkbox</a></strong> -
            Version 4.5 of the .NET Runtime introduced a class called 'System.Runtime.ProfileOptimization'
            which allow programs to save information about what methods where JIT compiled on
            that execution.   Checking this box will enable events that will allow examination
            of this background compilation.  See <a href="HtmlReportUsersGuide.htm#UnderstandingBackgroundJIT">Background JIT Compilation</a> for more.
        </li>
        <li>
            The <strong><a id="GCOnlyCheckBox">GC Only Checkbox</a></strong> - Turns off all
            providers (including the default ones) except for those needed to do a .NET GC Heap
            analysis. In this mode relatively few events are logged, so you can collect data
            about a large period of time (say an hour), in a reasonable size (say 200Meg). This
            option shows you stacks for a SAMPLING of GC allocations.   In addition to the GC
            events it also turns on the <a href="#MemInfoCheckBox">MemInfo</a> and <a href="#VirtualAllocCheckBox">VirtualAlloc</a> events, which are useful
            for tracking down memory issues.
            The /GCOnly command line option achieves the same effect.
        </li>
        <li>
            The <strong><a id="GCCollectOnlyCheckBox">GC Collect Only Checkbox</a></strong>
            - Turns off all providers except those that describe garbage collections. Thus even
            the GC allocation sampling is turned off. This mode logs even less data than GC
            Only, and thus can collect data over an every longer period of time (say a day)
            in reasonable size (say 200 meg).  The /GCCollectOnly command line option achieves the same effect.
        </li>
        <li>
            The <strong><a id="StressCheckBox">.NET Stress Checkbox</a></strong> - Turns on
            .NET events when the runtime does &#39;rare&#39; operations that have proven to
            be useful tracking down non-deterministic &#39;stress&#39; bugs.&nbsp;&nbsp; You
            can also turn these events on by specifying the &#39;ClrStress&#39; provider in
            the Additional Providers textbox.&nbsp;&nbsp;&nbsp;
            The option /Providers=ClrStress can achieve this at the command line.
        </li>

        <li>
            The <strong><a id="MemInfoCheckBox">MemInfo Checkbox</a></strong> - Turns on
            the Microsoft-Windows-Kernel-Memory so that every half second a
            snapshot of memory statistics for every process in the system is taken.  These
            show up in the 'events' view as 'MemInfo' 'MemInfoSessionWS' and 'MemoryProcessMemInfo' events.
            This is really just a shortcut for specifying the Microsoft-Windows-Kernel-Memory in the AdditionalProviders textBox.
            so you can specify this option at to command line with /Providers=Microsoft-Windows-Kernel-Memory.
        </li>
        <li>
            The <strong><a id="AdditionalProvidersTextBox">Additional Providers TextBox</a></strong>
            - A comma separated list of specifications for providers corresponding to the /providers command line qualifier.
            This can be specified by using the ... button or by the following textual specification.  Each provider
            specification has the general form of <em>provider</em>:<em>keywords</em>:<em>level:values</em>.
            The keyword and levels specification parts are optional and can be omitted (For example <em>provider:keywords:values</em> or <em>provider:values </em>is legal).&nbsp;&nbsp;
            <ul>
                <li>
                    <em>provider</em> is either
                    <ul>
                        <li>
                            &nbsp;The name of an ETW provider registered with the operating system.&nbsp;&nbsp;
                            The providers that come with the operating system are all registered in this way.&nbsp;&nbsp;
                            Use the &#39;logman query providers&#39; for a complete list.&nbsp;&nbsp; Typically
                            the most interesting providers start with Microsoft-Windows in their name.
                        </li>
                        <li>
                            The syntax <em>*EventSourceName</em>.&nbsp;&nbsp; This is another way of specifying
                            an EventSource provider.&nbsp;&nbsp; Every ETW provider is uniquely identified by
                            a GUID that is specified when the EventSource is created.&nbsp; However it is strongly
                            encouraged that programmers don&#39;t specify an explicit GUID but let a GUID be
                            generated from the EventSource&#39;s name using a web-standard procedure
                            (<a href="http://www.ietf.org/rfc/rfc4122.txt">4122</a>).&nbsp;&nbsp;
                            This allows you to always find the GUID if you know the name of the provider.&nbsp;&nbsp;
                            The * syntax says to look operate on the provider whose GUID is formed from <em>EventSourceName</em>
                            using RFC4122.&nbsp;&nbsp; This allows you to turn on an EventSource without knowing
                            where it is defined or what its GUID is as&nbsp; long as you know its name.&nbsp;&nbsp;
                            Names are case insensitive because the name is always upper cased before applying
                            RFC4122.&nbsp;
                        </li>
                        <li>
                            GUID (e.g. 77765ec1-a648-502a-0ba0-2beb13633b47).&nbsp; Fundamentally the OS just
                            needs the GUID to turn on a particular ETW provider.&nbsp;&nbsp; Thus you can always
                            simply specify just the GUID.&nbsp;
                        </li>
                    </ul>
                </li>
                <li>
                    <em>keywords</em> is a 64 bit hexadecimal number (which can have a 0x prefix), that
                    specifies the events groups (called keywords in ETW nomenclature).&nbsp; The meaning
                    of these varies from provider to provider (logman query provider <em>providerGuid</em>
                    will tell you the meaning of keywords).&nbsp; Choosing 0xFFFFFFF is good to start
                    with if you are unsure.&nbsp; Omitting this or using '*' means all keywords.
                </li>
                <li>
                    <em>level</em> is one of the following (Critical = 1, Error = 2, Warning = 3, Informational = 4, Verbose = 5).&nbsp;&nbsp; You can use either the names or the numbers to specify the level.&nbsp; Omitting this or using &#39;*&#39; means Verbose.
                </li>
            </ul>
        </li>
        <li>
            <ul>
                <li>
                    <em>values</em> this is a list of semicolon-separated values KEY=VALUE, which are used to pass extra information to the provider or to the ETW system.&nbsp;&nbsp; KEY values that begin with an @ are commands to the ETW system.&nbsp;&nbsp; Everything else is passed
                    on the the provider (EventSources have direct support for accepting this information
                    in its OnEventCommand method).&nbsp; The special ETW keywords include
                    <ul>
                        <li>@StacksEnabled - If this key&#39;s value is &#39;true&#39; then the stack associated with the event is taken (for every event in the provider)</li>
                        <li>
                            @ProcessIDFilter - a space separated list of decimal process IDs to collect data from.&nbsp; Only events from these processes (or those named in the @ProcessNameFilter) will be collected.
                            Since IDs only exist after a process is created, this only works on processes that are running at the time collection starts.
                        </li>
                        <li>
                            @ProcessNameFilter - a space separated list of process names (a process name is the file name (no path) of the executable INCLUDING the .EXE extension).&nbsp;&nbsp; Only events from the names processes&nbsp; (or those named in the @ProcessIDFilter)&nbsp; will be collected.&nbsp;
                            It does not matter if the process was running before collection or not.
                        </li>
                        <li>@EventIDsToEnable -a space separated list of decimal event ID numbers to collect.&nbsp;&nbsp; Event ETW event has a unique event ID and any IDs in this list will be collected in addition to any events specified by the Keywords.&nbsp;&nbsp; </li>
                        <li>@EventIDsToDisable - a space separated list of decimal event ID numbers to collect.&nbsp;&nbsp; Event ETW event has a unique event ID and any IDs in this list will be suppressed from those specified by the Keywords.&nbsp;&nbsp; </li>
                        <li>@EventIDStacksToEnable - a space separated list of decimal event ID numbers whose events should have their stacks collected.&nbsp;&nbsp; Event ETW event has a unique event ID and any IDs in this list will have a stack logged as well as the event information.</li>
                        <li>@EventIDStacksToDisable - a space separated list of decimal event ID numbers whose events should have their stack collection suppressed.&nbsp;&nbsp; Event ETW event has a unique event ID and any IDs in this list will not have a stack collected even though the @StacksEnabled would otherwise have cause a stack collection.&nbsp; </li>
                    </ul>
                    <p>
                        Because
                        some of the lists use whitespace as a separator if you specify these on the command line, you will need to quote the command line qualifier.&nbsp;&nbsp;
                    </p>
                </li>
            </ul>
        </li>
    </ul>
    <p>
        In addition to the more advanced events there are additional advanced options that
        you rarely have to change.
    </p>
    <ul>
        <li>
            The <strong><a id="SampleIntervalTextBox">Sample Interval Text Box.</a></strong>
            - By default, when CPU sampling is turned on, the system takes a sample once a millisecond
            per CPU. This number changes this default. It can be fractional but the system does
            enforce a minimum time (typically .125 MSec).   This can be set from the command
            line with the /CPUSampleMSec:XXX option.
        </li>
        <li>
            The <strong><a id="RundownCheckBox">.NET Symbol Collection Checkbox </a></strong>
            - In order to get symbolic names for .NET methods, it is necessary to ask the .NET
            Runtime to dump this symbolic information to the data file.&nbsp;&nbsp; Checking
            this box causes this to happen for every .NET process in the system just before
            data collection stops.&nbsp; Because the .NET runtime already dumps this information
            at process shutdown, if the process you are interested in shuts down before data
            collection completes, then doing this rundown is not necessary.&nbsp; This is why
            this defaults to &#39;off&#39; for &#39;Run&#39; commands since they always shut
            down before data collection completes.&nbsp;&nbsp;&nbsp; However if you are using
            the &#39;Collect&#39; command the default is to perform this dump.&nbsp;&nbsp; If
            you know that the process has already shutdown, however you can uncheck this box
            and save some time and disk space.&nbsp;&nbsp;
        </li>
        <li>
            The<strong> No 3.X<a id="NoNGenRundownCheckBox"> NGEN Symbols Checkbox </a></strong>
            - In version 4.0 of the runtime and beyond, the runtime has the capability of generating
            symbolic information (PDBs) directly from the NGEN images.&nbsp; However previous
            version requires all that symbolic information to be dumped into the ETL file.&nbsp;&nbsp;
            Be default PerfView only dumps this information for processes that need it (that
            have runtime versions before V4.0), however this still means that if there are V3.5
            processes running, even if they are &#39;uninteresting&#39; processes they will
            still dump a lot of data to the ETL file.&nbsp; If you know that the processes that
            you are interested in run Version 4.0 or greater, than you can avoid bloating the
            file in this way by checking this checkbox.
        </li>
        <li>
            The <strong><a id="RundownTimeoutTextBox">Symbol Timeout TextBox</a></strong> -&nbsp;
            If the <a href="#RundownCheckBox">.NET Symbol Collection checkbox</a> has been checked
            PerfView will signal all .NET processes to dump their symbolic information.&nbsp;
            Symbolic information can take few seconds to over a minute to complete depending
            on how many processes are running .NET applications.&nbsp; To determine how long
            to wait, PerfView monitors CPU activity and when it drops to a low value it assumes
            rundown is complete.&nbsp;&nbsp; However this heuristic is not foolproof.&nbsp;
            If there is a run-bound process on the system, PerfView could wait forever.&nbsp;&nbsp;
            Thus some fall back timeout is needed.&nbsp;&nbsp; This is what the Symbol Timeout
            Textbox is for.&nbsp;&nbsp; It defaults to 30 seconds which is typically more than
            enough, but may need to be increased.
        </li>
        <li>
            The <strong><a id="MaxCollectTextBox">Max Collect TextBox</a></strong> -&nbsp; This
            is the number of seconds that collection will continue. Useful for automation collection.
            See <a href="#ProductionMonitoring">Production monitoring</a> for more details.
        </li>
        <li>
            The <strong><a id="StopTriggerTextBox">Stop Trigger TextBox</a></strong> -&nbsp;
            This is of the form CATEGORY:COUNTERNAME:INSTANCE OP NUM (where CATEGORY:COUNTERNAME:INSTANCE,
            identify a performance counter (same as PerfMon), OP is either &lt; or &gt;, and
            NUM is a number.&nbsp;&nbsp; When that condition is true then collection will stop.
            See <a href="#ProductionMonitoring">Production monitoring</a> for more details.
        </li>
        <li>
            The <strong><a id="CPUCountersTextBox">Cpu Ctrs TextBox</a></strong> -&nbsp; On
            PerfView has the ability to sample stacks based on CPU sampling
            counters (like instructionsRetired, DCache Miss rates Branch Mispredictions etc)
            in addition to the standard sampling based on time. To turn these on you enter a
            space-separated list of strings of the form <strong>CpuCtrName:RolloverCount</strong>,
            where <strong>CpuCtrName</strong> is the name of the counter and <strong>RolloverCount</strong>
            is the number of such events to skip after taking a sample (thus BranchInstruction:10000
            will take a stack trace once ever 10K branch instructions.&nbsp;&nbsp;&nbsp;
            <b>The set of CPU counters that are supported depends on the processor</b>.
            In addition to the GUI you can access this feature
            using the command line /CpuCounters:XXX qualifier.   (See PerfView -> Command Line Help for more)
            <p>
                You can select several of these options from
                the drop down menu and the modify the counts if desired.&nbsp;&nbsp;  Currently there
                is no special view for these events, they show up in the 'Any Stacks Stacks' view as the
                PMCSample event.  Thus going to that view and doing a 'Include Item' on this
                item will allow you to see at what stacks the samples where taken.
            </p>
            <strong>Note:</strong> Earlier than Windows 10 this feature does not work if there is a hypervisor enabled.
            If you only have 'Timer' events available this is the cause. Disable the hypervisor to use
            this feature.
        </li>
        <li>
            The <strong><a id="OSHeapExeTextBox">OS Heap Executable TextBox</a></strong> -&nbsp;
            Windows has the ability to log every time an memory allocation is made from the
            OS memory heap (GlobalAlloc or LocalAlloc APIs). Most unmanaged languages allocate
            their memory from this heap. However unlike other ETW events, this one is so voluminous
            that it is not turned on machine wide. Instead you specify the name of the process
            EXE (no directory name, with or without the EXE extension) and only process that
            start after collection starts and and have this name will log OS Heap events. If
            you want to log events from a process that has already started use the
            <a href="#OSHeapProcessTextBox">OS Heap Process ID Textbox</a>.
            See <a href="#UnmanagedMemoryAnalysis">Unmanaged Memory Analysis</a> for more.
        </li>
        <li>
            The <strong><a id="OSHeapProcessTextBox">OS Heap Process ID TextBox</a></strong>
            -&nbsp; Windows has the ability to log every time an memory allocation is made from
            the OS memory heap (GlobalAlloc or LocalAlloc APIs). Most unmanaged languages allocate
            their memory from this heap. However unlike other ETW events, this one is so voluminous
            that it is not turned on machine wide. Instead you specify the process ID of the
            process you wish to log events. If you wish to log events of a process that has
            not yet started, use the <a href="#OSHeapExeTextBox">OS Heap Executable Textbox</a>.
            See <a href="#UnmanagedMemoryAnalysis">Unmanaged Memory Analysis</a> for more.
        </li>
    </ul>
    <!--  ****************** -->
    <h3><a id="ProviderBrowser">Provider Browser</a></h3>
    <p>
        The Provider Browser is a dialog box generated from the ... button on the right of
        the <a href="#AdditionalProvidersTextBox">additional providers</a> textbox.
        The Provider Browser allows the user to inspect the providers that are available
        as well as the keywords available any particular provider.
    </p>
    <p>
        Because there so many ETW providers available machine wide, the Browser also allows
        the search to be filtered to only those providers that are relevant for a particular
        process.
    </p>
    <ul>
        <li>
            The <strong><a id="ProviderBrowserProcesses">process selector</a></strong>
            has a list of all of process on the system when the window was created.
            This is here to allow providers to be viewed for a given process.
            The "*" will selected all the registered providers.  This is the default selection.
        </li>
        <li>
            The <strong><a id="ProviderBrowserProviders">provider selector</a></strong>
            is a list of either all of the registered providers or those for the selected process
        </li>
        <li>
            The <strong><a id="ProviderBrowserKeywords">keyword selector</a></strong>
            is a list of all of the keywords for the selected provider.  Multiple keywords can be selected for the provider specification.
        </li>
    </ul>
    <h4>Viewing Manifests</h4>
    <p>
        While the name of the provider and its keywords are often sufficient to decide whether
        what events to turn on, it is not unusual that you want more information about what the
        possible events are.   This is what the 'View Manifest' button is for.   Many providers
        register a XML document called a <b>manifest</b> that describes all the events the
        provider can generate in relatively fine detail.    Included in this manifest is
        <ul>
            <li>
                A complete list of all the keywords (bits in a bitset) that can be specified
                to control what events are enabled
            </li>
            <li>
                A description of each event that includes
                <ul>
                    <li>The task and opcode for the event (which make up its name)</li>
                    <li>The name and type of each property that is part of the payload for the event</li>
                </ul>
            </li>
        </ul>
        This information is typically sufficient to understand determine the optimal keywords
        to set for any given application.  See
        the <a href="http://msdn.microsoft.com/en-us/library/windows/desktop/aa384043(v=vs.85).aspx">official docs</a>
        for more details of the information in the manifest).
    </p>
    <!--  ****************** -->
    <h3>The Abort command</h3>
    <p>
        The model for ETW data collection is that data is collected machine-wide.&nbsp;
        Moreover, data collection can <strong>
            exceed the lifetime of the process that started
            collection
        </strong>.&nbsp; While this characteristic is useful (it allows independent
        start and stop command line commands), it also means that it is possible to accidentally
        leave ETW collection running for an indefinite period of time.&nbsp;&nbsp;&nbsp;
        PerfView goes to some length to ensure that data collection is stopped in typical
        cases, however if PerfView was terminated abnormally, or if the command line &#39;start&#39;
        operation was used it is possible that ETW data collection is left on.&nbsp; The
        Collect-&gt;Abort command is designed for this case.&nbsp;&nbsp; It ensures that
        any ETW providers turned on by PerfView are off.&nbsp;
    </p>
    <p>
        Finally, is also easy to launch PerfView from the command line to collect profile
        data.&nbsp; See <a href="#CollectingFromCommandLine">
            collecting data from the command
            line
        </a> for more.
    </p>
    <hr />
    <!--  ********************************** -->
    <h2>
        <a id="MemoryCollectionDialog">Memory Collection Dialog </a>
    </h2>
    <p>
        The memory collection Dialog box allows you to select the input and output for collecting
        GC Heap data as well as set additional options on how that data is collected.
    </p>
    <ul>
        <li>
            <a id="ProcessDumpTextBox"><strong>Process Dump TextBox</strong> </a>This textbox
            is only present when extracting the GC heap from a process dump. It indicates the
            dump file that will be used as input.
        </li>
        <li>
            <a id="ProcessFilterTextBox"><strong>Process Filter TextBox</strong> </a>Any .NET
            Regular expression in the Filter textbox&nbsp; is used as a filter for the Process
            List View.&nbsp; Only processes whose name or process ID match the given regular
            expression will be shown in the listview.&nbsp; This textbox is only present when
            dumping a GC heap from a live process.&nbsp; see <a href="#FilteringByProcess">
                filtering
                by process
            </a> for more.
        </li>
        <li>
            <a id="AllProcsCheckBox"><strong>All Procs Check Box</strong> </a>Normally only
            processes that have a GC heap (.NET and JavaScript processes) are displayed in the
            process window. Checking this box will show all processes on the system. This checkbox
            is only present when dumping a GC heap from a live process.&nbsp; see
            <a href="#FilteringByProcess">filtering by process</a> for more.
        </li>
        <li>
            <a id="ProcessesListView"><strong>Processes ListView</strong> </a>This listview
            is only present when dumping a GC heap from a live process.&nbsp; This listview
            shows all the processes in the system that you currently have sufficient rights
            to access.&nbsp; If you don&#39;t see the process of interest, it may be because
            you don&#39;t have sufficient rights.&nbsp;&nbsp; You can click the &#39;Elevate
            to Admin&#39; hyperlink to relaunch PerfView with Admin rights, which typically
            corrects the problem.
        </li>
        <li>
            <a id="GCHeapDataFileNameTextBox"><strong>Data FileName TextBox</strong> </a>This
            textbox holds the path name of the file that heap dump data will be written to.&nbsp;
        </li>
        <li>
            <a id="MaxDumpTextBox"><strong>Max Dump TextBox</strong> </a>By default to keep
            dump file size under control, there is a limit to the amount of the GC heap that
            will be dumped.&nbsp; By default this is 250K objects.&nbsp; This is typically more
            than enough to get a good sample (and PerfView tries hard to get a representative
            sample). This text box allows you to set this default. See
            <a href="#GCHeapSampling">Understanding GC Heap Sampling</a> for more.
        </li>
        <li>
            <a id="FreezeCheckBox"><strong>Freeze CheckBox</strong> </a>When collecting from
            a live process, by default the process is NOT frozen for the duration of the dump
            but only in short (~ 100msec) bursts.&nbsp; This make the process of dumping the
            heap unimpactful in server scenarios (it allows the server to continue to service
            requests).&nbsp;&nbsp; However because the heap is changing while the heap is being
            dumped, it is not a true snapshot in time.&nbsp;&nbsp; If this inaccuracy is important,
            and you are willing to have the process frozen for the time it takes to make the
            dump, then checking the freeze checkbox will cause the process to be frozen while
            dumping happens.&nbsp;&nbsp; Note that when dumping in the CLRProfiler format, the
            process is always frozen. See <a href="#ProcessFreezing">Process Freezing</a> for
            more.
        </li>
        <li>
            <a id="SaveETLCheckBox"><strong>Save ETL CheckBox </strong></a>The WPA tool can
            also display a GC heaps (either JavaScript or Project N .NET), however it uses an ETL file
            as its format for the heap dump (not a GCDump file). By selecting this checkbox
            PerfView will the GC heap to an ETL file which can be viewed with the WPA.   Note
            that this currently only works for JavaScript or .NET project N (not desktop .NET).
            tool.
        </li>
    </ul>
    <!--  ********************************** -->
    <h2>
        <a id="FilteringGroupingStackData">Filtering / Grouping Stack Data</a>
    </h2>
    <!--  ****************** -->
    <h3>
        <a id="PatternMatching">Simplified Pattern matching</a>
    </h3>
    <p>
        Unfortunately the syntax for normal .NET regular expressions is not very convenient
        for matching patterns for method names.&nbsp;&nbsp; In particular the &#39;.&#39;,
        &#39;\&#39; &#39;(&#39; &#39;)&#39; and even &#39;+&#39; and &#39;?&#39; are used
        in method or file names and would need to be escaped (or worse users would forget
        they need to escape them, and get misleading results).&nbsp;&nbsp; As a result PerfView
        uses a simplified set of patterns that avoid these collisions.&nbsp;&nbsp; The patterns
        are
    </p>
    <ul>
        <li>
            * - Represents any number (0 or more) of any character (like .NET .*).&nbsp;&nbsp;
            This is not unlike what * means in Windows command line
        </li>
        <li>
            % - Represents any number (0 or more) of any alpha-numeric characters or the &#39;.&#39;
            character (like .NET [\w\d.]*)
        </li>
        <li>^ - Matches the beginning of the pattern (like .NET ^) </li>
        <li>| - is an &#39;or&#39; operator that allows the text on either side (like .NET |)</li>
        <li>{} - Forms groups for pattern replacement (like .NET ()) </li>
    </ul>
    <p>
        This simplified pattern matching is used in the GroupPats, FoldPats, IncPats, and
        ExcPats text boxes.  If you need more powerful matching operators, you can do this by
        prefixing the ENTIRE PATTERN with a @.   That indicates to PerfView that the rest of the
        rest of the pattern follows
        <a href="http://msdn.microsoft.com/en-us/library/hs600312(v=vs.110).aspx">.NET Regular expression</a> syntax.
    </p>
    <p>
        Simplified pattern matching is NOT used in the &#39;Find&#39; box.&nbsp;
        For that true .NET regular expressions are used.&nbsp;
    </p>
    <!--  ****************** -->
    <h3>
        <a id="GroupPatsTextBox">Grouping (The GroupPats TextBox)</a>
    </h3>
    <p>
        See also <a href="#PatternMatching">Simplified Pattern matching</a>.
    </p>
    <p>
        Fundamentally, what is collected by the PerfView profiler is a sequence of stacks.&nbsp;
        A stack is collected every millisecond for each hardware processor on the machine.&nbsp;&nbsp;
        This is wonderfully detailed information, but it is very easy to be not see the
        &#39;forest&#39; (the semantic component consuming an unreasonable amount of time)
        because of the &#39;trees&#39; (the data on hundreds or even thousands of &#39;helper&#39;
        methods that are used by many different components).&nbsp;&nbsp;&nbsp;&nbsp; One
        very important tool to tame this complexity is to group methods into semantic groups.&nbsp;&nbsp;&nbsp;
        PerfView provides a simple but very powerful way of doing just this.&nbsp;
    </p>
    <p>
        Every sample consists of a list of stack frames, each of which has a name associated
        with it.&nbsp; Initially looks something like this
    </p>
    <ul>
        <li>
            C:\Windows\Microsoft.NET\Framework\v2.0.50727\mscorwks!Assembly::ExecuteMainMethod
        </li>
    </ul>
    <p>
        In particular the name consists of the full path of the DLL that contains the method
        (however the file name suffix has been removed), followed by a &#39;!&#39; followed
        by the full name (including namespace and signature) of the method.&nbsp;&nbsp;
        By default PerfView simply removes the directory path from the name and uses that
        to display.&nbsp;&nbsp; However you can instead ask PerfView to group together methods
        that match a particular pattern.&nbsp; There are two ways of doing this.&nbsp;&nbsp;
    </p>
    <ol>
        <li>
            <em>PAT</em><strong>-&gt;</strong><em>GROUPNAME</em>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
            Replace any frame names matching <em>PAT</em> with the text <em>GROUPNAME</em>.&nbsp;&nbsp;
        </li>
        <li>
            <em>PAT</em><strong>=&gt;</strong><em>GROUPNAME</em><strong>
                &nbsp;&nbsp;&nbsp;&nbsp;
            </strong>&nbsp; Like <em>PAT-&gt;GROUPNAME</em> but remember the &#39;entry point&#39;
            into the group.&nbsp; (See <a href="#EntryGroups">Entry Groups</a>)
        </li>
    </ol>
    <p>
        The first form is the easiest to understanding.&nbsp;&nbsp; Basically it is just
        search and substitute on all the frame names.&nbsp;&nbsp;&nbsp;&nbsp; Any frame
        that matches the given pattern, will be replaced (in its entirety) with GROUPNAME.&nbsp;&nbsp;
        This has the effect of creating groups (all methods that match a particular pattern).&nbsp;&nbsp;
        For example the specification
    </p>
    <ul>
        <li>mscorlib!Assembly::-&gt;class Assembly </li>
    </ul>
    <p>
        Will match any frames that have mscorlib!Assembly:: and replace the entire frame
        name (not just the part the matched) with the string &#39;class Assembly&#39;.&nbsp;&nbsp;
        This has the effect of grouping all methods from the class Assembly into a single
        group.&nbsp; With one simple command you can group together all methods from a particular
        class.
    </p>
    <p>
        Like .NET regular expressions, PerfView regular expressions allow you to &#39;capture&#39;
        parts of the string match the pattern and use it in forming the group name.&nbsp;&nbsp;
        By surrounding parts of the pattern with {} you capture that&nbsp; part of the pattern,
        and then you can use reference the string that matched that part of the pattern
        by using $1, $2, ... to signify the first, second, ... capture.&nbsp; For example
    </p>
    <ul>
        <li>{%}!-&gt;module $1 </li>
    </ul>
    <p>
        Says to match any frame that has alphanumeric characters before !, and to capture
        those alphanumeric characters into a $1 variable.&nbsp;&nbsp; Whatever was matched
        is then used to form a group name.&nbsp;&nbsp; This has the effect of grouping all
        samples by the module that contained them (the &#39;module level view&#39;).&nbsp;&nbsp;
    </p>
    <p>
        It is useful to have more than one group specification, so group syntax supports
        a semicolon list of grouping commands.&nbsp; For example here is another useful
        one.
    </p>
    <ul>
        <li>
            <span class="style1"><font color="#3333ff">{%!*}.%(-&gt;class $1</font></span>;<span class="style2"><font color="#cc0000">{%!*}::-&gt;class $1</font></span>
        </li>
    </ul>
    <p>
        There are two patterns in this specification.&nbsp; The first one (in blue) looks
        captures the text right before the ! as well as up to the last &#39;.&#39; before
        a (.&nbsp;&nbsp; This captures the &#39;class and namespace&#39; part of a .NET
        style method name.&nbsp;&nbsp; The second pattern does something very similar with
        C++ style names (that use :: to separate class name from method name.&nbsp;&nbsp;&nbsp;
        Thus the specification above groups methods by class.&nbsp;&nbsp; Powerful!
    </p>
    <p>
        Another useful technique is take advantage of the fact that the full path name of
        a module is matched to group even more broadly than module.&nbsp; For example because
        * matches any number of&nbsp; any character, the pattern
    </p>
    <ul>
        <li>system32\*!-&gt;OS </li>
    </ul>
    <p>
        Will have the effect of grouping any methods that came from ANY module that lives
        has system32 as any part of its module&#39;s path as &#39;OS&#39;.&nbsp;&nbsp; This
        is very convenient because typically this is what people want.&nbsp; They don&#39;t
        want to see any of the details of&nbsp; methods INTERNAL to the operation system,
        they want them grouped together.&nbsp; This simple command does this in one swoop.
    </p>
    <h4>Grouping precedence and exclusion groups</h4>
    <p>
        When a frame is matched against groups, it is done in the order of the group patterns.&nbsp;&nbsp;
        Once a match occurs, no further processing of the group pattern is done for that
        frame (first one wins).&nbsp;&nbsp; Moreover, if the GROUPNAME is omitted, it means
        &#39;do no transformation&#39;.&nbsp;&nbsp; These two behaviors can be combined
        to force certain methods to NOT be in a group.&nbsp; For example the specification
    </p>
    <ul>
        <li>
            <span class="style1"><font color="#3333ff">myDirectory\*!-&gt;</font></span>;<span class="style2"><font color="#cc0000">{%}!-&gt;module $1</font></span>
        </li>
    </ul>
    <p>
        Force a module level view for all modules (the red grouping pattern), however because
        of the first (blue) pattern, any modules that have &#39;myDirectory; in their path
        are NOT grouped by the red pattern (they are excluded).&nbsp; This can be used to
        create a &#39;just my code&#39; effect.&nbsp; Functions of every module except the
        code that lives under &#39;myDirectory&#39; is group together.&nbsp;&nbsp; Powerful!
    </p>
    <h4>
        <a id="EntryGroups">Entry Groups</a>
    </h4>
    <p>
        The examples so far as &#39;simple groups&#39;.&nbsp;&nbsp; The problem with simple
        groups is that you lose track of valuable information about how you &#39;entered&#39;
        the group.&nbsp; Consider the example of grouping all modules in System32 into a
        group called OS that was considered before.&nbsp;&nbsp; This works well, but has
        limitations.&nbsp; You might see that a particular function &#39;Foo&#39; calls
        into the OS can that whatever it did in the OS takes a lot of time.&nbsp;&nbsp;
        Now it may be possible simply by looking at the body of &#39;Foo&#39; to &#39;guess&#39;
        what OS function was being called, but this clearly an unnecessary pain.&nbsp;&nbsp;&nbsp;
        The data collected knows exactly which OS function was entered, it is just that
        our grouping has stripped that information.&nbsp;
    </p>
    <p>
        This is the problem entry groups solve.&nbsp;&nbsp; They are just like normal groups
        but use the =&gt; instead of -&gt; to indicate they are entry groups.&nbsp;&nbsp; An entry
        group creates the same group as a normal group but it instructs the parsing logic
        to take the caller into account.&nbsp; Effectively a group is formed for each &#39;entry
        point into the group.&nbsp;&nbsp; If a call is made from outside the group to inside
        the group, the name of the entry point is used as the name of the group.&nbsp;&nbsp;
        As long as that method calls other methods within the group, the stack frame is
        marked as being in the group.&nbsp;&nbsp;&nbsp;&nbsp; Thus boundary methods are
        left alone (they always form another group, but internal methods (methods that call
        within the group), are assigned to whatever entry point group called it.
    </p>
    <p>
        This fits very nicely into people normal notion of modularity.&nbsp; While grouping
        all functions within the OS as a group is reasonable in some cases, it is also reasonable
        to group them by &#39;public surface areas (a group for every entry point into the
        OS).&nbsp;&nbsp; This is what entry groups do.&nbsp;&nbsp; Thus the command
    </p>
    <ul>
        <li>
            system32\*!<span class="style2"><font color="#cc0000">=</font></span>&gt;OS
        </li>
    </ul>
    <p>
        Will fold away all OS functions, keeping just their entry points in the lists.&nbsp;
        This is VERY powerful!
    </p>
    <h4>Group Descriptions (comments)</h4>
    <p>
        Groups can be a powerful feature, but often the semantic usefulness of a group is
        not clear simply by looking at the pattern definition.&nbsp;&nbsp; Because of this
        groups are allows to have a description that precedes the actual group pattern.&nbsp;
        This description is enclosed in square brackets [].&nbsp;&nbsp; PerfView ignores
        these descriptions, however they are very useful for humans to look at to understand
        the intent of the pattern.&nbsp;
    </p>
    <!--  ****************** -->
    <h3>Folding (inlining)</h3>
    <h4>
        <a id="FoldPatsTextBox">Folding by name (FoldPats TextBox)</a>
    </h4>
    <p>
        See also <a href="#PatternMatching">Simplified Pattern matching</a>.
    </p>
    <p>
        It is not uncommon that a particular helper method will show up &#39;hot&#39; in
        a profile.&nbsp; You have looked at this helper method and it is as efficient as
        it be made.&nbsp; There no way to make it better.&nbsp;&nbsp; Thus it is no longer
        interesting to see this method in the profile.&nbsp;&nbsp; You would prefer that
        this method was &#39;inlined&#39; into each of its callers so that they get charged
        for the cost (rather than it showing up in the helper).&nbsp; This is exactly what&nbsp;
        folding does.&nbsp;&nbsp; The &#39;FoldPats&#39; text box is simply a semicolon
        list of patterns to fold away.&nbsp;&nbsp; Thus the pattern
    </p>
    <ul>
        <li>MyHelperFunction </li>
    </ul>
    <p>
        Will remove MyHelperFunction from the trace, moving its time into whoever called
        it (as exclusive time).&nbsp; It has effect of &#39;inlining&#39; MyHelperFunction&#39;
        into all callers.&nbsp;
    </p>
    <p>
        Grouping transformations occur before folding (or filtering), so you can use the
        names of groups to specify folding.&nbsp; Thus the fold specification
    </p>
    <ul>
        <li>OS</li>
    </ul>
    <p>
        Will fold way all OS functions (into their parents) all in one simple command.&nbsp;
    </p>
    <h4>
        <a id="FoldPercentTextBox">Folding away small nodes (The Fold % TextBox)</a>
    </h4>
    <p>
        Generally speaking, if a method does not consume more than say 1% <strong>of the total in the view</strong>
        then it is usually just 'cluttering' up the display. The Fold
        % TextBox is designed to remove this noise. Any method whole total aggregate inclusive
        metric (that is what is shown in the ByName view in the 'Inc' column) is less than
        1% of the total metric, is removed and its metric is given to its direct parent.
    </p>
    <p>
        While it is tempting to increase this number to a large value (say 10% or more),
        to force most callstacks to be 'big' this generally produces inferior results. The
        reason is that the % does not take into account the semantic relevance of the node.
        Thus folding might fold a very semantically meaningful node into a 'helper' of some
        higher level function. Thus it is usually better to select nodes that 'you don't
        understand' to fold away so that what you are left with is nodes that are meaningful
        to you.
    </p>
    <!--  ****************** -->
    <h3>
        Filtering
    </h3>
    <h4>
        <a id="ExcPatsTextBox">Filtering Stacks with Particular Frames (The ExcPats TextBox)</a>
    </h4>
    <p>
        Grouping and folding have the attribute that they do not affect the total sample
        count in the trace.&nbsp;&nbsp; Samples are not removed, they are simply renamed
        or assigned to another node.&nbsp;&nbsp;&nbsp; It is also useful to exclude nodes
        altogether.&nbsp;&nbsp;&nbsp; The ExcPats text box is a semicolon list of simplified
        regular expression (See <a href="#PatternMatching">Simplified Pattern matching</a>).&nbsp;
        If <strong>any</strong> frame in the stack matches ANY of the patterns in this list,
        then it is removed from the view.&nbsp;&nbsp; The pattern does not have to match
        the complete frame name unless it is anchored (e.g. using ^).&nbsp;&nbsp; The patterns
        are matched AFTER grouping and folding.&nbsp;&nbsp;
    </p>
    <p>
        A common use of exclusion filtering is to find the &#39;second most problematic&#39;
        performance problem in an app.&nbsp;&nbsp; In this scenario you discover that a
        particular method (say &#39;Foo&#39;) was poorly designed and you even understand
        how you might fix it, but you also know that is not your only problem.&nbsp;&nbsp;
        What you want is to find the next most important issue.&nbsp;&nbsp; By excluding
        the samples that call &#39;Foo&#39; you can effectively simulate how the program
        would behave if Foo was &#39;perfect&#39; (took no time).&nbsp;&nbsp; This is typically
        a good approximation of what the program will look like after the fix is applied.&nbsp;&nbsp;
        Thus by simply excluding these samples you look for the next perf problem and thus
        tackle many of them quickly.
    </p>
    <h4>
        <a id="IncPatsTextBox">
            Filtering any Stacks that do not Include a Particular Frame (The
            IncPats TextBox)
        </a>
    </h4>
    <p>
        By default events are captured machine wide, but often you are only interested in
        some of the samples.&nbsp; For example it is very common to only be interested in
        one process, or one thread, or isolate yourself to only one method.&nbsp;&nbsp;
        This is what the IncPats textbox does.&nbsp;&nbsp; The contents of the text box
        is a semicolon separated list of simplified regular expressions (see
        <a href="#PatternMatching">Simplified Pattern matching</a>).&nbsp;&nbsp;&nbsp; It is required that a stack
        matches at least ONE of the patterns in the IncPats list for it to be included in
        the trace.&nbsp; The pattern does not have to match the complete frame name unless
        it is anchored (e.g. using ^).&nbsp;&nbsp; The patterns are matched AFTER grouping
        and folding.&nbsp;&nbsp;
    </p>
    <p>
        As mentioned, it is very common to use the IncPats textbox to restrict your analysis
        to a single process.&nbsp;&nbsp; It is also very useful to use the &#39;|&#39; (or)
        operator here so that you can include just two (or more) processes and exclude the
        rest.&nbsp;
    </p>
    <h4>
        <a id="StartTextBox"></a><a id="EndTextBox">
            Filtering by Time (The Start and End Filtering
            by Time (The Start and End TextBox)
        </a>
    </h4>
    <p>
        It is very useful to &#39;zoom in&#39; to a particular time of interest and filter
        out samples outside this range.&nbsp;&nbsp; This is done by setting the &#39;Start
        TextBox&#39; and &#39;End TextBox&#39; appropriately.&nbsp; These ranges are inclusive
        (on both ends), and are expresses as msecs from the start of the trace.&nbsp;&nbsp;&nbsp;&nbsp;
        You can of course enter times manually or cut and paste numbers from other parts
        of the display.&nbsp;&nbsp; In addition if you paste two numbers into the 'start'
        textbox it will set both the start and end values. There are a few other nice shortcuts
        for setting a time interval.&nbsp;
    </p>
    <h5>
        <a id="SelectingTime">Selecting Time Ranges</a>
    </h5>
    <p>
        The &#39;First&#39; and &#39;Last&#39; columns of tree node are often a useful range
        to filter on.&nbsp; To do this easily, simply select both the boxes (either by dragging
        or by holding the &#39;Ctrl&#39; key as you click additional entries),&nbsp; Once
        you have selected two cells you can right click and select &#39;Set Time Range&#39;
        which will set both the start and end time to the first and last column. You can
        also select a time range by coping two numbers to the clipboard (select two cells
        and press Ctrl-C) and then pasting the numbers into the 'Start' textbox. This textbox
        is smart enough to recognize that the pasted value is a range and will set the 'End'
        time appropriately.
    </p>
    <p>
        It is also very useful to select time ranges based on the &#39;When&#39; column.&nbsp;
        To do this, first select a &#39;When&#39; cell of interest.&nbsp;&nbsp; This will
        cause the status bar at the bottom of the view to display the &#39;When&#39; text.&nbsp;&nbsp;
        By dragging the mouse over the characters, highlight the region of interest (it
        is typically the region of high cost).&nbsp;&nbsp; Then move your mouse off the
        selected region, right click and select &#39;Set Time Range&#39;.&nbsp; This will
        set the &#39;Start&#39; and &#39;End&#39; time to the region you selected.&nbsp;&nbsp;
        You may end up repeating this process to further &#39;zoom in&#39; to a region.&nbsp;
    </p>
    <h4>
        <a id="SamplingTextBox">Speeding up StackViewer display with sampling.</a>
    </h4>
    <p>
        If there are more than 1M data samples being viewed in the stack viewer, the responsiveness
        becomes very sluggish (it takes 10 &gt; seconds to update). To avoid this some stack
        source (most notably the memory stack source), support the concept of sampling.
        The basic idea behind sampling is to only process every Nth sample. Thus by setting
        the sampling text box to 10 the stack view will only have to process 1/10 of the
        data and thus should be 10 times faster. When Sampling is enabled, the stack-viewer
        automatically scales all counts (and therefore metrics too) in the view by the sampling
        rate. Thus the resulting metric and counts are approximately the same as without
        sampling (you can see this because all counts are a multiple of the sampling rate.&nbsp;
    </p>
    <h4>
        <a id="FindTextBox">Finding Items in the View (The Find TextBox)</a>
    </h4>
    <p>
        Text searches of names in the view can be performed by typing a search pattern in
        the &#39;Find:&#39; text box in the upper right corner of the stack viewer.&nbsp;&nbsp;
        Ctrl-F will bring you to this search box quickly.&nbsp;&nbsp; The search pattern
        uses <a href="http://msdn.microsoft.com/en-us/library/hs600312.aspx">.NET regular expressions</a>,
        and is case insensitive.&nbsp;&nbsp; Searching starts at the current cursor position
        and will wrap around until all text is searched.&nbsp;&nbsp; The F3 key can be used
        to find the next instance of the pattern.&nbsp; When all the text has been searched
        the app will beep.&nbsp; The next F3 after that starts over.
        Specification of expressions combined with boolean criteria can be done similar to filtering
        select columns in the <a href="#FilteringSelectColumns">Columns to Display</a> textbox.
        &nbsp;
    </p>
    <!--  ****************** -->
    <h3>
        <a id="Preset">Presets (Save Grouping and Folding Preferences)</a>
    </h3>
    <p>
        <a href="#GroupPatsTextBox">GroupPats</a>, <a href="#FoldPatsTextBox">FoldPats</a> and <a href="#FoldPercentTextBox">Fold%</a>
        text boxes can be edited to contain custom patterns. These patterns combined together can be saved as a named preset.
    </p>
    <p>
        In order to create new preset use Preset -&gt; Save As Preset menu item. If <a href="#GroupPatsTextBox">GroupPats</a>
        text box contains description (enclosed in []), then the description will be offered as a preset name.
        Otherwise automatically generated name will be suggested.
    </p>
    <p>
        All created presets are added to the Preset menu for all active PerfView windows. Select menu item in the Preset menu
        to activate a preset. The name of the preset will be shown in [] in the <a href="#GroupPatsTextBox">GroupPats</a> textbox.
        Presets are saved across sessions. Preset -&gt; Manage Presets menu item allows editing existing presets as well as deleting them.
    </p>
    <hr />
    <!--  ********************************** -->
    <h2><a id="BlockedTimeInvestigation">Blocked/Wall Clock Time Investigation: The Thread Time Views</a></h2>
    <h3>Why Blocked/Wall Clock Time Investigations are harder</h3>
    <p>
        Wall clock time investigations break down into two cases.&nbsp; Either most of that wall
        clock time is dominated by CPU (in which case a CPU investigation is will work), or
        it is not dominated by CPU time, in which case you also need to understand the blocked
        (non-CPU) time being consumed.&nbsp;&nbsp;&nbsp; Thus the &#39;hard&#39; part&#39; of doing
        a wall clock investigation is understanding blocked time.&nbsp;&nbsp;
    </p>
    <p>
        <strong>Blocked time investigations are inherently harder than CPU investigations</strong>.&nbsp;
        CPU investigations are reasonably straightforward because in most scenarios any CPU usage is &#39;interesting&#39; to
        investigate regardless of where it happens.&nbsp; Thus the trivial algorithm of attaching the
        same weight to every msec of CPU regardless of where it happened is appropriate.&nbsp;&nbsp;
        This is actually not true in some scenarios.&nbsp; For example, if there was a background CPU-bound
        task on a multi-processor machine, the CPU associated with that background task is likely not very
        interesting because it is not consuming &#39;precious&#39; resources and is not on the critical path
        of some user operation.&nbsp;&nbsp; Thus if you were investigating CPU on such an application you
        would need a way of filtering out this &#39;background&#39; activity so you could concentrate on
        the &#39;important&#39; CPU use.&nbsp;&nbsp; Typically this would be easy to do because the threads
        that execute such background
        CPU activity are dedicated to background activities (so you can just exclude all samples from those
        threads).&nbsp;&nbsp; However imagine if the background thread was a &#39;service&#39; and important
        foreground CPU activity was scheduled on it interleaved with the idle background activity.&nbsp; This
        would make analysis quite difficult.&nbsp;&nbsp;
    </p>
    <p>
        This bad situation is EXACTLY the situation you have with blocked time.&nbsp;&nbsp;&nbsp; Typically
        there are many threads that spend most of their time blocked, and most of this blocked time is never
        interesting because it is not part of a critical path.&nbsp;&nbsp; However these threads wake up at
        least some of the time and PARTS of their execution can be on the critical path (and thus are very
        interesting).&nbsp;&nbsp; Unfortunately is no simple, general way of separating &#39;important&#39; blocked
        time (on a critical path), from uninteresting blocked time without additional &#39;help&#39; (annotation)
        of the INTENT of the program.&nbsp;&nbsp; Thus the &#39;trick&#39; to doing a
        blocked time analysis is to use scenario specific mechanisms to tag the &#39;important&#39; blocked
        time and allow it to separated from the (large amount) of unimportant blocked time.&nbsp;&nbsp;
    </p>
    <h3><a id="UnderstandingPerfDataThreadTime">Understanding Thread Time</a></h3>
    <p>
        The view that PerfView has to understand wall clock time or blocked time is called the Thread Time View.&nbsp;&nbsp;
        This view is based on the observation that at any instant in time every thread is doing &#39;something&#39;.&nbsp;
        It might be consuming CPU, or it is not (which we will defined as BLOCKED).&nbsp;&nbsp; If it is BLOCKED it might
        be because it waiting for its turn to use a processor (which we call READIED), or it may be waiting on something
        else (e.g. for a DISK request to respond, or the NETWORK to respond or for some synchronization object (e.g.
        Event, Mutex, Semaphore ...) to change state.&nbsp; Whatever it is doing there is a stack associated with it.&nbsp;&nbsp;
        Thus at every instant of time every thread has a stack and that stack can be marked with a metric that represents wall
        clock time that the thread consumed at that call stack.&nbsp;&nbsp;&nbsp; This is a &#39;perfect&#39; model of what
        every thread is doing on the system.
    </p>
    <p>
        If you set the &#39;thread time checkbox on the collection dialog, or pass the /ThreadTime qualifier to the command
        line, PerfView will ask the operating system to collect the following information:
    </p>
    <ol>
        <li>
            Every millisecond what stack that processor (CPU) is working on (this is present
            event without the /ThreadTime qualifier)
        </li>
        <li>
            On every context switch (when a thread transitions from running to blocked) the stack of
            the thread that is starting to run
        </li>
        <li>The time any thread gets created or destroyed.&nbsp; </li>
    </ol>
    <p>
        With this
        data we have &#39;perfect&#39; information on where we are blocked.&nbsp; We know the exact time when we started
        to block and when we ended, and thus can attribute exactly the correct amount of time to that particular stack.&nbsp;&nbsp;
        We also have approximate information where CPU time is spent.&nbsp;&nbsp;&nbsp; If we get a sample (which might
        be a CPU sample or a context switch) we can attribute that stack with the time spent since the last sample was
        taken (which again is either a context switch (e.g. if the thread had the CPU less than 1 msec) or another CPU
        sample (e.g. if it has been longer than 1msec since the last context switch).&nbsp; Thus the events above we can
        do a VERY good job of detailing exactly where each thread spent its time.&nbsp;&nbsp; It is interesting to note
        that you get &#39;perfect&#39; information on EXACTLY how much CPU time things use (since you know exactly when
        threads start consuming CPU time and when they stop consuming CPU).&nbsp;&nbsp; The only imperfection is
        that the stacks associated with CPU is only a sampling.&nbsp;
    </p>
    <p>
        This transformation of context switch and CPU samples is the foundation of the &#39;Thread Time Stacks&#39; view
        in PerfView and is the view of choice to understand wall clock time (or blocked time).&nbsp;&nbsp; Like the CPU
        stacks view, the Thread Time Stacks view shows inclusive &#39;tree&#39; which aggregates all these stacks of where
        threads spend their time.&nbsp;&nbsp; At the bottom (away from thread start) end of each stack a pseudo-frame is
        appended which indicate what information is known about that stack (CPU_TIME, DISK_TIME, HARD_FAULT (disk time
        to fetch mapped files), NETWORK_TIME, READIED_TIME or&nbsp; BLOCKED_TIME).&nbsp;&nbsp; For some things more is
        known (like the file or network port, so pseudo-frames
        get inserted for those too.&nbsp;&nbsp;&nbsp; These tags make it easy to use PerfView&#39;s folding and
        grouping and filtering capabilities to look at only certain causes of delay.&nbsp;
    </p>
    <h3>A Wall Clock Time Investigation</h3>
    <p>In broad strokes, a clock time investigation consists of the following steps</p>
    <ol>
        <li>
            Collect a trace with the Thread Time events.&nbsp;&nbsp; This is done using the PerfView Run
            or PerfView Collect commands, but you need to tell PerfView to also collect the context switch information by either
            <ol>
                <li>Setting the <strong>ThreadTime</strong> checkbox in the Data collection dialog box</li>
                <li>Passing the <strong>/ThreadTime</strong> qualifier on the command line to PerfView</li>
            </ol>
        </li>
        <li>Open the &#39;Thread Time Stacks&#39; View of the resulting ETW data. </li>
        <li>
            Find the segment of time in a single thread that is interesting to you.&nbsp;&nbsp; This is the
            critical part because you really only want to see the wall clock time (or blocked time) that is
            on your critical path.&nbsp;&nbsp; Techniques for doing this depend on your scenario.&nbsp;&nbsp;&nbsp;
            Here are some possibilities for &#39;easier&#39; cases:<ol>
                <li>
                    For simple sequential programs with synchronous I/O (a very common case including typical
                    application startup), you simply need to find the method that represents the &#39;work&#39;
                    you are interested in. and use the &#39;Include Item&#39; (Alt-I) operation to narrow it to
                    that method (which is on a single thread).&nbsp;
                </li>
                <li>
                    &nbsp; For ASP.NET applications that don&#39;t use Asynchronous I/O, the ASP.NET Thread Time
                    View will group those fragments of threads that were on the critical path for a particular
                    request together.&nbsp;&nbsp; Thus using &#39;Include Item&#39; on the frame representing a
                    request (or groups of request), you can see only &#39;interesting&#39; time.
                </li>
                <li>
                    If the application uses System.Threading.Threads.Tasks, you can use the &#39;Thread Time (with
                    Tasks) view.&nbsp; This marks the segment of a task that is executing a single task with the
                    ID of that task.&nbsp; I also attributes a Task&#39;s time to the call stack of the task that
                    activated it.&nbsp;&nbsp; In this way concurrent programs can be analyzed as if they were singly
                    threaded sequential programs.
                </li>
                <li>
                    You can use System.Diagnostics.Tracing.EventSource to emit events for interesting (often small)
                    operations in&nbsp; your application.&nbsp; If these operations do not do Async I/O or otherwise
                    spawn work on another thread, the events can be used to find a interesting segment of a single thread.&nbsp;
                    You can then use the &#39;Include Item&#39; on the thread of interest, as well
                    as the &#39;start&#39; and &#39;end&#39;
                    time ranges to find an interesting part of a thread to analyze.&nbsp;
                </li>
            </ol>
        </li>
        <li>
            Once you have narrowed your interest to the time range of a single thread, you
            can proceed to analyze it.&nbsp;&nbsp; Typically you do this by switching to
            the &#39;By Name&#39; view and simply looking at the &#39;types&#39; of time
            being consumed (CPU, BLOCKED, HARD_FAULT, READIED, DISK, NETWORK).&nbsp; From
            here the analysis is much like a CPU analysis.&nbsp;
        </li>
    </ol>
    <p>
        To recap, a Wall clock (or blocked time) investigation always starts with filtering to
        find &#39;interesting&#39; wall clock time (typically on a single thread).&nbsp; Until
        you get to this point you can&#39;t sensibly interpret the &#39;Thread Time View&#39;, but
        after you have found the interesting time, it proceeds much like a CPU analysis.&nbsp;
    </p>
    <h3>Blocked time and Causality (ReadyThread)</h3>
    <p>
        Sometimes identifying the size and call stack of blocked time is sufficient to understand
        a particular performance problem.&nbsp;&nbsp; For example analyzing the cold startup
        time of an application falls into this category because understanding why the blocked
        time is as long as it is is clear (a Disk read was needed), and so the only questions
        are how long are these operations and where did the occurred (what stack caused them).&nbsp;&nbsp;&nbsp;
        However in other scenarios the issue is understanding why delays is as long&nbsp; as it is.&nbsp;
        For example, if a thread is blocked waiting on a lock, the interesting question is why
        was some other thread holding the lock so long?&nbsp; To answer this question you need
        to determine which thread was holding the lock.&nbsp;&nbsp; Questions like this are
        what the ReadyThread event helps answer.
    </p>
    <p>
        When you you turn on the /ThreadTime events, not only do you turn on the context switch
        events, you also turn on the ReadyThread events.&nbsp;&nbsp; A ReadyThread event fires
        when one thread causes another thread to change from being BLOCKED to being runnable
        (that is it make a thread READY to run).&nbsp;&nbsp; Thus if thread A is waiting on a
        lock&nbsp; that thread B owns, when thread B releases the lock it make thread A ready to
        run.&nbsp;&nbsp;&nbsp; When a ReadyThread event fires in this example it logs both threads
        A and B as well as the stack of thread B.&nbsp;&nbsp; Loosely speaking, READYTHREAD logs
        the fact that thread B CAUSED thread A to wake up.&nbsp;
    </p>
    <p>
        PerfView has a special view for displaying READYTHREAD information called the &#39;Thread Time
        (with ReadyThread)&#39; view.&nbsp;&nbsp; This view works just like the &#39;Thread Time&#39;
        view but in addition, every stack where a thread blocks is &#39;extended&#39; with additional
        frames that tell you the thread and stack that woke it up.&nbsp;&nbsp; These extra frames
        are suffixed with &#39;(READIED_BY)&#39; so that you know that you can easily see these
        are not ordinary frame (and you can fold them away if you like).&nbsp; In the example of
        a Thread A waiting on a lock and being awakened by Thread B releasing the lock you would see
    </p>
    <ul>
        <li>Process X</li>
        <li>Thread A</li>
        <li>ntdll!RtlThreadStart</li>
        <li><em>&lt;Additional Frames&gt;</em></li>
        <li>X!LockEnter</li>
        <li>
            <em>
                &lt;Frames of calls into the operating system that block the thread (typically
                WaitForSingleObject or WaitForMulitpleObject)&gt;
            </em>
        </li>
        <li>READIED BY Thread B Waited &lt; 1msec for CPU</li>
        <li>Process X&nbsp; (READIED_BY)</li>
        <li>Thread B (READIED_BY)</li>
        <li>ntdll!RtlThreadStart (READIED_BY)</li>
        <li><em>&lt;Additional Frames, all suffixed with (READIED_BY)&gt;</em></li>
        <li>X!LockExit (READIED_BY)</li>
        <li><em>&lt;Frames into the operating system that unblock the thread (typically SetEvent), suffixed by (READIED_BY)&gt;</em></li>
        <li>BLOCKED_TIME</li>
    </ul>
    <p>
        Which clearly shows that after blocking in &#39;X!LockEnter&#39; the thread was awakened
        by thread B calling &#39;X!LockExit&#39;.&nbsp;
    </p>
    <h3><a id="UnderstandingPerfDataThreadTimeWithTasks">How Tasks make Thread Time Easy (The Thread Time (with Tasks) View)</a></h3>
    <p>
        If you have not already read the basics of <a href="#UnderstandingPerfDataThreadTime">Understanding Thread Time</a>
        you should read that now.  This section builds on those basics.
    </p>
    <p>
        It is strongly recommended that if you need to do asynchronous or parallel operations, that
        you use the .NET System.Threading.Tasks.Task class to represent the parallel activity or
        the &#39;continuation&#39; of the thread after an asynchronous operation completes (the &#39;await&#39;
        feature in C# uses Tasks).&nbsp;&nbsp;&nbsp; What makes Tasks valuable to PerfView
        is that this class logs events when Tasks are created (along with an ID for the created
        task), when there body of the task is invoked (along with an ID for the task), and when
        the task&#39;s body completes (again along with an ID).&nbsp;&nbsp; This helps us in two important ways
    </p>
    <ol>
        <li>
            Task bodies represent real user work, and thus can be used to segregate &#39;important
            blocked time&#39;, from &#39;uninteresting infrastructure time (time these threads
            spend blocked waiting for user work).&nbsp;&nbsp; This is VERY useful.
        </li>
        <li>
            Tasks know where they were recreated (who &#39;caused&#39; them), so there is a
            very natural way of &#39;charging&#39; the creator of the task for all the time
            (or other resources a task uses) to the creator.
        </li>
    </ol>
    <p>
        The &#39;Thread Time (with Task)&#39; view does exactly this.&nbsp;&nbsp; When a
        thread calls a task creation method, this view inserts a pseudo-frame at this point
        that indicates that a task has been scheduled, and then inserts <strong>
            all the events
            for the body of that task at that point
        </strong>.&nbsp; Here is an example
    </p>
    <ul>
        <li>Process32 X</li>
        <li>Thread (1276) CPU=733ms (Startup Thread)</li>
        <li>ntdll!_RtlUserThreadStart</li>
        <li>BlockedTime!BlockedTime.Program.Main</li>
        <li>BlockedTime!Program.DoWork</li>
        <li>mscorlib.ni!TaskFactory.StartNew</li>
        <li>Task Scheduled</li>
        <li>Task Executing on Thread 848</li>
        <li>mscorlib.ni!IThreadPoolWorkItem.ExecuteWorkItem</li>
        <li>BlockedTime!BlockedTime.Program+&lt;&gt;c__DisplayClass5.&lt;DoWork&gt;b__3</li>

    </ul>
    <p>
        &nbsp;
        In this example the &#39;Main&#39; Program called &#39;DoWork&#39; which had the code
    </p>
    <ul>
        <li>Task.Factory.StartNew(delegate {</li>
        <li>&nbsp;&nbsp;&nbsp;&nbsp; // Body Code ...</li>
        <li>});</li>
    </ul>
    <p>
        &nbsp;
        This call causes another thread (in this case thread 848 to start up, and start executing
        the body (the delegate {...}).&nbsp; This&nbsp; &#39;inline delegate&#39; code is called
        an anonymous delegate, and the C# compiled generates name for it (in this case &#39;c__DisplayClass5.&lt;DoWork&gt;b__3&#39;),
        which does the the work (note PerfView&#39;s &#39;Goto Source&#39; (Alt-D) option is VERY
        handy at this point for seeing exactly what this code is).&nbsp;
    </p>
    <p>
        The important part here is that from a source code level it is very natural to think
        that any costs (time) spent in this anonymous delegate should be &#39;charged&#39;
        to &#39;DoWork&#39; because that code caused that delegate to actually run (on a different
        thread).&nbsp; This is EXACTLY what the Thread Time (with Tasks), view does.&nbsp; If your
        application uses Tasks, you should be using this view.&nbsp;
    </p>
    <!--  ************************************************************ -->

    <h3><a id="MakingServerInvestigationEasy">Making Server Investigations Easy (The Thread Time (with Start-Stop Tasks) View)</a></h3>
    <p>
        At its heart, a server investigation is typically about <strong>response time</strong>.   Thus to do
        an server investigation you would like all costs that contribute to making this
        response time longer rolled up together in the display.   This is exactly what the
        <strong>Thread Time with Start-Stop Tasks View</strong> does.
        <ul>
            <li>
                Like all <a href="#BlockedTimeInvestigation">thread time</a> views, it keeps track of where every thread is (what
                its current stack is) regardless of whether it is blocked or using CPU.   Like all thread time
                views <strong>it needs the 'ThreadTime' checkbox</strong> (or /threadTime command line parameter) to be used when
                collecting the data so the necessary events are present.
            </li>
            <li>
                Like all <a href="#UnderstandingPerfDataThreadTimeWithTasks">'with Tasks'</a> views it also knows how to track
                any Asynchronous or concurrent activity done by thread pool threads and assign that cost to the code that
                caused that work to happen.
            </li>
            <li>
                Finally on top of this it identifies events declared to be 'Start-Stop pairs'
                which identify 'interesting' units of time.   The .NET Framework has declared a
                one such start-stop pair when IIS or ASP.NET requests begin, but there are others
                when WCF operations start and stop, as well as when HTTP requests or SQL requests are made to
                other machines.   In addition you can define start-stop requests of your own
                that PerfView will recognise (see below).  Once a 'Start' event is emitted, anything on that
                thread (or any Task caused by that thread) will be part of that start-stop activity
                until the Stop event for that start-stop pair is seen.
            </li>
        </ul>
    </p>
    <p>
        This is best shown by example.   This is an example of a ASP.NET Web server that was
        monitored using 'PerfView /threadTime collect'.  Because we use the /ThreadTime parameter,
        information on context switches and tasks is collected that allows 'Thread Time' views
        to be displayed including the 'Thread Time (with StartStop Tasks)' display .   Here is the
        result of opening this view and focusing on the W3WP process (which is the web server process).
    </p>
    <center>
        <img src="images/ThreadTimeWithStartStop.png" alt="ThreadTimeWithStartStop" />
    </center>
    <p>
        At the top of the tree, we see the process node, but then immediately all costs are segregated
        into two parts, things that are associated with some start-stop activity, and everything else.
        Thus this lets you quickly focus on the thread time that is likely to be of interest.
    </p>
    <p>
        Under the 'Activities' node you see all 'top level' start-stop activities, sorted by
        cost (that is thread time attributed to that activity).  In the view above we opened
        the 'IISRequest' activity (which has a particular ID number and URL) that happens to have
        730.7 msec of thread time.  This IISRequest Activity happens to cause another nested
        Start-stop pair for an AspNetReq activity, so that is shown, from there all stacks
        associated with the AspNetReq activity are shown.  In this example we can see the call
        stack through user code to the method MyOtherAsyncMethod which does a 'await' that
        takes 524.5 msec)
    </p>
    <p>
        Hopefully you can immediately see how useful this view is.  Basically it takes all the
        thread time associated with semantically relevant things (start-stop tasks that someone
        instrumented into the code), and displays the stack based on causality (thus event if
        execution hops threads the stacks 'follow' it).   Thus it becomes trivial to see exactly
        where time is being spent.
    </p>
    <p>
        A typical strategy is to immediately select the '(Activities)' node, right click -> Include Item,
        which will exclude all the non-activity thread time.   This works well most of the time
        however keep in mind that some important costs may be in this (Non-Activities) node, in particular
        things like the GC (in server or background GC), or any non-threadpool threads did work but
        never logged a start and stop event.   This is why PerfView does not hide this, but typically
        you start by looking at the activities, only look outside that if you are lead there.  Typically
        if you will filter to just look at the non-activities and only the CPU_TIME, to see what
        is 'interesting' in that group.
    </p>
    <h4>Thread Time is not Elapsed Wall Clock Time</h4>
    <p>
        It is important to note that what is being shown is STILL thread time, NOT wall clock
        time.   Thus if there is concurrency going on, the total metric is very likely to
        add up to more than elapsed wall clock time.  This is easy to determine this is the case (because you will
        see more than one thread as children of the activity), and you can even see the overlap
        (by looking at the 'when' column of each of the children).   Still it is something to
        be aware of.  See <a href="UnderstandingPerfDataThreadTime">Understanding Thread Time</a> and for more.
    </p>
    <p>
        It is also possible that the thread time will be LESS than elapsed wall clock time.
        This should be a much rarer case.   It happens when the code causes work to happen but
        does not use the mechanisms that have been instrumented to detect that work on another
        thread was caused by the current thread.   Because of this the current thread may return
        to the threadpool (at which point its time is NOT attributed to the activity anymore), but because
        the work on the other thread is unknown to PerfView, it can't properly attribute that
        time to the activity (it ends up under the non-activities node).  Thus there can be 'gaps' in the thread time
        for a request.   PerfView tries to fill these gaps
        with a pseudo-node called 'UNKNOWN_ASYNC', so that at the cost in the view is never less
        than the wall clock time for sorting purposes, but sometimes PerfView's algorithm is not
        perfect.   In either case, however it becomes very difficult to determine what was going
        on during these gaps.   Hopefully this simply won't happen to you...
    </p>
    <h4>Making your own Start-Stop tasks</h4>
    <p>
        Often the 'standard' instrumentation in the .NET Framework gives you good 'starting'
        activities to work with (as the IISRequest and AspNetReq did above).   However if those
        are not sufficient, you can define start-stop activities of your own.
        If your code is running on V4.6 of the .NET Framework or beyond, then it is trivial
        to add new start-stop activities that will show up in this view.
        See
        <a href="http://blogs.msdn.com/b/vancem/archive/2015/09/14/exploring-eventsource-activity-correlation-and-causation-features.aspx">
            EventSource Activities
        </a> for details of doing this.   You will want to turn your events on using the
        /Provider=*YOUR_EVENT_SOURCE_NAME when collecting data, and this view will simply
        incorporate them automatically.
    </p>
    <hr />
    <!--  ********************************** -->
    <h2><a id="UnmanagedMemoryAnalysis">Unmanaged Memory Analysis</a></h2>
    <p>
        PerfView can also be used to do unmanaged memory analysis.&nbsp;&nbsp;&nbsp; &nbsp;
        Typically the first step in a memory investigation (whether it be a managed or
        unmanaged memory investigation is to use a tool like the free SysInternals
        <a href="http://technet.microsoft.com/en-us/sysinternals/dd535533">vmmap</a> tool
        to determine what the memory make up is of your process.&nbsp;&nbsp; This tool can
        break down the current memory usage into half a dozen categories including
    </p>
    <ol>
        <li>Mapped DLLs and EXEs</li>
        <li>Memory allocated by the .NET runtime (the GC heap)</li>
        <li>
            Memory allocated by the unmanaged OS heap (e.g. C malloc or C++ &#39;new&#39;
            new operator, called simply &#39;Heap&#39; by vmmap)
        </li>
        <li>Memory allocated with Virtual Alloc directly (this is called &#39;Private Data&#39; in vmmap)</li>
    </ol>
    <p>
        Depending on which of these is big (and thus interesting, you attack it differently.&nbsp;&nbsp;
        If mapped DLLs or EXEs are the issue, you need to load fewer of them.&nbsp;
        PerfView&#39;s &#39;Image Load Stacks&#39; will show you where you are loading DLLs.&nbsp;&nbsp;
        If the problem is GC Heap, you need to do a GC Heap investigation as described
        in &#39;<a href="#WhenToCareAboutTheGCHeap">When to care about the GC heap</a>&#39;.&nbsp;&nbsp;&nbsp;
        If the problem is either of the last two, then this section tells you how to drill into that problem.&nbsp;
    </p>
    <p>
        In the end, all memory in a process is either mapped (e.g. DLLs or EXEs) or is allocated
        by windows VirtualAlloc API.&nbsp; PerfView allows you to collect a stack trace on
        every VirtualAlloc call (and every VirtualFree call), by checking the &#39;Virtual Alloc&#39;
        checkbox on the advanced collection dialog box.&nbsp; VirtualAlloc was designed to be
        used to allocate large chunks of data (in fact the minimum size is 64K), and so turning
        this option on is not likely to affect the performance of your app, so feel free
        to do so.&nbsp;&nbsp; However precisely because VirtualAllocs are called infrequently
        (typically when another allocator needs more memory), this information is often &#39;to
        coarse&#39; and is only useful when your user code directly calls this API (which is unusual).&nbsp;
    </p>
    <p>
        Much more commonly, you will notice in your VMMAP the that &#39;Heap&#39; entry in the
        display is large, and thus you want to drill into the OS heap.&nbsp; To do this we
        need to collect data every time an OS heap allocation or free happens.&nbsp; This is
        MUCH more common.&nbsp; In fact it is so common that the operating system does not provide
        a way to turn it on system wide (that would be too much data) instead there are two
        dialog boxes in the advanced section of the collection dialog box.&nbsp;
    </p>
    <ol>
        <li>
            The <a href="#OSHeapExeTextBox">OS Heap Exe</a> textbox - Specify an EXE name
            (no path or extension) to turn on OS heap events for a process which has not yet started.
        </li>
        <li>
            The <a href="#OSHeapProcessTextBox">OS Heap Process</a> textbox - Specify
            an EXE name or process ID to turn on OS heap events for a process that is already started.&nbsp;
        </li>
    </ol>
    <p>
        Using one these two techniques you can turn on OS heap events for the process of
        interest.&nbsp;&nbsp; Optionally you can also turn on VirtualAlloc events.&nbsp;
    </p>
    <p>Once you have done this and collected data, you will get the following views</p>
    <ol>
        <li>The OS Heap Alloc Stacks view if you asked for OS heap events</li>
        <li>The VirtualAlloc Stacks view if you ask for VirtualAlloc events.</li>
    </ol>
    <p>
        The two views work the same way.&nbsp;&nbsp;&nbsp;&nbsp; Every allocation in the
        trace is given a weight equal to the number of bytes allocated.&nbsp;&nbsp;
        Every free is given a negative weight and and the CALL STACK OF THE ALLOCATION
        (this way they perfectly &#39;cancel out&#39;).&nbsp; Frees that can&#39;t be
        matched up with allocations in the trace as a whole are ignored.&nbsp;&nbsp;
        After this PerfView treats the stacks just like any other stack-based data it
        processes.&nbsp;&nbsp; It only considered samples that match its filters and
        displays the result.&nbsp;&nbsp; Note that this means that VALUES CAN BE
        NEGATIVE.&nbsp; If you select a time rage where only frees happen then you
        will get a negative number.&nbsp;&nbsp; The basic invariant is that the view
        shows you the NET memory allocation for the range you select.&nbsp; Because
        metrics can now be negative the &#39;When&#39; column might need to show negative
        numbers.&nbsp;&nbsp; These are displayed by using lower case letters (see
        <a href="#WhenColumn">When Column</a> for more).&nbsp;
    </p>
    <p>
        Note that this means that if you display the TOTAL execution of a program in
        theory you should see a value of 0 (you freed everything you allocated).&nbsp;
        In practice this is not true but what IS true is that you are not usually interested
        in the FINAL memory used just before process termination, but the PEAK memory allocation.&nbsp;&nbsp;
        To get that you need to find the time where memory allocation was at its peak.
    </p>
    <p>
        You can do this (roughly) by going to the &#39;
        <a href="#CallTreeView">CallTree View</a>&#39; and selection the
        <a href="#WhenColumn">When Column</a> for the root of hierarchy.&nbsp;&nbsp;
        As you drag regions of the when column PerfView will compute the net and peak
        metric in the region that you dragged.&nbsp;&nbsp; Thus by dragging you can
        quickly determine where the peak is.&nbsp; Typically you the simply need to
        hit &#39;Set Range&#39; (Alt-R) and now you have the region of time where you built
        up to the peak memory usage.&nbsp;
    </p>
    <p>
        You can also easily investigate the net memory usage of any particular operation
        by selecting the time rage over that operation.&nbsp; All the normal filtering,
        folding and grouping operators work. for the memory case.&nbsp;&nbsp;&nbsp;
        Finally by opening two views you can use the <a href="#Diff">Diff</a> feature
        to do an analysis of two runs of the application.
    </p>

    <hr />

    <!--  ********************************** -->
    <h3><a id="DirectorySize">Directory Size Analysis</a></h3>
    <p>
        The directory size menu entry will generate an *.directorySize.perfView.xml.zip file that is a
        hierarchical summation of the sizes of all files in a directory (recursively).   Thus it is
        a very good tool for determine what is taking up disk space on a disk drive and 'cleaning up'
        less valuable files.
    </p>
    <p>
        Selecting this menu entry will bring up a directory chooser that you use to select the directory
        to analyze as well as the name of the file that will hold the gathered data.   Once selected
        PerfView will do a recursive scan on that directory which make take a while.   When it finishes
        (which may take a while for large directories), it will automatically open the data file it
        generates).  You may reopen the file at any time later simply by clicking on it in PerfView's
        main tree view.
    </p>
    <p>
        The 'when' field for directory size works a bit different than for most performance data.
        For each data file, its 'Timestamp' is the number of days (which can be fractional) from the
        time that the data was collected, to the time it was last modified.    Thus by selecting the
        time range from 0 to 7  you will see all files that were modified less than one week ago.
        This information can be very useful for seeing how 'old' the data is (which is often useful
        to determine whether to keep it or not).
    </p>
    <h3><a id="ImageSize">Image Size Analysis</a></h3>
    <h4>Collecting data</h4>
    <p>
        Selecting the Size -> Image Size menu entry will bring up a dialog box you use to specify
        the DLL or EXE to do the size analysis one.  In addition it will allow you to set the
        name of the output file that holds the resulting data.    The dialog will derive a
        output file name from the input file name and generally this default is fine.
    </p>
    <h4>Analyzing the data</h4>
    <p>
        The image size menu entry will generated a .imagesize.xml file the describes the breakdown of
        the size of a DLL or EXE file.    It does this by looking up every symbol for the DLL/EXE in its
        PDB file and using those names for each chunk of the file.  It also looks for references from
        on part of the file to another (for example pointers in memory blobs or assembly code to other
        memory blobs or assembly code.   Because these references can form arbitrary graphs of dependency
        in the same way the GC heap objects form a graph of dependency, PerfView displays this data
        in very much the same way as a GC heap.   Like a GC heap, the 'When', 'First' and 'Last' columns
        do not show the time but represent an address of where the particular item is in the virtual
        address space when loaded.  Thus you can also use this to get an idea of the locality of
        different symbols within the file when loaded.
    </p>
    <h4>Flattening the Trace</h4>
    <p>
        As mentioned, by default PerfView tries to create a 'GC heap' of the items in the DLL if one
        item refers to another it will have a link from the referencer to the object being referenced.   <strong>
            However this behavior can interfere with some analysis.
        </strong>.   In particular if you use the 'include pats or
        'exclude pats' textboxes, it will include or exclude ON THE ENTIRE PATH.   When this is not what you
        want, one easy way to fix the problem is to 'flatten' the graph.
    <p>
        Flattening a set of nodes takes one set of nodes, and returns a new 'GC Heap' where
        <ul>
            <li>
                All links between nodes are ignored.   Instead you get a 'flat' list, where every node
                is a child of 'ROOT' and has no children of its own.
            </li>
            <li>
                Any grouping is 'frozen' int the name.   Normally the 'Group Pats' text box just effects
                how the nodes are displayed, but the nodes still have their original names.   After flattening
                the node name is really what is being displayed (changing the grouping will no longer have
                an effect).
            </li>
        </ul>
        Thus if you to to the 'RefTree' view select the metric associated with the 'ROOT' node, right click, and
        select 'Flatten' you will get a new view in which there is no links between nodes.  Now the 'include pats'
        and 'exclude pats' will select a node based on ONLY THAT NAME (not the name of any of its parents).
    </p>

    <h4>Meaning of certain tags in a Image Size analysis</h4>
    <p>
        Many of the names used in the image size report are the symbol names that symbolic names that
        have a direct relationship with the names in the source code.    However other names describe
        entities of the <a href="http://en.wikipedia.org/wiki/Portable_Executable">Portable Executable (PE)</a>
        format which are needed to prepare the code/data in the DLL/EXE to be run.     Here we describe
        some of these that may show up prominently in the output.
    </p>
    <ul>
        <li>
            <b>Section .relocs</b> - The .relocs section describes relocations.  A PE file may be loaded
            at at effectively any location in memory.  However the code/data in the file may be expecting
            to be loaded at a particular address (called its preferred base address).   A relocation is a
            description of a 'fixup' needed to 'patch' at point in a file that needs to change if the image
            but there may be 1000s of them which can add up.   Both code and data (especially vtables) can
            cause relocations to be necessary.
        </li>
        <li>
            <b>Section .pdata</b> - In X64 and ARM processors, exceptions are supported by having a table
            that will allow the operating system at runtime to convert a code address into the method that
            contains it (along with exception handling information).   This is needed to support 'unwinding'
            of the stack to support exceptions.   The .pdata is this table that maps code address to unwind
            information.   It is proportional to the number of methods.
        </li>
        <li>
            <b>_imp_*</b> - import dispatch cells.  When one DLL calls a method in another DLL, it makes
            an indirect call through a memory cell that will be fixed up at runtime to point at the target
            method.   The _imp_ symbol points at this cell.  It always a pointer sized cell and the number
            of them are proportional to the number distinct cross DLL targets.
        </li>
    </ul>
    <p>Other names are associated with the .NET Runtime Native file format.</p>
    <ul>
        <li>
            <b>ReadOnlyBlobSection</b> - This is a set of bytes that were emitted as a blob and no more
            is known about them.  Currently .NET Metadata needed for reflection is emitted in this way
            and typically is the reason this section is large.   Using an
            <a href="http://blogs.msdn.com/b/dotnet/archive/2014/05/20/net-native-deep-dive-dynamic-features-in-static-code.aspx">runtime directive file</a> to limit the amount of
            reflection used by the app can make this smaller.
        </li>
        <li>
            <b>vtable *</b> - a vtable is short for virtual dispatch table.   It is used to implement
            virtual methods on a class.   It is directly proportional to the number of virtual methods
            a class needs to implement (both directly and inherited virtual methods).
        </li>
        <li>
            <b>FrozenString</b> - a frozen string is the bytes needed to represent a .NET System.String
            literal (quoted strings in the source code).   Having fewer literal strings will make this
            smaller.
        </li>
        <li>
            <b>MethodToGCInfoMap</b> - In order to support garbage collection (GC) the .NET Runtime needs
            to find every reference to the GC heap that is on every methods stack frame.  To find this
            data it needs to map a code address to the GC information. This is the table that does this.
            It is proportional to the number of .NET Methods.
        </li>
        <li>
            <b>MethodToEHInfoMap</b> - In order to support exception handling (EH) the .NET Runtime needs
            to map methods to their exception handling information.
        </li>
        <li>
            <b>CodeManagerSection</b> - The code manager is the logic in the .NET Runtime that knows
            how to decode stack frames.   It is needed by both the GC and EH system.
        </li>
    </ul>

    <h3><a id="ILSize">IL Size Analysis</a></h3>
    <h4>Collecting data</h4>
    <p>
        Selecting the Size -> IL Size menu entry allows you to do a analysis of what is in a .NET
        Intermediate File (IL), which is what .NET Compilers like C# and VB create.   It will generate
        a .gcdump file that makes graph of types, methods, fields and other structures in the IL file
        where each node of the graph indicates how big it is in the file, and the arcs between the nodes
        are references from one item to another.   Thus you can do dependency analysis (what things
        refer to what other things), in the same way as objects in a GC heap.
    </p><p>
        The Size -> IL Size menu entry will bring up a dialog box you use to specify
        the DLL or EXE to do the size analysis on.  This file needs to be a DLL or EXE that contains
        .NET IL (e.g. the output of a .NET compiler).  In addition it will allow you to set the
        name of the output file that holds the resulting data.    The dialog will derive a
        output file name from the input file name and generally this default is fine.
    </p>
    <h4>Analyzing the data</h4>
    <p>
        The image size menu entry will generated a .gcdump file the describes the breakdown of types
        methods fields and other items in the IL file.   It works in much the same way as the GC heap
        analysis or the native <a href="#ImageSize">Image Size Analysis</a>.
    </p>
    <h4>Multi-File heap</h4>
    <p>
        The Menu entry only allows you to specify one IL file when creating the node-arc graph for
        the IL code.   Any references outside this file are not traversed, but simply marked as a
        special 'external reference' node.    It is sometimes useful to select a group of IL files
        (e.g. representing a complete application) which are traversed and only when you leave this
        group would you use 'external reference' nodes.   You can do this with the 'ILSize.ILSize'
        user command.    Thus the command
    </p>
    <ul>
        <li>PerfView userCommand ILSize.ILSize File1.dll File2.dll File3.dll</li>
    </ul>
    <p>
        Will create a GC heap of File1.dll File2.dll and File3.dll as if they were one file.
    </p>

    <!--  ********************************** -->
    <h2>
        <a id="MultiScenarioAnalysis">Multi-Scenario Analysis (Aggregating Traces))</a>
    </h2>
    <p>
        Often, it is useful to analyze performance of one program across multiple traces.
        These traces might represent one large project in a variety of scenarios, or the
        behavior of a common library being used by multiple programs. PerfView supports
        several features for this sort of multi-scenario analysis.
    </p>
    <p>
        A main challenge when doing analysis of multiple scenarios (data files)
        simultaneously is simply the quantity of data being manipulated.&nbsp;&nbsp;
        Individual scenarios can often have an ETL file that is 100s of megabytes,
        and&nbsp; and if you have 100 such scenarios you are now talking 10-100 GB of
        information to process.&nbsp; Because of this, the process is designed to reduce
        the data volume as quickly as possible and to persist this &#39;lean&#39; form
        so that the data volumes at viewing time are kept under control.&nbsp;&nbsp;
        Thus there are two main steps in working with a multiple multiple scenarios&nbsp;
    </p>
    <ol>
        <li>
            For each .ETL (or .ETL.ZIP file), create a new file (a .PERFVIEW.XML.ZIP file),
            that contains just the information needed&nbsp; to view the data in the
            PerfView Stackviewer.&nbsp;&nbsp; This reduces the data volume by a factor
            of 100 or more.&nbsp;&nbsp; This step can be done &#39;off-line&#39; and once
            complete does not need to be repeated until new data comes in.&nbsp;&nbsp; The
            tool is &#39;smart&#39; in that if new input files are added to an existing set
            of data file, it skips the files that were already converted.&nbsp;&nbsp; This
            process takes a few seconds to 10s of seconds&nbsp; for each data file actually
            converted.&nbsp;&nbsp; If you have important unmanaged DLLs in your scenario it is important that the PDB symbol path (e.g. _NT_SYMBOL_PATH) is set properly at his stage.&nbsp; Once converted to an XML.ZIP it is no longer possible to resolve symbols.&nbsp;
        </li>
        <li>
            A new kind of viewing file (a .SCENARIOSET.XML file) that represents the aggregation
            of a set of PERFVIEW.XML.ZIP files.&nbsp; When you open a file of this type
            PerfView will show you the data from all the data files simultaneously.&nbsp;&nbsp;
            You can generate many of these files to form different subsets of the same data files.&nbsp;
            When PerfView opens these files, each data file is given a &#39;top node&#39;
            (above the &#39;process node&#39;) that represents the data.&nbsp;&nbsp; PerfView&#39;s
            standard grouping techniques can then be used zero in on the area of interest (e.g.
            how much a particular library or a function is used across all scenarios, or where
            CPU time is spend &#39;on average&#39; over all scenarios).&nbsp;&nbsp; In addition PerfView
            has special features (the &#39;which column&#39;) that help you quickly understand
            which scenarios are contributing to any particular metric.&nbsp;&nbsp;
            Once &#39;hot&#39; areas are discovered, you can use the &#39;which column&#39;
            to understand how uniformly the problem is distributed across scenarios.&nbsp;
        </li>
    </ol>
    <p>
        The following is more detailed instructions on performing these steps.
    </p>
    <h3 id="BatchXMLCreation">Step 1: Preprocessing ETL Data and Forming the ScenarioSet Representing All the Data Files </h3>
    <p>
        The first step in viewing multiple data file simultaneously is to preprocess
        the data into a &#39;Scenario Set&#39;.&nbsp;&nbsp;&nbsp; You can do this with the &#39;SaveScenarioCPUStacks&#39;
        user command(currently only CPU sampling aggregation is supported).&nbsp;&nbsp; You
        can run it from the <a href="#InvokingUserCommandsGui">PerfView GUI</a> using the &#39;File-&gt;UserCommand&#39;
        menu item or from the <a href="#InvokingUserCommands">command line</a> by executing the following
    </p>
    <ul>
        <li>PerfView userCommand SaveScenarioCPUStacks <strong>MyDataDirectory</strong></li>
    </ul>
    <p>
        The <code>SaveScenarioCPUStacks</code> command takes one argument. This argument
        can be a directory name (as in the example above), or the path to an XML config file.
    </p>
    <p>
        If you pass in a directory, <code>SaveScenarioCPUStacks</code> will run in &quot;automatic&quot; mode.
        It will process all ETL and ETL.ZIP files found in the directory (or any sub-directory),
        using a heuristic method to automatically detect the process of interest for the
        trace.&nbsp; The heuristic used to pick the process of interest is
    </p>
    <ol>
        <li>If the trace contains a&nbsp; Win8 store app, then the first Windows Store app is chosen.&nbsp; </li>
        <li>
            If there is no Windows Store app, then the first executable to start that runs for more than
            half the trace length (this will tend to ignore setup scripts).
        </li>
        <li>If no app matches (2) then the first app to start after the trace starts.</li>
    </ol>
    <p>
        Typically this heuristic approach works well, however if you need control over how <code>SaveScenarioCPUStacks</code>
        runs, you can pass in an XML configuration file that gives you fine control over the processing of the ETL files.&nbsp;&nbsp;
        Here's an example XML config file:<br />
    </p>
    <pre>
&lt;ScenarioConfig&gt;
    &lt;Scenarios files=&quot;*.etl&quot; name=&quot;Win8 Store scenario [$1]&quot; /&gt;
    &lt;Scenarios files=&quot;ScenarioProcess.etl.zip&quot; name=&quot;PerfView&quot; process=&quot;procexp64&quot;
         start=&quot;1000&quot; end=&quot;5000&quot; /&gt;
&lt;/ScenarioConfig&gt;</pre>
    <p>
        As you can see, a config file is composed of a root <code>ScenarioConfig</code>
        element, which contains one or more <code>Scenarios</code> elements. Each <code>Scenarios</code> element
        has attributes set that control how scenarios are processed:
    </p>
    <ul>
        <li>
            The <code>files</code> attribute is the only required attribute of the Scenarios element . This attribute's
            value is a wild card pattern of files to match. All files matched by this pattern
            will be preprocessed and included in the output scenario set. (Relative paths are
            relative to the directory containing the XML config file.)&nbsp;
        </li>
        <li>
            The <code>name</code> attribute controls the name of the scenario, as it is displayed
            in the GUI. This can contain <code>$</code>-substitutions as specified in the <a href="http://msdn.microsoft.com/en-us/library/ewy2t5e0.aspx">.NET framework documentation.</a>
            Each <code>*</code> in the wild card pattern will be converted to a capture group,
            which will allow its use with the <code>$</code><i>number</i> substitution. (For
            example, the first * in the pattern can be referred to as "<code>$1</code>".)
        </li>
        <li>
            The <code>process</code> attribute allows you to override the process-of-interest
            detection logic for a trace. The value of this attribute should be the name of the
            process you wish to include (without any <code>.exe</code> file extension).&nbsp; If you use &#39;*&#39; as the process name then all processes from the trace are processed into the perfView.xml.zip file.&nbsp; Like the &#39;name&#39; attribute you can use $1 in this attribute which will be replaced with the corresponding capture group.&nbsp;
        </li>
        <li>
            The <code>start</code> and <code>end</code> attributes allow you to set the time
            range of interest from the matched traces. Events at time <code>start</code> will be  will be at
            time 0 in the processed output, and any events outside of the time range will be dropped
        </li>
    </ul>
    <p>
        The result of running the SaveScenarioCPUStacks command are the following output file.
    </p>
    <ul>
        <li>
            One <code>*.perfView.xml.zip</code> file for every trace matched.&nbsp;&nbsp; These will
            only be generated if they do not exist, or if their corresponding ETL trace data is
            newer, but if they are up to date, nothing is done for that file.&nbsp;
        </li>
        <li>
            One <code>*.scenarioSet.xml </code>file for the entire scenario set.
            This file is necessary for the viewing the data in <a href="#ViewingMultipleScenarios">step 2</a>.
        </li>
    </ul>
    <p>
        If you'd like, you can also generate your own <code>scenarioSet.xml</code> file.
        A scenarioSet file is similar to a <a href="#ScenarioConfig">scenario config</a>
        file, but with slightly different attributes.&nbsp; Here is an example scenarioSet file:
    </p>
    <pre>
&lt;ScenarioSet&gt;
    &lt;Scenarios files="*.perfView.xml.zip" namePattern="Example scenario [$1]" /&gt;
    &lt;Scenarios files="foo.perfView.xml.zip" namePattern="Example scenario [baz]" /&gt;
&lt;/ScenarioSet&gt;</pre>
    <p>
        As you can see it is basically a list of file patterns (which indicate which files
        in the directory (or any subdirectory) of the directory holding the ScenarioSet.xml
        file should be included), as well as a pattern that allows you to take that file name
        and convert it to scenario name.&nbsp;&nbsp;&nbsp; You can make your own XML files to
        create interesting subsets of some data.
    </p>
    <h3>Step 2: Viewing Multiple Scenarios</h3>
    <p>
        Once you've processed your scenario data, you can then proceed to view it. To do
        this, use the treeview in the main view to browse to the generated scenarioSet.xml
        data file and double-click to open it.
    </p>
    <p>
        For the most part, this is the familiar Stack viewer you use on a single ETL file,
        the main difference is that each stack from a particular data file (scenario) has a
        new pseudo-frame at the very top that identifies the scenario that the sample comes
        from.&nbsp;&nbsp; Thus stacks belong to threads belong to processes belong to
        scenarios.&nbsp;&nbsp; Everything else about the stack viewer works as it did in
        the single-scenario case.&nbsp; The stack view appears as if every scenario simultaneously
        on the same machine.
    </p>
    <p>
        In addition to the new &#39;top&#39; node for each stack, the viewer has a couple
        of enhancements that only are visible in the multi-scenario case.&nbsp;&nbsp; You will see:&nbsp;
    </p>
    <ul>
        <li>
            A <a id="WhichColumn">'which column'</a>
            displaying a histogram of the scenarios in which
            a given frame occurred.
        <li>
            As at the top of the display there is the <a id="ScenariosBox">Scenarios textBox</a> that
            lets you filter and rearrange the scenarios
            shown in the &#39;which&#39; column.
    </ul>
    <p>
        In the same way that the &#39;when&#39; column allows you to see for every row in
        the view a small graph displaying the samples as function (histogram) in time, the &#39;which&#39;
        shows you a histogram of the scenarios that had samples contributing to that row.&nbsp;&nbsp;
        Thus you can quickly determine whether the cost of that row was uniformly distributed across
        scenarios or whether just a handful of scenarios contributed to the cost.&nbsp;
    </p>
    <p>
        The which field has a number of handy features associated with it.
    </p>
    <ul>
        <li>
            You can select the &#39;which&#39; field, then select a range and as you drag the range
            the names of the scenarios will be displayed in the status line at the bottom of the
            view.&nbsp; This allows you to see the name of values in the histogram.
        </li>
        <li>
            You can select a &#39;which&#39; field, right click -&gt; Scenarios -&gt; Sort -&gt;
            Sort by this Node.&nbsp;&nbsp; This causes the scenarios to be reorders in the histogram
            so that the current node&#39;s metrics will be sorted from the scenario that use the most
            metric to the scenarios that use the least metric.&nbsp;&nbsp; You can undo this with
            Scenarios -&gt; Sort -&gt; Sort by Default.
        </li>
        <li>
            When you select a range in the &#39;which&#39; field you can right click -&gt; Scenarios -&gt;
            Set Scenario List, which will filter the trace to just the scenarios represented by the
            selected range.&nbsp; This is typically used in conjunction with the &#39;sort&#39; feature
            (first you sort the scenarios by how expensive they are for a particular node, and then
            select some subrange of those scenarios to drill into (looking at the scenarios that
            either used a lot or a little of the metric).
        </li>
    </ul>
    <hr />
    <!--  ********************************** -->
    <h2>
        <a id="Merging">Merging</a>
    </h2>
    If you intend to transfer the data collected with PerfView to another machine an additional
    step called merging is needed.&nbsp;
    <p>
        PerfView uses the <a href="http://msdn.microsoft.com/en-us/library/bb968803(v=VS.85).aspx">
            Event
            Tracing for Windows (ETW)Windows (ETW)
        </a> facility built into windows to collect profiling
        information.&nbsp;&nbsp; This infrastructure does not naturally create a single
        file for the data, but segregates data that came from the OS kernel from other events.&nbsp;&nbsp;
        Thus the &#39;raw&#39; data generated consists of two files (one which is just etl,
        and another .kernel.etl).&nbsp;&nbsp; Moreover these files are missing some information
        that is needed to fully decode the file on another machine (most notably, the mapping
        of OS kernel names to NTFS file names and the symbol server &#39;keys&#39; that
        allow unambiguous lookup of symbolic information (PDBs).&nbsp;&nbsp; Neither of
        these limitations are a problem if you consume the data on the same machine as it
        was collected on, but if you wish to transfer it to another machine, you should
        first merge the data.&nbsp;
    </p>
    <p>
        Merging is a process by which the .kernel.etl is merged into the main .etl file.&nbsp;&nbsp;
        In addition the missing system-specific information is gathered up and also placed
        in the .etl file.&nbsp; The result is a single file that can be copied to a different
        machine for analysis.&nbsp;&nbsp; This process can take a non-trivial amount of
        time (10s of seconds), which is why PerfView does not do it by default.&nbsp;&nbsp;&nbsp;
        You can perform merging by
    </p>
    <ul>
        <li>Right clicking on the file in the main tree view an selecting &#39;Merge&#39;</li>
        <li>Using the Collect-&gt;Merge menu item. </li>
        <li>Clicking the &#39;Merge&#39; checkbox when the data is collected</li>
        <li>
            Collect the data from the command line (using &#39;run&#39; or &#39;collect&#39;)
            commands and specify the /merge qualifier.
        </li>
    </ul>
    <p>
        Once the file is merged, you can simply copy the single file to another machine
        for &#39;off-line&#39; analysis.&nbsp;&nbsp;&nbsp; Note however that while the ETL
        file contains symbolic information for .NET Runtime code, it does NOT contain symbolic
        information for unmanaged code.&nbsp;&nbsp; Thus if it is important to see the symbolic
        names for unmanaged code, you need to ensure that the machine on which analysis
        occurs has access to the PDB files that contains this information.&nbsp;&nbsp;
    </p>
    <hr />
    <!--  ********************************** -->
    <h2>
        <a id="NGenPdbs">NGen Pdbs (and Zipping)</a>
    </h2>
    <p>
        <a id="merging">Merging</a> an operation necessary to view ETL files on a machine
        other than the machine the data was collected on. However it is not sufficient for
        all cases. While the resulting merged file has all the information to look up symbolic
        information (for stack traces), it does not guaranteed that the symbolic information
        will be available. In particular, when collecting traces whose processes use the
        .NET runtime, it is necessary to reference the symbolic information (PDB files)
        for the native code images (NGEN images), of the managed code (if it was NGENed).
        These NGEN Pdbs are NOT the PDB file for the IL images (something created by IL
        compilers like CSC.exe, or VBC.exe). The NGEN PDBs are generated by the NGen.exe
        command that comes with the .NET framework and can only be reliably generated on
        the machine that generated the NGEN image.
    </p>
    <p>
        As part of the ZIPPing process, PerfView will look up all addresses in the ETL file
        and determine which NGEN images were used, and if necessary generate the PDB files
        for those images. It will then ZIP both the ETL file as well as any NGEN PDBs into
        a single ZIP file that can now be viewed on any machine (PerfView knows how to automatically
        unpack these files).
    </p>
    <hr />
    <!--  ********************************** -->
    <h2>
        <a id="CollectingFromCommandLine">
            Collecting Data from the Command Line (Scripting,
            Automation)
        </a>
    </h2>
    <p>
        See also <a href="#PerfViewExtensions">PerfView Extensions</a> for advanced automation
        by building an extension for PerfView.
    </p>
    <p>
        See also <a href="#CommandLineReference">Command Line Reference</a> for a complete list
        of the options you can use at the command line
    </p>
    <p>
        PerfView is designed so that you can automate collecting profile data be using a
        batch file or other script. The three likely scenarios are:
    </p>
    <ol>
        <li>
            The user simply wants to quickly collect data from the command line for immediate
            analysis, either on the same machine or a different machine.
        </li>
        <li>
            The user wants to make a simple script to automate data collection but still needs
            to be present during collection (e.g., hand testing a GUI app), but does not wish to
            immediately analyze the data (someone else will do that).
        </li>
        <li>
            Data collection is completely automated, for completely unmonitored collection.
        </li>
    </ol>
    <p>
        In the first case you are likely to want to use either the &#39;run&#39; or &#39;collect
        commands
    </p>
    <ul>
        <li>PerfView run <strong>Command_and_Args</strong></li>
        <li>PerfView collect </li>
    </ul>
    <p>
        &nbsp;The &#39;run&#39; command immediately runs the command and launches the stack
        viewer.&nbsp;&nbsp; This is the preferred option if it is easy to launch the program
        and it can be run to completion.&nbsp;&nbsp; However sometimes it is difficult to
        do this (the app is part of a service, or is activated by a complicated script),
        then you can start system wide collection with the &#39;collect&#39; command.&nbsp;
    </p>
    <h4>Skipping Rundown (/NoRundown)</h4>
    <p>
        By default the &#39;collect&#39; command performs a &#39;rundown&#39; where information
        to properly decode symbolic information collected before profiling stops.&nbsp;&nbsp;
        This operation can be relatively expensive (takes seconds, and increases file size
        by 10s of Meg).&nbsp;&nbsp;&nbsp; This information is naturally provide when processes
        shut down, but the &#39;collect&#39; command does not know if you shut down the
        process of interest, so it performs the rundown.&nbsp;&nbsp;&nbsp;&nbsp; If you
        know that the process of interest has exited, then rundown is pointless and can
        be avoided by specifying the /NoRundown qualifier.&nbsp;&nbsp; This option can save
        time and file size.&nbsp;
    </p>
    <h4>Suppressing Viewing&nbsp; (/NoView)</h4>
    <p>
        By default PerfView assumes you wish to immediately view the data you collected,
        but if the person collecting the data (e.g. a tester) is not the person analyzing
        the data (e.g. a developer), then we wish to suppress the viewer.&nbsp;&nbsp; This
        is what the /noView qualifier does and it works on the &#39;collect&#39; and &#39;run&#39;
        command.&nbsp; Thus
    </p>
    <ul>
        <li>PerfView /noView run <strong>Command_and_Args</strong></li>
    </ul>
    <p>
        Will turn on logging and run the given command.&nbsp; It will also <a href="#merging">merge</a>
        the file, under the assumption that the file is likely to be moved off the current system.&nbsp;
        It will however still bring up the GUI and it will not exit automatically when it is done (so that
        the user can react to any failures or messages and is required for the &#39;collect&#39;
        command so that the user can indicate when collection should stop).&nbsp;
    </p>
    <h4>
        <a id="AutomatingCollection">Automating Collection&nbsp; (/LogFile:FileName)</a>
    </h4>
    <p>
        See also <a href="#CommandLineReference">Command Line Reference</a> for a complete list
        of the options you can use at the command line
    </p>
    <p>
        The /NoView makes sense where is it hard to fully automate data collection (measuring
        ad-hoc scenario in a GUI app).&nbsp;&nbsp; However for fully automatic collection
        you don&#39;t want the GUI at all.&nbsp; This is what the /LogFile qualifier is
        for.&nbsp;&nbsp; By specifying this qualifier you indicate that no GUI should be
        opened and that the program should exit after running the command on the command
        line.&nbsp;&nbsp; Any error messages that would have been reported in the GUI instead
        are APPENDED to the log file (we append so you can use the same file for several
        PerfView commands.&nbsp;&nbsp; The exit code of the PerfView process will indicate
        the success or failure of the collection and the log file will contain the detailed
        diagnostic messages.&nbsp;&nbsp;&nbsp;
    </p>
    <p>
        Note that the /LogFile qualifier will suppress the GUI, but it will not suppress the
        generation of a console if the 'Collect' command is specified and no /MaxCollectSec
        qualifier is given.   The reason is that without /MaxCollectSec=XXX the Collect command
        could run forever and you would have not way of stopping it cleanly (you would have
        to kill the process).   If you wish to use /LogFile and Collect (because you wish
        to use the /StopOn* qualifiers), and wish to suppress any consoles, you can do this by
        specifying a very large /MaxCollectSec value.
    </p>
    <p>
        In addition to the /logFile qualifier it is good to also apply the /AcceptEula qualifier
        to scripts that call PerfView.  By default the first time PerfView is run on any particular
        computer it displays a pop-up that asks the user to accept the usage agreement (EULA).  This
        can be problematic for scripts since it requires human interaction.   To avoid this you can
        use the /AcceptEula qualifier on the command line that does this operation silently.
    </p>
    <p>Thus a typical use of the /logFile and /AcceptEula qualifiers is the command</p>
    <ul>
        <li>PerfView /logFile=perfViewRun.log /AcceptEula run tutorial.exe</li>
    </ul>
    <p>
        which runs the &#39;tutorial.exe&#39; from a script (no GUI).&nbsp;&nbsp; If you need
        to collect system wide, (you want to use &#39;collect&#39; not &#39;run&#39;) there
        is a problem because PerfView does not know when to stop.&nbsp; There are two ways
        to solve this problem.&nbsp; The first is to use the &#39;/MaxCollectSec&#39; qualifier..&nbsp;
        For example the following command will collect for 10 seconds and then exit.&nbsp;
    </p>
    <ul>
        <li>
            PerfView&nbsp; /LogFile=PerfViewCollect.log /AcceptEula /MaxCollectSec:10 collect&nbsp;
        </li>
    </ul>
    <p>
        If you wish to control the stopping by some other means besides a time limit, you
        can also use the &#39;start&#39; and &#39;stop&#39; and &#39;abort&#39; commands.&nbsp;
    </p>
    <ul>
        <li>PerfView start /AcceptEula /LogFile=PerfViewCollect.log</li>
        <li>PerfView stop /AcceptEula /LogFile=PerfViewCollect.log</li>
        <li>PerfView abort /AcceptEula /LogFile=PerfViewCollect.log</li>
    </ul>
    <p>
        These are meant to be used in scripts.&nbsp;&nbsp; The first will start logging
        <strong>and leave it on even after program exit</strong>.&nbsp; The second stops
        logging.&nbsp;&nbsp;&nbsp;&nbsp; You should avoid using these (use collect /MaxCollectSec
        instead), if you can.&nbsp;&nbsp; The reason is if the script where to fail between
        the start and stop commands, logging might not be stopped and will run &#39;forever&#39;.
        Thus some care is necessary in using these.&nbsp;&nbsp; The &#39;abort&#39; command
        is meant to help ensure that PerfView is not logging.&nbsp;&nbsp;&nbsp; It is meant
        to be called at locations where you know that PerfView should NOT be running, and
        it ensures that indeed it is not.&nbsp;&nbsp; You should use it liberally in scripts
        that use the &#39;start&#39; command.
    </p>

    <h4>
        Minimizing Impact of Collection on the System&nbsp; (/LowPriority)
    </h4>
    <p>
        The normal Event Tracing for Windows (ETW) logging is generally very efficient (often &lt; 3%)
        however after a trace has completed, PerfView normally does relatively expensive things
        to package up the data (including merging, NGEN symbol creation and ZIP compression).  These
        operations obviously can use resources that may slow down whatever else is running on the
        machine.
    </p>
    <p>
        If you pass the /LowPriority option to PerfView on the command line, it PerfView will do
        these operations at low CPU priority.  This can significantly slow down the time it takes
        to package up the data, but it minimizes the impact to the system.
    </p>

    <!--  ************************ -->
    <hr />
    <h4>
        <a id="WindowsContainers">Using PerfView inside Windows Server (Docker) Containers</a>
    </h4>
    <p>
        Containers can be best thought of as a light weight virtual machine.    See
        <a href="https://docs.microsoft.com/en-us/virtualization/windowscontainers/quick-start/quick-start-windows-10">Windows Containers on Windows 10</a>
        for more background on containers for windows.   In particular windows supports a
        light weight container called a 'Windows Server Container' in which the kernel is
        shared among all the containers running on a machine.   Such containers are used
        in conjunction with a tool called Docker, which allows you to create OS images and
        run applications in the virtualized environment.
    </p>
    <p>
        Ideally containers should be irrelevant to using PerfView, since containers are a kind of windows
        operating system and PerfView is just a windows application running there.   This is
        mostly true, but there are some differences that need to be considered.
    </p>
    <ol>
        <li>
            Because containers share the kernel, and the ETW events that PerfView relies on
            are generated by the kernel, it requires special support in the operating system
            to 'virtualize' the events and forward them to the ETW session in the appropriate
            container.    This support was added in version RedStone (RS) 3 (also called version 1709 released 10/2017)
            of the operating system.
            The command 'cmd -c ver' will tell you the BUILD version of the OS you are currently running
            on and the <a href="https://en.wikipedia.org/wiki/Windows_10_version_history">Windows 10 version history</a>
            page can correlate that to your windows 10 version.    Note that as of that release only
            the CPU and context switch events are supported. but that is enough to do a lot of useful
            analysis.
        </li>
        <li>
            Containers don't have GUIs, and PerfView is a GUI app.  What this means is that if you run
            PerfView from a command prompt in a container, it will seem to do nothing.  What it was doing
            is launching the GUI, which you don't see, and detaching from the current console.  Thus it
            is doing exactly what it always does, it is just not as useful in a container.   However
            PerfView supports powerful command line options to automate collection and these work fine
            in a container.
        </li>
    </ol>
    <p>
        Thus PerfView works in a container, but need to ensure you have a new enough version of the
        operating system, and that you use the techniques in <a href="#AutomatingCollection">Automating Collection</a>
        to collect data without using the GUI.
    </p>

    <!--  *************** -->
    <h5> Container Use Example </h5>
    <p>
        A example is worth a thousand explanations, so here is an example.  First you need to set up
        install Docker for windows from the web.  There are plenty of good tutorials on line for that.
        Once you have docker set up you can do the following
    </p>
    <ol>
        <li>docker pull microsoft/windowsservercore:1803 cmd</li>
    </ol>
    <p>
        which will pull down the 1803 version of Windows Server Core (it is about 5GB) and run the 'cmd' command in it.
        Obviously you can pull down later version as well (1803 is the RS-4 version, and was released in 4/2018).  The
        important part is that it is RS-3 or later.  The result is a C> command prompt.
    </p>
    <p>
        At this point you can copy PerfView into your container (e.g. 'net use \\SomeShare\SomeSpot).  Once you
        have PerfView copied you can do
    </p>
    <ol>
        <li>PerfView /logFile=log.txt /maxCollectSec=30 collect</li>
    </ol>
    <p>
        Which will cause PerfView to disconnect from the console, logging any diagnostics to out.txt.   Ultimately
        this command will create a PerfViewData.etl file in the normal way.  You can do 'type log.txt' to see how
        things are progressing as it runs.  If you put this command in a batch file, it will not detach from the
        console and thus the batch file will not continue until the collection is done.   Thus you can make a batch file
        that calls PerfView, and then copies the resulting file somewhere.    You can also use the 'start' and 'stop'
        PerfView commands instead of the 'collect' command if you wish to have your batch file start collection, kick
        off some operation while monitoring, and then stop it.   The point is that this works just like normal windows,
        and PerfView is very flexible.   You will be able to do just about anything.
    </p>

    <!--  *************** -->
    <h5> Windows Nanoserver and PerfViewCollect </h5>
    <p>
        The windowsservercore docker image is a pretty complete version of windows.  In particular it has a complete
        .NET Runtime on it, which is what PerfView needs to run.   Microsoft also supports a even smaller Docker image
        of windows called microsoft/nanoserver (which is 300 MB not 5GB).  This OS does support ETW, and thus in theory
        you could collect PerfView data on it, but it does not have the desktop runtime, so the PerfView.exe tool
        itself can't run.  This is what the 'PerfViewCollect' tool is for.
    </p>
    <p>
        PerfViewCollect is a version of PerfView that has been stripped of its GUI (it only does collection), and
        built using the .NET Core runtime.    When building .NET Core applications you can build them to be self-contained
        meaning that the application comes with all the .NET runtime and framework DLLs needed to run it.  Thus you
        only need the basic OS functionality, and in particular it will run on the NanoServer.
    </p>
    <p>
        Currently we don't create a binary distribution of PerfViewCollect, it must be built from the source code at
        https://github.com/Microsoft/perfview.   To build, however you don't need visual studio, you only need the
        <a href="https://www.microsoft.com/net/download">.NET Core SDK</a>  Thus the procedure is
    </p>
    <ul>
        <li>Install the <a href="https://www.microsoft.com/net/download">.NET Core SDK</a>.  This gives you the 'dotnet' command</li>
        <li>Install Git for windows if you not already</li>
        <li>git clone https://github.com/Microsoft/perfview</li>
        <li>cd PerfView\src\PerfViewCollect</li>
        <li>dotnet publish -c Release --self-contained -r win-x64</li>
    </ul>
    <p>
        This last command will build the PerfViewCollect application as a self contained application. The tool
        tells you where it put it, but it should be in src\PerfViewCollect\bin\Release\netcoreapp3.1\win-x64\publish.
        The tool is the PerfViewCollect.exe in that directory.  You can do a PerfViewCollect /? to get some help
        (but it will be exactly the same command line help for PerfView.exe).
    </p>
    <p>
        If you copy this directory to your nanoserver you should be able to run the PerfViewCollect.exe there as well
        Thus you can do the command
    </p>
    <ul>
        <li>PerfViewCollect.exe /logFile=log.txt /maxCollectSec=30 collect</li>
    </ul>
    <p>
        To collect data on Window nanoserver.
    </p>
    <!--  *************** -->
    <h5> Known issues (in Windows Version 1803 or earlier) </h5>
    <p>
        There is a known issue as of 10/2018 (or earlier).   Basically the issue is that DLLs that are part of the
        operating system in the container (e.g. the kernel, ntdll, kernelbase ...) end up using the HOST paths
        not the CONTAINER paths.   This would not be that big of a deal, except that the DLL load events do NOT
        contain a special unique identifier that is used to find the symbol file for the DLL on the Microsoft
        symbol server.   Normally as part of preparation (merging) of the file to be copied off system, these
        unique IDs are added to the trace.   However because this is done IN THE CONTAINER and the events have
        the HOST paths, the logic that does this fails so there are no unique IDs for the system.DLLs.  This
        means PerfView can't look up the symbol names.
    </p>
    <p>
        There is a work-around.   If you get the correct symbol files (PDBs) and place them in a directory
        and use the File -> Set Symbol Path to include this directory, AND you pass the /UnsafePDBMatch option
        to PerfView, then it should work.
    </p>
    <p>
        There are a variety of ways of getting the correct symbol file, but one way is to use a debugger
        in the container and ask the debugger to load the necessary system files.   Then go to where the debugger
        put them.
    </p>

    <!--  ************************ -->
    <hr />
    <h4>
        <a id="ProductionMonitoring">Production Monitoring</a>
    </h4>
    <p>
        See also <a href="#CommandLineReference">Command Line Reference</a> for a complete list
        of the options you can use at the command line
    </p>
    <p>
        &nbsp;PerfView has a few features that are designed specifically to collect data on production
        workloads to diagnose performance problems that only occur under real-world loads.
        We have already seen the /noView option that indicates that after data collection
        is completes PerfView should simply exit (rather than try to display the data).
        There are a couple other useful <a href="#CommandLineReference">command line options</a> that can be used for production
        monitoring. First is the /MaxCollectSec:N qualifier. The command
    </p>
    <ul>
        <li>PerfView collect /MaxCollectSec:20 /AcceptEula /logFile=collectionLog.txt</li>
    </ul>
    <p>
        Will indicate that PerfView should collect for at most 20 seconds. Thus this command
        needs no user interaction to collect a sample of data. Because the /logFile option
        was also given, any diagnostic information about the collection will be sent to
        'collectionLog.txt'. Thus this completely automates collection of data on a server
        machine in a single command line command.
    </p>
    <h5><a id="StopOnPerfCounter">Using Performance Counters to trigger collection stop (Stop Trigger qualifier)</a></h5>
    <p>
        The /MaxCollectSec qualifier is useful to collect sample immediately. However it
        is not uncommon that servers experience intermittent performance problems (e.g.
        bouts of high CPU or high GC usage etc). Thus what is desired is the ability to
        monitor the server and only capture a sample when something 'interesting' is happening.
        This is what the /StopOnPerfCounter option is for. The basic syntax for the /StopOnPerfCounter
        qualifier is
    </p>
    <ul>
        <li>PerfView collect /StopOnPerfCounter:CATEGORY:COUNTERNAME:INSTANCE OP NUM</li>
    </ul>
    <p>
        Where CATEGORY:COUNTERNAME:INSTANCE indicates a particular performance counter (following
        the same naming convention that PerfMon uses), OP is either a &lt; or a &gt; and
        NUM is a number. For example
    </p>
    <ul>
        <li>PerfView collect "/StopOnPerfCounter:.NET CLR Memory:% Time in GC:_Global_>20"</li>
    </ul>
    <p>
        Indicates that PerfView should collect data until the _Global_ instance (which represents
        sum of all GC heaps for all processes on the system) of the '% Time in GC' for the '.NET CLR Memory'
        category is greater than 20%. Thus this specification will trigger when GC time
        is high. By default the 'collect' runs in 'circular buffer mode' with a default
        size of 500MB. Thus the command above will only collect 500MB of data (typically
        this is a few minutes of data) and then it starts discarding the oldest data. When
        the performance counter triggers, then the command stops and you will have the last
        few minutes of data that lead up to the 'bad perf' (in this case high GC time).
    </p>
    <p>
        Some counters (like the system global counters 'Memory:Committed Bytes' do not have
        an instance because there is only one for the whole machine.  For these specify
        an empty string.   For example
    </p>
    <ul>
        <li>PerfView collect "/StopOnPerfCounter:Memory:Committed Bytes: > 50000000000"</li>
    </ul>
    <p>
        will stop collection when the committed bytes for the entire machine exceed 50GB.  Notice
        that the counter is still CATEGORY:NAME:INSTANCE, but in this case INSTANCE is the
        empty string (the trailing :).
    </p>
    <p>
        The performance counter will trigger when PerfView detects that the
        counter has satisfied the condition for a certain number of seconds,
        defaulting to 3 seconds. You can control this with the flag
        /MinSecForTrigger:N to set the threshold to N seconds.
    </p>
    <p>
        When the performance counter triggers, PerfView actually collects 10 more seconds
        of trace before stopping. This way you get both the conditions up to and slightly
        after the event that you are interested in. PerfView logs an event called StopReason
        to the ETW event stream when the performance counter is triggers so you can see
        exactly when this happened when looking at the data.
    </p>
    <p>
        To find the exact names of performance counters to use in the /StopOnPerfCounter' qualifier
        you can use the PerfMon utility built into windows. To start it simply type 'start
        PerfMon' at a command line. Then click on the 'Performance Monitor' icon in the
        left hand pane. This brings up the performance counter graph in the right hand pain.
        You can click on the + icon at the top to add new performance counters. This will
        bring up and 'Add Counters' dialog box with the performance counters categories
        populated. For example you can open the '.NET CLR Memory' category and you will
        see counters like '# bytes in all heaps' and '% time in GC'. Selecting one of these
        will then show you all the instances (processes) that have those counters. These
        three names (category, counter, instance) are the values you need to give to the
        '/StopOnPerfCounter qualifier.
    </p>
    <p>
        You will want to test your /StopOn* specification before waiting a long time to see
        if it captures a trace properly.   If you open the log (or use /MaxCollectSec=XXX to
        force it to stop quickly and then look at the file specified by /LogFile or look for
        this captured log file in the 'TraceInfo view of the '*.etl.zip'), you will find
        diagnostic messages as it monitors the perf counter.  You should see messages that
        show it setting up the perf counter as well as the values it sees every few seconds.
        This can give you confidence that you did not misspell the counter, that you have
        the correct instance, and you picked a reasonable threshold.
    </p>
    <p>
        You can specify the /StopOnPerfCounter qualifier more than once and each acts as a trigger.
        Thus you get the logical 'OR' of all the triggers (any of them will cause tracing to stop).
        There is currently no way of specifying a logical 'AND'.
    </p>
    <p>
        If the process you want to monitor lives a long time, then you can specify the instance
        of that process in the /StopOnPerfCounter qualifier. Sometimes, however it is difficult
        to identify the process instance you want. Some counters (like the GC counters and
        others), have a special instance that represents 'all' processes in some way. Look
        for these in the 'instances' listbox in PerfMon. These can be handy.  If don't have a
        aggregate instance, you can /StopOnPerfCounter for each process instance that MIGHT exist.
        This is not hard to do because Perf Counters are given names like EXE, EXE#1, EXE#2 etc.
        Thus you can specify /StopOnPerfCounter for each of the N from 1 up to the maximum
        number of instance you expect.   PerfView is robust to instances that don't exist (it waits
        for them to exist), so you get the behavior you want.
    </p>
    <p>
        Here are some other useful /StopOnPerfCounter examples
    </p>
    <ul>
        <li>
            PerfView collect "/StopOnPerfCounter=Processor:% Processor Time:_Total>90" - This command
            will trigger if the total CPU time used by the machine exceeds 90%
        </li>
    </ul>
    <h5>Monitoring Performance Counters in the ETL file.  </h5>
    <p>
        It is often useful to have performance counter data logged to the ETL file so that
        you can correlate the data in the performance counter to the other ETW data.   This
        is what the /MonitorPerfCounter=spec qualifier does.   It has the format
        CATEGORY:COUNTERNAME:INSTANCE@NUM  where CATEGORY:COUNTERNAME:INSTANCE, identify
        a performance counter (same as PerfMon)and NUM is a number representing seconds.
        The @NUM part is optional and defaults to 2.   You can have several of these
        qualifiers when collecting data.  The value of the performance counter
        is logged to the ETL file as an event ever NUM seconds.   Thus
    </p>
    <ul>
        <li>PerfView "/MonitorPerfCounter=Memory:Available MBytes:@10" collect</li>
    </ul>
    <p>
        This command logs the Available MBytes performance counter ever 10 seconds.  This data
        shows up in the 'events' view under the PerfView/PerformanceCounterUpdate event.
        Monitoring the server's RPS load or memory usage is often useful.
    </p>

    <h5>Using log HTTP requests as the trigger to stop</h5>
    <p>
        A reasonably common scenario is that you have a web service and you are interested
        in investigating cases where response time is long. However most of the time response
        time is good. Thus simply collecting a sample is not likely to be useful. What you
        need is to run as a 'flight recorder' until a long request happened and then stop.
        This is what the /StopOnRequestOverMSec qualifier does. The command
    </p>
    <ul>
        <li>PerfView collect "/StopOnRequestOverMSec:2000"</li>
    </ul>
    <p>
        Will stop when an IIS (e.g. ASP.NET) request takes longer than 2000 msec. You can
        also add the /CollectMultiple:N option so that you collect N of these (the file
        name is morphed to add a .1, .2 ....).
    </p>
    <p>
        Finally you can also cause PerfView to stop when messages are written to the windows
        Application event log. Thus the command:
    </p>
    <ul>
        <li>PerfView collect "/StopOnEventLogMessage:Pattern"</li>
    </ul>
    <p>
        Will stop when a message is written to the Windows Event Log that matches the .NET
        Regular expression pattern 'Pattern'.  By default PerfView monitors the Applications
        event log, but if you wish to monitor another you can do so by prefixing 'Pattern'
        with the name of the event log following by a @.
    </p>
    <h5>Using long .NET GCs as as the trigger to stop</h5>
    <p>
        Another&nbsp; reasonably common scenario is
        you have some non-HTTP based service that is experiencing pause times and you have a large
        .NET Heap.&nbsp;&nbsp; Using the /gccollectOnly option for collection you where able to take a
        very long trace (hours to days) and did discover that there are long GCs that happen from time
        to time, but only sporadically.&nbsp;&nbsp; These long GCs are blocking and thus are
        likely to be responsible for the long pause times and you wish to have detailed information about
        the long GCs.&nbsp;&nbsp;&nbsp; This is what the&nbsp; /StopOnGCOverMSec qualifier does. The command
    </p>

    <p>
        will collect detailed information that will capture about 2 minutes of detailed information right before any GC
        that takes over 5 seconds.&nbsp;&nbsp; This detailed information includes information on contexts switches
        (the /ThreadTime qualifier) and will collect up to three separate files (named the default: PerfViewData.etl.zip,
        PerfViewData.1.etl.zip and PerfViewData.2.etl.zip) for 3 separate long GCs before shutting down.&nbsp;
    </p>

    <h5>Using Exceptions to trigger a stop</h5>
    <p>
        Another common scenario is to trigger a stop after an exception as been thrown.   This allows you to see what was
        happening just before the exception happened.  You can also match on the name exception or text in the exception being thrown.
        For example
    </p>
    <ul>
        <li>PerfView collect "/StopOnException:ApplicationException" /Process:MyService /ThreadTime </li>
    </ul>
    <p>
        Will stop on whenever an exception that has 'ApplicationException' was thrown from the MyService process (note that
        /Process picks the FIRST process with the given name to focus on, NOT all processes with that name).   The pattern
        argument for /StopOnException can be any .NET Regular expression.
    </p>
    <ul>
        <li>PerfView collect "/StopOnException:FileNotFound.*Foo.dll" /ThreadTime </li>
    </ul>
    <p>
        Will stop on whenever an exception that has 'FileNotFound' in its type and 'Foo.dll' somewhere in the text of the message.
        Notice that you can use a .NET Regular expression .* in the pattern.   You can use the full power of .Net regular expressions.
    </p>
    <h5>Collecting multiple instances of a problem</h5>
    <p>
        By default when any of the /Stop* arguments are given, PerfView will stop and exit after the trigger fires.
        It is often useful to collect multiple instances of a problem in once session this is what the /CollectMuliple:N
        qualifier does.     For example
    </p>
    <ul>
        <li>PerfView collect "/StopOnRequestOverMSec:5000" /CollectMultiple:3</li>
    </ul>
    <p>
        Will only trigger for ASP.NET requests over 5000, However once triggered, it will go back and resume monitoring
        until 3 such examples are created.   Thus a maximum of 3 files will be generated by the command above.  The
        resulting .ETL.ZIP files have a number just before the .ETL.ZIP suffix that makes the file names unique.
    </p>

    <h5>Restricting the trigger to a particular process&nbsp; </h5>
    <p>
        By default the&nbsp; /StopOn*OverMsec and /StopOnException will trigger when ANY process satisfies the trigger.&nbsp;&nbsp; On servers
        with many services running this can lead to false triggers if you are only interested in a particular process.&nbsp;&nbsp;
        This is what the /Process:<strong>processNameOrID</strong> qualifier can be used for.&nbsp; For example
    </p>
    <ul>
        <li>PerfView collect "/StopOnRequestOverMSec:5000" /Process:3543</li>
    </ul>
    <p>
        Will only trigger if there is a web request that is over 5000 msec from the process with ID 3543.   You can also
        use a process name (exe without path or extension) for the filter, however this name is just used to look up the
        FIRST PROCESS with that name.   Thus if there is more than one process with that name at the time the collection
        is started the exact process that is picked is effectively random.   Thus you need to use numeric IDs for existing
        processes unless the process name is unique on the system.   Processes that start after the collect starts can
        use the name unambiguously.
    </p>
    <h5>Using the /DecayToZeroHours:XX option</h5>
    <p>
        One issue that you can run into when using the /StopOn*Over or /StopOnPerfCounter is choosing a good threshold number.&nbsp;
        Choosing a number too high will mean that trigger will never fire.&nbsp; Choosing a number too low will cause it to trigger on
        uninteresting cases.&nbsp;&nbsp; This is what the /DecayToZeroHours option is for.&nbsp; The basic idea is you set the trigger
        to a number that is on the upper range of what you believe is likely.&nbsp; You also set /DecayToZeroHours:XX to a value
        that is &#39;long&#39; (typically it is something like 24 hours.&nbsp;&nbsp; By specifying this option you have indicated
        that the original trigger value should slowly decay to zero over that time.&nbsp; Thus the command
    </p>
    <ul>
        <li>PerfView collect "/StopOnRequestOverMSec:5000" /ThreadTime /collectMultiple:3 /DecayToZeroHours:24</li>
    </ul>
    <p>
        Will start with the stop threshold at 5000 msec, however it decays at a rate such that it will hit zero in 24 hours.&nbsp; Thus
        in 12 hours it will be at 2500 msec.&nbsp; Thus over that time period the trigger will eventually get small enough to fire, but
        odds are that it will trigger well before that at a &#39;reasonably big&#39; case.&nbsp;
    </p>
    <p>
        Will collect detailed information that will capture about 2 minutes of detailed information right before any GC that takes over
        5 seconds.&nbsp;&nbsp; This detailed information includes information on contexts switches (the /ThreadTime qualifier) and will
        collect up to three separate files (named the default: PerfViewData.etl.zip, PerfViewData.1.etl.zip and PerfViewData.2.etl.zip)
        for 3 separate long GCs before shutting down.&nbsp;
    </p>
    <h5>Logging while collecting with the /StopOn* options</h5>
    <p>
        When the /StopOn* trigger options are active, PerfView will log both to the PerfView log, as well as to the ETL file messages
        about the average, and maximum request in 10 second intervals.&nbsp; You can see these logs when data collection is happening by
        clicking the &#39;log&#39; button on the Main window (even when the collection dialog box is up).&nbsp; They will also be in
        the ETL file and can be viewed in the &#39;events&#39; view by filtering to the &#39;PerfView/PerfViewLog&#39; events.&nbsp;&nbsp;
        These can be helpful in understanding more about how the maximum changes over time.
    </p>
    <h5><a id="DelayAfterTriggerSec">Capturing more data after the stop Trigger has fired</a></h5>
    <p>
        After the /StopOn* trigger has fired, By default PerfView waits 5 seconds before it stops the trace.   This ensures that you
        see no only the period just before the trigger, but also 5 seconds afterward.   This is sufficient for most scenarios
        but if you need more you can use the /DelayAfterTriggerSec=N to specify a longer period.    Keep in mind, however that typically
        the default 500Meg circular buffer will only hold 2-3 min of trace so specifying a number larger than 100-200 seconds is likely
        to allow the period of time before triggering to get overwritten with new data.
    </p>
    <h5><a id="StopCommand">Executing an external command when the stop Trigger fires.</a></h5>
    <p>
        In some cases, it there is other logging that is being collected along with the PerfView data.  When PerfView is triggering
        the stop it is useful to execute a command that stops this logging.   This is what the /StopCommand is for.   The argument can use
        the variable name %OUTPUTDIR% or %OUTPUTBASENAME% or in it to represent the directory and the base name (filename without the
        directory or file extension) to pass to the external command.
    </p>
    <h5>Stopping on arbitrary ETW events or arbitrary start-stop pairs</h5>
    <p>
        The /StopOnRequestOverMSec is wired to measure the duration between the IIS start and IIS stop event.   Many services use IIS to
        route their requests and thus this option is useful much of the time.   However it is also possible to trigger a stop on either
        a single ETW event occurring or a start-stop pair having a duration longer than a trigger amount using the /StopOnEtwEvent.
        The general syntax is
    </p>
    <ul>
        <li>PerfView "/StopOnEtwEvent:Provider/EventName;Key1=Value1;Key2=Value2..." collect</li>
    </ul>
    <p>Where the 'Provider' can be </p>
    <ul>
        <li>The name of an ETW provider that is registered with the operating system (returned by 'logman query Providers')</li>
        <li>
            A string of the form '*EventSourceName', which specifies the name of a dynamically registered ETW provider (e.g. an
            EventSource).  The '*' indicates that the name should be hashed to a GUID and that GUID be used as the provider ID.
        </li>
        <li>A explicit GUID</li>
    </ul>
    And 'EventName' can be
    <ul>
        <li>Of the form 'TaskName/OpcodeName' (e.g. GC/Start)  This is the</li>
        <li>Simply 'TaskName' if the OpcodeName is 'Info' (0) </li>
        <li>Of the form EventID(NNN), where NNN is the decimal event number associated with the event</li>
    </ul>
    <p>
        In general the event name shown in the 'Events' view of PerfView is the correct thing to use.   Finally the key value pairs
        give additional 'options' that affect the semantics.   They are all optional, and here are keys that are valid for the key-value
        pairs.
    </p>
    <ul>
        <li>
            <strong>Keywords=XXXX</strong> Specify the ETW keywords to turn on the event needed for the stop.
            The XXXX is specified in hexadecimal (with or without the 0x prefix) and the default value is ulong.MaxValue
            If using Windows Kernel Trace provider the default value is 0x0001270F (same as default in perfview for kernel events except profile)
        </li>
        <li>
            <strong>Level=N</strong> Specify the ETW Level (1 = critical, 5 = verbose) to turn on the event needed for the stop.
            The N is specified in decimal and the default value is 4 (informational)
        </li>
        <li>
            <strong>Process=SSSS</strong> Specify a decimal process ID or a process name (exe name without path or extension),
            to filter by.   This can also be specified by using the /Process qualifier.  The default is to listen to all processes.
            As with the /Process qualifier, if multiple processes with the same name are present only ONE of them at any particular
            time will be the focus process (thus use process IDs in that case).
        </li>
        <li>
            <a id="FieldFilter"><strong>FieldFilter=FieldName OP Value </strong></a>  Specifies a field filter.
            OP is one of &lt; &lt;= &gt; &gt;= = != or ~ (which means match
            .NET regular expression, case insensitive), and value is either a string, integer value or a floating point value.   This option can be
            repeated more than once in which case the event has to match ALL the filters (currently no OR operator).  This can work
            on single event triggers as well as start-stop triggers however for start-stop triggers it only applies to the START event.
            This is a very powerful option that allows you to be quite specific about which particular event to trigger on.
        </li>
        <li>
            <strong>BufferSizeMB=NNN</strong> Specify the buffer size used by the trigger session.   This is only needed if
            the provider generates a LARGE volume of events rapidly.  The default value is 256 MB.
        </li>
        <li>
            <strong>TriggerMSec=NNN</strong> Specifies that PerfView will listen not for a single event but for a start stop
            pair and that only duration larger than NNN MSec will trigger a stop.   If this key-value is present then the following
            key-values have meaning.
        </li>
        <li>
            <strong>DecayToZeroHours=NNN</strong> Indicates that the Trigger time will decay to 0 over NNN hours.   This can also
            be specified by the /DecayToZeroHours qualifier.
        </li>
        <li>
            <strong>StopEvent=PROVIDER/EVENTNAME</strong> Specifies that stop event of the start-stop pair (the start event was
            specified before the key-value pairs).  If this key-value pair is not specified, then the stop event is derived from the
            start event by the following rules.
            <ol>
                <li>
                    If the start event ends with 'Start' then the stop event name is derived by replacing 'Start' with 'Stop'.
                </li>
                <li>Otherwise the event with the next event ID is assumed to be the stop event.</li>
            </ol>
        </li>
        <li>
            <strong>StartStopID=XXXX</strong> In order to match up start-stop pairs, PerfView needs a 'ID' that is present in
            both the start and the stop event that can be used to do the matching.   XXXX specifies the name of the
            payload field that does. this.   In addition to all payload fields, XXXX can be 'ThreadID' or 'ActivityID' which
            indicate that they should be used as the correlation ID.    If this key-value is not present, and the event has any
            payload fields the first field is used as the correlation ID.  If the event has no payload fields the thread ID is used.
        </li>
        <li>
            <strong>Verbose=true</strong> By default PerfView logs a SAMPLE of PerfView/StopTriggerDebugMessage messages into the ETW log
            which is typically enough information to diagnose why triggering is not working properly.   However by setting
            Verbose=true the information is more complete.
        </li>
    </ul>
    <h5>Examples of /StopOnEtwEvent use</h5>
    <p>
        As you can see there are a lot of options, but mostly you don't need them.  This option is perhaps most useful for your
        own EventSource Events.   If you defined an event 'MyWarning' you could stop on that warning condition by doing
    </p>
    <ul>
        <li>PerfView /StopOnEtwEvent:*MyEventSource/MyWarning collect</li>
    </ul>
    <p>
        If you defined your provider 'MyEventSource, and had two events 'MyRequestStart' and 'MyRequestStop',
        you could stop whenever your requests took more than 2 seconds by doing
    </p>
    <ul>
        <li>PerfView /StopOnEtwEvent:*MyEventSource/MyRequest/Start;TriggerMSec=2000 collect</li>
    </ul>
    <p>
        If want to stop when the process named 'GCTest' (that is the exe is named GCTest.exe) stops (you can also use a process number).
    </p>
    <ul>
        <li>PerfView /StopOnEtwEvent:Microsoft-Windows-Kernel-Process/ProcessStop/Stop;Process=GCTest collect</li>
    </ul>
    <p>
        If want to stop when a process starts it is a bit more problematic because the 'start' event actually occurs in the process that
        spawned the process not the process being created.  Instead you can use the fact that the ProcessStart has a 'ImageName' field
        and you can use the ~ operator of the <a href="#FieldFilter">FieldFilter</a> option to trigger on that.
        Thus to stop when a process called GCTest.exe is launched you can do
    </p>
    <ul>
        <li>PerfView /StopOnEtwEvent:Microsoft-Windows-Kernel-Process/ProcessStart/Start;FieldFilter=ImageName~GCTest.exe collect</li>
    </ul>
    <p>
        Here is a slightly more complex example where we only stop if the GCTest.exe executable fails with a non-zero exit code.   Here
        we use the ImageName field to find a particular Exe as well as the ExitCode field to determine if the process fails.
        You can use this to stop PerfView when a particular process in a large script fails (which is a reasonably common scenario).
    </p>
    <ul>
        <li>PerfView /StopOnEtwEvent:Microsoft-Windows-Kernel-Process/ProcessStop/Stop;FieldFilter=ImageName~GCTest.exe;FieldFilter=ExitCode!=0 collect</li>
    </ul>
    <p>
        Here is an example where we want to stop when a particular URL is serviced by a ASP.NET server.  Basically we stop when a ASP.NET
        Request event fires with a 'FullUrl' field that matches the pattern (ends in /stop.aspx).
    </p>
    <ul>
        <li>PerfView "/StopOnEtwEvent:*Microsoft-Windows-ASPNET/Request/Start;FieldFilter=FullUrl~http://.*/stop.aspx" collect</li>
    </ul>
    <p>
        Here is an example where we want to stop when a disk I/O takes longer than 10000 ms.  We want to monitor Windows Kernel Trace/DiskIO/Read events and use 'DiskServiceTimeMSec' field in a FieldFilter expression.
    </p>
    <ul>
        <li>PerfView "/StopOnEtwEvent:Windows Kernel Trace/DiskIO/Read;FieldFilter=DiskServiceTimeMSec>10000.0;Keywords=0x100" collect</li>
    </ul>
    <p>
        In general the option is pretty powerful, especially if you have the ability to add ETW events to your code (EventSource)  Coupled with
        the FieldFilter you can use this to stop on particular DLLs in particular processes loading, or unloading, registry keys being touched
        files being opened, as well as any of your specific EventSource events happening (testing their arguments).
    </p>
    <h5>Using Keywords on /StopOnEtwEvent providers</h5>
    <p>
        In the previous examples we turned on all the 'keywords' associated with a particular provider.   For example to trace the starts and
        stops of process we turned on all the events in the Microsoft-Windows-Kernel-Process provider.  While this works, it can mean that the
        triggering logic has to look at and discard many events that are unimportant.   You can improve the efficiency as well as make any
        debugging of triggering easier by reducing the number of events subscribed to by using the 'Keywords' option.  For example
    </p>
    <ul>
        <li>PerfView /StopOnEtwEvent:Microsoft-Windows-Kernel-Process/ProcessStop/Stop;<b>Keywords=0x10;</b>FieldFilter=ImageName~GCTest.exe;FieldFilter=ExitCode!=0 collect</li>
    </ul>
    <p>
        This is the same as the previous example but it has the Keywords=0x10 option placed on it.   This tells PerfView to only turn on particular events
        designated by the 0x10 bitfield.   The only issue is how do you know what 0x10 means?    You can determine this by looking at the manifest for
        the Microsoft-Windows-Kernel-Process provider.    You can do this by opening the advanced section of the 'collection' dialog box, and clicking on the
        <a href="#ProviderBrowser">Provider Browser</a> button.   Select the provider of interest in the 'Providers' listbox and then click the 'View Manifest'
        button.  This will bring up the complete XML manifest for the provider.  You will find a 'keywords' section and in that you will find the definitions
        of each keyword.  Thus we find that the WINEVENT_KEYWORD_PROCESS keyword has the value 0x10, and we can see that the event of interest (ProcessStop/Stop)
        is tied to this keyword, we know that this is the only keyword we actually need.   Thus we know the 'magic' number to give to the 'Keywords' option
        above.   Another way to find the keywords is using "logman query providers <b>provider</b>".
        Note you don't have to do this, but it does make debugging easier and processing more efficient (since there are fewer events to have to filter out).
    </p>
    <h5>Debugging Triggering Issues</h5>
    <p>
        It is not uncommon for you to try out a /StopOnEtwEvent qualifier and find that it does not do what you want (typically because it did not
        trigger). Sometimes what is in the log will help, however PerfView can't place too much in the log because it might flood the log. Instead
        it emits special PerfView StopTriggerDebugMessage events into the ETW stream so that you can look at data in the 'events' view and figure out why it is
        not working properly. If you have issues with Triggering you will definitely want to look at these events.
    </p>

    <h5>Using Performance Counters to trigger collection start (Start Trigger qualifier)</h5>
    <p>
        For many scenarios, simply using the <a href="#StopOnPerfCounter">/StopOnPerfCounter</a> is sufficient (along
        with perhaps a <a href="#DelayAfterTriggerSec">/DelayAfterTriggerSec</a>) to collect data at an interesting point
        (when a performance counter is unusually high or low).  However that technique
        has the disadvantage of requiring that collection be on continuously.   This is
        inefficient if the point of interest was well after the performance counter
        triggers.   In this case it makes more sense to <b>not event start</b> collection until the interesting time.
        This is what the /StartOnPerfCounter option is for. Its syntax is identical to <a href="#StopOnPerfCounter">/StopOnPerfCounter</a>
        except that it will not even start collecting until this trigger trips.
        The flag /MinSecForTrigger:N applies to /StartOnPerfCounter, to
        control how many seconds the performance counter has to satisfy the
        condition before triggering collection (the default is 3 seconds).
    </p>
    <p>&nbsp;</p>
    <p>&nbsp;</p>
    <p>&nbsp;</p>
    <p>&nbsp;</p>
    <p>&nbsp;</p>
    <p>&nbsp;</p>
    <hr />
    <!--  ********************************** -->
    <h2>
        <a id="UsingEventSources">Using PerfView with EventSources</a>
    </h2>
    <p>
        The .NET V4.5 Runtime comes with a class called
        <a href="http://msdn.microsoft.com/en-us/library/system.diagnostics.tracing.eventsource.aspx">System.Diagnostics.Tracing.EventSource</a>
        which can be used to log ETW events
        in a very convenient way. For example here is a trivial EventSource called MyCompanyEventSource
        which has a 'Load' and 'Unload' event. Each event logs whatever interesting information
        makes sense for that event, in this case the 'imageBase' of the load as well as
        the name.
    </p>
    <pre>         sealed class MyCompanyEventSource : EventSource
        {
            public static MyCompanyEventSource Log = new MyCompanyEventSource();    // The log itself
            public void Load(long ImageBase, string Name) { WriteEvent(1, ImageBase, Name); }
            public void Unload(long ImageBase) { WriteEvent(2, ImageBase); }
        }
        // In other code
        MyCompanyEventSource.Log.Load(myImageBase, "MyName");
        // In another place 
        MyCompanyEventSource.Log.Unload(myImageBase);</pre>
    <p>
        Because EventSources can log to the ETW logging file in standard way, PerfView can
        display these in useful ways. This section describes some of the common techniques
    </p>
    <h3>Naming EventSources</h3>
    <p>
        Like all ETW providers, and EventSource has a 8 byte GUID that uniquely identifies
        it. Normally GUIDs are not convenient to use, and you would prefer to use a name.
        If an ETW provider registers itself with the operating system PerfView can ask the
        OS to look up a name and get the GUID. However typically EventSources do not do
        this because it complicates the deployment of the application. Instead EventSources
        typically use an internet standard way of generating a GUID from a name. Thus given
        a name you can find the GUID without the EventSource ever needing to register itself.
        PerfView supports using this convention with the *<i>NAME</i> syntax. If a provider
        names starts with a * it is assumed to be the provider GUID which results by hashing
        NAME in the standard way. (The hash is case insensitive). EventSource names are
        either the name supplied by the Name parameter of the EventSourceAttribute applied
        to the EventSource class or it is the simple name of the class (no namespace) if
        there is no name given explicitly. Once you know the name of the EventSource you
        can use the /providers qualifier to turn on the EventSource. For example
    </p>
    <ul>
        <li>PerfView /Providers=*MyCompanyEventSource collect</li>
    </ul>
    <p>
        Will turn on all keywords (eventGroups) EventSource called 'MyCompanyEventSource'
        at the verbose level. Notice that all of this is just 'standard' ETW. The only special
        part is the * to refer to the EventSource without it being registered.
    </p>
    <p>
        In the previous example the MyCompanyEventSource was activated IN ADDITION TO the
        standard kernel and CLR providers. This is great for monitoring fine-grained performance,
        however it is too verbose for simple monitoring. While you can use the /kernelEvents=none
        /clrEvents=none /NoRundown qualifiers to turn off the default logging there is a
        '/onlyProviders' qualifier that makes this even easier. Thus
    </p>
    <ul>
        <li>PerfView /OnlyProviders=*MyCompanyEventSource collect</li>
    </ul>
    <p>
        Will collect ONLY from the providers mentioned (in this case the MyCompanyEventSource),
        turning off all other default logging. Thus the files tend to remain very small
        and is suitable when you only wish to see your EventSource messages.
    </p>
    <p>
        You can achieve the same effect of the /OnlyProviders qualifier in the GUI by opening
        the 'Advanced' dropdown, unchecking the '.NET Rundown' 'Kernel Base' and '.NET'
        checkboxes, and adding your EventSource specification in the 'Additional Providers'
        textbox.
    </p>
    <p>
        Just like any other ETW source, you can change the 'keywords' (groups) of events
        or the verbosity of your logging by specifying these to the /OnlyProviders qualifier
        See the help on <a href="#AdditionalProvidersTextBox">AdditionalProviders</a> for
        more details on this syntax. One very interesting option here is to turn on the
        'stacks' option for the provider, which will log a stack trace every time your ETW
        event fires. This can then be viewed in the 'Any Stacks' view of the resulting log
        file.
    </p>
    <p>
        Once you have collected your data, you can look at it with PerfView in the normal
        way This almost certainly means opening the 'Events' view, selecting the events
        of interest and updating the display. If desired the events can be saved as XML
        or CSV files by using the right click context menu in the events view.
    </p>
    <h3>Converting EventSource Data to XML</h3>
    <p>
        Looking at the output of an EventSource in the event viewer is great for ad-hoc
        investigations since the GUI allows quick filtering and conversion to CSV or XML
        file (right click in the EventViewer).&nbsp;&nbsp;&nbsp; However it may be that
        you want to simply parse the data with other tools that you would like to remain
        very loosely coupled to PerfView/ETW.&nbsp; For these applications all you want
        is something that takes a ETL file and converts it to and XML file, which you can
        then process using other tools.&nbsp;&nbsp; There is a PerfView command that does
        this.&nbsp;&nbsp;
    </p>
    <ul>
        <li>PerfView /logFile=convert.log.txt UserCommand DumpEventsAsXml PerfViewData.etl.zip</li>
    </ul>
    <p>
        The command above runs the &#39;UserCommand&#39; called &#39;DumpEventsAsXml&#39;
        giving it the parameter &#39;PerfViewData.etl.zip.&nbsp;&nbsp; This will create
        a file called PerfViewData.etl.xml which is an XML dump of all the ETL data in the
        original file (thus the file can get big).&nbsp;&nbsp;&nbsp; It works on any ETL
        or ETL.ZIP file however it is meant for files produced with the /OnlyProviders qualifier
        that only have EventSources turned on and thus will produce relatively little output.
    </p>
    <p>
        The attentive user will wonder what a &#39;UserCommand&#39;&nbsp; is.&nbsp; PerfView
        has &#39;built in&#39; commands, but it also has the ability to be extended with
        code that the user provides (see <a href="#PerfViewExtensions">PerfView Extensions</a>
        for more).&nbsp;&nbsp; Some of these user commands become useful enough that they
        ship with PerfView itself by default.&nbsp;&nbsp; DumpEventsAsXml is one of these
        commands.&nbsp;&nbsp; You can see all the user commands that PerfView currently
        knows about by looking at the Help -&gt; User Command Help menu option.
    </p>
    <!-- TODO FIX NOW do more -->
    <hr />
    <!--  ********************************** -->
    <h2>
        <a id="PerfViewExtensions">PerfView Extensions (Automating PerfView)</a>
    </h2>
    <p>
        PerfView has the ability to <a href="#CollectingFromCommandLine">
            collect data with command
            line commands
        </a>, which can be used to automate simple collection tasks, however
        it is also useful to automate analysis as well as collection. For this simple command
        line options are not sufficient, you need the full power of a programming language
        to support an unbounded variety of useful data manipulations. This is what PerfView
        extensions are for. PerfView allows you to <a href="#CreatingExtensions">create an extension</a>,
        which is a .NET DLL that lives alongside PerfView.exe that defined user defined
        commands. These commands can control PerfView's collection or analysis capabilities.
        It is very powerful and opens up a broad range of automation scenarios including
    </p>
    <ol>
        <li>
            Computing complex metrics like startup time which requires you to find the difference
            between two events (e.g. process start and first render event
        </li>
        <li>Custom groupings and other analysis based on names in the stacks.</li>
        <li>Custom reports on Disk I/O, reference set or other metrics</li>
        <li>
            Automating not only ETW collection, but also automating symbol resolution, reducing
            data to a single process and saving various views as PERFVIEW.XML.ZIP files, dramatically
            reducing the amount of data (so you can archive more of it) and speeds up use of
            that data (since symbols are resolved and files size are so small)
        </li>
    </ol>
    <!--  ****************  -->
    <h3>
        <a id="InvokingUserCommands">Invoking user defined commands</a>
    </h3>
    <p>
        Along with the built in command line commands like 'run', 'collect' and 'view' there
        is also a 'userCommand'. A user command is one way to activate user-defined functionality
        in PerfView. For example when you run the command
    </p>
    <ul>
        <li>PerfView UserCommand Global.DemoCommandWithDefaults arg1 arg2 arg3</li>
    </ul>
    <p>
        PerfView will look for a DLL called 'PerfViewExtensions\Global.dll next to PerfView.exe.
        It will then look for a type call 'Commands' and create an instance of it. Then
        it looks for a method within that type called 'DemoCommandWithDefaults'. It then
        passes the rest of the parameters of the command to that method. Often the method
        target is varags (its last argument is 'params string[]') which allow it to handle
        any number of arguments.
    </p>
    <p>
        The extension named 'Global' is special in that if the user command has no '.' in
        it, then the extension is assumed to be 'Global' extension. Thus the command above
        could be shorted to
    </p>
    <ul>
        <li>PerfView UserCommand DemoCommandWithDefaults arg1 arg2 arg3</li>
    </ul>
    <!--  ****************  -->
    <h3>
        <a id="InvokingUserCommandsGui">Invoking user defined command from the GUI</a>
    </h3>
    <p>
        You can also invoke user commands from the GUI by using the File -> UserCommand
        menu option (Alt-U) on the Main Viewer. This command will bring up a dialog box
        in which you can enter your command. PerfView remembers the user commands you have
        previously executed (even across invocations of the program), so typing just the
        first few characters is typically enough to select a command you have executed in
        the past. Hitting the tab key will commit the completion and hitting Enter will
        run the command. Thus in just a few keystrokes you can be executing your user defined
        commands.
    </p>
    <h3>Help on User defined commands</h3>
    <p>
        The Help-> 'User Defined Commands' menu entry, as well as the 'Command Help' button
        on the user command dialog will open a dialog that contains help on the various
        user defined commands
    </p>
    <!--  ****************  -->
    <h3>
        <a id="CreatingExtensions">Creating a PerfView Extension (creating user commands)</a>
    </h3>
    <p>
        Before you can invoke a user defined command, you need to create an Extension DLL
        which contains command. This is what the PerfView CreateExtensionProject command
        does. Because extension DLLs are located by looking RELATIVE to PerfView.exe, the
        first step in creating your own extensions, is to copy the PerfView.exe to a location
        that you control. For example:
    </p>
    <ul>
        <li>xcopy \\clrmain\tools\perfview.exe .\</li>
    </ul>
    <p>
        Once you do this you can execute the command (notice we launch the LOCAL copy of
        perfview)
    </p>
    <ul>
        <li>.\PerfView CreateExtensionProject <i>ExtensionName</i></li>
    </ul>
    <p>
        You will create the PerfViewExtensions directory next to the PerfView.exe, and does
        three things
    </p>
    <ol>
        <li>
            Creates a new C# project in a PerfViewExtenions<i>ExtensionName</i>Src. If <i>ExtensionName</i>
            is missing/empty, the extension name 'Global' is used.
        </li>
        <li>
            Creates/Modifies the solution file PerfViewExtenions\Extensions.sln to include the
            new project.
        </li>
        <li>Opens the PerfViewExtenions\Extensions.sln in Visual Studio 2010.</li>
    </ol>
    <p>
        Thus after running the CreateExtensionProject command you can simply open the PerfViewExtenions\Extensions.sln
        to run compile and test your new PerfView extension. If you have VS2010 installed,
        you can be up and running in seconds.
    </p>
    <p>
        Thus probably the best way to get started it to simply:
    </p>
    <ul>
        <li>
            Run 'PerfView CreateExtensionProject' This will create 'Global' extension DLL and
            launch VS2010 on it.
        </li>
        <li>
            Open the 'Commands.cs' file and set a breakpoint on the first line of the 'Demonstration'
            method.
        </li>
        <li>
            Compile and run by hitting F5. You will launch PerfView and you can step through
            the example.
        </li>
        <li>
            Explore the PerfView object model (see <a href="#ExploringPerfViewObjectModel">
                section
                below
            </a>)
        </li>
        <li>Create new commands by creating new methods in the 'Commands' class.</li>
    </ul>
    <!--  ****************  -->
    <h3>
        <a id="ExploringPerfViewObjectModel">Exploring the PerfView Object Model</a>
    </h3>
    <ol>
        <li>
            INTELLISENSE IS YOUR FRIEND! Only the PerfViewExtensibility namespace is open by
            default and this is where the most important classes in PerfView's object model
            reside. This means that there is a good chance if you type some characters, you
            will find what you are looking for.
        </li>
        <li>
            CommmandEnvironment is a good place to start. This is the class that defines 'global'
            methods. If you select on the CommmandEnvironment below and hit F12, you can browse
            the other global methods. These methods will return other important types in the
            object model (e.g. EtlFile, Events, Stacks).
        </li>
        <li>
            Understand classes in PerfViewExtensibility first. You can use the object browser
            (Ctrl-W J) and look under the PerfView.PerfViewExtensibility namespace.
        </li>
        <li>
            Take a look at the example commands. These use many of the important features (logging,
            symbol lookup, HTML report) in context, which is quite helpful.
        </li>
    </ol>
    <p>
        Once you have familiarized yourself with the PerfView object model, you need to
        realize an important consideration
    </p>
    <ul>
        <li><b>There is no compatibility guarantee on the PerfView object model!</b></li>
    </ul>
    <p>
        What this means is that if you were to upgrade PerfView.exe to a newer version there
        is a good chance you will have to update your extension to match any changes that
        where made to PerfView since the last version. The reason for this is simple. The
        PerfView object model is really best thought of as being a 'Beta' release, because
        there simply has not been enough time to find the best API surface. Thus changes
        are inevitable, and the cost of keeping compatibility is simply not worth it. Thus
        you are free to create PerfView extensions but you must be ready to pay the porting
        cost on upgrades when you decide to create an extension.
    </p>

    <!--  ****************  -->
    <h3>
        <a id="ExtendingTheGui">Extending the GUI with User Commands</a>
    </h3>
    <p>
        User commands give you the ability to call your code to create specialized views
        of data, but it is not integrated into the GUI itself.   This section shows how
        to make your user commands become part of the normal GUI experience.   The key
        to doing this is the 'PerfViewStartup' file in the 'PerfViewExtensions' directory
        next to the PerfView.exe file.   If such a file exists, the commands in this
        file are executed at startup of PerfView.   This file is read line by line
        and have the following commands
    </p>
    <ul>
        <li>
            # Comments - lines that begin with # are assumed to be comments and
            are ignored.
        </li>
        <li>
            OnStartup <b>UserCommandName</b> - This causes the user command
            <b>UserCommandName</b>  to be called when PerfView starts up.  Like
            all user commands <b>UserCommandName</b>  can have the form
            <b>ExtensionDLLName.CommandName</b> that indicates the DLL where to
            find the user command and then the command name.   This command should
            take no arguments.   Note that this forces this <b>ExtensionDLLName </b>to
            be loaded on PerfView startup.  Ideally you don't use this hook or if
            you are forced to, you do as little as possible in this routine to keep
            things pay-for-play.
        </li>
        <li>
            OnFileOpen <b>Extension</b> <b>UserCommandName</b> - This causes the user command
            <b>UserCommandName</b> to be called whenever a file with the extension (e.g. .etl).
            <b>Extension</b> is opened (double clicked) in the main view pane.   Note PerfView
            automatically understands etl.zip files are .etl files so specifying .etl will also
            cover .etl.zip.   This allows you to add new children to existing file format as
            well as make PerfView recognize completely new file extensions.   The user command
            is called with path name of the file being opened as an argument.
            If you only  need to add a new view to an existing format (e.g. adding new views to .etl files)
            it is better to use the DeclareFileView option.
        </li>
        <!--
        <li>DeclareFileView <b>Extension</b> <b>ViewName</b> <b>UserCommandName</b> - This causes
                a new view (child) to be added to all files with <b>Extension</b>.   This when files
                of this type are opened, it will have a new child called <b>ViewName</b>.  When
                a user double-clicks on that child <b>UserCommandName</b> is called.  The user command
                is called with path name of the file being opened View name.
        </li>
        -->
    </ul>
    <hr />
    <!--  ********************************** -->
    <h2><a id="ViewingLinuxData">Viewing Linux Data</a></h2>
    <p>
        Linux has a kernel level event logging system called <a href="https://en.wikipedia.org/wiki/Perf_%28Linux%29">Perf Events</a> which is
        not unlike ETW, and in particular knows how to capture CPU stacks at a periodic interval (e.g. 1msec)  PerfView knows how to read this data,
        so it is possible to collect data using the Perf Events tool on Linux copy the data over to a Windows machine and view it with PerfView's
        stack viewer.   Much of the rest of this section is a clone of the <a href="https://github.com/dotnet/coreclr/blob/master/Documentation/project-docs/linux-performance-tracing.md">linux-performance-tracing.md</a>
        document.   You may wish to check there as well to see if there for the latest version of these instructions.
    </p>
    <h3>Setup</h3>
    <h4>Getting perfcollect script</h4>
    <p>
        There is a BASH (shell) script that Brian Robbins wrote that will run Perf.exe resolve symbols and collect all the information
        into a ZIP file for transfer to another machine.     You can download it using either a web browser or using the 'cURL' utility
    </p>
    <ul>
        <li>curl -OL http://aka.ms/perfcollect</li>
    </ul>
    <p>
        Once downloaded, to allow it to run you have to make it executable
    </p>
    <ul>
        <li>
            chmod +x perfcollect
        </li>
    </ul>
    <p>
        If that works you should be able to do
    </p>
    <ul>
        <li>
            ./perfcollect
        </li>
    </ul>
    <p>
        And it should print out some help.
    </p>
    <h4>Installing Linux Perf tool</h4>
    <p>
        You will need the Perf.exe command as well as the LTTng package you can get these by doing
    </p>
    <ul>
        <li>
            sudo ./perfcollect install
        </li>
    </ul>
    <p>
        Note that you need to be super-user to do this so if you are not already, which is why the command above uses
        the sudo command to elevate to super-user before executing the install script.
    </p>
    <h4>Collecting Data</h4>
    <p>
        If you are running a .NET Runtime application you must set an environment variable that will
        tell the runtime to emit symbol information about Just in Time (JIT) compiled methods.  Thus you
        must make sure that the following environment variable is set before running the application
    </p>
    <ul>
        <li>
            export COMPlus_PerfMapEnabled=1
        </li>
    </ul>
    <p>
        At this point you can start collection.   To do so open another command window and run the following command.
    </p>
    <ul>
        <li>
            sudo ./perfcollect collect FILENAME
        </li>
    </ul>
    <p>
        At which point you can go to the first window (where COMPlus_PerfMapEnabled was set) and start your application.
        After the application completes you can use Ctrl-C to stop the collection.   The result is a FILENAME.trace.zip file.
        This contains the trace as well as all other files to resolve symbolic information.
    </p>
    <h4>Viewing data with PerfView</h4>
    <p>
        Once you have created the FILENAME.trace.zip file you can transfer it to a windows machine and simply open it with
        PerfView.   It will open the file in a stack window of the CPU samples, and all the normal techniques of CPU
        investigation are applicable.
    </p>
    <p>
        What is going on under the hood is that PerfView is opening the FILENAME.trace.zip file to locate a file within
        the archive with the suffix *.data.txt and reads that.  This file is expected to be the output of running
        <a href="http://linux.die.net/man/1/perf-script">'Perf script'</a> command.  PerfView also knows how to read files
        with the *.data.txt suffix directly, so if you don't wish to use the 'perfcollect' script when collecting your Linux
        data, you can still easily feed the data to PerfView.  (You can also zip up your *.data.txt file into a file with the
        suffix *.trace.zip and PerfView will happily open it)
    </p>
    <hr />
    <!--  ********************************** -->
    <h2><a id="ViewingExternalData">Viewing External Data</a></h2>
    <p>
        One of the most powerful aspects of PerfView is its stack viewer.   Perhaps one of the most interesting things about
        this viewer is that it is VERY generic.     The data that is shown in this viewer is simply a set of samples where
        each sample contains
    </p>
    <ol>
        <li>An (optional) floating point value representing the time.</li>
        <li>A value (defaults to 1) representing the metric or cost of the sample.</li>
        <li>A list of names representing the stack or path in a hierarchical tree.</li>
    </ol>
    <p>
        All the rest of magic of the stack viewer, the inclusive and exclusive cost, the timeline, filtering, the callers,
        and callees views, are all just different aggregations of this data.
    </p>
    <p>
        What this means is that pretty much any hierarchical data can be usefully displayed in the stack viewer.   For example
        the size on disk view is simply taking the path of a file name to form the 'stack' and the size of the file as the
        metric to form the model of the total size on disk view.     This means that data from other profilers or any other
        place where the data forms a hierarchy can be viewed with the stack viewer.
    </p>
    <h3>Simple .perfView.xml Format</h3>
    <p>
        Now inside the implementation of PerfView is a class called a 'StackSource' that represents this list of samples with
        stacks that PerfView's viewer views.   There is also a class called a 'InternStackSource' that is designed to make
        it easy to read other formats and turn that data into a StackSource.    However PerfView also has two formats that make
        it very easy allow other tools to output the stacks that perfview can simply read.   One of these formats is XML based
        and the other is JSON based, and neither of them will be surprising, they are simply the 'obvious' encoding of the
        data that the stack viewer needs in those formats.   For example here is a sample of the .perfView.xml format
    </p>
    <pre>
        &lt;StackSource&gt;
          &lt;Samples&gt;
           &lt;Sample Time="10" Metric="10"&gt; 
                HelperNested 
                Helper1 
                Func3 
                Func 
                Main 
           &lt;/Sample&gt;
           &lt;Sample Time="20" Metric="10"&gt; 
                Func3 

                Func 
                Main 
           &lt;/Sample&gt;
           &lt;Sample Time="30" Metric="10"&gt; 
                HelperX 
                Helper1 
                Func3 
                Func 
                Main 
           &lt;/Sample&gt;
           &lt;Sample Time="40" Metric="10"&gt; 
                Func 
                Main 
           &lt;/Sample&gt;
          &lt;/Samples&gt;
         &lt;/StackSource&gt;
    </pre>
    <p>
        You can see that the format can be very straightforward.   There is a 'StackSource' element that has a member 'Samples'
        which in turn contains a list of Samples, each of which has a time and a metric (both of these are optional, time defaults
        to 0 and metric defaults to 1) Inside each sample is a list of stack frames, one per line.   These are ordered from the
        most specific (or deepest call tree nesting) to the least specific (main program).    That is all you need to generate
        in order for PerfView to read the data.    You can try this out by simply pasting the above text into a '*.perfView.xml'
        file and the opening the file in perfview.   PerfView will open that data in the stack viewer (Try it!)
    </p>
    <p>
        There is a corresponding *.perfView.json format which is completely analogous to the XML format.   The basic structure
        is the same: A StackSource which has a list of Samples each same has a time, metric and list of names that represent
        the stack.   Here is an example.   Like the previous example you can cut and paste into a *.perfView.json file and
        open it in PerfView, to see the data in the stack viewer.
    </p>
    <pre>
    {
      "StackSource" :  {
        "Samples" : [
           { "Time" : "10", "Metric": "10",
             "Stack": [
                "HelperNested",
                "Helper1",
                "Func",
                "Main" 
             ]
           },
           { "Time" : "20", "Metric": "10",
             "Stack": [
                "Func3",
                "Func",
                "Main" 
             ]
           },
           { "Time" : "30", "Metric": "10",
             "Stack": [
                "HelperX",
                "Helper1",
                "Func3",
                "Func",
                "Main" 
             ]
           },
           { "Time" : "40", "Metric": "10",
             "Stack": [
                "Func",
                "Main" 
             ]
           }
        ]
      }
    }
    </pre>
    <h3>Advanced .perfView.xml Format</h3>
    <p>
        The simple format is nice because it is so easy to explain, but it is very inefficient.  You can see the each stack
        has to be repeated in its entirety for each sample, and most of the time the stacks are very similar to one another.
        Moreover when you read the samples into the viewer, you don't get any defaults for PerfView's grouping, folding and
        filtering options, which makes the experience less than ideal.
    </p>
    <p>
        Well, the .perfView.xml format is actually more complex than what has been shown so far.   In fact you can assign
        IDs to each unique Frame of the stack and use the ID instead of the name (saving a lot of space).  Similarly you
        can assign IDs to each unique Stack (built from Frame IDs) that can be used in the samples (saving more space).
        This compression dramatically reduces the time to load the data.  Finally it is possible to specify all the defaults
        and all the options for each of the stack viewers textboxes (e.g., the Group Pats, Fold Pats Include Pats ... textboxes).
        In short with a little more work when you generate your .perfView.xml file you can make the experience significantly
        nicer.
    </p>
    <p>
        Rather than document the specific format for these, it is easier to simply show you an example.   The PerfView
        stack viewer has a File -> Save command and this saves the current stack view as a .perfView.xml.zip file.
        If you unzip this file, then you will see the representation of the data data in this more complete, efficient
        format.   Thus you can take one of the examples above, open it, add some data to the text boxes (which remember
        the history), and the save the view.   Then you an unzip it and look at the format.   The format is completely straightforward.
    </p>
    <hr />
    <!--  ********************************** -->
    <h2><a id="WorkingWithWPA">Working with WPA (Windows Performance Analyzer)</a></h2>
    <p>
        <a href="http://msdn.microsoft.com/en-us/library/windows/hardware/hh448170.aspx">Windows Performance Analyzer</a> (WPA)
        is a tool build by the Windows and is available for no charge as part of the Windows Assessment and Deployment Kit.   Along
        with the Windows Performance Recorder (WPR) It can be used to collect and view ETW data.    Because they both use the same
        data format (ETW trace log (ETL) files), it is easy to collect using one tool and view using another.   This is useful because
        WPA has has very powerful ways of graphing and viewing data that PerfView does not have, and PerfView has powerful ways of
        collecting data and other view that are not present in WPA.
    </p>
    <h3>Using PerfView to collect data and WPA to view data. </h3>
    <p>
        PerfView has a number of <a href="#ProductionMonitoring">Production Monitoring</a> (e.g. /StopOnPerfCounter) capabilities that
        at present WPR does not have.   In addition the fact that PerfView is easy anyone to download from the web and XCOPY deploy
        as a single EXE makes PerfView ideal for collecting data in the field.   In this case you can simply collect with PerfView
        collect command (with the /threadTime option if you may be doing a wall clock investigation) and the result will be a .ETL.ZIP
        file ready for uploading.   Unfortunately, at present WPA will not open the ETL.ZIP file, but you can use the following command
    </p>

    <ul>
        <li>PerfView /wpr unzip <i>DataFileName</i></li>
    </ul>

    <p>
        which will unzip the data file as well as any NGEN PDBS and store them in a .NGENPDB folder in the way that WPR would   Thus
        after unziping in this way, you can run the WPA command on the data file to view the data in WPA.
    </p>

    <p>
        In the scenario above PerfView will set the ETW providers as it would normally.   However PerfView also has the ability to
        mimic the providers that WPR would turn on by default.   Thus if you wish to use PerfView to collect data and try to mimic
        WPR as much as possible, collect the data with the following command.
    </p>
    <ul>
        <li>PerfView /wpr collect </li>
    </ul>
    <p>
        This should produce data files that are very close if not identical to what WPR would produce.   In particular it does
        not produce a ZIPPed file but outputs the .ETL file and the .NGENPDB directory just as WPR would.    Like all collection
        commands, you can use the
        '/Providers' qualifier to add more providers as well as the /KernelEvents or /ClrEvents qualifiers to fine-tune the Kernel
        and .NET provider events.
    </p>
    <p>
        If you wish to generate a file as WPR would but take advantage of PerfView's ZIPPing capability you can combine the /wpr
        and /zip commands as follows.
    </p>
    <ul>
        <li>PerfView /wpr /zip collect </li>
    </ul>
    <p>
        This command will turn on the providers as WPR would, but ZIP it like PerfView would.   This is useful for remote collection.
        You can use this to collect the data, and use the PerfView /wpr unzip to unpack it at its destination for viewing with WPA.
    </p>
    <h3>Using PerfView to View data collected with WPR. </h3>
    <p>
        PerfView has a number of views and viewing capabilities that WPA does not have.   Thus it is often useful to view data in PerfView
        that was collected with WPR.    This scenario 'just works'   PerfView already knows how to open the ETL files and it is smart enough
        to notice the NGENPDB directory for the symbolic information and use it appropriately.
    </p>
    <hr />
    <!--  ********************************** -->
    <h2><a id="CommandLineReference">Command Line Reference</a></h2>
    <p>
        Most functionality that is not intimately tied to viewing is available from the
        command line to allow for easy automation of data collection.&nbsp; At the command
        line typing
    </p>
    <ul>
        <li>PerfView /? </li>
    </ul>
    <p>
        Or&nbsp; navigating to Help-&gt;Command Line Help from the main PerfView window
        will give you more complete details.&nbsp;&nbsp;
    </p>
    <p>
        See also <a href="#PerfViewExtensions">PerfView Extensions</a> for information on
        building extensions for PerfView.
    </p>
    <!--  ****************** -->
    <h3>Using PerfView in Scripts (/LogFile qualifier)</h3>
    <p>
        By default PerfView will always bring up a GUI window when performing any operation,
        including data collection.&nbsp; It does this to allow errors to be reported back.
        For unattended automation this can be undesirable.&nbsp;&nbsp; This is /LogFile:<strong>FileName</strong>
        qualifier is for.&nbsp; When this qualifier is specified instead of launching the
        GUI the command will send all output to the specified file.&nbsp;&nbsp; The intent
        is that scripts would use this qualifier to avoid the GUI.&nbsp;&nbsp; The exit
        code for PerfView will be 0 if the command was successful.&nbsp;
    </p>
    <!--  ****************** -->
    <h3>
        <a id="AdvancedCollection">Advanced Data Collection</a>
    </h3>
    <p>
        PerfView data collection is based on
        <a href="http://msdn.microsoft.com/en-us/library/bb968803(v=VS.85).aspx">Event Tracing for Windows (ETW)</a>.&nbsp;&nbsp;
        This is a general facility
        for logging information in a low overhead way.&nbsp; It is useful extensively throughout
        the Windows OS and in particular is used by both the Windows OS Kernel and the .NET
        CLR Runtime.&nbsp;&nbsp;&nbsp;&nbsp; By default PerfView picks a default set of
        these events that have high value for the kinds of analysis PerfView can visualize.&nbsp;&nbsp;
        However PerfView can also be used as simply a data-collector, at which point it
        can be useful to turn on other events.&nbsp;&nbsp; This is what the /KernelEvents:
        /ClrEvents: and /Provider: qualifiers do
    </p>
    <p>
        All ETW events log the following information
    </p>
    <ol>
        <li>The time (to 100ns resolution) when the event happened </li>
        <li>
            The provider that logged the event (e.g., the Kernel, CLR or some user provider).
        </li>
        <li>The event number (which indicates how to decode the payload) </li>
        <li>
            The process and thread associated with the event (some events however there is no
            useful process or thread ID, but most do)
        </li>
    </ol>
    <h4>Kernel Events</h4>
    <p>
        By far, the ETW events built into the Windows Kernel are the most fundamental and
        useful.&nbsp;&nbsp; Almost any data collection will want to turn at least some of
        these on.&nbsp;&nbsp; PerfView groups the kernel events into three groups&nbsp;
        See <a href="http://msdn.microsoft.com/en-us/library/aa363784(v=VS.85).aspx">
            Kernel
            ETW Events
        </a>
    </p>
    <h5>The Default Kernel Group</h5>
    <p>
        The default group is the group that PerfView turns on by default.&nbsp;&nbsp; The
        most verbose of these events is the &#39;Profile&#39; event that is trigger a stack
        trace every millisecond for each CPU on the machine (so you know what your CPU is
        doing).&nbsp;&nbsp;&nbsp; Thus on a 4 processor machine you will get 4000 samples
        (with stack traces) every second of trace time.&nbsp;&nbsp; This can add up.&nbsp;&nbsp;
        Assume you will get at least 1 Meg of file size per second of trace.&nbsp;&nbsp;
        If you need to run very long traces (100s of seconds), you should strongly consider
        using the circular buffer mode to keep the logs under control.&nbsp;&nbsp; Here
        are the events you get under the default group:
    </p>
    <ol>
        <li>
            Default = DiskIO | DiskFileIO | DiskIOInit | ImageLoad | MemoryHardFaults | NetworkTCPIP
            | Process | ProcessCounters | Profile | Thread
        </li>
        <li>
            DiskIO - Fires every time a physical disk read is COMPLETE, indicates the size,
            and how long the operation took.&nbsp; No stack trace.
        </li>
        <li>
            DiskIOInit - Fires each time Disk I/O operation begins (where DiskIO fires when
            it ends).&nbsp; Unlike DiskIO this logs a stack trace.&nbsp;
        </li>
        <li>
            DiskFileIO - Logs the mapping between OS file object handles and the name of the
            file.&nbsp; Without this many kernel events are not useful because you can&#39;t
            relate the operation to a meaningful name.&nbsp;&nbsp;&nbsp; You almost always want
            this event.&nbsp; No stack trace.
        </li>
        <li>
            ImageLoad - Fires when a DLL or EXE is loaded into memory for execution (LoadLibaryEx
            is called).&nbsp; Needed if you want to map memory addresses back to symbolic names.&nbsp;
            Logs a stack trace.&nbsp;
        </li>
        <li>
            MemoryHardFaults - Fires when the OS had to cause a physical disk read in response
            to mapping virtual memory.&nbsp;&nbsp; Logs a stack trace.
        </li>
        <li>
            NetworkTCPIP - Fires when TCP&nbsp; or UDP packets are sent or received.&nbsp;&nbsp;
            Logs the two end points and the size.&nbsp; No stack trace.
        </li>
        <li>
            Process - Fires when a process is created or destroyed.&nbsp; Indicates the command
            line (on start) or exit code (on end).&nbsp; Logs a stack trace.
        </li>
        <li>
            ProcessCounters - Logs process memory statistics before a process dies or the trace
            ends. &nbsp; No stack trace.
        </li>
        <li>
            Profile&nbsp; - Fires every 1 msec per processor and indicates where the instruction
            pointer current list and takes as tack trace.
        </li>
        <li>
            Thread - Fires every time a thread is created or destroyed.&nbsp;&nbsp; Logs a stack
            trace.&nbsp;
        </li>
    </ol>
    <p>
        The following Kernel events are not on by default because they can be relatively
        verbose or are for more specialized performance investigations.&nbsp;
    </p>
    <ol>
        <li>
            ThreadTime = Default | ContextSwitch | Dispatcher - This is the most common
            of the verbose options. In addition to all the default providers. This option is
            needed if you want to use the 'Thread Time' view in perfview.
        </li>
        <li>
            Verbose = Default | ContextSwitch | DiskIOInit | Dispatcher | FileIO | FileIOInit
            | MemoryPageFaults | Registry | VirtualAlloc
        </li>
        <li>
            ContextSwitch - Fires each time OS stops running switches to another.&nbsp; It indicates
            losing processor and the thread getting it.&nbsp; This event fire &gt; 10K second
            depending on scenario, but can be VERY useful for determining why some process is
            waiting.&nbsp; Logs a stack trace.
        </li>
        <li>
            Dispatcher - (Also known as ReadyThread) Fires when a thread goes from waiting to
            ready (note that the thread may not actually run if there is no CPU available).&nbsp;
            This can also fire &gt; 10K / sec, but is very useful in understanding why waits
            are happening.&nbsp;
        </li>
        <li>
            FileIO - Fires when a file operation completes (even if the operation does not cause
            a disk read (because it was in the file system cache).&nbsp; Does not log a stack
            trace.&nbsp;
        </li>
        <li>
            FileIOInit - Fires when a file operation starts.&nbsp; Unlike FileIO this will log
            a stack trace.&nbsp;
        </li>
        <li>
            MemoryPageFaults - Fires when a virtual memory page is make accessible (backed by
            physical memory).&nbsp;&nbsp; This fires not only when the page needed to be fetched
            from disk, but also if it was already in the file system cache, or only needed to
            be zeroed.&nbsp;&nbsp;&nbsp; Logs a stack trace.
        </li>
        <li>
            Registry - Fires when a registry operation occurs.&nbsp;&nbsp; Logs a stack trace.
        </li>
        <li>
            VirtualAlloc - Fires when the Virtual memory allocation or free operation occurs.&nbsp;
            All memory in a process either was mapped or was allocated through Virtual Alloc
            operations.
        </li>
    </ol>
    <p>
        The final set of kernel events are typically useful for people writing device drivers
        or trying to understand why hardware or low level OS software is misbehaving&nbsp;
    </p>
    <ol>
        <li>
            OS = AdvancedLocalProcedureCalls | DeferedProcedureCalls | Driver | Interrupt
        </li>
        <li>
            AdvancedLocalProcedureCalls - Logged when an OS machine local procedure call is
            made.
        </li>
        <li>DeferedProcedureCalls - Logged when an OS Deferred procedure call is made </li>
        <li>SplitIO - Logged when an disk I/O had to be split into pieces </li>
        <li>Driver - Logs various hardware driver events occur. </li>
        <li>Interrupt - Logged when a hardware interrupt occurs. </li>
    </ol>
    <h4>CLR Events</h4>
    <p>
        In addition to the kernel events, if you are running .NET Runtime code you are likely
        to want to also have the CLR ETW events turned on.&nbsp;&nbsp;&nbsp; PerfView turns
        a number of these on by default.&nbsp;&nbsp; See&nbsp; <a href="http://msdn.microsoft.com/en-us/library/dd264810.aspx">CLR ETW Events</a>
        for more information on these events.
    </p>
    <ol>
        <li>
            Default = GC | Type | GCHeapSurvivalAndMovement | Binder | Loader | Jit | NGen | SupressNGen
            | StopEnumeration | Security | AppDomainResourceManagement | Exception | Threading | Contention | Stack | JittedMethodILToNativeMap
            | ThreadTransfer
        </li>
        <li>GC - Fires when GC starts and stops </li>
        <li>Binder - Currently only useful for CLR team.</li>
        <li>Loader -Fires when assemblies are loaded or unloaded </li>
        <li>Jit - Fires when methods are Just in Time (JIT) compiled. </li>
        <li>NGen - Fires when operations assumed with precompiled NGEN images happen </li>
        <li>Security - Fires on various security checks </li>
        <li>
            AppDomainResourceManagement - Fires when certain appdomain resource management events
            occur.
        </li>
        <li>Contention - Fires when managed locks cause a thread to sleep. </li>
        <li>Exception - Fires when a managed exception happens. </li>
        <li>Threading - Fires on various System.Threading.ThreadPool operations </li>
        <li>
            Stop Enumeration - Dumps symbolic information as early as possible (not recommended)
        </li>
        <li>
            Start Enumeration - Dumps symbolic information as late as possible (typically at
            process stop). This is the default.
        </li>
        <li>
            JitTracing - Verbose information on Just in time compilation (why things were inlined
            ...)
        </li>
        <li>
            Interop - Verbose information on the generation of Native Interoperations code.&nbsp;
        </li>
        <li>Stack - Turn on stack traces for various CLR events.&nbsp; </li>
    </ol>
    <h4>
        <a id="ASPNetEvents">ASP.NET Events</a>
    </h4>
    <p>
        ASP.NET has a set of events that are sent when each request is process.&nbsp;&nbsp;
        PerfView has a special view that you can open when ASP.NET events are turned on.&nbsp;&nbsp;
        By default PerfView turns on ASP.NET events, however, you must also have selected
        the &#39;Tracing&#39; option when ASP.NET was installed for these events to work.&nbsp;
        Thus if you are not seeing ASP.NET events you are running an ASP.NET scenario this
        is one likely reason why you are not getting data.
    </p>
    <p>
        To turn on ASP.NET Tracing
    </p>
    <p>
        The easiest way to turn on tracing is with the DISM tool that comes with the operating system.&nbsp;&nbsp;
        Run the following command from an elevated command prompt
    </p>
    <ul>
        <li>DISM /online /Enable-Feature /FeatureName:IIS-HttpTracing </li>
    </ul>
    <p>
        Note that this command will restart the web service (so that it takes effect), which may cause complications
        if you ASP.NET service handles long (many second) requests.  This will either force DISM to delay (for a reboot) or
        abort the outstanding requests.   Thus you may wish to schedule this with other server maintenance.   Once this
        configuration is done on a particular machine, it persists.
    </p>
    <p>
        You can also do this configuration by hand using a GUI interface.&nbsp; You first need to get to the dialog for
        configuring windows software.&nbsp; This
        differs depending on whether you are on a Client or Server version of the operating
        system.
    </p>
    <ul>
        <li>
            On Client - Start -&gt; Control Panel -&gt; Programs -&gt; Programs and Features
            -&gt;&nbsp; Turn Windows features on or off
            <ul>
                <li>
                    -&gt;&nbsp; Internet Information Services -&gt; World Wide Web Services -&gt; Health
                    and Diagnostics -&gt; Tracing
                </li>
            </ul>
        </li>
        <li>
            On Server - Start -&gt; Computer -&gt; Right Click -&gt; Manage Roles -&gt; Web
            Server (IIS) -&gt; Roll Services&nbsp;&nbsp;
            <ul>
                <li>Add Role Services Health and Diagnostics -&gt; Tracing</li>
            </ul>
        </li>
    </ul>
    <hr />
    <!--  ****************** -->
    <h3>
        <a id="SymbolResolution">Symbol Resolution</a>
    </h3>
    <p>
        See also <a href="#SourceCodeLookup">Source Code Lookup.</a>
    </p>
    <p>
        At collection time, when a CPU sample or a stack trace is taken, it is represented
        by an address in memory.&nbsp;&nbsp;&nbsp; This memory address needs to be converted
        to symbolic form to be useful for analysis.&nbsp;&nbsp; This happens in two steps.&nbsp;
    </p>
    <ol>
        <li>
            &nbsp;First determine if the code belongs to a particular DLL (module) or not.&nbsp;
        </li>
        <li>Given the DLL, look up detailed symbolic information </li>
    </ol>
    <p>
        If the first step fails (uncommon), then the address is given the symbolic name
        ?!? (unknown module and method).&nbsp;&nbsp; However if the second step fails (more
        common) then you can at least know the module and the address is given the symbolic
        name <strong>module</strong>!?.
    </p>
    <!--  ******** -->
    <h4>
        <a id="UnknownMethods">?!? Methods</a>
    </h4>
    <p>
        Code that does not belong to any DLL must have been dynamically generated.&nbsp;&nbsp;
        If this code was generated by the .NET Runtime by compiling a .NET Method, it should&nbsp;
        have been decoded by PerfView.&nbsp;&nbsp; However if you specified the /NoRundown
        or the log file is otherwise incomplete, it is possible that the information necessary
        to decode the address has been lost.&nbsp;&nbsp;&nbsp; More commonly, however there
        are a number of &#39;anonymous&#39; helper methods that are generated by the runtime,
        and since these have no name, there is not much to do except leave them as ?!?.&nbsp;&nbsp;&nbsp;
        These helper typically are uninteresting (they don&#39;t have much exclusive time),
        and can be folded into their caller during analysis (add ?!? to the <a href="#FoldPatsTextBox">FoldPats textbox</a>).&nbsp;
        They typically happen at the boundary of managed
        and unmanaged code.&nbsp;
    </p>
    <!--  ******** -->
    <h4>module!? Methods</h4>
    <p>
        Code that was not generated at runtime is always part of the body of a DLL, and
        thus the DLL name can always be determined.&nbsp;&nbsp;&nbsp;Precompiled managed
        code lives in (NGEN) images which have in <strong>.ni </strong>in their name and
        the information should be in the ETL file PerfView collected.&nbsp; &nbsp; If you
        see things unknown function names in modules that have <strong>.ni </strong>in them
        it implies that something went wrong with CLR rundown (see <a href="#UnknownMethods">?!? methods</a>).&nbsp;
        For unmanaged code (that do not have <strong>.ni</strong>)
        the addresses need to be looked up in the symbolic information associated with that
        DLL.&nbsp;&nbsp; This symbolic information is stored in program database files (PDBs)),
        and can be fairly expensive (10s of seconds or more), to resolve a large trace.&nbsp;&nbsp;
        Because of this PerfView by default does not resolve any unmanaged symbols.&nbsp;
    </p>
    <p>
        Instead it waits until you as the user request more symbolic information.&nbsp;
        Typically this is done in the stack viewer by right clicking on a cell with a <strong>module</strong>!?
        name in and selecting &#39;Lookup Symbols&#39;.&nbsp; This
        indicates that PerfView should search for the PDB file and resolve any&nbsp; names
        that it can in <strong>module</strong>.&nbsp; Problems finding the correct PDB are
        not uncommon, so this is not guaranteed to succeed, and can take a few seconds to
        complete.&nbsp;&nbsp; See the log file if &#39;Lookup Symbols&#39; fails.&nbsp;
    </p>
    <p>
        In general PerfView supports executing a command on multiple cells.&nbsp; This can
        be handy for symbol resolution.&nbsp; For example if there are several unresolved
        modules that look interesting to you (because they have high CPU usage), you can
        select them all (by dragging or shift-clicking) and then select &#39;Lookup Symbols&#39;.&nbsp;&nbsp;
    </p>
    <p>
        It is possible to &#39;prefetch&#39; symbols from the command line.&nbsp;&nbsp;
        You do this by specifying the /SymbolsForDlls:<strong>dll1</strong>,<strong>dll2</strong>
        ...&nbsp; when launching PerfView.&nbsp;&nbsp; The dlls in the list passed to /SymbolsForDlls
        do NOT have their file name extension or path.&nbsp;
    </p>
    <!--  ******** -->
    <h4>Default Symbol Path</h4>
    <p>
        By far, the most common unmanaged DLLs of interest are the DLLs that Microsoft ships
        as part of the operating system.&nbsp;&nbsp;&nbsp; Thus if you don&#39;t specify
        a _NT_SYMBOL_PATH PerfView uses the following &#39;standard&#39; one
    </p>
    <ul>
        <li>
            _NT_SYMBOL_PATH=SRV*%TEMP%\SymbolCache*https://msdl.microsoft.com/download/symbols
        </li>
    </ul>
    <p>
        This says is to look up PDB at the standard Microsoft PDB server https://msdl.microsoft.com/download/symbols
        and cache them locally in %TEMP%\SymbolCache.&nbsp;&nbsp; Thus by default you can always
        find the PDBs for standard Microsoft DLLs.&nbsp;
    </p>
    <p>
        However if you are interested in symbols for DLLs that Microsoft does not publish
        (e.g. your own unmanaged code, you must supply a _NT_SYMBOL_PATH before launching
        PerfView that specifies where to look.
    </p>
    <!--  ******** -->
    <h4>
        <a id="SymbolPathTextBox">Setting _NT_SYMBOL_PATH in the GUI</a>
    </h4>
    <p>
        If you need change the symbol path, you can either set the _NT_SYMBOL_PATH environment
        variable before you launch PerfView, or you can use the File -&gt; SetSymbolPath
        menu option on StackViewer window.&nbsp;&nbsp; This command will bring up a simple
        dialog box showing the current value of the _NT_SYMBOL_PATH variable and allow you
        to change it.&nbsp;&nbsp; The _NT_SYMBOL_PATH is a semicolon delimited list of places
        to look for symbols. Each such entry can be either
    </p>
    <ol>
        <li>
            A simple file system path. These can be relative, but absolute paths
            are recommended
        </li>
        <li>
            Syntax of the form SRV*<strong><em>localPath</em></strong>*<strong><em>symbolServer</em></strong>.&nbsp;&nbsp;
            Where <strong><em>localPath</em></strong> is optional and specifies a location on
            your local machine to cache files fetched from the symbol server.&nbsp;&nbsp; Using
            this is always recommended and PerfView will add it for you (using %TEMP%\SymbolCache)
            if you don&#39;t enter it.&nbsp;&nbsp;&nbsp; <strong><em>SymbolServer</em></strong>
            is the name of the symbol server.&nbsp; It is either a UNC file name (e.g. \\MySymbols\symbols)
            or a URL (e.g. https://msdl.microsoft.com/download/symbols)
        </li>
    </ol>
    <p>
        Typically if you don't get unmanaged symbols when you do the &#39;Lookup Symbols&#39;,
        you check the log and if necessary add new paths to the symbol path.&nbsp;&nbsp;
        See also <a href="#SymbolResolution">symbol resolution</a>.
    </p>
    <p>
        PerfView supports Azure DevOps symbol servers and it will automatically authenticate either using
        local  development credentials (Visual Studio or VSCode) or by prompting you to sign in.
        <ul>
            <li>
                <code>
                    SRV*localPath*https://<strong>yourorg</strong>.artifacts.visualstudio.com/_apis/Symbol/symsrv
                </code>
            </li>
            <li>
                <code>
                    SRV*localPath*https://artifacts.dev.azure.com/<strong>yourorg</strong>/_apis/symbol/symsrv
                </code>
            </li>
        </ul>
    </p>
    <!--  ******** -->
    <h4>Summary</h4>
    <p>
        Thus typically all you need to get good symbols is
    </p>
    <ol>
        <li>
            <p style="margin-left: 40px">
                If you are investigating performance problems of unmanaged DLLs of EXEs that did
                not come from Microsoft (e.g. you built them yourself), you have to set the _NT_SYMBOL_PATH
                to include the location of these PDBs before launching PerfView.
            </p>
        </li>
        <li>
            <p style="margin-left: 40px">
                Select cells that have !? in them in the viewer, right click and select &#39;Lookup
                Symbols&#39;
            </p>
        </li>
    </ol>
    <!--  ****************** -->
    <h3>
        <a id="SourceCodeLookup">Source Code Lookup</a>
    </h3>
    <p>
        One very useful feature that is easy to miss is PerfView's source code support.
        This support is activated by selecting a name in the stack viewer and typing Alt-D
        (D for definition), or right clicking and selecting 'Goto Source'. This will bring
        up the source code for that name in a text editor, where every line has been annotated
        with metric for that line. This feature is indispensable for doing analysis within
        a method, and is also just generally useful for understanding what the code is doing
        in general.
    </p>
    <p>
        Source code support is a relatively fragile mechanism because in addition to having
        all the information to symbolically look up method names (PDBs) PerfView also needs
        line level information as well as access to the source code itself. It is easy for
        these extra conditions to break which will break the feature. However source code
        support is typically so useful that it is worth the trouble to get things working.
    </p>
    <p>
        In order for source code to work you need the following
    </p>
    <ol>
        <li>
            The code must support line level symbolic information. This includes
            <ul>
                <li>Unmanaged code (e.g. C++)</li>
                <li>
                    Managed code using the .NET V4.5 Runtime. V4.5 is an in-place update to the V4.0
                    .NET Runtime, which windows update should install by 12/2012 (it is also the default
                    for Windows 8). However if you are running an application built for V3.5, source
                    code will not work unless you set a configuration file for the app to force it to
                    use the V4.5 runtime.
                </li>
            </ul>
        </li>
        <li>
            PerfView must be able to find the source code. This can be accomplished in a number
            of ways.
            <ul>
                <li>
                    If the code was built on the machine where the profile was collected, then things
                    should 'just work'. The EXE or DLL will contain the path to the symbol file (PDB)
                    and this will be correct, and the source code paths in the symbol file will also
                    be correct.
                </li>
                <li>
                    If the code was built with 'Source Server' support and you have access to the TFS
                    or Source Depot (SD) source code repository, then again source code should 'just
                    work'. This is a common case for users within Microsoft itself because both DevDiv
                    (which makes Visual Studio, and the .NET Runtime), and the Operating system to build
                    their code with source server support. In this case the PDB symbol file has embedded
                    within it the exact version information needed to find exactly the right version
                    of the source in the source code control system.
                </li>
                <li>
                    If the code was built with <a href="http://aka.ms/sourcelink">Source Link</a> support
                    then PerfView will attempt to download the source file from the linked repository.
                    For public repos on GitHub.com, for example, this should just work. For private
                    GitHub repos and Azure DevOps repos, you may be prompted for authorization.
                    See <a href="#AuthenticationOptions">Authentication Options</a> for the different
                    ways PerfView can authenticate to private repositories and symbol stores.
                    If you need to supply credentials, but take too long to sign in, the source look-up
                    will time out. In that case, just retry the "Go to source" command and it should
                    succeed.
                </li>
                <li>
                    You have set the _NT_SOURCE_PATH environment variable to be a semicolon list of
                    places to look to find the source code. Each such element in this list is a 'base'
                    that PerfView will search by appending suffixes of the full build-time path of the
                    source file.
                </li>
            </ul>
        </li>
    </ol>
    <p>
        PerfView gives detailed messages in PerfView's log of the steps it took to find
        the source code. Thus if there is any issue with looking up source code this log
        is the place to start.
    </p>
    <!--  ******** -->
    <h4>
        <a id="SourcePathTextBox">Setting _NT_SOURCE_PATH in the GUI</a>
    </h4>
    <p>
        Often you don't need to set the _NT_SOURCE_PATH variable because by default PerfView
        will search both the original build time location (which will work if you build
        on the same machine you run) as well as the symbol server specified in the PDB symbol
        file (Which works if the code was indexed with the source server. However in other
        cases you must set the _NT_SOURCE_PATH. Just like the case of _NT_SYMBOL_PATH, you
        can set this variable in the GUI by going to the File -> 'Set Source Path' menu
        entry of the stack viewer. This value is persisted across different invocations
        of the PerfView program.
    </p>
    <p>
        See also <a href="#SourceCodeLookup">Source Code Lookup.</a>
    </p>
    <h3>
        <a id="AuthenticationOptions">Authenticating to Azure DevOps symbol servers and private source repositories.</a>
    </h3>
    <p>
        If your symbols are on an Azure DevOps artifacts store, or your source code is not public,
        then PerfView may prompt you to sign in. Support currently exists for Azure DevOps and private 
        GitHub repositories. If installed, PerfView will try to use the <a href="https://github.com/GitCredentialManager">Git Credential Manager</a>
        which is typically installed with Git For Windows. If Git Credential Manager is not installed,
        PerfView will fall back to alternate authentication mechanisms. The authentication mechanisms
        can be configured on the Authentication submenu on the Options menu in the main PerfView window.
        The authentication options are described below.
        <ul>
            <li>
                <strong>Git Credential Manager</strong>. This is the most flexible option for developers
                using Git. It works alongside your Git installation to sign into private repositories.
                Support is currently enabled for Azure DevOps and GitHub. We hope to add GitLab and BitBucket
                support in the future. 
                PerfView will search for the Git Credential Manager executable (git-credential-manager-core.exe)
                in a number of well-known locations, but if it can't be found, then the option will be
                unavailable. If you have installed Git Credential Manager in a non-standard location, you can
                set the value of the GCM_CORE_PATH environment variable to the full path prior to launching PerfView.
                However, please note that, for security reasons, the GCM_CORE_PATH environment variable is ignored when PerfView is running elevated.
            </li>
            <li>
                <strong>Developer identity for Azure DevOps</strong>. If you're having trouble with Git
                Credential Manager (sometimes, it can prompt several times for the same credentials), then this
                might be a better option. It uses the same mechanism that's' used in the Azure SDK for .NET when
                providing developer credentials to Azure resources. Visual Studio or VS Code must be installed
                and, while they don't have to be running at the same time as PerfView, you must have signed
                into Visual Studio or VS Code using credentials that can access your Azure DevOps repo.
                If you sign into Visual Studio with several different accounts, you may need to select the
                right one in Tools/Options/Azure Service Authentication.
                See the <a href="https://devblogs.microsoft.com/azure-sdk/authentication-and-the-azure-sdk/">Authentication and the Azure SDK</a> 
                blog posting for more information.
            </li>
            <li>
                <strong>Device Code Flow for GitHub</strong>. This option, for GitHub only, uses a Device Code
                to grant PerfView access to GitHub private repositories. 
                PerfView will prompt you with an 8 digit device code which you use to log into GitHub.com using
                any web browser. The browser could be running on a different device, if necessary.
                When you enter the code into the browser and approve the app, the dialog will automatically close
                and PerfView will be able to access the same private repositories that your account is allowed
                to access.
            </li>
            <li>
	            <strong>Basic HTTP Authentication</strong>. This option allows you to use Basic HTTP authentication
	            when connecting to a symbol server. To use it, you should specify the username and password in the URL for your symbol server. For example:
	            <i>SRV*SymbolCachePath*https://username:password@symbolstore.url;</i>. This scheme is active by default but
	            used only if the URL contains <i>username</i> and <i>password</i> information.
            </li>
        </ul>
    </p>
    <hr />
    <!--  ****************** -->
    <h3>
        <strong><a id="BrokenStacks">&#39;BROKEN&#39; Stack Frame in Trace. </a>&nbsp;</strong>
    </h3>
    <p>
        When a sample is taken, the ETW system attempts to take a stack trace.&nbsp;&nbsp;&nbsp;
        For a variety of reasons it is possible that this will fail before a complete stack
        is taken.&nbsp;&nbsp;&nbsp; PerfView uses the heuristic that all stacks should end
        in a frame in a particular OS DLL (ntdll) which is responsible for creating threads.&nbsp;&nbsp;
        If a stack does not end there, PerfView assumes that it is broken, and injects a
        pseudo-node called &#39;BROKEN&#39; between the thread and the part of the stack
        that was fetched (at the very least it will have the address of where the sample
        was taken).&nbsp;&nbsp;&nbsp; Thus BROKEN stacks should always be direct children
        of some frame representing an OS thread.&nbsp;&nbsp;
    </p>
    <p>
        When the number of BROKEN stacks are small (say &lt; 3% of total samples), they
        can simply be ignored.&nbsp; This is the common case.&nbsp;&nbsp; However the more
        broken stacks there are, the less useful a &#39;top-down&#39; analysis (using the
        CallTree View) is because effectively some non-trivial fraction of the samples are
        not being placed in their proper place, giving you skewed results near the top of
        the stack.&nbsp;&nbsp;&nbsp; A &#39;bottom-up&#39; analysis (where you look first
        as where methods where samples occurred) is not affected by broken stacks (however
        as that analysis moves &#39;up the stack&#39;, it can be affected)
    </p>
    <p>
        Broken stacks occur for the following reasons
    </p>
    <ol>
        <li>
            In 32 bit processes, ETW relies on the compiler to mark the stack by emitting an
            &#39;EBP Frame&#39;.&nbsp; When it fails to do this completely and uses the EBP
            register for other purposes, it breaks the stack.&nbsp;&nbsp; This should not happen
            for operating system code or for .NET Runtime code, but may occur for 3rd party
            code.
        </li>
        <li>
            In a 32 bit process on a 64 bit Windows 7 or Windows Server 2008 there is a bug
            in which stacks are uniformly dropped in some sessions.&nbsp; The good news is that
            it only happens intermittently.&nbsp;&nbsp; Thus if you collect the data again,
            it is likely to sidestep this bug.&nbsp;&nbsp; This should be fixed in Windows 8.
        </li>
        <li>
            In a 64 bit process, ETW relies on a different mechanism to walk the stack.&nbsp;
            In this mechanism the compiler generates &#39;unwind information&#39;.&nbsp;&nbsp;&nbsp;
            Currently this ETW mechanism does not work properly for dynamically generated code
            (as generated by the .NET runtime JIT compiler).&nbsp; This causes stacks to be
            broken at the first JIT compiled method on the stack (you see the JIT compile method,
            but no callers of that method).&nbsp;&nbsp;&nbsp; This issue is fixed on Window
            8 but not in previous OS versions.&nbsp;
        </li>
        <li>
            Asynchronous activities.&nbsp;&nbsp; Stack crawling is a &#39;best effort&#39; service.&nbsp;&nbsp;
            If the sample is taken at a time where it would be impossible to do logging safely,
            then the OS simply skips it.&nbsp;&nbsp; For example, if during stack crawling while
            in the kernel the stack page is found to be swapped out to the disk, then stack
            crawling is simply aborted.&nbsp;
        </li>
    </ol>
    <h4>Working around 64 bit stack breaks:</h4>
    <p>
        If you are profiling a 64 bit process there is pretty good chance that you are being
        affected by scenario (2) above.&nbsp;&nbsp;&nbsp; There are three workarounds to
        broken stacks in that instance
    </p>
    <ol>
        <li>
            NGEN the application.&nbsp;&nbsp; The failures occur at JIT compiled code.&nbsp;
            If you <a href="http://msdn.microsoft.com/en-us/library/6t9t5wcf(VS.80).aspx">NGEN</a>
            the application, JIT compilation will not be necessary and the broken stacks will
            disappear.&nbsp;&nbsp; To NGEN your application simply type&nbsp;
            <p>
                C:\Windows\Microsoft.NET\Framework64\v4.0.30319\NGen install YourApp.exe.
            </p>
            You will have to repeat this every time your application is recompiled. If your
            code is called from a server, you need to NGEN all the DLLs that are important to
            you (same command line as above).
            <p>
                For server applications there is often not a main EXE that you can pass to the NGEN
                command above, however you can NGEN particular DLLs using the same syntax (NGEN
                install DLLPATH). If you don't know that path names to your DLLs you can find them
                by going to the 'Events' view and selecting the 'ModuleLoad' and 'ModuleDCStop'
                events as well as the 'ModuleILPath' and 'ModuleNativePath' columns. Any DLL without
                a 'ModuleNativePath' is a candidate for NGEN.
            </p>
        </li>
        <li>
            Switch to 32 bit.&nbsp;&nbsp; If your code is pure managed code, then it can run
            both as a 32 or a 64 bit process.&nbsp; By switching use a 32 bit process, you avoids
            the problem.&nbsp;&nbsp; This does not work if you took dependencies native code
            that only exists for 64 bit.&nbsp;&nbsp;&nbsp; You can convert your application
            to run 32 bit by using the <a href="http://msdn.microsoft.com/en-us/library/ms164699(VS.80).aspx">CorFlags</a>
            utility that comes are part of the <a href="http://www.microsoft.com/downloads/details.aspx?FamilyID=fe6f2099-b7b4-4f47-a244-c96d69c35dec&amp;DisplayLang=en">.NET SDK</a>.&nbsp;&nbsp;
            It also comes are part of Visual Studio (open
            the VS command prompt).&nbsp;&nbsp; To switch simply type CorFlags /32bit+ <strong>YourApp.exe</strong>.
            You will have to repeat this every time your application is recompiled.
            <p>
                For ASP.NET applications you can set it so that your page is loaded in a 32 bit
                process by following the instruction in
                <a href="http://blogs.msdn.com/b/rakkimk/archive/2007/11/03/iis7-running-32-bit-and-64-bit-asp-net-versions-at-the-same-time-on-different-worker-processes.aspx">this blog</a>
            </p>
        </li>
        <li>
            Perform only a bottom-up analysis.&nbsp;&nbsp; Even with many broken stacks, there
            is a lot of information in the profile, and a &#39;bottom-up&#39; analysis is possible.&nbsp;
        </li>
    </ol>
    <!--  ****************** -->
    <h3>
        <a id="MissingFrames">
            Missing frames on stacks (Stacks Says A calls C, when in the source
            A calls B which calls C)
        </a>
    </h3>
    <p>
        Missing stack frames are different than a broken stack because it is frames in the
        &#39;middle&#39; of the stack that are missing.&nbsp;&nbsp; Typically only one or
        maybe two methods are missing.&nbsp;&nbsp; There are three basic reasons for missing
        stacks.
    </p>
    <ol>
        <li>
            Inlining.&nbsp;&nbsp; If A calls B calls C, if B is very small it is not unusual
            for the compiler to have simply &#39;inlined&#39; the body of B into the body of
            A.&nbsp;&nbsp; In this case obviously B does not appear because in a very real sense
            B does not exist at the native code level.
        </li>
        <li>
            Tail-calling.&nbsp;&nbsp; If the last thing method B does before returning is to
            call C, the compiler can do another optimization.&nbsp;&nbsp; Instead of calling
            C and then returning to A, B can simply jump to C.&nbsp;&nbsp;&nbsp; When C returns
            it will simply return to A directly.&nbsp;&nbsp;&nbsp; From a profiler&#39;s point
            of view, when the CPU is executing C, B has been removed from the stack and thus
            does not show up in the trace.&nbsp;&nbsp; Note also that B does not need to be
            small for this optimization to be beneficial.&nbsp; The only requirement is that
            calling C is the last thing that B does.&nbsp;&nbsp;
        </li>
        <li>
            EBP Frame optimization.&nbsp; In 32 bit processes (64 bit processes don&#39;t use
            EBP Frames), the profiler is relying on the compiler to &#39;mark&#39; the call
            by emitting code at the beginning of the method called the EBP Frame.&nbsp;&nbsp;&nbsp;
            If the compiler does not set up a frame at all and uses the EBP register for its
            own use it results in a <a href="#BrokenStacks">broken stack</a>.&nbsp;&nbsp; However
            even when the compiler is aware of the need to generate EBP Frames there is overhead
            in doing so (2 instructions at the beginning and end of the method.&nbsp;&nbsp;
            For small methods (too big to inline, but still small), the compiler can opt to
            simply omit the generation of the frame (but LEAVE EBP untouched).&nbsp;&nbsp; This
            results in a missing frame.&nbsp;&nbsp; It should be noted that the EBP Frame that
            method sets up marks the CALLER, not itself.&nbsp;&nbsp; Thus if method B seems
            to be missing, it is not because B omitted its EBP frame but because method C did.&nbsp;&nbsp;&nbsp;
            Thus this kind of frame omission happens when method C is small, not when B is small.&nbsp;
        </li>
    </ol>
    <p>
        While missing frames can be confusing and thus slow down analysis, they rarely truly
        block it.&nbsp;&nbsp; Missing frames are the price paid&nbsp; for profiling unmodified
        code in a very low overhead way.&nbsp;&nbsp;&nbsp;&nbsp;
    </p>
    <hr />
    <!--  ********************************** -->
    <h2>
        <a id="Troubleshooting">Troubleshooting</a>
    </h2>
    <!--  ****************** -->
    <h3>
        <a id="MainViewerTroubleshooting">Main View Troubleshooting</a>
    </h3>
    <ul>
        <li>
            <strong>Check the Log File </strong>- If you performed an operation (like symbol
            lookup) and&nbsp; you should check the <a href="#LogFile">log file</a> (button in
            the bottom right corner of the display), for additional information
        </li>
        <li>
            <strong>No ASP.NET Events</strong> If you are expecting the <a href="#ASPNetEvents">ASP.NET</a>
            view when you opened the ETL file but it was not present, it is likely
            because ASP.NET itself was not configured to log such events. See
            <a href="#ASPNetEvents">ASP.NET Events</a> for more.
        </li>
        <li><a href="#Troubleshooting">Other Troubleshooting</a></li>
    </ul>
    <!--  ****************** -->
    <h3>
        <a id="StackViewerTroubleshooting">Stack Viewer Troubleshooting</a>
    </h3>
    <ul>
        <li>
            <strong>Symbol Problems: </strong><a id="TroubleshootingSymbols">
                <strong>
                    (X!?&nbsp;
                    or ?!? in the display
                </strong>
            </a><strong>):&nbsp; </strong>Because of the expense
            of looking up symbols for unmanaged DLL, PerfView resolves them lazily.&nbsp;&nbsp;
            You have to indicate which DLLs to resolve by right clicking on a set of cells and
            selecting the &#39;Lookup Symbols&#39; command.&nbsp;&nbsp;&nbsp; If &#39;Lookup
            Symbols&#39; fails look carefully at the <a href="#LogFile">log file</a> and
            <a href="#SymbolPathTextBox">add to _NT_SYMBOL_PATH</a>.&nbsp;&nbsp;
            Read see <a href="#SymbolResolution">Symbol Resolution</a> for more complete information.
        </li>
        <li>
            <strong>All the CPU time is in a node like OTHER&lt;&lt;ntdll!?&gt;&gt;&nbsp;: </strong>
            An important part of a performance investigation is to group the costs into semantic
            groups that are meaningful to the programmer. Typically this means that you want
            to group code that you have no control over (like the operating system code) as
            one group and much finer groups (often individual methods) for code that you can
            change. The default view for PerfView tries to approximate this by a view called
            'Just My App' which groups all code that is NOT in the directory subtree in which
            EXE file lives as a 'OTHER' group. This works well for many applications, but for
            scenarios where some other host (e.g. internet explorer) it produces poor results
            because your code does not live with the EXE and thus is grouped as 'OTHER'. Fixing
            this is a simple matter of choosing a more appropriate grouping operator. Typically
            choosing the 'group module entries' entry in the 'GroupPats' text box is a good
            choice to start with, and then ungroup (right click->grouping->ungroup module) any
            modules that you yourself own. See how to group and <a href="#GroupPatsTextBox">
                grouping
                reference
            </a> for more on grouping.
        </li>
        <li>
            <strong>All the CPU time is the Process Node</strong>.   By default PerfView
            sets the Fold% textbox to 1, which means that any node in the tree that uses
            less than 1% of the total CPU time is 'inlined' into its parent.   This normally
            works well, but for servers that have 100s of threads, then no thread may use
            more than 1% of the time (since it is split among 100s).  The result is that
            you lose all the detail and just see the Process node, which is not useful.
            To fix this simply set Fold% to 0.   You may also wish to fold away the thread
            nodes (put ^Thread in the Fold Pats Text box).  This basically says I don't care
            what thread CPU is used on, combine them all as if they were one thread.   Once
            you do this you can actually turn Fold% back to 1 because now you don't have
            100s of threads any more (they are treated as one).
        </li>
        <li>
            <strong>&#39;BROKEN&#39; Stack Frame in Trace-</strong> During data collection when
            an event fires, a stack trace is taken.&nbsp; When this trace is incomplete it is
            called a broken stack, and can happen for a variety of reasons.&nbsp; See <a href="#BrokenStacks">Broken stacks</a>
            section for more.&nbsp;&nbsp;&nbsp;
        </li>
        <li>
            <strong>Missing Frames</strong> - Sometimes the trace shows method A calling method
            C&nbsp; but you KNOW that A calls B which calls C.&nbsp; See <a href="#MissingFrames">Missing Frame</a>s for details.&nbsp;
        </li>
        <li>
            <strong>.NET Programs spend a lot of time in clr.dll (or mscorwks.dll) at shutdown.</strong>
            In order to get good symbolic information for .NET methods, it is necessary for
            the CLR runtime to dump the mapping from native instruction location to method name.&nbsp;
            This is done when the process shuts down (or when PerfView requests and rundown
            explicitly).&nbsp;&nbsp; The CPU consumed by this is uninteresting from an analysis
            perspective (because it does not occur normally). The easiest way to exclude this
            time is to set a time range that does not include the process shutdown. See <a href="#ZoomingToARangeOfInterest">
                zooming
                to a range of interest
            </a> for more.
        </li>
        <li><a href="#Troubleshooting">Other Troubleshooting</a></li>
    </ul>
    <!--  ****************** -->
    <h3>
        <a id="EventViewerTroubleshooting">Event Viewer Trouble Shooting</a>
    </h3>
    <ul>
        <li>
            <strong>No events show up</strong> - It is reasonably common to leave text in one
            of the filter text boxes unintentionally (typically the &#39;textPats&#39; textBox).&nbsp;
            This of course filters the set of events more aggressively, often leading to no
            events being shown.&nbsp;&nbsp; Inspect the filters carefully in this case.&nbsp;
        </li>
        <li><a href="#Troubleshooting">Other Troubleshooting</a> </li>
    </ul>
    <hr />
    <!--  ********************************** -->
    <h2>
        <a id="Tips">Tips</a>
    </h2>
    <p>
        Here are useful techniques that may not be obvious at first:
    </p>
    <!--  ****************** -->
    <h3>
        <a id="GeneralTips">General Tips</a>
    </h3>
    <ul>
        <li>
            <strong><a id="HelpTip">Help</a></strong>: On Windows 7, if you drag a top level
            window to the left or right margin of the desktop, it will expand automatically
            to fill exactly half the desktop.&nbsp; Use this feature to drag the StackWindow
            to take up the left half and the help to take up the right.&nbsp; Now you can read
            the help and use it simultaneously.&nbsp;
        </li>
        <li>
            <strong>Blue hyperlink help: </strong>&nbsp;The UI is filled with blue hyperlinks
            that take you to specific help on that aspect of the UI.&nbsp;&nbsp; If you have
            a question about what a particular piece of UI does, a blue hyperlink can often
            help.&nbsp;
        </li>
        <li>
            <strong>Right Click is your friend</strong>:&nbsp; When you want to know what is
            possible to do, it never hurts to right click to see what context menu pops up.&nbsp;&nbsp;
            These menus have the keyboard shortcut on them so it is a useful way to learn fast
            ways of navigating.&nbsp;
        </li>
        <li>
            <strong>Cut and paste into Excel, EMail </strong>- You can cut and paste regions
            of data in a grid-view into excel or into E-mail as text.&nbsp;&nbsp;
        </li>
        <li>
            <strong>Cut and Paste Ranges</strong> - When you have selected exactly two numeric
            cells, this is copied (Ctrl-C) as two numbers.&nbsp; If you paste this into the
            &#39;start&#39; textbox, it will set both the start and end values.&nbsp;&nbsp;
        </li>
        <li>
            <strong>Quick Sums, Averages</strong> - Whenever you select more than one numeric
            cell in a grid-view, The sum, count, average (and possibly difference) is displayed
            in the status bar.&nbsp;&nbsp; These results can be cut and pasted from the status
            bar.
        </li>
        <li>
            <strong>Quick Selection of numbers</strong> If you wish to copy a number out of
            the status bar, you can quickly select it by simply double clicking on it, and hitting
            Ctrl-C.&nbsp;
        </li>
        <li>
            <strong>Quick Calculator</strong> - If you copy a number into the clipboard then
            when a single cell is selected in the gridview, the sum, difference product and
            ratio of the selected cell and the clipboard value is displayed in the status bar.&nbsp;&nbsp;
            This is useful when you want to do arithmetic on cells in different views.
        </li>
    </ul>
    <hr />
    <!--  ********************************** -->
    <h2>
        <a id="FAQ">Frequently Asked Questions (FAQ))</a>
    </h2>
    <ul>
        <li>
            <strong>How do I get rid of ? in node names (e.g. ntdll!?)?</strong>
            <p>
                PerfView emits a ? for any program address that it cannot resolve to a symbolic
                name.&nbsp;&nbsp;&nbsp; See<a href="#TroubleshootingSymbols"> Troubleshooting Symbols</a>
                and <a href="#SymbolResolution">Symbol Resolution</a> for more.
            </p>
        </li>
        <li>
            <strong>What are &#39;BROKEN&#39; stacks?&nbsp; What causes BROKEN stacks?</strong>
            <p>
                If the stack trace that is taken at data sample time does not terminate in OS DLL
                that starts threads, the stack is considered broken.&nbsp; See<a href="#BrokenStacks">
                    Broken Stacks
                </a> for more.
            </p>
        </li>
        <li>
            <strong>Stack frames seem to be missing.&nbsp; What is going on?</strong>
            <p>
                The algorithm used to crawl the stack is not perfect.&nbsp;&nbsp; In some cases
                there is not sufficient information on the stack to quickly find the caller.&nbsp;&nbsp;
                Also compilers perform inlining, tailcall and other operations that literally remove
                the frame completely at runtime.&nbsp;&nbsp;&nbsp; The good news is that while sometimes
                confusing, it is usually pretty easy to fill in the gaps. &nbsp; See<a href="#MissingFrames">
                    Missing Frames
                </a> for more.
            </p>
        </li>
        <li>
            <strong>
                .NET Programs spend a lot of time in clr.dll (or mscorwks.dll) at shutdown.
                What is that?
            </strong>
            <p>
                In order to get good symbolic information for .NET methods, it is necessary for
                the CLR runtime to dump the mapping from native instruction location to method name.&nbsp;
                This is done when the process shuts down (or when PerfView requests and rundown
                explicitly).&nbsp;&nbsp; The CPU consumed by this is uninteresting from an analysis
                perspective (because it does not occur normally). The easiest way to exclude this
                time is to set a time range that does not include the process shutdown.&nbsp; See
                <a href="#ZoomingToARangeOfInterest">zooming to a range of interest</a> for more.
            </p>
        </li>
        <li>
            <strong>
                What is mscorwks!PreStubWorker or clr!PreStubWorker? I see my code calling
                it.&nbsp;
            </strong>&nbsp;
            <p>
                PreStubWorker is the method in the .NET Runtime that is the first method in the
                .NET Runtime Just-in-time compiler.&nbsp;&nbsp; This method will be called the first
                time a method is called to convert the code in the EXE (which is NOT native code)
                into native code that can be executed by the processor.&nbsp;&nbsp; If the amount
                of time in this helper (inclusively) is large, it can be reduced by using the NGEN.exe
                tool to precompile the code.
            </p>
        </li>
    </ul>
    <ul>
        <li>
            <strong>What does (unmerged) mean in the main viewer?.&nbsp; </strong>&nbsp;
            <p>
                When ETW data is first collected, it actually comes in two files an .ETL file (which
                the viewer shows you) and a .Kernel.ETL file (which the viewer hides from you).&nbsp;&nbsp;
                Moreover these files do not contain information (precise dll versions) needed if
                you wish to examine the data on a different machine.&nbsp;&nbsp;&nbsp; Merging is
                the process of combining these files and adding the extra information.&nbsp;&nbsp;&nbsp;
                Because merging can take some time (10s of seconds) it is not done by default, and
                the viewer indicates this by displaying &#39;(unmerged)&#39;.&nbsp;&nbsp; This is
                a warning to you that if you wish to copy this file to another machine you will
                need to merge it first.&nbsp; See <a href="#merging">merging</a> for more.&nbsp;
            </p>
        </li>
        <li>
            <strong>What is PerfView&#39;s Relationship to XPERF.exe&nbsp; /&nbsp; WPA.exe </strong>
            &nbsp;
            <p>
                The NT performance team has a tool called XPERF (and a newer version called
                <a href="http://msdn.microsoft.com/en-us/library/ff191077(v=VS.85).aspx">Windows Performance Analyzer (WPA) </a>
                which is also VERY useful for doing performance
                analysis.&nbsp; In fact, PerfView and XPERF/WAP should not really be considered
                competitors.&nbsp; In fact they both use the same data (ETW data collected by various
                ETW providers).&nbsp;&nbsp; The ETL files created by XPERF can be viewed by PerfView
                and vice versa because they really are very similar programs.&nbsp; So which should
                you use?&nbsp; The good news is that it does not really matter that much, since
                you can change your mind at any point.&nbsp;&nbsp; Currently PerfView has more power
                grouping capabilities, so XPERF users may want to try PerfView out when they encounter
                &#39;flat&#39; profiles.&nbsp;&nbsp; Conversely, WPA has better graphing capabilities
                as well as memory views that PerfView simply does not have.&nbsp;&nbsp;&nbsp;
            </p>
            <p>
                PerfView has /wpr qualifier that eases some friction when using WPA to view data
                collected with PerfView.   See <a href="#WorkingWithWPA">Working with WPA</a> for more.

            </p>
        </li>
    </ul>
    <ul>
        <li>
            <strong>What is PerfView&#39;s Relationship to the Visual Studio Profiler? </strong>
            &nbsp;
            <p>
                Visual Studio also has a profiler built into it, so the question arises why not
                use that?&nbsp;&nbsp; The answer is you should!&nbsp;&nbsp; However the Visual Studio
                profiler&#39;s goal was to make profiling easy at development time.&nbsp;&nbsp;&nbsp;
                Also it concentrates on CPU issues.&nbsp;&nbsp;&nbsp; If you have need to collect
                profile information &#39;in the field&#39; (which typically includes test labs),
                it is hard to use the VS profiler (you have to install it, which includes creating
                a device driver).&nbsp; It also it cumbersome to attach to services (often there
                are security issues).&nbsp; The result is that it is hard to use the VS profiler
                outside of development time.
            </p>
        </li>
    </ul>
    <p>
        &nbsp;
    </p>
    <!--  ********************************** -->
    <h2>
        <a id="ReleaseNotes">Release Notes</a>
    </h2>
    <ul>
        <!--
        <li>Version 1.4.1 - 1/XX/2013
            <ul>
            <li>XXX</li>
              <li>Improved the performance of the stack-view substantially by building a simplified regular expression matcher
                    instead of using System.Text.RegularExpressions.  This should improve 'update' speeds in the stack viewer
                    in common scenarios by of 2 </li>
                <li>Allowed the stack-view to parallelize much of the activity, improving performance by probably another factor
                    of about 2 or more in many cases.   More can be done here to take advantage of Quad procs etc </li>
                <li>Because it would be very easy for the above two features to cause the code to break, I have added a /safeMode
                    command qualifier that turns them off.   If you get a crash when updating the stack viewer, you should retry
                    with /safemode</li>
            </ul>
        </li>
        -->
        <li>
            Version 2.0.39 3/20/19
            <ul>
                <li>
                    Merged kayle's update to display the type of the alloction for C++ code (in the Net OS Heap Alloc View).
                    It is now the case that if you have PDBS for the call site of a C++ 'new' expression and that compiler
                    supports it (I believe anything after VS2017 CPP compiler will work), then PerfView will create a 'Type XXX'
                    pseduo-node for allocation sites.   Having this type information can definitely be useful.
                </li>
            </ul>
        </li>
        <li>
            Version 2.0.32 12/14/18
            <ul>
                <li>
                    Added the /focusProcess=ProcessIDOrName qualifier (e.g. focusProcess=PerfView.exe) This allows you
                    to only turn on non-Kernel events
                    for a particular process, and thus cut the overhead / size of the collection when there are many
                    active processes on the system..   Note that it does not have an effect on kernel events (which are
                    often the most common, but not always), so it may not help as much as you would like, but DEFINITELY
                    helps during rundown (if you have many managed processes, they all do rundown which can be impactful).
                    So it always helps when there are many managed processes (because of rundown) but can help quite a lot
                    if many of those processes allocate a lot, or use the threadpool (which both can create many events).
                </li>
                <li>
                    Added a popup warning if the ETL file has events out of order in time (this should not happen but
                    when it does, it can produce GUI anomalies, so I want the warning to be obvious).    Added a
                    FirstTimeInversion property to support this feature.
                </li>
            </ul>
        </li>
        <li>
            Version 2.0.28 10/2/18
            <ul>
                <li>
                    Fixed parsing of Task Parallel library parsing to include the .NET Core 2.1 event
                    System.Threading.Tasks.TplEventSource/IncompleteAsyncMethod used to find 'orphaned' Async operations.
                    Also added this event  to the default collection for TPL, so that it is always 'just here'.

                    Basically this is a new feature of the .NET Core task library that notices when tasks are created,
                    but then collected without ever being completed one way or the other.   This can happen if the
                    TaskCompletionSource dies before it calls 'Complete' on the task.

                    The string in the event is the name of the method where the orphaned machine (Task) will return
                    when it continues.   The code that was supposed to trigger the 'await' to complete is at fault.

                    This feature needs to be friendlier but it is a big step from knowing nothing.
                </li>
                <li>
                    modified the TraceEvent library's concept of what the 'version of the manifest is to' include
                    a term that is 100 * the largest event ID.   Thus if you add a new event (at the end), you can
                    remove (clean up) a few dozen unused events and still be considered 'better'.   Note that this should
                    be used with care, as it implys that the deleted events are not EVER useful (even for old code that
                    still emits them), because TraceEvent will not parse them going forward (The TPL EventSource did just
                    this which is why it came up here.)
                </li>
            </ul>
        </li>
        <li>
            Version 2.0.27 9/25/18
            <ul>
                <li>
                    Fixed 'PerfView Listen EVENTSOURCE' so that it works without the * prefix for EventSources.
                </li>
                <li>Fixed missing descriptions for user commands</li>
                <li>
                    Added support for the /SessionName=XXXX parameter which renames both the user and kernel
                    session names that PerfView uses (which allow you to have two PerfView's running or run
                    with other tools that use the kernel provider)
                </li>
                <li>Use stack compression by default</li>
                <li>
                    Stop the kernel and user mode session concurrently.  This helps when the disks are very
                    slow (VMs), to keep the two sessions overlapping maximally
                </li>
            </ul>
        </li>
        <li>
            Version 2.0.22 8/15/18
            <ul>
                <li>
                    Added the /DotNetCallsSampled command line option that does call instrumentation
                    but samples every 997 calls (to keep overhead low)
                </li>
                <li>
                    Added the /DisableInlining command line option that tells the runtime not to
                    inline (used with the /DotNetCalls or /DotNetCallsSampled options)
                </li>
                <li>
                    Minor bug fixes so that things work inside windows docker containers.
                    This works on windowsServerCore Version RS3 or beyond.  PerfViewCollect can
                    be used on windowsNano OS
                </li>
                <li>fixed build to support SourceLink for the PerfView/TraceEvent source itself.</li>
                <li>Added docs for using PerfView in windowservercore and nanoserver containers.</li>
            </ul>
        </li>
        <li>
            Version 2.0.17 5/25/18
            <ul>
                <li>
                    Added support for the ThreadName property that the OS supports.   The Thread/SetName
                    event is now parsed well, and if the name is present it shows up in the Stack views.
                </li>
            </ul>
        </li>
        <li>
            Version 2.0.16 5/22/18
            <ul>
                <li>
                    Fix bug when parsing 'mixed' EventSources that use both Manifest events and self-describing events
                    in the same EventSource, leading to the self-describing events being parsed as (garbled) manifest
                    events.    This can happen when using EventCounters pretty easily since EventCounters use the self-describing
                    format.
                </li>
            </ul>
        </li>
        <li>
            Version 2.0.15 5/14/18
            <ul>
                <li>
                    Changed the default symbol cache to %TEMP%\SymbolCache.   This aligns PerfView with what Visual Studio does
                    which saves some space.  Because PerfView remembers the symbol path from invocation to invocation, this change
                    will not affect existing places where PerfView is run.  To use the new cache location you need to use the
                    file -> Clear User Config, and restart.  But mostly you should not care.
                </li>

            </ul>
        </li>
        <li>
            Version 2.0.2 1/12/18
            <ul>
                <li>
                    Added support for SourceLink for 'Goto Source' functionality.
                    SourceLink is a technique of finding source files by placing a mapping from built time file name to URL into the
                    symbol file so that the source code can be fetched by URL at debug/profiling time.
                    .NET Core annotates all its symbol files this way.   The result is that 'Goto Source' on .NET Core assemblies
                    (that is the framework and ASP.NET) just work in PerfView (it will bring up the relevant source).
                </li>

            </ul>
        </li>
        <li>
            Version 2.0.1 1/8/18
            <ul>
                <li>
                    Added <a href="#FlameGraphView">Flame Graph</a>.
                </li>

            </ul>
        </li>
        <li>
            Version 2.0.0 1/5/18
            <ul>
                <li>
                    Officially update the version number to 2.0 in preparation for signing and releasing officially.
                    Only the version number update happens here.
                </li>

            </ul>
        </li>
        <li>
            Version 1.9.71 1/3/18
            <ul>
                <li>
                    Fix an issue in TraceEvent that causes double-dispatch of some events.   This is most likely to affect
                    the Start-stop activities.   Thus if there is strangeness there, this may fix it.
                </li>

            </ul>
        </li>
        <li>
            Version 1.9.70 12/15/17
            <ul>
                <li>
                    Fixed issue where when PerfView is run on older .NET Runtime's it fails to load the
                    System.Runtime.InteropServices.RuntimeInformation.dll.
                </li>

            </ul>
        </li>
        <li>
            Version 1.9.69 12/14/17
            <ul>
                <li>
                    Added the /LowPriority command line qualifier that causes the merging/NGENing/ZIPPing that
                    perfview does to package up the data to happen at low CPU priority to minimize the impact
                    to the system.
                </li>

            </ul>
        </li>
        <li>
            Version 1.9.68 11/8/17
            <ul>
                <li>
                    If you are collecting with something that needs a .NET Profiler (the .NET Alloc, .NET Alloc Sampled or .NET Calls).
                    it is possible that modifications to the registry that install PerfViews profiler are not being cleaned up.
                    The effect of this is mostly that other tools that might use the .NET Profiler will not work properly (e.g.
                    code coverage tools or other profilers).   This is most likely to happen on 64 bit and .NET Core (Desktop .NET
                    is likely to work OK).   This fix makes the cleanup thorough.  The fix will 'clean up' any keys left behind
                    by old PerfView runs.
                </li>

            </ul>
        </li>
        <li>
            Version 1.9.67 11/2/17
            <ul>
                <li>
                    Added the Gen2 Object Death view that use the 100KB allocation events (coarse sampling).  Thus most traces
                    will now have this view (including the /GCOnly view).   This view shows you were you allocated objects that then die in Gen 2 (These are the
                    most important for reducing the number of Gen2 GCs (and Gen 2 GC fragmentation)).   You could do this before
                    when you turned on /DotNetAlloc or /DotNetAllocSampled collection but those are more expensive and can have
                    logistic issues (you can't attach to a existing process).    The view will only show you a coarse sampling
                    but that often has useful information.
                </li>
                <li>
                    Added the 'GC Occurred Gen(X)' frame to the GC Heap Net Alloc and GC 2 Object Death views.   These are
                    useful for seeing where the GCs in time without having to go to the GCStats or Events views.
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.66 10/27/17
            <ul>
                <li>
                    Updated the support DLLs that parse .diagsession files.  This allows it to read the newest format.
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.64 9/27/17
            <ul>
                <li>
                    Added Support for Argon (light weight) Windows containers.   Most of this is in fact work-arounds which
                    will eventually be removed, but this makes PerfView work with Argon containers in the RS3 version of the OS
                    (the version currently available).   Note that there seems to still be issues with looking up symbols for SOME
                    OS DLLs, but all managed code should work.    Also PerfView is a GUI app and Argon containers don't use
                    GUI, so you need to use the techniques in 'Automating data collection' to use PerfView in the container.
                    (for example 'Perfview.exe /logfile:logfile.txt /accepteula /maxcollectsec:30 collect').  PerfView (like
                    all GUI apps) will run in the background if run from the command line directly, but will block until exit
                    when run from a batch script).
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.55 5/9/17
            <ul>
                <li>
                    Change /GCCollectOnly so that it also collect Kernel Image load events.   This is useful because
                    it allows you to get software version information which otherwise is unavailable without increasing
                    the size of the resulting file significantly.
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.54 5/5/17
            <ul>
                <li>
                    Fixed bug where Process name for the MapFile event was incorrect.
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.51 1/31/17
            <ul>
                <li>
                    Added support for reading files from the YourKit java profiler.  This works for both their CPU trees
                    as well as their object allocation trees.  These XML files need to be named '*.tree.xml' for perfview
                    to recognize the file as something it understands.   See XmlTreeStackSource for more details.
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.50 1/24/17
            <ul>
                <li>
                    Fixed issue where the 'processes' view was giving negative start times and other bogus values.
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.49 1/6/17
            <ul>
                <li>
                    Better names for start-stop coming from Diagnostics Sources.
                    This helps for doing ASP.NET Core uses DiagnosticSource for both
                    incoming and outgoing HTTP requests.
                    use
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.47 1/5/17
            <ul>
                <li>
                    Enable DiagnosticSource and ApplicationsInsight providers by default.
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.46 1/4/17
            <ul>
                <li>
                    Added the GIT commit hash to the module information in the 'Modules' Excel table in the 'Processes' view.
                    This commit will also show up in the ImageLoad event in the 'events view.   Useful for finding the source
                    code for a particular module.   This information is fetched from  the 'FileVersion field of the version
                    information for the file (what fileVersion -v returns).  It is looking for 'Commit Hash: HASH'.  If it does
                    not find this on FileVersion, it looks on the ProductVersion field.
                </li>
                <li>
                    Extend the UserCommand Listen command to take full ETW provider specs rather that just the ETW provider name
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.45 12/19/16
            <ul>
                <li> Fixed problem getting symbols for System.Private.CoreLib.ni.dll by using /ForceNGENRundown. </li>
            </ul>
        </li>
        <li>
            Version 1.9.44 12/16/16
            <ul>
                <li> Supported .NET Alloc, .NET Sample Alloc and .NET Calls on .NET Core.  </li>
            </ul>
        </li>
        <li>
            Version 1.9.41 11/26/16
            <ul>
                <li> Fix Null Ref when opening Thread Time With Start-Stop Activities. </li>
            </ul>
        </li>
        <li>
            Version 1.9.40 10/14/16
            <ul>
                <li> Update version number to 1.9.40 for GitHub release. </li>
            </ul>
        </li>
        <li>
            Version 1.9.33 9/16/16
            <ul>
                <li> Merged in code to fix .NET Core ReadyToRun images by running crossgen with .ni.dll file names </li>
            </ul>
        </li>
        <li>
            Version 1.9.31 9/16/16
            <ul>
                <li> Fix issue getting symbols for .NET Core's CoreLib.ni NGEN image.</li>
            </ul>
        </li>
        <li>
            Version 1.9.30 9/13/16
            <ul>
                <li> Fix issue https://github.com/Microsoft/perfview/issues/116.  Problem opening ETL files with bad end time. </li>
            </ul>
        </li>
        <li>
            Version 1.9.29 9/7/16
            <ul>
                <li> Fix issue where if you do GC dump with 'save etl' more than once from the same process you don't get type names.  </li>
            </ul>
        </li>
        <li>
            Version 1.9.28 9/7/16
            <ul>
                <li> Fix perf issue with traceLogging support </li>
            </ul>
        </li>
        <li>
            Version 1.9.27 9/5/16
            <ul>
                <li> Reorganize TraceLogging fix into its own class (TraceLoggingEventID). </li>
            </ul>
        </li>
        <li>
            Version 1.9.26 9/2/16
            <ul>
                <li>
                    Fix the parsing of Events generated by Windows 10 TraceLogging APIs.
                    While they generally worked in the native case, in JavaScript they were
                    liked to be broken.   The issue is that TraceLogging events no longer give
                    an small integer Event ID that was guaranteed to be unique for that
                    layout of event.   Instead you simply have a blob of meta-data.   Fixed this
                    by assigning an event ID to each such blob (would have been nice if ETW
                    had simply done that)
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.25 8/31/16
            <ul>
                <li>
                    Fix symbol lookup but associated with 1.9.24 (can't find PDB signature)
                </li>
                <li>Change the convention for PDB naming for ready-to-run images.</li>
            </ul>
        </li>
        <li>
            Version 1.9.24 8/31/16
            <ul>
                <li>
                    Added ability to property create PDBS for NGEN and read-to-run images
                    for .NET Core scenarios.  Note that this support is likely to be ripped out
                    when these PDBS are up on a symbol server properly.
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.23 8/30/16
            <ul>
                <li> Fixed issues with Activity views in .NET Core. </li>
                <li> Added the command line arguments to the process node in the stack viewers </li>
                <li> Hack to make ready-to-run PDB lookup work (really needs crossgen to be fixed, but this makes things work in the mean time)</li>
            </ul>
        </li>
        <li>
            Version 1.9.22 8/29/16
            <ul>
                <li>
                    If you place a 'symbols' directory next to a data file, PerfView will place any PDBs needed in
                    that directory.   This is a handy feature when you are sharing data with other people with data
                    files that are private builds.  Unfortunately, a few versions back this logic was broken.
                    This update fixes this.
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.21 8/10/16
            <ul>
                <li>
                    Removed Just My app for dotnet.exe hosts since it is does more harm than good.
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.20 8/9/16
            <ul>
                <li>
                    Fixed broken opening of .diagsession files.
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.19 6/17/16
            <ul>
                <li>
                    Fixed issue opening trace.zip files introduced in last update.
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.18 6/15/16
            <ul>
                <li>
                    Fixed issue where .Trace.ZIP files without LTTng information would fail when viewing the CPU stacks with a file in use error.
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.17 6/14/16
            <ul>
                <li>
                    Fixes issue with out of memory when taking a .GCDump from a very large process dump.    Improved the out of
                    memory logic to automatically retry with smaller values.
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.16 6/2/16
            <ul>
                <li>Made the view for a *.trace.zip file  show all the possible sub-views (CPU stacks as well as LTTng data).</li>
            </ul>
        </li>
        <li>
            Version 1.9.15 6/1/16
            <ul>
                <li>Integrated Lee's fixes for LTTng support for GC Heap dumps on Linux.</li>
            </ul>
        </li>
        <li>
            Version 1.9.14 5/20/16
            <ul>
                <li>Updated DirecotrySize view to recognise NGEN images and Ready-To-Run images.  </li>
            </ul>
        </li>
        <li>
            Version 1.9.13 5/19/16
            <ul>
                <li>Fixes to make .NET Core Ready-to-run images work properly; </li>
                <li>Added the PdbSignature user command (help debug PDB symbol match issues)</li>
            </ul>
        </li>
        <li>
            Version 1.9.12 5/16/16
            <ul>
                <li> Updated default symbol paths to include NuGet locations. </li>
            </ul>
        </li>
        <li>
            Version 1.9.11 5/13/16
            <ul>
                <li> Fixed issue looking at heap dumps in ETL files.  </li>
                <li> Fixed activity paths to have // prefix again. </li>
            </ul>
        </li>
        <li>
            Version 1.9.8 4/22/16
            <ul>
                <li> Added the 'Advanced Group' to .GCDump files and put everything but the heap in it </li>
                <li> Added a bit more information to the .GCDump log spew. </li>
            </ul>
        </li>
        <li>
            Version 1.9.8 4/5/16
            <ul>
                <li>
                    Added TotalHeapSize TotalPromotedSIze and Depth fields to the GC/HeapStats event.  This is
                    useful for /StopOnEtwEvent uses (e.g. stop when the GC heap gets too big)
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.7 4/4/16
            <ul>
                <li>Fix issue getting source code from NGEN images on .NET Core scenarios. </li>
                <li>Added support to collect File Open (Create) events (with stacks) by default. </li>
                <li>Also add collection of Process Create events (with stacks) by default</li>
            </ul>
        </li>
        <li>
            Version 1.9.6 3/28/16
            <ul>
                <li>Fix asserts associated with keeping EnumerateTemplates in sync with TraceEventParser events. </li>
                <li>Made PDB expansion logic a bit more robust.</li>
            </ul>
        </li>
        <li>
            Version 1.9.5 3/28/16
            <ul>
                <li>Make the heap dumper retry with a smaller maxObjectCount if it runs out of memory</li>
                <li>Tuned the CLR rundown to avoid unnecessary events (in high volume scenarios)</li>
            </ul>
        </li>
        <li>
            Version 1.9.4 3/28/16
            <ul>
                <li>Fixed failure to load NGEN images in .NET Core scenarios</li>
                <li>Change it so that PDBS that are in the build location or next to the DLL are checked first </li>
                <li>(thus no network operations if you build locally)</li>
            </ul>
        </li>
        <li>
            Version 1.9.3 3/28/16
            <ul>
                <li>
                    Fixed failure reading Linux traces that have unusual characters in their path name.
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.1 2/22/16
            <ul>
                <li>
                    Added Power events (so you can know how throttled the CPU is)
                </li>
            </ul>
        </li>
        <li>
            Version 1.9.0 2/12/16
            <ul>
                <li>
                    Updated documentation.  Preped for release to web.
                </li>
            </ul>
        </li>
        <li>
            Version 1.8.28 2/7/16
            <ul>
                <li>
                    Categorized items in etl files into 'memory' 'specialized' and 'obsolete' group so people are more
                    naturally drawn to the most important views.  Removed blocked time (thread Time supercedes it)
                </li>
                <li>Added Support for CrossGen when auto-generating NGEN pdbs (for CoreCLR)</li>
                <li>
                    Added Support for .perfView.json and perfView.json.zip files.  You can give it a JSON file like the following which
                    has two samples in it.
                    <pre>                    
{
    "StackSource": {
        "Samples": [
            {
                "Time": "10.1",
                "Stack": [
                    "Executing Func for Sample 1",
                    "Calling Func",
                    "Main"
                ]
            },
            {
                "Time": "10.1",
                "Stack": [
                    "Executing Func for Sample 2",
                    "Calling Func",
                    "Main"
                ]
            }
        ]
    }
}
                </pre>
                </li>
                <li>
                    Version 1.8.28 2/4/16
                    <ul>
                        <li>
                            Added support doing performance investigations with Linux Perf Events data.   Basically if
                            collect data with the bash script https://raw.githubusercontent.com/dotnet/corefx-tools/master/src/performance/perfcollect/perfcollect
                            it will runt the Linux 'perf' tool that will collect CPU samples, convert them to a .data.txt file
                            (which is a textual representation of the data) and then ZIP it into a .trace.zip file PerfView
                            knows how to decode either the uncompressed .data.txt file or the zipped .trace.zip file and
                            display it as a stack view.   Thus you can now do linux  performance investigations with PerfView.
                        </li>
                    </ul>
                </li>
                <li>
                    Version 1.8.25 2/2/16
                    <ul>
                        <li>
                            Improvements in Start-Stop time.   UNKNOWN_ASYNC displayed more often, some AWAIT time shown more often.
                        </li>
                    </ul>
                </li>
                <li>
                    Version 1.8.24 1/27/16
                    <ul>
                        <li>
                            When opening 'Drill Into' windows, the columns are not in the order of the parent window in the ByName view.
                            Fixed this.
                        </li>
                    </ul>
                </li>
                <li>
                    Version 1.8.23 1/26/16
                    <ul>
                        <li>
                            Merging failed on Win7 and Win2k8 systems in PerfView Version 1.8.  This means you could still analyze on
                            the machine where you collected, but symbols would fail to look up if you took the trace off the system.
                            Fixed by including an old version of KernelTraceControl.dll an used it on Win7 systems.
                        </li>
                    </ul>
                </li>
                <li>
                    Version 1.8.22 1/23/16
                    <ul>
                        <li>
                            Fixed ArgumentOutOfRange exceptions thrown in EventView for some events (strings with length prefixes)
                        </li>
                        <li>Don't crash if regular expressions are incorrect in Events view. </li>
                    </ul>
                </li>
                <li>
                    Version 1.8.21 1/18/16
                    <ul>
                        <li>
                            Extended perfView.xml file format so that it can more easily consume 'ad hoc' creation of stacks.
                            It still accepts the 'interned' scheme where you give IDs to each frame and stack and use those
                            to create samples, but now you can specify the samples inline with the sample like this
                            <pre>
&lt;StackWindow&gt;
    &lt;StackSource&gt;
        &lt;Samples&gt;
            &lt;Sample Time="10"&gt;
                Executing Func for Sample 1
                Calling Func
                Main
            &lt;/Sample&gt;
            &lt;Sample Time="20"&gt;
                Executing Func for Sample 2
                Calling Func
                Main
            &lt;/Sample&gt;
        &lt;/Samples&gt;
    &lt;/StackSource&gt; 
&lt;/StackWindow&gt;
                    </pre>
                            While this format is inefficient (you repeat many strings in many stacks), it is sometimes
                            convenient, and it is easy enough to support.   There are more details which I will blog about in
                            the near future.
                        </li>
                    </ul>
                </li>
                <li>
                    Version 1.8.20 1/13/16
                    <ul>
                        <li>
                            Improved the robustness of the UserCommand 'Listen' command in the face of bad events.
                        </li>
                    </ul>
                </li>
                <li>
                    Version 1.8.19 1/7/16
                    <ul>
                        <li>
                            Significantly improved the Thread Time with Start-Stop Activities.  The goal here is
                            that this view replaces the ASP.NET and Service Request view, and we are probably most of
                            the way there now.   I need to validate this more and then probably obsolete the other views.
                        </li>
                    </ul>
                </li>
                <li>
                    Version 1.8.15 12/4/15
                    <ul>
                        <li>
                            Fixed a fairly serious bug associated with the Events Viewer where you don't see some CLR events
                            (They appear in the left pane, but you never see them in the right pane even though there are
                            instances of them in the file).   Note that version 1.8.0 does not have this bug, it was introduced
                            relatively recently.
                        </li>
                        <li>
                            Added ActivityInfo and StartStopActivity fields to Events View.   ActivityInfo will show you the
                            creation and start time (and the raw ID) of the System.Threading.Tasks.Task that logged  the event.
                            StartStopActivity shows you the name of the start-stop activity that
                            is logged the event.
                        </li>
                    </ul>
                </li>
                <li>
                    Version 1.8.15 12/4/15
                    <ul>
                        <li>
                            Fixed a fairly serious bug associated with the Events Viewer where you don't see some CLR events
                            (They appear in the left pane, but you never see them in the right pane even though there are
                            instances of them in the file).   Note that version 1.8.0 does not have this bug, it was introduced
                            relatively recently.
                        </li>
                        <li>
                            Added ActivityInfo and StartStopActivity fields to Events View.   ActivityInfo will show you the
                            creation and start time (and the raw ID) of the System.Threading.Tasks.Task that logged  the event.
                            StartStopActivity shows you the name of the start-stop activity that
                            is logged the event.
                        </li>
                    </ul>
                </li>
                <li>
                    Version 1.8.11 11/16/15
                    <ul>
                        <li>
                            Fix excessive warnings when converting ETL files.   Might also fix some StartStop Activity issues.
                        </li>
                    </ul>
                </li>
                <li>
                    Version 1.8.10 11/12/15
                    <ul>
                        <li>
                            Significant improvement in how activity tracking works.  Hopefully the stacks associated with 'with Tasks' views
                            will be better.
                        </li>
                        <li>
                            Added JIT Inlining feature that enables viewing all successful and failed inlining attempts, including the
                            JIT-supplied reason for why inlining wasn't performed in the failure cases.
                        </li>
                        <li>
                            Added finalization feature that tracks finalized objects and provides a table of each type with a finalized object
                            and the associated number of times an object of that type was finalized.
                        </li>
                    </ul>
                </li>
                <li>
                    Version 1.8.9 11/1/15
                    <ul>
                        <li>
                            There is a bug in RC candidates of V4.6.1 where NGEN createPdb only works if the path of the NGEN image
                            is in the Native Image Cache (NIC), but V4.6.1 uses hard links for NGEN images that come from the install itself.
                            The result is that you don't get symbols for mscorlib, system, and system.core.   This adds a work-around
                            for this (normally all paths to the NIC path before calling NGEN CreatePdb), until the runtime is fixed.
                        </li>
                    </ul>
                </li>
                <li>
                    Version 1.8.8 10/31/15
                    <ul>
                        <li>
                            Added support for .NET V4.6.2 convention for NGEN PDB line numbers.   This means that if data is collected on
                            a V4.6.2 then the lack of access IL PDBS are not available at data collection time is not longer an
                            impediment to getting line number information (that is access to the corresponding IL pdb with line number
                            information is no longer needed to create an NGEN pdb that has line number information).
                        </li>
                    </ul>
                </li>
                <li>
                    Version 1.8.7 10/22/15
                    <ul>
                        <li>
                            Integrated changes that allow DyanamicTraceEventParser to do everything that RegisteredTraceEventParser can do.
                            Removed the calls to RegisteredTraceEventParser.   This could break things but should not.   So far things look
                            OK.
                        </li>
                    </ul>
                </li>
                <li>
                    Version 1.8.6 10/12/15
                    <ul>
                        <li>
                            Integrated Lee's update of CLRMD that should make PerfView able to extract heap dumps from debugger dumps of
                            .NET Native processes.
                        </li>
                        <li>Added the DotNet (Telemetry) event ETW provider by default. </li>
                    </ul>
                </li>
                <li>
                    Version 1.8.5 10/6/15
                    <ul>
                        <li>
                            Made 'Any Stacks (with StartStop Activities)' and 'Any StartStopTree' public.
                        </li>
                    </ul>
                </li>
                <li>
                    Version 1.8.3 9/23/15
                    <ul>
                        <li>
                            Turned off System.Threading.Tasks.Task events that are verbose and only needed for debugging.  This was
                            useful before so that any traces I get have detailed information for debugging, but are now impacting
                            the cost of using PerfView in production when Tasks are used heavily.
                        </li>
                    </ul>
                </li>
                <li>
                    Version 1.8.2 9/13/15
                    <ul>
                        <li> /InMemoryCircularBuffer option was broken (Would throw a file not found exception in SetFileName).  Fixed this. </li>
                    </ul>
                </li>
                <li>
                    Version 1.8.1 9/3/15
                    <ul>
                        <li>
                            Fixed issue where Debug versions were asserting that two stacks were attached to the same event
                            because kernel and user mode stacks were not being stitched together properly (mostly in rare cases
                            where thread-starts were happening)
                        </li>
                    </ul>
                </li>

                <li>
                    Version 1.8.0 8/30/15
                    <ul>
                        <li> Release to Web. </li>
                    </ul>
                </li>
                <li>
                    Version 1.7.31 8/19/15
                    <ul>
                        <li>
                            Update code that does merging so it works properly on Win10.  It does not have an effect if you look
                            at the events with PerfView, but on Win10 until this change, data collected with PerfView would not
                            parse EventSource events properly in WPA.
                        </li>
                    </ul>
                </li>
            </ul>
            <p>
                &nbsp;
            </p>
            <p>
                &nbsp;
            </p>
            <p>
                &nbsp;
            </p>
            <p>
                &nbsp;
            </p>
            <p>
                &nbsp;
            </p>
            <p>
                &nbsp;
            </p>
            <p>
                &nbsp;
            </p>
            <p>
                &nbsp;
            </p>
            <p>
                &nbsp;
            </p>
            <p>
                &nbsp;
            </p>
            <p>
                &nbsp;
            </p>
            <p>
                &nbsp;
            </p>
            <p>
                &nbsp;
            </p>
            <p>
                &nbsp;
            </p>
            <p>
                &nbsp;
            </p>
            <p>
                &nbsp;
            </p>
</body>
</html>