{ "cells": [ { "cell_type": "markdown", "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "source": [ "# Perf Avore: A Performance Analysis and Monitoring Tool in FSharp\n", "\n", "For my 2021 F# Advent Submission (5 years of submissions!!!!), I developed a Performance Based Monitoring and Analysis Tool called \"_Perf-Avore_\" that applies user specified __Rules__ that consists of __Conditions__ to match based on Trace Events from either an .ETL trace or a real time session and if the conditions are met, __Actions__ specified in the rule are invoked. The types of __Conditions__ could include to check if the trace event property is an anomaly i.e. a deviant point based on an anomaly detection algorithm or is simply above a threshold value specified in the rule. Similarly, types of __Actions__ could be specified that lead to different outputs such as printing out the callstacks, charting the data point or just simply alerting the user that a condition is met. \n", "\n", "The __purpose__ of Perf Avore is to provide users an easy and configurable way to detect and diagnose performance issues effectively by specifying details that are pertinent to performance issues in the rule itself. A use case, for example, is detecting spikes in memory allocations that can put unwanted pressure on the Garbage Collector and inevitably slow down the process. By specifying a rule that tracks ``AllocationAmount`` on the ``GC/AllocationTick`` event if it goes above a specified amount and then printing out the callstack for it can shed light on the impetus behind the increased pressure.\n", "\n", "## High Level Overview\n", "\n", "![High Level Idea](Images/HighlevelIdea.png)\n", "\n", "1. Users provide rules.\n", " 1. Rules consist of conditions and actions.\n", " 2. Conditions Include: \n", " 1. The Name of the Trace Event and the property they'd like to track. \n", " 2. The condition or case for which they'd like to act on.\n", "2. Trace Events are proffered to the rules engine to apply the rules to.\n", "3. Based on either a given trace or by real time monitoring, conditions are checked for and actions are invoked based on a stream of trace events.\n", "4. Examples of Rules:\n", " 1. ``GC/AllocationTick.AllocationAmount > 200000 : Print Alert``\n", " 2. ``ThreadPoolWorkerThreadAdjustment/Stats.Throughput < 4 : Print CallStack``\n", " 3. ``GC/HeapStats.GenerationSize0 isAnomaly DetectIIDSpike : Print Chart``\n", "\n", "The code is available [here](https://github.com/MokoSan/FSharpAdvent_2021/tree/main/src/PerfAvore/PerfAvore.Console). \n", "\n", "To \n", "\n", "1. Directly jump into the details without reading about the experience developing in FSharp and the motivation, click [here](https://nbviewer.org/github/MokoSan/PerfAvore/blob/main/AdventSubmission.ipynb#Plan).\n", "2. Start learning how to run Perf Avore click [here](https://nbviewer.org/github/MokoSan/PerfAvore/blob/main/AdventSubmission.ipynb#How-To-Run-Perf-Avore). \n", "\n", "## Experience Developing in FSharp \n", "\n", "F#, once again, didn't fail to deliver an incredible development experience! \n", "Despite not developing in F# for an extended period of time (much to my regret - I kicked myself about this during last year's [submission](https://bit.ly/3hhhRjq) as well), I was able to let the muscle memory from my previous projects kick in and reached a productive state surprisingly quickly; I'd like to underscore that this is more of a testament to the ease of usage of the language speaking volumes about the user-friendly nature of the language itself (and not necessarily my some-what-sophomoric acumen). \n", "\n", "Granted, I didn't make use of all the bells and whistles the language had to offer, what I did make use of was incredibly easy to get stuff done with. \n", "The particular aspects of the language that made it easy to develop a Domain Specific Language, a parser for that domain specific language and dynamic application of the actions are Pattern Matching and Immutable Functional Data Structures such as Records and Discriminated Unions that make expressing the domain succinctly and lucidly not only for the developer but also the reader.\n", "\n", "An image that typifies the incredibly accessible nature of F# is the following one filched from a presentation by [Don Syme](https://twitter.com/dsymetweets) and [Kathleen Dollard](https://twitter.com/KathleenDollard) during this year's .NET Conf in November:\n", "\n", "![Why FSharp](Images/WhyFSharp.jpg)\n", "\n", "## Inspiration For the Project\n", "\n", "Perf Avore was heavily inspired by [maoni0's](https://twitter.com/maoni0) [realmon](https://github.com/Maoni0/realmon), a monitoring tool that tells you when GCs happen in a process and some characteristics about these GCs. My contributions and associated interactions for realmon definitely were incredibly instrumental in coming up with this idea and it's implementation.\n", "\n", "Additionally, as a Perf Engineer, I find that there are times where I need to arduously load traces in Perf View, resolve symbols and wait until all the windows open up to do basic things such as look up a single call stack for a single event or look up the payload value of a single event. By devising a simpler solution, I wish to reduce my perf investigation time as I build on this project.\n", "\n", "## 5 Years Going Strong!\n", "\n", "It has been 5 years of submissions to the FSharp Advent event and it has been an awesome experience. Here are links to my previous posts:\n", "\n", "1. [2020: Bayesian Inference in F#](https://bit.ly/3hhhRjq)\n", "2. [2019: Building A Simple Recommendation System in F#](http://t.co/KqE8kfaZQ7)\n", "3. [2018: An Introduction to Probabilistic Programming in F#](https://t.co/fdssLnvzLX)\n", "4. 2017: The Lord of The Rings: An F# Approach\n", " 1. [Introduction](https://t.co/8qGEiwNniY)\n", " 2. [The Path of the Hobbits](https://t.co/UtFQRj3W3X)\n", " 3. [The Path of the Wizard](https://t.co/6AzIg7voAb)\n", " 4. [The Path of the King](https://t.co/ko6bubJqsw)\n", "\n", "Now that a basic overview and other auxiliary topics have been covered, without much more ceremony, I'll be diving into how I built Perf Avore. \n", "\n", "## Plan\n", "\n", "The plan to get rule applications working is threefold:\n", "\n", "1. __Parse Rules__: Convert the user inputted string based rules to a domain defined Rule.\n", "2. __Process Trace Events__: Retrieve trace events from either a trace or a real time process.\n", "3. __Apply Rules__: If the conditions of a rule are met, invoke the action associated with the rule.\n", "\n", "![Birds Eye View](Images/BirdsEyeView.png)\n", "\n", "However, before implementation is presented, it is of paramount importance to define the domain.\n", "\n", "## The Domain\n", "\n", "A Rule is defined as having a __Condition__ and an __Action__. \n", "\n", "``GC/AllocationTick.AllocationAmount > 200000 : Print Alert``\n", "\n", "Here, the user requests that for the said process, an alert will be printed if the ``AllocationAmount`` of the ``GC/AllocationTick`` event is greater than 200,000 bytes. The action if the condition is met is that of alerting the user by outputting a message. \n", "\n", "A rule, more generally, is of the following format: \n", "\n", "``EventName.PropertyName ConditionalOperator ConditionalOperand : ActionOperator ActionOperand``\n", "\n", "where:\n", "\n", "| Part | Description | \n", "| ----------- | ----------- |\n", "| Event Name | The event name from the trace / real time analysis for which we want to look up the property. | \n", "| Property Name | A double property (this may change in the future) for which we'd want to construct a rule for. | \n", "| Conditional Operator | An operator that, along with the Conditional Operand, will dictate situation for which we'll invoke an action for. | \n", "| Conditional Operand | The value or name of the anomaly detection operator along with the Conditional Operator that'll dictate the situation for which we'll invoke an action for. | \n", "| Action Operator | The operator that, along with the action operand will be invoked if a condition is met. | \n", "| Action Operand | The operand for which the action operator will be applied to in case a condition is met. | \n", "\n", "The __Condition__ is modeled as the following combination of records and discriminated unions:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [], "source": [ "// src/PerfAvore/PerfAvore.Console/RulesEngine/Domain.fs\n", "\n", "type Condition = \n", " { Conditioner : Conditioner\n", " ConditionType : ConditionType\n", " ConditionalValue : ConditionalValue }\n", "and Conditioner = \n", " { ConditionerEvent : ConditionerEvent \n", " ConditionerProperty : ConditionerProperty }\n", "and ConditionType = \n", " | LessThan\n", " | LessThanEqualTo\n", " | GreaterThan\n", " | GreaterThanEqualTo\n", " | Equal\n", " | NotEqual\n", " | IsAnomaly\n", "and ConditionalValue =\n", " | Value of double\n", " | AnomalyDetectionType of AnomalyDetectionType \n", "and ConditionerEvent = string\n", "and ConditionerProperty = string\n", "and AnomalyDetectionType =\n", " | DetectIIDSpike" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To accommodate Anomaly Detection algorithms we add a ``IsAnomaly`` as a ``ConditionType`` which, rather than relying on a hardcoded threshold for the Conditional Value will relegate invoking an action onto an Anomaly Detection algorithm. The one that's implemented for this submission is that of an Independently and Identically Distributed Spike anomaly detection algorithm; more details are given below.\n", "\n", "For the sake of completeness, the conditions we define are the following:\n", "\n", "| Condition Operation | Description | \n", "| ----------- | ----------- |\n", "| IsAnomaly | The condition to match on an anomaly detection algorithm. | \n", "| > >= < <= != = | Self explanatory conditional matching based on the value of the event property specified by the rule |\n", "\n", "It is worth noting, right now the library only accepts numeric payloads.\n", "\n", "An Action is modeled as a record of an __ActionOperator__ and an __ActionOperand__:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [], "source": [ "// src/PerfAvore/PerfAvore.Console/RulesEngine/Domain.fs\n", "\n", "type Action = \n", " { ActionOperator: ActionOperator; ActionOperand: ActionOperand }\n", "and ActionOperator = \n", " | Print\n", "and ActionOperand =\n", " | Alert\n", " | CallStack\n", " | Chart" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following are the currently implemented action operands:\n", "\n", "| Name of Action Operands | Description | \n", "| ----------- | ----------- |\n", "| Alert | Alerting Mechanism that'll print out pertinent details about the rule invoked and why it was invoked. |\n", "| Call Stack | If a call stack is available, it will be printed out on the console. |\n", "| Chart | A chart of data points preceding and including the one that triggered the condition of the rule is generated and rendered as an html file | \n", "\n", "As of now, ``Print`` is the only operator that simply outputs the operand to the Console.\n", "\n", "The Rule, a combination of a Condition and a Action along with an identifier and the original rule passed in by the user and therefore is modeled as:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [], "source": [ "// src/PerfAvore/PerfAvore.Console/RulesEngine/Domain.fs\n", "\n", "type Rule = \n", " { Id : Guid\n", " Condition : Condition\n", " Action : Action \n", " InputRule : string }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that the domain is defined, the rule parsing logic can be explained; this makes extensive use of pattern matching after deserializing a list of rules from a specified JSON file that could look like the following:\n", "\n", "```\n", "[ \n", " \"GC/AllocationTick.AllocationAmount > 108000: Print Alert\",\n", " \"GC/AllocationTick.AllocationAmount isAnomaly DetectIIDSpike : Print CallStack\"\n", "]\n", "```\n", "\n", "## Step 1: Parse Rule\n", "\n", "![Step 1](Images/Step1_ParseRule.png)\n", "\n", "This first step's goal is take the user inputted rule as a string to a Rule defined in the domain. The parsing logic is broken into two main functions that break up the logic of parsing the Condition and Action separately. The ``parseCondition`` function is defined as the following and constructs the condition based on the aforementioned constituents:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [], "source": [ "// src/PerfAvore/PerfAvore.Console/RulesEngine/Parser.fs\n", "\n", "let parseCondition (conditionAsString : string) : Condition = \n", "\n", " let splitCondition : string[] = conditionAsString.Split(\" \", StringSplitOptions.RemoveEmptyEntries)\n", " \n", " // Precondition check\n", " if splitCondition.Length <> 3\n", " then invalidArg (nameof conditionAsString) (\"Incorrect format of the condition. Format is: Event.Property Condition ConditionalValue. For example: GCEnd.SuspensionTimeMSec >= 298\")\n", " \n", " // Condition Event and Property\n", " let parseConditioner : Conditioner = \n", " let splitConditioner : string[] = splitCondition.[0].Split(\".\", StringSplitOptions.RemoveEmptyEntries)\n", " let parseConditionEvent : ConditionerEvent = splitConditioner.[0]\n", " let parseConditionProperty : ConditionerProperty = splitConditioner.[1]\n", "\n", " { ConditionerEvent = parseConditionEvent; ConditionerProperty = parseConditionProperty }\n", "\n", " // Condition Type\n", " let parseConditionType : ConditionType =\n", " match splitCondition.[1].ToLower() with\n", " | \">\" | \"greaterthan\" -> ConditionType.GreaterThan \n", " | \"<\" | \"lessthan\" -> ConditionType.LessThan\n", " | \">=\" | \"greaterthanequalto\" | \"greaterthanorequalto\" -> ConditionType.GreaterThanEqualTo\n", " | \"<=\" | \"lessthanequalto\" | \"lessthanorequalto\" -> ConditionType.LessThanEqualTo\n", " | \"=\" | \"equal\" | \"equals\" -> ConditionType.Equal\n", " | \"!=\" | \"notequal\" -> ConditionType.NotEqual\n", " | \"isanomaly\" -> ConditionType.IsAnomaly\n", " | _ -> invalidArg (nameof splitCondition) (\"${splitCondition.[1]} is an unrecognized condition type.\")\n", "\n", " // Condition Value\n", " let parseConditionValue : ConditionalValue =\n", " let conditionalValueAsString = splitCondition.[2].ToLower()\n", " let checkDouble, doubleValue = Double.TryParse conditionalValueAsString \n", " match checkDouble, doubleValue with\n", " | true, v -> ConditionalValue.Value(v)\n", " | false, _ -> \n", " match conditionalValueAsString with\n", " | \"detectiidspike\" -> ConditionalValue.AnomalyDetectionType(AnomalyDetectionType.DetectIIDSpike)\n", " | _ -> invalidArg (nameof splitCondition) ($\"{conditionalValueAsString} is an unrecognized anomaly detection type.\")\n", " \n", " { Conditioner = parseConditioner; ConditionType = parseConditionType; ConditionalValue = parseConditionValue }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, the action parsing logic is implemented via ``parseAction`` function:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [], "source": [ "// src/PerfAvore/PerfAvore.Console/RulesEngine/Parser.fs\n", "\n", "let parseAction (actionAsAString : string) : Action = \n", " let splitAction : string[] = actionAsAString.Split(\" \", StringSplitOptions.RemoveEmptyEntries)\n", "\n", " // ActionOperator\n", " let parseActionOperator : ActionOperator = \n", " match splitAction.[0].ToLower() with\n", " | \"print\" -> ActionOperator.Print\n", " | _ -> invalidArg (nameof splitAction) ($\"{splitAction.[0]} is an unrecognized Action Operator.\")\n", "\n", " // ActionOperand \n", " let parseActionOperand : ActionOperand = \n", " match splitAction.[1].ToLower() with\n", " | \"alert\" -> ActionOperand.Alert\n", " | \"callstack\" -> ActionOperand.CallStack\n", " | \"chart\" -> ActionOperand.Chart\n", " | _ -> invalidArg (nameof splitAction) ($\"{splitAction.[1]} is an unrecognized Action Operand.\")\n", "\n", " \n", " { ActionOperator = parseActionOperator; ActionOperand = parseActionOperand }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, these 2 parsing functions are combined to parse a particular rule:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [], "source": [ "// src/PerfAvore/PerfAvore.Console/RulesEngine/Parser.fs\n", "\n", "let parseRule (ruleAsString : string) : Rule = \n", " let splitRuleAsAString : string[] = ruleAsString.Split(\":\")\n", " let condition : Condition = parseCondition splitRuleAsAString.[0]\n", " let action : Action = parseAction splitRuleAsAString.[1]\n", " { Condition = condition; Action = action; InputRule = ruleAsString; Id = Guid.NewGuid() }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have the functionality of parsing a rule, we want to move on to Step 2 i.e. Processing Trace Events. \n", "\n", "## Step 2: Process Trace Events\n", "\n", "![Step 2: Process Trace Events](Images/Step2_ProcessTraceEvents.png)\n", "\n", "Since both reading Trace Events from a .ETL file and real time event processing had to be accomodated for, a split in the logic is made using a command line parameter ``TracePath``; the absence of this command line parameter will indicate we want to kick off the real time processing logic.\n", "\n", "``Argu``, an F# specific command line argument parsing library is used pattern match based the types of the command line args such as the following:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [], "source": [ "// src/PerfAvore/PerfAvore.Console/CommandLine.fs\n", "\n", "#r \"nuget:Argu\" // Added specifically for this notebook.\n", "\n", "open Argu\n", "\n", "type Arguments = \n", " | [] ProcessName of string\n", " | TracePath of Path : string\n", " | RulesPath of Path : string\n", "\n", " interface IArgParserTemplate with\n", " member s.Usage =\n", " match s with\n", " | TracePath _ -> \"Specify a Path to the Trace.\"\n", " | ProcessName _ -> \"Specify a Process Name.\"\n", " | RulesPath _ -> \"Specify a Path to a Json File With the Rules.\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The usage of the trace path is incorporated like the following:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [ { "data": { "text/html": [ "
True
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "// src/PerfAvore/PerfAvore.Console/Program.fs\n", " \n", "// This is passed in from the command line but for the sake of demonstration, we'll include this as a literal [].\n", "let argv = [| \"--tracepath\"; \"Path.etl\"; \"--processname\"; \"Test.exe\"|]\n", "\n", "let parser = ArgumentParser.Create()\n", "let parsedCommandline = parser.Parse(inputs = argv)\n", "\n", "let containsTracePath : bool = parsedCommandline.Contains TracePath\n", "containsTracePath" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To interface with the Trace Events, the ``Microsoft.Diagnostics.Tracing.TraceEvent`` library that contains the ``TraceLog`` API is used to read events from both the .ETL file and for real time processing for Windows. Inspired by [this](https://twitter.com/dsymetweets/status/1472655546885058570) tweet highlighting [this](https://carpenoctem.dev/blog/fsharp-for-linux-people/) blogpost about F# integration in Linux, I pursued adding rule application functionality for Linux and MacOS; that [API](https://www.nuget.org/packages/Microsoft.Diagnostics.NETCore.Client/), ``Microsoft.Diagnostics.NETCore.Client`` is different from the TraceLog one and will be highlighted below in the code.\n", "\n", "For further details about the TraceLog API, refer to [this](https://github.com/microsoft/perfview/blob/main/documentation/TraceEvent/TraceEventProgrammersGuide.md#higher-level-processing-tracelog) doc. The logic to get the stream of events is achieved by the following two functions based on if the ``tracepath`` is specified as a command line argument. \n", "\n", "The code that retrieves the ``TraceLog`` abstraction if the ``tracepath`` arg is specified is the following:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [ { "data": { "text/html": [ "
Installed Packages
  • Microsoft.Diagnostics.Tracing.TraceEvent, 2.0.74
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "// src/PerfAvore/PerfAvore.Console/TraceSession.fs\n", "\n", "#r \"nuget: Microsoft.Diagnostics.Tracing.TraceEvent\"\n", "\n", "open Microsoft.Diagnostics.Tracing.Etlx\n", "\n", "let getTraceLogFromTracePath (tracePath : string) : TraceLog = \n", " TraceLog.OpenOrConvert tracePath" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And, the code that retrieves the ``TraceEventDispatcher`` and ``Session`` abstraction for that's responsible for real time processing and if the ``tracepath`` arg isn't specified is the following with support for Windows and Linux/MacOS is added:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [ { "data": { "text/html": [ "
Installed Packages
  • Microsoft.Diagnostics.NETCore.Client, 0.2.257301
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "// src/PerfAvore/PerfAvore.Console/TraceSession.fs\n", "\n", "// Needed for compilation\n", "#r \"nuget:Microsoft.Diagnostics.NETCore.Client\"\n", "\n", "// Ignore this impl for now. More details about this in Step 3 but for the sake of success compilation, we need this.\n", "let applyRule (rule: Rule) (traceEvent : TraceEvent) : unit = \n", " ()\n", "\n", "open System\n", "open System.Collections.Generic\n", "open System.Runtime.InteropServices\n", "\n", "open Microsoft.Diagnostics.NETCore.Client\n", "open Microsoft.Diagnostics.Tracing.Etlx\n", "open Microsoft.Diagnostics.Tracing.Session\n", "open Microsoft.Diagnostics.Tracing.Parsers\n", "open Microsoft.Diagnostics.Tracing\n", "\n", "open System.Diagnostics\n", "\n", "let getProcessIdForProcessName (processName : string) : int =\n", " let processes = Process.GetProcessesByName(processName)\n", " if processes.Length < 1 then invalidArg processName $\"No processes with name: {processName} exists.\"\n", " // For the sake of simplicity, choose the first process available with the said name. \n", " else processes.[0].Id\n", "\n", "let getRealTimeSession (processName : string) (parsedRules : Rule list) : TraceEventDispatcher * IDisposable = \n", "\n", " let callbackForAllEvents (processId : int): Action = \n", " Action(fun traceEvent -> \n", " parsedRules\n", " |> List.iter(fun rule ->\n", " if processId = traceEvent.ProcessID then applyRule rule traceEvent))\n", "\n", " let processId = getProcessIdForProcessName processName\n", "\n", " // Windows.\n", " if RuntimeInformation.IsOSPlatform OSPlatform.Windows then\n", " let traceEventSession : TraceEventSession = new TraceEventSession($\"Session_{Guid.NewGuid()}\");\n", "\n", " let keywords : uint64 = uint64(ClrTraceEventParser.Keywords.All) \n", "\n", " traceEventSession.EnableKernelProvider(KernelTraceEventParser.Keywords.All, KernelTraceEventParser.Keywords.None) |> ignore\n", " traceEventSession.EnableProvider(ClrTraceEventParser.ProviderGuid, TraceEventLevel.Verbose, keywords) |> ignore\n", "\n", " // Once the pertinent providers are enabled, create the trace log event source. \n", " let traceLogEventSource = TraceLog.CreateFromTraceEventSession traceEventSession\n", "\n", " // Add all the necessary callbacks.\n", " traceLogEventSource.Clr.add_All(callbackForAllEvents processId) |> ignore\n", " traceLogEventSource.Kernel.add_All(callbackForAllEvents processId) |> ignore\n", "\n", " // TODO: Enable the GLAD events - only available for real time processing.\n", " // ala: https://devblogs.microsoft.com/dotnet/556-2/\n", " traceLogEventSource, traceEventSession\n", "\n", " // Linux / MacOS.\n", " else\n", " let keywords : int64 = int64(ClrTraceEventParser.Keywords.All) \n", " let eventPipeProvider : EventPipeProvider = \n", " EventPipeProvider(\"Microsoft-Windows-DotNETRuntime\", Tracing.EventLevel.Informational, keywords)\n", " let providers = List()\n", " providers.Add eventPipeProvider\n", "\n", " // For the sake of simplicity, choose the first process available with the said name. \n", " let processId = getProcessIdForProcessName processName\n", " let client = DiagnosticsClient(processId)\n", " let eventPipeSession = client.StartEventPipeSession(providers, false)\n", " let source = new EventPipeEventSource(eventPipeSession.EventStream)\n", "\n", " source.Clr.add_All(callbackForAllEvents processId) |> ignore\n", " source.Kernel.add_All(callbackForAllEvents processId ) |> ignore\n", "\n", " source, eventPipeSession" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This function is a bit more involved and requires turning on the Kernel and Clr Providers for those type of events. The events will flow in via callbacks that are subscribed to via the ``callbackForAllEvents`` function. Finally, the ``TraceEventDispatcher`` that'll be used for callstack retrieval and the session that'll need to be disposed once the session ends else, we'll run into a session leak is returned. \n", "\n", "It is worth noting, to enable the Kernel provider, admin privileges are needed; this implies for real time processing in Windows, the process must be started with admin privileges. \n", "\n", "Finally, in the main program, the events are subscribed to in the following manner:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [], "source": [ "// src/PerfAvore/PerfAvore.Console/Program.fs\n", " \n", " // Hard coded values for the sake of successful compilation.\n", " let processName = \"Test.exe\"\n", " let processID = -1\n", " let parsedRules = [ parseRule \"GC/AllocationTick.AllocationAmount > 110000 : Print CallStack\" ]\n", " \n", "let startProcessingEvents() : unit = \n", " // If the trace log file is provided, use the Trace Log API to traverse through all events.\n", " if containsTracePath then \n", " let tracePathArgs = parsedCommandline.GetResult TracePath\n", " let traceLog = getTraceLogFromTracePath tracePathArgs\n", " let events = traceLog.Events \n", " let eventNamesToFilter = parsedRules |> List.map(fun r -> r.Condition.Conditioner.ConditionerEvent.ToString())\n", "\n", " let applyRulesForAllEvents (events : TraceEvent seq) (rules : Rule list) = \n", " events\n", " // Consider events with name of the process and if they contain the events defined in the rules.\n", " |> Seq.filter(fun e -> e.ProcessID = processID && \n", " eventNamesToFilter |> List.contains(e.EventName))\n", " |> Seq.iter(fun e -> \n", " rules\n", " |> List.iter(fun rule -> applyRule rule e ))\n", " applyRulesForAllEvents events parsedRules\n", "\n", " // Else, start a Real Time Session.\n", " // Requires admin privileges\n", " else\n", " let traceLogEventSource, session = getRealTimeSession processName parsedRules\n", " Console.CancelKeyPress.Add(fun e -> session.Dispose() |> ignore )\n", "\n", " traceLogEventSource.Process() |> ignore\n", " ()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the TraceLog API allows iteration through Events as if it is a plain-vanilla ``seq`` unlike the real time session that requires callback registration.\n", "\n", "The next step is to start applying rules and go into details about the implementation of the Action Engine and the different types of actions as well as the anomaly detection logic.\n", "\n", "## Step 3: Apply Rules\n", "\n", "![Step 3](Images/Step3_ApplyRules.png)\n", "\n", "Lastly, the logic that's responsible for applying the rule if the condition is met is added. The application of the rule for a particular ``TraceEvent`` instance and a Rule is to check if the condition specified in the rule matches and then invoking the action.\n", "\n", "The condition checking logic is as follows:\n", "\n", "1. Check if the name of the ``TraceEvent`` matches the condition of the Rule.\n", "2. Check if the property we want to match on is in the ``TraceEvent``.\n", "3. Check if the condition matches based on the ``TraceEvent`` and the rules conditions. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [], "source": [ "// src/PerfAvore/PerfAvore.Console/RulesEngine/ActionEngine.fs\n", "\n", "open System\n", "open System.Linq\n", "\n", "let applyRule (rule : Rule) (traceEvent : TraceEvent) : unit =\n", "\n", " // Helper fn checks if the condition is met for the traceEvent.\n", " let checkCondition : bool =\n", " let condition : Condition = rule.Condition\n", "\n", " // Match the event name.\n", " let matchEventName (traceEvent : TraceEvent) : bool = \n", " traceEvent.EventName = condition.Conditioner.ConditionerEvent\n", " \n", " // Check if the specified payload exists.\n", " let checkPayload (traceEvent : TraceEvent) : bool = \n", " if traceEvent.PayloadNames.Contains condition.Conditioner.ConditionerProperty then true\n", " else false\n", "\n", " // Early return if the payload is unavailable since it will except later if we let it slide. \n", " if ( checkPayload traceEvent ) = false then \n", " false\n", " else\n", " let payload : double = Double.Parse (traceEvent.PayloadByName(condition.Conditioner.ConditionerProperty).ToString())\n", "\n", " // Check if the condition matches.\n", " let checkConditionValue (rule : Rule) (traceEvent : TraceEvent) : bool =\n", " let conditionalValue : ConditionalValue = rule.Condition.ConditionalValue\n", "\n", " match conditionalValue with\n", " | ConditionalValue.Value value ->\n", " match condition.ConditionType with\n", " | ConditionType.Equal -> payload = value\n", " | ConditionType.GreaterThan -> payload > value\n", " | ConditionType.GreaterThanEqualTo -> payload >= value\n", " | ConditionType.LessThan -> payload < value\n", " | ConditionType.LessThanEqualTo -> payload <= value\n", " | ConditionType.NotEqual -> payload <> value\n", " | ConditionType.IsAnomaly -> false // This case should technically not be reached but adding it to prevent warnings.\n", " | ConditionalValue.AnomalyDetectionType anomalyDetectionType ->\n", " match anomalyDetectionType with\n", " | AnomalyDetectionType.DetectIIDSpike ->\n", " // We'll be going over this logic below. Right now simply return false.\n", " false\n", "\n", " // Match on Event Name, if the payload exists and the condition based on the trace event is met.\n", " matchEventName traceEvent && checkPayload traceEvent && checkConditionValue rule traceEvent\n", "\n", " // ... => Apply Actions. \n", " ()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The simple ConditionType checks should be self-explanatory. The AnomalyDetection based approach, however, is a bit more involved. In general, for a point to be considered an anomaly, the context of \"amongst which other values is that data point an anomaly\" is needed; this implies that the history of the TraceEvents immediately before the said point to check the condition for should be kept in memory. As an aside, ``Microsoft.ML`` and ``Microsoft.ML.TimeSeries`` are the two nuget packages that are used for anomaly detection computation.\n", "\n", "To accommodate this logic, a rolling window of the last 'n - 1' points before the TraceEvent in question should be made available at the time of the anomaly detection computation. The implementation of the abstraction involves a queue with a capacity and an eviction policy that dequeues the oldest element and enqueues the incoming element if the queue is at capacity. This abstraction is named ``FixedSizeQueue``." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [], "source": [ "// src/PerfAvore/PerfAvore.Console/RulesEngine/AnomalyDetection/Service.fs\n", "\n", "open System\n", "open System.Collections.Concurrent\n", "\n", "type FixedSizeQueue<'T> (capacity : int) =\n", " // Concurrency might not be necessary but better to be safe than sorry.\n", " let queue = ConcurrentQueue<'T>()\n", "\n", " member this.Capacity : int = capacity\n", " member this.Count : int = queue.Count\n", " member this.Print() : unit = \n", " let stringRepr : string = String.Join(\",\", queue.ToArray())\n", " printfn \"%A\" stringRepr\n", "\n", " member this.Insert (item : 'T) : unit = \n", " // If we are at capacity, evict the first item.\n", " if queue.Count = capacity then \n", " queue.TryDequeue() |> ignore\n", " \n", " // Enqueue the new item to the list.\n", " queue.Enqueue(item)\n", "\n", " member this.GetAll() : seq<'T> = \n", " queue" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The abstraction that is responsible for the orchestration of the retrieval of the last 'n' events will maintain a ``FixedSizeQueue`` for each rule. Before defining the said service, the Anomaly Detection domain must be defined. To keep things as simple as possible, an anomaly detection algorithm takes in a ``Context`` and a ``Result`` returned.\n", "\n", "The ``Context`` will, therefore, have to consist of details about the point in question and the associated Rule. The output would be a result indicating if the point is an anomaly and the confidence with which the algorithm believes the point is an anomaly. The input will consist of the payload value and the associated timestamp." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [ { "data": { "text/html": [ "
Installed Packages
  • Microsoft.ML, 1.7.0
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "// src/PerfAvore/PerfAvore.Console/RulesEngine/Domain.fs\n", "\n", "#r \"nuget:Microsoft.ML\"\n", "\n", "open Microsoft.ML.Data\n", "\n", "type AnomalyDetectionInput() =\n", " []\n", " []\n", " val mutable public timestamp : double \n", "\n", " []\n", " []\n", " val mutable public value : float32 \n", "\n", "type AnomalyDetectionContext = \n", " { Rule : Rule \n", " Input : AnomalyDetectionInput }\n", "type AnomalyDetectionResult = \n", " { Context : AnomalyDetectionContext\n", " IsAnomaly : bool\n", " PValue : double }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have the Anomaly Detection domain defined, the service mentioned before that'll retrieve the fixed size queue can be implemented." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [], "source": [ "// src/PerfAvore/PerfAvore.Console/RulesEngine/AnomalyDetection/Service.fs\n", "\n", "open System\n", "open System.Collections.Concurrent\n", "\n", "type AnomalyDetectionContextService(capacity : int) = \n", " // Keyed on the Rule Id and Value is a FixedSizeQueueForTraceEvents.\n", " // Each Rule that has Anomaly Detection associated with it must have its own Fixed Size Queue.\n", " let cache = ConcurrentDictionary>()\n", "\n", " static member AnomalyPValueHistoryLength : int = 30\n", " static member AnomalyConfidence : double = 95.\n", "\n", " member this.Upsert (ruleId : Guid) (item : AnomalyDetectionInput) : unit =\n", " let queueExists, queue = cache.TryGetValue ruleId\n", " match queueExists, queue with\n", " | true, q -> \n", " q.Insert item\n", " | false, _ -> \n", " cache.[ruleId] <- FixedSizeQueue( capacity )\n", " cache.[ruleId].Insert item\n", "\n", " member this.TryRetrieve(ruleId : Guid) : AnomalyDetectionInput seq option = \n", " let queueExists, queue = cache.TryGetValue ruleId\n", " match queueExists, queue with\n", " | true, q -> Some (q.GetAll())\n", " | false, _ -> None" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The one Anomaly Detection algorithm implemented is that of [Independent and Identically Distributed Spike Detector](https://github.com/dotnet/machinelearning/blob/510f0112d4fbb4d3ee233b9ca95c83fae1f9da91/src/Microsoft.ML.TimeSeries/IidSpikeDetector.cs) from ``Microsoft.ML.TimeSeries`` that makes use of adaptive kernel density estimation to compute p-values to decide how much of an anomaly a certain point is. \n", "\n", "The computation of the kernel p-value can be found [here](https://github.com/dotnet/machinelearning/blob/510f0112d4fbb4d3ee233b9ca95c83fae1f9da91/src/Microsoft.ML.TimeSeries/SequentialAnomalyDetectionTransformBase.cs#L475). To be put as simply as possible, the difference of the value of the point in question and all other points in the fixed size queue is computed and we consider a point an anomaly if there is a huge difference. Of course, I am trivializing the details, however, my intention here is to highlight the intuition more than the gory statistical details. \n", "\n", "Now that all ducks in a row with regard to the Anomaly Detection computation, the rest of the implementation is the following:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [ { "data": { "text/html": [ "
Installed Packages
  • Microsoft.ML.TimeSeries, 1.7.0
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "// src/PerfAvore/PerfAvore.Console/RulesEngine/AnomalyDetection/IIDSpike.fs\n", "\n", "#r \"nuget:Microsoft.ML.TimeSeries\"\n", "\n", "open Microsoft.ML\n", "open Microsoft.ML.Data\n", "open Microsoft.ML.Transforms.TimeSeries\n", "open System.Collections.Generic\n", "\n", "open System.Linq\n", "\n", "let ctx : MLContext = MLContext()\n", "\n", "type Prediction() = \n", " []\n", " [] // prediction i.e. 0/1 + value i.e. payload + p-value\n", " val mutable public Prediction : double[]\n", "\n", "let getAnomaliesUsingIIDSpikeEstimation (input : AnomalyDetectionContext) \n", " (service : AnomalyDetectionContextService) \n", " : AnomalyDetectionResult =\n", " let retrievedInput = service.TryRetrieve input.Rule.Id \n", " let buffer = \n", " match retrievedInput with\n", " | Some b -> b \n", " | None -> failwith $\"Failed to look up Anomaly Detection Buffer for rule: {input.Rule.InputRule}\" \n", "\n", " let dataView = \n", " ctx.Data.LoadFromEnumerable(buffer)\n", " \n", " // If p-value < (1 - confidence / 100.0) -> Alert i.e. anomaly.\n", " let anomalyPipeline : IidSpikeEstimator =\n", " ctx.Transforms.DetectIidSpike(\n", " outputColumnName = \"Prediction\",\n", " inputColumnName = \"value\",\n", " side = AnomalySide.Positive,\n", " confidence = AnomalyDetectionContextService.AnomalyConfidence, // Alert Threshold = 1 - options.Confidence / 100;\n", " pvalueHistoryLength = AnomalyDetectionContextService.AnomalyPValueHistoryLength )\n", "\n", " // For this model, fitting doesn't matter.\n", " let trainedAnomalyModel : IidSpikeDetector \n", " = anomalyPipeline.Fit(ctx.Data.LoadFromEnumerable(List()))\n", " let transformedAnomalyData : IDataView \n", " = trainedAnomalyModel.Transform(dataView)\n", " let anomalies : Prediction seq = \n", " ctx.Data.CreateEnumerable(transformedAnomalyData, reuseRowObject = false)\n", "\n", " // Last one in the buffer since it's the most recent one. \n", " let inputPoint = anomalies.Last()\n", " { Context = input \n", " IsAnomaly = inputPoint.Prediction[0] = 1\n", " PValue = inputPoint.Prediction[2] }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tying the anomaly detection computation all together with the rest of the condition matching logic:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [], "source": [ "// src/PerfAvore/PerfAvore.Console/RulesEngine/ActionEngine.fs\n", "\n", "let anomalyDetectionContextService : AnomalyDetectionContextService = \n", " AnomalyDetectionContextService(AnomalyDetectionContextService.AnomalyPValueHistoryLength)\n", "\n", "let applyRule (rule : Rule) (traceEvent : TraceEvent) : unit =\n", "\n", " // Helper fn checks if the condition is met for the traceEvent.\n", " let checkCondition : bool =\n", " let condition : Condition = rule.Condition\n", "\n", " // Match the event name.\n", " let matchEventName (traceEvent : TraceEvent) : bool = \n", " traceEvent.EventName = condition.Conditioner.ConditionerEvent\n", " \n", " // Check if the specified payload exists.\n", " let checkPayload (traceEvent : TraceEvent) : bool = \n", " if traceEvent.PayloadNames.Contains condition.Conditioner.ConditionerProperty then true\n", " else false\n", "\n", " // Early return if the payload is unavailable since it will except later if we let it slide. \n", " if ( checkPayload traceEvent ) = false then \n", " false\n", " else\n", " let payload : double = Double.Parse (traceEvent.PayloadByName(condition.Conditioner.ConditionerProperty).ToString())\n", "\n", " // Add the new data point to the anomaly detection dict.\n", " let anomalyDetectionInput : AnomalyDetectionInput = \n", " AnomalyDetectionInput(timestamp = traceEvent.TimeStampRelativeMSec, value = float32(payload))\n", " anomalyDetectionContextService.Upsert rule.Id anomalyDetectionInput |> ignore\n", "\n", " // Check if the condition matches.\n", " let checkConditionValue (rule : Rule) (traceEvent : TraceEvent) : bool =\n", " let conditionalValue : ConditionalValue = rule.Condition.ConditionalValue\n", "\n", " match conditionalValue with\n", " | ConditionalValue.Value value ->\n", " match condition.ConditionType with\n", " | ConditionType.Equal -> payload = value\n", " | ConditionType.GreaterThan -> payload > value\n", " | ConditionType.GreaterThanEqualTo -> payload >= value\n", " | ConditionType.LessThan -> payload < value\n", " | ConditionType.LessThanEqualTo -> payload <= value\n", " | ConditionType.NotEqual -> payload <> value\n", " | ConditionType.IsAnomaly -> false // This case should technically not be reached but adding it to prevent warnings.\n", " | ConditionalValue.AnomalyDetectionType anomalyDetectionType ->\n", " match anomalyDetectionType with\n", " | AnomalyDetectionType.DetectIIDSpike ->\n", " let context = { Rule = rule; Input = anomalyDetectionInput }\n", " let result = getAnomaliesUsingIIDSpikeEstimation context anomalyDetectionContextService \n", " result.IsAnomaly\n", "\n", " // Match on Event Name, if the payload exists and the condition based on the trace event is met.\n", " matchEventName traceEvent && checkPayload traceEvent && checkConditionValue rule traceEvent\n", "\n", " ()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, the Action implementation logic is added. The actions that are implemented are the following:\n", "\n", "| Name of Action Type | Description | \n", "| ----------- | ----------- |\n", "| 1. Alert | Alerting Mechanism that'll print out pertinent details about the rule invoked and why it was invoked. |\n", "| 2. Callstack | If a call stack is available, it will be printed out on the console. |\n", "| 3. Chart | A chart of data points preceding and including the one that triggered the condition of the rule is generated and rendered as an html file |\n", "\n", "Alerts are fairly straight forward and are implemented in the following manner using the ``Spectre.Console`` library for it's aesthetic appeal:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [ { "data": { "text/html": [ "
Installed Packages
  • Spectre.Console, 0.43.0
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "// src/PerfAvore/PerfAvore.Console/RulesEngine/Actions/Alerts.fs\n", "\n", "#r \"nuget:Spectre.Console\"\n", "\n", "open Microsoft.Diagnostics.Tracing\n", "open Spectre.Console\n", "\n", "// Added this back to distinguish between this Rule and the open imported from\n", "// opening up Spectre.Console.\n", "type Rule = \n", " { Id : Guid\n", " Condition : Condition\n", " Action : Action \n", " InputRule : string }\n", "\n", "let printAlert (rule : Rule) (traceEvent : TraceEvent) : unit = \n", "\n", " // Create a table\n", " let table = Spectre.Console.Table();\n", " table.Title <- Spectre.Console.TableTitle \"[underline red] Alert! [/]\"\n", "\n", " table.AddColumn(\"Input Rule\") |> ignore\n", " table.AddColumn(\"Timestamp\") |> ignore\n", " table.AddColumn(\"Event Name\") |> ignore\n", " table.AddColumn(\"Event Property\") |> ignore\n", " table.AddColumn(\"Payload\") |> ignore\n", "\n", " table.AddRow( rule.InputRule, \n", " traceEvent.TimeStampRelativeMSec.ToString(), \n", " traceEvent.EventName,\n", " rule.Condition.Conditioner.ConditionerProperty,\n", " traceEvent.PayloadByName(rule.Condition.Conditioner.ConditionerProperty).ToString() ) |> ignore\n", "\n", " table.Border <- Spectre.Console.TableBorder.Square\n", "\n", " // Render the table to the console\n", " Spectre.Console.AnsiConsole.Write(table);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For __Callstack__ actions, symbol resolution is an important step and is highlighted below; a recursive function that walks the stack frame-by-frame and prints out the module and full method name after resolving symbols is used:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [], "source": [ "// src/PerfAvore/PerfAvore.Console/RulesEngine/Actions/CallStack.fs\n", "\n", "open System.IO\n", "\n", "open Microsoft.Diagnostics.Tracing\n", "open Microsoft.Diagnostics.Tracing.Etlx\n", "open Microsoft.Diagnostics.Symbols\n", "\n", "open Spectre.Console\n", "\n", "// Added this back to distinguish between this Rule and the open imported from\n", "// opening up Spectre.Console.\n", "type Rule = \n", " { Id : Guid\n", " Condition : Condition\n", " Action : Action \n", " InputRule : string }\n", "\n", "let symbolReader : SymbolReader = new SymbolReader(TextWriter.Null, SymbolPath.SymbolPathFromEnvironment)\n", "\n", "// Helper fn responsible for getting the call stack from a particular trace event.\n", "let printCallStack (rule: Rule) (traceEvent : TraceEvent) : unit =\n", "\n", " let callStack = traceEvent.CallStack()\n", " if isNull callStack then \n", " printfn $\"Rule: {rule.InputRule} invoked for Event: {traceEvent} however, the call stack associated with the event is null.\" \n", " ()\n", "\n", " let root = Tree(Rule(rule.InputRule.EscapeMarkup()))\n", "\n", " let printStackFrame (callStack : TraceCallStack) : unit =\n", " if not (isNull callStack.CodeAddress.ModuleFile)\n", " then\n", " callStack.CodeAddress.CodeAddresses.LookupSymbolsForModule(symbolReader, callStack.CodeAddress.ModuleFile)\n", " let frameValue = sprintf \"%s!%s\" callStack.CodeAddress.ModuleName callStack.CodeAddress.FullMethodName\n", " root.AddNode ( frameValue.EscapeMarkup() ) |> ignore\n", "\n", " let rec processFrame (callStack : TraceCallStack) : unit =\n", " if isNull callStack then ()\n", " else\n", " printStackFrame callStack\n", " processFrame callStack.Caller\n", " \n", " processFrame callStack\n", " AnsiConsole.Write root\n", " printfn \"\\n\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An example of a printed callstack is the following:\n", "\n", "![Call Stack](Images/Example_Callstack.jpeg)\n", "\n", "It is worth noting that currently support for callstack resolution doesn't exist in its full capacity for Linux/MacOS.\n", "\n", "Lastly, charting is made possible using ``FSharp.Plotly`` in the following manner:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "dotnet_interactive": { "language": "fsharp" } }, "outputs": [ { "data": { "text/html": [ "
Installed Packages
  • Fsharp.Plotly, 1.2.2
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "// src/PerfAvore/PerfAvore.Console/RulesEngine/Actions/Chart.fs\n", "\n", "#r \"nuget:Fsharp.Plotly\"\n", "\n", "open System.Linq\n", "open FSharp.Plotly\n", "\n", "let printChart (rule : Rule) (service : AnomalyDetectionContextService) : unit = \n", "\n", " let v = service.TryRetrieve(rule.Id).Value\n", " let x = \n", " v\n", " |> Seq.map(fun i -> i.timestamp)\n", " let y = \n", " v\n", " |> Seq.map(fun i -> i.value)\n", " let input = Seq.zip x y\n", " let point = v.Last()\n", " let scatterPoint = seq { point.timestamp, point.value }\n", "\n", " [\n", " Chart.Line (input, Name = $\"Trend for {point.timestamp}\") \n", " Chart.Scatter (scatterPoint, mode = StyleParam.Mode.Markers, Name=\"Anomaly Point\")\n", " ]\n", " |> Chart.Combine\n", " |> Chart.withX_AxisStyle(title = \"Relative Timestamp (ms)\")\n", " |> Chart.withY_AxisStyle(title = $\"{rule.Condition.Conditioner.ConditionerProperty}\")\n", " |> Chart.Show" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An example of an chart is:\n", "\n", "![Chart](Images/Example_Charting.jpeg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How To Run Perf Avore\n", "\n", "Now that all the components of Perf Avore are covered, this section will cover how a user can run the Console App. Perf-Avore can be run by cd'ing into the ``src/PerfAvore/PerfAvore.Console`` directory and then:\n", "\n", "1. ``dotnet restore``\n", "2. ``dotnet run -- --processname [--tracepath ] [--rulespath ]``.\n", "\n", "### Command Line Arguments \n", "\n", "| Command Line Option | Description | \n", "| ----------- | ----------- |\n", "| ``processname`` | Name of the Process to analyze. This is the only mandatory parameter. |\n", "| ``tracepath`` | The path of the trace file (.ETL / .ETLX). The absence of this command line will trigger a real time session. Note: For real time sessions, admin privileges are required. |\n", "| ``rulespath`` | The path to a json file that contains a list of all the rules. By default, the ``SampleRules.json`` file will be used if this argument isn't specified. The location of this file is ``src\\PerfAvore\\PerfAvore.Console\\SampleRules\\SampleRules.json`` for Windows and ``src\\PerfAvore\\PerfAvore.Console\\SampleRules\\LinuxSampleRules.json`` | \n", "\n", "## Prototypes\n", "\n", "Prior to writing this Console App, I prototyped functionality to test out smaller components that can be found [here](https://github.com/MokoSan/PerfAvore/tree/main/src/Prototypes). Some of the prototypes include:\n", "\n", "1. [Rule Engine based DSL Parsing](https://github.com/MokoSan/PerfAvore/blob/main/src/Prototypes/RuleEngineDSL.ipynb)\n", "2. [Anomaly Detection With Trace Log API](https://github.com/MokoSan/PerfAvore/blob/main/src/Prototypes/AnomalyDetection_TraceLog.ipynb)\n", "3. [Anomaly Detection with ML.NET](https://github.com/MokoSan/PerfAvore/blob/main/src/Prototypes/AnomalyDetection_ML.NET.ipynb)\n", "4. [Prototyping the Trace Log API](https://github.com/MokoSan/PerfAvore/blob/main/src/Prototypes/PrototypingTraceLog.ipynb)\n", "\n", "## Testing\n", "\n", "I tested the effectiveness with a rouge process given [here](https://github.com/MokoSan/PerfAvore/blob/main/src/PerfAvore/RougePrograms/TimedExcessiveAllocs/Program.cs) that excessively allocates to both the SOH and the LOH on a timer and was able to get all the actions invoked. \n", "\n", "## Conclusion\n", "\n", "Finally, done! This submission took a lot of work and I feel I have a reasonable base to continue to build on top of. As a disclaimer, the project is still under development and is without unit tests. \n", "\n", "To reiterate:\n", "\n", "1. Build a parser for a domain we defined that encapsulates details about rules.\n", "2. A mechanism to digest ``TraceEvent`` instances from a trace file or real time process for Windows, Linux and MacOS systems.\n", "3. Action invocation logic if the conditions of a rule are met.\n", "\n", "Would very much appreciate any feedback or suggestions that can improve the product or if you spot any mistakes! Building Perf Avore was an incredibly rewarding learning experience!\n", "\n", "Thanks to the organizers of #fsadvent particularly, [Sergey Tihon](https://twitter.com/sergey_tiho)! Happy Holidays to all!\n", "\n", "## Tools Used\n", "\n", "Perf Avore was developed on VSCode using the [ionide](https://github.com/ionide/ionide-vscode-fsharp) plugin and dotnet cli. \n", "\n", "The version of dotnet used to develop is:\n", "```\n", "❯ dotnet --version\n", "6.0.100\n", "```\n", "\n", "I tested the linux use case using WSL with the following version:\n", "\n", "```\n", "MokoSan:~:% lsb_release -a\n", "No LSB modules are available.\n", "Distributor ID: Ubuntu\n", "Description: Ubuntu 20.04 LTS\n", "Release: 20.04\n", "Codename: focal\n", "```\n", "\n", "### Dependencies\n", "\n", "| Dependency Name | Reasons |\n", "| --------------- | ------- |\n", "| Argu | Command line parsing |\n", "| FSharp.Plotly | Charting |\n", "| Microsoft.Diagnostics.NETCore.Client | Linux / MacOS Trace Event Support |\n", "| Microsoft.Diagnostics.Tracing.TraceEvent | Trace Event Support |\n", "| Microsoft.ML | Basic abstractions used for the Anomaly Detection side of things |\n", "| Microsoft.ML.TimeSeries | Anomaly Detection Algorithm | \n", "| Spectre.Console | Prettifying the Console | \n", "| System.Text.Json | Parsing the JSON rules file | \n", "\n", "### Next Steps\n", "\n", "1. Added Unit Tests\n", "2. Add the ability to create an Audit of all the Actions Invoked.\n", "3. Clean up some of the interfaces and add more documentation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "\n", "1. [Taking Stock of Anomalies with F# And ML.NET](https://www.codesuji.com/2019/05/24/F-and-MLNet-Anomaly/)\n", "2. [A CPU Sampling Profiler in Less Than 200 Lines](https://lowleveldesign.org/2020/10/13/a-cpu-sampling-profiler-in-less-than-200-lines/)\n", "3. [Tutorial: Detect anomalies in time series with ML.NET](https://docs.microsoft.com/en-us/dotnet/machine-learning/tutorials/phone-calls-anomaly-detection)\n", "4. [Plug-in martingales for testing exchangeability on-line: arXiv:1204.3251](https://arxiv.org/pdf/1204.3251.pdf)\n", "5. [Atle Rudshaug's Submission of a Console App that helped me significantly design my app](https://atlemann.github.io/fsharp/2021/12/11/fs-crypto.html)\n", "6. [realmon](https://github.com/Maoni0/realmon/tree/main/src)" ] } ], "metadata": { "kernelspec": { "display_name": ".NET (F#)", "language": "F#", "name": ".net-fsharp" }, "language_info": { "name": "F#" } }, "nbformat": 4, "nbformat_minor": 2 }