Get Started

Knowledge

get faster, more accurate resolve investigations by adding knowledge for your team every company has unique operational and tribal knowledge each team investigates issues differently iteratively add knowledge to teach resolve best practices for different types of issues how do i set up knowledge? start by creating a team , which represents a logical group within your organization (ex infra, payments) teams have a few key sections knowledge define best practices for querying observability tools and investigating incidents configuration settings to help resolve understand when this team's knowledge is relevant to pull in to an investigation alerts use the alerts page to filter alerts, enable auto investigation, and attach runbooks slack channels install the resolve slack app, then connect channels for alerts, change events, or general chat resources define ownership by selecting applications, services, or clusters managed by the team dashboards add dashboard guidance to help resolve interpret and analyze observability data correctly overview diagram of teams create your first team open https //app0 resolve ai/ open the left sidebar and click teams click create team enter a name, description, and click create team set up knowledge, alerts, slack channels, and dashboards the default org lets you add guidance for company wide best practices team knowledge knowledge is how you guide resolve’s reasoning resolve uses a two tier knowledge model organization level knowledge (company defaults) team level knowledge (team specific overrides) when both exist, resolve prioritizes the most relevant information knowledge types there are 4 types of team knowledge, which are all markdown files resolve md authoritative guidance used in all chats and alert investigations keep this file concise it's used in every chat and investigation for the team, so too much content in resolve md will slow down and distract investigations include a brief system overview, glossary, and best practices for investigations alert runbooks alert specific guidance always examined during mapped alert investigations use actionable, sequential steps include examples of exact or templated queries dashboard guidance instructions for interpreting each dashboard, including variable selection, filters, and chart sections resolve will auto generate guidance for new dashboards review and edit this markdown file to improve how resolve makes use of the dashboard docs guidance for specific issue scenarios, pulled in based on relevance clearly state when this guidance applies example knowledge use https //commonmark org/help/ when writing team knowledge example resolve md clusters we have the following clusters app3 cluster, dev2 cluster, stgl cluster app3 cluster is the production cluster if the user mentions a question around particular cluster(s) then pass that information to all tools and agents for eg if a user is asking to look in logs for app3, then you should always mention the cluster information during explorations logs guidance always apply cluster filter the logs are fetched from a grafana cloud instance that has logs for many other organizations for this organization, you must always apply one of the following filters cluster="app3 cluster" or cluster="dev2 cluster" if investigating a kubernetes pod, you should also use the kubernetes log integration for most services, you can get the logs for the right level using a query like {cluster="app3 cluster", namespace="checkout assistant", service name=" \<service name> "} | detected level=" \<level> " levels can be info, warn, debug, error the labels like cluster, namespace, service name are indexed and make the query efficient as much as possible, avoid just a keyword based search without label filters as that scans a lot of logs example alert runbooks use traces to find the appropriate rds instances and related services use 'rds overview (us east 2)' dashboard to get the health of the overall rds instances use 'rds performance insights (us east 2)' dashboard to triage further use the 'pgstats' command via the awscli tool to determine if there are specific queries resulting in high cpu from step 1, use the related services to determine the radius of impact example dashboard guidance when to use this dashboard troubleshoot any service issues including latency, performance etc dashboard variables replace \<servicename> and \<country code> according to the service and country for all variables set to \<servicename> \<country code> except aws ecs service append " backend" service should be set to \<servicename> \<country code> host service should be set to \<servicename> \<country code> aws ecs service should be set to \<servicename> \<country code> backend if the service name and country code are both provided, do not append an additional country code example if the service is "checkout" and country is germany, use the following values service should be set to checkout ge host service should be set to checkout ge aws ecs service should be set to checkout ge backend example docs general guidance for the frontend service you can often use the @http url @http path group and @http target attributes to group by or filter for certain spans/traces investigation guidance the frontend is the entrypoint of our app (sitting behind frontend proxy which receives requests from users) therefore lots of errors might bubble up from our other microservices and trigger frontend alerts just because there is an error or alert triggered on the frontend service does not mean it is the root cause of the problem in fact its likely that it is not the root cause error logs in the frontend might reference other services and we can use traces to determine if any dependencies are the source of our errors evidence queries guidance you can often use the @http url @http path group and @http target attributes to group by or filter for certain spans/traces the service makes http and grpc requests to downstream services so you might see common error codes from those protocols in any error logs knowledge retrieval alert investigations deep, deterministic reasoning pulls mapped runbooks and relevant team + org knowledge chats optimized for speed pulls limited but relevant knowledge organization knowledge is always included; higher relevance is prioritized when conflicts exist alerts owned by multiple teams when alerts are owned by multiple teams, resolve pulls and prioritizes knowledge from all owning teams and the resolve md for the organization summary teams are foundational to high quality investigations, accurate chats, and scalable onboarding well structured teams ensure resolve understands your systems and knows how to help when it matters most