← Back to home

Building a Sentry-like Error Tracing System for Next.js using Loki and Grafana

2024-07-31

Implementing Lightweight Error Tracing in Next.js with Loki and Grafana: An Alternative to Sentry

Recently, I had some free time and built an Error Boundary for my team's project to implement error tracing on exception errors. I started thinking about internal back-office systems and how we could create a custom solution to keep all data internal without relying on external services.

In this post, we'll explore how to implement a comprehensive error tracing system in a Next.js application using Loki for log aggregation and Grafana for visualization. This approach provides a cost-effective alternative to services like Sentry while offering customizable error tracking. We'll also dive into an enhanced feature: tracing the last 5 UI elements interacted with and sending this information along with the error report.

Do note that this implementation is specifically for Next.js 13, where we still wrap components with Error Boundary, unlike the latest Next.js 14 changes. However, the core concepts still apply.

The Stack

To provide more context when errors occur, we'll implement a system that tracks the last 5 UI elements the user interacted with. This information can be crucial for debugging and understanding the user's journey leading up to the error.

Our error tracing system leverages the following technologies:

  1. Next.js: A popular React framework for building web applications.
  2. Loki: A horizontally-scalable, highly-available log aggregation system.
  3. Grafana: An open-source platform for monitoring and observability.
  4. Pino: A super fast Node.js logger with JSON output.
  5. Custom hook storing ui interactions only the HTML elements NOT a keylogger for obvious security reason.

Implementation

Let's go through the key components of our error tracing system.

1. Error Boundary Setup

First, create an ErrorBoundary component to catch and handle errors in your React tree:

╰┈➤ src/components/ErrorBoundary.tsx

import React from 'react';
import Router from 'next/router';

class ErrorBoundaryInner extends React.Component<
  { children: React.ReactNode; getInteractions: () => string[] },
  { hasError: boolean; error: Error | null }
> {
  constructor(props) {
    super(props);
    this.state = { hasError: false, error: null };
  }

  static getDerivedStateFromError(error: Error) {
    return { hasError: true, error };
  }

  componentDidCatch(error: Error, errorInfo: React.ErrorInfo) {
    const interactions = this.props.getInteractions();

    fetch('/api/log-error', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        message: error.message,
        stack: error.stack,
        url: window.location.href,
        userAgent: navigator.userAgent,
        timestamp: new Date().toISOString(),
        route: window.location.pathname,
        interactions: interactions,
      }),
    }).catch(console.error);
  }

  render() {
    if (this.state.hasError) {
      return (
        <div
          style={{
            display: 'flex',
            justifyContent: 'center',
            alignItems: 'center',
            height: '100vh',
            backgroundColor: '#f5f5f5',
          }}
        >
          <div
            style={{
              padding: '2rem',
              backgroundColor: 'white',
              borderRadius: '8px',
              boxShadow: '0 4px 6px rgba(0, 0, 0, 0.1)',
              maxWidth: '400px',
              textAlign: 'center',
            }}
          >
            <h2>Oops! Something went wrong.</h2>
            <p>
              We're sorry for the inconvenience. Our team has been notified and is working on a fix.
            </p>
            <div style={{ marginTop: '1rem' }}>
              <button onClick={() => window.location.reload()} style={buttonStyle}>
                Reload
              </button>
              <button onClick={() => Router.back()} style={buttonStyle}>
                Go Back
              </button>
              <button onClick={() => Router.push('/')} style={buttonStyle}>
                Home
              </button>
            </div>
          </div>
        </div>
      );
    }

    return this.props.children;
  }
}

const buttonStyle = {
  margin: '0 0.5rem',
  padding: '0.5rem 1rem',
  backgroundColor: '#007bff',
  color: 'white',
  border: 'none',
  borderRadius: '4px',
  cursor: 'pointer',
};

function ErrorBoundary({
  children,
  getInteractions,
}: {
  children: React.ReactNode;
  getInteractions: () => string[];
}) {
  return <ErrorBoundaryInner getInteractions={getInteractions}>{children}</ErrorBoundaryInner>;
}

export default ErrorBoundary;

2. Logging API Route

Set up an API route in Next.js to receive and process error logs:

╰┈➤ pages/api/log-error.ts

import type { NextApiRequest, NextApiResponse } from 'next';
import logger from '../../lib/logger';

export default function handler(req: NextApiRequest, res: NextApiResponse) {
  if (req.method !== 'POST') {
    return res.status(405).end();
  }

  const { message, stack, url, userAgent, timestamp, route, interactions } = req.body;

  const relevantStack = stack
    ? stack.split('\n').slice(0, 3).join('\n')
    : 'No stack trace available';

  const formattedInteractions = interactions
    ? interactions
        .map((interaction: string, index: number) => `  ${index + 1}. ${interaction}`)
        .join('\n')
    : 'No interactions logged';

  logger.error(
    {
      message: message || 'Unknown error',
      stack: relevantStack,
      url: url || 'Unknown URL',
      userAgent: userAgent ? userAgent.split(' ').pop() : 'Unknown user agent',
      timestamp: timestamp || new Date().toISOString(),
      env: ╰┈➤ if you have multiple environment
      route: route || 'Unknown route',
      interactions: formattedInteractions,
    },
    `Client Exception Error: ${
      message || 'Unknown error'
    }\nLast User Interactions:\n${formattedInteractions}`
  );

  res.status(200).json({ received: true });
}

3. Logger Configuration

Use Pino for logging, configured to work with Loki:

╰┈➤ lib/logger.ts

import pino from 'pino';

const logger = pino({
  level: 'warn',
  transport: {
    target: 'pino-pretty',
    options: {
      colorize: true,
      translateTime: 'SYS:standard',
      ignore: 'pid,hostname',
      messageFormat: '{msg} | {type}: {metric} | Value: {value} | Rating: {rating} | URL: {url}',
    },
  },
  formatters: {
    level: (label) => {
      return { level: label.toUpperCase() };
    },
  },
  base: {
    env: ╰┈➤ if you have multiple environment and add conditonals for what to trace only i.e.: UAT
    apiGateway: ╰┈➤ for your apiBasePath or gateway
  },
});

export default logger;

4. Integration in _app.tsx with useInteractionTracker hook

Wrap your entire application with the ErrorBoundary in the custom App component together with a tracker in locastorage:

╰┈➤ pages/_app.tsx

import ErrorBoundary from 'src/components/ErrorBoundary';
import { useInteractionTracker } from 'src/hooks/useInteractionTracker';

function App({ Component, pageProps }: AppProps) {
  const getInteractions = useInteractionTracker();

  return (
    <ErrorBoundary getInteractions={getInteractions}>
      <Component {...pageProps} />
    </ErrorBoundary>
  );
}

export default App;
╰┈➤ src/hooks/useInteractionTracker.ts

import { useEffect, useRef } from "react";

const MAX_INTERACTIONS = 5;

function getElementIdentifier(element: HTMLElement): string {
  if (element.id) {
    return `#${element.id}`;
  }

  if (element.className) {
    return `.${element.className.split(' ')[0]}`;
  }

  return element.tagName.toLowerCase();
}

function getElementText(element: HTMLElement): string {
  let text = element.getAttribute("aria-label") ||
             element.getAttribute("title") ||
             element.textContent?.trim() || "";

  return text ? ` "${text.slice(0, 20)}${text.length > 20 ? "..." : ""}"` : "";
}

function isInteractiveElement(element: HTMLElement): boolean {
  const interactiveTags = ['A', 'BUTTON', 'INPUT', 'SELECT', 'TEXTAREA'];
  const interactiveRoles = ['button', 'link', 'checkbox', 'menuitem', 'tab'];

  return interactiveTags.includes(element.tagName) ||
         interactiveRoles.includes(element.getAttribute('role') || '') ||
         element.hasAttribute('onclick') ||
         element.hasAttribute('tabindex');
}

function findInteractiveParent(element: HTMLElement): HTMLElement {
  let current = element;
  while (current && current !== document.body) {
    if (isInteractiveElement(current)) {
      return current;
    }
    current = current.parentElement!;
  }
  return element;
}

export function useInteractionTracker() {
  const interactionsRef = useRef<string[]>([]);

  useEffect(() => {
    const trackInteraction = (event: MouseEvent | KeyboardEvent) => {
      const target = event.target as HTMLElement;
      const interactiveElement = findInteractiveParent(target);

      const elementId = getElementIdentifier(interactiveElement);
      const elementText = getElementText(interactiveElement);

      let interaction = `${event.type} on ${elementId}${elementText}`;

      if (event instanceof KeyboardEvent && event.key !== "Tab") {
        interaction += ` (key: ${event.key})`;
      }

      interactionsRef.current = [
        interaction,
        ...interactionsRef.current.slice(0, MAX_INTERACTIONS - 1),
      ];

      localStorage.setItem("userInteractions", JSON.stringify(interactionsRef.current));
    };

    window.addEventListener("click", trackInteraction, true);
    window.addEventListener("keydown", trackInteraction, true);

    return () => {
      window.removeEventListener("click", trackInteraction, true);
      window.removeEventListener("keydown", trackInteraction, true);
    };
  }, []);

  return () => interactionsRef.current;
}

5. Setting up Loki and Grafana

To complete our error tracing system, we need to set up Loki for log aggregation and Grafana for visualization.

Loki Setup

  1. Install Loki: Create a docker-compose.yml file:

    version: '3'
    services:
      loki:
        image: grafana/loki:2.8.0
        ports:
          - '3100:3100'
        command: -config.file=/etc/loki/local-config.yaml
        volumes:
          - ./loki-config.yaml:/etc/loki/local-config.yaml
    
  2. Create a loki-config.yaml file:

    auth_enabled: false
    
    server:
      http_listen_port: 3100
    
    ingester:
      lifecycler:
        address: 127.0.0.1
        ring:
          kvstore:
            store: inmemory
        final_sleep: 0s
      chunk_idle_period: 5m
      chunk_retain_period: 30s
    
    schema_config:
      configs:
        - from: 2020-05-15
          store: boltdb
          object_store: filesystem
          schema: v11
          index:
            prefix: index_
            period: 168h
    
    storage_config:
      boltdb:
        directory: /tmp/loki/index
      filesystem:
        directory: /tmp/loki/chunks
    
    limits_config:
      enforce_metric_name: false
      reject_old_samples: true
      reject_old_samples_max_age: 168h
    
    chunk_store_config:
      max_look_back_period: 0s
    
    table_manager:
      retention_deletes_enabled: false
      retention_period: 0s
    
  3. Run Loki:

    docker-compose up -d
    

Grafana Setup

  1. Update your docker-compose.yml to include Grafana:

    version: '3'
    services:
      loki:
        image: grafana/loki:2.8.0
        ports:
          - '3100:3100'
        command: -config.file=/etc/loki/local-config.yaml
        volumes:
          - ./loki-config.yaml:/etc/loki/local-config.yaml
      grafana:
        image: grafana/grafana:latest
        ports:
          - '3000:3000'
        depends_on:
          - loki
    
  2. Run Grafana:

    docker-compose up -d
    
  3. Access Grafana at http://localhost:3000 (default credentials: admin/admin)

  4. Add Loki as a data source in Grafana:

    • Go to Configuration > Data Sources
    • Click "Add data source"
    • Select Loki
    • Set the URL to http:╰┈➤loki:3100
    • Click "Save & Test"
  5. Create a dashboard in Grafana:

    • Click "+ > Create > Dashboard"
    • Add a new panel
    • In the query editor, use LogQL to query your logs, e.g.:
      {job="next-app"} |= "error"
      
  6. Configuring Next.js to send logs to Loki: To send logs from your Next.js application to Loki, we'll use a Pino transport that forwards logs to Loki. Here's how to set it up:


a. Install required packages:

```bash
npm install pino pino-loki

b. Update your logger configuration (lib/logger.ts):

import pino from 'pino';
import { createWriteStream } from 'pino-loki';

const transport = createWriteStream({
  host: 'http://localhost:3100', ╰┈➤ Adjust this to your Loki server address
  basicAuth: {
    username: 'your-username', ╰┈➤ If you've set up authentication
    password: 'your-password',
  },
  labels: {
    job: 'next-app', ╰┈➤ This helps identify your app in Loki
    environment: process.env.NODE_ENV || 'development',
  },
});

const logger = pino(
  {
    level: process.env.LOG_LEVEL || 'info',
    formatters: {
      level: (label) => {
        return { level: label.toUpperCase() };
      },
    },
    base: {
      env: ╰┈➤if you have multiple environment to trace i.e.: dev, UAT
      apiGateway: ╰┈➤implement based on your env apiBasePath or Gateway
    },
  },
  transport
);

export default logger;

This configuration sets up Pino to send logs directly to Loki. The createWriteStream function from pino-loki creates a transport that sends logs to the specified Loki server.

c. Update your error logging API (pages/api/log-error.ts):

import type { NextApiRequest, NextApiResponse } from 'next';
import logger from '../../lib/logger';

export default function handler(req: NextApiRequest, res: NextApiResponse) {
  if (req.method !== 'POST') {
    return res.status(405).end();
  }

  const { message, stack, url, userAgent, timestamp, route, interactions } = req.body;

  const relevantStack = stack
    ? stack.split('\n').slice(0, 3).join('\n')
    : 'No stack trace available';

  const formattedInteractions = interactions
    ? interactions
        .map((interaction: string, index: number) => `  ${index + 1}. ${interaction}`)
        .join('\n')
    : 'No interactions logged';

  logger.error(
    {
      message: message || 'Unknown error',
      stack: relevantStack,
      url: url || 'Unknown URL',
      userAgent: userAgent ? userAgent.split(' ').pop() : 'Unknown user agent',
      timestamp: timestamp || new Date().toISOString(),
      route: route || 'Unknown route',
      interactions: formattedInteractions,
    },
    `Client Exception Error: ${message || 'Unknown error'}`
  );

  res.status(200).json({ received: true });
}

Now, when errors occur in your Next.js application, they will be sent to Loki via the configured Pino transport.

7. Creating Useful Grafana Dashboards

With logs now flowing into Loki, you can create informative dashboards in Grafana. Here are some example queries and panels you might want to create:

a. Error Count Over Time:
Query: `sum(count_over_time({job="next-app"} |= "error" [$__interval]))`
Panel: Graph

b. Top 10 Error Messages:
Query: `topk(10, count_over_time({job="next-app"} |= "error" [$__interval]) by (message))`
Panel: Table

c. Errors by Route:
Query: `sum(count_over_time({job="next-app"} |= "error" [$__interval])) by (route)`
Panel: Pie Chart

d. Latest Errors:
Query: `{job="next-app"} |= "error" | json | line_format "{{.message}} ({{.route}})"`
Panel: Logs

e. Error Distribution by Environment:
Query: `sum(count_over_time({job="next-app"} |= "error" [$__interval])) by (env)`
Panel: Bar Gauge

8. Setting Up Alerts

Grafana allows you to set up alerts based on your log data. Here's an example of how to set up a simple alert:

a. In your Grafana dashboard, edit a panel showing error counts.
b. Go to the "Alert" tab.
c. Click "Create Alert".
d. Set conditions, for example:

"WHEN last() OF query(A, 5m, now) IS ABOVE 10" This will trigger an alert when there are more than 10 errors in the last 5 minutes. e. Set up notification channels (email, Slack, etc.) in Grafana's Alert Notification settings.

9. Best Practices

- Log Levels: Use appropriate log levels (error, warn, info, debug) to categorize your logs.
- Structured Logging: Always use structured logging to make it easier to query and analyze logs.
- Sensitive Information: Be careful not to log sensitive information like passwords or personal data.
- Performance: Monitor the performance impact of logging, especially in high-traffic applications.

tldr:

  • This was just an interesting way to workaround not paying sentry though, still suggest you to go for it if resources allow, since we are still storing data ourselves this way. That comes with a cost too.
  • BUT it is a unique custom solution whereby there is no external party, all the data is yours to manage and you know what is logged.
  • DO note, I did not implement user id/user name identity in this tracing as everyone has different identity management system, so you may want to add that if needed.