How to write a parser or (parser) plugin

Introduction

This page is intended to give you an introduction into developing a parser for Plaso.

  • A step-by-step example is provided to create a simple binary parser for the Safari Cookies.binarycookies file.

  • At bottom are some common troubleshooting tips that others have run into before you.

This page assumes you have at least a basic understanding of programming in Python and use of git.

Terminology

  • event; a subclass of EventObject which represents an event

  • event data; a subclass of EventData which represents data related to the event.

  • event data stream; a subclass of EventDataStream which represents the data stream which the event data originated from.

  • message formatter; a configuration driven subsystem of Plaso that generates a human readable message of the event data.

  • timeliner; a configuration driven subsystem of Plaso that generates events from event data.

  • parser; a subclass of FileObjectParser that extracts event data from a file.

  • parser plugin; an extension of existing parser, such as the SQLite parser, that that extracts event data from a file.

Before you start

Before you can write a binary file parser you will need to have a good understanding of the data format. Several things can help here:

  • having a diverse set of test data, preferable test data that is reproducible. Examples of how to create reproducible test data can be found here

  • having format specifications

Parser or (parser) plugin

Before starting work on a parser, check if Plaso already has a parser that handles the underlying data format. Plaso currently supports (parser) plugins for the following file formats:

  • Bencode

  • Compound ZIP archives

  • Web Browser Cookies

  • Extensible Storage Engine (ESE) databases

  • Single-line JSON-L log files

  • OLE Compound files

  • Plist files

  • SQLite databases

  • SQLite database files

  • Text-based log files

  • Windows Registry files (CREG and REGF)

If the data format you are trying to parse is in one of these formats, you will need to write a (parser) plugin rather than a parser.

Writing a parser

For our example, the Safari Cookies.binarycookies file has its own unique data format, hence we need to create a separate parser.

A description of the Safari Cookies.binarycookies format can be found here.

Test data

First we make a representative test file and add it to the test_data/ directory, in our example:

test_data/Cookies.binarycookies

Make sure that the test file does not contain sensitive or copyrighted material.

The parser

Next create the parser and add it to the plaso/parsers/ directory.

plaso/parsers/safari_cookies.py
# -*- coding: utf-8 -*-
"""Parser for Safari Binary Cookie files."""

from plaso.parsers import interface
from plaso.parsers import manager


class BinaryCookieParser(interface.FileObjectParser):
  """Parser for Safari Binary Cookie files."""

  NAME = 'binary_cookies'
  DATA_FORMAT = 'Safari Binary Cookie file'

  def ParseFileObject(self, parser_mediator, file_object, **kwargs):
    """Parses a Safari binary cookie file-like object.

    Args:
      parser_mediator (ParserMediator): parser mediator.
      file_object (dfvfs.FileIO): file-like object to be parsed.

    Raises:
      WrongParser: when the format is not supported by the parser, this will
          signal the event extractor to apply other parsers.
    """
    ...


manager.ParsersManager.RegisterParser(BinaryCookieParser)

manager.ParsersManager.RegisterParser(BinaryCookieParser) is used to register the parser with the unique name binary_cookies.

Registering a parser

To ensure the parser is registered automatically add an import to:

plaso/parsers/__init__.py
from plaso.parsers import safari_cookies

When plaso.parsers is imported this will load the safari_cookies submodule safari_cookies.py.

The event data

from plaso.containers import events


class SafariBinaryCookieEventData(events.EventData):
  """Safari binary cookie event data.

  Attributes:
    cookie_name (str): cookie name.
    ...
  """

  DATA_TYPE = 'safari:cookie:entry'

  def __init__(self):
    """Initializes event data."""
    super(SafariBinaryCookieEventData, self).__init__(data_type=self.DATA_TYPE)
    self.cookie_name = None
    ...

The unit test

To ensure the parser is and remains working it is necessary to write a unit test. Next create the parser unit test and add it to the tests/parsers/ directory.

test/parsers/safari_cookies.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""Tests for the Safari cookie parser."""

import unittest

from plaso.lib import definitions
from plaso.parsers import safari_cookies

from tests.parsers import test_lib


class SafariCookieParserTest(test_lib.ParserTestCase):
  """Tests for the Safari cookie parser."""

  def testParseFile(self):
    """Tests the Parse function on a Safari binary cookies file."""
    ...


if __name__ == '__main__':
  unittest.main()

The timeliner configuration

To have Plaso generate events from the extracted event data the timeliner configuration data/timeliner.yaml needs to be extended with a definition for the safari:cookie:entry data type.

data_type: 'safari:cookie:entry'
attribute_mappings:
- name: 'creation_time'
  description: 'Creation Time'
place_holder_event: true

The message formatter configuration

To have Plaso generate human readable message of the event data the formatter configuration data/formatters/ needs to be extended with a definition for the safari:cookie:entry data type.

The event message format is defined in data/formatters/*.yaml.

type: 'conditional'
data_type: 'safari:cookie:entry'
message:
- '{url}'
...
short_message:
- '{url}'
...
short_source: 'WEBHIST'
source: 'Safari Cookies'

For more information about the configuration file format see: message formatting