How to write a parser or (parser) plugin
Introduction
This page is intended to give you an introduction into developing a parser for Plaso.
A step-by-step example is provided to create a simple binary parser for the Safari Cookies.binarycookies file.
At bottom are some common troubleshooting tips that others have run into before you.
This page assumes you have at least a basic understanding of programming in Python and use of git.
Terminology
event; a subclass of EventObject which represents an event
event data; a subclass of EventData which represents data related to the event.
event data stream; a subclass of EventDataStream which represents the data stream which the event data originated from.
message formatter; a configuration driven subsystem of Plaso that generates a human readable message of the event data.
timeliner; a configuration driven subsystem of Plaso that generates events from event data.
parser; a subclass of FileObjectParser that extracts event data from a file.
parser plugin; an extension of existing parser, such as the SQLite parser, that that extracts event data from a file.
Before you start
Before you can write a binary file parser you will need to have a good understanding of the data format. Several things can help here:
having a diverse set of test data, preferable test data that is reproducible. Examples of how to create reproducible test data can be found here
having format specifications
Parser or (parser) plugin
Before starting work on a parser, check if Plaso already has a parser that handles the underlying data format. Plaso currently supports (parser) plugins for the following file formats:
Bencode
Compound ZIP archives
Web Browser Cookies
Extensible Storage Engine (ESE) databases
Single-line JSON-L log files
OLE Compound files
Plist files
SQLite databases
Text-based log files
Windows Registry files (CREG and REGF)
If the data format you are trying to parse is in one of these formats, you will need to write a (parser) plugin rather than a parser.
Writing a parser
For our example, the Safari Cookies.binarycookies file has its own unique data format, hence we need to create a separate parser.
A description of the Safari Cookies.binarycookies format can be found here.
Test data
First we make a representative test file and add it to the test_data/
directory, in our example:
test_data/Cookies.binarycookies
Make sure that the test file does not contain sensitive or copyrighted material.
The parser
Next create the parser and add it to the plaso/parsers/
directory.
plaso/parsers/safari_cookies.py
# -*- coding: utf-8 -*-
"""Parser for Safari Binary Cookie files."""
from plaso.parsers import interface
from plaso.parsers import manager
class BinaryCookieParser(interface.FileObjectParser):
"""Parser for Safari Binary Cookie files."""
NAME = 'binary_cookies'
DATA_FORMAT = 'Safari Binary Cookie file'
def ParseFileObject(self, parser_mediator, file_object, **kwargs):
"""Parses a Safari binary cookie file-like object.
Args:
parser_mediator (ParserMediator): parser mediator.
file_object (dfvfs.FileIO): file-like object to be parsed.
Raises:
WrongParser: when the format is not supported by the parser, this will
signal the event extractor to apply other parsers.
"""
...
manager.ParsersManager.RegisterParser(BinaryCookieParser)
manager.ParsersManager.RegisterParser(BinaryCookieParser)
is used to
register the parser with the unique name binary_cookies
.
Registering a parser
To ensure the parser is registered automatically add an import to:
plaso/parsers/__init__.py
from plaso.parsers import safari_cookies
When plaso.parsers is imported this will load the safari_cookies submodule
safari_cookies.py
.
The event data
from plaso.containers import events
class SafariBinaryCookieEventData(events.EventData):
"""Safari binary cookie event data.
Attributes:
cookie_name (str): cookie name.
...
"""
DATA_TYPE = 'safari:cookie:entry'
def __init__(self):
"""Initializes event data."""
super(SafariBinaryCookieEventData, self).__init__(data_type=self.DATA_TYPE)
self.cookie_name = None
...
The unit test
To ensure the parser is and remains working it is necessary to write a unit
test. Next create the parser unit test and add it to the tests/parsers/
directory.
test/parsers/safari_cookies.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""Tests for the Safari cookie parser."""
import unittest
from plaso.lib import definitions
from plaso.parsers import safari_cookies
from tests.parsers import test_lib
class SafariCookieParserTest(test_lib.ParserTestCase):
"""Tests for the Safari cookie parser."""
def testParseFile(self):
"""Tests the Parse function on a Safari binary cookies file."""
...
if __name__ == '__main__':
unittest.main()
The timeliner configuration
To have Plaso generate events from the extracted event data the timeliner
configuration data/timeliner.yaml
needs to be extended with a definition
for the safari:cookie:entry
data type.
data_type: 'safari:cookie:entry'
attribute_mappings:
- name: 'creation_time'
description: 'Creation Time'
place_holder_event: true
The message formatter configuration
To have Plaso generate human readable message of the event data the formatter
configuration data/formatters/
needs to be extended with a definition for
the safari:cookie:entry
data type.
The event message format is defined in data/formatters/*.yaml
.
type: 'conditional'
data_type: 'safari:cookie:entry'
message:
- '{url}'
...
short_message:
- '{url}'
...
short_source: 'WEBHIST'
source: 'Safari Cookies'
For more information about the configuration file format see: message formatting