plaso.multi_processing package

Submodules

plaso.multi_processing.analysis_process module

The multi-process analysis process.

class plaso.multi_processing.analysis_process.AnalysisProcess(event_queue, storage_writer, knowledge_base, analysis_plugin, processing_configuration, data_location=None, event_filter_expression=None, **kwargs)[source]

Bases: plaso.multi_processing.base_process.MultiProcessBaseProcess

Multi-processing analysis process.

SignalAbort()[source]

Signals the process to abort.

plaso.multi_processing.base_process module

Base class for a process used in multi-processing.

class plaso.multi_processing.base_process.MultiProcessBaseProcess(processing_configuration, enable_sigsegv_handler=False, **kwargs)[source]

Bases: multiprocessing.context.Process

Multi-processing process interface.

rpc_port

port number of the process status RPC server.

Type

int

abstract SignalAbort()[source]

Signals the process to abort.

property name

process name.

Type

str

run()[source]

Runs the process.

plaso.multi_processing.engine module

The multi-process processing engine.

class plaso.multi_processing.engine.MultiProcessEngine[source]

Bases: plaso.engine.engine.BaseEngine

Multi-process engine base.

This class contains functionality to: * monitor and manage worker processes; * retrieve a process status information via RPC; * manage the status update thread.

plaso.multi_processing.logger module

The multi-processing sub module logger.

plaso.multi_processing.plaso_xmlrpc module

XML RPC server and client.

class plaso.multi_processing.plaso_xmlrpc.ThreadedXMLRPCServer(callback)[source]

Bases: plaso.multi_processing.rpc.RPCServer

Threaded XML RPC server.

Start(hostname, port)[source]

Starts the process status RPC server.

Parameters
  • hostname (str) – hostname or IP address to connect to for requests.

  • port (int) – port to connect to for requests.

Returns

True if the RPC server was successfully started.

Return type

bool

Stop()[source]

Stops the process status RPC server.

class plaso.multi_processing.plaso_xmlrpc.XMLProcessStatusRPCClient[source]

Bases: plaso.multi_processing.plaso_xmlrpc.XMLRPCClient

XML process status RPC client.

class plaso.multi_processing.plaso_xmlrpc.XMLProcessStatusRPCServer(callback)[source]

Bases: plaso.multi_processing.plaso_xmlrpc.ThreadedXMLRPCServer

XML process status threaded RPC server.

class plaso.multi_processing.plaso_xmlrpc.XMLRPCClient[source]

Bases: plaso.multi_processing.rpc.RPCClient

XML RPC client.

CallFunction()[source]

Calls the function via RPC.

Close()[source]

Closes the RPC communication channel to the server.

Open(hostname, port)[source]

Opens a RPC communication channel to the server.

Parameters
  • hostname (str) – hostname or IP address to connect to for requests.

  • port (int) – port to connect to for requests.

Returns

True if the communication channel was established.

Return type

bool

plaso.multi_processing.psort module

The psort multi-processing engine.

class plaso.multi_processing.psort.PsortEventHeap[source]

Bases: object

Psort event heap.

PopEvent()[source]

Pops an event from the heap.

Returns

containing:

str: identifier of the event MACB group or None if the event cannot

be grouped.

str: identifier of the event content. EventObject: event. EventData: event data. EventDataStream: event data stream.

Return type

tuple

PopEvents()[source]

Pops events from the heap.

Yields

tuple

containing:

str: identifier of the event MACB group or None if the event cannot

be grouped.

str: identifier of the event content. EventObject: event. EventData: event data. EventDataStream: event data stream.

PushEvent(event, event_data, event_data_stream)[source]

Pushes an event onto the heap.

Parameters
property number_of_events

number of events on the heap.

Type

int

class plaso.multi_processing.psort.PsortMultiProcessEngine(worker_memory_limit=None, worker_timeout=None)[source]

Bases: plaso.multi_processing.engine.MultiProcessEngine

Psort multi-processing engine.

AnalyzeEvents(knowledge_base_object, storage_writer, data_location, analysis_plugins, processing_configuration, event_filter=None, event_filter_expression=None, status_update_callback=None)[source]

Analyzes events in a plaso storage.

Parameters
  • knowledge_base_object (KnowledgeBase) – contains information from the source data needed for processing.

  • storage_writer (StorageWriter) – storage writer.

  • data_location (str) – path to the location that data files should be loaded from.

  • analysis_plugins (dict[str, AnalysisPlugin]) – analysis plugins that should be run and their names.

  • processing_configuration (ProcessingConfiguration) – processing configuration.

  • event_filter (Optional[FilterObject]) – event filter.

  • event_filter_expression (Optional[str]) – event filter expression.

  • status_update_callback (Optional[function]) – callback function for status updates.

Raises

KeyboardInterrupt – if a keyboard interrupt was raised.

ExportEvents(knowledge_base_object, storage_reader, output_module, processing_configuration, deduplicate_events=True, event_filter=None, status_update_callback=None, time_slice=None, use_time_slicer=False)[source]

Exports events using an output module.

Parameters
  • knowledge_base_object (KnowledgeBase) – contains information from the source data needed for processing.

  • storage_reader (StorageReader) – storage reader.

  • output_module (OutputModule) – output module.

  • processing_configuration (ProcessingConfiguration) – processing configuration.

  • deduplicate_events (Optional[bool]) – True if events should be deduplicated.

  • event_filter (Optional[FilterObject]) – event filter.

  • status_update_callback (Optional[function]) – callback function for status updates.

  • time_slice (Optional[TimeSlice]) – slice of time to output.

  • use_time_slicer (Optional[bool]) – True if the ‘time slicer’ should be used. The ‘time slicer’ will provide a context of events around an event of interest.

plaso.multi_processing.rpc module

The RPC client and server interface.

class plaso.multi_processing.rpc.RPCClient[source]

Bases: object

RPC client interface.

abstract CallFunction()[source]

Calls the function via RPC.

abstract Close()[source]

Closes the RPC communication channel to the server.

abstract Open(hostname, port)[source]

Opens a RPC communication channel to the server.

Parameters
  • hostname (str) – hostname or IP address to connect to for requests.

  • port (int) – port to connect to for requests.

Returns

True if the communication channel was established.

Return type

bool

class plaso.multi_processing.rpc.RPCServer(callback)[source]

Bases: object

RPC server interface.

abstract Start(hostname, port)[source]

Starts the RPC server.

Parameters
  • hostname (str) – hostname or IP address to connect to for requests.

  • port (int) – port to connect to for requests.

Returns

True if the RPC server was successfully started.

Return type

bool

abstract Stop()[source]

Stops the RPC server.

plaso.multi_processing.task_engine module

The task multi-process processing engine.

class plaso.multi_processing.task_engine.TaskMultiProcessEngine(maximum_number_of_tasks=None, number_of_worker_processes=0, worker_memory_limit=None, worker_timeout=None)[source]

Bases: plaso.multi_processing.engine.MultiProcessEngine

Class that defines the task multi-process engine.

This class contains functionality to: * monitor and manage extraction tasks; * merge results returned by extraction workers.

ProcessSources(session, source_path_specs, storage_writer, processing_configuration, enable_sigsegv_handler=False, status_update_callback=None)[source]

Processes the sources and extract events.

Parameters
  • session (Session) – session in which the sources are processed.

  • source_path_specs (list[dfvfs.PathSpec]) – path specifications of the sources to process.

  • storage_writer (StorageWriter) – storage writer for a session storage.

  • processing_configuration (ProcessingConfiguration) – processing configuration.

  • enable_sigsegv_handler (Optional[bool]) – True if the SIGSEGV handler should be enabled.

  • status_update_callback (Optional[function]) – callback function for status updates.

Returns

processing status.

Return type

ProcessingStatus

plaso.multi_processing.task_manager module

The task manager.

class plaso.multi_processing.task_manager.TaskManager[source]

Bases: object

Manages tasks and tracks their completion and status.

A task being tracked by the manager must be in exactly one of the following states:

  • abandoned: a task assumed to be abandoned because a tasks that has been

    queued or was processing exceeds the maximum inactive time.

  • merging: a task that is being merged by the engine.

  • pending_merge: the task has been processed and is ready to be merged with

    the session storage.

  • processed: a worker has completed processing the task, but it is not ready

    to be merged into the session storage.

  • processing: a worker is processing the task.

  • queued: the task is waiting for a worker to start processing it. It is also

    possible that a worker has already completed the task, but no status update was collected from the worker while it processed the task.

Once the engine reports that a task is completely merged, it is removed from the task manager.

Tasks are considered “pending” when there is more work that needs to be done to complete these tasks. Pending applies to tasks that are: * not abandoned; * abandoned, but need to be retried.

Abandoned tasks without corresponding retry tasks are considered “failed” when the foreman is done processing.

CheckTaskToMerge(task)[source]

Checks if the task should be merged.

Parameters

task (Task) – task.

Returns

True if the task should be merged.

Return type

bool

Raises

KeyError – if the task was not queued, processing or abandoned.

CompleteTask(task)[source]

Completes a task.

The task is complete and can be removed from the task manager.

Parameters

task (Task) – task.

Raises

KeyError – if the task was not merging.

CreateRetryTask()[source]

Creates a task that to retry a previously abandoned task.

Returns

a task that was abandoned but should be retried or None if there are

no abandoned tasks that should be retried.

Return type

Task

CreateTask(session_identifier, storage_format='sqlite')[source]

Creates a task.

Parameters
  • session_identifier (str) – the identifier of the session the task is part of.

  • storage_format (Optional[str]) – the storage format that the task should be stored in.

Returns

task attribute container.

Return type

Task

GetFailedTasks()[source]

Retrieves all failed tasks.

Failed tasks are tasks that were abandoned and have no retry task once the foreman is done processing.

Returns

tasks.

Return type

list[Task]

GetProcessedTaskByIdentifier(task_identifier)[source]

Retrieves a task that has been processed.

Parameters

task_identifier (str) – unique identifier of the task.

Returns

a task that has been processed.

Return type

Task

Raises

KeyError – if the task was not processing, queued or abandoned.

GetStatusInformation()[source]

Retrieves status information about the tasks.

Returns

tasks status information.

Return type

TasksStatus

GetTaskPendingMerge(current_task)[source]

Retrieves the first task that is pending merge or has a higher priority.

This function will check if there is a task with a higher merge priority than the current_task being merged. If so, that task with the higher priority is returned.

Parameters

current_task (Task) – current task being merged or None if no such task.

Returns

the next task to merge or None if there is no task pending merge or

with a higher priority.

Return type

Task

HasPendingTasks()[source]

Determines if there are tasks running or in need of retrying.

Returns

True if there are tasks that are active, ready to be merged or

need to be retried.

Return type

bool

RemoveTask(task)[source]

Removes an abandoned task.

Parameters

task (Task) – task.

Raises

KeyError – if the task was not abandoned or the task was abandoned and was not retried.

SampleTaskStatus(task, status)[source]

Takes a sample of the status of the task for profiling.

Parameters
  • task (Task) – a task.

  • status (str) – status.

StartProfiling(configuration, identifier)[source]

Starts profiling.

Parameters
  • configuration (ProfilingConfiguration) – profiling configuration.

  • identifier (str) – identifier of the profiling session used to create the sample filename.

StopProfiling()[source]

Stops profiling.

UpdateTaskAsPendingMerge(task)[source]

Updates the task manager to reflect that the task is ready to be merged.

Parameters

task (Task) – task.

Raises

KeyError – if the task was not queued, processing or abandoned, or the task was abandoned and has a retry task.

UpdateTaskAsProcessingByIdentifier(task_identifier)[source]

Updates the task manager to reflect the task is processing.

Parameters

task_identifier (str) – unique identifier of the task.

Raises

KeyError – if the task is not known to the task manager.

plaso.multi_processing.worker_process module

The multi-process worker process.

class plaso.multi_processing.worker_process.WorkerProcess(task_queue, storage_writer, collection_filters_helper, knowledge_base, session_identifier, processing_configuration, **kwargs)[source]

Bases: plaso.multi_processing.base_process.MultiProcessBaseProcess

Class that defines a multi-processing worker process.

SignalAbort()[source]

Signals the process to abort.

Module contents