GRR Rapid Response is an incident response framework focused on remote live forensics.
It consists of a python client (agent) that is installed on target systems, and python server infrastructure that can manage and talk to clients.
The goal of GRR is to support forensics and investigations in a fast, scalable manner to allow analysts to quickly triage attacks and perform analysis remotely.
GRR on GitHub
GRR is open source (Apache License 2.0) and is developed on GitHub: github.com/google/grr
Contacts
GitHub issues: github.com/google/grr/issues
GRR Users mailing list: grr-users
Follow us on twitter for announcements of GRR user meetups. We use a gitter chat room during meetups.
Table of contents
What is GRR?
GRR Rapid Response is an incident response framework focused on remote live forensics.
The goal of GRR is to support forensics and investigations in a fast, scalable manner to allow analysts to quickly triage attacks and perform analysis remotely.
GRR consists of 2 parts: client and server.
GRR client is deployed on systems that one might want to investigate. On every such system, once deployed, GRR client periodically polls GRR frontend servers for work. “Work” means running a specific action: downloading file, listing a directory, etc.
GRR server infrastructure consists of several components (frontends, workers, UI servers, fleetspeak) and provides web-based graphical user interface and an API endpoint that allows analysts to schedule actions on clients and view and process collected data.
Remote forensics at scale
GRR was built to run at scale so that analysts are capable of effectively collecting and processing data from large numbers of machines. GRR was built with following scenarios in mind:
Joe saw something weird, check his machine (p.s. Joe is on holiday in Cambodia and on 3G)
Forensically acquire 25 machines for analysis (p.s. they’re in 5 continents and none are Windows)
Tell me if this machine is compromised (while you’re at it, check 100,000 of them - i.e. “hunt” across the fleet)
GRR client features
Cross-platform support for Linux, OS X and Windows clients.
Live remote memory analysis using YARA library.
Powerful search and download capabilities for files and the Windows registry.
OS-level and raw file system access, using the SleuthKit (TSK).
Secure communication infrastructure designed for Internet deployment.
Detailed monitoring of client CPU, memory, IO usage and self-imposed limits.
GRR server features
Fully fledged response capabilities handling most incident response and forensics tasks.
Enterprise hunting (searching across a fleet of machines) support.
Fast and simple collection of hundreds of digital forensic artifacts.
AngularJS Web UI and RESTful JSON API with client libraries in Python, PowerShell and Go.
Powerful data export features supporting variety of formats and output plugins.
Fully scalable back-end capable of handling large deployments.
Automated scheduling for recurring tasks.
Asynchronous design allowing future task scheduling for clients, designed to work with a large fleet of laptops.
Quickstart (have GRR running in 5 minutes)
This page describes how to get GRR clients and server components up and running for the first time. If you have Docker installed, and just want to experiment with a container that already has GRR set up, you can follow the instructions given here. It takes ~2 minutes to download the image and initialize the server.
To start, install the GRR server deb as described here.
On successful installation, you should have an admin interface running on port 8000 by default. Open the Admin UI in a browser, navigate to Binaries -> Executables (details here) and download the installer you need.
Next, run the client installer on the target machine (can be the same machine as the one running the GRR server components) as root:
For Windows you will see a 64 bit installer. Run the installer as admin (it should load the UAC prompt if you are not admin). It should run silently and install the client to
c:\windows\system32\grr\%version%\
. It will also install a Windows Service, start it, and configure the registry keys to make it talk to the URL/server you specified during repack of the clients on the server.For OSX you will see a pkg file, install the pkg. It will add a launchd item and start it.
For Linux you will see a deb and rpms, install the appropriate one. For testing purposes you can run the client on the same machine as the server if you like.
After install, hit Enter in the search box in the top left corner of the UI to see all of your clients that have enrolled with the server. If you don’t see clients, follow the troubleshooting steps.
Installing GRR Server
Overview
Depending on your use case, there are multiple ways of installing the GRR server components. Most users will want to install a release deb. The server deb includes client templates that can be downloaded from GRR’s Admin UI after installation.
Alternatively, GRR can be installed from PIP.
You can install also GRR from source as described here. You should then be able to modify GRR code and build your own server/client installers.
GRR components
GRR consists of a client and a few server components.
Whenever GRR server is installed using one of the methods mentioned above, all the server components are installed. This is a simple way to setup smaller GRR server installations: each component will run as a separate server process on the same machine. However, when scaling GRR server to run on multiple machines, each component can exist on its own machine in the data center and run separately.
GRR components are:
Client
The GRR client is not a server component but it comes bundled with GRR server. GRR client is deployed on corporate assets using the usual mechanism for software distribution and updates (e.g. SMS, apt). The client consists of 2 processes: the fleetspeak client responsible for implementig the communication protocol based on a streaming HTTPS connection. The GRR client process sends and receives GRR messages from the server and implements the business logic.
GRR Datastore
The data store acts both as a central storage component for data, and as a communication mechanism for all GRR server components.
Fleetspeak server
This component terminates the streaming HTTPS connections from the fleetspeak clients and implements the communication protocol. It receives messages from clients and delivers queued messages to the clients.
Fleetspeak admin server
This component provides an interface for the GRR server to send messages to clients.
Fleetspeak database
The fleetspeak system uses a MySQL database for queing messages for clients.
GRR Front End Servers
The front end servers’ main task is to receive GRR messages from the fleetspeak server, un-bundle the contained messages and queue these on the data store. The front end also fetches any messages queued for the client and sends them to the client.
GRR Worker
In order to remain scalable, the front end does not do any processing of data, preferring to offload processing to special worker components. The number of workers can be tuned in response to increased workload. Workers typically check queues in the data stores for responses from the client, process those and re-queue new requests for the clients (See Flows and Queues).
GRR Web UI
GRR Web UI is the central application which enables the incident responder or forensic analyst to interact with the system. It allows for analysis tasks to be queued for the clients, and results of previous stored analysis to be examined. It also acts as an API endpoint: GRR API can be used for automation and integration with other systems.
Securing access to GRR server (important!)
GRR is a powerful system. However, with power comes responsibility and potential security risks.
If you’re running GRR for anything besides simple demo purposes, it’s extremely important to properly secure access to GRR server infrastructure. Because:
Anybody who has root access to your GRR server effectively becomes root on all systems running GRR client talking to your GRR server.
Anybody who has direct write access to GRR datastore effectively becomes root on all systems running GRR client talking to your GRR server.
Consequently, it’s important to secure your GRR infrastructure. Please follow a security checklist below.
GRR Security Checklist
Generate new CA/server keys on initial install. Back up these keys somewhere securely (see Key Management).
Maximally restrict SSH access (or any other kind of direct access) to GRR server machines.
Make sure GRR web UI is not exposed to the Internet and is protected.
For a high security environment:
Make sure GRR’s web UI is served through an Apache or Nginx proxy via HTTPS. If you’re using any kind of internal authentication/authorization system, limit access to GRR web UI when configuring Apache or Nginx. See user authentication documentation.
If there’re more than just a few people working with GRR, turn on GRR approval-based access control
Regenerate code signing key with passphrases for additional security.
Run the http server serving clients on a separate machine to the workers.
Ensure the database server is using strong passwords and is well protected.
Produce obfuscated clients (repack the clients with a different Client.name setting)
Installing from a release server deb (recommended)
This is the recommended way of installing the GRR server components. GRR server debs are built for Ubuntu 18.04 Bionic (x86-64). They may install on Debian or other Ubuntu versions, but compatibility is not guaranteed.
MySQL is GRR’s default database backend, and should be up and running before installing GRR. The database framework can be run alongside GRR on the same machine, or on a remote machine. Here’s how you would install the community edition of MySQL from Ubuntu repositories:
apt install -y mysql-server
If you have never installed MySQL on the machine before, you will be prompted for a password for the ‘root’ database user. After installation completes, you need to change the
max_allowed_packet
setting, otherwise GRR will have issues writing large chunks of data to the database. Put the following into/etc/mysql/my.cnf
:[mysqld] max_allowed_packet=40M
Then restart the MySQL server:
service mysql restart
You will typically want to create a new database user for GRR and give that user access an empty database that the GRR server installation will use:
mysql -u root -p
mysql> CREATE USER 'grr'@'localhost' IDENTIFIED BY 'password'; mysql> CREATE DATABASE grr; mysql> GRANT ALL ON grr.* TO 'grr'@'localhost'; mysql> CREATE USER 'fleetspeak'@'localhost' IDENTIFIED BY 'password'; mysql> CREATE DATABASE fleetspeak; mysql> GRANT ALL ON fleetspeak.* TO 'fleetspeak'@'localhost';
Please note: GRR is senstive to the MySQL’s
max_allowed_packet
setting. Make sure it’s not lower than 20971520. Creation of a new user is optional though since GRR can use the credentials of the root user to connect to MySQL.Download the latest server deb from https://github.com/google/grr/releases:
wget https://storage.googleapis.com/releases.grr-response.com/grr-server_3.4.6-0_amd64.deb
Install the server deb, along with its dependencies, like below:
sudo apt install -y ./grr-server_3.4.6-0_amd64.deb
The installer will prompt for a few pieces of information to get things set up. After successful installation, the
grr-server
andfleetspeak-server
services should be running:root@grruser-bionic:/home/grruser# systemctl status grr-server ● grr-server.service - GRR Service Loaded: loaded (/lib/systemd/system/grr-server.service; enabled; vendor preset: enabled) Active: active (exited) since Wed 2017-11-22 10:16:39 UTC; 2min 51s ago Docs: https://github.com/google/grr Process: 10404 ExecStart=/bin/systemctl --no-block start grr-server@admin_ui.service grr-server@frontend.service grr-server@worker.service grr-server@worker2.service (code=exited, status=0/SUCCESS) Main PID: 10404 (code=exited, status=0/SUCCESS) Tasks: 0 Memory: 0B CPU: 0 CGroup: /system.slice/grr-server.service Nov 22 10:16:39 grruser-bionic systemd[1]: Starting GRR Service... Nov 22 10:16:39 grruser-bionic systemd[1]: Started GRR Service. root@grruser-bionic:/home/grruser# systemctl status fleetspeak-server ● fleetspeak-server.service - Fleetspeak Server Service Loaded: loaded (/lib/systemd/system/fleetspeak-server.service; disabled; vendor preset: enabled) Active: active (running) since Thu 2021-08-12 18:35:13 UTC; 4s ago Docs: https://github.com/google/fleetspeak Main PID: 3555 (fleetspeak-serv) Tasks: 7 (limit: 4666) CGroup: /system.slice/fleetspeak-server.service └─3555 /usr/bin/fleetspeak-server --services_config /etc/fleetspeak-server/server.services.config --components_config /etc/fleetspeak-server/server.components.config Aug 12 18:35:13 grruser-bionic systemd[1]: Started Fleetspeak Server Service.
In addition, administrative commands for GRR, e.g
grr_console
andgrr_config_updater
should be available in your PATH.
Installing from a HEAD DEB
If the Google Cloud SDK is not already installed, download a tarball for the latest version from https://cloud.google.com/sdk/docs/downloads-versioned-archives and extract it to somewhere on your filesystem, e.g:
wget https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-180.0.1-linux-x86_64.tar.gz tar zxf google-cloud-sdk-180.0.1-linux-x86_64.tar.gz -C "${HOME}"
Alternatively, you can install the SDK using
apt-get
as described in https://cloud.google.com/sdk/docs/downloads-apt-get.Use
gsutil
to copy the server deb from its GCS bucket to your local machine:$HOME/google-cloud-sdk/bin/gsutil cp gs://autobuilds.grr-response.com/_latest_server_deb/*.deb .
Install the server deb the same way you would a release deb.
Installing from released PIP packages
If the templates included in release server debs are not compatible with the platform you would like to run them on, you have the option of installing GRR from PIP on your target platform, then building your own.
First, install the prerequisites:
Ubuntu:
apt install -y debhelper dpkg-dev python-dev python-pip rpm zip
Centos:
yum install -y epel-release python-devel wget which libffi-devel \
openssl-devel zip git gcc gcc-c++ redhat-rpm-config
yum install -y python-pip
Next, upgrade pip and install virtualenv:
sudo pip install --upgrade pip virtualenv
Next, create a virtualenv and install the GRR client-builder package:
virtualenv GRR_ENV
source GRR_ENV/bin/activate
pip install grr-response-client-builder
Once that is done, you can build a template for your platform with:
grr_client_build build --output mytemplates
and repack it with:
grr_client_build repack --template mytemplates/*.zip --output_dir mytemplates
If you would like to experiment with GRR’s server components, you will need
to first install the grr-response-server
pip package. Administrative
commands such as as grr_server
and grr_config_updater
will be added to
the virtualenv. You can then launch the server components as follows:
# To run fleetspeak server.
grr_server --component fleetspeak_server --verbose
# To run AdminUI.
grr_server --component admin_ui --verbose
# To run the worker.
grr_server --component worker --verbose
# To run the frontend.
grr_server --component frontend --verbose
Note that GRR requires Python 3.6+, so for platforms with older default Python versions (e.g Centos 6), you need to build a newer version of Python from source and use that for creating the virtualenv.
Installing from source
Here’s how to install GRR from source (from github HEAD). This method is useful when you intend to do some development or when you need to build GRR for an architecture not supported by the binary distribution (i.e. arm64 or PowerPC).
For the instructions below, we assume that Ubuntu 18 is used.
Prerequisites
First, install the prerequisites:
sudo apt-get update
sudo apt install -y fakeroot debhelper libffi-dev libssl-dev python3-dev \
python3-pip python3-venv wget openjdk-8-jdk zip git devscripts dh-systemd \
dh-virtualenv libc6-i386 lib32z1 asciidoc libmysqlclient-dev
# Go (version 1.13 or newer) is needed to build Fleetspeak. If you already
# have Go installed on your system, you can skip this part.
# We use amd64 version here, you might need to download another Go
# distribution matching your architecture (see https://go.dev/dl/).
wget https://go.dev/dl/go1.17.3.linux-amd64.tar.gz
sudo bash -c "rm -rf /usr/local/go && tar -C /usr/local -xzf go1.17.3.l
inux-amd64.tar.gz"
export PATH=$PATH:/usr/local/go/bin
Building Fleetpseak
Fleetspeak is a communication middleware (effectively, a client and a server)
used by GRR to communicate with endpoints. GRR is tightly integrated with
Fleetspeak and expects Fleetspeak binaries to be present in
fleetspeak-server-bin
and fleetspeak-client-bin
PIP packages.
Create and activate a Python virtualenv at $HOME/INSTALL:
python3 -m venv $HOME/INSTALL
source $HOME/INSTALL/bin/activate
pip install --upgrade pip setuptools wheel
Next, build fleetspeak-server-bin
and fleetspeak-client-bin
Python packages.
These packages are effectively Python PIP package wrappers around Fleetspeak
binaries.
# Check out Fleetspeak. git clone https://github.com/google/fleetspeak.git # This is needed for Protobuf builds. pip install grpcio-tools cd fleetspeak/ pip install -e fleetspeak_python/ ./fleetspeak/build.sh ./fleetspeak/build-pkgs.sh python fleetspeak/server-wheel/setup.py bdist_wheel --package-root fleetspeak/server-pkg/debian/fleetspeak-server/ --version 0.1.11 pip install dist/fleetspeak_server_bin-0.1.11-py2.py3-none-linux_x86_64.whl python fleetspeak/client-wheel/setup.py bdist_wheel --package-root fleetspeak/client-pkg/debian/fleetspeak-client/ --version 0.1.11 pip install dist/fleetspeak_client_bin-0.1.11-py2.py3-none-linux_x86_64.whl cd ..
Now, when you do pip freeze
, you should see fleetspeak-server-bin
and fleetspeak-client-bin
installed inside the virtualenv:
$ pip freeze ... fleetspeak-client-bin @ file:///home/foo/fleetspeak/dist/fleetspeak_client_bin-0.1.11-py2.py3-none-linux_x86_6 4.whl fleetspeak-server-bin @ file:///home/foo/fleetspeak/dist/fleetspeak_server_bin-0.1.11-py2.py3-none-linux_x86_64.whl ...
Building GRR
Next, you need to build GRR. Download the GitHub repo and cd into its directory:
git clone https://github.com/google/grr.git
cd grr
If your goal is development
Run the following script to install GRR into the virtualenv:
./travis/install.sh
Please check Installing from a release server deb for instructions on how to configure MySQL server.
You should now be able to run GRR commands from inside the virtualenv, particularly, you can trigger GRR server configuration.
source $HOME/INSTALL/bin/activate
grr_config_updater initialize # Initialize GRR's configuration
Then, to run individual GRR components (for a Fleetspeak-enabled configuration):
# To run fleetspeak server.
grr_server --component fleetspeak_server --verbose
# To run AdminUI.
grr_server --component admin_ui --verbose
# To run the worker.
grr_server --component worker --verbose
# To run the frontend.
grr_server --component frontend --verbose
If your goal is to build a custom GRR client template
In this case you don’t need a full blown GRR server setup. Run the following:
./travis/install_client_builder.sh
This will install necessary PIP packages into your virtualenv to run the
grr_client_build
command. See
Building custom client templates
for how to use it.
Installing via GRR Docker image
The GRR Docker image is currently intended for evaluation/testing use, but the plan is to support simple cloud deployment of a stable production image in the future.
The instructions below get you a recent stable docker image. We also build an
image automatically from the
latest commit in the github repository which is more up-to-date but isn’t
guaranteed to work. If you want bleeding edge you can use grrdocker/grr:latest
in the commands below.
How to use the image
docker run \ --name grr-server \ -e EXTERNAL_HOSTNAME=localhost \ -e ADMIN_PASSWORD=demo \ -p 0.0.0.0:8000:8000 -p 0.0.0.0:8080:8080 \ grrdocker/grr:v3.4.6.0
Once initialization finishes point your web browser to localhost:8000 and login with admin:demo. Follow the final part of the quickstart instructions to download and install the clients.
EXTERNAL_HOSTNAME is the hostname you want GRR agents (clients) to poll back to, “localhost” is only useful for testing.
ADMIN_PASSWORD is the password for the “admin” user in the webui.
The container will listen on port 8000 for the admin web UI and port 8080 for client polls.
Advanced usage
Using configs on the host
The default config directory for GRR’s server installation, specified by
the config option Config.directory
, is
/usr/share/grr-server/install_data/etc
(inside the container). You can
mount a directory on the host system over that directory, effectively
overriding the directory inside the container. Any changes made by GRR inside
that directory (notably, creating and modifying the server writeback file) will
be reflected on the host instead of inside the container. Here’s how to
accomplish that:
Copy over the initial configs for the server installation from GRR’s image to the host’s filesystem:
mkdir ~/grr-docker docker create --name grr-server grrdocker/grr:v3.4.6.0 docker cp grr-server:/usr/share/grr-server/install_data/etc ~/grr-docker docker rm grr-server
Start a new container as usual with the command given previously, but with this extra argument:
-v ~/grr-docker/etc:/usr/share/grr-server/install_data/etc
Logging to the host
Note that verbose logging is disabled by default. You can enable verbose
logging by setting Logging.verbose
in grr-server.yaml
(the primary GRR config) to True
after copying the file to the host as
described above.
The default server logging directory, specified by the config option
Logging.path
, is
/usr/share/grr-server/lib/python2.7/site-packages/grr_response_core/var/log/
.
As with overriding the config directory used by GRR, you can mount a directory
on the host over that directory, like below:
-v /var/log/grr:/usr/share/grr-server/lib/python2.7/site-packages/grr_response_core/var/log
Using the host’s MySQL instance.
By default, GRR is configured to connect to a MySQL instance that is
already installed in the container. An easy way of setting up the GRR container
to connect to an external MySQL instance instead involves enabling
host networking, by setting the
docker run
argument --network
to host
. In addition, you should set
the following environment variables:
DISABLE_INTERNAL_MYSQL
totrue
(lowercase)GRR_MYSQL_HOSTNAME
to127.0.0.1
GRR_MYSQL_PASSWORD
to the password of the DB user GRR will use, which by default isroot
.
Here’s an example:
```bash docker run \ --name grr-server \ --network host \ -e EXTERNAL_HOSTNAME=localhost \ -e ADMIN_PASSWORD=demo \ -e DISABLE_INTERNAL_MYSQL=true \ -e GRR_MYSQL_HOSTNAME=127.0.0.1 \ -e GRR_MYSQL_PASSWORD="${ROOT_MYSQL_PASS}" \ grrdocker/grr:v3.4.6.0 ```
It is worth noting that host
networking disables publishing of ports,
which is why the -p
arguments are left out from the command above.
If host networking is not an option (e.g for non-Linux hosts), you need to configure the MySQL installation on the host to allow connections from Docker before starting the container.
Besides the environment variables given above, other
MySQL-related variables that can be specified include
GRR_MYSQL_PORT
(default 0), GRR_MYSQL_DB
(default grr), and
GRR_MYSQL_USERNAME
(default root).
Note that if you’re running boot2docker on OS X there are a few bugs with docker itself that you will probably need to work around. You’ll likely have to set up port forwards for 8000 and 8080 as described here.
Interactive mode
GRR containers can be started interactively with:
docker run -it grrdocker/grr:v3.4.6.0 /bin/bash
GRR gets installed into a virtualenv in
/usr/share/grr-server
. Thus, the easiest way to run any of the GRR binaries
inside the Docker container is by first activating the virtualenv:
source /usr/share/grr-server/bin/activate
After that, commands such as grr_server
, grr_config_updater
, grr_console
,
etc become available in the PATH.
Building custom docker images (Advanced)
It is advisable to read GRR’s Dockerfile in order to get an understanding of how GRR’s CI workflows build docker images before attempting this.
Check out GRR’s source code:
git clone https://github.com/google/grr
Make the changes you want to make to the Dockerfile, or related scripts. It is important to note that by default, GRR gets installed in the image from pre-built pip sdists hosted in Google Cloud Storage (so client or server code changes you make in your local repository will not be applied to the custom image).
Identify the commit for which you would like to fetch pre-built client and server pip sdists. For HEAD, that would be the output of:
git rev-parse HEAD
cd to the root of GRR’s repository if not already there, then build your custom image with:
docker build \ -t grrdocker/grr:custom \ --build-arg GCS_BUCKET=autobuilds.grr-response.com \ --build-arg GRR_COMMIT=${GRR_COMMIT_SHA} \ .
The image above will be tagged
grrdocker/grr:custom
, but you are free to use whatever tag you like.
Troubleshooting (“GRR server doesn’t seem to run”)
This page describes common issues encountered when installing the GRR server components.
OperationalError: Access denied; you need (at least one of) the SUPER privilege(s) for this operation
MySQL server used by GRR has max_allowed_packet
setting that’s too low. It has to be at
least 20971520. If the setting is lower, GRR tries to execute
SET GLOBAL max_allowed_packet=20971520;
statement and fails if it doesn’t have enough
privileges. To mitigate the issue, change the MySQL server’s max_allowed_packet
setting
(by executing the statement above by a user with enough privileges, for example).
ImportError: cannot import name jobs_pb2 or similar
If you see “ImportError: cannot import name jobs_pb2” or a similar error for any other _pb2 file, you need to regenerate the protobuf files. Just run
python setup.py build
sudo python setup.py install
The upstart/init.d scripts show no output
When I run an init.d script e.g. “/etc/init.d/grr-http-server start” it does not show me any output.
Make sure that the “START” parameter in the corresponding default file, e.g. “/etc/default/grr-http-server”, has been changed to “yes”.
I cannot start any/some of the GRR services using the init.d scripts
When I run an init.d script e.g. “/etc/init.d/grr-http-server start” it indicates it started the service although when I check with “/etc/init.d/grr-http-server status” it says it is not running.
You can troubleshoot by running the services in the foreground, e.g. to run the HTTP Front-end server in the foreground:
sudo grr_server --start_http_server --verbose
Any/some of the GRR services are not running correctly
Check if the logs contain an indication of what is going wrong.
Troubleshoot by running the services in the foreground, e.g. to run the UI in the foreground:
sudo grr_server --verbose --start_ui
Cannot open libtsk3.so.3
error while loading shared libraries: libtsk3.so.3: cannot open shared object file: No such file or directory
The libtsk3 library cannot be found in the ld cache. Check if the path to libtsk3.so.3 is in /etc/ld.so.conf (or equivalent) and update the cache:
sudo ldconfig
Protobuf
Travis jobs for GRR’s github repository use this script to install protobuf. Older versions of protobuf may not be compatible with the GRR version you are trying to install.
System-wide Python packages
To avoid version conflicts with the GRR server’s Python environment (such as this one), we recommend using virtualenvs wherever possible on machines where GRR server components are to be installed.
Deploying GRR clients
Overview
This document describes getting clients up and running for the first time.
Getting started
Once we’ve got the GRR server installed and running we’ll want to start deploying some clients.
To do so we’ll need to:
Download the specific client version we need to install on the system.
Decide on a deployment method.
Perform the deployment and verify the results.
Requirements
A number of machines (or VMs) to talk to the server. OSX, Windows and Linux clients are supported. Client and server can run on the same host for testing purposes.
Installing the Clients
Downloading clients
If your server install went successfully, the clients should have been uploaded to the server with working configurations, and should be available in the Admin UI.
Click on “Binaries” in the left menu. The files should be in the executables directory, under installers.
If your server configuration has changed your clients will need to be repacked with an updated config. For details see the server documentation.
Installation steps differ significantly depending on the operating system so we split it into separate sections below.
Run the client on the target machine as administrator: Windows instructions, OSX instructions, Linux instructions.
See How to check if a deployed client talks back to the GRR server.
Uninstalling GRR
A quick manual on how to remove the GRR client completely from a machine is included in the platform-specific docs: Windows instructions, OSX instructions, Linux instructions
Notes
Deploying at scale
There shouldn’t be any special considerations for deploying GRR clients at scale. If the server can’t handle the load, the clients should happily back off and wait their turn. However, we recommend a staged rollout if possible.
Client and Server Version Compatibility and Numbering
We try hard to avoid breaking backwards compatibility for clients since upgrading can be painful, but occasionally we need to make changes that require a new client version to work. As a general rule you want to upgrade the server first, then upgrade the clients fairly soon after.
Matching major/minor versions of client and server should work well together. i.e. Clients 3.1.0.0 and 3.1.6.2 should work well with servers 3.1.0.0 and 3.1.9.7 because they are all 3.1 series. We introduced this approach for the 3.1.0.0 release.
For older servers and clients, matching the last digit provided similar guarantees. i.e. client 3.0.0.7 was released with server 0.3.0-7 and should work well together.
How to check if a deployed client talks back to the GRR server
If a client installation is successful, the client should appear in the web UI search within a few seconds.
After install, hit Enter in the search box in the top left corner of the UI to see all of your clients that have enrolled with the server.
If you don’t see the clients, follow the troubleshooting steps.
Installing GRR clients on Windows
MSI installer
Since version 3.4.5.1, GRR provides a new MSI based installer for the client on Windows.
The client can be installed either by double-clicking the installer or by using
the
msiexec
command.
The client can be removed via the “Apps and features” setting in the control
panel or by using msiexec
.
Legacy self-extracting .exe installer
For Windows you will see a 64 bit installer. Run the installer as admin (it
should load the UAC prompt if you are not admin). It should run silently and
install the client to c:\windows\system32\grr\%version%\
. It will also
install a Windows Service, start it, and configure the registry keys to make it
talk to the URL/server you specified during repack of the clients on the
server.
The Windows clients are special self extracting zipfiles. Just double click or otherwise execute the binary. If you are not an administrator it will prompt you for credentials. It should then install silently in the background, unless you enabled the verbose build.
The most straightforward way to deploy a GRR client to a Windows machine is to use PsExec. PsExec allows one to execute commands on a remote system if credentials for a valid user are known.
To do so, start by downloading psexec and placing in a directory of your choice, we’ll call it CLIENT_DIRECTORY here. Store the version of the client you want to download to the same directory.
Once you have both, you have to make sure you know the username and password of an Administrator user in the remote system. Once all these requirements are met, just start a cmd.exe shell and type:
cd C:\CLIENT_DIRECTORY\
net use \\MACHINE\IPC$ /USER:USERNAME *
psexec \\MACHINE -c -f -s client-version.exe
Note: The NET USE
command will ask for a password interactively,
so it’s not suited for using in scripts. You could Switch the *
for
the PASSWORD instead if you want to include it in a script.
You’ll need to replace:
C:\CLIENT_DIRECTORY\ with the full path you chose.
MACHINE with the name of the target system.
USERNAME with the user with administrative privileges on the target system.
This will copy the client-version.exe executable on the target system and execute it. The installation doesn’t require user input.
The expected output is something along these lines:
C:\> cd C:\CLIENT_DIRECTORY\
C:\> net use \\127.0.0.1\IPC$ /USER:admin *
Type the password for \\127.0.0.1\IPC$:
The command completed successfully
C:\CLIENT_DIRECTORY> psexec \\127.0.0.1 -c -f -s client.exe
PsExec v1.98 - Execute processes remotely
Copyright (C) 2001-2010 Mark Russinovich
Sysinternals - www.sysinternals.com
The command completed successfully.
client.exe exited on 127.0.0.1 with error code 0.
C:\CLIENT_DIRECTORY>
For even less footprint on installation you could host the client on a shared folder on the network and use this psexec command instead:
cd C:\CLIENT_DIRECTORY\
net use \\MACHINE\IPC$ /USER:USERNAME *
psexec \\MACHINE -s \\SHARE\FOLDER\client-version.exe
This requires the USERNAME on the remote MACHINE be able to log into SHARE and access the shared folder FOLDER. You can do this either by explicitly allowing the user USERNAME on that share or by using an Anonymous share.
The best way to verify whether the whole installation process has worked is to search for the client in the GUI.
Uninstalling
On Windows the client does not have a standard uninstaller. It is designed to have minimal impact on the system and leave limited traces of itself such that it can be hidden reasonably easily. Thus it was designed to install silently without an uninstall.
Disabling the service can be done with the Uninstall GRR flow, but this does not clean up after itself by default.
Cleaning up the client is a matter of deleting the service and the install directory, then optionally removing the registry keys and install log if one was created.
On Windows, GRR lives in
%SystemRoot%\system32\grr\*
The service can be stopped with
sc stop "grr monitor"
Or via the task manager.
The GRR config lives in the registry, for a full cleanup, the path
HKEY_LOCAL_MACHINE\Software\GRR
should be deleted.
Removing the GRR client completely from a machine:
sc stop "grr monitor"
sc delete "grr monitor"
reg delete HKLM\Software\GRR
rmdir /Q /S c:\windows\system32\grr
del /F c:\windows\system32\grr_installer.txt
Installing GRR clients on Mac OS X
For OSX you will see a pkg file, install the pkg. It will add a launchd item and start it.
See Linux instructions. They apply also to OSX.
On OSX you can also use the Uninstall GRR flow.
Uninstalling on Mac OS X
This is a quick manual on how to remove the GRR client completely from a machine.
On OSX, pkg uninstall is not supported. The files to delete are:
/usr/local/lib/grr/*
/etc/grr.local.yaml
/Library/LaunchDaemons/com.google.code.grr.plist
The service can be stopped using
sudo launchctl unload /Library/LaunchDaemons/com.google.corp.grr.plist
Installing GRR clients on Linux
For Linux you will see a deb and rpms, install the appropriate one. For testing purposes you can run the client on the same machine as the server if you like.
The process depends on your environment, if you have a mechanism such as puppet, then building as a Deb package and deploying that way makes the most sense. Alternatively you can deploy using ssh:
scp client_version.deb host:/tmp/
ssh host sudo dpkg -i /tmp/client_version.deb
Uninstalling on Linux
This is a quick manual on how to remove the GRR client completely from a machine.
On Linux the standard system packaging (deb, pkg) is used by default. Use the standard uninstall mechanisms for uninstalling.
dpkg -r grr
This might leave some config files lying around, if a complete purge is necessary, the list of files to delete is:
/usr/lib/grr/*
/etc/grr.local.yaml
/etc/init/grr.conf
The GRR service can be stopped using
sudo service grr stop
Life of a GRR client (what happens after deployment)
When a new client starts, it notices it doesn’t have a public/private key pair.
The client generates the keys. A hash of the public key becomes the client’s ID. The ID is unique.
The client enrolls with the GRR server.
The server sees the client ID for the first time & interrogates the new client.
Client Robustness Mechanisms
We have a number of mechanisms built into the client to try and ensure it has sensible resource requirements, doesn’t get out of control, and doesn’t accidentally die. We document them here.
Heart beat
The client process regularly writes to a registry key (file on Linux and OSX) with a timer. The nanny process watches this registry key called HeartBeat, if it notices that the the client hasn’t updated the heartbeat in the time allocated by UNRESPONSIVE_KILL_PERIOD (default 3 minutes), the nanny will assume the client has hung and will kill it. In Windows we then rely on the Nanny to revive it, on Linux and OSX we rely on the service handling mechanism to do so.
Transaction log
When the client is about to start an action it writes to a registry key containing information about what it is about to do. If the client dies while performing the action, when the client gets restarted it will send an error along with the data from the transaction log to help diagnose the issue.
One tricky thing with the transaction log is the case of Bluescreens or kernel panics. Writing to the transaction log will write a registry key on Windows, but registry keys are not flushed to disk immediately. Therefore, writing a transaction log, and then getting a hard BlueScreen or kernel panic, the transaction log won’t be persistent, and therefore the error won’t be sent. We work around this by adding a Flush to the transaction log when we are about to do dangerous transactions, such as loading a memory driver. But if the client dies during a transaction we didn’t deem as dangerous, it is possible that you will not get a crash report.
Memory limit
We have a hard and a soft memory limit built into the client to stop it getting out of control. The hard limit is enforced by the nanny, if the client goes over that limit it will be hard killed. The soft limit is enforced by the client, if the limit is exceeded the client will stop retrieving new work to do. Once it has finished its current work it will die cleanly.
Default soft limit is 500MB, but GRR should only use about 30MB. Some volatility plugins can use a lot of memory so we try to be generous. Hard limit is double the soft limit. This is configurable from the config file.
CPU limit
A ClientAction can be transmitted from the server with a specified CPU limit, this is how many seconds the action can use. If the action uses more than that it will be killed. The actual implementation is a little more complicated. An action can run for 3 minutes using any CPU it wants before being killed by nanny. However actions that are good citizens (normally the dangerous ones) will call the Progress() function regularly. This function checks if limit has been exceeded and will exit.
Advanced
The GRR client uses HTTP to communicate with the server.
The client connections are controlled via a number of config parameters:
Client.error_poll_min: Time to wait between retries in an ERROR state.
Client.server_urls: A list of URLs for the base control server.
Client.proxy_servers: A list of proxies to try to connect through.
Client.poll_max, Client.poll_min: Parameters for timing of SLOW POLL and FAST POLL modes.
The client goes through a state machine:
In the INITIAL state, the client has no active server URL or active proxy and it is therefore searching through the list of proxies and connection URLs for one that works. The client will try each combination of proxy/URL in turn without delay until a 200 or a 406 message is seen. If all possibilities are exhausted, and a connection is not established, the client will switch to SLOW POLL mode (and retry connection every Client.poll_max).
In SLOW POLL mode the client will wait Client.poll_max between re-connection attempts.
If a server is detected, the client will communicate with it. If the server returns a 406 error, the client will send an enrollment request. Enrollment requests are only re-sent every 10 minutes (regardless of the frequency of 406 responses). Note that a 406 message is considered a valid connection and the client will not search for URL/Proxy combinations as long as it keep receiving 406 responses.
During CONNECTED state, the client has a valid server certificate, receives 200 responses from the server and is able to send messages to the server. The polling frequency in this state is determined by the polling mode requested by the messages received or send. If any message from the server or from the worker queue (to the server) has the require_fastpoll flag set, the client switches into FAST POLL mode.
When not in FAST POLL mode, the polling frequency is controlled by the Timer() object. It is currently a geometrically decreasing function which starts at the Client.poll_min and approaches the Client.poll_max setting.
If a 500 error occurs in the CONNECTED state, the client will assume that the server is temporarily down. The client will switch to the RETRY state and retry sending the data with a fixed frequency determined by Client.error_poll_min to the same URL/Proxy combination. The client will retry for retry_error_limit times before exiting the CONNECTED state and returning to the INITIAL state (i.e. the client will start searching for a new URL/Proxy combination). If a retry is successful, the client will return to its designated polling frequency.
If there are connection_error_limit failures, the client will exit. Hopefully the nanny will restart the client.
Examples:
Client starts up on a disconnected network: Client will try every URL/Proxy combination once every Client.poll_max (default 10 minutes).
Client connects successful but loses network connectivity. Client will re-try retry_error_limit (10 times) every Client.error_poll_min (1 Min) to resent the last message. If it does not succeed it starts searching for a new URL/Proxy combination as in example 1.
See comms.py for the client implementation.
GRR Client Protection
The open source agent does not contain protection against being disabled by administrator/root on the machine. E.g. on Windows, if an attacker stops the service, the agent will stop and will no longer be reachable. Currently, it is up to the deployer of GRR to provide more protection for the service.
Obfuscation
If every deployment in the world is running from the same location and
the same code, e.g. c:\program files\grr\grr.exe
, it becomes a pretty
obvious thing for an attacker to look for and disable. Luckily the
attacker has the same problem an investigator has in finding malware on
a system, and we can use the same techniques to protect the client. One
of the key benefits of having an open architecture is that customization
of the client and server is easy, and completely within your control.
For a test, or low security deployment, using the defaults open source agents is fine. However, in a secure environment we strongly recommend using some form of obfuscation.
This can come in many forms, but to give some examples:
Changing service, and binary names
Changing registry keys
Obfuscating the underlying python code
Using a packer to obfuscate the resulting binary
Implementing advanced protective or obfuscation functionality such as those used in rootkits
Implementing watchers to monitor for failure of the client
GRR does not include any obfuscation mechanisms by default. But we attempt to make this relatively easy by controlling the build process through the configuration file.
Enrollment
In the default setup, clients can register to the GRR server with no prior knowledge. This means that anyone who has a copy of the GRR agent, and knows the address of your GRR server can register their client to your deployment. This significantly eases deployment, and is generally considered low risk as the client has no control or trust on the server.
However, it does introduce some risk, in particular:
If there are flows or hunts you deploy to the entire fleet, a malicious client may receive them. These could give away information about what you are searching for.
Clients are allowed to send some limited messages to the server without prompting, these are called Well Known flows. By default these can be used to send log messages, or errors. A malicious client using these could fill up logs and disk space.
If you have custom Well Known Flows that perform interesting actions. You need to be aware that untrusted clients can call them. Most often this could result in a DoS condition, e.g. through a client sending multiple install failure or client crash messages.
In many environments this risk is unwarranted, so we suggest implementing further authorization in the Enrollment Flow using some information that only your client knows, to authenticate it before allowing it to become a registered client.
Note that this does not give someone the ability to overwrite data from another client, as client name collisions are protected.
Troubleshooting (“I don’t see my clients”)
Debugging the client installation
If the installer is failing to run, it should output a log file which will help you debug. The location of the logfile is configurable, but by default should be:
Windows: %WinDir%\system32\logfiles\GRR_installer.txt
Linux/Mac OSX: /tmp/grr_installer.txt
Once you have done this, you can download the new binary from the Web UI. It should have the same configuration, but will output detailed progress to the console, making it much easier to debug.
Note that the binary is also a zipfile, you can open it in any capable zip reader. Unfortunately this doesn’t include the built in Windows zip file handler but does include winzip or 7-zip. Opening the zip is useful for reading the config or checking that the right dependencies have been included.
Repacking the Windows client in verbose mode enables console output for both the installer and for the application itself. It does so by updating the header of the binary at PE_HEADER_OFFSET + 0x5c from value 2 to 3. This is at 0x144 on 64 bit and 0x134 on 32 bit Windows binaries. You can do this manually with a hex editor as well.
Interactively Debugging the Client
On each platform, the client binary should support the following options:
–verbose: This will set higher logging allowing you to see what is going on.
–pdb_post_mortem: If set, and an unhandled error occurs in the client, the client will break into a pdb debugging shell.
C:\Windows\system32>net stop "grr monitor" The GRR Monitor service is stopping. The GRR Monitor service was stopped successfully. C:\Windows\system32>c:\windows\system32\grr\3.4.6.0\grr.exe --config grr.exe.yaml --verbose
Note that on Windows, the GRR client is running completely in the background so starting it from the console will not give you any output. For this reason, we always also build a dbg_GRR[…].exe client that is a console application and can be used for this kind of debugging.
Changing Logging For Debugging
On all platforms, by default only hard errors are logged. A hard error is defined as anything level ERROR or above, which is generally reserved for unrecoverable errors. But because temporary disconnections are normal, an client failing to talk to the server doesn’t actually count as a hard error.
In the client you will likely want to set:
Logging.verbose: True
And depending on your configuration, you can play with syslog, log file
and Windows EventLog logging using parameters Logging.path
, and
Logging.engines
.
Proxies and Connectivity
If an client can’t connect to the server, there can be a number of reasons such as:
Server Isn’t Listening Confirm you can connect to the server and retrieve the server.pem file. E.g.
wget http://server:8080/server.pem
Proxy Required For Access If the environment doesn’t allow direct connections GRR may need to use a proxy. GRR currently doesn’t support Proxy Autoconfig or Proxy Authentication. GRR will attempt to guess your proxy configuration, or you can explicitly set proxies in the config file, e.g.
Client.proxy_servers: ["http://cache.example.com:3128/"]
On Windows systems GRR will try a direct connection, and then search for configured proxies in all users profiles on the system trying to get a working connection. On Linux GRR should obey system proxy settings, and it will also obey environment variables. e.g.
$ export http_proxy=http://cache.example.com:3128
Outbound Firewall Blocking Connections GRR doesn’t do anything to bypass egress firewalling by default. However, if you have a restrictive policy you could add this as an installer plugin.
If you look at the running config, the first time the client
successfully connects to the server a variable
Client.server_serial_number
will be written to the config. If that
exists, the client successfully made a connection.
Crashes
The client shouldn’t ever crash… but it does because making software is hard. There are a few ways in which this can happen, all of which we try and catch, record and make visible to allow for debugging. In the UI they are visible in two ways, in “Crashes” when a client is selected, and in “All Client Crashes”. These have the same information but the client view only shows crashes for the specific client.
Each crash should contain the reason for the crash, optionally it may contain the flow or action that caused the crash. In some cases this information is not available because the client may have crashed when it wasn’t doing anything or in a way where we could not tie it to the action.
This data is also emailed to the email address configured in the config
as Monitoring.alert_email
Crash Types
Crashed while executing an action
Often seen with an error “Client killed during transaction”. This means that while handling a specific action, the client died. The nanny knows this because the client recorded the action it was about to take in the Transaction Log before starting it. When the client restarts it picks up this log and notifies the server of the crash.
Causes
Client segfaults, could happen in native code such as Sleuth Kit or psutil.
Hard reboot while the machine was running an action where the client service didn’t have a chance to exit cleanly.
Unexpected child process exit!
This means the client exited, but the nanny didn’t kill it.
Causes
Uncaught exception in python, very unlikely due to the fact that we catch Exception for all client actions.
Memory limit exceeded, exiting
This means the client exited due to exceeding the soft memory limit.
Causes
Client hits the soft memory limit. Soft memory limit is when the client knows it is using too much memory but will continue operation until it finishes what it is doing.
Nanny Message - No heartbeat received
This means that the Nanny killed the client because it didn’t receive a Heartbeat within the allocated time.
Causes
The client has hung, e.g. locked accessing network file
The client is performing an action that is taking longer than it should.
I don’t see my clients
Here are common client troubleshooting steps you can work through to diagnose client install and communications problems.
Check that the client got installed
You should have something like:
C:\Windows\System32\GRR\3.1.0.2\GRR.exe
on windows;/usr/lib/grr/grr_3.1.0.2_amd64/grrd
on linux; and/usr/local/lib/grr_3.1.0.2_amd64/grrd
on OSX
with variations based on the GRR version and architecture you installed. If you didn’t get this far, then there is a problem with your installer binary.
Check the client is running
On linux and OS X you should see two processes, something like:
$ ps aux | grep grr
root 957 0.0 0.0 11792 944 ? Ss 01:12 0:00 /usr/sbin/grrd --config=/usr/lib/grr/grr_3.1.0.2_amd64/grrd.yaml
root 1015 0.2 2.4 750532 93532 ? Sl 01:12 0:01 /usr/sbin/grrd --config=/usr/lib/grr/grr_3.1.0.2_amd64/grrd.yaml
On windows you should see a GRR Monitor
service and a GRR.exe
process in taskmanager.
If it isn’t running check the install logs and other logs in the same directory:
Linux/OS X:
/var/log/grr_installer.txt
Windows:
C:\Windows\System32\LogFiles\GRR_installer.txt
and then try running it interactively as below.
Check the client machine can talk to the server
The URL here should be the server address and port you picked when you
set up the server and listed in Client.control_urls
in the client’s
config.
wget http://yourgrrserver.yourcompany.com:8080/server.pem
# Check your config settings, note that clients earlier than 3.1.0.2 used Client.control_urls
$ sudo cat /usr/lib/grr/grr_3.1.0.2_amd64/grrd.yaml | grep Client.server_urls
Client.server_urls: http://yourgrrserver.yourcompany.com:8080/
If you can’t download that server.pem, the common causes are that your server isn’t running or there are firewalls in the way.
Run the client in verbose mode
Linux: Stop the daemon version of the service and run it in verbose mode:
$ sudo service grr stop
$ sudo /usr/sbin/grrd --config=/usr/lib/grr/grr_3.1.0.2_amd64/grrd.yaml --verbose
OS X: Unload the service and run it in verbose mode:
$ sudo launchctl unload /Library/LaunchDaemons/com.google.code.grr.plist
$ sudo /usr/local/lib/grr_3.1.0.2_amd64/grrd --config=/usr/local/lib/grr_3.1.0.2_amd64/grrd.yaml --verbose
Windows: The normal installer isn’t a terminal app, so you don’t get any output if you run it interactively.
Install the debug
dbg_GRR_3.1.0.2_amd64.exe
version to make it a terminal app.Stop the
GRR Monitor
service in task manager
Then run in a terminal as Administrator
cd C:\Windows\System32\GRR\3.1.0.2\
GRR.exe --config=GRR.exe.yaml --verbose
If this is a new client you should see some 406’s as it enrols, then they stop and are replaced with sending message lines with increasing sleeps in between. The output should look similar to this:
Starting client aff4:/C.a2be2a27a8d69c61
aff4:/C.a2be2a27a8d69c61: Could not connect to server at http://somehost:port/, status 406
Server PEM re-keyed.
sending enrollment request
aff4:/C.a2be2a27a8d69c61: Could not connect to server at http://somehost:port/, status 406
Server PEM re-keyed.
sending enrollment request
aff4:/C.a2be2a27a8d69c61: Could not connect to server at http://somehost:port/, status 406
Server PEM re-keyed.
aff4:/C.a2be2a27a8d69c61: Sending 3(1499), Received 0 messages in 0.771058797836 sec. Sleeping for 0.34980125
aff4:/C.a2be2a27a8d69c61: Sending 0(634), Received 0 messages in 0.370272874832 sec. Sleeping for 0.4022714375
aff4:/C.a2be2a27a8d69c61: Sending 0(634), Received 0 messages in 0.333703994751 sec. Sleeping for 0.462612153125
aff4:/C.a2be2a27a8d69c61: Sending 0(634), Received 0 messages in 0.345727920532 sec. Sleeping for 0.532003976094
aff4:/C.a2be2a27a8d69c61: Sending 0(634), Received 0 messages in 0.346176147461 sec. Sleeping for 0.611804572508
aff4:/C.a2be2a27a8d69c61: Sending 0(634), Received 8 messages in 0.348709106445 sec. Sleeping for 0.2
If enrolment isn’t succeeding (406s never go away), make sure you’re
running a worker process and check logs in /var/log/grr-worker.log
.
Investigating with GRR
Overview
Once set up and configured, the GRR user interface is a web interface which allows the analyst to search for connected client (agent) machines, examine what data has been collected from the machines and issue requests to collect additional data.
The GRR server also provides access to this functionality through a JSON API. Client libraries to support scripting from python or go are provided.
Starting Points
Depending on the type of client data that the analyst is interested in, there are several places that they might start.
Virtual File System
The virtual file system shows the files, directories, and registry entries which have already been collect from a client. It shows when the entry was collected, and provides some buttons to collect additional buttons of this sort.
This is a natural starting point for ad-hock examination of an individual machine.
Flows
A Flow performs one or more operations on a client machine, in order to collect or check for data. For example, the data collection buttons shown by the virtual file system start flows to collect specific files and directories. However, flows can do many other things - from searching a directory for files containing a particular substring, to recording the current network configuration. The administrative interface shows for each client the flows which have been launched against it also the flow’s status and any results returned.
When an analyst would like to collect a specific bit of information about a machine, they will need to directly or indirectly run a flow.
Hunts
A Hunt is a mechanism to run a Flow on a number of clients. For example, this makes it possible to check if any Windows machine in the fleet has a file with a particular name in a particular location.
Artifacts
An Artifact is a way to collect and name a group of files or other data that an analyst might want to collect as a unit. For example, an artifact might try to collect all the common linux persistence mechanisms.
Client-Server Communication
When a Flow needs to request information from a client, it queues up a message for the client. GRR clients poll the GRR server approximately every 10 minutes, and it will receive the message and begin responding to the request at the next poll.
After a client performs some work, it will normally enter ‘fast-poll’ mode in which it polls much more rapidly. Therefore when an analyst requests data from a machine, it might initially take some minutes to respond but additional requests will be noticed more quickly.
Protocol
The client poll is an HTTP request. It passes a signed and encrypted payload and expects the same from the GRR server. The client signs using its client key. This key is created on the client when first run, and the GRR ID is actually just a fingerprint of this key.
This means that no configuration is required by the client to establish an identity, but that clients cannot eavesdrop on or impersonate other clients.
Security Considerations
As described in Client-Server Communication, communication between the GRR client and server is secure against eavesdropping and impersonation. Furthermore, the server’s record of flows run and files downloaded is not changed after the fact. It is intended to provide a secure archive of data that has been gathered from machines.
Root Privileges
The GRR client agent normally runs as root, and by design is capable of reading any data on the system. Furthermore some flows can use significant resources on client (e.g. searching a large directory recursively) and in fact the GRR agent does have the ability to download and execute server provided code, though the code does need to be signed.
For these reasons, access to the GRR server should be considered tantamount to have root access to all GRR client machines. While we provide auditing and authorization support to mitigate the risks of this, the risks inherient in this should be understood when configuring and using GRR.
Searching For A Client
In order to start interfacing with a client, we first need to search for it in the GUI. The GRR search bar is located at the top of the GUI and allows you to search clients based on:
Hostname: “host:myhost-name”
Fully Qualified Domain Name (FQDN): “fqdn:myhost-name.organization.com”, also prefixes of components, e.g. “fqdn:myhost-name.organization”
MAC address: “mac:eeffaabbccdd”.
IP address: “ip:10.10.10.10”, also prefixes of bytes “ip:10.10”. Note that IP address is only collected during interrogate, which by default is run once per week.
User: “user:john”
Label: “label:testmachines”. Finds hosts with a particular GRR label.
Time of Last Data Update: Time ranges can be given using “start_date:” and “end_date:” prefixes. The data is interpreted as a human readable timestamp. Examples: start_date:2015, end_date:2018-01-01.
All of these keywords also work without the type specifier, though with less precision. For example “johnsmith” is both a user name and a hostname name, it will match both.
Furthermore there are additional keywords such as OS and OS version. So “Windows” will find all windows machines and “6.1.7601SP1” will match Windows 7 machines with SP1 installed, “6.1.7601” will match those without a service pack.
By default, the search index only considers clients that have checked in during the last six months. To override this behavior, use an explicit “start_date:” directive as specified above.
Interpreting Client Search Results
Searching returns a list of clients with the following information about each one:
Online: An icon indicating whether the host is online or not. Green means online; yellow, offline for some time; red, offline for a long time.
Subject: The client IDentifier. This is how GRR refers internally to the system.
Host: The name of the host as the operating system sees it.
Version: The operating system version.
MAC: A list of MAC addresses of the system.
Usernames: A list of user accounts the operating system knows about (usually users local to the system or that have logged in).
First Seen: The time when the client first talked to the server.
OS install time: The timestamp for the operating system install.
Labels: Any labels applied to this client.
Last Checkin: The last time the client communicated with the server.
Once you’ve found the client you were looking for, click on it and both the left panel and main panel will change to reflect you’re now working with a client.
Flows
GRR Flows
When designing GRR, one of the main goals was achieving great scalability. One of the main resource hogs with the client-server model is that while a client is active all resources that might have been needed on the server side to communicate with it and do processing are held (think temporary buffers, sockets, file descriptors…). Even when the client itself is doing operations that take time such as heavy computations or waiting on I/O, resources are held on the server.
When trying to deal with thousands of clients at the same time, this would translates into the server hoarding many unneeded resources.
To solve the resource hogging problem, Flows were created. Flows are the server-side code entities that call client actions. These calls are done asynchronously. That is, they are requested and their results become available later on. Flows are like a state machine, where transition between states happens when the results of client actions return to the server. So here’s what happens when the GRR server launches a typical Flow.
The GRR server executes the initial Flow state.
This state asks for one or more client actions to be performed on the client.
The server clears all the resources this Flow has requested and waits for responses from the client to come back.
When responses are received, the server fetches all the needed resources again and runs the Flow state where it expects these responses. If more client actions are requested by this state it goes back to step 2. Otherwise…
The results of this Flow are stored and the flow state is updated.
Flows have a second very interesting property. For flows that make use of some of the most primitive client actions, because all of the logic is encapsulated on the server side and the client doesn’t have any state at all, they naturally survive reboots while processing is taking place.
In the GRR UI, while having selected a client in the GUI, clicking on the Manage launched flows link on the left panel takes you to a view that shows all the Flows that have been requested on this client.
The table view shows the current state of the flow, what’s the flow identifier (Path), the name of the Flow launched, the date when it was launched, when it was last active and who created it.
As you can see, a couple of Flows have been launched in the shown example:
CAEnroler. This is the first flow ever to launch for any client. It is the enroling Flow which gets the client set up server side.
Interrogate. After enroling, a client sends some information about the machine it’s running in such as the hostname, MAC address or users available on the system. This is the flow that fetches this information and if you remember the Host Information option, most information is contained there.
This flow has also a few subflows shown in the table - flows can call other flows to collect more information from the client.
ListDirectory. A Flow that lists the contents of a directory. This is what happened when the refresh button was pressed on the GUI.
FileFinder. This flow is used to download and list files on the client machine.
Let’s see the ListDirectory flow in detail. You can click on any flow to get detailed information.
There’s a lot of information here. Some bits worth mentioning are:
Flow state, which tells us whether it finished correctly (oddly named TERMINATED) or not (ERROR), or if it’s still running (RUNNING).
Arguments show the arguments this flow was started with.
In Context there is information about resource usage of the flow and metadata about how many messages were sent back and forth to the client.
State Data shows internal server side state of this flow. This information is mostly useful for debugging.
The other tabs shown on the flow details view are
Requests. Show what pending requests this flow is waiting for while it’s running. Mostly for debugging.
Results. The results returned by this flow. More information in Working with Results.
Log. The log for this flow.
Api. This tab shows how the flow can be started automatically using the GRR API. See the Chapter on Automation.
Starting Flows
To start a new Flow simply click on the Start new flows option on the left panel while having a client selected. The main panel will populate with the holy trinity of panels. The tree view shows all the Flows organized by category.
For example, in order to start a FileFinder flow, expand the FileSystem category and select the corresponding item. The flow view will populate with a form with all the user-configurable parameters for this flow. What’s more, because each parameter has a well-defined type, GRR shows you widgets to select a value for each of them.
The FileFinder flow accepts a range parameters:
Paths. This is a list of textual paths that you want to look at.
Pathtype. Which VFS handler you want to use for the path. Available options are:
OS. Uses the OS “open” facility. These are the most straightforward for a first user. Examples of os paths are
C:/Windows
on Windows or/etc/init.d/
on Linux/OSX.TSK. Use Sleuthkit. Because Sleuthkit is invoked a path to the device is needed along the actual directory path. Examples of tsk paths are
\\?\Volume{19b4a721-6e90-12d3-fa01-806e6f6e6963}\Windows
for Windows or/dev/sda1/init.d/
on Linux (But GRR is smart enough to figure out what you want if you useC:\Windows
or/init.d/
instead even though there is some guessing involved).REGISTRY. Windows-related. You can open the live Windows registry as if it was a virtual filesystem. So you can specify a path such as
HKEY_LOCAL_MACHINE/Select/Current
.MEMORY and TMPFILE are internal and should not be used in most cases.
Condition. The FileFinder can filter files based on condition like file size or file contents. The different conditions should be self explanatory. Multiple conditions can be stacked, the file will only be processed if it fulfills them all.
Action. Once a file passes all the conditions, the action decides what should be done with it. Options are STAT, HASH and DOWNLOAD. Stat basically just indicates if a file exists, this is mostly used to list directories (path
C:\Windows\*
and action STAT). Hash returns a list of hashes of the file and Download collects the file from the client and stores it on the server.
For this example, a good set of arguments would be a directory listing, something like path C:\Windows\*
or /tmp/*
and action STAT. Once you’ve filled in each required field, click on Launch and if all
parameters validated, the Flow will run. Now you can go to the Manage
launched flows view to find it running or track it.
Important Not all flows might be available on every platform. When trying to run a flow that’s not available in the given platform an error will show up.
Available flows
The easiest ways to see the current flows is to check in the AdminUI under StartFlow. These have useful documentation.
Note that by default only BASIC flows are shown in the Admin UI. By clicking the settings (gear icon) in the top right, you can enable ADVANCED flows. With this set you will see many of the underlying flows which are sometimes useful, but require a deeper understanding of GRR.
Specifying File Paths
Providing file names to flows is a core part of GRR, and many flows have been consolidated into the File Finder flow, which uses a glob+interpolation syntax.
File Path Examples
All executables or dlls in each user’s download directory:
%%users.homedir%%\Downloads\*.{exe,dll}
All .evtx files found up to three directories under “C:\Windows\System32\winevt”:
%%environ_systemroot%%\System32\winevt\**.evtx
“findme.txt” files in user homedirs, up to 10 directories deep:
%%users.homedir%%/**10/findme.txt
File Paths: backslash or forward slash?
Either forward “/home/me” or backslash “C:\Users\me” path specifications are allowed for any target OS. They will be converted to a common format internally. We recommend using whatever is normal for the target OS: (backslash for Windows, fwdslash for OS X and Linux).
File Path Interpolation
GRR supports path interpolation from values in the artifact Knowledge Base. Interpolated values are enclosed with %%, and may expand to multiple elements. e.g.
%%users.homedir%%
Might expand to the following paths on Windows:
C:\Users\alice
C:\Users\bob
C:\Users\eve
and on OS X:
/Users/alice
/Users/bob
/Users/eve
and on Linux:
/home/alice
/usr/local/home/bob
/home/local/eve
A full list of possible interpolation values can be found by typing %% in the gui. The canonical reference is the proto/knowledge_base.proto file, which also contains docstrings for each type.
Path Globbing
Curly braces work similarly to bash, e.g:
{one,two}.{txt,doc}
Will match:
one.txt
two.txt
one.doc
two.doc
Recursive searching of a directory is performed with **. The default search depth is 3 directories. So:
/root/**.doc
Will match:
/root/blah.doc
/root/1/something.doc
/root/1/2/other.doc
/root/1/2/3/another.doc
More depth can be specified by adding a number to the **, e.g. this performs the same search 10 levels deep:
/root/**10.doc
IMPORTANT: Due to a regression, the recursive glob
**
behaves differently in ClientFileFinder. This will be fixed in a future version.Note: FileFinder transfers all data to the server and does the matching server side. This might lead to terrible performance when used with deep recursive directory searches. For a faster alternative that has the drawback of leaking the path you are searching for to the potentially compromised client, use the ClientFileFinder flow which does the matching right on the client.
Grep Syntax
A number of GRR flows (such as the File Finder) accept Grep specifications, which are a powerful way to search file and memory contents. There are two types of grep syntax: literal and regex.
Literal Matches
Use this when you have a simple string to match, or want to match a byte string. Here’s a simple string example (note no quotes required):
allyourbase
And a byte string example:
MZ\x90\x00\x03\x00\x00\x00\x04\x00\x00\x00
To minimise the potential for errors we recommend using python to create byte strings for you where possible, e.g. the above byte string was created in ipython like this:
In [1]: content = open("test.exe","rb").read(12)
In [2]: content
Out[2]: 'MZ\x90\x00\x03\x00\x00\x00\x04\x00\x00\x00'
Regex Matches
Use this when you need more complex matching. The format is a regular python regex (see http://docs.python.org/2/library/re.html) with the following switches applied automatically:
re.IGNORECASE | re.DOTALL | re.MULTILINE
An example regex is below. The entire match is reported, () groups are not broken out separately. Also note that 10 bytes before and after will be added to any matches by default - use the Advanced menu to change this behavior:
Accepted [^ ]+ for [^ ]+ from [0-9.]+ port [0-9]+ ssh
Specifying Windows Registry Paths
When specifying registry paths, GRR uses the following hive names (these can also be found by looking at the registry folder under “Browse Virtual Filesystem”):
HKEY_CLASSES_ROOT
HKEY_CURRENT_CONFIG
HKEY_CURRENT_USER
HKEY_DYN_DATA
HKEY_LOCAL_MACHINE
HKEY_PERFORMANCE_DATA
HKEY_USERS
The Registry Finder flow uses the same path globbing and interpolation system as described in Specifying File Paths. Examples:
HKEY_USERS\%%users.sid%%\Software\Microsoft\Windows\CurrentVersion\Run\*
HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\RunOnce\*
RegistryFinder will retrieve the values of any keys specified and report them in the registry data field. Default values will be retrieved and reported in registry data of the parent key. E.g. for this registry structure:
HKEY_LOCAL_MACHINE\Software\test:
(Default) = "defaultdata"
subkey = "subkeydata"
Collecting this:
HKEY_LOCAL_MACHINE\Software\test\*
Will give results like:
Path: /HKEY_LOCAL_MACHINE/SOFTWARE/test
Registry data: defaultdata
Path: /HKEY_LOCAL_MACHINE/SOFTWARE/test/subkey
Registry data: subkeydata
Working with Flow results
All flows in GRR produce a list of (0 or more) results. For example in the image below, the results for a FileFinder flow are shown. The type of the items in the list is FileFinderResult which are essentially Stat entries that also reference collected files in case the file finder was also downloading the files from the client to the GRR server.
GRR only provides rudimentary analysis functionality for results. Results can be filtered using the filter input on the right which does a full text match for results (including all metadata fields) but in many cases it’s better to export the results and use external analysis tools instead.
GRR offers to download the results in three different formats (the Download As:
dropdown on the right:
CSV
YAML
SQlite
NOTE All those formats will flatten the data, i.e., you might lose some information. In most cases this does not make a difference but it’s good to be aware that this happens.
Files
In the case where the results contain items that reference files (StatEntry for example), another option will be shown that allows the download of the referenced files as a ZIP or TAR archive. Additionally, GRR shows an export command that can be pasted into a GRR API shell and will download the same ZIP file programmatically using the GRR API.
Hunts
What are “Hunts” and how to use them
Hunting is one of the key features of GRR. Anything you can do on a single client, should be able to be done on thousands of clients just as easily.
A hunt specifies a Flow, the Flow parameters, and a set of rules for which machines to run the Flow on.
Creating a Hunt
Manual Hunt Creation
You can create a new Hunt in the Hunts section of the UI. To create a Hunt:
Click the + button
Select a flow you want to run and fill out all parameters as you would do for a flow running on a single client.
The second page of the Hunt Wizard lets you set hunt parameters. They are:
Hunt Description A friendly name for this hunt to show in the UI.
Client Limit A limit on the number of clients this hunt should run on. This can be changed later so it makes sense to start the hunt on a few clients first (we like 100) to see how it goes and only later remove the limit.
Crash Limit If more clients than indicated in this parameter return an error, the hunt is automatically paused.
Expiry Time Hunts are never really done since new clients might appear at any time and they should run the hunt too. This time indicates the lifetime of a hunt after which it is considered done.
Client Rate Number of clients to schedule the hunt on per minute. The default of 20 we have found to be safe, avoiding overloading the server for intensive hunts with lots of message passing (e.g. multiple flows, lots of results). A value of 0 disables rate limiting and clients will be scheduled as fast as possible. Use this with care: light hunts you need to run quickly.
Some more advanced options are available behind the advanced link but they should not need adjustment in most cases.
Set any output plugins, such as receiving an email for each result.
Set Hunt Rules. Most commonly this is used to select a specific platform target, e.g. Windows. See the Rules section.
Click Run
Unless approvals are required, you can start the hunt immediately.
Creating a Hunt from a Flow
Manual hunt creation is a bit dangerous because mistakes happen and they can propagate to many machines in short time. A better way of creating a hunt is to copy an existing and tested flow. In order to do so, create a flow on a client with the parameters you want to hunt for and make sure that it works. Once you are convinced that this flow is what you want to run on the fleet, use the “Create Hunt from Flow” button in the Manage launched flows view:
This will copy all flow parameters to a new hunt and you can continue to set all Hunt parameters starting from Step 3 above.
Hunt Rules
Hunt rules are used to define a subset of the clients to run a hunt on. The most common rules are
Limiting the hunt to a single operating system and
Limiting the hunt to clients that have a certain label attached
but GRR offers also regex and integer matching on all client attributes so more sophisiticated hunt rules can be created.
The rule section of the New Hunt Wizard can be seen in the following image:
GRR allows to specify more than one rule for hunts. By default, those rules are evaluated using a logical AND, meaning a client will only run the hunt if it satisfies all the conditions specified by every rule. This behavior can be changed with the dropdown on top, changing
Match all (logical AND)
to
Match any (logical OR)
Hunt Limits, DOs and DONTs
Creating Hunts
As already outlined in the section about creating hunts, hunts might negatively impact both the client machines they run on and also the GRR system by causing extensive load. In order to prevent this, we recommend to always start hunts either by
copying an existing and tested flow or
copying an existing hunt, applying only minor modifications.
We realize that sometimes unexpected problems can still arise and therefore GRR applies some limits to hunts.
Default Hunt Limits
There are two sets of limits in place for hunts. GRR enforces both, limits on individual clients and limits on the average resource usage for the hunt.
Individual Client Limits
Analoguous to what happens with flows, GRR enforces limits on the resource usage for each client participating in a hunt. Since the impact of a hunt is potentially much larger than for a single flow, the limits for hunts are lower than the flow limits. The current defaults are:
600 cpu seconds per client
100 MB of network traffic per client
Limits on Average Resource Usage
Once a hunt has processed 1000 clients, the average resource usage is also checked and enforced by GRR. The defaults for those limits are:
1000 results on average per client
60 cpu seconds on average per client
10 MB of network traffic on average per client
All the limits in this section can be overridden in the advanced settings when scheduling hunts. Use at your own risk.
Hunt Controls
Create a new hunt
Use this button to create a new hunt.
Start a Hunt
Use this button to start a newly created hunt. New hunts are created in the PAUSED state, so you’ll need to do this to run them. Hunts that reach their client limit will also be set to PAUSED, use this button to restart them after you have removed the client limit (see modify below).
Stop a Hunt
Stopping a hunt will prevent new clients from being scheduled and interrupt in-progress flows the next time they change state. This is a hard stop, so in-progress results will be lost, but results already reported are unaffected. Once a hunt is stopped, there is no way to start it again.
Modify a Hunt
The modify button allows you to change the hunt client limit and the hunt expiry time. Typically you use this to remove (set to 0) or increase a client limit to let the hunt run on more machines. Modifying an existing hunt doesn’t require re-approval. Hunts can only be modified in the PAUSED state.
Copy a Hunt
The copy button creates a new hunt with the same set of parameters as the currently selected one. If you find a mistake in a scheduled hunt and have to stop it, this button can be used to quickly reschedule a fixed version of the hunt.
Delete a Hunt
Use this to remove an unwanted hunt. For accountability reasons hunts can only be deleted if they haven’t run on any clients.
Show automated hunts
Use this button to display all hunts, including those created by cronjobs. These are hidden by default to reduce UI clutter.
Working with Hunt results
Hunt results in GRR work almost the same as Flow results. The only difference is that each item is annotated with the client id of the client that produced it. Everything else (filtering, exporting, generating archives with collected files) works exactly the same as with flow results as described in the flow results section.
Rapid hunts
Rapid hunting is a GRR UI feature that automatically sets client rate to 0 for certain types of hunts, thus making such hunts much faster.
Enabling rapid hunts
Add the following to GRR server configuration file:
AdminUI.rapid_hunts_enabled: True
Rapid hunts UI
When rapid hunts are enabled, hunts creation wizard shows an eligibility note on its Review page.
If a hunt is deemed rapid hunting eligible, its client rate is set to 0 automatically. User can still override it to a non-0 value, if needed.
There’s also an additional eligibility note on a Hunt Overview page.
Rapid hunting eligibility
A hunt should meet following conditions to be considered rapid-hunting eligible:
It should be a FileFinder or ClientFileFinder hunt.
Action should be STAT or HASH (but not DOWNLOAD).
Each glob expression can have at most one wildcard in it. For example,
/foo/bar/*
is ok, but/foo/*/bar/*
is not.
NOTE: if approvals are turned on, rapid hunts would still need to be approved.
Troubleshooting (“Why is my hunt doing nothing?”)
There are caches involved in the frontend server, you may need to wait a couple of minutes before the first client picks up the flow.
Clients only check if there is hunt work to do when doing a foreman check. The frequency of these checks are specified in the
Client.foreman_check_frequency
parameter. This defaults to every 30 minutes.Even when a client issues a foreman check, the flows may not immediately start. Instead, the process is asynchronous, so the check tells the server to check its hunt rules to see if there are things for the client to do. If there are, it schedules them, but the client may not do its regular poll and pick up that flow until
Client.poll_max period
(10 minutes by default).When you run a hunt you can specify a “Client Rate” as specified in creating hunts. If this is set low (but not 0), you can expect a slow hunt.
When running a hunt under high server load, clients seem appear complete in batches. This results in the completion graph appearing “stepped”. The clients are finishing normally, but their results are being processed and logged in batches by the Hunt. When the system is under load, this hunt processing takes some time to complete resulting in the steps.
Virtual File System
Virtual File System
Every time GRR collects forensics information (files, Registry data, …) from a client, it stores it server side in the data store. This information is presented to the analyst in a unified view we call the Virtual Filesystem view (menu item Browse Virtual Filesystem).
A screenshot is shown below (larger Image).
VFS Tree
On the left there is a tree navigator. The most important top level items are:
fs This gives a view of the client’s filesystem.
fs/os Under fs/os, GRR maps all the files it collected on a client using the OS apis, i.e., standard file manipulation system calls. Under fs/os, GRR shows everything that is mounted under / for Linux and Mac clients, for Windows clients, under fs/os there are entries like C: for each disk present on the client system.
fs/tsk GRR also uses The Sleuth Kit (TSK) to open disks and images in raw mode and parsing the file systems on them. Every file collected in that way will be presented under fs/tsk. The first level under fs/tsk will be a raw disk, for example sda1 for Linux or \?\Volume{5f62dc05-4cfc-58fd-b82c-579426ac01b8} for Windows.
registry Only present for Windows machines, this branch gives a view into the live registry on the client machine. The first level after /registry are hives, HKEY_LOCAL_MACHINE or HKEY_USERS for example.
Note how in the screenshot, the folders under /fs/tsk and /fs/os actually reference the same folder on disk (“/”). However, there is an additional folder startup-KoGcRS that is only visible when using TSK, it gets filtered by the operating system API.
Creating Timelines
The GRR VFS can be switched to a timeline view where all files GRR knows about are shown sorted by their various timestamps:
Use the button on the top right in the VFS view to switch between Timeline and File List View. The drop down menu can be used to download the generated timeline as CSV.
Note that
the timeline is always generated by recursively descending into directories starting from the currently selected one.
the timeline only contains files that GRR has downloaded metadata before. This timelining mechanism is fully server side, if you haven’t collected the metadata yet, the recursive refresh button will do this for you.
TimelineFlow
The TimelineFlow
provides a one-click solution for retrieving a snapshot (stat information for each file) of the entire filesystem. Once collected, the snapshot can be downloaded in Sleuthkit’s Body format for further analysis.
To use the TimelineFlow
:
Start a
TimelineFlow
(can be found inCollectors > Timeline
)Once completed, run the GRR API shell to download the results:
from grr_response_proto.api import timeline_pb2 grrapi.Client('C.0123456789abcdef').Flow('12345678') \ .GetCollectedTimeline(fmt=timeline_pb2.ApiGetCollectedTimelineArgs.BODY) \ .WriteToFile('/tmp/out.body')
Artifacts
Artifacts
Outline
During a security investigation responders need to quickly retrieve common pieces of information that include items such as logs, configured services, cron jobs, patch state, user accounts, and much more. These pieces of information are known as forensic artifacts, and their location and format vary drastically across systems.
We have built a framework to describe forensic artifacts that allows them to be collected and customised quickly using GRR. This collection was initially contained inside the GRR repository, but we have now moved it out to a separate repository to make access simple for other tools.
Goals
The goals of the GRR artifacts implementation are:
Describe artifacts with enough precision that they can be collected automatically without user input.
Cover modern versions of Mac, Windows, and Linux and common software products of interest for forensics.
Provide a standard variable interpolation scheme that allows artifacts to simply specify concepts like “all user home directories”,
%TEMP%
,%SYSTEMROOT%
etc.Allow grouping across operating systems and products e.g. “Chrome Web History” artifact knows where the web history is for Chrome on Mac, Windows and Linux.
Allow grouping of artifacts into high level concepts like persistence mechanisms, and investigation specific meta-artifacts.
To create simple, shareable, non-GRR-specific human-readable definitions that allow people unfamiliar with the system to create new artifacts. i.e. not XML or a domain specific language.
The ability to write new artifacts, upload them to GRR and be able to collect them immediately.
Collecting Artifacts
Artifacts can be collected using the artifact collector flow. Multiple artifacts can be collected at once.
Using artifacts in hunts is particularly powerful as an artifact like
JavaCacheFiles
can be scheduled in a single hunt across all three operating
systems, and the artifact itself determines the correct paths to be downloaded
for each operating system.
Defining Artifacts
Basics
GRR artifacts are defined in YAML. The main artifact repository is hosted on GitHub. You can browse [existing artifact definitions] artifact-samples or read the exhaustive syntax overview.
Knowledgebase
We use a standard set of machine information collected from the host for
variable interpolation. This collection of data is called the [knowledgebase]
artifact-knowledgebase and is referenced with a %%variable%%
syntax.
The artifact defines where the data lives. Once it is retrieved by GRR a parser can optionally be applied to turn the collected information into a more useful format, such as parsing a browser history file to produce URLs.
Uploading new definitions
New artifacts should be added to the [forensic artifacts repository] artifact-repository.
The changes can be imported into GRR by running make
in the grr/artifacts
directory. This will delete the existing artifacts, checkout the latest version
of the artifact repository and add all of the YAML definitions into GRR’s
directory. Running python setup.py build
will have the same effect. The new
artifacts will be available once the server is restarted.
Artifacts can also be uploaded via the Artifacts GUI and used immediately without the need for a restart.
Local definitions
Artifacts that are specific to your environment or need to remain private can be
added to the grr/artifacts/local
directory. This directory will remain
untouched when you update the main artifacts repository. You can also use this
directory to test new artifacts before they are added to the main public
repository.
Flow templates
We currently support using the artifact format to call GRR-specific
functionality, such as invoking a GRR client action or listing processes. Such
“artifacts” are called flow templates and since they are GRR-specific they
remain in the GRR repository in the grr/artifacts/flow_templates
directory.
This is a temporary working name. We intend to rework this functionality into a more general, powerful and configurable way to call GRR from YAML.
Cron Jobs in GRR
Click on “Cron Jobs” in the left menu to open a list of GRR cron jobs. GRR has a few system cron jobs that run periodically. These jobs perform periodical cleanup and maintenance tasks.
You can also use GRR’s Cron Jobs UI to schedule periodical hunts.
osquery
osquery is a utility created by Facebook that exposes system information through an SQL API. By writing SQL queries one can list processes, devices, network connections, users or browse directory structure. All that data is well-structured and described by the associated schema. By mixing that with capabilities of SQL we get a really powerful tool for analysts.
Setup
The osquery executable has to be available on the target client. Usually, it is as easy as downloading a package from the official website for appropriate platform and putting it somewhere where GRR can access it, e.g. /opt/osquery/
. Then, the location of the main osquery executable has to be specified in the Osquery.path
option in the GRR configuration:
(...)
Platform:Linux:
(...)
Osquery.path: '/opt/osquery/3.3/usr/bin/osqueryi'
(...)
Platform:Windows:
(...)
Osquery.path: 'C:\\Program Files\\osquery\\osqueryd.exe'
(...)
Note that it does not really matter whether you use osqueryi
or osqueryd
, both should work just fine.
Make sure you repack/redeploy GRR clients after the Osquery.path configuration setting is updated.
Investigating
To run a query on the desired client, simply start a new flow under “Collectors > Osquery”. The flow itself is very simple with “query” field being the important one. For example, to run a simple query that lists all the processes, you can use the following:
SELECT * FROM processes;
After a moment, the client should yield an SQL table with all the desired results.
Of course, it is possible to use all SQL features, such as joins:
SELECT cmdline, total_size FROM osquery_info JOIN processes USING (pid);
Refer to the osquery schema documentation to discover what tables and columns are available.
Options
Timeout
The timeout field can be used to specify a time after which the query is killed if it is not able to produce any results. Usually, queries should return immediately and the default timeout should be more than enough. However, it is possible to write more sophisticated queries (e.g. ones that search through the file system) where it might make sense to increase it.
Ignore stderr
errors
By default, GRR uses stderr
output produced by osquery to determine whether the execution was successful or not (e.g. due to syntax errors in the specified query). However, occasionally osquery can also produce warning messages that will cause the entire flow to fail. In such cases, it might be desired to simply suppress them.
Note that this option should be used with caution and not be ticked if not needed—otherwise errors in the query might go unnoticed.
Automation using the GRR API
GRR AdminUI server also serves as an API endpoint. We provide the grr_api_client Python library which can be used for easy GRR automation. See the API client documentation for more details.
There’s also an API client library in PowerShell working on Windows, Linux and macOS for GRR automation and scripting, developed by Swisscom.
Interactive scripting
GRR Colab is an alternative GRR API that allows you to work with GRR interactively. It is represented as IPython extension and can be used with IPython itself, Jupyter or Colaboratory.
Extension API
The GRR Colab offers two APIs.
Python API allows to work with multiple clients simultaneously since each client is represented as a usual Python object. All functions return Protobuf objects for non-trivial values. In order to start a flow you just call a single method. Python API does not have a state so you have to always specify absolute paths.
Magic API has a state and that’s why it allows to work with a single client only since you have to select a client first. The magic commands have Bash-like syntax and return Pandas dataframes for non-trivial values. The state allows you to work with relative paths also.
More details about both APIs you can find here in the Jupyter notebook demo.
Enabling GRR IPython extension
First, install grr-colab
Python package. On successful installation you
should be able to import grr_colab
from Python (or IPython) terminal. Python
API should be already available for you.
In order to enable Magic API grr_colab.ipython_extension
must be loaded. You
may do this in two ways:
If you want the extension to be loaded on the IPython kernel start, open IPython config file and add
grr_colab.ipython_extension
toc.InteractiveShellApp.extensions
list. Then Magic API should be available right after creating new Jupyter notebook or starting new IPython session.If you want to enable the extension for the current session only, run
%load_ext grr_colab.ipython_extension
command in you Jupyter notebook or IPython terminal.
Output Plugins
This document outlines how to send flow and hunt results from GRR to other systems and formats for further analysis.
Every GRR installation supports many Output Plugins by default to save flow/hunt results in different formats or send the results to other systems.
Splunk
GRR can send flow results as Events to Splunk’s HTTP Event Collector. To set this up, add a new HTTP Event Collector in
Splunk’s Data Input Settings. Events automatically get grr
as default source and
grr_flow_result
as default sourcetype unless you override this in Splunk’s or GRR’s configuration.
Configure GRR to send data to your Splunk instance by specifying the following values in your server configuration YAML file.
url
refers to the absolute API URL of your Splunk installation, including scheme and port. token
is the generated access
token from the Splunk HEC settings. More configuration options, including how to deal with Splunk Cloud’s self-signed
certificates are found in
config/output_plugins.py.
Splunk.url: https://input-prd-p-123456788901.cloud.splunk.com:8088
Splunk.token: 97e96c19-9bf1-4618-a079-37c567b577dc
After restarting your server, you can send flow and hunt results to Splunk, by enabling SplunkOutputPlugin
when launching a
flow/hunt. Here’s an example event that will be sent to Splunk, where the analyst set incident-123
as annotation for the
flow.
{
"annotations": ["incident-123"],
"client": {
"hostname": "max.example.com",
"os": "Linux",
"usernames": "max",
// ...
},
"flow": {
"name": "ClientFileFinder",
"creator": "sherlock",
"flowId": "23B01B77",
// ...
},
"resultType": "FileFinderResult",
"result": {
"statEntry": {
"stMode": "33261",
"stSize": "1444",
"pathspec": {
"path": "/home/max/Downloads/rickmortyseason6earlyaccess.sh",
// ...
},
// ...
}
}
}
BigQuery
Our blog contains a post explaining how to setup and use BigQuery.
Emergency Pushing of Code and Binaries
GRR offers functionality to deploy custom code or binaries to the whole fleet quickly. It goes without saying that this functinality can lead to problems and should therefore be used with care - we recommend using this as a last resort means in case things have gone terribly bad and not relying on this functionality for normal operations.
Binaries and python code can be pushed from the server to the
clients to enable new functionality.
The code that is pushed from the server must be signed by the
corresponding private key for Client.executable_signing_public_key
for
python and binaries. These signatures will
be checked by the client to ensure they match before the code is used.
What is actually sent to the client is the code or binary wrapped in a protobuf which will contain a hash, a signature and some other configuration data.
To sign code requires use of config_updater utility. In a secure environment the signing may occur on a different box from the server, but the examples below show the basic example.
Deploying Arbitrary Python Code.
To execute an arbitrary python blob, you need to create a file with python code that has the following attributes:
Code in the file must work when executed by exec() in the context of a running GRR client.
Any return data that you want sent back to the server can be stored encoded as a string in a variable called “magic_return_str”. Alternatively, GRR collects all output from print statements and puts them into the magic_return_string.
E.g. as a simple example. The following code modifies the clients poll_max setting and pings test.com.
import commands
status, output = commands.getstatusoutput("ping -c 3 test.com")
config_lib.CONFIG.Set("Client.poll_max", 100)
config_lib.CONFIG.Write()
magic_return_str = "poll_max successfully set. ping output %s" % output
This file then needs to be signed and converted into the protobuf format required, and then needs to be uploaded to the data store. You can do this using the following command line.
grr_config_updater upload_python --file=myfile.py --platform=windows
The uploaded files live by convention in python_hacks<platform> and are viewable in the “Binaries” section of the Admin UI.
The ExecutePythonHack Flow is provided for executing the file on a client. This is visible in the Admin UI under Administrative flows if Advanced Mode is enabled.
Note: Specifying arguments to a PythonHack is possible as well through the py_args argument, this can be useful for making the hack more generic.
Deploying Executables.
The GRR Agent provides an ExecuteBinaryCommand Client Action which allows us to send a binary and set of command line arguments to be executed. Again, the binary must be signed using the executable signing key (config option PrivateKeys.executable_signing_private_key).
To sign an exe for execution use the config updater script.
db@host:$ grr_config_updater upload_exe --file=/tmp/bazinga.exe --platform=windows
Using configuration <ConfigFileParser filename="/etc/grr/grr-server.conf">
Uploaded successfully to /config/executables/windows/installers/bazinga.exe
db@host:$
This file can then be executed with the LaunchBinary flow which is in the Administrative flows if Advanced Mode is enabled.
Glossary
AFF4
AFF4 was the data model used for storage in GRR that has since been deprecated. You can read about the usage in the GRR paper linked above and there is additional documentation on Forensics Wiki.
Agent
A platform-specific program that is installed on machines that one might want to investigate. It communicates with the GRR server and can perform client actions at the server’s request.
Client
A system that has an agent installed. Also used to refer to the specific instance of an agent running in that system.
Client Action
A client action is an action that a client can perform on behalf of the server.
It is the base unit of work on the client. Client actions are initiated by the
server through flows. Example client actions are ListDirectory
,
EnumerateFilesystems
, Uninstall
.
Collection
A collection is a logical set of objects stored in the AFF4 database. Generally these are a list of URNs containing a grouping of data such as artifacts or events from a client.
Datastore
The backend is where all AFF4 and scheduler data is stored. It is provided as an abstraction to allow for replacement of the datastore without significant rewrite. The datastore supports read, write, querying and filtering.
Flow
A logical collection of server or client actions which achieve a given
objective. A flow is the core unit of work in the GRR server. For example a
BrowserHistory
flow contains all the logic to download, extract and display
browser history from a client. Flows can call other flows to get their job done.
E.g. A CollectBrowserHistory
flow might call ListDirectory
and GetFile
to
do it’s work. A flow is implemented as a class that inherits from GRRFlow
.
Frontend server
Server-side component that sends and receives messages back and forth from clients.
Hunt
A hunt is a mechanism for managing the execution of a flow on a large number of machines. A hunt is normally used when you are searching for a specific piece of data across a fleet of machines. Hunts allow for monitoring and reporting of status.
Message
Transfer unit in GRR that transports information from a flow to a client and vice versa.
Worker
Once receiving a message from a client a worker will wake up the flow that requested its results and execute it.
Maintaining and tuning GRR deployment
Changing GRR server configuration
Configuration file organization
When GRR is installed on a system, a system wide, distribution specific, configuration file is also installed (by default /etc/grr/grr_server.yaml
). This specifies the basic configuration of the GRR service (i.e. various distribution specific locations). However, the configuration typically needs to be updated with site specific parameters (for example new crypto keys, the Client.control_urls
setting to allow the client to connect to the server, etc.)
In order to avoid overwriting the user customized configuration files when GRR is updated in future, we write site specific configuration files to a location specified by the Config.writeback
config option (by default /etc/grr/server.local.yaml
). This local file only contains the parameters which are changed from the defaults, or the system configuration file.
Note
The system configuration file is never modified, all updated configurations are always written to the writeback location. Do not edit the system configuration file as changes in this file will be lost when GRR is upgraded.
You can see all available configuration options and their current values in the “Settings” pane in GRR’s web UI.
Configuration Contexts
The next important concept to understand is that of configuration contexts. GRR consists of many components, which share most of their configuration. Often, however, we want to specify slight differences between each component. If we were to write a separate configuration file say for building each client version (Currently, 3 operating systems, and 2 architectures) as well as for each component (e.g. worker, web UI, frontend etc), we would end up with an unmaintainable mess of many configuration files. It would make more sense to put most of the common configuration parameters in the same file, and only specify the different ones, depending on the component which is running.
GRR introduces the concept of running context. The context is a list of attributes about the currently running component. When the code accesses the configuration system, the configuration system considers the currently running context and returns the appropriate value of the required parameter.
For example, consider the parameter Logging.filename
which specifies the
location of the log file. In a full GRR installation, we wish the GRR Admin
UI to log to /var/log/grr/adminui.log
while the frontend should log to
/var/log/grr/frontend.log
. We can therefore specify in the YAML config
file:
Logging.filename: /var/log/grr/grr_server.log
AdminUI Context:
Logging.filename: /var/log/grr/adminui.log
Frontend Context:
Logging.filename: /var/log/grr/httpserver.log
Client Context:
Logging.filename: /var/log/grr/grr_client.log
Platform:Windows:
Logging.filename: c:\\windows\\system32\\grr.log
When the AdminUI program starts up, it populates its context with
AdminUI Context
. This forces the configuration system to return the
value specific to this context. Note that other components, such as the
worker will set a context of Worker Context, which will not match any
of the above overrides. Therefore the worker will simply log to the
default of /var/log/grr/grr_server.log
.
Note that it is possible to have multiple contexts defined at the same
time. So for example, GRR client running under windows will have the
contexts, Client Context
and Platform:Windows
, while a GRR client
running under Linux will have the context Client Context
and
Platform:Linux
. When several possibilities can apply, the option which
matches the most context parameters will be selected. So in the above
example, linux and osx clients will have a context like Client Context
and will select /var/log/grr/grr_client.log
. On the other hand, a
windows client will also match the Platform:Windows
context, and will
therefore select c:\windows\system32\grr.log
.
The following is a non-exhaustive list of available contexts:
AdminUI Context, Worker Context, Frontend Context etc. Are defined when running one of the main programs.
Test Context is defined when running unit tests.
Arch:i386 and Arch:amd64 are set when running a 32 or 64 bit GRR client binary.
Installer Context is defined when the windows installer is active (i.e. during installation)
Platform:Windows, Platform:Linux, Platform:Darwin are set when running GRR client under these platforms.
The full list can be seen here.
Parameter Expansion
The GRR configuration file format allows for expansion of configuration parameters inside other parameters. For example consider the following configuration file:
Client.name: GRR
Nanny.service_name: "%(Client.name)service.exe"
Nanny.service_key_hive: HKEY_LOCAL_MACHINE
Nanny.service_key: Software\\%(Client.name)
Nanny.child_command_line: |
%(child_binary) --config "%(Client.install_path)\\%(Client.binary_name).yaml"
The expansion sequence %(parameter_name)
substitutes or expands the
parameter into the string. This allows us to build values automatically
based on other values. For example above, the Nanny.service_name
will
be GRRservice.exe
and the service_key
will be set to Software\GRR
Expansion is very useful, but sometimes it gets in the way. For example, if we need to pass literal % escape sequences. Consider the Logging.format parameter which is actually a python format string:
Logging.format: \%(levelname\)s \%(module\)s:\%(lineno\)s] \%(message\)s
Logging.format: %{%(levelname)s %(module)s:%(lineno)s] %(message)s}
This form escapes GRR’s special escaping sequence by preceding both opening and closing sequences with the backslash character.
This variation uses the literal expansion sequence %{} to declare the entire line as a literal string and prevent expansion.
Filtering
The configuration system may be extended to provide additional functionality accessible from the configuration file. For example consider the following configuration file:
Logging.path: "%(HOME|env)/grr.log"
This expansion sequence uses the env filter which expands a value
from the environment. In this case the environment variable HOME
will be expanded into the Logging.path
parameter to place the log
file in the user’s home directory.
There are a number of additional interesting filters. The file filter allows including other files from the filesystem into the configuration file. For example, some people prefer to keep their certificates in separate files rather than paste them into the config file:
CA.certificate: "%(/etc/certificates/ca.pem|file)"
It is even possible to nest expansion sequences. For example this retrieves the client’s private key from a location which depends on the client name:
Client.private_key: "%(/etc/grr/client/%(Client.name)/private_key.pem|file)"
Key management
Cryptographic Keys
Communication Security.
GRR uses fleetspeak for communication, which in turn uses TLS.
The fleetspeak server uses a CA and server public key pair generated on server install. The CA certificate is deployed to the client so that it can ensure it is communicating with the correct server. If these keys are not kept secure, anyone with MITM capability can intercept communications and take control of your clients. Additionally, if you lose these keys, you lose the ability to communicate with your clients.
Note: The fact that the client uses a CA certificate to verify the server ensures only that the client doesn’t accidentally connect to the wrong server and protects for example from privilege escalation attacks using a malicious fleetspeak server. This feature does not stop clients that do not have the CA certificate from connecting to your server - it’s for example possible to use a modified client that just doesn’t do this check. Such clients would the be able to see for example IOCs that you send to all clients in a hunt.
Code Signing and CA Keys.
In addition to the CA and Server key pairs, GRR maintains a set of code signing signing keys. By default GRR aims to provide only read-only actions, this means that GRR is unlikely to modify evidence, and cannot trivially be used to take control of systems running the agent.
Note that read only access many not give direct code exec, but may well provide it indirectly via read access to important keys and passwords on disk or in memory!
However, there are a number of use cases where it makes sense to have GRR execute arbitrary code as explained in the section Deploying Custom Code.
As part of the GRR design, we decided that administrative control of the GRR server shouldn’t trivially lead to code execution on the clients. As such we embed a strict allowlist of commands that can be executed on the client and we have a separate set of keys for code signing. For a binary to be run, the code has to be signed by the specific key and the client will confirm this signature before execution.
This mechanism helps give the separation of control required in some deployments. For example, the Incident Response team need to analyze hosts to get their job done, but deployment of new code to the platfrom is only done when blessed by the administrators and rolled out as part of standard change control. The signing mechanism allows Incident Response to react fast with new code if necessary, but only with the blessing of the Signing Key held by the platform administrator.
In the default install, the driver and code signing private keys are not passphrase protected. In a secure environment we strongly recommended generating and storing these keys off the GRR server and doing offline signing every time this functionality is required, or at a minimum setting passphrases which are required on every use. We recommend encrypting the keys in the config with PEM encryption, config_updater will then ask for the passphrase when they are used. An alternative is to keep a separate offline config that contains the private keys.
Key Generation
As state above, GRR requires multiple key pairs. These are used to:
Sign the client certificates for enrollment.
Sign and decrypt messages from the client.
Sign code and binaries sent to the client.
These keys can be generated using the config_updater script normally installed in the path as grr_config_updater using the generate_keys command.
db@host:$ sudo grr_config_updater generate_keys
Generating executable signing key
..............+++
.....+++
Generating driver signing key
..................................................................+++
.............................................................+++
Generating CA keys
Generating Server keys
Generating Django Secret key (used for xsrf protection etc)
db@host:$
Adding a Passphrase
To Encrypt and add a password to the code & driver signing certificates
Copy the keys (PrivateKeys.executable_signing_private_key & PrivateKeys.driver_signing_private_key) from the current GRR configuration file, most likely
- /etc/grr/server.local.yaml
Save these two keys as new text files temporarily. You’ll need to convert the key text to the normal format by removing the leading whitespace and blank lines:
cat <original.exe.private.key.txt> | sed 's/^ //g' | sed '/^$/d' > <clean.exe.private.key>
cat <original.driver.private.key.txt> | sed 's/^ //g' | sed '/^$/d' > <clean.driver.private.key>
Encrypt key and add a password
openssl rsa -des3 -in <clean.exe.private.key.txt> -out <exe.private.secure.key>
openssl rsa -des3 -in <clean.driver.private.key.txt> -out <driver.private.secure.key>
Securely wipe all temporary files with cleartext keys.
Replace the keys in the GRR config with the new encrypted keys (or store them offline). Ensure the server is restarted to load the updated configuration.
PrivateKeys.executable_signing_private_key: '-----BEGIN RSA PRIVATE KEY-----
Proc-Type: 4,ENCRYPTED
DEK-Info: DES-EDE3-CBC,8EDA740783B7563C
<start key text after *two* blank lines…>
<KEY...>
-----END RSA PRIVATE KEY-----'
Note In the YAML encoding,there must be an extra line between the encrypted PEM header and the encoded key. The key is double-spaces and indented two spaced exactly like all other keys in configuration file.
Alternatively, you can also keep your new, protected keys in files on the server and load them in the configuration using the file filter like this:
PrivateKeys.executable_signing_private_key: %(<path_to_keyfile>|file)
Rotating the keys
Rotating Client Keys
The keys used by the fleetspeak client to talk to the server do not have to be / cannot be rotated since the client id depends directly on the private key. Just removing the fleetspeak state file results in the client generating a new key pair but the client will also get a new client id.
Client keys used for signing executables and drivers can be changed in the config as described in the documentation about keys but
Clients need to be repacked and redeployed so they can use the new keys.
Old clients on the fleet will not be able to use the new keys anymore so there will be some issues during transition.
Rotating Server Keys
Issuing a new server key can be done using the
grr_config_updater rotate_server_key
command. This will update the server configuration with a new private key that is signed by the same CA. Existing clients will not lose connectivity by changing those keys even though old clients might need to be restarted before they will accept the new server credentials.
Note Changing the server key will invoke GRRs certificate revocation process. All clients that have seen the new server certificate will from that point on refuse to accept the previous key so it’s not possible to revert to the old key.
User management
Users in GRR
Concept
GRR has a concept of users of the system. The GUI supports basic authentication and this verfication of user identity is used in all auditing functions (so for example GRR can properly record which user accessed which client, and who executed flows on clients). There is no logout button. To change users you will need to invalidate your session.
A GRR user will be created as an admin by default. This is only important if the approval-based workflow is turned on, since only “admin” users can approve hunts. We are aware of a bug that all users are created with admin permissions, however, the approval system is the authority for access across GRR, when enabled.
To add the user joe as an admin:
db@host:~$ sudo grr_config_updater add_user joe
Using configuration <ConfigFileParser filename="/etc/grr/grr-server.conf">
Please enter password for user 'joe':
Updating user joe
Username: joe
Password: set
To list all users:
db@host:~$ sudo grr_config_updater show_user
Using configuration <ConfigFileParser filename="/etc/grr/grr-server.conf">
Username: test
Is Admin: True
To update a user password:
db@host:~$ sudo grr_config_updater update_user joe --password
Using configuration <ConfigFileParser filename="/etc/grr/grr-server.conf">
Updating user joe
Username: joe
Is Admin: True
Available commands are (–admin is currently not functional in v3.3.0.0):
usage: grr_config_updater update_user [-h] [--helpfull] [--password] [--admin]
username
positional arguments:
username Username to update.
optional arguments:
-h, --help show this help message and exit
--helpfull show full help message and exit
--password Reset the password for this user (will prompt for password).
--admin Make the user an admin, if they aren't already.
Authentication
The AdminUI uses HTTP Basic Auth authentication by default, based on the passwords within the user objects stored in the data store, but we don’t expect you to use this in production (see Securing Access for more details).
There is so much diversity and customization in enterprise
authentication schemes that there isn’t a good way to provide a solution that
works for a majority of users. Large enterprises most likely already have internal webapps
that use authentication, this is just one more. Most people have found the
easiest approach is to sit Apache (or similar) in front of the GRR Admin UI as
a reverse proxy and use an existing SSO plugin that already works for that
platform. Alternatively, with more work you can handle auth inside GRR by
writing a Webauth Manager (AdminUI.webauth_manager
config option) that uses an
SSO or SAML based authentication mechanism.
Running GRR UI behind Apache
Running apache as a reverse proxy in front of the GRR admin UI is a good way to provide SSL protection for the UI traffic and also integrate with corporate single sign on (if available), for authentication.
To set this up:
Buy an SSL certificate, or generate a self-signed one if you’re only testing.
Place the public key into “/etc/ssl/certs/“ and ensure it’s world readable
chmod 644 /etc/ssl/certs/grr_ssl_certificate_filename.crt
Place the private key into “/etc/ssl/private” and ensure it is NOT world readable
chmod 400 /etc/ssl/private/grr_ssl_certificate_filename.key
Install apache2 and required modules
apt-get install apache2
a2enmod proxy
a2enmod ssl
a2enmod proxy_http
Disable any default apache files currently enabled (probably 000-default.conf, but check for others that may interfere with GRR)
a2dissite 000-default
Redirect port 80 HTTP to 443 HTTPS
Create the file “/etc/apache2/sites-available/redirect.conf” and copy the text below into it.
<VirtualHost *:80>
Redirect "/" "https://<your grr adminUI url here>"
</VirtualHost>
Create the file “/etc/apache2/sites-available/grr_reverse_proxy.conf” and copy the text below into it.
<VirtualHost *:443>
SSLEngine On
SSLCertificateFile /etc/ssl/certs/grr_ssl_certificate_filename.crt
SSLCertificateKeyFile /etc/ssl/private/grr_ssl_certificate_filename.key
ProxyPass / http://127.0.0.1:8000/
ProxyPassReverse / http://127.0.0.1:8000/
</VirtualHost>
Enable the new apache files
a2ensite redirect.conf
a2ensite grr_reverse_proxy.conf
Restart apache
service apache2 restart
NOTE: This reverse proxy will only proxy the AdminUI. It will have no impact on the agent communications on port 8080. It is advised to restrict access to the AdminUI at the network level.
Running GRR behind Nginx
Running NGINX as a reverse proxy in front of GRR web UI is a good way to provide SSL protection for the UI traffic and also integrate with corporate single sign on system (if available), for authentication.
The instruction below is largely based on this excellent tutorial by Josh Reichardt: How To Configure Nginx with SSL as a Reverse Proxy for Jenkins
Instructions below assume that you have a user capable of sudo-access and that you have GRR server already installed.
Step One - Configure Nginx
Install Nginx
Update your package lists and install Nginx:
sudo apt-get update
sudo apt-get install nginx
You may want to check Nginx’s version in case you run into issues later down the road.
nginx -v
Get a Certificate
Next, you will need to purchase or create an SSL certificate. These commands are for a self-signed certificate, but you should get an officially signed certificate if you want to avoid browser warnings.
Move into the proper directory and generate a certificate:
cd /etc/nginx
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /etc/nginx/cert.key -out /etc/nginx/cert.crt
You will be prompted to enter some information about the certificate. You can fill this out however you’d like; just be aware the information will be visible in the certificate properties. We’ve set the number of bits to 2048 since that’s the minimum needed to get it signed by a CA. If you want to get the certificate signed, you will need to create a CSR.
Edit the Configuration
Next, edit the default Nginx configuration file: /etc/nginx/sites-enabled/default
Here is what the final config might look like; the sections are broken down and briefly explained below. You can update or replace the existing config file, although you may want to make a quick copy first.
server {
listen 80;
return 301 https://$host$request_uri;
}
server {
listen 443;
server_name localhost;
ssl_certificate /etc/nginx/cert.crt;
ssl_certificate_key /etc/nginx/cert.key;
ssl on;
ssl_session_cache builtin:1000 shared:SSL:10m;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_ciphers HIGH:!aNULL:!eNULL:!EXPORT:!CAMELLIA:!DES:!MD5:!PSK:!RC4;
ssl_prefer_server_ciphers on;
access_log /var/log/nginx/grr.access.log;
location / {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Fix the “It appears that your reverse proxy set up is broken" error.
proxy_pass http://localhost:8000;
proxy_read_timeout 180;
proxy_redirect http://localhost:8000 https://localhost;
}
}
In our configuration, the cert.crt and cert.key settings reflect the location where we created our SSL certificate.
The first section tells the Nginx server to listen to any requests that come in on port 80 (default HTTP) and redirect them to HTTPS.
server {
listen 80;
return 301 https://$host$request_uri;
}
servername and proxyredirect settings assume that you run Nginx on the same host with the GRR web UI server. This is suitable for demo purposes, but in a real-world scenario you may want to host Nginx and GRR processes on different machines and replace localhost inside servername and proxyredirect with your Nginx server’s IP address or its intranet domain name.
NOTE: even when running behind a SSL-enabled Nginx server with authentication turned on, GRR web UI shouldn’t be exposed to the Internet. Whoever has access to it has a root-level access to client machines in your GRR deployment.
Next we have the SSL settings. This is a good set of defaults but can definitely be expanded on. For more explanation, please read this tutorial.
...
listen 443;
server_name jenkins.domain.com;
ssl_certificate /etc/nginx/cert.crt;
ssl_certificate_key /etc/nginx/cert.key;
ssl on;
ssl_session_cache builtin:1000 shared:SSL:10m;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_ciphers HIGH:!aNULL:!eNULL:!EXPORT:!CAMELLIA:!DES:!MD5:!PSK:!RC4;
ssl_prefer_server_ciphers on;
...
The final section is where the proxying happens. It basically takes any incoming requests and proxies them to GRR web UI instance running on the same host listening to port 8000 on the local network interface (localhost:8000). Note that *localhost:8000” is the default address of GRR web UI when installed from GRR DEB package.
...
location / {
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Fix the “It appears that your reverse proxy set up is broken" error.
proxy_pass http://localhost:8000;
proxy_read_timeout 180;
proxy_redirect http://localhost:8000 https://localhost;
}
...
Note that the second argument of proxy_redirect directive should be the publicly accessible address of your reverse proxy. We use https://localhost here for the sake of simplicity, but this can be replaced with an internal domain name or an IP address.
The first argument is the default host/port of AdminUI
NOTE: once again, we have to remind that GRR web UI shouldn’t ever be exposed to the Internet. Whoever has access to GRR web UI potentially has root access to all GRR clients in your deployment.
Step Two - Configure GRR
Edit /etc/grr/server.local.yaml by setting the following option (a corresponding line should be in the file already):
AdminUI.url: https://localhost
The value of AdminUI.url should be equal to the second argument of Nginx’s proxy_redirect. It must contain a URL that should be used by GRR users to access GRR web UI.
Step Three - check that everything is working
Restart Nginx and GRR:
sudo service grr-server restart
sudo service nginx restart
If you run the command above and see HTML code in the standard output, it means that you can reach GRR web UI.
At this point if you type https://locahost in your browser, you should see GRR’s page protected by a Basic Auth dialog. You can also just use wget:
wget --no-check-certificate --user=admin --password=password -O- https://localhost
Here admin is the username and password is the password of one of the registered GRR users. Use a username and a password that you specified when installing GRR.
Note that in this case it’s GRR web UI process that triggers the Basic Auth check. In the current setup Nginx acts as a simple proxy that doesn’t do any authentication and doesn’t limit user’s activity in any way. Whoever can access Nginx, can access GRR web UI running behind it.
Step Four - make Nginx responsible for the authentication
GRR web UI supports setups where user authentication is delegated to the reverse proxy. In this kind of setups GRR web UI will simpy read the username from the X-Remote-User HTTP header and assume that the user has already passed the authentication stage and can be trusted.
Below is an example of doing such a setup.
Edit /etc/grr/server.local.yaml and add the following:
AdminUI.webauth_manager: RemoteUserWebAuthManager
AdminUI.remote_user_trusted_ips:
- 127.0.0.1
- ::ffff:127.0.0.1
What happens here is that we switch from using BasicWebAuthManager (a default for AdminUI.webauth_manager) to RemoteUserWebAuthManager.
BasicWebAuthManager triggers Basic Authentication and checks passwords entered by users against the ones in GRR datastore. Only users that are explicitly created with grr_config_updater can log in if BasicWebAuthManager is used.
RemoteUserWebAuthManager is an entirely different thing. It doesn’t do any authentication by itself. On the contrary, it expects the request to be already authenticated by a reverse proxy and X-Remote-User header to have a username of an authenticated user (the exact header name may be configured by GRR’s AdminUI.remote_user_header option).
Given that RemoteUserWebAuthManager blindly trusts the value of X-Remote-User, it’s very important to restrict the list of hosts that can send requests to GRR web UI when RemoteUserWebAuthManager is used. This is configured by AdminUI.remote_user_trusted_ips settings. This setting should contain IP address(es) of Nginx reverse proxy.
In this article we assume that GRR web UI and Nginx run on the same host, so we set AdminUI.remote_user_trusted_ips to a localhost IP address in 2 notations, IPv4 and IPv6.
Now let’s configure Nginx to do the Basic Auth itself. Change /etc/nginx/sites-enabled/default by adding the following to location directive:
...
location / {
...
auth_basic "restricted site";
auth_basic_user_file /etc/nginx/.htpasswd;
proxy_set_header X-Remote-User $remote_user;
...
}
...
proxy_set_header directive sets X-Remote-User header (that is used by GRR web UI) to a username returned by Nginx’s authentication directives (in this case: auth_basic and auth_basic_user_file).
auth_basic and auth_basic_user_file directives tell Nginx to request Baisc Authentication on a given location. Usernames and passwords are stored in the /etc/nginx/.htpasswd file.
You can add a username to the file using this command. We are using admin as an example:
sudo sh -c "echo -n 'admin:' >> /etc/nginx/.htpasswd"
Next, add an encrypted password entry for the username by typing:
sudo sh -c "openssl passwd -apr1 >> /etc/nginx/.htpasswd"
Now restart both Nginx and GRR:
sudo service grr-server restart
sudo service nginx restart
If you try accessing https://localhost using your browser now, you should see the Basic Authentication dialog. Enter username “admin” and the password that you’ve used right above and you should see GRR web UI. You can also just use wget:
wget --no-check-certificate --user=admin --password=password -O- https://localhost
While this behavior looks similar to a default GRR behavior, it’s drastically different under the hood. In a current setup the authentication is done by Nginx and GRR simply trusts the username that Nginx passes.
We used Basic Auth in Nginx for the sake of simplicity. However one of many Nginx authentication plugins can be used instead of basic_auth in this kind of setup.
Limiting access to GRR UI/API with API routers
GRR has a notion of API call routers. Every API request that GRR server handles is processed in a following fashion:
A username of a user making the request is matched against rules in the router configuration file (as defined in the
API.RouterACLConfigFile
config option).If router configuration file is not set, a default router is used. Default router is specified in the
API.DefaultRouter
configuration option.If one of the rules in the
API.RouterACLConfigFile
matches, then a router from the matching rule is used.
The API router is used to process the request.
GRR has a few predefined API routers:
DisabledApiCallRouter - a router that will return an error for all possible requests.
ApiCallRouterWithoutChecks (default) - a router that performs no access checks and just gives blanket access to the system.
ApiCallRouterWithApprovalChecks - a router that enables GRR approvals-based workflow for better auditing. See Approval-based workflow for more details.
ApiCallRobotRouter - a router that provides limited access to the system that is suitable for automation. Normally used together with ApiCallRouterWithApprovalChecks so that scripts and robots can perform actions without getting approvals.
Various configuration scenarios
All GRR users should have unrestricted access (default)
In GRR server configuration file:
API.DefaultRouter: ApiCallRouterWithoutChecks
All GRR users should follow audit-based workflow
In GRR server configuration file:
API.DefaultRouter: ApiCallRouterWithApprovalChecks
GRR users should follow audit-based workflow, but user “john” should have blanket access.
In GRR server configuration file:
API.RouterACLConfigFile: /etc/grr_api_acls.yaml
API.DefaultRouter: ApiCallRouterWithApprovalChecks
In /etc/grr_api_acls.yaml
:
router: "ApiCallRouterWithoutChecks"
users:
- "john"
NOTE: for example, you can set up user ‘john’ as a robot user, so that his credentials are used by scripts talking to the GRR API.
Sample api_acls.yaml:
router: "ApiCallRobotRouter"
users:
- "serviceAccount1"
router: "ApiCallRouterWithApprovalChecks"
users:
- "test"
- "test2"
router: "ApiCallRouterWithoutChecks"
users:
- "adminuser1"
- "adminuser2"
Email configuration
This section assumes you have already installed an MTA, such as Postfix or nullmailer. After you have successfully tested your mail transfer agent, please proceed to the steps outlined below.
To configure GRR to send emails for reports or other purposes:
Ensure email settings are correct by running back through the configuration script if needed (or by checking /etc/grr/server.local.yaml
):
grr_config_updater initialize
Edit /etc/grr/server.local.yaml
to include the following at the end of the file:
Worker.smtp_server: <server>
Worker.smtp_port: <port>
and, if needed,
Worker.smtp_starttls: True
Worker.smtp_user: <user>
Worker.smtp_password: <password>
After configuration is complete, restart the GRR worker(s). You can test this configuration by running *OnlineNotification” flow on any of the online clients (in the “Administrative” category). This flow sends an email to a given email address as soon as the client goes online. If the client is already online, the email will be sent next time it checks in (normally within 10 minutes).
Custom Email addresses
When GRR sends an Email to a GRR user, the destination Email address defaults to <username>@<Logging.domain>
. Starting with version 3.4.2.0, GRR can alternatively use custom Email addresses. In this mode, GRR will store a separate Email address for each user in the data store. To enable this mode, set the following option in the GRR config:
Email.enable_custom_email_address: true
The value of the custom Email addresses can be set in one of the following ways:
When using
RemoteUserWebAuthManager
for authentication, the Email address can be passed to GRR in theX-Remote-Extra-Email
HTTP header. (The actual of the header is configurable using theAdminUI.remote_email_header
option).Using the GRR root API:
grr_api_shell_raw_access --exec_code "grrapi.root.CreateGrrUser(username='foo', email='foo@bar.org')"
grr_api_shell_raw_access --exec_code "grrapi.root.GrrUser('foo').Get().Modify(email='foo2@bar.org')"
Monitoring
This page outlines how to set up monitoring for the GRR components and for Fleetspeak. GRR and Fleetspeak keep track of many metrics which can be used to create charts about their performance, health, and workload.
When Monitoring.http_port
is configured, each GRR component exports
metrics at http://<host>:<port>/metrics
. The export is human-readable
plain-text and follows the Open Metrics standard.
More importantly however, Prometheus can parse the metrics,
and Grafana can plot the parsed data.
Prometheus Setup
This example will walk you through a basic Prometheus setup. For this example, the GRR Frontend, Worker, and Admin UI will be launched on your local machine. You can also choose to monitor Fleetspeak servers; otherwise feel free to skip the relevant steps, which are marked as FS.
Install GRR, for example from pip.
Run the GRR components locally. Execute each of the three commands in a separate terminal:
# Fleetspeak monitoring is configured in a separate config file. grr_server --component fleetspeak_server grr_server --component admin_ui -p Monitoring.http_port=44451 grr_server --component frontend -p Monitoring.http_port=44452 grr_server --component worker -p Monitoring.http_port=44453
Note: Custom monitoring port assignment is only required because the ports would clash when running multiple GRR components on one machine. Prometheus requires to know which type of component listens on which ports. If you use
Monitoring.http_port_max
, make sure that only one type of GRR components (e.g. only workers) listen on a given range of ports.FS: Edit the file
server.components.config
. It is located in/etc/fleetspeak-server
on a setup installed from a DEB and in the virtualenv, infleetspeak-server-bin/etc/fleetspeak-server
when running from a virtualenv. Add the following to the end of the file:stats_config: < address: "localhost:54451" >
This will insure that Fleetspeak will export its metrics to Prometheus in http://localhost:54451/metrics.
Open http://localhost:44451/metrics in your browser. You should see a plain-text list of metrics.
Install Prometheus, e.g. by downloading and unpacking the archive file.
Configure Prometheus to scrape GRR. Save the following configuration as
prometheus.yml
in the Prometheus folder.global: scrape_interval: 15s scrape_configs: - job_name: 'grr_admin_ui' static_configs: - targets: ['localhost:44451'] - job_name: 'grr_frontend' static_configs: - targets: ['localhost:44452'] - job_name: 'grr_worker' static_configs: - targets: ['localhost:44453']
FS: under
scrape_configs
also add the following:- job_name: 'fleetspeak' static_configs: - targets: ['localhost:54451']
Start Prometheus, by running the following command from the Prometheus folder:
./prometheus --config.file=prometheus.yml
Open http://localhost:9090/targets in your browser. After a couple seconds, you should see three (FS: four) targets (
grr_admin_ui
,grr_frontend
,grr_worker
), each having 1 instance up.Open the Expression Browser by clicking on Graph (http://localhost:9090/graph). On this page, click on the Graph tab (next to Console). Then, try any of the example queries to query GRR metrics. Be aware that you might only see very few data points, very low values, or no data at all since GRR is not under any real workload and has 0 connected clients in this example.
At this point, you can use Prometheus to display basic charts using the expression browser. We recommend the usage of a dedicated visualization software, e.g. Grafana. You can set up a quick configuration of Grafana to scrape the metrics from Prometheus by setting up Grafana.
Example Queries
To get you started, this page contains some example queries. These queries give you a good insight on GRR’s health and workload.
QPS rate for the Frontend
rate(frontend_request_count_total{job="grr_frontend"}[1m])
Latency for requests to the Frontend
rate(frontend_request_latency_sum{job="grr_frontend"}[5m]) /
rate(frontend_request_latency_count{job="grr_frontend"}[5m])
Active Tasks running on the Frontend
frontend_active_count
Rate of successful flows on the Worker
rate(grr_flow_completed_count_total{job="grr_worker"}[5m])
Rate of failed flows on the Worker
rate(grr_flow_errors_total{job="grr_worker"}[5m])
Threadpool latency in the Worker
rate(threadpool_working_time_sum{job="grr_worker"}[5m]) /
rate(threadpool_working_time_count{job="grr_worker"}[5m])
Threadpool queueing time in Worker
rate(threadpool_queueing_time_sum{job="grr_worker"}[5m]) /
rate(threadpool_queueing_time_count{job="grr_worker"}[5m])
Number of outstanding tasks in the Worker
threadpool_outstanding_tasks{job="grr_worker"}
Number of threads running on the Worker
threadpool_threads{job="grr_worker"}
Rate of client crashes reported to the Worker
rate(grr_client_crashes_total{job="grr_worker"}[5m])
Scaling Prometheus
Prometheus supports automatic Service Discovery for many types of infrastructure. Depending on your hosting setup and size of your GRR installation, this can be an improvement over manually hardcoding hostnames in the Prometheus configuration.
Prometheus Security Considerations
A minimal HTTP service, based on
prometheus_client
is listening at Monitoring.http_port
for each GRR component. This HTTP
service exports read-only metrics under /metrics
and /varz
and does
not enforce any access control. People with access to it can read
aggregated metrics about your GRR installation. With these metrics, facts
about the number of workers, flow activity, and service health can be
derived. Make sure to limit access to the port, for example by employing
a firewall. Furthermore, read Prometheus Security.
Grafana Setup
This example will walk you through setting up Grafana as a dedicated visualization software to parse, display and query metrics scraped from GRR server components by Prometheus. If you’ve followed the example setup before, then Prometheus is configured to scrape GRR (and Fleetspeak as well if you have a Fleetspeak-based deployment). You will also be able to set up a simple alerting system using Grafana. These instructions assume that GRR server and Prometheus are both up and running.
Install Grafana by following the instructions:
sudo apt-get install -y apt-transport-https sudo apt-get install -y software-properties-common wget wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add - sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main" sudo apt-get update sudo apt-get install grafana
This will install the latest OSS release.
After Grafana is installed, you can start the Grafana server by executing in a terminal (assuming your operating system either Debian or Ubuntu and you installed the latest OSS release):
sudo systemctl daemon-reload sudo systemctl start grafana-server sudo systemctl status grafana-server
This will get the Grafana server up and running on
http://localhost:3000
.Grafana is now set up and can be visited at
http://localhost:3000
. The username and password should be “admin”; please change it. If the Grafana UI doesn’t show up, either the installation of Grafana or the server run failed; make sure to check the official documentation.Set up Prometheus as a data source for Grafana, so that Grafana can display the metrics from Prometheus. To do that, follow the these instructions: Click on Configuration -> Data Sources in the Grafana server UI, then click on the “Add data source” button. From the list of “time series databases”, choose Prometheus and in the Name field type
grr-server
.Grafana is set up and ready to show metrics scraped by Prometheus. You can start by either creating your own dashboards or importing exisiting dashboards into Grafana. The next step would be to import dashboards provided with GRR from the Grafana folder in GRR repository. You can customize and expand them with additional graphs, if needed. These dashboards contain some example graphs of metrics scraped by Prometheus, and also implement sample alerts. To do that, first download the dashboards from the repository (download dashboards from
fleetspeak_enabled_setup/dashboards_for_use
orlegacy_setup/dashboards_for_use
, depending on your deployment), and then head over tohttp://localhost:3000/dashboard/import
. There, you can click ‘Upload .json file’ and upload the dashboard you have downloaded from the repository. The dashboard is now imported; you can access it by going tohttp://localhost:3000/dashboards
and clicking the dashboard’s name, e.g “Frontends Dashboard”. Each of the sample dashboards correspond to a different component of GRR server (and Fleetspeak server itself), e.g the Frontends Dashboard shows aggregated metrics from all Frontends that are active in GRR server. Each of the dashboards contain several panels; each such panel consists of a graph that contain one or more metrics queried from Prometheus, and possibly alerting rules.You can now use the dashboards. The dashboards can give a general overview over the main components of the GRR server and Fleetspeak servers, which can be utilized by the user to monitor different metrics of each component. Examples for such metrics can be found in the examples above. Remember that the dashboards and alerts are flexible, and can be expanded or modified to adjust to your exact needs. Additional metrics can be used by exploring
http://<host>:<port>/metrics
for each component of GRR server (change the port according to the GRR server component you want) and Fleetspeak servers, and if you’d like to create your own custom dashboards, panels and alerts, make sure to go over the corresponding documentation in Grafana.
Alerts
If you want to use the alerts in Grafana, you first need to define a
notification channel
for the alerts. This can be done by heading over to
http://localhost:3000/alerting/notification/new
and
following the form
to add a notification channel. For reference, check out the screenshot
that follows.
Once a notification channel is set up,
you will start receiving alerts from the existing dashboards, as those
contain definitions for simple alerts. There are two sample alerting
rules: in each dashboard, there is a panel that counts the number of
active processes for the specific job. For example, in the Workers
dashboards, there is a panel called “Active Processes” which shows
the current number of active Workers processes. If the number of active
workers is zero - an alert will be fired. If you want to disable or
remove an alert, go to the dashboard and the corresponding panel,
and there you can remove the alerting rule.
Monitoring Client Load Stats
Using Grafana, you are able to view statistics of individual GRR clients and monitor them. To achieve this, check out the following steps.
Note that this is only applicable to Fleetspeak-based GRR deployments, as the individual client data using these steps is gathered from the underlying Fleetspeak client, unlike the current GRR Admin UI which gathers its client data from the GRR clients instead.
Set up Grafana, until at least step 3. At this point, you should have a running instance of Grafana server.
Run GRRafana HTTP server by
grr_server --component grrafana
. Briefly, this server is a component of GRR which serves Grafana clients’ stats data from GRR, through the gRPC connection between GRR and Fleetspeak. For more details, check out google/grr#832. It should run by default on port 5000.Install JSON Datasource plugin by using the command:
grafana-cli plugins install simpod-json-datasource
. The plugin will issue JSON requests from Grafana to the HTTP server, and then display the queries’ results. Make sure that the url ishttp://<host>:5000
, and for a more friendly name than ‘JSON’, feel free to rename the data source to ‘grrafana’.After the server is up and running, if you do not want to create your own Grafana monitoring dashboards for individual client monitoring, we got you. In the sample dashboards in step 3, you can also find a sample dashboard called Client Load Stats for this exact purpose and import it as explained in step 5 here.
If you want, you can create Grafana dashboards or extend the sample dashboards to query individual clients. To do that, create a new dashboard (clicking on ‘+’ -> Dashboard), then click on Dashboard Settings on the top-right and add a new variable. Name the new variable
ClientID
(this name must be precise, with correct capitalization as well, in order for it to be identified correctly) and specifiy its type as “Constant” (and make sure to not hide the variable or the label, so that it’s visible in the dashboard). At this point, you should have an empty dashboard with an empty textbox namedClientID
above your panels. Feel free to create new panels and playing around with all the individual statistics information** you can get from your clients!
Aggregated Clients Statistics
In GRR Admin UI you are able to view aggregated statistics for multiple GRR clients, such as an OS breakdown, GRR version breakdown and so on. To see those statistics, on GRR Admin UI, under the ‘Management’ section on the menu, enter ‘Statistics’. Similarly to Client Load Stats, you may choose to view this information using Grafana.
To achieve this, follow these steps:
Follow the steps in Client Load Stats until at least step 3. At this point, you should have a Grafana instance up and running, the JSON Datasource plugin installed and configured, and the GRRafana HTTP server (as discussed in google/grr#832) is up and running.
You can now fetch aggregated clients statistics, which are named on a similar convention to the metrics gathered in the current GRR Admin UI, such as “OS Release Breakdown - N Days Active” and so on. Note that we do not provide sample dashboards for these metrics as they are table-based and cannot be manipulated too much. Therefore, you can make a panel that consists of a single metric alone by just selecting it on the dropdown box named “Metric” while creating the new Grafana panel.
Approval-based workflow
GRR has support for an approval-based access control system. To turn it on, set the API.DefaultRouter configuration option in the server configuration (see API routers documentation for more details):
API.DefaultRouter: ApiCallRouterWithApprovalChecks
Approval-based workflow places certain restrictions on what users can do without getting an explicit approval from another user.
Actions that don’t require an approval
Searching for clients.
Getting information about a client host (including the history).
Looking at the list of hunts.
Looking at hunt results, logs and other related data.
Looking at cron jobs.
Actions that do require an approval
Browsing client’s Virtual File System.
Inspecting client’s flows.
Starting new flows on a client.
Starting a new hunt (NOTE: this is a special case - starting a hunt requires an approval from an admin user).
Downloading files collected by a hunt.
Starting a new cron job.
Requesting an approval
Whenever you try to perform an action that requires an approval in the GRR web UI, you’ll see a popup dialog similar to this one:
You need to fill-in a comma-separated list of approvers. Each approver will get a notification about a pending approval request in their GRR web UI.
You can also select CC option to send an email to a predefined email address. This helps if you have multiple possible approvers and they’re all subscribed to the same mailing list. CC option will only work if you have a proper Email Configuration in your GRR server config.
For every approval you also need to enter a reason you request one. This is done for auditing purposes. Also, when you access a client that you have an approval for, you’ll see the reason displayed in GRR web UI’s top left corner.
Granting an approval
Whenever somebody requests an approval, you’ll get a notification in the GRR web UI: the notification button in the top right corner will turn red and display a number of pending notifications.
When you click on a Please grant access to <…> notification, you’ll see an approval request review page similar to this one:
You can see the approval request reason on the review page and also some details about the object that the requestor is trying to access (a client, a hunt or a cron job). If you find the approval request justified, you click on the “Approve” button. The requestor will receive a notification then stating that the access was granted.
Repacking GRR clients
The client can be customized for deployment. There are two key ways of doing this:
Repack the released client with a new configuration.
Rebuild the client from scratch (advanced users); see Building custom client templates.
Doing a rebuild allows full reconfiguration, changing names and everything else. A repack on the other hand limits what you can change. Each approach is described below.
Repacking the Client with a New Configuration.
Changing basic configuration parameters can be done by editing the server config file (/etc/grr/server.local.yaml) to override default values, and then using the config_updater to repack the binaries. This allows for changing basic configuration parameters such as the URL the client reports back to.
Once the config has been edited, you can repack all clients with the new config and upload them to the datastore using
$ grr_config_updater repack_clients
Repacking works by taking the template zip files, injecting relevant configuration files, and renaming files inside the zip to match requested names. This template is then turned into something that can be deployed on the system by using the debian package builder on linux, creating a self extracting zip on Windows, or creating an installer package on OSX.
After running the repack you should have binaries available in the UI
under Binaries → installers and also on the filesystem under grr/executables
.
Repacking Clients: Signing installer binaries
You can also use the grr_client_build tool to repack individual templates and control more aspects of the repacking, such as signing. For signing to work you need to follow these instructions:
Linux RPM signing
Linux RPMs are signed following a similar process to windows. A template is built inside a docker container / Vagrant VM and the host does the repacking and signing.
To get set up for signing, first you need to create a gpg key that you will use for signing (here’s a decent HOWTO).
Then make sure you have rpmsign and rpm utilities installed on your host system:
sudo apt-get install rpm
Tell GRR where your public key is:
sudo ln -s /path/to/mykey.pub /etc/alternatives/grr_rpm_signing_key
Set this config variable to whatever you used as your key name:
ClientBuilder.rpm_gpg_name: my key name
That’s it, you can follow the normal build process.
Windows EXE signing
The easiest way to get a properly signed Windows GRR installer is to rebuild the client template and repack it on a dedicated Windows machine.
To do that:
Edit your
/etc/grr/server.local.yaml
by configuring Windows signing and verification commands:ClientBuilder.signtool_signing_cmd: | C:\Program Files \(x86\)\Windows kits\10\App Certification Kit\signtool.exe sign /f Sample.CodeSigning.pfx ClientBuilder.signtool_verification_cmd: | C:\Program Files \(x86\)\Windows kits\10\App Certification Kit\signtool.exe verify /pa /all
Note: these are sample commands using a sample certificate file. In your case commands might depend on your setup. Please consult the Windows documentation for details about code signing.
Install Python 3.7 on the Windows box.
Install Windows SignTool. It’s available as part of Windows SDK.
Copy your server configuration files to the Windows box:
/etc/grr/server.local.yaml
/usr/share/grr-server/install_data/etc/grr-server.yaml
Make sure that server.local.yaml is transferred and stored securely - it contains sensitive keys.
In the console run:
C:\Python37\python.exe -m venv temp_venv .\temp_venv\Scripts\activate pip install grr-response-client-builder
This will install the GRR client builder from PIP. Alternatively, you can checkout GRR from GitHub and build it from source (see our Appveyor configuration as a starting point).
To build the client template, run (while in the virtualenv):
grr_client_build --verbose build --output templates --config grr-server.yaml --secondary_config server.local.yaml
To sign the built template, run:
grr_client_build --verbose sign_template --template templates\GRR_3.4.0.1_amd64.exe.zip --output_file templates\GRR_3.4.0.1_amd64_signed.exe.zip --config grr-server.yaml --secondary_config server.local.yaml
To repack the signed template, run:
grr_client_build repack --template templates\GRR_3.4.0.1_amd64_signed.exe.zip --sign --signed_template --output_dir repacked --config grr-server.yaml --secondary_config server.local.yaml
The resulting signed installer will be put into the
repacked
folder.
MacOS PKG signing
The easiest way to get a properly signed MacOS GRR installer is to rebuild the client template and repack it on a dedicated MacOS machine.
Edit your
/etc/grr/server.local.yaml
to reference the certificate name and, optionally, the keychain:ClientBuilder.signing_cert_name: "My certificate name"
Optionally you can also specify a keychain file to use (if not specified, a default one of
%(HOME|env)/Library/Keychains/MacApplicationSigning.keychain
will be used):ClientBuilder.signing_keychain_file: "%(HOME|env)/Library/Keychains/My.keychain"
Copy your server configuration files to the MacOS box:
/etc/grr/server.local.yaml
/usr/share/grr-server/install_data/etc/grr-server.yaml
Make sure that server.local.yaml is transferred and stored securely - it contains sensitive keys.
Execute following commands:
python3 -m venv temp_venv source temp_venv/bin/activate # you might need to run "xcode-select --install" in order for the next command to succeed export MACOSX_DEPLOYMENT_TARGET=10.10 CFLAGS=-stdlib=libc++ pip install grr-response-client-builder
Build the client template (it will get signed along the way):
grr_client_build --verbose build --output templates --config grr-server.yaml --secondary_config server.local.yaml
Repack it (this command technically doesn’t need MacOS to be executed, can be done on the GRR server side):
grr_client_build --verbose repack --template templates/grr_3.4.0.1_amd64.xar.zip --output_dir repacked --config grr-server.yaml --secondary_config server.local.yaml
The resulting installer with all contents signed will be placed into the
repacked
folder. If you want the installer itself to be signed, use the productsign utility (see this article, for example).
Repacking Clients With Custom Labels: Multi-Organization Deployments
Each client can have a label “baked in” at build time that allows it to be identified and hunted separately. This is especially useful when you want to deploy across a large number of separate organisations. You achieve this by creating a config file that contains the unique configuration you want applied to the template. A minimal config that just applies a label would contain:
Client.labels: [mylabel]
and be written into repack_configs/mylabel.yaml.
Then you can call repack_multiple to repack all templates (or whichever templates you choose) with this configuration and any others in the repack_configs directory. An installer will be built for each config:
grr_client_build repack_multiple --templates /path/to/grr-response-templates/templates/*.zip --repack_configs /path/to/repack_configs/*.yaml --output_dir=/grr_installers
To sign the installers (RPM and EXE), add –sign.
Client Configuration.
Configuration of the client is done during the packing/repacking of the client. The process looks like:
For the client we are packing, find the correct context and platform, e.g.
Platform: Windows
Client Context
Extract the relevant configuration parameters for that context from the server configuration file, and write them to a client specific configuration file e.g.
GRR.exe.yaml
Pack that configuration file into the binary to be deployed.
When the client runs, it determines the configuration in the following
manner based on --config
and --secondary_configs
arguments that are
given to it:
Read the config file packed with the installer, default:
c:\windows\system32\GRR\GRR.exe.yaml
GRR.exe.yaml reads the Config.writeback value, default:
reg://HKEY_LOCAL_MACHINE/Software/GRR
Read in the values at that registry key and override any values from the yaml file with those values.
Most parameters are able to be modified by changing parameters and then
restarting GRR. However, some configuration options, such as
Client.name
affect the name of the actual binary itself and therefore
can only be changed with a repack on the server.
Updating a configuration variable in the client can be done in multiple ways:
Change the configuration on the server, repack the clients and redeploy/update them.
Edit the yaml configuration file on the machine running the client and restart the client.
Update where Config.writeback points to with new values, e.g. by editing the registry key.
Issue an UpdateConfig flow from the server (not visible in the UI), to achieve 3.
As an example, to reduce how often the client polls the server to every 300 seconds, you can update the registry as per below, and then restart the service:
C:\Windows\System32\>reg add HKLM\Software\GRR /v Client.poll_max /d 300
The operation completed successfully.
C:\Windows\System32\>
Low-level maintenance with grr_console
GRR is shipped with a grr_console binary. This is effectively an IPython shell that gets initialized with a few basic GRR objects and offers you direct access to internal GRR classes.
In cases when there’s no proper tool or UI or API for performing a certain task, grr_console may be useful. However, given that it exposes undocumented GRR internals and gives you complete unrestricted and unaudited access to GRR datastore, it’s important to remember that:
It’s very easy to shoot yourself in the foot and make the system unusable by running a wrong piece of code in grr_console.
GRR is a project that’s being actively developed. Class names, import paths, interfaces are changing all the time. Even if you have a snippet of code that works in grr_console today, it may stop working in a month.
Console Snippets (might break any time)
Delete a client from the database
In [1]: print data_store.REL_DB.ReadClientSnapshot(u"C.83558528f50f993e").knowledge_base.fqdn
example.host.com
In [2]: data_store.REL_DB.DeleteClient(u"C.83558528f50f993e")
Delete all signed binaries from the database
In [1]: ids = data_store.REL_DB.ReadIDsForAllSignedBinaries()
In [2]: for i in ids:
...: data_store.REL_DB.DeleteSignedBinaryReferences(i)
...:
GRR datastore
GRR needs a database backend in order to run. Previously, GRR supported SQLite and MySQL databases, but support for SQLite was dropped after GRR v3.2.3.2 was released. MySQL is now GRR’s only supported database backend.
For users that were using the obsoleted SQLiteDataStore, please note that GRR does not have functionality for migrating data to the MySQL datastore. No need to fret though - a client re-deployment will not be necessary. Existing clients will still be able to talk to the GRR server when starting off with a new database. Client re-enrollment and interrogation should happen automatically.
AFF4 deprecation
Starting from the version 3.3.0 GRR uses a new datastore format by default - REL_DB. REL_DB is backwards-incompatible with the now-deprecated AFF4 datastore format (even though they both use MySQL as a backend).
Use of AFF4-based deployments is now discouraged. REL_DB is expected to be much more stable and performant.
Switching an AFF4-based deployment to use REL_DB
To check which datastore implementation you’re using: go to “Setting” in the GRR AdminUI and check the “Database” section. If Database.aff4_enabled
is set to 0
and Database.enabled
is set to 1
, you’re running REL_DB.
Note (IMPORTANT!): There’s no established data migration process to migrate the data from AFF4 data storage to REL_DB. You’ll lose all the historical data when you switch an existing deployment to use REL_DB. However, the clients, will continue to work without issues. They will show up in the new datastore first time they check in (normally every 10 minutes).
To switch an existing GRR deployment to REL_DB.
Backup your MySQL database and GRR server configuration files (this is a good idea for any kind of GRR upgrade, ideally you should also have regular backups set up).
Upgrade GRR to a version greater or equal than 3.3.0.0.
Run the following to update the GRR server configuration file:
sudo grr_config_updater switch_datastore # Add at least one admin user to the empty REL_DB datastore. sudo grr_config_updater add_user --admin admin
Answer the questions asked by the tool.
Restart GRR server.
sudo service grr-server restart
At this point GRR will run with an empty REL_DB datastore. Your clients will still be able to talk to the server and will appear in the AdminUI as soon as they check in (normally within 10 minutes).
Switching back to AFF4-based deployment
If for some reason you had an issue with switching to REL_DB and would like to go back to using AFF4 datastore at your GRR deployment, edit your configuration file at /etc/grr/server.local.yaml
and make sure you have the settings below set correctly (or simply restore the file to its backed up version):
Database.aff4_enabled: True
Database.enabled: False
# Ignore the following line if you had custom
# Blobstore.implementation setting before.
Blobstore.implementation: MemoryStreamBlobStore
Scaling GRR within a single server
For 3.1.0 and later, use a systemd drop-in override to control how many copies of each component you run on each machine. This can initially be done using:
sudo systemctl edit grr-server
which creates “/etc/systemd/system/grr-server.service.d/override.conf”. You’ll want to turn this into a template file and control via puppet or similar. An example override that just runs 3 workers looks like:
[Service]
ExecReload=
ExecReload=/bin/systemctl --no-block reload grr-server@worker.service grr-server@worker2.service grr-server@worker3.service
ExecStart=
ExecStart=/bin/systemctl --no-block start grr-server@worker.service grr-server@worker2.service grr-server@worker3.service
ExecStop=
ExecStop=/bin/systemctl --no-block stop grr-server@worker.service grr-server@worker2.service grr-server@worker3.service
When starting multiple copies of the UI and the web frontend you also need to tell GRR which ports it should be using. So if you want 10 http frontends on a machine you would configure your systemd drop-in to start 10 copies and then set Frontend.port_max so that you have a range of 10 ports from Frontend.bind_port. (I.E. set Frontend.bind_port to 8080 and Frontend.port_max to 8089) You can then configure your load balancer to distribute across that port range. AdminUI.port_max works the same way for the UI.
Large Scale Deployment
The GRR server components should be distributed across multiple machines in any deployment where you expect to have more than a few hundred clients, or even smaller deployments if you plan on doing intensive hunting. The performance needs of the various components are discussed below, and some real-world example deployment configurations are described in the FAQ.
You should install the GRR package on all machines and use configuration management (chef, puppet etc.) to:
Distribute the same grr-server.yaml to each machine
Control how many of each component to run on each machine (see next section for details)
For scaling the fleetspeak communication framework, see the instructions in the fleetspeak section.
Component Performance Needs
Worker: you will probably want to run more than one worker. In a large deployment where you are running numerous hunts it makes sense to run 20+ workers. As long as the datastore scales, the more workers you have the faster things get done. We previously had a config setting that forked worker processes off, but this turned out to play badly with datastore connection pools, the stats store, and monitoring ports so it was removed.
HTTP frontend: The frontend http server can be a significant bottleneck. By default we ship with a simple http server, but this is single process, written in python which means it may have thread lock issues. To get better performance you will need to run multiple instances of the HTTP frontend behind a reverse HTTP proxy (i.e. Apache or Nginx). Assuming your datastore handles it, these should scale linearly.
Web UI: The admin UI component is usually under light load, but you can run as many as you want for redundancy (you’ll need to run them behind Apache or Nginx to load-balance the traffic). The more concurrent GRR users you have, the more instances you need. This is also the API server, so if you intend to use the API heavily run more.
Building custom client templates
Building the templates
Note: templates have to be built on a target platform (i.e. if you need a Windows 10 GRR client template, you have to build on Windows 10).
After installing the GRR server components, you can build client templates with
grr_client_build build --output mytemplates
If you can’t install prebuilt Debian or PIP packages, you’d need to install GRR from source. You might choose to install the full server or just the packages needed to build the templates. Please see Installing from source for details.
Repacking the templates
The templates are generic GRR client builds that have to be repacked to be used with a particular GRR server deployment. Repacking is a process that injects server-specific configuration (like the frontend address, labels, etc).
While templates have to be built on the client target platform, repacking has to be done on your GRR server, as GRR server contains the configuration (i.e., crypto credentials for the clients to use).
grr_client_build repack --template mytemplates/*.zip --output_dir mytemplates
Samples
We have fully automated client builds for Linux, OSX and Windows. The scripts we use are mostly in the Travis and Appveyor directories on GitHub. Duplicating what we do there might give you custom clients with little effort.
Troubleshooting
Please check the output of your system’s service logs and/or GRR logging directory (default:
/usr/share/grr-server/lib/python2.7/site-packages/grr_response_core/var/log/
) for more information regarding failures.
The upstart/init.d scripts show no output
When I run an init.d script e.g. “/etc/init.d/grr-http-server start” (deprecated) it does not show me any output.
Make sure that the “START” parameter in the corresponding default file, e.g. “/etc/default/grr-http-server”, has been changed to “yes”.
I cannot start any/some of the GRR services using the service scripts
When I run an init.d script e.g. “/etc/init.d/grr-http-server start” (deprecated) it indicates it started the service although when I check with “/etc/init.d/grr-http-server status” it says it is not running.
You can troubleshoot by running the services in the foreground, e.g. to run the HTTP Front-end server in the foreground:
sudo grr_server --component [frontend|adminui|worker] --verbose
Any/some of the GRR services are not running correctly
Check if the logs contain an indication of what is going wrong.
Troubleshoot by running the services in the foreground, e.g. to run the UI in the foreground:
sudo grr_server --component [frontend|adminui|worker] --verbose
deprecated
Cannot open libtsk3.so.3
error while loading shared libraries: libtsk3.so.3: cannot open shared object file: No such file or directory
The libtsk3 library cannot be found in the ld cache. Check if the path to libtsk3.so.3 is in /etc/ld.so.conf (or equivalent) and update the cache:
sudo ldconfig
GRR and Fleetspeak setup
Installing from a release deb (recommended)
This is the recommended way of running a fleetspeak based setup.
Starting from v3.4.2.3, the release deb contains a
bundled fleetspeak installation. To use it, proceed as described in
installation guide. During
the installation of the package (or when running grr_config_updater
) answer
the following question with yes
:
Use Fleetspeak (next generation communication framework)?
Installing from source
This guide will walk you through the process of installing GRR from sources and making it communicate through Fleetspeak, rather than its old custom communication protocol.
At the end you should have a working single-machine setup with both the server and the client communicating with each other. Follow other documentation pages to learn how to adapt this setup to real deployments.
Installation
System
Because we are going to install everything from sources, there are some dependencies required to be installed on your system first.
Since GRR is written in Python, the fundamental requirement is to have it
available on your system. GRR needs at least Python 3.6. On recent Debian-based
distros it should be enough to install the python3-dev
package:
$ sudo apt install python3-dev
Moreover, both GRR and Fleetspeak use MySQL (or MariaDB) as its RDBMS, so make
sure it is also installed. The package that GRR uses to communicate with MySQL
depends on the MySQL client library, so you should set it up as well. On Debian
distros it is libmysqlclient-dev
or libmariadbclient-dev
:
$ sudo apt install libmysqlclient-dev
Finally, GRR uses Closure Compiler to process its JavaScript files. While installing Closure Compiler is handled for you in one of the build scripts, it requires a Java Runtime Environment to run. OpenJDK should work just fine, so on Debian distros it is enough to install it like:
$ sudo apt install openjdk-13-jre
Fleetspeak
Install Fleetspeak and follow the guide to learn how to configure services and run the Fleetspeak server and the Fleetspeak client.
GRR
Use Git to clone the GRR repository to your workstation and enter into it:
$ git clone https://github.com/google/grr
$ cd grr/
For the rest of the guide we will assume that the working directory is the root
GRR folder, placed in your home folder (/home/username/grr
), so remember to
adapt to your specific setup.
To avoid cluttering the filesystem, we will also use Python’s virtual environment:
$ python3 -m venv venv
Before proceeding with installing GRR packages themselves, you also need to
install grpcio-tools
and protobuf
packages (for compiling Protocol Buffers
files) and nodeenv
(for compiling JavaScript files):
$ venv/bin/pip install grpcio-tools protobuf nodeenv
$ venv/bin/nodeenv -p --prebuilt --node=12.16.1
Once your virtual environment is ready, install all the required GRR Python packages:
$ source venv/bin/activate
(venv) $ pip install -e grr/proto/
(venv) $ pip install -e grr/core/
(venv) $ pip install -e grr/client/
(venv) $ pip install -e grr/client_builder/
(venv) $ pip install -e api_client/python/
(venv) $ pip install -e grr/server/[mysqldatastore]
(venv) $ deactivate
If you encounter any problems related to mysqlclient
during the installation of grr/server
when using MariaDB, then try the following:
(venv) $ pip install mysqlclient
(venv) $ pip install -e grr/server/
(venv) $ deactivate
Configuration
MySQL / MariaDB
We need to create a GRR database along with an associated user. In this guide
the database will be named grr
, the user will be named grr-user
and will
be identified by password grr-password
. We also need to increase the max
allowed packet size to 40 MiB.
Fire up the MySQL console as an administrative user, e.g.:
$ mysql --user root
Then, use the following commands:
mysql> CREATE USER `grr-user` IDENTIFIED BY 'grr-password';
mysql> CREATE DATABASE `grr`;
mysql> GRANT ALL PRIVILEGES ON `grr`.* TO `grr-user`;
mysql> SET GLOBAL max_allowed_packet = 40 * 1024 * 1024;
GRR
Now you should initialize the GRR configuration. GRR comes with a tool for this and handles most of the things for you:
$ venv/bin/grr_config_updater initialize
Use the MySQL credentials that you configured previously. It should be fine to keep the rest of the things at the default values (but skip client repacking).
Now you need to configure GRR to work with Fleetspeak. Pick an unused port, e.g.
1138
on which your GRR service will run.
Add the following lines to the GRR config file in
grr/core/install_data/etc/server.local.yaml
(where localhost:1138
uses the
port you picked and localhost:9091
is the address your Fleetspeak admin server
binds to):
Client.fleetspeak_enabled: true
Server.fleetspeak_message_listen_address: "localhost:1138"
Server.fleetspeak_server: "localhost:9091"
Fleetspeak
Server
Add the following definition to your Fleetspeak server-side service configuration file (using appropriate port you chose for GRR):
services {
name: "GRR"
factory: "GRPC"
config: {
[type.googleapis.com/fleetspeak.grpcservice.Config] {
target: "localhost:1138"
insecure: true
}
}
}
Client
Define a new service text file with the following contents:
name: "GRR"
factory: "Daemon"
config: {
[type.googleapis.com/fleetspeak.daemonservice.Config]: {
argv: "/home/username/grr/venv/bin/python"
argv: "-m"
argv: "grr_response_client.grr_fs_client"
}
}
Running
Start the Fleetspeak server and all the GRR components in separate terminals:
$ venv/bin/python -m grr_response_server.gui.admin_ui
$ venv/bin/python -m grr_response_server.bin.worker
$ venv/bin/python -m grr_response_server.bin.fleetspeak_frontend
Finally, start the Fleetspeak client. After a moment, it should successfully register with GRR and you should be able to see it in the GRR UI. In your web browser, using appropriate client identifier, navigate to:
http://localhost:8000/#/clients/C.<your_fleetspeak_client_id>/host-info
You should see a page with details about your client. GRR should automatically
schedule initial interrogation flow, which should complete within couple of
minutes. You can also try some simpler flows, for example listing all processes
(ListProcesses
in Processes category), to verify that everything works as
expected.
Scaling a fleetspeak setup
The following diagram shows a standard, distributed fleetspeak setup:
The fleetspeak server setup consists of 2 components:
Fleetspeak Server: This component terminates the streaming HTTPS connections from the fleetspeak clients. Each fleetspeak client is connected to a particular server instance at a time.
Fleetspeak Admin Server: This component provides an administrative GRPC interface as well as an interface for the GRR server to send messages to clients.
Both components are included in the single binary /usr/bin/fleetspeak-server
and can be enabled respectively in the config file
/etc/fleetspeak-server/server.components.config
. The binary is automatically
started by the systemd unit fleetspeak-server
.
In the legacy GRR protocol, clients would periodically poll the server for messages, leading to delays in message delivery. Fleetspeak on the other hand uses persistent, streaming connections and thus can deliver messages without a significant delay. For this mechanism to work, a simple, HTTP based notification protocol between the server an admin server is necessary: if the admin server submits a message for a particular client, it notifies the respective server so that the server immediately delivers the message to the client.
To run a fleetspeak in a distributed setup:
Install the GRR release deb on all machines.
Take the files
/etc/fleetspeak-server/server.services.config
and/etc/fleetspeak-server/server.components.config
from a working installation, adapt them and distribute them to the target machines.
Running the Fleetspeak Server component
To run a server component, keep the https_config
section and remove the
admin_config
section in the file
/etc/fleetspeak-server/server.components.config
.
Add 2 more config options:
notification_listen_address
: the IP address and port on which the server listens for notifications from the admin server. This port is not meant to be publicly accessible, but will need to be reachable from other servers.notification_public_address
the IP address and port advertised to other servers.
The final config should look like:
mysql_data_source_name: "[...]"
https_config: <
listen_address: "192.168.1.83:4443"
certificates: "-----BEGIN CERTIFICATE-----\n [...]"
key: "-----BEGIN EC PRIVATE KEY----- [...]"
>
notification_listen_address: "192.168.1.83:10000"
notification_public_address: "192.168.1.83:10000"
Next, configure a TCP/IP load balancer balancing traffic to the port specified
in https_config.listen_address
on the respective server instances.
In case that your load balancer removes the client’s IP address (i.e.
fleetspeak sees connections originating from the load balancer’s IP), it’s
recommended to use the PROXY
protocol. In this
case, add one more option in the file
/etc/fleetspeak-server/server.components.config
:
proxy_protocol: true
To make GRR/fleetspeak clients use the distributed fleetspeak setup, run
sudo grr_config_updater initialize
to repack the GRR clients. If the load balancer
listens on 1.2.3.4:4443
, answer the fleetspeak related questions as follows:
Use Fleetspeak (next generation communication framework)? [yN]: [N]: y
Please enter your hostname e.g. grr.example.com [grr-server]: 1.2.3.4
Fleetspeak public HTTPS port [4443]: 4443
Running the Fleetspeak Admin Server component
To run the admin server component, keep the admin_config
section and remove
the https_config
section in the file
/etc/fleetspeak-server/server.components.config
.
Add one more config option:
notification_use_http_notifier
: enables usage of the HTTP based notification protocol.
The final config should like:
mysql_data_source_name: "[...]"
admin_config: <
listen_address: "192.168.1.83:4444"
>
notification_use_http_notifier:true
Next, configure a TCP/IP load balancer balancing traffic to the port specified
in admin_config.listen_address
on the respective admin server instances.
To make the GRR server use the distributed admin server setup, edit the config
file /etc/grr/server.local.yaml
. If the load balancer listens on
1.2.3.4:4444
, set the following config option:
Server.fleetspeak_server: 1.2.3.4:4444
Developing GRR
Setting Up a Development Environment
NOTE: This article describes setting up GRR development environment on Ubuntu 18 (Xenial) or Debian 10 (buster). Most of the instructions below are applicable to other Linux distributions as well as to Mac OS X, although some details may differ from platform to platform.
Step 1. Checkout GRR sources.
Navigate to the root GRR directory (or clone the repository if you have not done it yet):
git clone https://github.com/google/grr
cd grr/
Step 2. Set up Virtual Python environment.
Make sure that you have venv
installed. It should be available in
repositories of all popular Linux distributions. For example, to install it on
Ubuntu-based distros, run:
sudo apt install python3-venv
To create a virtual environment you execute the python3 venv $DIR
command
where $DIR
is the directory where you want it to be placed. The rest of the
manual will assume that the environment is created in ~/grr_venv
like
this:
python3 -m venv ~/grr_venv
After creating the environment you have to activate it:
source ~/grr_venv/bin/activate
It is also advised to make sure that we are running a recent version of pip
and have latest versions of a few related packages:
pip install --upgrade pip wheel setuptools six
For more information about creating and managing your virtual environments
refer to the venv
documentation.
Step 3. Set up Node.js environment.
Because GRR offers a user interface component it also needs some JavaScript code
to be built with Node.js. Fortunately, this can be done with pip
inside our
virtual environment so that your system remains uncluttered.
To install Node.js, execute:
pip install nodeenv
nodeenv -p --prebuilt --node=12.14.1
Because the nodeenv
command modifies our virtual environment, we also need to
reinitialize it with:
source ~/grr_venv/bin/activate
Step 4. Install GRR dependencies.
GRR depends on a few libraries that have to be installed on a system. Run this command to ensure you have them:
sudo apt-get install libssl-dev python3-dev python-pip wget openjdk-8-jdk zip dh-systemd libmysqlclient-dev
Depending on the distribution, the set of packages providing these functionalities can be a bit different—please, refer to your distribution package repository. Useful replacement suggestions are also often provided by package managers themselves.
Step 5. Install GRR packages.
GRR is split into multiple packages. For the development we recommend installing all components. Assuming that you are in the root GRR directory run the following commands:
NOTE: order of the commands is important. Running these commands in wrong order will lead to PIP fetching prebuilt GRR packages from PyPi. In such case, editing code in checked out GRR repository won’t have any effect.
pip install -e ./grr/proto && \
pip install -e ./grr/core && \
pip install -e ./grr/client && \
pip install -e ./api_client/python && \
pip install -e ./grr/client_builder && \
pip install -e ./grr/server/[mysqldatastore] && \
pip install -e ./colab && \
pip install -e ./grr/test
The -e
(or --editable
) flag passed to pip
makes sure that the packages
are installed in a “development” mode and any changes you make in your working
directory are directly reflected in your virtual environment, no reinstalling
is required.
If running
pip install -e ./grr/server/[mysqldatastore]
results in the following error:OSError: mysql_config not found
, it means that the MySQL/MariaDB installation from Step 4 was incorrect. Make sure you are using the correct package, for examplelibmariadbclient-dev-compat
instead oflibmariadb-dev
.
If running
pip install -e ./grr/server/[mysqldatastore]
results in an error caused bymysqlclient
such as_mysql.c:1911:41: error: 'MYSQL' has no member named 'reconnect'
, try ommitting[mysqldatastore]
and installing it manually withpip install mysqlclient
.
Step 6. Install prebuilt GRR client templates.
If you want to repack GRR clients, you’ll need a prebuilt GRR client templates package with templates for Linux/Mac/Windows. This should be installed using the following command:
pip install --no-cache-dir -f https://storage.googleapis.com/releases.grr-response.com/index.html grr-response-templates
Step 7: Install MySQL
The GRR server components need to connect to a running MySQL instance in order to run:
apt install -y mysql-server
Step 8. Initialize GRR.
To initialize the development GRR setup, run:
grr_config_updater initialize
Step 9. Run GRR components.
You can use one of the commands below to run specific GRR server components.
Fleetspeak server: fleetspeak_server
Admin UI: grr_admin_ui
Frontend: grr_frontend
Worker: grr_worker
You can also run a local GRR client by executing fleetspeak_client
command.
Adding --verbose
flag to any of these commands would force GRR components to output debug information to stderr.
Testing
Now you are ready to start the GRR development. To make sure that everything is set-up correctly follow the testing guide.
Running the tests
For running tests GRR uses the pytest framework.
Prerequisites
Make sure you have correctly set-up the
development environment, especially the part about
installing the grr-response-test
package.
This setup is sufficient for running most of the tests. However, GRR also has a UI component written in Angular. For testing that part we use Selenium and ChromeDriver which need to be installed first. You can skip this part if you do not need to execute UI tests but we strongly recommend to run them as well.
On Debian-based distros simply install chromium-driver
package:
sudo apt install chromium-driver
If there is no chromium-driver
available in your repositories you may try
installing Chromium browser and then downloading latest chromium-driver
binary
from the official website.
After downloading unpack it somewhere and add it to your $PATH
or just move it
to /usr/bin
.
Running the whole test suite
To run all the tests, navigate to the root GRR directory and simply execute:
pytest
This will automatically discover and execute all test cases.
Running tests in parallel
To use pytest to run tests in parallel, install the pytest-xdist plugin
pip install pytest-xdist
and run
pytest -n <number of cores to use>
Running the tests selectively
Running all the tests is reasonable when you want to test everything before publishing your code, but it is not feasible during development. Usually you just want to execute only tests in a particular directory, like so:
pytest grr/server/aff4_objects
Or just a particular file:
pytest grr/server/aff4_objects/filestore_test.py
Or just a particular test class:
pytest grr/server/aff4_objects/filestore_test.py::HashFileStoreTest
Or even just a single test method:
pytest grr/server/aff4_objects/filestore_test.py::HashFileStoreTest::testListHashes
Ignoring test cases
Some kind of tests are particularly slow to run. For example, all UI tests are based on running a real web browser instance and simulating its actions which is painfully sluggish and can be very annoying if we want to test non-UI code.
In this case we can skip all tests in particular directory using the --ignore
flag, e.g.:
pytest --ignore=grr/gui/selenium_tests
This will run all the tests except the Selenium ones.
Benchmarks
Benchmarks are not really testing the code correctness so there is no point in running them every time we want to publish our code. This is why the test suite will not run them by default. However, sometimes they can be useful for testing the performance and sanity of our system.
In order to run the test suite including the benchmarks, pass the --benchmark
option to the test runner:
pytest --benchmark
Debugging
If our tests are failing and we need to fix our code the Python Debugger can come in handy.
If you run pytest with --pdb
flag then upon a failure the program execution
will be halted and you will be dropped into the PDB shell where you can
investigate the problem in the environment it occurred.
If you set breakpoints in your code manually using pdb.set_trace()
you will
notice a weird behaviour when running your tests. This is because pytest
intercepts the standard input and output writes, breaking the PDB shell. To deal
with this behaviour simply run tests with -s
flag - it will prevent pytest
from doing that.
More information
The functionalities outlined in this guide are just a tip of the pytest
capabilities. For more information consult pytest’s man page, check
pytest --help
or visit the pytest homepage.
Contributing Code
GRR is a somewhat complex piece of code. While this complexity is necessary to provide the scale and performance characteristics we need, it makes it more difficult to get involved and understand the code base. So just be aware that making significant changes in the core will be difficult if you don’t have a software engineering background, or solid experience with python that includes use of things like pdb and profiling.
That said, we want your contribution! You don’t need to be a veteran pythonista to contribute artifacts or parsers. But whatever your experience, we strongly recommend starting somewhere simple before embarking on core changes and reading our documentation. In particular we recommend these as good starting points:
Build a standalone console script to perform the actions you want. A standalone console script won’t benefit from being able to be run as a Collector or a Hunt, but it will get you familiar with the API, and an experienced developer can convert it into a fully fledged flow afterwards.
Add to Artifacts or an existing flow. Many flows could do with work to detect more things or support additional platforms.
Add a new parser to parse a new filetype, e.g. if you have a different Anti-virus or HIDS log you want to parse.
Contributor License Agreement
GRR is an opensource project released under the Apache License and you should feel free to use it in any way compatible with this. However, in order to accept changes into the GRR mainline repository we must ask that keep a signed a Google Contributor License Agreement on file.
Style
The Chromium and Plaso projects have some good general advice about code contributions that is worth reading. In particular, make sure you’re communicating via the dev list before you get too far into a feature or bug, it’s possible we’re writing something similar or have already fixed the bug.
Code needs to conform to the Google Python Style Guide. Note that despite what the guide says, we use two-space indentation, and MixedCase instead of lower_underscore for function names since this is the internal standard. Two-space indentation also applies to CSS.
Setup
We use the github fork and pull review process to review all contributions.
First, fork the GRR repository by following the github instructions.
Now that you have a github.com/your-username/grr repository:
# Make a new branch for the bug/feature
$ git checkout -b my_shiny_feature
# Make your changes, add any new files
$ git add newmodule.py newmodule_test.py
When you’re ready for review, sync your branch with upstream:
$ git fetch upstream
$ git merge upstream/master
# Fix any conflicts and commit your changes
$ git commit -a
$ git push origin HEAD
Use the GitHub Web UI to create and send the pull request. We’ll review the change.
# Make review changes
$ git commit -a
$ git push origin HEAD
Once we’re done with review we’ll commit the pull request.
Contributing Documentation
GRR’s documentation is stored on GitHub and is hosted on readthedocs.
Documentation repository structure
master branch of github.com/google/grr-doc contains the HEAD version of GRR documentation. Effectively, it has the documentation that will be included into the next GRR release.
Every time a release is made, we create a new documentation branch. I.e. we have such branches as v3.2.0, v3.2.1, etc.
NOTE: whenever a change in master branch is done, it should be applied to the latest release branch (unless the change is not backwards-compatible). Reasoning: latest release branch is what GRR users see on readthedocs.
Contributing small documentation changes
If a suggested change doesn’t include any new files, the easiest way to submit it is to edit the docs directly on GitHub. GitHub will then create a pull request for you, and then one of GRR team members can review the pull request and merge it into the master branch and latest release branch.
Setting up editing environment with Sphinx
Readthedocs.io uses Sphinx to render the documentation. In order to have a live preview of the changes done to a local grr-doc clone, one should do the following :
mkdir ~/tmp/grr-doc
cd ~/tmp/grr-doc
virtualenv .
source bin/activate
pip install Sphinx sphinx-autobuild recommonmark sphinx_rtd_theme
git clone https://github.com/google/grr-doc
sphinx-autobuild grr-doc out
The code above should start the Sphinx documentation server at http://127.0.0.1:8000/. The server will monitor documentation file for changes and update instantly.
Using Visual Studio Code as an editing tool
GRR documentation should be written in Markdown, so you can write it using any text editor you prefer. However, Visual Studio Code has a builtin support for Markdown and makes editing it really easy: you have a preview window in a separate pane that is updated as you type.
To give VSCode a try, do the following:
Download and install it from https://code.visualstudio.com/download.
Launch VSCode. Click on File->Open Folder. Point the dialog to checked-out grr-doc sources.
Click on any of the Markdown files. Then press Ctrl-K V (this means press Ctrl + K, then separately, after Ctrl + K is not pressed anymore, press V).
Now if you edit the Markdown file the preview pane is going to be updated automatically.
Implementation details
GRR Messages
On the wire, the client and server interchange messages. We term the messages sent from server to the client requests, while messages sent from the client to the server are responses. Requests sent to the client ask the client to perform some action, for example ListDirectory. We term these actions the client action. A single request may elicit multiple responses.
For example, the request message ListDirectory will elicit a response for each file in the directory (potentially thousands). Requests and responses are tracked using an incremental request_id and response_id.
In order to indicate when all responses have been sent for a particular request, the client sends a special STATUS message as the last response. In the event of errors, the message contains details of the error (including backtrace). If the action completed successfully, an OK status is returned.
Messages are encoded as GrrMessage protobufs:
session_id |
A unique integer for all packets generated by this flow. |
name |
The name of the Action to be called on the client (See below). |
args |
A serialized protobuf which will be interpreted by the Action. |
request_id |
An incrementing number of this request (see below) |
response_id |
An incrementing number of the response (see below) |
Figure 2 illustrates a typical sequence of messages. Request 1 was sent from the server to the client, and elicited 3 responses, in addition to a status message.
When the server sends the client messages, the messages are tagged in the data store with a lease time. If the client does not reply for these requests within the lease time, the requests become available for lease again. This is designed for the case of the client rebooting or losing connectivity part way through running the action. In this case, the request is re-transmitted and the action is run again.
Client Path Specifications
One of the nice things about the GRR client is the ability to nest file readers. For example, we can read files inside an image using the sleuthkit and also directly through the API. We can read registry keys using REGFI from raw registry files as well as using the API. The way this is implemented is using a pathspec.
Pathspecs
The GRR client has a number of drivers to virtualize access to different objects, creating a Virtual File System (VFS) abstraction. These are called VFS Handlers and they provide typical file-like operations (e.g. read, seek, tell and stat). It is possible to recursively apply different drivers in the correct order to arrive at a certain file like object. In order to specify how drivers should be applied we use Path Specifications or pathspecs.
Each VFS handler is constructed from a previous handler and a pathspec. The pathspec is just a collection of arguments which make sense to the specific VFS handler. The type of the handler is carried by the pathtype parameter:
pathtype: OS Implemented by the grr.client.vfs_handlers.file module is a VFS Handler for accessing files through the normal operating system APIs.
pathtype: TSK Implemented by the grr.client.vfs_handlers.sleuthkit module is a VFS Handler for accessing files through the sleuthkit. This Handle depends on being passed a raw file like object, which is interpreted as the raw device.
A pathspec is a list of components. Each component specifies a way to derive a new python file-like object from the previous file-like object. For example, image we have the following pathspec:
path: /dev/sda1
pathtype: OS
nested_path: {
path: /windows/notepad.exe
pathtype: TSK
}
This opens the raw device /dev/sda1 using the OS driver. The TSK driver is then given the previous file like object and the nested pathspec instructing it to open the /windows/notepad.exe file after parsing the filesystem in the previous step.
This can get more involved, for example:
path: /dev/sda1
pathtype: OS
nested_path: {
path: /windows/system32/config/system
pathtype: TSK
nested_path: {
path: SOFTWARE/MICROSOFT/WINDOWS/
pathtype: REGISTRY
}
}
Which means to use TSK to open the raw registry file and then REGFI to read the key from it (note that is needed because you generally cant read the registry file while the system is running).
Pathspec transformations
The pathspec tells the client exactly how to open the required file, by nesting drivers on the client. Generally, however, the server has no prior knowledge of files on the client, therefore the client needs to transform the server request to the pathspec that makes sense for the client. The following are the transformations which are applied to the pathspec by the client.
File Case Correction and path separator correction
Some filesystems are not case sensitive (e.g. NTFS). However they do preserve file cases. This means that the same pathspecs with different case filename will access the same file on disk. This file however, does have a well defined and unchanging casing. The client can correct file casing, e.g.:
path: c:\documents and settings\
pathtype: OS
Is corrected to the normalized form:
path: /c/Documents and Settings/
pathtype: OS
Filesystem mount point conversions
Sometimes the server requires to read a particular file from the raw disk using TSK. However, the server generally does not know where the file physically exists without finding out the mounted devices and their mount points. This mapping can only be done on the client at request runtime. When the top level pathtype is TSK, the client knows that the server intends to read the file through the raw interface, and therefore converts the pathspec to the correct form using the mount points information. For example:
path: /home/user/hello.txt
pathtype: TSK
Is converted to:
path: /dev/sda2
pathtype: OS
nested_path: {
path: /user/hello.txt
pathtype: TSK
}
UUIDs versus “classical” device nodes
External disks can easily get re-ordered at start time, so that path specifiers containing /dev/sd? etc. may not be valid anymore after the last reboot. For that reason the client will typically replace /dev/sda2 or similar strings with /dev/disk/by-uuid/[UUID] on Linux or other constructions (e.g. pathtype: uuid) for all clients.
Life of a client pathspec request
The figure below illustrates a typical request to the client - in this case to list a directory:
A ListDirectory Flow is called with a pathspec of:
path: c:\docume~1\bob\ pathtype: OS
The flow sends a request for the client action ListDirectory with the provided pathspec.
Client calls VFSOpen(pathspec) which opens the file, and corrects the pathspec to:
path: c:\Documents and Settings\Bob\ pathtype: OS
Client returns StatResponse for this directory with the corrected pathspec.
The client response maps to a defined location in the server data store. In this case a mapping is created for files read through the OS apis using the OS path type. Note that the path to access the created files contains the case corrected client paths:
/fs/os/c/Documents and Settings/Bob
The server now creates this object, and stores the corrected pathspec as an attribute.
Client pathspec conversions can be expensive so the next time the server uses this object for a client request, the server can simply return the client the corrected pathspec. The corrected pathspec has the LITERAL option enabled which prevents the client from applying any corrections.
Foreman
The Foreman is a client scheduling service. At a regular intervals (defaults to every 30 minutes) the client will report in asking if there are Foreman actions for it. At the time of this check in, the Foreman will be queried to decide if there are any jobs that match the host, if there are, appropriate flows will be created for the client. This mechanism is generally used by Hunts to schedule flows on a large number of clients.
The foreman maintains a list of rules, if the rule matches a client when it checks in, the specified flow will execute on the client. The rules work against client attributes allowing for things like “All XP Machines” or “All OSX machines installed after 01.01.2011”. For more information see the section about hunt rules.
The foreman check-in request is a special request made by the client. When the server sees this request it does the following:
Determines how long since this client did a Foreman check-in.
Determines the set of rules that are non-expired and haven’t previously been checked by the client.
Matches those rules against the client’s attributes to determine if there is a match.
If there is a match, run the associated flow.
The reason for the separate Foreman check-in request is that the rule matching can be expensive when you have a lot of clients, so having these less frequent saves a lot of processing.
Authorization and Auditing
GRR contains an audit framework that logs all requests coming through the GRR API (the admin UI also uses said API as the backend so all requests through the UI are logged too). The username that is logged is generated by the chosen WebAuthManager (in practice often the BasicWebAuthManager which uses basic http authentication).
Reading the Audit Log
The audit log is currently not surfaced in any UI. The following code snippet illustrated hot to read the audit log through the GRR console:
end_time = rdfvalue.RDFDatetime.Now()
start_time = end_time - rdfvalue.Duration("52w")
result = data_store.REL_DB.ReadAPIAuditEntries(min_timestamp=start_time,
max_timestamp=end_time)
print result[0]
message APIAuditEntry {
http_request_path : u'/api/clients/C.071d6e8cc079bf11/flows/33D85D32?'
response_code : OK
router_method_name : u'GetFlow'
timestamp : RDFDatetime:
2019-05-13 14:54:19
username : u'amoser'
}
Release notes
Each release of GRR brings some significant changes as the project’s code is moving quickly. This page tries to identify the key changes and issues to watch for when upgrading from one version to another.
Note that since the 3.2.0.1 release, client and server releases are no longer done separately (server debs include client templates).
Remember to back up your data-store and configs before upgrading GRR to a new version.
Server
3.4.6.0 (May 30 2022)
Regular release for Q2 2022.
UIv2 “files and flows” Launch
Search and collect files, browse collected files (former VFS)
List directory processes and network connections
Scan and dump memory
Launch binary, execute python hacks
Collect timeline, browser history and artifacts
UIv2 Canary feature: create hunts from existing flows!
New Feature:
Read low level
flow and client action. Use it to read raw data from disk.Small bug fixes and refactors.
Note: This will be the last release where GRR supports the Legacy Communication protocol. From the next release onwards, expect GRR to communicate only via Fleetspeak (more details to follow before the next release).
Note: This will be the last release based on Python 3.6.
3.4.5.1 (August 19 2021)
Mid-quarter release for Q3 2021.
Sandboxing in the GRR client: TSK, libfsntfs and YARA libraries are now run in a separate, unprivileged process. This uses an unprivileged user and network/IPC namespaces on Linux, an unprivileged user in combination with
sandbox_init
on macOS, and AppContainer Isolation on Windows.Fleetspeak, the next generation communication framework, is now enabled by default. It’s still possible to enable the deprecated, legacy communication framework via
grr_config_updater
.New MSI installer for Windows clients. The old, self-extracting ZIP exe is now deprecated.
ListNamedPipes
: New flow for named pipe collection on Windows.Ongoing progress on the UIv2: early version of the next generation UI can be checked via the
<admin ui address>/v2
URL.
3.4.3.1 (May 19 2021)
Regular release for Q2 2021.
Prometheus/Grafana support.
New flow: “Collect large file” that uploads a given path to Google Cloud Storage using a given signed URL.
New output plugin: Elasticsearch (thanks micrictor@ for the contribution!)
Fleetspeak-enabled setups now use streaming connections by default (agents keep connections open, thus reducing latency from ~5 minutes to under 1 minute).
API changes:
Introduced KillFleetspeak, RestartFleetspeakGrrService, DeleteFleetspeakPendingMessages, GetFleetspeakPendingMessages, GetFleetspeakPendingMessageCount API methods to provide Fleetspeak-specific capabilities for Fleetspeak-enabled clients.
Introduced ListParsedFlowResults and ListFlowApplicableParsers API methods for on-demand artifacts parsing.
Bugfixes and various minor enhancements:
Osquery doesn’t report results if the client returns an empty dataset.
Additional options can be specified when downloading timelines.
Introduced Hunt.default_client_rate configuration option.
Added Cron.interrogate_client_rate config option (defaults to 50).
Ongoing work on the UIv2: very early version of the next-gen UI can be checked via the
<admin ui address>/v2
URL.
Note: this is the last release where Fleetspeak is considered an experimental feature. Next release is going to use Fleetspeak as a default option (while still keeping the legacy communication protocol as a deprecated option).
3.4.2.4 (October 14 2020)
Minor bug-fix release on top of v3.4.2.3.
grr_config_updater
: Fixed a bug where the fleetspeak MySQL password setting was not applied correctly. Fleetspeak was effectively always using an empty MySQL password when connecting to the database.
3.4.2.3 (October 01 2020)
Quarterly release for Q3 2020.
GRR API:
Reference documentation is now available.
An OpenAPI spec is now available.
Support for starting file collection hunts involving multiple explicitly specified hosts is now available.
ArtifactCollectorFlowArgs
,ArtifactFilesDownloaderFlowArgs
:use_tsk
is now deprecated in favor ofuse_raw_filesystem_access
use_tsk
will be kept for compatibility until 2021-04-01, existing users should migrate to usinguse_raw_filesystem_access
The
NTFS
virtual file system handler using libfsntfs is now the default for raw filesystem access on Windows. TheTSK
handler is being deprecated.Experimental bundled fleetspeak (next generation communication framework) is now available on all platforms. This can be enabled using
grr_config_updater
.Timeline collection flow now also collects the file creation timestamp on all platforms.
Flow and Hunt IDs now have a length of 16 hexadecimal digits.
A workaround for a bug in
find_library
affecting macOS 11 has been added.Bugfixes and various enhancements.
3.4.2.0 (July 06 2020)
Quarterly release for Q2 2020.
New flows:
TimelineFlow
for collecting a snapshot of an entire filesystem.CollectEfiHashes
andDumpEfiImage
flows for macOS investigations.
New SplunkOutputPlugin to export Flow/Hunt results to Splunk.
New NTFS virtual file system handler for file system parsing using libfsntfs.
Support for custom Email addresses for GRR users. #555.
Support pre-filling the approval reason text field by appending
?reason=<pre-filled reason>
to GRR URLs.API Changes:
All GRR Protocol Buffers messages now have proper package declarations.
New method
GetVersion
.API shell now validates GRR server version and fails if the API client is too old. (This can be disabled using the
--no-check-version
flag.)
Experimental bundled fleetspeak (next generation communication framework) on Debian server / clients. This can be enbled using
grr_config_updater
.Bugfixes and various enhancements.
3.4.0.1 (December 18 2019)
This is a first fully Python 3-based release.
GRR server debian package now has to be installed on Ubuntu 18.
UpdateClient flow fixed for Ubuntu clients (see https://github.com/google/grr/issues/720).
A number of bugfixes and minor enhancements for YARA memory scanning.
3.3.0.8 (October 9 2019)
NOTE: Last Python 2-based release: further releases are expected to be Python 3-only.
Minor bug-fix release.
A number of REL_DB issues discovered since the last release fixed.
YARA scanning fixes and enhancements.
Using latest (in most cases) versions of dependencies (applies to both Python and JS).
3.3.0.4 (July 3 2019)
Minor bug-fix release.
Resolved an issue with approvals #719.
Resolved an issue with AdminUI failing when notification button was clicked, but no unread notifications were present.
3.3.0.3 (July 1 2019)
Minor bug-fix release.
Resolved an issue with client labels removal #716.
3.3.0.2 (June 28 2019)
Just a small bug-fix release.
3.3.0.0 (May 22 2019)
[IMPORTANT] This is the first GRR release that works with the new relational data model. The legacy, AFF4-based data store is still available in this release but new GRR setups are encouraged to switch to the relational data store. Please see the documentation for details.
Rekall memory forensics support has been dropped from GRR. GRR still offers process memory acquisition and process memory scanning using the Yara library.
This release contains basic osquery integration for clients that have osquery preinstalled.
Custom monitoring code was removed in favor of prometheus.io integration.
Tons of small (and not-so-small) bug fixes and code health improvements.
API updates
ListFlowOutputPluginsLogs, ListFlowOutputPluginErrors, ListHuntOutputPluginLogs and ListHuntOutputPluginErrors API calls now always report batch_index and batch_size as 0 and no longer include PluginDescriptor into the reply.
ListHuntCrashes method no longer accepts the “filter” argument.
ListHunts no longer fills the “total_count” attribute of ApiListHuntsResult.
The ApiHunt protobuf no longer has an expires field. Instead, a duration field has been added which can be used to calculate the expiration date as start_time + duration. Note that if the hunt hasn’t been started, it does not have a start_time and, in consequence, it does not have an expiry time either.
The ApiModifyHuntArgs protobuf no longer has an expires field. Instead, a duration field has been added.
The artifact field of ApiUploadArtifactArgs no longer accepts an arbitrary byte stream but only proper strings. Since this field is ought to be the artifact description in the YAML format and YAML is required to be UTF-8 encoded, it makes no sense to accept non-unicode objects.
3.2.4.6 (Dec 20 2018)
This is just a small off-schedule release with some bugfixes.
Fixed a unicode bug that prevented Windows clients with non-latin usernames from checking in.
Fixed a bug in the
grr_config_updater generate_keys
command.
3.2.4.5 (Dec 17 2018)
[IMPORTANT] This release is the last GRR release to work on a legacy AFF4-based datastore. Next generation datastore will also work on top of MySQL but will have a completely different schema, meaning that you’ll lose historical data with the next GRR upgrade (unfortunately we can’t provide data migration tools).
[IMPORTANT] This release is the last GRR release to still include deprecated Rekall support. It will be removed completely with the next GRR release.
Dropped support for SQLite datastores. Current users with SQLite deployments will have to migrate to MySQL when upgrading. Please note that GRR does not offer any functionality for migrating data from SQLite databases to MySQL. The setup process for MySQL datastores has been revamped, and is easier and more intuitive now. Users will have the option of aborting installation if MySQL is not installed on a target system, and GRR will now proactively and transparently attempt to connect to a MySQL instance during initialization.
Improvements to the Python package structure.
A lot of Python 3-compatibility improvements.
Implementation of client-side artifact collector that tremendously improves latency of artifact collection.
Tons of small (and not-so-small) bug fixes and code health improvements.
API updates
Renamed the
task_eta
field of theApiClientActionRequest
object toleased_until
.Got rid of
ListCronJobFlows
andGetCronJobFlow
in favor ofListCronJobRuns
andGetCronJobRun
.ListCronJobRuns
/GetCronJobRun
returnApiCronJobRun
protos instead ofApiFlow
returned by deletedListCronJobFlows
/GetCronJobFlow
.Changed
CreateCronJob
to accept newly introducedApiCreateCronJobArgs
instead of anApiCronJob
.ApiCreateCronJobArgs
only allows to create hunt-based cron jobs.All
ApiFlowRequest
responses do not fill the AFF4 specificrequest_state.request
field anymore. Similarly, thetask_id
andpayload
fields inApiFlowRequest.responses
objects is not populated anymore starting from this release.Flow log results returned by
ApiListFlowLogsHandler
do not contain the name of the flow the logs are for anymore.The
ListPendingGlobalNotifications
andDeletePendingGlobalNotification
API methods have been deleted, since GRR no longer supports global notifications. The corresponding protosApiListPendingGlobalNotificationsResult
andApiDeletePendingGlobalNotificationArgs
have been deprecated.
3.2.3.2 (Jun 28 2018)
This is an off-schedule release that fixes a client-repacking bug introduced in v3.2.3.0.
3.2.3.0 (Jun 25 2018)
We are planning to deprecate the SQLite datastore in favor of MySQL. Support for SQLite will likely get dropped in the coming few releases.
Significant progress has been made towards reimplementing GRR’s database model to use a relational schema. We expect to launch the new model - replacing the old one - in the next few months. The new model will not have a SQLite implementation (which is why we are dropping SQLite).
UI improvements and bugfixes.
Previously, when building and repacking client templates, config options in the ‘ClientBuilder’ context would leak into client config files. This will no longer happen, starting from this release. Config files that define client options inside a ‘ClientBuilder’ context might need fixing.
Api Updates:
GetGrrBinary
API method now returnsApiGrrBinary
object instead of a binary stream. The old behavior is preserved in a newly introducedGetGrrBinaryBlob
method.Additional fields were added to
Api{Hunt,CronJob,Client}Approval
protos:requestor
andemail_message_id
.ApiNotification.Type
enum has been changed: 0 now means ‘UNSET’ (previously it meant ‘DISCOVERY’, but ‘DISCOVERY’ type is now deprecated).ApiNotificationDiscoveryReference
has been renamed toApiNotificationClientReference
.ApiNotificationFileDownloadReference
has been deprecated.In
ApiNotificationCronReference
andApiNotificationCronJobApprovalReference
protos, thecron_job_urn
field has been replaced withcron_job_id
.In the
ApiNotificationHuntReference
proto, thehunt_urn
field has been replaced withhunt_id
.
3.2.2.0 (Mar 12 2018)
As of this release, GRR’s legacy asciidoc documentation (previously hosted on github) has been deleted. https://grr-doc.readthedocs.io is now GRR’s only documentation repository.
GRR’s protobufs have been refactored into their own pip package
grr-response-proto
, which is now a dependency ofgrr-response-core
.User objects now get automatically created when using
RemoteUserWebAuthManager
(see Running GRR behind Nginx). Users who log in for the first time no longer need to do anything special to make notifications work.grr-response-client
’s pip dependency for Rekall has been updated to version1.7.2rc1
.Functionality has been added to allow showing predefined warnings in the Admin UI when accessing clients with certain labels (
AdminUI.client_warnings
config option).Scanning and dumping process memory with Yara has been made more resource-efficient, and now puts less strain on client machines.
A number of changes have been made to the GRR Api, some of which are not backwards compatible with Api client code expecting pre-3.2.2.0 responses and behavior:
hunt_pb2.ApiListHuntLogsResult
now useshunt_pb2.ApiHuntLog
, a new proto, instead ofjobs_pb2.Flowlog
.hunt_pb2.ApiListHuntErrorsResult
now useshunt_pb2.ApiHuntError
, a new proto, instead ofjobs_pb2.HuntError
.flow_pb2.ApiListFlowLogsResult
now usesflow_pb2.ApiFlowLog
, a new proto, instead ofjobs_pb2.Flowlog
.The default
age
attribute has been removed from Api responses.cron_pb2.ApiCronJob
protos now have acron_job_id
field.GetClientVersions
Api call (/api/clients/<client_id>/versions) no longer includes metadata (last ping, last clock, last boot time, last crash time).
3.2.1.1 (Dec 6 2017)
The
HTTPDatastore
has been removed from GRR.GRR now supports MySQL out of the box (when installed from the server deb).
GRR’s documentation has been revised and moved to https://grr-doc.readthedocs.io. The old asciidoc documentation (currently hosted on github) will be deleted before the next release.
SSO support has been added to the Admin UI, with the
RemoteUserWebAuthManager
.Auto-refresh functionality for flows and hunts has been added to the GRR UI.
Support for per-client-average limits for hunts has been added.
GRR now uses the pytest framework to run tests.
grr_run_tests
is deprecated, and will be removed before the next release.Support for scanning and dumping process memory using Yara has been added.
The
SearchFileContent
flow has been deprecated, in favor of equivalentFileFinder
functionality.zip/tar.gz archives downloaded from GRR will no longer use symlinks to avoid duplication of files - this makes them easier to work with.
3.2.0.1 (Sep 5 2017)
Starting with 3.2.0.1, we plan on releasing more frequently, since we have automated a large part of the release process. The recommended way of installing this version of GRR is with the server debian package, which includes client templates for all supported platforms. PIP packages, and a Docker image for the release will be made available at a later date.
IMPORTANT Before upgrading, make sure to back up your data store. This release has changes that may be incompatible with older versions of the data store. Also make copies of your configs - for reference, and in case you want to roll back the upgrade.
WARN Rekall is disabled by default. ‘Rekall.enabled: True’ line should be added to GRR configuration file in order for Rekall to work. There are known stability issues in Rekall and winpmem driver. Disabling it by default makes its ‘in dev/experimental’ status more explicit.
WARN Components are deprecated. The GRR client is now monolithic, there’s no need to upload or update components anymore. NOTE: The new GRR server will only be able to run Rekall-based flows (AnalyzeClientMemory, MemoryCollector) with the latest GRR client, since it expects the client not to use components.
WARN Rekall flows have been moved to ‘DEBUG’ category. They won’t be visible unless users enable ‘Mode: DEBUG’ in the settings menu (accessible via the settings button in top right corner of GRR AdminUI).
WARN A number of backwards-incompatible datastore schema changes were made. The nature of these changes is such that easy data migration between releases is not possible. Data loss of flows and hunts is possible. Please back up your current system if this data is critical.
WARN A number of deprecated configuration options were removed. If GRR fails to start after the upgrade, please check the server logs for ‘Unknown configuration option’ messages.
WARN Global Flows are no longer supported.
INFO Bugfixes, additionals tests and code health work in all parts of the system.
INFO HTTP API now covers 100% of GRR functionality. Python API client library significantly extended. API call routers implemented, allowing for flexible authorization.
INFO ‘Download as’ functionality added to flows and hunts. Supported formats are CSV, YAML and SQLite.
INFO Build process for clients and server is significantly improved. Client templates for all supported platforms, as well as a release-ready server deb, get built on every github commit.
0.3.0-7 to 3.1.0.2
Note that we shifted our version string left to reduce confusion, see the documentation on GRR versions.
WARN Quite a few config options have changed. We recommend moving any customizations you have into your server.local.yaml, and using the main config file supplied with the new version. Some important specific ones are called out here.
WARN Config option Client.control_urls has been replaced with Client.server_urls. You will need to make sure your polling URL is set in the new variable.
WARN We have completely rewritten how memory collection and analysis works inside GRR. Rekall is now responsible for all driver loading, memory collection, and memory analysis. It also replaces the functionality of the Memory Collector flow. This means you must upgrade your clients to be able to do memory collection and analysis.
WARN Repacking of old client templates no longer works due to changes necessary for repacking different versions. You will need to use the new client templates that include a build.yaml file.
WARN Config option Cron.enabled_system_jobs (a allowlist) was replaced with Cron.disabled_system_jobs (a blocklist) to make adding new cronjobs easier. You will need to remove Cron.enabled_system_jobs and if you made customizations here, translate any jobs you want to remain disabled to the new list.
WARN We changed the format for the Knowledge Base protocol buffer to remove a duplication of the users component which was storing users information twice and complicating code. This means that all clients need a fresh interrogate run to populate the machine information. Until the machines run interrogate the
%%users.homedir%%
and similar expansions won’t work.WARN The compiler we use to build the Mac client template has become more aggressive with optimizations and now emits code optimized to specific processor versions. The mac client will not run on a processor older than our build machine - an early 2011 Macbook with the Intel Sandy Bridge architecture. If you need to run GRR on, for example, Core 2 duo machines, you will need a custom client built on one of those machines.
WARN We no longer support running the client on Windows XP, it’s difficult to make it work and we have no use for an XP client ourselves. See here for our OS support statement.
INFO Strict context checking was added for config files in this commit, which exposed a number of minor config typo bugs. If you get InvalidContextError on upgrade, you need to update your config to fix up the typos. The config/contexts.py file is the canonical reference.
Upgrade procedure:
Make backups of important data such as your configs and your database.
Upgrade to xenial
Clear out any old installations:
sudo rm -rf /usr/local/bin/grr_*; sudo rm -rf /usr/local/lib/python2.7/dist-packages/grr/
Install deb package
Backup then remove your /etc/grr/grr-server.yaml. Make sure any customizations you made are stored in
/etc/grr/server.local.yaml
.Run
grr_config_updater initialize
0.3.0-6 to 0.3.0-7
INFO We moved to using the open source binplist, rather than bundling it with GRR. If you are getting “ImportError: cannot import name binplist”, make sure you have installed the binplist package and deleted grr/parsers/binplist* to clean up any .pyc files that might have been created.
INFO If you have troubles with the rekall efilter library failing to import, try deleting all of your rekall-related .pyc’s and make sure you’re installing the version in requirements.txt. See this bug for full details.
0.3.0-5 to 0.3.0-6
This version bump was to keep in step with the client, which got a new version because of a memory collection bug.
WARN The artifact format changed in a way that is not backwards compatible. You’ll need to delete any artifacts you have in the artifact manager, either through the GUI or on the console with
aff4.FACTORY.Delete("aff4:/artifact_store")
before you upgrade. If you forget you should be able to use the console or you can roll back to this commit before the artifact change.WARN Keyword search is now based on an index that is built at weekly interrogate time, which means decomissioned clients won’t appear in search results. To populate all of your historical clients into the search index, use the code snippet on this commit.
0.3.0 to 0.3.0-5
WARN Rekall made some fundamental changes that mean rekall in the 0.3.0-5 server won’t work with rekall in 3.0.0.3 clients, so a client upgrade to 3.0.0.5 is required. Non-rekall GRR components should continue to work.
WARN The enroller component was removed, and is now handled by the worker. You will need to stop those jobs and delete the relevant upstart/init scripts.
INFO We now check that all config options are defined in the code and the server won’t start if they aren’t. When you upgrade it’s likely there is some old config that will need cleaning up. See the graveyard below for advice.
0.2.9 to 0.3.0
WARN After install of new deb you will need to restart the service manually.
INFO To get the new clients you will want to repack the new version with
sudo grr_config_updater repack_clients
. This should give you new Windows, Linux and OSX clients.INFO Client themselves will not automatically be upgraded but should continue to work.
0.2-8 to 0.2-9
WARN After install of new deb you will need to restart the service manually.
WARN Users have changed from being managed in the config file to being managed in the datastore. This means your users will not work. We haven’t migrated them automatically, please re-add with
sudo grr_config_updater add_user <user>
INFO To get the new clients you will want to repack the new version with
sudo grr_config_updater repack_clients
. This should give you new Windows, Linux and OSX clients.INFO Client themselves will not automatically be upgraded but should continue to work.
Client
3.2.0.1 -
Starting from 3.2.0.1, we the same version string for the client and server, since they get released at the same time.
3.0.7.1 to 3.1.0.0
Note that we skipped some numbers to make versioning simpler and reduce confusion, see the documentation on GRR versions.
WARN We changed rekall to be a independently updatable component in the client, which is a backwards incompatible change. You must upgrade your clients to 3.1.0.0 if you want to use memory capabilities in the 3.1.0 server.
WARN Our previous debian package added the GRR service using both upstart and init.d runlevels. This caused some confusion on systems with ubuntu upstart systems. We detect and remove this problem automatically with the new version, but since it is a config file change you need to specify whether to install the new config or keep the old one, or you will get a config change prompt. New is preferred. Something like
sudo apt-get -o Dpkg::Options::="--force-confnew" grr
.
3.0.0.7 to 3.0.7.1
INFO 3.0.7.1 is the first version of GRR that will work on OS X El Capitan. The new OS X System Integrity Protection meant we had to shift our install location from
/usr/lib/
to/usr/local/lib
.INFO We changed our version numbering scheme for the client at this point to give us the ability to indicate multiple client versions that work with a server version. So 3.0.7.* clients will all work with server 0.3.0-7.
3.0.0.6 to 3.0.0.7
WARN Linux and OS X clients prior to 3.0.0.7 were using
/etc/grr/client.local.yaml
as the local writeback location. For 3.0.0.7 this was changed to/etc/%(Client.name).local.yaml
where the default is/etc/grr.local.yaml
. If you wish to preserve the same client IDs you need to use platform management tools to copy the old config into the new location for all clients before you upgrade. If you don’t do this the clients will just re-enrol and get new client IDs automatically.
3.0.0.5 to 3.0.0.6
INFO 3.0.0.5 had a bug that broke memory collection, fixed in this commit. We also wrote a temporary server-side workaround, so upgrading isn’t mandatory. 3.0.0.5 clients should still work fine.
3.0.0.3 to 3.0.0.5
(We skipped a version number, there’s no 3.0.0.4)
3.0.0.2 to 3.0.0.3
WARN A change to OpenSSL required us to sign our CSRs generated during the enrollment process. This wasn’t necessary previously and provided no benefit for GRR so we had gained some speed by not doing it. Since new OpenSSL required it, we started signing the CSRs, but it meant that the 3.0.0.3 server will reject any 3.0.0.2 clients that haven’t already enrolled (i.e. they will see a HTTP 406). Old 3.0.0.2 clients that have already enrolled and new 3.0.0.3 clients will work fine. This basically just means that you need to push out new clients at the same time as you upgrade the server.
Config Variable Graveyard
Sometimes config variables get renamed, sometimes removed. When this happens we’ll try to record it here, so users know if local settings should be migrated/ignored etc.
You can verify your config with this (root is required to read the writeback config)
sudo PYTHONPATH=. python ./run_tests.py --test=BuildConfigTests.testAllConfigs
AdminUI.team_name: replaced by Email.signature
ClientBuilder.build_src_dir: unused, effectively duplicated ClientBuilder.source
ClientBuilder.executables_path: ClientBuilder.executables_dir
Client.config: unused. Now built from Client.config_hive and Client.config_key
Client.config_file_name: unused
Client.location: replaced by Client.control_urls
Client.package_maker_organization: replaced by ClientBuilder.package_maker_organization
Client.tempdir: replaced by Client.grr_tempdir and Client.tempdir_roots
Email.default_domain: essentially duplicated Logging.domain, use that instead.
Frontend.processes: unused
Nanny.nanny_binary: replaced by Nanny.binary
NannyWindows.* : replaced by Nanny.
PyInstaller.build_root_dir: unused, effectively duplicated ClientBuilder.build_root_dir.
Users.authentication: unused, user auth is now based on aff4:/users objects. Use config_updater to modify them.
Worker.task_limit: unused
Worker.worker_process_count: unused
Cron.enabled_system_jobs (a allowlist) was replaced with Cron.disabled_system_jobs (a blocklist). Cron.enabled_system_jobs should be removed. Any custom jobs you want to stay disabled should be added to Cron.enabled_system_jobs.
Client.control_urls: renamed to Client.server_urls.
Frequently Asked Questions
Who wrote GRR and Why?
GRR started at Google as a 20% project and gained in popularity until it became fully-supported and open sourced. The primary motivation was that we felt the state of the art for incident response was going in the wrong direction, and wasn’t going to meet our cross platform, scalability, obfuscation or flexibility goals for an incident response agent.
Additionally, we believe that for things to progress in security, everyone has to up their game and improve their capabilities. We hope that by open sourcing GRR, we can foster development of new ways of doing things and thinking about the problem, get it into the hands of real organizations at reasonable cost, and generally push the state of the art forward.
We are getting significant code contributions from outside of Google, and have close relationships with a number companies running large-scale deployments.
Why is the project called GRR?
When using other tools, we found ourselves making the sound “grrr” a lot, and it just kind of stuck. GRR is a recursive acronym, in the tradition of GNU and it stands for GRR Rapid Response. Not GRR Response Rig or Google Rapid Response which it is sometimes mistaken for.
Is GRR production ready?
As of Aug 2015 GRR is running at large scale both inside and outside of Google. The largest opensource deployment we know of is roughly 30k machines, there’s another company at around 15k, and quite a few around 2k. Google’s deployment is bigger than all of those put together, although there are some differences in the codebase (see below).
Should I expect to be able to install and just start running GRR?
Yes, for basic use cases.
But if you want to do a large-scale enterprise deployment, it is probably best to think about GRR as a 80% written software project that you could invest in. The question then becomes: instead of investing X million in product Y to buy something, should I instead invest 25% of that in GRR and hire a dev to contribute, build and deploy it? On the one hand that gives you control and in-house support, on the other, it is a real investment of resources.
If you are selling GRR internally (or to yourself) as a free <insert commercial IR product here>, your expectations will be wrong, and you may get disillusioned.
Can the GRR team provide me with assistance in getting it setup?
The core GRR team cares about the open source project, but in the end, our main goals are to build something that works for us. We don’t, and won’t offer a helpdesk, professionally curated documentation, nor someone you can pay money to help you out if something goes wrong. We aren’t providing feeds or consulting services, and have nothing direct to gain from offering the platform. If you need something pre-packaged and polished, GRR probably isn’t right for you (at least in the short-medium term). For a large deployment you should expect to fix minor bugs, improve or write documentation, and actively engage with the team to make it successful.
If someone is willing to code, and has invested some time learning we will do what we can to support them. We’re happy to spend time on VC or in person helping people get up to speed or running hackathons. However, the time that the developers invest in packaging, building, testing and debugging for open source is mostly our own personal time. So please be reasonable in what and how you ask for assistance. We’re more likely to help if you’ve contributed documentation or code, or even filed good bug reports.
I’m interested in GRR but I, or my team need some more convincing. Can you help?
The core GRR team has invested a lot in the project, we think its pretty awesome, so the team happy to talk, do phone calls, or chat with other teams that are considering GRR. We’ve even been known to send house-trained GRR engineers to companies to talk with interested teams. Just contact us directly. You also can corner one of us, or at least someone from the team, or someone who works on GRR at most leading forensics/IR type conferences around the world.
I’ve heard that there are secret internal versions of GRR that aren’t open sourced that may have additional capabilities. Is that true?
GRR was always designed to be open sourced, but with any sufficiently complex “enterprise” product you expect to integrate it with other systems and potentially even with proprietary technology. So its true that some of the core developers time is spent working on internal features that won’t be released publicly. The goal is to ensure that everything useful is released, but there are some things that don’t make sense. Below are listed some of the key differences that may matter to you:
Datastore/Storage: At Google we run GRR on a Bigtable datastore (see below for more detail), but we have abstracted things such that using a different datastore is very simple. The MySQLAdvanced datastore is available to open source are actively used at real scale outside of Google.
Security and privacy: The open source version has minimal controls immediately available for user authentication, multi-party authorization, privacy controls, logging, auditing etc. This is because these things are important enough for them to be custom and integrated with internal infrastructure in a large deployment. We open source the bits that make sense, and provide sensible hooks for others to use, but full implementations of these may require some integration work.
Machine handling and monitoring: Much of the infrastructure for running and monitoring a scalable service is often built into the platform itself. As such GRR hasn’t invested a lot in built-in service or performance monitoring. We expose a lot of statistics, but only display a small subset in the UI. We expect companies to have package deployment (SCCM/munki/casper/apt/yum etc.), config management (chef/puppet/salt etc.), and server monitoring (nagios/cacti/munin/zabbix/spiceworks etc.).
Differences will be whittled away over time as the core GRR team runs open source GRR deployments themselves. That means you can expect most of these things to become much less of an issue over time.
Why was support for SQLite dropped?
Our intent for SQLite was for it to be used as a casual, easy-setup datastore for testing/evaluating GRR. It doesn’t scale well, and is not recommended for actual production use. When developing GRR’s new relational database model, we decided that continuing to have support for SQLite was not worth the engineering effort it would take to build a SQLite implementation of the new model.
I heard GRR was designed for Bigtable and now Google has a Cloud Bigtable service. Can I use it?
Internally we use Bigtable, but the internal API is very different so the code cannot be used directly. The Cloud Bigtable service uses an extension of the HBase API. We’d like to write a GRR database connector that can use this service, but (as at Aug 2015) the Bigtable service is still in Beta and the python libraries to interact with it are still being developed, so it isn’t currently a high priority.
What operating system versions does the client support?
We try to support a wide range of operating systems because we know it’s often the old forgotten machines that get owned and need GRR the most. Having said that ‘support’ is minimal for very old operating systems, we’re a small team and we don’t have the time to test the client on every possible combination. So the approach is basically to try and keep support for old systems by building compiled dependencies in a way that should work on old systems.
Windows
Well tested on: 64bit Windows 7+ workstations, 64bit Win2k8+ servers.
Should probably work: 32bit versions of the above, Windows Vista+ workstations. (We’ve stopped building and releasing 32bit versions since release 3.4.0. It should be possible to build a client from source though.)
OS X
Well tested on: 64bit 10.8 (Mountain Lion)+ Note that 10.11 (El cap)+ systems require a 3.0.7.1+ client due to the OS X System Integrity Protection changes. And sleuthkit hasn’t caught up to filesystem changes from Yosemite onwards. The compiler we use to build the Mac client template has become more aggressive with optimizations and now emits code optimized to specific processor versions. The mac client will not run on a processor older than our build machine - an early 2011 Macbook with the Intel Sandy Bridge architecture. If you need to run GRR on, for example, Core 2 duo machines, you will need a custom client built on one of those machines.
Linux
Well tested on: 64bit Ubuntu Lucid+, CentOS 5.11+
Should probably work: 32bit versions of the above, and essentially any system of similar vintage that can install a deb or rpm.
What operating system versions does the server support?
We only support 64bit Ubuntu 18.04.
We don’t provide a 32-bit server version since standing up new 32-bit linux servers is not something rational people do, and there are many places you can get 64-bit virtual servers for ~free. We use the “amd64” notation, but this just means 64-bit, it runs fine on Intel.
What hardware do I need to run GRR?
This is actually a pretty tough question to answer. It depends on the database you choose, the number of clients you have, and how intensively you hunt. Someone who wants to do big collection hunts (such as sucking all the executables out of System32) will need more grunt and storage than someone who mainly wants to check for specific IOCs and investigate single machines.
But to give you some data points we asked some of the GRR users with large production installs about the hardware they are using (as at October 2015) and provide it here below:
32k clients:
Workers: AWS m4.large running 3 worker processes
HTTP frontends: 6-10 x AWS m4.large. Sits behind an AWS Elastic Load Balancer.
Datastore (SQLite): 5 x AWS m4.2xlarge. m4.2xlarge is used when running intensive enterprise hunts. During normal usage, m4.large is fine.
AdminUI: 1 m3.large
15k clients:
Workers and HTTP frontends: 10 x 4 core 8GB RAM virtual servers running 1 worker + 1 frontend each
Datastore (MySQLAdvanced): 16 core 256G ram 8x10k drives. 128G RAM was sufficient, but we had the opportunity to stuff extra RAM in so we did.
AdminUI: 12 core 24G RAM is where we left the UI since it was spare hardware and we had a lot of active users and the extra RAM was nice for faster downloads of large datasets. It was definitely overkill and the backup was on a 4 core 8GB of RAM VM and worked fine during maintenance stuff.
7k clients:
Run in AWS with c3.large instances in two autoscaling groups.
Workers: 4 worker processes per server. The weekly interrogate flow tends to scale up the servers to about 10 systems, or 40 workers, and then back down in a couple of hours.
HTTP frontends and AdminUI: Each server has apache running a reverse proxy for the GRR AdminUI. At idle it uses just a t2.small, but whenever there is any tasking it scales up to 1-3 c3.large instances. Sits behind an AWS Elastic Load Balancer.
Datastore (MySQLAdvanced): AWS r3.4xlarge RDS server. RDS instance is optimized for 2000 IOPS and we’ve provisioned 3000.
How do I handle multi-organisational deployments?
Bake labels into clients at build time, and use a “Clients With Label” hunt rule to hunt specific groups of clients separately.
Which cloud should I deploy in? GCE? EC2? Azure?
Google Compute Engine (GCE) of course :) We’re working on making cloud deployment easier by dockerizing and building a click-to-deploy for GCE. Our focus will be primarily on making this work on GCE, but moving to a docker deployment with orchestration will simplify deployment on all clouds. The largest cloud deployments of GRR are currently on EC2, and we hope the community will be able to share configuration and HOWTOs for this and other cloud deployments.
Where/how do you do your data analysis?
We mostly do this outside of GRR using an internal system very similar to BigQuery. GRR supports Output Plugins that send hunt results to other systems (e.g. BigQuery, Splunk) with minimal delay.
When will feature X be ready?
Generally our roadmap on the main project page matches what we are working on, but we reserve the right to miss those goals, work on something entirely different, or sit around a fire singing kumbaya. Of course, given this is open source, you can add the feature yourself if it matters.
Who is working on GRR?
GRR has around 5 full-time software engineers working on it as their day job, plus additional part time code contributors. The project has long term commitment.
Why aren’t you developing directly on open source?
Given we previously had limited code contribution from outside, it was hard to justify the extra effort of jumping out of our internal code review and submission processes. That has now changed, we are syncing far more regularly (often multiple times per week), and we are working on code structure changes that will make it easier for us to develop externally.
Why is GRR so complicated?
GRR is complicated. We are talking about a distributed, asynchronous, cross platform, large scale system with a lot of moving parts. Building that is a hard and complicated engineering problem. This is not your average pet python project.
Having said that, the most common action of just collecting something from machines and parsing what you get back has been made significantly easier with the artifacts system. This allows you to specify complex multi-operating system collection tasks with just a few lines of YAML, and collect any of the hundreds of pre-defined forensic artifacts with the click of a button.
What are the commercial competitors to GRR?
Some people have compared GRR functionality to Mandiant’s MIR, Encase Enterprise, or F-Response. There is some crossover in functionality with those products, but we don’t consider GRR to be a direct competitor. GRR is unlikely to ever be the product for everyone, as most organizations need consultants, support and the whole package that goes with that.
In many ways we have a way to go to match the capabilities and ease of use of some of the commercial products, but we hope we can learn something off each other, we can all get better, and together we can all genuinely improve the security of the ecosystem we all exist in. We’re happy to see others use GRR in their commercial consulting practices.
How do I change the timezone from UTC?
You can’t. Technically it’s possible, but it’s a lot of work and we don’t see much benefit. You should run your GRR server in the UTC timezone.
In the datastore GRR stores all timestamps as microseconds from epoch (UTC). To implement timezone customization we could store a user’s local preference for timezone and daylight savings based on a location, and convert all the timestamps in the UI, but there is a lot of complexity involved. We’d need to make sure each user gets the correct time and timezone displayed in all parts of the UI, handle the shifting winds of daylight savings correctly, provide clear input options on forms that accept timestamps, and make sure all the conversions were correct. All of that adds up to lots of potential to display or store an incorrect time, which is not what you want from a forensics tool.
It’s common to have GRR users split across different timezones and a client fleet in many timezones. If we went to all of the effort above, all we have really achieved is fragmentation of each user’s view of the system. Each user is still going to have to analyse and reason about events on clients in a different timezone to their own. But now we have made collaborative analysis harder: instead of being able to simply copy a timestamp out of the UI into a ticket or handoff timeline analysis notes to another shift in another timezone they will have to convert timestamps back into UTC manually for consistency.
For forensic analysis and timeline building, UTC always makes the most sense. We have added a UTC clock to the top right hand corner of the UI to help reason about the timestamps you see.
Is there any relation between the grr pip package and Google’s GRR?
No. The naming collision is just a coincidence, and you might actually run into issues if you install the two in the same Python environment.
Publications
Papers
A scalable file based data store for forensic analysis, by Flavio Cruz, Andreas Moser, Michael Cohen. DFRWS 2015 Europe.
Hunting in the enterprise: Forensic triage and incident response, by Moser, Andreas, and Michael I. Cohen. Digital Investigation, 2013.
Distributed forensics and incident response in the enterprise, by M.I. Cohen, D. Bilby, G. Caronni. Digital Investigation, 2011.
Presentations
GRR Meetup: GRR Users Meetup: Fleetspeak, API client, Go rewrite @ Mar 2017 GRR Meetup by A. Moser, M. Bushkov, B. Galehouse, M. Lakomy. Video.
GRR Meetup: 3.1.0 Release @ Apr 2016 GRR Meetup by G. Castle and M. Cohen. Video.
GRR Meetup: API Edition @ Nov 2015 GRR Meetup by G. Castle and M. Bushkov. Covers API basics and discusses where we are headed with API dev.
GRR Hunting Best Practices @ Oct 2015 GRR Meetup by G. Castle.
Tactical use of GRR Artifacts, Fuse, Plaso and Timesketch by D. Bilby Demo Video (youtube).
“Intro to GRR” for the Open Source Network Security Monitoring Group @ UIUC Feb 2015. Video by G. Castle
“GRR: Find all the badness, collect all the things” @ Blackhat 2014 Video by G. Castle. Also presented a 20 min version of the same talk for Hacker Hotshots.
OSDFC 2012 GRR Overview, by D. Bilby
Workshops
GRR Workshop at CERN, February 2018 slides by M. Bushkov, B. Galehouse, M. Lakomy, and A. Moser
ICDF2C Prague 2017 Workshop slides by M. Bushkov and B. Galehouse
DFRWS US 2015 Workshop slides and scripts to setup your own workshop infrastructure by G. Castle.
DFRWS EU 2015 Workshop workshop slides by A. Moser
OSDF 2013 workshop presentation by D. Bilby.
Podcasts
GRR Rapid Response on Down the Security Rabbithole Jun 2014 by G. Castle.
Whitepapers
GRR Forensic Artifacts White Paper, by G. Castle
External Presentations
These are presentations about GRR by people outside the core GRR team.
“Human Hunting” @BSidesSF 2015 by Sean Gillespie. Covers how Yahoo is using GRR. Video.
“GRR Rapid Response: Remote Live Forensics for Incident Response” @ Linux Security Summit Aug 2015 by Sean Gillespie.