Censys Search & Data

Community Platform

runZero supports importing assets from the Censys Search API and the Censys Internet Dataset.

Censys Search API

To get started with the Censys Search API, you will need to register for a Censys Search account. Once you have done so, you can find your API credentials in the My Account section.

In runZero, go to the Credentials page, and click Add Credential. Select Censys Search API Key as the credential type, and enter your API ID and API secret.

You can now go to your asset inventory, click the Connect button, and choose Censys Search API. Select the credential you just created from the Censys Search credential dropdown.

Configuration

There are two modes for connecting runZero to the Censys Search API.

  • Custom Query mode - runZero runs a Censys search query you specify, and then imports all of the results into runZero. The search query should be in Censys Search Language. It is a good idea to test your query using the main Censys Search 2.0 interface before running an import task.

  • All Assets mode - runZero assembles a list of public IP addresses from all of the assets in the selected site, and then uses the API to find Censys Search information for those addresses. The information found is imported into runZero and merged into the appropriate assets.

As with a runZero scan, you’ll need to select a site to contain the scan data. The usual task scheduling options are available.

When you have finished editing the Censys Search configuration, click Activate Connection.

Censys Universal Internet Dataset

To get started with the Censys Universal Internet Dataset API, you will need a paid Censys Data account and the associated API credential. You can find your API credentials in the My Account section.

The dataset can be downloaded by following the instructions in the Censys documentation. The Search API is used to get a list of files for a given date and those individual files should downloaded into a local directory backed by SSD or NVMe storage.

Creating the database

The raw files are in Apache Avro format and need to be converted into a database for efficient queries.

To process Censys data files, you use the runZero CLI’s censys-db-convert command. This command takes two parameters:

  • The path to a directory containing the .avro files from Censys
  • The path to write the computed database
$ nice runzero censys-db-convert /home/censys/avro /home/censys/db

The default configuration requires substantial computing resources:

  • At least 8 CPU cores, but 16 or more is better
  • At least 64GiB of RAM, but more is better
  • At least 3Tb of storage backed by SSD or NVMe (1Tb+2Tb or single volume)

An AWS m5.4xlarge with a 3Tb GP2 SSD volume meets these requirements and can process a full dataset (single day) in about 13 hours. The resulting database is about twice the size of the source data (1.3TiB database from 640GiB of Avro). Using the database requires additional disk overhead and over provisioning the storage also improves throughput.

Querying the database

After the Avro files have been converted to a local database, the censys-db command can be used to import data into runZero.

The CLI queries the local database, and writes a file in runZero scan format containing the appropriate host records. By default, the file has a name matching censys-*.rumble.gz and is written to the current directory. Alternatively you can specify an output filename with the --output-raw option, as if performing a runZero scan.

The runZero scan file can be uploaded to the runZero console like any other scan file.

If you have more IP addresses or CIDRs than will fit on a command line, you can use the --input-targets option to specify that the CLI should read them from a file. The file is expected to be ASCII text, and contain CIDRs or IP addresses separated by whitespace (which can include newlines).

You can also use the CLI to process data, upload it, and then delete the scan data file if everything succeeded. For example:

% runzero censys-db /home/censys/db \
  12.216.190.0/24 --upload --api-key=YOUR_ORGANIZATION_API_KEY \
  --upload-site="Primary site"

If you are using self-hosted runZero, you can use the --api-url option to specify your console’s API endpoint.

The censys command also supports the --verbose option, which will make it list host addresses as they are written to the output file.

Creating a local Censys Search API server

The computed database can also be used to serve a limited, local version of the Censys Search API using the runZero CLI’s censys-db-server command.

Due to the size of the database, the system vm.max_map_count may need to be increased to avoid a memory map error. The memory map count can be increased by adding the following line to /etc/sysctl.conf:

vm.max_map_count=262144 

Once this line is added, reload the sysctl.conf with the following command:

$ sudo sysctl -p /etc/sysctl.conf 

After the vm.max_map_count has been updated, start the Censys DB Server with the following command:

$ runzero censys-db-server /home/censys/db

This will start a local web service on port 55555 by default (changeable via --port <val>) that responds to the /api/v2/hosts/search and /api/v2/hosts/<ip> endpoints. Once this server is running, it can be queried using the runZero Censys Search API connector, and through other HTTP clients, such as curl:

$ curl http://127.0.0.1:55555/api/v2/hosts/search?q=ip%3A8.8.8.0/24
$ curl http://127.0.0.1:55555/api/v2/hosts/8.8.8.8

Querying the raw Avro files without database processing

runZero also supports direct queries of the unprocessed Avro files. These queries are slow and may take hours or days to complete depending on the query and local storage speed. To query the raw Avro files, you use the runZero CLI’s censys command. It takes any number of arguments, which can be:

  • Names of Avro files, which must end in .avro
  • CIDRs or IP addresses to search for in the files

The CLI reads the Avro files specified, and writes a file in runZero scan format containing the appropriate host records. By default, the file has a name matching censys-*.rumble.gz and is written to the current directory. Alternatively you can specify an output filename with the --output-raw option, as if performing a runZero scan.

The runZero scan file can be uploaded to the runZero console like any other scan file.

If you have more IP addresses or CIDRs than will fit on a command line, you can use the --input-targets option to specify that the CLI should read them from a file. The file is expected to be ASCII text, and contain CIDRs or IP addresses separated by whitespace (which can include newlines).

You can also use the CLI to process data, upload it, and then delete the scan data file if everything succeeded. For example:

% runzero censys universal-internet-dataset-20210923-000000000000.avro \
  12.216.190.0/24 --upload --api-key=YOUR_ORGANIZATION_API_KEY \
  --upload-site="Primary site"

If you are using self-hosted runZero, you can use the --api-url option to specify your console’s API endpoint.

The censys command also supports the --verbose option, which will make it list host addresses as they are written to the output file.

Troubleshooting

If you are having trouble using this integration, the questions and answers below may assist in your troubleshooting.

Why is the Censys Search integration unable to connect?

  1. Are you getting any data from the Censys Search integration?
    • Make sure to query the inventory rather than look at the task details to review all the data available from this integration.
    • In some cases, integrations have a configuration set that limits the amount of data that comes into the runZero console.
  2. Some integrations require very specific actions that are easy to overlook. If a step is missed when setting up the intergration, it may not work correctly. Please review this integration documentation and follow the steps exactly.
  3. If the Censys Search integration is unable to connect be sure to check the task log for errors. Some common errors include:
    • 500 - server error, unable to connect to the endpoint
    • 404 - hitting an unknown endpoint on the server
    • 403 - not authorized, likely a credential issue
Updated