Cybersecurity Analytics: Leveraging Data for Smarter Threat Detection
.png)
Modern cyber threats unfold as coordinated, multi-stage campaigns. Yet most security tools still treat them as isolated incidents, limiting defenders’ ability to understand the full scope of an attack. Cybersecurity analytics bridges this gap by correlating data across systems, timeframes, and entities to reveal patterns that would otherwise remain hidden. It enables security teams to detect threats earlier, respond more effectively, and adapt to evolving attack strategies. Achieving this requires more than reviewing disconnected logs; it depends on understanding how malicious domains, compromised endpoints, and suspicious behaviors are connected within a broader campaign.
In this article, we explore the foundations of cybersecurity analytics, the critical role of connected data, and how graph-based techniques can bring structure and clarity to complex threat environments. We will also demonstrate this approach in practice by using PuppyGraph and threat intelligence from Open Threat Exchange (OTX) to visualize hidden relationships and accelerate threat investigations.
Cybersecurity Analytics: The Essentials
Cybersecurity analytics is the practice of collecting, correlating, and analyzing security data to detect, investigate, and respond to threats with greater intelligence. Instead of reacting to isolated alerts, it empowers security teams to uncover hidden patterns, trace attack paths, and predict future risks based on connected evidence.
Modern environments produce massive amounts of telemetry, from system logs and network flows to cloud service events and user activities. Cybersecurity analytics transforms this raw data into structured insights that power faster, more accurate decisions.
What Cybersecurity Analytics Achieves
At its core, cybersecurity analytics supports four goals: detection, prevention, response, and prediction.
Detection identifies threats as they emerge. Prevention uses past learnings to strengthen defenses. Response provides the context needed to contain incidents effectively. Prediction anticipates vulnerabilities and attack strategies before they are exploited.
The need for analytics has become clear as traditional security tools show their limits. Signature-based detection, static rules, and isolated log reviews often fail against today’s stealthy, multi-stage attacks. Defenders must see not just individual events, but the relationships between them. Analytics provides the ability to connect dots across time, systems, and identities, revealing coordinated activity that would otherwise go unnoticed.
The Analytics Lifecycle
Effective cybersecurity analytics follows a structured process that transforms noisy telemetry into actionable insights.
Each stage must work efficiently with the others. Gaps or delays at any point in the lifecycle can break the flow of intelligence and delay critical threat detection.
Real-Time versus Batch Analytics
Cybersecurity analytics operates in two complementary modes.
Real-time analytics ingests and processes data streams as they arrive, providing immediate detection of active threats. It relies on event streaming platforms, low-latency data stores, and stream processors to enable split-second decision-making.
Batch analytics analyzes large volumes of historical data to identify trends, refine detection models, and support retrospective investigations. It runs over data lakes, warehouses, or distributed query engines, often uncovering patterns that are invisible in short-term data streams.
Both approaches are necessary. Real-time analytics catches threats in progress, while batch analytics strengthens long-term defense strategies.
When Analytics Fail
Collecting data is only the beginning. Many organizations struggle to turn that data into meaningful insights and actions.
Analytics initiatives often fail because of poor normalization, inconsistent correlation logic, or disconnected tooling. For example, if identity logs, endpoint telemetry, and access logs are stored in separate systems without integration, investigating a suspicious login becomes slow and error-prone.
Successful cybersecurity analytics depends not just on gathering large amounts of data, but on enabling fast, confident decisions based on connected insights. Systems must allow flexible correlation, timely querying, and intuitive investigation paths. Without this, defenders are left piecing together incidents manually, losing time and context during critical investigations.
Cybersecurity analytics reaches its full potential when it links isolated events into coherent attack stories, empowering defenders to act decisively.
Key Components and Data Sources
Effective cybersecurity analytics depends on the quality and diversity of the underlying data, as well as the systems that can process and correlate that information at scale. Without complete and high-fidelity telemetry, even the most advanced analytics engines risk missing critical signals. Building a strong foundation requires both comprehensive data sources and technologies that can transform raw information into actionable intelligence.
Primary Data Sources in Cybersecurity Analytics
Effective cybersecurity analytics depends on diverse, high-fidelity telemetry that captures activity across systems, networks, endpoints, and users. The following categories represent the primary data sources organizations rely on to detect and investigate threats.
Infrastructure and Network Telemetry
Logs from operating systems, applications, firewalls, and network devices provide crucial signals about internal activities and external communications. System logs capture authentication attempts, configuration changes, and application behavior. Firewall and network logs monitor traffic flows, flagging intrusion attempts and lateral movement. Network flow records and packet captures offer deeper insights into traffic behavior, enabling the detection of data exfiltration, command-and-control activity, and protocol anomalies.
Endpoint and Identity Telemetry
Endpoint detection and response (EDR) platforms record detailed host-level activity, including process execution, file access, and privilege escalations. User activity and authentication data from identity providers and authentication systems track login patterns, session behavior, and access attempts. Analyzing these sources enables the detection of insider threats, compromised credentials, and abnormal user behavior.
Cloud Service Telemetry
Logs from cloud and SaaS environments, such as AWS CloudTrail, Azure Monitor, and Google Cloud Audit Logs, capture user actions, API calls, permission changes, and resource access across cloud infrastructures. Cloud telemetry is essential for maintaining visibility as organizations increasingly migrate workloads beyond traditional network perimeters.
Risk and Threat Intelligence Sources
Vulnerability scans identify exploitable weaknesses across assets, providing critical context for prioritizing remediation efforts. Threat intelligence feeds, such as those from Open Threat Exchange (OTX), supply curated indicators of compromise, including malicious IP addresses, domains, and file hashes. Integrating risk and threat intelligence enhances internal detections and supports proactive defense.
Core Technologies That Enable Analysis
Processing diverse and high-volume telemetry requires a robust stack of cybersecurity analytics technologies. These systems collect, normalize, enrich, and correlate data to generate meaningful insights in a timely manner.
Data Collection and Correlation Systems
Security information and event management (SIEM) platforms serve as the central hub for aggregating security data, applying normalization and correlation rules, and supporting compliance reporting. They often form the foundation of an organization’s detection infrastructure by linking signals across diverse sources.
Automation and Enrichment Platforms
Security orchestration, automation, and response (SOAR) tools streamline investigation workflows by automating repetitive tasks and integrating alerts from multiple systems. Threat intelligence platforms (TIPs) enrich internal telemetry with external indicators of compromise and attacker tactics, improving context during detection and investigation.
Behavioral Analytics Solutions
User and entity behavior analytics (UEBA) solutions profile users, devices, and applications over time, establishing baselines of normal behavior. Deviations from these baselines help identify insider threats, compromised accounts, and lateral movement that traditional detection rules might miss.
Scalable Analytics Infrastructure
Modern cybersecurity analytics requires scalable storage and processing capabilities. Data lakes and warehouses provide long-term retention and flexible querying for raw and enriched telemetry. Stream processing frameworks such as Apache Kafka and Flink enable real-time ingestion and low-latency analysis. Distributed analytics engines, including Apache Spark and graph databases, support complex modeling, threat hunting, and retrospective investigations over large datasets.
A Core Challenge: Understanding Event Relationships
Modern security environments generate massive volumes of telemetry, events, and alerts. Yet context—the understanding of how individual events relate to one another—remains scarce. Each log entry often captures only a fragment of the broader story, forcing defenders to react to symptoms rather than address root causes.
Attacks rarely unfold in a simple, linear fashion. A single campaign might involve phishing emails, credential theft, lateral movement, malware deployment, and cloud resource abuse, with each phase leaving traces across different systems. For example, one endpoint might generate a malware hash tied to a domain, another device might beacon to that domain, and a cloud instance might later communicate with the same attacker infrastructure. Traditional security tooling often treats these signals as isolated events, missing the larger patterns that link them together.

The difficulty stems from how most analytics systems structure data. Platforms based on static tables and schemas excel at point-in-time queries but struggle to model dynamic, multi-hop relationships. Join-heavy queries across disparate datasets are slow and unreliable. Recursive queries, which are critical for tracing attacker movement across multiple assets, are difficult or unsupported. As a result, analysts are often forced to manually pivot between dashboards and tools, reconstructing the attack graph mentally during investigations.
This fragmentation creates serious blind spots. Without a way to natively model and explore relationships, defenders risk missing key links between events. Investigative questions such as which assets communicated with a suspicious domain, which users accessed the same datastore, or whether lateral movement occurred are fundamentally about relationships, not isolated records.
Addressing this challenge requires shifting from thinking in terms of individual events to modeling connections. Graph-based approaches, which treat entities and relationships as first-class citizens, provide a more natural and powerful way to reconstruct attacks, uncover hidden patterns, and respond with greater speed and clarity.
Introducing Graph Modeling and PuppyGraph
A graph model represents data as entities (nodes) and relationships (edges) between them. In cybersecurity, nodes might represent users, endpoints, domains, IP addresses, or file hashes, while edges capture actions and associations such as logins, communications, or malware relationships. This structure makes it easier to query, visualize, and understand how threats spread across complex environments.
To make graph modeling practical for security teams without complex infrastructure changes, PuppyGraph provides an accessible and scalable solution.
PuppyGraph offers a real-time, zero-ETL graph query engine that allows organizations to query existing relational data stores as unified graphs without moving or duplicating data. By connecting directly to SQL-based systems, PuppyGraph enables teams to model and explore relationships through familiar languages such as openCypher and Gremlin, without restructuring their underlying databases.
Unlike traditional graph databases, PuppyGraph eliminates the need for complex ETL pipelines and specialized storage. It supports petabyte-scale datasets and fast, multi-hop queries through a distributed, vectorized execution engine, with separate computation and storage layers to maintain consistent performance as data volumes grow.

By simplifying graph analytics over existing infrastructure, PuppyGraph helps security teams uncover hidden patterns, map attack paths, and accelerate threat investigations. To demonstrate this approach, we will walk through an example using PuppyGraph and threat intelligence data from Open Threat Exchange (OTX).

Demo
This demonstration shows how PuppyGraph can be used to model and analyze real-world threat intelligence data from Open Threat Exchange (OTX). We transform OTX pulses—summaries of threats and associated indicators of compromise (IOCs)—into a graph structure for querying and visualization. The OTX data is downloaded as JSON files, imported into a PostgreSQL database, and then mapped into a graph model using PuppyGraph. Pulses group related IOCs, providing context about threat campaigns, malware families, or attacker infrastructure.
To help you follow along, we have prepared all necessary materials, including setup scripts, schema files, and sample code, in a public GitHub repository. Please download or clone the repository before starting.
Environment Setup
1. To follow along, you will need Docker Compose, Python 3, and an OTX API key. Start by launching PostgreSQL and PuppyGraph services:
docker compose up -d
2. Next, create a Python virtual environment, activate it, and install the required dependencies:
python3 -m venv myvenv
source myvenv/bin/activate
pip install psycopg2-binary
3. Install the OTXv2 Python SDK from the customized repository:
cd ../OTX-Python-SDK
pip install .
4. After installation, navigate back to the demo directory.
cd ../demo-1
Importing OTX Data
1. Configure your OTX API key in data.py and download threat pulses:
python data.py download
2. Access the PostgreSQL client (Password: postgres123.) and create the required tables:
docker exec -it postgres psql -h postgres -U postgres
Then run the SQL commands in create_tables.sql to set up the schema.
3. Import the downloaded data into PostgreSQL:
python data.py import
4. Access the PostgreSQL client as before and run some queries to verify the data:
SELECT * FROM pulse LIMIT 5;
Building the Graph Model in PuppyGraph
Access the PuppyGraph Web UI at http://localhost:8081 using:
- Username: puppygraph
- Password: puppygraph123
Upload the provided schema.json file through the Upload Graph Schema section to define the nodes and edges.

Querying Threat Relationships
In the PuppyGraph Query interface, you can run Gremlin or openCypher queries to explore the relationships between pulses and indicators. Here are some example queries of Gremlin and Cypher.
Gremlin queries:
// Count the number of pulses
g.V().hasLabel("pulse").count()
// Maximum number of indicators linked to a pulse
g.V().hasLabel("pulse").local(__.out("pulse_indicator").count()).max()
// Top 10 pulses by number of indicators
g.V().hasLabel('pulse').as('p').
project('name', 'description', 'indicatorCount').
by('name').
by('description').
by(__.out('pulse_indicator').count()).
order().by(select('indicatorCount'), desc).
limit(10)
// Indicators linked to two or more pulses
g.V().hasLabel("indicator").
where(__.in("pulse_indicator").count().is(gte(2))).
in("pulse_indicator").path()
Cypher queries:
// Count the number of pulses
MATCH (n:pulse) RETURN COUNT(n)
// Maximum number of indicators linked to a pulse
MATCH (p:pulse)
OPTIONAL MATCH (p)-[:pulse_indicator]->(i)
WITH p, COUNT(i) AS indicatorCount
RETURN max(indicatorCount) AS maxIndicatorCount
// Top 10 pulses by number of indicators
MATCH (p:pulse)
OPTIONAL MATCH (p)-[:pulse_indicator]->(i)
WITH p, COUNT(i) AS indicatorCount
RETURN p.name, p.description, indicatorCount
ORDER BY indicatorCount DESC
LIMIT 10
// Indicators linked to two or more pulses
MATCH (i:indicator)<-[:pulse_indicator]-(p:pulse)
WITH i, COUNT(p) AS pulseCount
WHERE pulseCount >= 2
MATCH path = (p)-[:pulse_indicator]->(i)
RETURN path

Cleanup
When finished, shut down and remove the running services:
docker compose down -v
Conclusion
Today’s cyber threats rarely stay confined to a single system or user. They move across devices, identities, and cloud services, making traditional alert-based defenses insufficient. To respond effectively, security teams need correlation, context, and clarity—qualities that cybersecurity analytics brings together to detect faster, investigate deeper, and act smarter.
But volume alone isn’t enough. Real value comes from modeling and exploring the relationships hidden in the data. That’s why many teams are turning to graph-powered approaches to trace attacker infrastructure, map lateral movement, and connect the dots efficiently.
If you’re ready to go beyond dashboards and uncover deeper structure in your threat data, try the forever-free Developer Edition or book a demo with our team.
Get started with PuppyGraph!
Developer Edition
- Forever free
- Single noded
- Designed for proving your ideas
- Available via Docker install
Enterprise Edition
- 30-day free trial with full features
- Everything in developer edition & enterprise features
- Designed for production
- Available via AWS AMI & Docker install