Future Of Artificial Intelligence In Cybersecurity -

Cyber Artificial Intelligence (AI) technology and tools are nowadays in the early stages of adoption, but the global market is expected to grow by US$19 billion between 2021 and 2025. Nice, but why? Let’s take a look at some of the main problems cybersecurity will face and how AI can help provide valuable solutions.

Signature-Based Intrusion Detection Systems (IDS) Can’t Spot Zero-Day Attacks

Traditional intrusion detection techniques in the cybersecurity domain use signatures or indicators of compromise to identify threats. Signature-based approaches are particularly effective against known attacks, but can fail in at least three cases: 1) when it comes to zero-day attacks, signature features might be unknown, resulting in insufficient profiling of the threat; 2) these systems require to be constantly updated to propagate the latest signatures appended to the databases, but the negligence of the user or lack of speed of the service provider could lead to delay; 3) threats can show smart behaviors, such as dynamic signatures, which can help them evade the monitoring system. This is part of the reason why companies are increasingly moving towards a “zero trust” model, where defense mechanisms are constantly inspecting network traffic and applications to verify they are legitimate. Artificial Intelligence can be used to mitigate the problem in the form of an anomaly-based detection system. While signature-based methods are configured to recognize the patterns of suspicious behavior, AI-based methods approach the problem from a complementary perspective. Indeed, anomaly detection techniques learn to model the normal behavior of a subject (e.g. a user, a company, a business network) and use the acquired knowledge as a reference to spot anomalies. This characteristic allows AIs to be effective against zero-day attacks, fast against recently discovered threats, and resilient against dynamic signatures. Therefore, combining the traditional and AI-based approaches, a hybrid system allows for making the most out of both, leveraging their complementarity.

Among the possible AI-based approaches the more suitable to build an intrusion detection system are supervised, semi-supervised, and unsupervised ones. A supervised approach requires a labeled dataset (i.e. a dataset containing both normal and anomalous samples clearly identified by a domain expert) to be trained; in the case of a high-quality, comprehensive dataset it can be able to turn the provided data into a knowledge general enough to explain both normal behaviors and known attacks, without losing effectiveness on the unknown ones. An unsupervised approach uses machine learning algorithms to analyze and cluster unlabeled data sets; these algorithms can discover hidden patterns in data without the need for human intervention. Semi-supervised approaches instead require a dataset containing only the normal data, to model a normal behavior to use as a reference to spot unseen anomalies.

Effective IDS Can Be Computationally and Time Expensive

Most IDS solutions proposed by the researchers are based on complex models that are resource expensive. These systems are often based on deep learning techniques that, despite being very effective, are time and computationally expensive. The processing speed of these systems can be improved by increasing the processing power, for instance by relying on multi-core high-performance GPUs whose main drawback is high hardware cost and energy consumption. These characteristics can prevent these solutions from being deployed in several use cases where speed and pervasiveness are crucial, as in the case of IDSs aimed to provide real-time security to IoT networks. Sensor nodes in IoT environments are resource-constrained, with limited computational power, storage capacity, and battery life, making it impossible for high-performance deep learning models to be deployed in a distributed manner over the sensor nodes. In this example scenario either a lightweight IDS model with a good intrusion detection rate or innovative high-performance, energy-efficient hardware is needed.

To address the need for energy-efficient processors, neuromorphic computing utilizes asynchronous, event-based designs that run on top of semiconductor devices inspired by neurobiological architectures. Indeed, Human brains can consume less than 20 watts of power and still outperform von Neumann supercomputers, demonstrating outstanding energy efficiency. Neuromorphic processors are dramatically different from traditional processors: the formers are characterized by simple, highly interconnected processing elements and implement spiking neural network execution models.

Although hardware improvement is crucial to reach satisfactory resource efficiency, interesting progress can be achieved by working meticulously on the algorithmic aspect to build effective lightweight IDSs. Algorithms like Extreme Learning Machines (ELM) can help. ELMs are simple feedforward neural networks with one or more layers of hidden nodes, where the number of nodes is a parameter to be learned. These hidden nodes can be either randomly assigned and never updated or can be inherited from their ancestors. Unlike standard neural networks, ELMs do not rely on backpropagation for training and, in most cases, the output weights of hidden nodes are usually learned in a single step. As a consequence these models show satisfying generalization capabilities and a training speed thousands of times higher, allowing quick learning on-the-fly and model updating in continual learning frameworks.

AI-Generated Phishing Emails are More Likely Open than Handwritten Ones

Artificial Intelligence is a double-edged sword. Although it is particularly helpful to prevent and react against cyber threats, threat actors can weaponize AI. Some examples are automatic target selection, attack design to hide from IDS, deepfakes, human impersonation, password guessing, enumeration, attack surface scanning, and the crafting of reliable phishing emails. As these AI-aided attacks are getting more effective and widely adopted, enterprises need to take particular notice to protect themselves. But how do we protect ourselves from AI-aided attacks if traditional security systems are often ineffective against them? Well, with AI-aided security systems.

For example, when combined with stolen personal information or scraped data from online websites such as social networks, cybercriminals can leverage AI to create phishing emails to spread malware or collect valuable information. AI-generated phishing emails often result to be more reliable than manually crafted ones and consequently have a higher probability of being opened. To spot these contents Natural Language Processing can be used. State-of-the-art large language models can be used, such as Generative Pre-trained Transformer (GPT), a large neural network trained on massive datasets with astonishing capabilities. GPT-3 and similar models can provide a variety of benefits to cybersecurity applications, including natural-language-based threat detection, categorization of spam and malicious emails, and monitoring of online forums to prevent future threats. These models already have incredible capabilities and soon the new, disruptive, most powerful-than-ever fourth version of GPT will be released.

Domain Shifting Relative to Simulated Datasets

As in many other domains, academia offers valuable, innovative solutions for solving modern days cybersecurity problems. These solutions are often generated by using academic datasets such as UNSW-NB15 and KDD Cup 1999, among others, in the case of Network Intrusion Detection Systems. These datasets are typically synthetically generated by simulating attacks in carefully designed networks. Although they represent valuable benchmarks for realizing proof of concepts of new algorithms and approaches, they show some limitations: 1) since they are generated in lab environments with limited resources, these datasets cannot contain all the possible existing attacks and their variants, but they are limited to several selected attacks; 2) the most widely adopted datasets were generated several years or even decades ago and as a consequence, they do not fully represent the current network characteristics; 3) their synthetic nature implies the presence of a bias in the traffic patterns due to structural factors and attack configurations. These differences between synthetic data and real-world traffic, which are commonly referred to as “domain shifting”, prevent IDS trained on academic data to be deployed in production. To mitigate the domain shifting issue high-quality, comprehensive, up-to-date, real-world datasets are needed. A step towards more modern network attacks on a real-world network came with the LITNET dataset, which was collected in 2020 on a Lithuanian network covering nodes in four major Lithuanian cities. This dataset is one of the first long-term (10 months), real-world network intrusion datasets produced and made available for researchers. However, the dataset contains a limited number of attacks and it is probably still insufficient to train AI-powered IDS.

Privacy Concerns Relative to Real-World Datasets

Due to the limited possibilities of academia caused by the lack of high-quality open datasets, relying on companies’ private databases is mandatory when it comes to developing cybersecurity systems. When these datasets are not simulated, they are generated by companies’ users, raising privacy and security concerns.

Homomorphic encryption is a type of encryption mechanism that could resolve these security and privacy issues. While public key encryption has three security procedures (i.e., key generation, encryption, and decryption), homomorphic encryption is based on a four-step process that includes an evaluation algorithm. Homomorphic encryption allows third-party service providers to perform only certain types of operations on the user’s encrypted data without decrypting it.

Collaborative Learning, also known as Peer-to-Peer Federated Learning, is a type of federated learning where a group of entities (e.g. companies) with a common goal can form a consortium to share model weights instead of sharing data. This allows for all entities to have a model with high generalization capabilities built with the contribution of several local models trained on limited local datasets. For example, let’s assume there are three security companies each owning a dataset specialized in a different type of attack, like DDoS, ransomware, and probing. If each company decides to train a model with its own dataset, the resulting model would be able to predict only one kind of attack. If their objective is to obtain a general model to predict every type of attack, the easiest way to proceed would be to share their data, but this would imply several issues regarding ownership, privacy, and security. These companies can then decide to join a consortium to share local model weights with each other instead of the data to collaboratively train their model without compromising on privacy. Federated Learning can thus help bridge the gap toward obtaining quality data and creating better models.

High Intra- and Inter-Node Variability

The normal behavior of a node of a complex network system is subjected to high variability. This variability manifests in two ways: 1) as inter-variability, i.e. the behavior of a node can change between different nodes, networks, geographical areas, etc.; 2) as intra-variability, i.e., the behavior of the same node can vary over time, showing drift and seasonality. To account for this variability several strategies are emerging.

One example is localized Federated Learning, an approach that allows tuning a model accounting for local context without aggregating data in massive centralized datasets for continual learning. According to this approach, a first generic model is trained on the first batch of data; then, the baseline model is distributed to all the nodes of the network, where it gets tuned on local data available at the node, hence improving its contextual performance. The knowledge locally learned in the form of model weights is then exchanged with the main hub, aggregated using various techniques (federated averaging, XGBoost, FedProx, and Fed+), and used to update the general model, closing the loop. This process is iterated to keep a generic federated model and local specialized models up-to-date with changes.

Modeling Normal Behavior to Spot Anomalies Requires Handling Massive, Complex, Heterogeneous Datasets

To spot anomalies and cyber attacks, an intrusion detection system must be able to understand, directly or indirectly, what the normal behavior of a network or a node is. The description of this can be obtained manually, when a domain expert defines a set of rules or signatures, automatically, when the patterns are learned from data through machine learning techniques, or in a hybrid way. In any case, this process requires a deep understanding of the data to extract useful patterns and sets of rules, but cybersecurity datasets can be quite complicated to handle. These can be massive (e.g., network packet records), complex (e.g., when containing different behavior types or many features), and heterogeneous since they can contain different types of data like tabular data, time series, natural language, etc. (e.g. for phishing detection purposes). To simplify the dataset, highlight its most important features while getting rid of the noise, visualize it in a human-understandable manner and analyze it, dimensionality reduction techniques can be used. A dimensionality reduction technique is any kind of algorithm able to learn the relationships between the features (dimensions) of a dataset and to project its records onto a lower dimensional space. Popular algorithms are Principal Component Analysis (PCA), Factor Analysis (FA), Linear Discriminant Analysis (LDA), and Truncated Singular Value Decomposition (SVD). More recently, more advanced techniques were invented, such as t-distributed Stochastic Neighbor Embedding (t-SNE) and Uniform manifold approximation and projection (UMAP). In particular, the latter is catching the attention of researchers and product developers for its ability to preserve both short- and longer-range distance relationships between data and to deal with multimodal data. Fun fact: the mathematical principles it is based on (theory of pseudo-Riemannian manifolds) are the same Albert Einstein relied on to develop his general theory of relativity. Here is a trivial but effective example of how these techniques work.

This projection onto a low-dimensional space facilitates the learner to identify homogeneous groups of records (i.e. clusters of samples sharing similar characteristics), both visually and mathematically. Each cluster can represent a different set of similar behaviors that can be considered independent from all the others. Thus, the clustering process allows the analyst to split the dataset into smaller portions, each one with a peculiar set of rules to be modeled. This clustering process is particularly crucial for boosting the interpretability and the mouldability of the dataset, as well as the effectiveness of the learned rules that describe normal and anomalous patterns. The clustering process can be automatized by leveraging several algorithms, such as K-Means, Gaussian Mixture Model, HDBSCAN, and many others.

False Positives

False positives can distract security analysts from real alerts and much like the boy who cried wolf in Aesop’s fables, they can reduce confidence in alerts to a point where a real alert might be missed. While there is no one magic solution to eliminate false positives, there are several techniques that can be combined in parallel to minimize false alerts to an acceptable level.

First, the overall design of an AI/ML model is critical. That starts with a proper exploratory analysis of the data and a firm understanding of those data, the domain, and the problem to be solved. The choice of features (variables) for a model is also critical. Much like Goldilocks, it’s possible to have too many features sometimes referred to as the curse of dimensionality or to have not enough variables for a given data stream. Thus choosing an appropriate feature selection algorithm for the problem at hand is critical to the success of the model.

There are dozens of possible AI/ML algorithms and many more parameters that can be applied to a particular problem. The careful selection of the right algorithm or algorithms is critical. In the case of cybersecurity sometimes different algorithms may have different strengths and weaknesses for detecting specific types of indicators of attack and compromise. Often several models can be combined into a hybrid or ensemble. Other techniques such as voting methods can be used to form a consensus of several models.

Cyber situation awareness can also help to improve the accuracy of models by considering what else is going on in the environment in time and the network in general. And additional contextual data to alerts can also improve their accuracy. Finally, once an algorithm is in production, it needs to be monitored and tuned accordingly.

Conclusion

In this blog post, we briefly analyzed some of the main problems cybersecurity is facing nowadays and will face in the future. We also pragmatically described several ways AI can help in providing valuable solutions to these challenges. Cybersecurity Artificial Intelligence is a broad topic, which certainly cannot be comprehensively represented in a single blog post. Therefore, this post has no ambition to be exhaustive, but it lays the groundwork for future posts that will better explore the fascinating relationship between artificial intelligence and cybersecurity. Here at DuskRise, we promise that future blog posts, like this one, will be pragmatic and fact-based and will allow for critical highlighting of opportunities and limitations of present and future solutions.

If you have read this long blog post so far, you must have enjoyed it. If so, don’t miss our future publications. See you soon.

The corporate network perimeter has been extended into untrusted networks, redefining the enterprise edge. Employees working from home are using these networks to access sensitive company assets, putting organizations at risk of lateral movement attacks. The DuskRise solution enables corporate security and segmentation policy management, extending office-grade protection to remote assets and users.

Schedule a demo today to see how it works.