Bridging Theory and Practice: The Path to Computationally Feasible Machine Learning with Fully Homomorphic Encryption

Executive Summary:


This report provides a comprehensive analysis of Fully Homomorphic Encryption (FHE) as a transformative technology for privacy-preserving machine learning (PPML). It begins by establishing the cryptographic principles of FHE, its evolution, and its unique value proposition in securing data during computation. The core of the report is a deep-dive into the three fundamental challenges that have historically rendered FHE impractical: prohibitive performance overhead, the intricate problem of noise management, and the massive data expansion of ciphertexts and keys. We then present a multi-faceted analysis of the solutions being engineered to overcome these barriers. This includes a comparative review of modern FHE schemes (BGV, BFV, CKKS, TFHE) to identify their suitability for various ML tasks, an exploration of the software ecosystem of libraries and compilers that are making FHE more accessible, and a detailed survey of the hardware acceleration landscape, where FPGAs and ASICs are achieving performance gains of several orders of magnitude. The report synthesizes these threads to conclude that the practical application of FHE for ML is no longer a distant theoretical goal but an emerging reality, driven by a co-design approach that spans algorithms, software, and hardware.

 

The Cryptographic Paradigm of Fully Homomorphic Encryption

 

Fully Homomorphic Encryption (FHE) represents a paradigm shift in data security, offering the capability to perform arbitrary computations directly on encrypted data without the need for prior decryption.1 This unique property fundamentally alters the data security landscape by extending protection beyond data at rest and in transit to the processing stage itself, a phase where data has traditionally been most vulnerable. The result of a homomorphic computation remains encrypted; when decrypted by the key holder, this result is identical to what would have been obtained by performing the same operations on the original, unencrypted data.2 This capability enables a new class of secure outsourced computation, particularly in the context of cloud computing and third-party data analytics, where sensitive information can be processed without ever being exposed.

 

Conceptual Framework: From Privacy Homomorphisms to Arbitrary Computation

 

The theoretical underpinnings of FHE date back to 1978, when Rivest, Adleman, and Dertouzos first proposed the concept of “privacy homomorphisms”.4 They envisioned an encryption system where specific algebraic operations on plaintext data would have a corresponding operation in the ciphertext domain. For over three decades following this proposal, the cryptographic community only succeeded in developing Partially Homomorphic Encryption (PHE) schemes. These systems could support an unlimited number of operations of a single type—either addition or multiplication, but not both simultaneously.6 Famous examples include the RSA cryptosystem, which is multiplicatively homomorphic, and the Paillier cryptosystem, which is additively homomorphic.4

The long-standing challenge of creating a system that could handle both addition and multiplication, and thus arbitrary computation, was considered by many to be insurmountable. This changed dramatically in 2009 with Craig Gentry’s groundbreaking Ph.D. thesis, which presented the first plausible construction of an FHE scheme.1 Gentry’s work was revolutionary, demonstrating for the first time that it was theoretically possible to evaluate circuits of arbitrary depth and complexity on encrypted data.5 The central innovation that enabled this leap from partial to full homomorphism was a technique Gentry termed “bootstrapping.” This procedure is a method for managing the “noise” that is inherent in FHE ciphertexts and which grows with each successive operation. By effectively refreshing a ciphertext and resetting its noise level, bootstrapping allows for an unlimited number of computations.3

To achieve the goal of arbitrary computation, an FHE scheme must be able to evaluate a set of functions that is Turing complete. In the context of digital computation, this is universally achieved by supporting the homomorphic evaluation of bit-wise Addition (equivalent to a Boolean XOR gate) and bit-wise Multiplication (equivalent to a Boolean AND gate). As the set {} is Turing complete, any computable function can be represented as a circuit of these gates. Therefore, a cryptosystem that can homomorphically evaluate both additions and multiplications can, in principle, compute any function on encrypted data.4

 

Mathematical Foundations: Lattice-Based Cryptography and the Learning with Errors Problem

 

The security of most contemporary FHE schemes is rooted in the mathematical hardness of problems defined on lattices. Specifically, many schemes, including the most efficient and widely used ones, base their security on the Ring Learning with Errors (RLWE) problem.5 In these schemes, plaintext messages are encoded as polynomials, and encryption involves masking this polynomial with another polynomial that contains small, randomly generated “noise” coefficients. The ciphertext itself is typically represented as a pair of large-coefficient polynomials in a specific polynomial ring, such as

, where  is a cyclotomic polynomial.9 The security of the system relies on the assumption that it is computationally infeasible for an attacker, without the secret key, to distinguish a valid ciphertext from a pair of uniformly random polynomials in the ring. This difficulty is directly related to the hardness of solving the underlying RLWE problem.

A significant and strategic advantage of this lattice-based foundation is its inherent resistance to attacks from quantum computers. Unlike classical public-key cryptosystems such as RSA and Elliptic Curve Cryptography (ECC), whose security relies on the difficulty of integer factorization and the discrete logarithm problem, respectively, lattice-based problems are not known to be efficiently solvable by quantum algorithms like Shor’s algorithm.12 This makes FHE a form of Post-Quantum Cryptography (PQC), positioning it as a durable, long-term solution for data security in an era where the threat of quantum computing is becoming increasingly tangible.4 This quantum resilience is not merely a technical footnote; it is a powerful strategic driver for the significant research and development investment in FHE. While the performance overhead of FHE is substantial, the alternative—using classical encryption—carries the risk of “harvest now, decrypt later” attacks, where adversaries store encrypted data today with the intent of decrypting it with a future quantum computer. For governments and enterprises dealing with data that must remain confidential for decades, the high computational cost of FHE serves as a necessary investment to future-proof their data infrastructure against this existential threat.

 

The Generational Evolution of FHE Schemes

 

The field of FHE has evolved rapidly since Gentry’s initial construction, with progress often categorized into distinct “generations,” each marked by significant improvements in performance, efficiency, and underlying mathematical techniques.5

  • First Generation: This generation includes Gentry’s original 2009 scheme, which was based on ideal lattices, and subsequent schemes like DGHV, which was built over the integers.5 These schemes were monumental in proving the feasibility of FHE but were far too slow for any practical application. Gentry’s first implementation, for instance, reported a timing of approximately 30 minutes for a single basic bit operation on standard hardware, highlighting the immense performance gap that needed to be closed.2
  • Second Generation: Emerging around 2011-2012, this generation brought major efficiency improvements by leveraging the Ring Learning with Errors (RLWE) problem. Key schemes from this era include BGV (Brakerski-Gentry-Vaikuntanathan) and BFV (Brakerski/Fan-Vercauteren).5 A crucial innovation of this generation was the ability to perform Single Instruction, Multiple Data (SIMD) operations. This technique, known as “packing,” allows a single ciphertext to encrypt a vector of multiple plaintext values, and a single homomorphic operation on the ciphertext applies the operation to all values in the vector simultaneously. This amortization of computational cost made these schemes efficient enough for a range of applications beyond simple proof-of-concept demonstrations.5
  • Third Generation: This generation, which includes schemes like FHEW (Ducas-Micciancio) and TFHE (Chillotti-Gama-Georgieva-Izabachene), focused on radically improving the performance of the most expensive FHE operation: bootstrapping. These schemes introduced a gate-by-gate bootstrapping method that was orders of magnitude faster than in previous generations, with TFHE achieving bootstrapping in under a second.5 This made it feasible to evaluate circuits of arbitrary depth without prohibitive latency penalties for noise management. However, these schemes initially lacked the efficient SIMD capabilities of their second-generation counterparts.5
  • Fourth Generation: The fourth generation is primarily defined by the Cheon-Kim-Kim-Song (CKKS) scheme, which introduced the concept of approximate homomorphic encryption.5 Unlike previous schemes that performed exact arithmetic on integers, CKKS was designed to perform approximate arithmetic on real or complex numbers. It achieves this by treating the inherent cryptographic noise as part of the overall approximation error, analogous to floating-point errors in standard computation. This approach proved to be extremely efficient for applications that are tolerant of small precision errors, most notably machine learning, making CKKS a cornerstone of modern privacy-preserving AI.5

 

FHE in the Privacy Technology Landscape: A Comparison with Confidential Computing and MPC

 

FHE is one of several advanced technologies aimed at protecting data during processing, and its unique approach sets it apart from other methods like Confidential Computing and Secure Multi-Party Computation (SMPC). Understanding these differences is crucial for appreciating the specific security model FHE provides.

FHE represents a fundamental shift away from the traditional security philosophy of perimeter defense. For decades, data security has focused on securing the infrastructure—building firewalls, controlling access, and hardening servers—under the assumption that if an attacker breaches the system, any data being processed in the clear is compromised.4 FHE operates on the starkly different assumption that the infrastructure is

already or will inevitably be compromised.4 It provides security not by protecting the environment, but by making the data itself computationally indecipherable at all times, even while it is being actively processed. This moves the anchor of trust from the physical or virtual computing environment to the mathematical guarantees of the underlying cryptography.

  • FHE vs. Confidential Computing: Confidential Computing technologies, such as those based on Trusted Execution Environments (TEEs) like Intel SGX or AMD SEV, aim to protect data in use by creating isolated, hardware-based secure enclaves.1 Within these enclaves, data is decrypted, processed in plaintext, and then re-encrypted before leaving. The primary difference lies in the trust model. Confidential Computing requires trust in the hardware manufacturer and the integrity of the TEE implementation. FHE, by contrast, is a purely cryptographic solution that requires trust only in the underlying mathematics of the encryption scheme.1 While Confidential Computing currently offers significantly better performance for general-purpose computing, it still exposes plaintext data within the hardware enclave, a potential attack surface that FHE completely eliminates.1 These two technologies can also be used in a complementary fashion to provide layered security.
  • FHE vs. Secure Multi-Party Computation (SMPC): FHE is typically characterized as a non-interactive protocol for outsourced computation. A single client encrypts data and sends it to a server, which performs computations without any further interaction with the client until the encrypted result is returned.7 SMPC, on the other hand, is an interactive cryptographic protocol involving multiple parties who wish to jointly compute a function of their private inputs without revealing those inputs to one another.17 While both achieve the goal of computing on private data, SMPC requires continuous communication and coordination among participants, whereas FHE is better suited for client-server scenarios. The two are not mutually exclusive; for example, FHE can be used as a tool within an SMPC protocol to secure certain computations.

 

The Intersection of FHE and Machine Learning

 

The convergence of Fully Homomorphic Encryption and machine learning has created the field of Privacy-Preserving Machine Learning (PPML), a domain with the potential to unlock the value of sensitive data in industries like healthcare, finance, and beyond. By enabling ML models to be trained and executed directly on encrypted data, FHE addresses the critical privacy gap that occurs when data is processed by third-party services.

 

Enabling Privacy-Preserving Machine Learning (PPML)

 

The most prominent application of FHE in machine learning is the Machine-Learning-as-a-Service (MLaaS) scenario.5 In this model, a cloud provider hosts a powerful, pre-trained ML model (e.g., for medical diagnosis, fraud detection, or image recognition), and clients wish to use this service for inference on their private data. Traditional MLaaS requires the client to send their data in plaintext to the provider’s server, creating a significant privacy risk and a single point of failure where the sensitive data is exposed.13

FHE provides an elegant solution to this problem. The workflow for privacy-preserving inference proceeds as follows:

  1. The client, who possesses the public and secret keys for an FHE scheme, encrypts their sensitive input data (e.g., a patient’s medical scan) using the public key.19
  2. This encrypted data, or ciphertext, is sent to the MLaaS provider’s server.
  3. The server executes its ML model homomorphically on the ciphertext. Every operation in the model—from matrix multiplications to activation functions—is performed on the encrypted data without ever decrypting it.13
  4. The server obtains an encrypted prediction as the result of the inference.
  5. This encrypted result is sent back to the client.
  6. The client uses their secret key, which never left their possession, to decrypt the result and obtain the final prediction in plaintext.13

Throughout this entire process, the server learns nothing about the client’s input data or the resulting prediction, ensuring end-to-end privacy.20 FHE can also be extended to the model training phase. This allows multiple data owners (e.g., different hospitals) to pool their encrypted datasets to collaboratively train a more accurate and robust ML model than any single institution could train on its own, all without revealing their sensitive individual datasets to each other or to a central server.8 This is particularly powerful when combined with frameworks like federated learning.8

 

Architecting FHE-Friendly Neural Networks

 

Standard machine learning models, particularly deep neural networks, are not inherently compatible with the computational constraints of FHE. To make them work in the encrypted domain, models must be carefully adapted and re-architected. This process primarily involves addressing two major challenges: handling non-linear activation functions and converting from floating-point to integer-based arithmetic.

 

The Challenge of Non-Linear Activation Functions

 

Neural networks derive their expressive power from non-linear activation functions, such as the Rectified Linear Unit (ReLU), Sigmoid, or hyperbolic tangent (Tanh), which are applied after the linear operations in each layer.22 However, these functions are not polynomials and thus cannot be evaluated natively by most FHE schemes, which are typically restricted to polynomial operations (additions and multiplications).24

The prevailing solution to this problem is to replace the standard activation functions with low-degree polynomial approximations that mimic their behavior over a specific range.23 Early pioneering work in this area, such as Microsoft’s CryptoNets, took a simple approach by replacing the ReLU function with a square function (

), which is a simple polynomial that introduces non-linearity.14 While effective for shallow networks, this approximation is not very accurate. More recent and advanced methods employ sophisticated techniques, such as Chebyshev series or the Remez algorithm, to find optimal low-degree polynomial approximations of functions like ReLU or Sigmoid.24

This approach introduces a critical trade-off between model accuracy and computational performance. A higher-degree polynomial can approximate the original activation function more accurately, leading to better model performance. However, evaluating a higher-degree polynomial requires a greater number of homomorphic multiplications. Since each multiplication significantly increases the noise in a ciphertext and is computationally expensive, using high-degree polynomials leads to slower inference times and requires larger, more cumbersome FHE parameters.25 Consequently, designing FHE-friendly networks involves a careful balancing act to find the lowest-degree polynomial that still provides acceptable model accuracy.

 

Quantization and Integer-Based Arithmetic for FHE Compatibility

 

The second major adaptation involves data representation. Most machine learning models are trained and executed using high-precision floating-point numbers (e.g., 32-bit float or 16-bit bfloat16). However, the most common FHE schemes, such as BFV and BGV, are designed to operate on integers within a finite field or ring.29 Even the approximate arithmetic scheme CKKS, which handles real numbers, does so by encoding them as scaled integers within polynomials. Therefore, all data involved in the ML model—including the input features, model weights, and biases—must be converted from floating-point to a fixed-point integer representation before encryption. This conversion process is known as quantization.29

A naive approach, known as Post-Training Quantization (PTQ), involves training a model with floating-point numbers and then simply quantizing the learned weights to integers. This often leads to a significant degradation in model accuracy, as the model was not designed to tolerate the loss of precision. A more effective and widely adopted technique is Quantization-Aware Training (QAT).30 QAT simulates the effects of low-precision arithmetic

during the training process itself. It inserts “fake” quantization operations into the neural network graph, forcing the model to learn parameters that are robust to the precision loss that will occur during encrypted inference. By making the model aware of the quantization constraints during training, QAT allows for the use of very low bit-widths (e.g., 8-bit or even 4-bit integers) while maintaining high accuracy, which is crucial for FHE performance.33 Modern PPML frameworks, such as Zama’s Concrete ML, integrate QAT directly into their toolchain, allowing data scientists to automatically produce quantized, FHE-ready models from standard ML frameworks.13

The constraints imposed by FHE—the need for polynomial-only operations and low-precision integer arithmetic—have a profound effect on the design of ML models. They introduce a “simplicity bias,” steering architectural choices away from the ever-increasing complexity often seen in plaintext ML (e.g., extremely deep networks, novel non-polynomial activations) and toward models that are inherently more efficient in terms of their arithmetic complexity. An ML engineer designing for an FHE deployment is not solely optimizing for accuracy but for what can be termed “homomorphic complexity”—a composite metric that includes the model’s multiplicative depth, its tolerance for low-precision quantization, and the degree of its polynomial activation functions. This leads to a distinct set of optimal architectures that may differ significantly from their plaintext counterparts. This complex, multi-dimensional optimization space—balancing network topology, quantization parameters, and polynomial approximations—is an ideal domain for automation. This points toward a future where FHE-aware Automated Machine Learning (AutoML) frameworks will become essential. Such systems would abstract away the cryptographic complexities, allowing a developer to specify a dataset and a target privacy-performance budget, and would automatically search for and generate a fully optimized, FHE-ready model.

 

Use Cases in High-Stakes Domains: Healthcare and Finance

 

The ability to perform machine learning on encrypted data is particularly transformative for industries that handle highly sensitive information and are bound by strict regulatory frameworks.

  • Healthcare: The healthcare sector is a prime example where data is abundant but heavily siloed due to privacy regulations like HIPAA in the United States.2 FHE offers a path to break down these silos securely. For instance, multiple hospitals could pool their encrypted patient records to train a more powerful diagnostic AI for detecting rare diseases. Researchers could perform large-scale genomic analyses or identify correlations between diseases and demographics across diverse populations without ever accessing individual patient data in the clear.3 This unlocks the potential for unprecedented medical discovery while upholding the highest standards of patient confidentiality.36
  • Finance: In the financial industry, FHE enables new forms of secure collaboration. A consortium of banks could, for example, collaboratively train a fraud detection model on their combined, encrypted transaction data. Such a model could identify sophisticated, cross-institutional fraud rings that would be invisible to any single bank operating on its own data.3 Similarly, a financial firm could use a third-party analytics service to perform complex risk modeling on its encrypted customer portfolios without revealing its proprietary trading strategies or sensitive client information.37 FHE can also be applied to build more accurate credit scoring models by securely incorporating data from multiple sources, all while complying with data privacy laws like GDPR.35

 

Analysis of Core Computational Barriers

 

Despite its transformative potential, the widespread adoption of FHE has been historically hindered by several fundamental computational challenges. These barriers—prohibitive performance overhead, the intricate mechanics of noise management, and the massive expansion of data size—have been the primary focus of FHE research for the past decade. Overcoming them is the key to making FHE a practical technology for real-world machine learning applications.

 

The Performance Chasm: Quantifying the Computational Overhead

 

The most significant obstacle to practical FHE is its immense computational cost. Operations performed on encrypted data are dramatically slower than their equivalents on plaintext, with slowdowns frequently cited to be between four and six orders of magnitude—that is, 10,000 to 1,000,000 times slower.11

This staggering overhead originates from the complex mathematical structures that underpin FHE schemes. A simple arithmetic operation, such as adding or multiplying two integers, is transformed into a complex series of operations on large-degree polynomials with very large coefficients.11 For example, in RLWE-based schemes, a single plaintext number is encoded into a polynomial, and encryption expands this into a pair of polynomials whose coefficients are drawn from a large integer modulus. A homomorphic multiplication then involves several polynomial multiplications, which are computationally intensive tasks in themselves.

This overhead translates directly into high latency for applications. Gentry’s original FHE scheme, while a theoretical marvel, took about 30 minutes to evaluate a single logic gate.2 While modern schemes have made extraordinary progress, the performance gap remains substantial. A single homomorphic NAND gate evaluation in a scheme like TFHE, for instance, takes on the order of milliseconds, whereas a native hardware gate operates in nanoseconds—a difference of roughly six orders of magnitude.40 In the context of machine learning, this means that a logistic regression training task that might complete in minutes on unencrypted data can take many hours when performed homomorphically.21 For deep neural networks with millions of parameters and operations, this performance penalty can stretch inference times from milliseconds to minutes or even hours, rendering real-time applications infeasible without specialized acceleration.

 

The Noise Dilemma: Managing Error Growth in Encrypted Computations

 

A unique and fundamental challenge in FHE is the management of “noise.” Unlike in traditional computing where noise is an unwanted artifact, in lattice-based FHE, it is an essential component for security. However, this same noise is also the primary limiting factor on the complexity of computations that can be performed.

 

The Mechanics of Noise Growth

 

The security of RLWE-based cryptosystems relies on the introduction of a small, random error or “noise” term during the encryption process. This noise effectively masks the underlying plaintext message within the mathematical structure of the ciphertext, making it computationally difficult to recover the message without the secret key.4

The complication arises because this noise accumulates with every homomorphic operation performed on the ciphertext. Homomorphic addition typically causes the noise to grow at a linear rate; for example, the noise in the sum of two ciphertexts is roughly the sum of their individual noises. Homomorphic multiplication, however, causes a much more rapid, multiplicative or exponential growth in noise.4

Every FHE ciphertext has an associated “noise budget” or “noise ceiling,” which is a threshold determined by the scheme’s parameters. If the accumulated noise from successive operations exceeds this threshold, the noise will overwhelm the original message signal within the ciphertext. At this point, the ciphertext becomes corrupted, and attempting to decrypt it will fail to produce the correct plaintext result.7 This inherent limitation means that for a given set of parameters, only a finite number of operations—particularly multiplications—can be performed before the noise budget is exhausted. A scheme that can only support a pre-determined, limited depth of computation is known as a Leveled FHE or a Somewhat Homomorphic Encryption (SHE) scheme.2

 

Bootstrapping: The Recrypting Engine for Unbounded Computation

 

To transcend the limitations of leveled schemes and achieve true Fully Homomorphic Encryption, a mechanism is needed to manage and reduce the accumulated noise. This mechanism is bootstrapping. First proposed by Gentry, bootstrapping is a remarkable procedure that effectively “refreshes” a ciphertext that is close to its noise limit, reducing its noise back to a low, manageable level.2

The process works, counter-intuitively, by homomorphically evaluating the decryption circuit itself. A simplified view of the process is as follows: the server takes a noisy ciphertext  (which encrypts a message ) and a public “bootstrapping key,” which is an encryption of the secret key . The server then uses the homomorphic evaluation capabilities of the scheme to compute the decryption function  in the encrypted domain. The output of this homomorphic decryption is a new ciphertext, , which also encrypts the same message . However, the noise in this new ciphertext  is not related to the high noise level of the original ciphertext ; instead, its noise is at a fresh, low level determined only by the operations performed during the bootstrapping procedure itself.4

By resetting the noise budget, bootstrapping makes it possible to perform an arbitrary number of subsequent operations, thus enabling circuits of unlimited depth.7 However, this power comes at a tremendous computational cost. The bootstrapping procedure is itself a complex computation involving many homomorphic operations, and it has historically been the single greatest performance bottleneck in FHE systems.7 A significant portion of FHE research over the last decade has been dedicated to designing more efficient schemes and faster algorithms for bootstrapping, with schemes like TFHE making notable progress by reducing bootstrapping times to the sub-second range.6

 

Alternative Noise Control: Modulus Switching and Rescaling

 

While bootstrapping is the universal method for achieving FHE, some schemes employ other techniques to manage noise growth for leveled computations. These methods do not reset noise but rather slow its growth, allowing for deeper circuits before bootstrapping becomes necessary.

  • Modulus Switching: This technique is a hallmark of the BGV scheme. After a homomorphic multiplication, which significantly increases the magnitude of the noise, the ciphertext modulus is “switched” to a smaller one. This is done by scaling down all the coefficients of the ciphertext polynomials. This operation reduces the magnitude of the noise term more than it reduces the magnitude of the message term, effectively increasing the signal-to-noise ratio and extending the remaining noise budget.9 A computation in BGV proceeds through a pre-defined “ladder” of decreasing moduli.
  • Rescaling: The CKKS scheme for approximate arithmetic uses a conceptually similar technique called rescaling. In CKKS, a plaintext is scaled by a large factor  before encryption. A multiplication of two ciphertexts results in a new ciphertext where the underlying plaintext is scaled by . The rescaling operation is a form of modulus switching that divides the ciphertext by , returning the scaling factor to its original level and, in the process, reducing the magnitude of the error that was introduced during the multiplication.47 This allows noise to grow linearly with the depth of the circuit, rather than exponentially, which is a key reason for CKKS’s efficiency in deep computations like neural networks.48

 

The Data Deluge: Ciphertext Expansion and Key Management Complexities

 

A final practical barrier to FHE adoption is the massive expansion in data size that occurs upon encryption. FHE ciphertexts are significantly larger than their corresponding plaintexts, often by several orders ofmagnitude. A single byte of plaintext data can expand to a ciphertext that is hundreds of kilobytes or even megabytes in size.11 This “data size inflation” has profound implications for system design, placing immense strain on memory capacity, storage systems, and network bandwidth, especially when dealing with large datasets typical in machine learning.50

The size of ciphertexts is directly coupled to the security and computational parameters of the FHE scheme. To achieve a desired security level (e.g., 128-bit security) and to support a sufficient multiplicative depth for a given computation, parameters such as the polynomial degree  and the size of the coefficient moduli  must be chosen appropriately. Larger values for these parameters provide greater security and a larger noise budget, but they also result in larger ciphertexts and keys, and slower homomorphic operations. This creates a tight and often difficult trade-off between security, functionality, and performance.51

Furthermore, FHE systems require the management of multiple types of cryptographic keys, all of which can be very large. In addition to the standard public and secret keys, FHE schemes require special “evaluation keys” to manage the results of homomorphic operations. These include relinearization keys (used to reduce the size of ciphertexts after multiplication) and rotation keys (used for SIMD vector permutations). For schemes that require bootstrapping, a large bootstrapping key is also needed. Securely generating, storing, and distributing these keys, which can collectively amount to gigabytes of data for a single user, presents a significant logistical and security challenge in large-scale applications.50

The seemingly prohibitive per-operation costs of FHE can be misleading when considered in isolation. The path to practical feasibility lies in amortizing this cost over large quantities of data. The most critical technique for achieving this is SIMD (Single Instruction, Multiple Data) processing, also known as “packing.” Schemes like BFV, BGV, and CKKS allow a single ciphertext to be structured as an encryption of a vector containing thousands of individual plaintext values.5 When a homomorphic operation (e.g., addition or multiplication) is performed on this single ciphertext, the operation is executed in parallel on all the packed values. Consequently, while the latency of one homomorphic operation remains high, the overall throughput, measured in plaintext operations per second, can be made practical for data-parallel workloads. This shifts the focus of optimization from minimizing per-operation latency to maximizing the number of parallel operations per ciphertext, a strategy that aligns perfectly with the vector and matrix computations that dominate machine learning algorithms.

Moreover, the nature of noise in FHE is not just a technical hurdle but also a feature that can be leveraged. While noise is fundamentally required for the security of lattice-based schemes, the CKKS scheme uniquely embraces the imprecision it causes. Instead of treating noise as an error to be strictly segregated from the message, as in exact arithmetic schemes like BFV, CKKS merges the noise with the message payload, framing the entire computation as approximate.47 This reframes the cryptographic problem of “managing a noise budget” into the more familiar data science problem of “managing numerical precision,” akin to handling floating-point errors.19 This conceptual alignment significantly lowers the barrier to entry for ML practitioners and is a primary reason for CKKS’s widespread adoption in the PPML community.

 

A Comparative Analysis of Modern FHE Schemes for Machine Learning

 

The evolution of FHE has produced several distinct families of schemes, each with unique characteristics, strengths, and weaknesses. For practical machine learning, four schemes have emerged as the most prominent: BGV, BFV, CKKS, and TFHE. There is no single “best” scheme; the optimal choice is highly dependent on the specific requirements of the machine learning task, such as the need for exact integer arithmetic versus approximate real-number computation, or the prevalence of non-linear operations.28

 

BGV and BFV: Schemes for Exact Integer Arithmetic

 

The Brakerski-Gentry-Vaikuntanathan (BGV) and Brakerski/Fan-Vercauteren (BFV) schemes are second-generation, “word-based” FHE schemes designed for performing exact computations on integers or elements of a finite field.26 Both are based on the RLWE problem and excel at highly parallelizable arithmetic through SIMD packing.10

  • Core Functionality: BGV and BFV are ideal for applications where perfect precision is non-negotiable. They operate over polynomial rings with integer coefficients, and all homomorphic additions and multiplications are performed modulo both a polynomial modulus and a plaintext modulus .
  • Key Differences: The primary distinction between the two lies in their approach to noise management. BGV is a scale-dependent scheme that uses the modulus switching technique. As computations are performed, the ciphertext modulus is progressively reduced to control the growth of noise.6 In contrast, BFV is a
    scale-invariant scheme. It manages noise by encoding the plaintext message into the most significant bits of the ciphertext’s polynomial coefficients, leaving the least significant bits to accommodate the noise. This design can be simpler to implement and reason about in some scenarios.6
  • Machine Learning Suitability: Because most machine learning algorithms are based on real-number arithmetic, BGV and BFV are not always the most natural fit. They require careful quantization of all data to integers, and the modular arithmetic they perform can sometimes lead to unexpected “wrap-around” effects if intermediate values exceed the plaintext modulus. However, they are highly valuable for specific ML tasks that require exactness, such as counting operations, secure database lookups, or models that rely on integer-based features.26 For example, Apple has reported using the BFV scheme to compute dot products and cosine similarity on integer-based embedding vectors for its Enhanced Visual Search feature.55

 

CKKS: The De Facto Standard for Approximate Arithmetic in ML

 

The Cheon-Kim-Kim-Song (CKKS) scheme represents a significant departure from its predecessors and is widely considered the de facto standard for FHE in machine learning.

  • Core Functionality: CKKS is specifically designed for approximate arithmetic on vectors of real or complex numbers.5 It achieves this by cleverly re-purposing the inherent cryptographic noise. Instead of treating noise as a separate entity to be managed, CKKS considers it an integral part of the computation’s overall precision error, much like the rounding errors that occur in standard floating-point arithmetic.19
  • Key Features: CKKS employs an efficient rescaling operation to manage the magnitude of plaintext values and control error growth after multiplications, allowing for deep arithmetic circuits.47 It also features powerful and highly efficient SIMD capabilities, enabling parallel operations on thousands of real numbers packed into a single ciphertext.46
  • Machine Learning Suitability: The design of CKKS makes it exceptionally well-suited for a broad range of machine learning applications, where small precision errors in computation are generally tolerable and do not significantly impact the final model accuracy.46 It is the scheme of choice for implementing encrypted linear algebra operations, such as matrix-vector and matrix-matrix multiplications, which form the backbone of neural networks. Consequently, it is the dominant scheme used in research and practical implementations of privacy-preserving deep learning inference and gradient descent-based training.8

 

TFHE: Excelling in Boolean Logic and Non-Arithmetic Operations

 

The Torus Fully Homomorphic Encryption (TFHE) scheme offers a fundamentally different approach to computation, focusing on bit-wise operations and Boolean logic.

  • Core Functionality: TFHE operates on individual encrypted bits, allowing for the homomorphic evaluation of arbitrary Boolean circuits composed of gates like AND, OR, and NOT.58 Its defining characteristic is an extremely fast
    bootstrapping procedure, which can be performed in milliseconds.6 A key innovation in TFHE is “programmable bootstrapping,” which allows the evaluation of an arbitrary function (represented as a lookup table) on an encrypted bit and the refreshing of the ciphertext to occur in a single, efficient step.
  • Key Features: Because TFHE can evaluate any function on bits, it can natively and exactly handle non-arithmetic operations that are very difficult or inefficient for word-wise schemes like CKKS or BFV. This includes crucial operations like comparisons (, ), finding the maximum or minimum of a set of numbers, and evaluating the sign function.54
  • Machine Learning Suitability: TFHE’s strengths make it ideal for evaluating the non-linear components of machine learning models. For example, it can be used to implement an exact ReLU activation function () by using a comparison gate, whereas CKKS must rely on a polynomial approximation. It is also well-suited for evaluating decision trees, which consist of a series of comparisons. However, TFHE’s bit-wise nature makes it very inefficient for the high-throughput arithmetic required for the linear layers (e.g., large matrix multiplications) of a neural network, where schemes like CKKS have a clear advantage.46

 

Strengths, Weaknesses, and Optimal Use Cases for PPML

 

The distinct capabilities of these schemes lead to a clear conclusion: the future of high-performance PPML does not lie with a single, “winner-take-all” scheme. Instead, it points toward the development of hybrid systems that can leverage the complementary strengths of different schemes for different parts of a single computation. A typical neural network inference, for example, consists of alternating linear layers (matrix multiplications) and non-linear activation functions. The most efficient way to evaluate this homomorphically would be to use a high-throughput arithmetic scheme like CKKS for the linear layers, then “switch” the ciphertext into the TFHE domain to evaluate the ReLU activation exactly, and then switch back to CKKS for the next linear layer. This avoids the accuracy loss of polynomial approximations in CKKS and the poor arithmetic performance of TFHE. This concept, known as scheme switching, is an active and critical area of research, with frameworks like Chimera and libraries like OpenFHE already developing the necessary tools to make such hybrid computations a reality.52

The following table summarizes the key characteristics and trade-offs of the major FHE schemes in the context of machine learning.

Feature BGV / BFV CKKS (HEAAN) TFHE (CGGI)
Primary Arithmetic Type Exact Integer / Finite Field Arithmetic Approximate Real / Complex Number Arithmetic Boolean Logic / Integer Arithmetic (via circuits)
Core Plaintext Unit Vector of Integers (Word) Vector of Real/Complex Numbers (Word) Single Bit or Small Integer
Noise Management Modulus Switching (BGV) / Scale Invariant (BFV) Rescaling Bootstrapping
SIMD Support Yes, efficient for integer vectors Yes, highly efficient for real number vectors Limited/Inefficient for large-scale arithmetic
Bootstrapping Slower; resets noise for exact computation Slower than TFHE; resets precision/modulus Very Fast (<1s); enables programmable bootstrapping (LUTs)
Primary ML Strength Exact computations (e.g., secure counting, integer embeddings) High-throughput linear algebra (dense layers, convolutions), gradient descent Exact evaluation of non-polynomial functions (ReLU, comparisons), decision trees
Primary ML Weakness Inefficient for real-number models; requires careful quantization Inefficient for comparisons and non-polynomial functions without approximation Low throughput for large-scale arithmetic (matrix multiplication)

 

Engineering Feasibility: Solutions and Optimizations

 

The theoretical promise of FHE is being translated into practical reality through a concerted effort across the software and hardware domains. A maturing ecosystem of open-source libraries and high-level compilers is making the technology more accessible, while dedicated hardware accelerators are beginning to deliver the orders-of-magnitude performance gains necessary for real-world deployment. This evolution mirrors the development of other high-performance computing (HPC) fields, where a layered stack—from custom hardware to user-friendly software—is essential for widespread adoption.

 

Software Ecosystem and Algorithmic Advances

 

The foundation of practical FHE development lies in a rich ecosystem of open-source software libraries that implement the complex underlying cryptography. These libraries provide the building blocks that allow researchers and engineers to construct privacy-preserving applications without having to become expert cryptographers themselves.

 

The Role of Open-Source Libraries

 

Several key libraries have emerged, each with different strengths and supported schemes.

  • OpenFHE: A modern, community-driven C++ library that has become a leading platform for FHE research and development. It is a spiritual successor to the PALISADE library and integrates design concepts from several other major projects. Its key strength is its comprehensive support for all major FHE schemes, including BGV, BFV, CKKS, and TFHE, within a single, modular framework. It is designed from the ground up with bootstrapping and hardware acceleration in mind, featuring a Hardware Abstraction Layer (HAL) to facilitate integration with GPUs, FPGAs, and ASICs.64
  • Microsoft SEAL: One of the most widely used FHE libraries, known for its high-quality code, excellent documentation, and focus on usability. Developed by Microsoft Research, SEAL (Simple Encrypted Arithmetic Library) provides robust implementations of the BFV and CKKS schemes. Its ease of use has made it a popular choice for developers and researchers entering the FHE field.26
  • IBM HElib: One of the earliest and most influential FHE libraries, HElib was the first to provide an open-source implementation of the BGV scheme, including its complex bootstrapping procedure. It has since added support for the CKKS scheme and remains an important tool for the research community.13
  • TFHE-rs and Concrete: Developed by the company Zama, TFHE-rs is a pure Rust implementation of the TFHE scheme, emphasizing performance, memory safety, and modern software engineering practices. It serves as the cryptographic core for Zama’s higher-level tools, including the Concrete library and the Concrete ML framework, which are specifically designed for privacy-preserving machine learning.13
  • Lattigo: An open-source FHE library written entirely in the Go programming language. It supports the BFV, BGV, and CKKS schemes and has a particular focus on multiparty protocols. Its implementation in Go makes it well-suited for modern cloud-native and microservices architectures.65

The table below provides a comparative overview of these key libraries.

Library Lead Developer/Maintainer Primary Language Supported Schemes Key Features
OpenFHE Duality Technologies & Community C++ BGV, BFV, CKKS, TFHE Comprehensive scheme support, hardware acceleration layer (HAL), built-in scheme switching.
Microsoft SEAL Microsoft Research C++ BFV, CKKS High-quality implementation, excellent documentation, focus on usability.
IBM HElib IBM Research C++ BGV, CKKS Pioneering implementation of BGV with bootstrapping.
TFHE-rs Zama Rust TFHE High performance, memory safety, core of the Concrete ecosystem.
Concrete ML Zama Python TFHE High-level framework to convert ML models (scikit-learn, PyTorch) into FHE.
Lattigo Tune Insight Go BGV, BFV, CKKS Native Go implementation, strong support for multiparty protocols.

 

High-Level Tooling: Compilers and Transpilers for FHE

 

While libraries provide the essential cryptographic primitives, they still require significant expertise to use correctly. To bridge the gap between cryptography and data science, a new generation of high-level tools is emerging. These compilers and transpilers aim to automate the process of converting standard programs and machine learning models into their FHE equivalents.

  • Zama’s Concrete ML: This Python framework is a prime example of such a tool. It allows a data scientist to take a model trained in a familiar framework like scikit-learn or PyTorch and, with a few lines of code, compile it into a privacy-preserving version that can perform inference on encrypted data. The framework automatically handles the complex tasks of model quantization, conversion to an FHE-compatible representation, and parameter selection for the underlying TFHE scheme.13
  • Google’s FHE Transpiler: This open-source tool takes a different approach, allowing developers to write general-purpose C++ code which is then transpiled into an FHE-equivalent program that runs on a cryptographic backend like OpenFHE. This aims to enable a broader range of privacy-preserving applications beyond just machine learning.65

These tools are crucial for the democratization of FHE, as they abstract away the immense complexity of the underlying cryptography and allow domain experts to focus on their applications.

 

The Hardware Acceleration Imperative

 

Despite significant algorithmic and software improvements, executing FHE on general-purpose CPUs remains too slow for many time-sensitive or large-scale machine learning tasks. The consensus in the field is that specialized hardware acceleration is not just an optimization but a necessity for making FHE practical.38 The core computations in FHE, primarily large-integer polynomial arithmetic, are highly structured and massively parallel, making them poor fits for the architecture of a modern CPU but ideal candidates for custom hardware like FPGAs and ASICs.

 

FPGAs: Reconfigurable Hardware for FHE Primitives

 

Field-Programmable Gate Arrays (FPGAs) offer a flexible and powerful platform for accelerating FHE. Unlike CPUs, FPGAs consist of a large array of reconfigurable logic blocks that can be programmed to create custom digital circuits optimized for a specific task.44 This allows for the creation of highly parallel and pipelined architectures tailored to the most computationally intensive FHE primitives, such as the Number Theoretic Transform (NTT)—an algorithm essential for performing fast polynomial multiplication.44

Several research projects have demonstrated the potential of FPGAs to deliver significant speedups. A notable example is FAB (FPGA-based Accelerator for Bootstrappable FHE). FAB was the first project to demonstrate a complete implementation of the CKKS scheme, including the complex bootstrapping procedure, on an FPGA for practical security parameters. For a logistic regression model training application, FAB achieved a remarkable 456x speedup over a multi-core CPU implementation and a 9.5x speedup over a high-end GPU implementation.72 Similarly, Zama has developed and open-sourced an FHE processor design, the

HPU (Homomorphic Processing Unit), specifically for accelerating TFHE on FPGAs.76

 

ASICs: Custom Silicon for Peak Performance

 

Application-Specific Integrated Circuits (ASICs) represent the ultimate solution for hardware acceleration. By designing a silicon chip from the ground up specifically for FHE computations, ASICs can achieve the highest possible performance and power efficiency, far surpassing what is possible with FPGAs or GPUs.39

Recognizing this potential, government agencies like the U.S. Defense Advanced Research Projects Agency (DARPA) have launched major research programs, such as DPRIVE (Data Protection in Virtual Environments), to fund the development of FHE ASICs.78 This has spurred a wave of innovation, leading to several prominent accelerator designs:

  • CraterLake: An ASIC accelerator designed for unbounded FHE computation, it introduces a new architecture that scales efficiently to the very large ciphertexts required for deep computations and outperforms a 32-core CPU by a geometric mean of 4,600x.38
  • F1: One of the first programmable FHE accelerators, F1 is a wide-vector processor with functional units specialized for FHE primitives. It achieves a speedup of 5,400x over a 4-core CPU for shallow FHE computations.38
  • BASALISC: An ASIC architecture designed to accelerate the BGV scheme, including fully-packed bootstrapping. Simulation results for BASALISC project a speedup of over 5,000 times compared to the widely used HElib software library running on a CPU.77

The table below summarizes some of the key hardware acceleration projects, highlighting the dramatic performance gains they have achieved.

Project Name Lead Institution/Company Platform Target Scheme(s) Key Innovation Reported Speedup (vs. CPU)
FAB Boston University, et al. FPGA CKKS First FPGA accelerator with full bootstrapping support. 456x (for Logistic Regression)
CraterLake MIT, et al. ASIC Generic (CKKS-like) Architecture for unbounded computation and very large ciphertexts. 4,600x (gmean vs. 32-core CPU)
F1 MIT, et al. ASIC Generic (CKKS-like) First programmable wide-vector FHE accelerator. 5,400x (gmean vs. 4-core CPU)
BASALISC KU Leuven, et al. ASIC BGV First BGV accelerator with fully-packed bootstrapping. >5,000x (vs. HElib software)
Zama HPU Zama FPGA TFHE Open-source, programmable processor for TFHE operations. N/A (enables ~13k PBS/sec)

 

Synthesis and Future Outlook

 

The journey of Fully Homomorphic Encryption from a theoretical curiosity to a computationally feasible technology for machine learning has been marked by rapid and multifaceted progress. The convergence of advanced cryptographic schemes, a maturing software ecosystem, and transformative hardware acceleration has brought the field to an inflection point, where practical applications are no longer a distant vision but an emerging reality.

 

The State of Practical FHE for Machine Learning

 

Today, the application of FHE to moderately complex machine learning tasks is demonstrably feasible. Encrypted inference for standard deep learning models like ResNet-20 on datasets such as CIFAR-10, which was once computationally intractable, is now achievable.24 With software-only implementations, inference times have been reduced from days to hours or minutes. With the advent of specialized hardware accelerators, these times are plummeting further into the realm of seconds or even milliseconds, opening the door to near-real-time applications.61 Similarly, training simpler models like logistic regression on large, encrypted datasets has been successfully demonstrated, with training times on the order of hours on a single machine—a significant achievement given the complexity of the task.21

This progress is not the result of a single breakthrough but rather the product of a holistic, co-design approach that spans the entire computational stack. The path to practicality involves:

  1. Adapting ML Models: Designing FHE-friendly neural networks that use polynomial activation functions and are robust to low-precision quantization.
  2. Automating Conversion: Using FHE-aware compilers and high-level tools to automatically translate these models into their encrypted equivalents.
  3. Optimizing Cryptography: Running these models on highly optimized open-source cryptographic libraries that implement the most efficient FHE schemes.
  4. Accelerating Execution: Executing the most demanding cryptographic operations on specialized hardware platforms like FPGAs and ASICs.

It is the synergy between these layers that is successfully bridging the performance chasm that once made FHE impractical.

 

Remaining Challenges and Frontiers of Research

 

Despite the remarkable progress, several challenges remain on the path to the widespread, routine use of FHE in machine learning.

  • Performance and Scalability: While hardware acceleration is closing the gap, a significant performance overhead still exists, particularly for very deep and complex neural network architectures like Transformers or for applications requiring extremely low latency. Scaling FHE to handle massive datasets and models with billions of parameters remains a key challenge.
  • Standardization: The FHE landscape consists of multiple schemes and libraries with different APIs and parameter conventions. The FHE.org community is actively working toward creating standards for security parameters and potentially APIs, which will be crucial for ensuring interoperability, security, and long-term stability in the ecosystem.81
  • Software and Usability: While tools like Concrete ML are making FHE more accessible, there is still a need for more advanced compilers and development environments that can fully abstract the underlying cryptographic complexity from machine learning practitioners, enabling them to design and deploy privacy-preserving solutions with minimal cryptographic knowledge.
  • Advanced ML Models: Current FHE research has largely focused on feed-forward neural networks and simpler models. Extending FHE to efficiently handle more complex and dynamic architectures, such as Recurrent Neural Networks (RNNs), Graph Neural Networks (GNNs), and the attention mechanisms in Transformers, remains an active and challenging frontier of research.

 

Concluding Remarks: The Trajectory Towards Ubiquitous Encrypted Computation

 

The trajectory of Fully Homomorphic Encryption is clear. The combination of exponential improvements in algorithmic efficiency and the dedicated engineering of specialized hardware is rapidly diminishing the computational barriers that once confined FHE to the realm of theory. The question is no longer if FHE will be practical for machine learning, but when and for which applications it will become the standard.

As the technology continues to mature, FHE is poised to become a critical pillar of the next generation of privacy-enhancing technologies. It offers a powerful cryptographic guarantee of privacy that is independent of trust in hardware or infrastructure, and its post-quantum nature ensures its relevance for decades to come. By enabling a world where the immense value of data can be harnessed for innovation in science, medicine, and commerce without compromising the fundamental right to privacy, FHE is on a path to becoming an essential tool for building a more secure and trustworthy digital society.