<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Among Bytes</title>
<link>https://www.amongbytes.com/</link>
<atom:link href="https://www.amongbytes.com/index.xml" rel="self" type="application/rss+xml"/>
<description></description>
<generator>quarto-1.9.36</generator>
<lastBuildDate>Mon, 23 Feb 2026 00:00:00 GMT</lastBuildDate>
<item>
  <title>Benchmarking ML-DSA Signature Generation: Understanding Rejection Sampling Performance</title>
  <dc:creator>Kris Kwiatkowski</dc:creator>
  <link>https://www.amongbytes.com/posts/20260223-mldsa-benchmarking/20260223-mldsa-benchmarking.html</link>
  <description><![CDATA[ 





<section id="introduction" class="level2">
<h2 class="anchored" data-anchor-id="introduction">Introduction</h2>
<p>As post-quantum cryptography (PQC) adoption accelerates, developers face new challenges in deploying signature schemes like ML-DSA on constrained devices-from embedded IoT devices to edge servers with limited computational resources. Unlike traditional signature algorithms such as RSA or ECDSA, which typically exhibit relatively predictable signing latency on a fixed platform, ML-DSA introduces an architectural feature <strong>rejection sampling</strong> that makes signing time inherently variable. This probabilistic mechanism ensures cryptographic security but creates variable signing latencies that can significantly impact system performance.</p>
<p>In this post, we’ll explore what rejection sampling means for ML-DSA benchmarking, share concrete performance metrics, and discuss best practices for measuring signing speed on constrained cryptographic modules.</p>
<p>This post extends section 4.1 of IETF draft <a href="https://datatracker.ietf.org/doc/html/draft-ietf-pquip-pqc-hsm-constrained-03#name-optimizing-performance-in-p">Adapting Constrained Devices for Post-Quantum Cryptography</a>.</p>
</section>
<section id="why-ml-dsa-signing-is-different" class="level2">
<h2 class="anchored" data-anchor-id="why-ml-dsa-signing-is-different">Why ML-DSA Signing is Different</h2>
<p>ML-DSA implements the Fiat-Shamir with Aborts construction, which uses rejection sampling as a core mechanism - a design choice rooted in lattice-based cryptography’s unique mathematical properties. Here’s what that means practically:</p>
<section id="the-fiat-shamir-with-aborts-construction" class="level3">
<h3 class="anchored" data-anchor-id="the-fiat-shamir-with-aborts-construction">The Fiat-Shamir with Aborts Construction</h3>
<p>Traditional Fiat-Shamir signatures use a public challenge derived from the message and public key. However, lattice-based signatures like ML-DSA need stronger guarantees. The “with Aborts” variant solves this by:</p>
<ol type="1">
<li>Computing preliminary signature components (called a “y” value in lattice terms)</li>
<li>Deriving a challenge from these components and the message</li>
<li>Computing candidate signature components using the challenge and secret key</li>
<li><strong>Checking norm bounds</strong>: Verifying that these components don’t exceed predefined thresholds</li>
<li>Either accepting the signature or aborting and restarting with fresh randomness</li>
</ol>
</section>
<section id="what-the-norm-bounds-actually-do" class="level3">
<h3 class="anchored" data-anchor-id="what-the-norm-bounds-actually-do">What the Norm Bounds Actually Do</h3>
<p>The norm bound checks ensure that signature components stay within specific vector magnitude ranges. In lattice cryptography, if signature values are allowed to vary based on the secret key properties, an attacker could observe:</p>
<ul>
<li>Patterns in signature magnitudes that leak information about the private key</li>
<li>Subtle correlations between multiple signatures that compromise security</li>
<li>Bias in the distribution of valid signatures</li>
</ul>
<p>By enforcing strict bounds, rejection sampling eliminates these side channels. The tradeoff: some attempts must be discarded, creating the variable latency we’ll see throughout this post.</p>
</section>
<section id="why-this-creates-variable-performance" class="level3">
<h3 class="anchored" data-anchor-id="why-this-creates-variable-performance">Why This Creates Variable Performance</h3>
<p>After computing candidate signature components, the algorithm checks whether these bounds are satisfied. If they aren’t met - which happens probabilistically - the entire signing attempt is discarded and restarted with fresh randomness.</p>
<p>This approach serves two critical purposes:</p>
<ol type="1">
<li><strong>Security</strong>: Prevents information leakage about the secret key through out-of-range values</li>
<li><strong>Correctness</strong>: Ensures signature distributions match security proof assumptions</li>
</ol>
<p>Unlike traditional algorithms that process a message once and produce a signature, ML-DSA may need to retry the signing process multiple times. This makes predicting signing latency fundamentally different from RSA or ECDSA.</p>
</section>
</section>
<section id="the-numbers-rejection-probability-and-expected-attempts" class="level2">
<h2 class="anchored" data-anchor-id="the-numbers-rejection-probability-and-expected-attempts">The Numbers: Rejection Probability and Expected Attempts</h2>
<p>Here’s where benchmarking gets interesting. The acceptance probability - the chance a signing attempt succeeds on the first try - varies by ML-DSA parameter set:</p>
<div style="float: right; width: 40%; margin: 0 auto;">
<table class="table">
<caption>Acceptance probability - per-attempt probability of successful signing for the given ML-DSA variant.</caption>
<thead>
<tr class="header">
<th>ML-DSA Variant</th>
<th>Acceptance Probability</th>
<th>Expected Attempts</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>ML-DSA-44</td>
<td>23.50%</td>
<td>4.255</td>
</tr>
<tr class="even">
<td>ML-DSA-65</td>
<td>19.63%</td>
<td>5.094</td>
</tr>
<tr class="odd">
<td>ML-DSA-87</td>
<td>25.96%</td>
<td>3.852</td>
</tr>
</tbody>
</table>
</div>
<p>What this tells us:</p>
<ul>
<li><strong>ML-DSA-44</strong> (compact variant) succeeds about 1 in 4 times per attempt</li>
<li><strong>ML-DSA-65</strong> (NIST Category 3, most deployed) expects roughly 5 attempts on average</li>
<li><strong>ML-DSA-87</strong> (highest security) actually has the best acceptance probability, requiring ~3.9 attempts</li>
</ul>
<p>These aren’t guesses - they’re mathematically derived from the algorithm’s structure and parameters defined in FIPS-204 using Equation 5 from Li32, assuming a random bit generator (RBG) as specified in Section 3.6.1.</p>
<section id="factors-affecting-rejection-probability" class="level3">
<h3 class="anchored" data-anchor-id="factors-affecting-rejection-probability">Factors Affecting Rejection Probability</h3>
<p>The probability that any given signing attempt succeeds isn’t fixed - it depends on several factors:</p>
<ol type="1">
<li><strong>The message being signed</strong>: Different messages produce different challenge values</li>
<li><strong>The secret key material</strong>: Specific key properties affect norm bound satisfaction probability</li>
<li><strong>Random seed (hedged signing)</strong>: When FIPS-204 Section 3.4 hedged signing is used, additional randomness affects outcomes</li>
<li><strong>Context string</strong>: The optional context parameter (FIPS-204 Section 5.2) influences the challenge derivation</li>
</ol>
<p>In practice, this means some message-key combinations may require significantly more rejection iterations than others. A particular message signed with a particular key might consistently need more attempts than average, while another pairing might consistently succeed quickly.</p>
</section>
</section>
<section id="understanding-the-distribution" class="level2">
<h2 class="anchored" data-anchor-id="understanding-the-distribution">Understanding the Distribution</h2>
<p>The expected number of attempts is only part of the story. Due to the geometric distribution of the rejection-sampling loop, we need to understand the “tail” of the distribution-what happens in worst-case scenarios.</p>
<section id="the-mathematical-model" class="level3">
<h3 class="anchored" data-anchor-id="the-mathematical-model">The Mathematical Model</h3>
<p>For benchmarking and capacity planning, the rejection-sampling loop is well modeled as a geometric distribution with acceptance probability <img src="https://latex.codecogs.com/png.latex?p">:</p>
<ul>
<li>Each attempt either succeeds (with probability p = acceptance probability) or fails (with probability <img src="https://latex.codecogs.com/png.latex?1-p">)</li>
<li>The number of attempts follows a geometric distribution</li>
<li>The expected total attempts = <img src="https://latex.codecogs.com/png.latex?1/p"> (the reciprocal of acceptance probability)</li>
</ul>
<p>Using this model, we can calculate the cumulative distribution function (CDF)-the probability of completing signing within exactly N iterations.</p>
</section>
<section id="ml-dsa-cumulative-distribution" class="level3">
<h3 class="anchored" data-anchor-id="ml-dsa-cumulative-distribution">ML-DSA Cumulative Distribution</h3>
<p>The CDF expresses the probability that the signing process completes within at most a given number of iterations.</p>
<div style="float: right; width: 60%; margin: 0 auto;">
<p><img src="https://www.amongbytes.com/posts/20260223-mldsa-benchmarking/cdf.png" class="img-fluid" style="width:100.0%"></p>
</div>
<p>The data shows significant variation across ML-DSA variants:</p>
<ul>
<li><strong>First attempt success</strong>: Only 19.6% to 26% of signing operations succeed on the first try</li>
<li><strong>Within 5 iterations</strong>: About two-thirds (67-78%) of operations complete by the 5th attempt</li>
<li><strong>Within 10 iterations</strong>: Most operations (88-95%) complete within 10 attempts</li>
<li><strong>Tail behavior</strong>: Even after 11 iterations, a small fraction (3-9%) of operations still need more attempts</li>
</ul>
<div style="width: 60%; margin: 0 auto;">
<table class="table">
<caption>Expected Number of Attempts for the given ML-DSA variant.</caption>
<thead>
<tr class="header">
<th>Iterations</th>
<th>ML-DSA-44</th>
<th>ML-DSA-65</th>
<th>ML-DSA-87</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>1</td>
<td>23.50%</td>
<td>19.63%</td>
<td>25.96%</td>
</tr>
<tr class="even">
<td>2</td>
<td>41.48%</td>
<td>35.41%</td>
<td>45.18%</td>
</tr>
<tr class="odd">
<td>3</td>
<td>55.23%</td>
<td>48.09%</td>
<td>59.41%</td>
</tr>
<tr class="even">
<td>4</td>
<td>65.75%</td>
<td>58.28%</td>
<td>69.95%</td>
</tr>
<tr class="odd">
<td>5</td>
<td>73.80%</td>
<td>66.47%</td>
<td>77.75%</td>
</tr>
<tr class="even">
<td>6</td>
<td>79.96%</td>
<td>73.05%</td>
<td>83.53%</td>
</tr>
<tr class="odd">
<td>7</td>
<td>84.67%</td>
<td>78.34%</td>
<td>87.80%</td>
</tr>
<tr class="even">
<td>8</td>
<td>88.27%</td>
<td>82.59%</td>
<td>90.97%</td>
</tr>
<tr class="odd">
<td>9</td>
<td>91.03%</td>
<td>86.01%</td>
<td>93.31%</td>
</tr>
<tr class="even">
<td>10</td>
<td>93.14%</td>
<td>88.76%</td>
<td>95.05%</td>
</tr>
<tr class="odd">
<td>11</td>
<td>94.75%</td>
<td>90.96%</td>
<td>96.34%</td>
</tr>
</tbody>
</table>
</div>
<p>This demonstrates the importance of dimensioning systems with adequate retry budget. While ML-DSA-44 and ML-DSA-87 show faster convergence than ML-DSA-65, all variants exhibit the same geometric tail behavior-rare but real outliers that extend beyond typical case scenarios.</p>
</section>
</section>
<section id="practical-implications-for-constrained-devices" class="level2">
<h2 class="anchored" data-anchor-id="practical-implications-for-constrained-devices">Practical Implications for Constrained Devices</h2>
<p>For battery-powered IoT devices and embedded systems, this variability matters significantly:</p>
<section id="latency-unpredictability" class="level3">
<h3 class="anchored" data-anchor-id="latency-unpredictability">Latency Unpredictability</h3>
<p>Consider a concrete example: suppose a single rejection-sampling iteration takes 100 microseconds on your embedded device.</p>
<ul>
<li><strong>Best case (1 iteration)</strong>: 100 microseconds</li>
<li><strong>Expected case (5 iterations)</strong>: 500 microseconds</li>
<li><strong>95th percentile (11 iterations)</strong>: 1,100 microseconds</li>
<li><strong>99th percentile (21 iterations)</strong>: 2,100 microseconds</li>
</ul>
<p>For time-critical applications like IoT gateways expecting 1 millisecond response times, this becomes problematic. If your system budgets for average-case performance (500 μs) and occasionally encounters 99th percentile cases (2,100 μs), you’ll miss deadlines approximately 1% of the time. In production systems handling thousands of signatures per day, that 1% isn’t negligible.</p>
</section>
<section id="energy-consumption-variability" class="level3">
<h3 class="anchored" data-anchor-id="energy-consumption-variability">Energy Consumption Variability</h3>
<p>Power consumption scales directly with iteration count. On battery-powered devices:</p>
<ul>
<li>A “fast” signature (1 iteration) might consume 50 mJ</li>
<li>The same signature might consume 250 mJ at expected case (5 iterations)</li>
<li>Rare outliers (21 iterations) might consume 1,050 mJ</li>
</ul>
<p>For devices relying on energy harvesting or with tight power budgets, this 20x variation between best and 99th percentile cases creates significant uncertainty. Devices must either:</p>
<ol type="1">
<li>Over-provision battery capacity for worst-case scenarios</li>
<li>Implement aggressive power limiting that reduces throughput</li>
<li>Accept occasional failed signing operations when power budgets are exceeded</li>
</ol>
</section>
<section id="impact-on-tls-handshakes" class="level3">
<h3 class="anchored" data-anchor-id="impact-on-tls-handshakes">Impact on TLS Handshakes</h3>
<p>In TLS 1.3 with ML-DSA, the server performs a signature during the handshake. On a mobile IoT device over cellular:</p>
<ul>
<li>Expected signing: ~500 μs (manageable within handshake timing)</li>
<li>Occasional outliers: ~2,100 μs (visible latency increase; user-perceptible in some scenarios)</li>
<li>Compounded with network latency and cryptographic verification, outlier cases can extend handshakes by 10-20+ milliseconds</li>
</ul>
<p>For LTE IoT connections, this can push handshakes from 200ms to 220ms - noticeable but usually acceptable. However, on slower networks or with multiple signature operations, the impact multiplies.</p>
</section>
<section id="system-design-considerations" class="level3">
<h3 class="anchored" data-anchor-id="system-design-considerations">System Design Considerations</h3>
<ul>
<li><strong>Real-time systems</strong> must allocate resources for 99th percentile (21 iterations), not average-case (5 iterations), unless they can tolerate occasional missed deadlines</li>
<li><strong>Energy-harvesting devices</strong> need to either buffer energy or implement adaptive signing strategies</li>
<li><strong>Communication protocols</strong> should not assume signing is faster than network operations</li>
<li><strong>Firmware updates</strong> and key generation (which don’t use rejection sampling) can be significantly faster than signing, creating performance asymmetry</li>
</ul>
</section>
</section>
<section id="best-practices-for-benchmarking-ml-dsa-signing" class="level2">
<h2 class="anchored" data-anchor-id="best-practices-for-benchmarking-ml-dsa-signing">Best Practices for Benchmarking ML-DSA Signing</h2>
<p>If you’re benchmarking ML-DSA implementations on constrained devices, don’t fall into the trap of reporting a single timing number. Here’s what to measure:</p>
<section id="single-iteration-signing-time" class="level3">
<h3 class="anchored" data-anchor-id="single-iteration-signing-time">1. Single-Iteration Signing Time</h3>
<p>Measure the time for signature operations that complete in a single rejection-sampling iteration. This captures the <strong>best-case performance</strong> and shows the efficiency of the core algorithm without retry overhead. It isolates the fundamental speed of your cryptographic implementation and makes it comparable across different hardware platforms.</p>
</section>
<section id="average-signing-time" class="level3">
<h3 class="anchored" data-anchor-id="average-signing-time">2. Average Signing Time</h3>
<p>Report the average across a large number of signing operations using independent messages and randomness. Alternatively, report the time corresponding to the expected number of iterations (shown in the table above). This reflects real-world performance that users will actually experience, accounting for the natural variation in rejection attempts.</p>
</section>
<section id="iteration-reporting" class="level3">
<h3 class="anchored" data-anchor-id="iteration-reporting">3. Iteration Reporting</h3>
<p>The most important step: <strong>make the signing function report the actual number of rejection iterations used</strong>. This enables:</p>
<ul>
<li>Accurate averaging of multiple signing operations</li>
<li>Correlation of timing/energy measurements with iteration count</li>
<li>Identification of anomalies or implementation issues</li>
</ul>
</section>
</section>
<section id="comparing-to-traditional-signatures" class="level2">
<h2 class="anchored" data-anchor-id="comparing-to-traditional-signatures">Comparing to Traditional Signatures</h2>
<p>To illustrate why rejection sampling benchmarking is different, consider RSA or ECDSA:</p>
<ul>
<li><strong>Signing time is deterministic</strong>: You can measure a single 2048-bit RSA signature and get the same runtime within microseconds every time</li>
<li><strong>Energy consumption is predictable</strong>: An ECDSA-P256 signature consumes nearly identical energy regardless of message or key</li>
<li><strong>Performance metrics are straightforward</strong>: Report a single timing number; it accurately represents all signing operations</li>
</ul>
<p>The choice of metric dramatically affects system design. Budget for average-case and 1% of your operations will timeout. Budget for 99% case and you’re over-provisioning resources by 4-5x.</p>
</section>
<section id="conclusion" class="level2">
<h2 class="anchored" data-anchor-id="conclusion">Conclusion</h2>
<p>The rejection sampling in ML-DSA’s signing operations is a carefully engineered security feature, not a limitation. It’s fundamental to how lattice-based signatures provide provable security against known attacks. But it does require a thoughtfully different approach to performance evaluation than you might expect from traditional signature algorithms. It is worth to note that:</p>
<ul>
<li><strong>Performance is probabilistic, not deterministic.</strong> A single timing measurement is meaningless. Instead, you need to understand the distribution of signing times.</li>
<li><strong>The expected overhead is manageable.</strong> Averaging 4-5 iterations for ML-DSA-65 is reasonable. The core signing operation (one iteration) executes in acceptable time on modern embedded hardware.</li>
<li><strong>You <em>can</em> predict and measure it precisely.</strong> Using the geometric distribution model and FIPS-204 parameters, you now have the mathematical framework to estimate signing time distributions without extensive benchmarking.</li>
<li><strong>System design must account for variability.</strong> Real-time systems, battery-powered devices, and time-sensitive protocols need to budget for 99th percentile cases, not average-case performance.</li>
<li><strong>Signing only.</strong> The mechanism applies only to the signing operation. This abort/retry mechanism mechanism doesn’t apply to key generation and verification.</li>
</ul>


</section>

 ]]></description>
  <category>Cryptography</category>
  <guid>https://www.amongbytes.com/posts/20260223-mldsa-benchmarking/20260223-mldsa-benchmarking.html</guid>
  <pubDate>Mon, 23 Feb 2026 00:00:00 GMT</pubDate>
  <media:content url="https://www.amongbytes.com/img/rej_sampling.png" medium="image" type="image/png" height="77" width="144"/>
</item>
<item>
  <title>Making SmartFusion2 Productive in Brownfield Systems</title>
  <dc:creator>Kris Kwiatkowski</dc:creator>
  <link>https://www.amongbytes.com/posts/20251219-sf2/20251219-sf2.html</link>
  <description><![CDATA[ 





<section id="introduction" class="level1">
<h1>Introduction</h1>
<p>SmartFusion2 is an interesting platform: an FPGA tightly coupled with a Cortex‑M3 microcontroller, security features baked into silicon, and a toolchain that reflects its long industrial heritage. It is powerful—but it can also feel heavy if your primary goal is simply to get code running, talk over UART, and start experimenting.</p>
<p>This post describes a software-first workflow for working with the Microcontroller Subsystem (MSS) on Microchip SmartFusion2, based on hands‑on work with the M2S090TS evaluation board. The emphasis is deliberately on getting productive quickly, especially for software and security engineers who do not want to live inside FPGA tools.</p>
<p>Rather than documenting every register or Libero click-path, this article focuses on the decisions, trade-offs, and minimal setup that make the platform usable and predictable in practice.</p>
</section>
<section id="context-an-older-platform-and-brownfield-devices" class="level1">
<h1>Context: An Older Platform and Brownfield Devices</h1>
<p>SmartFusion2 is not a new platform. It has been deployed in real products for years, often in long-lived industrial, infrastructure, and security-sensitive systems. This matters, because a large part of its relevance today comes from brownfield deployments, not greenfield designs.</p>
<p>In engineering terms, brownfield devices are systems that:</p>
<ul>
<li>Are already deployed or close to deployment</li>
<li>Have fixed hardware constraints</li>
<li>Cannot be redesigned freely without high cost or risk</li>
<li>Must be extended, maintained, or upgraded in place</li>
</ul>
<p>This is in contrast to greenfield designs, where hardware, software, and tooling choices can be made from scratch. For brownfield devices, the problem is rarely “design the perfect system.” Instead, it is:</p>
<ul>
<li>How to add new functionality without changing hardware</li>
<li>How to modernise software workflows on top of legacy platforms</li>
<li>How to introduce new security mechanisms without destabilising a proven system</li>
</ul>
<p>SmartFusion2 fits squarely into this category. Many teams encounter it not because they would choose it today, but because it is already part of an existing product or certification boundary.</p>
<p>The approach described in this article is shaped by that reality: it assumes fixed hardware, aging tooling, and long product lifetimes, and focuses on making such systems workable and productive rather than ideal.</p>
</section>
<section id="philosophy-software-first-hardware-fixed" class="level1">
<h1>Philosophy: Software First, Hardware Fixed</h1>
<p>The guiding idea behind this setup is simple:</p>
<blockquote class="blockquote">
<p>Fix the hardware early, keep it minimal, and let software move fast.</p>
</blockquote>
<p>SmartFusion2 allows deep hardware customisation, but recompiling FPGA designs is slow, license-gated, and unnecessary for early development. By freezing a small, known-good MSS configuration and distributing it as a ready-to-flash image, software developers can iterate without touching Libero at all.</p>
</section>
<section id="a-minimal-and-repeatable-development-platform" class="level1">
<h1>A Minimal and Repeatable Development Platform</h1>
<p>The evaluation setup is intentionally designed to remove friction during early development. All interaction with the board is handled through a single USB connection. The kit exposes a built-in FlashPro programmer, so no external probes or adapters are required to establish a usable development environment.</p>
<p>In practice, the setup reduces to three fixed elements:</p>
<ul>
<li>One USB cable used for both programming and UART console access</li>
<li>A jumper configuration that allows flashing of both the FPGA fabric and firmware</li>
<li>A known, static DIP-switch configuration</li>
</ul>
<div id="fig-elephants" class="quarto-layout-panel">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-elephants-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div class="quarto-layout-row">
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="https://www.amongbytes.com/posts/20251219-sf2/j1.jpg" class="img-fluid figure-img" width="400"></p>
</div>
<div class="quarto-layout-cell" style="flex-basis: 50.0%;justify-content: center;">
<p><img src="https://www.amongbytes.com/posts/20251219-sf2/j3.jpg" class="img-fluid figure-img" width="400"></p>
</div>
</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig quarto-uncaptioned" id="fig-elephants-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1
</figcaption>
</figure>
</div>
<p>Once configured, the board can remain in this state for the entire development cycle.</p>
<p>On the hardware side, the FPGA design is deliberately minimal. Only the components required to make the MSS usable are enabled: a Cortex-M3 clocked at 166 MHz, APB buses running at full core speed, a single UART for console output, GPIO-mapped LEDs for visible execution state, and one GPIO routed as an external trigger for measurement and debugging. There is no custom logic, no accelerators, and no unused peripherals. The result is a predictable execution environment that behaves identically on every boot.</p>
<p>The FPGA design is developed using Libero SoC, Microchip’s integrated FPGA design environment for SmartFusion2. Libero is used to configure the FPGA fabric, MSS peripherals, clocks, and pin assignments, and to generate the final FPGA bitstream. It is a comprehensive but heavyweight toolchain, typically operated by hardware teams, and it requires licenses, long build times, and detailed device-level knowledge. Libero produces both the FPGA bitstream and the associated firmware artifacts, which can then be used to build a BSP. Firmware engineers can subsequently continue software development in SoftConsole IDE or, for more low-level workflows, directly edit the code (e.g., in vim) and build it using a GCC-based ARM toolchain.</p>
<p>To keep the workflow software-centric, the FPGA can be programmed using FlashPro Express rather than a full Libero project. The hardware is delivered as a pre-built programming job, which avoids licensing requirements and lengthy synthesis or place-and-route steps. Every developer works against an identical hardware configuration, and flashing the FPGA becomes a one-time operation that takes seconds (well, maybe longer…).</p>
<p>The firmware follows the same philosophy. It does only what is necessary to confirm that the platform is alive: initialise clocks and GPIOs, bring up a UART console, and provide a visible heartbeat via an LED. If UART output is visible and the LED toggles, the system is ready. Anything beyond that belongs in application code, not in bring-up firmware.</p>
<p>Two firmware build modes are supported: a debug configuration that runs from SRAM for fast iteration, and a release configuration that runs from on-chip non-volatile memory for deployment-like testing. This split keeps development efficient without sacrificing realism.</p>
</section>
<section id="enabling-uart" class="level1">
<h1>Enabling UART</h1>
<p>Using SmartFusion2 from Linux works well in practice, but one detail regularly trips people up: <strong>UART access via the on-board FTDI device</strong>. Microchip’s documentation is very complete for Windows, but Linux workflows are less well covered (even though Microchip support is excellent). The issue is not the hardware, but how Linux binds drivers to the FTDI interfaces by default.</p>
<p>The evaluation board exposes a multi-interface FT4232H USB device. From a hardware perspective, several virtual serial channels are available. From the Linux kernel’s perspective, however, only one of those interfaces is automatically bound to the <code>ftdi_sio</code> driver, while the remaining three are not.</p>
<p>The FT4232H device connected to the SmartFusion2 micro-USB port sets up four virtual ports. Under Linux, the root device is listed as:</p>
<pre><code>Bus 003 Device 005: ID 1514:2008 Actel Embedded FlashPro5</code></pre>
<p>and each individual interface as</p>
<pre><code>/:  Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/4p, 480M
    |__ Port 4: Dev 5, If 0, Class=Vendor Specific Class, Driver=, 480M
    |__ Port 4: Dev 5, If 1, Class=Vendor Specific Class, Driver=, 480M
    |__ Port 4: Dev 5, If 2, Class=Vendor Specific Class, Driver=ftdi_sio, 480M
    |__ Port 4: Dev 5, If 3, Class=Vendor Specific Class, Driver=, 480M</code></pre>
<p>By default, Linux binds <code>ftdi_sio</code> only to interface <code>If 2</code>. In practice, the remaining interfaces can also be bound to <code>ftdi_sio</code>. You can force the driver to match them as follows:</p>
<pre><code>echo 1514 2008 | sudo tee /sys/bus/usb-serial/drivers/ftdi_sio/new_id</code></pre>
<p>This command causes the kernel to rebind the remaining FTDI interfaces, which can be verified in <code>dmesg</code>. After that, UART access is available via the <code>/dev/ttyUSBx</code> device corresponding to interface <code>If 3</code> (the last FTDI interface).</p>
<p>(Credit goes to a colleague who helped identify this behaviour.)</p>
</section>
<section id="flashing-firmware-from-the-linux-command-line" class="level1">
<h1>Flashing Firmware from the Linux Command Line</h1>
<p>Flashing firmware from the command line is not officially supported by the standard toolchain; Microchip recommends using SoftConsole. The GUI relies on OpenOCD (bundled with SoftConsole) and GDB, both of which connect to the on-chip debugger via USB.</p>
<p>A command-line workflow is essential for automation, CI, and reproducible Linux-based development. While it is possible to reverse-engineer the OpenOCD invocations used by the GUI, there is a cleaner alternative.</p>
<p>The programmer is supported by the <a href="https://pyocd.io/">pyOCD</a> toolset, which provides a practical and well-maintained solution for flashing and debugging SmartFusion2 devices using an external probe.</p>
<p>SmartFusion2 evaluation boards expose a standard RVI debug header, which allows the use of common ARM debug probes such as:</p>
<ul>
<li>Keil ULINK (CMSIS‑DAP)</li>
<li>Other CMSIS‑DAP compliant probes</li>
<li>SEGGER J‑LINK</li>
</ul>
<p>A <strong>key drawback</strong> of using an external programmer is that jumper J8 must be moved to position 2–3 in order to program the FPGA bitstream using FlashPro 4 or 5.</p>
<p>pyOCD supports these probes out of the box and provides a clean, scriptable interface suitable for automation.</p>
<p>In practice, the workflow looks like this:</p>
<ul>
<li>Connect the external probe to the RVI header</li>
<li>Install pyOCD on the host system</li>
<li>Install the CMSIS device pack for the SmartFusion2 target</li>
<li>Flash binaries directly from the command line</li>
</ul>
<p>Once set up, flashing a firmware image becomes a single command, which integrates naturally into Makefiles, CMake builds, or CI pipelines. This avoids reliance on vendor GUIs while remaining robust and repeatable.</p>
<p>In practice, this looks as follows:</p>
<ul>
<li>HW configuration: The user connects ULINK2 into the RVI port, connects pins 1-2 of jumper J8 and J31 (as below).</li>
<li>SW configuration: Once pyOCD is installed, the user needs to download a CMSIS package for M2S090 board. No special udev rules are required.</li>
</ul>
<pre><code> pyocd pack install m2s090
Downloading packs (press Control-C to cancel):
    Microsemi.M2Sxxx.1.0.65
Downloading descriptors (001/001)

&gt; pyocd list -p
  #   Probe/Board                           Unique ID   Target
----------------------------------------------------------------
  0   Keil Software Keil ULINK2 CMSIS-DAP   V0010M9E    n/a</code></pre>
<p>With this setup, flashing becomes straightforward:</p>
<pre><code>pyocd flash --target m2s090 app/hello.bin
pyocd reset --target m2s090 -m hw</code></pre>
<p>The <code>reset</code> command resets the board, and the <code>-m hw</code> option ensures that the device does not enter debug mode.</p>
<p>For measurement and analysis, UART is often not the most convenient interface.</p>
<p>As an alternative, SEGGER J-LINK or CMSIS-DAP programmers can be used. The RTT interface provided by J-LINK enables fast data transfer between the device and the host, which can be useful for collecting data for constant-time analysis, serving as an alternative to UART.</p>
<p>In addition, the board includes a Trace Port Interface Unit (TPIU) supporting ITM and ETM (Instruction Trace Module / Embedded Trace Module), which can also be used as an alternative to RTT.</p>
</section>
<section id="using-usbip-for-remote-access" class="level1">
<h1>Using usbip for Remote Access</h1>
<p>In shared lab environments, it is often useful to access SmartFusion2 boards remotely—for example, from CI servers or developer machines without physical USB access. It also helps avoid accidents, such as spilling coffee on expensive hardware.</p>
<p>In such cases, usbip can be used to export a USB-attached debug probe (and, if needed, the FlashPro interface) from a remote host and reattach it on the client machine over the network.</p>
<p>This enables:</p>
<ul>
<li>Remote firmware flashing using pyOCD</li>
<li>Centralised lab hardware shared across multiple users</li>
<li>Integration of physical boards into CI systems</li>
</ul>
<p>Configuration is very simple: * Server side (machine with the USB device physically plugged in)</p>
<pre><code>sudo modprobe usbip_core usbip_host
sudo usbipd -D
sudo usbip list -l
sudo usbip bind -b &lt;BUSID&gt;</code></pre>
<p>The <code>usbip list -l</code> command lists available <code>BUSID</code> values, which can be exported over IP using <code>usbip bind -b</code>.</p>
<ul>
<li>Client side:</li>
</ul>
<pre><code>sudo modprobe usbip_core vhci_hcd
sudo usbip list -r &lt;SERVER_IP&gt;
sudo usbip attach -r &lt;SERVER_IP&gt; -b &lt;BUSID&gt;</code></pre>
<p>Similarly, <code>usbip list -r</code> lists the USB devices exported by the remote server.</p>
<p>As a practical note, usbip uses TCP port 324, so make sure this port is allowed through the firewall.</p>
<p>When combined with pyOCD and stable UART device naming (via udev), usbip allows SmartFusion2 boards to be treated much like network‑attached test equipment. This setup is particularly useful for regression testing, automated measurements, and long‑running experiments.</p>
<p>As with any USB‑over‑IP solution, latency and reliability depend on the network, but for flashing and debug control the approach works well in practice.</p>
</section>
<section id="conclusion" class="level1">
<h1>Conclusion</h1>
<p>SmartFusion2 can feel intimidating at first, especially if approached from an FPGA-centric mindset. Treated instead as a microcontroller platform with a fixed hardware personality, it becomes much easier to work with.</p>
<p>Despite being an older platform, SmartFusion2 remains highly capable. The MSS provides 80 KB of SRAM (or 64 KB when error correction is enabled), which may sound restrictive by modern standards—but in practice it is sufficient for serious cryptography. In our work, we were able to fit both ML-DSA and ML-KEM comfortably, using significantly less. The resulting implementations run reliably and with respectable performance for a Cortex-M3-class device.</p>
<p>This does, however, require care. On Cortex-M3, certain long multiplication instructions are not constant-time, which means developers must be deliberate when implementing cryptographic arithmetic, especially in security-sensitive contexts. Understanding where the microarchitecture leaks and how to work around it is essential.</p>
<p>That topic deserves a deeper discussion of its own and is a story for another blog.</p>
<p>By keeping hardware minimal, firmware simple, and tooling predictable, the platform becomes stable enough to fade into the background. Once the basics are solid, complexity can be added where it actually matters. Starting simple is what makes that possible.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.amongbytes.com/posts/20251219-sf2/sf2_uart.jpg" class="img-fluid figure-img"></p>
<figcaption>SF2</figcaption>
</figure>
</div>


</section>

 ]]></description>
  <guid>https://www.amongbytes.com/posts/20251219-sf2/20251219-sf2.html</guid>
  <pubDate>Mon, 22 Dec 2025 00:00:00 GMT</pubDate>
  <media:content url="https://www.amongbytes.com/posts/20251219-sf2/sf2.png" medium="image" type="image/png" height="96" width="144"/>
</item>
<item>
  <title>Evaluating Intel QAT for Hash-Based Post-Quantum Signature Schemes</title>
  <dc:creator>Kris Kwiatkowski</dc:creator>
  <link>https://www.amongbytes.com/posts/20251129-qat-intro/20251129-qat-intro.html</link>
  <description><![CDATA[ 





<section id="introduction" class="level1">
<h1>Introduction</h1>
<p>Intel QuickAssist Technology (QAT) accelerates cryptographic workloads by offloading selected operations to dedicated hardware. This reduces CPU load and improves throughput for applications that rely on TLS, VPN, storage encryption, key exchange, or large-scale hashing.</p>
<p>This post outlines the QAT software stack and evaluates its usefulness for accelerating modern cryptographic implementations.</p>
<p>In this post, I show that QAT hashing is not a good fit for post-quantum signature schemes such as LMS, XMSS, and SLH-DSA: the accelerator only wins for large, contiguous inputs, while these schemes issue many small, sequential hash calls. Instead, QAT remains best suited for bulk TLS/IPsec/storage workloads and asynchronous offload.</p>
<p><img src="https://www.amongbytes.com/posts/20251129-qat-intro/qat_architecture.svg" class="img-fluid" style="width:100.0%"></p>
<section id="qat-software-stack" class="level3">
<h3 class="anchored" data-anchor-id="qat-software-stack">QAT software stack</h3>
<p>The diagram below shows how applications reach QAT hardware through OpenSSL, the QAT Engine, and Intel’s software libraries:</p>
<ul>
<li><strong>Application</strong>: Consumes cryptographic services (e.g.&nbsp;TLS termination, IPsec, disk encryption, PKI, secure messaging).</li>
<li><strong>OpenSSL</strong>: Exposes standard crypto APIs; can dispatch operations to software or QAT hardware transparently.</li>
<li><strong>QAT Engine</strong>: OpenSSL engine plugin that offloads supported primitives (RSA, ECDSA, DH, AES-GCM, ChaCha20-Poly1305, SHA variants) to QAT hardware when available.</li>
<li><strong>Intel IPP</strong>: Highly optimized CPU software primitives (SIMD, microarchitecture-tuned) for symmetric and asymmetric cryptography when hardware offload is not used.</li>
<li><strong>Multi-Buffer Crypto (IPP-crypto)</strong>: Batches multiple independent crypto jobs (e.g.&nbsp;parallel RSA, AES-GCM streams) to improve core utilization—useful for high-concurrency servers.</li>
<li><strong>Intel QAT Driver</strong>: Kernel + user-space interface to QAT devices. Two branches exist:
<ul>
<li>In-tree (QATlib): Aligned with kernel development model; standardized feature management.</li>
<li>Out-of-tree: Broader feature set for some legacy or extended hardware. Driver utilities (e.g.&nbsp;<code>adf_ctl</code>) configure and monitor accelerator instances. Driver families 1.x and 2.x support different hardware generations.</li>
</ul></li>
<li><strong>QAT Hardware</strong>: Integrated (selected Xeon SKUs) or discrete PCIe accelerators providing queues for crypto and compression services.</li>
<li><strong>QATlib</strong>: User-space library exposing a stable API to submit crypto jobs to QAT hardware or to fall back on software paths.</li>
</ul>
</section>
<section id="key-points" class="level3">
<h3 class="anchored" data-anchor-id="key-points">Key Points</h3>
<ul>
<li>Applications typically call OpenSSL, which can use the QAT Engine to offload to hardware or fall back to software (IPP, IPP-crypto).</li>
<li>QAT Engine is designed specifically for hardware acceleration.</li>
<li>Intel IPP and Multi-buffer Crypto provide CPU-based optimizations when hardware acceleration is unavailable or unnecessary.</li>
<li>Multi-buffer Crypto boosts performance by parallelizing cryptographic operations across multiple data buffers, which is ideal for high-concurrency servers.</li>
</ul>
</section>
</section>
<section id="quantitative-analysis" class="level1">
<h1>Cryptographic support</h1>
<p>The list of algorithms supported by QAT is documented in the official QAT <a href="https://intel.github.io/quickassist/PG/services_cryptography_api.html">documentation</a>. Support varies by hardware generation; the driver detects availability. If an algorithm is unsupported, its context initialization returns <code>CPA_STATUS_UNSUPPORTED</code>. Numeric algorithm IDs are defined in the driver sources <a href="https://github.com/intel/qatlib/tree/main/quickassist/include/lac">here</a>.</p>
<p>For this study, the relevant hashing algorithms provided by the installed hardware are:</p>
<ul>
<li>SHA2-256</li>
<li>SHA2-512</li>
<li>SHA3-256</li>
</ul>
<p>(SHAKE is not supported on this device.)</p>
<section id="quantitative-analysis-1" class="level2">
<h2 class="anchored" data-anchor-id="quantitative-analysis-1">Quantitative analysis</h2>
<p>We assess whether QAT hash implementations are useful as building blocks inside post-quantum signature schemes - LMS and XMSS.</p>
<p>Measurements compare a conventional optimized C implementation (running on the host CPU) with one-off QAT hash requests submitted via QATlib. Single-shot calls reflect PQ use cases: many small, latency-sensitive invocations rather than bulk streaming. Inputs up to 4 MB stay within QAT request size limits; larger sizes are not relevant for these schemes.</p>
<section id="sha2-256" class="level3">
<h3 class="anchored" data-anchor-id="sha2-256">SHA2-256</h3>
<p>In LMS (and similarly XMSS) SHA2-256 dominates runtime in key and signature generation. The inner loop for computing <code>K</code> (<a href="https://datatracker.ietf.org/doc/html/rfc8554#section-4.3">RFC 8554</a>) iteratively applies the hash to ≈55‑byte inputs; the sequential dependency prevents parallelization:</p>
<pre><code>     4. Compute the string K as follows:
        for ( i = 0; i &lt; p; i = i + 1 ) {
          tmp = x[i]
          for ( j = 0; j &lt; 2^w - 1; j = j + 1 ) {
            tmp = H(I || u32str(q) || u16str(i) || u8str(j) || tmp)
          }
          y[i] = tmp
        }
        K = H(I || u32str(q) || u16str(D_PBLC) || y[0] || ... || y[p-1])</code></pre>
<p>Characteristics:</p>
<ol type="1">
<li>Strictly sequential inner chain (no batching benefit). The intermediate value <code>tmp</code> must be computed sequentially and cannot be parallelized.</li>
<li>Very small message size per hash call (55 bytes per invocation).</li>
</ol>
<p>Benchmark results (time per input, log scale) show QAT incurs large fixed overhead for small buffers; advantage appears only beyond ≈512 KB. This makes QAT hashing unsuitable inside LMS/XMSS constructions: cumulative latency increases sharply when each tiny hash call pays the fixed offload cost.</p>
<div id="0828b181" class="cell" data-execution_count="2">
<div class="cell-output cell-output-display">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.amongbytes.com/posts/20251129-qat-intro/20251129-qat-intro_files/figure-html/cell-3-output-1.png" class="quarto-figure quarto-figure-center figure-img" width="661" height="595"></p>
</figure>
</div>
</div>
</div>
<p>Consequently, using QAT-based hash functions as components within LMS, XMSS, or SLH-DSA implementations is not advisable, as it would result in a substantial performance penalty.</p>
<p>Conclusion: explore specialized hardware-assisted chaining (e.g.&nbsp;PQPerform-style offload of the iterative compression loop) rather than generic QAT hashing. Such approaches, however, require hardware features not exposed by QAT.</p>
</section>
<section id="sha3-256" class="level3">
<h3 class="anchored" data-anchor-id="sha3-256">SHA3-256</h3>
<p>SHA3-256 offload shows similar scaling; hardware improves absolute times versus SHA2 but still requires large inputs to amortize setup. For PQ schemes with many sub-256‑byte or kilobyte-scale hashes, CPU software remains superior in both latency and energy.</p>
<div id="b2ed053f" class="cell" data-execution_count="3">
<div class="cell-output cell-output-display">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.amongbytes.com/posts/20251129-qat-intro/20251129-qat-intro_files/figure-html/cell-4-output-1.png" class="quarto-figure quarto-figure-center figure-img" width="661" height="595"></p>
</figure>
</div>
</div>
</div>
<p>The results are consistent with those observed for SHA2, with SHA3 showing noticeably better hardware performance.</p>
<p>Conclusion: The hardware supports SHA3-256 but not SHAKE. Performance characteristics are similar: QAT only overtakes the CPU at large buffer sizes, making it unsuitable for PQ signature schemes where hashes are small and sequential.</p>
</section>
<section id="sha2-512" class="level3">
<h3 class="anchored" data-anchor-id="sha2-512">SHA2-512</h3>
<p>Used in SLH-DSA. Host software (64‑bit words) outperforms QAT across all tested sizes up to 4 MB. The 64‑bit variant also beats SHA2-256 on the same platform, as expected due to native word operations.</p>
<div id="dbac75bc" class="cell" data-execution_count="4">
<div class="cell-output cell-output-display">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.amongbytes.com/posts/20251129-qat-intro/20251129-qat-intro_files/figure-html/cell-5-output-1.png" class="quarto-figure quarto-figure-center figure-img" width="661" height="595"></p>
</figure>
</div>
</div>
</div>
<p>Conclusion: For SHA2-512, QAT never outperforms the host CPU at any tested size. Software remains the optimal choice for SLH-DSA workloads.</p>
</section>
<section id="overall-findings" class="level3">
<h3 class="anchored" data-anchor-id="overall-findings">Overall findings</h3>
<ul>
<li>QAT hashing favors large, contiguous payloads (bulk TLS record digestion, storage integrity, deduplication).</li>
<li>PQ signature schemes issue numerous small, sequential, data-dependent hash calls—poor match for queue-based accelerator semantics.</li>
<li>Offload overhead dominates for sub‑MB inputs; no throughput crossover in practical PQ parameter ranges.</li>
<li>Optimized host implementations (vectorized SHA2/SHA3) yield lower latency and better scaling for LMS, XMSS, SLH-DSA.</li>
</ul>
<p>Conclusion: For post-quantum signature workloads on 64‑bit systems, retain optimized CPU hash functions. QAT hashing is not an effective accelerator for their internal iterative constructions.</p>
</section>
</section>
</section>
<section id="technological-fit-for-qat" class="level1">
<h1>Technological Fit for QAT</h1>
<p>To realize these benefits, applications must integrate QAT asynchronously. Offloading compute-heavy primitives frees CPU cycles and improves throughput, particularly in high-concurrency environments.</p>
<p>A key enabler is asynchronous execution. In the past, Intel invested in OpenSSL by implementing <a href="https://www.mobibrw.com/wp-content/uploads/2024/08/337003-001-intelquickassisttechnologyandopenssl-110.pdf"><code>ASYNC_JOB</code></a> infrastructure. This functionality is based on reactor pattern, which we describe below.</p>
<section id="asynchronous-use-of-the-qat_engine.-the-reactor-pattern." class="level2">
<h2 class="anchored" data-anchor-id="asynchronous-use-of-the-qat_engine.-the-reactor-pattern.">Asynchronous use of the <code>QAT_engine</code>. The reactor pattern.</h2>
<p>The reactor software design pattern is an event handling strategy that can respond to many potential service requests concurrently. Its key function is to demultiplex incoming requests and dispatch them to the correct request handler. By relying on event-based mechanisms rather than blocking I/O or multi-threading, it’s designed to handle numerous concurrent I/O bound requests with minimal delay. Request handlers (here QAT engines) are registered as callbacks with the event handler for flexibility and separation of concerns.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><a href="https://en.wikipedia.org/wiki/Reactor_pattern"><img src="https://www.modernescpp.com/wp-content/uploads/2023/04/reactorUML.png" class="img-fluid figure-img"></a></p>
<figcaption>Reactor</figcaption>
</figure>
</div>
<p>The Reactor pattern excels at managing asynchronous I/O. QAT is accessed asynchronously; requests are submitted, and the application is notified later via polling or a callback. This direct match means a Reactor-based application architecture can effectively handle QAT operations without blocking its main event loop. This is typically used for TLS hardware acceleration: each engine is implemented as a request handler within the reactor. Such a design minimizes connection-establishment latency in environments that handle multiple requests at the same time.</p>
</section>
<section id="tls-offload" class="level2">
<h2 class="anchored" data-anchor-id="tls-offload">TLS offload</h2>
<p>With asynchronous execution available in OpenSSL, Intel performed a <a href="https://www.intel.com/content/www/us/en/content-details/767645/intel-quickassist-technology-nginx-performance-white-paper.html">measurement study</a> showing performance improvement when offloading TLS. The study goes into great detail on how the measurements were done.</p>
<p>The conclusion that is particularly interesting is that QAT throughput for TLS saturates beyond 16 cores for RSA2K/ECDHE-X25519.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.amongbytes.com/posts/20251129-qat-intro/QAT_TLS_offload.jpg" class="img-fluid figure-img"></p>
<figcaption>NGINX Webserver Handshake Performance</figcaption>
</figure>
</div>
<p>It would be interesting to see a similar measurement study focused on post-quantum (PQ) schemes, namely the following combinations:</p>
<ul>
<li><p>Key Exchange: X25519-MLKEM768, Digital Signature: RSA-2048 This reflects the most commonly used hybrid setup in today’s web traffic.</p></li>
<li><p>Key Exchange: MLKEM-768, Digital Signature: ML-DSA-65 A compelling mid-term candidate for fully post-quantum TLS session establishment.</p></li>
<li><p>Key Exchange: MLKEM-768, Digital Signature: FN-DSA-512 Arguably the most performant post-quantum option for future web communication.</p></li>
</ul>
<p>A comparison of QAT, software implementations, and an implementation on a GPU (e.g.&nbsp;cuPQC) could provide a realistic view of performance trade-offs across hybrid and fully post-quantum scenarios.</p>
</section>
</section>
<section id="qat-ecosystem-ipp-and-multi-buffer-crypto" class="level1">
<h1>QAT ecosystem: IPP and Multi-buffer Crypto</h1>
<p>This part describes in more detail the software components used in the QAT ecosystem.</p>
<section id="integrated-performance-primitives-ipp" class="level3">
<h3 class="anchored" data-anchor-id="integrated-performance-primitives-ipp">Integrated Performance Primitives (IPP)</h3>
<p>Intel IPP is a set of software libraries optimized for Intel processors that provides a variety of cryptographic primitives. They are optimized for latency and throughput by using Intel’s ISA crypto extensions (traditional software acceleration, leveraging SIMD, NI, etc.). IPP implements operations like AES, RSA, ECC, hashing (SHA), and compression algorithms.</p>
</section>
<section id="multi-buffer-crypto" class="level3">
<h3 class="anchored" data-anchor-id="multi-buffer-crypto">Multi-buffer Crypto</h3>
<p>A specialized library developed by Intel and often packaged alongside IPP, designed specifically for parallelizing cryptographic operations across multiple independent data buffers simultaneously. It optimizes performance in multi-threaded or asynchronous environments by batching multiple independent cryptographic operations and processing them concurrently. Particularly useful for ciphers like AES-GCM, where latency can be hidden by parallelism.</p>
<p>It’s important to understand that they are designed <em>explicitly</em> for parallel workloads. This solution achieves significantly better throughput by processing multiple buffers concurrently, even within a single thread.</p>
<p>Ideal for high-throughput networking (server side) scenarios, VPNs, and SSL/TLS termination points, with <em>multiple</em> client connections.</p>
</section>
<section id="relationship-between-ipp-and-multi-buffer-crypto" class="level3">
<h3 class="anchored" data-anchor-id="relationship-between-ipp-and-multi-buffer-crypto">Relationship between IPP and Multi-buffer Crypto</h3>
<p>IPP provides baseline cryptographic primitives optimized for single-buffer, high-performance CPU execution. Multi-buffer Crypto takes IPP primitives a step further, optimizing for parallel operations on multiple independent data streams.</p>
<p>Multi-buffer Crypto delivers much higher throughput in scenarios where latency can be tolerated and multiple independent tasks can run in parallel.</p>
</section>
</section>
<section id="reuse-within-a-cryptographic-software-stack" class="level1">
<h1>Reuse within a cryptographic software stack</h1>
<p>When integrating QAT (or any accelerator) into a cryptographic software stack, three patterns are common:</p>
<ol type="1">
<li>Integration into the core crypto stack</li>
<li>Separate companion component alongside the core stack</li>
<li>Plugin-based integration for open-source ecosystems</li>
</ol>
<section id="integration-into-the-core-crypto-stack" class="level2">
<h2 class="anchored" data-anchor-id="integration-into-the-core-crypto-stack">Integration into the core crypto stack</h2>
<p>This approach embeds HW‑accelerated implementations directly into the core library, exposing a unified API over both software and HW assisted paths. While convenient at the interface level, it significantly increases design complexity and long‑term maintenance burden. A cleaner separation is to keep the core limited to portable software (including CPU ISA extensions), and avoid coupling to device‑specific accelerators.</p>
<p>The same conclusion applies to accelerator SDKs targeting other devices (e.g., GPUs): keeping them out of the core library preserves clarity and portability.</p>
</section>
<section id="separate-companion-component" class="level2">
<h2 class="anchored" data-anchor-id="separate-companion-component">Separate companion component</h2>
<p>This mirrors a split design: a pure‑software core and a distinct HW‑assisted component, each with its own repository, release cadence, and maintenance workflow. The separation is practical because accelerator support targets a narrower footprint, while the core aims for broad platform coverage. A companion component can provide a user‑space dispatch layer that selects between accelerator backends and CPU‑optimized code, and can be extended to support additional devices over time.</p>
</section>
<section id="plugin-based-integration-for-open-source-ecosystems" class="level2">
<h2 class="anchored" data-anchor-id="plugin-based-integration-for-open-source-ecosystems">Plugin-based integration for open-source ecosystems</h2>
<p>Here, accelerator support is delivered as plugins for widely used open‑source cryptographic libraries and frameworks. We recommend building and maintaining such plugins following the separate‑component strategy above: plugins integrate with an existing cryptographic implementation rather than re‑implementing primitives. These plugins must handle low‑latency, incremental input processing and asynchronous completion.</p>
<p>This approach delivers immediate value by integrating with real applications—for example, web servers for TLS offload or VPN stacks for key establishment—without requiring application rewrites.</p>
<p>In practice, I favor a portable software core plus a separate accelerator companion and plugin-style integrations for ecosystems like OpenSSL, rather than baking accelerator logic into the core library.</p>
</section>
</section>
<section id="conclusions" class="level1">
<h1>Conclusions</h1>
<p>The <span id="quantitative-analysis">quantitative study</span> presented in this post was conducted using QAT hardware connected via PCIe. While the host machine used is relatively powerful, the PCIe communication introduces latency during data transfers.</p>
<p>Intel’s 4th Generation Xeon Scalable processors, released in 2023, represent a significant architectural advancement. These processors feature integrated QAT acceleration engines and support for the CXL 1.1 (Compute Express Link) standard. QAT performance on these CPUs may differ significantly from our current results. Preliminary analysis suggests that CXL offers a more efficient communication model between CPUs and cryptographic accelerators, making it a better fit for such workloads. Even with CXL, unless latency drops substantially, the underlying pattern is likely unchanged: QAT is effective for bulk workloads but not for the many small, sequential hash invocations typical in PQ signature schemes.</p>


</section>

 ]]></description>
  <guid>https://www.amongbytes.com/posts/20251129-qat-intro/20251129-qat-intro.html</guid>
  <pubDate>Sat, 29 Nov 2025 00:00:00 GMT</pubDate>
  <media:content url="https://www.amongbytes.com/img/intel_qat_acc.png" medium="image" type="image/png" height="80" width="144"/>
</item>
<item>
  <title>Migration to Post-Quantum Cryptography</title>
  <dc:creator>Kris Kwiatkowski</dc:creator>
  <link>https://www.amongbytes.com/posts/20250916-pqc-migration.html</link>
  <description><![CDATA[ 





<p>The global internet security ecosystem is preparing for one of its biggest shifts in decades: the migration from traditional cryptographic algorithms to post-quantum cryptography. Quantum computing may still be years away from breaking widely deployed algorithms, but the “harvest-now, decrypt-later” (HNDL) threat makes planning and transitioning urgent. Sensitive data encrypted today could be collected and decrypted in the future once cryptographically relevant quantum computers (CRQCs) arrive.</p>
<p>Here are some thoughts on the migration.</p>
<section id="why-start-the-migration" class="level1">
<h1>Why Start the Migration?</h1>
<p>You don’t want to be caught off guard when quantum computers become capable of breaking current cryptographic standards. The migration process is complex and time-consuming, often taking several years to complete. Organizations must first evaluate when and how to begin, considering factors such as:</p>
<ol type="1">
<li><strong>Data lifetime</strong> – how long the information you protect needs to remain confidential.</li>
<li><strong>Migration complexity</strong> – how much effort is required across systems, hardware, and vendors.</li>
<li><strong>Quantum threat timeline</strong> – how soon a CRQC could become practical.</li>
</ol>
<p>Even without immediate existential risk, planning and testing today ensures you aren’t forced into a rushed, high-cost migration tomorrow. More complicated systems, such as old, small embedded devices, may take years to update, replace, or ideally redesign with PQC in mind.</p>
<p>Starting this transition early is definitely a good idea, because migration can be complex and time-consuming. But without panic and feeling pressured to start immediately. Be deliberate when selecting products and suppliers - this technology is still maturing and not yet fully commoditized, which means higher costs and potential risks. Open-source solutions can help mitigate some of these challenges, though they come with their own uncertainties. Proprietary options may provide stronger support and stability but can also create dependency on specific vendors and ecosystems. In short, try hard to avoid the “sell-now, forget-later” mindset - this remains a developing field with trade-offs, uncertainties, and costs that must be carefully balanced.</p>
</section>
<section id="key-exchange-in-tls-the-most-urgent-step" class="level1">
<h1>Key Exchange in TLS: The Most Urgent Step</h1>
<p>TLS key exchange is the top priority for PQC migration. If session keys are negotiated using quantum-vulnerable algorithms, future quantum computers could decrypt recorded traffic. To mitigate this, <a href="https://datatracker.ietf.org/doc/html/rfc9794#name-primitives">PQ/T hybrid key exchange</a> is recommended during the transition. These approaches combine at least one post-quantum and one classical algorithm, so security is maintained as long as one remains unbroken. Hybrid KEMs are also easier to roll out than post-quantum signatures, since they are ephemeral and not linked to long-term identity.</p>
<p>As PQC matures, hybrid KEMs will become less important, and eventually only the post-quantum algorithm will be needed.</p>
<p>The IETF began work on standardizing post-quantum key exchange for TLS in 2019, after a key workshop at Mozilla’s Mountain View offices and early Google-led experiments. This resulted in a framework for new key exchange methods, and the <a href="https://blog.cloudflare.com/the-tls-post-quantum-experiment/">TLS Post-Quantum Experiment</a> showed hybrid KEMs in action.</p>
<p>The first widely adopted Internet draft for <a href="https://datatracker.ietf.org/doc/html/draft-ietf-tls-hybrid-design">hybrid key exchange in TLS</a> is now close to completion. This draft and its <a href="https://datatracker.ietf.org/doc/draft-ietf-tls-ecdhe-mlkem/">extension</a> have become the de facto standard for modern TLS, with implementations already available in OpenSSL, NGINX, and AWS’s AWS-LC library, and are already widely deployed. Jan Schuman from Akamai has a <a href="https://www.netmeister.org/blog/pqc-use-2025-09.html">great post</a> on sites already using PQC.</p>
<p>TLS is not the only protocol that needs quantum-safe key exchange. For this reason, there is also an effort to standardize a more <a href="https://datatracker.ietf.org/doc/draft-irtf-cfrg-hybrid-kems/">generic</a> approach to hybrid KEM constructions for broader use.</p>
</section>
<section id="digital-signatures-important-but-less-urgent" class="level1">
<h1>Digital Signatures: Important, But Less Urgent</h1>
<p>Digital signatures remain a vital component of cryptographic protocols, ensuring authenticity, integrity, and non-repudiation. As such, post-quantum digital signature schemes are necessary for a future-proof internet infrastructure. However, the urgency to deploy them is relatively lower compared to key encapsulation mechanisms (KEMs).</p>
<p>Unlike key exchange, authentication cannot be broken retrospectively, meaning quantum-safe signatures are only needed once cryptanalytically relevant quantum computers become available. As a result, the migration to post-quantum digital signatures is less time-sensitive than for KEMs, allowing for a more deliberate and carefully planned transition. Since post-quantum signature schemes often involve larger keys and signatures, greater computational overhead, and increased implementation complexity, their deployment may incur higher costs - reinforcing the importance of keeping the migration - —as simple and efficient as possible.</p>
<p>Determining whether and when to adopt PQC certificates or PQ/T hybrid schemes may depend on several factors, such as:</p>
<ul>
<li>Frequency and duration of system upgrades</li>
<li>Operational flexibility to enable or disable algorithms</li>
</ul>
<p>Deployments with limited flexibility (e.g., embedded systems) benefit significantly from PQ/T hybrid signatures. This approach mitigates the risks associated with delays in transitioning to PQC and provides an immediate safeguard against zero-day vulnerabilities.</p>
<p>While hybrid constructs may seem plausible for long-term security, they also introduce complexity, potential performance overhead, and long-term implications:</p>
<ul>
<li>The number of possible hybrid combinations leads to interoperability challenges and increased implementation burden.</li>
<li>If one scheme is compromised, forgery is only a concern while the corresponding public key remains trusted.</li>
<li>Long-term protection through hybrids may be limited in practice due to standard key management practices.</li>
</ul>
<p>There is another risk related to the potential misuse of PQ/T hybrid signatures. Consider this: a deployment may use hybrid signatures to facilitate migration, resulting in a mix of devices - some aware of PQ schemes and some not. Devices unaware of PQ schemes may continue to validate only the traditional signature, while those aware of PQ schemes may validate both signatures. A deployment might continue this approach even after the traditional algorithm has been broken. While this may simplify operations by avoiding re-provisioning of trust anchors, it introduces a significant risk. A CRQC could forge the broken traditional signature component over a message, then combine it with the valid post-quantum component to produce a new composite signature that verifies successfully. This underscores the critical need to retire hybrid certificates containing broken algorithms once CRQCs become available (and always validate both components of a hybrid signature).</p>
<p>The IETF has many experts working on this topic. For example, <a href="https://datatracker.ietf.org/doc/draft-ietf-lamps-pq-composite-sigs/">draft-ietf-lamps-pq-composite-sigs</a> describes how to create and verify composite signatures that combine a post-quantum signature with a classical signature.</p>
<p>Nevertheless, hybrid signatures remain complicated and may not be suitable for all scenarios. Fortunately, they are only needed once quantum computers are capable of breaking current signature algorithms. So, we still have some time to make the authentication migration as smooth as possible.</p>
</section>
<section id="infrastructure-costs-what-to-expect" class="level1">
<h1>Infrastructure Costs: What to Expect</h1>
<p>This topic is both important and frequently overlooked. Migrating to post-quantum cryptography (PQC) often requires significant updates to existing infrastructure. Careful planning and budgeting are essential, as costs can arise in multiple areas.</p>
<p>The first step is discovery - building a comprehensive inventory of all cryptographic assets and where they are used. This process may require specialized tools and can itself be resource-intensive.</p>
<p>Another major factor is whether updates can be delivered through software patches or require new hardware. For some systems, particularly constrained or niche devices, supporting PQC may require custom development or even physical replacement. Engaging with vendors early is critical to understand available options and associated costs.</p>
<p>Key infrastructure components that will need attention include:</p>
<ul>
<li><p>Network protocols – TLS, SSH, and QUIC must be adapted to handle larger PQC artifacts. While PQC KEMs such as ML-KEM often perform competitively in handshakes, their larger message sizes can increase bandwidth use, add round trips, or introduce latency.</p></li>
<li><p>Message processing – PQC signature algorithms typically process entire messages rather than digests. This can hurt performance in systems like HSMs that rely on streaming data, unless applications adopt pre-hashing or streaming-friendly designs.</p></li>
<li><p>PKI systems – Certificate Authorities (CAs), certificate formats, and trust anchors must all evolve to support PQC. Hybrid certificate formats can ease transition but also add complexity and operational overhead.</p></li>
<li><p>Constrained devices – Long-lived systems (e.g., satellites, industrial controllers, smart meters) are especially difficult to update. Limited memory and compute resources may force costly redesigns or replacements, and in-field updates can be logistically challenging.</p></li>
<li><p>Hybrid approaches, while helpful for resilience during the transition, can add two layers of cost: first to support dual algorithms (certificates, key management, validation), and later to migrate again once hybrids are no longer needed. In some environments, this two-step process is more expensive than planning a direct migration to PQC at the right moment.</p></li>
<li><p>Training and Awareness: Ensuring that staff are knowledgeable about PQC concepts, like KPIs provided by PQC implementations and impact on performance, familiar with tradeoffs of PQ/T hybrid schemes and their implications on migration process are essential. This may involve training programs, workshops, or hiring specialized personnel, all of which contribute to the overall cost.</p></li>
</ul>
<p>Long-term savings come from embracing cryptographic agility: designing systems that can switch algorithms without major architectural changes. This reduces the cost of future transitions - but achieving true agility requires upfront investment in both design and standardization.</p>
</section>
<section id="final-thoughts" class="level1">
<h1>Final Thoughts</h1>
<p>Migrating to post-quantum cryptography is not a single upgrade — it is a long-term process. As mentioned at the beginning of this article, starting this transition early is definitely a good idea.</p>
<p>It is worth recalling that the <a href="https://datatracker.ietf.org/doc/draft-kiefer-tls-ecdhe-sidh/">first</a> IETF draft proposing a PQ/T hybrid key exchange for TLS was published back in 2019 (I agree, it wasn’t serious proposal at the time), nevertheless we are now in 2025, and only recently have PQ/T hybrid standards been finalized by the IETF and started to see adoption by major browsers. That is six years for a single - and relatively simple - use case: key exchange in TLS.</p>
<p>When it comes to digital signatures, the migration will be significantly more complex and time-consuming. Based on current timelines and projections for quantum computing, cryptographically relevant quantum computers are expected to emerge by the end of the next decade. This leaves less than ten years to finalize standards and initiate large-scale migration. Not in TLS, but everywhere. Given that such migration efforts typically span several years, it is clear that we are already behind schedule.</p>
<p>The situation today, however, is quite different from 2019: the first PQC algorithms have been standardized, and there is far greater motivation and momentum to move forward. Nevertheless, much remains to be done. The key is to plan for agility and actively align with emerging standards. Going back to the beginning of this article: don’t wait until the last minute, but also don’t rush into unproven solutions. Balance risk, cost, and operational complexity carefully.</p>
<p>The transition will be uneven: some systems will adopt hybrids, some will wait for pure PQC. Constrained devices may require tailored strategies and highly optimized implementations to match the performance and resource utilization of traditional algorithms. But the direction is clear: a future-proof Internet must stay safe!</p>
<p>Ongoing work in the IETF focuses on general <a href="https://datatracker.ietf.org/doc/draft-kwiatkowski-pquip-pqc-migration/">guidance for migration to PQC</a> as well as guidance for <a href="https://datatracker.ietf.org/doc/draft-reddy-pquip-pqc-hsm/">constrained devices</a>. Feel free to join the discussions in the <a href="https://mailarchive.ietf.org/arch/browse/pqc/">PQUIP Working Group</a>.</p>


</section>

 ]]></description>
  <category>Cryptography</category>
  <guid>https://www.amongbytes.com/posts/20250916-pqc-migration.html</guid>
  <pubDate>Tue, 16 Sep 2025 00:00:00 GMT</pubDate>
  <media:content url="https://www.amongbytes.com/img/pqc_lock.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Note on speed of verification in SLH-DSA</title>
  <dc:creator>Kris Kwiatkowski</dc:creator>
  <link>https://www.amongbytes.com/posts/202409-statefull_vs_stateless.html</link>
  <description><![CDATA[ 





<p>Here I’ll compare on the verification functionality of LMS and SLH-DSA. The XMSS is not mentioned, but as both LMS and XMSS are quite similar in this sense, we probably can observe similar results (XMSS is slightly slower than LMS).</p>
<p>When comparing stateful and stateless hash-based signature schemes, the main benefit of the former is significantly shorter signature sizes and much faster verification. The difference in signature size is significant. I summarised differences in a table below, but in brief, an LMS signature is around 4KB, while a signature with SLH-DSA at a similar security level is closer to &lt;50KB, hence ~10x bigger.</p>
<table class="caption-top table">
<colgroup>
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
<col style="width: 12%">
</colgroup>
<thead>
<tr class="header">
<th>SLH-DSA param</th>
<th>PubKey</th>
<th>Signature</th>
<th>Security</th>
<th>LMS param</th>
<th>SK</th>
<th>PK</th>
<th>Sig</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>SLH-DSA-SHA2-128s</td>
<td>32</td>
<td>7856</td>
<td>128</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr class="even">
<td>SLH-DSA-SHA2-128f</td>
<td>32</td>
<td>17088</td>
<td>128</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr class="odd">
<td>SLH-DSA-SHA2-192s</td>
<td>48</td>
<td>16224</td>
<td>192</td>
<td>LMS-SHA2-M24-H25-W8</td>
<td>44</td>
<td>48</td>
<td>1260</td>
</tr>
<tr class="even">
<td>SLH-DSA-SHA2-192f</td>
<td>48</td>
<td>35664</td>
<td>192</td>
<td>LMS-SHA2-M24-H25-W1</td>
<td>44</td>
<td>48</td>
<td>5436</td>
</tr>
<tr class="odd">
<td>SLH-DSA-SHA2-256s</td>
<td>64</td>
<td>29792</td>
<td>256</td>
<td>LMS-SHA2-M32-H25-W8</td>
<td>52</td>
<td>56</td>
<td>1932</td>
</tr>
<tr class="even">
<td>SLH-DSA-SHA2-128f</td>
<td>64</td>
<td>49856</td>
<td>256</td>
<td>LMS-SHA2-M32-H25-W1</td>
<td>52</td>
<td>56</td>
<td>9324</td>
</tr>
</tbody>
</table>
<p>For SHA2-based LMS, there are 40 different parameterizations possible. In the table above we used extreme values. Parameterization with the <code>W1</code> postfix indicates large, but fast verification and the one with <code>W8</code> postfix is a parameterization that provides small signatures, but slow verification.</p>
<p>When it comes to SLH-DSA – the <code>s</code> postfix indicates <em>small</em> parameter sets and <code>f</code> indicates the <em>fast</em> one. But, contrary to LMS, the verification procedure of <code>s</code> parameterization is faster than <code>f</code>. That is related to the design of the verification algorithm – namely, a shorter signature implies fewer evaluations of the hash function.</p>
<p>To give some numbers, the runtime of function <code>F()</code> dominates the runtime of SLH-DSA. We calculated (average) the number of calls to that function as well as the percentage of time the verification algorithm spends in the <code>F()</code> function. The results are presented in the table below. One should notice that there are much fewer calls to the <code>F()</code> in the case of <code>s</code> variant.</p>
<table class="table-hover caption-top table">
<colgroup>
<col style="width: 14%">
<col style="width: 14%">
<col style="width: 14%">
<col style="width: 14%">
<col style="width: 14%">
<col style="width: 14%">
<col style="width: 14%">
</colgroup>
<thead>
<tr class="header">
<th></th>
<th>128f</th>
<th>192f</th>
<th>256f</th>
<th>128s</th>
<th>192s</th>
<th>256s</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>No of invocations of function <code>F()</code></td>
<td>5908</td>
<td>8620</td>
<td>8633</td>
<td>1886</td>
<td>2751</td>
<td>4067</td>
</tr>
<tr class="even">
<td>% time spent in <code>F()</code></td>
<td>94.8%</td>
<td>95.7%</td>
<td>95.2%</td>
<td>88.1%</td>
<td>89.3%</td>
<td>90.9%</td>
</tr>
</tbody>
</table>



 ]]></description>
  <guid>https://www.amongbytes.com/posts/202409-statefull_vs_stateless.html</guid>
  <pubDate>Tue, 03 Sep 2024 00:00:00 GMT</pubDate>
  <media:content url="https://cdn.pixabay.com/photo/2022/04/01/19/28/sphinx-7105523_1280.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Key agreement methods in FIPS</title>
  <dc:creator>Kris Kwiatkowski</dc:creator>
  <link>https://www.amongbytes.com/posts/20230601-fips-key-agreement-methods.html</link>
  <description><![CDATA[ 





<p>FIPS has multiple ways of claiming CAVP-tested compliance of the key agreement schemes. Each of them corresponds to a different use case, for example, the key agreement may or may not include key derivation. Additionally, FIPS also supports key confirmation (i.e.&nbsp;56Ar3, 5.9) which can be applied to some key agreements. It is easy to get lost when reading FIPS IG, hence here below I put short summary of differences:</p>
<ul>
<li><p><strong>KAS-SSC</strong>: Compliance with the agreement on shared secret <code>Z</code> (only). The key agreement scheme is the one mentioned in the SP800-56C r3, Section 6. No key derivation is done after Z is agreed upon.</p></li>
<li><p><strong>KAS</strong>: Compliance with NIST-approved key agreement AND derivation. Testing is done End-to-End, meaning both operations are done by single security service and a calling sequence is within the module boundary.</p></li>
<li><p><strong>KDA</strong>: It relates only to the key derivation part, so testing is NOT done End-to-End. This certificate is given when derivation uses one of the KDF’s described in SP800-56C rev1 or rev2.</p></li>
<li><p><strong>CVL</strong>: It relates only to the key derivation part, so testing is NOT done End-to-End. This certificate is given when derivation uses one of the KDF’s described by the IG 2.4.B.</p></li>
</ul>
<p>Note that SP800-56C rev2 is also mentioned by the IG 2.4.B. My understanding is that for example, in the case of TLS v1.3, we do need SP800-56 rev2, but not necessarily KDA certificate. For KDA compliance, software needs to be tested separately.</p>
<p><strong>Example PQ-TLS v1.3</strong>: Two goals. 1) to implement the TLS key schedule as per 7.1 of RFC 8446, 2) to allow hybrid, quantum-safe key agreement.</p>
<p>We need a scheme that will be used for generating shared secret Z, so we need KAS-SSC. KAS is not useful as TLS key schedule is a single-extract-multi-expand derivation (800-56C r2, section 5.3). TLS uses key derivation with HKDF (two-step), so we also need KDA or CVL. Only IG 2.4.B. mentions TLS, so we need CVL. Hybrid-PQ TLS is not standardized, so CVL won’t apply here (I think), from the other hand SP800-56C rev2 allows using an auxiliary KAS as an addition to the approved one, hence we also need KDA. Therefore, in this case, we need KAS-SSC, KDA and CVL certificates.</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Abbriviation</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>SSC</td>
<td>Shared Secret Computation</td>
</tr>
<tr class="even">
<td>KDA</td>
<td>Key Derivation Algorithm</td>
</tr>
<tr class="odd">
<td>CVL</td>
<td>Component Validation List</td>
</tr>
<tr class="even">
<td>KAS</td>
<td>Key agreement Scheme</td>
</tr>
</tbody>
</table>



 ]]></description>
  <guid>https://www.amongbytes.com/posts/20230601-fips-key-agreement-methods.html</guid>
  <pubDate>Thu, 01 Jun 2023 00:00:00 GMT</pubDate>
  <media:content url="https://www.amongbytes.com/img/fips_shield.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Gentle introduction to NTRU cryptosystem (part 1)</title>
  <link>https://www.amongbytes.com/posts/20211017-gentle-introduction-to-ntru-1/20211017-gentle-introduction-to-ntru-1.html</link>
  <description><![CDATA[ 





<p>NTRU cryptosystem is a grandfather of lattice-based encryption schemes. The initial idea was due to <a href="http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=144BBB9F0E87EF0D471151F0EACC7DB8?doi=10.1.1.40.2489&amp;rep=rep1&amp;type=pdf">Ajtai</a>. His work evolved into a whole area of research with the goal of creating more practical, lattice-based cryptosystems, like the first NTRU-based <a href="https://www.ntru.org/f/hps98.pdf">encryption system</a> and <a href="http://www.math.brown.edu/jpipher/NTRUSign_RSA.pdf">signature scheme</a> due to Hoffstein, Pipher, Silverman, Howgrawe-Graham and Whyte.</p>
<p>The cryptosystem is based on polynomial rings. More precisely, the base is a problem of recovering a sparse polynomial that is a factor of a polynomial modulo \(X^n - 1\) in the polynomial ring of some finite field \(F_q\).</p>
<p>The article below tries to explain, in easy to understand terms, the basics of NTRU, starting from a brief explanation of what the lattice is. Future articles will introduce a more detailed view of a modern approach to building NTRU-based cryptosystems.</p>
<section id="the-lattice-and-related-hard-problems" class="level2">
<h2 class="anchored" data-anchor-id="the-lattice-and-related-hard-problems">The lattice and related hard-problems</h2>
<p>The picture below visualizes a lattice as points in a two-dimensional space. A lattice is defined by the origin \(O\) and base vectors \(\{b_1,b_2\}\). Every point on the lattice is represented as a linear combination of the base vectors, for example \(V=−2b_1+b_2\).</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.amongbytes.com/posts/20211017-gentle-introduction-to-ntru-1/simple_lattice.png" class="img-fluid quarto-figure quarto-figure-center figure-img"></p>
</figure>
</div>
<p>There are two classical NP-hard problems in lattice-based cryptography:</p>
<ul>
<li><strong>Shortest Vector Problem</strong> (SVP): Given a lattice, to find the shortest non-zero vector in the lattice. In the graph, the vector \(s\) is the shortest one. The SVP problem is NP-hard only under some assumptions.</li>
<li><strong>Closest Vector Problem</strong> (CVP): Given a lattice and a vector \(V\) (not necessarily in the lattice), to find the closest vector to \(V\). For example, the closest vector to t is \(z\).</li>
</ul>
<p>In the graph above, solving SVP and CVP would be quite simple. However, the lattices used in cryptography have higher dimensions, say above 1000, as well as non-orthogonal vector basis. It is believed that in these instances, the problems are hard enough to solve, even for future quantum computers.</p>
<p>Coppersmith and Shamir, proved in <a href="https://link.springer.com/chapter/10.1007/3-540-69053-0_5">“Lattice Attacks on NTRU”</a> (EUROCRYPT’97) that NTRU problem, can be reduced to SVP.</p>
</section>
<section id="rings" class="level2">
<h2 class="anchored" data-anchor-id="rings">Rings</h2>
<p>NTRU operates in a <a href="https://en.wikipedia.org/wiki/Ring_(mathematics)">ring</a> of polynomials of degree \(N\). The degree of a polynomial is the highest exponent of its variable. For example, \(x<sup>7+6x</sup>3+11x^2\) has degree of 7. One can add polynomials in the ring in the usual way, by simply adding theirs coefficients modulo some integer. In NTRU this integer is called as \(q\). Polynomials can also be multiplied (obviously), and the result of a multiplication is always a polynomial of degree less than \(N\). It basically means that exponents of the resulting polynomial are added to modulo \(N\). For example:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Beqnarray%7D%0AX%5E%7BN-1%7D%20%5Ccdot%20X%20%20%20&amp;=&amp;%20X%5EN%20%20%20%20%20&amp;=&amp;%201%20%20%20%20%20%5Cnewline%0AX%5E%7BN-1%7D%20%5Ccdot%20X%5E2%20&amp;=&amp;%20X%5E%7BN+1%7D%20&amp;=&amp;%20X%20%20%20%20%20%5Cnewline%0AX%5E%7BN-1%7D%20%5Ccdot%20X%5E3%20&amp;=&amp;%20X%5E%7BN+2%7D%20&amp;=&amp;%20X%5E2%20%20%20%5Cnewline%0A%5Cdots%0A%5Cend%7Beqnarray%7D"></p>
<p>In other words, polynomial ring arithmetic is very similar to modular arithmetic, but instead of working in a “set of numbers” less than \(N\), one works in a set of polynomials with a degree less than \(N\).</p>
</section>
<section id="ntru-scheme-basic-idea" class="level2">
<h2 class="anchored" data-anchor-id="ntru-scheme-basic-idea">NTRU scheme: basic idea</h2>
<p>To instantiate the NTRU cryptosystem, the following domain parameters must be chosen:</p>
<ul>
<li>\(N\) - degree of the polynomial ring, in NTRU the principal objects are polynomials of degree \(N−1\).</li>
<li>\(p\) - small modulus, used during key generation and decryption for reducing message coefficients.</li>
<li>\(q\) - large modulus, used during algorithm execution for reducing coefficients of the polynomials.</li>
</ul>
<p>First, we generate a pair of public and private keys. To do that, two polynomials \(f\) and \(g\) are chosen from the ring in a way that their randomly generated coefficients are much smaller than \(q\). Then key generation computes two inverses of the polynomial:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Beqnarray%7D%0Af_p%20=%20f%5E%7B-1%7D~mod~p%20%5Cnewline%0Af_q%20=%20f%5E%7B-1%7D~mod~q%20%5Cnewline%0A%5Cend%7Beqnarray%7D"></p>
<p>The values \(f\) and \(f_p\) make up the private key. The public key \(pk\) is computed, as follows: <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Beqnarray%7D%0Apk%20=%20p%20%5Ccdot%20f_q%20%5Ccdot%20g~mod~q%0A%5Cend%7Beqnarray%7D"> The \(f_q\) is not part of any key, however it must remain secret.</p>
<p>It might be the case that after choosing \(f\), the inverses modulo \(p\) and \(q\) do not exist. In this case, the algorithm has to start from the beginning and generate another \(f\). That’s unfortunate because calculating the inverse of a polynomial is a costly operation. The recent instantiations of some NTRU schemes (like NTRU-HRSS) are design in a way to ensure those inverses always exist. Which makes key generation faster and more reliable.</p>
<p>The encryption of a message \(m\) proceeds as follows. First, the message \(m\) is converted to a ring element \(pt\) (there exists an algorithm for performing this conversion in both directions). During encryption, NTRU randomly chooses one polynomial \(b\) called \(blinder\). The goal of the blinder is to generate different ciphertexts per encryption. Thus, the ciphertext \(ct\) is obtained as:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Beqnarray%7D%0Act%20=%20(b%20%5Ccdot%20pk%20+%20pt)~mod~q%0A%5Cend%7Beqnarray%7D"></p>
<p>Decryption looks a bit more complicated, but it can also be easily understood. It uses both the secret value \(f\) and \(f_p\). To recover the plaintext as: <img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Beqnarray%7D%0Av%20%20&amp;=&amp;%20f%20%5Ccdot%20ct~mod~q%20%20%20%20%20%5Cnewline%0Apt%20&amp;=&amp;%20v%20%5Ccdot%20f_p~mod~p%20%20%20%20%5Cnewline%0A%5Cend%7Beqnarray%7D"></p>
<p>Taking all that was described above, evaluation done during decryption is something like:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cbegin%7Beqnarray%7D%0A(f%20%5Ccdot%20ct)&amp;~mod~p&amp;%20%5Ccdot~f_p~mod~q=%20%5Cnewline%0A(f%20%5Ccdot%20(b%5Ccdot%20pk%20+%20pt)%20)%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20&amp;~mod~p&amp;%20%5Ccdot~f_p~mod~q=%20%5Cnewline%0A(f%20%5Ccdot%20(b%5Ccdot%20f_q%20%5Ccdot%20p%20%5Ccdot%20g%20+%20pt)%20)%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20&amp;~mod~p&amp;%20%5Ccdot~f_p~mod~q=%20%5Cnewline%0A%5Cxcancel%7B(f%20%5Ccdot%20f_q)%7D(b%20%5Ccdot%20p%20%5Ccdot%20g)%20+%20(f%20%5Ccdot%20pt)%20)%20&amp;~mod~p&amp;%20%5Ccdot~f_p~mod~q=%20%5Cnewline%0A%5Cxcancel%7Bb%20%5Ccdot%20g%20%5Ccdot%20p~mod~p%7D+%20f%20%5Ccdot%20pt~mod~p%20%20%20%20%20%20%20%20%20&amp;%20%20%20%20%20%20&amp;%20%5Ccdot~f_p~mod~q=%20%5Cnewline%0A(%5Cxcancel%7Bf%20%5Ccdot%20f_p%7D)%20%5Ccdot~pt%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20&amp;~mod~p&amp;~~~~~~~~mod~q=pt%0A%5Cend%7Beqnarray%7D%0A"> After obtaining \(pt\), the message \(m\) is recovered by inverting the conversion function.</p>
<p>The underlying hard assumption is that given two polynomials: \(f\) and \(g\) whose coefficients are short compared to the modulus \(q\), it is difficult to distinguish \(pk={f g}\) from a random element in the ring. It means that it’s hard to find \(f\) and \(g\) given only public key \(pk\).</p>
</section>
<section id="concrete-schemes" class="level2">
<h2 class="anchored" data-anchor-id="concrete-schemes">Concrete schemes</h2>
<p>The original scheme has a long (over 20 years) history. Since then it has been changed multiple times, as a response to account for cryptanalytic advances. Several variants of concrete KEM and signature schemes based on NTRU were proposed during NIST PQC standardization. In the case of KEM, two candidates NTRUEncrypt and NTRU-HRSS-KEM been merged together and end up as a scheme called … well, <a href="https://ntru.org/">“NTRU”</a>. The scheme is fairly easy to implement in constant-time with is characterized by performance, allowing it to be used in the production environment. The NTRU-based signature scheme <a href="https://falcon-sign.info/">Falcon</a> also came to the last round of the standardization. It’s characterized by very fast execution and relatively small key sizes. Nevertheless, performance efficient, constant-time implementation can be quite complicated. It also seems, patent situation for NTRU schemes is much clearer than in case of other candidates.</p>


</section>

 ]]></description>
  <guid>https://www.amongbytes.com/posts/20211017-gentle-introduction-to-ntru-1/20211017-gentle-introduction-to-ntru-1.html</guid>
  <pubDate>Sun, 17 Oct 2021 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Constant-time code verification with Memory Sanitizer</title>
  <link>https://www.amongbytes.com/posts/20210709-testing-constant-time/20210709-testing-constant-time.html</link>
  <description><![CDATA[ 





<p>In the cryptography context, the <strong>side-channel</strong> attacks are about exploiting computer system implementation to gain information about the secret key. First such attacks were introduced by Paul Kocher in his paper called <a href="https://paulkocher.com/doc/TimingAttacks.pdf">“Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems”</a>. Since then attacks were much improved to include different sources of side-channel information like <a href="https://www.usenix.org/system/files/conference/usenixsecurity17/sec17-tang.pdf">power consumption</a>, and electromagnetic and acoustic emissions. Since then attacks were much improved to include different sources of side-channel information like power consumption, and electromagnetic and acoustic emissions. Since then, those attacks were <a href="https://crypto.stanford.edu/~dabo/pubs/papers/ssl-timing.pdf">proved</a> multiple times to be practical.</p>
<p><strong>Timing attacks</strong> are a subset of side-channels in which information about the execution time of a cryptographic primitive is exploited to break the system. One of the most appealing advantages over other side-channel attacks is that those attacks can be applied remotely, through the network. For an attack to be useful, the execution of the cryptographic primitive must depend on secret material. In brief, constant-time implementation is crafted in a way that execution doesn’t depend on secret data. Article by Thomas Pornin, author of BearSSL, explains in detail what it means <a href="https://www.bearssl.org/constanttime.html">here</a>.</p>
<p>In many cases, checking if the implementation is constant-time is not trivial and tools are needed. Adam Langley’s developed a tool called <code>ctgrind</code>. The idea is to <a href="https://www.imperialviolet.org/2010/04/01/ctgrind.html">extend and use</a> a tool detecting <strong>Use of Uninitialized Memory</strong> (UUM). The tool is called Memcheck, which runs in the Valgrind framework.</p>
<p>Memory Sanitizer (MSan) is another UUM detector. It is a part of the LLVM project, which integrates with the clang compiler. This blog post briefly describes how it <a href="{{<ref &quot;#UUM&quot;>}}">works</a> and how to use it for checking the constant-time implementation in a <a href="{{<ref &quot;#TOY&quot;>}}">“a toy”</a> code. Finally, after introducing useful <a href="{{<ref &quot;#CTCHECK&quot;>}}">helper</a> it shows a <a href="{{<ref &quot;#FRODO&quot;>}}">result</a> of integrating it with existing cryptographic implementations and <a href="{{<ref &quot;#BUG&quot;>}}">compares</a> it to Memcheck.</p>
<section id="UUM" class="level3">
<h3 class="anchored" data-anchor-id="UUM">How does the UUM detector work?</h3>
<p>Techniques used for detecting <em>Use of Uninitialized Memory</em>, implemented in tools like Valgrind’s Memcheck or LLVM’s Memory Sanitizers, has been developed over the years and currently are quite advanced. At high level both Memcheck and MSan are similiar.</p>
<p>The main difference between Memcheck and Memory Sanitizer is that in the case of Valgrind binary instrumentation is done at startup. Memcheck uses the Valgrind framework for instrumenting already compiled binary. In contrast, Memory Sanitizer instruments code at the compilation stage and leverages mechanisms implemented in LLVM, like its intermediate representation. Memcheck combines the detection of UUM and memory addressability bugs in a single tool. In the case of LLVM, these are implemented in two different tools - Memory Sanitizer (MSan) for detecting UUMs and a sibling tool called AddressSanitizer (ASan) for addressability bugs. There are down and upsides to both approaches. The approach taken by MSan allows for execution to be magnitude faster and have almost no startup penalty.</p>
<p>Except for those differences, at a high level, the internals of both Memcheck and MSan are similar. UUM detector tracks the state of every bit of memory or register used by the program. It uses a concept called <strong>shadow memory</strong> (Valgrind calls shadow memory a <a href="https://valgrind.org/docs/manual/mc-manual.html#mc-manual.machine">VBITS</a>’s), which stores information on whether each bit of memory was properly initialized.</p>
<p>It allows using uninitialized memory if it is “safe” to do so. For example, copying it from one place to the other is not a problem. It reports a problem only when the execution of a program depends on an uninitialized state. For example, when branching, dereferencing a pointer or using uninitialized memory as for array indexing. That’s exactly what we need to test constant-time functions. Uninitialized memory can be propagated to other variables in the code (i.e.&nbsp;copying). To track the propagation, UUMs implement <strong>propagation</strong> of the shadow memory - when the uninitialized value is used as an operand of a “safe” operation, its state is propagated to the result of that operation. The new shadow value is computed based on the values of the operands and their shadow values.</p>
<p>Let’s summarize those concepts by analysing a concrete example. Function on the left side adds <code>1</code> to an argument <code>n</code> and returns the result. On the right side, we see the state of the shadow memory and the current values of variables. Uninitialized memory is marked red and initialized is blue. In this example, the function is called with argument <code>n=6</code>. The 12 least significant bits of an argument <code>n</code> are uninitialized.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.amongbytes.com/posts/20210709-testing-constant-time/shadow_memory.png" class="img-fluid figure-img"></p>
<figcaption>UUM detection mechanism</figcaption>
</figure>
</div>
<p>The function creates on a stack temporary variable <code>b</code> and assigns <code>1</code> to it. Variable is properly initialized, hence its shadow memory is marked blue. The second variable <code>r</code> is initially uninitialized. In addition, the UUM detector looks at shadow bits of both operands and calculates a new shadow value for variable <code>r</code>. Adding uninitialized to initialized results in uninitialized. So, the shadow memory for <code>r</code> is the same as for the argument <code>n</code> (4 most significant bits have initialized the rest is not). In addition, the operation doesn’t change program flow it doesn’t trigger the UUM.</p>
<p>It should also be noticed that resulting shadow memory depends, on the operation being done as well as the state of shadow memory of both operands.</p>
<p><strong>Origin tracking</strong> helps to understand potential problems. It assigns an ID to each variable and serves as an identifier that created uninitialized bits in shadow memory. Once UUM is triggered, the detector uses those IDs to backtrack the origin of uninitialized values and print them in the final report. MSan implements more advanced origin tracking, its report shows lines of code where uninitialized memory was created as well as places at which it was propagated before UUM was triggered (see the result of running “toy example” above). In the case of MemorySanitizer, it is enabled by providing <code>-fsanitize-memory-track-origins=</code> flag to the compiler, in case of Valgrind it is <code>--track-origins=yes</code> option.</p>
</section>
<section id="TOY" class="level3">
<h3 class="anchored" data-anchor-id="TOY">Toy example</h3>
<p>A constant-time table lookup is an important tool in secure, cryptographic implementations. For instance, it is used for the implementation of elliptic curves-based schemes, like ECDH - widely deployed on the Internet and used by HTTPS connection key exchange schemes. In that system, the (rational) point on the elliptic curve is multiplied by a scalar. A scalar is used as a secret key and hence it must be protected against leaks. The optimized version of such multiplication may use, so-called window technique. In this case, to speed up computation, an algorithm starts with pre-computing small multiplies of a point (like <img src="https://latex.codecogs.com/png.latex?%5B0%5DP,%20%5B1%5DP,%20%5Ccdots,%20%5B15%5DP">, for fixed window size <img src="https://latex.codecogs.com/png.latex?%5Ctextit%7Bw%7D=4">) and stores them in some table. Then, scalar-by-point multiplication consists of slicing the binary representation of a scalar into equal <img src="https://latex.codecogs.com/png.latex?w">-bit long pieces, iterate over such split and use those pieces for point multiplication (see <a href="https://eprint.iacr.org/2011/338.pdf">here</a> for a detailed description of a technique). Most importantly at each iteration, an algorithm gets a small multiply of a point <img src="https://latex.codecogs.com/png.latex?%5Bk%5DP"> for <img src="https://latex.codecogs.com/png.latex?k%3C15"> from the table. That is a time variable operation - as the value may be loaded from different locations (CPU register, cache or RAM). An attacker can then try to guess secret scalar by exploiting those time differences. Hence, lookups done be secure implementations need to be implemented in a constant-time manner.</p>
<p>The toy example shows how to use LLVM’s Memory Sanitizer to detect whether table lookup is constant-time. Program starts with initializing table <code>pow2</code> with powers of two in a range <img src="https://latex.codecogs.com/png.latex?%5Cleft%5Clangle%202%5E0,%202%5E%7B64%7D%20%5Cright%5Crangle">. Then it reads <img src="https://latex.codecogs.com/png.latex?x"> from the command line to the variable <code>secret</code>, uses it to get <img src="https://latex.codecogs.com/png.latex?2%5Ex"> from the table and returns the result.</p>
<p>In the first step, we want to ensure that MSan triggers UUM whenever program execution depends on the value of <code>secret</code> variable, if it does - UUM must be triggered. Fortunately, the LLVM’s MemorySanitizer API offers such a possibility, the <code>__msan_allocated_memory</code> marks address ranges as containing undefined or defined data, exactly what we need. All the MemorySanitizer API functions are located in the <a href="https://github.com/llvm-mirror/compiler-rt/blob/master/include/sanitizer/msan_interface.h"><code>msan_interface.h</code></a> header, which needs to be included in the code. Let’s look at the example below:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode numberSource c number-lines code-with-copy"><code class="sourceCode c"><span id="cb1-1"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">&lt;stdio.h&gt;</span></span>
<span id="cb1-2"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">&lt;stdint.h&gt;</span></span>
<span id="cb1-3"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">&lt;stdlib.h&gt;</span></span>
<span id="cb1-4"></span>
<span id="cb1-5"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">&lt;sanitizer/msan_interface.h&gt;</span></span>
<span id="cb1-6"></span>
<span id="cb1-7"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#define POW2_NUM </span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">64</span></span>
<span id="cb1-8"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">static</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint64_t</span> pow2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span>POW2_NUM<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">];</span></span>
<span id="cb1-9"></span>
<span id="cb1-10"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">static</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">inline</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint64_t</span> select_n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint64_t</span> n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-11">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> pow2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span>n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">];</span></span>
<span id="cb1-12"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb1-13"></span>
<span id="cb1-14"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> main<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> argc<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">const</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">char</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> argv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[])</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-15">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint64_t</span> ret<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> secret<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-16">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Initialize a table with powers of 2</span></span>
<span id="cb1-17">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">size_t</span> i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">64</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-18">        pow2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span>i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">]</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">ULL</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-19">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb1-20"></span>
<span id="cb1-21">    secret <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> atoi<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>argv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">]);</span></span>
<span id="cb1-22"></span>
<span id="cb1-23">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Denote "secret" variable as uninitialized</span></span>
<span id="cb1-24">    __msan_allocated_memory<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(&amp;</span>secret<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">sizeof</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>secret<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">));</span></span>
<span id="cb1-25">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Time dependent operation possible load from cache or memory</span></span>
<span id="cb1-26">    ret <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> select_n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>secret<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb1-27"></span>
<span id="cb1-28">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Denote memory as defined to eliminate false possitive, due</span></span>
<span id="cb1-29">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// to non constant-time implementation of printf</span></span>
<span id="cb1-30">    __msan_unpoison<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(&amp;</span>secret<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">sizeof</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>secret<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">));</span></span>
<span id="cb1-31">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Denote also 'ret' in case shadow bits were propagated</span></span>
<span id="cb1-32">    __msan_unpoison<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(&amp;</span>ret<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">sizeof</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>ret<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">));</span></span>
<span id="cb1-33">    printf<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2^</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%lu</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%lu\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> secret<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> ret<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb1-34"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<p>In that code, the <code>select_n</code> function performs memory lookup in a table <code>pow2</code>. Just before the function is called, 64-bits (8 bytes) of memory storing <code>secret</code> is denoted as uninitialized. That’s done by <code>__msan_allocated_memory</code> function. Then program calls the <code>select_n</code> function and at this point, UUM should be triggered. To avoid reporting a false positive error (caused by <code>printf</code>), the <code>secret</code> and <code>ret</code> are marked as defined, just after <code>select_n</code> returns. The result of the run is as expected:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode numberSource shell number-lines code-with-copy"><code class="sourceCode"><span id="cb2-1">&gt; clang -g -fsanitize=memory -fsanitize-memory-track-origins=2 -fno-omit-frame-pointer test.c</span>
<span id="cb2-2">&gt; ./a.out 47</span>
<span id="cb2-3">==1307703==WARNING: MemorySanitizer: use-of-uninitialized-value</span>
<span id="cb2-4">    #0 0x55f795fe2a1c in select_n /home/kris/test.c:11:12</span>
<span id="cb2-5">    #1 0x55f795fe269f in main /home/kris/test.c:26:11</span>
<span id="cb2-6">    #2 0x7fc798ac8b24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)</span>
<span id="cb2-7">    #3 0x55f795f6114d in _start (/home/kris/a.out+0x2014d)</span>
<span id="cb2-8"></span>
<span id="cb2-9">  Uninitialized value was stored to memory at</span>
<span id="cb2-10">    #0 0x55f795fe299e in select_n /home/kris/test.c:10</span>
<span id="cb2-11">    #1 0x55f795fe269f in main /home/kris/test.c:26:11</span>
<span id="cb2-12">    #2 0x7fc798ac8b24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)</span>
<span id="cb2-13"></span>
<span id="cb2-14">  Memory was marked as uninitialized</span>
<span id="cb2-15">    #0 0x55f795fbd1bb in __msan_allocated_memory (/home/kris/a.out+0x7c1bb)</span>
<span id="cb2-16">    #1 0x55f795fe2647 in main /home/kris/test.c:24:5</span>
<span id="cb2-17">    #2 0x7fc798ac8b24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)</span>
<span id="cb2-18"></span>
<span id="cb2-19">SUMMARY: MemorySanitizer: use-of-uninitialized-value /home/kris/test.c:11:12 in select_n</span>
<span id="cb2-20">Exiting</span></code></pre></div></div>
<p>Execution correctly triggers UUM at line 11, which is precisely where table lookup is done. Sanitizer also reports some information about origin of the problem. Generation of that information is enabled by <code>-fsanitize-memory-track-origins=2</code> flag and proves to be quite useful during designing functions with constant-time execution.</p>
<p>Detection works that’s excellent. Let’s try to use it now on a code that’s constant time and see if UUM is not triggered. Following implementation is functionally equivalent to <code>select_n</code>, but now table lookup is done in constant time. Namely, it <em>always</em> goes thru all the elements of the table. The function calculates a <code>mask</code> variable, which sets all the bits only when element <code>n</code> is processed. Then thanks to logical <code>&amp;</code> value is copied to the variable <code>ret</code>.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode c code-with-copy"><code class="sourceCode c"><span id="cb3-1"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">static</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">inline</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint64_t</span> const_select_n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint64_t</span> n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb3-2">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint64_t</span> mask<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> sign<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> ret <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb3-3">    sign <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">ULL</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">63</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb3-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Always iterate over all elements</span></span>
<span id="cb3-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>POW2_NUM<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++,</span> sign<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;&lt;=</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb3-6">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Arithmetical shift right propagates MSB if</span></span>
<span id="cb3-7">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// set. Thanks to 'sign' set above, this is</span></span>
<span id="cb3-8">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// done only once during whole iteration.</span></span>
<span id="cb3-9">        mask <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">((</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int64_t</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span>sign<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">63</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb3-10">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// With correctly set mask only one value</span></span>
<span id="cb3-11">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// is assigned to 'a' variable</span></span>
<span id="cb3-12">        ret <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|=</span> pow2<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span>i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">]</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> mask<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb3-13">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb3-14">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> ret<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb3-15"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<p>We can now swap the call to <code>select_n</code> with a call to <code>const_select_n</code>. Such implementation doesn’t trigger UUM anymore, as execution doesn’t depend on uninitialized data - the program always reads the whole table. On the flip side, the implementation of <code>const_select_n</code> is much more complicated to analyse, so tools are needed.</p>
<pre class="shell"><code>&gt; clang -g -fsanitize=memory -fsanitize-memory-track-origins=2 -fno-omit-frame-pointer test.c
&gt; ./a.out 47
2^47 = 140737488355328
&gt;</code></pre>
</section>
<section id="CTCHECK" class="level3">
<h3 class="anchored" data-anchor-id="CTCHECK">Utility called <a href="https://github.com/kriskwiatkowski/pqc/blob/main/src/common/ct_check.h"><code>ct_check.h</code></a></h3>
<p>Both, the Memcheck and Memory Sanitizer provide programmatic API, that can be used to design constant-time code. The <a href="https://github.com/kriskwiatkowski/pqc/blob/main/src/common/ct_check.h"><code>ct_check</code></a> provides a unified API for using both of those tools. A flag is used at compile time to control which tool to use. At the development stage, I use both tools- MSan is faster and gives more information in the final report, checks by Memcheck are more granular. Such a wrapper allows writing code only once and hence it is quite useful.</p>
<p>The <code>ct_check.h</code> exposes following functions:</p>
<table class="caption-top table">
<colgroup>
<col style="width: 50%">
<col style="width: 50%">
</colgroup>
<thead>
<tr class="header">
<th>API</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td><code>ct_poison</code></td>
<td>Marks bytes as uninitialized. Switches on constat time checks for certain memory regions. It is wrapper around <code>__msan_allocated_memory</code> and <code>VALGRIND_MAKE_MEM_UNDEFINED</code></td>
</tr>
<tr class="even">
<td><code>ct_purify</code></td>
<td>Marks bytes as initialized. Switches off constat time checks (operation opposite to <code>ct_poison</code>)</td>
</tr>
<tr class="odd">
<td><code>ct_print_shadow</code></td>
<td>Prints state of shadow bits for uninitialized memory region.</td>
</tr>
<tr class="even">
<td><code>ct_expect_uum</code></td>
<td>Instructs the compiler that it expects UUM after a call to this function. It works only with LLVM, useful for testing.</td>
</tr>
<tr class="odd">
<td><code>ct_require_uum</code></td>
<td>Ensures that UUM was before reaching this function. It works only with LLVM, useful for testing. Usually used in blocks <code>ct_expect_uum(); do_non_ct_stuff(); ct_require_uum();</code></td>
</tr>
</tbody>
</table>
<p>With that set of functions, I’ve used <a href="https://github.com/agl/ctgrind/blob/master/test.c">tests implemented by A. Lagnley</a> to ensure the correctness of MSan and <code>ctgrind</code> are the same. Implementation of those tests with <code>ct_check.h</code> is <a href="https://github.com/kriskwiatkowski/pqc/blob/main/test/ct.cpp">here</a>, but indeed, results are the same.</p>
</section>
<section id="FRODO" class="level3">
<h3 class="anchored" data-anchor-id="FRODO">Applying <code>ct_check.h</code> to the existing implementation</h3>
<p>Instead of toy-code, let’s now take an existing, modern cryptographic implementation, which was vulnerable to timing attacks and see if Memory Sanitizer can detect a problem in vulnerable code. Quantum-safe cryptographic implementations are currently my main focus, so I’ll apply it to one of <em>Key Encapsulation Mechanism</em> (KEM) submitted to NIST for post-quantum standardization. All the code presented below comes from <a href="https://github.com/kriskwiatkowski/pqc">PQC</a> library available on Github (branch called <code>blog/frodo_constant_time_issue</code>).</p>
<p>A KEM is defined by 3 algorithms. A <em>key generation</em> returning pair of public and private keys, <em>encapsulation</em> algorithm which uses the public key to return <strong>shared secret</strong> in plain form and in encrypted form as <strong>ciphertext</strong>. Finally, <em>decapsulation</em> algorithm, that takes ciphertext and secret key as an input and returns <strong>shared secret</strong>, which then can be used for symmetric encryption (i.e.&nbsp;<a href="https://blog.cloudflare.com/the-tls-post-quantum-experiment">in TLS</a>). To avoid leaking the secret key, the <em>decapsulation</em> function must ensure that the operation done on the private key is constant-time. This problem has been reported in <a href="https://frodokem.org/">FrodoKEM</a> and exploited in recent <a href="https://eprint.iacr.org/2020/743">paper</a>. In that work, the authors propose (section 3) a generic side-channel technique that can be applied to recover the secret key of (LWE-based) KEM. Then (in section 4) describes how to use that technique to recover the FrodoKEM key. I highly recommend the paper (or <a href="https://www.youtube.com/watch?v=e9ZK3RQ0Ykk">video</a>) to anybody interested in secure cryptographic implementations.</p>
<p>The following, variable time, implementation allowed attack to succeed.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode c code-with-copy"><code class="sourceCode c"><span id="cb5-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// https://github.com/kriskwiatkowski/pqc/blob/e57a8915834e08998f1a93f3d111cfaf3fcd94a7/src/kem/frodo/frodokem640shake/clean/kem.c#L229</span></span>
<span id="cb5-2"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> PQCLEAN_FRODOKEM640SHAKE_CLEAN_crypto_kem_dec<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint8_t</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>ss<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">const</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint8_t</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>ct<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">const</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint8_t</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>sk<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb5-3">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">...</span></span>
<span id="cb5-4">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>memcmp<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>Bp<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> BBp<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>PARAMS_N<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>PARAMS_NBAR<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;&amp;</span></span>
<span id="cb5-5">        memcmp<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>C<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> CC<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>PARAMS_NBAR<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>PARAMS_NBAR<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb5-6">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Load k' to do ss = F(ct || k')</span></span>
<span id="cb5-7">        memcpy<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>Fin_k<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> kprime<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> CRYPTO_BYTES<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb5-8">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb5-9">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Load s to do ss = F(ct || s)</span></span>
<span id="cb5-10">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// This branch is executed when a malicious ciphertext is decapsulated</span></span>
<span id="cb5-11">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// and is necessary for security. Note that the known answer tests</span></span>
<span id="cb5-12">        <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// will not exercise this line of code but it should not be removed.</span></span>
<span id="cb5-13">        memcpy<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>Fin_k<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> sk_s<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> CRYPTO_BYTES<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb5-14">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<p>The ciphertext is a concatenation of two parts <code>ciphertext = Bp || C</code>. The <em>decapsulation</em> function, implemented by FrodoKEM, uses a secret key to decrypt the ciphertext, encrypt it again and compares a result with ciphertext received (see Fujisaki-Okamoto transform). That is what is being done in the code above. Values, <code>BBp</code> and <code>CC</code> represent ciphertext that was recomputed during decapsulation. Those values are compared with received ciphertext. If the comparison succeeds, the <strong>shared secret</strong> <code>sk_k</code>is returned, otherwise the function returns some random value.</p>
<p>There are two problems related to variable time execution:</p>
<ol type="1">
<li>comparison uses <code>memcmp</code>: this function is not constant-time - it fails as soon as it detects the first difference</li>
<li>it is used in short-circuit evaluation: in case first <code>memcmp</code> returns a value different than <code>0</code>, second <code>memcmp</code> is not called. Hence that’s also not constant-time behaviour.</li>
</ol>
<p>The first issue is already enough to recover the private key. Let’s see if Memory Sanitizer will help to design constant-time implementation. I’m using <a href="https://github.com/kriskwiatkowski/pqc/tree/main/src/kem/frodo/frodokem640shake/clean">PQC</a> library, which integrates both variable and constant time decapsulation in FrodoKEM/640. Let’s start with a unit test:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode c code-with-copy"><code class="sourceCode c"><span id="cb6-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Uses GTEST and C++</span></span>
<span id="cb6-2">TEST<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>Frodo<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> CtDecaps<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb6-3"></span>
<span id="cb6-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Get descriptor of an algorithm</span></span>
<span id="cb6-5">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">const</span> pqc_ctx_t <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>p <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pqc_kem_alg_by_id<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>PQC_ALG_KEM_FRODOKEM640SHAKE<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb6-6"></span>
<span id="cb6-7">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Initialize buffers for KEM output</span></span>
<span id="cb6-8">    std<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>vector<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint8_t</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> sk<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>pqc_private_key_bsz<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>p<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">));</span></span>
<span id="cb6-9">    std<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>vector<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint8_t</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> pk<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>pqc_public_key_bsz<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>p<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">));</span></span>
<span id="cb6-10">    std<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>vector<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint8_t</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> ct<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>pqc_ciphertext_bsz<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>p<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">));</span></span>
<span id="cb6-11">    std<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>vector<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint8_t</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> ss<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>pqc_shared_secret_bsz<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>p<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">));</span></span>
<span id="cb6-12">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">bool</span> res<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb6-13"></span>
<span id="cb6-14">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Generate key pair and perform encapsulation</span></span>
<span id="cb6-15">    ASSERT_TRUE<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>pqc_keygen<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>p<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> pk<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>data<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(),</span> sk<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>data<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()));</span></span>
<span id="cb6-16">    ASSERT_TRUE<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>pqc_kem_encapsulate<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>p<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> ct<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>data<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(),</span> ss<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>data<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(),</span> pk<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>data<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()));</span></span>
<span id="cb6-17"></span>
<span id="cb6-18">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Mark secret material as uninitialized, so that variable time implementation causes UUM.</span></span>
<span id="cb6-19">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// First 16 bytes is a shared secret, then next 9616 is just a public key, and then next</span></span>
<span id="cb6-20">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// 10240 is another part of secret material (a secret matrix S used by FrodoKEM). Both</span></span>
<span id="cb6-21">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// shared secret and matrix S not leak, but it is OK to do variable-time operations on</span></span>
<span id="cb6-22">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// public key.</span></span>
<span id="cb6-23">    ct_poison<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>sk<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>data<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(),</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb6-24">    ct_poison<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">((</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">unsigned</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">char</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*)</span>sk<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>data<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9616</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">640</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb6-25"></span>
<span id="cb6-26">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Decapsulate</span></span>
<span id="cb6-27">    res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pqc_kem_decapsulate<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>p<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> ss<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>data<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(),</span> ct<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>data<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(),</span> sk<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>data<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">());</span></span>
<span id="cb6-28"></span>
<span id="cb6-29">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Purify res to allow non-ct check by ASSERT_TRUE</span></span>
<span id="cb6-30">    ct_purify<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(&amp;</span>res<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb6-31">    ASSERT_TRUE<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>res<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb6-32"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<p>The test is compiled with flags enabling Memory Sanitizer and origin tracking. When run, it correctly triggers UUM as expected.</p>
<pre class="shell"><code>
./ut --gtest_filter="Frodo.CtDecaps"
Running main() from /home/kris/repos/pqc/3rd/gtest/googletest/src/gtest_main.cc
Note: Google Test filter = Frodo.CtDecaps
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from Frodo
[ RUN      ] Frodo.CtDecaps
Uninitialized bytes in MemcmpInterceptorCommon at offset 0 inside [0x7ffc94382040, 10240)
==3099896==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x559140afe8cd in memcmp (build.msan.debug/ut+0xd08cd)
    #1 0x5591410d5020 in PQCLEAN_FRODOKEM640SHAKE_CLEAN_crypto_kem_dec kem.c:233:9
    #2 0x559140e75c5b in pqc_kem_decapsulate pqapi.c:112:13
    #3 0x559140b30983 in Frodo_CtDecaps_Test::TestBody() test/ut.cpp:148:11
    #4 0x559140de55a4 in void testing::internal::HandleSehExceptionsInMethodIfSupported&lt;testing::Test, void&gt;(testing::Test*, void (testing::Test::*)(), char const*) (build.msan.debug/ut+0x3b75a4)
    ...
  Uninitialized value was stored to memory at
    #0 0x5591410dec98 in PQCLEAN_FRODOKEM640SHAKE_CLEAN_key_decode /home/kris/repos/pqc/src/kem/frodo/frodokem640shake/clean/util.c:123:18
    #1 0x5591410d44e0 in PQCLEAN_FRODOKEM640SHAKE_CLEAN_crypto_kem_dec /home/kris/repos/pqc/src/kem/frodo/frodokem640shake/clean/kem.c:184:5
    ...</code></pre>
<p>Runs as expected. In this case, Memory Sanitizer proves itself to be useful for the detection of code that’s not constant-time. It would find a bug in FrodoKEM if it was used.</p>
<p>In this case, <code>ct_check.h</code> has detected that uninitialized memory is used at line 229 (call to <code>memcmp</code>). It also gives a lot of additional output, due to origin tracking enabled (I have removed most of it). Now, to make the code constant-time, we must swap usage of <code>memcmp</code> with the implementation that compares bytes in constant-time. Implementation of such function looks like this:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode c code-with-copy"><code class="sourceCode c"><span id="cb8-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Compares in constant time two byte arrays of size 'n'</span></span>
<span id="cb8-2"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint8_t</span> ct_memcmp<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">const</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">void</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>a<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">const</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">void</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>b<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">size_t</span> n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb8-3">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">const</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint8_t</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>pa <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint8_t</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*)</span> a<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>pb <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint8_t</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*)</span> b<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb8-4">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint8_t</span> r <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb8-5">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// XOR bytes in 'a' with corresponding bytes in 'b'. If</span></span>
<span id="cb8-6">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// all bytes are equal, 'r' will be == 0.</span></span>
<span id="cb8-7">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">while</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">--)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span> r <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>pa<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>pb<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb8-8">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Set most significant bit to 1 only if r!=0, otherwise</span></span>
<span id="cb8-9">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// r stays == 0</span></span>
<span id="cb8-10">    r   <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>r <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> r<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb8-11">    r <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb8-12">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// return last byte - 0 means a==b</span></span>
<span id="cb8-13">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> r<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb8-14"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<p>After swapping <code>memcmp</code> with <code>ct_memcmp</code> and running with Memory Sanitizer, UUM has not triggered anymore. And in this case, that’s <strong>BAD</strong>. The first problem is fixed, but the second problem is not - code is still not constant-time. We can verify that by running the same code in Valgrind (thanks to <code>ct_check.h</code>).</p>
<pre class="shell"><code>&gt; valgrind --tool=memcheck ./ut --gtest_filter="Frodo.CtDecaps"
==3096880== Memcheck, a memory error detector
==3096880== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==3096880== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==3096880== Command: ./ut --gtest_filter=*CtDecaps*
==3096880==
Running main() from /home/kris/repos/pqc/3rd/gtest/googletest/src/gtest_main.cc
Note: Google Test filter = *CtDecaps*
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from Frodo
[ RUN      ] Frodo.CtDecaps
==3096880== Conditional jump or move depends on uninitialised value(s)
==3096880==    at 0x1B721E: PQCLEAN_FRODOKEM640SHAKE_CLEAN_crypto_kem_dec (kem.c:244)
==3096880==    by 0x179C7D: pqc_kem_decapsulate (pqapi.c:112)
==3096880==    by 0x11A4AD: Frodo_CtDecaps_Test::TestBody() (ut.cpp:148)
...</code></pre>
<p>Ok, Memcheck correctly reports an error. Interesting, but why?</p>
</section>
<section id="BUG" class="level3">
<h3 class="anchored" data-anchor-id="BUG">Shadow memory propagation: MSan vs Valgrind</h3>
<p>It took me some time to understand why it doesn’t work. It turns out that rules for shadow memory propagation are different in Memcheck and LLVM’s Memory Sanitizer. After analysing FrodoKEM, I’ve found out that the root cause boils down to the following example:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode c code-with-copy"><code class="sourceCode c"><span id="cb10-1"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">&lt;stdio.h&gt;</span></span>
<span id="cb10-2"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">&lt;stdlib.h&gt;</span></span>
<span id="cb10-3"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">&lt;stdint.h&gt;</span></span>
<span id="cb10-4"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">"common/ct_check.h"</span></span>
<span id="cb10-5"></span>
<span id="cb10-6"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> main<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> argc<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">const</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">char</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> argv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[])</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb10-7">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Store 1 bit of first argument provided at command line</span></span>
<span id="cb10-8">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint16_t</span> sign <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">((</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint16_t</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span>atoi<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>argv<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">]))</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb10-9">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint16_t</span> s<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb10-10">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Use ct_poison and logical AND to mark least significant bit</span></span>
<span id="cb10-11">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// of 2-byte value as uninitialized.</span></span>
<span id="cb10-12">    ct_poison<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(&amp;</span>sign<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb10-13">    sign <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> sign <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb10-14">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Print shadow memory - should produce output 01 00.</span></span>
<span id="cb10-15">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// It means that only least significant bit is uninitialized,</span></span>
<span id="cb10-16">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// the rest is properly initialized.</span></span>
<span id="cb10-17">    printf<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Shadow memory: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb10-18">    ct_print_shadow<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(&amp;</span>sign<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb10-19">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// On Intel, negation is two's complement operation. So depending</span></span>
<span id="cb10-20">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// on value of 'sign' (0 or 1), shadow propagation may be needed.</span></span>
<span id="cb10-21">    s <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(-</span>sign<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Same as 's = ~sign + 1'</span></span>
<span id="cb10-22">    ct_print_shadow<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(&amp;</span>s<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb10-23">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Take a branch depending on uninitialized value. Should trigger UUM.</span></span>
<span id="cb10-24">    s <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb10-25">    ct_print_shadow<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(&amp;</span>s<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb10-26">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>s<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb10-27">        printf<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Branch A taken</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb10-28">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb10-29">        printf<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Branch B taken</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb10-30">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb10-31"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<p>The program reads input from the command line and stores it in 16-bit value <code>sign</code>. It marks the least significant bit of <code>sign</code> as uninitialized and then negates the value of <code>sign</code>. Negation on Intel platform is done by <a href="https://en.wikipedia.org/wiki/Two%27s_complement">Two’s complement</a> operation, hence it is equal to <code>s = ~sign + 1</code>. So, if <code>sign == 0</code>, then <code>~sign == 0xFFFF</code> and adding value <code>1</code> will cause all the bits to flip, hence uninitialized state should be propagated to all the bits (similar operation is done by <code>frodo_sample_n</code> function in FrodoKEM).</p>
<p>Finally, a branch is taken depending on the value of a most significant bit of <code>s</code>. An important point to notice here is that - <strong>execution path of the program depends on uninitialized data</strong>, so UUM should be triggered. Let’s run, first Memory Sanitizer.</p>
<pre class="shell"><code>&gt; clang -g -O0 -DPQC_USE_CTSANITIZER -fsanitize=memory -fno-omit-frame-pointer -fno-optimize-sibling-calls test.c
&gt; ./a.out 0
Shadow memory:
01 00
01 00
00 00
Branch A taken
&gt; ./a.out 1
Branch B taken</code></pre>
<p>UUM happy, nothing reported. The <code>ct_print_shadow</code> shows the state of shadow memory - the second line shows that only the least significant bit is marked uninitialized, so after the right (logical) shift, all bits must be properly initialized. Now Memcheck:</p>
<pre class="shell"><code>&gt; clang -g -O0 -DPQC_USE_CTGRIND -fno-omit-frame-pointer -fno-optimize-sibling-calls test.c
&gt; valgrind --tool=memcheck ./a.out 0
==3156952== Memcheck, a memory error detector
==3156952== Command: ./a.out 0
Shadow memory:
01 00
FF FF
01 00
==3156952== Conditional jump or move depends on uninitialised value(s)
==3156952==    at 0x109239: main (test.c:26)
Branch A taken</code></pre>
<p>As expected UUM is not happy. It is clear that shadow propagation rules are different - memcheck propagates shadow memory following complement’s two operation and Memory Sanitizer uses some less strict rules.</p>
<p>Initially, I thought that’s a bug (reported in <a href="https://github.com/google/sanitizers/issues/1430">GH#1430</a>, <a href="https://github.com/google/sanitizers/issues/1424">GH#1424</a>, <a href="https://github.com/google/sanitizers/issues/1427">GH#1427</a>), but MSan maintainers from Google made me realize (thanks!) that it is a design decision which allows to faster execution. Indeed, looking at shadow propagation rules, described in <a href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43308.pdf">“MemorySanitizer: fast detector of uninitialized memory use in C++”</a> (see chapter 3.3.1), it seems shadow memory propagation following carry propagation is not implemented for efficiency reasons.</p>
</section>
<section id="speed" class="level3">
<h3 class="anchored" data-anchor-id="speed">Speed</h3>
<p>The graph below shows execution time difference when running FrodoKEM decapsulation without any instrumentation, with compile-time instrumentation Memory Sanitizer and then runtime instrumentation done by Valgrind’s Memcheck. The origin tracking is pretty expensive, so separated results are shown for a run with those enabled and disabled.</p>
<div id="a27ba552" class="cell" data-fig-format="png" data-execution_count="1">
<div class="cell-output cell-output-display">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.amongbytes.com/posts/20210709-testing-constant-time/20210709-testing-constant-time_files/figure-html/cell-2-output-1.png" class="quarto-figure quarto-figure-center figure-img" width="591" height="412"></p>
</figure>
</div>
</div>
</div>
<p>The control run shows that decapsulation takes around 3000 ms. With origin tracking disabled, the Memory Sanitizer seems to be 5 times slower comparing to the control run. Memcheck is 25 times slower. Then, with origin tracking enabled, Memory Sanitizer incurs 9x slowdown, in the case of Valgrind it is 59x - that’s a big difference.</p>
<p>Execution of a code instrumented at compile time is much faster. As described in the earlier section, Memcheck does more granular checks, so slower execution is expected. Nevertheless, the difference is significant. It should be noted that results do not include setup time. It means they are slightly biased as the whole runtime instrumentation done by Valgrind is not included in those results.</p>
</section>
<section id="conclusion-limitations-and-future-direction" class="level3">
<h3 class="anchored" data-anchor-id="conclusion-limitations-and-future-direction">Conclusion, limitations and future direction</h3>
<p>Memory Sanitizer, in some rare cases, is not going to discover uses of uninitialized data. This negatively impacts checking constant-time implementations. But still, it is pretty good at a job. My CI runs a build with Memory Sanitizer anyway, so adding extra checks has zero costs. Nevertheless, at the development stage, I use both, Memcheck and MSan the additional assurance provided by Memcheck is needed. I think it would be useful to have a possibility of controlling rules for shadow memory propagation in Memory Sanitizer. I.e. a compilation flag to use for choosing between more performant or more granular checks.</p>
<p>But what I would like to see in the future is a type system and better integration with a build system. Imagine that all the variables in the code that are used to store sensitive data, could use some kind of “secure” type (or annotation). Then, by introducing special build configuration, we could tell build system to instrument the code in a way that data using “secure” type is automatically marked uninitialized. In case of non constant-time access and with proper unit-tests, UUM detector would automatically report errors. I think it is an interesting feature for modern programming language like Rust, which seems to occupy a space of secure implementations.</p>
<p>Comming back to current state of art, it is also worth to mention that Memory Sanitizer has some additional limitations: * it is <a href="https://clang.llvm.org/docs/MemorySanitizer.html#supported-platforms">supported</a> by Linux/x86_64, NetBSD and FreeBSD only * requires to instrument all memory accesses in the program. This includes standard C++ library (i.e.&nbsp;used by <code>gtest</code>). Instructions <a href="https://github.com/google/sanitizers/wiki/MemorySanitizerLibcxxHowTo">here</a> describe how to instrument and integrate <code>libc++</code> into a project.</p>
<p>Finally, side-channel attacks are much more complicated and there is no single tool which will be able to detect them. But from the other hand problems like the one in FrodoKEM, described above, are pretty basic. Automatic detection of such bugs is possible and and should be done by tools, so that Cryptography Engineers can spent time on more interesting things.</p>


</section>

 ]]></description>
  <guid>https://www.amongbytes.com/posts/20210709-testing-constant-time/20210709-testing-constant-time.html</guid>
  <pubDate>Fri, 09 Jul 2021 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Experimenting with NGINX</title>
  <link>https://www.amongbytes.com/posts/20210612-nginx-setup.html</link>
  <description><![CDATA[ 





<p>This page describes how to enable support for some features on NGINX, i.e.&nbsp;post-quantum schemes or QUIC protocol. Page is updated on as-needed bases, some parts of it may be specific to Debian Linux.</p>
<section id="sources" class="level1">
<h1>Sources</h1>
<p>I’ll get NGINX sources, change it’s the build configuration, re-compile the server and rebuild <code>deb</code> package. To get sources, we have to add the NGINX repositories to the <code>/etc/apt/sources/list</code>.</p>
<p>The following two lines go to the end of a file:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode c code-with-copy"><code class="sourceCode c"><span id="cb1-1">deb https<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//nginx.org/packages/mainline/debian buster nginx</span></span>
<span id="cb1-2">deb<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>src https<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//nginx.org/packages/mainline/debian buster nginx</span></span></code></pre></div></div>
<p>Then add NGINX public key for verification and download the sources:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> wget https://nginx.org/keys/nginx_signing.key</span>
<span id="cb2-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> apt-key add nginx_signing.key</span>
<span id="cb2-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> apt-get update</span>
<span id="cb2-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> apt-get upgrade</span>
<span id="cb2-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> apt-get build-dep nginx</span>
<span id="cb2-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> apt-get source nginx</span></code></pre></div></div>
<p>Sources should now be downloaded to the <code>nginx</code> directory.</p>
</section>
<section id="link-nginx-with-the-boringssl" class="level1">
<h1>Link NGINX with the BoringSSL</h1>
<p>My current setup of HTTP server uses <code>bssl</code>. The choice comes from the fact that it’s simpler, documentation is clearer and it contains more modern features (or those I’m intersted in).</p>
<p>To link NGINX with BoringSSL, one needs to copy the sources to <code>nginx/debian/modules</code> and compile it.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb3-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">cd</span> nginx/debian/modules/ <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">&amp;&amp;</span></span>
<span id="cb3-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">git</span> clone https://github.com/google/boringssl</span>
<span id="cb3-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mkdir</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-p</span> boringssl/build <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">&amp;&amp;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">cd</span> boringssl/build</span>
<span id="cb3-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cmake</span> .. <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">&amp;&amp;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">make</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-j</span> 8</span></code></pre></div></div>
<p>Following step instructs NGINX to use BoringSSL instead of OpenSSL (used by default). To do that, one needs to modify the rules file <code>nginx/debian/rules</code>.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb4-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">config.status.nginx:</span> config.env.nginx</span>
<span id="cb4-2">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">cd</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">$(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">BUILDDIR_nginx</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">)</span> <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">&amp;&amp;</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb4-3">    <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">CFLAGS</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">./configure</span> {...} <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--with-stream_ssl_preread_module</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb4-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--with-cc-opt</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"-I</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">$(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">CURDIR</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">)</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">/debian/modules/boringssl/include </span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">$(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">CFLAGS</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">)</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;"> -Wno-ignored-qualifiers"</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb4-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--with-ld-opt</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"-L</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">$(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">CURDIR</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">)</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">/debian/modules/boringssl/build/ssl </span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">\</span></span>
<span id="cb4-6"><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">    -L</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">$(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">CURDIR</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">)</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">/debian/modules/boringssl/build/crypto""</span></span></code></pre></div></div>
<p>Example above shows how to modify a “release” target, but if needed, debug target can be modified in exact same way. Notice that <code>-Wno-ignored-qualifiers</code> has been added. That’s because BoringSSL throws compilation warnings, which become errors in the NGINX build.</p>
</section>
<section id="adding-quic-support" class="level1">
<h1>Adding QUIC support</h1>
<p>The QUIC protocol specified by RFC9000 opens new possibilities. Something I would definitely like to try. Clone the newest version and overwrite whatever is currently provided by NGINX.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb5-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">hg</span> clone <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-b</span> quic https://hg.nginx.org/nginx-quic</span>
<span id="cb5-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rsync</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-r</span> nginx-quic/ nginx</span></code></pre></div></div>
<p>Again, the rules file needs to be modified to enable the support. Add <code>--with-http_v3_module --with-http_quic_module --with-stream_quic_module</code> to <code>config.env.nginx</code> and <code>config.env.nginx_debug</code> targets (somewhere after <code>--with-stream_ssl_preread_module</code>).</p>
</section>
<section id="package-creation" class="level1">
<h1>Package creation</h1>
<p>Following commands re-create debian package with NGINX, which the can be installed by <code>dpkg</code>. This two step procedure requires, first to modify <code>nginx/debian/changelog</code> file and add information about changes done to the package. Add something like:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb6-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">nginx</span> <span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">1.21.0-2~buster</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">buster</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">;</span> <span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">urgency</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>low</span>
<span id="cb6-2"></span>
<span id="cb6-3">  <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">*</span> 1.21.0-1 adds quic</span>
<span id="cb6-4"></span>
<span id="cb6-5"> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">--</span> Kris Kwiatkowski <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>kris@amongbytes.com<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span>  Tue, 12 Jun 2021 16:01:22 +0300</span></code></pre></div></div>
<p>Next step is to build a package. Build process will use GPG key to sign the package. To specify a key I use <code>-kkris@amongbytes.com</code> which identify secret key used for signing.</p>
<p>To start the build use following command:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sudo</span> dpkg-buildpackage <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-b</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-kkris@amongbytes.com</span></span></code></pre></div></div>
</section>
<section id="nginx-configuration" class="level1">
<h1>NGINX configuration</h1>
<p>Following instructions enable post-quantum in TLS and add support for QUIC protocol (unfortunatelly, PQ in QUIC is not supported).</p>
<ol type="1">
<li><strong>Post-Quantum</strong> support: BoringSSL supports post-quantum key exchange. It can be enabled only in TLS v1.3 and uses a variant of NTRU-HRSS mixed with X25519, called <code>CECPQ2</code> (detailed description <a href="https://blog.cloudflare.com/the-tls-post-quantum-experiment/">here</a>). To enable that support, following line needs to be added to nginx.conf</li>
</ol>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb8-1">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ssl_protocols</span> TLSv1.2 TLSv1.3<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">;</span>      <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Enable both TLS 1.3 and 1.2</span></span>
<span id="cb8-2">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ssl_ecdh_curve</span> CECPQ2:X25519:P-256<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">;</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Enable PQ key exchange</span></span></code></pre></div></div>
<p>It is important to add CECPQ2 as a first on that list as well as add some classical key exchange algorithm for backward compatibility. This server supports post-quantum key exchange.</p>
<ol start="2" type="1">
<li><strong>QUIC</strong> support: for HTTP/3 over QUIC add following changes to the virtual server config:</li>
</ol>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb9-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">server{</span></span>
<span id="cb9-2">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">listen</span> 443 http3 quic reuseport<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">;</span></span>
<span id="cb9-3">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">listen</span> 443 ssl http2<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">;</span></span>
<span id="cb9-4"></span>
<span id="cb9-5">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">quic_retry</span> on<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">;</span></span>
<span id="cb9-6">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">ssl_early_data</span> on<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">;</span></span>
<span id="cb9-7"></span>
<span id="cb9-8">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">http3_max_field_size</span> 5000<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">;</span></span>
<span id="cb9-9">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">http3_max_table_capacity</span> 50<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">;</span></span>
<span id="cb9-10">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">http3_max_blocked_streams</span> 30<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">;</span></span>
<span id="cb9-11">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">http3_max_concurrent_pushes</span> 30<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">;</span></span>
<span id="cb9-12">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">http3_push</span> 10<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">;</span></span>
<span id="cb9-13">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">http3_push_preload</span> on<span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">;</span></span>
<span id="cb9-14"></span>
<span id="cb9-15">    <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">add_header</span> alt-svc <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'$quic=":443"; ma=3600'</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">;</span></span></code></pre></div></div>
<p>The add_header alt-svc is need to make sure that the web browser will know that your server supported http/3. Other settings also need to setup in order for the NGINX QUIC not to produce 404 error on your file assets.</p>


</section>

 ]]></description>
  <guid>https://www.amongbytes.com/posts/20210612-nginx-setup.html</guid>
  <pubDate>Sat, 12 Jun 2021 00:00:00 GMT</pubDate>
  <media:content url="https://www.amongbytes.com/img/webserver.png" medium="image" type="image/png" height="68" width="144"/>
</item>
<item>
  <title>OpenVPN authentication hardened with ARM TrustZone</title>
  <dc:creator>Kris Kwiatkowski</dc:creator>
  <link>https://www.amongbytes.com/posts/20210112-optee-openssl-engine/20210112-optee-openssl-engine.html</link>
  <description><![CDATA[ 





<p>The goal is to connect an embedded device to VPN network. The VPN uses authentication with X.509 certificates, which means that the device needs to store securely a private key. The question is, how to protect the key from being copied? Many ideas have been explored already, in this particular case, I’ll describe the solution which uses secure enclave. The project itself is quite easy to implement and it can serve as a hands-on intro to the ARM TrustZone-based TEEs.</p>
<p>Earlier last year, I <a href="https://www.amongbytes.com/post/201904-tee-sign-delegator/">needed</a> an implementation of TLS server, which stored private keys in the secure enclave, namely <a href="https://readthedocs.org/projects/optee/downloads/pdf/latest/">OP-TEE</a> running in the Trusted Execution Environment (TEE), protected by <a href="../../post/201805-trustzone_pres_cf/">ARM TrustZone</a>. A Similar idea will be used here, with software stack integration, being the main difference. A Previous project integrated the solution with BoringSSL, which requires changing the internals of the library. The preferred solution would not touch the internals of the TLS library, but rather work as a form of a plugin to the existing framework. OpenSSL implements the <a href="https://eprint.iacr.org/2018/354">ENGINE</a> API, which can be (and actually is) used as a way to implement cryptographic backends.</p>
<p>Finally, this is what I want to end up with:</p>
<div style="text-align:center;">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.amongbytes.com/posts/20210112-optee-openssl-engine/openvpn_tee.png" class="img-fluid figure-img" style="width:80.0%"></p>
<figcaption>Flow for OpenVPN with private client key in TEE</figcaption>
</figure>
</div>
</div>
<p>The private key will be stored in a secure enclave. The OpenVPN calls OpenSSL for cryptographic operations and operations related to TLS. At the init phase, the OpenSSL will load my implementation of the ENGINE API, which I call <a href="https://github.com/henrydcase/optee_eng">OpTEE ENGINE</a>. This implements a callback, that’s called by TLS stack, for message signing. Finally, the engine implementation forwards the signing to the OP-TEE, which is the place where private key operation happens.</p>
<section id="security" class="level3">
<h3 class="anchored" data-anchor-id="security">Security</h3>
<p>But firstly, why do I think secure storage provides enough security?</p>
<p>The TEE that I’m claims compliance with GlobalPlatform API. Looking at the GP requirements in this <a href="https://globalplatform.org/wp-content/uploads/2018/06/GPD_TEE_Internal_Core_API_Specification_v1.1.2.50_PublicReview.pdf">specification</a> (see 2.2.2), the basic requirement regarding secure storage are to:</p>
<ul>
<li>obviously, encrypt the data (provide confidentiality as well as integrity)</li>
<li>be bound to a device, this one is important. It means that sensitive data can be accessed only by those applications which are running on a particular device and in the particular TEE (there may be <a href="https://www.businesswire.com/news/home/20190224005093/en/Trustonic-and-Huawei-Disrupt-Application-Shielding-Market-with-Partnership-to-Introduce-First-Multi-TEE-Security-Platform-for-Mobile-App-Developers">multiple</a> TEEs on the same device).</li>
<li>have an ability to hide sensitive keys form the TEE process running in the TEE</li>
<li>allow access to the data only by the TEE application which has created it (btw: TA=Trusted Application, an application running in the TEE).</li>
</ul>
<p>In my context, it means that VPN private key is stored encrypted and can be used only by a single device. The secure storage can be copied to a different device, but as it is bound to a particular one, it can’t be decrypted there. Key can’t be accessed by malicious TA installed on the same device thanks to access separation. Finally, the TA that owns the key doesn’t have access to the sensitive data, so in case of a bug in the TA, the key doesn’t leak. It may leak in case of bug in the TEE, but in this case, the whole system is probably already compromised.</p>
<!--
1. I should have this PKI:
CA -> intermediate CA stored on the device -> device generates client cert and uses INT CA to sign it.
2. Device deletes intermediate cert (after first boot).
-->
<p>The spec gives hope for a decent level of security. Looking at implementation details, the <strong>Key Manager</strong> is a component implemented in OP-TEE, which ensures confidentiality and integrity of the data (see implementation details of <a href="https://optee.readthedocs.io/en/latest/architecture/secure_storage.html">secure storage</a>). To provide device binding it uses <strong>Hardware Unique Key</strong> (HUK), which is defined as symmetric secret key stored in a piece of hardware (often in the SoC itself) of the device and is globally unique. OP-TEE uses it to derive, a key called <strong>SSK</strong>, which is then used to provide device binding. <strong>SSK</strong> is created at boot-time and stored in secure memory (never stored on disk):</p>
<div style="text-align:center;">
<div class="callout callout-style-simple callout-note no-icon">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-body-container">
<p>SSK = HMAC-SHA256(HUK, Chip ID || “some data as salt”)</p>
</div>
</div>
</div>
</div>
<p>The <strong>SSK</strong> is then used to derive <strong>TSK</strong> key which is unique per TA installed in the TEE. This provides a possibility to allow access to the data only for TA which owns it. Finally, there is a <strong>FEK</strong>, randomly generated key used for file encryption.</p>
<p>An <strong>Important</strong> part of this whole story, but just implementation detail: the OP-TEE, as on GitHub, doesn’t actually try to use HUK. Retrieval of the HUK is specific to the SoC and needs to be implemented during integration with the concrete device/platform. Namely, there is a function called <a href="https://github.com/OP-TEE/optee_os/blob/9c525fe4cdfe48d2394afe8440a56e7db36cc91f/core/arch/arm/kernel/otp_stubs.c#L20"><code>tee_otp_get_hw_unique_key</code></a>, which must be filled with proper code for HUK retrieval. Similarly, to provide secure storage, the “chip ID” needs to be also retrieved, this is done by <a href="https://github.com/OP-TEE/optee_os/blob/9c525fe4cdfe48d2394afe8440a56e7db36cc91f/core/arch/arm/kernel/otp_stubs.c#L26"><code>tee_otp_get_die_id</code></a> which also needs to be filled with proper code. Currently, OP-TEE uses the stream of 0 bytes, as HUK.</p>
<p>Finally, the secure storage kept in normal world OS filesystem (<code>/data/tee</code> by default on linux). This subsystem uses AES/128. My ultimate goal is to have quantum-resistant TEE and AES/128 is too small to be resist quantum attacks (because of <a href="https://en.wikipedia.org/wiki/Grover%27s_algorithm">Grover’s</a> algorithm), hence migration to 256-bit symmetric key is needed.</p>
</section>
<section id="tls-client-authentication" class="level3">
<h3 class="anchored" data-anchor-id="tls-client-authentication">TLS client authentication</h3>
<p>The X.509 certificates are used to authenticate a client to a VPN server. In this authentication method, a client sends a certificate and a proof for possession of the private key that corresponds to that certificate. In this case, the private key never leaves TEE, hence the primary functionality of an application running in the TEE, is to create a proof when requested.</p>
<p>Looking at the TLS level (TLSv1.3), the client authentication starts with a server requesting it in TLS Server Hello (<a href="https://tools.ietf.org/html/rfc8446#page-60">4.3.2. of RFC 8446</a>). In response, the client produces a proof by creating following signature:</p>
<div style="text-align:center;">
<div class="callout callout-style-simple callout-note no-icon">
<div class="callout-body d-flex">
<div class="callout-icon-container">
<i class="callout-icon no-icon"></i>
</div>
<div class="callout-body-container">
<p>proof = sign(0x20 byte repeated 32 times || “TLS 1.3, client CertificateVerify” || 0 || transcript hash)</p>
</div>
</div>
</div>
</div>
<p>The client uses the same algorithm as the one used when signing X.509 certificate and a private key, to create a signature. Signature is created over a concatenation of strings defined in the RFC (section <a href="https://tools.ietf.org/html/rfc8446#section-4.4.3">4.4.3</a>) and a TLS transcript hash (section <a href="https://tools.ietf.org/html/rfc8446#section-4.4.1">4.4.1</a>). Both, the X.509 certificate and proof are sent back to the server for verification.</p>
</section>
<section id="secure-world" class="level3">
<h3 class="anchored" data-anchor-id="secure-world">Secure world</h3>
<p>The Trusted Application is mostly copied from the <a href="https://www.amongbytes.com/post/201904-tee-sign-delegator/">previous</a> project. In the current state, it is assumed that the key is loaded to TEE at some initial point, and then it is used when Normal World requests signing. An alternative implementation, could create a private key during the first boot and use it to create CSR, which is then signed by the CA and returned to the device. It’s a more complicated process, but this way, one can ensure that the client’s private key never existed anywhere else but on the device.</p>
<p>The demo TA comes with a simple <a href="https://github.com/henrydcase/optee_eng/blob/master/src/optee_engine/keymgnt.c">key management app</a> which can be used to install or remove keys from the device. It is also a good place to see how communication from Normal World to Secure World is implemented. Assuming, the TEE is running on the device, and tee-supplicant with Linux driver is loaded in the Normal World (see here for <a href="https://optee.readthedocs.io/en/latest/building/gits/build.html">setup</a>), an application can use GlobalPlatform API to send/receive requests to/from TEE. The code would look somehow like that:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" data-code-line-numbers="" style="background: #f1f3f5;"><pre class="sourceCode c code-with-copy"><code class="sourceCode c"><span id="cb1-1">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// TEE context</span></span>
<span id="cb1-2">    TEEC_Context ctx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-3">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Session with the TA</span></span>
<span id="cb1-4">    TEEC_Session sess<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-5">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Operation context</span></span>
<span id="cb1-6">    TEEC_Operation op<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-7">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// ID of an app in the TEE</span></span>
<span id="cb1-8">    TEEC_UUID uuid <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> TA_UUID<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-9"></span>
<span id="cb1-10">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Initialize a context connecting us to the TEE</span></span>
<span id="cb1-11">    TEEC_InitializeContext<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>NULL<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>ctx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb1-12">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Open a session to the TA identified by uuid</span></span>
<span id="cb1-13">    TEEC_OpenSession<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(&amp;</span>ctx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>sess<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>uuid<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb1-14">        TEEC_LOGIN_PUBLIC<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> NULL<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> NULL<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>err_origin<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb1-15"></span>
<span id="cb1-16">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Initialize operation context 'op' (see github)</span></span>
<span id="cb1-17">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// ...</span></span>
<span id="cb1-18"></span>
<span id="cb1-19">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Send command to the TA running in TEE</span></span>
<span id="cb1-20">    TEEC_InvokeCommand<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(&amp;</span>sess<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> TA_INSTALL_KEYS<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>op<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>err_origin<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span></code></pre></div></div>
<p>After opening a session with the TEE on a line <code>13</code>, the application sets <code>op</code> context, by providing input arguments and setting buffers for the output. Then call to <code>TEEC_InvokeCommand</code> will trigger communication with the TEE. During this process, TA signature verification is done the TA is started. The entry point to the TA is a function called <code>TA_InvokeCommandEntryPoint</code>.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb2" data-code-line-numbers="" style="background: #f1f3f5;"><pre class="sourceCode c code-with-copy"><code class="sourceCode c"><span id="cb2-1">TEE_Result TA_InvokeCommandEntryPoint<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">void</span> __maybe_unused <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>sess_ctx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb2-2">            <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint32_t</span> cmd_id<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb2-3">            <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint32_t</span> param_types<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> TEE_Param params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">])</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb2-4">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">void</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)&amp;</span>sess_ctx<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/* Unused parameter */</span></span>
<span id="cb2-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">switch</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>cmd_id<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb2-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">case</span> TA_INSTALL_KEYS<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span></span>
<span id="cb2-7">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> install_key<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>param_types<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb2-8">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">case</span> TA_SIGN_ECC<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span></span>
<span id="cb2-9">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> sign_ecdsa<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>param_types<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb2-10">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">case</span> TA_GET_PUB_KEY<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span></span>
<span id="cb2-11">        <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> get_public_key<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>param_types<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb2-12">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">...</span></span>
<span id="cb2-13">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb2-14"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<p>The TA is instructed by providing <code>cmd_id</code> to run specific logic, like key installation, signing or returning public key (the reason for which is described in next section). When installing the key, the TA will copy private and public key attributes to temporary <code>transient_object</code> and then create a file on persistent storage containing those attributes. The key is identified by <code>key_id</code> received from Normal World.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb3" data-source-line-numbers="12,14,16" data-code-line-numbers="12,14,16" style="background: #f1f3f5;"><pre class="sourceCode c code-with-copy"><code class="sourceCode c"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Puts the key to the storage</span></span>
<span id="cb3-2"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">static</span> TEE_Result install_key<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint32_t</span> param_types<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> TEE_Param params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">])</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb3-3">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">//...</span></span>
<span id="cb3-4">    TEE_ObjectHandle transient_obj <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> TEE_HANDLE_NULL<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb3-5">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// ...</span></span>
<span id="cb3-6">    TEE_AllocateTransientObject<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>TEE_TYPE_ECDSA_KEYPAIR<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb3-7">            ecc<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span>x<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>sz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>transient_obj<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb3-8">    ATTR_REF<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>cnt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> TEE_ATTR_ECC_PRIVATE_VALUE<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> ecc<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span>scalar<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb3-9">    ATTR_REF<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>cnt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> TEE_ATTR_ECC_PUBLIC_VALUE_X<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> ecc<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span>x<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb3-10">    ATTR_REF<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>cnt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> TEE_ATTR_ECC_PUBLIC_VALUE_Y<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> ecc<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span>y<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb3-11">    TEE_InitValueAttribute<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(&amp;</span>attrs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span>cnt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">++],</span> TEE_ATTR_ECC_CURVE<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span>ecc<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span>curve_id<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb3-12">    TEE_PopulateTransientObject<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>transient_obj<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> attrs<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> cnt<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb3-13"></span>
<span id="cb3-14">    ret <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> TEE_CreatePersistentObject<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span></span>
<span id="cb3-15">        TEE_STORAGE_PRIVATE<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb3-16">        key_id<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">32</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb3-17">        TEE_DATA_FLAG_ACCESS_WRITE<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb3-18">        transient_obj<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb3-19">        NULL<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>persistant_obj<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb3-20">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// ...</span></span>
<span id="cb3-21"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<p>When signing, the TA will initialize <code>key_handle</code> - the handler to the key, it’s done by calling <code>TEE_OpenPersistentObject</code> with the <code>key_id</code>. Then, <code>key_handle</code> is used when setting up an operation identified by <code>op</code> (line 13) and finally used for signing (line 14). One should notice, that private key material stays in the TEE, it is never revealed to the TA.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb4" data-source-line-numbers="6,13-14" data-code-line-numbers="6,13-14" style="background: #f1f3f5;"><pre class="sourceCode c code-with-copy"><code class="sourceCode c"><span id="cb4-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Performs ECDSA signing with a key from secure storage</span></span>
<span id="cb4-2"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">static</span> TEE_Result sign_ecds <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">uint32_t</span> param_types<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> TEE_Param params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">])</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb4-3">TEE_OperationHandle op <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> TEE_HANDLE_NULL<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb4-4">TEE_ObjectHandle key_handle<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb4-5"></span>
<span id="cb4-6">TEE_OpenPersistentObject<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span></span>
<span id="cb4-7">    TEE_STORAGE_PRIVATE<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb4-8">    key_id<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">32</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb4-9">    TEE_DATA_FLAG_ACCESS_READ<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>key_handle<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb4-10"></span>
<span id="cb4-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// perform ECDSA sigining</span></span>
<span id="cb4-12">TEE_AllocateOperation<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(&amp;</span>op<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> TEE_ALG_ECDSA_P256<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> TEE_MODE_SIGN<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">256</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb4-13">TEE_SetOperationKey<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>op<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> key_handle<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb4-14">TEE_AsymmetricSignDigest<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>op<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> NULL<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb4-15">    params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">].</span>memref<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>buffer<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">].</span>memref<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb4-16">    params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">].</span>memref<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>buffer<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span>params<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">[</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">].</span>memref<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>size<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb4-17">LOG_RET<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>ret<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb4-18"></span>
<span id="cb4-19"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<p>The demo code (<a href="https://github.com/henrydcase/optee_eng/blob/master/src/ta/delegator_tz.c">here</a>) supports ECDSA/p256 only but can be easily extended to provide support for all the schemes used by TLS v1.3.</p>
</section>
<section id="openssl-engine-for-op-tee" class="level3">
<h3 class="anchored" data-anchor-id="openssl-engine-for-op-tee">OpenSSL engine for OP-TEE</h3>
<p>One of the goals for this project was the ease the integration with the TLS layer. It should be possible to provide whole functionality as a plugin loaded to any modern version of OpenSSL, code modifications. OpenSSL provides the possibility to extend functionalities by implementing, so-called, ENGINE API. The dynamically loadable library may implement some cryptographic operations (like signing, verification, key generation) and register it by calling ENGINE’s API. When processing a cryptographic operation the OpenSSL uses custom implementation if provided. The general architecture and guide to build OpenSSL engines can be found in an excellent paper called <a href="https://eprint.iacr.org/2018/354">Start your ENGINEs: dynamically loadable contemporary crypto</a>.</p>
<p>In case of engine for OP-TEE, the code structure looks briefly like:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb5" data-source-line-numbers="8,18" data-code-line-numbers="8,18" style="background: #f1f3f5;"><pre class="sourceCode c code-with-copy"><code class="sourceCode c"><span id="cb5-1"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">static</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> OPTEE_ENG_bind<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>ENGINE <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>e<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">const</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">char</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>id<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb5-2">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// ... some initialization code ...</span></span>
<span id="cb5-3"></span>
<span id="cb5-4">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Set name and ID of an engine</span></span>
<span id="cb5-5">    ENGINE_set_id<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>e<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> OPTEE_ENG_ENGINE_ID<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb5-6">    ENGINE_set_name<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>e<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> OPTEE_ENG_ENGINE_NAME<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb5-7">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Call OPTEE_ENG_load_private_key to load the private key</span></span>
<span id="cb5-8">    ENGINE_set_load_privkey_function<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>e<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> OPTEE_ENG_load_private_key<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">));</span></span>
<span id="cb5-9">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Register callback for signing</span></span>
<span id="cb5-10">    ENGINE_set_pkey_meths<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>e<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> OPTEE_ENG_pkey_meths<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb5-11"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb5-12"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">static</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> OPTEE_ENG_pkey_meths<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>ENGINE <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>e<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> EVP_PKEY_METHOD <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span>pmeth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span></span>
<span id="cb5-13">    <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">const</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span>nids<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> nid<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb5-14">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Use EVP_PKEY_meth_copy to copy all the callbacks to new_meth</span></span>
<span id="cb5-15">    EVP_PKEY_METHOD <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>new_meth <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> EVP_PKEY_meth_new<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>EVP_PKEY_EC<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb5-16">    EVP_PKEY_meth_copy<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>new_meth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> EVP_PKEY_meth_find<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>EVP_PKEY_EC<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">));</span></span>
<span id="cb5-17">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Set new callback for signing</span></span>
<span id="cb5-18">    EVP_PKEY_meth_set_sign<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>new_meth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">,</span> OPTEE_ENG_evp_cb_sign<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb5-19">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Return new EVP_PKEY_METHOD struture</span></span>
<span id="cb5-20">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>pmeth <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> new_meth<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb5-21">    <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb5-22"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span>
<span id="cb5-23"></span>
<span id="cb5-24"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">// Tell the OpenSSL to call OPTEE_ENG_bind when plugin is loaded</span></span>
<span id="cb5-25">IMPLEMENT_DYNAMIC_BIND_FN<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span>OPTEE_ENG_bind<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">)</span></span>
<span id="cb5-26">IMPLEMENT_DYNAMIC_CHECK_FN<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span></span></code></pre></div></div>
<p>The OP-TEE engine adds to the OpenSSL with 2 following custom implementations. The <code>OPTEE_ENG_load_private_key</code> extends the functionality of the<code>ENGINE_load_private_key</code> function. The former is an ENGINE API function <a href="https://github.com/OpenVPN/openvpn/blob/fdfbd4441c2225dc69431c57d18291e103c466cf/src/openvpn/crypto_openssl.c#L1104">used</a> by the OpenVPN to load private keys. The custom <a href="https://github.com/henrydcase/optee_eng/blob/382b2b032fdda0ad2ddacb9995d618f0f3c17200/src/optee_engine/back.c#L233">implementation</a>, provided by the <code>optee_eng</code>, checks if a key with the given ID exists in the TEE. It returns initialized <code>EVP_PKEY</code> object, used by the OpenSSL for message signing, during TLS session establishment. Contrary to standard implementation, <code>EVP_PKEY</code> object returned by <code>optee_eng</code> doesn’t store the private key material instead, it keeps an ID corresponding to the private key.</p>
<p>The second functionality is implemented by <code>OPTEE_ENG_evp_cb_sign</code>. This function gets invoked when signing is requested for a key returned by <code>OPTEE_ENG_load_private_key</code>. The <code>EVP_PKEY</code> contains a list of function pointers, implementing singing, verification, key generation, etc. This callback is assigned to a pointer for message signing. Implementation of this function, <a href="https://github.com/henrydcase/optee_eng/blob/382b2b032fdda0ad2ddacb9995d618f0f3c17200/src/optee_engine/back.c#L340">calls</a> TA in the TEE with an ID of a key and a message to sign. Then control is transferred to <code>sign_ecdsa</code> function <a href="https://github.com/henrydcase/optee_eng/blob/382b2b032fdda0ad2ddacb9995d618f0f3c17200/src/ta/delegator_tz.c#L272">implemented</a> by the TA, which initializes handle to the key and calls TEE OS to perform performs ECDSA/p256 signing.</p>
<p>The <code>IMPLEMENT_DYNAMIC_BIND_FN</code> macro binds everything together. It defines an entry point of an engine - a first function that gets executed when the library is loaded to the OpenSSL (<code>OPTEE_ENG_bind</code> in this case). The function sets an identifier and name of an engine and uses ENGINE API to assign the callbacks (line 8 and 18 in the code listing above).</p>
<p><strong>Side note</strong>: In case of the private key, the OpenSSL v1.1.1 requires that <code>EVP_PKEY</code> structure contains a public part of a key, otherwise loading of the certificate fails and TLS client won’t be able to initialize the connection. In this program, the public part is stored also in the TEE.</p>
<p>Ok, so dynamic engine provides implementation, but OpenSSL needs to somehow know how to load such a library. Following configuration can be added to the OpenSSL’s config file (<code>/etc/ssl/openssl.cnf</code> on my Linux), so that framework knows where to find the dynamic library when requesting engine load by ID `` in this case.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb6" data-code-line-numbers="" style="background: #f1f3f5;"><pre class="sourceCode toml code-with-copy"><code class="sourceCode toml"><span id="cb6-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Additional content of openssl.cnf</span></span>
<span id="cb6-2"></span>
<span id="cb6-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">[default_conf]</span></span>
<span id="cb6-4"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">engines</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">engine_section</span></span>
<span id="cb6-5"></span>
<span id="cb6-6"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">[engine_section]</span></span>
<span id="cb6-7"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">optee</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">optee_section</span></span>
<span id="cb6-8"></span>
<span id="cb6-9"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">[optee_section]</span></span>
<span id="cb6-10"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">engine_id</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">optee</span></span>
<span id="cb6-11"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">dynamic_path</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"/opt/liboptee_eng.so"</span></span>
<span id="cb6-12"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">init</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span></code></pre></div></div>
<p>Let’s try, if it works. On qemu emulating ARMv8 machine I now get:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb7" data-code-line-numbers="" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb7-1"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">qemu</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> openssl</span>
<span id="cb7-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">OpenSSL</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> engine <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-c</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-v</span> optee</span>
<span id="cb7-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">optee</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">OpTEE</span> OpenSSL ENGINE.</span>
<span id="cb7-4"> <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">[id-ecPublicKey]</span></span></code></pre></div></div>
<p>Seems engine can be loaded correctly. Now, when OpenSSL tries to sign a message it needs to do a call to TEE (which is an SMC call to switch CPU into the secure world), get a key from secure storage and return the signature. Also, a crypto operation is now not done by OpenSSL, but by crypto library provided by the OP-TEE OS (in this case it is a fork of LibTomCrypt). All in all, there is a cost of all that dance. Measuring this difference will give some idea and also is a good way to check if the whole flow works correctly. That is done by <a href="https://github.com/henrydcase/optee_eng/blob/master/src/optee_engine/speed.cc"><code>speed.cc</code></a> located in the project’s repository. Benchmark runs 2 functions, the <code>SignREE</code> performs signing, fully in the Normal World by calling pure OpenSSL implementation and <code>SignTEE</code> uses <code>optee_eng</code> for singing. I’ve got the following results when running it on <a href="https://www.96boards.org/documentation/consumer/hikey/hikey960/getting-started/">HiKey960</a> (ARM Cortex-A73).</p>
<div id="c3ed6b48" class="cell" data-fig-format="png" data-execution_count="1">
<div class="cell-output cell-output-display">
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.amongbytes.com/posts/20210112-optee-openssl-engine/20210112-optee-openssl-engine_files/figure-html/cell-2-output-1.png" class="quarto-figure quarto-figure-center figure-img" width="591" height="370"></p>
</figure>
</div>
</div>
</div>
<p>The operation works correctly - the benchmarking code loads <code>optee_eng</code> ENGINE into vanilla OpenSSL and uses only <code>EVP_API</code>. Nevertheless, the slowdown is significant. At this point I need to say, that I haven’t done any more investigation, hence I’m not sure where the slow down comes from exactly. I’ve run release version of the software and used similar settings for the board as described <a href="https://www.amongbytes.com/post/201804-comparing-mbedtls-to-boringssl/#preparation-step">here</a>. I’m pretty sure optimization level in OpenSSL is much better than in LibTomCrypt, hence there is probably lots of room for improvement.</p>
<p><strong>Side note</strong>: the benchmark expects to find in TEE a key with a name <code>bench_key</code>. It must be inserted by using key management app <code>optee_keymgnt put bench_key &lt;PEM_file_with_a_key&gt;</code>.</p>
</section>
<section id="plugging-to-the-openvpn" class="level3">
<h3 class="anchored" data-anchor-id="plugging-to-the-openvpn">Plugging to the OpenVPN</h3>
<p>At this point integration with the OpenVPN is very easy. The only requirement is a version 2.5 of the software (which includes <a href="https://patchwork.openvpn.net/patch/1126/">this</a> change). That change adds the possibility to use OpenSSL ENGINE to load private key, what’s needed here.</p>
<p>There is a trick that needs to be used here to configure OpenVPN correctly. So, the configuration file specifies has a <code>key</code> parameter which specifies the name of the file with the private key, corresponding to the certificate provided by <code>cert</code> parameter. In case of <code>optee_eng</code>, this is a name of the key stored in the TEE (this name is provided to <code>ENGINE_load_private_key</code> as <code>key_id</code> argument). Additionally, file with the same name must exist in the OpenVPN configuration directory. The OpenVPN will try to use the engine to load a key, only if loading from the file fails. So the file needs to be empty, to make sure the load of a key fails. The configuration needs to also specify <code>engine</code> parameter, to instruct OpenSSL to use the <code>optee_eng</code>. Whole configuration file as used on the client can be found <a href="https://github.com/henrydcase/optee_eng/blob/master/cfg/openvpn_cli.conf">here</a>.</p>
</section>
<section id="setting-up-op-tee-image-building-running" class="level3">
<h3 class="anchored" data-anchor-id="setting-up-op-tee-image-building-running">Setting-up OP-TEE image, building &amp; running</h3>
<p>The code of the solution is available on <a href="https://github.com/henrydcase/optee_eng">github</a>. It was tested against OP-TEE 3.11, running in QEMU and on HiKey960 development board. To build and play with the solution, one requires first to build the OP-TEE itself (instructions <a href="https://optee.readthedocs.io/en/latest/building/gits/build.html">here</a>).</p>
<p>To compile the solution:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb8" data-code-line-numbers="" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb8-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">git</span> clone https://github.com/henrydcase/optee_eng</span>
<span id="cb8-2"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">cd</span> optee_eng</span>
<span id="cb8-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">git</span> submodule init <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">&amp;&amp;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">git</span> submodule update</span>
<span id="cb8-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mkdir</span> build <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">&amp;&amp;</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">cd</span> build</span>
<span id="cb8-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cmake</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-DOPTEE_BUILD_DIR</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=&lt;</span>OPTEE location<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> -DPLATFORM=qemu ..</span>
<span id="cb8-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">make</span></span>
<span id="cb8-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">make</span> install</span></code></pre></div></div>
<p>The <code>&lt;OPTEE location&gt;</code> is a root directory for OPTEE. The <code>-DPLATFORM</code> specifies a platform for which solution should be built. I’ll use QEMU in this example. The <code>make install</code> command will copy all needed files to the OP-TEE’s build directory.</p>
<p>OP-TEE uses <code>buildroot</code> to create Normal World OS, where examples can be run. By default, the OpenVPN is not enabled. It can be done by applying 2 patches from <code>optee_eng</code> repo:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb9" data-code-line-numbers="" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb9-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">cd</span> optee_eng</span>
<span id="cb9-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">patch</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-p1</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-d</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>OPTEE location<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span>/buildroot <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> optee-patches/0001-openvpn-2.4.9-to-2.5.0.patch</span>
<span id="cb9-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">patch</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-p1</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">-d</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>OPTEE location<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span>/build <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> 0002_build_enable_openvpn.patch</span>
<span id="cb9-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">cd</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span>OPTEE location<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span>/build</span>
<span id="cb9-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">make</span> run</span></code></pre></div></div>
<p>To connect to the VPN, we need a server. The repository contains configuration for server and client, as well as a set of X.509 certificates (to regenerate certificates the <code>create_cert.sh</code> can be used). The command below configures and starts OpenVPN server on the host machine.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb10" data-code-line-numbers="" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb10-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> cd <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">optee_eng</span></span>
<span id="cb10-2"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> sudo <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">openvpn</span> <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--cd</span> cfg <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--config</span> openvpn_srv.conf</span>
<span id="cb10-3"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">2021-02-06</span> 23:39:53 OpenVPN 2.5.0 <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">[</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">git:makepkg/a73072d8f780e888+</span><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">]</span> x86_64-pc-linux-gnu [SSL <span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">OpenSSL</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">]</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">[</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">LZO</span><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">]</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">[</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">LZ4</span><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">]</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">[</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">EPOLL</span><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">]</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">[</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">PKCS11</span><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">]</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">[</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">MH/PKTINFO</span><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">]</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">[</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">AEAD</span><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">]</span> built on Nov  6 2020</span>
<span id="cb10-4"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">2021-02-06</span> 23:39:53 library versions: OpenSSL 1.1.1h  22 Sep 2020, LZO 2.10</span>
<span id="cb10-5"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">2021-02-06</span> 23:39:53 TUN/TAP device tun0 opened</span>
<span id="cb10-6"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">2021-02-06</span> 23:39:53 net_iface_mtu_set: mtu 1500 for tun0</span>
<span id="cb10-7"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">2021-02-06</span> 23:39:53 net_iface_up: set tun0 up</span>
<span id="cb10-8"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">2021-02-06</span> 23:39:53 net_addr_v4_add: 172.16.0.1/16 dev tun0</span>
<span id="cb10-9"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">2021-02-06</span> 23:39:53 UDPv4 link local <span class="er" style="color: #AD0000;
background-color: null;
font-style: inherit;">(</span><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">bound</span><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">)</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">:</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">[</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">AF_INET</span><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">][</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">undef</span><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">]</span>:1194</span>
<span id="cb10-10"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">2021-02-06</span> 23:39:53 UDPv4 link remote: <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">[</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">AF_UNSPEC</span><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">]</span></span>
<span id="cb10-11"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">2021-02-06</span> 23:39:53 Initialization Sequence Completed</span></code></pre></div></div>
<p>Once QEMU is started and the user is log-in as <code>root</code> in NWd terminal, the <code>tee-supplicant -d</code> needs to be started. The supplicant makes it possible to communicate from Normal World to Secure World. Then next thing to do is to, is to insert a client key into TEE and start VPN.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb11" data-code-line-numbers="" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb11-1"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> optee_keymgnt <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">put</span> vpn.testlab.com /etc/openvpn/certs/client.key</span>
<span id="cb11-2"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> rm <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">/etc/openvpn/certs/client.key</span></span>
<span id="cb11-3"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> openvpn <span class="ex" style="color: null;
background-color: null;
font-style: inherit;">--cd</span> /etc/openvpn/ <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">--config</span> client.conf</span>
<span id="cb11-4"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">2021-02-09</span> 00:27:27 Initializing OpenSSL support for engine <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'optee'</span></span>
<span id="cb11-5"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">2021-02-09</span> 00:27:27 OpenSSL: error:0909006C:PEM routines:get_name:no start line</span>
<span id="cb11-6"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">2021-02-09</span> 00:27:27 PEM_read_bio failed, now trying engine method to load private key</span>
<span id="cb11-7"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">2021-02-09</span> 00:27:27 TCP/UDP: Preserving recently used remote address: <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">[</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">AF_INET</span><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">]</span>172.16.0.1:1194</span>
<span id="cb11-8"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">...</span></span>
<span id="cb11-9"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">2021-02-09</span> 00:27:28 <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">[</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">vpn.testlab.com</span><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">]</span> Peer Connection Initiated with <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">[</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">AF_INET</span><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">]</span>172.16.0.1:1194</span>
<span id="cb11-10"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">...</span></span></code></pre></div></div>
<p>The second terminal displays logs from TEE OS running in parallel to Linux. One should see the following traces there:</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb12" data-code-line-numbers="" style="background: #f1f3f5;"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb12-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># When inserting a key to the TEE</span></span>
<span id="cb12-2"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">I/TA:</span> New key <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">[</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">F671A1B757</span><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">]</span> registered</span>
<span id="cb12-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># During TLS handshake</span></span>
<span id="cb12-4"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">I/TA:</span> Sign for a key ID <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">[</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">F671A1B757</span><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">]</span> requested</span>
<span id="cb12-5"><span class="ex" style="color: null;
background-color: null;
font-style: inherit;">I/TA:</span> Message signed with key ID <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">[</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">F671A1B757</span><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">]</span></span></code></pre></div></div>
<p>At this point the VPN tunnel should be correctly created.</p>
</section>
<section id="conclusion" class="level3">
<h3 class="anchored" data-anchor-id="conclusion">Conclusion</h3>
<p>Hopefully, this example shows how to utilize ARM TrustZone from OpenSSL to secure private keys for OpenVPN. Ideas similar to implemented by <code>optee_eng</code> can be used with any software using OpenSSL - the same engine can be used by Nginx, ssh-agent or strongSwan on both client and server-side. The solution is fully “pluggable”, it doesn’t require any modification to existing software. It’s worth to notice that such isolation of private keys from internet-facing applications, may help to avoid security incidents. For example, it would be enough to use <code>optee_eng</code> to avoid hearthbleed, as the private key is not stored in the process running OpenSSL library.</p>
<p>As an improvement to this idea, one could think of using PKCS#11 standard for communication with TEE. It wasn’t done here for 2 reasons - PKCS#11 would require TA implementing the standard, which is not finished yet (but <a href="https://github.com/OP-TEE/optee_os/issues/4283">ongoing</a>). The other reason is that my ultimate goal (which wasn’t presented here) is to use post-quantum cryptography. Those new schemes are not yet incorporated properly into PKCS#11 standard.</p>
<p>Finally, upcoming OpenSSL 3.0 removes support for ENGINE API completely. Instead, there is a new concept called <code>providers</code>. Hence, implementation of <code>optee_eng</code> for the upcoming version of OpenSSL will look probably slightly different. But from one hand OpenVPN doesn’t support this new version yet and from the other hand, it doesn’t seem 3.0 provides yet similar functionality for loading private keys.</p>


</section>

 ]]></description>
  <guid>https://www.amongbytes.com/posts/20210112-optee-openssl-engine/20210112-optee-openssl-engine.html</guid>
  <pubDate>Tue, 12 Jan 2021 00:00:00 GMT</pubDate>
</item>
<item>
  <title>On using Trusted Execution Environment for TLS session signing</title>
  <link>https://www.amongbytes.com/posts/201904-TEE-sign-delegator.html</link>
  <description><![CDATA[ 





<section id="problem-description" class="level3">
<h3 class="anchored" data-anchor-id="problem-description">Problem description</h3>
<p>Typically, a TLS server uses a Certificate and associated Private Key in order to sign TLS session. From now on I’ll call this Private Key a “traffic- private-key”. Both certificate and traffic-private-key form a asymmetric cryptographic key-pair. Revealing the traffic-private-key makes it possible to perform men-in-the-middle type of attacks. Typically traffic-private-key is stored on the server’s hard disk. Even if traffic-private-key is stored in encrypted form, at some point HTTPS server needs to have a possibility to decrypt it in order to use for signing. It means that at runtime the key in plaintext will be available in a memory of a HTTPS process. At this point attacker with an access to the machine may be able to dump memory of the process and learn the traffic-private-key.</p>
<p>Hence, server operators need to take special care in order to make sure traffic-private-keys are not revealed.</p>
<p>This situation gets more complicated in cases when server operator and domain owner are 2 different entities. For example in case of CDN, TLS offloading happens on the edge system - which often is a completely different machine than actual application server. Also it is often the case that servers (physical machines) of CDN provider are spread over the world and are located in remote data centers. Those data centers may be owned by multiple different entities.</p>
<p>In such situations, problem of ensuring that the traffic-private-key is not copied and used by an attacker may be challenging and not obvious to solve. Clients of the CDN may also be concerned about idea of spreading the traffic-private-key over the world.</p>
</section>
<section id="solution-proposed" class="level3">
<h3 class="anchored" data-anchor-id="solution-proposed">Solution proposed</h3>
<p>For brevity I’m assuming server uses only TLS 1.3 as specified in [RFC8446], but solution can be adapted to any version of TLS.</p>
<p>The idea is to perform TLS session signing inside Trusted Execution Environment. The traffic-private-key will be accessible only to TEE. Additionally, solution ensures that key is stored in encrypted form in trusted storage. The storage is bound to the physical machine and hence copy of the storage can’t be used on some different machine.</p>
<p>The solution as implemented in the PoC and described below is based on ARM TrustZone and it uses open sourced TEE called OP-TEE (see [OP-TEE]), sources of OP-TEE are stored on github (see [OP-TEE-SRC]). OP-TEE was driven by the fact that author is quite familiar with environment nevertheless it can be implemented with other TEEs which provide device bound trusted storage. Author is convinced that Intel SGX with Asylo would be better choice here. Solution makes also heavy use of BoringSSL for handling with TLS traffic.</p>
<p>Points below describe implementation in more details:</p>
<ol type="1">
<li><p>Key provisioning server</p>
<p>It is assumed that machine is initially provisioned with a software which acts as a server for traffic-private-key provisioning.</p>
<p>In order to install traffic-private-key on a machine, operator connects to key provisioning server and sends the traffic-private-key to be installed on the machine. This operation is done over TLS connection which uses client authentication. Possition of some form of TLS provisioning is required by the operator. Key provisioning server must be able to verify provisioning key, hence verification-provisioning-key is also preinstalleld.</p>
<p>After sucessuful TLS authentication, operator sends a pair of traffic-private-key and domain name for which the key must be used. This pair is installed on secure storage which accessible from TEE only. TEE ensures traffic-provisioning-key can’t be read from outside of TEE.</p></li>
<li><p>TLS session signing</p>
<p>Solution uses BoringSSL to offload TLS traffic. BoringSSL API gives a possibility to register a function which is called during TLS handshake, when server needs to sign a session with traffic-private-key.</p>
<p>It means that there are no modifications needed to BoringSSL in order to use it for signing TLS session with traffic-private-key stored in TEE.</p>
<p>The code which registers signing operation looks like this:</p>
<pre><code>void signing_operation(message_to_sign, domain_name, *signature) {
    ... calls TEE for signing ...
}

SSL_PRIVATE_KEY_METHOD private_key_methods {
    .sign = signing_operation
    .decrypt = ...
    .complete = ...
};
SSL_CTX_set_private_key_method(SSL_CTX, &amp;private_key_methods)</code></pre>
<p>TLS server calls <code>signing_operation</code> function when TLS session needs to be signed. This function passes <code>message_to_sign</code> and <code>domain_name</code> to the TEE. While in the TEE, the <code>domain_name</code> is used as an index in order to retrieve right traffic-private-key (many domains can be handled by the server). TEE performs signing and <code>signature</code> is returned to the BoringSSL. BoringSSL continues TLS handshake as normal.</p></li>
<li><p>Key can’t be used on another machine.</p>
<p>Trusted storage in OP-TEE is bound to the physical device. It means even if the storage is coppied to another device, it won’t be possible to decrypt stored data.</p>
<p>In more detail, OP-TEE implements GlobalPlatform Trusted Storage API. Device binding is one of the requirements for trusted storage. In order to make it possible each device needs to come with preinstalled Hardware Unique Key (HUK).</p>
<p>More details about trusted storage can be found on in OP-TEE documentation (see [OP-TEE-STORAGE]).</p>
<p>It must be mentioned, that in order to use trusted storage, SoC specific customization is needed (see comment in orange at the bottom of [OP-TEE-STORAGE]).</p></li>
</ol>
<!-- 4. RPMB:  TODO -->
</section>
<section id="poc-implementation" class="level2">
<h2 class="anchored" data-anchor-id="poc-implementation">PoC implementation</h2>
<p>As mentioned before, implementation uses OP-TEE as a base for TEE. Example was tested with OP-TEE running inside QEMU emulating ARMv8.</p>
<p>PoC is composed of:</p>
<ul>
<li><p><code>admin_cli</code>: Client used for installing the private keys inside TEE. This component is used instead of <code>key provisioning server</code> as such server was not implemented in PoC.</p></li>
<li><p><code>server</code>: It is a TLS offloading server. Server listens on <code>127.0.0.1:443</code> and uses BoringSSL to accept and handle TLS connection. Server implements function callback, which calls TEE when private key operation needs to be done. Only ECDSA/P256 sining is currently supported.</p></li>
<li><p><code>ta</code>: Trusted application running inside TEE. The application is responsible for processing requests from <code>admin_cli</code>, which is storing the keys on trusted storage and deleting them if requested. As well as processing signing requests from the <code>server</code>.</p></li>
</ul>
<p>The section called “Example of usage” explains how to use the software in details.</p>
<section id="compilation-and-installation" class="level3">
<h3 class="anchored" data-anchor-id="compilation-and-installation">Compilation and installation</h3>
<p>Following steps need to be taken to install the software:</p>
<ol type="1">
<li><p>OP-TEE building. This step is explained in details <a href="https://optee.readthedocs.io/building/gits/build.html#get-and-build-the-solution">here</a>. It is required to perform steps 1 to 5. The <code>TARGET</code> (see the building instruction) used by this example is called <code>QEMUv8</code>. In case OP-TEE is started after step 5, it has to be stopped.</p></li>
<li><p>Next steps assume that Linux operating system is used and OP-TEE has been cloned to the directory called <code>OPTEE_DIR</code>.</p></li>
<li><p>Create directory <code>/tmp/tee_shared</code></p></li>
<li><p>Go to <code>OPTEE_DIR</code> directory.</p></li>
<li><p>Clone <code>git clone https://git.amongbytes.com/kris/c3-tls-sign-delegator.git projects</code></p></li>
<li><p>Compile BoringSSL for aarch64 and native system: <code>cd OPTEE_DIR/projects/bssl; make</code>. Makefile is configured to use toolchain build in step 1. This step will also build BoringSSL for host machine, it requires all dependencies for building BoringSSL are installed (see [BORING-BUILD]).</p></li>
<li><p>Compile solution: <code>cd OPTEE_DIR/projects/delegator; make</code></p></li>
</ol>
</section>
<section id="start-process" class="level3">
<h3 class="anchored" data-anchor-id="start-process">Start process</h3>
<ol type="1">
<li><p>Starting OP-TEE: User needs to:</p>
<ul>
<li>Enter build directory: <code>cd OPTEE_DIR/build</code></li>
<li>Start QEMU with OP-TEE emulation: <code>make QEMU_VIRTFS_ENABLE=y QEMU_USERNET_ENABLE=y QEMU_VIRTFS_HOST_DIR=/tmp/tee_share HOSTFWD=",hostfwd=tcp::1443-:1443" run-only</code>.</li>
<li>Just after qemu starts it will pause with following prompt:</li>
</ul>
<pre class="shell"><code>cd /home/hdc/repos/optee/qemuv8/build/../out/bin &amp;&amp; /home/hdc/repos/optee/qemuv8/build/../qemu/aarch64-softmmu/qemu-system-aarch64 \
    -nographic \
    -serial tcp:localhost:54320 -serial tcp:localhost:54321 \
    -smp 2 \
    -s -S -machine virt,secure=on -cpu cortex-a57 \
    -d unimp -semihosting-config enable,target=native \
    -m 1057 \
    -bios bl1.bin \
    -initrd rootfs.cpio.gz \
    -kernel Image -no-acpi \
    -append 'console=ttyAMA0,38400 keep_bootcon root=/dev/vda2' \
    -fsdev local,id=fsdev0,path=/tmp/tee_share,security_model=none -device virtio-9p-device,fsdev=fsdev0,mount_tag=host -netdev user,id=vmnic,hostfwd=tcp::1443-:1443 -device virtio-net-device,netdev=vmnic
QEMU 3.0.93 monitor - type 'help' for more information
(qemu)</code></pre>
<p>User continues the process by entering <code>c</code></p>
<pre><code>(qemu) c</code></pre>
<p>After a while 2 additional terminals should appear - one terminal labeld as “Normal”, running linux and another terminal labeled as “Secure” with output from the TEE. </p><div class="quarto-video"><video id="video_shortcode_videojs_video6" class="video-js vjs-default-skin vjs-big-play-centered vjs-fluid" controls="" preload="auto" data-setup="{}" title=""><source src="<https://youtu.be/02klEwlsJIA>"></video></div><p></p></li>
<li><p>In the “Normal World” terminal user needs to mount file system to share data between guest and host machine. Following command needs to be used:</p>
<pre class="shell"><code>mount -t 9p -o trans=virtio host /mnt</code></pre>
<div class="quarto-video"><video id="video_shortcode_videojs_video1" class="video-js vjs-default-skin vjs-big-play-centered vjs-fluid" controls="" preload="auto" data-setup="{}" title=""><source src="<https://youtu.be/5psOtKtdlWI>"></video></div></li>
<li><p>Install Trusted Application inside OP-TEE:</p>
<p>In the “Normal” terminal invoke:</p>
<pre><code>sh /mnt/out/etc/tee_install</code></pre>
<div class="quarto-video"><video id="video_shortcode_videojs_video2" class="video-js vjs-default-skin vjs-big-play-centered vjs-fluid" controls="" preload="auto" data-setup="{}" title=""><source src="<https://youtu.be/aGbqgz_e9Ec>"></video></div></li>
</ol>
<p>At this point installation and startup process is complated and solution can be used.</p>
</section>
<section id="example-of-usage" class="level3">
<h3 class="anchored" data-anchor-id="example-of-usage">Example of usage</h3>
<ol type="1">
<li><p>Installing a key on secure storage</p>
<p>First step is to install the key on secure storage. Ideally this step is done by “Key provisioning server”. Nevertheless, this PoC doesn’t implement such server. Instead <code>admin_cli</code> can be used to install the key.</p>
<p>In the “Normal” terminal, go to <code>/mnt/out/</code> and invoke</p>
<pre><code>cd /mnt/out
# ./admin_cli/admin_cli put www.test.com etc/ecdsa_256.key</code></pre>
<p>This command installs private key for <code>www.test.com</code>. In the “Secure” terminal you should see a message <code>E/TA:  install_key:156 Storing a key</code>. After this step <code>etc/ecdsa_256.key</code> can be removed.</p>
<div class="quarto-video"><video id="video_shortcode_videojs_video3" class="video-js vjs-default-skin vjs-big-play-centered vjs-fluid" controls="" preload="auto" data-setup="{}" title=""><source src="<https://youtu.be/__7WKvx8XxM>"></video></div></li>
<li><p>Start a TLS server and perform TLS handshake:</p>
<p>With private key installed TLS server can be started. In the “Normal” terminal invoke</p>
<pre class="shell"><code>&gt; cd /mnt/out
&gt; ./server/server</code></pre>
<p>Server will start listening on <code>127.0.0.1:1443</code>. In the host machine one can try to connect to the TLS server:</p>
<pre class="shell"><code>&gt; cd OPTEE_DIR

&gt; ./projects/bssl/src/build.native/tool/bssl client -connect 127.0.0.1:1443 -server-name "www.test.com"
Connecting to 127.0.0.1:1443
Connected.
Version: TLSv1.3
Resumed session: no
Cipher: TLS_AES_128_GCM_SHA256
ECDHE curve: X25519
Signature algorithm: ecdsa_secp256r1_sha256
Secure renegotiation: yes
Extended master secret: yes
Next protocol negotiated:
ALPN protocol:
OCSP staple: no
SCT list: no
Early data: no
Cert subject: CN = www.dmv.com
Cert issuer: C = FR, ST = PACA, L = Cagnes sur Mer, OU = Domain Control Validated SARL, CN = Domain Control Validated SARL</code></pre>
<div class="quarto-video"><video id="video_shortcode_videojs_video4" class="video-js vjs-default-skin vjs-big-play-centered vjs-fluid" controls="" preload="auto" data-setup="{}" title=""><source src="<https://youtu.be/kRRl2zhbUqc>"></video></div></li>
<li><p>Trial to access different domain fails as traffic-private-key is not available.</p>
<div class="quarto-video"><video id="video_shortcode_videojs_video5" class="video-js vjs-default-skin vjs-big-play-centered vjs-fluid" controls="" preload="auto" data-setup="{}" title=""><source src="<https://youtu.be/LBhllWcn4RY>"></video></div></li>
</ol>
</section>
<section id="extensions-to-the-idea" class="level3">
<h3 class="anchored" data-anchor-id="extensions-to-the-idea">Extensions to the idea</h3>
<p>First of all - key storage is bound to the device. In order to use stolen key for MITM, attacker needs to steal whole machine, which is much more difficult and easier to control. In order to implement such solution user doesn’t need expensive HSM, but it can simply use Intel with SGX and Asylo. Also it’s easy to imagine some extensions to this idea. For example instead of calling TEE each time for session signing during TLS handshake, one could imagine that solution can use “Delegated Credentials for TLS” (see <a href="https://tools.ietf.org/html/draft-rescorla-tls-subcerts-02" class="uri">https://tools.ietf.org/html/draft-rescorla-tls-subcerts-02</a>). In this case TEE would be responsible for generating short lived certificates and TLS server would request such certificate every fixed amount of time (every few minutes). This idea could be combined with another – instead of storing traffic-private-key in multiple machines, one could imagine storing a key in some central location with more restricted access (but still in TEE). Combining those two ideas improves security of traffic-private-key storage without degrading time needed to perform TLS handshake. It must be noticed that “Delegated Credentials for TLS” are already implemented in BoringSSL.</p>
</section>
</section>
<section id="links" class="level2">
<h2 class="anchored" data-anchor-id="links">Links</h2>
<ul>
<li>[PoC code] <a href="https://git.amongbytes.com/kris/TLS-delegation-to-TEE" class="uri">https://git.amongbytes.com/kris/TLS-delegation-to-TEE</a></li>
<li>[RFC8446] <a href="https://tools.ietf.org/html/rfc8446" class="uri">https://tools.ietf.org/html/rfc8446</a></li>
<li>[OP-TEE] <a href="https://www.op-tee.org/" class="uri">https://www.op-tee.org/</a></li>
<li>[OP-TEE-SRC] <a href="https://github.com/OP-TEE" class="uri">https://github.com/OP-TEE</a></li>
<li>[OP-TEE-STORAGE] <a href="https://optee.readthedocs.io/architecture/secure_storage.html" class="uri">https://optee.readthedocs.io/architecture/secure_storage.html</a></li>
<li>[BORING-BUILD] <a href="https://github.com/google/boringssl/blob/master/BUILDING.md" class="uri">https://github.com/google/boringssl/blob/master/BUILDING.md</a></li>
</ul>


</section>

 ]]></description>
  <guid>https://www.amongbytes.com/posts/201904-TEE-sign-delegator.html</guid>
  <pubDate>Mon, 15 Apr 2019 00:00:00 GMT</pubDate>
  <media:content url="https://www.amongbytes.com/img/optee.png" medium="image" type="image/png" height="86" width="144"/>
</item>
<item>
  <title>i2c-stub: Playing with I2C on linux</title>
  <dc:creator>Kris Kwiatkowski</dc:creator>
  <link>https://www.amongbytes.com/posts/20190208-i2c-stub.html</link>
  <description><![CDATA[ 





<p>Recently I had a chance to play with <code>i2c-stub</code>. The goal was to send and receive encrypted data to/from I2C connected device. I didn’t want to play with real I2C device, so I needed to emulate it somehow, which is possible with <code>i2c-stub</code> on linux. Here below is description how it was done.</p>
<section id="requirements" class="level2">
<h2 class="anchored" data-anchor-id="requirements">Requirements</h2>
<p>The solution needs to be implemented in C and have following functionalities * Possibility to connect to I2C slave * Send encrypted data * Receive and decrypt data * Possibility to check connection status</p>
</section>
<section id="the-code" class="level2">
<h2 class="anchored" data-anchor-id="the-code">The code</h2>
<p>The code itself is <a href="https://git.amongbytes.com/kris/i2c-stub-toy">here</a>. To compilie with <code>gcc</code> simply download and <code>make</code>.</p>
<section id="initialization" class="level3">
<h3 class="anchored" data-anchor-id="initialization">Initialization</h3>
<p>In order to use the code (read/write data to I2C) I’m using <code>i2c-stub</code> linux module and i2c-tools package (ArchLinux). <code>i2c-stub</code> creates a fake I2C adapter(Controller/Master) and emulates i2C hardware (using array to store data). We also will need <code>i2c-dev</code> module as a frontend.</p>
<p>Following command will load the module, initialize slave device with an address 0x03, and read it’s initial state:</p>
<pre class="shell"><code>&gt; modprobe i2c-dev
&gt; modprobe i2c-stub chip_addr=0x03

&gt; i2cdetect -l
i2c-1   i2c         i915 gmbus dpc                      I2C adapter
i2c-2   i2c         i915 gmbus dpd                      I2C adapter
...
i2c-8   smbus       SMBus stub driver                   SMBus adapter &lt;-- this one
...

&gt; i2cdump -y 8 0x03
No size specified (using byte-data access)
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
</code></pre>
<p>We can see that module was loaded. <code>i2cdetect</code> as detected is as a character device <code>/dev/i2c-8</code> and <code>i2c-dump</code> shows memory state of slave device with an ID <code>0x03</code>.</p>
</section>
<section id="sending-data" class="level3">
<h3 class="anchored" data-anchor-id="sending-data">Sending data</h3>
<p>Test program has <code>-s</code> option that needs to be used in order to send data. As an argument, device ID needs to be provided.</p>
<pre class="shell"><code>&gt; ./bin/main -s 8 &amp;&amp; echo $?
0</code></pre>
<p>On a success program return 0. We can now verify if data has been stored in the i2c device with <code>i2cdump</code>.</p>
<pre class="shell"><code>     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
00: 1f f8 47 8e 7f 24 1d 2b 47 ca 64 be ce 0a 3f bd    ??G??$?+G?d?????
10: 08 1c 05 87 b0 31 6c 85 46 94 6f c8 9e 49 dd b2    ?????1l?F?o??I??
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................</code></pre>
<p>Before sending program uses Poly1305-ChaCha20 to encrypt and authenticate data.</p>
</section>
<section id="receiving-data" class="level3">
<h3 class="anchored" data-anchor-id="receiving-data">Receiving data</h3>
<p>Test program has <code>-r</code> option to indicate that user want’s to receive data from I2C device. On exit program prints received data.</p>
<pre class="shell"><code>[root@cryptoden final]# ./bin/main -r 8
RECEIVED DATA:
HELLO WORLD!!!</code></pre>
<p>As data is authenticated any change to data stored in the I2C will result in decryption error. In order to see this behaviour one can dump the I2C memory, change it, load to I2C and try to read again. Let’s see this:</p>
<pre class="shell"><code>&gt; i2cdump -y 8 0x03 b &gt; dump
&gt; cat dump
[root@cryptoden ~]# cat dump
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
00: 1f f8 47 8e 7f 24 1d 2b 47 ca 64 be ce 0a 3f bd    ??G??$?+G?d?????
10: 08 1c 05 87 b0 31 6c 85 46 94 6f c8 9e 49 dd b3    ?????1l?F?o??I??
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00    ................

## Here I modify 32-nd byte from b3 to b2 and load data to i2c-stub
&gt; i2c-stub-from-dump 0x03 dump
256 byte values written to 8-0003

## Trying to read
&gt; ./bin/main -r 8
[i2c_recv() src/i2c.c:165] Error occured when decrypting
[i2c_recv() src/i2c.c:172] ERROR: can't receive encrypted data
[test_receive() src/main.c:88] Error occured when receiving data</code></pre>
</section>
<section id="testing" class="level3">
<h3 class="anchored" data-anchor-id="testing">Testing</h3>
<p>Program has a <code>-t</code> option which can be used to test program and see that connection stays persistent after connecting to the device.</p>
<pre class="shell"><code>&gt; ./bin/main -t 8 &amp;&amp; echo $?
0</code></pre>
</section>
</section>
<section id="additional-links" class="level2">
<h2 class="anchored" data-anchor-id="additional-links">Additional links</h2>
<ul>
<li>https://robot-electronics.co.uk/i2c-tutorial</li>
<li>http://alokprasad7.blogspot.com/2018/01/fake-i2c-device-i2c-stub.html</li>
<li>https://electronicayciencia.github.io/wPi_soft_i2c/</li>
<li>https://learn.sparkfun.com/tutorials/i2c/all</li>
<li>https://electronicayciencia.github.io/wPi_soft_i2c/</li>
</ul>


</section>

 ]]></description>
  <guid>https://www.amongbytes.com/posts/20190208-i2c-stub.html</guid>
  <pubDate>Fri, 08 Feb 2019 00:00:00 GMT</pubDate>
  <media:content url="https://www.amongbytes.com/img/uart_hw.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Building HTTPS server with quantum-safe TLS in baby steps</title>
  <link>https://www.amongbytes.com/posts/201810-baby-steps-to-PQ-HTTPS-server.html</link>
  <description><![CDATA[ 





<p>This post is a step-by-step instruction to build HTTPS web server which uses quantum-resistant algorithm for TLS key exchange. Solution is written in <a href="https://golang.org">Go</a> language. For a web server I’ve used <a href="https://caddyserver.com/">Caddy</a>. Caddy uses TLS implementation from standard Go library. As this doesn’t support any of quantum-resistant algorithms I’ll change it with another implementation.</p>
<p>The quantum-resistant algorithm of my choice is SIDH. Quite some effort has been already put into security research of SIDH. Basics of the algorithm are explained in more details by <span class="citation" data-cites="LVH">@LVH</span> in his <a href="https://www.lvh.io/posts/supersingular-isogeny-diffie-hellman-101.html">blog post</a> as well as on <a href="https://en.wikipedia.org/wiki/Supersingular_isogeny_key_exchange">Wiki</a>. I’ll use Cloudflare’s implementation of SIDH available from <a href="http://github.com/cloudflare/sidh">here</a>. One interesting characteristic of the algorithm is that it can be used as a drop-in replacement for the ECDH.</p>
<p>Finally, I’ll need TLS implementation which supports SIDH. This has been done in <a href="https://github.com/cloudflare/tls-tris/pull/143">tls-tris</a>. The library provides built-in support for <strong>SIDH/P503-X25519</strong> - key exchange based on IETF draft <a href="https://franziskuskiefer.github.io/pq-tls-draft/draft-kiefer-tls-ecdhe-sidh.html">Hybrid ECDHE-SIDH Key Exchange for TLS</a>. This is done over TLS v1.3, which library now also <a href="https://github.com/cloudflare/tls-tris/pull/135">supports</a>. The tls-tris is code compatible with TLS implementation from standard Go, which makes it possible to simply swap one implementation with the other.</p>
<section id="step-1-sidh-support-in-tls" class="level1">
<h1>Step 1: SIDH support in TLS</h1>
<p>With tls-tris it is possible to perform SIDH key exchange. In order to link application (Caddy in this case) with TLS tris one needs to swap TLS implementation in standard library and recompile Go from source. All this is implementated in the Makefile that comes with the library. Binaries are placed in <code>tls-tris/_dev/GOROOT</code> folder, the <code>GOROOT</code> needs to point to this folder.</p>
<p>Let’s first download needed sources:</p>
<pre class="shell"><code># Create some workspace
WORKSPACE=/tmp/workspace/
mkdir -p ${WORKSPACE}

# Get Go 1.10 sources (if not already done)
cd ${WORKSPACE} 
wget https://dl.google.com/go/go1.10.4.linux-amd64.tar.gz -O - | tar -xz

# Clone tls-tris 
git clone https://github.com/cloudflare/tls-tris</code></pre>
<p>Next step is to build Go with TLSv1.3 and SIDH support. This is done automatically and makefile implements all needed steps. The <code>build-all</code> target will basically:</p>
<ul>
<li>Swap standard library TLS with tls-tris</li>
<li>Download SIDH crypto library and vendor it to <code>${WORKSPACE}/go/src/vendor</code> directory. This way SIDH is available as it would be a part of standard library.</li>
</ul>
<pre class="shell"><code># By specifying GOROOT_ENV makefile knows where to look for Go sources
cd tls-tris; GOROOT_ENV=${WORKSPACE}/go make -f _dev/Makefile build-all</code></pre>
<p>Finally <code>GOROOT</code> needs to be adjusted:</p>
<pre class="shell"><code>export GOROOT=${WORKSPACE}/tls-tris/_dev/GOROOT/linux_amd64</code></pre>
</section>
<section id="step-2-patching-caddy" class="level1">
<h1>Step 2: Patching Caddy</h1>
<p>By default Caddy supports TLS up to version 1.2. Thanks to steps above, TLSv1.3 and Hybrid SIDH/503-X25519 key exchange are ready to use. Nevertheless, some code changes to Caddy are needed in order to benefit from those new features.</p>
<p>Let’s start by cloning the sources:</p>
<pre class="shell"><code>mkdir -p /tmp/gopath
export GOPATH=/tmp/gopath
go get github.com/mholt/caddy/caddy 
go get github.com/caddyserver/builds
cd $GOPATH/src/github.com/mholt/caddy</code></pre>
<p>Now we need to add 2 lines of code in order to: 1. Enable TLS v1.3: In file <code>caddytls/config.go</code> Caddy keeps a map of supported TLS protocols. The TLSv.1.3 needs to be added to this map. 2. Enable SIDH: In the same file Caddy also keeps map of supported curves. The scheme called <code>tls.HybridSIDHp503Curve25519</code> needs to be added to this map.</p>
<p>The whole diff should look something like that:</p>
<pre class="shell"><code>&gt; git diff
diff --git a/caddytls/config.go b/caddytls/config.go
index 8cf61e4..fc7510d 100644
--- a/caddytls/config.go
+++ b/caddytls/config.go
@@ -583,6 +583,7 @@ var SupportedProtocols = map[string]uint16{
        "tls1.0": tls.VersionTLS10,
        "tls1.1": tls.VersionTLS11,
        "tls1.2": tls.VersionTLS12,
+       "tls1.3": tls.VersionTLS13,
 }
 
 // GetSupportedProtocolName returns the protocol name
@@ -682,6 +683,7 @@ var supportedCurvesMap = map[string]tls.CurveID{
        "P256":   tls.CurveP256,
        "P384":   tls.CurveP384,
        "P521":   tls.CurveP521,
+       "SIDH/503-X25519": tls.HybridSIDHp503Curve25519,
 }
 
 // List of all the curves we want to use by default.</code></pre>
<p>With those changes applied, Caddy can be built:</p>
<pre class="shell"><code># Once again, just to make sure right Go version is used
export GOROOT=${WORKSPACE}/tls-tris/_dev/GOROOT/linux_amd64

# And let's build caddy
cd $GOPATH/src/github.com/mholt/caddy/caddy
go run build.go</code></pre>
</section>
<section id="caddy-configuration-and-server-bring-up" class="level1">
<h1>Caddy configuration and server bring up</h1>
<p>At this point hybrid post quantum key exchange should be supported by Caddy. Obviusly that’s not a default configuration, so one needs to tell Caddy to use it. Caddy is configuring by providing <code>Caddyfile</code>. In this file max version of the protocol is set to “tls1.3”. Also elliptic curve preferences are changed by specifying “SIDH/503-X25519” as key exchange algorithm. My minimal Caddyfile looks like this:</p>
<pre class="shell"><code>localhost:2015 {
        tls self_signed
        tls {
                protocols tls1.2 tls1.3
                curves X25519 P256 "SIDH/503-X25519"
        }
        log stdout
        proxy / http://www.amongbytes.com
}</code></pre>
<p>This file is placed in the same folder where “caddy” binary lives, namely <code>$GOPATH/src/github.com/mholt/caddy/caddy/Caddyfile</code>. It will basically open a port 2015 on localhost and forward all traffic to <code>http://www.amongbytes.com</code>. The self_signed certificate will be used in this test configuration. Let’s start it:</p>
<pre class="shell"><code>&gt; cd $GOPATH/src/github.com/mholt/caddy/caddy
&gt; ./caddy
Activating privacy features... done.
https://localhost:2015</code></pre>
<p>Looks like it’s working. Now it would be good to actually test if post quantum key exchange is working. There are probably not too many browsers supporting SIDH/503-X25519 (if any). Nevertheless tls-tris contains a patch for boringssl which adds SIDH and is used for interoperability testing. We can reuse it to test our setup.</p>
<pre class="shell"><code>cd ${WORKSPACE}

# Patch for BoringSSL adding SIDH/P503-X25519 support
wget https://raw.githubusercontent.com/cloudflare/tls-tris/master/_dev/boring/sidh_ff433815b51c34496bb6bea13e73e29e5c278238.patch

# Clone and checkout BoringSSL. I'm using specific commit as I want to make sure patch applies correctly
git clone https://boringssl.googlesource.com/boringssl
cd boringssl
git fetch &amp;&amp; git checkout ff433815b51c34496bb6bea13e73e29e5c278238 
patch -p1 &lt; ../sidh_ff433815b51c34496bb6bea13e73e29e5c278238.patch

# When building, make sure EXP_SIDH is defined as it enables SIDH
mkdir build &amp;&amp; cd build; cmake -DEXP_SIDH=1 -GNinja .. &amp;&amp; ninja</code></pre>
<p>Assuming server is running we can now check if quantum resistant TLS handshake works.</p>
<pre class="shell"><code>&gt; ./tool/bssl client -curves x25519sidh503 -connect localhost:2015                                         
Connecting to [::1]:2015
Connected.
  Version: TLSv1.3
  Resumed session: no
  Cipher: TLS_AES_128_GCM_SHA256
  ECDHE curve: x25519sidh503
  Signature algorithm: ecdsa_secp256r1_sha256
  Secure renegotiation: yes
  Extended master secret: yes
  Next protocol negotiated: 
  ALPN protocol: 
  OCSP staple: no
  SCT list: no
  Early data: no
  Cert subject: O = Caddy Self-Signed
  Cert issuer: O = Caddy Self-Signed</code></pre>
<p>BoringSSL reports that ECHDE curve used for key exchange was “x25519sidh503”. It seems post quantum key exchange works just all right!</p>
</section>
<section id="conclusion-and-next-steps" class="level1">
<h1>Conclusion and next steps</h1>
<p>I’m neither an expert nor big fan of Golang. Nevertheless, I think it is great language for performing experiments - especially those related to networking. The setup presented here is used for experiments related to post-quantum cryptographic primitive implementation. I’m using it both on ARM and Intel and thanks to Golang’s build tools, compilation process is very simple, fast and quite easy to perform.</p>
<p>The SIDH looks interesting as a quantum resistant replacement for ECDH. Nevertheless, KEM construction providing IND-CCA2 security sounds to me like something worth trying. In the next steps I will be experimenting with other algorithms - my non exhaustive list contains Round5 presenting very interesting results, something NTRU based, Kyber and CSIDH. Also at further step I’ll try to tackle quantum-resistant signature schemes.</p>


</section>

 ]]></description>
  <guid>https://www.amongbytes.com/posts/201810-baby-steps-to-PQ-HTTPS-server.html</guid>
  <pubDate>Wed, 10 Oct 2018 00:00:00 GMT</pubDate>
</item>
<item>
  <title>TrustZone Overview</title>
  <dc:creator>Kris Kwiatkowski</dc:creator>
  <link>https://www.amongbytes.com/posts/201805-trustzone_pres_cf/201805-trustzone_pres_cf.html</link>
  <description><![CDATA[ 





<p>Slides from presentation given at the Cloudflare office in London. <!--more--></p>
<p>The goal of the presentation was to introduce main concepts behaind Trusted Execution Environment on ARM and how it could potentially be used on the server side.</p>
<p><img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/1.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/2.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/3.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/4.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/5.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/6.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/7.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/8.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/9.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/10.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/11.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/12.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/13.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/14.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/15.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/16.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/17.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/18.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/19.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/20.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/21.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/22.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/23.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/24.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/25.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/26.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/27.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/28.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/29.png" class="img-fluid"> <img src="https://www.amongbytes.com/posts/201805-trustzone_pres_cf/30.png" class="img-fluid"></p>



 ]]></description>
  <guid>https://www.amongbytes.com/posts/201805-trustzone_pres_cf/201805-trustzone_pres_cf.html</guid>
  <pubDate>Wed, 30 May 2018 00:00:00 GMT</pubDate>
</item>
<item>
  <title>mbedTLS vs BoringSSL on ARM</title>
  <dc:creator>Kris Kwiatkowski</dc:creator>
  <link>https://www.amongbytes.com/posts/201804-comparing-mbedtls-to-boringssl.html</link>
  <description><![CDATA[ 





<section id="goals-and-assumptions" class="level2">
<h2 class="anchored" data-anchor-id="goals-and-assumptions">Goals and assumptions</h2>
<p>Goal is to choose most suitable TLS library that could be statically linked with an application. The application will be runing on modern mobile operating system and variety of ARM CPUs. I’m interested in client side of the TLS only. Ideal library is the one which ensures the best security, implements algorithms optimized for speed and compiles to reasonably small binary. Additionally I assume I can control both sides of the connection, meaning I’m free to choose a cipher(s) to be used for both - symmetric and assymetric encryption (without using PSK). I also have some requirements regarding licences and being open-source.</p>
<p>I’ve identified two libraries which seem to met those requirements:</p>
<ul>
<li><code>mbedTLS</code> - is a library formerly known as <code>PolarSSL</code>. It makes it fairly easy for developers to include cryptographic and TLS capabilities in embedded products. It is highly configurable, so that facilitating TLS functionality may have very small minimal coding footprint. It is currently maintained by ARM.</li>
<li><code>BoringSSL</code> - is a fork of OpenSSL maintained and used by Google. It is a default TLS library used by Android OS (starting from version M), Chrome as well as used on Cloudflare systems. I has advantage of being originated from <code>OpenSSL</code> - it means that library got a lot of reviews and testing.</li>
</ul>
</section>
<section id="testing-application" class="level2">
<h2 class="anchored" data-anchor-id="testing-application">Testing application</h2>
<p>It’s a implementation of simple C-based test application, which compiles and links against library under test and run on ARMv8 platform running Android operating system. The app is composed of client and server. As I’m only interested in client side of the TLS end, we fix the server to always use same library (it’s based on <code>BoringSSL</code>). Server is configured to support only TLSv1.2 (as 1.3 is not supported by <code>mbedTLS</code>, yet [16]). In order to start a server, user provides an argument which specifies cetificate type to be used (RSA, ECDSA or EdDSA based). Once run it always enforces same cipher suite to be used - for example in case of RSA it will be ECDHE key agreement with RSA signature and AES/256 in GCM AEAD mode.</p>
<p>Client application is the one which I want to benchmark. I have implemented one which uses <code>mbedTLS</code> API and links with this library and similar one for <code>BoringSSL</code>. Client always establishes TCP connection in blocking mode (simplicity). It implements 3 different tests:</p>
<ul>
<li><strong>Handshake</strong> : during this test client opens TCP connection and performs many handshake without closing the connection. Performance of this test depends on key type used for certificate signing and symmetric key agreement algorithm (as well as elliptic curve used), hence this test is performed multiple times, once for each certificate type</li>
<li><strong>Write</strong>: clients opens TCP connection and sends few hundred megabytes of data. This test is done mostly to assess performance of symmetric encryption</li>
<li><strong>Read</strong>: clients opens TCP connection and sends a request to the server which sends back few hundred megabytes of data. This test is done mostly to assess performance of symmetric decryption</li>
</ul>
<section id="details-regarding-testing-environment" class="level3">
<h3 class="anchored" data-anchor-id="details-regarding-testing-environment">Details regarding testing environment</h3>
<ul>
<li><p><strong>Software version</strong></p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Library</th>
<th>Commit</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>BoringSSL</td>
<td><code>eb7c3008</code></td>
</tr>
<tr class="even">
<td>mbedTLS</td>
<td><code>4ca9a457</code></td>
</tr>
</tbody>
</table></li>
<li><p><strong>Compiler and environment settings</strong></p>
<table class="caption-top table">
<colgroup>
<col style="width: 34%">
<col style="width: 65%">
</colgroup>
<thead>
<tr class="header">
<th>Name</th>
<th>Setting</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Compiler</td>
<td>aarch64-linux-android-clang5.0 (as Google is deprecating <code>gcc</code> )</td>
</tr>
<tr class="even">
<td>ABI</td>
<td>arm64-v8a</td>
</tr>
<tr class="odd">
<td>NDK version</td>
<td>16b</td>
</tr>
<tr class="even">
<td>Android Native API level</td>
<td>27</td>
</tr>
<tr class="odd">
<td>Android Build type</td>
<td>Release</td>
</tr>
</tbody>
</table></li>
<li><p><strong>Testing platform</strong></p>
<p>Hardware platform used for testing is a HiKey620 development board. It is powered by Kirin 620 SoC (8 x ARM Cortex-A53) from HiSilicon. It is running Android 8 from AOSP (see build details in Appendix B <a href="https://git.amongbytes.com/playground/tls_comparison/blob/master/APPENDIX.md">here</a> ). Details about the board can be found <a href="https://www.96boards.org/product/hikey/">here</a> and <a href="https://source.android.com/setup/devices#620hikey">here</a>.</p>
<p>Details of the environment used:</p>
<pre class="shell"><code>Linux localhost 4.9.29-g23875fc #1 SMP PREEMPT Tue Jul 4 14:25:00 CEST 2017 aarch64
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32</code></pre>
<center>
<p><img src="https://www.96boards.org/product/ce/hikey/images/Hikey-Lemaker-front-web.png" alt="Because any report with picture looks better."></p>
</center></li>
</ul>
</section>
</section>
<section id="preparation-step" class="level2">
<h2 class="anchored" data-anchor-id="preparation-step">Preparation step</h2>
<p>Following script is used to set-up platform for benchmarking. Most important step is to fix CPU frequency so that it is not auto-regulated by things like EAS [11].</p>
<pre class="shell"><code># Number of CPUs on the board
NUM_CPU=8
# CPU scaling governor.
GOVERNOR=userspace
# Requested CPU frequency
MAX_FREQ=1200000


adb root
adb remount
# Prevent system from suspending
adb shell "echo temporary &gt; /sys/power/wake_lock"
# Probably useful only on qcom, but anyway...
adb shell stop thermal-engine
adb shell stop mpdecision

for ID in `seq 0 $((NUM_CPU-1))`
do
adb shell "echo 1 &gt; /sys/devices/system/cpu/cpu${ID}/online"
adb shell "echo ${GOVERNOR} &gt; /sys/devices/system/cpu/cpu${ID}/cpufreq/scaling_governor"
adb shell "echo ${MAX_FREQ} &gt; /sys/devices/system/cpu/cpu${ID}/cpufreq/scaling_setspeed"
done

for ID in `seq 0 $((NUM_CPU-1))`
do
adb shell "cat /sys/devices/system/cpu/cpu${ID}/online"
adb shell "cat /sys/devices/system/cpu/cpu${ID}/cpufreq/scaling_governor"
adb shell "cat /sys/devices/system/cpu/cpu${ID}/cpufreq/scaling_cur_freq"
done</code></pre>
</section>
<section id="binary-size-reduction" class="level2">
<h2 class="anchored" data-anchor-id="binary-size-reduction">Binary size reduction</h2>
<section id="mbedtls" class="level3">
<h3 class="anchored" data-anchor-id="mbedtls">mbedTLS</h3>
<p><code>mbedTLS</code> makes it possible to select features of TLS library before compile time. Configuration template is available in <code>config.h</code> file and is managed by definining or disabling number of preprocessor symbols (look for <code>MBEDTLS_CONFIG_FILE</code> for more details). This is an easy way for developers to include cryptographic and (optional) SSL/TLS capabilities in their products, facilitating those functionalities with a minimal coding footprint. Indeed, it is interesting feature for memory constrained devices (for example microcontrollers).</p>
<p><code>mbedtls</code> compilation produces 3 separated libraries - crypto, ssl and x509 library. Compilation also outputs number of test binaries.</p>
<p>As a first step I have applied set of obvious size optimization provided by compiler (<code>-Os</code>) and stripped all the symbols (they can be stored in separated file if needed). I also applied <code>-ffuntion-sections</code> and <code>--fdata-sections</code> options to the compiler. This will cause compiler to place each function or data item into its own section. Then thanks to <code>-Wl,--gc-sections</code> linker will be able to chose only those sections which are actually used, which makes resulting binary much smaller (one can add <code>-Wl,--print-gc-sections</code> in order to see removed sections). This optimization may produce unexpected results, so I strongly advice to look at documentation and get familiar with the details of this optimization.</p>
<p>In a second step I have changed <code>config.h</code> file and removed capabilities which are not needed by our client application, leaving following capabilities only:</p>
<ul>
<li>TLS client side code</li>
<li>TLS v1.2</li>
<li>AES-GCM used as a symmetric cipher</li>
<li>RSA, ECDSA and ECDH with curves P-256</li>
<li>SHA-256 and SHA-512</li>
<li>Code for key pre-sharing has been removed</li>
<li>Some additional features required by the client code</li>
</ul>
<p>In a third step I’ve removed support of RSA, which from one hand isn’t actually necessarily, as I control both sides of a connection, and from the other hand it’s interesting how much binry size get’s reduced.</p>
<table class="caption-top table">
<colgroup>
<col style="width: 4%">
<col style="width: 41%">
<col style="width: 15%">
<col style="width: 14%">
<col style="width: 16%">
<col style="width: 8%">
</colgroup>
<thead>
<tr class="header">
<th>Step</th>
<th>Optim</th>
<th><code>libmbedx509.a</code></th>
<th><code>libmbedtls.a</code></th>
<th><code>libmbedcrypto.a</code></th>
<th>Test app</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>0</td>
<td>Initial size (with -O2)</td>
<td>96K</td>
<td>260K</td>
<td>604K</td>
<td>464K</td>
</tr>
<tr class="even">
<td>1</td>
<td>Removal of data and function sections, strip, -Os</td>
<td>68K</td>
<td>132K</td>
<td>380K</td>
<td>272K</td>
</tr>
<tr class="odd">
<td>2</td>
<td>Disabling not needed capabilities</td>
<td>52K</td>
<td>52K</td>
<td>236K</td>
<td>128K</td>
</tr>
<tr class="even">
<td>3</td>
<td>Disabling RSA</td>
<td>40K</td>
<td>48K</td>
<td>208K</td>
<td>108K</td>
</tr>
</tbody>
</table>
<p>The test client has been reduced more than 4 times and indeed to very small size. Further reductions are possible (see [8] for ideas), nevertheless at this point I’m satisfied with the size and I don’t think it is possible to change it much. Also removing RSA reduces a size only by 20 bytes, so I’ve decided to keep RSA and pay a little penalty.</p>
<p>Also it’s worth noting that 48KB for TLSv1.2 implementation is really small memory footprint. Very interesting for small devices which implement most of needed crypto in hardware.</p>
</section>
<section id="boringssl" class="level3">
<h3 class="anchored" data-anchor-id="boringssl">BoringSSL</h3>
<p>Similar experiment as bove has been done with BoringSSL. This library doesn’t offer so many configuration possibilities as <code>mbedTLS</code>, nevertheless it provides some.</p>
<p>In the first step I’ve applied exactly same compiler flags as in case of <code>mbedTLS</code> (<code>-Os</code>, symbol strip, indexing data&amp;function sections).</p>
<p>In a second step I’ve applied <code>OPENSSL_SMALL=1</code> configuration option. This tells the compiler to use algorithm implementation which is optimized for size rather than for speed (see [12] for more details).</p>
<p>In a third step I’ve tried to remove assembly implementation. For some algorithms this causes huge performance degradation as some optimizations are written in assembly as well as hardware acceleration needs a “glue” code which is written in assembly. Nevertheless, it is interesting step when comparing against <code>mbedTLS</code>, as it doesn’t have any such optimizations currently (see [6]).</p>
<p><code>BoringSSL</code> provides concept of <em>crypto buffers</em> which can be used instead of some functions from memory hungry X509 and ASN.1 implementation. This feature together with indexing data&amp;function sections (done in first step) greatly reduces binary size. We have used it in step 4. In step 5 we go a bit further - need for X509 and ASN.1 can be complatelly removed, assuming user provides custom certificate verification function. My client doesn’t implement such function, but from one hand it shouldn’t be very complicated to implement such function and also code size of such function won’t change much final binary size. Hence it’s interesting to see a result of size reduction.</p>
<p>In last step I’ve tried to introduce more aggressive changes to the comment out (with preprocessor symbols) RSA and DH implementation.</p>
<table class="caption-top table">
<colgroup>
<col style="width: 5%">
<col style="width: 68%">
<col style="width: 9%">
<col style="width: 8%">
<col style="width: 8%">
</colgroup>
<thead>
<tr class="header">
<th>Step</th>
<th>Optim</th>
<th><code>libcrypto.a</code></th>
<th><code>libssl.a</code></th>
<th>Test App</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>0</td>
<td>Initial size</td>
<td>12.4M</td>
<td>7.6M</td>
<td>7.5M</td>
</tr>
<tr class="even">
<td>1</td>
<td>Removal of data and function sections, strip, -Os</td>
<td>1244K</td>
<td>356K</td>
<td>796K</td>
</tr>
<tr class="odd">
<td>2</td>
<td>OPENSSL_SMALL=1</td>
<td>1200K</td>
<td>356K</td>
<td>756K</td>
</tr>
<tr class="even">
<td>3</td>
<td>OPENSSL_NO_ASM</td>
<td>1184K</td>
<td>356K</td>
<td>736K</td>
</tr>
<tr class="odd">
<td>4</td>
<td>BoringSSL crypto buffers</td>
<td>1184K</td>
<td>356K</td>
<td>700K</td>
</tr>
<tr class="even">
<td>5</td>
<td>Complate elimination of X.509 and ASN.1 code</td>
<td>1184K</td>
<td>356K</td>
<td>392K</td>
</tr>
<tr class="odd">
<td>6</td>
<td>Disabling RSA and DH</td>
<td>1144K</td>
<td>352K</td>
<td>356K</td>
</tr>
</tbody>
</table>
<p>I’m positivelly surprised by the fact that it is possible to remove X509 and ASN.1 code, it gives you really small library. At the moment I don’t want to implement my own certificate verification function and I want to perform certificate verification during performance benchmarking. But it’s worth noting that with fairly small cost BoringSSL can be reduced almost twice to the binary size that’s a bit more than 3 times bigger than the one produced with mbedTLS, which is quite interesting.</p>
<p>Removing ASM hits performance a lot - so I will keep it. Removing RSA and DH gives on 36KB smaller binary, but it introduces very high maintenance cost - it will be hard and error prone to apply those changes to the code after updating library to newer version. As a side note - OpenSSL has a switch which removes RSA (<code>OPENSSL_NO_RSA</code>), FWIW it might be that this code could be ported to <code>BoringSSL</code>.</p>
<p>Finally for my further analysis I’ll apply steps 1,2 and 4 (and I’ll encourage again to apply step 5).</p>
</section>
</section>
<section id="notes-on-size-reduction" class="level2">
<h2 class="anchored" data-anchor-id="notes-on-size-reduction">Notes on size reduction</h2>
<ul>
<li>Something that wasn’t tried is a <em>Link Time Optimization</em> feature which may provide binary with reduced size (see [3], [4] and [5]).
<ul>
<li>It might be interesting to see how different results will be when using this features instead/with section indexing</li>
</ul></li>
<li>I’ve calculated also size of shared libraries for boring ssl - <code>libcrypto.so</code>: 1072KB; <code>libssl.so</code>: 276KB</li>
<li><code>mbedTLS</code> doesn’t implement hardware acceleration, so performance won’t be as good as for <code>BoringSSL</code>. I wonder if it would make sense to take exremly small SSL implementation from <code>mbedTLS</code> and use crypto from <code>BoringSSL</code>.</li>
</ul>
</section>
<section id="performance-comparison-comparison" class="level2">
<h2 class="anchored" data-anchor-id="performance-comparison-comparison">Performance comparison comparison</h2>
<section id="results-from-tools-provided-by-the-libraries" class="level3">
<h3 class="anchored" data-anchor-id="results-from-tools-provided-by-the-libraries">Results from tools provided by the libraries</h3>
<p>Both libraries provide tools for benchmarking. This subsection compares results reported by those tools. I compare default compilation against binary I got after applying tricks which reduce size of client application.</p>
<section id="mbedtls-default-vs-reduced" class="level4">
<h4 class="anchored" data-anchor-id="mbedtls-default-vs-reduced">mbedTLS: default vs reduced</h4>
<p><code>mbedTLS</code> provides a tool for performance benchmarking called <code>benchmark</code>. The table below shows results for most interesting algorithms (for results of all algorithms see Appendix C <a href="https://git.amongbytes.com/playground/tls_comparison/blob/master/APPENDIX.md">here</a>.</p>
<table class="caption-top table">
<colgroup>
<col style="width: 42%">
<col style="width: 28%">
<col style="width: 28%">
</colgroup>
<thead>
<tr class="header">
<th>Algo</th>
<th>Reduced</th>
<th>Default (-O2)</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>SHA-256</td>
<td>46809 KiB/s</td>
<td>52044 KiB/s</td>
</tr>
<tr class="even">
<td>AES-GCM-128</td>
<td>16399 KiB/s</td>
<td>16398 KiB/s</td>
</tr>
<tr class="odd">
<td>AES-GCM-256</td>
<td>14287 KiB/s</td>
<td>14286 KiB/s</td>
</tr>
<tr class="even">
<td>RSA-2048</td>
<td>652 public/s</td>
<td>653 public/s</td>
</tr>
<tr class="odd">
<td>RSA-2048</td>
<td>17 private/s</td>
<td>17 private/s</td>
</tr>
<tr class="even">
<td>RSA-4096</td>
<td>168 public/s</td>
<td>168 public/s</td>
</tr>
<tr class="odd">
<td>RSA-4096</td>
<td>3 private/s</td>
<td>3 private/s</td>
</tr>
<tr class="even">
<td>ECDSA-secp256r1</td>
<td>189 sign/s</td>
<td>195 sign/s</td>
</tr>
<tr class="odd">
<td>ECDHE-secp256r1</td>
<td>57 handshake/s</td>
<td>60 handshake/s</td>
</tr>
<tr class="even">
<td>ECDH-secp256r1</td>
<td>77 handshake/s</td>
<td>81 handshake/s</td>
</tr>
<tr class="odd">
<td>ECDHE-Curve25519</td>
<td>41 handshake/s</td>
<td>41 handshake/s</td>
</tr>
<tr class="even">
<td>ECDH-Curve25519</td>
<td>80 handshake/s</td>
<td>82 handshake/s</td>
</tr>
</tbody>
</table>
<p>One thing to notice is that (for algorithms above) there is no much difference bewteen applying <code>-Os</code> and <code>-O2</code> as <code>-Os</code> enables all <code>-O2</code> optimizations that do not typically increase code size. Also it’s worth to notice performance difference between static and ephemeral ECDH. It seems to be quite weird and probably root cause should be studied further.</p>
</section>
<section id="boringssl-default-vs-reduced" class="level4">
<h4 class="anchored" data-anchor-id="boringssl-default-vs-reduced">BoringSSL: default vs reduced</h4>
<p>Performance results are provieded by <code>bssl speed</code> tool from BoringSSL. Table with most interesting algorithms (for results of all algorithms see Appendix C <a href="https://git.amongbytes.com/playground/tls_comparison/blob/master/APPENDIX.md">here</a>.</p>
<table class="caption-top table">
<colgroup>
<col style="width: 33%">
<col style="width: 33%">
<col style="width: 33%">
</colgroup>
<thead>
<tr class="header">
<th>Operation</th>
<th>Reduced</th>
<th>Default (-O2)</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>RSA 2048 signing</td>
<td>(59.5 ops/sec)</td>
<td>(108.1 ops/sec)</td>
</tr>
<tr class="even">
<td>RSA 2048 verify</td>
<td>(2377.5 ops/sec)</td>
<td>(4078.3 ops/sec)</td>
</tr>
<tr class="odd">
<td>RSA 4096 signing</td>
<td>(8.3 ops/sec)</td>
<td>(14.9 ops/sec)</td>
</tr>
<tr class="even">
<td>RSA 4096 verify</td>
<td>(668.0 ops/sec)</td>
<td>(1088.4 ops/sec)</td>
</tr>
<tr class="odd">
<td>AES-128-GCM (1350 bytes)</td>
<td>(17675.4 ops/sec): 23.9 MB/s</td>
<td>(291430.3 ops/sec): 393.4 MB/s</td>
</tr>
<tr class="even">
<td>AES-256-GCM (1350 bytes)</td>
<td>(14792.6 ops/sec): 20.0 MB/s</td>
<td>(254718.5 ops/sec): 343.9 MB/s</td>
</tr>
<tr class="odd">
<td>ChaCha20-Poly1305 (1350 bytes)</td>
<td>(33108.8 ops/sec): 44.7 MB/s</td>
<td>(67622.8 ops/sec): 91.3 MB/s</td>
</tr>
<tr class="even">
<td>SHA-256 (8192 bytes)</td>
<td>(6824.7 ops/sec): 55.9 MB/s</td>
<td>(63214.7 ops/sec): 517.9 MB/s</td>
</tr>
<tr class="odd">
<td>SHA-512 (8192 bytes)</td>
<td>(14014.7 ops/sec): 114.8 MB/s</td>
<td>(14759.6 ops/sec): 120.9 MB/s</td>
</tr>
<tr class="even">
<td>RNG (8192 bytes)</td>
<td>(4058.7 ops/sec): 33.2 MB/s</td>
<td>(55705.4 ops/sec): 456.3 MB/s</td>
</tr>
<tr class="odd">
<td>ECDH P-256 operations</td>
<td>(594.7 ops/sec)</td>
<td>(642.8 ops/sec)</td>
</tr>
<tr class="even">
<td>ECDSA P-256 signing</td>
<td>(1396.6 ops/sec)</td>
<td>(1738.5 ops/sec)</td>
</tr>
<tr class="odd">
<td>ECDSA P-256 verify</td>
<td>(672.1 ops/sec)</td>
<td>(704.2 ops/sec)</td>
</tr>
</tbody>
</table>
</section>
</section>
<section id="comparing-mbedtls-and-boringssl-based-client" class="level3">
<h3 class="anchored" data-anchor-id="comparing-mbedtls-and-boringssl-based-client">Comparing <code>mbedTLS</code> and <code>BoringSSL</code> based client</h3>
<section id="default-compilation" class="level4">
<h4 class="anchored" data-anchor-id="default-compilation">Default compilation</h4>
<p>Those results represent as close to best possible performance that we should expect on ARMv8 when using BoringSSL as a client.</p>
<ul>
<li><p>Performance:</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Test</th>
<th>mbedTLS</th>
<th>BoringSSL</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Handshakes - RSA_2048 (x200)</td>
<td>0m21.69s</td>
<td>0m03.12s</td>
</tr>
<tr class="even">
<td>Handshakes - ECDSA_256 (x200)</td>
<td>0m24.34s</td>
<td>0m01.38s</td>
</tr>
<tr class="odd">
<td>Write - ECDSA_256 (AES-GCM)</td>
<td>0m16.28s</td>
<td>0m03.94s</td>
</tr>
<tr class="even">
<td>Read - ECDSA_256 (AES-GCM)</td>
<td>0m17.49s</td>
<td>0m03.92s</td>
</tr>
</tbody>
</table></li>
</ul>
<p>I could find following reasons for difference in performance:</p>
<ol type="1">
<li><p>BoringSSL contains support for ARMv8 crypto extensions implemented in hardrware (AES, PMULL, SHA256), which mbedTLS doesn’t support yet [6]. BoringSSL also uses vector instructions (NEON) for some algorithms, NEON can be find on both v7 (optional) and v8 (mandatory) ARMs. Nevertheless algorithms used in this test do not use NEON. But, Poly1305-ChaCha20 uses NEON and this is important because it could optimize devices based on ARMv7. Those devices do not offer hardware accelerated AES and hence if AES is used on such devices, it will be much slower. Poly-ChaCha implementation is only available in the <code>BoringSSL</code>. One more comment on hardware support - it is discovered at runtime and BoringSSL will fallback to software implementation (or NEON and then software) in case CPU doesn’t support required extension.</p></li>
<li><p>BoringSSL client supports X25519 curve. From the other hand, mbedTLS doesn’t support this curve in TLS (it supports it only as a primitive [10]). In the test above <code>mbedTLS</code> used<code>NIST P-384</code>. Implementation of arithmetic on x25519 curve is much more efficient than than <code>P-384</code>. It’s obviously wrong to compare two different curves - one of the tests below enforces usage of P-256.</p></li>
<li><p>It seems <code>mbedTLS</code> does more I/O - it sends more TCP packets than BoringSSL</p></li>
</ol>
<ul>
<li>exchanged TCP packets were generally bigger (for example ClientHello, 470B - mbedTLS and 213B - BoringSSL)</li>
<li>mbedTLS sends “Client Key Exchange” and “Change Cipher Spec” in separated TCP packets, which is not a case for BoringSSL</li>
</ul>
<p>According to mbedTLS forum, every TLS message is sent using the <code>send</code> bio callback. The default implementation is that every packet sent is sent separately. One could supply custom send callback, that will concatenate every possible message, and will send as one TCP packet. Nevertheless, this wasn’t done during this analysis.</p>
<p>Following two tests try to build libraries and TLS clients with different profiles, hopefully eliminating as much as possible some of differences described above.</p>
<ul>
<li><p>Software implementation only</p>
<p>For this test I’ve built BoringSSL client which uses only crypto implemented in software and doesn’t use hardware acceleration. Those results should help to understand how BoringSSL will behave on CPUs which don’t provide such features.</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Test</th>
<th>mbedTLS</th>
<th>BoringSSL</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Handshakes - RSA_2048 (x200)</td>
<td>0m20.89s</td>
<td>0m04.89s</td>
</tr>
<tr class="even">
<td>Handshakes - ECDSA_256 (x200)</td>
<td>0m23.80s</td>
<td>0m01.66s</td>
</tr>
<tr class="odd">
<td>Write - ECDSA_256 (AES-GCM)</td>
<td>0m16.41s</td>
<td>0m13.79s</td>
</tr>
<tr class="even">
<td>Read - ECDSA_256 (AES-GCM)</td>
<td>0m17.51s</td>
<td>0m13.72s</td>
</tr>
</tbody>
</table>
<p>Ok, so mostly symmetric encryption is affected.</p></li>
<li><p>Enforcing usage of NIST P-256 curve for ECDHE</p>
<p>This test enforces usage of curve NIST P-256. This mostly affect handshake time and eliminates some differences seen in first performance test.</p>
<table class="caption-top table">
<thead>
<tr class="header">
<th>Test</th>
<th>mbedTLS</th>
<th>BoringSSL</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Handshakes - RSA_2048 (x200)</td>
<td>0m16.88s</td>
<td>0m03.56s</td>
</tr>
<tr class="even">
<td>Handshakes - ECDSA_256 (x200)</td>
<td>0m20.26s</td>
<td>0m01.89s</td>
</tr>
</tbody>
</table></li>
</ul>
</section>
</section>
<section id="other-things" class="level3">
<h3 class="anchored" data-anchor-id="other-things">Other things</h3>
<p>BoringSSL seems to be a better choice, let see what else it offers.</p>
<section id="using-eddsa-with-x25519-for-ecdhe" class="level4">
<h4 class="anchored" data-anchor-id="using-eddsa-with-x25519-for-ecdhe">Using EdDSA with X25519 for ECDHE</h4>
<p>During course of action, I’ve found out that BoringSSL offers possibility to use Ed25519 with TLSv1.2. Results below show differences in performing 500 handshakes with Ed25519, ECDSA/P-256 and RSA/2048. CA certificate is still RSA/2048 (same as it was used for other tests).</p>
<ul>
<li><p>Performance:</p>
<table class="caption-top table">
<colgroup>
<col style="width: 32%">
<col style="width: 17%">
<col style="width: 14%">
<col style="width: 16%">
<col style="width: 19%">
</colgroup>
<thead>
<tr class="header">
<th>Handshake x500</th>
<th>TLS handsh.</th>
<th>PubKey</th>
<th>Sign size</th>
<th>Degradation</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td>Handshakes - Ed25519</td>
<td>0m02.72s</td>
<td>256 bits</td>
<td>512 bits</td>
<td></td>
</tr>
<tr class="even">
<td>Handshakes - ECDSA</td>
<td>0m03.47s</td>
<td>256 bits</td>
<td>512 bits</td>
<td>27.6%</td>
</tr>
<tr class="odd">
<td>Handshakes - RSA</td>
<td>0m07.83s</td>
<td>2048 bits</td>
<td>2048 bits</td>
<td>287.9%</td>
</tr>
</tbody>
</table></li>
</ul>
<p>It’s worth noticing that Ed25519 and ECDSA offer same security level and RSA/2048 is a bit weaker. Nevertheless, Ed25519 certificates are not yet very popular.</p>
</section>
<section id="tls-1.3-0-rtt" class="level4">
<h4 class="anchored" data-anchor-id="tls-1.3-0-rtt">TLS 1.3 &amp; 0-RTT</h4>
<p>Only BoringSSL supports TLS 1.3, at the moment it implements latest draft of the standard (28). Gains from using TLS v1.3 (and 0-RTT) are well described in [13].</p>
</section>
</section>
</section>
<section id="out-of-scope-left-for-further-analysis" class="level2">
<h2 class="anchored" data-anchor-id="out-of-scope-left-for-further-analysis">Out of scope / left for further analysis:</h2>
<p>Few points there were not checked:</p>
<ul>
<li>32bit (<code>armeabi-v7a</code>) code may be smaller and still run on ARM64. Thumb mode (variable-length instruction set) will produce even more compact code. Thumb mode is default setting in NDK</li>
<li>Something I havn’t checked is a power consumption, which is important in case of mobile application. It’s not complicated but requires specific hardware (see [18] and [19]). I assumed that thing which executes in smaller amount of time will consume less. But this assumption should be verified, as it’s probably not always true.</li>
<li>Implementing hardware acceleration <code>mbedTLS</code> is obvious improvement which should be considered. See <a href="https://tls.mbed.org/kb/development/hw_acc_guidelines">here</a> for more details. It is also highly time consuming task.</li>
<li><code>mbedTLS</code> supports so called “alternative” implementation. One idea on using it would be to swap existing implementation of ECC with either smaller or faster implementation (for smaller implementation I would recomend uECC, which can be as small as 4KB). Other option could be to use small SSL implementation from mbedTLS and fast crypto implementation from BoringSSL or NaCL [17].</li>
<li><code>mbedTLS</code> has a configuration option called <code>MBEDTLS_SSL_MAX_CONTENT_LEN</code> which determines the size of internal I/O buffer. Playing with this value may help improve performance or reduce size.</li>
<li>Performance of Poly-ChaCha on ARMv7</li>
</ul>
</section>
<section id="conclusion" class="level2">
<h2 class="anchored" data-anchor-id="conclusion">Conclusion</h2>
<p>My preference goes to BoringSSL for following reasons:</p>
<ul>
<li>It offers much better performance on ARM</li>
<li>It offers more features like TLSv1.3 and Curve25519</li>
<li>It compiles to binary size which is reasonable. Smallest possible resulting library is 3 times bigger than the one based on <code>mbedTLS</code>, overall result is just 350KB. The difference between smallest possible <code>mbedTLS</code> based client and <code>BoringSSL</code> one is just 248KB. Let say the library will be linked to each and every application on the phone. Assuming user has has 100 apps on a phone, the difference in size is 24MB, which nowadays is negligible. Also ccording to report by Statista [14], on average users have 27 apps instaled on the phone (which is less an argument and more interesting information).</li>
<li>BoringSSL is a default TLS library on Android and is a Google product. It means that there is a lot of intrest to make even more secure and fast.</li>
<li>Recently BoringSSL received formally verified implementation of Curve25519 and P-256 (see [15])</li>
</ul>
<p>It seems both libraries have very different design goals. mbedTLS is made for resource constrained embedded systems, which face challanges in terms of memory availability. Embedded platforms often do not exceed 256KB of RAM, often don’t have memory management units and cannot support virtual memory, as a result dynamic allocation is avoided. I believe for such systems mbedTLS is unbeatable and a great choice.</p>
<p>BoringSSL doesn’t seem to have similar design goal. It seems to be designed for devices which offer more RAM, storage space and in general have much different profile than resource constrained embedded systems. Mobile devices offer all those features and it would be huge mistake not make use of it.</p>
<p>When thinking about software design, there is great difference between aiming for “reasonably small” and “smallest possible bianry size” - those are basically two different goals.</p>
</section>
<section id="finally" class="level2">
<h2 class="anchored" data-anchor-id="finally">Finally</h2>
<p>I would like to thank Ron E. from mbedTLS team for all the answers for my questions.</p>
<p>UPDATE: Recently one of my co-workers has implemented performance improvement for ARMv64. It is small change which give good speedup - see more details (here)[https://github.com/ARMmbed/mbedtls/pull/1964].</p>
</section>
<section id="footnotes" class="level2">
<h2 class="anchored" data-anchor-id="footnotes">Footnotes</h2>
<ul>
<li>[0] Android NDK: reducing binary sizes: https://blog.algolia.com/android-ndk-how-to-reduce-libs-size/</li>
<li>[1] to check: https://stackoverflow.com/questions/6771905/how-to-decrease-the-size-of-generated-binaries</li>
<li>[2] C/C++ reducing size http://ptspts.blogspot.co.uk/2013/12/how-to-make-smaller-c-and-c-binaries.html</li>
<li>[3] “Link time optimization” in https://www.iecc.com/linker/linker11.html</li>
<li>[4] LTO GCC: https://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html</li>
<li>[5] LTO LLVM: https://llvm.org/docs/LinkTimeOptimization.html</li>
<li>[6] https://github.com/ARMmbed/mbedtls/pull/1424</li>
<li>[7] “Link time garbage collection” in https://www.iecc.com/linker/linker11.html</li>
<li>[8] https://github.com/android-ndk/ndk/issues/436</li>
<li>[9] https://tls.mbed.org/kb/how-to/reduce-mbedtls-memory-and-storage-footprint</li>
<li>[10] https://github.com/ARMmbed/mbedtls/issues/941</li>
<li>[11] https://wiki.linaro.org/WorkingGroups/PowerManagement/Resources/EAS</li>
<li>[12] https://boringssl.googlesource.com/boringssl/+/HEAD/BUILDING.md</li>
<li>[13] https://blog.cloudflare.com/introducing-0-rtt/</li>
<li>[14] https://www.apptentive.com/blog/2017/06/22/how-many-mobile-apps-are-actually-used/</li>
<li>[15] https://boringssl.googlesource.com/boringssl/+/HEAD/third_party/fiat/</li>
<li>[16] https://tls.mbed.org/discussions/feature-request/any-plans-for-tls-1-3-support</li>
<li>[17] https://eprint.iacr.org/2018/354/20180418:202819</li>
<li>[18] https://source.android.com/devices/tech/power/component</li>
<li>[19] https://developer.arm.com/products/software-development-tools/ds-5-development-studio/streamline/arm-energy-probe</li>
</ul>


</section>

 ]]></description>
  <guid>https://www.amongbytes.com/posts/201804-comparing-mbedtls-to-boringssl.html</guid>
  <pubDate>Thu, 19 Apr 2018 00:00:00 GMT</pubDate>
  <media:content url="https://www.thyrasec.com/wp-content/uploads/2024/05/image.png" medium="image" type="image/png"/>
</item>
<item>
  <title>How run GDB on an Android</title>
  <dc:creator>Kris Kwiatkowski</dc:creator>
  <link>https://www.amongbytes.com/posts/201804-debugging-on-android.html</link>
  <description><![CDATA[ 





<p>Android NDK comes with GDB, somewhere in the NDK folder one can find <code>gdbserver</code> and <code>gdb</code> binaries. The idea is obviously to run <code>gdbserver</code> on the device and then connect to it from local host with <code>gdb</code>. For that to work - both server and client need to have available binary that both are debugging (that’s because both need to have debugging symbols).</p>
<p>Let say I want to debug something which is called <code>main</code>. First step would be to export some variables</p>
<pre class="shell"><code># Change line below to wherever you keep NDK
NDK_DIR=/opt/android-ndk

HOST_GDBSERVER=${NDK_DIR}/prebuilt/android-arm64/gdbserver/gdbserver
HOST_GDB=${NDK_DIR}/prebuilt/linux-x86_64/bin/gdb

HOST_APP=/tmp/main
TARGET_APP=/data/app/main
TARGET_GDBSERVER=/data/app/gdbserver
PORT=5039</code></pre>
<p>Then in one terminal I would start <code>gdbserver</code></p>
<pre><code>adb forward tcp:${PORT} tcp:${PORT}
adb push ${HOST_GDBSERVER} ${TARGET_GDBSERVER}
adb shell ${TARGET_GDBSERVER} :${PORT} ${TARGET_APP}</code></pre>
<p>And <code>gdb</code> in another terminal:</p>
<pre><code>${HOST_GDB} ${HOST_APP}</code></pre>
<p>While in <code>gdb</code>, you can connect to gdb server</p>
<pre><code>target remote :5039</code></pre>
<p>That’s it, easy-peasy. Happy debugging!</p>
<!--
## How to run Valgrind
TODO:
-->



 ]]></description>
  <guid>https://www.amongbytes.com/posts/201804-debugging-on-android.html</guid>
  <pubDate>Sun, 15 Apr 2018 19:51:13 GMT</pubDate>
  <media:content url="https://www.91-cdn.com/hub/wp-content/uploads/2023/09/New-Android-logo-2023.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Creating certificates for TLS testing</title>
  <dc:creator>Kris Kwiatkowski</dc:creator>
  <link>https://www.amongbytes.com/posts/201804-creating-certificates-for-ssl-testing.html</link>
  <description><![CDATA[ 





<p>In some cases, it is needed to create your chain of certificates - CA and server (for example TLS testing). There are many descriptions out there on how to do it, nevertheless, I couldn’t find any copy-paste examples which would give me an RSA, ECDSA and EdDSA certificates. Hence, here below, one can find some instructions on how to use <code>openssl</code> to quickly create your certs which, then can then be used during TLS verification.</p>
<p>This post doesn’t explain meaning of configuration used. If such explenation is needed I would suggest reading <em>“Network Security with OpenSSL: Cryptography for Secure Communications”</em>, by J. Viega or looking for required information at this <a href="https://jamielinux.com/docs/openssl-certificate-authority/introduction.html">blog</a>.</p>
<section id="configuration-file" class="level2">
<h2 class="anchored" data-anchor-id="configuration-file">Configuration file</h2>
<p>OpenSSL uses configuration file in order to store information required during certificate creation. Configuraiton file contains things like organization name, address, location, internet address, default hash algorithm used to produce signatures, etc.</p>
<p>Name of both - my example CA and an organization for which server certificate will be created - is called “Cert Testing Organization” with an address <code>www.cert_testing.com</code>.</p>
<p>Here below configuration file used in this example. Copy &amp; paste it to file <code>openssl.cnf</code>:</p>
<pre class="shell"><code>[ ca ]
# `man ca`
default_ca = CA_default

[ CA_default ]
# Directory and file locations.
dir               = .
certs             = $dir/certs
crl_dir           = $dir/crl
new_certs_dir     = $dir/newcerts
database          = $dir/index.txt
serial            = $dir/serial
RANDFILE          = $dir/private/.rand

# The root key and root certificate.
private_key       = $dir/root.key
certificate       = $dir/root.pem

# For certificate revocation lists.
crlnumber         = $dir/crlnumber
crl               = $dir/crl/intermediate.crl.pem
crl_extensions    = crl_ext
default_crl_days  = 30

# SHA-1 is deprecated, so use SHA-2 instead.
default_md        = sha256

name_opt          = ca_default
cert_opt          = ca_default
default_days      = 9999
preserve          = no
policy            = policy_loose

[ policy_strict ]
# The root CA should only sign intermediate certificates that match.
# See the POLICY FORMAT section of `man ca`.
countryName             = match
stateOrProvinceName     = match
organizationName        = match
organizationalUnitName  = optional
commonName              = supplied
emailAddress            = optional

[ policy_loose ]
# Allow the intermediate CA to sign a more diverse range of certificates.
# See the POLICY FORMAT section of the `ca` man page.
countryName             = optional
stateOrProvinceName     = optional
localityName            = optional
organizationName        = optional
organizationalUnitName  = optional
commonName              = supplied
emailAddress            = optional

[ req ]
# Options for the `req` tool (`man req`).
default_bits        = 4096
distinguished_name  = req_distinguished_name
string_mask         = utf8only

[ req_distinguished_name ]
countryName                     = Country Name (2 letter code)
stateOrProvinceName             = State or Province Name (full name)
localityName                    = Locality Name (eg, city)
organizationalUnitName          = Organizational Unit Name (eg, section)
commonName                      = Common Name

stateOrProvinceName_default     = PACA
countryName_default             = FR
localityName_default            = Cagnes sur Mer
organizationalUnitName_default  = Cert Testing Organization
commonName_default              = Cert Testing Organization
commonName_max                  = 64

[ v3_ca ]
# Extensions for a typical CA (`man x509v3_config`).
subjectKeyIdentifier        = hash
authorityKeyIdentifier      = keyid:always,issuer
basicConstraints            = critical, CA:true
keyUsage                    = critical, digitalSignature, cRLSign, keyCertSign

[ v3_intermediate_ca ]
# Extensions for a typical intermediate CA (`man x509v3_config`).
subjectKeyIdentifier        = hash
authorityKeyIdentifier      = keyid:always,issuer
basicConstraints            = critical, CA:true, pathlen:0
keyUsage                    = critical, digitalSignature, cRLSign, keyCertSign

[ usr_cert ]
# Extensions for client certificates (`man x509v3_config`).
basicConstraints        = CA:FALSE
nsCertType              = client, email
nsComment               = 'Cert Testing Intermediate - Client'
subjectKeyIdentifier    = hash
authorityKeyIdentifier  = keyid,issuer
keyUsage                = critical, nonRepudiation, digitalSignature, keyEncipherment
extendedKeyUsage        = clientAuth, emailProtection

[ server_cert ]
# Extensions for server certificates (`man x509v3_config`).
basicConstraints        = CA:FALSE
nsCertType              = server
nsComment               = 'Cert Testing Intermediate - Server'
subjectKeyIdentifier    = hash
authorityKeyIdentifier  = keyid,issuer:always
keyUsage                = critical, digitalSignature, keyEncipherment
extendedKeyUsage        = serverAuth
subjectAltName          = @alt_names

[ client_cert ]
# Extensions for server certificates (`man x509v3_config`).
basicConstraints        = CA:FALSE
nsCertType              = client, email
nsComment               = 'Cert Testing EE - Client'
subjectKeyIdentifier    = hash
authorityKeyIdentifier  = keyid,issuer
keyUsage                = critical, nonRepudiation, digitalSignature, keyEncipherment
extendedKeyUsage        = clientAuth, emailProtection

[ crl_ext ]
# Extension for CRLs (`man x509v3_config`).
authorityKeyIdentifier  = keyid:always

[ ocsp ]
# Extension for OCSP signing certificates (`man ocsp`).
basicConstraints        = CA:FALSE
subjectKeyIdentifier    = hash
authorityKeyIdentifier  = keyid,issuer
keyUsage                = critical, digitalSignature
extendedKeyUsage        = critical, OCSPSigning

[alt_names]
DNS.1   = *.cert_testing.com
IP.1    = 127.0.0.1</code></pre>
</section>
<section id="preparation" class="level2">
<h2 class="anchored" data-anchor-id="preparation">Preparation</h2>
<p>We will need some directories where output of cert generation will be stored:</p>
<pre class="shell"><code>mkdir -p private
mkdir -p certs
mkdir -p csr</code></pre>
</section>
<section id="ca-cert-creation" class="level2">
<h2 class="anchored" data-anchor-id="ca-cert-creation">CA cert creation</h2>
<ol type="1">
<li><p>CA private key</p>
<p>First step is to create private key of CA cert. Root cert will use RSA keypair with key length of 4096 bits.</p>
<p>OpenSSL will ask for pasword - provide <code>test123</code>.</p>
<p><!-- just remove -aes256 and it won't use password --></p></li>
</ol>
<pre class="shell"><code>    openssl genrsa -aes256 -out private/ca.key 4096</code></pre>
<pre><code>or in case of ECDSA certificates:</code></pre>
<pre class="shell"><code>    openssl ecparam -name prime256v1 -genkey -noout -out private/ca.key
    openssl ec -in private/ca.key -out private/ca.key -aes256</code></pre>
<pre><code>Here second line (encrypting ca.key) is needed only for rest of the article to be copy-paste'able.</code></pre>
<ol start="2" type="1">
<li><p>Create CA cert</p>
<p>This command will use a key created above and create self-signed CA certificate. Certificate will be valid for 9999 days.</p>
<p>Provide password <code>test123</code> and hit enter on everything else. <code>openssl</code> will use values defined in <code>openssl.cnf</code>.</p></li>
</ol>
<pre class="shell"><code>     openssl req -config openssl.cnf \
        -extensions v3_ca -new -x509 -days 9999 \
        -key private/ca.key \
        -out certs/ca.cert</code></pre>
<pre><code>One interesting option to notice is ``-extensions v3_ca`` - it is reference to the section with the same name in ``openssl.cnf``. This section tells the ``openssl`` that created certificate must be a CA cert (``CA:true``).</code></pre>
</section>
<section id="server-cert-creation" class="level2">
<h2 class="anchored" data-anchor-id="server-cert-creation">Server cert creation</h2>
<p>In this example, certificate signing is done in 3 steps.</p>
<ul>
<li>Create server certificate private key</li>
<li>Create certificate singing request</li>
<li>Sign the request with CA private key</li>
</ul>
<p>So let’s do it.</p>
<ol type="1">
<li><p>Server’s private key (I skip intermediate certs creation for the brevity).</p>
<ul>
<li>RSA/2048 with e=3, for fast verification</li>
</ul></li>
</ol>
<pre class="shell"><code>        openssl genpkey -algorithm RSA \
            -pkeyopt rsa_keygen_bits:2048 \
            -pkeyopt rsa_keygen_pubexp:3 \
            -out private/rsa_2048.key</code></pre>
<pre><code>* ECDSA/P-256</code></pre>
<pre class="shell"><code>        openssl genpkey -algorithm EC \
            -pkeyopt ec_paramgen_curve:P-256 \
            -pkeyopt ec_param_enc:named_curve \
            -out private/ecdsa_p256.key</code></pre>
<pre><code>* EdDSA/25519 (supported by newer version of ``openssl`` and in TLS 1.3 only)</code></pre>
<pre class="shell"><code>        openssl genpkey -algorithm Ed25519 \
            -out private/ed25519.key</code></pre>
<ol start="2" type="1">
<li><p>Create certificate signing request - intermediary step</p>
<ul>
<li>RSA</li>
</ul>
<pre class="shell"><code>     openssl req -config openssl.cnf -new \
       -sha256 \
       -passin pass:test123 \
       -key private/rsa_2048.key \
       -out csr/rsa_2048.csr \
       -days 9999</code></pre>
<ul>
<li>ECDSA</li>
</ul>
<pre class="shell"><code>     openssl req -config openssl.cnf -new \
       -sha256 \
       -passin pass:test123 \
       -key private/ecdsa_p256.key  \
       -out csr/ecdsa_p256.csr \
       -days 9999</code></pre>
<ul>
<li>EdDSA</li>
</ul>
<pre class="shell"><code>     openssl req -config openssl.cnf -new \
       -passin pass:test123 \
       -key private/ed25519.key  \
       -out csr/ed25519.csr \
       -days 9999</code></pre></li>
<li><p>Create server cert</p>
<p>Finally we can create set of server certificates.</p>
<ul>
<li>RSA</li>
</ul>
<pre class="shell"><code>     openssl x509 \
       -extfile openssl.cnf \
       -extensions server_cert -sha256 -req  \
       -CA certs/ca.cert -CAkey private/ca.key -CAcreateserial \
       -passin pass:test123 \
       -in csr/rsa_2048.csr \
       -out certs/rsa_2048.cert \
       -days 9999</code></pre>
<ul>
<li>ECDSA</li>
</ul>
<pre class="shell"><code>     openssl x509 \
       -extfile openssl.cnf \
       -extensions server_cert -sha256 -req  \
       -CA certs/ca.cert -CAkey private/ca.key -CAcreateserial \
       -passin pass:test123 \
       -in csr/ecdsa_p256.csr \
       -out certs/ecdsa_256.cert \
       -days 9999</code></pre>
<ul>
<li>EdDSA</li>
</ul>
<pre class="shell"><code>     openssl x509 \
       -extfile openssl.cnf \
       -extensions server_cert -req  \
       -passin pass:test123 \
       -CA certs/ca.cert -CAkey private/ca.key -CAcreateserial \
       -passin pass:test123 \
       -in csr/ed25519.csr \
       -out certs/ed25519.cert \
       -days 9999</code></pre></li>
</ol>
<p>It is currently believed that all private keys created above provide similar attack resistance, which is comparable to 128-bit symmetric cipher. Nevertheless, it’s worth to notice that byte size of those keys are much different.</p>
</section>
<section id="client-cert-creation" class="level2">
<h2 class="anchored" data-anchor-id="client-cert-creation">Client cert creation</h2>
<p>Commands below will create client private key and certificate that can be used for mutual TLS (client authentication). Procedure is similar to creating server certificate, so I’ll do it only for ECDSA.</p>
<ol type="1">
<li>Client’s private key</li>
</ol>
<pre class="shell"><code>openssl genpkey -algorithm EC \
          -pkeyopt ec_paramgen_curve:P-256 \
          -pkeyopt ec_param_enc:named_curve \
          -out private/cli_ecdsa_p256.key</code></pre>
<ol start="2" type="1">
<li>Create certificate signing request - intermediary step</li>
</ol>
<pre class="shell"><code>openssl req -config openssl.cnf -new \
          -sha256 \
          -passin pass:test123 \
          -key private/cli_ecdsa_p256.key  \
          -out csr/cli_ecdsa_p256.csr \
          -subj "/O=Cert Testing ORG/CN=Client Cert"</code></pre>
<ol start="3" type="1">
<li>Create client cert</li>
</ol>
<pre class="shell"><code>openssl x509 \
          -extfile openssl.cnf \
          -extensions client_cert \
          -req  \
          -CA certs/ca.cert \
          -CAkey private/ca.key \
          -CAcreateserial \
          -in csr/cli_ecdsa_p256.csr \
          -passin pass:test123 \
          -out certs/cli_ecdsa_p256.cert \
          -days 9999</code></pre>
</section>
<section id="verification" class="level2">
<h2 class="anchored" data-anchor-id="verification">Verification</h2>
<p>In order to verify server certificate against CA following command can be used.</p>
<pre class="shell"><code>&gt; openssl verify -CAfile certs/ca.cert certs/ecdsa_256.cert
certs/ecdsa_256.cert: OK</code></pre>
<p>That’s it, I hope it helps, but most of all I hope I won’t have to look for this stuff ever again.</p>
</section>
<section id="also" class="level2">
<h2 class="anchored" data-anchor-id="also">Also</h2>
<p>Thank you to <span class="citation" data-cites="mattcaswell">@mattcaswell</span> from OpenSSL team, for helping to figure out how to create EdDSA certs.</p>


</section>

 ]]></description>
  <guid>https://www.amongbytes.com/posts/201804-creating-certificates-for-ssl-testing.html</guid>
  <pubDate>Sun, 15 Apr 2018 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Looking for C++ object in a memory dump</title>
  <dc:creator>Kris Kwiatkowski</dc:creator>
  <link>https://www.amongbytes.com/posts/201201-how-find-cpp-object-address-of-a-class-in-the-core-file/</link>
  <description><![CDATA[ 





<p>When analyzing the core dump of a C++ based, long-running, server application it may be helpful to know the exact state of some objects created by the process. The question, then is, how to find that object. Core files consist of the recorded state of the working memory. The task may be not trivial if the process uses lots of memory.</p>
<p>I’ll assume that the C++ object has some virtual method. In that case, the object must contain a virtual pointer to the V-table of a class. By using the <code>nm</code> tool, it is easy to determine the address of a V-table. I can use that address, to determine exact locations in a core dump, of all the objects of that class, as all those objects will contain an address to V-table.</p>
<p>To demonstrate how the procedure works, let’s use the following, code below as “server application”. We are looking for an object <code>t</code>.</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode cpp code-with-copy"><code class="sourceCode cpp"><span id="cb1-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">class</span> test <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-2"><span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">public</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span></span>
<span id="cb1-3">  <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">virtual</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>test<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(){}</span></span>
<span id="cb1-4"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">};</span></span>
<span id="cb1-5"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> main<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-6">  test t<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-7">  abort<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">();</span></span>
<span id="cb1-8">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-9"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<p>As mentioned, the <code>nm</code> command shows an address of <code>test</code> class V-table.</p>
<pre class="shell"><code>$ nm -C myapp | grep "vtable for test"
0000000000400980 V vtable for test</code></pre>
<p>Ok, so the V-table address is <code>0x400980</code>. Now I need to find a V-pointer in the compiled binary. The value of a V-pointer is an address to the V-table + 16 (on a 64-bit system). To understand where this +16 comes from, we need to understand how the layout of the V-table looks like.</p>
<div class="quarto-figure quarto-figure-center">
<figure class="figure">
<p><img src="https://www.amongbytes.com/posts/201201-how-find-cpp-object-address-of-a-class-in-the-core-file/vtable_layout.png" class="img-fluid figure-img"></p>
<figcaption>V-Table</figcaption>
</figure>
</div>
<p>The graph above shows 5 segments. Typically, on a 64-bit system, each of those segments is 8 bytes long (4 on a 32-bit system). V-table starts with an empty segment, storing value <code>0x00</code>. The following segment contains an address to the typeinfo object of the class (used by the <code>typeid</code> function). The next segment is an address to the first virtual function declared in the class - in our case, it is an address to the destructor of the <code>test</code> class. The V-pointer stores address to this function, which is a reason why the value of V-pointer is an address of V-table+16 (V-table + 2 segments).</p>
<p>The next step in my investigation is to determine an address of the V-pointer in a program binary. Address of a V-table is <code>0x400980</code>, so the address too look for is a value <code>0x00400980 + 0x10 = 0x00400990</code>.</p>
<pre class="shell"><code>&gt; hexdump -C myapp | grep "90 09 40 00"
00001860  48 c7 00 90 09 40 00 5d  c3 90 90 90 90 90 90 90  |H....@.]........|
00002860  48 c7 00 90 09 40 00 5d  c3 90 90 90 90 90 90 90  |H....@.]........|
0005e410  90 09 40 00 00 00 00 00  00 00 00 00 00 00 00 00  |..@.............|</code></pre>
<p>We have got 3 possible places where an object may be located. I’ll use 3-rd for further description. I know that is the one I’m looking for, but normally at this point, one needs to somehow determine which object is the interesting one by examinating all of them. The address of this object is <code>0x005e410</code>.</p>
<p>Now we need to find out what’s the address of this object in a core file. To do it you need to do some calculations, because:</p>
<pre class="shell"><code>object address in a core file = offset to the V-pointer from program binary + VMA address - VMA offset</code></pre>
<p>VMA address and VMA offset we can get by using <code>objdump</code> or <code>readelf</code> commands.</p>
<pre class="shell"><code>&gt; objdump -h corefile
Sections:
Idx Name          Size      VMA               LMA               File off  Algn
...
36 load26        00001000  00007f7fbc5c5000  0000000000000000  0003e000  2**12
                 CONTENTS, ALLOC, LOAD
37 load27        00022000  00007fff4f143000  0000000000000000  0003f000  2**12
                 CONTENTS, ALLOC, LOAD
38 load28        00001000  00007fff4f1f9000  0000000000000000  00061000  2**12
                 CONTENTS, ALLOC, LOAD, READONLY, CODE</code></pre>
<p>Section 37, starting with <code>load 27</code> is the interesting one. That’s because the V-pointer offset value from the program binary is <code>0x005e410</code>. This value is between <code>0x3f000</code> (“File off” column for section 37) and <code>0x61000</code> (“File off” column for section 38). VMA address value for this section is <code>00007fff4f143000</code>, VMA offset value is <code>0003f000</code>. According to the formula above address of the object will be:</p>
<pre class="shell"><code>0x5e410 + 0x7fff4f143000 - 0x3f000 = 0x7FFF4F162410</code></pre>
<p>Let’s now check with GDB if the described procedure is correct:</p>
<pre class="shell"><code>&gt; gdb myapp core
(gdb) p &amp;amp;t
$1 = (test *) 0x7fff4f162410</code></pre>
<p>As we see address of the <code>t</code> variable is the same as what I have got from the calculation so the procedure is correct.</p>



 ]]></description>
  <guid>https://www.amongbytes.com/posts/201201-how-find-cpp-object-address-of-a-class-in-the-core-file/</guid>
  <pubDate>Mon, 15 Jan 2018 00:00:00 GMT</pubDate>
  <media:content url="https://download.logo.wine/logo/C%2B%2B/C%2B%2B-Logo.wine.png" medium="image" type="image/png"/>
</item>
<item>
  <title>Compile C code in Android NDK</title>
  <link>https://www.amongbytes.com/posts/201603-compile-native-clibrary-for-android.html</link>
  <description><![CDATA[ 





<p>I’ve limited love to Android tools provided by Google and never understood why Google tries to make it so complicated to run native code on the device. In the end Android is some form of Linux and some parts of Android framework are implemented in <code>C/C++</code>. I also have limited love (and knowledge) to Java and don’t really like to use it.</p>
<p>Anyways, here below I present 2 methods of compiling C programs with Android NDK.</p>
<p>Let’s use standard “hello world” as an application that we want to run on Android dev board (<code>main.c</code>):</p>
<div class="code-copy-outer-scaffold"><div class="sourceCode" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode c code-with-copy"><code class="sourceCode c"><span id="cb1-1"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">#include </span><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">&lt;stdio.h&gt;</span></span>
<span id="cb1-2"></span>
<span id="cb1-3"><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">int</span> main<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">()</span></span>
<span id="cb1-4"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span></span>
<span id="cb1-5">  printf<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">(</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Hello World</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">);</span></span>
<span id="cb1-6">  <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">return</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">;</span></span>
<span id="cb1-7"><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span></span></code></pre></div></div>
<section id="method-1-using-ndk-build" class="level2">
<h2 class="anchored" data-anchor-id="method-1-using-ndk-build">Method 1: Using <code>ndk-build</code></h2>
<p>This method follows Android’ic way of doing things:</p>
<ul>
<li><p>Create required directories</p>
<pre class="shell"><code>mkdir -p hello_world/jni
mkdir -p hello_world/libs</code></pre></li>
</ul>
<p>In the <code>jni</code> directory create</p>
<ul>
<li><code>Android.mk</code></li>
</ul>
<pre class="make"><code>    LOCAL_PATH := $(call my-dir)
    include $(CLEAR_VARS}
    # give module name
    LOCAL_MODULE := hello_world
    # list your C files to compile
    LOCAL_SRC_FILES := main.c
    include $(BUILD_EXECUTABLE)</code></pre>
<ul>
<li>Copy/create <code>main.c</code> to <code>jni</code> directory</li>
<li>Go to <code>jni</code> directory, call <code>ndk-build</code>. Compilation result should be in <code>hello_world/libs/armeabi/hello_world</code></li>
</ul>
</section>
<section id="method-2-makefile" class="level2">
<h2 class="anchored" data-anchor-id="method-2-makefile">Method 2: Makefile</h2>
<p>With the second (and my prefered) way you have better control over files being compiled, compiler settings, etc. There is also no “magic” that <code>ndk-build</code> provides.</p>
<p>Following Makefile uses <code>clang</code> from <code>NDK 16b</code> in order to compile a file for Android with API version 27 and for ARMv8 CPU. The makefile can be used a template.</p>
<pre class="make"><code># Change this to whereever you keep NDK
NDK            = /opt/android-ndk
SRCDIR         = .
OBJDIR         = .
DBG           ?= 0

# Debug/Release configuration
ifeq ($(DBG),1)
MODE_FLAGS     = -DDEBUG -g -O0
else
MODE_FLAGS     = -Os -fdata-sections -ffunction-sections
endif

## NDK configuration (clang)

# NDK Version
NDK_TARGETVER  = 27

# Target arch - here aarch64 for android
NDK_TARGETARCH = aarch64-linux-android

# Target CPU (ARMv8)
NDK_TARGETSHORTARCH = arm64

# Toolchain version
NDK_TOOLVER  = 4.9

# Architecture of a machine that does cross compilation
NDK_HOSTARCH = linux-x86_64

# Set needed preprocessor symbols
NDK_TOOLS    = $(NDK)/toolchains/llvm/prebuilt/$(NDK_HOSTARCH)/bin
NDK_SYSROOT  = $(NDK)/sysroot
NDK_TOOL     = $(NDK_TOOLS)/clang
NDK_LIBS     = $(NDK)/toolchains/$(NDK_TARGETARCH)-$(NDK_TOOLVER)/prebuilt/linux-x86_64/lib/gcc/$(NDK_TARGETARCH)/4.9.x
NDK_INCLUDES = -I$(NDK)/sysroot/usr/include \
               -I$(NDK)/sysroot/usr/include/$(NDK_TARGETARCH)
NDK_SYSROOT  = $(NDK)/platforms/android-$(NDK_TARGETVER)/arch-$(NDK_TARGETSHORTARCH)

# Options common to compiler and linker
OPT          = $(MODE_FLAGS) \
               -std=c99 \
               -fPIE \
               -Wall \
               -target $(NDK_TARGETARCH)

# Compiler options
CFLAGS       = $(OPT) \
               $(NDK_INCLUDES)

# Linker options
LDFLAGS      = $(OPT) \
               $(MODE_FLAGS) \
               -pie \
               --sysroot=$(NDK_SYSROOT) \
               -B $(ANDROID_NDK)/toolchains/$(NDK_TARGETARCH)-$(NDK_TOOLVER)/prebuilt/linux-x86_64/$(NDK_TARGETARCH)/bin \
               -L$(NDK_LIBS)

all:
    $(NDK_TOOL) -c $(SRCDIR)/main.c -o $(OBJDIR)/main.o $(CFLAGS)
    $(NDK_TOOL) -o main $(OBJDIR)/main.o $(LDFLAGS)

adb-prepare:
    adb root
    adb remount

push: adb-prepare
    adb push main /data/app/

run: adb-prepare push
    adb shell /data/app/main</code></pre>
<p>Copy this file to same directory as <code>main.c</code> and try</p>
<pre class="shell"><code>make all
make run</code></pre>
<p>This should compile the file, push it to target and run (if target is connected).</p>
<pre class="shell"><code>hdc@cryptoden 23:49 &gt; ~/example 
&gt; make run   
adb root
adb remount
remount succeeded
adb push main /data/app/
main: 1 file pushed. 0.7 MB/s (6000 bytes in 0.008s)
adb shell /data/app/main
Hello World</code></pre>


</section>

 ]]></description>
  <guid>https://www.amongbytes.com/posts/201603-compile-native-clibrary-for-android.html</guid>
  <pubDate>Thu, 03 Mar 2016 00:00:00 GMT</pubDate>
  <media:content url="https://www.91-cdn.com/hub/wp-content/uploads/2023/09/New-Android-logo-2023.jpg" medium="image" type="image/jpeg"/>
</item>
<item>
  <title>Encrypting RaspberryPI root partition</title>
  <link>https://www.amongbytes.com/posts/201505-arch-linux-on-rpi.html</link>
  <description><![CDATA[ 





<p>Description of encrypting root partition of already installed ArchLinux running on Raspberry. I assume that ArchLinux is already installed on SD card and Pi is booting correctly.</p>
<p>Tested on: * Kernel 4.1.6 (it may <strong>not</strong> work with much older kernel) * Raspberry model B revision 2</p>
<section id="creating-initrd" class="level2">
<h2 class="anchored" data-anchor-id="creating-initrd">Creating initrd</h2>
<p>Best is to start on some actions that need to be done on raspberry. We need to install mkinitcpio and create initram file.</p>
<pre class="shell"><code>pacman -S mkinitcpio
cp /etc/mkinitcpio.conf ~/mkinitcpio.ripi.conf
vi ~/mkinitcpio.ripi.conf
</code></pre>
<p>Make sure that in the configuration file you have HOOKS and MODULES variables changed as below:</p>
<pre class="shell"><code>MODULES="dm_mod hid usbhid usbcore"
HOOKS="base udev autodetect modconf block filesystems keyboard encrypt fsck"
</code></pre>
<p>In MODULES most important is <code>dm_mod</code> and in HOOKS <code>encrypt</code>. Also order is very important in HOOKS. Once done generate new init-ram.</p>
<pre class="shell"><code>mkinitcpio -k `uname -r` -c ~/mkinitcpio.ripi.conf -g /boot/initrd-crypt</code></pre>
</section>
<section id="creating-encrypted-volume" class="level2">
<h2 class="anchored" data-anchor-id="creating-encrypted-volume">Creating encrypted volume</h2>
<p>This must be done on PC. Insert SD card, mount root partition and copy it’s content to some temporary location. Don’t forget trailing / after <code>temporary_location</code>, it is important.</p>
<pre class="shell"><code>mount /dev/mmcblk0p2 /media
mkdir /temporary_location
rsync --progress -axv /media /temporary_location/
</code></pre>
<p>Next step is to create encrypted volume, format it and copy back root partition content:</p>
<pre class="shell"><code>cryptsetup luksFormat /dev/mmcblk0p2
cryptsetup luksOpen /dev/mmcblk0p2 root-raspberry
mkfs.ext4 /dev/mapper/root-raspberry
mount /dev/mapper/root-raspberry /mnt
rsync --progress -axv /temporary_location/ /mnt</code></pre>
</section>
<section id="modification-in-etcfstab-mntbootconfig.txt-and-mntbootcmdline.txt-file" class="level2">
<h2 class="anchored" data-anchor-id="modification-in-etcfstab-mntbootconfig.txt-and-mntbootcmdline.txt-file">Modification in /etc/fstab, /mnt/boot/config.txt and /mnt/boot/cmdline.txt file</h2>
<p>Watch out here - many sources on internet says that you need to specify and address on which initram is loaded (something like initramfs initrd-crypt 0x0a000000, in config.txt). This doesn’t work with kernel 4.1. It’s enough to specify name of the init-ram file in config.txt and cmdline.txt</p>
<ul>
<li><p><strong>/mnt/etc/fstab</strong>: Change device that mounts on /. File must have following entry (remove entry that starts with <code>/dev/mmcblk0p2</code>)</p>
<pre class="shell"><code>/dev/mapper/root / ext4 defaults,discard,commit=120 0 1</code></pre></li>
<li><p><strong>/mnt/boot/config.txt</strong>: Set initramfs. This file needs to have following line</p>
<pre class="shell"><code>initramfs initrd-crypt</code></pre></li>
<li><p><strong>/mnt/boot/cmdline.txt</strong>: Add following kernel command line arguments:</p>
<pre class="shell"><code>cryptdevice=/dev/mmcblk0p2:root:allow-discards root=/dev/mapper/root rootwait rootfstype=ext4 initrd=initrd-crypt</code></pre></li>
</ul>
<p>Unmount and close crypto device:</p>
<pre class="shell"><code>sync
unmount /mnt
cryptsetup luksClose root-raspberry</code></pre>
<p>Now you can put back SD card to raspberry and boot device. It should ask for password while booting.</p>
</section>
<section id="password-on-usb-key" class="level2">
<h2 class="anchored" data-anchor-id="password-on-usb-key">Password on USB key</h2>
<p>Raspberry can also read a password directly from file on USB key while booting. In order to do it, create a file with password:</p>
<pre class="shell"><code>dd if=/dev/urandom of=/mnt/sdb1/ripi.txt
cryptsetup luksAddKey /dev/mmcblk0p2 /mnt/sdb1/ripi.txt</code></pre>
<p>And add following entry to cmdline.txt</p>
<pre class="shell"><code>cryptkey=/dev/disk/by-uuid/ABCD-EFGH:vfat:/ripi.txt</code></pre>
<p>Where value for ABCD-EFGH you get by running blkid on partition of USB key that contains password:</p>
<pre class="shell"><code>blkid /dev/sdb1
/dev/sda: UUID="ABCD-EFGH" TYPE="vfat"</code></pre>
</section>
<section id="interesting-links" class="level2">
<h2 class="anchored" data-anchor-id="interesting-links">Interesting links</h2>
<ol type="1">
<li>https://www.pavelkogan.com/2014/05/23/luks-full-disk-encryption/</li>
<li>https://outflux.net/blog/archives/2017/08/30/grub-and-luks/</li>
</ol>


</section>

 ]]></description>
  <guid>https://www.amongbytes.com/posts/201505-arch-linux-on-rpi.html</guid>
  <pubDate>Wed, 20 May 2015 00:00:00 GMT</pubDate>
  <media:content url="https://uk.farnell.com/productimages/standard/en_GB/4255998-40.jpg" medium="image" type="image/jpeg"/>
</item>
</channel>
</rss>
