[nca] Use better tight loop allocation schemes (none at all) for AES decrypt/encrypt and force MbedTLS to use AES x86_64 instructions (#2750 )

Uses stack instead of allocating stuff haphazardly (16 bytes and 512 bytes respectively) - removes malloc() pollution and all that nasty stuff from tight loops Original work by Ribbit but edited by me. Will NOT bring a massive speedup since the main bottleneck is mbedtls itself, but may bring nice oddities to STARTUP TIMES nonetheless. AES instructions being forced wont affect CPUs without them since there is always a runtime check for them. Signed-off-by: lizzie lizzie@eden-emu.dev Co-authored-by: Ribbit <ribbit@placeholder.com> Reviewed-on: eden-emu/eden#2750 Reviewed-by: CamilleLaVey <camillelavey99@gmail.com> Co-authored-by: lizzie <lizzie@eden-emu.dev> Co-committed-by: lizzie <lizzie@eden-emu.dev>
[dynarmic, qt] fix build with QuaZip <=1.4 and fmt v9 (#2755 )
2025-10-17 05:08:51 +02:00 · 2025-10-17 04:20:11 +02:00 · 2025-10-17 01:23:48 +02:00 · 2025-10-16 20:33:38 +02:00 · 2025-10-16 06:48:17 +02:00 · 2025-10-16 03:33:24 +02:00
35 changed files with 344 additions and 1156 deletions
--- a/.patch/mbedtls/0002-aesni-fix.patch
+++ b/.patch/mbedtls/0002-aesni-fix.patch
@ -0,0 +1,13 @@
+diff --git a/library/aesni.h b/library/aesni.h
+index 754c984c79..59e27afd3e 100644
+--- a/library/aesni.h
+++ b/library/aesni.h
+@@ -35,7 +35,7 @@
+ /* GCC-like compilers: currently, we only support intrinsics if the requisite
+  * target flag is enabled when building the library (e.g. `gcc -mpclmul -msse2`
+  * or `clang -maes -mpclmul`). */
+-#if (defined(__GNUC__) || defined(__clang__)) && defined(__AES__) && defined(__PCLMUL__)
+#if defined(__GNUC__) || defined(__clang__)
+ #define MBEDTLS_AESNI_HAVE_INTRINSICS
+ #endif
+ /* For 32-bit, we only support intrinsics */
--- a/.patch/mbedtls/0003-aesni-fix.patch
+++ b/.patch/mbedtls/0003-aesni-fix.patch
@ -0,0 +1,22 @@
+diff --git a/library/aesni.c b/library/aesni.c
+index 2857068..3e104ab 100644
+--- a/library/aesni.c
+++ b/library/aesni.c
+@@ -31,16 +31,14 @@
+ #include <immintrin.h>
+ #endif
+ 
+-#if defined(MBEDTLS_ARCH_IS_X86)
+ #if defined(MBEDTLS_COMPILER_IS_GCC)
+ #pragma GCC push_options
+ #pragma GCC target ("pclmul,sse2,aes")
+ #define MBEDTLS_POP_TARGET_PRAGMA
+-#elif defined(__clang__) && (__clang_major__ >= 5)
+#elif defined(__clang__)
+ #pragma clang attribute push (__attribute__((target("pclmul,sse2,aes"))), apply_to=function)
+ #define MBEDTLS_POP_TARGET_PRAGMA
+ #endif
+-#endif
+ 
+ #if !defined(MBEDTLS_AES_USE_HARDWARE_ONLY)
+ /*
--- a/.patch/mcl/0001-assert-macro.patch
+++ b/.patch/mcl/0001-assert-macro.patch
@ -0,0 +1,55 @@
+diff --git a/include/mcl/assert.hpp b/include/mcl/assert.hpp
+index f77dbe7..9ec0b9c 100644
+--- a/include/mcl/assert.hpp
+++ b/include/mcl/assert.hpp
+@@ -23,8 +23,11 @@ template<typename... Ts>
+
+ }  // namespace mcl::detail
+
+#ifndef UNREACHABLE
+ #define UNREACHABLE() ASSERT_FALSE("Unreachable code!")
+#endif
+
+#ifndef ASSERT
+ #define ASSERT(expr)                                                     \
+     [&] {                                                                \
+         if (std::is_constant_evaluated()) {                              \
+@@ -37,7 +40,9 @@ template<typename... Ts>
+             }                                                            \
+         }                                                                \
+     }()
+#endif
+
+#ifndef ASSERT_MSG
+ #define ASSERT_MSG(expr, ...)                                                \
+     [&] {                                                                    \
+         if (std::is_constant_evaluated()) {                                  \
+@@ -50,13 +55,24 @@ template<typename... Ts>
+             }                                                                \
+         }                                                                    \
+     }()
+#endif
+
+#ifndef ASSERT_FALSE
+ #define ASSERT_FALSE(...) ::mcl::detail::assert_terminate("false", __VA_ARGS__)
+#endif
+
+ #if defined(NDEBUG) || defined(MCL_IGNORE_ASSERTS)
+-#    define DEBUG_ASSERT(expr) ASSUME(expr)
+-#    define DEBUG_ASSERT_MSG(expr, ...) ASSUME(expr)
+#    ifndef DEBUG_ASSERT
+#        define DEBUG_ASSERT(expr) ASSUME(expr)
+#    endif
+#    ifndef DEBUG_ASSERT_MSG
+#        define DEBUG_ASSERT_MSG(expr, ...) ASSUME(expr)
+#    endif
+ #else
+-#    define DEBUG_ASSERT(expr) ASSERT(expr)
+-#    define DEBUG_ASSERT_MSG(expr, ...) ASSERT_MSG(expr, __VA_ARGS__)
+#    ifndef DEBUG_ASSERT
+#        define DEBUG_ASSERT(expr) ASSERT(expr)
+#    endif
+#    ifndef DEBUG_ASSERT_MSG
+#        define DEBUG_ASSERT_MSG(expr, ...) ASSERT_MSG(expr, __VA_ARGS__)
+#    endif
+ #endif
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@ -52,6 +52,10 @@ if (PLATFORM_SUN)
        set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O3")
        set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O3")
    endif()
+    if (CMAKE_BUILD_TYPE MATCHES "RelWithDebInfo")
+        set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O2")
+        set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O2")
+    endif()
 endif()

 # Needed for FFmpeg w/ VAAPI and DRM
--- a/CMakeModules/FindDiscordRPC.cmake
+++ b/CMakeModules/FindDiscordRPC.cmake
@ -1,27 +1,33 @@
+# SPDX-FileCopyrightText: Copyright 2025 Eden Emulator Project
+# SPDX-License-Identifier: GPL-3.0-or-later
+
 # SPDX-FileCopyrightText: 2022 Alexandre Bouvier <contact@amb.tf>
 #
 # SPDX-License-Identifier: GPL-3.0-or-later

-find_path(DiscordRPC_INCLUDE_DIR discord_rpc.h)
+find_package(DiscordRPC CONFIG QUIET)

-find_library(DiscordRPC_LIBRARY discord-rpc)
+if (NOT DiscordRPC_FOUND)
+    find_path(DiscordRPC_INCLUDE_DIR discord_rpc.h)
+    find_library(DiscordRPC_LIBRARY discord-rpc)

-include(FindPackageHandleStandardArgs)
-find_package_handle_standard_args(DiscordRPC
+    include(FindPackageHandleStandardArgs)
+    find_package_handle_standard_args(DiscordRPC
        REQUIRED_VARS
            DiscordRPC_LIBRARY
            DiscordRPC_INCLUDE_DIR
-)
+    )

-if (DiscordRPC_FOUND AND NOT TARGET DiscordRPC::discord-rpc)
+    if (DiscordRPC_FOUND AND NOT TARGET DiscordRPC::discord-rpc)
        add_library(DiscordRPC::discord-rpc UNKNOWN IMPORTED)
        set_target_properties(DiscordRPC::discord-rpc PROPERTIES
            IMPORTED_LOCATION "${DiscordRPC_LIBRARY}"
            INTERFACE_INCLUDE_DIRECTORIES "${DiscordRPC_INCLUDE_DIR}"
        )
-endif()
+    endif()

-mark_as_advanced(
+    mark_as_advanced(
        DiscordRPC_INCLUDE_DIR
        DiscordRPC_LIBRARY
-)
+    )
+endif()
--- a/externals/cpmfile.json
+++ b/externals/cpmfile.json
@ -97,7 +97,11 @@
        "version": "3",
        "git_version": "3.6.4",
        "artifact": "%TAG%.tar.bz2",
-        "skip_updates": true
+        "skip_updates": true,
+        "patches": [
+            "0002-aesni-fix.patch",
+            "0003-aesni-fix.patch"
+        ]
    },
    "enet": {
        "repo": "lsalzman/enet",
--- a/src/audio_core/adsp/apps/audio_renderer/audio_renderer.h
+++ b/src/audio_core/adsp/apps/audio_renderer/audio_renderer.h
@ -1,3 +1,6 @@
+// SPDX-FileCopyrightText: Copyright 2025 Eden Emulator Project
+// SPDX-License-Identifier: GPL-3.0-or-later
+
 // SPDX-FileCopyrightText: Copyright 2023 yuzu Emulator Project
 // SPDX-License-Identifier: GPL-2.0-or-later

@ -12,7 +15,6 @@
 #include "audio_core/adsp/mailbox.h"
 #include "common/common_types.h"
 #include "common/polyfill_thread.h"
-#include "common/reader_writer_queue.h"
 #include "common/thread.h"

 namespace Core {
--- a/src/audio_core/sink/sink_stream.cpp
+++ b/src/audio_core/sink/sink_stream.cpp
@ -23,7 +23,7 @@ namespace AudioCore::Sink {

 void SinkStream::AppendBuffer(SinkBuffer& buffer, std::span<s16> samples) {
    SCOPE_EXIT {
-        queue.enqueue(buffer);
+        queue.EmplaceWait(buffer);
        ++queued_buffers;
    };

@ -147,7 +147,8 @@ std::vector<s16> SinkStream::ReleaseBuffer(u64 num_samples) {

 void SinkStream::ClearQueue() {
    samples_buffer.Pop();
-    while (queue.pop()) {
+    SinkBuffer tmp;
+    while (queue.TryPop(tmp)) {
    }
    queued_buffers = 0;
    playing_buffer = {};
@ -169,7 +170,7 @@ void SinkStream::ProcessAudioIn(std::span<const s16> input_buffer, std::size_t n
    while (frames_written < num_frames) {
        // If the playing buffer has been consumed or has no frames, we need a new one
        if (playing_buffer.consumed || playing_buffer.frames == 0) {
-            if (!queue.try_dequeue(playing_buffer)) {
+            if (!queue.TryPop(playing_buffer)) {
                // If no buffer was available we've underrun, just push the samples and
                // continue.
                samples_buffer.Push(&input_buffer[frames_written * frame_size],
@ -230,7 +231,7 @@ void SinkStream::ProcessAudioOutAndRender(std::span<s16> output_buffer, std::siz
    while (frames_written < num_frames) {
        // If the playing buffer has been consumed or has no frames, we need a new one
        if (playing_buffer.consumed || playing_buffer.frames == 0) {
-            if (!queue.try_dequeue(playing_buffer)) {
+            if (!queue.TryPop(playing_buffer)) {
                // If no buffer was available we've underrun, fill the remaining buffer with
                // the last written frame and continue.
                for (size_t i = frames_written; i < num_frames; i++) {
--- a/src/audio_core/sink/sink_stream.h
+++ b/src/audio_core/sink/sink_stream.h
@ -1,3 +1,6 @@
+// SPDX-FileCopyrightText: Copyright 2025 Eden Emulator Project
+// SPDX-License-Identifier: GPL-3.0-or-later
+
 // SPDX-FileCopyrightText: Copyright 2018 yuzu Emulator Project
 // SPDX-License-Identifier: GPL-2.0-or-later

@ -14,8 +17,8 @@
 #include "audio_core/common/common.h"
 #include "common/common_types.h"
 #include "common/polyfill_thread.h"
-#include "common/reader_writer_queue.h"
 #include "common/ring_buffer.h"
+#include "common/bounded_threadsafe_queue.h"
 #include "common/thread.h"

 namespace Core {
@ -237,7 +240,7 @@ private:
    /// Ring buffer of the samples waiting to be played or consumed
    Common::RingBuffer<s16, 0x10000> samples_buffer;
    /// Audio buffers queued and waiting to play
-    Common::ReaderWriterQueue<SinkBuffer> queue;
+    Common::SPSCQueue<SinkBuffer, 0x10000> queue;
    /// The currently-playing audio buffer
    SinkBuffer playing_buffer{};
    /// The last played (or received) frame of audio, used when the callback underruns
--- a/src/common/CMakeLists.txt
+++ b/src/common/CMakeLists.txt
@ -109,7 +109,6 @@ add_library(
  range_mutex.h
  range_sets.h
  range_sets.inc
-  reader_writer_queue.h
  ring_buffer.h
  ${CMAKE_CURRENT_BINARY_DIR}/scm_rev.cpp
  scm_rev.h
--- a/src/common/logging/log.h
+++ b/src/common/logging/log.h
@ -40,22 +40,22 @@ void FmtLogMessage(Class log_class, Level log_level, const char* filename, unsig
 #endif

 #define LOG_DEBUG(log_class, ...)                                                                  \
-    Common::Log::FmtLogMessage(Common::Log::Class::log_class, Common::Log::Level::Debug,           \
+    ::Common::Log::FmtLogMessage(::Common::Log::Class::log_class, ::Common::Log::Level::Debug,           \
                               __FILE__, __LINE__, __func__,          \
                               __VA_ARGS__)
 #define LOG_INFO(log_class, ...)                                                                   \
-    Common::Log::FmtLogMessage(Common::Log::Class::log_class, Common::Log::Level::Info,            \
+    ::Common::Log::FmtLogMessage(::Common::Log::Class::log_class, ::Common::Log::Level::Info,            \
                               __FILE__, __LINE__, __func__,          \
                               __VA_ARGS__)
 #define LOG_WARNING(log_class, ...)                                                                \
-    Common::Log::FmtLogMessage(Common::Log::Class::log_class, Common::Log::Level::Warning,         \
+    ::Common::Log::FmtLogMessage(::Common::Log::Class::log_class, ::Common::Log::Level::Warning,         \
                               __FILE__, __LINE__, __func__,          \
                               __VA_ARGS__)
 #define LOG_ERROR(log_class, ...)                                                                  \
-    Common::Log::FmtLogMessage(Common::Log::Class::log_class, Common::Log::Level::Error,           \
+    ::Common::Log::FmtLogMessage(::Common::Log::Class::log_class, ::Common::Log::Level::Error,           \
                               __FILE__, __LINE__, __func__,          \
                               __VA_ARGS__)
 #define LOG_CRITICAL(log_class, ...)                                                               \
-    Common::Log::FmtLogMessage(Common::Log::Class::log_class, Common::Log::Level::Critical,        \
+    ::Common::Log::FmtLogMessage(::Common::Log::Class::log_class, ::Common::Log::Level::Critical,        \
                               __FILE__, __LINE__, __func__,          \
                               __VA_ARGS__)
--- a/src/common/reader_writer_queue.h
+++ b/src/common/reader_writer_queue.h
@ -1,940 +0,0 @@
-// SPDX-FileCopyrightText: 2013-2020 Cameron Desrochers
-// SPDX-License-Identifier: BSD-2-Clause
-
-#pragma once
-
-#include <cassert>
-#include <cstdint>
-#include <cstdlib> // For malloc/free/abort & size_t
-#include <memory>
-#include <new>
-#include <stdexcept>
-#include <type_traits>
-#include <utility>
-
-#include "common/atomic_helpers.h"
-
-#if __cplusplus > 199711L || _MSC_VER >= 1700 // C++11 or VS2012
-#include <chrono>
-#endif
-
-// A lock-free queue for a single-consumer, single-producer architecture.
-// The queue is also wait-free in the common path (except if more memory
-// needs to be allocated, in which case malloc is called).
-// Allocates memory sparingly, and only once if the original maximum size
-// estimate is never exceeded.
-// Tested on x86/x64 processors, but semantics should be correct for all
-// architectures (given the right implementations in atomicops.h), provided
-// that aligned integer and pointer accesses are naturally atomic.
-// Note that there should only be one consumer thread and producer thread;
-// Switching roles of the threads, or using multiple consecutive threads for
-// one role, is not safe unless properly synchronized.
-// Using the queue exclusively from one thread is fine, though a bit silly.
-
-#ifndef MOODYCAMEL_CACHE_LINE_SIZE
-#define MOODYCAMEL_CACHE_LINE_SIZE 64
-#endif
-
-#ifndef MOODYCAMEL_EXCEPTIONS_ENABLED
-#if (defined(_MSC_VER) && defined(_CPPUNWIND)) || (defined(__GNUC__) && defined(__EXCEPTIONS)) ||  \
-    (!defined(_MSC_VER) && !defined(__GNUC__))
-#define MOODYCAMEL_EXCEPTIONS_ENABLED
-#endif
-#endif
-
-#ifndef MOODYCAMEL_HAS_EMPLACE
-#if !defined(_MSC_VER) ||                                                                          \
-    _MSC_VER >= 1800 // variadic templates: either a non-MS compiler or VS >= 2013
-#define MOODYCAMEL_HAS_EMPLACE 1
-#endif
-#endif
-
-#ifndef MOODYCAMEL_MAYBE_ALIGN_TO_CACHELINE
-#if defined(__APPLE__) && defined(__MACH__) && __cplusplus >= 201703L
-// This is required to find out what deployment target we are using
-#include <CoreFoundation/CoreFoundation.h>
-#if !defined(MAC_OS_X_VERSION_MIN_REQUIRED) ||                                                     \
-    MAC_OS_X_VERSION_MIN_REQUIRED < MAC_OS_X_VERSION_10_14
-// C++17 new(size_t, align_val_t) is not backwards-compatible with older versions of macOS, so we
-// can't support over-alignment in this case
-#define MOODYCAMEL_MAYBE_ALIGN_TO_CACHELINE
-#endif
-#endif
-#endif
-
-#ifndef MOODYCAMEL_MAYBE_ALIGN_TO_CACHELINE
-#define MOODYCAMEL_MAYBE_ALIGN_TO_CACHELINE AE_ALIGN(MOODYCAMEL_CACHE_LINE_SIZE)
-#endif
-
-#ifdef AE_VCPP
-#pragma warning(push)
-#pragma warning(disable : 4324) // structure was padded due to __declspec(align())
-#pragma warning(disable : 4820) // padding was added
-#pragma warning(disable : 4127) // conditional expression is constant
-#endif
-
-namespace Common {
-
-template <typename T, size_t MAX_BLOCK_SIZE = 512>
-class MOODYCAMEL_MAYBE_ALIGN_TO_CACHELINE ReaderWriterQueue {
-    // Design: Based on a queue-of-queues. The low-level queues are just
-    // circular buffers with front and tail indices indicating where the
-    // next element to dequeue is and where the next element can be enqueued,
-    // respectively. Each low-level queue is called a "block". Each block
-    // wastes exactly one element's worth of space to keep the design simple
-    // (if front == tail then the queue is empty, and can't be full).
-    // The high-level queue is a circular linked list of blocks; again there
-    // is a front and tail, but this time they are pointers to the blocks.
-    // The front block is where the next element to be dequeued is, provided
-    // the block is not empty. The back block is where elements are to be
-    // enqueued, provided the block is not full.
-    // The producer thread owns all the tail indices/pointers. The consumer
-    // thread owns all the front indices/pointers. Both threads read each
-    // other's variables, but only the owning thread updates them. E.g. After
-    // the consumer reads the producer's tail, the tail may change before the
-    // consumer is done dequeuing an object, but the consumer knows the tail
-    // will never go backwards, only forwards.
-    // If there is no room to enqueue an object, an additional block (of
-    // equal size to the last block) is added. Blocks are never removed.
-
-public:
-    typedef T value_type;
-
-    // Constructs a queue that can hold at least `size` elements without further
-    // allocations. If more than MAX_BLOCK_SIZE elements are requested,
-    // then several blocks of MAX_BLOCK_SIZE each are reserved (including
-    // at least one extra buffer block).
-    AE_NO_TSAN explicit ReaderWriterQueue(size_t size = 15)
-#ifndef NDEBUG
-        : enqueuing(false), dequeuing(false)
-#endif
-    {
-        assert(MAX_BLOCK_SIZE == ceilToPow2(MAX_BLOCK_SIZE) &&
-               "MAX_BLOCK_SIZE must be a power of 2");
-        assert(MAX_BLOCK_SIZE >= 2 && "MAX_BLOCK_SIZE must be at least 2");
-
-        Block* firstBlock = nullptr;
-
-        largestBlockSize =
-            ceilToPow2(size + 1); // We need a spare slot to fit size elements in the block
-        if (largestBlockSize > MAX_BLOCK_SIZE * 2) {
-            // We need a spare block in case the producer is writing to a different block the
-            // consumer is reading from, and wants to enqueue the maximum number of elements. We
-            // also need a spare element in each block to avoid the ambiguity between front == tail
-            // meaning "empty" and "full". So the effective number of slots that are guaranteed to
-            // be usable at any time is the block size - 1 times the number of blocks - 1. Solving
-            // for size and applying a ceiling to the division gives us (after simplifying):
-            size_t initialBlockCount = (size + MAX_BLOCK_SIZE * 2 - 3) / (MAX_BLOCK_SIZE - 1);
-            largestBlockSize = MAX_BLOCK_SIZE;
-            Block* lastBlock = nullptr;
-            for (size_t i = 0; i != initialBlockCount; ++i) {
-                auto block = make_block(largestBlockSize);
-                if (block == nullptr) {
-#ifdef MOODYCAMEL_EXCEPTIONS_ENABLED
-                    throw std::bad_alloc();
-#else
-                    abort();
-#endif
-                }
-                if (firstBlock == nullptr) {
-                    firstBlock = block;
-                } else {
-                    lastBlock->next = block;
-                }
-                lastBlock = block;
-                block->next = firstBlock;
-            }
-        } else {
-            firstBlock = make_block(largestBlockSize);
-            if (firstBlock == nullptr) {
-#ifdef MOODYCAMEL_EXCEPTIONS_ENABLED
-                throw std::bad_alloc();
-#else
-                abort();
-#endif
-            }
-            firstBlock->next = firstBlock;
-        }
-        frontBlock = firstBlock;
-        tailBlock = firstBlock;
-
-        // Make sure the reader/writer threads will have the initialized memory setup above:
-        fence(memory_order_sync);
-    }
-
-    // Note: The queue should not be accessed concurrently while it's
-    // being moved. It's up to the user to synchronize this.
-    AE_NO_TSAN ReaderWriterQueue(ReaderWriterQueue&& other)
-        : frontBlock(other.frontBlock.load()), tailBlock(other.tailBlock.load()),
-          largestBlockSize(other.largestBlockSize)
-#ifndef NDEBUG
-          ,
-          enqueuing(false), dequeuing(false)
-#endif
-    {
-        other.largestBlockSize = 32;
-        Block* b = other.make_block(other.largestBlockSize);
-        if (b == nullptr) {
-#ifdef MOODYCAMEL_EXCEPTIONS_ENABLED
-            throw std::bad_alloc();
-#else
-            abort();
-#endif
-        }
-        b->next = b;
-        other.frontBlock = b;
-        other.tailBlock = b;
-    }
-
-    // Note: The queue should not be accessed concurrently while it's
-    // being moved. It's up to the user to synchronize this.
-    ReaderWriterQueue& operator=(ReaderWriterQueue&& other) AE_NO_TSAN {
-        Block* b = frontBlock.load();
-        frontBlock = other.frontBlock.load();
-        other.frontBlock = b;
-        b = tailBlock.load();
-        tailBlock = other.tailBlock.load();
-        other.tailBlock = b;
-        std::swap(largestBlockSize, other.largestBlockSize);
-        return *this;
-    }
-
-    // Note: The queue should not be accessed concurrently while it's
-    // being deleted. It's up to the user to synchronize this.
-    AE_NO_TSAN ~ReaderWriterQueue() {
-        // Make sure we get the latest version of all variables from other CPUs:
-        fence(memory_order_sync);
-
-        // Destroy any remaining objects in queue and free memory
-        Block* frontBlock_ = frontBlock;
-        Block* block = frontBlock_;
-        do {
-            Block* nextBlock = block->next;
-            size_t blockFront = block->front;
-            size_t blockTail = block->tail;
-
-            for (size_t i = blockFront; i != blockTail; i = (i + 1) & block->sizeMask) {
-                auto element = reinterpret_cast<T*>(block->data + i * sizeof(T));
-                element->~T();
-                (void)element;
-            }
-
-            auto rawBlock = block->rawThis;
-            block->~Block();
-            std::free(rawBlock);
-            block = nextBlock;
-        } while (block != frontBlock_);
-    }
-
-    // Enqueues a copy of element if there is room in the queue.
-    // Returns true if the element was enqueued, false otherwise.
-    // Does not allocate memory.
-    AE_FORCEINLINE bool try_enqueue(T const& element) AE_NO_TSAN {
-        return inner_enqueue<CannotAlloc>(element);
-    }
-
-    // Enqueues a moved copy of element if there is room in the queue.
-    // Returns true if the element was enqueued, false otherwise.
-    // Does not allocate memory.
-    AE_FORCEINLINE bool try_enqueue(T&& element) AE_NO_TSAN {
-        return inner_enqueue<CannotAlloc>(std::forward<T>(element));
-    }
-
-#if MOODYCAMEL_HAS_EMPLACE
-    // Like try_enqueue() but with emplace semantics (i.e. construct-in-place).
-    template <typename... Args>
-    AE_FORCEINLINE bool try_emplace(Args&&... args) AE_NO_TSAN {
-        return inner_enqueue<CannotAlloc>(std::forward<Args>(args)...);
-    }
-#endif
-
-    // Enqueues a copy of element on the queue.
-    // Allocates an additional block of memory if needed.
-    // Only fails (returns false) if memory allocation fails.
-    AE_FORCEINLINE bool enqueue(T const& element) AE_NO_TSAN {
-        return inner_enqueue<CanAlloc>(element);
-    }
-
-    // Enqueues a moved copy of element on the queue.
-    // Allocates an additional block of memory if needed.
-    // Only fails (returns false) if memory allocation fails.
-    AE_FORCEINLINE bool enqueue(T&& element) AE_NO_TSAN {
-        return inner_enqueue<CanAlloc>(std::forward<T>(element));
-    }
-
-#if MOODYCAMEL_HAS_EMPLACE
-    // Like enqueue() but with emplace semantics (i.e. construct-in-place).
-    template <typename... Args>
-    AE_FORCEINLINE bool emplace(Args&&... args) AE_NO_TSAN {
-        return inner_enqueue<CanAlloc>(std::forward<Args>(args)...);
-    }
-#endif
-
-    // Attempts to dequeue an element; if the queue is empty,
-    // returns false instead. If the queue has at least one element,
-    // moves front to result using operator=, then returns true.
-    template <typename U>
-    bool try_dequeue(U& result) AE_NO_TSAN {
-#ifndef NDEBUG
-        ReentrantGuard guard(this->dequeuing);
-#endif
-
-        // High-level pseudocode:
-        // Remember where the tail block is
-        // If the front block has an element in it, dequeue it
-        // Else
-        //     If front block was the tail block when we entered the function, return false
-        //     Else advance to next block and dequeue the item there
-
-        // Note that we have to use the value of the tail block from before we check if the front
-        // block is full or not, in case the front block is empty and then, before we check if the
-        // tail block is at the front block or not, the producer fills up the front block *and
-        // moves on*, which would make us skip a filled block. Seems unlikely, but was consistently
-        // reproducible in practice.
-        // In order to avoid overhead in the common case, though, we do a double-checked pattern
-        // where we have the fast path if the front block is not empty, then read the tail block,
-        // then re-read the front block and check if it's not empty again, then check if the tail
-        // block has advanced.
-
-        Block* frontBlock_ = frontBlock.load();
-        size_t blockTail = frontBlock_->localTail;
-        size_t blockFront = frontBlock_->front.load();
-
-        if (blockFront != blockTail ||
-            blockFront != (frontBlock_->localTail = frontBlock_->tail.load())) {
-            fence(memory_order_acquire);
-
-        non_empty_front_block:
-            // Front block not empty, dequeue from here
-            auto element = reinterpret_cast<T*>(frontBlock_->data + blockFront * sizeof(T));
-            result = std::move(*element);
-            element->~T();
-
-            blockFront = (blockFront + 1) & frontBlock_->sizeMask;
-
-            fence(memory_order_release);
-            frontBlock_->front = blockFront;
-        } else if (frontBlock_ != tailBlock.load()) {
-            fence(memory_order_acquire);
-
-            frontBlock_ = frontBlock.load();
-            blockTail = frontBlock_->localTail = frontBlock_->tail.load();
-            blockFront = frontBlock_->front.load();
-            fence(memory_order_acquire);
-
-            if (blockFront != blockTail) {
-                // Oh look, the front block isn't empty after all
-                goto non_empty_front_block;
-            }
-
-            // Front block is empty but there's another block ahead, advance to it
-            Block* nextBlock = frontBlock_->next;
-            // Don't need an acquire fence here since next can only ever be set on the tailBlock,
-            // and we're not the tailBlock, and we did an acquire earlier after reading tailBlock
-            // which ensures next is up-to-date on this CPU in case we recently were at tailBlock.
-
-            size_t nextBlockFront = nextBlock->front.load();
-            size_t nextBlockTail = nextBlock->localTail = nextBlock->tail.load();
-            fence(memory_order_acquire);
-
-            // Since the tailBlock is only ever advanced after being written to,
-            // we know there's for sure an element to dequeue on it
-            assert(nextBlockFront != nextBlockTail);
-            AE_UNUSED(nextBlockTail);
-
-            // We're done with this block, let the producer use it if it needs
-            fence(memory_order_release); // Expose possibly pending changes to frontBlock->front
-                                         // from last dequeue
-            frontBlock = frontBlock_ = nextBlock;
-
-            compiler_fence(memory_order_release); // Not strictly needed
-
-            auto element = reinterpret_cast<T*>(frontBlock_->data + nextBlockFront * sizeof(T));
-
-            result = std::move(*element);
-            element->~T();
-
-            nextBlockFront = (nextBlockFront + 1) & frontBlock_->sizeMask;
-
-            fence(memory_order_release);
-            frontBlock_->front = nextBlockFront;
-        } else {
-            // No elements in current block and no other block to advance to
-            return false;
-        }
-
-        return true;
-    }
-
-    // Returns a pointer to the front element in the queue (the one that
-    // would be removed next by a call to `try_dequeue` or `pop`). If the
-    // queue appears empty at the time the method is called, nullptr is
-    // returned instead.
-    // Must be called only from the consumer thread.
-    T* peek() const AE_NO_TSAN {
-#ifndef NDEBUG
-        ReentrantGuard guard(this->dequeuing);
-#endif
-        // See try_dequeue() for reasoning
-
-        Block* frontBlock_ = frontBlock.load();
-        size_t blockTail = frontBlock_->localTail;
-        size_t blockFront = frontBlock_->front.load();
-
-        if (blockFront != blockTail ||
-            blockFront != (frontBlock_->localTail = frontBlock_->tail.load())) {
-            fence(memory_order_acquire);
-        non_empty_front_block:
-            return reinterpret_cast<T*>(frontBlock_->data + blockFront * sizeof(T));
-        } else if (frontBlock_ != tailBlock.load()) {
-            fence(memory_order_acquire);
-            frontBlock_ = frontBlock.load();
-            blockTail = frontBlock_->localTail = frontBlock_->tail.load();
-            blockFront = frontBlock_->front.load();
-            fence(memory_order_acquire);
-
-            if (blockFront != blockTail) {
-                goto non_empty_front_block;
-            }
-
-            Block* nextBlock = frontBlock_->next;
-
-            size_t nextBlockFront = nextBlock->front.load();
-            fence(memory_order_acquire);
-
-            assert(nextBlockFront != nextBlock->tail.load());
-            return reinterpret_cast<T*>(nextBlock->data + nextBlockFront * sizeof(T));
-        }
-
-        return nullptr;
-    }
-
-    // Removes the front element from the queue, if any, without returning it.
-    // Returns true on success, or false if the queue appeared empty at the time
-    // `pop` was called.
-    bool pop() AE_NO_TSAN {
-#ifndef NDEBUG
-        ReentrantGuard guard(this->dequeuing);
-#endif
-        // See try_dequeue() for reasoning
-
-        Block* frontBlock_ = frontBlock.load();
-        size_t blockTail = frontBlock_->localTail;
-        size_t blockFront = frontBlock_->front.load();
-
-        if (blockFront != blockTail ||
-            blockFront != (frontBlock_->localTail = frontBlock_->tail.load())) {
-            fence(memory_order_acquire);
-
-        non_empty_front_block:
-            auto element = reinterpret_cast<T*>(frontBlock_->data + blockFront * sizeof(T));
-            element->~T();
-
-            blockFront = (blockFront + 1) & frontBlock_->sizeMask;
-
-            fence(memory_order_release);
-            frontBlock_->front = blockFront;
-        } else if (frontBlock_ != tailBlock.load()) {
-            fence(memory_order_acquire);
-            frontBlock_ = frontBlock.load();
-            blockTail = frontBlock_->localTail = frontBlock_->tail.load();
-            blockFront = frontBlock_->front.load();
-            fence(memory_order_acquire);
-
-            if (blockFront != blockTail) {
-                goto non_empty_front_block;
-            }
-
-            // Front block is empty but there's another block ahead, advance to it
-            Block* nextBlock = frontBlock_->next;
-
-            size_t nextBlockFront = nextBlock->front.load();
-            size_t nextBlockTail = nextBlock->localTail = nextBlock->tail.load();
-            fence(memory_order_acquire);
-
-            assert(nextBlockFront != nextBlockTail);
-            AE_UNUSED(nextBlockTail);
-
-            fence(memory_order_release);
-            frontBlock = frontBlock_ = nextBlock;
-
-            compiler_fence(memory_order_release);
-
-            auto element = reinterpret_cast<T*>(frontBlock_->data + nextBlockFront * sizeof(T));
-            element->~T();
-
-            nextBlockFront = (nextBlockFront + 1) & frontBlock_->sizeMask;
-
-            fence(memory_order_release);
-            frontBlock_->front = nextBlockFront;
-        } else {
-            // No elements in current block and no other block to advance to
-            return false;
-        }
-
-        return true;
-    }
-
-    // Returns the approximate number of items currently in the queue.
-    // Safe to call from both the producer and consumer threads.
-    inline size_t size_approx() const AE_NO_TSAN {
-        size_t result = 0;
-        Block* frontBlock_ = frontBlock.load();
-        Block* block = frontBlock_;
-        do {
-            fence(memory_order_acquire);
-            size_t blockFront = block->front.load();
-            size_t blockTail = block->tail.load();
-            result += (blockTail - blockFront) & block->sizeMask;
-            block = block->next.load();
-        } while (block != frontBlock_);
-        return result;
-    }
-
-    // Returns the total number of items that could be enqueued without incurring
-    // an allocation when this queue is empty.
-    // Safe to call from both the producer and consumer threads.
-    //
-    // NOTE: The actual capacity during usage may be different depending on the consumer.
-    //       If the consumer is removing elements concurrently, the producer cannot add to
-    //       the block the consumer is removing from until it's completely empty, except in
-    //       the case where the producer was writing to the same block the consumer was
-    //       reading from the whole time.
-    inline size_t max_capacity() const {
-        size_t result = 0;
-        Block* frontBlock_ = frontBlock.load();
-        Block* block = frontBlock_;
-        do {
-            fence(memory_order_acquire);
-            result += block->sizeMask;
-            block = block->next.load();
-        } while (block != frontBlock_);
-        return result;
-    }
-
-private:
-    enum AllocationMode { CanAlloc, CannotAlloc };
-
-#if MOODYCAMEL_HAS_EMPLACE
-    template <AllocationMode canAlloc, typename... Args>
-    bool inner_enqueue(Args&&... args) AE_NO_TSAN
-#else
-    template <AllocationMode canAlloc, typename U>
-    bool inner_enqueue(U&& element) AE_NO_TSAN
-#endif
-    {
-#ifndef NDEBUG
-        ReentrantGuard guard(this->enqueuing);
-#endif
-
-        // High-level pseudocode (assuming we're allowed to alloc a new block):
-        // If room in tail block, add to tail
-        // Else check next block
-        //     If next block is not the head block, enqueue on next block
-        //     Else create a new block and enqueue there
-        //     Advance tail to the block we just enqueued to
-
-        Block* tailBlock_ = tailBlock.load();
-        size_t blockFront = tailBlock_->localFront;
-        size_t blockTail = tailBlock_->tail.load();
-
-        size_t nextBlockTail = (blockTail + 1) & tailBlock_->sizeMask;
-        if (nextBlockTail != blockFront ||
-            nextBlockTail != (tailBlock_->localFront = tailBlock_->front.load())) {
-            fence(memory_order_acquire);
-            // This block has room for at least one more element
-            char* location = tailBlock_->data + blockTail * sizeof(T);
-#if MOODYCAMEL_HAS_EMPLACE
-            new (location) T(std::forward<Args>(args)...);
-#else
-            new (location) T(std::forward<U>(element));
-#endif
-
-            fence(memory_order_release);
-            tailBlock_->tail = nextBlockTail;
-        } else {
-            fence(memory_order_acquire);
-            if (tailBlock_->next.load() != frontBlock) {
-                // Note that the reason we can't advance to the frontBlock and start adding new
-                // entries there is because if we did, then dequeue would stay in that block,
-                // eventually reading the new values, instead of advancing to the next full block
-                // (whose values were enqueued first and so should be consumed first).
-
-                fence(memory_order_acquire); // Ensure we get latest writes if we got the latest
-                                             // frontBlock
-
-                // tailBlock is full, but there's a free block ahead, use it
-                Block* tailBlockNext = tailBlock_->next.load();
-                size_t nextBlockFront = tailBlockNext->localFront = tailBlockNext->front.load();
-                nextBlockTail = tailBlockNext->tail.load();
-                fence(memory_order_acquire);
-
-                // This block must be empty since it's not the head block and we
-                // go through the blocks in a circle
-                assert(nextBlockFront == nextBlockTail);
-                tailBlockNext->localFront = nextBlockFront;
-
-                char* location = tailBlockNext->data + nextBlockTail * sizeof(T);
-#if MOODYCAMEL_HAS_EMPLACE
-                new (location) T(std::forward<Args>(args)...);
-#else
-                new (location) T(std::forward<U>(element));
-#endif
-
-                tailBlockNext->tail = (nextBlockTail + 1) & tailBlockNext->sizeMask;
-
-                fence(memory_order_release);
-                tailBlock = tailBlockNext;
-            } else if (canAlloc == CanAlloc) {
-                // tailBlock is full and there's no free block ahead; create a new block
-                auto newBlockSize =
-                    largestBlockSize >= MAX_BLOCK_SIZE ? largestBlockSize : largestBlockSize * 2;
-                auto newBlock = make_block(newBlockSize);
-                if (newBlock == nullptr) {
-                    // Could not allocate a block!
-                    return false;
-                }
-                largestBlockSize = newBlockSize;
-
-#if MOODYCAMEL_HAS_EMPLACE
-                new (newBlock->data) T(std::forward<Args>(args)...);
-#else
-                new (newBlock->data) T(std::forward<U>(element));
-#endif
-                assert(newBlock->front == 0);
-                newBlock->tail = newBlock->localTail = 1;
-
-                newBlock->next = tailBlock_->next.load();
-                tailBlock_->next = newBlock;
-
-                // Might be possible for the dequeue thread to see the new tailBlock->next
-                // *without* seeing the new tailBlock value, but this is OK since it can't
-                // advance to the next block until tailBlock is set anyway (because the only
-                // case where it could try to read the next is if it's already at the tailBlock,
-                // and it won't advance past tailBlock in any circumstance).
-
-                fence(memory_order_release);
-                tailBlock = newBlock;
-            } else if (canAlloc == CannotAlloc) {
-                // Would have had to allocate a new block to enqueue, but not allowed
-                return false;
-            } else {
-                assert(false && "Should be unreachable code");
-                return false;
-            }
-        }
-
-        return true;
-    }
-
-    // Disable copying
-    ReaderWriterQueue(ReaderWriterQueue const&) {}
-
-    // Disable assignment
-    ReaderWriterQueue& operator=(ReaderWriterQueue const&) {}
-
-    AE_FORCEINLINE static size_t ceilToPow2(size_t x) {
-        // From http://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2
-        --x;
-        x |= x >> 1;
-        x |= x >> 2;
-        x |= x >> 4;
-        for (size_t i = 1; i < sizeof(size_t); i <<= 1) {
-            x |= x >> (i << 3);
-        }
-        ++x;
-        return x;
-    }
-
-    template <typename U>
-    static AE_FORCEINLINE char* align_for(char* ptr) AE_NO_TSAN {
-        const std::size_t alignment = std::alignment_of<U>::value;
-        return ptr + (alignment - (reinterpret_cast<std::uintptr_t>(ptr) % alignment)) % alignment;
-    }
-
-private:
-#ifndef NDEBUG
-    struct ReentrantGuard {
-        AE_NO_TSAN ReentrantGuard(weak_atomic<bool>& _inSection) : inSection(_inSection) {
-            assert(!inSection &&
-                   "Concurrent (or re-entrant) enqueue or dequeue operation detected (only one "
-                   "thread at a time may hold the producer or consumer role)");
-            inSection = true;
-        }
-
-        AE_NO_TSAN ~ReentrantGuard() {
-            inSection = false;
-        }
-
-    private:
-        ReentrantGuard& operator=(ReentrantGuard const&);
-
-    private:
-        weak_atomic<bool>& inSection;
-    };
-#endif
-
-    struct Block {
-        // Avoid false-sharing by putting highly contended variables on their own cache lines
-        weak_atomic<size_t> front; // (Atomic) Elements are read from here
-        size_t localTail;          // An uncontended shadow copy of tail, owned by the consumer
-
-        char cachelineFiller0[MOODYCAMEL_CACHE_LINE_SIZE - sizeof(weak_atomic<size_t>) -
-                              sizeof(size_t)];
-        weak_atomic<size_t> tail; // (Atomic) Elements are enqueued here
-        size_t localFront;
-
-        char cachelineFiller1[MOODYCAMEL_CACHE_LINE_SIZE - sizeof(weak_atomic<size_t>) -
-                              sizeof(size_t)]; // next isn't very contended, but we don't want it on
-                                               // the same cache line as tail (which is)
-        weak_atomic<Block*> next;              // (Atomic)
-
-        char* data; // Contents (on heap) are aligned to T's alignment
-
-        const size_t sizeMask;
-
-        // size must be a power of two (and greater than 0)
-        AE_NO_TSAN Block(size_t const& _size, char* _rawThis, char* _data)
-            : front(0UL), localTail(0), tail(0UL), localFront(0), next(nullptr), data(_data),
-              sizeMask(_size - 1), rawThis(_rawThis) {}
-
-    private:
-        // C4512 - Assignment operator could not be generated
-        Block& operator=(Block const&);
-
-    public:
-        char* rawThis;
-    };
-
-    static Block* make_block(size_t capacity) AE_NO_TSAN {
-        // Allocate enough memory for the block itself, as well as all the elements it will contain
-        auto size = sizeof(Block) + std::alignment_of<Block>::value - 1;
-        size += sizeof(T) * capacity + std::alignment_of<T>::value - 1;
-        auto newBlockRaw = static_cast<char*>(std::malloc(size));
-        if (newBlockRaw == nullptr) {
-            return nullptr;
-        }
-
-        auto newBlockAligned = align_for<Block>(newBlockRaw);
-        auto newBlockData = align_for<T>(newBlockAligned + sizeof(Block));
-        return new (newBlockAligned) Block(capacity, newBlockRaw, newBlockData);
-    }
-
-private:
-    weak_atomic<Block*> frontBlock; // (Atomic) Elements are dequeued from this block
-
-    char cachelineFiller[MOODYCAMEL_CACHE_LINE_SIZE - sizeof(weak_atomic<Block*>)];
-    weak_atomic<Block*> tailBlock; // (Atomic) Elements are enqueued to this block
-
-    size_t largestBlockSize;
-
-#ifndef NDEBUG
-    weak_atomic<bool> enqueuing;
-    mutable weak_atomic<bool> dequeuing;
-#endif
-};
-
-// Like ReaderWriterQueue, but also providees blocking operations
-template <typename T, size_t MAX_BLOCK_SIZE = 512>
-class BlockingReaderWriterQueue {
-private:
-    typedef ::Common::ReaderWriterQueue<T, MAX_BLOCK_SIZE> ReaderWriterQueue;
-
-public:
-    explicit BlockingReaderWriterQueue(size_t size = 15) AE_NO_TSAN
-        : inner(size),
-          sema(new spsc_sema::LightweightSemaphore()) {}
-
-    BlockingReaderWriterQueue(BlockingReaderWriterQueue&& other) AE_NO_TSAN
-        : inner(std::move(other.inner)),
-          sema(std::move(other.sema)) {}
-
-    BlockingReaderWriterQueue& operator=(BlockingReaderWriterQueue&& other) AE_NO_TSAN {
-        std::swap(sema, other.sema);
-        std::swap(inner, other.inner);
-        return *this;
-    }
-
-    // Enqueues a copy of element if there is room in the queue.
-    // Returns true if the element was enqueued, false otherwise.
-    // Does not allocate memory.
-    AE_FORCEINLINE bool try_enqueue(T const& element) AE_NO_TSAN {
-        if (inner.try_enqueue(element)) {
-            sema->signal();
-            return true;
-        }
-        return false;
-    }
-
-    // Enqueues a moved copy of element if there is room in the queue.
-    // Returns true if the element was enqueued, false otherwise.
-    // Does not allocate memory.
-    AE_FORCEINLINE bool try_enqueue(T&& element) AE_NO_TSAN {
-        if (inner.try_enqueue(std::forward<T>(element))) {
-            sema->signal();
-            return true;
-        }
-        return false;
-    }
-
-#if MOODYCAMEL_HAS_EMPLACE
-    // Like try_enqueue() but with emplace semantics (i.e. construct-in-place).
-    template <typename... Args>
-    AE_FORCEINLINE bool try_emplace(Args&&... args) AE_NO_TSAN {
-        if (inner.try_emplace(std::forward<Args>(args)...)) {
-            sema->signal();
-            return true;
-        }
-        return false;
-    }
-#endif
-
-    // Enqueues a copy of element on the queue.
-    // Allocates an additional block of memory if needed.
-    // Only fails (returns false) if memory allocation fails.
-    AE_FORCEINLINE bool enqueue(T const& element) AE_NO_TSAN {
-        if (inner.enqueue(element)) {
-            sema->signal();
-            return true;
-        }
-        return false;
-    }
-
-    // Enqueues a moved copy of element on the queue.
-    // Allocates an additional block of memory if needed.
-    // Only fails (returns false) if memory allocation fails.
-    AE_FORCEINLINE bool enqueue(T&& element) AE_NO_TSAN {
-        if (inner.enqueue(std::forward<T>(element))) {
-            sema->signal();
-            return true;
-        }
-        return false;
-    }
-
-#if MOODYCAMEL_HAS_EMPLACE
-    // Like enqueue() but with emplace semantics (i.e. construct-in-place).
-    template <typename... Args>
-    AE_FORCEINLINE bool emplace(Args&&... args) AE_NO_TSAN {
-        if (inner.emplace(std::forward<Args>(args)...)) {
-            sema->signal();
-            return true;
-        }
-        return false;
-    }
-#endif
-
-    // Attempts to dequeue an element; if the queue is empty,
-    // returns false instead. If the queue has at least one element,
-    // moves front to result using operator=, then returns true.
-    template <typename U>
-    bool try_dequeue(U& result) AE_NO_TSAN {
-        if (sema->tryWait()) {
-            bool success = inner.try_dequeue(result);
-            assert(success);
-            AE_UNUSED(success);
-            return true;
-        }
-        return false;
-    }
-
-    // Attempts to dequeue an element; if the queue is empty,
-    // waits until an element is available, then dequeues it.
-    template <typename U>
-    void wait_dequeue(U& result) AE_NO_TSAN {
-        while (!sema->wait())
-            ;
-        bool success = inner.try_dequeue(result);
-        AE_UNUSED(result);
-        assert(success);
-        AE_UNUSED(success);
-    }
-
-    // Attempts to dequeue an element; if the queue is empty,
-    // waits until an element is available up to the specified timeout,
-    // then dequeues it and returns true, or returns false if the timeout
-    // expires before an element can be dequeued.
-    // Using a negative timeout indicates an indefinite timeout,
-    // and is thus functionally equivalent to calling wait_dequeue.
-    template <typename U>
-    bool wait_dequeue_timed(U& result, std::int64_t timeout_usecs) AE_NO_TSAN {
-        if (!sema->wait(timeout_usecs)) {
-            return false;
-        }
-        bool success = inner.try_dequeue(result);
-        AE_UNUSED(result);
-        assert(success);
-        AE_UNUSED(success);
-        return true;
-    }
-
-#if __cplusplus > 199711L || _MSC_VER >= 1700
-    // Attempts to dequeue an element; if the queue is empty,
-    // waits until an element is available up to the specified timeout,
-    // then dequeues it and returns true, or returns false if the timeout
-    // expires before an element can be dequeued.
-    // Using a negative timeout indicates an indefinite timeout,
-    // and is thus functionally equivalent to calling wait_dequeue.
-    template <typename U, typename Rep, typename Period>
-    inline bool wait_dequeue_timed(U& result,
-                                   std::chrono::duration<Rep, Period> const& timeout) AE_NO_TSAN {
-        return wait_dequeue_timed(
-            result, std::chrono::duration_cast<std::chrono::microseconds>(timeout).count());
-    }
-#endif
-
-    // Returns a pointer to the front element in the queue (the one that
-    // would be removed next by a call to `try_dequeue` or `pop`). If the
-    // queue appears empty at the time the method is called, nullptr is
-    // returned instead.
-    // Must be called only from the consumer thread.
-    AE_FORCEINLINE T* peek() const AE_NO_TSAN {
-        return inner.peek();
-    }
-
-    // Removes the front element from the queue, if any, without returning it.
-    // Returns true on success, or false if the queue appeared empty at the time
-    // `pop` was called.
-    AE_FORCEINLINE bool pop() AE_NO_TSAN {
-        if (sema->tryWait()) {
-            bool result = inner.pop();
-            assert(result);
-            AE_UNUSED(result);
-            return true;
-        }
-        return false;
-    }
-
-    // Returns the approximate number of items currently in the queue.
-    // Safe to call from both the producer and consumer threads.
-    AE_FORCEINLINE size_t size_approx() const AE_NO_TSAN {
-        return sema->availableApprox();
-    }
-
-    // Returns the total number of items that could be enqueued without incurring
-    // an allocation when this queue is empty.
-    // Safe to call from both the producer and consumer threads.
-    //
-    // NOTE: The actual capacity during usage may be different depending on the consumer.
-    //       If the consumer is removing elements concurrently, the producer cannot add to
-    //       the block the consumer is removing from until it's completely empty, except in
-    //       the case where the producer was writing to the same block the consumer was
-    //       reading from the whole time.
-    AE_FORCEINLINE size_t max_capacity() const {
-        return inner.max_capacity();
-    }
-
-private:
-    // Disable copying & assignment
-    BlockingReaderWriterQueue(BlockingReaderWriterQueue const&) {}
-    BlockingReaderWriterQueue& operator=(BlockingReaderWriterQueue const&) {}
-
-private:
-    ReaderWriterQueue inner;
-    std::unique_ptr<spsc_sema::LightweightSemaphore> sema;
-};
-
-} // namespace Common
-
-#ifdef AE_VCPP
-#pragma warning(pop)
-#endif
--- a/src/core/crypto/aes_util.cpp
+++ b/src/core/crypto/aes_util.cpp
@ -1,7 +1,11 @@
+// SPDX-FileCopyrightText: Copyright 2025 Eden Emulator Project
+// SPDX-License-Identifier: GPL-3.0-or-later
+
 // SPDX-FileCopyrightText: Copyright 2018 yuzu Emulator Project
 // SPDX-License-Identifier: GPL-2.0-or-later

 #include <array>
+#include <vector>
 #include <mbedtls/cipher.h>
 #include "common/assert.h"
 #include "common/logging/log.h"
@ -71,14 +75,16 @@ void AESCipher<Key, KeySize>::Transcode(const u8* src, std::size_t size, u8* des

    mbedtls_cipher_reset(context);

+    // Only ECB strictly requires block sized chunks.
    std::size_t written = 0;
-    if (mbedtls_cipher_get_cipher_mode(context) == MBEDTLS_MODE_XTS) {
+    if (mbedtls_cipher_get_cipher_mode(context) != MBEDTLS_MODE_ECB) {
        mbedtls_cipher_update(context, src, size, dest, &written);
-        if (written != size) {
-            LOG_WARNING(Crypto, "Not all data was decrypted requested={:016X}, actual={:016X}.",
-                        size, written);
+        if (written != size)
+            LOG_WARNING(Crypto, "Not all data was processed requested={:016X}, actual={:016X}.", size, written);
+        return;
    }
-    } else {
+
+    // ECB path: operate in block sized chunks and mirror previous behavior.
    const auto block_size = mbedtls_cipher_get_block_size(context);
    if (size < block_size) {
        std::vector<u8> block(block_size);
@ -89,7 +95,7 @@ void AESCipher<Key, KeySize>::Transcode(const u8* src, std::size_t size, u8* des
    }

    for (std::size_t offset = 0; offset < size; offset += block_size) {
-            auto length = std::min<std::size_t>(block_size, size - offset);
+        const auto length = std::min<std::size_t>(block_size, size - offset);
        mbedtls_cipher_update(context, src + offset, length, dest + offset, &written);
        if (written != length) {
            if (length < block_size) {
@ -99,9 +105,7 @@ void AESCipher<Key, KeySize>::Transcode(const u8* src, std::size_t size, u8* des
                std::memcpy(dest + offset, block.data(), length);
                return;
            }
-                LOG_WARNING(Crypto, "Not all data was decrypted requested={:016X}, actual={:016X}.",
-                            length, written);
-            }
+            LOG_WARNING(Crypto, "Not all data was processed requested={:016X}, actual={:016X}.", length, written);
        }
    }
 }
--- a/src/core/crypto/ctr_encryption_layer.cpp
+++ b/src/core/crypto/ctr_encryption_layer.cpp
@ -1,3 +1,6 @@
+// SPDX-FileCopyrightText: Copyright 2025 Eden Emulator Project
+// SPDX-License-Identifier: GPL-3.0-or-later
+
 // SPDX-FileCopyrightText: Copyright 2018 yuzu Emulator Project
 // SPDX-License-Identifier: GPL-2.0-or-later

@ -15,26 +18,36 @@ std::size_t CTREncryptionLayer::Read(u8* data, std::size_t length, std::size_t o
    if (length == 0)
        return 0;

-    const auto sector_offset = offset & 0xF;
-    if (sector_offset == 0) {
+    std::size_t total_read = 0;
+    // Handle an initial misaligned portion if needed.
+    if (auto const sector_offset = offset & 0xF; sector_offset != 0) {
+        const std::size_t aligned_off = offset - sector_offset;
+        std::array<u8, 0x10> block{};
+        if (auto const got = base->Read(block.data(), block.size(), aligned_off); got != 0) {
+            UpdateIV(base_offset + aligned_off);
+            cipher.Transcode(block.data(), got, block.data(), Op::Decrypt);
+            auto const to_copy = std::min<std::size_t>(length, got > sector_offset ? got - sector_offset : 0);
+            if (to_copy > 0) {
+                std::memcpy(data, block.data() + sector_offset, to_copy);
+                data += to_copy;
+                offset += to_copy;
+                length -= to_copy;
+                total_read += to_copy;
+            }
+        } else {
+            return 0;
+        }
+    }
+    if (length > 0) {
+        // Now aligned to 0x10
        UpdateIV(base_offset + offset);
-        std::vector<u8> raw = base->ReadBytes(length, offset);
-        cipher.Transcode(raw.data(), raw.size(), data, Op::Decrypt);
-        return length;
+        const std::size_t got = base->Read(data, length, offset);
+        if (got > 0) {
+            cipher.Transcode(data, got, data, Op::Decrypt);
+            total_read += got;
        }
-
-    // offset does not fall on block boundary (0x10)
-    std::vector<u8> block = base->ReadBytes(0x10, offset - sector_offset);
-    UpdateIV(base_offset + offset - sector_offset);
-    cipher.Transcode(block.data(), block.size(), block.data(), Op::Decrypt);
-    std::size_t read = 0x10 - sector_offset;
-
-    if (length + sector_offset < 0x10) {
-        std::memcpy(data, block.data() + sector_offset, std::min<u64>(length, read));
-        return std::min<u64>(length, read);
    }
-    std::memcpy(data, block.data() + sector_offset, read);
-    return read + Read(data + read, length - read, offset + read);
+    return total_read;
 }

 void CTREncryptionLayer::SetIV(const IVData& iv_) {
--- a/src/core/crypto/xts_encryption_layer.cpp
+++ b/src/core/crypto/xts_encryption_layer.cpp
@ -5,12 +5,13 @@
 // SPDX-License-Identifier: GPL-2.0-or-later

 #include <algorithm>
+#include <array>
 #include <cstring>
 #include "core/crypto/xts_encryption_layer.h"

 namespace Core::Crypto {

-constexpr u64 XTS_SECTOR_SIZE = 0x4000;
+constexpr std::size_t XTS_SECTOR_SIZE = 0x4000;

 XTSEncryptionLayer::XTSEncryptionLayer(FileSys::VirtualFile base_, Key256 key_)
    : EncryptionLayer(std::move(base_)), cipher(key_, Mode::XTS) {}
@ -19,41 +20,67 @@ std::size_t XTSEncryptionLayer::Read(u8* data, std::size_t length, std::size_t o
    if (length == 0)
        return 0;

-    const auto sector_offset = offset & 0x3FFF;
-    if (sector_offset == 0) {
-        if (length % XTS_SECTOR_SIZE == 0) {
-            std::vector<u8> raw = base->ReadBytes(length, offset);
-            cipher.XTSTranscode(raw.data(), raw.size(), data, offset / XTS_SECTOR_SIZE,
+    std::size_t total_read = 0;
+    // Handle initial unaligned part within a sector.
+    if (auto const sector_offset = offset % XTS_SECTOR_SIZE; sector_offset != 0) {
+        const std::size_t aligned_off = offset - sector_offset;
+        std::array<u8, XTS_SECTOR_SIZE> block{};
+        if (auto const got = base->Read(block.data(), XTS_SECTOR_SIZE, aligned_off); got > 0) {
+            if (got < XTS_SECTOR_SIZE)
+                std::memset(block.data() + got, 0, XTS_SECTOR_SIZE - got);
+            cipher.XTSTranscode(block.data(), XTS_SECTOR_SIZE, block.data(), aligned_off / XTS_SECTOR_SIZE,
                                XTS_SECTOR_SIZE, Op::Decrypt);
-            return raw.size();
+
+            auto const to_copy = std::min<std::size_t>(length, got > sector_offset ? got - sector_offset : 0);
+            if (to_copy > 0) {
+                std::memcpy(data, block.data() + sector_offset, to_copy);
+                data += to_copy;
+                offset += to_copy;
+                length -= to_copy;
+                total_read += to_copy;
            }
-        if (length > XTS_SECTOR_SIZE) {
-            const auto rem = length % XTS_SECTOR_SIZE;
-            const auto read = length - rem;
-            return Read(data, read, offset) + Read(data + read, rem, offset + read);
+        } else {
+            return 0;
        }
-        std::vector<u8> buffer = base->ReadBytes(XTS_SECTOR_SIZE, offset);
-        if (buffer.size() < XTS_SECTOR_SIZE)
-            buffer.resize(XTS_SECTOR_SIZE);
-        cipher.XTSTranscode(buffer.data(), buffer.size(), buffer.data(), offset / XTS_SECTOR_SIZE,
-                            XTS_SECTOR_SIZE, Op::Decrypt);
-        std::memcpy(data, buffer.data(), (std::min)(buffer.size(), length));
-        return (std::min)(buffer.size(), length);
    }

-    // offset does not fall on block boundary (0x4000)
-    std::vector<u8> block = base->ReadBytes(0x4000, offset - sector_offset);
-    if (block.size() < XTS_SECTOR_SIZE)
-        block.resize(XTS_SECTOR_SIZE);
-    cipher.XTSTranscode(block.data(), block.size(), block.data(),
-                        (offset - sector_offset) / XTS_SECTOR_SIZE, XTS_SECTOR_SIZE, Op::Decrypt);
-    const std::size_t read = XTS_SECTOR_SIZE - sector_offset;
-
-    if (length + sector_offset < XTS_SECTOR_SIZE) {
-        std::memcpy(data, block.data() + sector_offset, std::min<u64>(length, read));
-        return std::min<u64>(length, read);
+    if (length > 0) {
+        // Process aligned middle inplace, in sector sized multiples.
+        while (length >= XTS_SECTOR_SIZE) {
+            const std::size_t req = (length / XTS_SECTOR_SIZE) * XTS_SECTOR_SIZE;
+            const std::size_t got = base->Read(data, req, offset);
+            if (got == 0) {
+                return total_read;
            }
-    std::memcpy(data, block.data() + sector_offset, read);
-    return read + Read(data + read, length - read, offset + read);
+            const std::size_t got_rounded = got - (got % XTS_SECTOR_SIZE);
+            if (got_rounded > 0) {
+                cipher.XTSTranscode(data, got_rounded, data, offset / XTS_SECTOR_SIZE, XTS_SECTOR_SIZE, Op::Decrypt);
+                data += got_rounded;
+                offset += got_rounded;
+                length -= got_rounded;
+                total_read += got_rounded;
+            }
+            // If we didn't get a full sector next, break to handle tail.
+            if (got_rounded != got) {
+                break;
+            }
+        }
+        // Handle tail within a sector, if any.
+        if (length > 0) {
+            std::array<u8, XTS_SECTOR_SIZE> block{};
+            const std::size_t got = base->Read(block.data(), XTS_SECTOR_SIZE, offset);
+            if (got > 0) {
+                if (got < XTS_SECTOR_SIZE) {
+                    std::memset(block.data() + got, 0, XTS_SECTOR_SIZE - got);
+                }
+                cipher.XTSTranscode(block.data(), XTS_SECTOR_SIZE, block.data(),
+                                    offset / XTS_SECTOR_SIZE, XTS_SECTOR_SIZE, Op::Decrypt);
+                const std::size_t to_copy = std::min<std::size_t>(length, got);
+                std::memcpy(data, block.data(), to_copy);
+                total_read += to_copy;
+            }
+        }
+    }
+    return total_read;
 }
 } // namespace Core::Crypto
--- a/src/core/file_sys/fssystem/fssystem_aes_ctr_storage.cpp
+++ b/src/core/file_sys/fssystem/fssystem_aes_ctr_storage.cpp
@ -4,6 +4,7 @@
 // SPDX-FileCopyrightText: Copyright 2023 yuzu Emulator Project
 // SPDX-License-Identifier: GPL-2.0-or-later

+#include <boost/container/static_vector.hpp>
 #include "common/alignment.h"
 #include "common/swap.h"
 #include "core/file_sys/fssystem/fssystem_aes_ctr_storage.h"
@ -83,32 +84,24 @@ size_t AesCtrStorage::Write(const u8* buffer, size_t size, size_t offset) {
    std::memcpy(ctr.data(), m_iv.data(), IvSize);
    AddCounter(ctr.data(), IvSize, offset / BlockSize);

-    // Loop until all data is written.
-    size_t remaining = size;
-    s64 cur_offset = 0;
-
-    // Get a pooled buffer.
-    std::vector<char> pooled_buffer(BlockSize);
-    while (remaining > 0) {
+    // Loop until all data is written using a pooled buffer residing on the stack (blocksize = 0x10)
+    boost::container::static_vector<u8, BlockSize> pooled_buffer;
+    for (size_t remaining = size; remaining > 0; ) {
        // Determine data we're writing and where.
-        const size_t write_size = std::min(pooled_buffer.size(), remaining);
-        u8* write_buf = reinterpret_cast<u8*>(pooled_buffer.data());
+        auto const write_size = (std::min)(pooled_buffer.size(), remaining);
+        u8* write_buf = pooled_buffer.data();

-        // Encrypt the data.
+        // Encrypt the data and then write it.
        m_cipher->SetIV(ctr);
        m_cipher->Transcode(buffer, write_size, write_buf, Core::Crypto::Op::Encrypt);
+        m_base_storage->Write(write_buf, write_size, offset);

-        // Write the encrypted data.
-        m_base_storage->Write(write_buf, write_size, offset + cur_offset);
-
-        // Advance.
-        cur_offset += write_size;
+        // Advance next write chunk
+        offset += write_size;
        remaining -= write_size;
-        if (remaining > 0) {
+        if (remaining > 0)
            AddCounter(ctr.data(), IvSize, write_size / BlockSize);
    }
-    }
-
    return size;
 }

--- a/src/core/file_sys/fssystem/fssystem_aes_xts_storage.cpp
+++ b/src/core/file_sys/fssystem/fssystem_aes_xts_storage.cpp
@ -4,9 +4,13 @@
 // SPDX-FileCopyrightText: Copyright 2023 yuzu Emulator Project
 // SPDX-License-Identifier: GPL-2.0-or-later

+#include <algorithm>
+#include <array>
+#include <boost/container/static_vector.hpp>
 #include "common/alignment.h"
 #include "common/swap.h"
 #include "core/file_sys/fssystem/fssystem_aes_xts_storage.h"
+#include "core/file_sys/fssystem/fssystem_nca_header.h"
 #include "core/file_sys/fssystem/fssystem_utility.h"

 namespace FileSys {
@ -41,18 +45,12 @@ AesXtsStorage::AesXtsStorage(VirtualFile base, const void* key1, const void* key

 size_t AesXtsStorage::Read(u8* buffer, size_t size, size_t offset) const {
    // Allow zero-size reads.
-    if (size == 0) {
+    if (size == 0)
        return size;
-    }

-    // Ensure buffer is valid.
+    // Ensure buffer is valid and we can only read at block aligned offsets.
    ASSERT(buffer != nullptr);
-
-    // We can only read at block aligned offsets.
-    ASSERT(Common::IsAligned(offset, AesBlockSize));
-    ASSERT(Common::IsAligned(size, AesBlockSize));
-
-    // Read the data.
+    ASSERT(Common::IsAligned(offset, AesBlockSize) && Common::IsAligned(size, AesBlockSize));
    m_base_storage->Read(buffer, size, offset);

    // Setup the counter.
@ -60,25 +58,21 @@ size_t AesXtsStorage::Read(u8* buffer, size_t size, size_t offset) const {
    std::memcpy(ctr.data(), m_iv.data(), IvSize);
    AddCounter(ctr.data(), IvSize, offset / m_block_size);

-    // Handle any unaligned data before the start.
+    // Handle any unaligned data before the start; then read said data into a local pooled
+    // buffer that resides on the stack, do not use the global memory allocator this is a
+    // very tiny (512 bytes) buffer so should be fine to keep on the stack (Nca::XtsBlockSize wide buffer)
    size_t processed_size = 0;
    if ((offset % m_block_size) != 0) {
+        // Decrypt into our pooled stack buffer (max bound = NCA::XtsBlockSize)
+        boost::container::static_vector<u8, NcaHeader::XtsBlockSize> tmp_buf;
        // Determine the size of the pre-data read.
-        const size_t skip_size =
-            static_cast<size_t>(offset - Common::AlignDown(offset, m_block_size));
-        const size_t data_size = (std::min)(size, m_block_size - skip_size);
-
-        // Decrypt into a pooled buffer.
-        {
-            std::vector<char> tmp_buf(m_block_size, 0);
+        auto const skip_size = size_t(offset - Common::AlignDown(offset, m_block_size));
+        auto const data_size = (std::min)(size, m_block_size - skip_size);
+        std::fill_n(tmp_buf.begin(), skip_size, u8{0});
        std::memcpy(tmp_buf.data() + skip_size, buffer, data_size);
-
        m_cipher->SetIV(ctr);
-            m_cipher->Transcode(tmp_buf.data(), m_block_size, tmp_buf.data(),
-                                Core::Crypto::Op::Decrypt);
-
+        m_cipher->Transcode(tmp_buf.data(), m_block_size, tmp_buf.data(), Core::Crypto::Op::Decrypt);
        std::memcpy(buffer, tmp_buf.data() + skip_size, data_size);
-        }

        AddCounter(ctr.data(), IvSize, 1);
        processed_size += data_size;
@ -86,20 +80,16 @@ size_t AesXtsStorage::Read(u8* buffer, size_t size, size_t offset) const {
    }

    // Decrypt aligned chunks.
-    char* cur = reinterpret_cast<char*>(buffer) + processed_size;
-    size_t remaining = size - processed_size;
-    while (remaining > 0) {
-        const size_t cur_size = (std::min)(m_block_size, remaining);
-
+    auto* cur = buffer + processed_size;
+    for (size_t remaining = size - processed_size; remaining > 0; ) {
+        auto const cur_size = (std::min)(m_block_size, remaining);
        m_cipher->SetIV(ctr);
-        m_cipher->Transcode(cur, cur_size, cur, Core::Crypto::Op::Decrypt);
-
+        auto* char_cur = reinterpret_cast<char*>(cur); //same repr cur - diff signedness
+        m_cipher->Transcode(char_cur, cur_size, char_cur, Core::Crypto::Op::Decrypt);
        remaining -= cur_size;
        cur += cur_size;
-
        AddCounter(ctr.data(), IvSize, 1);
    }
-
    return size;
 }

--- a/src/dynarmic/externals/cpmfile.json
+++ b/src/dynarmic/externals/cpmfile.json
@ -13,6 +13,9 @@
        "hash": "f943bac39c1879986decad7a442ff4288eaeca4a2907684c7914e115a55ecc43c2782ded85c0835763fe04e40d5c82220ce864423e489e648e408a84f54dc4f3",
        "options": [
            "MCL_INSTALL OFF"
+        ],
+        "patches": [
+            "0001-assert-macro.patch"
        ]
    },
    "zycore": {
--- a/src/dynarmic/src/dynarmic/backend/arm64/emit_arm64.cpp
+++ b/src/dynarmic/src/dynarmic/backend/arm64/emit_arm64.cpp
@ -1,3 +1,6 @@
+// SPDX-FileCopyrightText: Copyright 2025 Eden Emulator Project
+// SPDX-License-Identifier: GPL-3.0-or-later
+
 /* This file is part of the dynarmic project.
 * Copyright (c) 2022 MerryMage
 * SPDX-License-Identifier: 0BSD
@ -238,7 +241,7 @@ EmittedBlockInfo EmitArm64(oaknut::CodeGenerator& code, IR::Block block, const E
 #undef A32OPC
 #undef A64OPC
        default:
-            ASSERT_FALSE("Invalid opcode: {}", inst->GetOpcode());
+            ASSERT_FALSE("Invalid opcode: {:x}", std::size_t(inst->GetOpcode()));
            break;
        }

--- a/src/dynarmic/src/dynarmic/backend/riscv64/emit_riscv64.cpp
+++ b/src/dynarmic/src/dynarmic/backend/riscv64/emit_riscv64.cpp
@ -1,3 +1,6 @@
+// SPDX-FileCopyrightText: Copyright 2025 Eden Emulator Project
+// SPDX-License-Identifier: GPL-3.0-or-later
+
 /* This file is part of the dynarmic project.
 * Copyright (c) 2024 MerryMage
 * SPDX-License-Identifier: 0BSD
@ -140,7 +143,7 @@ EmittedBlockInfo EmitRV64(biscuit::Assembler& as, IR::Block block, const EmitCon
 #undef A32OPC
 #undef A64OPC
        default:
-            ASSERT_FALSE("Invalid opcode: {}", inst->GetOpcode());
+            ASSERT_FALSE("Invalid opcode: {:x}", std::size_t(inst->GetOpcode()));
            break;
        }
    }
--- a/src/dynarmic/src/dynarmic/backend/x64/a32_emit_x64.cpp
+++ b/src/dynarmic/src/dynarmic/backend/x64/a32_emit_x64.cpp
@ -145,7 +145,7 @@ A32EmitX64::BlockDescriptor A32EmitX64::Emit(IR::Block& block) {
 #undef OPCODE
 #undef A32OPC
 #undef A64OPC
-            default: [[unlikely]] ASSERT_FALSE("Invalid opcode: {}", inst->GetOpcode());
+            default: [[unlikely]] ASSERT_FALSE("Invalid opcode: {:x}", std::size_t(inst->GetOpcode()));
            }
            reg_alloc.EndOfAllocScope();
            func(reg_alloc);
--- a/src/dynarmic/src/dynarmic/backend/x64/a64_emit_x64.cpp
+++ b/src/dynarmic/src/dynarmic/backend/x64/a64_emit_x64.cpp
@ -130,7 +130,7 @@ A64EmitX64::BlockDescriptor A64EmitX64::Emit(IR::Block& block) noexcept {
 #undef A32OPC
 #undef A64OPC
        default: [[unlikely]] {
-            ASSERT_MSG(false, "Invalid opcode: {}", opcode);
+                ASSERT_MSG(false, "Invalid opcode: {:x}", std::size_t(opcode));
            goto finish_this_inst;
        }
        }
--- a/src/dynarmic/src/dynarmic/backend/x64/emit_x64.cpp
+++ b/src/dynarmic/src/dynarmic/backend/x64/emit_x64.cpp
@ -59,7 +59,7 @@ std::optional<EmitX64::BlockDescriptor> EmitX64::GetBasicBlock(IR::LocationDescr
 }

 void EmitX64::EmitInvalid(EmitContext&, IR::Inst* inst) {
-    ASSERT_MSG(false, "Invalid opcode: {}", inst->GetOpcode());
+    ASSERT_MSG(false, "Invalid opcode: {:x}", std::size_t(inst->GetOpcode()));
 }

 void EmitX64::EmitVoid(EmitContext&, IR::Inst*) {
--- a/src/dynarmic/src/dynarmic/ir/opcodes.h
+++ b/src/dynarmic/src/dynarmic/ir/opcodes.h
@ -654,11 +654,3 @@ constexpr bool MayGetNZCVFromOp(const Opcode op) noexcept {
 }

 }  // namespace Dynarmic::IR
-
-template<>
-struct fmt::formatter<Dynarmic::IR::Opcode> : fmt::formatter<std::string> {
-    template<typename FormatContext>
-    auto format(Dynarmic::IR::Opcode op, FormatContext& ctx) const {
-        return formatter<std::string>::format(Dynarmic::IR::GetNameOf(op), ctx);
-    }
-};
--- a/src/hid_core/frontend/emulated_controller.cpp
+++ b/src/hid_core/frontend/emulated_controller.cpp
@ -763,7 +763,8 @@ void EmulatedController::StartMotionCalibration() {
    }
 }

-void EmulatedController::SetButton(const Common::Input::CallbackStatus& callback, std::size_t index, Common::UUID uuid) {
+void EmulatedController::SetButton(const Common::Input::CallbackStatus& callback, std::size_t index,
+                                   Common::UUID uuid) {
    const auto player_index = Service::HID::NpadIdTypeToIndex(npad_id_type);
    const auto& player = Settings::values.players.GetValue()[player_index];

@ -923,9 +924,13 @@ void EmulatedController::SetButton(const Common::Input::CallbackStatus& callback

    lock.unlock();

+    if (!is_connected && !controller_connected[player_index]) {
        if (player.connected) {
            Connect();
+            controller_connected[player_index] = true;
        }
+    }
+
    TriggerOnChange(ControllerTriggerType::Button, true);
 }

--- a/src/hid_core/frontend/emulated_controller.h
+++ b/src/hid_core/frontend/emulated_controller.h
@ -20,6 +20,7 @@
 #include "common/settings.h"
 #include "common/vector_math.h"
 #include "hid_core/frontend/motion_input.h"
+#include "hid_core/hid_core.h"
 #include "hid_core/hid_types.h"
 #include "hid_core/irsensor/irs_types.h"

@ -588,6 +589,7 @@ private:
    std::array<VibrationValue, 2> last_vibration_value{DEFAULT_VIBRATION_VALUE,
                                                       DEFAULT_VIBRATION_VALUE};
    std::array<std::chrono::steady_clock::time_point, 2> last_vibration_timepoint{};
+    std::array<bool, HIDCore::available_controllers - 2> controller_connected{};

    // Temporary values to avoid doing changes while the controller is in configuring mode
    NpadStyleIndex tmp_npad_type{NpadStyleIndex::None};
--- a/src/qt_common/config/shared_translation.h
+++ b/src/qt_common/config/shared_translation.h
@ -14,6 +14,7 @@
 #include <utility>
 #include <vector>
 #include <QString>
+#include <QObject>
 #include "common/common_types.h"
 #include "common/settings_enums.h"

--- a/src/qt_common/qt_compat.h
+++ b/src/qt_common/qt_compat.h
@ -3,7 +3,7 @@

 #pragma once

-#include <QtVersionChecks>
+#include <QtGlobal>

 #if QT_VERSION < QT_VERSION_CHECK(6, 9, 0)
 #define STATE_CHANGED stateChanged
--- a/src/qt_common/util/compress.cpp
+++ b/src/qt_common/util/compress.cpp
@ -82,16 +82,9 @@ bool compressSubDir(QuaZip *zip,
    if (dir != origDir) {
        QuaZipFile dirZipFile(zip);
        std::unique_ptr<QuaZipNewInfo> qzni;
-        if (options.getDateTime().isNull()) {
        qzni = std::make_unique<QuaZipNewInfo>(origDirectory.relativeFilePath(dir)
                                                   + QLatin1String("/"),
                                               dir);
-        } else {
-            qzni = std::make_unique<QuaZipNewInfo>(origDirectory.relativeFilePath(dir)
-                                                       + QLatin1String("/"),
-                                                   dir,
-                                                   options.getDateTime());
-        }
        if (!dirZipFile.open(QIODevice::WriteOnly, *qzni, nullptr, 0, 0)) {
            return false;
        }
@ -156,7 +149,7 @@ bool compressFile(QuaZip *zip,
            return false;
    } else {
        if (!outFile.open(QIODevice::WriteOnly,
-                          QuaZipNewInfo(fileDest, fileName, options.getDateTime()),
+                          QuaZipNewInfo(fileDest, fileName),
                          nullptr,
                          0,
                          options.getCompressionMethod(),
--- a/src/qt_common/util/content.cpp
+++ b/src/qt_common/util/content.cpp
@ -415,10 +415,8 @@ void ExportDataDir(FrontendCommon::DataManager::DataDir data_dir,
    QGuiApplication::processEvents();

    auto progress_callback = [=](size_t total_size, size_t processed_size) {
-        QMetaObject::invokeMethod(progress,
-                                  &QtProgressDialog::setValue,
-                                  static_cast<int>((processed_size * 100) / total_size));
-
+        QMetaObject::invokeMethod(progress, "setValue", Qt::DirectConnection,
+                                  Q_ARG(int, static_cast<int>((processed_size * 100) / total_size)));
        return !progress->wasCanceled();
    };

@ -501,9 +499,8 @@ void ImportDataDir(FrontendCommon::DataManager::DataDir data_dir,

    QObject::connect(delete_watcher, &QFutureWatcher<bool>::finished, rootObject, [=]() {
        auto progress_callback = [=](size_t total_size, size_t processed_size) {
-            QMetaObject::invokeMethod(progress,
-                                      &QtProgressDialog::setValue,
-                                      static_cast<int>((processed_size * 100) / total_size));
+            QMetaObject::invokeMethod(progress, "setValue", Qt::DirectConnection,
+                                      Q_ARG(int, static_cast<int>((processed_size * 100) / total_size)));

            return !progress->wasCanceled();
        };
--- a/src/video_core/renderer_vulkan/present/layer.cpp
+++ b/src/video_core/renderer_vulkan/present/layer.cpp
@ -332,7 +332,7 @@ void Layer::UpdateRawImage(const Tegra::FramebufferConfig& framebuffer, size_t i
        write_barrier.dstAccessMask = VK_ACCESS_SHADER_READ_BIT;
        write_barrier.oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL;

-        cmdbuf.PipelineBarrier(VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0,
+        cmdbuf.PipelineBarrier(VK_PIPELINE_STAGE_HOST_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, 0,
                               read_barrier);
        cmdbuf.CopyBufferToImage(*buffer, image, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, copy);
        cmdbuf.PipelineBarrier(VK_PIPELINE_STAGE_TRANSFER_BIT,
--- a/src/video_core/renderer_vulkan/vk_master_semaphore.cpp
+++ b/src/video_core/renderer_vulkan/vk_master_semaphore.cpp
@ -114,11 +114,8 @@ VkResult MasterSemaphore::SubmitQueue(vk::CommandBuffer& cmdbuf, vk::CommandBuff
    }
 }

-// Use precise wait stages instead of ALL_COMMANDS to avoid pipeline-wide stalls.
-// First entry is used for external acquire waits; we wait at transfer and color output stages
-// because this submit contains an upload cmd buffer and a render cmd buffer.
 static constexpr std::array<VkPipelineStageFlags, 2> wait_stage_masks{
-    VK_PIPELINE_STAGE_TRANSFER_BIT | VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
+    VK_PIPELINE_STAGE_ALL_COMMANDS_BIT,
    VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
 };

--- a/src/video_core/renderer_vulkan/vk_present_manager.cpp
+++ b/src/video_core/renderer_vulkan/vk_present_manager.cpp
@ -412,7 +412,7 @@ void PresentManager::CopyToSwapchainImpl(Frame* frame) {
            .sType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,
            .pNext = nullptr,
            .srcAccessMask = VK_ACCESS_TRANSFER_WRITE_BIT,
-            .dstAccessMask = 0,
+            .dstAccessMask = VK_ACCESS_MEMORY_READ_BIT,
            .oldLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL,
            .newLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
            .srcQueueFamilyIndex = VK_QUEUE_FAMILY_IGNORED,
@ -460,7 +460,7 @@ void PresentManager::CopyToSwapchainImpl(Frame* frame) {
                         MakeImageCopy(frame->width, frame->height, extent.width, extent.height));
    }

-    cmdbuf.PipelineBarrier(VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT, {},
+    cmdbuf.PipelineBarrier(VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_ALL_GRAPHICS_BIT, {},
                           {}, {}, post_barriers);

    cmdbuf.End();
--- a/src/video_core/renderer_vulkan/vk_texture_cache.cpp
+++ b/src/video_core/renderer_vulkan/vk_texture_cache.cpp
@ -1068,7 +1068,7 @@ void TextureCacheRuntime::ReinterpretImage(Image& dst, Image& src,

        cmdbuf.PipelineBarrier(VK_PIPELINE_STAGE_ALL_COMMANDS_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT,
                               0, READ_BARRIER, {}, middle_out_barrier);
-        cmdbuf.CopyBufferToImage(copy_buffer, dst_image, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, vk_out_copies);
+        cmdbuf.CopyBufferToImage(copy_buffer, dst_image, VK_IMAGE_LAYOUT_GENERAL, vk_out_copies);
        cmdbuf.PipelineBarrier(VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_ALL_COMMANDS_BIT,
                               0, {}, {}, post_barriers);
    });
--- a/src/yuzu/configuration/configure_debug.cpp
+++ b/src/yuzu/configuration/configure_debug.cpp
@ -44,8 +44,7 @@ void ConfigureDebug::SetConfiguration() {
    ui->log_filter_edit->setText(QString::fromStdString(Settings::values.log_filter.GetValue()));
    ui->flush_line->setChecked(Settings::values.log_flush_line.GetValue());
    ui->censor_username->setChecked(Settings::values.censor_username.GetValue());
-    ui->homebrew_args_edit->setText(
-        QString::fromStdString(Settings::values.program_args.GetValue()));
+    ui->homebrew_args_edit->setText(QString::fromStdString(Settings::values.program_args.GetValue()));
    ui->fs_access_log->setEnabled(runtime_lock);
    ui->fs_access_log->setChecked(Settings::values.enable_fs_access_log.GetValue());
    ui->reporting_services->setChecked(Settings::values.reporting_services.GetValue());
@ -75,14 +74,12 @@ void ConfigureDebug::SetConfiguration() {
    ui->disable_macro_hle->setEnabled(runtime_lock);
    ui->disable_macro_hle->setChecked(Settings::values.disable_macro_hle.GetValue());
    ui->disable_loop_safety_checks->setEnabled(runtime_lock);
-    ui->disable_loop_safety_checks->setChecked(
-        Settings::values.disable_shader_loop_safety_checks.GetValue());
+    ui->disable_loop_safety_checks->setChecked(Settings::values.disable_shader_loop_safety_checks.GetValue());
    ui->extended_logging->setChecked(Settings::values.extended_logging.GetValue());
    ui->perform_vulkan_check->setChecked(Settings::values.perform_vulkan_check.GetValue());
-
-#ifdef YUZU_USE_QT_WEB_ENGINE
    ui->disable_web_applet->setChecked(Settings::values.disable_web_applet.GetValue());
-#else
+
+#ifndef YUZU_USE_QT_WEB_ENGINE
    ui->disable_web_applet->setVisible(false);
 #endif
 }
@ -110,8 +107,7 @@ void ConfigureDebug::ApplyConfiguration() {
    Settings::values.enable_nsight_aftermath = ui->enable_nsight_aftermath->isChecked();
    Settings::values.dump_shaders = ui->dump_shaders->isChecked();
    Settings::values.dump_macros = ui->dump_macros->isChecked();
-    Settings::values.disable_shader_loop_safety_checks =
-        ui->disable_loop_safety_checks->isChecked();
+    Settings::values.disable_shader_loop_safety_checks = ui->disable_loop_safety_checks->isChecked();
    Settings::values.disable_macro_jit = ui->disable_macro_jit->isChecked();
    Settings::values.disable_macro_hle = ui->disable_macro_hle->isChecked();
    Settings::values.extended_logging = ui->extended_logging->isChecked();
Author	SHA1	Message	Date
lizzie	440ee4916d	[nca] Use better tight loop allocation schemes (none at all) for AES decrypt/encrypt and force MbedTLS to use AES x86_64 instructions (#2750 ) Uses stack instead of allocating stuff haphazardly (16 bytes and 512 bytes respectively) - removes malloc() pollution and all that nasty stuff from tight loops Original work by Ribbit but edited by me. Will NOT bring a massive speedup since the main bottleneck is mbedtls itself, but may bring nice oddities to STARTUP TIMES nonetheless. AES instructions being forced wont affect CPUs without them since there is always a runtime check for them. Signed-off-by: lizzie lizzie@eden-emu.dev Co-authored-by: Ribbit <ribbit@placeholder.com> Reviewed-on: eden-emu/eden#2750 Reviewed-by: CamilleLaVey <camillelavey99@gmail.com> Co-authored-by: lizzie <lizzie@eden-emu.dev> Co-committed-by: lizzie <lizzie@eden-emu.dev>	2025-10-17 05:08:51 +02:00
crueter	551f244dfd	[dynarmic, qt] fix build with QuaZip <=1.4 and fmt v9 (#2755 ) Signed-off-by: crueter <crueter@eden-emu.dev> Reviewed-on: eden-emu/eden#2755	2025-10-17 04:20:11 +02:00
MaranBr	ef14303c48	[common] Ensures that the Custom Web Applet will never be enabled if it has not been compiled with the project (#2754 ) This ensures that the Custom Web Applet will never be enabled under any circumstances if it has not been compiled with the project. Reviewed-on: eden-emu/eden#2754 Co-authored-by: MaranBr <maranbr@outlook.com> Co-committed-by: MaranBr <maranbr@outlook.com>	2025-10-17 01:23:48 +02:00
MaranBr	b7021afff6	[hid_core] Quick fix for PR 2747 (#2753 ) Ensures that only the controllers enabled in the settings remain active when the game is running. Reviewed-on: eden-emu/eden#2753 Reviewed-by: Lizzie <lizzie@eden-emu.dev> Co-authored-by: MaranBr <maranbr@outlook.com> Co-committed-by: MaranBr <maranbr@outlook.com>	2025-10-16 20:33:38 +02:00
Ribbit	bfc10723bc	Revert "[vk] Tighten queue wait stages (#2734 )" (#2751 ) Proprietary Qualcomm drivers will not like this change after further research. Co-authored-by: Ribbit <ribbit@placeholder.com> Reviewed-on: eden-emu/eden#2751 Reviewed-by: CamilleLaVey <camillelavey99@gmail.com> Co-authored-by: Ribbit <ribbit@eden-emu.dev> Co-committed-by: Ribbit <ribbit@eden-emu.dev>	2025-10-16 06:48:17 +02:00
Ribbit	30482692c7	Revert "[vk] Fix Vulkan Upload & Present Barriers for Spec Compliance (#2681 )" (#2748 ) Vulkan layout and barrier edits made the GPU use a different shader path that compiled a TLDS instruction with an unaligned register (R157). The old path never generated that case, so the translator’s missing unaligned-register handling only surfaced after this change. Co-authored-by: Ribbit <ribbit@placeholder.com> Reviewed-on: eden-emu/eden#2748 Reviewed-by: MaranBr <maranbr@eden-emu.dev> Co-authored-by: Ribbit <ribbit@eden-emu.dev> Co-committed-by: Ribbit <ribbit@eden-emu.dev>	2025-10-16 03:33:24 +02:00
lizzie	31463142e1	[audio] replace ReaderWriterQueue with the generic SPSC queue (#2745 ) A bit overkill for something that is only used once in the source code - should rather pertain to the generic SPSC queue just to avoid redundant code. If anything should be vendored. Signed-off-by: lizzie <lizzie@eden-emu.dev> Reviewed-on: eden-emu/eden#2745 Reviewed-by: crueter <crueter@eden-emu.dev> Co-authored-by: lizzie <lizzie@eden-emu.dev> Co-committed-by: lizzie <lizzie@eden-emu.dev>	2025-10-16 03:15:20 +02:00
MaranBr	bb836ed6c2	[hid_core] Fix a crash related to setting controls while the game is running (#2747 ) This fixes a crash related to setting controls while the game is running. Fixes BOTW, TOTK, MK8D and possibly others as well. Reviewed-on: eden-emu/eden#2747 Reviewed-by: crueter <crueter@eden-emu.dev> Reviewed-by: Lizzie <lizzie@eden-emu.dev> Co-authored-by: MaranBr <maranbr@outlook.com> Co-committed-by: MaranBr <maranbr@outlook.com>	2025-10-16 03:14:39 +02:00