diff --git a/.github/workflows/build-ci.yml b/.github/workflows/build-ci.yml
index 5cecd78e3..b8a5d3cbd 100644
--- a/.github/workflows/build-ci.yml
+++ b/.github/workflows/build-ci.yml
@@ -10,10 +10,10 @@ jobs:
       fail-fast: false
       matrix:
         #os: [windows-2019, macos-10.15, ubuntu-18.04, ubuntu-20.04]
-        os: [windows-latest, ubuntu-22.04]
+        os: [windows-latest, ubuntu-latest]
         python-version: ["3.9", "3.11"]
         include:
-          - os: ubuntu-22.04
+          - os: ubuntu-latest
             python-version: 3.9
             container: Docker
 
@@ -72,13 +72,12 @@ jobs:
           cd ../qiling
           cd ../examples/rootfs/x86_linux/kernel && unzip -P infected m0hamed_rootkit.ko.zip
           cd ../../../../
-          pip3 install -e .[RE]
-
-          if [ ${{ matrix.os }} == 'ubuntu-18.04' ] and [ ${{ matrix.python-version }} == '3.9' ]; then
+          pip3 install -e .
+          pip3 install poetry
+          if [ ${{ matrix.contrainer }} != "" ]; then
             docker run -it --rm -v ${GITHUB_WORKSPACE}:/qiling qilingframework/qiling:dev bash -c "cd tests && ./test_onlinux.sh"
           else
-            pip3 install poetry
-            cd tests && ./test_onlinux.sh
+            cd tests && ./test_onlinux.sh        
           fi
 
     # - name: mac run tests
diff --git a/README.md b/README.md
index 63f542f38..34a02ef68 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-[![Documentation Status](https://readthedocs.org/projects/qilingframework/badge/?version=latest)](https://docs.qiling.io)
+[![Documentation Status](https://github.com/qilingframework/qiling/wiki)](https://github.com/qilingframework/qiling/wiki)
 [![Downloads](https://pepy.tech/badge/qiling)](https://pepy.tech/project/qiling)
 [![Chat on Telegram](https://img.shields.io/badge/Chat%20on-Telegram-brightgreen.svg)](https://t.me/qilingframework)
 
@@ -8,24 +8,43 @@
 <img width="150" height="150" src="https://raw.githubusercontent.com/qilingframework/qiling/master/docs/qiling2_logo_small.png">
 </p>
 
-[Qiling's use case, blog and related work](https://github.com/qilingframework/qiling/issues/134)
-
-Qiling is an advanced binary emulation framework, with the following features:
-
-- Emulate multi-platforms: Windows, macOS, Linux, Android, BSD, UEFI, DOS, MBR.
-- Emulate multi-architectures: 8086, X86, X86_64, ARM, ARM64, MIPS, RISC-V, PowerPC.
-- Support multiple file formats: PE, Mach-O, ELF, COM, MBR.
-- Support Windows Driver (.sys), Linux Kernel Module (.ko) & macOS Kernel (.kext) via [Demigod](https://groundx.io/demigod/).
-- Emulates & sandbox code in an isolated environment.
-- Provides a fully configurable sandbox.
-- Provides in-depth memory, register, OS level and filesystem level API.
-- Fine-grain instrumentation: allows hooks at various levels
-  (instruction/basic-block/memory-access/exception/syscall/IO/etc.)
-- Provides virtual machine level API such as saving and restoring the current execution state.
-- Supports cross architecture and platform debugging capabilities.
-- Built-in debugger with reverse debugging capability.
-- Allows dynamic hot patch on-the-fly running code, including the loaded library.
-- True framework in Python, making it easy to build customized security analysis tools on top.
+# Qiling Framework
+
+Qiling is an advanced binary emulation framework that allows you to emulate and sandbox code in an isolated environment across multiple platforms and architectures. Built on top of Unicorn Engine, Qiling provides a higher-level framework that understands operating system contexts, executable formats, and dynamic linking.
+
+## Table of Contents
+
+- [Features](#features)
+- [Appearance](#Appearance)
+- [Use Cases](#use-cases)
+- [Quick Start](#quick-start)
+  - [Installation](#installation)
+  - [Basic Usage](#basic-usage)
+- [Qiling vs. Other Emulators](#qiling-vs-other-emulators)
+  - [Qiling vs. Unicorn Engine](#qiling-vs-unicorn-engine)
+  - [Qiling vs. QEMU User Mode](#qiling-vs-qemu-user-mode)
+- [Examples](#examples)
+- [Qltool](#qltool)
+- [Contributing](#contributing)
+- [License](#license)
+- [Contact](#contact)
+- [Core Developers & Contributors](#core-developers--contributors)
+
+## Features
+
+- **Multi-platform Emulation**: Windows, macOS, Linux, Android, BSD, UEFI, DOS, MBR.
+- **Multi-architecture Emulation**: 8086, X86, X86_64, ARM, ARM64, MIPS, RISC-V, PowerPC.
+- **Multiple File Format Support**: PE, Mach-O, ELF, COM, MBR.
+- **Kernel Module Emulation**: Supports Windows Driver (.sys), Linux Kernel Module (.ko) & macOS Kernel (.kext) via [Demigod](https://groundx.io/demigod/).
+- **Isolated Sandboxing**: Emulates & sandboxes code in an isolated environment with a fully configurable sandbox.
+- **In-depth API**: Provides in-depth memory, register, OS level, and filesystem level API.
+- **Fine-grain Instrumentation**: Allows hooks at various levels (instruction/basic-block/memory-access/exception/syscall/IO/etc.).
+- **Virtual Machine Level API**: Supports saving and restoring the current execution state.
+- **Debugging Capabilities**: Supports cross-architecture and platform debugging, including a built-in debugger with reverse debugging capability.
+- **Dynamic Hot Patching**: Allows dynamic hot patching of on-the-fly running code, including loaded libraries.
+- **Python Framework**: A true framework in Python, making it easy to build customized security analysis tools.
+
+## Appearance
 
 Qiling also made its way to various international conferences.
 
@@ -49,79 +68,37 @@ Qiling also made its way to various international conferences.
 - [Nullcon](https://nullcon.net/website/goa-2020/speakers/kaijern-lau.php)
     
 2019:
-
 - [DEFCON, USA](https://www.defcon.org/html/defcon-27/dc-27-demolabs.html#QiLing)
 - [Hitcon](https://hitcon.org/2019/CMT/agenda)
 - [Zeronights](https://zeronights.ru/report-en/qiling-io-advanced-binary-emulation-framework/)
 
+## Use Cases
 
-Qiling is backed by [Unicorn Engine](http://www.unicorn-engine.org).
-
-Visit our [website](https://www.qiling.io) for more information.
-
----
-#### License
-
-This program is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation; either version 2 of the License, or
-(at your option) any later version.
-
----
-
-#### Qiling vs. other Emulators
-
-There are many open-source emulators, but two projects closest to Qiling
-are [Unicorn](http://www.unicorn-engine.org) & [QEMU user mode](https://qemu.org).
-This section explains the main differences of Qiling against them.
-
-##### Qiling vs. Unicorn engine
+Qiling has been presented at various international conferences, showcasing its versatility in:
 
-Built on top of Unicorn, but Qiling & Unicorn are two different animals.
+- Binary analysis and reverse engineering.
+- Malware analysis and sandboxing.
+- Firmware analysis and emulation.
+- Security research and vulnerability discovery.
+- CTF challenges and exploit development.
 
-- Unicorn is just a CPU emulator, so it focuses on emulating CPU instructions,
-  that can understand emulator memory.
-  Beyond that, Unicorn is not aware of higher level concepts, such as dynamic
-  libraries, system calls, I/O handling or executable formats like PE, Mach-O
-  or ELF. As a result, Unicorn can only emulate raw machine instructions,
-  without Operating System (OS) context.
-- Qiling is designed as a higher level framework, that leverages Unicorn to
-  emulate CPU instructions, but can understand OS: it has executable format
-  loaders (for PE, Mach-O & ELF currently), dynamic linkers (so we can
-  load & relocate shared libraries), syscall & IO handlers. For this reason,
-  Qiling can run executable binary without requiring its native OS.
-
-##### Qiling vs. QEMU user mode
-
-QEMU user mode does a similar thing to our emulator, that is, to emulate whole
-executable binaries in a cross-architecture way. 
-However, Qiling offers some important differences against QEMU user mode:
+For more details on Qiling's use cases, blog posts, and related work, please refer to [Qiling's use case, blog and related work](https://github.com/qilingframework/qiling/issues/134).
 
-- Qiling is a true analysis framework,
-  that allows you to build your own dynamic analysis tools on top (in Python).
-  Meanwhile, QEMU is just a tool, not a framework.
-- Qiling can perform dynamic instrumentation, and can even hot patch code at
-  runtime. QEMU does neither.
-- Not only working cross-architecture, Qiling is also cross-platform.
-  For example, you can run Linux ELF file on top of Windows.
-  In contrast, QEMU user mode only runs binary of the same OS, such as Linux
-  ELF on Linux, due to the way it forwards syscall from emulated code to
-  native OS.
-- Qiling supports more platforms, including Windows, macOS, Linux & BSD. QEMU
-  user mode can only handle Linux & BSD.
+## Quick Start
 
----
+### Installation
 
-#### Installation
+Qiling requires Python 3.8 or newer. You can install it using pip:
 
-Please see [setup guide](https://docs.qiling.io/en/latest/install/) file for how to install Qiling Framework.
+```bash
+pip install qiling
+```
 
----
+For more detailed installation instructions and dependencies, please refer to the [official documentation](https://github.com/qilingframework/qiling/wiki/Installation).
 
-#### Examples
+### Basic Usage
 
-The example below shows how to use Qiling framework in the most
-straightforward way to emulate a Windows executable.
+The example below shows how to use Qiling framework in the most straightforward way to emulate a Windows executable.
 
 ```python
 from qiling import Qiling
@@ -135,8 +112,30 @@ if __name__ == "__main__":
     ql.run()
 ```
 
-- The following example shows how a Windows crackme may be patched dynamically
-  to make it always display the “Congratulation” dialog.
+## Qiling vs. Other Emulators
+
+There are many open-source emulators, but two projects closest to Qiling are [Unicorn](http://www.unicorn-engine.org) & [QEMU user mode](https://qemu.org). This section explains the main differences of Qiling against them.
+
+### Qiling vs. Unicorn Engine
+
+Built on top of Unicorn, but Qiling & Unicorn are two different animals.
+
+- **Unicorn** is just a CPU emulator, so it focuses on emulating CPU instructions, that can understand emulator memory. Beyond that, Unicorn is not aware of higher level concepts, such as dynamic libraries, system calls, I/O handling or executable formats like PE, Mach-O or ELF. As a result, Unicorn can only emulate raw machine instructions, without Operating System (OS) context.
+- **Qiling** is designed as a higher level framework, that leverages Unicorn to emulate CPU instructions, but can understand OS: it has executable format loaders (for PE, Mach-O & ELF currently), dynamic linkers (so we can load & relocate shared libraries), syscall & IO handlers. For this reason, Qiling can run executable binary without requiring its native OS.
+
+### Qiling vs. QEMU User Mode
+
+QEMU user mode does a similar thing to our emulator, that is, to emulate whole executable binaries in a cross-architecture way.
+However, Qiling offers some important differences against QEMU user mode:
+
+- **Qiling is a true analysis framework**, that allows you to build your own dynamic analysis tools on top (in Python). Meanwhile, QEMU is just a tool, not a framework.
+- **Qiling can perform dynamic instrumentation**, and can even hot patch code at runtime. QEMU does neither.
+- Not only working cross-architecture, **Qiling is also cross-platform**. For example, you can run Linux ELF file on top of Windows. In contrast, QEMU user mode only runs binary of the same OS, such as Linux ELF on Linux, due to the way it forwards syscall from emulated code to native OS.
+- **Qiling supports more platforms**, including Windows, macOS, Linux & BSD. QEMU user mode can only handle Linux & BSD.
+
+## Examples
+
+- The following example shows how a Windows crackme may be patched dynamically to make it always display the “Congratulation” dialog.
 
 ```python
 from qiling import Qiling
@@ -177,15 +176,13 @@ The below YouTube video shows how the above example works.
 
 #### Emulating ARM router firmware on Ubuntu x64 host
 
-Qiling Framework hot-patches and emulates an ARM router's `/usr/bin/httpd` on
-an x86_64 Ubuntu host.
+Qiling Framework hot-patches and emulates an ARM router's `/usr/bin/httpd` on an x86_64 Ubuntu host.
 
-[![Qiling Tutorial: Emulating and Fuzz ARM router firmware](https://github.com/qilingframework/theme.qiling.io/blob/master/source/img/fuzzer.jpg?raw=true)](https://www.youtube.com/watch?v=e3_T3KLh2NU)
+[![Qiling Tutorial: Emulating and Fuzz ARM router firmware](https://github.com/qilingframework/theme.qiling.io/blob/master/source/img/fuzzer.jpg?raw=true)](https://www.youtube.com/watch?v=e3_T3KLhNUs)
 
 #### Qiling's IDA Pro Plugin: Instrument and Decrypt Mirai's Secret
 
-This video demonstrates how Qiling's IDA Pro plugin can make IDA Pro run with
-Qiling instrumentation engine.
+This video demonstrates how Qiling's IDA Pro plugin can make IDA Pro run with Qiling instrumentation engine.
 
 [![Qiling's IDA Pro Plugin: Instrument and Decrypt Mirai's Secret](http://img.youtube.com/vi/ZWMWTq2WTXk/0.jpg)](http://www.youtube.com/watch?v=ZWMWTq2WTXk)
 
@@ -195,63 +192,62 @@ Solving a simple CTF challenge with Qiling Framework and IDA Pro
 
 [![Solving a simple CTF challenge with Qiling Framework and IDA Pro](https://i.ytimg.com/vi/SPjVAt2FkKA/0.jpg)](https://www.youtube.com/watch?v=SPjVAt2FkKA)
 
-
 #### Emulating MBR
 
 Qiling Framework emulates MBR
 
 [![Qiling DEMO: Emulating MBR](https://github.com/qilingframework/theme.qiling.io/blob/master/source/img/mbr.png?raw=true)](https://github.com/qilingframework/theme.qiling.io/blob/master/source/img/mbr.png?raw=true)
 
----
-
-#### Qltool
+## Qltool
 
 Qiling also provides a friendly tool named `qltool` to quickly emulate shellcode & executable binaries.
 
 With qltool, easy execution can be performed:
 
-
 With shellcode:
 
-```
+```bash
 $ ./qltool code --os linux --arch arm --format hex -f examples/shellcodes/linarm32_tcp_reverse_shell.hex
 ```
 
 With binary file:
 
-```
+```bash
 $ ./qltool run -f examples/rootfs/x8664_linux/bin/x8664_hello --rootfs  examples/rootfs/x8664_linux/
 ```
 
 With binary and GDB debugger enabled:
 
-```
+```bash
 $ ./qltool run -f examples/rootfs/x8664_linux/bin/x8664_hello --gdb 127.0.0.1:9999 --rootfs examples/rootfs/x8664_linux
 ```
 
 With code coverage collection (UEFI only for now):
 
-```
+```bash
 $ ./qltool run -f examples/rootfs/x8664_efi/bin/TcgPlatformSetupPolicy --rootfs examples/rootfs/x8664_efi --coverage-format drcov --coverage-file TcgPlatformSetupPolicy.cov
 ```
 
 With JSON output (Windows, mainly):
 
-```
+```bash
 $ ./qltool run -f examples/rootfs/x86_windows/bin/x86_hello.exe --rootfs  examples/rootfs/x86_windows/ --console False --json
 ```
----
 
+## Contributing
 
-#### Contact
+We welcome contributions from the community! If you're interested in contributing to Qiling Framework, please check out our [GitHub repository](https://github.com/qilingframework/qiling) and look for open issues or submit a pull request.
 
-Get the latest info from our website https://www.qiling.io
+## License
 
-Contact us at email info@qiling.io,
-via Twitter [@qiling_io](https://twitter.com/qiling_io).
+This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
 
----
+## Contact
+
+Get the latest info from our website [https://www.qiling.io](https://www.qiling.io)
+
+Contact us at email [info@qiling.io](mailto:info@qiling.io), or via Twitter [@qiling_io](https://twitter.com/qiling_io).
 
-#### Core developers, Key Contributors and etc.
+## Core Developers & Contributors
 
-Please refer to [CREDITS.md](https://github.com/qilingframework/qiling/blob/dev/CREDITS.md).
+Please refer to [CREDITS.md](https://github.com/qilingframework/qiling/blob/dev/CREDITS.md).
\ No newline at end of file
diff --git a/examples/blob_raw.ql b/examples/blob_raw.ql
new file mode 100644
index 000000000..23390130a
--- /dev/null
+++ b/examples/blob_raw.ql
@@ -0,0 +1,4 @@
+[CODE]
+load_address = 0x10000000
+entry_point = 0x10000008
+ram_size = 0xa00000
\ No newline at end of file
diff --git a/examples/fuzzing/dlink_dir815/dir815_mips32el_linux.py b/examples/fuzzing/dlink_dir815/dir815_mips32el_linux.py
index 28afce921..22ea02ffb 100644
--- a/examples/fuzzing/dlink_dir815/dir815_mips32el_linux.py
+++ b/examples/fuzzing/dlink_dir815/dir815_mips32el_linux.py
@@ -5,7 +5,7 @@
 
 # Everything about the bug and firmware https://www.exploit-db.com/exploits/33863
 
-import os,sys
+import sys
 sys.path.append("../../..")
 
 from qiling import Qiling
@@ -13,7 +13,7 @@
 from qiling.extensions.afl import ql_afl_fuzz
 
 
-def main(input_file, enable_trace=False):
+def main(input_file: str):
 
     env_vars = {
         "REQUEST_METHOD": "POST",
@@ -24,40 +24,36 @@ def main(input_file, enable_trace=False):
         # "CONTENT_LENGTH": "8", # no needed
     }
 
-    ql = Qiling(["./rootfs/htdocs/web/hedwig.cgi"], "./rootfs",
-                verbose=QL_VERBOSE.DEBUG, env=env_vars, console=enable_trace)
+    ql = Qiling(["./rootfs/htdocs/web/hedwig.cgi"], "./rootfs", verbose=QL_VERBOSE.DISABLED, env=env_vars)
 
-    def place_input_callback(ql: Qiling, input: bytes, _: int):
-        env_var = ("HTTP_COOKIE=uid=1234&password=").encode()
-        env_vars = env_var + input + b"\x00" + (ql.path).encode() + b"\x00"
-        ql.mem.write(ql.target_addr, env_vars)
+    def place_input_callback(ql: Qiling, data: bytes, _: int) -> bool:
+        # construct the payload
+        payload = b''.join((b"HTTP_COOKIE=uid=1234&password=", bytes(data), b"\x00", ql_path, b"\x00"))
 
-    def start_afl(_ql: Qiling):
+        # patch the value of 'HTTP_COOKIE' in memory
+        ql.mem.write(target_addr, payload)
+
+        # payload is in place, we are good to go
+        return True
 
+    def start_afl(_ql: Qiling):
         """
         Callback from inside
         """
+
         ql_afl_fuzz(_ql, input_file=input_file, place_input_callback=place_input_callback, exits=[ql.os.exit_point])
 
-    addr = ql.mem.search("HTTP_COOKIE=uid=1234&password=".encode())
-    ql.target_addr = addr[0]
+    addr = ql.mem.search(b"HTTP_COOKIE=uid=1234&password=")
+    target_addr = addr[0]
+    ql_path = ql.path.encode()
 
-    main_addr = ql.loader.elf_entry
-    ql.hook_address(callback=start_afl, address=main_addr)
+    ql.hook_address(start_afl, ql.loader.elf_entry)
 
-    try:
-        ql.run()
-        os._exit(0)
-    except:
-        if enable_trace:
-            print("\nFuzzer Went Shit")
-        os._exit(0)
+    ql.run()
 
 
 if __name__ == "__main__":
-    if len(sys.argv) == 1:
+    if len(sys.argv) < 2:
         raise ValueError("No input file provided.")
-    if len(sys.argv) > 2 and sys.argv[1] == "-t":
-        main(sys.argv[2], enable_trace=True)
-    else:
-        main(sys.argv[1])
+
+    main(sys.argv[1])
diff --git a/examples/hello_arm_blob_raw.py b/examples/hello_arm_blob_raw.py
new file mode 100644
index 000000000..4c257166e
--- /dev/null
+++ b/examples/hello_arm_blob_raw.py
@@ -0,0 +1,101 @@
+##############################################################################
+# This example is meant to demonstrate the modifications necessary 
+# to enable code coverage when emulating small code snippets or bare-metal 
+# code.
+##############################################################################
+from qiling import Qiling
+from qiling.const import QL_ARCH, QL_OS, QL_VERBOSE
+from qiling.extensions.coverage import utils as cov_utils
+from qiling.loader.loader import Image
+import os
+
+BASE_ADDRESS = 0x10000000
+CHECKSUM_FUNC_ADDR = BASE_ADDRESS + 0x8
+END_ADDRESS = 0x100000ba
+DATA_ADDR = 0xa0000000 # Arbitrary address for data
+STACK_ADDR = 0xb0000000 # Arbitrary address for stack
+
+# Python implementation of the checksum function being emulated
+# This checksum function is intended to have different code paths based on the input
+# which is useful for observing code coverage
+def checksum_function(input_data_buffer: bytes):
+    expected_checksum_python = 0
+    input_data_len = len(input_data_buffer)
+    if input_data_len >= 1 and input_data_buffer[0] == 0xDE: # MAGIC_VALUE_1
+        for i in range(min(input_data_len, 4)):
+            expected_checksum_python += input_data_buffer[i]
+        expected_checksum_python += 0x10
+    elif input_data_len >= 2 and input_data_buffer[1] == 0xAD: # MAGIC_VALUE_2
+        for i in range(input_data_len):
+            expected_checksum_python ^= input_data_buffer[i]
+        expected_checksum_python += 0x20
+    else:
+        for i in range(input_data_len):
+            expected_checksum_python += input_data_buffer[i]
+    expected_checksum_python &= 0xFF # Ensure it's a single byte
+    return expected_checksum_python
+
+def unmapped_handler(ql: Qiling, type: int, addr: int, size: int, value: int) -> None:
+    print(f"Unmapped Memory R/W, trying to access {size:d} bytes at {addr:#010x} from {ql.arch.regs.pc:#010x}")
+
+def emulate_checksum_function(input_data_buffer: bytes) -> None:
+    print(f"\n--- Testing with input: {input_data_buffer.hex()} ---")
+
+    test_file = "rootfs/blob/example_raw.bin"
+
+    with open(test_file, "rb") as f:
+        raw_code: bytes = f.read()
+
+    ql: Qiling = Qiling(
+        code=raw_code,
+        archtype=QL_ARCH.ARM,
+        ostype=QL_OS.BLOB,
+        profile="blob_raw.ql",
+        verbose=QL_VERBOSE.DEBUG,
+        thumb=True
+    )
+
+    ''' monkeypatch - Correcting the loader image name, used for coverage collection
+    removing all images with name 'blob_code' that were created by the blob loader. 
+    This is necessary because some code coverage visualization tools require the 
+    module name to match that of the input file '''
+    ql.loader.images = [img for img in ql.loader.images if img.path != 'blob_code']
+    ql.loader.images.append(Image(ql.loader.load_address, ql.loader.load_address + ql.os.code_ram_size, os.path.basename(test_file)))
+
+    input_data_len: int = len(input_data_buffer)
+
+    # Map memory for the data and stack
+    ql.mem.map(STACK_ADDR, 0x2000)
+    ql.mem.map(DATA_ADDR, ql.mem.align_up(input_data_len + 0x100)) # Map enough space for data
+
+    # Write input data
+    ql.mem.write(DATA_ADDR, input_data_buffer)
+
+    # Set up the stack pointer
+    ql.arch.regs.sp = STACK_ADDR + 0x2000 - 4
+    # Set up argument registers
+    ql.arch.regs.r0 = DATA_ADDR
+    ql.arch.regs.r1 = input_data_len
+
+    # Set the program counter to the function's entry point
+    ql.arch.regs.pc = CHECKSUM_FUNC_ADDR
+
+    # Set the return address (LR) to a dummy address.
+    ql.arch.regs.lr = 0xbebebebe
+
+    ql.hook_mem_unmapped(unmapped_handler)
+    #ql.debugger="gdb:127.0.0.1:9999"
+
+    # Start emulation
+    print(f"Starting emulation at PC: {hex(ql.arch.regs.pc)}")
+    try:
+        with cov_utils.collect_coverage(ql, 'drcov', 'output.cov'):
+            ql.run(begin=CHECKSUM_FUNC_ADDR, end=END_ADDRESS)
+    except Exception as e:
+        print(f"Emulation error: {e}")
+
+    print(f"Emulated checksum: {hex(ql.arch.regs.r0)}")
+
+if __name__ == "__main__":
+    data = b"\x01\x02\x03\x04\x05"  # Example input data
+    emulate_checksum_function(data)
\ No newline at end of file
diff --git a/examples/hello_arm_uboot.py b/examples/hello_arm_uboot.py
index 9544fe0ee..f97ff6eff 100644
--- a/examples/hello_arm_uboot.py
+++ b/examples/hello_arm_uboot.py
@@ -8,68 +8,82 @@
 
 from qiling.core import Qiling
 from qiling.const import QL_ARCH, QL_OS, QL_VERBOSE
-from qiling.os.const import STRING
+from qiling.os.const import STRING, SIZE_T, POINTER
 
 
-def get_kaimendaji_password():
-    def my_getenv(ql: Qiling):
-        env = {
-            "ID"      : b"000000000000000",
-            "ethaddr" : b"11:22:33:44:55:66"
-        }
+def my_getenv(ql: Qiling):
+    env = {
+        "ID"      : b"000000000000000",
+        "ethaddr" : b"11:22:33:44:55:66"
+    }
 
-        params = ql.os.resolve_fcall_params({'key': STRING})
-        value = env.get(params["key"], b"")
+    params = ql.os.resolve_fcall_params({'key': STRING})
+    value = env.get(params["key"], b"")
 
-        value_addr = ql.os.heap.alloc(len(value))
-        ql.mem.write(value_addr, value)
+    value_addr = ql.os.heap.alloc(len(value))
+    ql.mem.write(value_addr, value)
 
-        ql.arch.regs.r0 = value_addr
-        ql.arch.regs.arch_pc = ql.arch.regs.lr
+    ql.arch.regs.r0 = value_addr
+    ql.arch.regs.arch_pc = ql.arch.regs.lr
 
-    def get_password(ql: Qiling):
-        password_raw = ql.mem.read(ql.arch.regs.r0, ql.arch.regs.r2)
 
-        password = ''
-        for item in password_raw:
-            if 0 <= item <= 9:
-                password += chr(item + 48)
-            else:
-                password += chr(item + 87)
+def get_password(ql: Qiling):
+    # we land on a memcmp call, where the real password is being compared to
+    # the one provided by the user. we can follow the arguments to read the
+    # real password
 
-        print("The password is: %s" % password)
+    params = ql.os.resolve_fcall_params({
+        'ptr1': POINTER,    # points to real password
+        'ptr2': POINTER,    # points to user provided password
+        'size': SIZE_T      # comparison length
+        })
 
-    def partial_run_init(ql: Qiling):
-        # argv prepare
-        ql.arch.regs.arch_sp -= 0x30
-        arg0_ptr = ql.arch.regs.arch_sp
-        ql.mem.write(arg0_ptr, b"kaimendaji")
+    ptr1 = params['ptr1']
+    size = params['size']
 
-        ql.arch.regs.arch_sp -= 0x10
-        arg1_ptr = ql.arch.regs.arch_sp
-        ql.mem.write(arg1_ptr, b"000000")   # arbitrary password
+    password_raw = ql.mem.read(ptr1, size)
 
-        ql.arch.regs.arch_sp -= 0x20
-        argv_ptr = ql.arch.regs.arch_sp
-        ql.mem.write_ptr(argv_ptr, arg0_ptr)
-        ql.mem.write_ptr(argv_ptr + ql.arch.pointersize, arg1_ptr)
+    def __hex_digit(ch: int) -> str:
+        off = ord('0') if ch in range(10) else ord('a') - 10
 
-        ql.arch.regs.r2 = 2
-        ql.arch.regs.r3 = argv_ptr
+        return chr(ch + off)
 
-    with open("../examples/rootfs/blob/u-boot.bin.img", "rb") as f:
-        uboot_code = f.read()
+    # should be: "013f1f"
+    password = "".join(__hex_digit(ch) for ch in password_raw)
 
-    ql = Qiling(code=uboot_code[0x40:], archtype=QL_ARCH.ARM, ostype=QL_OS.BLOB, profile="uboot_bin.ql", verbose=QL_VERBOSE.OFF)
+    print(f'The password is: {password}')
 
-    image_base_addr = ql.loader.load_address
-    ql.hook_address(my_getenv, image_base_addr + 0x13AC0)
-    ql.hook_address(get_password, image_base_addr + 0x48634)
 
-    partial_run_init(ql)
+def partial_run_init(ql: Qiling):
+    # argv prepare
+    ql.arch.regs.arch_sp -= 0x30
+    arg0_ptr = ql.arch.regs.arch_sp
+    ql.mem.write(arg0_ptr, b"kaimendaji")
+
+    ql.arch.regs.arch_sp -= 0x10
+    arg1_ptr = ql.arch.regs.arch_sp
+    ql.mem.write(arg1_ptr, b"000000")   # arbitrary password
 
-    ql.run(image_base_addr + 0x486B4, image_base_addr + 0x48718)
+    ql.arch.regs.arch_sp -= 0x20
+    argv_ptr = ql.arch.regs.arch_sp
+    ql.mem.write_ptr(argv_ptr, arg0_ptr)
+    ql.mem.write_ptr(argv_ptr + ql.arch.pointersize, arg1_ptr)
+
+    ql.arch.regs.r2 = 2
+    ql.arch.regs.r3 = argv_ptr
 
 
 if __name__ == "__main__":
-    get_kaimendaji_password()
+    with open("../examples/rootfs/blob/u-boot.bin.img", "rb") as f:
+        uboot_code = f.read()
+
+    ql = Qiling(code=uboot_code[0x40:], archtype=QL_ARCH.ARM, ostype=QL_OS.BLOB, profile="uboot_bin.ql", verbose=QL_VERBOSE.DEBUG)
+
+    imgbase = ql.loader.images[0].base
+
+    ql.hook_address(my_getenv, imgbase + 0x13AC0)
+    ql.hook_address(get_password, imgbase + 0x48634)
+
+    partial_run_init(ql)
+
+    ql.run(imgbase + 0x486B4, imgbase + 0x48718)
diff --git a/examples/rootfs b/examples/rootfs
index 6d4d654fd..120fb6d37 160000
--- a/examples/rootfs
+++ b/examples/rootfs
@@ -1 +1 @@
-Subproject commit 6d4d654fdc2892490d98c433eca3efa5c6d062c7
+Subproject commit 120fb6d37700a2d4c0e35ced599aaee7a8f98723
diff --git a/examples/sality.py b/examples/sality.py
index 22d6f6515..be05753ba 100644
--- a/examples/sality.py
+++ b/examples/sality.py
@@ -159,7 +159,7 @@ def hook_StartServiceA(ql: Qiling, address: int, params):
                 init_unseen_symbols(ql.amsint32_driver, ntoskrnl.base+0xb7695, b"NtTerminateProcess", 0, "ntoskrnl.exe")
                 #ql.amsint32_driver.debugger= ":9999"
                 try:
-                    ql.amsint32_driver.load()
+                    ql.amsint32_driver.run()
                     return 1
                 except UcError as e:
                     print("Load driver error: ", e)
diff --git a/examples/scripts/dllscollector.bat b/examples/scripts/dllscollector.bat
index 2b85a83e9..b0707bba2 100644
--- a/examples/scripts/dllscollector.bat
+++ b/examples/scripts/dllscollector.bat
@@ -94,6 +94,9 @@ CALL :collect_dll32 wininet.dll
 CALL :collect_dll32 winmm.dll
 CALL :collect_dll32 ws2_32.dll
 CALL :collect_dll32 wsock32.dll
+CALL :collect_dll32 msvcp140.dll
+CALL :collect_dll32 msvcp140_1.dll
+CALL :collect_dll32 msvcp140_2.dll
 
 CALL :collect_dll32 downlevel\api-ms-win-core-fibers-l1-1-1.dll
 CALL :collect_dll32 downlevel\api-ms-win-core-localization-l1-2-1.dll
@@ -131,6 +134,9 @@ CALL :collect_dll64 win32u.dll
 CALL :collect_dll64 winhttp.dll
 CALL :collect_dll64 wininet.dll
 CALL :collect_dll64 ws2_32.dll
+CALL :collect_dll64 msvcp140.dll
+CALL :collect_dll64 msvcp140_1.dll
+CALL :collect_dll64 msvcp140_2.dll
 
 CALL :collect_dll64 downlevel\api-ms-win-crt-heap-l1-1-0.dll
 CALL :collect_dll64 downlevel\api-ms-win-crt-locale-l1-1-0.dll
diff --git a/examples/src/blob/Makefile b/examples/src/blob/Makefile
new file mode 100644
index 000000000..74966f268
--- /dev/null
+++ b/examples/src/blob/Makefile
@@ -0,0 +1,52 @@
+# Makefile for Bare-Metal ARM Checksum Calculator
+
+# --- Toolchain Definitions ---
+TOOLCHAIN_PREFIX = arm-none-eabi
+
+# Compiler, Linker, and Objcopy executables
+CC = $(TOOLCHAIN_PREFIX)-gcc
+LD = $(TOOLCHAIN_PREFIX)-gcc
+OBJCOPY = $(TOOLCHAIN_PREFIX)-objcopy
+
+# --- Source and Output Files ---
+SRCS = example_raw.c
+OBJS = $(SRCS:.c=.o) # Convert .c to .o
+ELF = example_raw.elf
+BIN = example_raw.bin
+
+# --- Linker Script ---
+LDSCRIPT = linker.ld
+
+# --- Compiler Flags ---
+CFLAGS = -c -O0 -mcpu=cortex-a7 -mthumb -ffreestanding -nostdlib
+
+# --- Linker Flags ---
+LDFLAGS = -T $(LDSCRIPT) -nostdlib
+
+# --- Objcopy Flags ---
+OBJCOPYFLAGS = -O binary
+
+# --- Default Target ---
+.PHONY: all clean
+
+all: $(BIN)
+
+# Rule to build the raw binary (.bin) from the ELF file
+$(BIN): $(ELF)
+	$(OBJCOPY) $(OBJCOPYFLAGS) $< $@
+	@echo "Successfully created $(BIN)"
+
+# Rule to link the object file into an ELF executable
+$(ELF): $(OBJS) $(LDSCRIPT)
+	$(LD) $(LDFLAGS) $(OBJS) -o $@
+	@echo "Successfully linked $(ELF)"
+
+# Rule to compile the C source file into an object file
+%.o: %.c
+	$(CC) $(CFLAGS) $< -o $@
+	@echo "Successfully compiled $<"
+
+# --- Clean Rule ---
+clean:
+	rm -f $(OBJS) $(ELF) $(BIN)
+	@echo "Cleaned build artifacts."
diff --git a/examples/src/blob/example_raw.c b/examples/src/blob/example_raw.c
new file mode 100644
index 000000000..13cd70779
--- /dev/null
+++ b/examples/src/blob/example_raw.c
@@ -0,0 +1,56 @@
+// example checksum algorithm to demonstrate raw binary code coverage in qiling
+// example_raw.c
+
+// Define some magic values
+#define MAGIC_VALUE_1 0xDE
+#define MAGIC_VALUE_2 0xAD
+
+// This function calculates a checksum with branches based on input data
+// It takes a pointer to data and its length
+// Returns the checksum (unsigned char to fit in a byte)
+unsigned char calculate_checksum(const unsigned char *data, unsigned int length) {
+    unsigned char checksum = 0;
+
+    // Branch 1: Check for MAGIC_VALUE_1 at the start
+    if (length >= 1 && data[0] == MAGIC_VALUE_1) {
+        // If first byte is MAGIC_VALUE_1, do a simple sum of first 4 bytes
+        // (or up to length if less than 4)
+        for (unsigned int i = 0; i < length && i < 4; i++) {
+            checksum += data[i];
+        }
+        // Add a fixed offset to make this path distinct
+        checksum += 0x10;
+    }
+    // Branch 2: Check for MAGIC_VALUE_2 at the second byte
+    else if (length >= 2 && data[1] == MAGIC_VALUE_2) {
+        // If second byte is MAGIC_VALUE_2, do a XOR sum of all bytes
+        for (unsigned int i = 0; i < length; i++) {
+            checksum ^= data[i];
+        }
+        // Add a fixed offset to make this path distinct
+        checksum += 0x20;
+    }
+    // Default Branch: Standard byte sum checksum
+    else {
+        for (unsigned int i = 0; i < length; i++) {
+            checksum += data[i];
+        }
+    }
+
+    return checksum;
+}
+
+// Minimal entry point for bare-metal.
+// This function will not be called directly during Qiling emulation,
+// but it's needed for the linker to have an entry point.
+__attribute__((section(".text.startup")))
+void _start() {
+    // In a real bare-metal application, this would initialize hardware,
+    // set up stacks, etc. For this example, it's just a placeholder.
+    // We'll call calculate_checksum directly from our Qiling script.
+
+    while (1) {
+        // Do nothing, or perhaps put the CPU to sleep
+        asm volatile ("wfi"); // Wait For Interrupt (ARM instruction)
+    }
+}
\ No newline at end of file
diff --git a/examples/src/blob/linker.ld b/examples/src/blob/linker.ld
new file mode 100644
index 000000000..ae31f2fa3
--- /dev/null
+++ b/examples/src/blob/linker.ld
@@ -0,0 +1,39 @@
+/* linker.ld */
+
+ENTRY(_start) /* Define the entry point of our program */
+
+/* Define memory regions - simple RAM region for this example */
+MEMORY
+{
+    ram (rwx) : ORIGIN = 0x10000000, LENGTH = 64K /* 64KB of RAM for our program */
+}
+
+SECTIONS
+{
+    /* Define the start of our program in memory.
+     */
+    . = 0x10000000;
+
+    .text : {
+        KEEP(*(.text.startup)) /* Keep the _start function */
+        *(.text)             /* All other code */
+        *(.text.*)
+        *(.rodata)           /* Read-only data */
+        *(.rodata.*)
+        . = ALIGN(4);
+    } > ram /* Place .text section in the 'ram' region */
+
+    .data : {
+        . = ALIGN(4);
+        *(.data)             /* Initialized data */
+        *(.data.*)
+        . = ALIGN(4);
+    } > ram
+
+    .bss : {
+        . = ALIGN(4);
+        *(.bss)
+        *(.bss.*)
+        . = ALIGN(4);
+    } > ram
+}
\ No newline at end of file
diff --git a/examples/src/windows/except/CppHelloWorld.cpp b/examples/src/windows/except/CppHelloWorld.cpp
new file mode 100644
index 000000000..4b78ac15d
--- /dev/null
+++ b/examples/src/windows/except/CppHelloWorld.cpp
@@ -0,0 +1,11 @@
+// This is the default Hello World program generated by Visual Studio 2022.
+
+#include <iostream>
+
+int main()
+{
+    std::cout << "Hello World!\n";
+
+    return 0;
+}
+
diff --git a/examples/src/windows/except/README b/examples/src/windows/except/README
new file mode 100644
index 000000000..8dfda022b
--- /dev/null
+++ b/examples/src/windows/except/README
@@ -0,0 +1,3 @@
+In this folder: Sources for programs intended to help test C++ features and software exceptions.
+
+Compile with MSVC (Visual Studio 2022)
\ No newline at end of file
diff --git a/examples/src/windows/except/TestCppEx.cpp b/examples/src/windows/except/TestCppEx.cpp
new file mode 100644
index 000000000..bd6fa46e3
--- /dev/null
+++ b/examples/src/windows/except/TestCppEx.cpp
@@ -0,0 +1,95 @@
+#include <iostream>
+#include <cstdlib>
+
+/*
+ * Test simple try..catch.
+ */
+void test1()
+{
+    std::cout << "y";
+
+    try {
+        std::cout << "y";
+        throw (unsigned int)0x12345678;
+        std::cout << "n";
+    }
+    catch(unsigned int n) {
+        n;
+        std::cout << "y";
+    }
+
+    std::cout << "y";
+}
+
+/*
+ * Test simple try..catch with throw.
+ */
+void test2()
+{
+    std::cout << "y";
+
+    try {
+        std::cout << "y";
+        throw (unsigned int)0x12345679;
+        std::cout << "n";
+    }
+    catch (unsigned int n) {
+        n;
+        if (n == 0x12345679) {
+            std::cout << "y";
+        }
+        else {
+            std::cout << "n";
+        }
+    }
+
+    std::cout << "y";
+}
+
+/*
+ * Test nested try..catch with throw.
+ */
+void test3()
+{
+    std::cout << "y";
+
+    try {
+        std::cout << "y";
+
+        try {
+            std::cout << "y";
+            throw (unsigned int)0x1234567A;
+            std::cout << "n";
+        }
+        catch (unsigned int n) {
+            n;
+            if (n == 0x1234567A) {
+                std::cout << "y";
+            }
+            else {
+                std::cout << "n";
+            }
+        }
+        
+        std::cout << "y";
+    }
+    catch (unsigned int n) {
+        n;
+        std::cout << "n";
+    }
+
+    std::cout << "y";
+}
+
+int main()
+{
+    /*
+     * For this program, all subtests successful will print:
+     * - 14 'y'
+     * - 0 'n'
+     */
+
+    test1();
+    test2();
+    test3();
+}
diff --git a/examples/src/windows/except/TestCppExUnhandled.cpp b/examples/src/windows/except/TestCppExUnhandled.cpp
new file mode 100644
index 000000000..0074d1f4f
--- /dev/null
+++ b/examples/src/windows/except/TestCppExUnhandled.cpp
@@ -0,0 +1,46 @@
+#include <windows.h>
+#include <cstdio>
+
+LONG WINAPI CustomExceptionFilter(EXCEPTION_POINTERS* ExceptionInfo) {
+    printf("Inside exception filter (GOOD)\n");
+    DWORD exceptionCode = (DWORD)ExceptionInfo->ExceptionRecord->ExceptionCode;
+    printf("Exception Code: 0x%X\n", exceptionCode);
+
+    if (exceptionCode == 0xE06D7363) { // code for C++ exception
+        printf("Exception code DOES match, GOOD\n");
+    }
+    else {
+        printf("Exception code DOES NOT match, BAD\n");
+    }
+
+    printf("Exception Address: 0x%llx\n", (ULONGLONG)ExceptionInfo->ExceptionRecord->ExceptionAddress);
+
+    printf("After printing exception: (GOOD)\n");
+    
+    return EXCEPTION_EXECUTE_HANDLER;
+}
+
+int main() {
+    /*
+     * For this program, all subtests successful will print:
+     * - 3 'GOOD'
+     * - 0 'BAD'
+     * 
+     * It is expected that the program terminates abnormally
+     * with status code 0xE06D7363 (C++ exception)
+     */
+
+    // Set the custom top-level exception filter
+    SetUnhandledExceptionFilter(CustomExceptionFilter);
+
+    // Throw an unhandled exception.
+    // It should be caught by our filter.
+    throw (unsigned int)5;
+
+    // We should never reach this point, because the exception
+    // dispatcher should terminate the program after our unhandled
+    // exception filter is called.
+    printf("After exception filter (BAD)\n");
+
+    return 0;
+}
\ No newline at end of file
diff --git a/examples/src/windows/except/TestCppExUnhandled2.cpp b/examples/src/windows/except/TestCppExUnhandled2.cpp
new file mode 100644
index 000000000..600855cc1
--- /dev/null
+++ b/examples/src/windows/except/TestCppExUnhandled2.cpp
@@ -0,0 +1,21 @@
+#include <iostream>
+#include <cstdio>
+
+int main()
+{
+    /*
+     * For this program, all subtests successful will print:
+     * - 1 'GOOD'
+     * - 0 'BAD'
+     * 
+     * It is expected that the program terminates abnormally
+     * with status code 0xC0000409 (stack buffer overrun/security
+     * check failure)
+     */
+
+    printf("Before throw (GOOD)\n");
+
+    throw (unsigned int)5;
+
+    printf("After throw (BAD)\n");
+}
diff --git a/examples/src/windows/except/TestCppTypes.cpp b/examples/src/windows/except/TestCppTypes.cpp
new file mode 100644
index 000000000..42b8da21e
--- /dev/null
+++ b/examples/src/windows/except/TestCppTypes.cpp
@@ -0,0 +1,93 @@
+#include <iostream>
+
+struct TestStruct {
+    float q;
+};
+
+class TestClass {
+public:
+    int x, y;
+    virtual ~TestClass() {
+        std::cout << "TestClass destructor, GOOD" << std::endl;
+    };
+    void yyy() {
+        std::cout << "REALLY GOOD" << std::endl;
+    }
+};
+
+class Something {
+public:
+    char z;
+    virtual ~Something() {
+        std::cout << "Something destructor, GOOD" << std::endl;
+    };
+    virtual void zzz() {
+        std::cout << "BAD" << std::endl;
+    };
+};
+
+class TestClass2 : public TestClass, public Something {
+public:
+    int z;
+    virtual ~TestClass2() {
+        std::cout << "TestClass2 destructor, GOOD" << std::endl;
+    };
+    virtual void zzz() {
+        std::cout << "GOOD" << std::endl;
+    };
+};
+
+int main()
+{
+    /*
+     * For this program, all subtests successful will print:
+     * - 12 'GOOD'
+     * - 0 'BAD'
+     */
+
+    int x = 5;
+    TestClass p;
+    TestStruct s;
+
+    std::cout << typeid(x).name() << std::endl;
+    if (strcmp(typeid(x).name(), "int") == 0) {
+        std::cout << "typeid(x) is int, GOOD" << std::endl;
+    }
+    else {
+        std::cout << "typeid(x) is NOT int, BAD" << std::endl;
+    }
+
+    std::cout << typeid(p).name() << std::endl;
+    if (strcmp(typeid(p).name(), "class TestClass") == 0) {
+        std::cout << "typeid(p) is \"class TestClass\", GOOD" << std::endl;
+    }
+    else {
+        std::cout << "typeid(p) is NOT \"class TestClass\", BAD" << std::endl;
+    }
+
+    std::cout << typeid(s).name() << std::endl;
+    if (strcmp(typeid(s).name(), "struct TestStruct") == 0) {
+        std::cout << "typeid(s) is \"struct TestStruct\", GOOD" << std::endl;
+    }
+    else {
+        std::cout << "typeid(s) is NOT \"struct TestStruct\", BAD" << std::endl;
+    }
+
+    std::cout << "Reached virtual methods and dynamic_cast test. GOOD" << std::endl;
+
+    TestClass2* kz = new TestClass2;
+
+    Something* ks = static_cast<Something*>(kz);
+
+    ks->zzz();
+
+    TestClass* pk = dynamic_cast<TestClass*>(ks);
+
+    pk->yyy();
+
+    std::cout << "Reached virtual destructor test. GOOD" << std::endl;
+
+    delete pk;
+
+    std::cout << "Finished all tests. GOOD" << std::endl;
+}
diff --git a/examples/src/windows/except/TestSoftSEH.cpp b/examples/src/windows/except/TestSoftSEH.cpp
new file mode 100644
index 000000000..2578b7aae
--- /dev/null
+++ b/examples/src/windows/except/TestSoftSEH.cpp
@@ -0,0 +1,45 @@
+#include <windows.h>
+#include <cstdio>
+
+void test1() {
+    __try {
+        printf("Inside __try block. (GOOD)\n");
+
+        RaiseException(
+            0xE0000001,
+            0,
+            0,
+            nullptr
+        );
+
+        printf("After RaiseException. (BAD)\n");
+    }
+    __except (EXCEPTION_EXECUTE_HANDLER) {
+        printf("In __except block. (GOOD)\n");
+
+        unsigned long excepCode = GetExceptionCode();
+
+        printf("Exception code=0x%x\n", excepCode);
+
+        if (excepCode == 0xE0000001) {
+            printf("Exception code IS same, GOOD\n");
+        }
+        else {
+            printf("Exception code DOES NOT MATCH, BAD\n");
+        }
+    }
+
+    printf("After __except block. (GOOD)\n");
+}
+
+int main() {
+    /*
+     * For this program, all subtests successful will print:
+     * - 4 'GOOD'
+     * - 0 'BAD'
+     */
+
+    test1();
+
+    return 0;
+}
diff --git a/examples/tendaac1518_httpd.py b/examples/tendaac1518_httpd.py
index 0a32fd275..165aff1f2 100644
--- a/examples/tendaac1518_httpd.py
+++ b/examples/tendaac1518_httpd.py
@@ -78,6 +78,8 @@ def __vfork(ql: Qiling):
 
         ql.os.set_syscall('vfork', __vfork)
 
+    os.unlink(fr'{ROOTFS}/proc/sys/kernel/core_pattern')
+
     ql.run()
 
 
diff --git a/examples/uboot_bin.ql b/examples/uboot_bin.ql
index b7f7216c8..1e95311fe 100644
--- a/examples/uboot_bin.ql
+++ b/examples/uboot_bin.ql
@@ -1,6 +1,8 @@
 [CODE]
 ram_size = 0xa00000
+load_address = 0x80800000
 entry_point = 0x80800000
+heap_address = 0xa0000000
 heap_size = 0x300000
 
 
diff --git a/pyproject.toml b/pyproject.toml
index 71a7d8ee2..9b2b7b75e 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -28,7 +28,7 @@ keywords = [
 
 [tool.poetry.dependencies]
 python = "^3.8"
-capstone = "^4"
+capstone = "^5"
 unicorn = "2.1.3"
 pefile = ">=2022.5.30"
 python-registry = "^1.3.1"
diff --git a/qiling/arch/arm64.py b/qiling/arch/arm64.py
index bfe54e38e..f3b634800 100644
--- a/qiling/arch/arm64.py
+++ b/qiling/arch/arm64.py
@@ -45,7 +45,8 @@ def regs(self) -> QlRegisterManager:
             **arm64_const.reg_map_q,
             **arm64_const.reg_map_s,
             **arm64_const.reg_map_w,
-            **arm64_const.reg_map_v
+            **arm64_const.reg_map_v,
+            **arm64_const.reg_map_fp
         )
 
         pc_reg = 'pc'
diff --git a/qiling/arch/arm64_const.py b/qiling/arch/arm64_const.py
index eaadb8363..c254ca37f 100644
--- a/qiling/arch/arm64_const.py
+++ b/qiling/arch/arm64_const.py
@@ -68,6 +68,7 @@
     "pc": UC_ARM64_REG_PC,
     "lr": UC_ARM64_REG_LR,
     "cpacr_el1": UC_ARM64_REG_CPACR_EL1,
+    "pstate": UC_ARM64_REG_PSTATE,
 }
 
 reg_map_b = {
@@ -313,3 +314,8 @@
     "v30": UC_ARM64_REG_V30,
     "v31": UC_ARM64_REG_V31
 }
+
+reg_map_fp = {
+    "fpcr": UC_ARM64_REG_FPCR,
+    "fpsr": UC_ARM64_REG_FPSR
+}
diff --git a/qiling/arch/utils.py b/qiling/arch/utils.py
index 6782c9321..4d44b545f 100644
--- a/qiling/arch/utils.py
+++ b/qiling/arch/utils.py
@@ -48,7 +48,7 @@ def get_base_and_name(self, addr: int) -> Tuple[int, str]:
         return addr, '-'
 
     def disassembler(self, ql: Qiling, address: int, size: int):
-        data = ql.mem.read(address, size)
+        data = memoryview(ql.mem.read(address, size))
 
         # knowing that all binary sections are aligned to page boundary allows
         # us to 'cheat' and search for the containing image using the aligned
@@ -64,11 +64,14 @@ def disassembler(self, ql: Qiling, address: int, size: int):
         ba, name = self.get_base_and_name(ql.mem.align(address))
 
         anibbles = ql.arch.bits // 4
+        pos = 0
 
-        for insn in ql.arch.disassembler.disasm(data, address):
-            offset = insn.address - ba
+        for iaddr, isize, mnem, ops in ql.arch.disassembler.disasm_lite(data, address):
+            offset = iaddr - ba
+            ibytes = data[pos:pos + isize]
 
-            ql.log.info(f'{insn.address:0{anibbles}x} [{name:20s} + {offset:#08x}]  {insn.bytes.hex(" "):20s} {insn.mnemonic:20s} {insn.op_str}')
+            ql.log.info(f'{iaddr:0{anibbles}x} [{name:20s} + {offset:#08x}]  {ibytes.hex():22s} {mnem:16s} {ops}')
+            pos += isize
 
         if ql.verbose >= QL_VERBOSE.DUMP:
             for reg in ql.arch.regs.register_mapping:
diff --git a/qiling/arch/x86_utils.py b/qiling/arch/x86_utils.py
index 4726ef745..1bc9f8953 100644
--- a/qiling/arch/x86_utils.py
+++ b/qiling/arch/x86_utils.py
@@ -5,6 +5,7 @@
 from qiling import Qiling
 from qiling.arch.x86 import QlArchIntel
 from qiling.arch.x86_const import *
+from qiling.const import QL_ARCH
 from qiling.exception import QlGDTError, QlMemoryMappedError
 from qiling.os.memory import QlMemoryManager
 
@@ -60,6 +61,8 @@ def __init__(self, ql: Qiling, base=QL_X86_GDT_ADDR, limit=QL_X86_GDT_LIMIT, num
         # setup GDT by writing to GDTR
         ql.arch.regs.write(UC_X86_REG_GDTR, (0, base, limit, 0x0))
 
+        self.is_long_mode = ql.arch.type is QL_ARCH.X8664
+
         self.array = GDTArray(ql.mem, base, num_entries)
 
     @staticmethod
@@ -93,7 +96,18 @@ def make_selector(idx: int, rpl: int) -> int:
         return (idx << 3) | QL_X86_SEGSEL_TI_GDT | rpl
 
     def register_gdt_segment(self, index: int, seg_base: int, seg_limit: int, access: int) -> int:
-        flags = QL_X86_F_OPSIZE_32
+        is_code = access & QL_X86_A_CODE
+
+        if is_code and self.is_long_mode:
+            # If this is a code segment and 64-bit long mode is enabled,
+            # then set the long segment descriptor bit.
+            # This prevents some strange CPU errors encountered with
+            # intra-privilege level IRET instructions used for
+            # context switching on 64-bit Windows.
+            flags = QL_X86_F_LONG
+        else:
+            # Otherwise, OPSIZE_32 should be set.
+            flags = QL_X86_F_OPSIZE_32
 
         # is this a huge segment?
         if seg_limit > (1 << 16):
@@ -138,16 +152,21 @@ def setup_gs(self, base: int, size: int) -> None:
 
 class SegmentManager86(SegmentManager):
     def setup_cs_ds_ss_es(self, base: int, size: int) -> None:
-        # While debugging the linux kernel segment, the cs segment was found on the third segment of gdt.
+        # TODO: 64-bit code segment access bits were adjusted, removing the conforming bit.
+        # Perhaps make the same change for x86?
         access = QL_X86_A_PRESENT | QL_X86_A_PRIV_3 | QL_X86_A_DESC_CODE | QL_X86_A_CODE | QL_X86_A_CODE_C | QL_X86_A_CODE_R
+        # While debugging the linux kernel segment, the cs segment was found on the third segment of gdt.
         selector = self.gdtm.register_gdt_segment(3, base, size - 1, access)
 
         self.arch.regs.cs = selector
 
         # TODO : The section permission here should be QL_X86_A_PRIV_3, but I do n’t know why it can only be set to QL_X86_A_PRIV_0.
+        # TODO: 64-bit data segment access bits were adjusted, removing the direction bit.
+        # After this change, there were no problems changing the privilege level to ring 3.
+        # Perhaps make the same change for x86?
+        access = QL_X86_A_PRESENT | QL_X86_A_PRIV_0 | QL_X86_A_DESC_DATA | QL_X86_A_DATA | QL_X86_A_DATA_E | QL_X86_A_DATA_W
         # While debugging the Linux kernel segment, I found that the three segments DS, SS, and ES all point to the same location in the GDT table.
         # This position is the fifth segment table of GDT.
-        access = QL_X86_A_PRESENT | QL_X86_A_PRIV_0 | QL_X86_A_DESC_DATA | QL_X86_A_DATA | QL_X86_A_DATA_E | QL_X86_A_DATA_W
         selector = self.gdtm.register_gdt_segment(5, base, size - 1, access)
 
         self.arch.regs.ds = selector
@@ -169,15 +188,32 @@ def setup_gs(self, base: int, size: int) -> None:
 
 class SegmentManager64(SegmentManager):
     def setup_cs_ds_ss_es(self, base: int, size: int) -> None:
+        # Code segment access bits:
+        # * QL_X86_A_PRESENT        : Present
+        # * QL_X86_A_PRIV_3         : Ring 3 (user-mode)
+        # * QL_X86_A_DESC_CODE      : Segment describes a code segment
+        # * QL_X86_A_CODE           : Executable bit set
+        # * QL_X86_A_CODE_R         : Readable
+        # Not set:
+        # * QL_X86_A_CODE_C         : Conforming bit
+        #   -> unset means code in this segment can only be executed from the ring set in DPL.
+        access = QL_X86_A_PRESENT | QL_X86_A_PRIV_3 | QL_X86_A_DESC_CODE | QL_X86_A_CODE | QL_X86_A_CODE_R
         # While debugging the linux kernel segment, the cs segment was found on the sixth segment of gdt.
-        access = QL_X86_A_PRESENT | QL_X86_A_PRIV_3 | QL_X86_A_DESC_CODE | QL_X86_A_CODE | QL_X86_A_CODE_C | QL_X86_A_CODE_R
         selector = self.gdtm.register_gdt_segment(6, base, size - 1, access)
 
         self.arch.regs.cs = selector
 
-        # TODO : The section permission here should be QL_X86_A_PRIV_3, but I do n’t know why it can only be set to QL_X86_A_PRIV_0.
+        # Data segment access bits:
+        # * QL_X86_A_PRESENT        : Present
+        # * QL_X86_A_PRIV_3         : Ring 3 (user-mode)
+        # * QL_X86_A_DESC_DATA      : Segment describes a data segment
+        # * QL_X86_A_DATA           : Executable bit NOT set
+        # * QL_X86_A_DATA_W         : Writable
+        # Not set:
+        # * QL_X86_A_DATA_E         : Direction bit
+        #   -> unset means the data segment grows upward, rather than downward.
+        access = QL_X86_A_PRESENT | QL_X86_A_PRIV_3 | QL_X86_A_DESC_DATA | QL_X86_A_DATA | QL_X86_A_DATA_W
         # When I debug the Linux kernel, I find that only the SS is set to the fifth segment table, and the rest are not set.
-        access = QL_X86_A_PRESENT | QL_X86_A_PRIV_0 | QL_X86_A_DESC_DATA | QL_X86_A_DATA | QL_X86_A_DATA_E | QL_X86_A_DATA_W
         selector = self.gdtm.register_gdt_segment(5, base, size - 1, access)
 
         # self.arch.regs.ds = selector
diff --git a/qiling/cc/__init__.py b/qiling/cc/__init__.py
index 99c9e5643..a1f354818 100644
--- a/qiling/cc/__init__.py
+++ b/qiling/cc/__init__.py
@@ -70,6 +70,12 @@ def setReturnValue(self, val: int) -> None:
 
         raise NotImplementedError
 
+    def getReturnAddress(self) -> int:
+        """Get function return address.
+        """
+
+        raise NotImplementedError
+
     def setReturnAddress(self, addr: int) -> None:
         """Set function return address.
 
diff --git a/qiling/cc/arm.py b/qiling/cc/arm.py
index 51d798b23..29ac126be 100644
--- a/qiling/cc/arm.py
+++ b/qiling/cc/arm.py
@@ -21,17 +21,22 @@ class QlArmBaseCC(QlCommonBaseCC):
     def getNumSlots(argbits: int) -> int:
         return 1
 
+    def getReturnAddress(self) -> int:
+        return self.arch.regs.lr
+
     def setReturnAddress(self, addr: int) -> None:
         self.arch.regs.lr = addr
 
     def unwind(self, nslots: int) -> int:
         # TODO: cleanup?
-        return self.arch.regs.lr
+        return self.getReturnAddress()
+
 
 class aarch64(QlArmBaseCC):
     _retreg = UC_ARM64_REG_X0
     _argregs = make_arg_list(UC_ARM64_REG_X0, UC_ARM64_REG_X1, UC_ARM64_REG_X2, UC_ARM64_REG_X3, UC_ARM64_REG_X4, UC_ARM64_REG_X5, UC_ARM64_REG_X6, UC_ARM64_REG_X7)
 
+
 class aarch32(QlArmBaseCC):
     _retreg = UC_ARM_REG_R0
     _argregs = make_arg_list(UC_ARM_REG_R0, UC_ARM_REG_R1, UC_ARM_REG_R2, UC_ARM_REG_R3)
diff --git a/qiling/cc/intel.py b/qiling/cc/intel.py
index ca1796034..f2e6971d1 100644
--- a/qiling/cc/intel.py
+++ b/qiling/cc/intel.py
@@ -15,6 +15,9 @@ class QlIntelBaseCC(QlCommonBaseCC):
     Supports arguments passing over registers and stack.
     """
 
+    def getReturnAddress(self) -> int:
+        return self.arch.stack_read(0)
+
     def setReturnAddress(self, addr: int) -> None:
         self.arch.stack_push(addr)
 
diff --git a/qiling/cc/mips.py b/qiling/cc/mips.py
index 9ebf23375..472b2a3ec 100644
--- a/qiling/cc/mips.py
+++ b/qiling/cc/mips.py
@@ -12,6 +12,9 @@ class mipso32(QlCommonBaseCC):
     _shadow = 4
     _retaddr_on_stack = False
 
+    def getReturnAddress(self) -> int:
+        return self.arch.regs.ra
+
     def setReturnAddress(self, addr: int):
         self.arch.regs.ra = addr
 
diff --git a/qiling/cc/ppc.py b/qiling/cc/ppc.py
index 2440fab15..b4a88f791 100644
--- a/qiling/cc/ppc.py
+++ b/qiling/cc/ppc.py
@@ -22,5 +22,8 @@ class ppc(QlCommonBaseCC):
     def getNumSlots(argbits: int):
         return 1
 
+    def getReturnAddress(self) -> int:
+        return self.arch.regs.lr
+
     def setReturnAddress(self, addr: int):
         self.arch.regs.lr = addr
diff --git a/qiling/cc/riscv.py b/qiling/cc/riscv.py
index 3a360bd8d..f9f09522c 100644
--- a/qiling/cc/riscv.py
+++ b/qiling/cc/riscv.py
@@ -22,5 +22,8 @@ class riscv(QlCommonBaseCC):
     def getNumSlots(argbits: int):
         return 1
 
+    def getReturnAddress(self) -> int:
+        return self.arch.regs.ra
+
     def setReturnAddress(self, addr: int):
         self.arch.regs.ra = addr
diff --git a/qiling/core_struct.py b/qiling/core_struct.py
index 6c0d99cca..f10fd42f4 100644
--- a/qiling/core_struct.py
+++ b/qiling/core_struct.py
@@ -1,5 +1,5 @@
 #!/usr/bin/env python3
-# 
+#
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
@@ -25,14 +25,14 @@ def __init__(self, endian: QL_ENDIAN, bit: int):
             QL_ENDIAN.EB: '>'
         }[endian]
 
-        self._fmt8   = f'{modifier}B'
-        self._fmt8s  = f'{modifier}b'
-        self._fmt16  = f'{modifier}H'
-        self._fmt16s = f'{modifier}h'
-        self._fmt32  = f'{modifier}I'
-        self._fmt32s = f'{modifier}i'
-        self._fmt64  = f'{modifier}Q'
-        self._fmt64s = f'{modifier}q'
+        self._fmt8   = struct.Struct(f'{modifier}B')
+        self._fmt8s  = struct.Struct(f'{modifier}b')
+        self._fmt16  = struct.Struct(f'{modifier}H')
+        self._fmt16s = struct.Struct(f'{modifier}h')
+        self._fmt32  = struct.Struct(f'{modifier}I')
+        self._fmt32s = struct.Struct(f'{modifier}i')
+        self._fmt64  = struct.Struct(f'{modifier}Q')
+        self._fmt64s = struct.Struct(f'{modifier}q')
 
         handlers = {
             64 : (self.pack64, self.pack64s, self.unpack64, self.unpack64s),
@@ -51,49 +51,49 @@ def __init__(self, endian: QL_ENDIAN, bit: int):
         self.unpacks = ups
 
     def pack64(self, x: int, /) -> bytes:
-        return struct.pack(self._fmt64, x)
+        return self._fmt64.pack(x)
 
     def pack64s(self, x: int, /) -> bytes:
-        return struct.pack(self._fmt64s, x)
+        return self._fmt64s.pack(x)
 
     def unpack64(self, x: ReadableBuffer, /) -> int:
-        return struct.unpack(self._fmt64, x)[0]
+        return self._fmt64.unpack(x)[0]
 
     def unpack64s(self, x: ReadableBuffer, /) -> int:
-        return struct.unpack(self._fmt64s, x)[0]
+        return self._fmt64s.unpack(x)[0]
 
     def pack32(self, x: int, /) -> bytes:
-        return struct.pack(self._fmt32, x)
+        return self._fmt32.pack(x)
 
     def pack32s(self, x: int, /) -> bytes:
-        return struct.pack(self._fmt32s, x)
+        return self._fmt32s.pack(x)
 
     def unpack32(self, x: ReadableBuffer, /) -> int:
-        return struct.unpack(self._fmt32, x)[0]
+        return self._fmt32.unpack(x)[0]
 
     def unpack32s(self, x: ReadableBuffer, /) -> int:
-        return struct.unpack(self._fmt32s, x)[0]
+        return self._fmt32s.unpack(x)[0]
 
     def pack16(self, x: int, /) -> bytes:
-        return struct.pack(self._fmt16, x)
+        return self._fmt16.pack(x)
 
     def pack16s(self, x: int, /) -> bytes:
-        return struct.pack(self._fmt16s, x)
+        return self._fmt16s.pack(x)
 
     def unpack16(self, x: ReadableBuffer, /) -> int:
-        return struct.unpack(self._fmt16, x)[0]
+        return self._fmt16.unpack(x)[0]
 
     def unpack16s(self, x: ReadableBuffer, /) -> int:
-        return struct.unpack(self._fmt16s, x)[0]
+        return self._fmt16s.unpack(x)[0]
 
     def pack8(self, x: int, /) -> bytes:
-        return struct.pack(self._fmt8, x)
+        return self._fmt8.pack(x)
 
     def pack8s(self, x: int, /) -> bytes:
-        return struct.pack(self._fmt8s, x)
+        return self._fmt8s.pack(x)
 
     def unpack8(self, x: ReadableBuffer, /) -> int:
-        return struct.unpack(self._fmt8, x)[0]
+        return self._fmt8.unpack(x)[0]
 
     def unpack8s(self, x: ReadableBuffer, /) -> int:
-        return struct.unpack(self._fmt8s, x)[0]
+        return self._fmt8s.unpack(x)[0]
diff --git a/qiling/debugger/__init__.py b/qiling/debugger/__init__.py
index 57e0576ed..4122e4cb4 100644
--- a/qiling/debugger/__init__.py
+++ b/qiling/debugger/__init__.py
@@ -1,3 +1 @@
 from .debugger import QlDebugger
-# from .disassember import QlDisassember
-# from .utils import QlReadELF
diff --git a/qiling/debugger/disassember.py b/qiling/debugger/disassember.py
deleted file mode 100644
index fea90563a..000000000
--- a/qiling/debugger/disassember.py
+++ /dev/null
@@ -1,55 +0,0 @@
-#!/usr/bin/env python3
-# 
-# Cross Platform and Multi Architecture Advanced Binary Emulation Framework
-#
-
-from elftools.elf.elffile import ELFFile
-
-from qiling import Qiling
-from qiling.const import *
-from capstone import *
-
-
-class QlDisassember():
-    def __init__(self, ql:Qiling):
-        self.ql = ql
-
-    def disasm_all_lines(self):
-        disasm_result = []
-
-        if self.ql.os.type == QL_OS.LINUX:
-            disasm_result = self.disasm_elf()
-
-        return disasm_result
-
-    def disasm_elf(self, seg_name='.text'):
-        def disasm(ql, address, size):
-            md = ql.arch.disassembler
-            md.detail = True
-
-            return md.disasm(ql.mem.read(address, size), address)
-
-        disasm_result = []
-        if self.ql.arch.type == QL_ARCH.X86:
-            BASE = int(self.ql.profile.get("OS32", "load_address"), 16)
-            seg_start = 0x0
-            seg_end = 0x0
-
-            f = open(self.ql.path, 'rb')
-            elffile = ELFFile(f)
-            elf_header = elffile.header
-            reladyn = elffile.get_section_by_name(seg_name)
-
-            # No PIE
-            if elf_header['e_type'] == 'ET_EXEC':
-                seg_start = reladyn.header.sh_addr
-                seg_end = seg_start + reladyn.data_size
-            # PIE
-            elif elf_header['e_type'] == 'ET_DYN':
-                seg_start = BASE + reladyn.header.sh_addr
-                seg_end = seg_start + reladyn.data_size
-
-            for insn in disasm(ql, seg_start, seg_end-seg_start):
-                disasm_result.append(insn)       
-
-        return disasm_result
\ No newline at end of file
diff --git a/qiling/debugger/gdb/gdb.py b/qiling/debugger/gdb/gdb.py
index a26bf6d93..f6d6498d8 100644
--- a/qiling/debugger/gdb/gdb.py
+++ b/qiling/debugger/gdb/gdb.py
@@ -183,6 +183,7 @@ def handle_qmark(subcmd: str) -> Reply:
             from unicorn.arm_const import UC_ARM_REG_R11
             from unicorn.arm64_const import UC_ARM64_REG_X29
             from unicorn.mips_const import UC_MIPS_REG_INVALID
+            from unicorn.ppc_const import UC_PPC_REG_31
 
             arch_uc_bp = {
                 QL_ARCH.X86      : UC_X86_REG_EBP,
@@ -191,7 +192,8 @@ def handle_qmark(subcmd: str) -> Reply:
                 QL_ARCH.ARM64    : UC_ARM64_REG_X29,
                 QL_ARCH.MIPS     : UC_MIPS_REG_INVALID, # skipped
                 QL_ARCH.A8086    : UC_X86_REG_EBP,
-                QL_ARCH.CORTEX_M : UC_ARM_REG_R11
+                QL_ARCH.CORTEX_M : UC_ARM_REG_R11,
+                QL_ARCH.PPC      : UC_PPC_REG_31
             }[self.ql.arch.type]
 
             def __get_reg_idx(ucreg: int) -> int:
diff --git a/qiling/debugger/gdb/xml/arm/arm-m-profile.xml b/qiling/debugger/gdb/xml/cortex_m/arm-m-profile.xml
similarity index 81%
rename from qiling/debugger/gdb/xml/arm/arm-m-profile.xml
rename to qiling/debugger/gdb/xml/cortex_m/arm-m-profile.xml
index f0584a206..a07071502 100644
--- a/qiling/debugger/gdb/xml/arm/arm-m-profile.xml
+++ b/qiling/debugger/gdb/xml/cortex_m/arm-m-profile.xml
@@ -25,4 +25,10 @@
   <reg name="pc" bitsize="32" type="code_ptr"/>
 
   <reg name="xpsr" bitsize="32" regnum="25"/>
-</feature>
+  <reg name="msp" bitsize="32"/>
+  <reg name="psp" bitsize="32"/>
+  <reg name="primask" bitsize="32"/>
+  <reg name="basepri" bitsize="32"/>
+  <reg name="faultmask" bitsize="32"/>
+  <reg name="control" bitsize="32"/>
+</feature>
\ No newline at end of file
diff --git a/qiling/debugger/gdb/xml/cortex_m/target.xml b/qiling/debugger/gdb/xml/cortex_m/target.xml
new file mode 100644
index 000000000..635912398
--- /dev/null
+++ b/qiling/debugger/gdb/xml/cortex_m/target.xml
@@ -0,0 +1,12 @@
+<?xml version="1.0"?>
+<!-- Copyright (C) 2009-2016 Free Software Foundation, Inc.
+
+ *!Copying and distribution of this file, with or without modification,
+ *!are permitted in any medium without royalty provided the copyright
+ *!notice and this notice are preserved.  -->
+
+<!DOCTYPE target SYSTEM "gdb-target.dtd">
+<target xmlns:xi="http://www.w3.org/2001/XInclude">
+    <architecture>armv7-m</architecture>
+    <xi:include href="arm-m-profile.xml"/>
+</target>
\ No newline at end of file
diff --git a/qiling/debugger/gdb/xml/ppc/ppc-core.xml b/qiling/debugger/gdb/xml/ppc/ppc-core.xml
new file mode 100644
index 000000000..d695132a2
--- /dev/null
+++ b/qiling/debugger/gdb/xml/ppc/ppc-core.xml
@@ -0,0 +1,51 @@
+<?xml version="1.0"?>
+<!-- Copyright (C) 2007-2020 Free Software Foundation, Inc.
+
+     Copying and distribution of this file, with or without modification,
+     are permitted in any medium without royalty provided the copyright
+     notice and this notice are preserved.  -->
+
+<!DOCTYPE feature SYSTEM "gdb-target.dtd">
+<feature name="org.gnu.gdb.power.core">
+    <reg name="r0" bitsize="32" type="uint32"/>
+    <reg name="r1" bitsize="32" type="uint32"/>
+    <reg name="r2" bitsize="32" type="uint32"/>
+    <reg name="r3" bitsize="32" type="uint32"/>
+    <reg name="r4" bitsize="32" type="uint32"/>
+    <reg name="r5" bitsize="32" type="uint32"/>
+    <reg name="r6" bitsize="32" type="uint32"/>
+    <reg name="r7" bitsize="32" type="uint32"/>
+    <reg name="r8" bitsize="32" type="uint32"/>
+    <reg name="r9" bitsize="32" type="uint32"/>
+    <reg name="r10" bitsize="32" type="uint32"/>
+    <reg name="r11" bitsize="32" type="uint32"/>
+    <reg name="r12" bitsize="32" type="uint32"/>
+    <reg name="r13" bitsize="32" type="uint32"/>
+    <reg name="r14" bitsize="32" type="uint32"/>
+    <reg name="r15" bitsize="32" type="uint32"/>
+    <reg name="r16" bitsize="32" type="uint32"/>
+    <reg name="r17" bitsize="32" type="uint32"/>
+    <reg name="r18" bitsize="32" type="uint32"/>
+    <reg name="r19" bitsize="32" type="uint32"/>
+    <reg name="r20" bitsize="32" type="uint32"/>
+    <reg name="r21" bitsize="32" type="uint32"/>
+    <reg name="r22" bitsize="32" type="uint32"/>
+    <reg name="r23" bitsize="32" type="uint32"/>
+    <reg name="r24" bitsize="32" type="uint32"/>
+    <reg name="r25" bitsize="32" type="uint32"/>
+    <reg name="r26" bitsize="32" type="uint32"/>
+    <reg name="r27" bitsize="32" type="uint32"/>
+    <reg name="r28" bitsize="32" type="uint32"/>
+    <reg name="r29" bitsize="32" type="uint32"/>
+    <reg name="r30" bitsize="32" type="uint32"/>
+    <reg name="r31" bitsize="32" type="uint32"/>
+
+    <reg name="cr" bitsize="32" type="uint32"/>
+    <reg name="lr" bitsize="32" type="code_ptr"/>
+    <reg name="pc" bitsize="32" type="code_ptr"/>
+    <reg name="msr" bitsize="32" type="uint32"/>
+    <reg name="ctr" bitsize="32" type="uint32"/>
+    <reg name="xer" bitsize="32" type="uint32"/>
+
+
+</feature>
\ No newline at end of file
diff --git a/qiling/debugger/gdb/xml/ppc/target.xml b/qiling/debugger/gdb/xml/ppc/target.xml
new file mode 100644
index 000000000..977416a37
--- /dev/null
+++ b/qiling/debugger/gdb/xml/ppc/target.xml
@@ -0,0 +1,12 @@
+<?xml version="1.0"?>
+<!-- Copyright (C) 2009-2016 Free Software Foundation, Inc.
+
+ *!Copying and distribution of this file, with or without modification,
+ *!are permitted in any medium without royalty provided the copyright
+ *!notice and this notice are preserved.  -->
+
+<!DOCTYPE target SYSTEM "gdb-target.dtd">
+<target xmlns:xi="http://www.w3.org/2001/XInclude">
+    <architecture>powerpc:common</architecture>
+    <xi:include href="ppc-core.xml"/>
+</target>
\ No newline at end of file
diff --git a/qiling/debugger/gdb/xmlregs.py b/qiling/debugger/gdb/xmlregs.py
index 4749b2111..6bc2371f4 100644
--- a/qiling/debugger/gdb/xmlregs.py
+++ b/qiling/debugger/gdb/xmlregs.py
@@ -13,13 +13,21 @@
     reg_map_q as arm_regs_q,
     reg_map_s as arm_regs_s
 )
+
+from qiling.arch.cortex_m_const import (
+    reg_map as cortex_m_regs
+)
+
 from qiling.arch.arm64_const import (
     reg_map as arm64_regs,
-    reg_map_v as arm64_regs_v
+    reg_map_v as arm64_regs_v,
+    reg_map_fp as arm64_reg_map_fp
 )
+
 from qiling.arch.mips_const import (
     reg_map as mips_regs_gpr
 )
+
 from qiling.arch.x86_const import (
     reg_map_32 as x86_regs_32,
     reg_map_64 as x86_regs_64,
@@ -30,6 +38,10 @@
     reg_map_ymm as x86_regs_ymm
 )
 
+from qiling.arch.ppc_const import (
+    reg_map as ppc_regs
+)
+
 from qiling.const import QL_ARCH, QL_OS
 
 RegEntry = Tuple[Optional[int], int, int]
@@ -132,9 +144,10 @@ def __load_regsmap(archtype: QL_ARCH, xmltree: ElementTree.ElementTree) -> Seque
             QL_ARCH.X86:      dict(**x86_regs_32, **x86_regs_misc, **x86_regs_cr, **x86_regs_st, **x86_regs_xmm),
             QL_ARCH.X8664:    dict(**x86_regs_64, **x86_regs_misc, **x86_regs_cr, **x86_regs_st, **x86_regs_xmm, **x86_regs_ymm),
             QL_ARCH.ARM:      dict(**arm_regs, **arm_regs_vfp, **arm_regs_q, **arm_regs_s),
-            QL_ARCH.CORTEX_M: arm_regs,
-            QL_ARCH.ARM64:    dict(**arm64_regs, **arm64_regs_v),
-            QL_ARCH.MIPS:     dict(**mips_regs_gpr)
+            QL_ARCH.CORTEX_M: dict(**cortex_m_regs),
+            QL_ARCH.ARM64:    dict(**arm64_regs, **arm64_regs_v, **arm64_reg_map_fp),
+            QL_ARCH.MIPS:     dict(**mips_regs_gpr),
+            QL_ARCH.PPC:      dict(**ppc_regs)
         }[archtype]
 
         regsinfo = sorted(QlGdbFeatures.__walk_xml_regs(xmltree))
diff --git a/qiling/debugger/qdb/arch/__init__.py b/qiling/debugger/qdb/arch/__init__.py
index 4c5b7a385..12ed30d11 100644
--- a/qiling/debugger/qdb/arch/__init__.py
+++ b/qiling/debugger/qdb/arch/__init__.py
@@ -3,7 +3,6 @@
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
-from .arch_x86 import ArchX86
-from .arch_mips import ArchMIPS
 from .arch_arm import ArchARM, ArchCORTEX_M
-from .arch_x8664 import ArchX8664
\ No newline at end of file
+from .arch_intel import ArchIntel, ArchX86, ArchX64
+from .arch_mips import ArchMIPS
diff --git a/qiling/debugger/qdb/arch/arch.py b/qiling/debugger/qdb/arch/arch.py
index cbe6489a7..bf1aa6dfe 100644
--- a/qiling/debugger/qdb/arch/arch.py
+++ b/qiling/debugger/qdb/arch/arch.py
@@ -3,32 +3,81 @@
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
+from typing import Collection, Dict, Mapping, Optional, TypeVar
 
-from qiling.const import QL_ARCH
-from unicorn import UC_ERR_READ_UNMAPPED
-import unicorn
+T = TypeVar('T')
 
 
 class Arch:
+    """Arch base class.
     """
-    base class for arch
-    """
 
-    def __init__(self):
-        pass
+    def __init__(self, regs: Collection[str], swaps: Mapping[str, str], asize: int, isize: int) -> None:
+        """Initialize architecture instance.
+
+        Args:
+            regs  : collection of registers names to include in context
+            asize : native address size in bytes
+            isize : instruction size in bytes
+            swaps : readable register names alternatives, may be empty
+        """
+
+        self._regs = regs
+        self._swaps = swaps
+        self._asize = asize
+        self._isize = isize
 
     @property
-    def arch_insn_size(self):
-        return 4
+    def regs(self) -> Collection[str]:
+        """Collection of registers names.
+        """
+
+        return self._regs
 
     @property
-    def archbit(self):
-        return 4
+    def isize(self) -> int:
+        """Native instruction size.
+        """
+
+        return self._isize
+
+    @property
+    def asize(self) -> int:
+        """Native pointer size.
+        """
+
+        return self._asize
+
+    def swap_regs(self, mapping: Mapping[str, T]) -> Dict[str, T]:
+        """Swap default register names with their aliases.
+
+        Args:
+            mapping: regsiters names mapped to their values
+
+        Returns: a new dictionary where all swappable names were swapped with their aliases
+        """
+
+        return {self._swaps.get(k, k): v for k, v in mapping.items()}
+
+    def unalias(self, name: str) -> str:
+        """Get original register name for the specified alias.
+
+        Args:
+            name: aliaes register name
+
+        Returns: original name of aliased register, or same name if not an alias
+        """
+
+        # perform a reversed lookup in swaps to find the original name for given alias
+        return next((org for org, alt in self._swaps.items() if name == alt), name)
+
+    def read_insn(self, address: int) -> Optional[bytearray]:
+        """Read a single instruction from given address.
+
+        Args:
+            address: memory address to read from
 
-    def read_insn(self, address: int):
-        try:
-            result = self.read_mem(address, self.arch_insn_size)
-        except unicorn.unicorn.UcError as err:
-            result = None
+        Returns: instruction bytes, or None if memory could not be read
+        """
 
-        return result
+        return self.try_read_mem(address, self.isize)
diff --git a/qiling/debugger/qdb/arch/arch_arm.py b/qiling/debugger/qdb/arch/arch_arm.py
index ed2e797c4..72a2979db 100644
--- a/qiling/debugger/qdb/arch/arch_arm.py
+++ b/qiling/debugger/qdb/arch/arch_arm.py
@@ -3,105 +3,145 @@
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
-from typing import Mapping
+from typing import ClassVar, Dict, Optional
 
 from .arch import Arch
 
+
 class ArchARM(Arch):
-    def __init__(self):
-        super().__init__()
-        self._regs = (
-                "r0", "r1", "r2", "r3",
-                "r4", "r5", "r6", "r7",
-                "r8", "r9", "r10", "r11",
-                "r12", "sp", "lr", "pc",
-                )
+    _flags_reg: ClassVar[str] = 'cpsr'
 
-    @property
-    def regs(self):
-        return self._regs
+    def __init__(self) -> None:
+        regs = (
+            'r0', 'r1', 'r2', 'r3',
+            'r4', 'r5', 'r6', 'r7',
+            'r8', 'r9', 'r10', 'r11',
+            'r12', 'sp', 'lr', 'pc'
+        )
 
-    @regs.setter
-    def regs(self, regs):
-        self._regs += regs
+        aliases = {
+            'r9' : 'sb',
+            'r10': 'sl',
+            'r12': 'ip',
+            'r11': 'fp'
+        }
 
-    @property
-    def regs_need_swapped(self):
+        asize = 4
+        isize = 4
+
+        super().__init__(regs, aliases, asize, isize)
+
+    @staticmethod
+    def get_flags(bits: int) -> Dict[str, bool]:
         return {
-                "sl": "r10",
-                "ip": "r12",
-                "fp": "r11",
-                }
+            'thumb':    bits & (0b1 <<  5) != 0,
+            'fiq':      bits & (0b1 <<  6) != 0,
+            'irq':      bits & (0b1 <<  7) != 0,
+            'overflow': bits & (0b1 << 28) != 0,
+            'carry':    bits & (0b1 << 29) != 0,
+            'zero':     bits & (0b1 << 30) != 0,
+            'neg':      bits & (0b1 << 31) != 0
+        }
 
     @staticmethod
-    def get_flags(bits: int) -> Mapping[str, bool]:
-        """
-        get flags for ARM
-        """
+    def get_mode(bits: int) -> str:
+        modes = {
+            0b10000: 'User',
+            0b10001: 'FIQ',
+            0b10010: 'IRQ',
+            0b10011: 'Supervisor',
+            0b10110: 'Monitor',
+            0b10111: 'Abort',
+            0b11010: 'Hypervisor',
+            0b11011: 'Undefined',
+            0b11111: 'System'
+        }
+
+        return modes.get(bits & 0b11111, '?')
 
-        def get_mode(bits: int) -> int:
-            """
-            get operating mode for ARM
-            """
-            return {
-                    0b10000: "User",
-                    0b10001: "FIQ",
-                    0b10010: "IRQ",
-                    0b10011: "Supervisor",
-                    0b10110: "Monitor",
-                    0b10111: "Abort",
-                    0b11010: "Hypervisor",
-                    0b11011: "Undefined",
-                    0b11111: "System",
-                    }.get(bits & 0x00001f)
+    @property
+    def is_thumb(self) -> bool:
+        """Query whether the processor is currently in thumb mode.
+        """
 
-        return {
-                "mode":     get_mode(bits),
-                "thumb":    bits & 0x00000020 != 0,
-                "fiq":      bits & 0x00000040 != 0,
-                "irq":      bits & 0x00000080 != 0,
-                "neg":      bits & 0x80000000 != 0,
-                "zero":     bits & 0x40000000 != 0,
-                "carry":    bits & 0x20000000 != 0,
-                "overflow": bits & 0x10000000 != 0,
-                }
+        return self.ql.arch.is_thumb
 
     @property
-    def thumb_mode(self) -> bool:
-        """
-        helper function for checking thumb mode
+    def isize(self) -> int:
+        return 2 if self.is_thumb else self._isize
+
+    @staticmethod
+    def __is_wide_insn(data: bytes) -> bool:
+        """Determine whether a sequence of bytes respresents a wide thumb instruction.
         """
 
-        return self.ql.arch.is_thumb
+        assert len(data) in (2, 4), f'unexpected instruction length: {len(data)}'
 
+        # determine whether this is a wide instruction by inspecting the 5 most
+        # significant bits in the first half-word
+        return (data[1] >> 3) & 0b11111 in (0b11101, 0b11110, 0b11111)
 
-    def read_insn(self, address: int) -> bytes:
+    def __read_thumb_insn_fail(self, address: int) -> Optional[bytearray]:
+        """A failsafe method for reading thumb instructions. This method is needed for
+        rare cases in which a narrow instruction is on a page boundary where the next
+        page is unavailable.
         """
-        read instruction depending on current operating mode
+
+        lo_half = self.try_read_mem(address, 2)
+
+        if lo_half is None:
+            return None
+
+        data = lo_half
+
+        if ArchARM.__is_wide_insn(data):
+            hi_half = self.try_read_mem(address + 2, 2)
+
+            # fail if higher half-word was required but could not be read
+            if hi_half is None:
+                return None
+
+            data.extend(hi_half)
+
+        return data
+
+    def __read_thumb_insn(self, address: int) -> Optional[bytearray]:
+        """Read one instruction in thumb mode.
+
+        Thumb instructions may be either 2 or 4 bytes long, depending on encoding of
+        the first word. However, reading two chunks of two bytes each is slower. For
+        most cases reading all four bytes in advance will be safe and quicker.
         """
 
-        def thumb_read(address: int) -> bytes:
+        data = self.try_read_mem(address, 4)
 
-            first_two = self.ql.mem.read_ptr(address, 2)
-            result = self.ql.pack16(first_two)
+        if data is None:
+            # there is a slight chance we could not read 4 bytes because only 2
+            # are available. try the failsafe method to find out
+            return self.__read_thumb_insn_fail(address)
 
-            # to judge it's thumb mode or not
-            if any([
-                first_two & 0xf000 == 0xf000,
-                first_two & 0xf800 == 0xf800,
-                first_two & 0xe800 == 0xe800,
-                 ]):
+        if ArchARM.__is_wide_insn(data):
+            return data
 
-                latter_two = self.ql.mem.read_ptr(address+2, 2)
-                result += self.ql.pack16(latter_two)
+        return data[:2]
 
-            return result
+    def read_insn(self, address: int) -> Optional[bytearray]:
+        """Read one instruction worth of bytes.
+        """
 
-        return super().read_insn(address) if not self.thumb_mode else thumb_read(address)
+        if self.is_thumb:
+            return self.__read_thumb_insn(address)
 
+        return super().read_insn(address)
 
 
 class ArchCORTEX_M(ArchARM):
+    _flags_reg: ClassVar[str] = 'xpsr'
+
     def __init__(self):
         super().__init__()
-        self.regs += ("xpsr", "control", "primask", "basepri", "faultmask")
+
+        self._regs += (
+            'xpsr', 'control', 'primask',
+            'basepri', 'faultmask'
+        )
diff --git a/qiling/debugger/qdb/arch/arch_intel.py b/qiling/debugger/qdb/arch/arch_intel.py
new file mode 100644
index 000000000..986309e02
--- /dev/null
+++ b/qiling/debugger/qdb/arch/arch_intel.py
@@ -0,0 +1,59 @@
+#!/usr/bin/env python3
+#
+# Cross Platform and Multi Architecture Advanced Binary Emulation Framework
+#
+
+from typing import Collection, Dict
+
+from .arch import Arch
+
+
+class ArchIntel(Arch):
+    """Arch base class for Intel architecture.
+    """
+
+    def __init__(self, regs: Collection[str], asize: int) -> None:
+        super().__init__(regs, {}, asize, 15)
+
+    @staticmethod
+    def get_flags(bits: int) -> Dict[str, bool]:
+        return {
+            'CF' : bits & (0b1 <<  0) != 0,  # carry
+            'PF' : bits & (0b1 <<  2) != 0,  # parity
+            'AF' : bits & (0b1 <<  4) != 0,  # adjust
+            'ZF' : bits & (0b1 <<  6) != 0,  # zero
+            'SF' : bits & (0b1 <<  7) != 0,  # sign
+            'IF' : bits & (0b1 <<  9) != 0,  # interrupt enable
+            'DF' : bits & (0b1 << 10) != 0,  # direction
+            'OF' : bits & (0b1 << 11) != 0   # overflow
+        }
+
+    @staticmethod
+    def get_iopl(bits: int) -> int:
+        return bits & (0b11 << 12)
+
+
+class ArchX86(ArchIntel):
+    def __init__(self) -> None:
+        regs = (
+            'eax', 'ebx', 'ecx', 'edx',
+            'ebp', 'esp', 'esi', 'edi',
+            'eip', 'eflags' ,'ss', 'cs',
+            'ds', 'es', 'fs', 'gs'
+        )
+
+        super().__init__(regs, 4)
+
+
+class ArchX64(ArchIntel):
+    def __init__(self) -> None:
+        regs = (
+            'rax', 'rbx', 'rcx', 'rdx',
+            'rbp', 'rsp', 'rsi', 'rdi',
+            'r8', 'r9', 'r10', 'r11',
+            'r12', 'r13', 'r14', 'r15',
+            'rip', 'eflags', 'ss', 'cs',
+            'ds', 'es', 'fs', 'gs'
+        )
+
+        super().__init__(regs, 8)
diff --git a/qiling/debugger/qdb/arch/arch_mips.py b/qiling/debugger/qdb/arch/arch_mips.py
index d262b0a90..52d7d8fcd 100644
--- a/qiling/debugger/qdb/arch/arch_mips.py
+++ b/qiling/debugger/qdb/arch/arch_mips.py
@@ -3,29 +3,27 @@
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
-
-
 from .arch import Arch
 
+
 class ArchMIPS(Arch):
-    def __init__(self):
-        super().__init__()
+    def __init__(self) -> None:
+        regs = (
+            'gp', 'at', 'v0', 'v1',
+            'a0', 'a1', 'a2', 'a3',
+            't0', 't1', 't2', 't3',
+            't4', 't5', 't6', 't7',
+            't8', 't9', 'sp', 's8',
+            's0', 's1', 's2', 's3',
+            's4', 's5', 's6', 's7',
+            'ra', 'k0', 'k1', 'pc'
+        )
+
+        aliases = {
+            's8': 'fp'
+        }
 
-    @property
-    def regs(self):
-        return (
-                "gp", "at", "v0", "v1",
-                "a0", "a1", "a2", "a3",
-                "t0", "t1", "t2", "t3",
-                "t4", "t5", "t6", "t7",
-                "t8", "t9", "sp", "s8",
-                "s0", "s1", "s2", "s3",
-                "s4", "s5", "s6", "s7",
-                "ra", "k0", "k1", "pc",
-                )
+        asize = 4
+        isize = 4
 
-    @property
-    def regs_need_swapped(self):
-        return {
-                "fp": "s8",
-                }
+        super().__init__(regs, aliases, asize, isize)
diff --git a/qiling/debugger/qdb/arch/arch_x86.py b/qiling/debugger/qdb/arch/arch_x86.py
deleted file mode 100644
index 10617cbd1..000000000
--- a/qiling/debugger/qdb/arch/arch_x86.py
+++ /dev/null
@@ -1,47 +0,0 @@
-#!/usr/bin/env python3
-#
-# Cross Platform and Multi Architecture Advanced Binary Emulation Framework
-#
-
-from typing import Mapping
-
-from .arch import Arch
-
-class ArchX86(Arch):
-    def __init__(self):
-        super().__init__()
-
-    @property
-    def arch_insn_size(self):
-        return 15
-
-    @property
-    def regs(self):
-        return (
-                "eax", "ebx", "ecx", "edx",
-                "esp", "ebp", "esi", "edi",
-                "eip", "ss", "cs", "ds", "es",
-                "fs", "gs", "eflags",
-                )
-
-    def read_insn(self, address: int) -> bytes:
-        # due to the variadic lengh of x86 instructions ( 1~15 )
-        # always assume the maxium size for disassembler to tell
-        # what is it exactly.
-
-        return self.read_mem(address, self.arch_insn_size)
-
-    @staticmethod
-    def get_flags(bits: int) -> Mapping[str, bool]:
-        """
-        get flags from ql.reg.eflags
-        """
-
-        return {
-                "CF" : bits & 0x0001 != 0, # CF, carry flag
-                "PF" : bits & 0x0004 != 0, # PF, parity flag
-                "AF" : bits & 0x0010 != 0, # AF, adjust flag
-                "ZF" : bits & 0x0040 != 0, # ZF, zero flag
-                "SF" : bits & 0x0080 != 0, # SF, sign flag
-                "OF" : bits & 0x0800 != 0, # OF, overflow flag
-                }
diff --git a/qiling/debugger/qdb/arch/arch_x8664.py b/qiling/debugger/qdb/arch/arch_x8664.py
deleted file mode 100644
index 686e2016e..000000000
--- a/qiling/debugger/qdb/arch/arch_x8664.py
+++ /dev/null
@@ -1,66 +0,0 @@
-#!/usr/bin/env python3
-#
-# Cross Platform and Multi Architecture Advanced Binary Emulation Framework
-#
-
-from typing import Mapping
-
-from .arch import Arch
-
-class ArchX8664(Arch):
-    '''
-    This is currently mostly just a copy of x86 - other than the size of archbits. Some of this may be wrong.
-    '''
-
-    def __init__(self):
-        super().__init__()
-    
-    @property
-    def arch_insn_size(self):
-        '''
-        Architecture maximum instruction size. x86_64 instructions are a maximum size of 15 bytes.
-
-        @returns bytes
-        '''
-
-        return 15
-    
-    @property
-    def regs(self):
-        return (
-                "rax", "rbx", "rcx", "rdx",
-                "rsp", "rbp", "rsi", "rdi",
-                "rip", "r8", "r9", "r10",
-                "r11", "r12", "r13", "r14",
-                "r15", "ss", "cs", "ds", "es",
-                "fs", "gs", "eflags"
-                )
-    
-    @property
-    def archbit(self):
-        '''
-        Architecture maximum register size. x86 is a maximum of 4 bytes.
-
-        @returns bytes
-        '''
-        
-        return 8
-
-    def read_insn(self, address: int) -> bytes:
-        # Due to the variadicc length of x86 instructions
-        # always assume the maximum size for disassembler to tell
-        # what it is.
-
-        return self.read_mem(address, self.arch_insn_size)
-    
-    @staticmethod
-    def get_flags(bits: int) -> Mapping[str, bool]:
-
-        return {
-                "CF" : bits & 0x0001 != 0, # CF, carry flag
-                "PF" : bits & 0x0004 != 0, # PF, parity flag
-                "AF" : bits & 0x0010 != 0, # AF, adjust flag
-                "ZF" : bits & 0x0040 != 0, # ZF, zero flag
-                "SF" : bits & 0x0080 != 0, # SF, sign flag
-                "OF" : bits & 0x0800 != 0, # OF, overflow flag
-                }
diff --git a/qiling/debugger/qdb/branch_predictor/__init__.py b/qiling/debugger/qdb/branch_predictor/__init__.py
index 5004ec348..670f65347 100644
--- a/qiling/debugger/qdb/branch_predictor/__init__.py
+++ b/qiling/debugger/qdb/branch_predictor/__init__.py
@@ -4,7 +4,13 @@
 #
 
 from .branch_predictor import BranchPredictor
-from .branch_predictor_x86 import BranchPredictorX86
-from .branch_predictor_mips import BranchPredictorMIPS
 from .branch_predictor_arm import BranchPredictorARM, BranchPredictorCORTEX_M
-from .branch_predictor_x8664 import BranchPredictorX8664
+from .branch_predictor_intel import BranchPredictorX86, BranchPredictorX64
+from .branch_predictor_mips import BranchPredictorMIPS
+
+__all__ = [
+	'BranchPredictor',
+	'BranchPredictorARM', 'BranchPredictorCORTEX_M',
+	'BranchPredictorX86', 'BranchPredictorX64',
+	'BranchPredictorMIPS'
+]
diff --git a/qiling/debugger/qdb/branch_predictor/branch_predictor.py b/qiling/debugger/qdb/branch_predictor/branch_predictor.py
index 713661501..9ee1466e5 100644
--- a/qiling/debugger/qdb/branch_predictor/branch_predictor.py
+++ b/qiling/debugger/qdb/branch_predictor/branch_predictor.py
@@ -4,37 +4,82 @@
 #
 
 from abc import abstractmethod
+from typing import ClassVar, NamedTuple, Optional
+
+from capstone import CS_GRP_JUMP, CS_GRP_CALL, CS_GRP_RET, CS_GRP_BRANCH_RELATIVE
+
 from ..context import Context
+from ..misc import InvalidInsn
 
 
-class Prophecy:
+class Prophecy(NamedTuple):
+    """Simple container for storing prediction results.
     """
-    container for storing result of the predictor
-    @going: indicate the certian branch will be taken or not
-    @where: where will it go if going is true
+
+    going: bool
+    """Indicate whether the certian branch is taken or not.
     """
 
-    def __init__(self):
-        self.going = False
-        self.where = None
+    where: Optional[int]
+    """Branch target in case it is taken.
+    Target may be `None` if it should have been read from memory, but that memory location
+    could not be reached.
+    """
 
-    def __iter__(self):
-        return iter((self.going, self.where))
 
 class BranchPredictor(Context):
+    """Branch predictor base class.
     """
-    Base class for predictor
+
+    stop: ClassVar[str]
+    """Instruction mnemonic that can be used to determine program's end.
     """
 
-    def read_reg(self, reg_name):
+    def has_ended(self) -> bool:
+        """Determine whether the program has ended by inspecting the currnet instruction.
+        """
+
+        insn = self.disasm_lite(self.cur_addr)
+
+        if not insn:
+            return False
+
+        # (address, size, mnemonic, op_str)
+        return insn[2] == self.stop
+
+    def is_branch(self) -> bool:
+        """Determine whether the current instruction is a branching instruction.
+        This does not provide indication whether the branch is going to be taken or not.
         """
-        read specific register value
+
+        insn = self.disasm(self.cur_addr, True)
+
+        # invalid instruction; definitely not a branch
+        if isinstance(insn, InvalidInsn):
+            return False
+
+        branching = (
+            CS_GRP_JUMP,
+            CS_GRP_CALL,
+            CS_GRP_RET,
+            CS_GRP_BRANCH_RELATIVE
+        )
+
+        return any(grp in branching for grp in insn.groups)
+
+    def is_fcall(self) -> bool:
+        """Determine whether the current instruction is a function call.
         """
 
-        return self.ql.arch.regs.read(reg_name)
+        insn = self.disasm(self.cur_addr, True)
+
+        # invalid instruction; definitely not a function call
+        if isinstance(insn, InvalidInsn):
+            return False
+
+        return insn.group(CS_GRP_CALL)
 
     @abstractmethod
     def predict(self) -> Prophecy:
-        """
-        Try to predict certian branch will be taken or not based on current context
+        """Predict whether a certian branch will be taken or not based on current context.
         """
diff --git a/qiling/debugger/qdb/branch_predictor/branch_predictor_arm.py b/qiling/debugger/qdb/branch_predictor/branch_predictor_arm.py
index bb5cd0f61..553c5ed7a 100644
--- a/qiling/debugger/qdb/branch_predictor/branch_predictor_arm.py
+++ b/qiling/debugger/qdb/branch_predictor/branch_predictor_arm.py
@@ -3,255 +3,264 @@
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
+from typing import Callable, Dict, List, Optional, Tuple
 
+from capstone import CS_OP_IMM, CS_OP_MEM, CS_OP_REG
+from capstone.arm import ArmOp, ArmOpMem
+from capstone.arm_const import (
+    ARM_CC_EQ, ARM_CC_NE, ARM_CC_HS, ARM_CC_LO,
+    ARM_CC_MI, ARM_CC_PL, ARM_CC_VS, ARM_CC_VC,
+    ARM_CC_HI, ARM_CC_LS, ARM_CC_GE, ARM_CC_LT,
+    ARM_CC_GT, ARM_CC_LE, ARM_CC_AL
+)
 
-from .branch_predictor import *
-from ..arch import ArchARM
-from ..misc import read_int
+from unicorn.arm_const import UC_ARM_REG_PC
 
+from .branch_predictor import BranchPredictor, Prophecy
+from ..arch import ArchARM, ArchCORTEX_M
+from ..misc import InvalidInsn
 
 
 class BranchPredictorARM(BranchPredictor, ArchARM):
+    """Branch Predictor for ARM.
     """
-    predictor for ARM
-    """
-
-    def __init__(self, ql):
-        super().__init__(ql)
-        ArchARM.__init__(self)
-
-        self.INST_SIZE = 4
-        self.THUMB_INST_SIZE = 2
-        self.CODE_END = "udf"
-
-    def read_reg(self, reg_name):
-        reg_name = reg_name.replace("ip", "r12").replace("fp", "r11")
-        return getattr(self.ql.arch.regs, reg_name)
 
-    def regdst_eq_pc(self, op_str):
-        return op_str.partition(", ")[0] == "pc"
+    stop = 'udf'
 
-    @staticmethod
-    def get_cpsr(bits: int) -> (bool, bool, bool, bool):
+    def get_cond_flags(self) -> Tuple[bool, bool, bool, bool]:
+        """Get condition status flags from CPSR / xPSR.
         """
-        get flags from ql.reg.cpsr
-        """
-        return (
-                bits & 0x10000000 != 0, # V, overflow flag
-                bits & 0x20000000 != 0, # C, carry flag
-                bits & 0x40000000 != 0, # Z, zero flag
-                bits & 0x80000000 != 0, # N, sign flag
-                )
-
-    def predict(self, pref_addr=None):
-        prophecy = Prophecy()
-        cur_addr = self.cur_addr if pref_addr is None else pref_addr
-        line = self.disasm(cur_addr)
-
-        if line.mnemonic == self.CODE_END: # indicates program exited
-            prophecy.where = True
-            return prophecy
-
-        jump_table = {
-                # unconditional branch
-                "b"    : (lambda *_: True),
-                "bl"   : (lambda *_: True),
-                "bx"   : (lambda *_: True),
-                "blx"  : (lambda *_: True),
-                "b.w"  : (lambda *_: True),
-
-                # branch on equal, Z == 1
-                "beq"  : (lambda V, C, Z, N: Z == 1),
-                "bxeq" : (lambda V, C, Z, N: Z == 1),
-                "beq.w": (lambda V, C, Z, N: Z == 1),
-
-                # branch on not equal, Z == 0
-                "bne"  : (lambda V, C, Z, N: Z == 0),
-                "bxne" : (lambda V, C, Z, N: Z == 0),
-                "bne.w": (lambda V, C, Z, N: Z == 0),
-
-                # branch on signed greater than, Z == 0 and N == V
-                "bgt"  : (lambda V, C, Z, N: (Z == 0 and N == V)),
-                "bgt.w": (lambda V, C, Z, N: (Z == 0 and N == V)),
-
-                # branch on signed less than, N != V
-                "blt"  : (lambda V, C, Z, N: N != V),
-
-                # branch on signed greater than or equal, N == V
-                "bge"  : (lambda V, C, Z, N: N == V),
-
-                # branch on signed less than or queal
-                "ble"  : (lambda V, C, Z, N: Z == 1 or N != V),
-
-                # branch on unsigned higher or same (or carry set), C == 1
-                "bhs"  : (lambda V, C, Z, N: C == 1),
-                "bcs"  : (lambda V, C, Z, N: C == 1),
-
-                # branch on unsigned lower (or carry clear), C == 0
-                "bcc"  : (lambda V, C, Z, N: C == 0),
-                "blo"  : (lambda V, C, Z, N: C == 0),
-                "bxlo" : (lambda V, C, Z, N: C == 0),
-                "blo.w": (lambda V, C, Z, N: C == 0),
-
-                # branch on negative or minus, N == 1
-                "bmi"  : (lambda V, C, Z, N: N == 1),
-
-                # branch on positive or plus, N == 0
-                "bpl"  : (lambda V, C, Z, N: N == 0),
-
-                # branch on signed overflow
-                "bvs"  : (lambda V, C, Z, N: V == 1),
-
-                # branch on no signed overflow
-                "bvc"  : (lambda V, C, Z, N: V == 0),
-
-                # branch on unsigned higher
-                "bhi"  : (lambda V, C, Z, N: (Z == 0 and C == 1)),
-                "bxhi" : (lambda V, C, Z, N: (Z == 0 and C == 1)),
-                "bhi.w": (lambda V, C, Z, N: (Z == 0 and C == 1)),
-
-                # branch on unsigned lower
-                "bls"  : (lambda V, C, Z, N: (C == 0 or Z == 1)),
-                "bls.w": (lambda V, C, Z, N: (C == 0 or Z == 1)),
-                }
 
-        cb_table = {
-                # branch on equal to zero
-                "cbz" : (lambda r: r == 0),
+        cpsr = self.read_reg(self._flags_reg)
 
-                # branch on not equal to zero
-                "cbnz": (lambda r: r != 0),
-                }
+        return (
+            (cpsr & (0b1 << 28)) != 0,  # V, overflow flag
+            (cpsr & (0b1 << 29)) != 0,  # C, carry flag
+            (cpsr & (0b1 << 30)) != 0,  # Z, zero flag
+            (cpsr & (0b1 << 31)) != 0   # N, sign flag
+        )
 
-        if line.mnemonic in jump_table:
-            prophecy.going = jump_table.get(line.mnemonic)(*self.get_cpsr(self.ql.arch.regs.cpsr))
+    def predict(self) -> Prophecy:
+        insn = self.disasm(self.cur_addr, True)
 
-        elif line.mnemonic in cb_table:
-            prophecy.going = cb_table.get(line.mnemonic)(self.read_reg(line.op_str.split(", ")[0]))
+        going = False
+        where = 0
 
-        if prophecy.going:
-            if "#" in line.op_str:
-                prophecy.where = read_int(line.op_str.split("#")[-1])
-            else:
-                prophecy.where = self.read_reg(line.op_str)
+        # invalid instruction; nothing to predict
+        if isinstance(insn, InvalidInsn):
+            return Prophecy(going, where)
 
-                if self.regdst_eq_pc(line.op_str):
-                    next_addr = cur_addr + line.size
-                    n2_addr = next_addr + len(self.read_insn(next_addr))
-                    prophecy.where += len(self.read_insn(n2_addr)) + len(self.read_insn(next_addr))
+        # iname is the instruction's basename stripped from all optional suffixes.
+        # this greatly simplifies the case handling
+        iname: str = insn.insn_name() or ''
+        operands: List[ArmOp] = insn.operands
 
-        elif line.mnemonic.startswith("it"):
-            # handle IT block here
+        # branch instructions
+        branches = ('b', 'bl', 'bx', 'blx')
 
-            cond_met = {
-                    "eq": lambda V, C, Z, N: (Z == 1),
-                    "ne": lambda V, C, Z, N: (Z == 0),
-                    "ge": lambda V, C, Z, N: (N == V),
-                    "hs": lambda V, C, Z, N: (C == 1),
-                    "lo": lambda V, C, Z, N: (C == 0),
-                    "mi": lambda V, C, Z, N: (N == 1),
-                    "pl": lambda V, C, Z, N: (N == 0),
-                    "ls": lambda V, C, Z, N: (C == 0 or Z == 1),
-                    "le": lambda V, C, Z, N: (Z == 1 or N != V),
-                    "hi": lambda V, C, Z, N: (Z == 0 and C == 1),
-                    }.get(line.op_str)(*self.get_cpsr(self.ql.arch.regs.cpsr))
+        # reg-based conditional branches
+        conditional_reg: Dict[str, Callable[[int], bool]] = {
+            'cbz' : lambda r: r == 0,
+            'cbnz': lambda r: r != 0
+        }
 
-            it_block_range = [each_char for each_char in line.mnemonic[1:]]
+        def __read_reg(reg: int) -> Optional[int]:
+            """[internal] Read register value where register is provided as a Unicorn constant.
+            """
 
-            next_addr = cur_addr + self.THUMB_INST_SIZE
-            for each in it_block_range:
-                _insn = self.read_insn(next_addr)
-                n2_addr = self.predict(ql, next_addr)
+            # name will be None in case of an invalid register. this is expected in some cases
+            # and should not raise an exception, but rather silently dropped
+            name = insn.reg_name(reg)
 
-                if (cond_met and each == "t") or (not cond_met and each == "e"):
-                    if n2_addr != (next_addr+len(_insn)): # branch detected
-                        break
+            # pc reg value needs adjustment
+            adj = (2 * self.isize) if reg == UC_ARM_REG_PC else 0
 
-                next_addr += len(_insn)
+            return name and self.read_reg(self.unalias(name)) + adj
 
-            prophecy.where = next_addr
+        def __read_mem(mem: ArmOpMem, size: int = 0, *, signed: bool = False) -> Optional[int]:
+            """[internal] Attempt to read memory contents. By default memory accesses are in
+            native size and values are unsigned.
+            """
 
-        elif line.mnemonic in ("ldr",):
+            base  = __read_reg(mem.base) or 0
+            index = __read_reg(mem.index) or 0
+            scale = mem.scale
+            disp  = mem.disp
 
-            if self.regdst_eq_pc(line.op_str):
-                _, _, rn_offset = line.op_str.partition(", ")
-                r, _, imm = rn_offset.strip("[]!").partition(", #")
+            return self.try_read_pointer(base + index * scale + disp, size, signed=signed)
 
-                if "]" in rn_offset.split(", ")[1]: # pre-indexed immediate
-                    prophecy.where = self.unpack32(self.read_mem(read_int(imm) + self.read_reg(r), self.INST_SIZE))
+        def __parse_op(op: ArmOp, *args, **kwargs) -> Optional[int]:
+            """[internal] Parse an operand and return its value. Register references will be
+            substitued with the corresponding register value, while memory dereferences will
+            be substitued by the effective address they refer to.
+            """
 
-                else: # post-indexed immediate
-                    # FIXME: weired behavior, immediate here does not apply
-                    prophecy.where = self.unpack32(self.read_mem(self.read_reg(r), self.INST_SIZE))
+            if op.type == CS_OP_REG:
+                value = __read_reg(op.reg)
 
-        elif line.mnemonic in ("addls", "addne", "add") and self.regdst_eq_pc(line.op_str):
-            V, C, Z, N = self.get_cpsr(self.ql.arch.regs.cpsr)
-            r0, r1, r2, *imm = line.op_str.split(", ")
+            elif op.type == CS_OP_IMM:
+                value = op.imm
 
-            # program counter is awalys 8 bytes ahead when it comes with pc, need to add extra 8 bytes
-            extra = 8 if 'pc' in (r0, r1, r2) else 0
+            elif op.type == CS_OP_MEM:
+                value = __read_mem(op.mem, *args, **kwargs)
 
-            if imm:
-                expr = imm[0].split()
-                # TODO: should support more bit shifting and rotating operation
-                if expr[0] == "lsl": # logical shift left
-                    n = read_int(expr[-1].strip("#")) * 2
+            else:
+                # we are not expecting any other operand type, including floating point (CS_OP_FP)
+                raise RuntimeError(f'unexpected operand type: {op.type}')
+
+            # LSR
+            if op.shift.type == 1:
+                value *= (1 >> op.shift.value)
+
+            # LSL
+            elif op.shift.type == 2:
+                value *= (1 << op.shift.value)
+
+            # ROR ?
+
+            return value
+
+        def __is_taken(cc: int) -> Tuple[bool, Tuple[bool, ...]]:
+            pred = predicate[cc]
+            flags = self.get_cond_flags()
+
+            return pred(*flags), flags
+
+        # conditions predicate selector
+        predicate: Dict[int, Callable[..., bool]] = {
+            ARM_CC_EQ: lambda V, C, Z, N: Z,
+            ARM_CC_NE: lambda V, C, Z, N: not Z,
+            ARM_CC_HS: lambda V, C, Z, N: C,
+            ARM_CC_LO: lambda V, C, Z, N: not C,
+            ARM_CC_MI: lambda V, C, Z, N: N,
+            ARM_CC_PL: lambda V, C, Z, N: not N,
+            ARM_CC_VS: lambda V, C, Z, N: V,
+            ARM_CC_VC: lambda V, C, Z, N: not V,
+            ARM_CC_HI: lambda V, C, Z, N: (not Z) and C,
+            ARM_CC_LS: lambda V, C, Z, N: (not C) or Z,
+            ARM_CC_GE: lambda V, C, Z, N: (N == V),
+            ARM_CC_LT: lambda V, C, Z, N: (N != V),
+            ARM_CC_GT: lambda V, C, Z, N: not Z and (N == V),
+            ARM_CC_LE: lambda V, C, Z, N: Z or (N != V),
+            ARM_CC_AL: lambda V, C, Z, N: True
+        }
+
+        # implementation of simple binary arithmetic and bitwise operations
+        binop: Dict[str, Callable[[int, int, int], int]] = {
+            'add': lambda a, b, _: a + b,
+            'adc': lambda a, b, c: a + b + c,
+            'sub': lambda a, b, _: a - b,
+            'rsb': lambda a, b, _: b - a,
+            'sbc': lambda a, b, c: a - b - (1 - c),
+            'rsc': lambda a, b, c: b - a - (1 - c),
+            'mul': lambda a, b, _: a * b,
+            'and': lambda a, b, _: a & b,
+            'orr': lambda a, b, _: a | b,
+            'eor': lambda a, b, _: a ^ b
+        }
+
+        # is this a branch?
+        if iname in branches:
+            going, _ = __is_taken(insn.cc)
+
+            if going:
+                where = __parse_op(operands[0])
+
+            return Prophecy(going, where)
+
+        if iname in conditional_reg:
+            is_taken = conditional_reg[iname]
+            reg = __parse_op(operands[0])
+            assert reg is not None, 'unrecognized reg'
+
+            going = is_taken(reg)
+
+            if going:
+                where = __parse_op(operands[1])
+
+            return Prophecy(going, where)
+
+        # instruction is not a branch; check whether pc is affected by this instruction.
+        #
+        # insn.regs_write doesn't work well, so we use insn.regs_access instead
+        if UC_ARM_REG_PC in insn.regs_access()[1]:
+
+            if iname == 'mov':
+                going = True
+                where = __parse_op(operands[1])
+
+            elif iname.startswith('ldr'):
+                suffix: str = insn.mnemonic[3:]
+
+                # map possible ldr suffixes to kwargs required for the memory access.
+                #
+                # to improve readability we also address the case where ldr has no suffix
+                # and no special kwargs are required. all strings start with '', so it
+                # serves as a safe default case
+                msize: Dict[str, Dict] = {
+                    'b' : {'size': 1, 'signed': False},
+                    'h' : {'size': 2, 'signed': False},
+                    'sb': {'size': 1, 'signed': True},
+                    'sh': {'size': 2, 'signed': True},
+                    ''  : {}
+                }
 
-            if line.mnemonic == "addls" and (C == 0 or Z == 1):
-                prophecy.where = extra + self.read_reg(r1) + self.read_reg(r2) * n
+                # ldr has different variations that affect the memory access size and
+                # whether the value should be signed or not.
+                suffix = next(s for s in msize if suffix.startswith(s))
 
-            elif line.mnemonic == "add" or (line.mnemonic == "addne" and Z == 0):
-                prophecy.where = extra + self.read_reg(r1) + (self.read_reg(r2) * n if imm else self.read_reg(r2))
+                going, _ = __is_taken(insn.cc)
 
-        elif line.mnemonic in ("tbh", "tbb"):
+                if going:
+                    where = __parse_op(operands[1], **msize[suffix])
 
-            cur_addr += self.INST_SIZE
-            r0, r1, *imm = line.op_str.strip("[]").split(", ")
+            elif iname in binop:
+                going, flags = __is_taken(insn.cc)
 
-            if imm:
-                expr = imm[0].split()
-                if expr[0] == "lsl": # logical shift left
-                    n = read_int(expr[-1].strip("#")) * 2
+                if going:
+                    operator = binop[iname]
+                    op1 = __parse_op(operands[1])
+                    op2 = __parse_op(operands[2])
+                    carry = int(flags[1])
 
-            if line.mnemonic == "tbh":
+                    where = (op1 and op2) and operator(op1, op2, carry)
 
-                r1 = self.read_reg(r1) * n
+            elif iname == 'pop':
+                going, _ = __is_taken(insn.cc)
 
-            elif line.mnemonic == "tbb":
+                if going:
+                    # find pc position within pop regs list
+                    idx = next(i for i, op in enumerate(operands) if (op.type == CS_OP_REG) and (op.reg == UC_ARM_REG_PC))
 
-                r1 = self.read_reg(r1)
+                    # read the corresponding stack entry
+                    where = self.ql.stack_read(idx * self.asize)
 
-            to_add = int.from_bytes(self.read_mem(cur_addr+r1, 2 if line.mnemonic == "tbh" else 1), byteorder="little") * n
-            prophecy.where = cur_addr + to_add
+            else:
+                # left here for users to provide feedback when encountered
+                raise RuntimeWarning(f'instruction affects pc but was not considered: {insn.mnemonic}')
 
-        elif line.mnemonic.startswith("pop") and "pc" in line.op_str:
+        # for some reason capstone does not consider pc to be affected by 'tbb' and 'tbh'
+        # so we need to test for them specifically
 
-            prophecy.where = self.ql.stack_read(line.op_str.strip("{}").split(", ").index("pc") * self.INST_SIZE)
-            if not { # step to next instruction if cond does not meet
-                    "pop"  : lambda *_: True,
-                    "pop.w": lambda *_: True,
-                    "popeq": lambda V, C, Z, N: (Z == 1),
-                    "popne": lambda V, C, Z, N: (Z == 0),
-                    "pophi": lambda V, C, Z, N: (C == 1),
-                    "popge": lambda V, C, Z, N: (N == V),
-                    "poplt": lambda V, C, Z, N: (N != V),
-                    }.get(line.mnemonic)(*self.get_cpsr(self.ql.arch.regs.cpsr)):
+        # table branch byte
+        elif iname == 'tbb':
+            offset = __read_mem(operands[0].mem, 1)
+            pc = __read_reg(UC_ARM_REG_PC)
 
-                prophecy.where = cur_addr + self.INST_SIZE
+            going = True
+            where = (offset and pc) and (pc + offset * 2)
 
-        elif line.mnemonic == "sub" and self.regdst_eq_pc(line.op_str):
-            _, r, imm = line.op_str.split(", ")
-            prophecy.where = self.read_reg(r) - read_int(imm.strip("#"))
+        # table branch half-word
+        elif iname == 'tbh':
+            offset = __read_mem(operands[0].mem, 2)
+            pc = __read_reg(UC_ARM_REG_PC)
 
-        elif line.mnemonic == "mov" and self.regdst_eq_pc(line.op_str):
-            _, r = line.op_str.split(", ")
-            prophecy.where = self.read_reg(r)
+            going = True
+            where = (offset and pc) and (pc + offset * 2)
 
-        if prophecy.where is not None:
-            prophecy.where &= ~0b1
+        return Prophecy(going, where)
 
-        return prophecy
 
-class BranchPredictorCORTEX_M(BranchPredictorARM):
-    def __init__(self, ql):
-        super().__init__(ql)
+class BranchPredictorCORTEX_M(BranchPredictorARM, ArchCORTEX_M):
+    """Branch Predictor for ARM Cortex-M.
+    """
diff --git a/qiling/debugger/qdb/branch_predictor/branch_predictor_intel.py b/qiling/debugger/qdb/branch_predictor/branch_predictor_intel.py
new file mode 100644
index 000000000..672fa0041
--- /dev/null
+++ b/qiling/debugger/qdb/branch_predictor/branch_predictor_intel.py
@@ -0,0 +1,181 @@
+#!/usr/bin/env python3
+#
+# Cross Platform and Multi Architecture Advanced Binary Emulation Framework
+#
+
+from typing import Callable, Dict, List, Optional, Tuple
+
+from capstone.x86 import X86Op
+from capstone.x86_const import X86_OP_REG, X86_OP_IMM, X86_OP_MEM, X86_INS_LEA
+
+from .branch_predictor import Prophecy, BranchPredictor
+from ..arch import ArchX86, ArchX64
+from ..misc import InvalidInsn
+
+
+class BranchPredictorIntel(BranchPredictor):
+    """Branch Predictor base class for Intel architecture.
+    """
+
+    stop = 'hlt'
+
+    def get_eflags(self) -> Tuple[int, int, int, int, int]:
+        eflags = self.read_reg('eflags')
+
+        return (
+            (eflags & (0b1 <<  0)) != 0,  # carry
+            (eflags & (0b1 <<  2)) != 0,  # parity
+            (eflags & (0b1 <<  6)) != 0,  # zero
+            (eflags & (0b1 <<  7)) != 0,  # sign
+            (eflags & (0b1 << 11)) != 0   # overflow
+        )
+
+    def predict(self) -> Prophecy:
+        insn = self.disasm(self.cur_addr, True)
+
+        going = False
+        where = 0
+
+        # invalid instruction; nothing to predict
+        if isinstance(insn, InvalidInsn):
+            return Prophecy(going, where)
+
+        mnem: str = insn.mnemonic
+        operands: List[X86Op] = insn.operands
+
+        # unconditional branches
+        unconditional = ('call', 'jmp')
+
+        # flags-based conditional branches
+        conditional: Dict[str, Callable[..., bool]] = {
+            'jb'  : lambda C, P, Z, S, O: C,
+            'jc'  : lambda C, P, Z, S, O: C,
+            'jnae': lambda C, P, Z, S, O: C,
+
+            'jnb' : lambda C, P, Z, S, O: not C,
+            'jnc' : lambda C, P, Z, S, O: not C,
+            'jae' : lambda C, P, Z, S, O: not C,
+
+            'jp'  : lambda C, P, Z, S, O: P,
+            'jpe' : lambda C, P, Z, S, O: P,
+
+            'jnp' : lambda C, P, Z, S, O: not P,
+            'jpo' : lambda C, P, Z, S, O: not P,
+
+            'je'  : lambda C, P, Z, S, O: Z,
+            'jz'  : lambda C, P, Z, S, O: Z,
+
+            'jne' : lambda C, P, Z, S, O: not Z,
+            'jnz' : lambda C, P, Z, S, O: not Z,
+
+            'js'  : lambda C, P, Z, S, O: S,
+            'jns' : lambda C, P, Z, S, O: not S,
+
+            'jo'  : lambda C, P, Z, S, O: O,
+            'jno' : lambda C, P, Z, S, O: not O,
+
+            'jbe' : lambda C, P, Z, S, O: C or Z,
+            'jna' : lambda C, P, Z, S, O: C or Z,
+
+            'ja'  : lambda C, P, Z, S, O: (not C) and (not Z),
+            'jnbe': lambda C, P, Z, S, O: (not C) and (not Z),
+
+            'jl'  : lambda C, P, Z, S, O: S != O,
+            'jnge': lambda C, P, Z, S, O: S != O,
+
+            'jge' : lambda C, P, Z, S, O: S == O,
+            'jnl' : lambda C, P, Z, S, O: S == O,
+
+            'jle' : lambda C, P, Z, S, O: Z or (S != O),
+            'jng' : lambda C, P, Z, S, O: Z or (S != O),
+
+            'jg'  : lambda C, P, Z, S, O: (not Z) or (not S),
+            'jnle': lambda C, P, Z, S, O: (not Z) or (not S)
+        }
+
+        # reg-based conditional branches
+        conditional_reg = {
+            "jcxz"  : 'cx',
+            "jecxz" : 'ecx',
+            "jrcxz" : 'rcx'
+        }
+
+        def __read_reg(reg: int) -> Optional[int]:
+            """Read register value where register is provided as a Unicorn constant.
+            """
+
+            # name will be None in case of an illegal or unknown register
+            name = insn.reg_name(reg)
+
+            return name and self.read_reg(name)
+
+        def __parse_op(op: X86Op) -> Optional[int]:
+            """Parse an operand and return its value. Memory dereferences will be
+            substitued by the effective address they refer to.
+            """
+
+            if op.type == X86_OP_REG:
+                value = __read_reg(op.reg)
+
+            elif op.type == X86_OP_IMM:
+                value = op.imm
+
+            elif op.type == X86_OP_MEM:
+                mem = op.mem
+
+                base  = __read_reg(mem.base) or 0
+                index = __read_reg(mem.index) or 0
+                scale = mem.scale
+                disp  = mem.disp
+
+                seg = __read_reg(mem.segment) or 0
+                ea = seg * 0x10 + (base + index * scale + disp)
+
+                # lea does not really dereference memory
+                value = ea if insn.id == X86_INS_LEA else self.try_read_pointer(ea)
+
+            else:
+                raise RuntimeError(f'unexpected operand type: {op.type}')
+
+            return value
+
+        # is this an unconditional branch?
+        if mnem in unconditional:
+            going = True
+            where = __parse_op(operands[0])
+
+        # is this a return from a function call?
+        elif mnem == 'ret':
+            going = True
+            where = self.ql.arch.stack_read(0)
+
+        # is this a flags-based branch?
+        elif mnem in conditional:
+            predict = conditional[mnem]
+            eflags = self.get_eflags()
+
+            going = predict(*eflags)
+
+            if going:
+                where = __parse_op(operands[0])
+
+        elif mnem in conditional_reg:
+            reg = conditional_reg[mnem]
+            predict = lambda c: c == 0
+
+            going = predict(self.read_reg(reg))
+
+            if going:
+                where = __parse_op(operands[0])
+
+        return Prophecy(going, where)
+
+
+class BranchPredictorX86(BranchPredictorIntel, ArchX86):
+    """Branch Predictor for x86.
+    """
+
+
+class BranchPredictorX64(BranchPredictorIntel, ArchX64):
+    """Branch Predictor for x86-64.
+    """
diff --git a/qiling/debugger/qdb/branch_predictor/branch_predictor_mips.py b/qiling/debugger/qdb/branch_predictor/branch_predictor_mips.py
index a111df8f6..e7423389b 100644
--- a/qiling/debugger/qdb/branch_predictor/branch_predictor_mips.py
+++ b/qiling/debugger/qdb/branch_predictor/branch_predictor_mips.py
@@ -3,88 +3,95 @@
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
+from typing import Optional
+from capstone.mips import MipsOp, MIPS_OP_REG, MIPS_OP_IMM
 
-
-from .branch_predictor import *
+from .branch_predictor import BranchPredictor, Prophecy
 from ..arch import ArchMIPS
+from ..misc import InvalidInsn
+
 
 class BranchPredictorMIPS(BranchPredictor, ArchMIPS):
-    """
-    predictor for MIPS
+    """Branch Predictor for MIPS 32.
     """
 
-    def __init__(self, ql):
-        super().__init__(ql)
-        ArchMIPS.__init__(self)
-        self.CODE_END = "break"
-        self.INST_SIZE = 4
+    stop = 'break'
 
-    @staticmethod
-    def signed_val(val: int) -> int:
-        """
-        signed value convertion
-        """
+    def predict(self):
+        insn = self.disasm(self.cur_addr, True)
+
+        going = False
+        where = 0
+
+        # invalid instruction; nothing to predict
+        if isinstance(insn, InvalidInsn):
+            return Prophecy(going, where)
+
+        unconditional = ('j', 'jr', 'jal', 'jalr', 'b', 'bl', 'bal')
+
+        conditional = {
+            'beq'   : lambda r0, r1: r0 == r1,  # branch on equal
+            'bne'   : lambda r0, r1: r0 != r1,  # branch on not equal
+            'blt'   : lambda r0, r1: r0 < r1,   # branch on r0 less than r1
+            'bgt'   : lambda r0, r1: r0 > r1,   # branch on r0 greater than r1
+            'ble'   : lambda r0, r1: r0 <= r1,  # branch on r0 less than or equal to r1
+            'bge'   : lambda r0, r1: r0 >= r1,  # branch on r0 greater than or equal to r1
+
+            'beqz'  : lambda r: r == 0,         # branch on equal to zero
+            'bnez'  : lambda r: r != 0,         # branch on not equal to zero
+            'bgtz'  : lambda r: r > 0,          # branch on greater than zero
+            'bltz'  : lambda r: r < 0,          # branch on less than zero
+            'bltzal': lambda r: r < 0,          # branch on less than zero and link
+            'blez'  : lambda r: r <= 0,         # branch on less than or equal to zero
+            'bgez'  : lambda r: r >= 0,         # branch on greater than or equal to zero
+            'bgezal': lambda r: r >= 0          # branch on greater than or equal to zero and link
+        }
+
+        def __as_signed(val: int) -> int:
+            """Get the signed integer representation of a given value.
+            """
 
-        def is_negative(i: int) -> int:
+            msb = 0b1 << 31
+
+            return (val & ~msb) - (val & msb)
+
+        def __read_reg(reg: int) -> Optional[int]:
+            """Read register value where register is provided as a Unicorn constant.
             """
-            check wether negative value or not
+
+            # name will be None in case of an illegal or unknown register
+            name = insn.reg_name(reg)
+
+            return name and __as_signed(self.read_reg(self.unalias(name)))
+
+        def __parse_op(op: MipsOp) -> Optional[int]:
+            """Parse an operand and return its value.
             """
 
-            return i & (1 << 31)
+            if op.type == MIPS_OP_REG:
+                value = __read_reg(op.reg)
 
-        return (val-1 << 32) if is_negative(val) else val
+            elif op.type == MIPS_OP_IMM:
+                value = op.imm
 
-    def read_reg(self, reg_name):
-        reg_name = reg_name.strip("$").replace("fp", "s8")
-        return self.signed_val(getattr(self.ql.arch.regs, reg_name))
+            else:
+                raise RuntimeError(f'unexpected operand type: {op.type}')
 
-    def predict(self):
-        prophecy = Prophecy()
-        line = self.disasm(self.cur_addr)
-
-        if line.mnemonic == self.CODE_END: # indicates program extied
-            prophecy.where = True
-            return prophecy
-
-        prophecy.where = self.cur_addr + self.INST_SIZE
-        if line.mnemonic.startswith('j') or line.mnemonic.startswith('b'):
-
-            # make sure at least delay slot executed
-            prophecy.where += self.INST_SIZE
-
-            # get registers or memory address from op_str
-            targets = [
-                    self.read_reg(each)
-                    if '$' in each else read_int(each)
-                    for each in line.op_str.split(", ")
-                    ]
-
-            prophecy.going = {
-                    "j"       : (lambda _: True),             # unconditional jump
-                    "jr"      : (lambda _: True),             # unconditional jump
-                    "jal"     : (lambda _: True),             # unconditional jump
-                    "jalr"    : (lambda _: True),             # unconditional jump
-                    "b"       : (lambda _: True),             # unconditional branch
-                    "bl"      : (lambda _: True),             # unconditional branch
-                    "bal"     : (lambda _: True),             # unconditional branch
-                    "beq"     : (lambda r0, r1, _: r0 == r1), # branch on equal
-                    "bne"     : (lambda r0, r1, _: r0 != r1), # branch on not equal
-                    "blt"     : (lambda r0, r1, _: r0 < r1),  # branch on r0 less than r1
-                    "bgt"     : (lambda r0, r1, _: r0 > r1),  # branch on r0 greater than r1
-                    "ble"     : (lambda r0, r1, _: r0 <= r1), # brach on r0 less than or equal to r1
-                    "bge"     : (lambda r0, r1, _: r0 >= r1), # branch on r0 greater than or equal to r1
-                    "beqz"    : (lambda r, _: r == 0),        # branch on equal to zero
-                    "bnez"    : (lambda r, _: r != 0),        # branch on not equal to zero
-                    "bgtz"    : (lambda r, _: r > 0),         # branch on greater than zero
-                    "bltz"    : (lambda r, _: r < 0),         # branch on less than zero
-                    "bltzal"  : (lambda r, _: r < 0),         # branch on less than zero and link
-                    "blez"    : (lambda r, _: r <= 0),        # branch on less than or equal to zero
-                    "bgez"    : (lambda r, _: r >= 0),        # branch on greater than or equal to zero
-                    "bgezal"  : (lambda r, _: r >= 0),        # branch on greater than or equal to zero and link
-                    }.get(line.mnemonic)(*targets)
-
-            if prophecy.going:
-                # target address is always the rightmost one
-                prophecy.where = targets[-1]
-
-        return prophecy
+            return value
+
+        # get operands. target address is always the rightmost one
+        if insn.operands:
+            *operands, target = insn.operands
+
+        if insn.mnemonic in unconditional:
+            going = True
+
+        elif insn.mnemonic in conditional:
+            predict = conditional[insn.mnemonic]
+
+            going = predict(*(__parse_op(op) for op in operands))
+
+        if going:
+            where = __parse_op(target)
+
+        return Prophecy(going, where)
diff --git a/qiling/debugger/qdb/branch_predictor/branch_predictor_x86.py b/qiling/debugger/qdb/branch_predictor/branch_predictor_x86.py
deleted file mode 100644
index dd1e34fee..000000000
--- a/qiling/debugger/qdb/branch_predictor/branch_predictor_x86.py
+++ /dev/null
@@ -1,128 +0,0 @@
-#!/usr/bin/env python3
-#
-# Cross Platform and Multi Architecture Advanced Binary Emulation Framework
-#
-
-
-
-import re
-
-from .branch_predictor import *
-from ..arch import ArchX86
-from ..misc import check_and_eval
-
-class BranchPredictorX86(BranchPredictor, ArchX86):
-    """
-    predictor for X86
-    """
-
-    class ParseError(Exception):
-        """
-        indicate parser error
-        """
-        pass
-
-    def __init__(self, ql):
-        super().__init__(ql)
-        ArchX86.__init__(self)
-
-    def predict(self):
-        prophecy = Prophecy()
-        line = self.disasm(self.cur_addr)
-
-        jump_table = {
-                # conditional jump
-
-                "jo"   : (lambda C, P, A, Z, S, O: O == 1),
-                "jno"  : (lambda C, P, A, Z, S, O: O == 0),
-
-                "js"   : (lambda C, P, A, Z, S, O: S == 1),
-                "jns"  : (lambda C, P, A, Z, S, O: S == 0),
-
-                "je"   : (lambda C, P, A, Z, S, O: Z == 1),
-                "jz"   : (lambda C, P, A, Z, S, O: Z == 1),
-
-                "jne"  : (lambda C, P, A, Z, S, O: Z == 0),
-                "jnz"  : (lambda C, P, A, Z, S, O: Z == 0),
-
-                "jb"   : (lambda C, P, A, Z, S, O: C == 1),
-                "jc"   : (lambda C, P, A, Z, S, O: C == 1),
-                "jnae" : (lambda C, P, A, Z, S, O: C == 1),
-
-                "jnb"  : (lambda C, P, A, Z, S, O: C == 0),
-                "jnc"  : (lambda C, P, A, Z, S, O: C == 0),
-                "jae"  : (lambda C, P, A, Z, S, O: C == 0),
-
-                "jbe"  : (lambda C, P, A, Z, S, O: C == 1 or Z == 1),
-                "jna"  : (lambda C, P, A, Z, S, O: C == 1 or Z == 1),
-
-                "ja"   : (lambda C, P, A, Z, S, O: C == 0 and Z == 0),
-                "jnbe" : (lambda C, P, A, Z, S, O: C == 0 and Z == 0),
-
-                "jl"   : (lambda C, P, A, Z, S, O: S != O),
-                "jnge" : (lambda C, P, A, Z, S, O: S != O),
-
-                "jge"  : (lambda C, P, A, Z, S, O: S == O),
-                "jnl"  : (lambda C, P, A, Z, S, O: S == O),
-
-                "jle"  : (lambda C, P, A, Z, S, O: Z == 1 or S != O),
-                "jng"  : (lambda C, P, A, Z, S, O: Z == 1 or S != O),
-
-                "jg"   : (lambda C, P, A, Z, S, O: Z == 0 or S == O),
-                "jnle" : (lambda C, P, A, Z, S, O: Z == 0 or S == O),
-
-                "jp"   : (lambda C, P, A, Z, S, O: P == 1),
-                "jpe"  : (lambda C, P, A, Z, S, O: P == 1),
-
-                "jnp"  : (lambda C, P, A, Z, S, O: P == 0),
-                "jpo"  : (lambda C, P, A, Z, S, O: P == 0),
-
-                # unconditional jump
-
-                "call" : (lambda *_: True),
-                "jmp"  : (lambda *_: True),
-
-                }
-
-        jump_reg_table = {
-                "jcxz"  : (lambda cx: cx == 0),
-                "jecxz" : (lambda ecx: ecx == 0),
-                "jrcxz" : (lambda rcx: rcx == 0),
-                }
-
-        if line.mnemonic in jump_table:
-            eflags = self.get_flags(self.ql.arch.regs.eflags).values()
-            prophecy.going = jump_table.get(line.mnemonic)(*eflags)
-
-        elif line.mnemonic in jump_reg_table:
-            prophecy.going = jump_reg_table.get(line.mnemonic)(self.ql.arch.regs.ecx)
-
-        if prophecy.going:
-            takeaway_list = ["ptr", "dword", "[", "]"]
-
-            if len(line.op_str.split()) > 1:
-                new_line = line.op_str.replace(":", "+")
-                for each in takeaway_list:
-                    new_line = new_line.replace(each, " ")
-
-                new_line = " ".join(new_line.split())
-                for each_reg in filter(lambda r: len(r) == 3, self.ql.arch.regs.register_mapping.keys()):
-                    if each_reg in new_line:
-                        new_line = re.sub(each_reg, hex(self.read_reg(each_reg)), new_line)
-
-                for each_reg in filter(lambda r: len(r) == 2, self.ql.arch.regs.register_mapping.keys()):
-                    if each_reg in new_line:
-                        new_line = re.sub(each_reg, hex(self.read_reg(each_reg)), new_line)
-
-
-                prophecy.where = check_and_eval(new_line)
-
-            elif line.op_str in self.ql.arch.regs.register_mapping:
-                prophecy.where = self.ql.arch.regs.read(line.op_str)
-
-            else:
-                prophecy.where = read_int(line.op_str)
-        else:
-            prophecy.where = self.cur_addr + line.size
-
-        return prophecy
diff --git a/qiling/debugger/qdb/branch_predictor/branch_predictor_x8664.py b/qiling/debugger/qdb/branch_predictor/branch_predictor_x8664.py
deleted file mode 100644
index 1350c9bb3..000000000
--- a/qiling/debugger/qdb/branch_predictor/branch_predictor_x8664.py
+++ /dev/null
@@ -1,127 +0,0 @@
-#!/usr/bin/env python3
-#
-# Cross Platform and Multi Architecture Advanced Binary Emulation Framework
-#
-
-
-
-import re
-
-from .branch_predictor import *
-from ..arch import ArchX8664
-from ..misc import check_and_eval
-
-class BranchPredictorX8664(BranchPredictor, ArchX8664):
-    """
-    predictor for X86
-    """
-
-    class ParseError(Exception):
-        """
-        indicate parser error
-        """
-        pass
-
-    def __init__(self, ql):
-        super().__init__(ql)
-        ArchX8664.__init__(self)
-
-    def predict(self):
-        prophecy = Prophecy()
-        line = self.disasm(self.cur_addr)
-
-        jump_table = {
-                # conditional jump
-
-                "jo"   : (lambda C, P, A, Z, S, O: O == 1),
-                "jno"  : (lambda C, P, A, Z, S, O: O == 0),
-
-                "js"   : (lambda C, P, A, Z, S, O: S == 1),
-                "jns"  : (lambda C, P, A, Z, S, O: S == 0),
-
-                "je"   : (lambda C, P, A, Z, S, O: Z == 1),
-                "jz"   : (lambda C, P, A, Z, S, O: Z == 1),
-
-                "jne"  : (lambda C, P, A, Z, S, O: Z == 0),
-                "jnz"  : (lambda C, P, A, Z, S, O: Z == 0),
-
-                "jb"   : (lambda C, P, A, Z, S, O: C == 1),
-                "jc"   : (lambda C, P, A, Z, S, O: C == 1),
-                "jnae" : (lambda C, P, A, Z, S, O: C == 1),
-
-                "jnb"  : (lambda C, P, A, Z, S, O: C == 0),
-                "jnc"  : (lambda C, P, A, Z, S, O: C == 0),
-                "jae"  : (lambda C, P, A, Z, S, O: C == 0),
-
-                "jbe"  : (lambda C, P, A, Z, S, O: C == 1 or Z == 1),
-                "jna"  : (lambda C, P, A, Z, S, O: C == 1 or Z == 1),
-
-                "ja"   : (lambda C, P, A, Z, S, O: C == 0 and Z == 0),
-                "jnbe" : (lambda C, P, A, Z, S, O: C == 0 and Z == 0),
-
-                "jl"   : (lambda C, P, A, Z, S, O: S != O),
-                "jnge" : (lambda C, P, A, Z, S, O: S != O),
-
-                "jge"  : (lambda C, P, A, Z, S, O: S == O),
-                "jnl"  : (lambda C, P, A, Z, S, O: S == O),
-
-                "jle"  : (lambda C, P, A, Z, S, O: Z == 1 or S != O),
-                "jng"  : (lambda C, P, A, Z, S, O: Z == 1 or S != O),
-
-                "jg"   : (lambda C, P, A, Z, S, O: Z == 0 or S == O),
-                "jnle" : (lambda C, P, A, Z, S, O: Z == 0 or S == O),
-
-                "jp"   : (lambda C, P, A, Z, S, O: P == 1),
-                "jpe"  : (lambda C, P, A, Z, S, O: P == 1),
-
-                "jnp"  : (lambda C, P, A, Z, S, O: P == 0),
-                "jpo"  : (lambda C, P, A, Z, S, O: P == 0),
-
-                # unconditional jump
-
-                "call" : (lambda *_: True),
-                "jmp"  : (lambda *_: True),
-
-                }
-
-        jump_reg_table = {
-                "jcxz"  : (lambda cx: cx == 0),
-                "jecxz" : (lambda ecx: ecx == 0),
-                "jrcxz" : (lambda rcx: rcx == 0),
-                }
-
-        if line.mnemonic in jump_table:
-            eflags = self.get_flags(self.ql.arch.regs.eflags).values()
-            prophecy.going = jump_table.get(line.mnemonic)(*eflags)
-
-        elif line.mnemonic in jump_reg_table:
-            prophecy.going = jump_reg_table.get(line.mnemonic)(self.ql.arch.regs.ecx)
-
-        if prophecy.going:
-            takeaway_list = ["ptr", "dword", "qword", "[", "]"]
-
-            if len(line.op_str.split()) > 1:
-                new_line = line.op_str.replace(":", "+")
-                for each in takeaway_list:
-                    new_line = new_line.replace(each, " ")
-
-                new_line = " ".join(new_line.split())
-                for each_reg in filter(lambda r: len(r) == 3, self.ql.arch.regs.register_mapping.keys()):
-                    if each_reg in new_line:
-                        new_line = re.sub(each_reg, hex(self.read_reg(each_reg)), new_line)
-
-                for each_reg in filter(lambda r: len(r) == 2, self.ql.arch.regs.register_mapping.keys()):
-                    if each_reg in new_line:
-                        new_line = re.sub(each_reg, hex(self.read_reg(each_reg)), new_line)
-
-                prophecy.where = check_and_eval(new_line)
-
-            elif line.op_str in self.ql.arch.regs.register_mapping:
-                prophecy.where = self.ql.arch.regs.read(line.op_str)
-
-            else:
-                prophecy.where = read_int(line.op_str)
-        else:
-            prophecy.where = self.cur_addr + line.size
-
-        return prophecy
diff --git a/qiling/debugger/qdb/const.py b/qiling/debugger/qdb/const.py
index 74c72d229..d316fc263 100644
--- a/qiling/debugger/qdb/const.py
+++ b/qiling/debugger/qdb/const.py
@@ -1,23 +1,25 @@
 from enum import IntEnum
 
+
 class color:
-   """
-   class for colorful prints
-   """
-   CYAN      = '\033[96m'
-   PURPLE    = '\033[95m'
-   BLUE      = '\033[94m'
-   YELLOW    = '\033[93m'
-   GREEN     = '\033[92m'
-   RED       = '\033[91m'
-   DARKGRAY  = '\033[90m'
-   WHITE     = '\033[48m'
-   DARKCYAN  = '\033[36m'
-   BLACK     = '\033[35m'
-   UNDERLINE = '\033[4m'
-   BOLD      = '\033[1m'
-   END       = '\033[0m'
-   RESET     = '\x1b[39m'
+    """
+    class for colorful prints
+    """
+    DARKGRAY  = '\033[90m'
+    RED       = '\033[91m'
+    GREEN     = '\033[92m'
+    YELLOW    = '\033[93m'
+    BLUE      = '\033[94m'
+    PURPLE    = '\033[95m'
+    CYAN      = '\033[96m'
+    WHITE     = '\033[48m'
+    BLACK     = '\033[35m'
+    DARKCYAN  = '\033[36m'
+    UNDERLINE = '\033[4m'
+    BOLD      = '\033[1m'
+    END       = '\033[0m'
+    RESET     = '\033[39m'
+
 
 class QDB_MSG(IntEnum):
     ERROR = 10
diff --git a/qiling/debugger/qdb/context.py b/qiling/debugger/qdb/context.py
index e4400f4b4..344f7563a 100644
--- a/qiling/debugger/qdb/context.py
+++ b/qiling/debugger/qdb/context.py
@@ -3,102 +3,147 @@
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
-from typing import Optional
+from __future__ import annotations
 
-from unicorn import UC_ERR_READ_UNMAPPED
-import unicorn
+from typing import TYPE_CHECKING, Optional, Tuple, Union
+from unicorn import UcError
 
-from capstone import CsInsn
+from .misc import InvalidInsn
+
+
+if TYPE_CHECKING:
+    from qiling import Qiling
+    from .misc import InsnLike
 
-from .misc import read_int, InvalidInsn
 
 class Context:
-    """
-    base class for accessing context of running qiling instance
+    """Emulation context accessor.
     """
 
-    def __init__(self, ql):
+    def __init__(self, ql: Qiling):
+        # make sure mixin classes are properly initialized
+        super().__init__()
+
         self.ql = ql
         self.pointersize = self.ql.arch.pointersize
-        self.unpack = ql.unpack
-        self.unpack16 = ql.unpack16
-        self.unpack32 = ql.unpack32
-        self.unpack64 = ql.unpack64
 
     @property
-    def cur_addr(self):
-        """
-        program counter of qiling instance
+    def cur_addr(self) -> int:
+        """Read current program counter register.
         """
 
         return self.ql.arch.regs.arch_pc
 
-    def read_mem(self, address: int, size: int):
+    @property
+    def cur_sp(self) -> int:
+        """Read current stack pointer register.
         """
-        read data from memory of qiling instance
+
+        return self.ql.arch.regs.arch_sp
+
+    def read_reg(self, reg: Union[str, int]) -> int:
+        """Get register value.
         """
 
-        return self.ql.mem.read(address, size)
+        return self.ql.arch.regs.read(reg)
 
-    def disasm(self, address: int, detail: bool = False) -> Optional[CsInsn]:
+    def write_reg(self, reg: Union[str, int], value: int) -> None:
+        """Set register value.
         """
-        helper function for disassembling
+
+        self.ql.arch.regs.write(reg, value)
+
+    def disasm(self, address: int, detail: bool = False) -> InsnLike:
+        """Helper function for disassembling.
         """
 
-        md = self.ql.arch.disassembler
-        md.detail = detail
+        insn_bytes = self.read_insn(address) or b''
+        insn = None
+
+        if insn_bytes:
+            md = self.ql.arch.disassembler
+            md.detail = detail
+
+            insn = next(md.disasm(insn_bytes, address, 1), None)
 
-        if (bytes_read := self.read_insn(address)):
-            return next(md.disasm(bytes_read, address), InvalidInsn(bytes_read, address))
-        return InvalidInsn(bytes_read, address)
+        return insn or InvalidInsn(insn_bytes, address)
 
-    def try_read(self, address: int, size: int) -> Optional[bytes]:
+    def disasm_lite(self, address: int) -> Tuple[int, int, str, str]:
+        """Helper function for light disassembling, when details are not required.
+
+        Returns:
+            A tuple of: instruction address, size, mnemonic and operands
         """
-        try to read data from ql.mem
+
+        insn_bytes = self.read_insn(address) or b''
+        insn = None
+
+        if insn_bytes:
+            md = self.ql.arch.disassembler
+
+            insn = next(md.disasm_lite(insn_bytes, address, 1), None)
+
+        return insn or tuple()
+
+    def read_mem(self, address: int, size: int) -> bytearray:
+        """Read data of a certain size from specified memory location.
         """
 
-        result = None
-        err_msg = ""
-        try:
-            result = self.read_mem(address, size)
+        return self.ql.mem.read(address, size)
 
-        except unicorn.unicorn.UcError as err:
-            if err.errno == UC_ERR_READ_UNMAPPED: # Invalid memory read (UC_ERR_READ_UNMAPPED)
-                err_msg = f"Can not access memory at address 0x{address:08x}"
+    def try_read_mem(self, address: int, size: int) -> Optional[bytearray]:
+        """Attempt to read data from memory.
+        """
 
-        except:
-            pass
+        try:
+            data = self.read_mem(address, size)
+        except UcError:
+            data = None
 
-        return (result, err_msg)
+        return data
 
-    def try_read_pointer(self, address: int) -> Optional[bytes]:
+    def read_pointer(self, address: int, size: int = 0, *, signed: bool = False) -> int:
+        """Attempt to read a native-size integer from memory.
         """
-        try to read pointer size of data from ql.mem
+
+        return self.ql.mem.read_ptr(address, size, signed=signed)
+
+    def try_read_pointer(self, address: int, size: int = 0, *, signed: bool = False) -> Optional[int]:
+        """Attempt to read a native-size integer from memory.
         """
 
-        return self.try_read(address, self.archbit)
+        try:
+            value = self.read_pointer(address, size, signed=signed)
+        except UcError:
+            value = None
+
+        return value
 
     def read_string(self, address: int) -> Optional[str]:
-        """
-        read string from memory of qiling instance
+        """Read string from memory.
         """
 
         return self.ql.mem.string(address)
 
     def try_read_string(self, address: int) -> Optional[str]:
-        """
-        try to read string from memory of qiling instance
+        """Attempt to read a string from memory.
         """
 
-        s = None
         try:
             s = self.read_string(address)
-        except:
-            pass
+        except UcError:
+            s = None
+
+        return s
+
+    def get_deref(self, ptr: int) -> Union[int, str, None]:
+        """Get content referenced by a pointer.
+
+        If dereferenced data is printable, a string will be returned. Otherwise
+        an integer value is retgurned. If the specified address is not reachable
+        None is returned.
+        """
 
-    @staticmethod
-    def read_int(s: str) -> int:
-        return read_int(s)
+        val = self.try_read_string(ptr)
 
-if __name__ == "__main__":
-    pass
+        return val if val and val.isprintable() else self.try_read_pointer(ptr)
diff --git a/qiling/debugger/qdb/helper.py b/qiling/debugger/qdb/helper.py
new file mode 100644
index 000000000..fd6c05bf3
--- /dev/null
+++ b/qiling/debugger/qdb/helper.py
@@ -0,0 +1,252 @@
+#!/usr/bin/env python3
+#
+# Cross Platform and Multi Architecture Advanced Binary Emulation Framework
+#
+
+from __future__ import annotations
+
+import re
+
+from typing import TYPE_CHECKING, List, Tuple
+
+from qiling.const import QL_ARCH
+from .context import Context
+from .arch import ArchCORTEX_M, ArchARM, ArchMIPS, ArchX86, ArchX64
+
+
+if TYPE_CHECKING:
+    from re import Match
+    from qiling import Qiling
+    from .misc import InsnLike
+
+
+def setup_command_helper(ql: Qiling):
+    atypes = {
+        QL_ARCH.X86:      ArchX86,
+        QL_ARCH.X8664:    ArchX64,
+        QL_ARCH.MIPS:     ArchMIPS,
+        QL_ARCH.ARM:      ArchARM,
+        QL_ARCH.CORTEX_M: ArchCORTEX_M
+    }
+
+    ret = type('CommandHelper', (CommandHelper, atypes[ql.arch.type]), {})
+
+    return ret(ql)
+
+
+# pre-compile the safe arithmetics and bitwise pattern
+__arith_pattern = re.compile(r'^(0[xX][0-9a-fA-F]+|0[0-7]+|\d+|[\+\-\*/\(\)|&^~\s])+$')
+
+
+def safe_arith(expr: str) -> int:
+    """Safely evaluate an arithmetic expression. The expression may include only
+    digits, arithmetic and bitwise operators, parantheses, whitespaces, hexadecimal
+    and octal values.
+
+    Args:
+        expr: arithmetic expression to evaluate
+
+    Returns: integer result
+
+    Raises:
+        ValueError: if disallowed tokens are included in `expr`
+        SyntaxError: in case the arithmetic expression does not make sense
+    """
+
+    if not __arith_pattern.fullmatch(expr):
+        raise ValueError
+
+    # adjust gdb-style octal values to python: 0644 -> 0o644
+    re.sub(r'0([0-7]+)', r'0o\1', expr)
+
+    # safely evaluate the expression
+    return eval(expr, {}, {})
+
+
+class CommandHelper(Context):
+    """
+    memory manager for handing memory access
+    """
+
+    def __init__(self, ql: Qiling):
+        super().__init__(ql)
+
+        # default values for the examine ('x') command
+        self.x_defaults = {
+            'n': '1',   # number of units to read
+            'f': 'x',   # output format
+            'u': 'w'    # unit type
+        }
+
+    def sub_reg_values(self, expr: str) -> str:
+        def __sub_reg(m: Match[str]) -> str:
+            reg = m.group(1).lower()
+
+            return f'{self.read_reg(self.unalias(reg)):#x}'
+
+        # replace reg names with their actual values
+        return re.sub(r'\$(\w+)', __sub_reg, expr)
+
+    def resolve_expr(self, expr: str) -> int:
+        """Resolve an arithmetic expression that might include register names.
+
+        Registers names will be substituted with their current value before
+        proceeding to evaluate the expression.
+
+        Args:
+            expr: an expression to evaluate
+
+        Returns:
+            final evaluation result
+
+        Raises:
+            KeyError: if `expr` contains an unrecognized register name
+            ValueError: if `expr` contains disallowed tokens
+            SyntaxError: if `expr` contains a broken arithmetic syntax
+        """
+
+        try:
+            # look for registers names  and replace them with their actual values
+            expr = self.sub_reg_values(expr)
+
+        # expr contains an unrecognized register name
+        except KeyError as ex:
+            raise KeyError(f'unrecognized register name: {ex.args[0]}') from ex
+
+        try:
+            # expr should contain only values and aithmetic tokens by now; attempt to evaluate it
+            res = safe_arith(expr)
+
+        # expr contains a disallowed token
+        except ValueError as ex:
+            raise ValueError('only integers, hexadecimals, octals, arithmetic and bitwise operators are allowed') from ex
+
+        # arithmetic syntax is broken
+        except SyntaxError as ex:
+            raise SyntaxError('error evaluating arithmetic expression') from ex
+
+        return res
+
+    def handle_set(self, line: str) -> Tuple[str, int]:
+        """
+        set register value of current context
+        """
+        # set $a = b
+
+        m = re.match(r'\s*\$(?P<reg>\w+)\s*=\s*(?P<expr>.+)', line)
+
+        if m is None:
+            raise SyntaxError('illegal command syntax')
+
+        if not m['reg']:
+            raise KeyError('error parsing input: invalid lhand expression')
+
+        if not m['expr']:
+            raise SyntaxError('error parsing input: invalid rhand expression')
+
+        reg = self.unalias(m['reg'])
+        expr = self.resolve_expr(m['expr'])
+
+        self.write_reg(reg, expr)
+
+        return (reg, expr)
+
+    def handle_i(self, addr: int, count: int) -> List[InsnLike]:
+        result = []
+
+        for _ in range(count):
+            insn = self.disasm(addr)
+            addr += insn.size
+
+            result.append(insn)
+
+        return result
+
+    def handle_examine(self, line: str) -> None:
+        # examples:
+        #   x/xw address
+        #   x/4xw $esp
+        #   x/4xg $rsp
+        #   x/i $eip - 0x10
+        #   x $sp
+        #   x $sp + 0xc
+
+        m = re.match(r'(?:/(?P<n>\d+)?(?P<f>[oxdutfacis])?(?P<u>[bhwg])?)?\s*(?P<target>.+)?', line)
+
+        # there should be always a match, at least for target, but let's be on the safe side
+        if m is None:
+            raise ValueError('unexpected examine command syntax')
+
+        n = m['n'] or self.x_defaults['n']
+        f = m['f'] or self.x_defaults['f']
+        u = m['u'] or self.x_defaults['u']
+
+        target = m['target']
+
+        # if target was specified, determine its value. otherwise use the current address
+        target = self.resolve_expr(target) if target else self.cur_addr
+
+        n = int(n)
+
+        if f == r'i':
+            for insn in self.handle_i(target, n):
+                print(f"{insn.address:#010x}: {insn.mnemonic:10s} {insn.op_str}")
+
+        # handle read c-style string
+        elif f == r's':
+            s = self.try_read_string(target)
+
+            if s is None:
+                raise ValueError(f'error reading c-style string at {target:#010x}')
+
+            print(f"{target:#010x}: {s}")
+
+        else:
+            def __to_size(u: str) -> int:
+                """Convert a gdb unit name to its corresponding size in bytes.
+                """
+
+                sizes = {
+                    'b': 1,  # byte
+                    'h': 2,  # halfword
+                    'w': 4,  # word
+                    'g': 8   # giant
+                }
+
+                # assume u is in sizes
+                return sizes[u]
+
+            def __to_py_spec(f: str, size: int) -> Tuple[str, str, str]:
+                """Convert a gdb format specifier to its corresponding python format,
+                prefix and padding specifiers.
+                """
+
+                specs = {
+                    'o': ('o', '0',  ''),              # octal
+                    'x': ('x', '0x', f'0{size * 2}'),  # hex
+                    'd': ('d', '',   ''),              # decimal
+                    'u': ('u', '',   ''),              # unsigned decimal
+                    't': ('b', '',   f'0{size * 8}'),  # binary
+                    'f': ('f', '',   ''),              # float
+                    'a': ('x', '0x', f'0{size * 2}'),  # address
+                    'c': ('c', '',   ''),              # char
+                }
+
+                # assume f is in specs
+                return specs[f]
+
+            size = __to_size(u)
+            pyfmt, prefix, pad = __to_py_spec(f, size)
+            values = [self.try_read_pointer(target + (i * size), size) for i in range(n)]
+
+            ipr = 4  # number of items to display per row
+
+            for i in range(0, len(values), ipr):
+                vset = values[i:i + ipr]
+
+                print(f'{target + i * size:#10x}:', end='\t')
+
+                for v in vset:
+                    print('?' if v is None else f'{prefix}{v:{pad}{pyfmt}}', end='\t')
+
+                print()
diff --git a/qiling/debugger/qdb/memory.py b/qiling/debugger/qdb/memory.py
deleted file mode 100644
index e26f49302..000000000
--- a/qiling/debugger/qdb/memory.py
+++ /dev/null
@@ -1,204 +0,0 @@
-#!/usr/bin/env python3
-#
-# Cross Platform and Multi Architecture Advanced Binary Emulation Framework
-#
-
-from qiling.const import QL_ARCH
-
-from .context import Context
-from .arch import ArchCORTEX_M, ArchARM, ArchMIPS, ArchX86, ArchX8664
-from .misc import check_and_eval
-import re, math
-
-
-
-def setup_memory_Manager(ql):
-
-    arch_type = {
-            QL_ARCH.X86: ArchX86,
-            QL_ARCH.X8664: ArchX8664,
-            QL_ARCH.MIPS: ArchMIPS,
-            QL_ARCH.ARM: ArchARM,
-            QL_ARCH.CORTEX_M: ArchCORTEX_M,
-            }.get(ql.arch.type)
-    
-    ret = type(
-            "MemoryManager", 
-            (MemoryManager, arch_type),
-            {}
-            )
-    
-    return ret(ql)
-
-
-class MemoryManager(Context):
-    """
-    memory manager for handing memory access
-    """
-
-    def __init__(self, ql):
-        super().__init__(ql)
-
-    @property
-    def get_default_fmt(self):
-        return ('x', 4, 1)
-
-    @property
-    def get_format_letter(self):
-        return {
-            "o", # octal
-            "x", # hex
-            "d", # decimal
-            "u", # unsigned decimal
-            "t", # binary
-            "f", # float
-            "a", # address
-            "i", # instruction
-            "c", # char
-            "s", # string
-            "z", # hex, zero padded on the left
-            }
-
-    @property
-    def get_size_letter(self):
-        return {
-            "b": 1, # 1-byte, byte
-            "h": 2, # 2-byte, halfword
-            "w": 4, # 4-byte, word
-            "g": 8, # 8-byte, giant
-            }
-
-    def extract_count(self, t):
-        return "".join([s for s in t if s.isdigit()])
-
-    def get_fmt(self, text):
-        f, s, c = self.get_default_fmt
-        if self.extract_count(text):
-            c = int(self.extract_count(text))
-
-        for char in text.strip(str(c)):
-            if char in self.get_size_letter.keys():
-                s = self.get_size_letter.get(char)
-
-            elif char in self.get_format_letter:
-                f = char
-
-        return (f, s, c)
-
-    def fmt_unpack(self, bs: bytes, sz: int) -> int:
-        return {
-                1: lambda x: x[0],
-                2: self.unpack16,
-                4: self.unpack32,
-                8: self.unpack64,
-                }.get(sz)(bs)
-
-    def handle_i(self, addr, ct=1):
-        result = []
-
-        for offset in range(addr, addr+ct*4, 4):
-            if (line := self.disasm(offset)):
-                result.append(line)
-
-        return result
-
-
-    def parse(self, line: str):
-
-        # test case
-        # x/wx address
-        # x/i address
-        # x $sp
-        # x $sp +0xc
-        # x $sp+0xc
-        # x $sp + 0xc
-
-        if line.startswith("/"):  # followed by format letter and size letter
-
-            fmt, *rest = line.strip("/").split()
-
-            fmt = self.get_fmt(fmt)
-
-        else:
-            args = line.split()
-
-            rest = [args[0]] if len(args) == 1 else args
-
-            fmt = self.get_default_fmt
-
-        if len(rest) == 0:
-            return
-
-        line = []
-        if (regs_dict := getattr(self, "regs_need_swapped", None)):
-            for each in rest:
-                for reg in regs_dict:
-                    if each in regs_dict:
-                        line.append(regs_dict[each])
-                else:
-                    line.append(each)
-        else:
-            line = rest
-
-        # for simple calculation with register and address
-
-        line = " ".join(line)
-        # substitue register name with real value
-        for each_reg in filter(lambda r: len(r) == 3, self.ql.arch.regs.register_mapping):
-            reg = f"${each_reg}"
-            if reg in line:
-                line = re.sub(f"\\{reg}", hex(self.ql.arch.regs.read(each_reg)), line)
-
-        for each_reg in filter(lambda r: len(r) == 2, self.ql.arch.regs.register_mapping):
-            reg = f"${each_reg}"
-            if reg in line:
-                line = re.sub(f"\\{reg}", hex(self.ql.arch.regs.read(each_reg)), line)
-
-
-        ft, sz, ct = fmt
-
-        try:
-            addr = check_and_eval(line)
-        except:
-            return "something went wrong ..."
-
-        if ft == "i":
-            output = self.handle_i(addr, ct)
-            for each in output:
-                print(f"0x{each.address:x}: {each.mnemonic}\t{each.op_str}")
-
-        elif ft == "s":
-            # handle read c-style string
-            try:
-                print(f"0x{addr:08x}: {self.ql.os.utils.read_cstring(addr)}")
-            except:
-                return f"error reading c-style string at 0x{addr:08x}"
-
-        else:
-            lines = 1 if ct <= 4 else math.ceil(ct / 4)
-            # parse command
-            prefix = "0x" if ft in ("x", "a") else ""
-            pad = '0' + str(sz*2) if ft in ('x', 'a', 't') else ''
-            ft = ft.lower() if ft in ("x", "o", "b", "d") else ft.lower().replace("t", "b").replace("a", "x")
-
-            mem_read = []
-            for offset in range(ct):
-                # append data if read successfully, otherwise return error message
-                if (data := self.try_read(addr+(offset*sz), sz))[0] is not None:
-                    mem_read.append(data[0])
-
-                else:
-                    return data[1]
-
-            for line in range(lines):
-                offset = line * sz * 4
-                print(f"0x{addr+offset:x}:\t", end="")
-
-                idx = line * self.ql.arch.pointersize
-                for each in mem_read[idx:idx+self.ql.arch.pointersize]:
-                    data = self.fmt_unpack(each, sz)
-                    print(f"{prefix}{data:{pad}{ft}}\t", end="")
-
-                print()
-
-        return True
diff --git a/qiling/debugger/qdb/misc.py b/qiling/debugger/qdb/misc.py
index a3cf29e1a..46c06cc02 100644
--- a/qiling/debugger/qdb/misc.py
+++ b/qiling/debugger/qdb/misc.py
@@ -3,92 +3,68 @@
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
-from typing import AnyStr, Callable, Optional
+from typing import Optional, Union
 
 from dataclasses import dataclass
+from capstone import CsInsn
 
-import ast
-
-def check_and_eval(line: str):
-    """
-    This function will valid all type of nodes and evaluate it if nothing went wrong
-    """
-
-    class AST_checker(ast.NodeVisitor):
-        def generic_visit(self, node):
-            if type(node) in (ast.Module, ast.Expr, ast.BinOp, ast.Constant, ast.Add, ast.Mult, ast.Sub):
-                ast.NodeVisitor.generic_visit(self, node)
-            else:
-                raise ParseError("malform or invalid ast node")
-
-    checker = AST_checker()
-    ast_tree = ast.parse(line)
-    checker.visit(ast_tree)
-
-    return eval(line)
 
 @dataclass
 class InvalidInsn:
     """
     class for displaying invalid instruction
     """
+
     bytes: bytes
-    address: bytes
-    mnemonic: str = 'invalid'
+    address: int
+    mnemonic: str = '(invalid)'
     op_str: str = ''
 
     def __post_init__(self):
-        self.size = len(self.bytes)
+        self.size = len(self.bytes) if self.bytes else 1
 
 
 class Breakpoint:
+    """Dummy class for breakpoints.
     """
-    dummy class for breakpoint
-    """
-    def __init__(self, addr: int):
-        self.addr = addr
-        self.hitted = False
 
+    # monotonically increasing index counter
+    _counter = 0
 
-class TempBreakpoint(Breakpoint):
-    """
-    dummy class for temporay breakpoint
-    """
-    def __init__(self, addr: int):
-        super().__init__(addr)
+    def __init__(self, addr: int, temp: bool = False):
+        """Initialize a breakpoint object.
 
+        Args:
+            addr: address to break upon arrival
+            temp: whether this is a temporary breakpoint. temporary breakpoints
+            get removed after they get hit for the first time
+        """
 
-def read_int(s: str) -> int:
-    """
-    parse unsigned integer from string
-    """
-    return int(s, 0)
+        self.index = Breakpoint._counter
+        Breakpoint._counter += 1
 
+        self.addr = addr
+        self.temp = temp
+        self.enabled = True
 
-def try_read_int(s: AnyStr) -> Optional[int]:
-    """
-    try to read string as integer is possible
+
+def read_int(s: str, /) -> int:
+    """Turn a numerical string into its integer value.
     """
-    try:
-        ret = read_int(s)
-    except:
-        ret = None
 
-    return ret
+    return int(s, 0)
 
 
-def parse_int(func: Callable) -> Callable:
+def try_read_int(s: str, /) -> Optional[int]:
+    """Attempt to convert string to an integer value.
     """
-    function dectorator for parsing argument as integer
-    """
-    def wrap(qdb, s: str = "") -> int:
-        assert type(s) is str
-        ret = try_read_int(s)
-        return func(qdb, ret)
 
-    return wrap
+    try:
+        val = read_int(s)
+    except (ValueError, TypeError):
+        val = None
 
+    return val
 
 
-if __name__ == "__main__":
-    pass
+InsnLike = Union[CsInsn, InvalidInsn]
diff --git a/qiling/debugger/qdb/qdb.py b/qiling/debugger/qdb/qdb.py
index fe4a68d61..7182b46da 100644
--- a/qiling/debugger/qdb/qdb.py
+++ b/qiling/debugger/qdb/qdb.py
@@ -3,48 +3,60 @@
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
-import cmd
+from __future__ import annotations
 
-from typing import Callable, Optional, Tuple, Union, List
+import sys
+
+from typing import TYPE_CHECKING, Any, Callable, Dict, List, Union
+from cmd import Cmd
 from contextlib import contextmanager
 
-from qiling import Qiling
-from qiling.const import QL_OS, QL_ARCH, QL_ENDIAN, QL_VERBOSE
+from qiling.const import QL_OS, QL_ARCH, QL_VERBOSE
 from qiling.debugger import QlDebugger
 
-from .utils import setup_context_render, setup_branch_predictor, setup_address_marker, SnapshotManager, run_qdb_script
-from .memory import setup_memory_Manager
-from .misc import parse_int, Breakpoint, TempBreakpoint, try_read_int
 from .const import color
+from .helper import setup_command_helper
+from .misc import Breakpoint, try_read_int
+from .render.render import RARROW
+from .utils import setup_context_render, setup_branch_predictor, Marker, SnapshotManager, QDB_MSG, qdb_print
+
 
-from .utils import QDB_MSG, qdb_print
+if TYPE_CHECKING:
+    from qiling import Qiling
 
 
-def save_reg_dump(func: Callable) -> Callable[..., None]:
-    """Decorator for saving registers dump.
+def save_regs(func: Callable) -> Callable[..., None]:
+    """Save registers before running a certain functionality so we can display
+    the registers diff.
     """
 
     def inner(self: 'QlQdb', *args, **kwargs) -> None:
-        self._saved_reg_dump = dict(filter(lambda d: isinstance(d[0], str), self.ql.arch.regs.save().items()))
+        self.render.prev_regs = self.render.get_regs()
 
         func(self, *args, **kwargs)
 
     return inner
 
-def check_ql_alive(func: Callable) -> Callable[..., None]:
-    """Decorator for checking whether ql instance is alive.
+def liveness_check(func: Callable) -> Callable[..., None]:
+    """Decorator for checking whether the program is alive.
     """
 
     def inner(self: 'QlQdb', *args, **kwargs) -> None:
         if self.ql is None:
-            qdb_print(QDB_MSG.ERROR, "The program is not being run.")
-        else:
-            func(self, *args, **kwargs)
+            qdb_print(QDB_MSG.ERROR, 'no active emulation')
+            return
+
+        if self.predictor.has_ended():
+            qdb_print(QDB_MSG.ERROR, 'the program has ended')
+            return
+
+        # proceed to functionality
+        func(self, *args, **kwargs)
 
     return inner
 
 
-class QlQdb(cmd.Cmd, QlDebugger):
+class QlQdb(Cmd, QlDebugger):
     """
     The built-in debugger of Qiling Framework
     """
@@ -56,49 +68,55 @@ def __init__(self, ql: Qiling, init_hook: List[str] = [], rr: bool = False, scri
         """
 
         self.ql = ql
-        self.prompt = f"{color.BOLD}{color.RED}Qdb> {color.END}"
-        self._saved_reg_dump = None
+        self.prompt = f"{color.RED}(qdb) {color.RESET}"
         self._script = script
-        self.bp_list = {}
-        self.marker = setup_address_marker()
+        self.last_addr: int = -1
+        self.bp_list: Dict[int, Breakpoint] = {}
+        self.marker = Marker()
 
         self.rr = SnapshotManager(ql) if rr else None
-        self.mm = setup_memory_Manager(ql)
+        self.helper = setup_command_helper(ql)
         self.predictor = setup_branch_predictor(ql)
         self.render = setup_context_render(ql, self.predictor)
 
         super().__init__()
 
         # filter out entry_point of loader if presented
-        self.dbg_hook(list(filter(lambda d: int(d, 0) != self.ql.loader.entry_point, init_hook)))
+        self.dbg_hook([addr for addr in init_hook if int(addr, 0) != self.ql.loader.entry_point])
+
+    def run_qdb_script(self, filename: str) -> None:
+        with open(filename, 'r', encoding='latin') as fd:
+            self.cmdqueue = fd.readlines()
 
     def dbg_hook(self, init_hook: List[str]):
         """
         initial hook to prepare everything we need
         """
 
-        # self.ql.loader.entry_point  # ld.so
-        # self.ql.loader.elf_entry    # .text of binary
-
-        def bp_handler(ql, address, size, bp_list):
+        def __bp_handler(ql: Qiling, address: int, size: int):
+            if (address in self.bp_list) and (address != self.last_addr):
+                bp = self.bp_list[address]
 
-            if (bp := self.bp_list.get(address, None)):
+                if bp.enabled:
+                    if bp.temp:
+                        # temp breakpoint: remove once hit
+                        self.del_breakpoint(bp)
 
-                if isinstance(bp, TempBreakpoint):
-                    # remove TempBreakpoint once hitted
-                    self.del_breakpoint(bp)
+                    else:
+                        qdb_print(QDB_MSG.INFO, f'hit breakpoint at {self.cur_addr:#x}')
 
-                else:
-                    if bp.hitted:
-                        return
+                    # flush unicorn translation block to avoid resuming execution from next
+                    # basic block
+                    self.ql.arch.uc.ctl_flush_tb()
 
-                    qdb_print(QDB_MSG.INFO, f"hit breakpoint at {self.cur_addr:#x}")
-                    bp.hitted = True
+                    ql.stop()
+                    self.do_context()
 
-                ql.stop()
-                self.do_context()
+            # this is used to prevent breakpoints be hit more than once in a row. without
+            # it we would not be able to proceed after hitting a breakpoint
+            self.last_addr = address
 
-        self.ql.hook_code(bp_handler, self.bp_list)
+        self.ql.hook_code(__bp_handler)
 
         if self.ql.entry_point:
             self.cur_addr = self.ql.entry_point
@@ -107,64 +125,42 @@ def bp_handler(ql, address, size, bp_list):
 
         self.init_state = self.ql.save()
 
-        # stop emulator once interp. have been done emulating
-        if addr_elf_entry := getattr(self.ql.loader, 'elf_entry', None):
-            handler = self.ql.hook_address(lambda ql: ql.stop(), addr_elf_entry)
-        else:
-            handler = self.ql.hook_address(lambda ql: ql.stop(), self.ql.loader.entry_point)
-
-        # suppress logging temporary
-        _verbose = self.ql.verbose
-        self.ql.verbose = QL_VERBOSE.DISABLED
-
-        # init os for integrity of hooks and patches,
-        self.ql.os.run()
-
-        handler.remove()
-
-        # ignore the memory unmap error for now, due to the MIPS memory layout issue
-        try:
-            self.ql.mem.unmap_all()
-        except:
-            pass
-
-        self.ql.restore(self.init_state)
-
-        # resotre logging verbose
-        self.ql.verbose = _verbose
+        # the interpreter has to be emulated, but this is not interesting for most of the users.
+        # here we start emulating from interpreter's entry point while making sure the emulator
+        # stops once it reaches the program entry point
+        entry = getattr(self.ql.loader, 'elf_entry', self.ql.loader.entry_point) & ~0b1
+        self.set_breakpoint(entry, is_temp=True)
 
-        if self.ql.os.type is QL_OS.BLOB:
-            self.ql.loader.entry_point = self.ql.loader.load_address
+        # init os for integrity of hooks and patches while temporarily suppress logging to let it
+        # fast-forward
+        with self.__set_temp(self.ql, 'verbose', QL_VERBOSE.DISABLED):
+            self.ql.os.run()
 
-        elif init_hook:
+        if init_hook:
             for each_hook in init_hook:
                 self.do_breakpoint(each_hook)
 
         if self._script:
-            run_qdb_script(self, self._script)
-        else:
-            self.do_context()
-            self.interactive()
+            self.run_qdb_script(self._script)
+
+        self.cmdloop()
 
     @property
     def cur_addr(self) -> int:
-        """
-        getter for current address of qiling instance
+        """Get emulation's current program counter.
         """
 
         return self.ql.arch.regs.arch_pc
 
     @cur_addr.setter
     def cur_addr(self, address: int) -> None:
-        """
-        setter for current address of qiling instance
+        """Set emulation's current program counter.
         """
 
         self.ql.arch.regs.arch_pc = address
 
     def _run(self, address: int = 0, end: int = 0, count: int = 0) -> None:
-        """
-        internal function for emulating instruction
+        """Internal method for advancing emulation on different circumstences.
         """
 
         if not address:
@@ -176,42 +172,27 @@ def _run(self, address: int = 0, end: int = 0, count: int = 0) -> None:
         self.ql.emu_start(begin=address, end=end, count=count)
 
     @contextmanager
-    def _save(self, reg=True, mem=True, hw=False, fd=False, cpu_context=False, os=False, loader=False):
+    def save(self):
         """
         helper function for fetching specific context by emulating instructions
         """
-        saved_states = self.ql.save(reg=reg, mem=mem)
+        saved_states = self.ql.save(reg=True, mem=False)
         yield self
         self.ql.restore(saved_states)
 
-    def parseline(self, line: str) -> Tuple[Optional[str], Optional[str], str]:
-        """
-        Parse the line into a command name and a string containing
-        the arguments.  Returns a tuple containing (command, args, line).
-        'command' and 'args' may be None if the line couldn't be parsed.
-        """
+    def default(self, line: str):
+        # if this is a comment line, ignore it
+        if line.startswith('#'):
+            return
 
-        line = line.strip()
-        if not line:
-            return None, None, line
-        elif line[0] == '?':
-            line = 'help ' + line[1:]
-        elif line.startswith('!'):
-            if hasattr(self, 'do_shell'):
-                line = 'shell ' + line[1:]
-            else:
-                return None, None, line
-        i, n = 0, len(line)
-        while i < n and line[i] in self.identchars: i = i+1
-        cmd, arg = line[:i], line[i:].strip()
-        return cmd, arg, line
+        super().default(line)
 
-    def interactive(self, *args) -> None:
-        """
-        initial an interactive interface
-        """
+    def emptyline(self) -> bool:
+        # when executing a script, ignore empty lines
+        if self._script:
+            return False
 
-        return self.cmdloop()
+        return super().emptyline()
 
     def run(self, *args) -> None:
         """
@@ -220,15 +201,7 @@ def run(self, *args) -> None:
 
         self._run()
 
-    def emptyline(self, *args) -> None:
-        """
-        repeat last command
-        """
-
-        if (lastcmd := getattr(self, "do_" + self.lastcmd, None)):
-            return lastcmd()
-
-    def do_run(self, *args) -> None:
+    def do_run(self, args: str) -> None:
         """
         launch qiling instance
         """
@@ -236,346 +209,446 @@ def do_run(self, *args) -> None:
         self._run()
 
     @SnapshotManager.snapshot
-    @save_reg_dump
-    @check_ql_alive
-    def do_step_in(self, step: str = '', *args) -> Optional[bool]:
-        """
-        execute one instruction at a time, will enter subroutine
+    @save_regs
+    @liveness_check
+    def do_step_in(self, args: str) -> None:
+        """Go to next instruction, stepping into function calls.
         """
-        prophecy = self.predictor.predict()
 
-        if prophecy.where is True:
-            qdb_print(QDB_MSG.INFO, 'program exited due to code end hitted')
-            self.do_context()
-            return False
+        steps, *_ = args.split() if args else ('',)
+        steps = try_read_int(steps)
+
+        if steps is None:
+            steps = 1
 
-        step = 1 if step == '' else int(step)
+        qdb_print(QDB_MSG.INFO, f'stepping {steps} steps from {self.cur_addr:#x}')
 
-        # make sure follow branching
-        if prophecy.going is True and self.ql.arch.type == QL_ARCH.MIPS:
-            step += 1
+        # make sure to include delay slot when branching in mips
+        if self.ql.arch.type is QL_ARCH.MIPS and self.predictor.is_branch():
+            prophecy = self.predictor.predict()
 
-        self._run(count=step)
+            if prophecy.going:
+                steps += 1
+
+        self._run(count=steps)
         self.do_context()
 
     @SnapshotManager.snapshot
-    @save_reg_dump
-    @check_ql_alive
-    def do_step_over(self, *args) -> Optional[bool]:
-        """
-        execute one instruction at a time, but WON't enter subroutine
+    @save_regs
+    @liveness_check
+    def do_step_over(self, args: str) -> None:
+        """Go to next instruction, stepping over function calls.
         """
 
-        prophecy = self.predictor.predict()
+        addr, size, _, _ = self.predictor.disasm_lite(self.cur_addr)
+        next_insn = addr + size
 
-        if prophecy.going:
-            self.set_breakpoint(prophecy.where, is_temp=True)
+        # make sure to include delay slot when branching in mips
+        if self.ql.arch.type is QL_ARCH.MIPS and self.predictor.is_branch():
+            next_insn += size
 
-        else:
-            cur_insn = self.predictor.disasm(self.cur_addr)
-            bp_addr = self.cur_addr + cur_insn.size
-
-            if self.ql.arch.type is QL_ARCH.MIPS:
-                bp_addr += cur_insn.size
-
-            self.set_breakpoint(bp_addr, is_temp=True)
+        self.set_breakpoint(next_insn, is_temp=True)
 
         self._run()
 
     @SnapshotManager.snapshot
-    @parse_int
-    def do_continue(self, address: Optional[int] = None) -> None:
-        """
-        continue execution from current address if not specified
+    @save_regs
+    @liveness_check
+    def do_continue(self, args: str) -> None:
+        """Continue execution from specified address, or from current one if
+        not specified.
         """
 
+        address, *_ = args.split() if args else ('',)
+        address = try_read_int(address)
+
         if address is None:
             address = self.cur_addr
 
-        qdb_print(QDB_MSG.INFO, f"continued from 0x{address:08x}")
+        qdb_print(QDB_MSG.INFO, f'continuing from {address:#010x}')
 
         self._run(address)
 
-    def do_backward(self, *args) -> None:
-        """
-        step barkward if it's possible, option rr should be enabled and previous instruction must be executed before
+    def do_backward(self, args: str) -> None:
+        """Step backwards to the previous location.
+
+        This operation requires the rr option to be enabled and having a progress
+        of at least one instruction
         """
 
-        if self.rr:
-            if len(self.rr.layers) == 0 or not isinstance(self.rr.layers[-1], self.rr.DiffedState):
-                qdb_print(QDB_MSG.ERROR, "there is no way back !!!")
+        if self.rr is None:
+            qdb_print(QDB_MSG.ERROR, 'rr was not enabled')
+            return
 
-            else:
-                qdb_print(QDB_MSG.INFO, "step backward ~")
-                self.rr.restore()
-                self.do_context()
-        else:
-            qdb_print(QDB_MSG.ERROR, f"the option rr yet been set !!!")
+        if not self.rr.layers:
+            qdb_print(QDB_MSG.ERROR, 'there are no snapshots yet')
+            return
+
+        qdb_print(QDB_MSG.INFO, 'stepping backwards')
+
+        self.rr.restore()
+        self.do_context()
+
+        # we did not really amualte anything going backwards, so we manually
+        # updating last address
+        self.last_addr = self.cur_addr
 
     def set_breakpoint(self, address: int, is_temp: bool = False) -> None:
-        """
-        internal function for placing breakpoint
+        """[internal] Add or update an existing breakpoint.
         """
 
-        bp = TempBreakpoint(address) if is_temp else Breakpoint(address)
+        self.bp_list[address] = Breakpoint(address, is_temp)
 
-        self.bp_list.update({address: bp})
+    def del_breakpoint(self, bp: Union[int, Breakpoint]) -> None:
+        """[internal] Remove an existing breakpoint.
 
-    def del_breakpoint(self, bp: Union[Breakpoint, TempBreakpoint]) -> None:
-        """
-        internal function for removing breakpoint
+        The caller is responsible to make sure the breakpoint exists.
         """
 
-        self.bp_list.pop(bp.addr, None)
+        if isinstance(bp, int):
+            try:
+                bp = next(b for b in self.bp_list.values() if b.addr == bp)
+            except StopIteration:
+                qdb_print(QDB_MSG.ERROR, f'No breakpoint number {bp}.')
+                return
+
+        del self.bp_list[bp.addr]
 
-    @parse_int
-    def do_breakpoint(self, address: Optional[int] = None) -> None:
-        """
-        set breakpoint on specific address
+    def do_breakpoint(self, args: str) -> None:
+        """Set a breakpoint on a specific address, or current one if not specified.
         """
 
+        address, *_ = args.split() if args else ('',)
+        address = try_read_int(address)
+
         if address is None:
             address = self.cur_addr
 
         self.set_breakpoint(address)
 
-        qdb_print(QDB_MSG.INFO, f"Breakpoint at 0x{address:08x}")
+        qdb_print(QDB_MSG.INFO, f"breakpoint set at {address:#010x}")
 
-    @parse_int
-    def do_disassemble(self, address: Optional[int] = None) -> None:
-        """
-        disassemble instructions from address specified
+    def do_disassemble(self, args: str) -> None:
+        """Disassemble a few instructions starting from specified address.
         """
 
-        try:
-            context_asm(self.ql, address)
-        except:
-            qdb_print(QDB_MSG.ERROR)
+        address, *_ = args.split() if args else ('',)
+        address = try_read_int(address)
 
-    def do_examine(self, line: str) -> None:
+        if address is None:
+            address = self.cur_addr
 
-        """
-        Examine memory: x/FMT ADDRESS.
-        format letter: o(octal), x(hex), d(decimal), u(unsigned decimal), t(binary), f(float), a(address), i(instruction), c(char), s(string) and z(hex, zero padded on the left)
-        size letter: b(byte), h(halfword), w(word), g(giant, 8 bytes)
-        e.g. x/4wx 0x41414141 , print 4 word size begin from address 0x41414141 in hex
-        """
+        self.do_examine(f'x/{self.render.disasm_num * 2}i {address}')
 
-        if type(err_msg := self.mm.parse(line)) is str:
-            qdb_print(QDB_MSG.ERROR, err_msg)
+    def do_examine(self, args: str) -> None:
+        """Examine memory.
 
+        Usage: x/nfu target (all arguments are optional)
+        Where:
+            n - number of units to read
+            f - format specifier
+            u - unit type
+        """
+
+        try:
+            self.helper.handle_examine(args)
+        except (KeyError, ValueError, SyntaxError) as ex:
+            qdb_print(QDB_MSG.ERROR, ex)
 
-    def do_set(self, line: str) -> None:
+    def do_set(self, args: str) -> None:
         """
         set register value of current context
         """
         # set $a = b
 
-        reg, val = line.split("=")
-        reg_name = reg.strip().strip("$")
-        reg_val = try_read_int(val.strip())
-
-        if reg_name in self.ql.arch.regs.save().keys():
-            if reg_val is not None:
-                setattr(self.ql.arch.regs, reg_name, reg_val)
-                self.do_context()
-                qdb_print(QDB_MSG.INFO, f"set register {reg_name} to 0x{(reg_val & 0xfffffff):08x}")
-
-            else:
-                qdb_print(QDB_MSG.ERROR, f"error parsing input: {reg_val} as integer value")
-
+        try:
+            reg, value = self.helper.handle_set(args)
+        except (KeyError, ValueError, SyntaxError) as ex:
+            qdb_print(QDB_MSG.ERROR, ex)
         else:
-            qdb_print(QDB_MSG.ERROR, f"invalid register: {reg_name}")
+            qdb_print(QDB_MSG.INFO, f"{reg} set to {value:#010x}")
 
-    def do_start(self, *args) -> None:
+    def do_start(self, args: str) -> None:
         """
         restore qiling instance context to initial state
         """
 
-        if self.ql.arch != QL_ARCH.CORTEX_M:
+        if self.ql.arch.type is QL_ARCH.CORTEX_M:
             self.ql.restore(self.init_state)
             self.do_context()
 
-    def do_context(self, *args) -> None:
+    def do_context(self, *args: str) -> None:
         """
         display context information for current location
         """
 
-        self.render.context_reg(self._saved_reg_dump)
+        self.render.context_reg()
         self.render.context_stack()
         self.render.context_asm()
 
-    def do_jump(self, loc: str, *args) -> None:
+    def do_jump(self, args: str) -> None:
         """
         seek to where ever valid location you want
         """
 
-        sym = self.marker.get_symbol(loc)
-        addr = sym if sym is not None else try_read_int(loc)
+        loc, *_ = args.split() if args else ('',)
+        addr = self.marker.get_address(loc)
+
+        if addr is None:
+            addr = try_read_int(loc)
+
+            if addr is None:
+                qdb_print(QDB_MSG.ERROR, 'seek target should be a symbol or an address')
+                return
 
         # check validation of the address to be seeked
-        if self.ql.mem.is_mapped(addr, 4):
-            if sym:
-                qdb_print(QDB_MSG.INFO, f"seek to {loc} @ 0x{addr:08x} ...")
-            else:
-                qdb_print(QDB_MSG.INFO, f"seek to 0x{addr:08x} ...")
+        if not self.ql.mem.is_mapped(addr, 4):
+            qdb_print(QDB_MSG.ERROR, f'seek target is unreachable: {addr:#010x}')
+            return
 
-            self.cur_addr = addr
-            self.do_context()
+        qdb_print(QDB_MSG.INFO, f'seeking to {addr:#010x} ...')
 
-        else:
-            qdb_print(QDB_MSG.ERROR, f"the address to be seeked isn't mapped")
+        self.cur_addr = addr
+        self.do_context()
 
-    def do_mark(self, args=""):
+    def do_mark(self, args: str):
         """
         mark a user specified address as a symbol
         """
 
-        args = args.split()
-        if len(args) == 0:
+        elems = args.split() if args else []
+
+        if not elems:
             loc = self.cur_addr
-            sym_name = self.marker.mark_only_loc(loc)
+            sym = self.marker.mark(loc)
 
-        elif len(args) == 1:
-            if (loc := try_read_int(args[0])):
-                sym_name = self.marker.mark_only_loc(loc)
+        elif len(elems) == 1:
+            loc = try_read_int(elems[0])
 
-            else:
+            if loc is None:
                 loc = self.cur_addr
-                sym_name = args[0]
-                if (err := self.marker.mark(sym_name, loc)):
-                    qdb_print(QDB_MSG.ERROR, err)
+                sym = elems[0]
+
+                if not self.marker.mark(loc, sym):
+                    qdb_print(QDB_MSG.ERROR, f"duplicated symbol name: {sym} at address: {loc:#010x}")
                     return
 
-        elif len(args) == 2:
-            sym_name, addr = args
-            if (loc := try_read_int(addr)):
-                self.marker.mark(sym_name, loc)
             else:
+                sym = self.marker.mark(loc)
+
+        elif len(elems) == 2:
+            sym, addr = elems
+            loc = try_read_int(addr)
+
+            if loc is None:
                 qdb_print(QDB_MSG.ERROR, f"unable to mark symbol at address: '{addr}'")
                 return
+
+            else:
+                self.marker.mark(loc, sym)
+
         else:
             qdb_print(QDB_MSG.ERROR, "symbol should not be empty ...")
             return
 
-        qdb_print(QDB_MSG.INFO, f"mark symbol '{sym_name}' at address: 0x{loc:08x} ...")
+        qdb_print(QDB_MSG.INFO, f"mark symbol '{sym}' at address: 0x{loc:08x} ...")
 
-    @parse_int
-    def do_show_args(self, argc: int = -1):
-        """
-        show arguments of a function call
-        default argc is 2 since we don't know the function definition
+    @staticmethod
+    @contextmanager
+    def __set_temp(obj: object, member: str, value: Any):
+        """A utility context manager that temporarily sets a new value to an
+        object member, only to run a certain functionality. Then the change
+        is reverted.
         """
 
+        has_member = hasattr(obj, member)
+
+        if has_member:
+            orig = getattr(obj, member)
+            setattr(obj, member, value)
+
+        try:
+            yield
+        finally:
+            if has_member:
+                setattr(obj, member, orig)
+
+    def __info_args(self, args: str):
+        argc, *_ = args.split() if args else ('',)
+        argc = try_read_int(argc)
+
         if argc is None:
-            argc = -1
+            argc = 2
 
-        elif argc > 16:
-            qdb_print(QDB_MSG.ERROR, 'Maximum argc is 16.')
+        if argc > 16:
+            qdb_print(QDB_MSG.ERROR, 'can show up to 16 arguments')
             return
 
-        prophecy = self.predictor.predict()
-        if not prophecy.going:
-            qdb_print(QDB_MSG.ERROR, 'Not on a braching instruction currently.')
+        if not self.predictor.is_fcall():
+            qdb_print(QDB_MSG.ERROR, 'available only on a function call instruction')
             return
 
-        if argc == -1:
-            reg_n, stk_n = 2, 0
-        else:
-            if argc > 4:
-                reg_n, stk_n = 4, argc - 4
-            elif argc <= 4:
-                reg_n, stk_n = argc, 0
-
-        ptr_size = self.ql.arch.pointersize
+        # the cc methods were designed to access fcall arguments from within the function,
+        # and therefore assume a return address is on the stack (in relevant archs), so they
+        # skip it. when we are just about to call a function the return address is not yet
+        # there and the arguments, if read off the stack, get messed up.
+        #
+        # here we work around this by temporarily cheating cc to think there is no return
+        # address on the stack, so it does not skip it.
 
-        reg_args = []
-        arch_type = self.ql.arch.type
-        if arch_type in (QL_ARCH.MIPS, QL_ARCH.ARM, QL_ARCH.CORTEX_M, QL_ARCH.X8664):
+        with QlQdb.__set_temp(self.ql.os.fcall.cc, '_retaddr_on_stack', False):
+            fargs = [self.ql.os.fcall.cc.getRawParam(i) for i in range(argc)]
 
-            reg_idx = None
-            if arch_type == QL_ARCH.MIPS:
-                slot_addr = self.cur_addr + ptr_size
+        # mips requires a special handling since the instruction in delay slot might
+        # affect one of the reg arguments values
+        if self.ql.arch.type is QL_ARCH.MIPS:
+            slot_addr = self.cur_addr + self.ql.arch.pointersize
+            _, _, _, op_str = self.predictor.disasm_lite(slot_addr)
+            operands = op_str.split(',')
 
-                op_str = self.predictor.disasm(slot_addr).op_str
-                # register may be changed due to dealy slot
-                if '$a' in op_str.split(',')[0]:
-                    dst_reg = op_str.split(',')[0].strip('$')
-                    reg_idx = int(dst_reg.strip('a'))
+            reg_args = ('$a0', '$a1', '$a2', '$a3')
 
-                    # fetch real value by emulating instruction in delay slot
-                    with self._save() as qdb:
-                        qdb._run(slot_addr, 0, count=1)
-                        real_val = self.ql.arch.regs.read(dst_reg)
+            # find out whether one of the argument registers gets modified in the dealy slot
+            if any(a in operands[0] for a in reg_args):
+                last = self.last_addr
 
-                reg_names = [f'a{d}'for d in range(reg_n)]
-                if reg_idx != None:
-                    reg_names.pop(reg_idx)
+                dst_reg = operands[0].strip('$')
+                reg_idx = int(dst_reg.strip('a'))
 
-            elif arch_type in (QL_ARCH.ARM, QL_ARCH.CORTEX_M):
-                reg_names = [f'r{d}'for d in range(reg_n)]
+                # fetch real value by emulating instruction in delay slot
+                with self.save() as qdb:
+                    qdb._run(slot_addr, count=1)
+                    real_val = self.ql.arch.regs.read(dst_reg)
 
-            elif arch_type == QL_ARCH.X8664:
-                reg_names = ('rdi', 'rsi', 'rdx', 'rcx', 'r8', 'r9')[:reg_n]
+                # update argument value with the calculated one
+                fargs[reg_idx] = real_val
 
-            reg_args = [self.ql.arch.regs.read(reg_name) for reg_name in reg_names]
-            if reg_idx != None:
-                reg_args.insert(reg_idx, real_val)
+                # we don't want that to count as emulation, so restore last address
+                self.last_addr = last
 
-            reg_args = list(map(hex, reg_args))
+        nibbles = self.ql.arch.pointersize * 2
 
-        elif arch_type == QL_ARCH.X86:
-            stk_n = 2 if argc == -1 else argc
+        for i, a in enumerate(fargs):
+            deref = self.render.get_deref(a)
 
-        # read arguments on stack
-        if stk_n >= 0:
-            shadow_n = 0
-            base_offset = self.ql.arch.regs.arch_sp
+            if isinstance(deref, int):
+                deref_str = f'{deref:#0{nibbles + 2}x}'
 
-            if arch_type in (QL_ARCH.X86, QL_ARCH.X8664):
-                # shadow 1 pointer size for return address
-                shadow_n = 1
+            elif isinstance(deref, str):
+                deref_str = f'"{deref}"'
 
-            elif arch_type == QL_ARCH.MIPS:
-                # shadow 4 pointer size for mips
-                shadow_n = 4
+            else:
+                deref_str = ''
 
-            base_offset = self.ql.arch.regs.arch_sp + shadow_n * ptr_size
-            stk_args = [self.ql.mem.read(base_offset+offset*ptr_size, ptr_size) for offset in range(stk_n)]
-            endian = 'little' if self.ql.arch.endian == QL_ENDIAN.EL else 'big'
-            stk_args = list(map(hex, map(lambda x: int.from_bytes(x, endian), stk_args)))
+            qdb_print(QDB_MSG.INFO, f'arg{i}: {a:#0{nibbles + 2}x}{f" {RARROW} {deref_str}" if deref_str else ""}')
 
-        args = reg_args + stk_args
-        qdb_print(QDB_MSG.INFO, f'args: {args}')
+    def __info_breakpoints(self, args: str):
+        if self.bp_list:
+            qdb_print(QDB_MSG.INFO, f'{"id":2s} {"address":10s} {"enabled"}')
 
-    def do_show(self, keyword: Optional[str] = None, *args) -> None:
-        """
-        show some runtime information
-        """
+            for addr, bp in self.bp_list.items():
+                if not bp.temp:
+                    qdb_print(QDB_MSG.INFO, f"{bp.index:2d} {addr:#010x} {bp.enabled}")
 
-        qdb_print(QDB_MSG.INFO, f"Entry point: {self.ql.loader.entry_point:#x}")
-
-        if addr_elf_entry := getattr(self.ql.loader, 'elf_entry', None):
-            qdb_print(QDB_MSG.INFO, f"ELF entry: {addr_elf_entry:#x}")
+        else:
+            qdb_print(QDB_MSG.INFO, 'No breakpoints')
 
+    def __info_mem(self, kw: str):
         info_lines = iter(self.ql.mem.get_formatted_mapinfo())
 
         # print filed name first
         qdb_print(QDB_MSG.INFO, next(info_lines))
 
         # keyword filtering
-        if keyword:
-            lines = filter(lambda line: keyword in line, info_lines)
-        else:
-            lines = info_lines
+        lines = (line for line in info_lines if kw in line) if kw else info_lines
 
         for line in lines:
             qdb_print(QDB_MSG.INFO, line)
 
-        qdb_print(QDB_MSG.INFO, f"Breakpoints: {[hex(addr) for addr in self.bp_list.keys()]}")
-        qdb_print(QDB_MSG.INFO, f"Marked symbol: {[{key:hex(val)} for key,val in self.marker.mark_list]}")
+    def __info_marks(self, args: str):
+        """Show marked symbols.
+        """
+
+        if self.marker.mark_list:
+            qdb_print(QDB_MSG.INFO, f'{"symbol":10s} {"address":10s}')
+
+            for key, addr in self.marker.mark_list:
+                qdb_print(QDB_MSG.INFO, f'{key:10s} {addr:#010x}')
+
+        else:
+            qdb_print(QDB_MSG.INFO, 'No marked symbols')
+
+    def __info_snapshot(self, args: str):
         if self.rr:
-            qdb_print(QDB_MSG.INFO, f"Snapshots: {len([st for st in self.rr.layers if isinstance(st, self.rr.DiffedState)])}")
+            if self.rr.layers:
+                recent = self.rr.layers[-1]
+
+                # regs diff
+                if recent.reg:
+                    for reg, val in recent.reg.items():
+                        qdb_print(QDB_MSG.INFO, f'{reg:6s}: {val:08x}')
+
+                else:
+                    qdb_print(QDB_MSG.INFO, 'Regs identical')
+
+                qdb_print(QDB_MSG.INFO, '')
+
+                # system regs diff
+                if recent.xreg:
+                    for reg, val in recent.xreg.items():
+                        qdb_print(QDB_MSG.INFO, f'{reg:8s}: {val:08x}')
+
+                else:
+                    qdb_print(QDB_MSG.INFO, 'System regs identical')
+
+                qdb_print(QDB_MSG.INFO, '')
+
+                # ram diff
+                if recent.ram:
+                    for rng, (opcode, diff) in sorted(recent.ram.items()):
+                        lbound, ubound = rng
+                        perms, label, data = diff
+
+                        qdb_print(QDB_MSG.INFO, f'{opcode.name} {lbound:010x} - {ubound:010x} {perms:03b} {label:24s} ~{len(data)}')
+
+                else:
+                    qdb_print(QDB_MSG.INFO, 'Memory identical')
+
+            else:
+                qdb_print(QDB_MSG.INFO, 'No snapshots')
+
+        else:
+            qdb_print(QDB_MSG.INFO, 'Snapshots were not enabled for this session')
+
+    def __info_entry(self, args: str):
+        qdb_print(QDB_MSG.INFO, f'{"Entry point":16s}: {self.ql.loader.entry_point:#010x}')
+
+        if hasattr(self.ql.loader, 'elf_entry'):
+            qdb_print(QDB_MSG.INFO, f'{"ELF entry point":16s}: {self.ql.loader.elf_entry:#010x}')
+
+    def do_info(self, args: str) -> None:
+        """Provide run-time information.
+        """
+
+        subcmd, *a = args.split(maxsplit=1) if args else ('',)
+
+        if not a:
+            a = ['']
+
+        handlers = {
+            'args':        self.__info_args,
+            'breakpoints': self.__info_breakpoints,
+            'mem':         self.__info_mem,
+            'marks':       self.__info_marks,
+            'snapshot':    self.__info_snapshot,
+            'entry':       self.__info_entry
+        }
+
+        if subcmd in handlers:
+            handlers[subcmd](*a)
+
+        else:
+            qdb_print(QDB_MSG.ERROR, f'info subcommands: {list(handlers.keys())}')
 
     def do_script(self, filename: str) -> None:
         """
@@ -584,42 +657,51 @@ def do_script(self, filename: str) -> None:
         """
 
         if filename:
-            run_qdb_script(self, filename)
+            self._script = filename
+
+            self.run_qdb_script(filename)
         else:
             qdb_print(QDB_MSG.ERROR, "parameter filename must be specified")
 
-    def do_shell(self, *command) -> None:
+    def do_shell(self, args: str) -> None:
         """
         run python code
         """
 
+        # allowing arbitrary shell commands is a huge security problem. until it gets
+        # removed, block shell command in scripts for security reasons
+        if self._script:
+            qdb_print(QDB_MSG.ERROR, 'shell command is not allowed on script')
+            return
+
         try:
-            print(eval(*command))
+            print(eval(args))
         except:
             qdb_print(QDB_MSG.ERROR, "something went wrong ...")
 
-    def do_quit(self, *args) -> bool:
+    def do_quit(self, *args: str) -> None:
         """
         exit Qdb and stop running qiling instance
         """
 
         self.ql.stop()
-        if self._script:
-            return True
-        exit()
 
-    def do_EOF(self, *args) -> None:
+        sys.exit(0)
+
+    def do_EOF(self, *args: str) -> None:
         """
         handle Ctrl+D
         """
 
-        if input(f"{color.RED}[!] Are you sure about saying good bye ~ ? [Y/n]{color.END} ").strip() == "Y":
+        prompt = f'{color.RED}[!] are you sure you want to quit? [Y/n]{color.END} '
+        answer = input(prompt).strip()
+
+        if not answer or answer.lower() == 'y':
             self.do_quit()
 
     do_r = do_run
     do_s = do_step_in
     do_n = do_step_over
-    do_a = do_show_args
     do_j = do_jump
     do_m = do_mark
     do_q = do_quit
@@ -628,7 +710,3 @@ def do_EOF(self, *args) -> None:
     do_c = do_continue
     do_b = do_breakpoint
     do_dis = do_disassemble
-
-
-if __name__ == "__main__":
-    pass
diff --git a/qiling/debugger/qdb/render/__init__.py b/qiling/debugger/qdb/render/__init__.py
index 1625a52ae..0b7e61807 100644
--- a/qiling/debugger/qdb/render/__init__.py
+++ b/qiling/debugger/qdb/render/__init__.py
@@ -4,7 +4,6 @@
 #
 
 from .render import ContextRender
-from .render_x86 import ContextRenderX86
+from .render_intel import ContextRenderX86, ContextRenderX64
 from .render_mips import ContextRenderMIPS
 from .render_arm import ContextRenderARM, ContextRenderCORTEX_M
-from .render_x8664 import ContextRenderX8664
diff --git a/qiling/debugger/qdb/render/render.py b/qiling/debugger/qdb/render/render.py
index aa7a6022d..b1d62b85d 100644
--- a/qiling/debugger/qdb/render/render.py
+++ b/qiling/debugger/qdb/render/render.py
@@ -3,168 +3,184 @@
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
+"""Context Render for rendering UI
+"""
+
+
+from __future__ import annotations
 
+import os
 
-from capstone import CsInsn
-from typing import Mapping
-import os, copy
+from typing import TYPE_CHECKING, Callable, Collection, Dict, Iterator, List, Mapping, Optional, Sequence, Tuple, Union
 
 from ..context import Context
 from ..const import color
 
 
+if TYPE_CHECKING:
+    from qiling.core import Qiling
+    from ..branch_predictor.branch_predictor import BranchPredictor, Prophecy
+    from ..misc import InsnLike
 
-"""
 
-    Context Render for rendering UI
+COLORS = (
+    color.DARKCYAN,
+    color.BLUE,
+    color.RED,
+    color.YELLOW,
+    color.GREEN,
+    color.PURPLE,
+    color.CYAN,
+    color.WHITE
+)
 
-"""
+RARROW = '\u2192'
+RULER = '\u2500'
+
+CURSOR   = '\u25ba'  # current instruction cursor
+GOING_DN = '\u2ba6'  # branching downward to a higher address
+GOING_UP = '\u2ba4'  # branching upward to a lower address
 
-COLORS = (color.DARKCYAN, color.BLUE, color.RED, color.YELLOW, color.GREEN, color.PURPLE, color.CYAN, color.WHITE)
 
 class Render:
+    """Base class for graphical rendering functionality.
+
+    Render objects are agnostic to current emulation state.
     """
-    base class for rendering related functions
-    """
 
-    def divider_printer(field_name, ruler="─"):
+    def __init__(self):
+        # make sure mixin classes are properly initialized
+        super().__init__()
+
+        self.regs_a_row = 4  # number of regs to display per row
+        self.stack_num = 8   # number of stack entries to display in context
+        self.disasm_num = 4  # number of instructions to display in context before and after current pc
+
+    @staticmethod
+    def divider_printer(header: str, footer: bool = False):
         """
         decorator function for printing divider and field name
         """
 
-        def decorator(context_dumper):
+        def decorator(wrapped: Callable):
             def wrapper(*args, **kwargs):
                 try:
                     width, _ = os.get_terminal_size()
                 except OSError:
                     width = 130
 
-                bar = (width - len(field_name)) // 2 - 1
-                print(ruler * bar, field_name, ruler * bar)
-                context_dumper(*args, **kwargs)
-                if "DISASM" in field_name:
-                    print(ruler * width)
+                print(header.center(width, RULER))
+                wrapped(*args, **kwargs)
+
+                if footer:
+                    print(RULER * width)
 
             return wrapper
         return decorator
 
-    def __init__(self):
-        self.regs_a_row = 4
-        self.stack_num = 10
-        self.disasm_num = 0x10
-        self.color = color
-
-    def reg_diff(self, cur_regs, saved_reg_dump):
+    def reg_diff(self, curr: Mapping[str, int], prev: Mapping[str, int]) -> List[str]:
         """
         helper function for highlighting register changed during execution
         """
 
-        if saved_reg_dump:
-            reg_dump = copy.deepcopy(saved_reg_dump)
-            if getattr(self, "regs_need_swapped", None):
-                reg_dump = self.swap_reg_name(reg_dump)
+        return [k for k in curr if curr[k] != prev[k]] if prev else []
 
-            return [k for k in cur_regs if cur_regs[k] != reg_dump[k]]
-
-    def render_regs_dump(self, regs, diff_reg=None):
-        """
-        helper function for redering registers dump
+    def render_regs_dump(self, regs: Mapping[str, int], diff_reg: Collection[str]) -> None:
+        """Helper function for rendering registers dump.
         """
 
-        lines = ""
-        for idx, r in enumerate(regs, 1):
-            line = "{}{}: 0x{{:08x}}  {}\t".format(COLORS[(idx-1) // self.regs_a_row], r, color.END)
+        # find the length of the longest reg name to have all regs aligned in columns
+        longest = max(len(name) for name in regs)
 
-            if diff_reg and r in diff_reg:
-                line = f"{color.UNDERLINE}{color.BOLD}{line}"
+        def __render_regs_line() -> Iterator[str]:
+            elements = []
 
-            if idx % self.regs_a_row == 0 and idx != 32:
-                line += "\n"
+            for idx, (name, value) in enumerate(regs.items()):
+                line_color = f'{COLORS[idx // self.regs_a_row]}'
 
-            lines += line
+                if name in diff_reg:
+                    line_color = f'{color.UNDERLINE}{color.BOLD}{line_color}'
 
-        print(lines.format(*regs.values()))
+                elements.append(f'{line_color}{name:{longest}s}: {value:#010x}{color.END}')
 
-    def render_stack_dump(self, arch_sp: int) -> None:
-        """
-        helper function for redering stack dump
-        """
-
-        # Loops over stack range (last 10 addresses)
-        for idx in range(self.stack_num):
-            addr = arch_sp + idx * self.pointersize
+                if (idx + 1) % self.regs_a_row == 0:
+                    yield '\t'.join(elements)
 
-            '''
-            @NOTE: Implemented new class arch_x8664 in order to bugfix issue with only dereferencing 32-bit pointers
-            on 64-bit emulation passes.
-            '''
-            if (val := self.try_read_pointer(addr)[0]): # defined to be try_read_pointer(addr)[0] - dereferneces pointer
+                    elements.clear()
 
-                # @TODO: Bug here where the values on the stack are being displayed in 32-bit format
-                print(f"SP + 0x{idx*self.pointersize:02x}│ [0x{addr:08x}] —▸ 0x{self.unpack(val):08x}", end="")
+        for line in __render_regs_line():
+            print(line)
 
-            # try to dereference wether it's a pointer
-            if (buf := self.try_read_pointer(addr))[0] is not None:
+    def render_flags(self, flags: Mapping[str, int], before: str = ''):
+        def __set(f: str) -> str:
+            return f'{color.BLUE}{f.upper()}{color.END}'
 
-                if (addr := self.unpack(buf[0])):
+        def __cleared(f: str) -> str:
+            return f'{color.GREEN}{f.lower()}{color.END}'
 
-                    # try to dereference again
-                    if (buf := self.try_read_pointer(addr))[0] is not None:
-                        s = self.try_read_string(addr)
+        s_before = f"[{before}] " if before else ""
+        s_flags = " ".join(__set(f) if val else __cleared(f) for f, val in flags.items())
 
-                        if s and s.isprintable():
-                            print(f" ◂— {self.read_string(addr)}", end="")
-                        else:
-                            print(f" ◂— 0x{self.unpack(buf[0]):08x}", end="")
-            print()
+        print(f'{s_before}[flags: {s_flags}]')
 
-    def render_assembly(self, lines) -> None:
-        """
-        helper function for rendering assembly
+    def render_stack_dump(self, sp: int, dump: Sequence[Tuple[int, int, Union[int, str, None]]]) -> None:
+        """Helper function for rendering stack dump.
         """
 
-        # assembly before current location
-        if (backward := lines.get("backward", None)):
-            for line in backward:
-                self.print_asm(line)
+        # number of hexadecimal nibbles to display per value
+        nibbles = self.pointersize * 2
 
-        # assembly for current location
-        if (cur_insn := lines.get("current", None)):
-            prophecy = self.predictor.predict()
-            self.print_asm(cur_insn, to_jump=prophecy.going)
+        for address, value, deref in dump:
+            offset = address - sp
 
-        # assembly after current location
-        if (forward := lines.get("forward", None)):
-            for line in forward:
-                self.print_asm(line)
+            value_str = '(unreachable)' if value is None else f'{value:#0{nibbles + 2}x}'
 
-    def swap_reg_name(self, cur_regs: Mapping[str, int], extra_dict=None) -> Mapping[str, int]:
-        """
-        swap register name with more readable register name
-        """
+            if isinstance(deref, int):
+                deref_str = f'{deref:#0{nibbles + 2}x}'
 
-        target_items = extra_dict.items() if extra_dict else self.regs_need_swapped.items()
+            elif isinstance(deref, str):
+                deref_str = f'"{deref}"'
 
-        for old_reg, new_reg in target_items:
-            cur_regs.update({old_reg: cur_regs.pop(new_reg)})
+            else:
+                deref_str = ''
 
-        return cur_regs 
+            print(f'SP + {offset:#04x} │ {address:#010x} : {value_str}{f" {RARROW} {deref_str}" if deref_str else ""}')
 
-    def print_asm(self, insn: CsInsn, to_jump: bool = False) -> None:
-        """
-        helper function for printing assembly instructions, indicates where we are and the branch prediction
-        provided by BranchPredictor
+    def render_assembly(self, listing: Sequence[InsnLike], pc: int, prediction: Prophecy) -> None:
+        """Helper function for rendering assembly.
         """
 
-        opcode = "".join(f"{b:02x}" for b in insn.bytes)
+        def __render_asm_line(insn: InsnLike) -> str:
+            """Helper function for rendering assembly instructions, indicates where we are and
+            the branch prediction provided by branch predictor
+            """
+
+            trace_line = f"{insn.address:#010x} │ {insn.bytes.hex():18s} {insn.mnemonic:12} {insn.op_str:35s}"
+
+            cursor = ''  # current instruction cursor
+            brmark = ''  # branching mark
+
+            if insn.address == pc:
+                cursor = CURSOR
+
+                if prediction.going:
+                    # branch target might be None in case it should have been
+                    # read from memory but that memory could not be reached
+                    bmark = '?' if prediction.where is None else (GOING_DN if prediction.where > pc else GOING_UP)
+
+                    # apply some colors
+                    brmark = f'{color.RED}{bmark}{color.RESET}'
 
-        trace_line = f"0x{insn.address:08x} │ {opcode:15s} {insn.mnemonic:10} {insn.op_str:35s}"
+                # <DEBUG>
+                where = '?' if prediction.where is None else f'{prediction.where:#010x}'
 
-        cursor = "►" if self.cur_addr == insn.address else " "
+                print(f'prediction: {f"taken, {where}" if prediction.going else "not taken"}')
+                # </DEBUG>
 
-        jump_sign = f"{color.RED}✓{color.END}" if to_jump else " "
+            return f"{brmark:1s}  {cursor:1s}   {color.DARKGRAY}{trace_line}{color.RESET}"
 
-        print(f"{jump_sign}  {cursor}   {color.DARKGRAY}{trace_line}{color.END}")
+        for insn in listing:
+            print(__render_asm_line(insn))
 
 
 class ContextRender(Context, Render):
@@ -172,17 +188,17 @@ class ContextRender(Context, Render):
     base class for context render
     """
 
-    def __init__(self, ql, predictor):
+    def __init__(self, ql: Qiling, predictor: BranchPredictor):
         super().__init__(ql)
-        Render.__init__(self)
+
         self.predictor = predictor
+        self.prev_regs: Dict[str, int] = {}
 
-    def dump_regs(self) -> Mapping[str, int]:
-        """
-        dump all registers
+    def get_regs(self) -> Dict[str, int]:
+        """Save current registers state.
         """
 
-        return {reg_name: self.ql.arch.regs.read(reg_name) for reg_name in self.regs}
+        return {reg_name: self.read_reg(reg_name) for reg_name in self.regs}
 
     @Render.divider_printer("[ STACK ]")
     def context_stack(self) -> None:
@@ -190,50 +206,55 @@ def context_stack(self) -> None:
         display context stack dump
         """
 
-        self.render_stack_dump(self.ql.arch.regs.arch_sp)
-        
+        sp = self.cur_sp
+        stack_dump = []
+
+        for i in range(self.stack_num):
+            address = sp + i * self.asize
+
+            # attempt to read current stack entry
+            value = self.try_read_pointer(address)
+
+            # treat stack entry as a pointer and attempt to dereference it
+            deref = None if value is None else self.get_deref(value)
+
+            stack_dump.append((address, value, deref))
+
+        self.render_stack_dump(sp, stack_dump)
+
     @Render.divider_printer("[ REGISTERS ]")
-    def context_reg(self, saved_states: Mapping["str", int]) -> None:
-        """
-        display context registers dump
+    def context_reg(self) -> None:
+        """Rendering registers context.
         """
 
-        return NotImplementedError
+        curr = self.get_regs()
+        prev = self.prev_regs
+
+        curr = self.swap_regs(curr)
+        prev = self.swap_regs(prev)
+
+        diff_reg = self.reg_diff(curr, prev)
+        self.render_regs_dump(curr, diff_reg)
+        self.print_mode_info()
 
-    @Render.divider_printer("[ DISASM ]")
+    @Render.divider_printer("[ DISASM ]", footer=True)
     def context_asm(self) -> None:
+        """Disassemble srrounding instructions.
         """
-        read context assembly and render with render_assembly
-        """
 
-        lines = {}
-        past_list = []
-        from_addr = self.cur_addr - self.disasm_num
-        to_addr = self.cur_addr + self.disasm_num
-
-        cur_addr = from_addr
-        while cur_addr <= to_addr:
-            insn = self.disasm(cur_addr)
-            cur_addr += insn.size
-            past_list.append(insn)
-
-        bk_list = []
-        fd_list = []
-        cur_insn = None
-        for each in past_list:
-            if each.address < self.cur_addr:
-                bk_list.append(each)
-
-            elif each.address > self.cur_addr:
-                fd_list.append(each)
-
-            elif each.address == self.cur_addr:
-                cur_insn = each 
-
-        lines.update({
-            "backward": bk_list,
-            "forward": fd_list,
-            "current": cur_insn,
-            })
-
-        self.render_assembly(lines)
+        address = self.cur_addr
+        prediction = self.predictor.predict()
+
+        # assuming a single instruction is in the same size of a native pointer.
+        # this is not true for all architectures.
+        ptr = address - self.pointersize * self.disasm_num
+        listing = []
+
+        # taking disasm_num instructions before, current, and disasm_num instructions after
+        for _ in range(self.disasm_num * 2 + 1):
+            insn = self.disasm(ptr)
+            listing.append(insn)
+
+            ptr += insn.size
+
+        self.render_assembly(listing, address, prediction)
diff --git a/qiling/debugger/qdb/render/render_arm.py b/qiling/debugger/qdb/render/render_arm.py
index 7209be2c6..f08e39fa3 100644
--- a/qiling/debugger/qdb/render/render_arm.py
+++ b/qiling/debugger/qdb/render/render_arm.py
@@ -3,73 +3,66 @@
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
+from typing import Iterator
 
-
-from .render import *
+from .render import Render, ContextRender
 from ..arch import ArchARM, ArchCORTEX_M
+from ..misc import InsnLike
+
 
 class ContextRenderARM(ContextRender, ArchARM):
-    """
-    context render for ARM
+    """Context renderer for ARM architecture.
     """
 
-    def __init__(self, ql, predictor):
-        super().__init__(ql, predictor)
-        ArchARM.__init__(self)
-        self.disasm_num = 8
+    def print_mode_info(self) -> None:
+        cpsr = self.read_reg(self._flags_reg)
 
-    @staticmethod
-    def print_mode_info(bits):
-        flags = ArchARM.get_flags(bits)
+        flags = ArchARM.get_flags(cpsr)
+        mode = ArchARM.get_mode(cpsr)
 
-        print(f"[{flags.pop('mode')} mode] ", end="")
-        for key, val in flags.items():
-            if val:
-                print(f"{color.BLUE}{key.upper()} ", end="")
-            else:
-                print(f"{color.GREEN}{key.lower()} ", end="")
+        self.render_flags(flags, f'{mode} mode')
 
-        print(color.END)
+    def __disasm_all(self, rng: range) -> Iterator[InsnLike]:
+        addr = rng.start
 
-    @Render.divider_printer("[ REGISTERS ]")
-    def context_reg(self, saved_reg_dump):
-        """
-        redering context registers
+        while addr in rng:
+            insn = self.disasm(addr)
+            yield insn
+
+            addr += insn.size
+
+    @Render.divider_printer("[ DISASM ]", footer=True)
+    def context_asm(self) -> None:
+        """Disassemble srrounding instructions.
         """
 
-        cur_regs = self.dump_regs()
-        cur_regs = self.swap_reg_name(cur_regs)
-        diff_reg = self.reg_diff(cur_regs, saved_reg_dump)
-        self.render_regs_dump(cur_regs, diff_reg=diff_reg)
-        self.print_mode_info(self.ql.arch.regs.cpsr)
+        address = self.cur_addr
+        prediction = self.predictor.predict()
+
+        # arm thumb may mix narrow and wide instructions so we can never know for
+        # sure where we need to start reading instructions from. to work around
+        # that we assume all instructions are wide, and then take the most recent
+        # ones into consideration.
 
+        listing = []
+
+        begin = address - self.asize * self.disasm_num
+        end = address
+
+        # disassemble all instructions in range, but keep only the last ones
+        listing.extend(self.__disasm_all(range(begin, end)))
+        listing = listing[-self.disasm_num:]
+
+        begin = address
+        end = address + self.asize * (self.disasm_num + 1)
+
+        # disassemble all instructions in range, but keep only the first ones
+        listing.extend(self.__disasm_all(range(begin, end)))
+        listing = listing[:self.disasm_num * 2 + 1]
+
+        self.render_assembly(listing, address, prediction)
 
 
 class ContextRenderCORTEX_M(ContextRenderARM, ArchCORTEX_M):
+    """Context renderer for ARM Cortex-M architecture.
     """
-    context render for cortex_m
-    """
-
-    def __init__(self, ql, predictor):
-        super().__init__(ql, predictor)
-        ArchCORTEX_M.__init__(self)
-        self.regs_a_row = 3
-
-    @Render.divider_printer("[ REGISTERS ]")
-    def context_reg(self, saved_reg_dump):
-        cur_regs = self.dump_regs()
-        cur_regs = self.swap_reg_name(cur_regs)
-
-        # for re-order
-        extra_dict = {
-                "xpsr": "xpsr",
-                "control": "control",
-                "primask": "primask",
-                "faultmask": "faultmask",
-                "basepri": "basepri",
-                }
-
-        cur_regs = self.swap_reg_name(cur_regs, extra_dict=extra_dict)
-        diff_reg = self.reg_diff(cur_regs, saved_reg_dump)
-        self.render_regs_dump(cur_regs, diff_reg=diff_reg)
-        self.print_mode_info(self.ql.arch.regs.cpsr)
diff --git a/qiling/debugger/qdb/render/render_intel.py b/qiling/debugger/qdb/render/render_intel.py
new file mode 100644
index 000000000..0e0b8f7e2
--- /dev/null
+++ b/qiling/debugger/qdb/render/render_intel.py
@@ -0,0 +1,55 @@
+#!/usr/bin/env python3
+#
+# Cross Platform and Multi Architecture Advanced Binary Emulation Framework
+#
+
+from typing import Optional
+
+from .render import Render, ContextRender
+from ..arch import ArchIntel, ArchX86, ArchX64
+
+
+class ContextRenderIntel(ContextRender):
+    """Context renderer base class for Intel architecture.
+    """
+
+    def print_mode_info(self) -> None:
+        eflags = self.read_reg('eflags')
+
+        flags = ArchIntel.get_flags(eflags)
+        iopl = ArchIntel.get_iopl(eflags)
+
+        self.render_flags(flags, f'iopl: {iopl}')
+
+    @Render.divider_printer("[ DISASM ]", footer=True)
+    def context_asm(self) -> None:
+        """Disassemble srrounding instructions.
+        """
+
+        address = self.cur_addr
+        prediction = self.predictor.predict()
+
+        ptr = address
+        listing = []
+
+        # since intel architecture has instructions with varying sizes, it is
+        # difficult to tell what were the preceding instructions. for that reason
+        # we display instructions only from current address and on.
+
+        for _ in range(9):
+            insn = self.disasm(ptr)
+            listing.append(insn)
+
+            ptr += insn.size
+
+        self.render_assembly(listing, address, prediction)
+
+
+class ContextRenderX86(ContextRenderIntel, ArchX86):
+    """Context renderer for x86 architecture.
+    """
+
+
+class ContextRenderX64(ContextRenderIntel, ArchX64):
+    """Context renderer for x86-64 architecture.
+    """
diff --git a/qiling/debugger/qdb/render/render_mips.py b/qiling/debugger/qdb/render/render_mips.py
index ff67891d8..13f01c658 100644
--- a/qiling/debugger/qdb/render/render_mips.py
+++ b/qiling/debugger/qdb/render/render_mips.py
@@ -3,27 +3,13 @@
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
-
-
-from .render import *
+from .render import ContextRender
 from ..arch import ArchMIPS
 
+
 class ContextRenderMIPS(ContextRender, ArchMIPS):
+    """Context renderer for MIPS architecture.
     """
-    context render for MIPS
-    """
-
-    def __init__(self, ql, predictor):
-        super().__init__(ql, predictor)
-        ArchMIPS.__init__(self)
-
-    @Render.divider_printer("[ REGISTERS ]")
-    def context_reg(self, saved_reg_dump):
-        """
-        redering context registers
-        """
 
-        cur_regs = self.dump_regs()
-        cur_regs = self.swap_reg_name(cur_regs)
-        diff_reg = self.reg_diff(cur_regs, saved_reg_dump)
-        self.render_regs_dump(cur_regs, diff_reg=diff_reg)
+    def print_mode_info(self) -> None:
+        pass
diff --git a/qiling/debugger/qdb/render/render_x86.py b/qiling/debugger/qdb/render/render_x86.py
deleted file mode 100644
index c13b92fe7..000000000
--- a/qiling/debugger/qdb/render/render_x86.py
+++ /dev/null
@@ -1,68 +0,0 @@
-#!/usr/bin/env python3
-#
-# Cross Platform and Multi Architecture Advanced Binary Emulation Framework
-#
-
-
-
-from .render import *
-from ..arch import ArchX86
-
-class ContextRenderX86(ContextRender, ArchX86):
-    """
-    context render for X86
-    """
-
-    def __init__(self, ql, predictor):
-        super().__init__(ql, predictor)
-        ArchX86.__init__(self)
-
-    @Render.divider_printer("[ REGISTERS ]")
-    def context_reg(self, saved_reg_dump):
-        cur_regs = self.dump_regs()
-        diff_reg = self.reg_diff(cur_regs, saved_reg_dump)
-        self.render_regs_dump(cur_regs, diff_reg=diff_reg)
-
-        flags = self.get_flags(self.ql.arch.regs.eflags)
-        print("EFLAGS: ", end="")
-        print(color.GREEN, end="")
-        for key, val in flags.items():
-            if val:
-                print(f"{color.BLUE}{key.upper()} ", end="")
-            else:
-                print(f"{color.GREEN}{key.lower()} ", end="")
-
-        print(color.END)
-
-    @Render.divider_printer("[ DISASM ]")
-    def context_asm(self):
-        lines = {}
-        past_list = []
-
-        cur_addr = self.cur_addr
-        while len(past_list) < 10:
-            line = self.disasm(cur_addr)
-            past_list.append(line)
-            cur_addr += line.size
-
-        fd_list = []
-        cur_insn = None
-        for each in past_list:
-            if each.address > self.cur_addr:
-                fd_list.append(each)
-
-            elif each.address == self.cur_addr:
-                cur_insn = each 
-
-        """
-        only forward and current instruction will be printed, 
-        because we don't have a solid method to disasm backward instructions,
-        since it's x86 instruction length is variadic 
-        """
-
-        lines.update({
-            "current": cur_insn,
-            "forward": fd_list,
-            })
-
-        self.render_assembly(lines)
diff --git a/qiling/debugger/qdb/render/render_x8664.py b/qiling/debugger/qdb/render/render_x8664.py
deleted file mode 100644
index 22c687d49..000000000
--- a/qiling/debugger/qdb/render/render_x8664.py
+++ /dev/null
@@ -1,58 +0,0 @@
-#!/usr/bin/env python3
-#
-# Cross Platform and Multi Architecture Advanced Binary Emulation Framework
-#
-
-
-
-from .render import *
-from ..arch import ArchX8664
-
-class ContextRenderX8664(ContextRender, ArchX8664):
-    """
-    Context render for X86_64
-    """
-
-    def __init__(self, ql, predictor):
-        super().__init__(ql, predictor)
-        ArchX8664.__init__(self)
-
-    @Render.divider_printer("[ REGISTERS ]")
-    def context_reg(self, saved_reg_dump):
-        cur_regs = self.dump_regs()
-        diff_reg = self.reg_diff(cur_regs, saved_reg_dump)
-        self.render_regs_dump(cur_regs, diff_reg=diff_reg)
-        print(color.GREEN, "EFLAGS: [CF: {flags[CF]}, PF: {flags[PF]}, AF: {flags[AF]}, ZF: {flags[ZF]}, SF: {flags[SF]}, OF: {flags[OF]}]".format(flags=self.get_flags(self.ql.arch.regs.eflags)), color.END, sep="")
-
-    @Render.divider_printer("[ DISASM ]")
-    def context_asm(self):
-        lines = {}
-        past_list = []
-
-        cur_addr = self.cur_addr
-        while len(past_list) < 10:
-            line = self.disasm(cur_addr)
-            past_list.append(line)
-            cur_addr += line.size
-
-        fd_list = []
-        cur_insn = None
-        for each in past_list:
-            if each.address > self.cur_addr:
-                fd_list.append(each)
-
-            elif each.address == self.cur_addr:
-                cur_insn = each 
-
-        """
-        only forward and current instruction will be printed, 
-        because we don't have a solid method to disasm backward instructions,
-        since it's x86 instruction length is variadic 
-        """
-
-        lines.update({
-            "current": cur_insn,
-            "forward": fd_list,
-            })
-
-        self.render_assembly(lines)
diff --git a/qiling/debugger/qdb/utils.py b/qiling/debugger/qdb/utils.py
index c5f0d4456..03be0ba89 100644
--- a/qiling/debugger/qdb/utils.py
+++ b/qiling/debugger/qdb/utils.py
@@ -4,16 +4,16 @@
 #
 
 from __future__ import annotations
-from typing import TYPE_CHECKING, Callable, Dict, Mapping, Tuple, Type
 
-from capstone import CsInsn
+from enum import Enum
+from typing import TYPE_CHECKING, Any, Callable, Dict, List, Mapping, Optional, Tuple, Type, TypeVar, Union
 
 from qiling.const import QL_ARCH
 
 from .render import (
     ContextRender,
     ContextRenderX86,
-    ContextRenderX8664,
+    ContextRenderX64,
     ContextRenderARM,
     ContextRenderCORTEX_M,
     ContextRenderMIPS
@@ -22,7 +22,7 @@
 from .branch_predictor import (
     BranchPredictor,
     BranchPredictorX86,
-    BranchPredictorX8664,
+    BranchPredictorX64,
     BranchPredictorARM,
     BranchPredictorCORTEX_M,
     BranchPredictorMIPS,
@@ -36,81 +36,69 @@
     from .qdb import QlQdb
 
 
-def qdb_print(msgtype: QDB_MSG, msg: str) -> None:
-    """
-    color printing
-    """
+_K = TypeVar('_K')
+_V = TypeVar('_V')
 
-    def print_error(msg):
-        return f"{color.RED}[!] {msg}{color.END}"
 
-    def print_info(msg):
-        return f"{color.CYAN}[+] {msg}{color.END}"
+def qdb_print(level: QDB_MSG, msg: str) -> None:
+    """Log printing.
+    """
 
-    color_coated = {
-            QDB_MSG.ERROR: print_error,
-            QDB_MSG.INFO : print_info,
-            }.get(msgtype)(msg)
+    decorations = {
+        QDB_MSG.ERROR: ('!', color.RED),
+        QDB_MSG.INFO : ('+', color.CYAN),
+    }
 
-    print(color_coated)
+    tag, col = decorations[level]
 
+    print(f'{col}[{tag}] {msg}{color.END}')
 
-def setup_address_marker():
 
-    class Marker:
-        """provide the ability to mark an address as a more easier rememberable alias
-        """
+class Marker:
+    """provide the ability to mark an address as a more easier rememberable alias
+    """
 
-        def __init__(self):
-            self._mark_list = {}
+    def __init__(self):
+        self._mark_list: Dict[str, int] = {}
 
-        def get_symbol(self, sym):
-            """
-            get the mapped address to a symbol if it's in the mark_list
-            """
+    def get_address(self, sym: str) -> Optional[int]:
+        """
+        get the mapped address to a symbol if it's in the mark_list
+        """
 
-            return self._mark_list.get(sym, None)
+        return self._mark_list.get(sym)
 
-        @property
-        def mark_list(self):
-            """
-            get a list about what we marked
-            """
+    @property
+    def mark_list(self):
+        """
+        get a list about what we marked
+        """
 
-            return self._mark_list.items()
+        return self._mark_list.items()
 
-        def gen_sym_name(self):
-            """
-            generating symbol name automatically
-            """
+    def gen_sym_name(self) -> str:
+        """
+        generating symbol name automatically
+        """
 
-            sym_name, idx = "sym0", 0
-            while sym_name in self._mark_list:
-                idx += 1
-                sym_name = f"sym{idx}"
+        syms = len(self._mark_list)
 
-            return sym_name
+        # find the next available 'sym#'
+        return next((f'sym{i}' for i in range(syms) if f'sym{i}' not in self._mark_list), f'sym{syms}')
 
-        def mark_only_loc(self, loc):
-            """
-            mark when location provided only
-            """
+    def mark(self, loc: int, sym: Optional[str] = None) -> str:
+        """
+        mark loc as sym
+        """
 
-            sym_name = self.gen_sym_name()
-            self.mark(sym_name, loc)
-            return sym_name
+        sym = sym or self.gen_sym_name()
 
-        def mark(self, sym: str, loc: int):
-            """
-            mark loc as sym
-            """
+        if sym in self._mark_list:
+            return ''
 
-            if sym not in self.mark_list:
-                self._mark_list.update({sym: loc})
-            else:
-                return f"dumplicated symbol name: {sym} at address: 0x{loc:08x}"
+        self._mark_list[sym] = loc
 
-    return Marker()
+        return sym
 
 
 # helper functions for setting proper branch predictor and context render depending on different arch
@@ -120,7 +108,7 @@ def setup_branch_predictor(ql: Qiling) -> BranchPredictor:
 
     preds: Dict[QL_ARCH, Type[BranchPredictor]] = {
         QL_ARCH.X86:      BranchPredictorX86,
-        QL_ARCH.X8664:    BranchPredictorX8664,
+        QL_ARCH.X8664:    BranchPredictorX64,
         QL_ARCH.ARM:      BranchPredictorARM,
         QL_ARCH.CORTEX_M: BranchPredictorCORTEX_M,
         QL_ARCH.MIPS:     BranchPredictorMIPS
@@ -136,7 +124,7 @@ def setup_context_render(ql: Qiling, predictor: BranchPredictor) -> ContextRende
 
     rends: Dict[QL_ARCH, Type[ContextRender]] = {
         QL_ARCH.X86:      ContextRenderX86,
-        QL_ARCH.X8664:    ContextRenderX8664,
+        QL_ARCH.X8664:    ContextRenderX64,
         QL_ARCH.ARM:      ContextRenderARM,
         QL_ARCH.CORTEX_M: ContextRenderCORTEX_M,
         QL_ARCH.MIPS:     ContextRenderMIPS
@@ -146,121 +134,137 @@ def setup_context_render(ql: Qiling, predictor: BranchPredictor) -> ContextRende
 
     return r(ql, predictor)
 
-def run_qdb_script(qdb: QlQdb, filename: str) -> None:
-    with open(filename) as fd:
-        for line in iter(fd.readline, ""):
 
-            # skip commented and empty line
-            if line.startswith("#") or line == "\n":
-                continue
+class MemDiff(Enum):
+    ADD = '+'
+    REM = '-'
+    MOD = '*'
 
-            cmd, arg, _ = qdb.parseline(line)
-            func = getattr(qdb, f"do_{cmd}")
-            if arg:
-                func(arg)
-            else:
-                func()
 
+RamKey = Tuple[int, int]
+RamVal = Tuple[int, str, bytes]
+
+RamDiffKey = Tuple[int, int]
+RamDiffVal = Tuple[MemDiff, Tuple[int, str, Union[bytes, Tuple]]]
 
-class SnapshotManager:
-    """for functioning differential snapshot
 
-    Supports Qdb features like:
-    1. record/replay debugging
-    2. memory access in gdb-style
+class DiffedState:
+    """
+    internal container for storing diffed state
     """
 
-    class State:
-        """
-        internal container for storing raw state from qiling
-        """
+    def __init__(self, reg, xreg, ram, loader):
+        self.reg: Dict[str, int] = reg
+        self.xreg: Dict[str, int] = xreg
+        self.ram: Dict[RamDiffKey, RamDiffVal] = ram
+        self.loader: Dict[str, Any] = loader
 
-        def __init__(self, saved_state):
-            self.reg, self.ram, self.xreg = SnapshotManager.transform(saved_state)
 
-    class DiffedState:
-        """
-        internal container for storing diffed state
-        """
+class State:
+    """
+    internal container for storing raw state from qiling
+    """
+
+    def __init__(self, saved: Mapping[str, Mapping]):
+        self.reg: Dict[str, int] = saved.get("reg") or {}
+        self.xreg: Dict[str, int] = saved.get("cpr") or saved.get("msr") or {}
+
+        mem = saved.get("mem") or {}
+        ram = mem.get("ram") or []
+
+        # saved ram lists might not match in order, we turn them into dicts to work around
+        # that. in these dicts every memory content is mapped to its memory entry's properties
+        self.ram: Dict[RamKey, RamVal] = {(lbound, ubound): (perms, label, data) for lbound, ubound, perms, label, data in ram}
 
-        def __init__(self, diffed_st):
-            self.reg, self.ram, self.xreg = diffed_st
+        self.loader: Dict[str, Any] = saved.get('loader') or {}
 
     @staticmethod
-    def transform(st):
-        """
-        transform saved context into binary set
-        """
+    def __dict_diff(d0: Mapping[_K, _V], d1: Mapping[_K, _V]) -> Dict[_K, _V]:
+        return {k: v for k, v in d0.items() if v != d1.get(k)}
 
-        reg  = st.get("reg", {})
-        mem  = st.get("mem", [])
-        xreg = st.get("cpr") or st.get("msr") or {}
+    def _diff_reg(self, other: State) -> Dict[str, int]:
+        return State.__dict_diff(self.reg, other.reg)
 
-        ram = []
-        for mem_seg in mem["ram"]:
-            lbound, ubound, perms, label, raw_bytes = mem_seg
-            rb_set = {(idx, val) for idx, val in enumerate(raw_bytes)}
-            ram.append((lbound, ubound, perms, label, rb_set))
+    def _diff_xreg(self, other: State) -> Dict[str, int]:
+        return State.__dict_diff(self.xreg, other.xreg)
 
-        return (reg, ram, xreg)
+    def _diff_ram(self, other: State) -> Dict[RamDiffKey, RamDiffVal]:
+        ram0 = self.ram
+        ram1 = other.ram
 
-    def __init__(self, ql):
-        self.ql = ql
-        self.layers = []
+        ram_diff: Dict[RamDiffKey, RamDiffVal] = {}
 
-    def _save(self) -> State:
-        """
-        acquire current State by wrapping saved context from ql.save()
-        """
+        removed  = [rng for rng in ram0 if rng not in ram1]
+        added    = [rng for rng in ram1 if rng not in ram0]
+        modified = [rng for rng in ram0 if rng in ram1 and ram0[rng] != ram1[rng]]
 
-        return self.State(self.ql.save())
+        # memory regions that got removed should be re-added
+        for rng in removed:
+            ram_diff[rng] = (MemDiff.ADD, ram0[rng])
 
-    def diff_reg(self, prev_reg, cur_reg):
-        """
-        diff two register values
-        """
+        # memory regions that got added should be removed
+        for rng in added:
+            _, label, _ = ram1[rng]
 
-        diffed = filter(lambda t: t[0] != t[1], zip(prev_reg.items(), cur_reg.items()))
-        return {prev[0]: prev[1] for prev, _ in diffed}
+            # though we discard data as it is not required anymore, label is still required
+            # to determine the method of removing the region: brk, mmap, or ordinary map
+            ram_diff[rng] = (MemDiff.REM, (-1, label, b''))
 
-    def diff_ram(self, prev_ram, cur_ram):
-        """
-        diff two ram data if needed
-        """
+        # memory regions that fot modified should be reverted back
+        for rng in modified:
+            perms0, label0, data0 = ram0[rng]
+            perms1, label1, data1 = ram1[rng]
 
-        if any((cur_ram is None, prev_ram is None, prev_ram == cur_ram)):
-            return
+            perms = -1 if perms0 == perms1 else perms0
 
-        ram = []
-        paired = zip(prev_ram, cur_ram)
-        for each in paired:
-            # lbound, ubound, perm, label, data
-            *prev_others, prev_rb_set = each[0]
-            *cur_others, cur_rb_set = each[1]
+            assert label0 == label1, 'memory region label changed unexpectedly'
+            assert len(data0) == len(data1), 'memory contents differ in size'
 
-            if prev_others == cur_others and cur_rb_set != prev_rb_set:
-                diff_set = prev_rb_set - cur_rb_set
-            else:
-                continue
+            # scan both data chunks and keep the index and byte value of the unmatched ones.
+            # if memory contents are identical, this will result in an empty tuple
+            data_diff = tuple((i, b0) for i, (b0, b1) in enumerate(zip(data0, data1)) if b0 != b1)
 
-            ram.append((*cur_others, diff_set))
+            ram_diff[rng] = (MemDiff.MOD, (perms, label0, data_diff))
 
-        return ram
+        # <DEBUG>
+        # for rng, (opcode, diff) in sorted(ram_diff.items()):
+        #     lbound, ubound = rng
+        #     perms, label, data = diff
+        #
+        #     print(f'{opcode.name} {lbound:010x} - {ubound:010x} {perms:03b} {label:24s} ~{len(data)}')
+        # </DEBUG>
+
+        return ram_diff
 
-    def diff(self, before_st, after_st):
+    def diff(self, other: State) -> DiffedState:
+        """Diff between previous and current state.
         """
-        diff between previous and current state
+
+        return DiffedState(
+            self._diff_reg(other),
+            self._diff_xreg(other),
+            self._diff_ram(other),
+            self.loader
+        )
+
+
+class SnapshotManager:
+    """Differential snapshot object.
+    """
+
+    def __init__(self, ql: Qiling):
+        self.ql = ql
+        self.layers: List[DiffedState] = []
+
+    def save(self) -> State:
+        """
+        acquire current State by wrapping saved context from ql.save()
         """
 
-        # prev_st = self.layers.pop()
-        diffed_reg = self.diff_reg(before_st.reg, after_st.reg)
-        diffed_ram = self.diff_ram(before_st.ram, after_st.ram)
-        diffed_xreg = self.diff_reg(before_st.xreg, after_st.xreg)
-        # diffed_reg = self.diff_reg(prev_st.reg, cur_st.reg)
-        # diffed_ram = self.diff_ram(prev_st.ram, cur_st.ram)
-        return self.DiffedState((diffed_reg, diffed_ram, diffed_xreg))
+        return State(self.ql.save(reg=True, mem=True, loader=True))
 
-    def snapshot(func):
+    @staticmethod
+    def snapshot(func: Callable) -> Callable:
         """
         decorator function for saving differential context on certian qdb command
         """
@@ -268,17 +272,16 @@ def snapshot(func):
         def magic(self: QlQdb, *args, **kwargs):
             if self.rr:
                 # save State before execution
-                p_st = self.rr._save()
+                before = self.rr.save()
 
                 # certian execution to be snapshot
                 func(self, *args, **kwargs)
 
                 # save State after execution
-                q_st = self.rr._save()
+                after = self.rr.save()
 
                 # merge two saved States into a DiffedState
-                st = self.rr.diff(p_st, q_st)
-                self.rr.layers.append(st)
+                self.rr.layers.append(before.diff(after))
             else:
                 func(self, *args, **kwargs)
 
@@ -289,49 +292,65 @@ def restore(self):
         helper function for restoring running state from an existing incremental snapshot
         """
 
-        prev_st = self.layers.pop()
-        cur_st = self._save()
+        prev_st = self.layers.pop()  # DiffedState
+        curr_st = self.save()        # State, expected to be identical to 'after' State in snapshot method
+
+        curr_st.reg.update(prev_st.reg)
+        curr_st.xreg.update(prev_st.xreg)
+
+        if prev_st.ram:
+            diff_ram = prev_st.ram
+            curr_ram = curr_st.ram
+
+            # we must begin by removing unwanted memory regions, otherwise we would not be able to
+            # add new ones in case they overlap. here we iterate over the diff dictionary but handle
+            # only remove opcodes
+            for rng, (opcode, props) in diff_ram.items():
+                lbound, ubound = rng
+                size = ubound - lbound
 
-        for reg_name, reg_value in prev_st.reg.items():
-            cur_st.reg[reg_name] = reg_value
+                if opcode is MemDiff.REM:
+                    # NOTE: it doesn't seem like distinguishing between brk, mmap, mmap annonymous
+                    # and regular maps is actually required
+                    self.ql.mem.unmap(lbound, size)
 
-        for reg_name, reg_value in prev_st.xreg.items():
-            cur_st.xreg[reg_name] = reg_value
+            # doind a second pass, but this time handling add and modify opcodes
+            for rng, (opcode, props) in diff_ram.items():
+                lbound, ubound = rng
+                perms, label, data = props
+                size = ubound - lbound
 
-        to_be_restored = {
-            "reg": cur_st.reg,
+                if opcode is MemDiff.ADD:
+                    # TODO: distinguish between brk, mmap, mmap annonymous and regular maps
+
+                    self.ql.mem.map(lbound, size, perms, label)
+                    self.ql.mem.write(lbound, data)
+
+                elif opcode is MemDiff.MOD:
+                    if perms != -1:
+                        self.ql.mem.protect(lbound, size, perms)
+
+                    # is there a diff for this memory range?
+                    if data:
+                        # get current memory content
+                        _, _, curr_data = curr_ram[rng]
+                        curr_data = bytearray(curr_data)
+
+                        # patch with existing diff
+                        for i, b in data:
+                            curr_data[i] = b
+
+                        # write patched data
+                        self.ql.mem.write(lbound, bytes(curr_data))
+
+        self.ql.restore({
+            'reg': curr_st.reg,
 
             # though we have arch-specific context to restore, we want to keep this arch-agnostic.
             # one way to work around that is to include 'xreg' both as msr (intel) and cpr (arm).
             # only the relevant one will be picked up while the other one will be discarded
-            "msr": cur_st.xreg,
-            "cpr": cur_st.xreg
-        }
+            'msr': curr_st.xreg,
+            'cpr': curr_st.xreg,
 
-        # FIXME: not sure how this one even works. while curr_st is a fresh qiling snapshot,
-        # prev_st is a DiffedState which does not hold a complete state but only a diff between
-        # two points which seem to be unrelated here.
-        #
-        # this code only patches current memory content with the diff between points a and b while
-        # we may be already be at point c.
-        if getattr(prev_st, "ram", None) and prev_st.ram != cur_st.ram:
-
-            ram = []
-            # lbound, ubound, perm, label, data
-            for each in prev_st.ram:
-                *prev_others, prev_rb_set = each
-                for *cur_others, cur_rb_set in cur_st.ram:
-                    if prev_others == cur_others:
-                        cur_rb_dict = dict(cur_rb_set)
-                        for idx, val in prev_rb_set:
-                            cur_rb_dict[idx] = val
-
-                        bs = bytes(dict(sorted(cur_rb_dict.items())).values())
-                        ram.append((*cur_others, bs))
-
-            to_be_restored["mem"] = {
-                "ram": ram,
-                "mmio": {}
-            }
-
-        self.ql.restore(to_be_restored)
+            'loader': prev_st.loader
+        })
diff --git a/qiling/debugger/utils.py b/qiling/debugger/utils.py
deleted file mode 100644
index 5fa75e330..000000000
--- a/qiling/debugger/utils.py
+++ /dev/null
@@ -1,344 +0,0 @@
-#!/usr/bin/env python3
-#
-# Cross Platform and Multi Architecture Advanced Binary Emulation Framework
-#
-
-from elftools.common.exceptions import ELFError
-from elftools.common.py3compat import (
-        ifilter, byte2int, bytes2str, itervalues, str2bytes, iterbytes)
-from elftools.elf.elffile import ELFFile
-from elftools.elf.dynamic import DynamicSection, DynamicSegment
-from elftools.elf.enums import ENUM_D_TAG
-from elftools.elf.segments import InterpSegment
-from elftools.elf.sections import NoteSection, SymbolTableSection
-from elftools.elf.gnuversions import (
-    GNUVerSymSection, GNUVerDefSection,
-    GNUVerNeedSection,
-    )
-from elftools.elf.relocation import RelocationSection
-from elftools.elf.descriptions import (
-    describe_ei_class, describe_ei_data, describe_ei_version,
-    describe_ei_osabi, describe_e_type, describe_e_machine,
-    describe_e_version_numeric, describe_p_type, describe_p_flags,
-    describe_sh_type, describe_sh_flags,
-    describe_symbol_type, describe_symbol_bind, describe_symbol_visibility,
-    describe_symbol_shndx, describe_reloc_type, describe_dyn_tag,
-    describe_dt_flags, describe_dt_flags_1, describe_ver_flags, describe_note,
-    describe_attr_tag_arm
-    )
-from elftools.elf.constants import E_FLAGS
-from elftools.elf.constants import E_FLAGS_MASKS
-
-from qiling import Qiling
-
-
-class QlReadELF(object):
-    def __init__(self, ql:Qiling, elf_stream):
-        self.ql = ql
-        self.elffile = ELFFile(elf_stream)
-        self._versioninfo = None
-
-    def elf_file_header(self):
-        elf_header = {}
-        def add_info(key, value):
-            elf_header[key] = value
-
-        header = self.elffile.header
-        e_ident = header['e_ident']
-
-        add_info('Magic', ' '.join('%2.2x' % byte2int(b)
-                   for b in self.elffile.e_ident_raw))
-        add_info('Class',describe_ei_class(e_ident['EI_CLASS']))
-        add_info('Data', describe_ei_data(e_ident['EI_DATA']))
-        add_info('Version', e_ident['EI_VERSION'])
-        add_info('OS/ABI', describe_ei_osabi(e_ident['EI_OSABI']))
-        add_info('ABI Version', e_ident['EI_ABIVERSION'])
-        add_info('Type', describe_e_type(header['e_type']))
-        add_info('Machine', describe_e_machine(header['e_machine']))
-        add_info('Version_e', describe_e_version_numeric(header['e_version']))
-        add_info('Entry point address', self._format_hex(header['e_entry']))
-        add_info('Start of program headers', header['e_phoff'])
-        add_info('Start of section headers', header['e_shoff'])
-        add_info('Flags', [self._format_hex(header['e_flags']),
-                self.decode_flags(header['e_flags'])])
-        add_info('Size of this header', header['e_ehsize'])
-        add_info('Size of program headers', header['e_phentsize'])
-        add_info('Number of program headers', header['e_phnum'])
-        add_info('Size of section headers', header['e_shentsize'])
-        add_info('Number of section headers', header['e_shnum'])
-        add_info('Section header string table index', header['e_shstrndx'])
-
-        return elf_header
-
-    def elf_program_headers(self):
-        program_headers = []
-        def add_info(dic):
-            program_headers.append(dic)
-
-        if self.elffile.num_segments() == 0:
-            return None
-
-        for segment in self.elffile.iter_segments():
-            program_hdr = {}
-            program_hdr['Type'] = describe_p_type(segment['p_type'])
-            program_hdr['Offset'] = self._format_hex(segment['p_offset'], fieldsize=6)
-            program_hdr['VirtAddr'] = self._format_hex(segment['p_vaddr'], fullhex=True)
-            program_hdr['PhysAddr'] = self._format_hex(segment['p_paddr'], fullhex=True)
-            program_hdr['FileSiz'] = self._format_hex(segment['p_filesz'], fieldsize=5)
-            program_hdr['MemSiz'] = self._format_hex(segment['p_memsz'], fieldsize=5)
-            program_hdr['Flg'] = describe_p_flags(segment['p_flags'])
-            program_hdr['Align'] = self._format_hex(segment['p_align'])
-
-            add_info(program_hdr)
-
-        return program_headers
-
-    def elf_section_headers(self):
-        section_headers = []
-        def add_info(dic):
-            section_headers.append(dic)
-
-        if self.elffile.num_sections() == 0:
-            return None
-
-        for nsec, section in enumerate(self.elffile.iter_sections()):
-            section_hdr = {}
-            section_hdr['index'] = nsec
-            section_hdr['Name'] = section.name
-            section_hdr['Type'] = describe_sh_type(section['sh_type'])
-            section_hdr['Addr'] = self._format_hex(section['sh_addr'], fieldsize=8, lead0x=False)
-            section_hdr['Offset'] = self._format_hex(section['sh_offset'], fieldsize=6, lead0x=False)
-            section_hdr['Size'] = self._format_hex(section['sh_size'], fieldsize=6, lead0x=False)
-            section_hdr['ES'] = self._format_hex(section['sh_entsize'], fieldsize=2, lead0x=False)
-            section_hdr['Flag'] = describe_sh_flags(section['sh_flags'])
-            section_hdr['Lk'] = section['sh_link']
-            section_hdr['Inf'] = section['sh_info']
-            section_hdr['Al'] = section['sh_addralign']
-
-            add_info(section_hdr)
-
-        return section_headers
-
-    def elf_symbol_tables(self):
-        symbol_tables_list = []
-        def add_info(dic):
-            symbol_tables_list.append(dic)
-
-        self._init_versioninfo()
-
-        symbol_tables = [s for s in self.elffile.iter_sections()
-                    if isinstance(s, SymbolTableSection)]
-
-        if not symbol_tables and self.elffile.num_sections() == 0:
-            return None
-
-        for section in symbol_tables:
-            if not isinstance(section, SymbolTableSection):
-                continue
-
-            if section['sh_entsize'] == 0:
-                continue
-
-            for nsym, symbol in enumerate(section.iter_symbols()):
-                version_info = ''
-                if (section['sh_type'] == 'SHT_DYNSYM' and
-                        self._versioninfo['type'] == 'GNU'):
-                    version = self._symbol_version(nsym)
-                    if (version['name'] != symbol.name and
-                        version['index'] not in ('VER_NDX_LOCAL',
-                                                 'VER_NDX_GLOBAL')):
-                        if version['filename']:
-                            # external symbol
-                            version_info = '@%(name)s (%(index)i)' % version
-                        else:
-                            # internal symbol
-                            if version['hidden']:
-                                version_info = '@%(name)s' % version
-                            else:
-                                version_info = '@@%(name)s' % version
-
-                symbol_info = {}
-                symbol_info['index'] = nsym
-                symbol_info['Value'] = self._format_hex(
-                        symbol['st_value'], fullhex=True, lead0x=False)
-                symbol_info['Size'] = symbol['st_size']
-                symbol_info['Type'] = describe_symbol_type(symbol['st_info']['type'])
-                symbol_info['Bind'] = describe_symbol_bind(symbol['st_info']['bind'])
-                symbol_info['Vis'] = describe_symbol_visibility(symbol['st_other']['visibility'])
-                symbol_info['Ndx'] = describe_symbol_shndx(symbol['st_shndx'])
-                symbol_info['Name'] = symbol.name
-                symbol_info['version_info'] = version_info
-                add_info(symbol_info)
-        return symbol_tables_list
-
-    def decode_flags(self, flags):
-        description = ""
-        if self.elffile['e_machine'] == "EM_ARM":
-            eabi = flags & E_FLAGS.EF_ARM_EABIMASK
-            flags &= ~E_FLAGS.EF_ARM_EABIMASK
-
-            if flags & E_FLAGS.EF_ARM_RELEXEC:
-                description += ', relocatable executabl'
-                flags &= ~E_FLAGS.EF_ARM_RELEXEC
-
-            if eabi == E_FLAGS.EF_ARM_EABI_VER5:
-                EF_ARM_KNOWN_FLAGS = E_FLAGS.EF_ARM_ABI_FLOAT_SOFT|E_FLAGS.EF_ARM_ABI_FLOAT_HARD|E_FLAGS.EF_ARM_LE8|E_FLAGS.EF_ARM_BE8
-                description += ', Version5 EABI'
-                if flags & E_FLAGS.EF_ARM_ABI_FLOAT_SOFT:
-                    description += ", soft-float ABI"
-                elif flags & E_FLAGS.EF_ARM_ABI_FLOAT_HARD:
-                    description += ", hard-float ABI"
-
-                if flags & E_FLAGS.EF_ARM_BE8:
-                    description += ", BE8"
-                elif flags & E_FLAGS.EF_ARM_LE8:
-                    description += ", LE8"
-
-                if flags & ~EF_ARM_KNOWN_FLAGS:
-                    description += ', <unknown>'
-            else:
-                description += ', <unrecognized EABI>'
-
-        elif self.elffile['e_machine'] == "EM_MIPS":
-            if flags & E_FLAGS.EF_MIPS_NOREORDER:
-                description += ", noreorder"
-            if flags & E_FLAGS.EF_MIPS_PIC:
-                description += ", pic"
-            if flags & E_FLAGS.EF_MIPS_CPIC:
-                description += ", cpic"
-            if (flags & E_FLAGS.EF_MIPS_ABI2):
-                description += ", abi2"
-            if (flags & E_FLAGS.EF_MIPS_32BITMODE):
-                description += ", 32bitmode"
-            if (flags & E_FLAGS_MASKS.EFM_MIPS_ABI_O32):
-                description += ", o32"
-            elif (flags & E_FLAGS_MASKS.EFM_MIPS_ABI_O64):
-                description += ", o64"
-            elif (flags & E_FLAGS_MASKS.EFM_MIPS_ABI_EABI32):
-                description += ", eabi32"
-            elif (flags & E_FLAGS_MASKS.EFM_MIPS_ABI_EABI64):
-                description += ", eabi64"
-            if (flags & E_FLAGS.EF_MIPS_ARCH) == E_FLAGS.EF_MIPS_ARCH_1:
-                description += ", mips1"
-            if (flags & E_FLAGS.EF_MIPS_ARCH) == E_FLAGS.EF_MIPS_ARCH_2:
-                description += ", mips2"
-            if (flags & E_FLAGS.EF_MIPS_ARCH) == E_FLAGS.EF_MIPS_ARCH_3:
-                description += ", mips3"
-            if (flags & E_FLAGS.EF_MIPS_ARCH) == E_FLAGS.EF_MIPS_ARCH_4:
-                description += ", mips4"
-            if (flags & E_FLAGS.EF_MIPS_ARCH) == E_FLAGS.EF_MIPS_ARCH_5:
-                description += ", mips5"
-            if (flags & E_FLAGS.EF_MIPS_ARCH) == E_FLAGS.EF_MIPS_ARCH_32R2:
-                description += ", mips32r2"
-            if (flags & E_FLAGS.EF_MIPS_ARCH) == E_FLAGS.EF_MIPS_ARCH_64R2:
-                description += ", mips64r2"
-            if (flags & E_FLAGS.EF_MIPS_ARCH) == E_FLAGS.EF_MIPS_ARCH_32:
-                description += ", mips32"
-            if (flags & E_FLAGS.EF_MIPS_ARCH) == E_FLAGS.EF_MIPS_ARCH_64:
-                description += ", mips64"
-
-        return description
-
-    def _format_hex(self, addr, fieldsize=None, fullhex=False, lead0x=True,
-                    alternate=False):
-        """ Format an address into a hexadecimal string.
-
-            fieldsize:
-                Size of the hexadecimal field (with leading zeros to fit the
-                address into. For example with fieldsize=8, the format will
-                be %08x
-                If None, the minimal required field size will be used.
-
-            fullhex:
-                If True, override fieldsize to set it to the maximal size
-                needed for the elfclass
-
-            lead0x:
-                If True, leading 0x is added
-
-            alternate:
-                If True, override lead0x to emulate the alternate
-                hexadecimal form specified in format string with the #
-                character: only non-zero values are prefixed with 0x.
-                This form is used by readelf.
-        """
-        if alternate:
-            if addr == 0:
-                lead0x = False
-            else:
-                lead0x = True
-                fieldsize -= 2
-
-        s = '0x' if lead0x else ''
-        if fullhex:
-            fieldsize = 8 if self.elffile.elfclass == 32 else 16
-        if fieldsize is None:
-            field = '%x'
-        else:
-            field = '%' + '0%sx' % fieldsize
-        return s + field % addr
-
-    def _init_versioninfo(self):
-        """ Search and initialize informations about version related sections
-            and the kind of versioning used (GNU or Solaris).
-        """
-        if self._versioninfo is not None:
-            return
-
-        self._versioninfo = {'versym': None, 'verdef': None,
-                             'verneed': None, 'type': None}
-
-        for section in self.elffile.iter_sections():
-            if isinstance(section, GNUVerSymSection):
-                self._versioninfo['versym'] = section
-            elif isinstance(section, GNUVerDefSection):
-                self._versioninfo['verdef'] = section
-            elif isinstance(section, GNUVerNeedSection):
-                self._versioninfo['verneed'] = section
-            elif isinstance(section, DynamicSection):
-                for tag in section.iter_tags():
-                    if tag['d_tag'] == 'DT_VERSYM':
-                        self._versioninfo['type'] = 'GNU'
-                        break
-
-        if not self._versioninfo['type'] and (
-                self._versioninfo['verneed'] or self._versioninfo['verdef']):
-            self._versioninfo['type'] = 'Solaris'
-
-    def _symbol_version(self, nsym):
-        """ Return a dict containing information on the
-                   or None if no version information is available
-        """
-        self._init_versioninfo()
-
-        symbol_version = dict.fromkeys(('index', 'name', 'filename', 'hidden'))
-
-        if (not self._versioninfo['versym'] or
-                nsym >= self._versioninfo['versym'].num_symbols()):
-            return None
-
-        symbol = self._versioninfo['versym'].get_symbol(nsym)
-        index = symbol.entry['ndx']
-        if not index in ('VER_NDX_LOCAL', 'VER_NDX_GLOBAL'):
-            index = int(index)
-
-            if self._versioninfo['type'] == 'GNU':
-                # In GNU versioning mode, the highest bit is used to
-                # store wether the symbol is hidden or not
-                if index & 0x8000:
-                    index &= ~0x8000
-                    symbol_version['hidden'] = True
-
-            if (self._versioninfo['verdef'] and
-                    index <= self._versioninfo['verdef'].num_versions()):
-                _, verdaux_iter = \
-                        self._versioninfo['verdef'].get_version(index)
-                symbol_version['name'] = next(verdaux_iter).name
-            else:
-                verneed, vernaux = \
-                        self._versioninfo['verneed'].get_version(index)
-                symbol_version['name'] = vernaux.name
-                symbol_version['filename'] = verneed.name
-
-        symbol_version['index'] = index
-        return symbol_version
diff --git a/qiling/extensions/afl/afl.py b/qiling/extensions/afl/afl.py
index 4aef943ee..4128af5a4 100644
--- a/qiling/extensions/afl/afl.py
+++ b/qiling/extensions/afl/afl.py
@@ -96,8 +96,8 @@ def ql_afl_fuzz_custom(ql: Qiling,
     def __place_input_wrapper(uc: Uc, input_bytes: Array[c_char], iters: int, context: Any) -> bool:
         return place_input_callback(ql, input_bytes.raw, iters)
 
-    def __validate_crash_wrapper(uc: Uc, result: int, input_bytes: bytes, iters: int, context: Any) -> bool:
-        return validate_crash_callback(ql, result, input_bytes, iters)
+    def __validate_crash_wrapper(uc: Uc, result: int, input_bytes: Array[c_char], iters: int, context: Any) -> bool:
+        return validate_crash_callback(ql, result, input_bytes.raw, iters)
 
     def __fuzzing_wrapper(uc: Uc, context: Any) -> int:
         return fuzzing_callback(ql)
diff --git a/qiling/extensions/coverage/formats/base.py b/qiling/extensions/coverage/formats/base.py
index d9fe7c34e..4ca162cb8 100644
--- a/qiling/extensions/coverage/formats/base.py
+++ b/qiling/extensions/coverage/formats/base.py
@@ -3,9 +3,14 @@
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
+from __future__ import annotations
+
 from abc import ABC, abstractmethod
+from typing import TYPE_CHECKING
+
 
-from qiling import Qiling
+if TYPE_CHECKING:
+    from qiling import Qiling
 
 
 class QlBaseCoverage(ABC):
@@ -15,25 +20,21 @@ class QlBaseCoverage(ABC):
     all the methods marked with the @abstractmethod decorator.
     """
 
+    FORMAT_NAME: str
+
     def __init__(self, ql: Qiling):
         super().__init__()
 
         self.ql = ql
 
-    @property
-    @staticmethod
-    @abstractmethod
-    def FORMAT_NAME() -> str:
-        raise NotImplementedError
-
     @abstractmethod
-    def activate(self):
+    def activate(self) -> None:
         pass
 
     @abstractmethod
-    def deactivate(self):
+    def deactivate(self) -> None:
         pass
 
     @abstractmethod
-    def dump_coverage(self, coverage_file: str):
+    def dump_coverage(self, coverage_file: str) -> None:
         pass
diff --git a/qiling/extensions/coverage/formats/drcov.py b/qiling/extensions/coverage/formats/drcov.py
index 51a421946..bed0f8701 100644
--- a/qiling/extensions/coverage/formats/drcov.py
+++ b/qiling/extensions/coverage/formats/drcov.py
@@ -3,12 +3,20 @@
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
-from ctypes import Structure
-from ctypes import c_uint32, c_uint16
+from __future__ import annotations
+
+from ctypes import Structure, c_uint32, c_uint16
+from functools import lru_cache
+from typing import TYPE_CHECKING, BinaryIO, Dict, Tuple
 
 from .base import QlBaseCoverage
 
 
+if TYPE_CHECKING:
+    from qiling import Qiling
+    from qiling.loader.loader import QlLoader
+
+
 # Adapted from https://www.ayrx.me/drcov-file-format
 class bb_entry(Structure):
     _fields_ = [
@@ -29,36 +37,61 @@ class QlDrCoverage(QlBaseCoverage):
 
     FORMAT_NAME = "drcov"
 
-    def __init__(self, ql):
+    def __init__(self, ql: Qiling):
         super().__init__(ql)
 
         self.drcov_version = 2
         self.drcov_flavor = 'drcov'
-        self.basic_blocks = []
+        self.basic_blocks: Dict[int, bb_entry] = {}
         self.bb_callback = None
 
-    @staticmethod
-    def block_callback(ql, address, size, self):
-        for mod_id, mod in enumerate(ql.loader.images):
-            if mod.base <= address <= mod.end:
-                ent = bb_entry(address - mod.base, size, mod_id)
-                self.basic_blocks.append(ent)
-                break
+    @lru_cache(maxsize=64)
+    def _get_img_base(self, loader: QlLoader, address: int) -> Tuple[int, int]:
+        """Retrieve the containing image of a given address.
+
+        Addresses are expected to be aligned to page boundary, and cached for faster retrieval.
+        """
+
+        return next((i, img.base) for i, img in enumerate(loader.images) if img.base <= address < img.end)
+
+    def block_callback(self, ql: Qiling, address: int, size: int):
+        if address not in self.basic_blocks:
+            try:
+                # we rely on the fact that images are allocated on page size boundary and
+                # use it to speed up image retrieval. we align the basic block address to
+                # page boundary, knowing basic blocks within the same page belong to the
+                # same image. then we use the aligned address to retreive the containing
+                # image. returned values are cached so subsequent retrievals for basic
+                # blocks within the same page will return the cached value instead of
+                # going through the retreival process again (up to maxsize cached pages)
 
-    def activate(self):
-        self.bb_callback = self.ql.hook_block(self.block_callback, user_data=self)
+                i, img_base = self._get_img_base(ql.loader, address & ~(0x1000 - 1))
+            except StopIteration:
+                pass
+            else:
+                self.basic_blocks[address] = bb_entry(address - img_base, size, i)
 
-    def deactivate(self):
-        self.ql.hook_del(self.bb_callback)
+    def activate(self) -> None:
+        self.bb_callback = self.ql.hook_block(self.block_callback)
+
+    def deactivate(self) -> None:
+        if self.bb_callback:
+            self.ql.hook_del(self.bb_callback)
+
+    def dump_coverage(self, coverage_file: str) -> None:
+        def __write_line(bio: BinaryIO, line: str) -> None:
+            bio.write(f'{line}\n'.encode())
 
-    def dump_coverage(self, coverage_file):
         with open(coverage_file, "wb") as cov:
-            cov.write(f"DRCOV VERSION: {self.drcov_version}\n".encode())
-            cov.write(f"DRCOV FLAVOR: {self.drcov_flavor}\n".encode())
-            cov.write(f"Module Table: version {self.drcov_version}, count {len(self.ql.loader.images)}\n".encode())
-            cov.write("Columns: id, base, end, entry, checksum, timestamp, path\n".encode())
+            __write_line(cov, f"DRCOV VERSION: {self.drcov_version}")
+            __write_line(cov, f"DRCOV FLAVOR: {self.drcov_flavor}")
+            __write_line(cov, f"Module Table: version {self.drcov_version}, count {len(self.ql.loader.images)}")
+            __write_line(cov, "Columns: id, base, end, entry, checksum, timestamp, path")
+
             for mod_id, mod in enumerate(self. ql.loader.images):
-                cov.write(f"{mod_id}, {mod.base}, {mod.end}, 0, 0, 0, {mod.path}\n".encode())
-            cov.write(f"BB Table: {len(self.basic_blocks)} bbs\n".encode())
-            for bb in self.basic_blocks:
+                __write_line(cov, f"{mod_id}, {mod.base}, {mod.end}, 0, 0, 0, {mod.path}")
+
+            __write_line(cov, f"BB Table: {len(self.basic_blocks)} bbs")
+
+            for bb in self.basic_blocks.values():
                 cov.write(bytes(bb))
diff --git a/qiling/extensions/coverage/formats/drcov_exact.py b/qiling/extensions/coverage/formats/drcov_exact.py
index 685c6c044..4f7082789 100644
--- a/qiling/extensions/coverage/formats/drcov_exact.py
+++ b/qiling/extensions/coverage/formats/drcov_exact.py
@@ -1,5 +1,5 @@
 #!/usr/bin/env python3
-# 
+#
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
@@ -17,10 +17,6 @@ class QlDrCoverageExact(QlDrCoverage):
 
     FORMAT_NAME = "drcov_exact"
 
-    def __init__(self, ql):
-        super().__init__(ql)
-
-    def activate(self):
+    def activate(self) -> None:
         # We treat every instruction as a block on its own.
-        self.bb_callback = self.ql.hook_code(self.block_callback, user_data=self)
-        
\ No newline at end of file
+        self.bb_callback = self.ql.hook_code(self.block_callback)
diff --git a/qiling/extensions/coverage/formats/ezcov.py b/qiling/extensions/coverage/formats/ezcov.py
index b25218691..46290e4c5 100644
--- a/qiling/extensions/coverage/formats/ezcov.py
+++ b/qiling/extensions/coverage/formats/ezcov.py
@@ -1,19 +1,30 @@
 #!/usr/bin/env python3
-# 
+#
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
-from collections import namedtuple
-from os.path import basename
+from __future__ import annotations
+
+import os
+from typing import Any, TYPE_CHECKING, List, NamedTuple
 
 from .base import QlBaseCoverage
 
 
+if TYPE_CHECKING:
+    from qiling import Qiling
+
+
 # Adapted from https://github.com/nccgroup/Cartographer/blob/main/EZCOV.md#coverage-data
-class bb_entry(namedtuple('bb_entry', 'offset size mod_id')):
-    def csvline(self):
-        offset = '0x{:08x}'.format(self.offset)
+class bb_entry(NamedTuple):
+    offset: int
+    size: int
+    mod_id: Any
+
+    def as_csv(self) -> str:
+        offset = f'{self.offset:#010x}'
         mod_id = f"[ {self.mod_id if self.mod_id is not None else ''} ]"
+
         return f"{offset},{self.size},{mod_id}\n"
 
 class QlEzCoverage(QlBaseCoverage):
@@ -27,29 +38,30 @@ class QlEzCoverage(QlBaseCoverage):
 
     FORMAT_NAME = "ezcov"
 
-    def __init__(self, ql):
+    def __init__(self, ql: Qiling):
         super().__init__(ql)
+
         self.ezcov_version = 1
-        self.ezcov_flavor  = 'ezcov'
-        self.basic_blocks  = []
-        self.bb_callback   = None
+        self.ezcov_flavor = 'ezcov'
+        self.basic_blocks: List[bb_entry]  = []
+        self.bb_callback = None
 
-    @staticmethod
-    def block_callback(ql, address, size, self):
-        mod = ql.loader.find_containing_image(address)
-        if mod is not None:
-            ent = bb_entry(address - mod.base, size, basename(mod.path))
-            self.basic_blocks.append(ent)
+    def block_callback(self, ql: Qiling, address: int, size: int):
+        img = ql.loader.find_containing_image(address)
 
-    def activate(self):
-        self.bb_callback = self.ql.hook_block(self.block_callback, user_data=self)
+        if img is not None:
+            self.basic_blocks.append(bb_entry(address - img.base, size, os.path.basename(img.path)))
 
-    def deactivate(self):
-        self.ql.hook_del(self.bb_callback)
+    def activate(self) -> None:
+        self.bb_callback = self.ql.hook_block(self.block_callback)
 
-    def dump_coverage(self, coverage_file):
+    def deactivate(self) -> None:
+        if self.bb_callback:
+            self.ql.hook_del(self.bb_callback)
+
+    def dump_coverage(self, coverage_file: str) -> None:
         with open(coverage_file, "w") as cov:
             cov.write(f"EZCOV VERSION: {self.ezcov_version}\n")
             cov.write("# Qiling EZCOV exporter tool\n")
-            for bb in self.basic_blocks:
-                cov.write(bb.csvline())
\ No newline at end of file
+
+            cov.writelines(bb.as_csv() for bb in self.basic_blocks)
diff --git a/qiling/extensions/idaplugin/qilingida.py b/qiling/extensions/idaplugin/qilingida.py
index 9aaebc353..cdbcbf6bd 100644
--- a/qiling/extensions/idaplugin/qilingida.py
+++ b/qiling/extensions/idaplugin/qilingida.py
@@ -5,7 +5,6 @@
 
 import sys
 import collections
-import time
 import struct
 import re
 import logging
@@ -37,9 +36,6 @@
 import ida_netnode
 import ida_hexrays
 import ida_range
-# PyQt
-from PyQt5 import QtCore, QtWidgets
-from PyQt5.QtWidgets import (QPushButton, QHBoxLayout)
 
 # Qiling
 from qiling import Qiling
@@ -55,7 +51,6 @@
 from qiling.os.filestruct import ql_file
 from keystone import *
 
-
 QilingHomePage = 'https://www.qiling.io'
 QilingStableVersionURL = 'https://raw.githubusercontent.com/qilingframework/qiling/master/qiling/__version__.py'
 logging.basicConfig(level=logging.INFO, format='[%(levelname)s][%(module)s:%(lineno)d] %(message)s')
@@ -69,7 +64,27 @@ class Colors(Enum):
     Gray = 0xd9d9d9
     Beige = 0xCCF2FF
 
-class IDA:
+def _load_qt_bindings():
+    if IDA_SDK_VERSION >= 900:
+        try:
+            from PySide6 import QtCore, QtWidgets
+            from PySide6.QtWidgets import (QPushButton, QHBoxLayout)
+            logging.info("Using PySide6 for Qt bindings (IDA >= 9).")
+            return QtCore, QtWidgets, QPushButton, QHBoxLayout
+        except Exception as e:
+            logging.warning("Failed to import PySide6: %s. Trying PyQt5 fallback.", e)
+    try:
+        from PyQt5 import QtCore, QtWidgets
+        from PyQt5.QtWidgets import (QPushButton, QHBoxLayout)
+        logging.info("Using PyQt5 for Qt bindings (IDA < 9 or fallback).")
+        return QtCore, QtWidgets, QPushButton, QHBoxLayout
+    except Exception as e:
+        logging.error("Failed to import PyQt bindings: %s", e)
+        raise
+
+QtCore, QtWidgets, QPushButton, QHBoxLayout = _load_qt_bindings()
+
+class IDABase:
     def __init__(self):
         pass
 
@@ -79,15 +94,15 @@ def get_function(addr):
 
     @staticmethod
     def get_function_start(addr):
-        return IDA.get_function(addr).start_ea
+        return IDABase.get_function(addr).start_ea
 
     @staticmethod
     def get_function_end(addr):
-        return IDA.get_function(addr).end_ea
+        return IDABase.get_function(addr).end_ea
 
     @staticmethod
     def get_function_framesize(addr):
-        return IDA.get_function(addr).frsize
+        return IDABase.get_function(addr).frsize
 
     @staticmethod
     def get_function_name(addr):
@@ -95,7 +110,7 @@ def get_function_name(addr):
 
     @staticmethod
     def get_functions():
-        return [IDA.get_function(func) for func in idautils.Functions()]
+        return [IDABase.get_function(func) for func in idautils.Functions()]
 
     @staticmethod
     def set_color(addr, what, color):
@@ -104,7 +119,7 @@ def set_color(addr, what, color):
     @staticmethod
     def color_block(bb, color):
         for i in range(bb.start_ea, bb.end_ea):
-            IDA.set_color(i, idc.CIC_ITEM, color)
+            IDABase.set_color(i, idc.CIC_ITEM, color)
 
     # note:
     # corresponds to IDA graph view
@@ -113,8 +128,8 @@ def color_block(bb, color):
     # arg can be a function or a (start, end) tuple or an address in the function
     @staticmethod
     def get_flowchart(arg):
-        if type(arg) is int:
-            func = IDA.get_function(arg)
+        if isinstance(arg, int):
+            func = IDABase.get_function(arg)
             if func is None:
                 return None
             return ida_gdl.FlowChart(func)
@@ -122,7 +137,9 @@ def get_flowchart(arg):
 
     @staticmethod
     def get_block(addr):
-        flowchart = IDA.get_flowchart(addr)
+        flowchart = IDABase.get_flowchart(addr)
+        if flowchart is None:
+            return None
         for bb in flowchart:
             if bb.start_ea <= addr and addr < bb.end_ea:
                 return bb
@@ -143,10 +160,10 @@ def block_is_terminating(bb):
 
     @staticmethod
     def get_starting_block(addr):
-        flowchart = IDA.get_flowchart(addr)
+        flowchart = IDABase.get_flowchart(addr)
         if flowchart is None:
             return None
-        func = IDA.get_function(addr)
+        func = IDABase.get_function(addr)
         for bb in flowchart:
             if bb.start_ea == func.start_ea:
                 return bb
@@ -154,8 +171,10 @@ def get_starting_block(addr):
 
     @staticmethod
     def get_terminating_blocks(addr):
-        flowchart = IDA.get_flowchart(addr)
-        return [bb for bb in flowchart if IDA.block_is_terminating(bb)]
+        flowchart = IDABase.get_flowchart(addr)
+        if flowchart is None:
+            return []
+        return [bb for bb in flowchart if IDABase.block_is_terminating(bb)]
 
     @staticmethod
     def get_prev_head(addr, minea=0):
@@ -180,46 +199,45 @@ def get_segment_by_name(name):
 
     @staticmethod
     def __addr_in_seg(addr):
-        segs = IDA.get_segments()
+        segs = IDABase.get_segments()
         for seg in segs:
             if addr < seg.end_ea and addr >= seg.start_ea:
                 return seg
         return None
 
-    # note: accept name and address in the segment
     @staticmethod
     def get_segment(arg):
-        if type(arg) is int:
-            return IDA.__addr_in_seg(arg)
-        else: # str
-            return IDA.get_segment_by_name(arg)
+        if isinstance(arg, int):
+            return IDABase.__addr_in_seg(arg)
+        else:
+            return IDABase.get_segment_by_name(arg)
 
     @staticmethod
     def get_segment_start(arg):
-        seg = IDA.get_segment(arg)
+        seg = IDABase.get_segment(arg)
         if seg is not None:
             return seg.start_ea
         return None
 
     @staticmethod
     def get_segment_end(arg):
-        seg = IDA.get_segment(arg)
+        seg = IDABase.get_segment(arg)
         if seg is not None:
             return seg.end_ea
         return None
 
     @staticmethod
     def get_segment_perm(arg):
-        seg = IDA.get_segment(arg)
+        seg = IDABase.get_segment(arg)
         if seg is not None:
-            return seg.perm # RWX e.g. 0b101 = R + X
+            return seg.perm
         return None
 
     @staticmethod
     def get_segment_type(arg):
-        seg = IDA.get_segment(arg)
+        seg = IDABase.get_segment(arg)
         if seg is not None:
-            return seg.type # 0x1 SEG_DATA 0x2 SEG_CODE See doc for details
+            return seg.type
         return None
 
     @staticmethod
@@ -229,12 +247,10 @@ def get_instruction(addr):
             return None
         return r
 
-    # immidiate value
     @staticmethod
     def get_operand(addr, n):
         return (idc.get_operand_type(addr, n), idc.get_operand_value(addr, n))
 
-    # eax, ecx, etc
     @staticmethod
     def print_operand(addr, n):
         return idc.print_operand(addr, n)
@@ -248,7 +264,7 @@ def get_instructions_count(begin, end):
         p = begin
         cnt = 0
         while p < end:
-            sz = IDA.get_instruction_size(p)
+            sz = IDABase.get_instruction_size(p)
             cnt += 1
             p += sz
         return cnt
@@ -293,96 +309,34 @@ def get_xrefsfrom(addr, flags=ida_xref.XREF_ALL):
     def get_input_file_path():
         return ida_nalt.get_input_file_path()
 
-    @staticmethod
-    def get_info_structure():
-        return ida_idaapi.get_inf_structure()
-
-    @staticmethod
-    def get_main_address():
-        return IDA.get_info_structure().main
-
-    @staticmethod
-    def get_max_address():
-        return IDA.get_info_structure().max_ea
-
-    @staticmethod
-    def get_min_address():
-        return IDA.get_info_structure().min_ea
-
-    @staticmethod
-    def is_big_endian():
-        return IDA.get_info_structure().is_be()
-
-    @staticmethod
-    def is_little_endian():
-        return not IDA.is_big_endian()
-
-    @staticmethod
-    def get_filetype():
-        info = IDA.get_info_structure()
-        ftype = info.filetype
-        if ftype == ida_ida.f_MACHO:
-            return "macho"
-        elif ftype == ida_ida.f_PE or ftype == ida_ida.f_EXE or ftype == ida_ida.f_EXE_old: # is this correct?
-            return "pe"
-        elif ftype == ida_ida.f_ELF:
-            return "elf"
-        else:
-            return None
-
-    @staticmethod
-    def get_ql_arch_string():
-        info = IDA.get_info_structure()
-        proc = info.procname.lower()
-        result = None
-        if proc == "metapc":
-            result = "x86"
-            if info.is_64bit():
-                result = "x8664"
-        elif "mips" in proc:
-            result = "mips"
-        elif "arm" in proc:
-            result = "arm32"
-            if info.is_64bit():
-                result = "arm64"
-        # That's all we support :(
-        return result
-
     @staticmethod
     def get_current_address():
         return ida_kernwin.get_screen_ea()
 
-    # return (?, start, end)
     @staticmethod
     def get_last_selection():
         return ida_kernwin.read_range_selection(None)
 
-    # Use with skipcalls
-    # note that the address is the end of target instruction
-    # e.g.:
-    # 0x1 push eax
-    # 0x4 mov eax, 0
-    # call get_frame_sp_delta(0x4) and get -4.
     @staticmethod
     def get_frame_sp_delta(addr):
-        return ida_frame.get_sp_delta(IDA.get_function(addr), addr)
+        return ida_frame.get_sp_delta(IDABase.get_function(addr), addr)
 
     @staticmethod
     def patch_bytes(addr, bs):
         return ida_bytes.patch_bytes(addr, bs)
 
     @staticmethod
-    def fill_bytes(start, end, bs = b'\x90'):
+    def fill_bytes(start, end, bs=b'\x90'):
         return ida_bytes.patch_bytes(start, bs*(end-start))
 
     @staticmethod
     def nop_selection():
-        _, start, end = IDA.get_last_selection()
-        return IDA.fill_bytes(start, end)
+        _, start, end = IDABase.get_last_selection()
+        return IDABase.fill_bytes(start, end)
 
     @staticmethod
     def fill_block(bb, bs=b'\x90'):
-        return IDA.fill_bytes(bb.start_ea, bb.end_ea, bs)
+        return IDABase.fill_bytes(bb.start_ea, bb.end_ea, bs)
 
     @staticmethod
     def assemble(ea, cs, ip, use32, line):
@@ -394,7 +348,7 @@ def create_data(ea, dataflag, size, tid=ida_netnode.BADNODE):
 
     @staticmethod
     def create_bytes_array(start, end):
-        return IDA.create_data(start, ida_bytes.byte_flag(), end-start)
+        return IDABase.create_data(start, ida_bytes.byte_flag(), end-start)
 
     @staticmethod
     def create_byte(ea, length, force=False):
@@ -418,13 +372,12 @@ def get_item_size(ea):
 
     @staticmethod
     def get_item(ea):
-        return (IDA.get_item_head(ea), IDA.get_item_end(ea))
+        return (IDABase.get_item_head(ea), IDABase.get_item_end(ea))
 
     @staticmethod
     def is_colored_item(ea):
         return ida_nalt.is_colored_item(ea)
 
-    # NOTE: The [start, end) range should include all control flows except long calls.
     @staticmethod
     def get_micro_code_mba(start, end, decomp_flags=ida_hexrays.DECOMP_WARNINGS, maturity=7):
         mbrgs = ida_hexrays.mba_ranges_t()
@@ -444,6 +397,112 @@ def micro_code_from_mbb(mbb):
             cur = cur.next
         return
 
+class IDA7(IDABase):
+    @staticmethod
+    def get_info_structure():
+        return ida_idaapi.get_inf_structure()
+
+    @staticmethod
+    def get_main_address():
+        return IDA7.get_info_structure().main
+
+    @staticmethod
+    def get_max_address():
+        return IDA7.get_info_structure().max_ea
+
+    @staticmethod
+    def get_min_address():
+        return IDA7.get_info_structure().min_ea
+
+    @staticmethod
+    def is_big_endian():
+        return IDA7.get_info_structure().is_be
+
+    @staticmethod
+    def is_little_endian():
+        return not IDA7.is_big_endian()
+
+    @staticmethod
+    def get_filetype():
+        ftype = IDA7.get_info_structure().filetype
+        if ftype in (ida_ida.f_PE, ida_ida.f_EXE, ida_ida.f_EXE_old):
+            return "pe"
+        elif ftype == ida_ida.f_MACHO:
+            return "macho"
+        elif ftype == ida_ida.f_ELF:
+            return "elf"
+        return None
+
+    @staticmethod
+    def get_ql_arch_string():
+        proc = IDA7.get_info_structure().procname.lower()
+        is_64_bit = IDA7.get_info_structure().is_64bit()
+        if proc == "metapc":
+            return "x8664" if is_64_bit else "x86"
+        if "mips" in proc:
+            return "mips"
+        if "arm" in proc:
+            return "arm64" if is_64_bit else "arm32"
+        return None
+
+class IDA9(IDABase):
+    @staticmethod
+    def get_info_structure():
+        return ida_idaapi.get_inf_structure()
+
+    @staticmethod
+    def get_main_address():
+        return ida_ida.inf_get_main()
+
+    @staticmethod
+    def get_max_address():
+        return ida_ida.inf_get_max_ea()
+
+    @staticmethod
+    def get_min_address():
+        return ida_ida.inf_get_min_ea()
+
+    @staticmethod
+    def is_big_endian():
+        return ida_ida.inf_is_be()
+
+    @staticmethod
+    def is_little_endian():
+        return not ida_ida.inf_is_be()
+
+    @staticmethod
+    def get_filetype():
+        ftype = ida_ida.inf_get_filetype()
+        if ftype in (ida_ida.f_PE, ida_ida.f_EXE, ida_ida.f_EXE_old):
+            return "pe"
+        elif ftype == ida_ida.f_MACHO:
+            return "macho"
+        elif ftype == ida_ida.f_ELF:
+            return "elf"
+        return None
+
+    @staticmethod
+    def get_ql_arch_string():
+        proc = ida_ida.inf_get_procname().lower()
+        is_64_bit = ida_ida.inf_is_64bit()
+        if proc == "metapc":
+            return "x8664" if is_64_bit else "x86"
+        if "mips" in proc:
+            return "mips"
+        if "arm" in proc:
+            return "arm64" if is_64_bit else "arm32"
+        return None
+
+def get_ida_instance():
+    if IDA_SDK_VERSION >= 900:
+        logging.info("Using IDA9 compatibility layer")
+        return IDA9()
+    else:
+        logging.info("Using IDA7 compatibility layer")
+        return IDA7()
+
+IDA = get_ida_instance()
+
 ### View Class
 
 class QlEmuRegView(simplecustviewer_t):
@@ -1006,7 +1065,7 @@ def __init__(self):
     def init(self):
         # init data
         logging.info('---------------------------------------------------------------------------------------')
-        logging.info('Qiling Emulator Plugin For IDA, by Qiling Team. Version {0}, 2020'.format(QLVERSION))
+        logging.info('Qiling Emulator Plugin For IDA, by Qiling Team. Version {0}, 2025'.format(QLVERSION))
         logging.info('Based on Qiling v{0}'.format(QLVERSION))
         logging.info('Find more information about Qiling at https://qiling.io')
         logging.info('---------------------------------------------------------------------------------------')
diff --git a/qiling/extensions/tracing/formats/base.py b/qiling/extensions/tracing/formats/base.py
index 145944340..dbb9c78af 100644
--- a/qiling/extensions/tracing/formats/base.py
+++ b/qiling/extensions/tracing/formats/base.py
@@ -1,5 +1,5 @@
 #!/usr/bin/env python3
-# 
+#
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 # This code structure is copied and modified from the coverage extension
 
@@ -12,24 +12,20 @@ class QlBaseTrace(ABC):
     To add support for a new coverage format, just derive from this class and implement
     all the methods marked with the @abstractmethod decorator.
     """
-    
+
+    FORMAT_NAME: str
+
     def __init__(self):
         super().__init__()
 
-    @property
-    @staticmethod
-    @abstractmethod
-    def FORMAT_NAME():
-        raise NotImplementedError
-
     @abstractmethod
-    def activate(self):
+    def activate(self) -> None:
         pass
 
     @abstractmethod
-    def deactivate(self):
+    def deactivate(self) -> None:
         pass
 
     @abstractmethod
-    def dump_trace(self, trace_file):
+    def dump_trace(self, trace_file: str) -> None:
         pass
\ No newline at end of file
diff --git a/qiling/hw/hw.py b/qiling/hw/hw.py
index 33081a052..15213a7bf 100644
--- a/qiling/hw/hw.py
+++ b/qiling/hw/hw.py
@@ -1,26 +1,70 @@
 #!/usr/bin/env python3
-# 
+#
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
-import ctypes
+from functools import cached_property
+from typing import Any, Dict, List, Optional, Tuple
 
-from qiling.core import Qiling
+from qiling import Qiling
 from qiling.hw.peripheral import QlPeripheral
 from qiling.utils import ql_get_module_function
 from qiling.exception import QlErrorModuleFunctionNotFound
 
 
+# should adhere to the QlMmioHandler interface, but not extend it directly to
+# avoid potential pickling issues
+class QlPripheralHandler:
+    def __init__(self, hwman: "QlHwManager", base: int, size: int, label: str) -> None:
+        self._hwman = hwman
+        self._base = base
+        self._size = size
+        self._label = label
+
+    def __getstate__(self):
+        state = self.__dict__.copy()
+        del state['_hwman']  # remove non-pickleable reference
+
+        return state
+
+    @cached_property
+    def _mmio(self) -> bytearray:
+        """Get memory buffer used to back non-mapped hardware mmio regions.
+        """
+
+        return bytearray(self._size)
+
+    def read(self, ql: Qiling, offset: int, size: int) -> int:
+        address = self._base + offset
+        hardware = self._hwman.find(address)
+
+        if hardware:
+            return hardware.read(address - hardware.base, size)
+
+        else:
+            ql.log.debug('[%s] read non-mapped hardware [%#010x]', self._label, address)
+            return int.from_bytes(self._mmio[offset:offset + size], byteorder='little')
+
+    def write(self, ql: Qiling, offset: int, size: int, value: int) -> None:
+        address = self._base + offset
+        hardware = self._hwman.find(address)
+
+        if hardware:
+            hardware.write(address - hardware.base, size, value)
+
+        else:
+            ql.log.debug('[%s] write non-mapped hardware [%#010x] = %#010x', self._label, address, value)
+            self._mmio[offset:offset + size] = value.to_bytes(size, 'little')
+
+
 class QlHwManager:
     def __init__(self, ql: Qiling):
         self.ql = ql
 
-        self.entity = {}
-        self.region = {}    
-
-        self.stepable = {}    
+        self.entity: Dict[str, QlPeripheral] = {}
+        self.region: Dict[str, List[Tuple[int, int]]] = {}
 
-    def create(self, label: str, struct: str=None, base: int=None, kwargs: dict={}) -> "QlPeripheral":
+    def create(self, label: str, struct: Optional[str] = None, base: Optional[int] = None, kwargs: Optional[Dict[str, Any]] = None) -> QlPeripheral:
         """ Create the peripheral accroding the label and envs.
 
             struct: Structure of the peripheral. Use defualt ql structure if not provide.
@@ -30,39 +74,45 @@ def create(self, label: str, struct: str=None, base: int=None, kwargs: dict={})
         if struct is None:
             struct, base, kwargs = self.load_env(label.upper())
 
+        if kwargs is None:
+            kwargs = {}
+
         try:
-            
             entity = ql_get_module_function('qiling.hw', struct)(self.ql, label, **kwargs)
-            
-            self.entity[label] = entity
-            if hasattr(entity, 'step'):
-                self.stepable[label] = entity            
 
-            self.region[label] = [(lbound + base, rbound + base) for (lbound, rbound) in entity.region]
+        except QlErrorModuleFunctionNotFound:
+            self.ql.log.warning(f'could not create {struct}({label}): implementation not found')
 
+        else:
+            assert isinstance(entity, QlPeripheral)
+            assert isinstance(base, int)
+
+            self.entity[label] = entity
+            self.region[label] = [(lbound + base, rbound + base) for (lbound, rbound) in entity.region]
 
             return entity
-        except QlErrorModuleFunctionNotFound:
-            self.ql.log.debug(f'The {struct}({label}) has not been implemented')
 
-    def delete(self, label: str):
+        # FIXME: what should we do if struct is not implemented? is it OK to return None , or we fail?
+
+    def delete(self, label: str) -> None:
         """ Remove the peripheral
         """
+
         if label in self.entity:
-            self.entity.pop(label)
-            self.region.pop(label)
-            if label in self.stepable:
-                self.stepable.pop(label)            
+            del self.entity[label]
+
+        if label in self.region:
+            del self.region[label]
 
-    def load_env(self, label: str):
+    def load_env(self, label: str) -> Tuple[str, int, Dict[str, Any]]:
         """ Get peripheral information (structure, base address, initialization list) from env.
 
         Args:
             label (str): Peripheral Label
-        
+
         """
         args = self.ql.env[label]
-        
+
         return args['struct'], args['base'], args.get("kwargs", {})
 
     def load_all(self):
@@ -70,48 +120,30 @@ def load_all(self):
             if args['type'] == 'peripheral':
                 self.create(label.lower(), args['struct'], args['base'], args.get("kwargs", {}))
 
-    def find(self, address: int):
+    # TODO: this is wasteful. device mapping is known at creation time. at least we could cache lru entries
+    def find(self, address: int) -> Optional[QlPeripheral]:
         """ Find the peripheral at `address`
         """
-        
+
         for label in self.entity.keys():
             for lbound, rbound in self.region[label]:
                 if lbound <= address < rbound:
                     return self.entity[label]
 
+        return None
+
     def step(self):
-        """ Update all peripheral's state 
+        """ Update all peripheral's state
         """
-        for entity in self.stepable.values():
-            entity.step()
-
-    def setup_mmio(self, begin, size, info=""):
-        mmio = ctypes.create_string_buffer(size)        
-
-        def mmio_read_cb(ql, offset, size):
-            address = begin + offset                        
-            hardware = self.find(address)
-            
-            if hardware:
-                return hardware.read(address - hardware.base, size)
-            else:
-                ql.log.debug('%s Read non-mapped hardware [0x%08x]' % (info, address))                
-                
-                buf = ctypes.create_string_buffer(size)
-                ctypes.memmove(buf, ctypes.addressof(mmio) + offset, size)
-                return int.from_bytes(buf.raw, byteorder='little')
-
-        def mmio_write_cb(ql, offset, size, value):
-            address = begin + offset
-            hardware = self.find(address)
-
-            if hardware:
-                hardware.write(address - hardware.base, size, value)
-            else:
-                ql.log.debug('%s Write non-mapped hardware [0x%08x] = 0x%08x' % (info, address, value))
-                ctypes.memmove(ctypes.addressof(mmio) + offset, (value).to_bytes(size, 'little'), size)
-
-        self.ql.mem.map_mmio(begin, size, mmio_read_cb, mmio_write_cb, info=info)
+
+        for ent in self.entity.values():
+            if hasattr(ent, 'step'):
+                ent.step()
+
+    def setup_mmio(self, begin: int, size: int, info: str) -> None:
+        dev = QlPripheralHandler(self, begin, size, info)
+
+        self.ql.mem.map_mmio(begin, size, dev, info)
 
     def show_info(self):
         self.ql.log.info(f'{"Start":8s}   {"End":8s}   {"Label":8s} {"Class"}')
@@ -131,8 +163,25 @@ def __getattr__(self, key):
         return self.entity.get(key)
 
     def save(self):
-        return {label : entity.save() for label, entity in self.entity.items()}
+        return {
+            'entity': {label: entity.save() for label, entity in self.entity.items()},
+            'region': self.region
+        }
 
     def restore(self, saved_state):
-        for label, data in saved_state.items():
+        entity = saved_state['entity']
+        assert isinstance(entity, dict)
+
+        region = saved_state['region']
+        assert isinstance(region, dict)
+
+        for label, data in entity.items():
             self.entity[label].restore(data)
+
+        self.region = region
+
+        # a dirty hack to rehydrate non-pickleable hwman
+        # a proper fix would require a deeper refactoring to how peripherals are created and managed
+        for *_, ph in self.ql.mem.map_info:
+            if isinstance(ph, QlPripheralHandler):
+                setattr(ph, '_hwman', self)
diff --git a/qiling/loader/blob.py b/qiling/loader/blob.py
index 382dbb33c..728443391 100644
--- a/qiling/loader/blob.py
+++ b/qiling/loader/blob.py
@@ -4,8 +4,8 @@
 #
 
 from qiling import Qiling
-from qiling.loader.loader import QlLoader
-from qiling.os.memory import QlMemoryHeap
+from qiling.loader.loader import QlLoader, Image
+
 
 class QlLoaderBLOB(QlLoader):
     def __init__(self, ql: Qiling):
@@ -14,15 +14,18 @@ def __init__(self, ql: Qiling):
         self.load_address = 0
 
     def run(self):
-        self.load_address = self.ql.os.entry_point      # for consistency
+        self.load_address = self.ql.os.load_address
+        self.entry_point = self.ql.os.entry_point
 
-        self.ql.mem.map(self.ql.os.entry_point, self.ql.os.code_ram_size, info="[code]")
-        self.ql.mem.write(self.ql.os.entry_point, self.ql.code)
+        code_begins = self.load_address
+        code_size = self.ql.os.code_ram_size
+        code_ends = code_begins + code_size
 
-        heap_address = self.ql.os.entry_point + self.ql.os.code_ram_size
-        heap_size = int(self.ql.os.profile.get("CODE", "heap_size"), 16)
-        self.ql.os.heap = QlMemoryHeap(self.ql, heap_address, heap_address + heap_size)
+        self.ql.mem.map(code_begins, code_size, info="[code]")
+        self.ql.mem.write(code_begins, self.ql.code)
 
-        self.ql.arch.regs.arch_sp = heap_address - 0x1000
+        # allow image-related functionalities
+        self.images.append(Image(code_begins, code_ends, 'blob_code'))
 
-        return
+        # FIXME: stack pointer should be a configurable profile setting
+        self.ql.arch.regs.arch_sp = code_ends - 0x1000
diff --git a/qiling/loader/elf.py b/qiling/loader/elf.py
index 076cb8f0b..a5a8ac6fe 100644
--- a/qiling/loader/elf.py
+++ b/qiling/loader/elf.py
@@ -7,7 +7,7 @@
 import os
 
 from enum import IntEnum
-from typing import AnyStr, Optional, Sequence, Mapping, Tuple
+from typing import Any, AnyStr, Optional, Sequence, Mapping, Tuple
 
 from elftools.common.utils import preserve_stream_pos
 from elftools.elf.constants import P_FLAGS, SH_FLAGS
@@ -330,7 +330,7 @@ def __push_str(top: int, s: str) -> int:
         hwcap_values = {
             (QL_ARCH.ARM,   QL_ENDIAN.EL, 32): 0x001fb8d7,
             (QL_ARCH.ARM,   QL_ENDIAN.EB, 32): 0xd7b81f00,
-            (QL_ARCH.ARM64, QL_ENDIAN.EL, 64): 0x078bfbfd
+            (QL_ARCH.ARM64, QL_ENDIAN.EL, 64): 0x078bfafd
         }
 
         # determine hwcap value by arch properties; if not found default to 0
@@ -701,3 +701,15 @@ def get_elfdata_mapping(self, elffile: ELFFile) -> bytes:
                 elfdata_mapping.extend(sec.data())
 
         return bytes(elfdata_mapping)
+
+    def save(self) -> Mapping[str, Any]:
+        saved = super().save()
+
+        saved['brk_address'] = self.brk_address
+
+        return saved
+
+    def restore(self, saved_state: Mapping[str, Any]):
+        self.brk_address = saved_state['brk_address']
+
+        super().restore(saved_state)
diff --git a/qiling/loader/mcu.py b/qiling/loader/mcu.py
index 8a91d6334..3ad64c3bc 100644
--- a/qiling/loader/mcu.py
+++ b/qiling/loader/mcu.py
@@ -1,7 +1,7 @@
 #!/usr/bin/env python3
-# 
+#
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
-# Built on top of Unicorn emulator (www.unicorn-engine.org) 
+# Built on top of Unicorn emulator (www.unicorn-engine.org)
 
 
 import io
@@ -27,7 +27,7 @@ def __init__(self, path):
                 if addr != begin + len(stream):
                     self.segments.append((begin, stream))
                     begin, stream = addr, data
-                
+
                 else:
                     stream += data
 
@@ -36,13 +36,13 @@ def __init__(self, path):
     def parse_line(self, line):
         if len(line) < 9:
             return
-        
+
         desc = line[7: 9]
-        size = int(line[1: 3], 16)        
-        
+        size = int(line[1: 3], 16)
+
         addr = bytes.fromhex(line[3: 7])
-        data = bytes.fromhex(line[9: 9 + size * 2])        
-        
+        data = bytes.fromhex(line[9: 9 + size * 2])
+
         if   desc == '00': # Data
             offset = int.from_bytes(addr, byteorder='big')
             self.mem.append((self.base + offset, data))
@@ -52,20 +52,20 @@ def parse_line(self, line):
 
         elif desc == '04': # Extended Linear Address
             self.base = int.from_bytes(data, byteorder='big') * 0x10000
-        
+
 
 class QlLoaderMCU(QlLoader):
     def __init__(self, ql:Qiling):
-        super().__init__(ql)   
-        
+        super().__init__(ql)
+
         self.entry_point = 0
         self.load_address = 0
         self.filetype = self.guess_filetype()
-        
+
         if self.filetype == 'elf':
             with open(self.ql.path, 'rb') as infile:
                 self.elf = ELFFile(io.BytesIO(infile.read()))
-            
+
         elif self.filetype == 'bin':
             self.map_address = self.argv[1]
 
@@ -74,8 +74,8 @@ def __init__(self, ql:Qiling):
 
     def guess_filetype(self):
         if self.ql.path.endswith('.elf'):
-            return 'elf'            
-            
+            return 'elf'
+
         if self.ql.path.endswith('.bin'):
             return 'bin'
 
@@ -83,7 +83,7 @@ def guess_filetype(self):
             return 'hex'
 
         return 'elf'
-    
+
     def reset(self):
         if self.filetype == 'elf':
             for segment in self.elf.iter_segments(type='PT_LOAD'):
@@ -99,7 +99,7 @@ def reset(self):
             for begin, data in self.ihex.segments:
                 self.ql.mem.write(begin, data)
 
-        
+
         self.ql.arch.init_context()
         self.entry_point = self.ql.arch.regs.read('pc')
 
@@ -109,30 +109,34 @@ def load_profile(self):
     def load_env(self):
         for name, args in self.env.items():
             memtype = args['type']
+
             if memtype == 'memory':
                 size = args['size']
                 base = args['base']
                 self.ql.mem.map(base, size, info=f'[{name}]')
-            
-            if memtype == 'remap':
-                size = args['size']
-                base = args['base']
-                alias = args['alias']
-                self.ql.hw.setup_remap(alias, base, size, info=f'[{name}]')
 
-            if memtype == 'mmio':
+            # elif memtype == 'remap':
+            #     size = args['size']
+            #     base = args['base']
+            #     alias = args['alias']
+            #     self.ql.hw.setup_remap(alias, base, size, info=f'[{name}]')
+
+            elif memtype == 'mmio':
                 size = args['size']
                 base = args['base']
-                self.ql.hw.setup_mmio(base, size, info=f'[{name}]')
+                self.ql.hw.setup_mmio(base, size, name)
 
-            if memtype == 'core':
+            elif memtype == 'core':
                 self.ql.hw.create(name.lower())
 
+            else:
+                self.ql.log.debug(f'ignoring unknown memory type "{memtype}" for {name}')
+
     def run(self):
         self.load_profile()
         self.load_env()
-        
+
         ## Handle interrupt from instruction execution
         self.ql.hook_intr(self.ql.arch.unicorn_exception_handler)
-                
+
         self.reset()
diff --git a/qiling/loader/pe.py b/qiling/loader/pe.py
index 7e6746de6..30959d0f8 100644
--- a/qiling/loader/pe.py
+++ b/qiling/loader/pe.py
@@ -10,7 +10,8 @@
 import pickle
 import secrets
 import ntpath
-from typing import TYPE_CHECKING, Any, Dict, MutableMapping, NamedTuple, Optional, Mapping, Sequence, Tuple, Union
+from collections import namedtuple
+from typing import TYPE_CHECKING, Any, Dict, List, MutableMapping, NamedTuple, Optional, Mapping, Sequence, Tuple, Union
 
 from unicorn import UcError
 from unicorn.x86_const import UC_X86_REG_CR4, UC_X86_REG_CR8
@@ -29,6 +30,13 @@
     from logging import Logger
     from qiling import Qiling
 
+class ForwardedExport(NamedTuple):
+    source_dll: str
+    source_ordinal: str
+    source_symbol: str
+    target_dll: str
+    target_symbol: str
+
 
 class QlPeCacheEntry(NamedTuple):
     ba: int
@@ -79,6 +87,16 @@ class Process:
     export_symbols: MutableMapping[int, Dict[str, Any]]
     libcache: Optional[QlPeCache]
 
+    # maps image base to RVA of its function table
+    function_table_lookup: Dict[int, int]
+
+    # maps image base to its list of function table entries
+    function_tables: MutableMapping[int, List]
+
+    # List of exports which have been forwarded from
+    # one DLL to another.
+    forwarded_exports: List[ForwardedExport]
+
     def __init__(self, ql: Qiling):
         self.ql = ql
 
@@ -105,6 +123,108 @@ def __get_path_elements(self, name: str) -> Tuple[str, str]:
         vpath = ntpath.join(dirname, basename)
 
         return self.ql.os.path.virtual_to_host_path(vpath), basename.casefold()
+    
+    def init_function_tables(self, pe: pefile.PE, image_base: int):
+        """Parse function table data for the given PE file.
+        Only really relevant for non-x86 images.
+
+        Args:
+            pe: the PE image whose function data should be parsed
+            image_base: the absolute address at which the image was loaded
+        """
+        if self.ql.arch.type is not QL_ARCH.X86:
+
+            # Check if the PE file has an exception directory
+            if hasattr(pe, 'DIRECTORY_ENTRY_EXCEPTION'):
+                exception_dir = pe.OPTIONAL_HEADER.DATA_DIRECTORY[
+                    pefile.DIRECTORY_ENTRY['IMAGE_DIRECTORY_ENTRY_EXCEPTION']
+                ]
+                
+                self.function_table_lookup[image_base] = exception_dir.VirtualAddress
+
+                runtime_function_list = list(pe.DIRECTORY_ENTRY_EXCEPTION)
+
+                if image_base not in self.function_tables:
+                    self.function_tables[image_base] = []
+
+                self.function_tables[image_base].extend(runtime_function_list)
+
+                self.ql.log.debug(f'Parsed {len(runtime_function_list)} exception directory entries')
+
+            else:
+                self.ql.log.debug(f'Image has no exception directory; skipping exception data')
+
+    def lookup_function_entry(self, base_addr: int, control_pc: int):
+        """Look up a RUNTIME_FUNCTION entry and its index in a module's
+        function table, such that the given program counter falls within
+        the entry's begin and end range.
+
+        Args:
+            base_addr: The base address of the image whose exception directory to search.
+            control_pc: The program counter.
+
+        Returns:
+            A tuple (index, runtime_function)
+        """
+        function_table = self.function_tables[base_addr]
+
+        # Initiate a search of the function table for a RUNTIME_FUNCTION
+        # entry such that the provided PC falls within its start and end range.
+        return next(((i, rtfunc) for i, rtfunc in enumerate(function_table)
+                     if rtfunc.struct.BeginAddress <= control_pc - base_addr < rtfunc.struct.EndAddress),
+                     (None, None))
+    
+    def resolve_forwarded_exports(self):
+        while self.forwarded_exports:
+            forwarded_export = self.forwarded_exports.pop()
+
+            source_dll = forwarded_export.source_dll
+            source_ordinal = forwarded_export.source_ordinal
+            source_symbol = forwarded_export.source_symbol
+            target_dll = forwarded_export.target_dll
+            target_symbol = forwarded_export.target_symbol
+
+            if not source_symbol:
+                # Some DLLs (shlwapi.dll) have a bunch of forwarded
+                # exports with ordinals but no symbols.
+                # These are really annoying to deal with, but they are
+                # used extremely rarely, so we will ignore them.
+                continue
+
+            target_iat = self.import_address_table.get(target_dll)
+
+            if not target_iat:
+                # If IAT was not found, it is probably a virtual library.
+                continue
+
+            # If we have an existing entry in the process IAT for the code
+            # this entry forwards to, then we will point the symbol there
+            # rather than the symbol string in the exporter's data section.
+            forward_ea = target_iat.get(target_symbol)
+
+            if not forward_ea:
+                self.ql.log.warning(f"Forwarding symbol {source_dll}.{source_symbol} to {target_dll}.{target_symbol}: Failed to resolve address")
+                continue
+
+            self.import_address_table[source_dll][source_symbol] = forward_ea
+            self.import_address_table[source_dll][source_ordinal] = forward_ea
+
+            # Register the new address as having the source symbol/ordinal.
+            # This way, hooks on forward source symbols will function
+            # correctly.
+
+            self.import_symbols[forward_ea] = {
+                'name'    : source_symbol,
+                'ordinal' : source_ordinal,
+                'dll'     : source_dll.split('.')[0]
+            }
+
+            # TODO: With the above code, hooks on functions which are
+            # forward targets may not work correctly.
+            # The most correct way to resolve this would be to add
+            # support for addresses to be associated with multiple symbols.
+
+            self.ql.log.debug(f"Forwarding symbol {source_dll}.{source_symbol} to {target_dll}.{target_symbol}: Resolved symbol to ({forward_ea:#x})")
 
     def load_dll(self, name: str, is_driver: bool = False) -> int:
         dll_path, dll_name = self.__get_path_elements(name)
@@ -195,6 +315,9 @@ def load_dll(self, name: str, is_driver: bool = False) -> int:
                 with ShowProgress(self.ql.log, 0.1337):
                     dll.relocate_image(image_base)
 
+            # initialize the function tables only after possible relocation
+            self.init_function_tables(dll, image_base)
+
             data = bytearray(dll.get_memory_mapped_image())
             assert image_size >= len(data)
 
@@ -203,6 +326,31 @@ def load_dll(self, name: str, is_driver: bool = False) -> int:
             for sym in dll.DIRECTORY_ENTRY_EXPORT.symbols:
                 ea = image_base + sym.address
 
+                if sym.forwarder:
+                    # Some exports are forwarders, meaning they
+                    # actually refer to code in other libraries.
+                    # 
+                    # For example, calls to
+                    # kernel32.InterlockedPushEntrySList
+                    #   should be forwarded to
+                    # ntdll.RtlInterlockedPushEntrySList
+                    #
+                    # If we do not properly account for forwarders then
+                    # calls to these symbols will land in the exporter's
+                    # data section and cause a lot of problems.
+                    forward_str = sym.forwarder
+
+                    if b'.' in forward_str:
+                        target_dll_name, target_symbol_name = forward_str.split(b'.', 1)
+
+                        target_dll_filename = (target_dll_name.lower() + b'.dll').decode()
+
+                        # Remember the forwarded export for later.
+                        forwarded_export = ForwardedExport(dll_name, sym.ordinal, sym.name,
+                                                           target_dll_filename, target_symbol_name)
+
+                        self.forwarded_exports.append(forwarded_export)
+
                 import_symbols[ea] = {
                     'name'    : sym.name,
                     'ordinal' : sym.ordinal,
@@ -227,6 +375,8 @@ def load_dll(self, name: str, is_driver: bool = False) -> int:
         self.import_address_table[dll_name] = import_table
         self.import_symbols.update(import_symbols)
 
+        self.resolve_forwarded_exports()
+
         dll_base = image_base
         dll_len = image_size
 
@@ -281,8 +431,8 @@ def call_dll_entrypoint(self, dll: pefile.PE, dll_base: int, dll_len: int, dll_n
         # the blacklist may be revisited from time to time to see if any of the file
         # can be safely unlisted.
         blacklist = {
-            32 : ('gdi32.dll',),
-            64 : ('gdi32.dll',)
+            32 : ('gdi32.dll','user32.dll',),
+            64 : ('gdi32.dll','user32.dll',)
         }[self.ql.arch.bits]
 
         if dll_name in blacklist:
@@ -494,6 +644,7 @@ def init_imports(self, pe: pefile.PE, is_driver: bool):
 
                 # DLLs that seem to contain most of the requested symbols
                 key_dlls = (
+                    'kernel32.dll',
                     'ntdll.dll',
                     'kernelbase.dll',
                     'ucrtbase.dll'
@@ -674,12 +825,14 @@ def __init__(self, ql: Qiling, libcache: bool):
     def run(self):
         self.init_dlls = (
             'ntdll.dll',
-            'kernel32.dll',
+            'kernelbase.dll', # kernel32 forwards some exports to kernelbase
+            'kernel32.dll',   # for efficiency, load kernelbase first
             'user32.dll'
         )
 
         self.sys_dlls = (
             'ntdll.dll',
+            'kernelbase.dll',
             'kernel32.dll',
             'mscoree.dll',
             'ucrtbase.dll'
@@ -709,6 +862,9 @@ def run(self):
         self.export_symbols = {}
         self.import_address_table = {}
         self.ldr_list = []
+        self.function_tables = {}
+        self.function_table_lookup = {}
+        self.forwarded_exports = []
         self.pe_image_address = 0
         self.pe_image_size = 0
         self.dll_size = 0
@@ -841,6 +997,9 @@ def load(self, pe: Optional[pefile.PE]):
                 # set up call frame for DllMain
                 self.ql.os.fcall.call_native(self.entry_point, args, None)
 
+            # Initialize the function tables
+            super().init_function_tables(pe, image_base)
+
         elif pe is None:
             self.ql.mem.map(self.entry_point, self.ql.os.code_ram_size, info="[shellcode]")
 
diff --git a/qiling/os/blob/blob.py b/qiling/os/blob/blob.py
index 02e6f94d3..af52fa74a 100644
--- a/qiling/os/blob/blob.py
+++ b/qiling/os/blob/blob.py
@@ -8,6 +8,8 @@
 from qiling.const import QL_ARCH, QL_OS
 from qiling.os.fcall import QlFunctionCall
 from qiling.os.os import QlOs
+from qiling.os.memory import QlMemoryHeap
+
 
 class QlOsBlob(QlOs):
     """ QlOsBlob for bare barines.
@@ -21,7 +23,7 @@ class QlOsBlob(QlOs):
     type = QL_OS.BLOB
 
     def __init__(self, ql: Qiling):
-        super(QlOsBlob, self).__init__(ql)
+        super().__init__(ql)
 
         self.ql = ql
 
@@ -39,11 +41,20 @@ def __init__(self, ql: Qiling):
         self.fcall = QlFunctionCall(ql, cc)
 
     def run(self):
-        if self.ql.entry_point:
+        # if entry point was set explicitly, override the default one
+        if self.ql.entry_point is not None:
             self.entry_point = self.ql.entry_point
 
-        self.exit_point = self.ql.loader.load_address + len(self.ql.code)
-        if self.ql.exit_point:
-            self.exit_point = self.ql.exit_point
+        self.exit_point = self.load_address + len(self.ql.code)
 
+        # if exit point was set explicitly, override the default one
+        if self.ql.exit_point is not None:
+            self.exit_point = self.ql.exit_point
+        
+        # if heap info is provided in profile, create heap
+        heap_base = self.profile.getint('CODE', 'heap_address', fallback=None)
+        heap_size = self.profile.getint('CODE', 'heap_size', fallback=None)
+        if heap_base is not None and heap_size is not None:
+            self.heap = QlMemoryHeap(self.ql, heap_base, heap_base + heap_size)
+        
         self.ql.emu_start(self.entry_point, self.exit_point, self.ql.timeout, self.ql.count)
diff --git a/qiling/os/disk.py b/qiling/os/disk.py
index 765712ac1..ddc68e4a3 100644
--- a/qiling/os/disk.py
+++ b/qiling/os/disk.py
@@ -1,23 +1,33 @@
 #!/usr/bin/env python3
-# 
+#
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
+from typing import AnyStr, Optional, Union
 from .mapper import QlFsMappedObject
 
+ReadableBuffer = Union[bytes, bytearray, memoryview]
+
+
 # Open a file as a Disk
 #     host_path: The file path on the host machine.
 #     drive_path: The drive path on the emulated system. e.g. /dev/sda \\.\PHYSICALDRIVE0 0x80
-# 
+#
 # Note: CHS and LBA support is very limited since a raw file doesn't contain enough information.
 #       We simply assume that it is a disk with 1 head, 1 cylinder and (filesize/512) sectors.
+#
 # See: https://en.wikipedia.org/wiki/Cylinder-head-sector
 #      https://en.wikipedia.org/wiki/Logical_block_addressing
 #      http://www.uruk.org/orig-grub/PC_partitioning.txt
+
 class QlDisk(QlFsMappedObject):
 
-    def __init__(self, host_path, drive_path, n_heads=1, n_cylinders=1, sector_size=512):
-        self._host_path = host_path
+    # 512 bytes/sector
+    # 63 sectors/track
+    # 255 heads (tracks/cylinder)
+    # 1024 cylinders
+
+    def __init__(self, host_path: AnyStr, drive_path, n_cylinders: int = 1, n_heads: int = 1, sector_size: int = 512):
         self._drive_path = drive_path
         self._fp = open(host_path, "rb+")
         self._n_heads = n_heads
@@ -25,7 +35,7 @@ def __init__(self, host_path, drive_path, n_heads=1, n_cylinders=1, sector_size=
         self._sector_size = sector_size
         self.lseek(0, 2)
         self._filesize = self.tell()
-        self._n_sectors = (self._filesize - 1)// self.sector_size + 1
+        self._n_sectors = (self._filesize - 1) // self.sector_size + 1
 
     def __del__(self):
         if not self.fp.closed:
@@ -51,50 +61,43 @@ def n_cylinders(self):
     def sector_size(self):
         return self._sector_size
 
-    @property
-    def host_path(self):
-        return self._host_path
-    
-    @property
-    def drive_path(self):
-        return self._drive_path
-    
     @property
     def fp(self):
         return self._fp
 
     # Methods from FsMappedObject
-    def read(self, l):
-        return self.fp.read(l)
-    
-    def write(self, bs):
-        return self.fp.write(bs)
+    def read(self, size: Optional[int]) -> bytes:
+        return self.fp.read(size)
 
-    def lseek(self, offset, origin):
+    def write(self, buffer: ReadableBuffer) -> int:
+        return self.fp.write(buffer)
+
+    def lseek(self, offset: int, origin: int) -> int:
         return self.fp.seek(offset, origin)
-    
-    def tell(self):
+
+    def tell(self) -> int:
         return self.fp.tell()
 
-    def close(self):
-        return self.fp.close()
-    
+    def close(self) -> None:
+        self.fp.close()
+
     # Methods for QlDisk
-    def lba(self, cylinder, head, sector):
+    def lba(self, cylinder: int, head: int, sector: int) -> int:
         return (cylinder * self.n_heads + head) * self._n_sectors + sector - 1
-    
-    def read_sectors(self, lba, cnt):
+
+    def read_sectors(self, lba: int, cnt: int) -> bytes:
         self.lseek(self.sector_size * lba, 0)
-        return self.read(self.sector_size*cnt)
-    
-    def read_chs(self, cylinder, head, sector, cnt):
+
+        return self.read(self.sector_size * cnt)
+
+    def read_chs(self, cylinder: int, head: int, sector: int, cnt: int) -> bytes:
         return self.read_sectors(self.lba(cylinder, head, sector), cnt)
 
-    def write_sectors(self, lba, cnt, buffer):
-        if len(buffer) > self.sector_size * cnt:
-            buffer = buffer[:self.sector_size*cnt]
+    def write_sectors(self, lba: int, cnt: int, buffer: ReadableBuffer) -> int:
+        buffer = memoryview(buffer)
         self.lseek(self.sector_size * lba, 0)
-        return self.write(buffer)
-    
-    def write_chs(self, cylinder, head, sector, cnt, buffer):
-        return self.write_sectors(self.lba(cylinder, head, sector), cnt, buffer)
\ No newline at end of file
+
+        return self.write(buffer[:self.sector_size * cnt])
+
+    def write_chs(self, cylinder: int, head: int, sector: int, cnt: int, buffer: ReadableBuffer):
+        return self.write_sectors(self.lba(cylinder, head, sector), cnt, buffer)
diff --git a/qiling/os/dos/interrupts/int21.py b/qiling/os/dos/interrupts/int21.py
index da9ea64e9..0b3dc02f4 100644
--- a/qiling/os/dos/interrupts/int21.py
+++ b/qiling/os/dos/interrupts/int21.py
@@ -9,11 +9,6 @@
 
 from .. import utils
 
-# exit
-def __leaf_4c(ql: Qiling):
-    ql.log.info("Program terminated gracefully")
-    ql.emu_stop()
-
 # write a character to screen
 def __leaf_02(ql: Qiling):
     ch = ql.arch.regs.dl
@@ -131,6 +126,45 @@ def __leaf_43(ql: Qiling):
     ql.arch.regs.cx = 0xffff
     ql.os.clear_cf()
 
+
+def __leaf_48(ql: Qiling):
+    """Allocate memory.
+    """
+
+    size = ql.arch.regs.bx * 0x10
+
+    # announce it but do not do anything really
+    ql.log.debug(f'allocating memory block at {addr:#06x} to {size:#x} bytes')
+
+    # success
+    ql.os.clear_cf()
+
+
+def __leaf_49(ql: Qiling):
+    """Deallocate memory.
+    """
+    ...
+
+
+def __leaf_4a(ql: Qiling):
+    """Modify memory allocation.
+    """
+
+    addr = ql.arch.regs.es
+    size = ql.arch.regs.bx * 0x10
+
+    # announce it but do not do anything really
+    ql.log.debug(f'resizing memory block at {addr:#06x} to {size:#x} bytes')
+
+    # success
+    ql.os.clear_cf()
+
+
+def __leaf_4c(ql: Qiling):
+    ql.log.info("Program terminated gracefully")
+    ql.emu_stop()
+
+
 def handler(ql: Qiling):
     ah = ql.arch.regs.ah
 
diff --git a/qiling/os/linux/kernel_api/kernel_api.py b/qiling/os/linux/kernel_api/kernel_api.py
index c16ed3256..43510e3e7 100644
--- a/qiling/os/linux/kernel_api/kernel_api.py
+++ b/qiling/os/linux/kernel_api/kernel_api.py
@@ -1,5 +1,5 @@
 #!/usr/bin/env python3
-# 
+#
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
@@ -53,31 +53,63 @@ def hook_mcount(ql, address, params):
     return 0
 
 
-@linux_kernel_api(params={
-    "Ptr": POINTER
-})
-def hook___x86_indirect_thunk_rax(ql, address, params):
-    return 0
+def __x86_indirect_thunk(ql: Qiling, dest: int):
+    ql.log.debug('retpoline to %#010x', dest)
 
+    ql.arch.regs.arch_pc = dest
+
+# using passthru as a hack to avoid syscall handler overwrite instruction pointer
+@linux_kernel_api(passthru=True)
+def hook___x86_indirect_thunk_rax(ql: Qiling, address: int, params):
+    __x86_indirect_thunk(ql, ql.arch.regs.rax)
 
-@linux_kernel_api(params={
-    "Ptr": POINTER
-})
-def hook__copy_to_user(ql, address, params):
-    return 0
+
+@linux_kernel_api(passthru=True)
+def hook___x86_indirect_thunk_r14(ql, address, params):
+    __x86_indirect_thunk(ql, ql.arch.regs.r14)
 
 
 @linux_kernel_api(params={
-    "Ptr": POINTER
+    "ubuf": POINTER,
+    "kbuf": POINTER,
+    "count": SIZE_T
 })
-def hook__copy_from_user(ql, address, params):
+def hook__copy_to_user(ql: Qiling, address: int, params) -> int:
+    ubuf  = params['ubuf']
+    kbuf  = params['kbuf']
+    count = params['count']
+
+    # if user-mode buffer is not available, fail
+    # TODO: also fail if destination is not writeable
+    if not ql.mem.is_mapped(ubuf, count):
+        return count
+
+    data = ql.mem.read(kbuf, count)
+
+    ql.mem.write(ubuf, data)
+
     return 0
 
 
 @linux_kernel_api(params={
-    "Ptr": POINTER
+    "kbuf": POINTER,
+    "ubuf": POINTER,
+    "count": SIZE_T
 })
-def hook___x86_indirect_thunk_r14(ql, address, params):
+def hook__copy_from_user(ql: Qiling, address: int, params) -> int:
+    ubuf  = params['ubuf']
+    kbuf  = params['kbuf']
+    count = params['count']
+
+    # if user-mode buffer is not available, fail
+    # TODO: also fail if source is not readable
+    if not ql.mem.is_mapped(ubuf, count):
+        return count
+
+    data = ql.mem.read(ubuf, count)
+
+    ql.mem.write(kbuf, data)
+
     return 0
 
 
diff --git a/qiling/os/memory.py b/qiling/os/memory.py
index ec643c0e4..760438952 100644
--- a/qiling/os/memory.py
+++ b/qiling/os/memory.py
@@ -6,15 +6,32 @@
 import bisect
 import os
 import re
-from typing import Any, Callable, Iterator, List, Mapping, Optional, Pattern, Sequence, Tuple, Union
+from typing import Any, Callable, Dict, Iterator, List, Mapping, Optional, Pattern, Protocol, Sequence, Tuple, Union
 
 from unicorn import UC_PROT_NONE, UC_PROT_READ, UC_PROT_WRITE, UC_PROT_EXEC, UC_PROT_ALL
 
 from qiling import Qiling
 from qiling.exception import *
 
-# tuple: range start, range end, permissions mask, range label, is mmio?
-MapInfoEntry = Tuple[int, int, int, str, bool]
+
+class QlMmioHandler(Protocol):
+    """A simple MMIO handler boilerplate that can be used to implement memory mapped devices.
+
+    This should be extended to implement mapped devices state machines. Note that the read and write
+    methods are optional, where their existance indicates whether the device supports the corresponding
+    operation. That is, an unimplemented method means the corresponding operation will be silently
+    dropped.
+    """
+
+    def read(self, ql: Qiling, offset: int, size: int) -> int:
+        ...
+
+    def write(self, ql: Qiling, offset: int, size: int, value: int) -> None:
+        ...
+
+
+# tuple: range start, range end, permissions mask, range label, mmio hander object (if mmio range)
+MapInfoEntry = Tuple[int, int, int, str, Optional[QlMmioHandler]]
 
 MmioReadCallback  = Callable[[Qiling, int, int], int]
 MmioWriteCallback = Callable[[Qiling, int, int, int], None]
@@ -29,7 +46,6 @@ class QlMemoryManager:
     def __init__(self, ql: Qiling, pagesize: int = 0x1000):
         self.ql = ql
         self.map_info: List[MapInfoEntry] = []
-        self.mmio_cbs = {}
 
         bit_stuff = {
             64: (1 << 64) - 1,
@@ -48,6 +64,31 @@ def __init__(self, ql: Qiling, pagesize: int = 0x1000):
         # make sure pagesize is a power of 2
         assert self.pagesize & (self.pagesize - 1) == 0, 'pagesize has to be a power of 2'
 
+        self._packers = {
+            (1, True): ql.pack8s,
+            (2, True): ql.pack16s,
+            (4, True): ql.pack32s,
+            (8, True): ql.pack64s,
+
+            (1, False): ql.pack8,
+            (2, False): ql.pack16,
+            (4, False): ql.pack32,
+            (8, False): ql.pack64
+        }
+
+        self._unpackers = {
+            (1, True): ql.unpack8s,
+            (2, True): ql.unpack16s,
+            (4, True): ql.unpack32s,
+            (8, True): ql.unpack64s,
+
+            (1, False): ql.unpack8,
+            (2, False): ql.unpack16,
+            (4, False): ql.unpack32,
+            (8, False): ql.unpack64
+        }
+
+
     def __read_string(self, addr: int) -> str:
         ret = bytearray()
         c = self.read(addr, 1)
@@ -57,7 +98,7 @@ def __read_string(self, addr: int) -> str:
             addr += 1
             c = self.read(addr, 1)
 
-        return ret.decode()
+        return ret.decode('latin1')
 
     def __write_string(self, addr: int, s: str, encoding: str):
         self.write(addr, bytes(s, encoding) + b'\x00')
@@ -80,7 +121,7 @@ def string(self, addr: int, value=None, encoding='utf-8') -> Optional[str]:
 
         self.__write_string(addr, value, encoding)
 
-    def add_mapinfo(self, mem_s: int, mem_e: int, mem_p: int, mem_info: str, is_mmio: bool = False):
+    def add_mapinfo(self, mem_s: int, mem_e: int, mem_p: int, mem_info: str, mmio_ctx: Optional[QlMmioHandler] = None):
         """Add a new memory range to map.
 
         Args:
@@ -88,10 +129,10 @@ def add_mapinfo(self, mem_s: int, mem_e: int, mem_p: int, mem_info: str, is_mmio
             mem_e: memory range end
             mem_p: permissions mask
             mem_info: map entry label
-            is_mmio: memory range is mmio
+            mmio_ctx: mmio handler object; if specified the range will be treated as mmio
         """
 
-        bisect.insort(self.map_info, (mem_s, mem_e, mem_p, mem_info, is_mmio))
+        bisect.insort(self.map_info, (mem_s, mem_e, mem_p, mem_info, mmio_ctx))
 
     def del_mapinfo(self, mem_s: int, mem_e: int):
         """Subtract a memory range from map.
@@ -105,13 +146,13 @@ def del_mapinfo(self, mem_s: int, mem_e: int):
 
         def __split_overlaps():
             for idx in overlap_ranges:
-                lbound, ubound, perms, label, is_mmio = self.map_info[idx]
+                lbound, ubound, perms, label, mmio_ctx = self.map_info[idx]
 
                 if lbound < mem_s:
-                    yield (lbound, mem_s, perms, label, is_mmio)
+                    yield (lbound, mem_s, perms, label, mmio_ctx)
 
                 if mem_e < ubound:
-                    yield (mem_e, ubound, perms, label, is_mmio)
+                    yield (mem_e, ubound, perms, label, mmio_ctx)
 
         # indices of first and last overlapping ranges. since map info is always
         # sorted, we know that all overlapping rages are consecutive, so i1 > i0
@@ -129,27 +170,30 @@ def __split_overlaps():
         for entry in new_entries:
             bisect.insort(self.map_info, entry)
 
-    def change_mapinfo(self, mem_s: int, mem_e: int, mem_p: Optional[int] = None, mem_info: Optional[str] = None):
-        tmp_map_info: Optional[MapInfoEntry] = None
-        info_idx: int = -1
-
-        for idx, map_info in enumerate(self.map_info):
-            if mem_s >= map_info[0] and mem_e <= map_info[1]:
-                tmp_map_info = map_info
-                info_idx = idx
-                break
+    def change_mapinfo(self, mem_s: int, mem_e: int, *, new_perms: Optional[int] = None, new_info: Optional[str] = None) -> None:
+        if new_perms is None and new_info is None:
+            # nothing to do
+            return
 
-        if tmp_map_info is None:
+        try:
+            # locate the map info entry to change
+            entry = next(entry for entry in self.map_info if mem_s >= entry[0] and mem_e <= entry[1])
+        except StopIteration:
             self.ql.log.error(f'Cannot change mapinfo at {mem_s:#08x}-{mem_e:#08x}')
             return
 
-        if mem_p is not None:
-            self.del_mapinfo(mem_s, mem_e)
-            self.add_mapinfo(mem_s, mem_e, mem_p, mem_info if mem_info else tmp_map_info[3])
-            return
+        _, _, perms, info, mmio_ctx = entry
 
-        if mem_info is not None:
-            self.map_info[info_idx] = (tmp_map_info[0], tmp_map_info[1], tmp_map_info[2], mem_info, tmp_map_info[4])
+        # caller wants to change perms?
+        if new_perms is not None:
+            perms = new_perms
+
+        # caller wants to change info?
+        if new_info is not None:
+            info = new_info
+
+        self.del_mapinfo(mem_s, mem_e)
+        self.add_mapinfo(mem_s, mem_e, perms, info, mmio_ctx)
 
     def get_mapinfo(self) -> Sequence[Tuple[int, int, str, str, str]]:
         """Get memory map info.
@@ -168,18 +212,18 @@ def __perms_mapping(ps: int) -> str:
 
             return ''.join(val if idx & ps else '-' for idx, val in perms_d.items())
 
-        def __process(lbound: int, ubound: int, perms: int, label: str, is_mmio: bool) -> Tuple[int, int, str, str, str]:
-            perms_str = __perms_mapping(perms)
+        def __process(entry: MapInfoEntry) -> Tuple[int, int, str, str, str]:
+            lbound, ubound, perms, label, mmio_ctx = entry
 
             if hasattr(self.ql, 'loader'):
                 image = self.ql.loader.find_containing_image(lbound)
-                container = image.path if image and not is_mmio else ''
+                container = image.path if image and mmio_ctx is None else ''
             else:
                 container = ''
 
-            return (lbound, ubound, perms_str, label, container)
+            return (lbound, ubound, __perms_mapping(perms), label, container)
 
-        return tuple(__process(*entry) for entry in self.map_info)
+        return tuple(__process(entry) for entry in self.map_info)
 
     def get_formatted_mapinfo(self) -> Sequence[str]:
         """Get memory map info in a nicely formatted table.
@@ -270,12 +314,13 @@ def save(self):
             "mmio" : []
         }
 
-        for lbound, ubound, perm, label, is_mmio in self.map_info:
-            if is_mmio:
-                mem_dict['mmio'].append((lbound, ubound, perm, label, *self.mmio_cbs[(lbound, ubound)]))
+        for lbound, ubound, perm, label, mmio_ctx in self.map_info:
+            if mmio_ctx is None:
+                key, data = 'ram', bytes(self.read(lbound, ubound - lbound))
             else:
-                data = self.read(lbound, ubound - lbound)
-                mem_dict['ram'].append((lbound, ubound, perm, label, bytes(data)))
+                key, data = 'mmio', mmio_ctx
+
+            mem_dict[key].append((lbound, ubound, perm, label, data))
 
         return mem_dict
 
@@ -294,12 +339,12 @@ def restore(self, mem_dict):
             self.ql.log.debug(f'writing {len(data):#x} bytes at {lbound:#08x}')
             self.write(lbound, data)
 
-        for lbound, ubound, perms, label, read_cb, write_cb in mem_dict['mmio']:
+        for lbound, ubound, perms, label, handler in mem_dict['mmio']:
             self.ql.log.debug(f"restoring mmio range: {lbound:#08x} {ubound:#08x} {label}")
 
             size = ubound - lbound
             if not self.is_mapped(lbound, size):
-                self.map_mmio(lbound, size, read_cb, write_cb, info=label)
+                self.map_mmio(lbound, size, handler, label)
 
     def read(self, addr: int, size: int) -> bytearray:
         """Read bytes from memory.
@@ -328,22 +373,12 @@ def read_ptr(self, addr: int, size: int = 0, *, signed = False) -> int:
         if not size:
             size = self.ql.arch.pointersize
 
-        __unpack = ({
-            1: self.ql.unpack8s,
-            2: self.ql.unpack16s,
-            4: self.ql.unpack32s,
-            8: self.ql.unpack64s
-        } if signed else {
-            1: self.ql.unpack8,
-            2: self.ql.unpack16,
-            4: self.ql.unpack32,
-            8: self.ql.unpack64
-        }).get(size)
-
-        if __unpack is None:
+        try:
+            _unpack = self._unpackers[(size, signed)]
+        except KeyError:
             raise QlErrorStructConversion(f"Unsupported pointer size: {size}")
 
-        return __unpack(self.read(addr, size))
+        return _unpack(self.read(addr, size))
 
     def write(self, addr: int, data: bytes) -> None:
         """Write bytes to a memory.
@@ -369,22 +404,12 @@ def write_ptr(self, addr: int, value: int, size: int = 0, *, signed = False) ->
         if not size:
             size = self.ql.arch.pointersize
 
-        __pack = ({
-            1: self.ql.pack8s,
-            2: self.ql.pack16s,
-            4: self.ql.pack32s,
-            8: self.ql.pack64s
-        } if signed else {
-            1: self.ql.pack8,
-            2: self.ql.pack16,
-            4: self.ql.pack32,
-            8: self.ql.pack64
-        }).get(size)
-
-        if __pack is None:
+        try:
+            _pack = self._packers[(size, signed)]
+        except KeyError:
             raise QlErrorStructConversion(f"Unsupported pointer size: {size}")
 
-        self.write(addr, __pack(value))
+        self.write(addr, _pack(value))
 
     def search(self, needle: Union[bytes, Pattern[bytes]], begin: Optional[int] = None, end: Optional[int] = None) -> List[int]:
         """Search for a sequence of bytes in memory.
@@ -408,7 +433,7 @@ def search(self, needle: Union[bytes, Pattern[bytes]], begin: Optional[int] = No
         assert begin < end, 'search arguments do not make sense'
 
         # narrow the search down to relevant ranges; mmio ranges are excluded due to potential read side effects
-        ranges = [(max(begin, lbound), min(ubound, end)) for lbound, ubound, _, _, is_mmio in self.map_info if not (end < lbound or ubound < begin or is_mmio)]
+        ranges = [(max(begin, lbound), min(ubound, end)) for lbound, ubound, _, _, mmio_ctx in self.map_info if not (end < lbound or ubound < begin or mmio_ctx is not None)]
         results = []
 
         # if needle is a bytes sequence use it verbatim, not as a pattern
@@ -434,9 +459,6 @@ def unmap(self, addr: int, size: int) -> None:
         self.del_mapinfo(addr, addr + size)
         self.ql.uc.mem_unmap(addr, size)
 
-        if (addr, addr + size) in self.mmio_cbs:
-            del self.mmio_cbs[(addr, addr+size)]
-
     def unmap_between(self, mem_s: int, mem_e: int) -> None:
         """Reclaim any allocated memory region within the specified range.
 
@@ -595,7 +617,7 @@ def protect(self, addr: int, size: int, perms):
         aligned_size = self.align_up((addr & (self.pagesize - 1)) + size)
 
         self.ql.uc.mem_protect(aligned_address, aligned_size, perms)
-        self.change_mapinfo(aligned_address, aligned_address + aligned_size, perms)
+        self.change_mapinfo(aligned_address, aligned_address + aligned_size, new_perms=perms)
 
     def map(self, addr: int, size: int, perms: int = UC_PROT_ALL, info: Optional[str] = None):
         """Map a new memory range.
@@ -617,17 +639,17 @@ def map(self, addr: int, size: int, perms: int = UC_PROT_ALL, info: Optional[str
             raise QlMemoryMappedError('Requested memory is unavailable')
 
         self.ql.uc.mem_map(addr, size, perms)
-        self.add_mapinfo(addr, addr + size, perms, info or '[mapped]', is_mmio=False)
+        self.add_mapinfo(addr, addr + size, perms, info or '[mapped]', None)
 
-    def map_mmio(self, addr: int, size: int, read_cb: Optional[MmioReadCallback], write_cb: Optional[MmioWriteCallback], info: str = '[mmio]'):
+    def map_mmio(self, addr: int, size: int, handler: QlMmioHandler, info: str = '[mmio]'):
         # TODO: mmio memory overlap with ram? Is that possible?
         # TODO: Can read_cb or write_cb be None? How uc handle that access?
         prot = UC_PROT_NONE
 
-        if read_cb:
+        if hasattr(handler, 'read'):
             prot |= UC_PROT_READ
 
-        if write_cb:
+        if hasattr(handler, 'write'):
             prot |= UC_PROT_WRITE
 
         # generic mmio read wrapper
@@ -642,10 +664,8 @@ def __mmio_write(uc, offset: int, size: int, value: int, user_data: MmioWriteCal
 
             cb(self.ql, offset, size, value)
 
-        self.ql.uc.mmio_map(addr, size, __mmio_read, read_cb, __mmio_write, write_cb)
-        self.add_mapinfo(addr, addr + size, prot, info, is_mmio=True)
-
-        self.mmio_cbs[(addr, addr + size)] = (read_cb, write_cb)
+        self.ql.uc.mmio_map(addr, size, __mmio_read, handler.read, __mmio_write, handler.write)
+        self.add_mapinfo(addr, addr + size, prot, info, handler)
 
 
 class Chunk:
diff --git a/qiling/os/os.py b/qiling/os/os.py
index 636e089c4..dd9f38564 100644
--- a/qiling/os/os.py
+++ b/qiling/os/os.py
@@ -89,6 +89,7 @@ def __init__(self, ql: Qiling, resolvers: Mapping[Any, Resolver] = {}):
         if self.ql.code:
             # this shellcode entrypoint does not work for windows
             # windows shellcode entry point will comes from pe loader
+            self.load_address = self.profile.getint('CODE', 'load_address')
             self.entry_point = self.profile.getint('CODE', 'entry_point')
             self.code_ram_size = self.profile.getint('CODE', 'ram_size')
 
diff --git a/qiling/os/posix/const.py b/qiling/os/posix/const.py
index a03f68eaa..6fcea41bc 100644
--- a/qiling/os/posix/const.py
+++ b/qiling/os/posix/const.py
@@ -17,6 +17,9 @@
 # File Open Limits
 NR_OPEN = 1024
 
+# number of signals
+NSIG = 32
+
 SOCK_TYPE_MASK = 0x0f
 
 class linux_x86_socket_types(Enum):
@@ -459,6 +462,8 @@ def __str__(self) -> str:
 #          open flags          #
 ################################
 
+FLAG_UNSUPPORTED = -1
+
 class macos_x86_open_flags(QlPrettyFlag):
     O_RDONLY    = 0x000000
     O_WRONLY    = 0x000001
@@ -473,8 +478,8 @@ class macos_x86_open_flags(QlPrettyFlag):
     O_EXCL      = 0x000800
     O_NOCTTY    = 0x020000
     O_DIRECTORY = 0x100000
-    O_BINARY    = 0x000000
-    O_LARGEFILE = 0x000000
+    O_BINARY    = FLAG_UNSUPPORTED
+    O_LARGEFILE = FLAG_UNSUPPORTED
 
 
 class linux_x86_open_flags(QlPrettyFlag):
@@ -491,8 +496,8 @@ class linux_x86_open_flags(QlPrettyFlag):
     O_EXCL      = 0x000080
     O_NOCTTY    = 0x000100
     O_DIRECTORY = 0x010000
-    O_BINARY    = 0x000000
-    O_LARGEFILE = 0x000000
+    O_BINARY    = FLAG_UNSUPPORTED
+    O_LARGEFILE = FLAG_UNSUPPORTED
 
 
 class linux_arm_open_flags(QlPrettyFlag):
@@ -509,7 +514,7 @@ class linux_arm_open_flags(QlPrettyFlag):
     O_EXCL      = 0x000080
     O_NOCTTY    = 0x000100
     O_DIRECTORY = 0x004000
-    O_BINARY    = 0x000000
+    O_BINARY    = FLAG_UNSUPPORTED
     O_LARGEFILE = 0x020000
 
 
@@ -527,7 +532,7 @@ class linux_mips_open_flags(QlPrettyFlag):
     O_EXCL      = 0x000400
     O_NOCTTY    = 0x000800
     O_DIRECTORY = 0x010000
-    O_BINARY    = 0x000000
+    O_BINARY    = FLAG_UNSUPPORTED
     O_LARGEFILE = 0x002000
 
 
@@ -545,8 +550,8 @@ class linux_riscv_open_flags(QlPrettyFlag):
     O_EXCL      = 0x000080
     O_NOCTTY    = 0x000100
     O_DIRECTORY = 0x010000
-    O_BINARY    = 0x000000
-    O_LARGEFILE = 0x000000
+    O_BINARY    = FLAG_UNSUPPORTED
+    O_LARGEFILE = FLAG_UNSUPPORTED
 
 
 class linux_ppc_open_flags(QlPrettyFlag):
@@ -563,7 +568,7 @@ class linux_ppc_open_flags(QlPrettyFlag):
     O_EXCL      = 0x000080
     O_NOCTTY    = 0x000100
     O_DIRECTORY = 0x004000
-    O_BINARY    = 0x000000
+    O_BINARY    = FLAG_UNSUPPORTED
     O_LARGEFILE = 0x010000
 
 
@@ -581,26 +586,26 @@ class freebsd_x86_open_flags(QlPrettyFlag):
     O_EXCL      = 0x000800
     O_NOCTTY    = 0x008000
     O_DIRECTORY = 0x20000
-    O_BINARY    = 0x000000
-    O_LARGEFILE = 0x000000
+    O_BINARY    = FLAG_UNSUPPORTED
+    O_LARGEFILE = FLAG_UNSUPPORTED
 
 
 class windows_x86_open_flags(QlPrettyFlag):
     O_RDONLY    = 0x000000
     O_WRONLY    = 0x000001
     O_RDWR      = 0x000002
-    O_NONBLOCK  = 0x000000
+    O_NONBLOCK  = FLAG_UNSUPPORTED
     O_APPEND    = 0x000008
-    O_ASYNC     = 0x000000
-    O_SYNC      = 0x000000
-    O_NOFOLLOW  = 0x000000
+    O_ASYNC     = FLAG_UNSUPPORTED
+    O_SYNC      = FLAG_UNSUPPORTED
+    O_NOFOLLOW  = FLAG_UNSUPPORTED
     O_CREAT     = 0x000100
     O_TRUNC     = 0x000200
     O_EXCL      = 0x000400
-    O_NOCTTY    = 0x000000
-    O_DIRECTORY = 0x000000
+    O_NOCTTY    = FLAG_UNSUPPORTED
+    O_DIRECTORY = FLAG_UNSUPPORTED
     O_BINARY    = 0x008000
-    O_LARGEFILE = 0x000000
+    O_LARGEFILE = FLAG_UNSUPPORTED
 
 
 class qnx_arm_open_flags(QlPrettyFlag):
@@ -611,13 +616,13 @@ class qnx_arm_open_flags(QlPrettyFlag):
     O_APPEND    = 0x00008
     O_ASYNC     = 0x10000
     O_SYNC      = 0x00020
-    O_NOFOLLOW  = 0x000000
+    O_NOFOLLOW  = FLAG_UNSUPPORTED
     O_CREAT     = 0x00100
     O_TRUNC     = 0x00200
     O_EXCL      = 0x00400
     O_NOCTTY    = 0x00800
-    O_DIRECTORY = 0x000000
-    O_BINARY    = 0x000000
+    O_DIRECTORY = FLAG_UNSUPPORTED
+    O_BINARY    = FLAG_UNSUPPORTED
     O_LARGEFILE = 0x08000
 
 
diff --git a/qiling/os/posix/const_mapping.py b/qiling/os/posix/const_mapping.py
index 2832ae83b..dd95f717e 100644
--- a/qiling/os/posix/const_mapping.py
+++ b/qiling/os/posix/const_mapping.py
@@ -114,12 +114,12 @@ def ql_open_flag_mapping(ql: Qiling, flags: int) -> int:
     # convert emulated os flags to hosting os flags.
     # flags names are consistent across all classes, even if they are not supported, to maintain compatibility
     for ef in emul_flags:
-        # test whether flag i set, excluding unsupported flags and 0 values
-        if ef and flags & ef.value:
+        # test whether flag is set, excluding unsupported flags
+        if (ef.value != FLAG_UNSUPPORTED) and (flags & ef.value):
             hf = host_flags[ef.name or '']
 
             # if flag is also supported on the host, set it
-            if hf:
+            if hf.value != FLAG_UNSUPPORTED:
                 ret |= hf.value
 
     # NOTE: not sure why this one is needed
diff --git a/qiling/os/posix/posix.py b/qiling/os/posix/posix.py
index 1550ef31f..a0a7b6290 100644
--- a/qiling/os/posix/posix.py
+++ b/qiling/os/posix/posix.py
@@ -10,7 +10,7 @@
 from qiling.const import QL_ARCH, QL_INTERCEPT
 from qiling.exception import QlErrorSyscallNotFound
 from qiling.os.os import QlOs
-from qiling.os.posix.const import NR_OPEN, errors
+from qiling.os.posix.const import NR_OPEN, NSIG, errors
 from qiling.os.posix.msq import QlMsq
 from qiling.os.posix.shm import QlShm
 from qiling.os.posix.syscall.abi import QlSyscallABI, arm, intel, mips, ppc, riscv
@@ -49,7 +49,6 @@ class QlOsPosix(QlOs):
 
     def __init__(self, ql: Qiling):
         super().__init__(ql)
-        self.sigaction_act = [0] * 256
 
         conf = self.profile['KERNEL']
         self.uid = self.euid = conf.getint('uid')
@@ -92,6 +91,11 @@ def __init__(self, ql: Qiling):
 
         self._shm = QlShm()
         self._msq = QlMsq()
+        self._sig = [None] * NSIG
+
+        # a bitmap representing the blocked signals. a set bit at index i means signal i is blocked.
+        # note that SIGKILL and SIGSTOP cannot be blocked.
+        self.blocked_signals = 0
 
     def __get_syscall_mapper(self, archtype: QL_ARCH):
         qlos_path = f'.os.{self.type.name.lower()}.map_syscall'
@@ -264,3 +268,7 @@ def shm(self):
     @property
     def msq(self):
         return self._msq
+
+    @property
+    def sig(self):
+        return self._sig
diff --git a/qiling/os/posix/syscall/__init__.py b/qiling/os/posix/syscall/__init__.py
index 38b10e64e..1ed1125e7 100644
--- a/qiling/os/posix/syscall/__init__.py
+++ b/qiling/os/posix/syscall/__init__.py
@@ -14,6 +14,7 @@
 from .ptrace import *
 from .random import *
 from .resource import *
+from .rseq import *
 from .sched import *
 from .select import *
 from .sendfile import *
diff --git a/qiling/os/posix/syscall/rseq.py b/qiling/os/posix/syscall/rseq.py
new file mode 100644
index 000000000..403595a65
--- /dev/null
+++ b/qiling/os/posix/syscall/rseq.py
@@ -0,0 +1,13 @@
+#!/usr/bin/env python3
+#
+# Cross Platform and Multi Architecture Advanced Binary Emulation Framework
+#
+
+from qiling import Qiling
+
+
+def ql_syscall_rseq(ql: Qiling, rseq: int, rseq_len: int, flags: int, sig: int):
+    # indicate rseq is not supported by this kernel
+    # return -ENOSYS
+
+    return 0
diff --git a/qiling/os/posix/syscall/signal.py b/qiling/os/posix/syscall/signal.py
index c0e4583a7..1591cb558 100644
--- a/qiling/os/posix/syscall/signal.py
+++ b/qiling/os/posix/syscall/signal.py
@@ -3,27 +3,189 @@
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
-from qiling import Qiling
+from __future__ import annotations
+
+import ctypes
+from typing import TYPE_CHECKING, Type
+
+from qiling.const import QL_ARCH
+from qiling.os import struct
+from qiling.os.posix.const import NSIG
+
+# TODO: MIPS differs in too many details around signals; MIPS implementation is better extracted out
+
+if TYPE_CHECKING:
+    from qiling import Qiling
+    from qiling.arch.arch import QlArch
+
+
+@struct.cache
+def __make_sigset(arch: QlArch):
+    native_type = struct.get_native_type(arch.bits)
+
+    sigset_type = {
+        QL_ARCH.X86:      native_type,
+        QL_ARCH.X8664:    native_type,
+        QL_ARCH.ARM:      native_type,
+        QL_ARCH.ARM64:    native_type,
+        QL_ARCH.MIPS:     ctypes.c_uint32 * (128 // (4 * 8)),
+        QL_ARCH.CORTEX_M: native_type
+    }
+
+    if arch.type not in sigset_type:
+        raise NotImplementedError(f'sigset definition is missing for {arch.type.name}')
+
+    return sigset_type[arch.type]
+
+
+@struct.cache
+def __make_sigaction(arch: QlArch) -> Type[struct.BaseStruct]:
+    native_type = struct.get_native_type(arch.bits)
+    Struct = struct.get_aligned_struct(arch.bits, arch.endian)
+
+    sigset_type = __make_sigset(arch)
+
+    # # FIXME: untill python 3.11 ctypes Union does not support an endianess that is different from
+    # the hosting paltform. if a LE system is emulating a BE one or vice versa, this will fail. to
+    # work around that we avoid using a union and refer to the inner field as 'sa_handler' regardless.
+    #
+    # Union = struct.get_aligned_union(arch.bits)
+    #
+    # class sighandler_union(Union):
+    #     _fields_ = (
+    #         ('sa_handler',   native_type),
+    #         ('sa_sigaction', native_type)
+    #     )
+
+    # <WORKAROUND> see FIXME above
+    class sighandler_union(Struct):
+        _fields_ = (
+            ('sa_handler',   native_type),
+        )
+    # </WORKAROUND>
+
+    # see: https://elixir.bootlin.com/linux/v5.19.17/source/arch/arm/include/uapi/asm/signal.h
+    class arm_sigaction(Struct):
+        _anonymous_ = ('_u',)
+
+        _fields_ = (
+            ('_u',          sighandler_union),
+            ('sa_mask',     sigset_type),
+            ('sa_flags',    native_type),
+            ('sa_restorer', native_type)
+        )
+
+    # see: https://elixir.bootlin.com/linux/v5.19.17/source/arch/x86/include/uapi/asm/signal.h
+    class x86_sigaction(Struct):
+        _anonymous_ = ('_u',)
+
+        _fields_ = (
+            ('_u',          sighandler_union),
+            ('sa_mask',     sigset_type),
+            ('sa_flags',    native_type),
+            ('sa_restorer', native_type)
+        )
+
+    class x8664_sigaction(Struct):
+        _fields_ = (
+            ('sa_handler',  native_type),
+            ('sa_flags',    native_type),
+            ('sa_restorer', native_type),
+            ('sa_mask',     sigset_type)
+        )
+
+    # see: https://elixir.bootlin.com/linux/v5.19.17/source/arch/mips/include/uapi/asm/signal.h
+    class mips_sigaction(Struct):
+        _fields_ = (
+            ('sa_flags',    ctypes.c_uint32),
+            ('sa_handler',  native_type),
+            ('sa_mask',     sigset_type)
+        )
+
+    sigaction_struct = {
+        QL_ARCH.X86:      x86_sigaction,
+        QL_ARCH.X8664:    x8664_sigaction,
+        QL_ARCH.ARM:      arm_sigaction,
+        QL_ARCH.ARM64:    arm_sigaction,
+        QL_ARCH.MIPS:     mips_sigaction,
+        QL_ARCH.CORTEX_M: arm_sigaction
+    }
+
+    if arch.type not in sigaction_struct:
+        raise NotImplementedError(f'sigaction definition is missing for {arch.type.name}')
+
+    return sigaction_struct[arch.type]
+
 
 def ql_syscall_rt_sigaction(ql: Qiling, signum: int, act: int, oldact: int):
+    SIGKILL = 9
+    SIGSTOP = 23 if ql.arch.type is QL_ARCH.MIPS else 19
+
+    if signum not in range(NSIG) or signum in (SIGKILL, SIGSTOP):
+        return -1   # EINVAL
+
+    sigaction = __make_sigaction(ql.arch)
+
     if oldact:
-        arr = ql.os.sigaction_act[signum] or [0] * 5
-        data = b''.join(ql.pack32(key) for key in arr)
+        old = ql.os.sig[signum] or sigaction()
 
-        ql.mem.write(oldact, data)
+        old.save_to(ql.mem, oldact)
 
     if act:
-        ql.os.sigaction_act[signum] = [ql.mem.read_ptr(act + 4 * i, 4) for i in range(5)]
+        ql.os.sig[signum] = sigaction.load_from(ql.mem, act)
 
     return 0
 
 
-def ql_syscall_rt_sigprocmask(ql: Qiling, how: int, nset: int, oset: int, sigsetsize: int):
-    # SIG_BLOCK = 0x0
-    # SIG_UNBLOCK = 0x1
+def __sigprocmask(ql: Qiling, how: int, newset: int, oldset: int):
+    SIG_BLOCK = 0
+    SIG_UNBLOCK = 1
+    SIG_SETMASK = 2
+
+    SIGKILL = 9
+    SIGSTOP = 19
+
+    if oldset:
+        ql.mem.write_ptr(newset, ql.os.blocked_signals)
+
+    if newset:
+        set_mask = ql.mem.read_ptr(newset)
+
+        if how == SIG_BLOCK:
+            ql.os.blocked_signals |= set_mask
+
+        elif how == SIG_UNBLOCK:
+            ql.os.blocked_signals &= ~set_mask
+
+        elif how == SIG_SETMASK:
+            ql.os.blocked_signals = set_mask
 
+        else:
+            return -1  # EINVAL
+
+        # silently drop attempts to block SIGKILL and SIGSTOP
+        ql.os.blocked_signals &= ~((1 << SIGKILL) | (1 << SIGSTOP))
+
+    return 0
+
+
+def __sigprocmask_mips(ql: Qiling, how: int, newset: int, oldset: int):
+    SIG_BLOCK = 1
+    SIG_UNBLOCK = 2
+    SIG_SETMASK = 3
+
+    SIGKILL = 9
+    SIGSTOP = 23
+
+    # TODO: to implement
     return 0
 
 
+def ql_syscall_rt_sigprocmask(ql: Qiling, how: int, newset: int, oldset: int):
+    impl = __sigprocmask_mips if ql.arch.type is QL_ARCH.MIPS else __sigprocmask
+
+    return impl(ql, how, newset, oldset)
+
+
 def ql_syscall_signal(ql: Qiling, sig: int, sighandler: int):
     return 0
diff --git a/qiling/os/posix/syscall/unistd.py b/qiling/os/posix/syscall/unistd.py
index 62eab143c..09b86c380 100644
--- a/qiling/os/posix/syscall/unistd.py
+++ b/qiling/os/posix/syscall/unistd.py
@@ -152,6 +152,22 @@ def ql_syscall_capset(ql: Qiling, hdrp: int, datap: int):
 
 
 def ql_syscall_kill(ql: Qiling, pid: int, sig: int):
+    if sig not in range(NSIG):
+        return -1   # EINVAL
+
+    if pid > 0 and pid != ql.os.pid:
+        return -1   # ESRCH
+
+    sigaction = ql.os.sig[sig]
+
+    # sa_handler is:
+    #     SIG_DFL for the default action.
+    #     SIG_IGN to ignore this signal.
+    #     handler pointer
+
+    # if sa_flags & SA_SIGINFO:
+    #   call sa_sigaction instead of sa_handler
+
     return 0
 
 
@@ -399,6 +415,8 @@ def ql_syscall_read(ql: Qiling, fd: int, buf: int, length: int):
 
     try:
         data = f.read(length)
+    except IsADirectoryError:
+        return -EISDIR
     except ConnectionError:
         ql.log.debug('read failed due to a connection error')
         return -EIO
diff --git a/qiling/os/uefi/UefiSpec.py b/qiling/os/uefi/UefiSpec.py
index 2259e8c35..583ef6d89 100644
--- a/qiling/os/uefi/UefiSpec.py
+++ b/qiling/os/uefi/UefiSpec.py
@@ -10,6 +10,10 @@
 from .UefiBaseType import *
 from .UefiMultiPhase import *
 
+from .protocols.EfiSimpleTextInProtocol import EFI_SIMPLE_TEXT_INPUT_PROTOCOL
+from .protocols.EfiSimpleTextOutProtocol import EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL
+
+
 # definitions for EFI_TIME.Daylight
 EFI_TIME_ADJUST_DAYLIGHT = (1 << 1)
 EFI_TIME_IN_DAYLIGHT     = (1 << 2)
@@ -223,14 +227,6 @@ class EFI_CONFIGURATION_TABLE(STRUCT):
         ('VendorTable',    PTR(VOID)),
     ]
 
-# TODO: to be implemented
-# @see: MdePkg\Include\Protocol\SimpleTextIn.h
-EFI_SIMPLE_TEXT_INPUT_PROTOCOL = STRUCT
-
-# TODO: to be implemented
-# @see: MdePkg\Include\Protocol\SimpleTextOut.h
-EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL = STRUCT
-
 class EFI_SYSTEM_TABLE(STRUCT):
     _pack_ = 8
 
@@ -264,4 +260,4 @@ class EFI_SYSTEM_TABLE(STRUCT):
     'EFI_DEVICE_PATH_PROTOCOL',
     'EFI_OPEN_PROTOCOL_INFORMATION_ENTRY',
     'EFI_IMAGE_UNLOAD'
-]
\ No newline at end of file
+]
diff --git a/qiling/os/uefi/fncc.py b/qiling/os/uefi/fncc.py
index 83f999bf3..6294cbd74 100644
--- a/qiling/os/uefi/fncc.py
+++ b/qiling/os/uefi/fncc.py
@@ -1,14 +1,14 @@
 #!/usr/bin/env python3
-# 
+#
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
-from typing import Any, Mapping
+from typing import Any, Mapping, Optional
 
 from qiling import Qiling
 from qiling.const import QL_INTERCEPT
 
-def dxeapi(params: Mapping[str, Any] = {}):
+def dxeapi(params: Optional[Mapping[str, Any]] = None, passthru: bool = False):
     def decorator(func):
         def wrapper(ql: Qiling):
             pc = ql.arch.regs.arch_pc
@@ -18,7 +18,7 @@ def wrapper(ql: Qiling):
             onenter = ql.os.user_defined_api[QL_INTERCEPT.ENTER].get(fname)
             onexit = ql.os.user_defined_api[QL_INTERCEPT.EXIT].get(fname)
 
-            return ql.os.call(pc, f, params, onenter, onexit)
+            return ql.os.call(pc, f, params or {}, onenter, onexit, passthru)
 
         return wrapper
 
diff --git a/qiling/os/uefi/protocols/EfiSimpleTextInProtocol.py b/qiling/os/uefi/protocols/EfiSimpleTextInProtocol.py
new file mode 100644
index 000000000..1a8e3eedd
--- /dev/null
+++ b/qiling/os/uefi/protocols/EfiSimpleTextInProtocol.py
@@ -0,0 +1,56 @@
+#!/usr/bin/env python3
+#
+# Cross Platform and Multi Architecture Advanced Binary Emulation Framework
+#
+
+from qiling.os.const import *
+from qiling.os.uefi.fncc import dxeapi
+from qiling.os.uefi.utils import *
+from qiling.os.uefi.ProcessorBind import *
+from qiling.os.uefi.UefiBaseType import EFI_STATUS, EFI_EVENT
+
+
+# @see: MdePkg/Include/Protocol/SimpleTextIn.h
+class EFI_INPUT_KEY(STRUCT):
+    _fields_ = [
+        ('ScanCode',    UINT16),
+        ('UnicodeChar', CHAR16)
+    ]
+
+class EFI_SIMPLE_TEXT_INPUT_PROTOCOL(STRUCT):
+    EFI_SIMPLE_TEXT_INPUT_PROTOCOL = STRUCT
+
+    _fields_ = [
+        ('Reset',         FUNCPTR(EFI_STATUS, PTR(EFI_SIMPLE_TEXT_INPUT_PROTOCOL), BOOLEAN)),
+        ('ReadKeyStroke', FUNCPTR(EFI_STATUS, PTR(EFI_SIMPLE_TEXT_INPUT_PROTOCOL), PTR(EFI_INPUT_KEY))),
+        ('WaitForKey',    EFI_EVENT)
+    ]
+
+
+@dxeapi(params={
+    "This":	POINTER,              # IN PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL)
+    "ExtendedVerification": BOOL  # IN BOOLEAN
+})
+def hook_Input_Reset(ql: Qiling, address: int, params):
+    pass
+
+@dxeapi(params={
+    "This":	POINTER,  # IN PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL)
+    "Key": POINTER    # OUT PTR(EFI_INPUT_KEY)
+})
+def hook_Read_Key_Stroke(ql: Qiling, address: int, params):
+    pass
+
+
+def initialize(ql: Qiling, gIP: int):
+    descriptor = {
+        'struct': EFI_SIMPLE_TEXT_INPUT_PROTOCOL,
+        'fields': (
+            ('Reset',         hook_Input_Reset),
+            ('ReadKeyStroke', hook_Read_Key_Stroke),
+            ('WaitForKey',    None)
+        )
+    }
+
+    instance = init_struct(ql, gIP, descriptor)
+    instance.save_to(ql.mem, gIP)
diff --git a/qiling/os/uefi/protocols/EfiSimpleTextOutProtocol.py b/qiling/os/uefi/protocols/EfiSimpleTextOutProtocol.py
new file mode 100644
index 000000000..d69cd3a37
--- /dev/null
+++ b/qiling/os/uefi/protocols/EfiSimpleTextOutProtocol.py
@@ -0,0 +1,128 @@
+#!/usr/bin/env python3
+#
+# Cross Platform and Multi Architecture Advanced Binary Emulation Framework
+#
+
+from qiling.os.const import *
+from qiling.os.uefi.fncc import dxeapi
+from qiling.os.uefi.utils import *
+from qiling.os.uefi.ProcessorBind import *
+from qiling.os.uefi.UefiBaseType import EFI_STATUS
+
+
+# @see: MdePkg/Include/Protocol/SimpleTextOut.h
+class SIMPLE_TEXT_OUTPUT_MODE(STRUCT):
+    _fields_ = [
+        ("MaxMode",       INT32),
+        ("Mode",          INT32),
+        ("Attribute",     INT32),
+        ("CursorColumn",  INT32),
+        ("CursorRow",     INT32),
+        ("CursorVisible", BOOLEAN),
+    ]
+
+
+class EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL(STRUCT):
+    EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL = STRUCT
+
+    _fields_ = [
+        ("Reset",             FUNCPTR(EFI_STATUS, PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL), BOOLEAN)),
+        ("OutputString",      FUNCPTR(EFI_STATUS, PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL), PTR(CHAR16))),
+        ("TestString",        FUNCPTR(EFI_STATUS, PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL), PTR(CHAR16))),
+        ("QueryMode",         FUNCPTR(EFI_STATUS, PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL), UINTN, PTR(UINTN), PTR(UINTN))),
+        ("SetMode",           FUNCPTR(EFI_STATUS, PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL), UINTN)),
+        ("SetAttribute",      FUNCPTR(EFI_STATUS, PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL), UINTN)),
+        ("ClearScreen",       FUNCPTR(EFI_STATUS, PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL))),
+        ("SetCursorPosition", FUNCPTR(EFI_STATUS, PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL), UINTN, UINTN)),
+        ("EnableCursor",      FUNCPTR(EFI_STATUS, PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL), BOOLEAN)),
+        ("Mode",              PTR(SIMPLE_TEXT_OUTPUT_MODE))
+    ]
+
+
+@dxeapi(params={
+    "This":	POINTER,              # IN PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL)
+    "ExtendedVerification": BOOL  # IN BOOLEAN
+})
+def hook_TextReset(ql: Qiling, address: int, params):
+    pass
+
+@dxeapi(params={
+    "This":	  POINTER,  # IN PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL)
+    "String": WSTRING   # IN PTR(CHAR16)
+})
+def hook_OutputString(ql: Qiling, address: int, params):
+    print(params['String'])
+
+    return EFI_SUCCESS
+
+@dxeapi(params={
+    "This":	  POINTER,  # IN PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL)
+    "String": WSTRING   # IN PTR(CHAR16)
+})
+def hook_TestString(ql: Qiling, address: int, params):
+    pass
+
+@dxeapi(params={
+    "This":	POINTER,          # IN PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL)
+    "ModeNumber": ULONGLONG,  # IN UINTN
+    "Columns": POINTER,       # OUT PTR(UINTN)
+    "Rows": POINTER           # OUT PTR(UINTN)
+})
+def hook_QueryMode(ql: Qiling, address: int, params):
+    pass
+
+@dxeapi(params={
+    "This":	POINTER,         # IN PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL)
+    "ModeNumber": ULONGLONG  # IN UINTN
+})
+def hook_SetMode(ql: Qiling, address: int, params):
+    pass
+
+@dxeapi(params={
+    "This":	POINTER,        # IN PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL)
+    "Attribute": ULONGLONG  # IN UINTN
+})
+def hook_SetAttribute(ql: Qiling, address: int, params):
+    pass
+
+@dxeapi(params={
+    "This":	POINTER   # IN PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL)
+})
+def hook_ClearScreen(ql: Qiling, address: int, params):
+    pass
+
+@dxeapi(params={
+    "This":	POINTER,      # IN PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL)
+    "Column": ULONGLONG,  # IN UINTN
+    "Row": ULONGLONG      # IN UINTN
+})
+def hook_SetCursorPosition(ql: Qiling, address: int, params):
+    pass
+
+@dxeapi(params={
+    "This":	POINTER,  # IN PTR(EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL)
+    "Visible": BOOL   # IN BOOLEAN
+})
+def hook_EnableCursor(ql: Qiling, address: int, params):
+    pass
+
+
+def initialize(ql: Qiling, base: int):
+    descriptor = {
+        'struct': EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL,
+        'fields': (
+            ('Reset',             hook_TextReset),
+            ('OutputString',      hook_OutputString),
+            ('TestString',        hook_TestString),
+            ('QueryMode',         hook_QueryMode),
+            ('SetMode',           hook_SetMode),
+            ('SetAttribute',      hook_SetAttribute),
+            ('ClearScreen',       hook_ClearScreen),
+            ('SetCursorPosition', hook_SetCursorPosition),
+            ('EnableCursor',      hook_EnableCursor),
+            ('Mode',              None)
+        )
+    }
+
+    instance = init_struct(ql, base, descriptor)
+    instance.save_to(ql.mem, base)
diff --git a/qiling/os/uefi/st.py b/qiling/os/uefi/st.py
index b5fca9225..305c4664e 100644
--- a/qiling/os/uefi/st.py
+++ b/qiling/os/uefi/st.py
@@ -3,58 +3,81 @@
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
-from qiling import Qiling
+from __future__ import annotations
+
+from typing import TYPE_CHECKING
+
 from qiling.os.uefi import bs, rt, ds
 from qiling.os.uefi.context import UefiContext
 from qiling.os.uefi.utils import install_configuration_table
-from qiling.os.uefi.UefiSpec import EFI_SYSTEM_TABLE, EFI_BOOT_SERVICES, EFI_RUNTIME_SERVICES
+from qiling.os.uefi.UefiSpec import EFI_SYSTEM_TABLE, EFI_SIMPLE_TEXT_INPUT_PROTOCOL, EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL, EFI_BOOT_SERVICES, EFI_RUNTIME_SERVICES
+
+import qiling.os.uefi.protocols.EfiSimpleTextInProtocol as txt_in
+import qiling.os.uefi.protocols.EfiSimpleTextOutProtocol as txt_out
+
+
+if TYPE_CHECKING:
+    from qiling import Qiling
 
 # static mem layout:
 #
-#        +-- EFI_SYSTEM_TABLE ---------+
-#        |                             |
-#        | ...                         |
-#        | RuntimeServices*     -> (1) |
-#        | BootServices*        -> (2) |
-#        | NumberOfTableEntries        |
-#        | ConfigurationTable*  -> (4) |
-#        +-----------------------------+
-#    (1) +-- EFI_RUNTIME_SERVICES -----+
-#        |                             |
-#        | ...                         |
-#        +-----------------------------+
-#    (2) +-- EFI_BOOT_SERVICES --------+
-#        |                             |
-#        | ...                         |
-#        +-----------------------------+
-#    (3) +-- EFI_DXE_SERVICES ---------+
-#        |                             |
-#        | ...                         |
-#        +-----------------------------+
-#    (4) +-- EFI_CONFIGURATION_TABLE --+        of HOB_LIST
-#        | VendorGuid                  |
-#        | VendorTable*         -> (5) |
-#        +-----------------------------+
-#        +-- EFI_CONFIGURATION_TABLE --+        of DXE_SERVICE_TABLE
-#        | VendorGuid                  |
-#        | VendorTable*         -> (3) |
-#        +-----------------------------+
+#        +-- EFI_SYSTEM_TABLE -----------------+
+#        |                                     |
+#        | ...                                 |
+#        | ConIn*                       -> (1) |
+#        | ConOut*                      -> (2) |
+#        | RuntimeServices*             -> (3) |
+#        | BootServices*                -> (4) |
+#        | NumberOfTableEntries                |
+#        | ConfigurationTable*          -> (6) |
+#        +-------------------------------------+
+#    (1) +-- EFI_SIMPLE_TEXT_INPUT_PROTOCOL ---+
+#        |                                     |
+#        | ...                                 |
+#        +-------------------------------------+
+#    (2) +-- EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL --+
+#        |                                     |
+#        | ...                                 |
+#        +-------------------------------------+
+#    (3) +-- EFI_RUNTIME_SERVICES -------------+
+#        |                                     |
+#        | ...                                 |
+#        +-------------------------------------+
+#    (4) +-- EFI_BOOT_SERVICES ----------------+
+#        |                                     |
+#        | ...                                 |
+#        +-------------------------------------+
+#    (5) +-- EFI_DXE_SERVICES -----------------+
+#        |                                     |
+#        | ...                                 |
+#        +-------------------------------------+
+#    (6) +-- EFI_CONFIGURATION_TABLE ----------+        of HOB_LIST
+#        | VendorGuid                          |
+#        | VendorTable*                 -> (7) |
+#        +-------------------------------------+
+#        +-- EFI_CONFIGURATION_TABLE ----------+        of DXE_SERVICE_TABLE
+#        | VendorGuid                          |
+#        | VendorTable*                 -> (5) |
+#        +-------------------------------------+
 #
 #        ... the remainder of the chunk may be used for additional EFI_CONFIGURATION_TABLE entries
-
+#
 # dynamically allocated (context.conf_table_data_ptr):
 #
-#    (5) +-- VOID* --------------------+
-#        | ...                         |
-#        +-----------------------------+
+#    (7) +-- VOID* ----------------------------+
+#        | ...                                 |
+#        +-------------------------------------+
+
 
 def initialize(ql: Qiling, context: UefiContext, gST: int):
     ql.loader.gST = gST
 
-    gBS = gST + EFI_SYSTEM_TABLE.sizeof()       # boot services
-    gRT = gBS + EFI_BOOT_SERVICES.sizeof()      # runtime services
-    gDS = gRT + EFI_RUNTIME_SERVICES.sizeof()   # dxe services
-    cfg = gDS + ds.EFI_DXE_SERVICES.sizeof()    # configuration tables array
+    sti = gST + EFI_SYSTEM_TABLE.sizeof()                 # input protocols
+    sto = sti + EFI_SIMPLE_TEXT_INPUT_PROTOCOL.sizeof()   # output protocols
+    gRT = sto + EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL.sizeof()  # runtime services
+    gBS = gRT + EFI_RUNTIME_SERVICES.sizeof()             # boot services
+    gDS = gBS + EFI_BOOT_SERVICES.sizeof()                # dxe services
+    cfg = gDS + ds.EFI_DXE_SERVICES.sizeof()              # configuration tables array
 
     ql.log.info(f'Global tables:')
     ql.log.info(f' | gST   {gST:#010x}')
@@ -63,11 +86,16 @@ def initialize(ql: Qiling, context: UefiContext, gST: int):
     ql.log.info(f' | gDS   {gDS:#010x}')
     ql.log.info(f'')
 
+    txt_in.initialize(ql, sti)
+    txt_out.initialize(ql, sto)
+
     bs.initialize(ql, gBS)
     rt.initialize(ql, gRT)
     ds.initialize(ql, gDS)
 
     EFI_SYSTEM_TABLE(
+        ConIn = sti,
+        ConOut = sto,
         RuntimeServices = gRT,
         BootServices = gBS,
         NumberOfTableEntries = 0,
@@ -79,4 +107,4 @@ def initialize(ql: Qiling, context: UefiContext, gST: int):
 
 __all__ = [
     'initialize'
-]
\ No newline at end of file
+]
diff --git a/qiling/os/windows/const.py b/qiling/os/windows/const.py
index d001925e5..6cf06d0ef 100644
--- a/qiling/os/windows/const.py
+++ b/qiling/os/windows/const.py
@@ -38,6 +38,7 @@
 STATUS_PROCEDURE_NOT_FOUND = 0xC000007A
 STATUS_DLL_NOT_FOUND = 0xC0000135
 STATUS_PORT_NOT_SET = 0xC0000353
+STATUS_STACK_BUFFER_OVERRUN = 0xC0000409
 STATUS_NO_YIELD_PERFORMED = 0x40000024
 # ...
 
@@ -638,6 +639,7 @@
 ProcessDebugObjectHandle = 30
 ProcessDebugFlags = 31
 ProcessExecuteFlags = 34
+ProcessCookie = 36
 ProcessImageInformation = 37
 ProcessMitigationPolicy = 52
 ProcessFaultInformation = 63
diff --git a/qiling/os/windows/dlls/kernel32/errhandlingapi.py b/qiling/os/windows/dlls/kernel32/errhandlingapi.py
index 4dba7efb6..ece53aa80 100644
--- a/qiling/os/windows/dlls/kernel32/errhandlingapi.py
+++ b/qiling/os/windows/dlls/kernel32/errhandlingapi.py
@@ -44,15 +44,6 @@ def hook_GetLastError(ql: Qiling, address: int, params):
 def hook_SetLastError(ql: Qiling, address: int, params):
     ql.os.last_error = params['dwErrCode']
 
-# LONG UnhandledExceptionFilter(
-#   _EXCEPTION_POINTERS *ExceptionInfo
-# );
-@winsdkapi(cc=STDCALL, params={
-    'ExceptionInfo' : POINTER
-})
-def hook_UnhandledExceptionFilter(ql: Qiling, address: int, params):
-    return 1
-
 # UINT SetErrorMode(
 #   UINT uMode
 # );
@@ -63,33 +54,6 @@ def hook_SetErrorMode(ql: Qiling, address: int, params):
     # TODO maybe this need a better implementation
     return 0
 
-# __analysis_noreturn VOID RaiseException(
-#   DWORD           dwExceptionCode,
-#   DWORD           dwExceptionFlags,
-#   DWORD           nNumberOfArguments,
-#   const ULONG_PTR *lpArguments
-# );
-@winsdkapi(cc=STDCALL, params={
-    'dwExceptionCode'    : DWORD,
-    'dwExceptionFlags'   : DWORD,
-    'nNumberOfArguments' : DWORD,
-    'lpArguments'        : POINTER
-})
-def hook_RaiseException(ql: Qiling, address: int, params):
-    nNumberOfArguments = params['nNumberOfArguments']
-    lpArguments = params['lpArguments']
-
-    handle = ql.os.handle_manager.search("TopLevelExceptionHandler")
-
-    if handle is None:
-        ql.log.warning(f'RaiseException: top level exception handler not found')
-        return
-
-    exception_handler = handle.obj
-    args = [(PARAM_INTN, ql.mem.read_ptr(lpArguments + i * ql.arch.pointersize)) for i in range(nNumberOfArguments)] if lpArguments else []
-
-    ql.os.fcall.call_native(exception_handler, args, None)
-
 # PVOID AddVectoredExceptionHandler(
 #   ULONG                       First,
 #   PVECTORED_EXCEPTION_HANDLER Handler
@@ -151,3 +115,47 @@ def hook_RemoveVectoredExceptionHandler(ql: Qiling, address: int, params):
     hook.remove()
 
     return 0
+
+# VOID RaiseException(
+#   DWORD     dwExceptionCode,
+#   DWORD     dwExceptionFlags,
+#   DWORD     nNumberOfArguments,
+#   CONST ULONG_PTR* lpArguments
+# );
+@winsdkapi(cc=STDCALL, params={
+    'dwExceptionCode': DWORD,
+    'dwExceptionFlags': DWORD,
+    'nNumberOfArguments': DWORD,
+    'lpArguments': PVOID
+}, passthru=True)
+def hook_RaiseException(ql: Qiling, address: int, params):
+    # On x86_64, RaiseException will call RtlRaiseException,
+    # which calls the exception dispatcher directly. The native
+    # exception dispatching code mostly works correctly
+    # for software exceptions, so we shall simply continue
+    # through to the native dispatcher in this case.
+    if ql.arch.type is not QL_ARCH.X86:
+        return
+
+    # On x86, the situation is different. RtlRaiseException
+    # will call ZwRaiseException, which uses a syscall.
+    # However, Qiling doesn't really support Windows syscalls
+    # right now.
+    # We will treat all exceptions as unhandled exceptions,
+    # which is better than nothing.
+    # TODO: Get kernel exception dispatching working properly,
+    # then first-chance software exceptions, SEH, and C++
+    # exceptions can work on 32-bit Windows too.
+    nNumberOfArguments = params['nNumberOfArguments']
+    lpArguments = params['lpArguments']
+
+    handle = ql.os.handle_manager.search("TopLevelExceptionHandler")
+
+    if handle is None:
+        ql.log.warning(f'RaiseException: top level exception handler not found')
+        return
+
+    exception_handler = handle.obj
+    args = [(PARAM_INTN, ql.mem.read_ptr(lpArguments + i * ql.arch.pointersize)) for i in range(nNumberOfArguments)] if lpArguments else []
+
+    ql.os.fcall.call_native(exception_handler, args, None)
diff --git a/qiling/os/windows/dlls/kernel32/heapapi.py b/qiling/os/windows/dlls/kernel32/heapapi.py
index c48e5aa8f..871a933fd 100644
--- a/qiling/os/windows/dlls/kernel32/heapapi.py
+++ b/qiling/os/windows/dlls/kernel32/heapapi.py
@@ -49,6 +49,17 @@ def hook_HeapCreate(ql: Qiling, address: int, params):
 
     return ql.os.heap.alloc(dwInitialSize)
 
+def _HeapAlloc(ql: Qiling, address: int, params):
+    dwFlags = params["dwFlags"]
+    dwBytes = params["dwBytes"]
+
+    ptr = ql.os.heap.alloc(dwBytes)
+
+    if ptr and (dwFlags & HEAP_ZERO_MEMORY):
+        __zero_mem(ql.mem, ptr, dwBytes)
+
+    return ptr
+
 # DECLSPEC_ALLOCATOR LPVOID HeapAlloc(
 #   HANDLE hHeap,
 #   DWORD  dwFlags,
@@ -60,15 +71,45 @@ def hook_HeapCreate(ql: Qiling, address: int, params):
     'dwBytes' : SIZE_T
 })
 def hook_HeapAlloc(ql: Qiling, address: int, params):
-    dwFlags = params["dwFlags"]
-    dwBytes = params["dwBytes"]
+    return _HeapAlloc(ql, address, params)
 
-    ptr = ql.os.heap.alloc(dwBytes)
+# DECLSPEC_ALLOCATOR LPVOID HeapReAlloc(
+#   HANDLE                 hHeap,
+#   DWORD                  dwFlags,
+#   _Frees_ptr_opt_ LPVOID lpMem,
+#   SIZE_T                 dwBytes
+# );
+@winsdkapi(cc=STDCALL, params={
+    'hHeap'   : HANDLE,
+    'dwFlags' : DWORD,
+    'lpMem': LPVOID,
+    'dwBytes' : SIZE_T
+})
+def hook_HeapReAlloc(ql: Qiling, address: int, params):
+    base = params["lpMem"]
+    newSize = params["dwBytes"]
 
-    if ptr and (dwFlags & HEAP_ZERO_MEMORY):
-        __zero_mem(ql.mem, ptr, dwBytes)
+    if not base:
+        return _HeapAlloc(ql, address, params)
+    
+    if newSize == 0:
+        ql.os.heap.free(base)
+        
+        return 0
 
-    return ptr
+    oldSize = ql.os.heap.size(base)
+    oldData = bytes(ql.mem.read(base, oldSize))
+    
+    ql.os.heap.free(base)
+
+    if newSize < oldSize:
+        oldData = oldData[0:newSize]
+
+    newBase = ql.os.heap.alloc(newSize)
+    if newBase:
+        ql.mem.write(newBase, oldData)
+
+    return newBase
 
 # SIZE_T HeapSize(
 #   HANDLE  hHeap,
@@ -120,3 +161,29 @@ def hook_HeapSetInformation(ql: Qiling, address: int, params):
 @winsdkapi(cc=STDCALL, params={})
 def hook_GetProcessHeap(ql: Qiling, address: int, params):
     return ql.os.heap.start_address
+
+# BOOL HeapValidate(
+#   HANDLE hHeap,
+#   DWORD  dwFlags,
+#   LPCVOID lpMem
+# );
+@winsdkapi(cc=STDCALL, params={
+    'hHeap': PVOID,
+    'dwFlags': DWORD,
+    'lpMem': PVOID
+})
+def hook_HeapValidate(ql: Qiling, address: int, params):
+    hHeap = params['hHeap']
+    lpMem = params['lpMem']
+
+    if not hHeap:
+        return 0
+    
+    # TODO: Maybe _find is a heap manager implementation
+    # detail, in which case we shouldn't rely on it.
+    chunk = ql.os.heap._find(lpMem)
+
+    if not chunk:
+        return 0
+    
+    return chunk.inuse
diff --git a/qiling/os/windows/dlls/kernel32/winbase.py b/qiling/os/windows/dlls/kernel32/winbase.py
index 6fe624b31..0c5c1d122 100644
--- a/qiling/os/windows/dlls/kernel32/winbase.py
+++ b/qiling/os/windows/dlls/kernel32/winbase.py
@@ -159,24 +159,6 @@ def hook__lwrite(ql: Qiling, address: int, params):
 def hook_FatalExit(ql: Qiling, address: int, params):
     ql.emu_stop()
 
-# PVOID EncodePointer(
-#  _In_ PVOID Ptr
-# );
-@winsdkapi(cc=STDCALL, params={
-    'Ptr' : PVOID
-})
-def hook_EncodePointer(ql: Qiling, address: int, params):
-    return params['Ptr']
-
-# PVOID DecodePointer(
-#  _In_ PVOID Ptr
-# );
-@winsdkapi(cc=STDCALL, params={
-    'Ptr' : PVOID
-})
-def hook_DecodePointer(ql: Qiling, address: int, params):
-    return params['Ptr']
-
 # UINT WinExec(
 #   LPCSTR lpCmdLine,
 #   UINT   uCmdShow
diff --git a/qiling/os/windows/dlls/kernel32/winnls.py b/qiling/os/windows/dlls/kernel32/winnls.py
index 296fe9999..7acebefe2 100644
--- a/qiling/os/windows/dlls/kernel32/winnls.py
+++ b/qiling/os/windows/dlls/kernel32/winnls.py
@@ -80,18 +80,19 @@ def hook_IsValidCodePage(ql: Qiling, address: int, params):
     return 1
 
 def __LCMapString(ql: Qiling, address: int, params, wstring: bool):
-    lpSrcStr: str = params["lpSrcStr"]
+    lpSrcStr: int = params["lpSrcStr"]
+    cchSrc: int = params["cchSrc"]
     lpDestStr: int = params["lpDestStr"]
     cchDest: int = params["cchDest"]
 
-    enc = "utf-16le" if wstring else "utf-8"
-    res = f'{lpSrcStr}\x00'
+    char_size = 2 if wstring else 1
+    byte_count = cchSrc * char_size
 
     if cchDest and lpDestStr:
-        # TODO maybe do some other check, for now is working
-        ql.mem.write(lpDestStr, res.encode(enc))
+        source_bytes = ql.mem.read(lpSrcStr, byte_count)
+        ql.mem.write(lpDestStr, bytes(source_bytes))
 
-    return len(res)
+    return cchSrc
 
 # int LCMapStringW(
 #   LCID    Locale,
@@ -104,9 +105,9 @@ def __LCMapString(ql: Qiling, address: int, params, wstring: bool):
 @winsdkapi(cc=STDCALL, params={
     'Locale'     : LCID,
     'dwMapFlags' : DWORD,
-    'lpSrcStr'   : LPCWSTR,
+    'lpSrcStr'   : POINTER,
     'cchSrc'     : INT,
-    'lpDestStr'  : LPWSTR,
+    'lpDestStr'  : POINTER,
     'cchDest'    : INT
 })
 def hook_LCMapStringW(ql: Qiling, address: int, params):
@@ -123,9 +124,9 @@ def hook_LCMapStringW(ql: Qiling, address: int, params):
 @winsdkapi(cc=STDCALL, params={
     'Locale'     : LCID,
     'dwMapFlags' : DWORD,
-    'lpSrcStr'   : LPCSTR,
+    'lpSrcStr'   : POINTER,
     'cchSrc'     : INT,
-    'lpDestStr'  : LPSTR,
+    'lpDestStr'  : POINTER,
     'cchDest'    : INT
 })
 def hook_LCMapStringA(ql: Qiling, address: int, params):
@@ -145,9 +146,9 @@ def hook_LCMapStringA(ql: Qiling, address: int, params):
 @winsdkapi(cc=STDCALL, params={
     'lpLocaleName'         : LPCWSTR,
     'dwMapFlags'           : DWORD,
-    'lpSrcStr'             : LPCWSTR,
+    'lpSrcStr'             : POINTER,
     'cchSrc'               : INT,
-    'lpDestStr'            : LPWSTR,
+    'lpDestStr'            : POINTER,
     'cchDest'              : INT,
     'lpVersionInformation' : LPNLSVERSIONINFO,
     'lpReserved'           : LPVOID,
diff --git a/qiling/os/windows/dlls/msvcrt.py b/qiling/os/windows/dlls/msvcrt.py
index 2db18455f..e0ad4bebf 100644
--- a/qiling/os/windows/dlls/msvcrt.py
+++ b/qiling/os/windows/dlls/msvcrt.py
@@ -10,7 +10,7 @@
 from qiling.exception import QlErrorNotImplemented
 from qiling.os.const import *
 from qiling.os.windows.fncc import *
-from qiling.os.windows.const import LOCALE
+from qiling.os.windows.const import *
 from qiling.os.windows.handle import Handle
 
 # void __set_app_type (
@@ -135,10 +135,9 @@ def hook__controlfp(ql: Qiling, address: int, params):
 # );
 @winsdkapi(cc=CDECL, params={
     'func' : POINTER
-})
+}, passthru=True)
 def hook_atexit(ql: Qiling, address: int, params):
-    ret = 0
-    return ret
+    return
 
 # char*** __p__environ(void)
 @winsdkapi(cc=CDECL, params={})
@@ -174,17 +173,6 @@ def hook_puts(ql: Qiling, address: int, params):
 def hook__cexit(ql: Qiling, address: int, params):
     pass
 
-# void __cdecl _initterm(
-#    PVFV *,
-#    PVFV *
-# );
-@winsdkapi(cc=CDECL, params={
-    'pfbegin' : POINTER,
-    'pfend'   : POINTER
-})
-def hook__initterm(ql: Qiling, address: int, params):
-    return 0
-
 # void exit(
 #    int const status
 # );
@@ -194,17 +182,6 @@ def hook__initterm(ql: Qiling, address: int, params):
 def hook_exit(ql: Qiling, address: int, params):
     ql.emu_stop()
 
-# int __cdecl _initterm_e(
-#    PVFV *,
-#    PVFV *
-# );
-@winsdkapi(cc=CDECL, params={
-    'pfbegin' : POINTER,
-    'pfend'   : POINTER
-})
-def hook__initterm_e(ql: Qiling, address: int, params):
-    return 0
-
 # char***    __cdecl __p___argv (void);
 @winsdkapi(cc=CDECL, params={})
 def hook___p___argv(ql: Qiling, address: int, params):
@@ -233,20 +210,12 @@ def hook___p___argc(ql: Qiling, address: int, params):
     return ret
 
 # TODO: this one belongs to ucrtbase.dll
-@winsdkapi(cc=CDECL, params={})
+@winsdkapi(cc=CDECL, params={}, passthru=True)
 def hook__get_initial_narrow_environment(ql: Qiling, address: int, params):
-    ret = 0
-
-    for i, (k, v) in enumerate(ql.env.items()):
-        entry = bytes(f'{k}={v}', 'ascii') + b'\x00'
-        p_entry = ql.os.heap.alloc(len(entry))
-
-        ql.mem.write(p_entry, entry)
-
-        if i == 0:
-            ret = p_entry
-
-    return ret
+    # If the native version of this function does not
+    # get to run, then debug versions of the CRT DLLs can fail
+    # their initialization.
+    return
 
 # int sprintf ( char * str, const char * format, ... );
 @winsdkapi(cc=CDECL, params={
@@ -303,13 +272,6 @@ def hook_wprintf(ql: Qiling, address: int, params):
 
     return count
 
-# MSVCRT_FILE * CDECL MSVCRT___acrt_iob_func(unsigned idx)
-@winsdkapi(cc=CDECL, params={
-    'idx': UINT
-})
-def hook___acrt_iob_func(ql: Qiling, address: int, params):
-    return 0
-
 def __stdio_common_vfprintf(ql: Qiling, address: int, params, wstring: bool):
     format = params['_Format']
     arglist = params['_ArgList']
@@ -368,6 +330,18 @@ def __stdio_common_vsprintf(ql: Qiling, address: int, params, wstring: bool):
 def hook___stdio_common_vsprintf(ql: Qiling, address: int, params):
     return __stdio_common_vsprintf(ql, address, params, False)
 
+@winsdkapi(cc=CDECL, params={
+    '_Options'     : PARAM_INT64,
+    '_Buffer'      : POINTER,
+    '_BufferCount' : SIZE_T,
+    '_MaxCount'    : SIZE_T,
+    '_Format'      : STRING,
+    '_Locale'      : DWORD,
+    '_ArgList'     : POINTER
+})
+def hook___stdio_common_vsnprintf(ql: Qiling, address: int, params):
+    return __stdio_common_vsprintf(ql, address, params, False)
+
 @winsdkapi(cc=CDECL, params={
     '_Options'     : PARAM_INT64,
     '_Buffer'      : POINTER,
@@ -379,6 +353,18 @@ def hook___stdio_common_vsprintf(ql: Qiling, address: int, params):
 def hook___stdio_common_vswprintf(ql: Qiling, address: int, params):
     return __stdio_common_vsprintf(ql, address, params, True)
 
+@winsdkapi(cc=CDECL, params={
+    '_Options'     : PARAM_INT64,
+    '_Buffer'      : POINTER,
+    '_BufferCount' : SIZE_T,
+    '_MaxCount'    : SIZE_T,
+    '_Format'      : WSTRING,
+    '_Locale'      : DWORD,
+    '_ArgList'     : POINTER
+})
+def hook___stdio_common_vsnwprintf(ql: Qiling, address: int, params):
+    return __stdio_common_vsprintf(ql, address, params, True)
+
 # all the "_s" versions are aliases to their non-"_s" counterparts
 
 @winsdkapi(cc=CDECL, params={
@@ -412,6 +398,18 @@ def hook___stdio_common_vfwprintf_s(ql: Qiling, address: int, params):
 def hook___stdio_common_vsprintf_s(ql: Qiling, address: int, params):
     return hook___stdio_common_vsprintf.__wrapped__(ql, address, params)
 
+@winsdkapi(cc=CDECL, params={
+    '_Options'     : PARAM_INT64,
+    '_Buffer'      : POINTER,
+    '_BufferCount' : SIZE_T,
+    '_MaxCount'    : SIZE_T,
+    '_Format'      : STRING,
+    '_Locale'      : DWORD,
+    '_ArgList'     : POINTER
+})
+def hook___stdio_common_vsnprintf_s(ql: Qiling, address: int, params):
+    return hook___stdio_common_vsnprintf.__wrapped__(ql, address, params)
+
 @winsdkapi(cc=CDECL, params={
     '_Options'     : PARAM_INT64,
     '_Buffer'      : POINTER,
@@ -423,6 +421,18 @@ def hook___stdio_common_vsprintf_s(ql: Qiling, address: int, params):
 def hook___stdio_common_vswprintf_s(ql: Qiling, address: int, params):
     return hook___stdio_common_vswprintf.__wrapped__(ql, address, params)
 
+@winsdkapi(cc=CDECL, params={
+    '_Options'     : PARAM_INT64,
+    '_Buffer'      : POINTER,
+    '_BufferCount' : SIZE_T,
+    '_MaxCount'    : SIZE_T,
+    '_Format'      : WSTRING,
+    '_Locale'      : DWORD,
+    '_ArgList'     : POINTER
+})
+def hook___stdio_common_vsnwprintf_s(ql: Qiling, address: int, params):
+    return hook___stdio_common_vsnwprintf.__wrapped__(ql, address, params)
+
 @winsdkapi(cc=CDECL, params={})
 def hook___lconv_init(ql: Qiling, address: int, params):
     return 0
@@ -478,43 +488,42 @@ def hook_strncmp(ql: Qiling, address: int, params):
 
     return result
 
-def __malloc(ql: Qiling, address: int, params):
-    size = params['size']
-
-    return ql.os.heap.alloc(size)
-
 @winsdkapi(cc=CDECL, params={
     'size' : UINT
-})
+}, passthru=True)
 def hook__malloc_base(ql: Qiling, address: int, params):
-    return __malloc(ql, address, params)
+    return
 
 # void* malloc（unsigned int size)
 @winsdkapi(cc=CDECL, params={
     'size' : UINT
-})
+}, passthru=True)
 def hook_malloc(ql: Qiling, address: int, params):
-    size = params['size']
-
-    return ql.os.heap.alloc(size)
-
-def __free(ql: Qiling, address: int, params):
-    address = params['address']
+    return
 
-    ql.os.heap.free(address)
+# void* __cdecl _realloc_base(
+#     void*  const block,
+#     size_t const size
+#     )
+@winsdkapi(cc=CDECL, params={
+    'block' : POINTER,
+    'size' : UINT
+}, passthru=True)
+def hook__realloc_base(ql: Qiling, address: int, params):
+    return
 
 @winsdkapi(cc=CDECL, params={
     'address': POINTER
-})
+}, passthru=True)
 def hook__free_base(ql: Qiling, address: int, params):
-    return __free(ql, address, params)
+    return
 
 # void* free（void *address)
 @winsdkapi(cc=CDECL, params={
     'address': POINTER
-})
+}, passthru=True)
 def hook_free(ql: Qiling, address: int, params):
-    return __free(ql, address, params)
+    return
 
 # _onexit_t _onexit(
 #    _onexit_t function
@@ -530,6 +539,27 @@ def hook__onexit(ql: Qiling, address: int, params):
 
     return addr
 
+# _onexit_t __dllonexit(
+#    _onexit_t func,
+#    _PVFV **  pbegin,
+#    _PVFV **  pend
+#    );
+@winsdkapi(cc=STDCALL, params={
+    'function': POINTER,
+    'pbegin': POINTER,
+    'pend': POINTER
+})
+def hook___dllonexit(ql: Qiling, address: int, params):
+    function = params['function']
+
+    if function:
+        addr = ql.os.heap.alloc(ql.arch.pointersize)
+        ql.mem.write_ptr(addr, function)
+
+        return addr
+    
+    return 0
+
 # void *memset(
 #    void *dest,
 #    int c,
@@ -539,32 +569,16 @@ def hook__onexit(ql: Qiling, address: int, params):
     'dest'  : POINTER,
     'c'     : INT,
     'count' : SIZE_T
-})
+}, passthru=True)
 def hook_memset(ql: Qiling, address: int, params):
-    dest = params["dest"]
-    c = params["c"]
-    count = params["count"]
-
-    ql.mem.write(dest, bytes([c] * count))
-
-    return dest
-
-def __calloc(ql: Qiling, address: int, params):
-    num = params['num']
-    size = params['size']
-
-    count = num * size
-    ret = ql.os.heap.alloc(count)
-    ql.mem.write(ret, bytes([0] * count))
-
-    return ret
+    return
 
 @winsdkapi(cc=CDECL, params={
     'num'  : SIZE_T,
     'size' : SIZE_T
-})
+}, passthru=True)
 def hook__calloc_base(ql: Qiling, address: int, params):
-    return __calloc(ql, address, params)
+    return
 
 # void *calloc(
 #    size_t num,
@@ -573,9 +587,9 @@ def hook__calloc_base(ql: Qiling, address: int, params):
 @winsdkapi(cc=CDECL, params={
     'num'  : SIZE_T,
     'size' : SIZE_T
-})
+}, passthru=True)
 def hook_calloc(ql: Qiling, address: int, params):
-    return __calloc(ql, address, params)
+    return
 
 # void * memmove(
 #   void *dest,
@@ -586,12 +600,9 @@ def hook_calloc(ql: Qiling, address: int, params):
     'dest' : POINTER,
     'src'  : POINTER,
     'num'  : SIZE_T
-})
+}, passthru=True)
 def hook_memmove(ql: Qiling, address: int, params):
-    data = ql.mem.read(params['src'], params['num'])
-    ql.mem.write(params['dest'], bytes(data))
-
-    return params['dest']
+    return
 
 # int _ismbblead(
 #    unsigned int c
@@ -644,3 +655,18 @@ def hook__time64(ql: Qiling, address: int, params):
         ql.mem.write_ptr(dst, time_wasted, 8)
 
     return time_wasted
+
+# void abort( void );
+@winsdkapi(cc=CDECL, params={})
+def hook_abort(ql: Qiling, address: int, params):
+    # During testing, it was found that programs terminating abnormally
+    # via abort() terminated with exit code=STATUS_STACK_BUFFER_OVERRUN.
+    # According to Microsoft's devblog, this does not necessarily mean
+    # that a stack buffer overrun occurred.
+    # Rather, it can indicate abnormal program termination in a variety of
+    # situations, including abort().
+    # https://devblogs.microsoft.com/oldnewthing/20190108-00/?p=100655
+    # 
+    ql.os.exit_code = STATUS_STACK_BUFFER_OVERRUN
+
+    ql.emu_stop()
\ No newline at end of file
diff --git a/qiling/os/windows/dlls/ntdll.py b/qiling/os/windows/dlls/ntdll.py
index 92ba4a84b..16956f73f 100644
--- a/qiling/os/windows/dlls/ntdll.py
+++ b/qiling/os/windows/dlls/ntdll.py
@@ -17,6 +17,7 @@
 from qiling.os.windows import structs
 from qiling.os.windows import utils
 
+from unicorn.x86_const import *
 
 # void *memcpy(
 #    void *dest,
@@ -39,6 +40,7 @@ def hook_memcpy(ql: Qiling, address: int, params):
     return dest
 
 def _QueryInformationProcess(ql: Qiling, address: int, params):
+    handle = params["ProcessHandle"]
     flag = params["ProcessInformationClass"]
     obuf_ptr = params["ProcessInformation"]
     obuf_len = params['ProcessInformationLength']
@@ -68,6 +70,24 @@ def _QueryInformationProcess(ql: Qiling, address: int, params):
 
         res_data = bytes(pci_obj)
 
+    
+    elif flag == ProcessCookie:
+        hCurrentProcess = (1 << ql.arch.bits) - 1
+
+        if handle != hCurrentProcess:
+            # If a process attempts to query the cookie of another
+            # process, then QueryInformationProcess returns an error.
+            return STATUS_INVALID_PARAMETER
+
+        # TODO: Change this to something else,
+        # maybe a static randomly generated value.
+        res_data = ql.pack32(0x00000001)
+
+        if obuf_len != len(res_data):
+            # If the buffer length is not ULONG size
+            # then QueryInformationProcess returns an error.
+            return STATUS_INFO_LENGTH_MISMATCH
+    
     else:
         # TODO: support more info class ("flag") values
         ql.log.info(f'QueryInformationProcess: no implementation for info class {flag:#04x}')
@@ -452,4 +472,429 @@ def hook_wcsstr(ql: Qiling, address: int, params):
 @winsdkapi(cc=STDCALL, params={})
 def hook_CsrGetProcessId(ql: Qiling, address: int, params):
     pid = ql.os.profile["PROCESSES"].getint("csrss.exe", fallback=12345)
-    return pid
\ No newline at end of file
+    return pid
+
+# NTSYSAPI PVOID RtlPcToFileHeader(
+#   [in]  PVOID PcValue,
+#   [out] PVOID *BaseOfImage
+# );
+@winsdkapi(cc=STDCALL, params={
+    'PcValue'    : PVOID,
+    'BaseOfImage': PVOID
+})
+def hook_RtlPcToFileHeader(ql: Qiling, address: int, params):
+    pc = params["PcValue"]
+    base_of_image_ptr = params["BaseOfImage"]
+
+    containing_image = ql.loader.find_containing_image(pc)
+
+    base_addr = containing_image.base if containing_image else 0
+
+    ql.mem.write_ptr(base_of_image_ptr, base_addr)
+    return base_addr
+
+def _FindImageBaseAndFunctionTable(ql: Qiling, control_pc: int, image_base_ptr: int):
+    """
+    Helper function to locate a containing image for `control_pc` as well as its
+    function table, while writing the image base to `image_base_ptr` (if non-zero).
+    Returns:
+        (base_addr, function_table_addr)
+    if no image is found, otherwise
+        (0, 0)
+    """
+    containing_image = ql.loader.find_containing_image(control_pc)
+
+    if containing_image:
+        base_addr = containing_image.base
+    else:
+        base_addr = 0
+
+    # Write base address to the ImageBase pointer, if provided
+    if image_base_ptr != 0:
+        ql.mem.write_ptr(image_base_ptr, base_addr)
+
+    # If we don’t have a valid base, abort now
+    if base_addr == 0:
+        return 0, 0
+
+    # Look up the function-table RVA and compute the absolute address
+    function_table_rva = ql.loader.function_table_lookup.get(base_addr)
+    function_table_addr = base_addr + function_table_rva if function_table_rva else 0
+
+    return base_addr, function_table_addr
+
+# NTSYSAPI PRUNTIME_FUNCTION RtlLookupFunctionEntry(
+#   [in]  DWORD64               ControlPc,
+#   [out] PDWORD64              ImageBase,
+#   [out] PUNWIND_HISTORY_TABLE HistoryTable
+# );
+@winsdkapi(cc=STDCALL, params={
+    'ControlPc': PVOID,
+    'ImageBase': PVOID,
+    'HistoryTable': PVOID
+})
+def hook_RtlLookupFunctionEntry(ql: Qiling, address: int, params):
+    control_pc = params["ControlPc"]
+    image_base_ptr = params["ImageBase"]
+
+    # TODO: Make use of the history table to optimize this function.
+    # Alternatively, we could add caching to the loader, seeing as the
+    # loader is responsible for lookups in the function table.
+
+    # For simplicity, we are going to ignore the history table.
+    # history_table_ptr = params["HistoryTable"]
+
+    # This function should not be getting called on x86.
+    if ql.arch.type is QL_ARCH.X86:
+        raise QlErrorNotImplemented("RtlLookupFunctionEntry is not implemented for x86")
+
+    base_addr, function_table_addr = _FindImageBaseAndFunctionTable(ql, control_pc, image_base_ptr)
+
+    # If no function table was found, abort.
+    if function_table_addr == 0:
+        return 0
+
+    # Look up the RUNTIME_FUNCTION entry; we are interested in the index in the table
+    # so that we can compute the address.
+    runtime_function_idx, runtime_function = ql.loader.lookup_function_entry(base_addr, control_pc)
+
+    # If a suitable function entry was found,
+    # compute its address and return.
+    if runtime_function:
+        return function_table_addr + runtime_function_idx * 12    # sizeof(RUNTIME_FUNCTION)
+    
+    return 0
+
+# NTSYSAPI
+# PRUNTIME_FUNCTION
+# RtlLookupFunctionTable (
+#     IN PVOID ControlPc,
+#     OUT PVOID *ImageBase,
+#     OUT PULONG SizeOfTable
+# );
+@winsdkapi(cc=STDCALL, params={
+    'ControlPc': PVOID,
+    'ImageBase': PVOID,
+    'SizeOfTable': PVOID
+})
+def hook_RtlLookupFunctionTable(ql: Qiling, address: int, params):
+    control_pc = params["ControlPc"]
+    image_base_ptr = params["ImageBase"]
+    size_of_table_ptr = params["SizeOfTable"]
+
+    # This function should not be getting called on x86.
+    if ql.arch.type is QL_ARCH.X86:
+        raise QlErrorNotImplemented("RtlLookupFunctionTable is not implemented for x86")
+
+    base_addr, function_table_addr = _FindImageBaseAndFunctionTable(ql, control_pc, image_base_ptr)
+
+    # If no function table was found, abort.
+    if function_table_addr == 0:
+        ql.mem.write_ptr(size_of_table_ptr, 0, 4)
+
+        return 0
+    
+    # If a valid pointer for the size was provided,
+    # we want to figure out the size of the table.
+    if size_of_table_ptr != 0:
+        # Look up the function table from the loader,
+        # and get the number of entries.
+        function_table = ql.loader.function_tables[base_addr]
+
+        # compute the total size of the table
+        size_of_table = len(function_table) * 12    # sizeof(RUNTIME_FUNCTION)
+
+        # Write the size to memory at the provided pointer.
+        ql.mem.write_ptr(size_of_table_ptr, size_of_table, 4)
+    
+    return function_table_addr
+
+@winsdkapi(cc=STDCALL, params={})
+def hook_LdrControlFlowGuardEnforced(ql: Qiling, address: int, params):
+    # There are some checks in ntdll for whether CFG is enabled.
+    # We simply bypass these checks by returning 0.
+    # May not be necessary, but we do it just in case.
+    return 0
+
+# NTSYSAPI
+# NTSTATUS
+# ZwRaiseException (
+#     IN PEXCEPTION_RECORD ExceptionRecord,
+#     IN PCONTEXT ContextRecord,
+#     IN BOOLEAN FirstChance
+# );
+@winsdkapi(cc=STDCALL, params={
+    'ExceptionRecord': PVOID,
+    'ContextRecord': PVOID,
+    'FirstChance': BOOLEAN
+}, passthru=True)
+def hook_ZwRaiseException(ql: Qiling, address: int, params):
+    exception_ptr = params['ExceptionRecord']
+    context_ptr = params['ContextRecord']
+    first_chance = params['FirstChance']
+
+    # The native ZwRaiseException simply uses a syscall to start
+    # the kernel exception dispatcher. However, Windows syscalls
+    # are not really working in Qiling right now.
+    # For now, we just provide a workaround for second-chance
+    # exceptions to work.
+    # TODO: Get some kind of solution for kernel exception
+    # dispatching. This is also needed for first-chance exceptions
+    # to work properly on 32-bit Windows.
+    if first_chance:
+        raise QlErrorNotImplemented("ZwRaiseException is not implemented for first-chance exceptions.")
+
+    # In Windows, an unhandled exception triggers the
+    # top-level unhandled exception filter, after which the process
+    # is terminated and error reporting services are called.
+    # Regardless of whether an unhandled exception filter is present,
+    # the process terminates with the same error code that was raised.
+
+    # Our strategy for this hook is to forward second-chance exceptions
+    # to the registered unhandled exception filter, if one exists.
+
+    if exception_ptr:
+        exception_code = ql.mem.read_ptr(exception_ptr, 4) # exception code is always DWORD
+        ql.log.debug(f"[ZwRaiseException] ExceptionCode: 0x{exception_code:08X}")
+    else:
+        ql.log.debug("[ZwRaiseException] ExceptionRecord is NULL")
+
+    ql.log.debug(f"  ContextRecord: 0x{context_ptr:016X}")
+    ql.log.debug(f"  FirstChance: {first_chance}")
+
+    handle = ql.os.handle_manager.search("TopLevelExceptionHandler")
+
+    if handle is None:
+        ql.log.debug(f'[ZwRaiseException] No top-level exception filter was found.')
+        ql.log.info(f'The process exited with code 0x{exception_code:08X}.')
+
+        ql.os.exit_code = exception_code
+        
+        ql.emu_stop()
+        return
+
+    ret_addr = ql.stack_read(0)
+
+    exception_filter = handle.obj
+
+    # allocate some memory for the EXCEPTION_POINTERS struct
+    epointers_struct = structs.make_exception_pointers(ql.arch.bits)
+    exception_pointers_ptr = ql.os.heap.alloc(epointers_struct.sizeof())
+
+    with epointers_struct.ref(ql.mem, exception_pointers_ptr) as epointers_obj:
+        epointers_obj.ExceptionRecord = exception_ptr
+        epointers_obj.ContextRecord = context_ptr
+
+    exception_filter = handle.obj
+    ql.log.debug(f'[ZwRaiseException] Resuming execution at the top-level exception filter at 0x{exception_filter:08X}.')
+
+    # Hack: We are going to fake that the caller of ZwRaiseException
+    # actually called the unhandled exception filter instead.
+
+    # We will create a hook which will be triggered when the unhandled
+    # exception filter returns, so that we may terminate execution.
+    def __post_exception_filter(ql: Qiling):
+        # Free the exception pointers struct we allocated earlier.
+        # Might not be needed, since we are going to terminate the process
+        # soon, but we might as well free it.
+        ql.os.heap.free(exception_pointers_ptr)
+
+        ql.log.debug(f'[ZwRaiseException] Returned from unhandled exception filter at 0x{exception_filter:08X}.')
+        ql.log.info(f'The process exited with code 0x{exception_code:08X}.')
+
+        ql.os.exit_code = exception_code
+
+        ql.emu_stop()
+
+    ql.hook_address(__post_exception_filter, ret_addr)
+
+    exception_filter_args = [(POINTER, exception_pointers_ptr)]
+
+    # Resume execution at the registered unhandled exception filter.
+    # If a program is using a custom unhandled exception filter as an anti-debugging
+    # trick, then the exception filter might not return.
+    
+    # TODO: This relies on the hook being marked 'passthru' so that Qiling
+    # doesn't rewind after it returns. However, this is not entirely intended
+    # behavior of passthru, so this is a bit of a hack. Maybe find some
+    # way to rewrite without passthru.
+    ql.os.fcall.call_native(exception_filter, exception_filter_args, ret_addr)
+
+# NTSTATUS EtwNotificationRegister(
+#   LPCGUID   ProviderGuid,
+#   ULONG     Type,
+#   PVOID     CallbackFunction,
+#   PVOID     CallbackContext,
+#   PVOID*    RegistrationHandle
+# );
+@winsdkapi(cc=STDCALL, params={
+    'ProviderGuid': PVOID,
+    'Type': DWORD,
+    'CallbackFunction': PVOID,
+    'CallbackContext': PVOID,
+    'RegistrationHandle': PVOID
+})
+def hook_EtwNotificationRegister(ql: Qiling, address: int, params):
+    reg_handle_ptr    = params['RegistrationHandle']
+
+    # It is very important to have a hook for this function
+    # because it is called by some Windows DLLs (sechost.dll,
+    # advapi32.dll) during initialization when the global
+    # CRT lock is held.
+    # If a DllMain aborts here, then the global CRT lock is never
+    # freed and any attempt to lock the global CRT lock *anywhere*
+    # will crash us.
+
+    # TODO: See if a more thorough implementation
+    # is needed for this function.
+
+    # For now, just create a dummy handle, and return it.
+    handle = Handle()
+    ql.os.handle_manager.append(handle)
+
+    if reg_handle_ptr:
+        ql.mem.write_ptr(reg_handle_ptr, handle.id)
+
+    return STATUS_SUCCESS
+
+# NTSYSAPI
+# VOID RtlRaiseException(
+#   PEXCEPTION_RECORD ExceptionRecord
+# );
+@winsdkapi(cc=STDCALL, params={
+    'ExceptionRecord': PVOID
+}, passthru=True)
+def hook_RtlRaiseException(ql: Qiling, address: int, params):
+    return
+
+# NTSYSAPI
+# PRUNTIME_FUNCTION RtlVirtualUnwind(
+#   DWORD  HandlerType,
+#   DWORD64 ImageBase,
+#   DWORD64 ControlPc,
+#   PRUNTIME_FUNCTION FunctionEntry,
+#   PCONTEXT ContextRecord,
+#   PVOID* HandlerData,
+#   PDWORD64 EstablisherFrame,
+#   PKNONVOLATILE_CONTEXT_POINTERS ContextPointers
+# );
+@winsdkapi(cc=STDCALL, params={
+    'HandlerType': DWORD,
+    'ImageBase': PVOID,
+    'ControlPc': PVOID,
+    'FunctionEntry': PVOID,
+    'ContextRecord': PVOID,
+    'HandlerData': PVOID,
+    'EstablisherFrame': PVOID,
+    'ContextPointers': PVOID
+}, passthru=True)
+def hook_RtlVirtualUnwind(ql: Qiling, address: int, params):
+    return
+
+# NTSYSAPI
+# VOID RtlUnwindEx(
+#   PVOID               TargetFrame,
+#   PVOID               TargetIp,
+#   PEXCEPTION_RECORD   ExceptionRecord,
+#   PVOID               ReturnValue,
+#   PCONTEXT            OriginalContext,
+#   PUNWIND_HISTORY_TABLE HistoryTable
+# );
+@winsdkapi(cc=STDCALL, params={
+    'TargetFrame': PVOID,
+    'TargetIp': PVOID,
+    'ExceptionRecord': PVOID,
+    'ReturnValue': PVOID,
+    'OriginalContext': PVOID,
+    'HistoryTable': PVOID
+}, passthru=True)
+def hook_RtlUnwindEx(ql: Qiling, address: int, params):
+    return
+
+# NTSYSAPI
+# BOOLEAN RtlDispatchException(
+#   PEXCEPTION_RECORD ExceptionRecord,
+#   PCONTEXT ContextRecord
+# );
+@winsdkapi(cc=STDCALL, params={
+    'ExceptionRecord': PVOID,
+    'ContextRecord': PVOID
+}, passthru=True)
+def hook_RtlDispatchException(ql: Qiling, address: int, params):
+    return
+
+# NTSYSAPI
+# VOID RtlRestoreContext(
+#   PCONTEXT ContextRecord,
+#   PEXCEPTION_RECORD ExceptionRecord
+# );
+@winsdkapi(cc=CDECL, params={
+    'ContextRecord': PVOID,
+    'ExceptionRecord': PVOID
+}, passthru=True)
+def hook_RtlRestoreContext(ql: Qiling, address: int, params):
+    return
+
+# NTSYSAPI
+# VOID RtlCaptureContext(
+#   PCONTEXT ContextRecord
+# );
+@winsdkapi(cc=STDCALL, params={
+    'ContextRecord': PVOID
+}, passthru=True)
+def hook_RtlCaptureContext(ql: Qiling, address: int, params):
+    return
+
+# NTSYSAPI
+# VOID RtlCaptureContext2(
+#   PCONTEXT ContextRecord,
+#   ULONG Flags
+# );
+@winsdkapi(cc=STDCALL, params={
+    'ContextRecord': PVOID,
+    'Flags': DWORD
+}, passthru=True)
+def hook_RtlCaptureContext2(ql: Qiling, address: int, params):
+    return
+
+# NTSYSAPI
+# NTSTATUS RtlInitializeExtendedContext2(
+#   USHORT Version,
+#   USHORT ContextFlags,
+#   ULONG ExtensionCount,
+#   ULONG *ExtensionSizes,
+#   ULONG BufferSize,
+#   PVOID Buffer,
+#   PCONTEXT Context,
+#   ULONG *LengthReturned
+# );
+@winsdkapi(cc=STDCALL, params={
+    'Version': WORD,
+    'ContextFlags': WORD,
+    'ExtensionCount': DWORD,
+    'ExtensionSizes': PVOID,
+    'BufferSize': DWORD,
+    'Buffer': PVOID,
+    'Context': PVOID,
+    'LengthReturned': PVOID
+}, passthru=True)
+def hook_RtlInitializeExtendedContext2(ql: Qiling, address: int, params):
+    return
+
+# NTSYSAPI
+# NTSTATUS RtlGetExtendedContextLength2(
+#   USHORT Version,
+#   USHORT ContextFlags,
+#   ULONG ExtensionCount,
+#   ULONG *ExtensionSizes,
+#   PULONG Length
+# );
+@winsdkapi(cc=STDCALL, params={
+    'Version': WORD,
+    'ContextFlags': WORD,
+    'ExtensionCount': DWORD,
+    'ExtensionSizes': PVOID,
+    'Length': PVOID
+}, passthru=True)
+def hook_RtlGetExtendedContextLength2(ql: Qiling, address: int, params):
+    return
diff --git a/qiling/os/windows/structs.py b/qiling/os/windows/structs.py
index 56d685fc6..ea35c74c8 100644
--- a/qiling/os/windows/structs.py
+++ b/qiling/os/windows/structs.py
@@ -1545,3 +1545,19 @@ class WIN32_FIND_DATA(Struct):
         )
 
     return WIN32_FIND_DATA
+
+# https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-exception_pointers
+def make_exception_pointers(archbits: int):
+    """Generate an EXCEPTION_POINTERS structure class.
+    """
+
+    native_type = struct.get_native_type(archbits)
+    Struct = struct.get_aligned_struct(archbits)
+
+    class EXCEPTION_POINTERS(Struct):
+        _fields_ = (
+            ('ExceptionRecord',     native_type),
+            ('ContextRecord',       native_type)
+        )
+
+    return EXCEPTION_POINTERS
\ No newline at end of file
diff --git a/qiling/profiles/linux.ql b/qiling/profiles/linux.ql
index eac82348b..de828a32d 100644
--- a/qiling/profiles/linux.ql
+++ b/qiling/profiles/linux.ql
@@ -1,6 +1,7 @@
 [CODE]
 # ram_size 0xa00000 is 10MB
 ram_size = 0xa00000
+load_address = 0x1000000
 entry_point = 0x1000000
 
 
diff --git a/qiling/profiles/windows.ql b/qiling/profiles/windows.ql
index 15cc2f39b..ac2cc6684 100644
--- a/qiling/profiles/windows.ql
+++ b/qiling/profiles/windows.ql
@@ -23,6 +23,7 @@ KI_USER_SHARED_DATA = 0x7ffe0000
 [CODE]
 # ram_size 0xa00000 is 10MB
 ram_size = 0xa00000
+load_address = 0x1000000
 entry_point = 0x1000000
 
 [KERNEL]
diff --git a/tests/__init__.py b/tests/__init__.py
new file mode 100644
index 000000000..e69de29bb
diff --git a/tests/profiles/blob_raw.ql b/tests/profiles/blob_raw.ql
new file mode 100644
index 000000000..23390130a
--- /dev/null
+++ b/tests/profiles/blob_raw.ql
@@ -0,0 +1,4 @@
+[CODE]
+load_address = 0x10000000
+entry_point = 0x10000008
+ram_size = 0xa00000
\ No newline at end of file
diff --git a/tests/profiles/uboot_bin.ql b/tests/profiles/uboot_bin.ql
index b7f7216c8..1e95311fe 100644
--- a/tests/profiles/uboot_bin.ql
+++ b/tests/profiles/uboot_bin.ql
@@ -1,6 +1,8 @@
 [CODE]
 ram_size = 0xa00000
+load_address = 0x80800000
 entry_point = 0x80800000
+heap_address = 0xa0000000
 heap_size = 0x300000
 
 
diff --git a/tests/qdb_scripts/arm.qdb b/tests/qdb_scripts/arm.qdb
index 5bfa261a9..1336b219e 100644
--- a/tests/qdb_scripts/arm.qdb
+++ b/tests/qdb_scripts/arm.qdb
@@ -1,13 +1,37 @@
-# This line is demonstrate comment in qdb script
+# break on entry to main
+b 0x000103fc
 
-x/10wx 0x7ff3cee4
-x $sp
-x $sp + 0x10
-x/5i 0x047ba9e0
-b 0x047ba9ec
+# break on call to puts
+b 0x00010414
+
+# run till main
+c
+
+# show stack entries
+x/8xw $sp
+
+# run till puts
 c
-s
+
+# show argument passed to puts
+info args 1
+
+# show instructions passed call till end of function
+x/4i ($pc + 4)
+
+# step over call to puts
 n
+
+# show snapshot diff
+info snapshot
+
+# step backwards to start of main
 p
 p
+
+# re-run till the end of program to test that nothing breaks
+c
+c
+
+# quit
 q
diff --git a/tests/qdb_scripts/arm_static.qdb b/tests/qdb_scripts/arm_static.qdb
new file mode 100644
index 000000000..31cd02ab6
--- /dev/null
+++ b/tests/qdb_scripts/arm_static.qdb
@@ -0,0 +1,37 @@
+# break on entry to main
+b 0x000102e4
+
+# break on call to puts
+b 0x000102ee
+
+# run till main
+c
+
+# show stack entries
+x/8xw $sp
+
+# run till puts
+c
+
+# show argument passed to puts
+info args 1
+
+# show instructions passed call till end of function
+x/3i ($pc + 4)
+
+# step over call to puts
+n
+
+# show snapshot diff
+info snapshot
+
+# step backwards to start of main
+p
+p
+
+# re-run till the end of program to test that nothing breaks
+c
+c
+
+# quit
+q
diff --git a/tests/qdb_scripts/mips32el.qdb b/tests/qdb_scripts/mips32el.qdb
index 0e8342baf..cf880b486 100644
--- a/tests/qdb_scripts/mips32el.qdb
+++ b/tests/qdb_scripts/mips32el.qdb
@@ -1,13 +1,37 @@
-# This line is demonstrate comment in qdb script
+# break on entry to main
+b 0x565555e0
 
-x/10wx 0x7ff3cec0
-x $sp
-x $sp + 0x10
-x/5i 0x047bac40
-b 0x047bac50
+# break on call to puts
+b 0x56555600
+
+# run till main
+c
+
+# show stack entries
+x/8xw $sp
+
+# run till puts
 c
-s
+
+# show argument passed to puts
+info args 1
+
+# show instructions passed call till end of function
+x/5i ($pc + 4)
+
+# step over call to puts
 n
+
+# show snapshot diff
+info snapshot
+
+# step backwards to start of main
 p
 p
+
+# re-run till the end of program to test that nothing breaks
+c
+c
+
+# quit
 q
diff --git a/tests/qdb_scripts/x86.qdb b/tests/qdb_scripts/x86.qdb
index d06623328..e145f2bd1 100644
--- a/tests/qdb_scripts/x86.qdb
+++ b/tests/qdb_scripts/x86.qdb
@@ -1,11 +1,37 @@
-# This line is demonstrate comment in qdb script
+# break on entry to main
+b 0x5655551d
 
-x/4wx 0x7ff3cee0
-x $esp
-x $esp + 0x4
-x/5i 0x047bac70
-s
+# break on call to printf
+b 0x56555542
+
+# run till main
+c
+
+# show stack entries
+x/8xw $esp
+
+# run till printf
+c
+
+# show argument passed to printf
+info args 1
+
+# show instructions passed call till end of function
+x/8i ($eip + 5)
+
+# step over call to printf
 n
+
+# show snapshot diff
+info snapshot
+
+# step backwards to start of main
 p
 p
+
+# re-run till the end of program to test that nothing breaks
+c
+c
+
+# quit
 q
diff --git a/tests/test_blob.py b/tests/test_blob.py
index bc191dc16..0bd9a6629 100644
--- a/tests/test_blob.py
+++ b/tests/test_blob.py
@@ -10,13 +10,17 @@
 
 from qiling.core import Qiling
 from qiling.const import QL_ARCH, QL_OS, QL_VERBOSE
-from qiling.os.const import STRING
+from qiling.os.const import STRING, POINTER, SIZE_T
 
 
 class BlobTest(unittest.TestCase):
     def test_uboot_arm(self):
-        def my_getenv(ql, *args, **kwargs):
-            env = {"ID": b"000000000000000", "ethaddr": b"11:22:33:44:55:66"}
+        def my_getenv(ql: Qiling):
+            env = {
+                "ID": b"000000000000000",
+                "ethaddr": b"11:22:33:44:55:66"
+            }
+
             params = ql.os.resolve_fcall_params({'key': STRING})
             value = env.get(params["key"], b"")
 
@@ -26,12 +30,23 @@ def my_getenv(ql, *args, **kwargs):
             ql.arch.regs.r0 = value_addr
             ql.arch.regs.arch_pc = ql.arch.regs.lr
 
-        def check_password(ql, *args, **kwargs):
-            passwd_output = ql.mem.read(ql.arch.regs.r0, ql.arch.regs.r2)
-            passwd_input = ql.mem.read(ql.arch.regs.r1, ql.arch.regs.r2)
-            self.assertEqual(passwd_output, passwd_input)
+        def check_password(ql: Qiling):
+            params = ql.os.resolve_fcall_params({
+                'ptr1': POINTER,  # points to real password
+                'ptr2': POINTER,  # points to user provided password
+                'size': SIZE_T    # comparison length
+            })
+
+            ptr1 = params['ptr1']
+            ptr2 = params['ptr2']
+            size = params['size']
+
+            real_password = ql.mem.read(ptr1, size)
+            user_password = ql.mem.read(ptr2, size)
 
-        def partial_run_init(ql):
+            self.assertSequenceEqual(real_password, user_password, seq_type=bytearray)
+
+        def partial_run_init(ql: Qiling):
             # argv prepare
             ql.arch.regs.arch_sp -= 0x30
             arg0_ptr = ql.arch.regs.arch_sp
@@ -56,16 +71,78 @@ def partial_run_init(ql):
 
         ql = Qiling(code=uboot_code[0x40:], archtype=QL_ARCH.ARM, ostype=QL_OS.BLOB, profile="profiles/uboot_bin.ql", verbose=QL_VERBOSE.DEBUG)
 
-        image_base_addr = ql.loader.load_address
-        ql.hook_address(my_getenv, image_base_addr + 0x13AC0)
-        ql.hook_address(check_password, image_base_addr + 0x48634)
+        imgbase = ql.loader.images[0].base
+
+        ql.hook_address(my_getenv, imgbase + 0x13AC0)
+        ql.hook_address(check_password, imgbase + 0x48634)
 
         partial_run_init(ql)
 
-        ql.run(image_base_addr + 0x486B4, image_base_addr + 0x48718)
+        ql.run(imgbase + 0x486B4, imgbase + 0x48718)
 
         del ql
 
+    def test_blob_raw(self):
+        def run_checksum_emu(input_data_buffer: bytes) -> int:
+            """
+            Callable function that takes input data buffer and returns the checksum.
+            """
+            BASE_ADDRESS = 0x10000000
+            CHECKSUM_FUNC_ADDR = BASE_ADDRESS + 0x8
+            END_ADDRESS = 0x100000ba
+            DATA_ADDR = 0xa0000000
+            STACK_ADDR = 0xb0000000
+
+            with open("../examples/rootfs/blob/example_raw.bin", "rb") as f:
+                raw_code = f.read()
+
+            ql = Qiling(code=raw_code, archtype=QL_ARCH.ARM, ostype=QL_OS.BLOB, profile="profiles/blob_raw.ql", verbose=QL_VERBOSE.DEBUG, thumb=True)
+
+            input_data_len = len(input_data_buffer)
+
+            # Map memory for data and stack
+            ql.mem.map(STACK_ADDR, 0x2000)
+            ql.mem.map(DATA_ADDR, ql.mem.align_up(input_data_len + 0x100))
+
+            # Write input data
+            ql.mem.write(DATA_ADDR, input_data_buffer)
+
+            # Set up registers
+            ql.arch.regs.sp = STACK_ADDR + 0x2000 - 4
+            ql.arch.regs.r0 = DATA_ADDR
+            ql.arch.regs.r1 = input_data_len
+            ql.arch.regs.pc = CHECKSUM_FUNC_ADDR
+            ql.arch.regs.lr = 0xbebebebe
+
+            ql.run(begin=CHECKSUM_FUNC_ADDR, end=END_ADDRESS)
+            result = ql.arch.regs.r0
+
+            return result
+
+        def calculate_expected_checksum(input_data_buffer: bytes) -> int:
+            """
+            Python implementation of the expected checksum calculation.
+            """
+            input_data_len = len(input_data_buffer)
+            expected_checksum = 0
+
+            if input_data_len >= 1 and input_data_buffer[0] == 0xDE:  # MAGIC_VALUE_1
+                for i in range(min(input_data_len, 4)):
+                    expected_checksum += input_data_buffer[i]
+                expected_checksum += 0x10
+            elif input_data_len >= 2 and input_data_buffer[1] == 0xAD:  # MAGIC_VALUE_2
+                for i in range(input_data_len):
+                    expected_checksum ^= input_data_buffer[i]
+                expected_checksum += 0x20
+            else:
+                for i in range(input_data_len):
+                    expected_checksum += input_data_buffer[i]
+
+            return expected_checksum & 0xFF
+
+        test_input = b"\x01\x02\x03\x04\x05"
+        self.assertEqual(run_checksum_emu(test_input), calculate_expected_checksum(test_input))
+
 
 if __name__ == "__main__":
     unittest.main()
diff --git a/tests/test_pe_sys.py b/tests/test_pe_sys.py
index 546c0deed..17af42907 100644
--- a/tests/test_pe_sys.py
+++ b/tests/test_pe_sys.py
@@ -219,8 +219,8 @@ def hook_third_stop_address(ql: Qiling, stops: List[bool]):
         fcall.writeParams(((DWORD, 0),))
 
         # run until third stop
-        # TODO: Should stop at 0x10423, but for now just stop at 0x0001066a
-        amsint32.hook_address(hook_third_stop_address, 0x0001066a, stops)
+        # TODO: Should stop at 0x10423, but for now just stop at 0x10430
+        amsint32.hook_address(hook_third_stop_address, 0x10430, stops)
         amsint32.run(begin=0x102D0)
 
         self.assertTrue(stops[0])
diff --git a/tests/test_qdb.py b/tests/test_qdb.py
index 0a0da506c..563dd840e 100644
--- a/tests/test_qdb.py
+++ b/tests/test_qdb.py
@@ -1,41 +1,58 @@
 #!/usr/bin/env python3
-# 
+#
 # Cross Platform and Multi Architecture Advanced Binary Emulation Framework
 #
 
-import sys, unittest
+import sys
+import unittest
 
 sys.path.append("..")
 from qiling import Qiling
+from qiling.const import QL_VERBOSE
+
 
 class DebuggerTest(unittest.TestCase):
 
-    def test_qdb_mips32el_hello(self):
-        rootfs = "../examples/rootfs/mips32el_linux"
-        path = rootfs + "/bin/mips32el_hello"
+    def __test_common(self, vpath: str, rootfs: str, script: str) -> None:
+        """Load a common setup for all test cases.
+        """
 
-        ql = Qiling([path], rootfs)
-        ql.debugger = "qdb::rr:qdb_scripts/mips32el.qdb"
-        ql.run()
-        del ql
+        ql = Qiling([f'{rootfs}{vpath}'], rootfs, verbose=QL_VERBOSE.DEBUG)
+        ql.debugger = f'qdb::rr:{script}'
 
-    def test_qdb_arm_hello(self):
-        rootfs = "../examples/rootfs/arm_linux"
-        path = rootfs + "/bin/arm_hello"
+        try:
+            ql.run()
+        except SystemExit as ex:
+            self.assertEqual(ex.code, 0)
 
-        ql = Qiling([path], rootfs)
-        ql.debugger = "qdb::rr:qdb_scripts/arm.qdb"
-        ql.run()
-        del ql
+    def test_qdb_mips32el_hello(self):
+        self.__test_common(
+            r'/bin/mips32el_hello',
+            r'../examples/rootfs/mips32el_linux',
+            r'qdb_scripts/mips32el.qdb'
+        )
+
+    def test_qdb_arm_hello(self):
+        self.__test_common(
+            r'/bin/arm_hello',
+            r'../examples/rootfs/arm_linux',
+            r'qdb_scripts/arm.qdb'
+        )
+
+    def test_qdb_arm_hello_static(self):
+        self.__test_common(
+            r'/bin/arm_hello_static',
+            r'../examples/rootfs/arm_linux',
+            r'qdb_scripts/arm_static.qdb'
+        )
 
     def test_qdb_x86_hello(self):
-        rootfs = "../examples/rootfs/x86_linux"
-        path = rootfs + "/bin/x86_hello"
+        self.__test_common(
+            r'/bin/x86_hello',
+            r'../examples/rootfs/x86_linux',
+            r'qdb_scripts/x86.qdb'
+        )
 
-        ql = Qiling([path], rootfs)
-        ql.debugger = "qdb::rr:qdb_scripts/x86.qdb"
-        ql.run()
-        del ql
 
-if __name__ == "__main__":
+if __name__ == '__main__':
     unittest.main()
diff --git a/tests/test_windows_cpp_x86.py b/tests/test_windows_cpp_x86.py
new file mode 100644
index 000000000..ba45466ea
--- /dev/null
+++ b/tests/test_windows_cpp_x86.py
@@ -0,0 +1,64 @@
+#!/usr/bin/env python3
+# 
+# Cross Platform and Multi Architecture Advanced Binary Emulation Framework
+#
+
+import sys, unittest
+
+sys.path.append("..")
+from qiling import Qiling
+from qiling.const import QL_VERBOSE
+from qiling.extensions import pipe
+
+
+def good_bad_count(test_str: str, good_str="GOOD", bad_str="BAD"):
+    good_count = test_str.count(good_str)
+    bad_count = test_str.count(bad_str)
+
+    return good_count, bad_count
+
+
+class CppTests_x86(unittest.TestCase):
+
+    def test_cpp_helloworld(self):
+        """ Test a basic C++ Hello World program which prints "Hello World!"
+        to the console using std::cout.
+        """
+        ql = Qiling(["../examples/rootfs/x86_windows/bin/except/CppHelloWorld_x86.exe"], "../examples/rootfs/x86_windows/", verbose=QL_VERBOSE.DEFAULT)
+
+        ql.os.stdout = pipe.SimpleStringBuffer()
+
+        ql.run()
+
+        conout = ql.os.stdout.read()
+        self.assertEqual(conout, b"Hello World!\x0d\x0a")
+
+        del ql
+
+    def test_cpp_types(self):
+        """ This program tests several C++ type-related runtime features.
+        - typeid
+        - dynamic_cast
+        - virtual methods
+        - virtual destructors
+        """
+        ql = Qiling(["../examples/rootfs/x86_windows/bin/except/TestCppTypes_x86.exe"], "../examples/rootfs/x86_windows/", verbose=QL_VERBOSE.DEFAULT)
+
+        ql.os.stdout = pipe.SimpleStringBuffer()
+
+        ql.run()
+
+        conout = ql.os.stdout.read().decode('utf-8')
+        good_count, bad_count = good_bad_count(conout)
+
+        # the test program should print
+        # - 'GOOD' 12 times
+        # - 'BAD' 0 times
+        self.assertEqual(good_count, 12)
+        self.assertEqual(bad_count, 0)
+
+        del ql
+
+
+if __name__ == '__main__':
+    unittest.main()
\ No newline at end of file
diff --git a/tests/test_windows_cpp_x8664.py b/tests/test_windows_cpp_x8664.py
new file mode 100644
index 000000000..8f89a47e4
--- /dev/null
+++ b/tests/test_windows_cpp_x8664.py
@@ -0,0 +1,181 @@
+#!/usr/bin/env python3
+# 
+# Cross Platform and Multi Architecture Advanced Binary Emulation Framework
+#
+
+import sys, unittest
+
+sys.path.append("..")
+from qiling import Qiling
+from qiling.const import QL_VERBOSE
+from qiling.extensions import pipe
+
+
+def good_bad_count(test_str: str, good_str="GOOD", bad_str="BAD"):
+    good_count = test_str.count(good_str)
+    bad_count = test_str.count(bad_str)
+
+    return good_count, bad_count
+
+
+class CppTests_x8664(unittest.TestCase):
+
+    def test_cpp_helloworld(self):
+        """ Test a basic C++ Hello World program which prints "Hello World!"
+        to the console using std::cout.
+        """
+        ql = Qiling(["../examples/rootfs/x8664_windows/bin/except/CppHelloWorld.exe"], "../examples/rootfs/x8664_windows/", verbose=QL_VERBOSE.DEFAULT)
+
+        ql.os.stdout = pipe.SimpleStringBuffer()
+
+        ql.run()
+
+        conout = ql.os.stdout.read()
+        self.assertEqual(conout, b"Hello World!\x0d\x0a")
+
+        del ql
+
+    def test_cpp_types(self):
+        """ This program tests several C++ type-related runtime features.
+        - typeid
+        - dynamic_cast
+        - virtual methods
+        - virtual destructors
+        """
+        ql = Qiling(["../examples/rootfs/x8664_windows/bin/except/TestCppTypes.exe"], "../examples/rootfs/x8664_windows/", verbose=QL_VERBOSE.DEFAULT)
+
+        ql.os.stdout = pipe.SimpleStringBuffer()
+
+        ql.run()
+
+        conout = ql.os.stdout.read().decode('utf-8')
+        good_count, bad_count = good_bad_count(conout)
+
+        # the test program should print
+        # - 'GOOD' 12 times
+        # - 'BAD' 0 times
+        self.assertEqual(good_count, 12)
+        self.assertEqual(bad_count, 0)
+
+        del ql
+
+    def test_soft_seh(self):
+        """ Test software SEH.
+        This test program uses __try..__catch and calls RaiseException with
+        a custom code. If software SEH is functioning correctly, the program
+        should be able to invoke its __catch-block and continue execution after.
+        """
+        ql = Qiling(["../examples/rootfs/x8664_windows/bin/except/TestSoftSEH.exe"], "../examples/rootfs/x8664_windows/", verbose=QL_VERBOSE.DEFAULT)
+
+        ql.os.stdout = pipe.SimpleStringBuffer()
+
+        ql.run()
+
+        conout = ql.os.stdout.read().decode('utf-8')
+        good_count, bad_count = good_bad_count(conout)
+
+        # the test program should print
+        # - 'GOOD' 4 times
+        # - 'BAD' 0 times
+        self.assertEqual(good_count, 4)
+        self.assertEqual(bad_count, 0)
+
+        # If the exception handler was not invoked for some reason,
+        # the program may terminate abnormally with a non-zero exit
+        # code.
+        self.assertEqual(ql.os.exit_code, 0)
+
+        del ql
+
+    def test_soft_cppex(self):
+        """ Test software C++ exceptions.
+        This test program tests try..catch in various ways. If exception dispatching
+        and stack unwinding are functioning correctly, the program will run to completion.
+        - Simple try..catch
+        - Try..catch with throw data
+        - Nested try..catch with throw data
+        """
+        ql = Qiling(["../examples/rootfs/x8664_windows/bin/except/TestCppEx.exe"], "../examples/rootfs/x8664_windows/", verbose=QL_VERBOSE.DEFAULT)
+
+        ql.os.stdout = pipe.SimpleStringBuffer()
+
+        ql.run()
+
+        conout = ql.os.stdout.read().decode('utf-8')
+        good_count, bad_count = good_bad_count(conout, 'y', 'n')
+
+        # the test program should print
+        # - 'y' 14 times
+        # - 'n' 0 times
+        self.assertEqual(good_count, 14)
+        self.assertEqual(bad_count, 0)
+
+        # If the exception handler was not invoked for some reason,
+        # the program may terminate abnormally with a non-zero exit
+        # code.
+        self.assertEqual(ql.os.exit_code, 0)
+
+        del ql
+
+    def test_cppex_unhandled_filtered(self):
+        """ Test unhandled C++ exceptions.
+        This program registers its own unhandled exception filter via
+        SetUnhandledExceptionFilter, then throws an uncaught exception.
+        If unhandled exception filters are functioning correctly,
+        the program's custom exception filter will be reached, but
+        execution will NOT resume after the exception.
+        Instead, the program is expected to terminate abnormally
+        with status code 0xE06D7363 (C++ runtime exception).
+        """
+        ql = Qiling(["../examples/rootfs/x8664_windows/bin/except/TestCppExUnhandled.exe"], "../examples/rootfs/x8664_windows/", verbose=QL_VERBOSE.DEFAULT)
+
+        ql.os.stdout = pipe.SimpleStringBuffer()
+
+        ql.run()
+
+        conout = ql.os.stdout.read().decode('utf-8')
+        good_count, bad_count = good_bad_count(conout)
+
+        # the test program should print
+        # - 'GOOD' 3 times
+        # - 'BAD' 0 times
+        self.assertEqual(good_count, 3)
+        self.assertEqual(bad_count, 0)
+
+        # The program should have terminated abnormally
+        # with status code 0xE06D7363 (C++ runtime exception).
+        self.assertEqual(ql.os.exit_code, 0xE06D7363)
+
+        del ql
+
+    def test_cppex_unhandled_unfiltered(self):
+        """ Test unhandled C++ exceptions.
+        This program throws an uncaught C++ exception.
+        The program is expected to terminate abnormally
+        with status code 0xC0000409 (STATUS_STACK_BUFFER_OVERRUN).
+        """
+        ql = Qiling(["../examples/rootfs/x8664_windows/bin/except/TestCppExUnhandled2.exe"], "../examples/rootfs/x8664_windows/", verbose=QL_VERBOSE.DEFAULT)
+
+        ql.os.stdout = pipe.SimpleStringBuffer()
+
+        ql.run()
+
+        conout = ql.os.stdout.read().decode('utf-8')
+        good_count, bad_count = good_bad_count(conout)
+
+        # the test program should print
+        # - 'GOOD' 1 time
+        # - 'BAD' 0 times
+        self.assertEqual(good_count, 1)
+        self.assertEqual(bad_count, 0)
+
+        # The program is expected to terminate abnormally
+        # with status code 0xC0000409 (STATUS_STACK_BUFFER_OVERRUN)
+        # https://devblogs.microsoft.com/oldnewthing/20190108-00/?p=100655
+        #
+        self.assertEqual(ql.os.exit_code, 0xC0000409)
+
+        del ql
+
+if __name__ == '__main__':
+    unittest.main()
\ No newline at end of file