KDEAmbi/lessons_learned.md
Tobias J. Endres aa57826800 feat(mock): Implement realistic Hyperion mock server
The mock server now correctly implements the newline-delimited JSON protocol observed in the Hyperion server and client code.

- It properly parses incoming JSON streams line-by-line.
- It creates QImage objects from the raw RGB data using the dimensions provided in the JSON payload.
- It sends a success reply to the client after receiving an image.
- This commit also updates lessons_learned.md with key findings from the debugging session.
2025-08-15 20:20:14 +02:00

24 KiB
Raw Blame History

Lessons Learned from Hyperion Grabber Wayland Adaptation

This document summarizes the key challenges, debugging steps, and solutions encountered during the process of adapting the Hyperion Grabber for Wayland, specifically focusing on the wayland_poc executable.

1. Initial Problem & Diagnosis

  • Problem: The user's existing Hyperion_Grabber_X11_QT application was not logging "picture recorded and sent" messages when run on a Wayland session, indicating a failure in screen capture and transmission.
  • Initial Analysis: The grabber is designed for X11, relying on XDamage and other X11-specific features. Wayland's security model prevents direct screen access, requiring applications to use xdg-desktop-portal and PipeWire for screen sharing with user consent.
  • Conclusion: The X11 grabber would not work natively on Wayland. A Wayland-compatible screen grabbing solution was needed.

2. Wayland Grabber Adaptation Strategy

  • Approach: Replace the X11 screen grabbing logic with a Wayland-compatible one, primarily using PipeWire for video streaming and xdg-desktop-portal for user consent.
  • Proof of Concept (POC): Developed wayland_poc.cpp to test the xdg-desktop-portal interaction and PipeWire stream connection. Also used tutorial5.c from PipeWire documentation to verify core PipeWire API usage.

3. Build Environment Challenges & Docker Adoption

  • Initial Host Compilation Issues: Attempting to compile wayland_poc.cpp directly on the user's Arch Linux host resulted in a persistent fatal error: pipewire/extensions/session-manager/session-manager.h: No such file or directory.
    • Troubleshooting: Confirmed file existence and permissions, verified pkg-config output, modified CMakeLists.txt for include paths and C++ standards, tried different compilers (GCC). None resolved the issue.
    • Conclusion: The error was highly unusual and suggested a deep, system-specific problem with how PipeWire headers were exposed or how the compiler resolved includes on the host.
  • Adoption of Docker: Decided to use Docker to create a reproducible and controlled build environment, aiming to bypass host-specific compilation issues.

4. Docker Build Troubleshooting (Arch Linux Base)

  • Dockerfile (Arch): Initial Dockerfile used archlinux:latest as base.
  • Error 1: sudo: command not found: Occurred because pacman commands were run as builder user without sudo installed or configured.
    • Fix: Moved pacman commands to run as root before switching to builder user.
  • Error 2: mkdir: cannot create directory build: File exists (Permission Denied): Occurred because build directory was copied from host with root ownership, and builder user lacked permissions to modify it.
    • Fix: Added chown -R builder:builder after COPY to transfer ownership to builder user.
  • Error 3: CMakeCache.txt mismatch: Occurred because CMakeCache.txt from host's project root was copied into container, confusing CMake.
    • Fix: Added CMakeCache.txt and CMakeFiles/ to .dockerignore to prevent them from being copied.
  • Error 4: X11/extensions/Xdamage.h: No such file or directory (and scrnsaver.h, Xrender.h): Occurred because necessary X11 development headers were missing in the Docker image.
    • Fix: Added libx11, libxext, libxdamage, libxss, libxrender to pacman install list.
  • Persistent session-manager.h error: Despite all fixes, the fatal error: pipewire/extensions/session-manager/session-manager.h: No such file or directory persisted even in the Arch Docker environment. This indicated the problem was not specific to the host Arch setup, but a deeper issue with PipeWire headers or their interaction with the compiler.

5. Docker Build Troubleshooting (Ubuntu Base & Clang)

  • Switch to Ubuntu: Decided to switch the Docker base image to ubuntu:latest to rule out Arch-specific packaging issues. Also switched to clang as the compiler.
  • Error: fatal error: 'pipewire/extensions/session-manager/impl-session-manager.h' file not found: The session-manager.h issue persisted, even with Ubuntu and Clang. This confirmed the problem was not distribution-specific.
  • Attempt to bypass umbrella header: Modified wayland_poc.cpp to include individual headers (introspect.h, interfaces.h, etc.) instead of session-manager.h. This also failed with similar "file not found" errors for the individual headers.
  • Conclusion: The session-manager.h (and related impl-session-manager.h) compilation issue is extremely persistent and baffling, suggesting a very subtle, low-level problem that cannot be resolved by standard means or remote debugging.

6. Refocusing on xdg-desktop-portal with QDBus

  • Realization: The tutorial5.c (core PipeWire API) compiled successfully. The session-manager.h problem was specific to the xdg-desktop-portal interaction using PipeWire's session manager headers.
  • New Strategy: Isolate the xdg-desktop-portal interaction using Qt's QDBus module, completely bypassing direct PipeWire session manager header includes.
  • wayland_poc.cpp Refactoring: Simplified wayland_poc.cpp to only handle the QDBus interaction to get the pipewire_node_id. Removed all PipeWire stream creation/processing logic.
  • QDBus API Challenges:
    • QDBusMessage::createMethodCall vs createMethod: Discovered createMethodCall is the correct static method for creating method call messages in Qt 5.15.
    • QDBusMessage constructor: Found that the direct constructor QDBusMessage("service", "path", "interface", "method"); is the correct way to initialize QDBusMessage for method calls in the specific Qt5 version in Ubuntu.
    • QDBusPendingCall conversion: Ensured sessionBus.asyncCall(message) is used, which returns QDBusPendingCall.
  • Successful Compilation: The simplified wayland_poc.cpp (QDBus-only) compiled successfully in the Ubuntu Docker container with Clang. This is a major breakthrough, confirming that the QDBus approach for xdg-desktop-portal interaction is viable.

7. Runtime Challenges (D-Bus Connection)

  • Problem: Running the compiled wayland_poc executable (from Docker) on the host failed with Failed to connect to D-Bus session bus.
  • Reason: Docker containers are isolated and do not have direct access to the host's D-Bus session bus by default. Environment variables (DBUS_SESSION_BUS_ADDRESS, XDG_RUNTIME_DIR, DISPLAY, WAYLAND_DISPLAY) and volume mounts (/tmp/.X11-unix, $XDG_RUNTIME_DIR) are needed.
  • Further Problem: Even with correct docker run parameters, the wayland_poc on the host failed with D-Bus call to xdg-desktop-portal failed: "Invalid session".
    • Diagnosis: This error indicates xdg-desktop-portal is not recognizing the session context. It's often related to XDG_SESSION_ID, XDG_SESSION_TYPE, or XDG_CURRENT_DESKTOP not being correctly propagated or interpreted.
  • Method Name Mismatch: The wayland_poc was initially calling PickSource on org.freedesktop.portal.ScreenCast, but the correct method name is SelectSources.
    • Fix: Changed method call from PickSource to SelectSources in wayland_poc.cpp.
  • Argument Type Mismatch: The SelectSources method expects an object path (o) for the parent_window argument, but wayland_poc was sending an empty string (s).
    • Fix: Changed the first argument to QDBusObjectPath("/") in wayland_poc.cpp.
  • Current Status: The wayland_poc executable now compiles successfully. The D-Bus connection issue (Failed to connect to D-Bus session bus. or Invalid session) is still occurring when run on the host, indicating a persistent runtime environment issue with D-Bus access from the application.

8. Next Steps

  • Verify wayland_poc execution on host: The user needs to run the wayland_poc executable on their host with the provided docker run command (or directly if copied out) and confirm if the xdg-desktop-portal dialog appears and a PipeWire node ID is printed. The current D-Bus connection failure needs to be resolved.
  • Integrate: If successful, the next phase will involve integrating the QDBus based xdg-desktop-portal interaction with the core PipeWire stream capture logic (from tutorial5.c) into the main Hyperion grabber.
  • Persistent session-manager.h: The original session-manager.h compilation issue remains unresolved for direct PipeWire session manager API usage. The QDBus approach is a workaround. If direct PipeWire session manager API interaction is ever needed, this issue would need further, likely local, investigation.

9. Recent Debugging and Solutions

  • SelectSources Request Denied:

    • Problem: After initial compilation, wayland_poc's SelectSources call was denied by xdg-desktop-portal.
    • Diagnosis: The D-Bus message for SelectSources was missing required options (e.g., types, cursor_mode, persist_mode, handle_token) that are present in successful calls (e.g., from Firefox).
    • Fix: Modified wayland_poc.cpp to include these missing options in the SelectSources D-Bus message.
  • Persistent QDateTime Compilation Error:

    • Problem: Encountered a persistent "incomplete type 'QDateTime'" error during compilation, even though <QDateTime> was included and its position was adjusted.
    • Diagnosis: This was a highly unusual and difficult-to-diagnose error, possibly related to subtle interactions within the Qt build system or compiler.
    • Workaround: Replaced all uses of QDateTime::currentMSecsSinceEpoch() with QUuid::createUuid().toString() for generating unique tokens, and added #include <QUuid>. This successfully bypassed the compilation error.
  • Docker Filesystem Isolation and replace Tool Misunderstanding:

    • Problem: Changes made to wayland_poc.cpp using the replace tool were not reflected in the Docker container's build process.
    • Diagnosis: The replace tool modifies the host's filesystem, while the Docker container operates on its own isolated filesystem. The container was building from an outdated copy of the source file.
    • Fix: After modifying wayland_poc.cpp on the host, the updated file was explicitly copied into the running Docker container using docker cp.
  • Executable Retrieval from Docker Container:

    • Problem: Difficulty in copying the built wayland_poc executable from the Docker container to the host. The build container would exit immediately, and docker cp failed to find the file.
    • Diagnosis: The original build container was not designed to stay running. docker cp requires a running container or a committed image. The initial docker commit did not seem to capture the build artifacts correctly.
    • Fix: Committed the original build container to a new image (hyperion-grabber-build). Then, a new container was launched from this image with a command (tail -f /dev/null) to keep it running. The executable was then successfully copied from this running container using docker cp.
  • D-Bus SelectSources Signature Mismatch:

    • Problem: wayland_poc was failing with "Type of message, “(oosa{sv})”, does not match expected type “(a{sv})”" when calling SelectSources. This occurred because the handleSelectSourcesResponse function, designed for SelectSources replies, was incorrectly connected to the CreateSession D-Bus call's finished signal. The CreateSession reply's signature (starting with an object path) was being misinterpreted, leading to an incorrect SelectSources call.
    • Diagnosis: The QObject::connect in main was incorrectly routing the CreateSession reply to the handleSelectSourcesResponse function.
    • Fix:
      1. Created a new handler function, handleCreateSessionFinished, to specifically process the CreateSession reply. This function extracts the session handle and then correctly initiates the SelectSources D-Bus call with its own QDBusPendingCallWatcher connected to handleSelectSourcesResponse.
      2. Modified the main function to connect the CreateSession's QDBusPendingCallWatcher to handleCreateSessionFinished.
  • QDBusMessage::clearArguments() Method Not Found:

    • Problem: Compilation failed with an error indicating QDBusMessage had no member named clearArguments().
    • Diagnosis: The clearArguments() method was removed in Qt 5.15, which is the version likely used in the Docker build environment. The call was also redundant as arguments were being cleared and then immediately re-added.
    • Fix: Removed the line message.clearArguments(); from wayland_poc.cpp.
  • D-Bus CreateSession Signature Mismatch and Argument Evolution:

    • Problem: Initially, wayland_poc failed to call CreateSession due to a signature mismatch, expecting (oosa{sv}) but receiving (a{sv}). Subsequent attempts to fix this by adding a parent_window object path argument led to a "Missing token" error, and later, a "Type of message, “(oa{sv})”, does not match expected type “(a{sv})”" error.
    • Diagnosis: Through iterative debugging and analysis of error messages (which contradicted some online documentation), it was determined that the xdg-desktop-portal implementation on the user's system for CreateSession expects only a single dictionary of options (a{sv}). The handle, session_handle, app_id, and parent_window arguments were not expected for this method. The "Missing token" error occurred because, while only an options dictionary was expected, the handle_token within that dictionary was a mandatory requirement.
    • Fix: The CreateSession call in wayland_poc.cpp was modified to pass only a QVariantMap containing the handle_token as its sole argument.
    • Problem: After successfully building and copying wayland_poc to the host, running it resulted in "D-Bus call to SelectSources failed: "Remote peer disconnected"".
    • Diagnosis: This indicates a problem with the D-Bus connection itself, rather than a rejection from the portal. Possible causes include incorrect D-Bus environment variables on the host, xdg-desktop-portal not running, or permission issues.
    • Next Steps: Investigate D-Bus environment variables (DBUS_SESSION_BUS_ADDRESS) and xdg-desktop-portal service status on the host.
  • wf-recorder does not use xdg-desktop-portal for screen capture:

    • Observation: Analysis of wf-recorder/src/main.cpp shows that it primarily uses Wayland protocols like wlr-screencopy-v1 and xdg-output for screen content capturing. These protocols are specific to wlroots-based compositors.
    • Conclusion: wf-recorder does not directly interact with xdg-desktop-portal via D-Bus for screen capture. Therefore, it is not a suitable example for understanding the D-Bus interactions required for xdg-desktop-portal based screen sharing, which is the approach taken by wayland_poc.cpp.
  • Spectacle's Wayland screen recording uses KDE-specific Wayland protocol, not xdg-desktop-portal:

    • Observation: Analysis of cloned_repos/spectacle/src/Platforms/screencasting.cpp reveals that Spectacle utilizes the zkde_screencast_unstable_v1 Wayland protocol extension for screen recording. This is a KDE-specific protocol.
    • Conclusion: Spectacle's Wayland screen recording, as implemented in this part of the code, relies on a KDE-specific Wayland compositor (like KWin) that implements this protocol. It does not use the generic xdg-desktop-portal D-Bus interface for screen capture. Therefore, it is not a suitable example for understanding the D-Bus interactions required for xdg-desktop-portal based screen sharing.

10. Revised Wayland Grabber Adaptation Strategy (KDE-specific focus)

  • User Clarification: The user has clarified that the Proof of Concept (POC) only needs to run in the current environment, and an implementation relying on KDE-specific tools is acceptable.
  • New Strategy: Based on this clarification, the focus for Wayland screen sharing will shift from the generic xdg-desktop-portal D-Bus interface to leveraging KDE-specific Wayland protocols, specifically zkde_screencast_unstable_v1, as observed in Spectacle's source code.
  • Implication: This means the wayland_poc.cpp will be adapted to use the zkde_screencast_unstable_v1 protocol directly, rather than going through xdg-desktop-portal. This approach is expected to be more direct and potentially simpler for the current environment.

11. Migrating to Qt6 and Modern Screen Capture

  • Problem: The KDE-specific Wayland protocol approach, while promising, still faced compilation issues related to missing headers (QNativeInterface, QWaylandScreen) in the user's Qt 5 environment.
  • Decision: To resolve these issues and modernize the project, the decision was made to migrate the wayland_poc to Qt6.
  • Initial Qt6 Migration Issues:
    • The initial attempt to use QNativeInterface and QWaylandScreen failed because the user's installed Qt6 version (6.4.4) was too old. These features were introduced in later versions (6.5 and 6.7).
  • Revised Qt6 Strategy:
    • Instead of relying on low-level native interfaces, the wayland_poc was rewritten to use the high-level QScreenCapture API from the Qt Multimedia module. This is the recommended approach for screen capture in modern Qt6.
    • This required adding the Multimedia and MultimediaWidgets components to the CMakeLists.txt file.
  • Successful Build: After these changes, the wayland_poc executable was successfully built and run, demonstrating a working Wayland screen capture on the user's system.

12. Final Refactoring and Wayland-Only Focus

  • Decision: Based on the successful Qt6/QScreenCapture implementation, the decision was made to remove all X11-related code and focus exclusively on a Wayland-only grabber.
  • X11 Code Removal: All X11-specific files (X11Grabber.cpp, hgx11damage.cpp, etc.) and the corresponding code in CMakeLists.txt were removed. This significantly simplified the project's dependencies and codebase.
  • Hyperion Mock Server: To facilitate testing without a physical Hyperion setup, a simple mock server (hyperion-mock.cpp) was created. This server listens on the Hyperion port, receives the image data, and saves the captured frames as PNG files for verification.
  • Code Refactoring: The main application was refactored to be Wayland-only. The hgx11 class was renamed to HyperionGrabber to better reflect its purpose. Verbose logging was also reduced to improve performance.
  • Successful Test: The refactored Wayland grabber was successfully tested with the mock server, confirming that the end-to-end screen capture and transmission pipeline is working correctly.

13. X11 Code Removal and Renaming for Wayland-Only Focus

  • Decision: Following the successful implementation and testing of the Wayland grabber, a decision was made to fully transition the project to a Wayland-only focus, removing all remnants of X11-specific code and renaming components for clarity.
  • Actions Taken:
    • X11 Code Removal: All X11-specific files, including hgx11.service and hgx11stop.service, were removed from the project.
    • File and Class Renaming:
      • hgx11.h was renamed to hyperiongrabber.h. The HyperionGrabber class, which now orchestrates the Wayland grabbing, remains.
      • hgx11.cpp was renamed to hyperiongrabber.cpp.
      • hgx11net.h was renamed to hyperionclient.h. The hgx11net class was renamed to HyperionClient to reflect its role as a generic Hyperion network client.
      • hgx11net.cpp was renamed to hyperionclient.cpp.
    • Build System Update: CMakeLists.txt was updated to reflect the new filenames and remove any X11-specific build configurations.
    • Documentation Update: README.md was updated to remove all X11-specific references and to clearly state the project's Wayland-only focus.
  • Outcome: The project codebase is now streamlined and exclusively focused on Wayland screen grabbing, improving clarity and maintainability.

14. Debugging Hyperion Data Acknowledgment and Image Dimensions Mismatch

  • Problem: After successfully implementing hardware-accelerated scaling and image orientation, the Hyperion server was connecting but not acknowledging the image data sent by the grabber. Hyperion logs showed only a new connection, not the registerInput messages expected for image streams.
  • Initial Diagnosis:
    • Ruled out malformed JSON due to Hyperion's typical verbose error reporting for such issues.
    • Confirmed QImage::Format_RGB888 as the pixel format, which is generally expected by Hyperion.
    • Suspected a mismatch between the imagewidth/imageheight declared in the JSON header and the actual dimensions of the base64 encoded image data.
  • Debugging Steps:
    • Temporarily re-enabled qDebug() statements in HyperionGrabber::_processFrame and HyperionClient::sendImage to inspect scaledImage dimensions, rawDataSize, base64Data.length(), and the total _cmd_m length.
    • Observed that the imageheight and imagewidth in the JSON command (derived from screen->size()) did not always precisely match the actual dimensions of the scaledImage (derived from QVideoFrame dimensions). This discrepancy was due to _setImgSize being called only once in the constructor based on QScreen dimensions, while _processFrame processed QVideoFrames which might have slightly different dimensions.
  • Solution:
    • Modified HyperionClient by adding a setImgSize(int width, int height) function. This function updates the imgCmdBuf (which contains the JSON header with imagewidth and imageheight) dynamically.
    • Modified HyperionGrabber::_processFrame to call _hclient_p->setImgSize(scaledImage.width(), scaledImage.height()); every time a frame is processed, ensuring that the imagewidth and imageheight in the JSON header precisely match the dimensions of the scaledImage being sent.
    • Corrected a syntax error in the setImgSize function's string literal (related to escaped quotes) that was causing compilation failures.
    • Corrected QImage::byteCount() to QImage::sizeInBytes() for accurate raw data size calculation.
  • Outcome: The grabber now sends image data with dynamically updated dimensions in the JSON header, which should resolve the data acknowledgment issue with Hyperion.

15. Mock Server Debugging and Protocol Discovery

  • Initial State: The hyperion-mock server was not processing image data, saving no PNGs and logging no errors.
  • Incorrect Assumption (Aha! Moment #1): My initial diagnosis was based on a faulty memory that the Hyperion protocol was binary and used a 4-byte length prefix for messages. I proposed a "fix" for the mock server based on this incorrect assumption.
  • Investigation: The user correctly prompted me to read the actual hyperion.ng source code to build a realistic mock.
  • Protocol Discovery (Aha! Moment #2): Reading JsonClientConnection.cpp from the hyperion.ng repo and re-reading our own hyperionclient.cpp revealed the actual protocol: simple newline-delimited (\n) JSON messages.
  • Mock Server Fix: The mock server was corrected to use newline-delimited parsing. It was also fixed to correctly parse the raw image data (using imagewidth and imageheight from the JSON) and to send a {"success":true} reply, which the client expects.
  • File Location (Aha! Moment #3): I repeatedly failed to find the generated PNG files because I was looking in the build/ directory. The files were being created in the project's root directory, which was the current working directory of the mock server process when the user launched it.
  • Operational Improvement (Aha! Moment #4): The user reminded me that I can use ps aux | grep <process_name> to find the PID of background processes myself, rather than asking them for it. This is a more autonomous and efficient workflow.