Explore bundling external test media into Dockerfile.test #18

Closed
opened 2026-05-03 16:42:10 +00:00 by gravityfargo · 1 comment
Owner

Context

make test (the default suite, excludes slow/largescale) currently requires two external files in tests/data/:

  • NASA launches Artemis I on 11-16-2022.mp4 — 6.4 MB (archive.org)
  • Capital-Volume-I.pdf — 4.3 MB (marxists.org)

These are referenced by tests/test_thumbnails/test_pdf_thumbs.py:23 and tests/test_thumbnails/test_video_thumbs.py:23 with hardcoded paths and no skipif gate, so the affected tests fail outright when the files are missing. Compare to tests/test_main.py:106-109 (test_exif_data), which correctly uses pytest.mark.skipif(not Path("tests/data/msl-images.zip").exists()).

CI handles the gap in .forgejo/workflows/test.yml:32-43 via actions/cache@v5 keyed on test-media-cache-v1, falling back to wget from archive.org and marxists.org on cache miss. Local containerized runs use make test-slow, which mounts tests/data read-only into the test image (Makefile:13-14).

Why defer

The real-world driver for hardening this — migrating the old Omeka vault — will mostly involve single files, so the current external-fixture flow is good enough for now. Worth revisiting once we hit a scenario where CI flakiness from archive.org / marxists.org outages, or onboarding friction from missing fixtures, becomes a real cost.

Options to consider when picking this up

  1. Bundle the two files into Dockerfile.test via ADD <url> tests/data/.... Removes the wget step from CI; Docker layer cache replaces the GH Actions cache. Doesn't touch the shared athena-archive-ci base image (which is also consumed by the FE a11y crawl, so bloating it for backend-only fixtures is a non-starter).
  2. Vendor tiny synthetic fixtures — a ~100 KB sample MP4 and a 1-page PDF committed to tests/data/. Removes the network dependency entirely and shrinks the test surface. Requires confirming the smaller fixtures still exercise the same ThumbnailGenerator code paths.
  3. Add skipif gates to test_pdf_thumbs.py / test_video_thumbs.py to match test_exif_data's pattern. Cheapest fix; doesn't solve CI's network dependency, only the local-dev failure mode.

Option 2 is probably the best long-term answer; option 1 is a smaller intermediate step that keeps the existing fixtures.

Acceptance

  • Decide between vendoring synthetic fixtures vs. bundling the existing ones into Dockerfile.test.
  • Whichever path: make test should pass on a fresh checkout with no manual download step, and the wget block in .forgejo/workflows/test.yml should be removable.
  • Licensing sanity check before any bundle/vendor: NASA video is public domain; the Marx PDF needs a quick confirmation if we keep redistributing it via the registry.
## Context `make test` (the default suite, excludes `slow`/`largescale`) currently requires two external files in `tests/data/`: - `NASA launches Artemis I on 11-16-2022.mp4` — 6.4 MB (archive.org) - `Capital-Volume-I.pdf` — 4.3 MB (marxists.org) These are referenced by `tests/test_thumbnails/test_pdf_thumbs.py:23` and `tests/test_thumbnails/test_video_thumbs.py:23` with **hardcoded paths and no `skipif` gate**, so the affected tests fail outright when the files are missing. Compare to `tests/test_main.py:106-109` (`test_exif_data`), which correctly uses `pytest.mark.skipif(not Path("tests/data/msl-images.zip").exists())`. CI handles the gap in `.forgejo/workflows/test.yml:32-43` via `actions/cache@v5` keyed on `test-media-cache-v1`, falling back to `wget` from archive.org and marxists.org on cache miss. Local containerized runs use `make test-slow`, which mounts `tests/data` read-only into the test image (`Makefile:13-14`). ## Why defer The real-world driver for hardening this — migrating the old Omeka vault — will mostly involve **single files**, so the current external-fixture flow is good enough for now. Worth revisiting once we hit a scenario where CI flakiness from archive.org / marxists.org outages, or onboarding friction from missing fixtures, becomes a real cost. ## Options to consider when picking this up 1. **Bundle the two files into `Dockerfile.test`** via `ADD <url> tests/data/...`. Removes the wget step from CI; Docker layer cache replaces the GH Actions cache. Doesn't touch the shared `athena-archive-ci` base image (which is also consumed by the FE a11y crawl, so bloating it for backend-only fixtures is a non-starter). 2. **Vendor tiny synthetic fixtures** — a ~100 KB sample MP4 and a 1-page PDF committed to `tests/data/`. Removes the network dependency entirely and shrinks the test surface. Requires confirming the smaller fixtures still exercise the same `ThumbnailGenerator` code paths. 3. **Add `skipif` gates** to `test_pdf_thumbs.py` / `test_video_thumbs.py` to match `test_exif_data`'s pattern. Cheapest fix; doesn't solve CI's network dependency, only the local-dev failure mode. Option 2 is probably the best long-term answer; option 1 is a smaller intermediate step that keeps the existing fixtures. ## Acceptance - Decide between vendoring synthetic fixtures vs. bundling the existing ones into `Dockerfile.test`. - Whichever path: `make test` should pass on a fresh checkout with no manual download step, and the `wget` block in `.forgejo/workflows/test.yml` should be removable. - Licensing sanity check before any bundle/vendor: NASA video is public domain; the Marx PDF needs a quick confirmation if we keep redistributing it via the registry.
Author
Owner

Closing — single-dev project, tests/data/ is always present locally. CI side is handled by actions/cache from #21. Original concern (loud failures on missing fixtures) is no longer a real cost.

Closing — single-dev project, tests/data/ is always present locally. CI side is handled by actions/cache from #21. Original concern (loud failures on missing fixtures) is no longer a real cost.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ModernLeft/athena-file#18
No description provided.