Turning curated affiliate feeds into a searchable, monetized product catalog that powers personalized suggestions inside the Cressi assistant.
Cressi already suggests products to users, but those suggestions are pulled live from Google Shopping on every request — which means ongoing API cost, latency, no control over the catalog, and no affiliate revenue.
This project loads your existing affiliate product feeds (Mytheresa, Coutr, and ~1,700 further advertisers available on demand) into Cressi’s existing vector search, and serves suggestions catalog-first — falling back to the current live search only when the catalog has no good match. Every product link is an affiliate tracking link, so suggestions become a revenue stream rather than a cost centre.
Source data verified: Mytheresa feed ≈ 37k unique products, Coutr feed ≈ 36k unique products (each delivered as size-level rows that we collapse into single products). A third file maps 1,696 additional advertisers we can onboard later.
The solution is an extension of Cressi’s existing pipeline — the same vector database (Qdrant), the same embedding model, and the same product carousel the front-end already renders. We are not building a parallel system, which is what keeps the timeline short and the risk low.
Feed files (Excel / pipe-text)
│ 1 · parse → normalize → merge size variants → keep in-stock
▼
Common product records
│ 2 · generate embeddings (existing model) → upsert
▼
Qdrant "products" collection (filterable: gender · category · price · brand · stock)
│
User request ─▶ assistant ─▶ 3 · vector search + filters
├─ good matches? ──▶ serve from catalog (affiliate links)
└─ too few? ──▶ fall back to live search (today’s behaviour)
▼
Product carousel (unchanged)
The feeds all follow the standard Rakuten layout but differ in delivery (headerless Excel vs. pipe-delimited text). We build a config-driven parser: a small column-map per feed, large files streamed row-by-row, size-level rows merged into one product carrying its available sizes, out-of-stock filtered out, and the affiliate URL preserved exactly. Onboarding a new merchant later is a config block, not new code.
A dedicated products collection in the existing Qdrant, populated with the current embedding model. Products are batch-embedded and upserted with a stable ID, so re-loading a refreshed feed updates prices and stock in place rather than duplicating.
A new search service embeds the user’s request, runs a similarity search with filters (gender, category, price), and serves the matches. If the catalog can’t satisfy the request well, it transparently falls back to the existing live search — so coverage never gets worse than today. Results map onto the front-end’s existing product shape, so no UI changes are required.
A scheduled job re-ingests updated feeds (prices/stock are time-sensitive) and prunes sold-out items. Additional advertisers are added from the supplied directory by dropping in a feed and its column map.
| # | Phase | Key deliverables | Effort |
|---|---|---|---|
| 1 | Feed ingestion (ETL) | Generic Rakuten parser (Excel + text), per-feed column maps, variant merge, in-stock filter, price & image handling | 0.5 d |
| 2 | Embedding & catalog store | New products collection, filterable fields, batch embed ~73k items, idempotent re-load |
0.5 d |
| 3 | Catalog-first serving | Vector search + filters, live-search fallback logic, map to existing carousel via affiliate links | 0.5–1 d |
| 4 | Refresh & merchant ops | Scheduled re-ingest, stale/out-of-stock pruning, onboarding flow for new advertisers | 0.5 d |
| 5 | QA, tuning & deploy | Relevance comparison & tuning, filter correctness, affiliate click-through validation, production rollout | 0.5–1 d |
| Total build effort | 2.5–3.5 d | ||
Phases run as parallel workstreams (ingestion/store alongside serving), so calendar time is shorter than the summed effort. One-time initial embedding of ~73k products is machine compute (a few hours), not developer time, and is included in Phase 2.
Billed at a blended engineering rate of $30 / hour (an 8-hour developer-day = $240/day). The estimate is based on a build effort of 2.5–3.5 developer-days.
| Line item | Effort | Hours | Rate | Estimate |
|---|---|---|---|---|
| Engineering — Phases 1–5 (lower estimate) | 2.5 d | 20 h | $30 / h | $600 |
| Engineering — Phases 1–5 (upper estimate) | 3.5 d | 28 h | $30 / h | $840 |
| Estimated project total (range) | $600 – $840 | |||
Time-and-materials at $30/hour, billed against actual hours within the estimated range; a fixed-fee option is available on request. Excludes third-party costs (Qdrant hosting, embedding/LLM API usage, affiliate-network fees), which run on your existing accounts.