The problem
Freelancer profiles often contain partial or inconsistent information. The useful question was not simply whether two strings matched, but whether several weak signals could be combined into a shortlist that a person could review quickly.
What I built
The workflow combined public-profile collection, search queries, deterministic scoring, and optional LLM reranking. It used signals such as name variants, location, title phrases, skills, companies, and education.
The public Upwork to LinkedIn matcher documents one measured slice of the work. On a 51-profile golden dataset, the main bottleneck was search coverage rather than the final semantic selection step. That changed the next iteration: improve candidate discovery before spending more effort on reranking.
Related work
The broader sourcing workflow also included a Playwright-based Upwork profile collector with persistent sessions, small resumable runs, deduplication, normalized fields, and CSV export.
Provider matching
Another part of the workflow turned messy project briefs into inspectable provider shortlists:
- Convert Markdown briefs into structured requirements with a strict schema.
- Normalize categories and skills into a shared vocabulary.
- Expand important skills through a curated synonym map.
- Normalize capabilities from multiple provider sources.
- Filter by visible constraints such as skill overlap, budget, recency, and timezone.
- Loosen category enforcement once when a strict pass produces no candidates, preserving recall for human review.
The model is useful for extracting structured facts from unstructured briefs. The shortlist layer remains deterministic and inspectable: matched skills, thresholds, categories, and source signals stay visible to the reviewer.
What I learned
Matching systems are easiest to improve when each stage is observable. Separate discovery, filtering, scoring, and semantic selection so an evaluation can tell you where recall or precision is actually being lost.