How a frustrating apartment search in Portugal turned into a full-blown AI platform for real estate agencies. This is the origin story.
Table of contents
Open Table of contents
It started with a terrible apartment search
My wife and I decided to move to another apartment. Simple enough, right?
Wrong. What followed was weeks of pain: dozens of websites with outdated listings, broken search filters, properties that were sold months ago still showing as available, and agencies that never replied. We’d find something promising on one site, switch to another to compare prices, lose track of what we’d already seen, and start over.
The worst part was the constant context-switching between platforms. Every agency has their own website, their own listings, their own way of doing things. There’s no single place to search everything.
So I did what any frustrated software engineer would do — I decided to build one.
Enter Predileto
The initial idea was straightforward: build a vertical search engine that aggregates properties for sale and rent across all Portuguese agencies. One search, all listings, no more tab hell.
I started scraping property data from local agencies to build the index. And that’s when things got interesting.
The real problem
While scraping agency websites, I started talking to the people behind them. I wanted to understand their data, their workflows, their pain points. What I found surprised me.
These agencies are drowning in manual work. The CRM most of them use is expensive, ugly, and bloated with features nobody asked for. But the real bottleneck isn’t the CRM — it’s everything around it:
- Property registration — an agent receives a stack of PDFs (deeds, tax documents, ID cards) and manually types every field into the system. Address, area, bedrooms, owner names, NIF numbers — all copy-pasted from scanned documents.
- Applicant screening — tenants submit documents that need to be reviewed, cross-referenced, and approved. Mostly by hand.
- Contract generation — drafting rental or purchase contracts means copying a template, finding the right clauses, and filling in dozens of fields. Every time.
- Visit scheduling — coordinating between owners, agents, and potential buyers/tenants through phone calls and WhatsApp messages.
I talked to enough agencies to see the pattern. These are smart people spending 60-70% of their time on repetitive administrative tasks instead of what they’re actually good at: finding clients and closing deals.
As a software engineer who’s spent years building data pipelines, RAG systems, and async architectures — I saw the opportunity immediately. Most of this manual work can be automated with the tools we have today: OCR, LLMs, structured extraction, queue-based processing.
What Predileto became
Predileto evolved from a search engine into an AI-powered platform for real estate agencies. The vision:
-
Smart property ingestion — upload property documents and ID cards, and the system automatically registers the property and its owners into the database. OCR with Reducto, structured extraction with GPT-5.4, document-type-aware prompts for each Portuguese ID format.
-
Contract intelligence — upload a source contract, and the system parses it, classifies each section (static, parameterized, conditional, generative), and produces reusable Jinja templates. New contracts are generated by filling templates with CRM data.
-
Applicant screening — tenants submit their documents through a portal, and the system extracts, validates, and scores their application automatically.
-
Unified search — the original idea lives on. A single search across all agencies, with fresh data.
The tech stack: Python, FastAPI, hexagonal architecture, SQS workers, Reducto for OCR, LangChain + GPT-5.4 for structured extraction, PostgreSQL, S3, and Terraform for infrastructure. Everything async, everything behind abstract ports so pieces can be swapped without touching business logic.
What’s coming next
This blog series will document the build in detail. Real code, real architecture decisions, real trade-offs. Topics I’ll cover:
- RAG pipelines for document analysis and contract generation
- Structured LLM output with Pydantic schemas and LangChain
- Hexagonal architecture in Python — ports, adapters, and why it matters
- Async SQS workers with heartbeat-based visibility extension
- Testing strategies for AI-powered services without spinning up Docker
- System design for document processing at scale
Each post focuses on a specific feature with the actual implementation. No toy examples, no hand-waving.
Let’s build.