Architecture of a service for collecting and classifying housing ads

In this article, I will talk about how the service for finding rental ads from vk.com is designed and developed, why a service-oriented architecture (SOA) was chosen, and what technologies and solutions were used during development.
The service has been running for over nine months.

During this time:

Throughout the article, I will use the word service to mean a SOA module, not the entire web service.

I chose a SOA architecture because it allowed:

While it could be called a microservice architecture, there were some differences. Services exchange data using a "Shared Database" approach based on the MDBWP protocol instead of the traditional HTTP API with separate databases for each service. This approach enabled faster development while keeping the benefits of the SOA method.

To automate deployment, I chose Ansible, a configuration management tool with a low entry barrier.

As the database, I selected MongoDB. This document-oriented database was well-suited for storing rental ads, metro station lists, landlord contact information, and ad descriptions.

Currently, the overall interaction scheme of the services looks like this:

Services

Service for displaying and searching ads

https://github.com/mrsuh/rent-view

This service is written in NodeJS because the most important quality criteria were fast server response times for users.
The service fetches ads from MongoDB, renders HTML pages using the doT.js templating engine, and sends them to the browser.
The service is built with Grunt.
For the browser, scripts are written in pure JS, and styles are created with LESS. Nginx is used as a proxy server to cache some responses and enable HTTPS connections.

Ad Collection Service

https://github.com/mrsuh/rent-collector

This service collects ads, classifies them, and saves them to the database.
It is written in PHP for several reasons: familiarity with the necessary libraries and fast development speed.
The framework used is Symfony 3.
Beanstalk was chosen as the queue service. It is lightweight but doesn't have its own message broker, making it ideal for a small virtual server and non-critical data.

Using Beanstalk, four message exchange channels were created:

Ad Classification Service

https://github.com/mrsuh/rent-parser

The service is written in Golang.
To extract structured data from text, the service uses the Tomita parser. It performs both preprocessing of the text and postprocessing of parsing results.
You can read more about ad classification in this article.

Settings Management Service

https://github.com/mrsuh/rent-control

The service is written in PHP for its familiarity with the required libraries and fast development speed.
Framework: Symfony 3
Style Library: Bootstrap 3

This service manages settings such as:

Originally, all parsing management data was stored in configuration files. However, as the number of cities grew, it became necessary to:

Notification service

Notification Example

The Service is written in Golang

How it works:

Helper Repositories

Code for the Shared Database in PHP

Shared Database Schema

https://github.com/mrsuh/rent-schema

When the rent-control service was added, database schema code started to duplicate across services. To avoid this, the schema code was moved to a separate package.
Now, any PHP service can easily use this package by adding it as a dependency with composer:

composer require mrsuh/rent-schema

ODM for mongoDB

https://github.com/mrsuh/mongo-odm

The first ODM for PHP MongoDB I considered was Doctrine 2. It comes with Symfony 3 and has excellent documentation.
However, at the time of building the service, for Doctrine 2 to work with the latest version of Mongo PHP drivers, you needed to install an additional package as a bridge between the new and old API. Doctrine 2 is already a large project, and adding another package made it even heavier.
I wanted something more lightweight. So, I decided to write my own ODM with a minimal set of features. And it worked! The ODM handles its tasks perfectly.

Statistics

The service adds an average of 519.41 listings to the site every day.
The most popular metro stations in the largest cities of Russia are:

Conclusion

If you are unsure whether you need SOA architecture, start by building a monolithic application with modular breakdowns. This way, it will be easier to transition to services if needed. However, if you choose to use SOA architecture, be aware that it may increase the complexity of development, deployment, the amount of code, and the volume of messages between services.

P.S. I found my last two apartments using my service. I hope it helps you too.