Microservices architecture has gained widespread adoption due to its flexibility, scalability, and modularity. However, managing data consistency and availability across disparate services can pose significant challenges, particularly in large-scale applications like online marketplaces.
In this article, we will explore the complexities of data sharing between microservices within a complex online platform. To elucidate these concepts, we will utilize a real-world example: Fiverr's implementation of the Top Clients feature.
Fiverr, an established online marketplace for freelance services, provides an ideal case study for exploring these challenges and solutions. With millions of users and a diverse array of services, Fiverr's platform architecture offers insights into best practices for optimizing data sharing within a microservices environment.
In this article, we will examine Fiverr's implementation journey of the Top Clients feature, which aims to enhance seller credibility by showcasing their high-quality clients. From the initial motivation behind the feature to the constraints faced during implementation, we will explore how Fiverr navigated the complexities of microservices data sharing.
By dissecting Fiverr's approach and lessons learned, readers will gain a deeper understanding of the strategies and trade-offs involved in effective data sharing between microservices. Let's embark on this journey through the intricate landscape of microservices architecture, using Fiverr as our guiding example.
Fiverr sought to expand the feature's exposure across its top-of-funnel pages, including user pages and search results. This expansion posed new challenges, particularly concerning data consistency and availability, necessitating a robust solution within Fiverr's microservices architecture. Each microservice responsible for serving top-of-funnel pages maintained a read-optimized data-view tailored to its specific page needs.
In developing the Top Clients feature, Fiverr faced three primary constraints, each demanding careful consideration:
High Consistency Requirement for Self-View Mode:
Ensuring a seamless user experience, particularly in the self-view mode where sellers can view and edit their profiles, including their top clients, was paramount. Any discrepancies in this mode were deemed unacceptable.
Therefore, data consistency in self-view mode was prioritized to guarantee that the information presented to sellers remained up-to-date at all times.
To achieve this, Fiverr relied on data sourced directly from the Top Clients service database, where all reads and writes occurred in a controlled and consistent manner.
Eventual Consistency for Other Pages:
While self-view mode demanded immediate consistency, other pages such as the gig page or the user page could afford a degree of delay in data updates.
For instance, when a seller added a new client, there might be a brief delay before this information propagated to other pages. This delay was acceptable as long as eventual consistency was achieved across the platform.
By acknowledging the asynchronous nature of data updates on these pages, Fiverr ensured that users experienced a seamless browsing experience despite potential delays in data synchronization.
Uncompromised Availability of the Top Clients Service:
Given that the seller's top clients were prominently featured on Fiverr's most popular pages, ensuring the availability of the Top Clients service was imperative. Even if data fetching from the service encountered issues, the affected page should render successfully, albeit without the missing data.
This constraint required a robust architecture design that could gracefully handle service failures without disrupting the overall user experience. Fiverr's approach prioritized fault tolerance and graceful degradation to uphold platform availability under varying conditions.
Exploration of Alternatives
The synchronous approach entails each service sending an HTTP request to the Top Clients service to retrieve a seller's top clients and subsequently waiting for a response. This approach mirrors the ad hoc solution initially implemented for the Top Clients feature.
Simplicity for low scale: With each service making a single API call to the Top Clients service, implementation complexity remains low, making it suitable for smaller-scale deployments.
Data consistency: By fetching data directly from the Top Clients service, all pages display identical data, ensuring consistency across the platform.
No need for data duplication: Since all services consume data directly from the Top Clients service, there is no need for consumers to store this data in their respective databases.
Lower Availability: As traffic increases, particularly on high-traffic pages like top-of-funnel pages, the Top Clients service may struggle to handle the elevated load, leading to decreased availability and potential service disruptions.
Redundant calls: Not all sellers may have provided data about their top clients, yet every service still calls the Top Clients service, resulting in unnecessary requests and potential empty responses.
Higher Latency: Introducing an additional API call to the Top Clients service for each page rendering may prolong response times and increase latency, impacting overall user experience negatively.
Optimization To mitigate the higher latency associated with synchronous calls, one potential optimization involves making parallel calls to both the User Page service and the Top Clients service. However, this optimization is contingent upon having the necessary information available to call the Top Clients service independently of the User Page service's response.
In recognition of the read-heavy nature of the Top Clients service and the need to decouple business logic from user-facing pages, Fiverr opted for an asynchronous approach. This approach leverages Command Query Responsibility Segregation (CQRS), a design pattern that separates the responsibilities of handling read operations (queries) from write operations (commands), thereby promoting scalability and maintainability.
Fiverr's architecture embraces CQRS alongside event sourcing, a combination that facilitates robust data management and scalability. When a seller performs actions such as adding or deleting a top client, the Top Clients service updates its database and publishes a corresponding message to Kafka, a distributed messaging system. This message describes the change that occurred, enabling other services to consume it and update their data views accordingly.
High Availability: By storing top client data locally within each service, the asynchronous approach eliminates the need for subsequent API calls to the Top Clients service. This ensures that each service has immediate access to the required data, enhancing availability and reducing reliance on external dependencies.
Eventual Consistency: Leveraging Kafka's high availability, the data in various services becomes eventually consistent with the source of truth data maintained by the Top Clients service. This ensures that updates propagate seamlessly across the system, even in the face of network partitions or service failures.
Resource Management: Each service has the autonomy to adjust its resources independently based on its specific requirements. By fine-tuning factors such as CPU, memory, and database performance, services can effectively manage their scale while maintaining low query complexity and ensuring optimal performance.
Possible Inconsistencies: Despite eventual consistency, there remains a possibility of inconsistencies between the databases of different services. This divergence may lead to different pages displaying varied top clients for the same seller, potentially causing confusion or discrepancies in user experience.
Data Duplication: Adopting an asynchronous approach results in data duplication across multiple databases. While this redundancy enhances availability and reduces reliance on centralized data sources, it also increases maintenance overhead and introduces the potential for data integrity issues if not managed effectively.
Overall, while the asynchronous approach offers significant advantages in terms of availability, scalability, and autonomy, it necessitates careful consideration of data consistency and duplication concerns to ensure a seamless and reliable user experience across the platform.
The hybrid solution represents a refinement of the synchronous alternative, aiming to address inefficiencies while accommodating the variability in seller-provided data. A key observation driving this approach is the realization that not all sellers furnish data regarding their top clients.
Consequently, it becomes unnecessary to call the Top Clients service for every page rendering operation.
Data Write Process:
Upon creation of a client, the Top Clients service disseminates the client ID to its consumers. Each service, including the Gig Page service, saves the client identifier in its respective database. This process ensures that client information is readily available to the relevant services without necessitating repeated calls to the Top Clients service.
Data Read Process:
When rendering the gig page, the Gig Page service checks its database to ascertain whether the seller has associated client IDs. If such IDs exist, the service selectively calls the Top Clients service to enrich the data related to those clients. By adopting this approach, redundant calls to the Top Clients service are avoided, optimizing resource utilization and response times.
Reduced Redundancy: The hybrid solution minimizes redundant calls to the Top Clients service by selectively invoking it only for sellers with associated client data. This optimization reduces the strain on the Top Clients service and mitigates potential bottlenecks in the system.
Smaller Database Size: By storing only client IDs instead of complete client data, the database size of each service is significantly reduced. This optimization leads to more efficient data storage and retrieval operations, contributing to improved system performance.
Impact on Response Time: Despite the optimization measures, the Top Clients service may still exert an influence on the response time of pages where it is invoked. Depending on factors such as service load and network latency, the responsiveness of these pages may vary.
Data Duplication: While the hybrid solution reduces redundancy in API calls, it does not eliminate data duplication entirely. Client IDs are stored in multiple service databases, potentially leading to inconsistencies if not synchronized appropriately.
Transition to an asynchronous approach using Kafka
The transition to an asynchronous approach using Kafka marked a significant milestone for Fiverr. By leveraging Kafka's distributed messaging system, Fiverr could facilitate seamless data sharing among microservices while reducing reliance on synchronous API calls. Messages describing changes in top client data are published to Kafka by the Top Clients service, allowing other services to consume these messages and update their data views accordingly. This architectural shift empowered Fiverr with scalability and high availability, enabling services to operate independently and process data updates asynchronously.
Additionally, refinements in the hybrid solution played a crucial role in optimizing performance and minimizing data redundancy. By selectively calling the Top Clients service based on the presence of client IDs in service databases, redundant API calls were reduced. Client IDs were stored in service databases to enable efficient retrieval and rendering of top client data on relevant pages. These refinements aimed to strike a balance between data consistency and system performance, ensuring a seamless user experience on the Fiverr platform.
Outcome and Evaluation
The adoption of an asynchronous approach using Kafka yielded several benefits for Fiverr. Firstly, it ensured high availability and scalability of data sharing mechanisms, allowing services to handle increasing data volumes and user interactions effectively. Despite the asynchronous nature of data sharing, measures were in place to maintain data consistency, with Kafka's reliability ensuring reliable delivery of messages describing data updates to consuming services. Regular monitoring and reconciliation processes were employed to detect and resolve any discrepancies in data views across services.
Furthermore, the implementation process provided valuable insights into enhancing microservices architecture and data sharing practices. It highlighted the importance of balancing trade-offs between consistency, availability, and performance in distributed systems. These lessons learned informed future development efforts, guiding architectural decisions and refining best practices for data sharing and synchronization in microservices environments.
Fiverr's journey in implementing the Top Clients feature highlights the complexities and considerations involved in optimizing data sharing within a microservices architecture. By prioritizing scalability, availability, and consistency, Fiverr successfully enhanced seller credibility while ensuring seamless platform functionality. This case study underscores the importance of robust data-sharing mechanisms in driving platform innovation and user trust within online marketplaces.