Scale Your Social Media to Support Millions of Users

Design and develop your social media platform with a scalable micro services architecture, employing techniques such as auto scaling, load balancing, database clustering, database sharing and data caching, to achieve scalable performance at optimised costs.

Social media has the potential to grow exponentially. Facebook and Instagram have more then two billion users worldwide. The platform should be designed and developed to support such scale. Social media can grow suddenly, when it is least expected too.

We developed a social media app for a US community focused on political discourse. The community grew to about ten thousand users in six months after launch. In a viral moment, our app got coverage on national TV, one day. One million people signed up on that single day, and three more million in the week following!

Cost should be given careful consideration too. Social media platforms at large volumes can consume a lot of expensive resources. The platform has to be optimised to consume as little resources as needed without compromising performance.

In this article, we discuss some of the tools and techniques that we employ to scale social media platforms and at the same time keep costs low.

System Architecture

Below is a simplified architecture diagram of a social media platform. In reality platforms are much more complex with many components and interconnections. We present a simplistic view to discuss the concepts without distractions.

The frontend - the app running on your mobile phone or desktop - presents the user interface, sends and receives data from the backend.

The backend does all business logic, security and data processing. The backend code runs on server instances typically located in the cloud and connected through the internet.

The data is stored and retrieved from the database, and the media files (photos, videos) in storage devices. The database and storage are hidden behind the backend servers and are not directly accessible from the frontend clients.

Frontend interacts with the backend through a well defined interface called the API (Application Programming interface)

Horizontal and Vertical Scaling

The server instances (that run the backend code) are available in various configurations. Configurations differ on four main parameters - no of cpu cores, amount of memory, amount and type (SSD or HDD) of disk storage and the network bandwidth. A lower configuration could have a single core, 1gb of memory, 2gb hard disk, with 20 mbps network bandwidth, while a higher configuration, 128 cpu cores, 512 gb of memory, 2 Tb of SSD disk and a 50Gbps network bandwidth. Choosing the optimal configuration for the work load is necessary - the higher configuration instances cost much more than a low end configuration.

The backend performs functions on behalf of the client (refer to the architecture diagram above). When someone opens a social media app on their mobile phone and requests for the latest set of posts, the backend authenticates the user, queries the database, processes the data and returns it to the mobile app. Thus actions on the clients (mobile and web apps), result in work in the backend. If thousands of users start using the app simultaneously, then the server capacity might not be enough to handle all the workload.

Vertical Scaling:

One way to handle this increased work load, is to increase the capacity of the server instance. We could increase the no of cpu cores of the instance from 2 to 12. If the workload increase further, then we can upgrade the instance from 12 to 24 cores. This is an example of vertical scaling. In vertical scaling we keep increasing the capacity of the system to handle increasing work loads.

Vertical scaling has its limits. As we keep upgrading, at some point, even the top most configuration will not be sufficient. There is also another problem - how to efficiently handle changes to the workload. To save money, we should downgrade the configuration when the workload decreases and upgrade when it increases. However, Social media workloads tend to be erratic. Events happening in the real world - like a sports event, or celebrity moment - can cause traffic to spike. The work of creating a new configuration and moving the traffic to it, takes time, adversely affecting performance.

Horizontal Scaling:

Horizontal scaling solves these limitations. With horizontal scaling instead of increasing the capacity of an instance, we add more instances and distribute the workload among them. Theoretically, there is no limit to the no of instances we can add - the system can scale indefinitely. New instances can be created and added on the fly, without affecting the existing traffic and performance. If any server instance breaks, then the rest of servers can take the load without any glitch.

AutoScaling and Load Balancing

To achieve horizontal scaling we can employ two techniques - auto scaling and load balancing.

Autoscaling works by monitoring the utilisation of existing server instances. Once the utilisation reaches a certain limit (say 70% of capacity), new instances are created. A load balancer then distributes the work among all the instances, taking into consideration the current utilisation of each instance.

Micro Services Architecture

A social media backend does a variety of operations. Access requests have to be authenticated and the identity of users have to be established, posts have to be evaluated and the most appropriate ones for each user have to be chosen and sorted, posts have to checked for profanity or obscene content, user activities have to be logged and analysed.

In a traditional monolithic architecture all the backend exist together and run as a single entity in a single server. If the server is horizontally scaled, then the full backend logic is replicated.

In a micro services architecture, the backend code is split into multiple services. Authentication and identity management can be done by an Identity Service, posts can be chosen for distribution by a Content Distribution Service, posts can be evaluated for obscene content by a Moderation Service, activity logging can be done by a Logging service.

The development and maintenance of individual services is much more efficient than a single monolithic system. Each service can be run separately in different servers and scaled independently. A Moderation Service might have a light workload and needs only one or two servers, but the Distribution service might need more. A micro services architecture leads to a efficient horizontal scaling.

Scaling Databases

To understand database scaling concepts, first let us see how a database instance might look like. We have intentionally taken a simplistic view here, to explain the concepts without going too deep into the technology.

The database resides in “storage volumes”. You can consider a storage volume to be a single large space of storage that can be expanded or shrunk. In reality they are constructed from across multiple storage devices and hard disks.

The database server receives database queries (reads and writes) from application servers and other clients. It processes these queries and reads (and writes) from the disk and sends the results to the requestor. The database server instances can be of varying configurations. A lower end database server could have 2 cpu cores and 1gb of memory, whereas an higher end database server could have 64 cores and 512gb of memory.

Database Cluster

Data is very critical to any application. But database servers can break. Hard disks and storage devices can fail or misbehave. A database cluster prevents such catastrophic failures, by keeping a copy of the data and server instances in standby.

In a database cluster, one node is designated as a primary. A cluster also has one or more secondary nodes. When a primary receives a write request, the data is also sent to all the secondary nodes for storage. Thus an exact copy of the database is kept in multiple nodes. If the primary breaks, then one of the secondaries takes the role of primary.

The database can be scaled vertically by increasing the capacity of the database servers. However, it might not be possible to indefinitely scale beyond certain limits. The size of the database volumes can also be vertically scaled by adding more storage disks. However retrieving data from a very large database is less efficient. Duplicating a large database across multiple secondary nodes in the cluster is also very expensive.

Database Sharding

Horizontal scaling in database can be achieved using a technique called Sharding. Typically, the database is made up of many tables. Each table can contains millions of rows of data. Instead of storing the entire table in a single cluster, the rows in the table are distributed among multiple clusters. The information (metadata) about which cluster contains which row of data is stored in a separate server called a config server. When a read request is received, the data is fetched from one or more clusters, merged and sent.

Each cluster contains only a portion of the database. Hence data retrieval is more efficient. Each cluster can have one or more secondary servers duplicating only its portion of the data. The database can be scaled by adding more cluster nodes and redistributing the data.

Data Caching

Social media generates a lot of data. A person posts an image and tags a friend to the post. Another user likes it. This trivial action creates the following database actions:

Store the image in file storage

Create a record to store details of the image and its storage location.

Create a record to store the post details and add a reference to the image.

Update the user record to add a reference to this post the he authored.

Create a record to store the action of tagging his friend.

Update the record of the friend, to add a reference to the post.

Create a record to store the action of another person liking the post.

Social media platforms have to carefully select posts that are of relevance and interest to the users. They have to consider multiple aspects such as what is the relationship between the user and the author, is he a friend or follower? Are they work colleagues? Do they subscribe to similar groups? What is the content of the post about? Has this user shown interest in posts with similar content or from this author?. These decisions have to be done in real time, resulting in database queries spanning across multiple tables and rows.

One way to increase the performance of data retrieval is to cache the data in memory after fetching from database. Reading from memory is faster than retrieving the data from database. Hence next time when the same data has to be fetched, it can be retrieved quickly from the cache.

Memory is expensive and the memory cache is much smaller compared to the database. If there is no space in the memory cache, then unused entries in the cache are replaced. The more efficient we select entries to add and remove from the cache, the better the performance.

Caching works well with the way social media works. In social media, posts created once are repeatedly read by thousands of users. Posts on trending topics, posts from celebrities and viral posts are retrieved much more than others. By caching such frequently accessed posts, the performance of the system can be improved.

Client Side Caching

Caching can be done in the client side (mobile and web apps) as well. A user might look up his friends and followers frequently. He might save (bookmark) posts for referencing later. He might frequent his message inbox and browse older messages. By caching such data in the mobile app itself, unnecessary requests to the server can be avoided.

Network Side Caching

Caching of content can also be done at the network side as well. A celebrity posts a viral video - millions of users retrieve and watch it. Instead of retrieving the video from storage, if the video is cached in multiple servers across the globe, then users can retrieve the video from a nearest server quickly This technique is called CDN (Content Delivery Network).

Contact Us

Enter Your Name

Enter Your Phone Number

Enter Your Email

Briefly Describe Your Requirement