Solving the Filter Bubble Problem using Machine Learning

By Shalvi Mahajan, Data Scientist at SAP SE

The world of the internet is changing constantly. There was a time when one could do anything on the internet anonymously, but today, all our actions are being analyzed and utilized. Every day, we browse various websites to search for information and get results based on our search queries. Little do we know about how our search behvior affects us in return and alters the way we think.

The Filter Bubble

The "Filter Bubble" refers to the closed system that is created because of the algorithms which are behind the search results that we encounter everyday. It makes a person wonder about how artificial intelligence (AI) is giving such personalized suggestions. According to the famous author and activist, Eli Pariser, such algorithms create "a unique universe of information for each of us, which fundamentally alters the way we encounter ideas and information." Even searching for just a single word on a website makes that website install various cookies, which results in seeing various related popups for that word even on different websites.

Today, everyone has access to free internet but the information that we directly or indirectly provide to the internet comes with a price to steer our information pool. For example, suppose a user searches for the word "bag" on Amazon, and then closes the tab. Now, if they log in to their Facebook account, they will see many ads related to the bag, which the user searched on Amazon. This is not only compelling the user to buy it, but also making them see only the search he once made. Such things impact our mind because we only see what we know, not actually considering what other options one could have. Thus, we are always stuck in the same information bubble. Basically, such a bubble forces us to think “inside” the box. Similar things happen with other media websites and even simple search, which is also called “Media Bubble”.


Image source

How does a filter bubble emerge?

Today, most people consume news from media platforms such as Twitter, Apple News and co, which intend to deliver personalized content to a user. Artificial intelligence plays a big role in creating such recommendations that are close to user preferences. In a nutshell, two users might be seeing totally different content in their news feed, thus leading to a filter bubble, where you only see what you want to see rather than what you should see. This creates a polarization in the society.

This media bubble is a product of our bias towards supporting content with very little tolerance to opposing views. With so many articles being published daily, such a bias always exists in news, media, tweets, etc. Media companies also focus on highly liked content (by counting clicks, time spent on content, likes, etc.). Machine learning algorithms are quite good in identifying such patterns across the large datasets and form clusters of users with similar taste. Users are mostly attracted to sensational titles, thereby the information that they see would also be similar. The same thing happened during the 2016 US presidential elections which created a wrong information bubble all over the media platforms. Thus, fake news can steer a big decision and we are surrounded with this bubble all over the internet. In order to fix the filter bubble, Eli Parser developed an online platform named "Upworthy" with the goal of spreading highly shareable content. Unfortunately, Upworthy is not immune to the effect of a filter bubble and created its own at some point.

Breaking a Filter Bubble

AI helped create a filter bubble and now it is needed to break it as well. One idea is to have fair representation of all kinds of content in the training set so that AI algorithms also result in fair results. For example, LinkedIn has an idea of “fairness aware re-ranking”, which is used for hiring candidates. One idea is to quantify the bias with respect to age and gender. The concept is to “make sure the proportion of female candidates shown is the same as the corresponding proportion of profiles matching that query.” Such recommendations actually helped a lot in correcting bias for some professions. Similar to the LinkedIn approach, we can have diversity in news and media articles to train our AI model with fair datasets resulting in diverse results. Also, making sure that the coverage of such news articles should have both positive and negative aspects of a particular news piece in the training set.

An interesting approach to do this could be to train a machine learning algorithm on a diverse text corpus or text from speech corpus and develop a model that could analyse the patterns such as user interactions, keywords, etc. that could lead to a filter bubble. After the filter bubble is mapped out, using the correlations between different filter bubbles, the model could cross link the data with its polarized counterparts. By using this, users would always have a few personalized content along with new and different content in their respective news feeds, thus bursting such echo chambers. 

There is a browser extension called Nobias which also provides you a detailed summary of the credibility and bias of the news you consume. One can know if they have an equal proportion of diverse content or too biased points of view.

We are also responsible for our own filter bubbles. To solve this on the user end, one could delete web history, work on an incognito browser, delete cookies, and like a variety of diverse things on a social media website. This way, there would be diversity in our choices, which could help AI to deliver diverse content.


To solve a problem, the first step is to acknowledge it. Not everyone is aware of such filter bubbles. By informing users about the issue and how it affects us, we can break out of our filter bubbles by consuming information from a variety of credible sources, and seeking out every side of the argument. Solving filter bubbles across the internet could definitely help us diversify our opinions especially on sensitive topics. It could shift our conservative thinking towards a more progressive one. With this, we are more than capable of making informed decisions that could affect a huge population. Together we can fight against filter bubbles using AI and also help AI from the user end to deliver different opinions.

(All views are my own)

Get the latest data science news and resources every Friday right to your inbox!