Algorithmic Audits on TikTok: Why Results are Hard to Replicate and Quickly Expire

Have you ever wondered why the videos recommended to you on TikTok feel like they read your mind? TikTok’s algorithm raises significant questions about transparency, fairness, and the risk of users getting trapped in so-called filter bubbles containing misinformation or harmful viewpoints. This is why researchers started to conduct algorithmic audits—to see how these recommendations work. But what happens when different teams try to repeat these studies and find entirely different results? In our recent research paper, „Revisiting Algorithmic Audits of TikTok: Poor Reproducibility and Short-term Validity of Findings“, we address exactly this issue and provide new insights into the reproducibility of the current generation of algorithmic audits. This paper, which is the result of our ongoing AI-Auditology project, has been recently accepted at the prestigious international conference SIGIR ’25.

What We Found: Reproducibility is Tough

Our research highlights the following challenge: we faced significant obstacles when trying to replicate studies on TikTok’s algorithm. Original studies examining how personalised recommendations are made through watching videos, liking, or following creators turned out to be very difficult to replicate. At the same time, any kind of auditing (algorithmic one not being an exception) must be easily replicable, to verify the audit results, as well as to verify an improvement by countermeasures taken.

There are Two Key Issues why Reproducibility is so Difficult

TikTok evolves rapidly. TikTok constantly introduces changes – new features and adjustments to their recommendation engine and website. Even a minor difference can affect audit results. For example, we found ourselves frequently updating our code to adapt to sudden changes in the TikTok platform during our seven-month-long audit project. Assume you want your automated bot to click a specific button. We look for the button with a specific text, for example, “Allow All” (when accepting cookies). Once the text of the button changes, for example, to “Accept”, the bot would not be able to click it. Similar problems happen when using more sophisticated approaches for finding the button (such as searching based on its properties or even location). 

Methodology challenges. Earlier studies did not always clearly document their processes or release their source code, making it difficult for us to replicate their results. Due to these reasons, it was difficult to get the original code running. Even when we got the code to run, the technical part was outdated, and we had to find a different approach. Differences in methodologies, such as video watching duration or the precise nature of the bot activities, can also lead to different findings. 

Insights from Our Audit: Short-lived Results?

Our experiment replicated earlier studies conducted between 2021 and 2022. We figured out some differences:

Earlier research suggested that “following” creators had the strongest effect on personalising your TikTok feed, but in our audit, watching videos for longer durations showed the most significant impact. Watching a video completely or multiple times increased the personalisation factor. What this means is that just by simply watching a video on a specific topic for a longer duration, TikTok may determine that the topic is of interest and start recommending more and more videos from that specific topic. Coupled with the tendency to first recommend the most popular videos and then slowly recommend videos that are more niche, may lead you to more problematic or extreme content. 

We also observed that TikTok’s algorithm has shifted toward more extensive exploration of new content before personalising recommendations based on your explicit actions (liking or following). Such findings make sense, as the algorithm tries to keep you at the platform and finding all of your interests is a sure way to achieve this.

What’s the Impact?

These insights are important, particularly in an era of digital regulations like the EU’s Digital Services Act (DSA), which requires regular audits to ensure transparency and accountability on digital platforms. Poor reproducibility can affect the effectiveness of these regulatory efforts, as regulators and platforms alike may struggle to verify the accuracy and generalizability of audit findings.

So, Where Do We Go from Here?

To enhance the reliability of algorithmic audits, we propose in the AI-Auditology project a novel approach that is based on three key advancements:

Longitudinal studies. Conducting continuous, long-term audits to track changes and identify trends over time.

Multiplatform audits. Applying the same methodologies across different social media platforms to see how results vary.

Authentic user simulations. Developing more realistic user simulations for audit bots to better reflect real user behaviours.

In conclusion, algorithmic audits like ours are essential but complicated. Only by embracing consistent and transparent methodologies can we truly understand how recommender systems and search systems in social platforms like TikTok work.