The Dynamics of Hoaxes in Indonesia
Topic Modeling Analysis of TurnBackHoax.id Data in 2024
By: Ali Al Harkan (2406480304)
Course: Digital Research Methods | Instructor: Dr. Eriyanto
Topic Modeling Analysis of TurnBackHoax.id Data in 2024
By: Ali Al Harkan (2406480304)
Course: Digital Research Methods | Instructor: Dr. Eriyanto
This comprehensive analysis applies Latent Dirichlet Allocation (LDA) topic modeling to 3,746 hoax documents collected from turnbackhoax.id, Indonesia's leading hoax-busting platform. The dataset spans the 2024 Indonesian election period and has been categorized into three major types: Politics Scam Others.
TurnBackHoax.id is operated by MAFINDO (Masyarakat Anti Fitnah Indonesia), Indonesia's first anti-hoax community. It serves as a comprehensive database of fact-checked misinformation circulating in Indonesian social media and messaging platforms.
2024 Indonesian Election Context: Indonesia held its presidential election on February 14, 2024, with three candidate pairs competing. The election period was marked by intense political polarization, widespread social media misinformation, and concerns about foreign interference narratives.
44% of political hoaxes are candidate-specific character attacks, with narratives predominantly targeting Anies Baswedan (33%) and promoting the Jokowi-Prabowo-Gibran coalition (11%). This reveals a highly polarized information environment where personal attacks dominate over policy discussions.
| Category | Documents | % of Total | Avg Text Length |
|---|---|---|---|
| Politics | 1,358 | 36.2% | 392 chars |
| Scam | 939 | 25.1% | 401 chars |
| Others | 1,449 | 38.7% | 455 chars |
| Category | 5 Topics | 7 Topics | 10 Topics | Best Choice |
|---|---|---|---|---|
| Politics | 0.4520 | 0.4525 | 0.4581 | 10 topics |
| Scam | 0.4523 | 0.4487 | 0.4419 | 5 topics |
| Others | 0.4502 | 0.4610 | 0.4531 | 7 topics |
Comprehensive overview of all 22 topics identified across three categories of hoax narratives.
1,358 political hoax documents from the 2024 Indonesian election period.
| Topic ID | Topic Label | Documents | % Share | Top Terms |
|---|---|---|---|---|
| 0 | Social Media Verification | 107 | 7.9% | temu, rupa, gambar, jelas_akun, judul, sama, sedang |
| 1 | Palace Appointments | 8 | 0.6% | istana, lantik, kaesang, pramono, ganti, bakal |
| 2 | Foreign Interference & Religion | 8 | 0.6% | bongkar, cina, agama, partai, ancam, bikin |
| 3 | Parliamentary Affairs | 9 | 0.7% | dpr, bayar, lapor, gratis, milik, jelas_buah |
| 4 | Anti-Corruption Protests | 8 | 0.6% | demo, kpk, mahasiswa, gagal, kasus, libat |
| 5 | General Politics (Anies Attacks) | 449 | 33.1% | indonesia, presiden, anies, negara, jadi, sebut |
| 6 | Fact-Check Metadata (Artifacts) | 46 | 3.4% | disinformasi_first, draft_news, jakarta, jenis_mis |
| 7 | Election Fraud (China) | 22 | 1.6% | china, suara, kalah, google, mau, jabat |
| 8 | Jokowi-Prabowo-Gibran Coalition | 146 | 10.8% | jokowi, prabowo, gibran, ikn, pilkada, anak |
| 9 | KPU Manipulation | 14 | 1.0% | kpu, pecat, kuasa, panggil, gak, naik |
939 scam-related hoax documents targeting economic vulnerabilities.
| Topic ID | Topic Label | Documents | % Share | Top Terms |
|---|---|---|---|---|
| 0 | Fake Job Recruitment | 446 | 47.5% | juta, loker, lowong_kerja, daftar, indonesia, gaji |
| 1 | Lottery & Banking Scams | 121 | 12.9% | bank, undi, festival, gratis, motor, hadiah |
| 2 | Account Phishing | 334 | 35.6% | akun, resmi, nomor, hubung, pihak, minta |
| 3 | Fake Recruitment Letters | 155 | 16.5% | gaji, kerja, surat, posisi, terima, bulan |
| 4 | Celebrity Deepfake Endorsements | 110 | 11.7% | rupa, temu, guna, taut_daftar, pasti, profil |
1,449 miscellaneous hoax documents covering health, disasters, and social issues.
| Topic ID | Topic Label | Documents | % Share | Top Terms |
|---|---|---|---|---|
| 0 | Health Misinformation | 165 | 11.4% | sehat, obat, sakit, sebab, akibat, bahaya |
| 1 | News Articles & Headlines | 144 | 9.9% | artikel, judul, periksa_mafindo, tiba, tanda |
| 2 | Disasters & Events | 263 | 18.2% | kota, banjir, bencana, gunung, warga, gempa |
| 3 | COVID-19 & Conspiracy | 51 | 3.5% | vaksin, covid, virus, bill_gates, wef, digital |
| 4 | Religion & Sports | 56 | 3.9% | islam, timnas, piala_dunia, umat, masjid |
| 5 | Mixed Content | 639 | 44.1% | indonesia, baru, dapat, orang, masuk, jadi |
| 6 | Medical Miracle Cures | 77 | 5.3% | sembuh, darah, air, ginjal, israel, minum |
Use the tabs below to explore pyLDAvis interactive visualizations for each category. Each circular visualization shows topics positioned by their similarity - topics closer together share more vocabulary. Click on topics to see their top terms.
1,358 political hoax documents analyzed. Topics identified include candidate attacks, election fraud narratives, foreign interference conspiracies, and institutional distrust themes.
939 scam-related hoaxes analyzed. Topics include fake job offers, lottery scams, banking phishing, government aid fraud, and celebrity deepfake endorsements.
1,449 miscellaneous hoaxes analyzed. Topics include natural disasters, COVID-19 vaccines, food safety scares, religious content, celebrity news, and conspiracy theories.
Explore in-depth analysis, visualizations, and key topics for each category.
The political hoax landscape is dominated by candidate-specific character attacks (44%), revealing a highly polarized campaign environment. Topic 5 (Anies Baswedan attacks) and Topic 8 (Jokowi dynasty narratives) account for 595 documents combined - nearly half of all political misinformation.
Top terms: indonesia, presiden, anies, negara, jadi, sebut, telusur
The largest topic cluster, focused on attacking Anies Baswedan's character and qualifications. Represents the dominant narrative strategy in political misinformation.
Top terms: jokowi, prabowo, gibran, ikn, pilkada, anak
Narratives about dynastic politics, coalition building, and the controversial candidacy of Gibran (Jokowi's son) as vice presidential candidate.
Top terms: temu, rupa, gambar, jelas_akun, judul
Hoaxes involving manipulated social media content, fake accounts, and doctored images/videos.
Top terms: china, suara, kalah, google, mau
Conspiracy theories about Chinese interference in vote counting and election manipulation.
Top terms: kpu, pecat, kuasa, panggil, gak
Narratives undermining trust in the General Election Commission (KPU) through claims of corruption and manipulation.
Beyond candidate attacks, topics targeting KPU (election commission), KPK (anti-corruption commission), and DPR (parliament) collectively represent systematic efforts to undermine institutional legitimacy - a dangerous pattern for Indonesian democracy.
Scam hoaxes overwhelmingly target job seekers (60%) through fake recruitment and employment offers. The economic vulnerability of Indonesia's unemployment-affected population makes them prime targets for fraudulent schemes.
Top terms: juta, loker, lowong_kerja, daftar, indonesia
Fraudulent job offers from fake PT Freeport, Pertamina, BUMN companies. Often promise high salaries and easy acceptance.
Top terms: akun, resmi, nomor, hubung, pihak
Phishing attempts targeting banking, social media, and messaging app accounts through fake customer service contacts.
Top terms: gaji, kerja, surat, posisi, terima
Fake official recruitment letters claiming candidates have been accepted for positions at major companies.
Top terms: bank, undi, festival, gratis, motor
Fake lottery wins, free giveaways, and fraudulent banking promotions.
Top terms: rupa, temu, guna, taut_daftar, pasti
Deepfake videos of celebrities (Rhoma Irama, Nagita Slavina, Raffi Ahmad) promoting gambling sites.
82% of scam topics (0+2+3) exploit economic vulnerability by targeting job seekers and those seeking financial opportunities. This reveals scammers' sophisticated understanding of Indonesia's unemployment challenges.
The Others category achieved the highest coherence score (0.461), indicating these topics are most distinct and interpretable. Health misinformation (31% combined) and disaster-related hoaxes (14%) dominate this category.
Top terms: kota, banjir, bencana, gunung, warga
Misinformation about natural disasters (floods, earthquakes, tsunamis) and emergency events.
Top terms: sehat, obat, sakit, sebab, akibat
General health misinformation including fake cure claims and medical advice.
Top terms: artikel, judul, periksa_mafindo, tiba
Fake news articles and misleading headlines from unreliable sources.
Top terms: sembuh, darah, air, ginjal, israel
Unverified miracle cure claims and medical conspiracies (e.g., water cures, blood treatments).
Top terms: islam, timnas, piala_dunia, umat
Religious content mixed with sports hoaxes, particularly around Indonesian national football team.
COVID-19 and vaccine narratives persist even beyond the pandemic peak. Combined health topics (0+6) represent 31% of Others category, showing lasting impact of pandemic-era misinformation on public health discourse.
Finding: 44% of political hoaxes are candidate-specific character attacks, with Anies Baswedan receiving 3x more attacks (449 docs) than coverage of the Jokowi-Prabowo-Gibran coalition (146 docs).
Implication: Campaign discourse prioritizes personal attacks over policy discussions, deepening societal polarization and reducing substantive democratic debate.
Finding: 82% of scam content targets job seekers through fake recruitment (60%) and financial fraud (22%).
Implication: Scammers systematically exploit Indonesia's unemployment challenges, preying on economic desperation with fraudulent employment opportunities.
Finding: Health misinformation represents 31% of Others category, with vaccine conspiracies and miracle cure claims persisting beyond pandemic peak.
Implication: Pandemic-era misinformation has lasting effects on public health discourse and vaccine hesitancy.
Finding: Systematic targeting of KPU (election commission), KPK (anti-corruption), and DPR (parliament) through conspiracy narratives.
Implication: Deliberate erosion of institutional legitimacy threatens Indonesia's democratic stability beyond the election cycle.
Finding: Chinese interference conspiracies appear in both political topics (7, 2) and are mixed with religious polarization themes.
Implication: Xenophobic narratives weaponized for electoral purposes, potentially damaging Indonesia-China relations and fueling racial tensions.
All models trained with random_state=42 for reproducibility. Complete code, data, and
models
available in the project repository.
Analysis completed on standard hardware (no GPU required). Training time: ~5 minutes per category (15 minutes total). Preprocessing: ~2 minutes per category.
Project Title: The Dynamics of Hoaxes in Indonesia: Topic Modeling Analysis of TurnBackHoax.id Data in 2024
Researcher: Ali Al Harkan (2406480304)
Course: Digital Research Methods (Metode Riset Digital)
Instructor: Dr. Eriyanto
Date: November 24, 2025
Data Source: TurnBackHoax.id (MAFINDO)
Context: 2024 Indonesian Presidential Election period
This analysis is based on fact-checking work by MAFINDO (Masyarakat Anti Fitnah Indonesia), Indonesia's first anti-hoax community. Their tireless efforts to combat misinformation make research like this possible.
If using this analysis, please cite:
Al Harkan, A. (2025). The Dynamics of Hoaxes in Indonesia: Topic Modeling Analysis of TurnBackHoax.id Data in 2024. Digital Research Methods Course Project. Dataset: TurnBackHoax.id, 3,746 documents. Method: LDA with Indonesian text preprocessing.