2015-07-20

By Jason Q. Ngi

Summary:

WeChat (known as Weixin in China) is a mobile application developed by China’s Tencent. In addition to its core chat functionality, it also has a blogging feature known as the public accounts or official accounts platform (微信公众平台).

This platform is similar in some ways to Weibo and has recently been the target of official scrutiny.

This report offers the first attempt to systematically identify what is censored on this platform. We downloaded over 36,000 unique public account posts between Jun 2014-March 2015, monitoring them over time and tracking whether they were deleted from WeChat.

Not surprisingly, sensitive keywords are found in a greater percentage of censored posts than normal (uncensored) posts. Political keywords and keywords related to corruption are particularly likely to be found in censored posts as compared to uncensored posts.

The data contains evidence of automatic review filters preventing posts with certain blacklisted keywords from being published. Examples include 六四 (June 4), 太子党 (princeling), and keywords relating to Falun Gong. Further analysis of larger WeChat corpuses might be performed to deduce what other keywords trigger a post to be held for further review and/or denied publication.

A more thorough analysis of 150 censored posts reveals that rumors, speculation, and political commentary were also being censored. Censored content included posts which contained outright falsehoods, tabloid gossip, and sensationalism—a number of which appear fairly harmless. This may be a reflection of the ongoing “anti-rumor campaign” sweeping Chinese social media.

This report also discusses the collective power of rumors and ambiguity in censorship, issues raised by how WeChat controls information on its public accounts platform.

Though our data indicates a lower percentage of posts were being censored compared to other research into Chinese social media censorship, we emphasize that the set of posts collected is not a random sample and a number of factors might affect the rate. This report serves as a first attempt to identify types of content being censored on WeChat’s public accounts platform, and further research will be needed to determine actual censorship rates as well as determine other open questions.

Introduction

Much research has been done studying the relationship between media control and authoritarian regime durability. In this report, we examine one method of online information control by the world’s most sophisticated regulator of the Internet, China. We focus on how Tencent’s WeChat (aka Weixin in China), the leading mobile chat app in the country, restricts information on its public accounts platform (微信公众平台)—a growing social media blogging alternative to Sina Weibo. After performing a systematic collection of tens of thousands of WeChat public posts, we analyze the types of content removed by WeChat on its public accounts (also known as “official accounts”) platform. Overall, this collection of deleted posts serves as another set of data points in the ongoing goal to explicate the motives behind online censorship in China.

Though our data indicates a lower percentage of posts were being censored compared to other research into Chinese social media censorship, we emphasize that the set of posts collected is not a random sample and there are challenges with comparing our results to other studies. With that caveat in mind, despite this limitation, we are still able to capture a sizable number of censored posts and perform analysis on them. In our set of censored posts, in addition to the categories of posts past researchers have identified as targeted for censorship on Chinese social media (particularly articles related to collective action, censorship, and pornography), we find numerous posts which relate to government policies and news—with particular emphasis on corruption—categories which King, Pan and Roberts found not to be substantially censored across various Chinese social media platforms.1 This finding may be an indication of WeChat’s exceptionalism, reflective of a shift in official censorship mandates, or other reasons, including strong automatic review filtering mechanisms preventing certain types of content from being published in the first place.

Furthermore, among the commonly deleted posts are numerous ones that can be categorized as rumors, fake news, and superstitions. Compared to collective action events, violent threats, pornography, or criticism of the Communist Party of China (examples of which are also among the deleted posts), one might assume these kinds of rumors—some of which seem silly and harmless relative to other kinds of sensitive content—would be of lower priority for censorship. However, when considered in the context of the ongoing “anti-rumor campaign” as well as past movements to suppress rumors in recent Chinese history, the apparent emphasis on restricting rumors and political news—as suggested by their significant representation in the dataset—makes more sense. This report will outline the recent history of this online campaign and connect their justifications to related discourse on the presumed credibility gap in Chinese media as for why rumors might be seen as such a threat to Chinese authorities. It is useful to view China’s modern attempts to control information through the lens of history, and online rumors can be viewed as a kind of social protest by citizens skeptical of official news and the censorship of more independent sources of media. It is in this way that one can gain a deeper understanding for why censorship of political information from unofficial sources appears to be so prevalent on WeChat’s public accounts platform.

The Evolving Chinese Social Media Landscape

Authoritarian regimes have been ambivalent about Internet technology and have reacted with varying degrees of sophistication to shaping online public opinion.2 On the one hand, regimes—particularly of the one-party kind—with a long experience of managing the media through well-institutionalized “departments of propaganda” have recognized that the web can be a source of regime legitimation,3 provided that they deploy sufficient efforts to sanitize its contents while offering a range of appealing online activities4 or media content5 to their population. On the other hand, the corrosive political impact of uncontrolled and decentralized information is evident when web-activism turns into a social movement or helps diffuse information to the population.6 7 When confronted with major political challenges, autocrats have gone so far as shutting down all Internet services, as in Egypt during the demonstrations leading to the downfall of the Mubarak regime, or in Xinjiang when the regional authorities shut both the Internet and cellular phone service for months after ethnic riots in Urumqi erupted in July 2009. More targeted responses by authorities in China include building the so-called Great Firewall to deny mainland users access to certain foreign websites,8 9 supporting “50 Cent Party” members to post pro-China online comments,10 and detaining online bloggers,11 among others.

Social media has similarly become a battleground for Chinese officials, who seek to ensure certain content are not widely disseminated. Though a 2010 State Council Information Office white paper on Internet usage for the country asserts that Chinese users have the right to freedom of expression online, it also enumerates a prohibition against Internet information that is:

endangering state security, divulging state secrets, subverting state power and jeopardizing national unification; damaging state honor and interests; instigating ethnic hatred or discrimination and jeopardizing ethnic unity; jeopardizing state religious policy, propagating heretical or superstitious ideas; spreading rumors, disrupting social order and stability; disseminating obscenity, pornography, gambling, violence, brutality and terror or abetting crime; humiliating or slandering others, trespassing on the lawful rights and interests of others; and other contents forbidden by laws and administrative regulations.12

The Wenzhou train crash in July 2011 affirmed the influence of Sina Weibo as a serious source of counter-power13 to government officials and the state media; since then Sina Weibo has been one of the most prominent social media platforms in China. But despite (or because of) its astronomical growth, like other Chinese online content providers, it also struggled with abiding by such vague rules as outlined in the 2010 white paper (in addition to more specific directives14) while at the same time ensuring a steady stream of interesting fodder for users to share, click, and comment—without whose engagement and loyalty the site would languish. Weibo oftentimes walked a fine line between these two competing objectives. As the Bo Xilai affair escalated in February and into March 2012,15 rumors of an impending coup broke out across the site, dominating political conversations for weeks16 despite varying degrees of censorship.17 18 19 Six people were detained for “fabricating or disseminating online rumors” about the coup,20 and at month’s end, the State Internet Information Office (SIIO), one of the agencies tasked with regulating Internet affairs, announced that the site would be “criticized and punished accordingly.”21 Both Tencent and Sina’s microblogging services had their commenting features suspended for three days, with both companies admitting their sites had been overrun with rumors and in need of a clean-up.22

Weibo was confronted with another major controversy in January 2013 when the newspaper Southern Weekend’s annual New Year’s editorial was censored and re-written by Guangdong’s provincial propaganda chief. Though many media entities, including Sina’s news portal and Weibo itself, expressed coded support for the embattled magazine,23 Weibo became the target of vitriol after it began to censor posts and search keywords related to the Southern Weekend story. In response, a manager at Sina Weibo vented publicly in a post that Weibo had “tried to resist and let the messages spread” but he also acknowledged that the company had learned from the Bo Xilai affair that they could not let things get too out of control.24

The ability for Weibo to play both sides while also maintaining its top position in the Chinese social media landscape would not hold for long; by the summer of 2013, it soon became clear that a new gust of wind had blown Weibo’s balancing act off-center. In hindsight, the seeds for the summer crackdown on bloggers and Internet activists, particularly those who utilized Weibo, began in February, when Lu Wei, head of the SIIO (now also known as the Cyberspace Administration of China), invited a number of celebrity bloggers (known as Big Vs for the verified icon next to their names on Weibo) to dinner, and according to the Wall Street Journal, pressed them in subsequent meetings on the need to quell online rumors.25 Lu was quoted in Xinhua urging celebrities to be “more positive and constructive” in their writing,26 and these recommendations were formalized in August 2013 in the so-called “Seven Baselines,” which establish guidelines of conduct for online celebrities.27

Also in August, notable finance blogger Charles Xue, who had a following of 12 million Weibo fans who read his posts about stock tips and inequality in the economy, had been arrested and forced to confess to various crimes. He admitted, “In the beginning, I verified every post; but later on, I no longer did that. . . First of all, I didn’t double-check my facts. Secondly, I didn’t raise constructive suggestions to solve the problem. Instead, I just simply spread these ideas emotionally.”28 Hundreds were suspected to have been similarly detained—activist Wen Yunchao documented over 50 specific incidents29—and a chilling effect was felt on Weibo, with analytics firm Weiboreach reporting a 20% drop in monthly posts by Big Vs between January to August 2013.30 31 Numerous commentators also cited anecdotal evidence confirming the diminishment of the types of more sensitive discussions that Big Vs were once willing to be the center of on Weibo. Charles Custer tied these trends together, writing in February 2014,

Weibo, like any social service really, is driven in part by power users. . . These accounts were responsible for passing along a lot of the news that made Weibo so interesting, but the crackdown on rumors has made the passing-along of news (even news that has nothing to do with politics) seem dangerous unless it comes from an official source. So, it seems, many of these users have just stopped passing the interesting-but-unofficial news and information they find along to their followers. And that, in turn, made Weibo more boring. [. . .] For the average user, it doesn’t have anything to do with politics or political engagement; the issue is just that the most interesting users are coming online less often, and making fewer posts when they do come online.32

Custer goes on to echo another point that other China social media watchers had been making: a growing number of disaffected and/or bored Weibo users were gradually drifting away from Weibo as their primary social media platform to WeChat. Custer argues persuasively that it was not solely a case of WeChat poaching users nor one of censorship driving users to take up WeChat’s seemingly less censored platform (though encumbered by structural restrictions as discussed in the next section), but rather a combination of the two in concert with the attack on Big Vs, which caused Weibo to be a less dynamic and interesting place for news and discussion. Thus, in some ways, WeChat was the early beneficiary of the anti-rumor campaign’s rough handling of Weibo.

Throughout 2012 and 2013, as Weibo continued to be hammered by editorials and official statements arguing that it do more to eliminate rumors and other illegal content from the service, WeChat appeared to escape similar public levels of scrutiny. Weibo was forced to continue sanitizing their site both through technical means—blocking keywords, deleting posts, and implementing mechanisms that would automatically hold posts for review if they included certain keywords, among others33—as well as through punitive measures—developing new site rules, including a much-derided points system that would punish users for posting objectionable content.34 In May 2013, the SIIO issued a statement that rumormongers on microblogging sites would be targeted and reminded social media users that users could be jailed for spreading rumors that incited subversion against state power, affected the securities market, concerned terrorist activities, or were directed at smearing business reputations or products.35 36 The detention of hundreds of Weibo users for illegal distribution of content in recent years as well as the announcement of even tighter regulations in Sept 2013 criminalizing actions on microblogs like simply retweeting false information,37 combined with the corresponding lack of systematic censorship in WeChat’s core functionality, made the new service stand out as a seeming safe haven despite anecdotal security concerns.38

Though Weibo is still an important online space, its prominence in the Chinese social media landscape has lessened, and WeChat today is the primary communications application for many Chinese Internet users, particularly as more and more users spend their time online from mobile phones.

WeChat and Public Accounts as a Weibo Replacement

WeChat is sometimes described as a chat and SMS replacement, akin to Whatsapp. However, while the core functionality of the application is indeed sending short messages to friends by mobile phone over the Internet through the app (thus avoiding any charges for sending text messages over their carrier’s cellular network), Tencent has seamlessly bundled numerous features into what is no longer just a chat application and is in fact a full-fledged social network and social media service. Users can make payments, post photos on a personal blog, download games and stickers, connect with strangers, and hail taxis all from within the app. Competitors in this all-in-one chat app space include LINE and KakaoTalk, which are popular in Japan and Korea, respectively, and are aggressively building out international user bases.

By the second quarter of 2014, Tencent reported that WeChat had 438 million monthly active users while Weibo was at 157 million monthly active users.39 Though documented censorship does and has occurred in WeChat (outlined in the next section), there was speculation that WeChat’s core functionality of chatting among small groups made it a fundamentally easier space to manage for nervous government officials, especially when compared with Weibo’s much wider reach, where a single user’s message could ricochet unpredictably across the platform tens of thousands of times in minutes. Chatting within small groups on WeChat meant that users were voluntarily limiting their audience by simply using the product as intended, essentially siloing information amongst limited, disconnected networks. Whether government officials intentionally set out to attack Weibo to push users to the less viral-enabling WeChat or whether this was an unintended consequence is unclear, but intention aside, the net result was a boon to regulators and policy makers who were concerned about Weibo’s role in facilitating nationwide conversations and organizing capacities.

However, WeChat recognized that brands and corporations would continue to desire the ability to reach out to large audiences regardless of the platform. To enable marketers to engage masses of users in a more effective manner, WeChat introduced their public accounts platform (微信公众平台) in 2012. Also known later on as “official accounts,” this new feature would allow both individual users and companies to publish articles in the form of blog posts and push them to interested users who subscribed to their account, in addition to a number of other features designed to foster engagement between users and publishers.40 Foreign companies were also encouraged to join an international official account platform, where they could also use WeChat to engage users with updates and deals.ii

Despite the numerous impediments to registering an account—a user must fill out an application and submit a picture of themselves holding up their Chinese identification card, for instance—by November 2013, there were 2 million registered public accounts,41 by August 2014, that number had more than doubled to 5.8 million,42 and by the end of 2014 the number of accounts reached over 8 million.43

Some of these accounts have hundreds of thousands of subscribers—all the more remarkable considering that WeChat does little to promote public accounts within the app nor does it provide a directory of accounts. In order to view posts from a public account, a user must have access to either a QR code that links to the desired public account or the exact username of the public account. There is no search feature within WeChat to look up, for instance, environmental activism accounts. Popular accounts are often spread through word of mouth, breeding a certain kind of exclusivity.44 The self-selective nature of the audience plus the lack of a public comments feature (though readers can message the author privately) also eliminates much of the off-topic chatter and harassment that drove some Weibo users with large followings away from the service.45

While well-known brands and companies like Starbucks and Nike did build strong official account followings,46 much of the media and public attention was directed toward accounts registered and run by individuals or small, independent media startups. In fact, the stream of articles and information published by these non-corporate entities came to be known as a new form of “self media” (自媒体, also translated as “We-Media”47), offering “more in-depth material and more diverse viewpoints” than traditional media.48 These grassroots media organizations and citizen journalists (or if not journalists, then news aggregators) flourished on WeChat throughout the app’s early years, with the platform considered much more open and uncensored than traditional media or even the more established Weibo. Media scholar Hu Yong writes:

Blogging reached its peak in China during 2005 and 2006, but in those days there was no high-minded talk of “grassroots media.” However, the blogging craze turned us into a nation of writers, and then, when Weibo arrived on the scene, we went from a nation of writers to a nation of one-person news outlets. Everyone loved Weibo—both ordinary folks and the élite—and it took the whole country by storm. But even when Weibo was at its liveliest nobody was talking about the idea of grassroots media.

The grassroots media sensation really took off with the advent of mobile web technology, which introduced new ways of producing and disseminating content. It reached its culmination in WeChat, whose public accounts, introduced on August 23, 2012, allowed individuals and organizations to create mass postings of text, pictures, recordings, and later video. This turned WeChat from a private communication tool into a media platform. At the same time, because users could form circles, WeChat had the potential to become a tool for social organization—much more so than Weibo had been.49

In essence, WeChat had created a platform that mirrored some of the characteristics of Weibo that government officials feared the most. Whereas WeChat’s original chatting feature might have pushed users into small, disconnected groups, its public accounts platform empowered users to reach mass audiences, who could quickly and easily share the posts to friends or copy links to the posts (Figure 1)—links which could then be spread outside the app and across the general Internet. Thus, it is not surprising that of all the popular features within WeChat, it is the public accounts platform that has faced the tightest oversight and content restrictions.

Documented Cases and Evidence of WeChat Restrictions

Since WeChat’s release in January 2011, it has been assumed by users and security analysts alike that WeChat—like any Chinese social media service that hopes to survive, particularly one released by a corporate giant like Tencent—performs some level of censorship and surveillance in the application. There have been a number of anecdotal cases of surveillance and censorship within the chat app which have been publicized (see Appendix: “Documented Cases of WeChat Restrictions”).

Greater public attention to keyword censorship on WeChat came in January 2013, during the Southern Weekend controversy, when technology blog Tech in Asia reported the keyword “南方周末” (Southern Weekend) was being blocked in chat messages. The Next Web confirmed they received a similar error message indicating their message was blocked when they tried “法轮功” (Falun Gong). These reports were significant because the blocking affected international users. Following these reports Tencent released a statement claiming that “a small number of WeChat international users were not able to send certain messages due to a technical glitch.”

Keyword filtering and surveillance can be implemented on the client-side (i.e., on the application itself) or on the server side (i.e., on a remote server). Client-side implementations can be analyzed through reverse engineering the application and extracting the keyword lists used to trigger censorship or surveillance. Server side implementations do not allow for the same methods and typically need to be analyzed through sample testing in which researchers develop a set of content suspected to be blocked by a platform, send the sample to the platform, and record the results.50

Analysis done by Jeffrey Knockel and the Citizen Lab reveal that WeChat performs keyword filtering on the server side.

In May 2013, Jeffrey Knockel, while in China, ran an experiment with WeChat clients registered to US phone numbers and found keyword censorship of “法轮功” (Falun Gong) but not for “南方周末” (Southern Weekend) (Figure A, left). Running this same experiment from the US with the same accounts resulted in no censorship for the same keyword. These results suggest that at that time the user’s network vantage point enabled the keyword filtering features.

In December 2013, the Citizen Lab conducted tests of keyword filtering on WeChat using an account registered to mainland China phone number. The analysis confirmed the keyword “法轮功” (Falun Gong) was being filtered, but “南方周末” (Southern Weekend) was not (see Figure A, right). In later tests in January 2014, we were unable to reproduce the blocking of “法轮功”. In subsequent tests in January and February 2014, we tested samples of keywords derived from previously extracted keyword lists. These tests did not find any further evidence of keyword filtering. We attempted to reproduce the previous finding from Knockel but were neither able to trigger censorship by using a VPN based in China nor by spoofing GPS locations in China. These results suggest that accounts registered to mainland China phone numbers enabled the keyword filtering features. These results are inconclusive and further analysis is needed to evaluate if keyword filtering is currently active on WeChat and how the feature is enabled. However, even if keyword filtering in the chat feature is not currently active, these results show that the capability has already been developed for the app.

Figure A: Evidence of censorship in WeChat’s chat feature.
Left: May 2013, WeChat client using an account registered to a US phone number running from a Chinese network.
Right: Dec 2013, WeChat client using an account registered to mainland China phone number running from a Canadian network.

Previous research by Villeneuve51 and Knockel et al. on TOM-Skype (the Chinese version of Skype until 2013)52 revealed that TOM-Skype conducted keyword censorship and surveillance. This finding suggests that if TOM-Skype has chat message surveillance then other more popular chat programs in the Chinese market may also be under pressure to have similar controls. However, technically determining if surveillance features are present in WeChat is challenging. The Citizen Lab verified that keyword filtering is done on the server side and did not find any evidence of client side filtering or surveillance on WeChat. Therefore, if surveillance is present on the application it is likely being performed by a remote server and its effects are not visible outside that server.

Beyond content restrictions on the chat features of WeChat are controls enacted on the public platform. As discussed in the previous section, the public accounts platform would be more likely to be viewed as a threat by authorities, and as such makes sense for it to be more tightly controlled—particularly in the last year as registration numbers for the public accounts platform have grown dramatically

Though much was made of the suspension of dozens of public accounts in March 2014 and January 2015, how WeChat was already censoring these public accounts has been mostly overlooked. In fact, public posts on WeChat were already subject to an ambiguous form of content regulation that appeared intentionally designed to misperceive its users. Users who frequently browse through public accounts have likely encountered the error message “This content has been reported by multiple people, and the related content is unable to be shown” (此内容被多人举报，相关的内容无法进行查看), particularly when attempting to access sensitive content (Figure 2). In the next section, we discuss this form of content restriction in greater depth.

Identifying Deleted Public Posts and Censorship Rate

In order to identify and collect posts published by public accounts that were deleted on WeChat, we sought to collect WeChat posts at scale, then return to those posts periodically to check if they had been removed from the site. In essence, we sought to perform a task similar to Weibo censorship-tracking projects like the University of Hong Kong’s Weiboscope and GreatFire.org’s FreeWeibo. Collecting the posts at scale proved to be a challenge and a number of methods were considered before settling on an approach. (These are discussed in detail in Appendix: “Method for Identifying Deleted Public Posts and Limitations.”)

Though data collection for this project is ongoing, this report examines the first nine months’ worth of data collected, beginning from July 2014 through the end of March 2015. During this period, we were able to download and monitor over 36,000 unique posts. These 36,000 posts came from 10,254 unique public accounts. A Compact Language Detector test successfully categorized 96.6% of the posts by analyzing each article’s text. Of those reliably categorized, 96.5% were detected as Chinese Simplified, 1.94% as Chinese Traditional, and 0.73% as Arabic/Urdu.iii

We also check whether a user’s account is still active or if it has been suspended; of those 10,254 accounts, 154 (1.48%) had been identified as suspended as of April 10, 2015.iv Of the 23 user accounts China Digital Times suggested we specifically monitor, 19 are still active as of April 2015 while the other 4 have been suspended. Of those 23 accounts, the most prolific is the account for 大家, an official Tencent QQ blog, which has posted 3668 posts to their public account. Their posts regularly garner tens of thousands of views and hundreds of “likes” (Figure 3).

During each test, the posts being checked are categorized as having one of the three following states: normal/published (meaning the post was successfully loaded with no error messages), self-deleted (Figure 5: “This content has been removed by the publisher”), or deleted by WeChat (Figure 2: “This content has been reported by multiple people, and the related content is unable to be shown,”)—the latter being an explicit admission that the post was restricted and what most would consider censorship (more on this in “The Ambiguity of Censorship Messages” section).

These system-deleted posts—which for convenience we will refer to as censored posts—are on the whole slightly longer than normal or self-deleted posts (a mean of 2,642 and median of 1,883 characters long versus 2,168 and 1,478 for normal posts and 2,135 and 1,407 for self-deleted posts). Mean and median image counts were roughly the same between the three categories. While zero is the mode for number of images in a post, the majority of posts do have at least one—with the most decorated post in our sample featuring 439 images.

2.24% of the total posts were self-deleted. An examination of the contents in these posts showed a range of varying degrees of sensitivity. For instance, numerous prosaic posts of non-sensitive material returned this message, for instance, “2015 holiday schedule has come, quickly bookmark it!” (2015年放假时间表来了，赶紧收藏呀~), but others, including posts about Taiwan, the Communist Party, and banned online videos, were also reported as self-deleted. However, it is reasonable to assume that users who published sensitive articles would also be self-aware enough to delete them if circumstances encouraged them to—either a change of heart, a reaction to new online directive against certain content, a message from an authority, etc.—and thus it is not too surprising that some sensitive content is listed as self-deleted. For example, 看中国, an overseas Chinese-language account which re-posts content which is often very critical of Chinese politics, had 42 self-deleted posts among the 717 posts of theirs that we checked. By contrast, of the 655 posts we checked from 大家’s account, which primarily posted non-sensitive content, no posts were identified as self-deleted. Indeed, the number of self-deletions for non-suspended accounts is significantly predicted by the number of censored posts the account made after number of posts is controlled for.

Chinese social media have been known for fostering an ambiguity in the error messages provided to users, with censorship often being couched in much more innocent or vague messages.53 But despite some curious coincidences, including, for instance, how a number of posts with the same content from different authors were listed as self-deleted, we will set these self-deletions aside and focus on the posts which return the censored message.

Of the total posts checked, 3.97% were censored. However, calculating a true censorship rate for published posts on WeChat’s public accounts platform as a whole relies on having a random, representative sample of posts across the platform, which this study does not. As detailed in the “Method” section of the Appendix, we captured article links that had been publicly posted on social media in addition to posts from accounts known to share sensitive content. Users who are better self-promoters or are more well-known—and thus more likely to have links to their articles publicly posted—would undoubtedly make up a greater percentage of the sample than what should be expected. Furthermore, we also tracked 48 individual accounts of users who were known to post sensitive content, downloading their posts as they came in. Thus, certain users were overrepresented in our sample.

Furthermore, to more accurately gauge the level of censorship in our sample, we can differentiate between whether or not a censored post was itself specifically targeted for censorship or whether it just so happened to be one of the many posts that are no longer accessible from that user because their account was suspended. This is an issue because a censored post from an active user and all posts from a suspended user return the same exact censorship message in Figure 2. About 40% of the deletions (1.55% of the total sample) were definitively identified as being deleted while the author’s account was still active, meaning the content of the post itself was flagged for deletion. The other 60% of deletions were detected as having come from users whose accounts had been suspended. While it is possible that some of the content in these posts from suspended users would also have been targeted for censorship and have in fact triggered the account suspension of the user, we cannot be certain of this and thus, we can only claim at best that these latter deletions are representative of the type of content that will get a user suspended on WeChat, whereas the former deletions do represent specific sensitive content considered worthy of deletion according to WeChat. Again, one should keep in mind though, that this does not necessarily reflect the true censorship rates of WeChat on the whole as the sample is biased. In fact, one could make the argument that the true censorship rate of all posts on WeChat’s public platform could be even lower, since as described in more detail in the “Method” section of the Appendix, the sample came from users who would be more likely to share sensitive content (e.g. people who have the technical skills and the desire to cross the Great Firewall in addition to the sensitive users we track directly); or it could be higher since we do not capture posts in real time.

4.99% of user accounts in our sample had a post that was censored. Of the user accounts that only had one post captured in our sample, 3.5% of them were censored. Using a sample made up of one post randomly selected from each of the 10,254 individual user accounts, 3.51% of posts were censored. These measures are all different attempts to mitigate the effect that prolific users and suspended users would have on overly biasing the censorship rate.

Various censorship measures for our sample of WeChat public account posts

User accounts suspended

1.48%

Posts censored while user’s account was known to be active

1.55%

Self-deleted posts

2.24%

Posts censored, all

3.97%

User accounts that had at least one post censored

4.99%

Posts censored, only users who published one post

3.5%

Posts censored, one post per user account

3.51%

Though this report does not offer a true censorship rate for WeChat’s public accounts platform, the admittedly non-random sample of posts we collected does appear to suggest less censorship than what other researchers have observed on Weibo. Excluding re-posts and only looking at original messages, Bamman, O’Connor, and Smith reported 16.25% of a random Weibo sample was censored,54 while Zhu et al. calculated one of 12.8% when using a sample they hypothesized would produce a censorship rate higher than actual.55 Analyzing a wide array of non-Weibo social media posts in 2012, King, Pan, and Roberts reported a censorship rate of 13%.56 Though Fu et al. note the difficulties in calculating a true censorship rate with their extensive dataset of Weibo posts,57 Cairns and Carlson use a host of techniques to adjust and correct for such possible issues, putting the true censorship rate of Weibo within a comparable range to what others have calculated; and similar to King, Pan, and Roberts, Cairns and Carlson showed censorship on Weibo to be at a much higher, elevated rate during particularly sensitive events which elicit “volume bursts.”58 The numbers reported in these studies compared to the ones above for WeChat would fit with anecdotal reports that WeChat’s public accounts platform is less restrictive than Weibo’s with regards to censorship after publication.

However, as we do not capture our posts in real-time (71% of our dataset was captured only after a link to the WeChat post was shared on other social media), we would miss posts which are censored immediately after publication (or at least before they are shared). And as Zhu et al. note, 30% of the censored posts they collected were deleted within the first 30 minutes. Furthermore, a firsthand study by Liu and Zhu of The Carter Center noted how automated filters often kept sensitive content from being actually published in the first place—and thus lessen the amount of content in need of censorship after. As they recounted, “Several times we attempted to post the article, but were repeatedly informed that its content had something sensitive that needed to be revised or deleted.”59 Either of these factors might cause us to understate the level of censorship actually experienced on WeChat’s public accounts platform. We thus offer a strong caveat about generalizing too much from our dataset about the censorship rate on WeChat as that was not the primary objective of this report but rather the intention was to identify what kinds of content might be censored on the platform.

Analysis of Content in Deleted Posts

The censored posts we captured are content rich and typically written in a more conventional style, allowing for both enhanced manual and machine reading compared to much shorter Weibo posts, which are often filled with abbreviations, slang, neologisms, and coded keywords to combat censorship and the 140 character limits. Some of the posts are original content while others are re-posted material from other blogs and forums. We performed some general searches through the posts as well as a more in-depth reading and categorizing of 150 censored posts.

King, Pan, and Roberts note that collective action is specifically targeted by online censors on numerous social media in China. And indeed, a greater share of censored posts contain words related to protest than normal posts do. For instance, “抗议” appears in the text of 5.7% of the censored posts whereas it appears in only 1.83% of normal posts. By contrast, neutral words like 朋友 (friend), 的 (of), and Germany (德国) are found in roughly the same share of censored posts as normal posts as expected. However, while the share of posts containing 抗议 is larger in the censored posts, there are still a substantial number of posts containing the word that are not censored . A closer examination reveals that many of these uncensored posts concern protests or discussions about protests taking place overseas. However, this rule is not ironclad; for example, a post about Hong Kong youths protesting and attacking mainland shoppers at a mall in February 2015 (“香港又要闹事？公然叫嚣：蝗虫们，滚回大陆 ! “) is still uncensored to date. More rigorous analysis is needed to tease out any significant correlations that might help reveal more nuanced coding rules for what gets banned and what does not.

We performed a similar test of various sensitive and non-sensitive keywords—the top 1000 most frequent Chinese words,60 over 9,000 keywords from 13 censorship lists,61 and 36 other select keywords—comparing what share of published versus censored posts they were found in. As shown in Table 1, as expected, many sensitive keywords are found in a greater percentage of censored posts than normal posts, some to an extreme degree. For instance, the disgraced politician Bo Xilai (薄熙来) appears in only 0.2% of normal posts, but is found in 3.91% of censored posts, a ratio of nearly 20 times greater in censored posts than in normal posts. Other politicians’ names also score extremely high on this measure, including Hu Yaobang, Zhou Yongkang, Xu Caihou, and Hu Jintao. Numerous political terms also populate this list, including harmonious society (和谐社会), the Central Political and Legal Affairs Commission (政法委) and Communist Party (共产党). As a preliminary check against users with more posts skewing these findings, we also re-ran the calculations using a sample of posts made up of one post per user. Though there were differences in some keywords, on the whole, the two sets were mostly similar.v

Of particular note is that keywords relating to corruption (贪官, 贪污, 贪腐, 腐败, 公款) make up five of the top 50 most sensitive keywords according to this metric (after filtering out redundant keywords as well as those which were not found in less than 2.5% of censored posts), appearing in more than six times as many censored posts than would be expected based on their share of normal posts. While it is possible this is due to the fact that these keywords are less likely to be used in a neutral fashion than some of the other keywords, it may also be due to the fact that posts related to corruption are being watched particularly carefully by censors today or on WeChat.

Furthermore, these results offer evidence of the automated review filters that prevented Liu and Zhu from being able to publish posts containing sensitive keywords. Looking at commonly censored keywords in these tables, we notice that a few had an uncommonly low presence in both normal and censored posts, indicating that posts containing those keywords were not allowed to be published. For instance, “六四” (64) is a common reference to June 4, 1989—the date of the crackdown on student protesters in Tiananmen Square. It was found in only 14 of the 33,934 normal posts and—even more surprisingly—in only one censored post. By comparison, another sensitive number, 八九 (89 for 1989), appeared in 426 normal posts and 9 censored ones, indicating that posts containing it were more likely to reach publication than 六四. 六四 is not the only common sensitive keyword however in its surprising scarcity amongst published and censored articles: princeling (太子党), a derogatory reference to the children of high government officials, appears in only 11 normal posts and 1 censored post, while numerous words related to the Falun Gong (法轮功, 九评, 李洪志) do not appear at all in either set of posts. Other sensitive words that are also suspiciously underrepresented in these WeChat posts could be tested on WeChat’s platform to confirm their presence on a blacklist, but due to a number of reasons—a user is only allowed a limited number of posts per day; it is very difficult to register one account let alone multiple accounts; there is a delay between the time a user submits a post and when they are notified that it contains sensitive content—it is currently difficult to do this at scale. Further analysis of under-represented words in larger WeChat corpuses might be performed to deduce what other keywords trigger a post to be held for further review and/or denied publication.

In addition to looking at the articles in aggregate, we analyzed 150 censored posts, each from a different active user account. These 150 posts covered an array of topics, with 10 containing images of naked women, 14 whose main topic was corruption, and 8 relating to cases of suppression by police. Most curious were 11 posts relating to various superstitions and Chinese folk religion. The targeting of Buddhas, the zodiac, and other quasi-religious omens appears to fit with the effort to curtail “superstitious rumors,” efforts which have historical roots but have not manifested themselves in more recent online media censorship, but have a long tradition of being suppressed in Chinese history. Furthermore, “feudal superstition” is a category of content specifically mentioned as being prohibited in WeChat’s Chinese-language “Service Agreement”—but not in the English-language “Acceptable Use Policy”—a point further discussed in the next section.

We find a remarkably low number which would qualify as spam. Of the 150, only one post—an advertisement for an arthritis medicine—was clearly spam, while some of the 11 superstitious posts—which often have a call to action to forward and share the post or suffer some unlucky fate, akin to a chain letter—might also qualify as such. This paucity of spam in our WeChat dataset stands in stark contrast to the challenges other researchers have faced when dealing with Weibo content and may speak to the effect that WeChat’s rigorous sign-up process for an account may have on spam or how effective the pre- and post-review filters are at removing spam.

Less clear-cut were dozens of posts that contained rumors, misinformation, or some form of speculative commentary. We classified these as posts whose primary purpose was to share unverified or even false information. These included posts that featured dubious health claims, Chinese traditional medicine, made-up science, and conjecture about the future of political figures, a selected sample of which are included in Table 2. While some of these are clearly inflammatory (for instance, the post about Africans who have illegally immigrated to Guangzhou and are marrying Chinese women, featuring a photo of three black men carrying a limp woman) and fit with past censorship of rumors that might cause panics,vi others are do not appear to be destabilizing or even harmful. A number are more typical of sensationalism or tabloid gossip, for instance the claim that Jack Ma’s son had died or the story about a 5-year-old girl developing liver cancer due to a diet of ramen. Other articles that do not appear necessarily destabilizing or particularly critical of Chinese officials or policy include posts related to historical revisionism or historical figures, as well as general commentary of political news. Again, while these posts have little to no collective action potential and are unlikely to cause destabilizing panics, they fit with the recent concerted efforts by Internet regulators to crack down on online falsehoods and political news as stated in the August 2014 SIIO regulations on “instant messaging tools” (see Appendix: Documented Cases of WeChat Restrictions).

Thus, these findings about political keywords being found in great proportion in censored posts and posts about non-collective action rumors being censored appear to be at odds with King, Pan, and Roberts’ 2013 and 2014 findings—which concluded that despite a diversity in tactics in how social media platforms censor, on the whole, it is content related to collective action (both pro- and anti-government ones) that is most heavily censored while even “vitriolic blog posts about even the top Chinese leaders” are often allowed to persist. However, King, Pan, and Roberts’ data was drawn from hundreds of social media sites, and specifically, their analysis centered on censorship within volume bursts, that is, the flurry of censorship after and in reaction to a particularly sensitive event, which this study does not focus on. And while the authors make sure to qualify their conclusions in their papers, it may be possible to unintentionally make overly-generalized statements about what is censored online in China based on their excellent work.

Thus, this report, along with recent studies of social video platforms in China by Knockel et al.62 and nationalism on Weibo by Cairns and Carlson,63 remind us that Chinese Internet censorship is neither monolithic in tactics nor outcome. While we agree with King, Pan, and Roberts that a great deal of intense government criticism takes place throughout the Chinese Internet space and that collective action is an issue of primary concern for censors and authorities, our analysis of WeChat maintains that such conclusions may not apply for all Chinese Internet platforms and applications, even very popular and highly-scrutinized ones like WeChat. Furthermore, new government initiatives like the 2014 “Clean the Web Operation” (净网行动)64 or the anti-rumor campaign may render outmoded previously observed trends. We thus offer a cautious note about applying any comprehensive theory about an ecosystem as varied and fast changing as the Chinese Internet.

The Collective Power of Rumors in China

Though the recent campaign against rumors had its seeds in various government pronouncements from 2011,65 it has been in the past two years that it has gathered even greater momentum. In September 2013, the Supreme People’s Court issued guidelines that criminalized the sharing of defamatory or destabilizing misinformation that was forwarded over 500 times or viewed more than 5,000 times,66 a release that was presaged by a number of arrests and detentions of social media users, 67 68 including that of Charles Xue, that were much publicized in state media. Later that month, the new regulation was cited as the basis for the arrest of a 16-year-old student in Gansu Province, Yang Hui, who questioned the findings of a police investigation of a suicide case.69 He was released a week later after police declined to try him and the director of the county’s public security bureau was fired,70 an incident that reinforced for many the fact that rumors can carry as much truth as the words of the government. As Min Jiang notes, “Instead of increasing government transparency and responsiveness, the state’s demonization of ‘rumor’ produces a chilling effect on the public’s ability to know, to question and to act.”71

The reasons behind why the CPC employs such a strategy of information control with such downsides have been explored in Esarey and Xiao’s application of information regime theory to China72 as well as Rogier Creemers’ “Cyber-Leninism” essay, which outlines the importance of historical Party ideology on the CPC’s current stance toward the Internet.73 Seeing the Party through the historical lens of Creemers’ essay is extremely helpful for thinking about the roots of the current anti-rumor campaign and why rumors are viewed so negatively by modern Chinese officials, a topic Steve Smith examines in his work.74 75 Smith notes how rumor can often act as a form of political resistance, writing:

For Communist regimes, rumor represented both a form of unauthorized speech (and thus a potential threat to social stability) and a useful insight into popular attitudes and mood. This ambivalence reﬂected the dilemma of the regimes, which on the one hand were “wary of allowing citizens to express uncensored opinions about matters of public import in public,” and on the other were “extremely anxious to know what people were thinking.”76

Liu, in an analysis of the spread of rumors spread by mobile phone in six different case studies, writes similarly about the power of rumors:

The official assertion and accusation aims to not just obliterate rumors, but to deprive people of their legitimate rights to free speech and information flow, and further to silence people’s comments, doubts, questions and inquiries towards “the official story” by establishing deterrence. In this context, to circulate rumors… becomes a simple, but basic way for each person to show his/her suspicions, distrust and challenges toward the dominant public sphere and its hegemonic discourse. Obviously, this action displays a gesture of political confrontation… Additionally, in citizens’ minds, the more people join the dissemination of rumors, the louder the clamor of those unjustly oppressed grows. In other words, the aim of circulating of rumor… is not just to reveal the truth, which has been covered-up (e.g. unusual death), or to embarrass those individuals or institutions (e.g. local government) in power, but to mobilize citizens…77

Hu Yong’s “Rumor as Social Protest” further expands on this point of rumor serving as a sort of collective response to injustice and a kind of weapon of the weak.78 The notion of rumor as protest is particularly heightened in environments where there is a major credibility gap in what is published by authorities. While there is conflicting evidence over how skeptical Chinese citizens are of the various media they have access to,vii the explosion in the publication and sharing of unverified information in the past few years speaks to both a desire to express one’s mind and a collective desire to ensure there is a counter-power to potentially false official statements. As Cheng Yizhong, the founder of the Southern Metropolitan Daily newspaper stated, “Rumors are the penalty for lies… They are a rebellion of speech by the weak against power, a small ill hoping to overthrow a great evil.”79

The response by authorities to this “rebellion” certainly depends on the context of the times. A certain level of rumors or critical commentary on the Internet has always been acceptable as a form of “safety valve,”80 but the line is always in flux and it appears to have narrowed recently for issues like corruption. Adjusting the level of censorship is a delicate act, as Peter Lorentzen models in “China’s Strategic Censorship.”81 A limited amount of open, critical reporting can benefit the government in duties like rooting out corrupt local officials. But in (relatively) decentralized media like the Internet, performing partial censorship is difficult. Lorentzen posits that to compensate for this, authoritarian governments might tighten their grip on traditional media even further in order to ensure there is a countervailing balance to more critical online commentary.

However, if citizens are aware of this shift beyond even what was the norm previously, there may be an even greater reliance on independent media like online news, with little regard for whether the independent source is reputable or not. Thus, the online anti-rumor campaign serves as a valuable tactic for not only curtailing independent media sources through censorship but also for the larger goal of impugning the veracity of all independent, “unapproved” media sources, tilting the balance over control back in the government’s favor. Authorities have used the language of dise

Citizenlab.ca

Politics, Rumors, and Ambiguity: Tracking Censorship on WeChat’s Public Accounts Platform