Project description

Toutiao Media Lab provides content trends generated from Toutiao for China’s publishers and creators. The data on our website is available to the registered users only, and every publisher can apply for an account for free. Toutiao, the flagship product of Bytedance, has become the largest AI-based platform for content discovery and creation in China. Toutiao offers users the unique, personalized and comprehensive content experience, enabling them to explore their areas of interests. In the past, readers and audiences were used to the communication model dominated by media. But now media have to change their concept to cater to users’ taste and habit in mobile internet era. Our lab presents an opportunity for media staff and we-media publishers to look into the readers’ preference and discover news clues. Toutiao Media Lab has more than 80 thousand users, including 79 thousand we-media users and 1.5 thousand users from over 900 traditional media.

What makes this project innovative?

Toutiao Media Lab analyzes and mines the data generated by Toutiao. We have the complete process of data cleaning and data verification, and the professional visualizations to demonstrate our data. Besides, certain data is available for downloading. Compared with other trend products like Google Trends or Baidu Index, the data we provide has more dimensions and can be downloaded in xlsx or png format. Rather than trend of searching, what we offer is the reading behavior data regarding the interaction between publishers and readers, which requires matching contents’ characteristics with readers’. Such computing process is much more complex.

What was the impact of your project? How did you measure it?

Toutiao Media Lab has more than 80 thousand users, including 79 thousand we-media users and 1.5 thousand users from over 900 traditional media. We also cooperate with many traditional media including Caixin, the media providing financial and business information, and Xinhua News Agency, the official press agency of China. We provide our media partners with the tailored data concerning specific news topic. For instance, we produced the “Hilary Curve” for Caixin’s data journalism project. The curve illustrates that Toutiao’s users were more interested in Hilary than Trump before the Election Day, while afterwards users showed more interest for Trump, representing the beginning of the Trump Era. The curve is valuable for data journalism.

Source and methodology

We collect data from Toutiao, the platform boasting with 1.5 million publishers and over 600 thousand new contents per day. We use NLP to extract key information from oceans of articles and combine the information with the data on users’ behavior (reading, giving “likes”, time spent etc), so as to develop the user profile and trend of news events.

Technologies Used

NLP Text Analysis: user profile, text mining and keyword modelSpark: calculating the large-scale data; processing PB-grade data calculationHadoop HDFS: distributed storage of dataKafka: Stream computing; real-time data processingVueJS: display for front-end data

Project members

Zhiyi Liu (Acting Director)Yu Luo (Product Manager)Shuaiyi Wu (Back-end Programmer)Huaxian Chen (Front-end Programmer)Zizhang He (Operator)

Link

Project owner administration

Contributor username

Followers

Click Follow to keep up with the evolution of this project:
you will receive a notification anytime the project leader updates the project page.