Datasets
This is a GPS trajectory dataset collected in (Microsoft Research Asia) GeoLife project by 182 users in a period of over two years (from April 2007 to August 2012). This trajectory dataset can be used in many research fields, such as mobility pattern mining, user activity recognition, location-based social networks, location privacy, and location recommendation. The following heat maps visualize its distribution in Beijing.


please cite the following two papers when using this dataset.
[1] Yu Zheng, Quannan Li, Yukun Chen, Xing Xie. Understanding Mobility Based on GPS Data. In Proceedings of ACM conference on Ubiquitous Computing (UbiComp 2008), Seoul, Korea. ACM Press: 312-321.
[2] Yu Zheng, Lizhu Zhang, Xing Xie, Wei-Ying Ma. Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of International conference on World Wild Web (WWW 2009), Madrid Spain. ACM Press: 791-800.
This is a sample of T-Drive taxi trajectory dataset which was generated by over 10,000 taxis in a period of one week in Beijing.
Please cite the following two papers when using the dataset:
[1] Jing Yuan*, Yu Zheng, Chengyang Zhang, Wenlei Xie, Xing Xie, Guangzhong Sun, Yan Huang. T-Drive: Driving Directions Based on Taxi Trajectories. In Proceedings of ACM SIGSPATIAL Conference on Advances in Geographical Information Systems (ACM SIGSPATIAL GIS 2010),
[2] Jing Yuan*, Yu Zheng, Xing Xie, Guangzhong Sun. Driving with Knowledge from the Physical World. accepted by 17th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2011).
This is a portion of GPS trajectory dataset collected in (Microsoft Research Asia) GeoLife project. Each trajectory has a set of transportation mode labels, such as by driving, taking a bus, riding a bike and walking, which can support transportation mode learning.
Please cite the following three papers when using this GPS dataset.
[1] Yu Zheng, Like Liu, Longhao Wang, Xing Xie. Learning Transportation Mode from Raw GPS Data for Geographic Application on the Web, In Proceedings of International conference on World Wild Web (WWW 2008), Beijing, China. ACM Press: 247-256
[2] Yu Zheng, Quannan Li, Yukun Chen, Xing Xie. Understanding Mobility Based on GPS Data. In Proceedings of ACM conference on Ubiquitous Computing (UbiComp 2008), Seoul, Korea. ACM Press: 312-321.
[3] Yu Zheng, Yukun Chen, Quannan Li, Xing Xie, Wei-Ying Ma. Understanding transportation modes based on GPS data for Web applications. ACM Transaction on the Web. Volume 4, Issue 1, January, 2010. pp. 1-36.
This simulator can generate people’s requests for taxicabs on different road segments, using the knowledge mined from a large-scale real taxi trajectories. Each query consists of an origin, destination, and a timestamp. Please cite the following paper when using the simulator.
[1] Shuo Ma, Yu Zheng, Ouri Wolfson. T-Share: A Large-Scale Dynamic Taxi Ridesharing Service. In Proceedings of the 29th IEEE International Conference on Data Engineering (ICDE 2013).
The dataset consists of the check-in data in New York City and Los Angels as well as the social structure of the users. Each check-in includes a venue ID, the category of the venue, a timestamp, and a user ID. Please cite the following papers when using the dataset.
[1] Jie Bao, Yu Zheng, Mohamed F. Mokbel. Location-based and Preference-Aware Recommendation Using Sparse Geo-Social Networking Data. ACM SIGSPATIAL GIS 2012.
[2] Ling-Yin Wei, Yu Zheng, Wen-Chih Peng, Constructing Popular Routes from Uncertain Trajectories. 18th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2012).
This dataset includes the concentration of three air pollutants, PM2.5, PM10, and NO2, from air quality monitoring stations in Beijing and Shanghai in the time span of 2013-2-8 to 2014-2-8. Please cite the following two papers when using the dataset.
[1] Yu Zheng, Furui Liu, Hsun-Ping Hsieh. U-Air: When Urban Air Quality Inference Meets Big Data. 19th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2013).
[2] Yu Zheng, Xuxu Chen, Qiwei Jin, Yubiao Chen, Xiangyun Qu, Xin Liu, Eric Chang, Wei-Ying Ma, Yong Rui, Weiwei Sun. A Cloud-Based Knowledge Discovery System for Monitoring Fine-Grained Air Quality. MSR-TR-2014-40.
The package is comprised of six parts of data that were extracted from the GPS trajectories of taxicabs, road networks, POIs of Beijing, and video clips recording real traffic on roads. Please cite the following two papers when using the dataset.
[1] Jingbo Shang*, Yu Zheng, Wenzhu Tong, Eric Chang. Inferring Gas Consumption and Pollution Emission of Vehicles throughout a City. In the Proceeding of the 20th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2014).
[2] Yu Zheng, Licia Capra, Ouri Wolfson, Hai Yang. Urban Computing: concepts, methodologies, and applications. ACM Transaction on Intelligent Systems and Technology (ACM TIST). 5(3), 2014.
This package is comprised of three parts of data. 1) tensors representing the 311 complaints on urban noise; 2) geographical feature of each region in NYC; 3) Real noise levels of 36 locations in NYC. Please cite the following two papers when using the dataset.
[1] Yu Zheng, Tong Liu, Yilun Wang, Yanchi Liu, Yanmin Zhu, Eric Chang. Diagnosing New York City’s Noises with Ubiquitous Data. In Proc of UbiComp 2014.
[2] Wang, Y., Zheng, Y., Liu, T. A noise map of New York City. In Proc. of UbiComp 2014.
The dataset was used for air quality forecast and real-time inference. It also can be used for test cross-domain data fusion methods. Please cite the following papers when using the dataset.
[1] Yu Zheng, Furui Liu, Hsun-Ping Hsieh. U-Air: When Urban Air Quality Inference Meets Big Data. 19th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2013).
[2] Yu Zheng, Xiuwen Yi, Ming Li, Ruiyuan Li, Zhangqing Shan, Eric Chang, Tianrui Li. Forecasting Fine-Grained Air Quality Based on Big Data. In the Proceeding of the 21th SIGKDD conference on Knowledge Discovery and Data Mining (KDD 2015).
The dataset contains bike usage (denoted by the number of check-outs and check-ins) at each bike sharing station in NYC and Chicago. The weather condition data during the period, in which the bike sharing data is collected, is also shared. Please cite the following papers when using the dataset.
[1] Yexin Lee, Yu Zheng, Huichu Zhang, Lei Chen. Traffic Prediction in a Bike Sharing System, In Proceedings of the 23rd ACM International Conference on Advances in Geographical Information Systems (ACM SIGSPATIAL 2015)
[2] Yu Zheng, Licia Capra, Ouri Wolfson, Hai Yang. Urban Computing: concepts, methodologies, and applications. ACM Transaction on Intelligent Systems and Technology (ACM TIST). 5(3), 2014.
This dataset is comprised of five parts of data, named Taxi Trip Data, Bike sharing data, 311 data, POIs and road network data of NYC. Please cite the following papers when using the dataset.
[1] Yu Zheng, Huichu Zhang, Yong Yu. Detecting Collective Anomalies from Multiple Spatio-Temporal Datasets across Different Domains. In Proceedings of the 23rd ACM International Conference on Advances in Geographical Information Systems (ACM SIGSPATIAL 2015). (Data) (Codes)
[2] Yu Zheng, Licia Capra, Ouri Wolfson, Hai Yang. Urban Computing: concepts, methodologies, and applications. ACM Transaction on Intelligent Systems and Technology (ACM TIST). 5(3), 2014.
This data set consists of two types of crowd flows. One is a five-year taxis flow in Beijing. The other is bike usage in a bike sharing system in New York City. A research on predicting flow of crowds have been conducted based on this dataset. Please cite the following paper when using the dataset.
[1] Junbo Zhang, Yu Zheng, Dekang Qi. Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction, In Proceedings of the 31st AAAI Conference (AAAI 2017). (code)(data)(system)
Chinese Bio
郑宇(1979年-)博士、教授,湖南衡阳人,国际计算机学会会士(ACM Fellow)、国际电器与电子工程师协会会士(IEEE Fellow),京东集团副总裁、京东科技首席数据科学家、京东城市总裁、京东智能城市研究院院长,国家“万人计划”科技创新领军人才、享受国务院特殊津贴专家。在加入京东集团之前,在微软亚洲研究院工作12年,是城市计算领域负责人;同时也是上海交通大学的讲座教授(Chair Professor)、西安电子科技大学华山学者讲座教授、南京大学、香港科技大学和香港理工大学等多所知名高校的客座教授,以及西南交通大学人工智能研究院院长。
提出了城市计算理论体系,在国际上开辟了“城市计算”(Urban Computing)领域和学科,是城市计算领域的先驱和奠基人,也是大数据和人工智能领域的领军人物和实践者。在顶尖国际期刊和会议上发表论文两百余篇,论文被引用67,000余次,H-Index:117,根据Google Scholar的排名,在城市计算和时空数据挖掘领域均位列世界第一。个人专著《Urban Computing》由麻省理工出版社出版,成为该领域的第一本教科书,被翻译成多个国家的语言,在高校中使用。主编的《Computing with Spatial Trajectories》一书被Springer评为全球华人撰写的最受欢迎的十本计算机类图书之一。连续多年入选全球高被引学者和全球顶尖科学家,根据AI2000影响力排名,在数据挖掘领域位列中国第一、全球第八。
2019年,作为中国首位受邀学者在国际人工智能顶尖会议AAAI上发表主旨演讲,并在MDM 2021、SSTD 2021、KDD 2019 Plenary Keynote Panel和IEEE Big Data 2025等十几个国际会上发表主旨演讲。在国内人工智能、机器学习、数据挖掘和大数据领域的最高等级会议(CCFAI、CCML、CCDM、CCF BIGDATA)上均做过大会主旨报告。
担任人工智能旗舰刊物ACM Transactions on Intelligent Systems and Technology的主编(2015-2021)—是担任ACM顶尖期刊主编的首位大陆学者;担任大数据领域知名国际会议ICDE2014和CIKM2017的程序委员会主席、人工智能领域顶尖国际会议IJCAI2019的工业界主席,以及ACM数据挖掘中国分会(KDD China)主席,有效连接了工业界和学界,国内和国际的数据科学领域。担任北京市智慧城市专家委员会委员、全总数字化技术专家委员会委员、中国地理信息产业协会城市空间信息工作委员会副主任、中国计算机学会人工智能专委会常委等社会职务。担任北京大数据交易所、首都会展集团、特斯联、首旅慧科等公司的董事。
拥有丰富的科研实践和项目落地经验,主持国家级科研项目4项,担任国家重点研发计划-智慧城市与物联网重大专项首席科学家、总负责人,主导工业界和政府侧亿级经费以上大型项目二十余个。拥有100多项国际和中国发明专利,多项研究成果被应用于微软和京东的产品,获得两次获得国家优秀专利奖、三次微软技术转化奖和京东集团最高奖项“CEO特别奖”。他主持研发的Urban Air首次利用大数据和人工智能技术来监测和预报细粒度空气质量,该服务覆盖了中国的300多个城市,并被中国环境保护部采用。
具有二十余年中美领先科技公司的管理和产品研发经验。开创了京东智能城市业务板块,为全国70多个城市提供服务,创造直接经济效益超百亿元,具有丰富的战略规划、组织管理、产品研发、和产业拓展经验。提出的城市计算为雄安智能城市建设提供了理论支撑,带领团队研发的城市操作系统成为雄安新区的数字底座,经国家批复,雄安的智能中枢以“雄安城市计算中心”命名。主导了南通、大同等十余个城市的一网统管项目,撰写的《城市治理一网统管》一书,成为多地政府的学习材料。带领团队跟北京市一起推出了中国首个面向政府的协同办公系统“京办”,显著提高了政府的工作和协同效率。为北京国际大数据交易所搭建技术服务体系,开启了中国数据交易的新篇章。
七项研究成果历经行业十年的验证,分别于2022和2024年两次获得国际数据挖掘领域最高技术奖SIGKDD Test-of-Time Award(中国唯一学者),于2019、2020、2022和2023年四次获得时空数据领域国际最高技术奖SIGSPATIAL 10-Year-Impact Award(全球唯一),以及IEEE MDM 2023 Test-of-Time Award。获得中国计算机学会科技进步一等奖(排1)、中国电子学会优秀博士论文导师奖两次(2022和2024)。
2013年,郑宇博士因在城市计算领域的贡献被MIT科技评论评为全球杰出青年创新者(MIT TR35),该奖项从计算机、通信、生物、医疗和金融等多个领域中全球范围一共评选出35位35岁以下的顶尖创新者。2013年11月,他作为现代创新者代表登上了美国《时代》周刊。2014年,由于他主导的城市计算具有巨大的商业前景和改变行业格局的潜力,被美国《财富》评选为中国40位40岁以下商界精英。获得首都劳动奖章(2020)和中国AI金雁奖卓越成就奖(2021)。2023年被北京市委评为“北京市有突出贡献的科学、技术、管理人才”。2025年经北京市委组织部遴选,入选“首都一线产业科技人才宣讲团”(新一代信息技术方向代表)。因在时空数据挖掘和城市计算领域的杰出贡献,于2016年被评为美国计算机学会杰出科学家(ACM Distinguished Scientist),2020年11月,被评为国际电气与电子工程师协会会士(IEEE Fellow),2026年1月,当选国际计算机学会会士(ACM Fellow)。