The new liberal arts textbook "Theory, Technology and Application of Tourism Big Data" co edited by Dai Bin, President of China Tourism Research Institute, and Tang Xiaoyun, Vice President, has recently been published and distributed by Higher Education Press. Dean Dai Bin wrote the preface and introduction for this book, and the full text is as follows: 01 Origin and Objectives In June 2008, the Central Compilation and Administration approved the establishment of the China Tourism Research Institute to restore the National Tourism Administration, with the core task of monitoring and analyzing the operation of the tourism economy. In December 2015, with the addition of the original National Tourism Administration data center and the establishment of the Key Laboratory of Tourism Economy, data construction entered the fast lane. After the establishment of the Ministry of Culture and Tourism in March 2018, the China Tourism Research Institute and the former National Tourism Administration Data Center were transferred and renamed as the China Tourism Research Institute (Data Center of the Ministry of Culture and Tourism), which also included demand side cultural data in its work objectives. Over the past decade, through data exchange between government agencies and communication companies such as the National Bureau of Statistics, National Immigration Bureau, China UnionPay, and China Telecom, as well as data jointly produced by travel service providers' laboratories and self collected data from tourist satisfaction surveys, we have initially built a first-class and internationally renowned tourism data center in China. In the construction of tourism disciplines and the output of think tank achievements, we adhere to the principle of "a set of data is better than a thousand words" and put it into practice. Looking back, although some achievements have been made in platform construction, team building, standard and process construction, as well as theoretical research, literature collection, and data accumulation, holiday tourism data and quarterly analysis have also become hot topics of systematic, industry, and social attention. However, when we try to build a tourism big data community in the sense of Hayek's "knowledge expansion order", we find that even comrades in tourism statistics research, teaching, and frontline work find it difficult to reach consensus on basic concepts, theories, and methods. The real problem facing us is that grassroots comrades engaged in frontline tourism statistics and big data analysis do not have feasible conditions for theoretical construction, while most university teachers responsible for knowledge production and talent cultivation lack problem awareness and frontline experience. In this context, the China Tourism Research Institute (Data Center of the Ministry of Culture and Tourism) took the initiative to undertake this foundational work and organized professional teams from the Institute of Statistics and Survey, the Institute of Data Analysis, the Key Laboratory of Tourism Economy, Culture and Tourism, and the Postdoctoral Workstation. Together with teaching and research personnel from relevant universities, a writing team was formed to develop a textbook that balances theoretical research, talent cultivation, and practical work. Compared to journal articles and theoretical works that pursue logical consistency and marginal innovation, systematizing textbook writing of knowledge that is dispersed in different scenarios but recognized by the academic community is a more difficult academic output. Unfortunately, in the existing evaluation system of universities and research institutes, the importance of textbooks, translations, speeches, and popular science works is greatly underestimated. Original knowledge discovery, technological invention, and theoretical construction are difficult, and it is not an easy task to select the most valuable pearls in the boundless ocean of knowledge. It's not that easy to break a broken spear, sink sand, and leave iron unsold. I will polish and recognize the previous dynasty. If we still want to embed these knowledge points into the starry sky of history, making learners associate them with the idea of "the east wind does not match Zhou Bing's convenience, the bronze sparrow's spring deeply locks the two trees", it will be an overpass connecting the past and the future, individuals and the world. Since it is an overpass, the foundation of concepts, principles, tools, and methods must be firmly established, rather than built on a beach. In terms of the core concept of big data, it seems that its connotation, extension, and characteristics are self-evident. But if someone really asks' what is big data ', how should we answer? Well, Big Data and Mega Data are related to the Internet, the Internet of Things, machine learning, and 5G. They have 4V characteristics, namely, volume, speed, diversity, and value. What was the result? It is likely that computer professionals may find it shallow, while non computer professionals may be confused and confused. For this purpose, the writing team of this book must creatively retell technological achievements and theoretical knowledge, and systematize them. The big data we see is first and foremost a vast amount of information and data minerals, but too much information can increase the difficulty for users to grasp the essence of things. With too many minerals, we need to learn to use technologies such as distributed storage, distributed computing, and nonlinear decision-making to further understand big data. It updates people's perspectives and concepts, affects interpersonal communication and social governance methods and behaviors. In the preliminary discussion of the writing team, in addition to the logical framework and basic modules of the book, the most discussed topics were the understanding of basic concepts, principles, and methods, as well as the questioning of what problems to solve. Due to confusion, a question was asked
Is the world random, and can data help us better understand the tourism industry?
In the early stage of reform and opening up, in order to adapt to the market-oriented development of inbound tourism, the mainstream of tourism research was applied research, while the mainstream of tourism education was vocational education, or in other words, tourism theory and practice were integrated. Since the mid to late 20th century, the awareness of a tourism academic community has begun to emerge, and science has become the value orientation of tourism theory construction. Some tourism departments in universities have been renamed as tourism departments and tourism science colleges, and some tourism research institutes, research institutes, and research centers have been established. The development of modern science cannot be separated from mathematics, experimentation, and statistics. However, the tourism administrative department, through the traditional sampling statistical system, regularly releases tourism statistical yearbooks, which cannot meet the needs of scientific research, talent cultivation, and market analysis in terms of frequency and precision. The progress of Internet, Internet of Things, 5G communication, machine learning and other technologies has brought tourism statistics into the era of big data. Government agencies, research institutes, universities and enterprises have established big data centers in the name of smart tourism. So, is big data more helpful for us to understand the world than traditional statistical methods and even empirical judgments? From a historical perspective, science represented by Newtonian mechanics has been increasingly applied to the real world since the 19th century, and has achieved great success in the industrial revolution, economic growth, national prosperity, and human civilization evolution. It has also deepened the worldview of determinism, allowing us to describe reality and predict the future through concepts, propositions, models, and mathematical formulas. As a result, academic research in tourism has begun to emerge independently from the rich and diverse industry practices, using statistical, experimental, and data analysis tools to go further and further in the direction of logical consistency and disciplinary independence. The question is, is this road itself right? Or is it scientific? The founder of statistical thought, Karl Pearson, believed that the world is not deterministic, but rather random. Randomness follows certain patterns and can be described using probability distributions or precise mathematical fractions. In more academic terms, the observed quantity itself is random, and what is observed in scientific experiments is actually a "distribution". The so-called "error" is simply a reflection of the random nature of the observed quantity. This inevitably leads to a paradoxical situation, where the more accurate the data appears, the more dimensions it has, and the farther it is from essence and reality. Ronald Fisher, the founder of modern mathematical statistics, pointed out that the data from Mendelian pea experiments were falsified without repeated experiments: "Their accuracy was so high that they did not exhibit the randomness they should have, so they cannot be true. What a genius statement! It reminds me of the old saying 'if something goes wrong, it will be a demon'. From this perspective, before discussing big data, it is necessary to have a systematic understanding of the laws and theories of statistical work.
Can the hypotheses of modern statistics - testing and causal inference - answer Hume's question?
Using concepts and data to describe the external characteristics of things as accurately as possible is only the beginning of scientific research, while exploring the essence of phenomena, as well as the evolutionary laws and influencing factors of things, is the curiosity and eternal spiritual energy of scientists. For a long time, people have been curious about the relationship between cause and effect, while also struggling with the ambiguity of the concepts used. Francis Galton, known as the "genius of Victoria," discovered the phenomenon of "mean regression" in the pre statistical era: if the father is very tall, the child is often shorter than the father; If the father is very short, the child is often taller than the father. It seems that there is a mysterious force that keeps human height away from extremes and towards the average of everyone. The phenomenon of mean regression is not only applicable to studying human height issues, but almost all observations face the challenge of mean regression. This dilemma so torments the "glory of the human mind" that Scottish philosopher David Hume insists that humans can only recognize the constant sequential relationships between things based solely on experience, and cannot recognize any causal relationships. Thanks to the development of modern statistics, especially randomized controlled trials (RCTs), causal inference has become a feasible path to answer Hume's questions. Neiman, known as one of the "Four Heavenly Kings" of statistics, is so proud of his groundbreaking work in statistical hypothesis testing that he calls it the "Copernican revolution" in the history of statistical development. This revolution has not only achieved corresponding results in the fields of natural and engineering sciences, but also made gratifying progress in studying the causal relationships of complex social problems. In 2019, three economists from MIT, Abhijit Banerjee, Esther Du fl o, and Michael Kremer, were awarded the Nobel Prize in Economics for their experimental research on development economics. In the research and teaching process of tourism big data, we must not forget the original intention of exploring the essence of tourism, and must keep in mind the mission of promoting high-quality development of the tourism industry, rather than showing off computing tools and mathematical methods, and cannot be addicted to big data itself.
Is there a mathematical model for causality that is "correct"?
After the generation of big data, statisticians will use corresponding mathematical models to verify and explain the complex relationship between independent and dependent variables. Statisticians need to judge whether data deviates from the normal trajectory under specific spatiotemporal conditions based on this, and propose policy recommendations for countercyclical regulation or discretionary decision-making. However, is there an absolutely correct mathematical model and identity? Many models in physics, chemistry, biology, statistics, and economics have been destined to be negated and surpassed since their inception, such as the geocentric theory, heliocentric theory, gravity, special relativity, general relativity, quantum mechanics, singularities, the Big Bang, gravitational waves, and others. Lakatos regards falsifiability as one of the essential elements of science, and that's the truth. In fact, data models often do not distinguish between "right" and "wrong", but place great emphasis on the distinction between "good" and "bad", such as consistency, unbiasedness, and effectiveness. In the sense of philosophy of science, it means a humble and open worldview. In the construction of tourism statistics discipline and big data analysis work, we may never discover the "ultimate truth", but in the pursuit of better models, we can still use statistical empirical evidence and thought experiments to approach it infinitely.
03 Sample, Experiment, and Knowledge Production
Does big data have to be a large sample or even a full sample?
In the process of monitoring and analyzing the operation of tourism economy, especially in the execution of special statistical tasks such as holiday tourism, we often encounter the problem of insufficient sample size. In fact, except for the regular national population and economic census, almost all statistics are based on sampling. Sampling is a science that requires systematic design, trial investigation, stability verification, and periodic replacement of the representativeness of the sample. Sampling is also a practice that requires the establishment of a professional team for collection, summarization, cleaning, and quality control. Specific sample library construction, online and offline surveys, and compliance reviews need to be carried out in accordance with laws and regulations such as the Statistics Law of the People's Republic of China and the National Cultural Relics and Tourism Statistical Survey System. Of course, maintaining samples, building platforms, and collecting data also require corresponding financial budgets. The saying 'before the troops move, food and supplies come first' refers to this principle. Many workers engaged in tourism statistics and big data analysis tend to prefer a larger sample size, preferably a full sample, but overlook the lack of scientific methodology and feasibility in terms of manpower and financial resources. Whether in terms of ideological understanding, theoretical preparation, or practical experience in tourism statistics and data production, big data and sampling based statistics are not an either or opposition, but rather complement, confirm, and promote each other.
2. Big data requires experimentation and is also a new driving force for cultivating laboratory economy in the tourism industry.
The theoretical construction and scientific research of tourism big data must start from problems, face the macroeconomic regulation of national economy and social development, the micro supervision of tourism administrative authorities, and the practical needs of market entities' investment, research and development, innovation, operation, transformation, upgrading and other commercial activities. For a long time, the vertical incomparability, horizontal incomparability, lack of connectivity, and insufficient supply of structural data in tourism statistics have been criticized by the industry. The research and application of big data are aimed at solving rather than increasing the "chaos of tourism statistics", but from the perspective of smart tourism and big data application effects in the past decade, there are also significant irregularities in the construction, application, and release of tourism big data that cannot be ignored. Statistics need to be designed, and big data also requires scientific popularization, conceptual consensus, platform support, organizational construction, and model validation. The application of tourism big data in statistics is becoming increasingly widespread, but due to the lack of unified and standardized technical methods, the statistical logic of tourism big data is chaotic, with many errors and few positives. At the same time as theoretical research and teaching reform, a standard system for tourism statistics and big data applications should be developed as soon as possible. The technical standards for the scale requirements, processing rules, core algorithms, etc. of big data such as location, consumption, orders, and crawlers should be publicly disclosed after expert verification and approval by the competent authorities. For data indicators that are difficult to standardize, algorithm guidelines should be developed to ensure that key rules are unified and comparable, and to avoid causing a new round of chaos in tourism statistics. In the process of writing and teaching this book, positive responses were given to the above-mentioned issues.
3. Big data works or textbooks should be understandable to professional learners, and more importantly, practical workers should be able to use them.
Ronald Fisher's masterpiece "Statistical Methods for Researchers" published in 1925 provides examples of methods for creating charts, analyzing data, interpreting results, listing formulas, and even detailing their use on mechanical computers. However, all formulas lack mathematical derivation and proof. For researchers in a certain discipline, as long as these formulas and methods are the "best" models currently available, it is sufficient, just as applied economists only need to know that without definite, tradable, and protected property rights, there can be no economic prosperity and growth. As for the mathematical proof of the Coase theorem, it should be left to theoretical economists with solid mathematical foundations, just as Professor Yang Xiaokai has made contributions. Compared to statistical theory and computer science, the application of tourism big data is more profound, even in its theoretical construction. In the process of preliminary research and writing, I repeatedly consulted with Dr. Tang Xiaoyun, Dr. Ma Yiliang, and Dr. Xie Zhongwen, all of whom have received systematic academic training in management engineering, statistics, computer science, and other disciplines. Starting from the application level of their majors, they bring the light of science into the reality of the tourism industry, so that more frontline workers can also be interested, understand, and learn, and apply it in the practice of tourism statistics and big data analysis. In fact, science, theory, and knowledge, including big data, should not be used solely for worship, nor should they be kept away from people. Instead, they should be brought closer and utilized.
04 Acknowledgements
In the process of writing the preface, part of the views came from the rambling talk of the "Splendid Mountain People" on the WeChat official account, and the series of articles of Carl Pearson, Fisher, Egon Pearson, and Neiman, the "Four Heavenly Kings" of modern statistics; Professor Ding Peng published the "causal inference | leap in thinking of modern statistics" on the "econometrics" of WeChat official account, as well as some statistics and economics textbooks. This preface serves as an explanation of the editor's thoughts and a guide to the book, but does not list all the references and provide specific citation annotations in accordance with strict academic standards. Here, I and all the staff members pay tribute to all the pioneers in the fields of statistics, computer science, and big data.
Thank you to Dr. Tang Xiaoyun, Dr. Ma Yiliang, Dr. He Qiongfeng, Dr. Xie Zhongwen, Dr. Qiao Xiangjie, Dr. Li Yi, Dr. Zeng Tian, as well as to the surrounding, Wang Feng, Wang Zaorong, Liu Xuefeng, and Wang Liangju, Wang Juan, Liu Qinyun, Qian Tianyu, Li Huiyun, Ding Zhaohan, Mao Wei, Yang Suzhen, Dai Huihui, Guo Kexin, Hu Ningting, Wu Yuhan, Lu Guoping, Liu Yu, Gao Zhaoqing, Dai Jiqiu, Han Jinfang, Hu Yongjun, Zhang Jiayi, Chen Xiaohua, Shen Qibo, Zhang Yurong, Zheng Tao, Lin Zhisheng, Fan Xinsheng Friends and other professionals in tourism statistics and big data jointly form a luxurious research and teaching team in the field of domestic tourism statistics and big data. Dr. Zeng Tian, as the liaison for the writing of this book, has done a lot of work in personnel communication and manuscript writing. Thank you for her hard work. Without everyone's understanding, recognition, and effort, this book would not have been able to meet readers so quickly. Compared to the academic backgrounds and professional abilities of many members in the writing team, I am also a learner of tourism big data. Compared to their hands-on involvement in research and writing, I am more of a producer and director than an editor in chief.
Thank you to the editors of Higher Education Press for your hard work, and also to all the teachers, students, and tourism statisticians who have chosen this book. Thanks to your efforts, China's tourism statistics research and big data applications can steadily move forward along the path of science, and tourism work has obvious professional attributes.