Constantina Nicolaou1, Amal Vaidya1, Fabon Dzogang2, David Wardrope1,2 and Nikos Konstantinidis1, 1Department of Physics and Astronomy, University College London, Gower Street, London WC1E 6BT, UK and 2ASOS AI, Greater London House,Hampstead Road, London NW1 7FB, UK
We study the performance of customer intent classifiers designed to predict the most popular intent received through ASOS customer care, namely “Where is my order?”. We conduct extensive experiments to compare the accuracy of two popular classification models: logistic regression via N-grams that account for sequences in the data and recurrent neural networks that perform the extraction of sequential patterns automatically. A Mann-Whitney U test indicated that F1 score on a representative sample of held out labelled messages was greater for linear N-grams classifiers than for recurrent neural networks classifiers (M1=0.828, M2=0.815; U=1,196, P=1.46e-20), unless all neural layers including the word representation layer were trained jointly on the classification task (M1=0.831, M2=0.828, U=4,280, P=8.24e-4). Overall our results indicate that using simple linear models in modern AI production systems is a judicious choice unless the necessity for higher accuracy significantly outweighs the cost of much longer training times.
Natural Language Processing, Intent Classification, Bag-of-words, Recurrent Neural Networks
Marco A. Palomino1 and Adithya Murali2, 1School of Computing, Electronics and Mathematics, University of Plymouth, Drake Circus, Plymouth, PL4 8AA, United Kingdom and 2School of Computing Science and Engineering, Vellore Institute of Technology, Vellore - 632 014, Tamil Nadu, India
Online trends have established themselves as a new method of information propagation that is reshaping journalism in the digital age. Services such as Google Trends and Twitter Trends have recently attracted a great deal of attention. Taking election campaigns as an example, journalists, campaign managers and political analysts have looked into trends to determine candidates’ popularity and predict likely election outcomes. Trend discovery has therefore become a fundamental aid to monitor and summarise information. While previous research on trend discovery has focused on the dynamics of data streams, we argue that sentiment analysis—the classification of human emotion expressed in text—can enhance existing algorithms for trend discovery. By highlighting topics that are strongly polarised, sentiment analysis can offer further insight into the influence of users who are involved in a trend, and how other users adopt such a trend. As a case study, we have investigated a highly topical subject: Brexit, the withdrawal of the United Kingdom from the European Union. We retrieved an experimental corpus of publicly available tweets referring to Brexit and used them to test a proposed algorithm to identify trends. We validate the efficiency of the algorithm and gauge the sentiment expressed on the captured trends to confirm that highly polarised data ensures the emergence of trends.
Text mining, Twitter, sentiment analysis, information retrieval.
Nadine Kuhnert1,2 and Andreas Maier1, 1Pattern Recognition, Friedrich-Alexander University, Erlangen-Nueremberg, Germany and 2Siemens Healthcare GmbH, Erlangen, Germany
We aim to model unknown file processing. As the content of log files often evolves over time, we established a dynamic statistical model which learns and adapts processing and parsing rules. First, we limit the amount of unstructured text by focusing only on those frequent patterns which lead to the desired output table similar to Vaarandi . Second, we transform the found frequent patterns and the output stating the parsed table into a Hidden Markov Model (HMM). We use this HMM as a specific, however, flexible representation of a pattern for log file processing. With changes in the raw log file distorting learned patterns, we aim the model to adapt automatically in order to maintain high quality output. After training our model on one system type, applying the model and the resulting parsing rule to a different system with slightly different log file patterns, we achieve an accuracy over 99%.
Hidden Markov Models, Parameter Extraction, Parsing, Text Mining, Information Retrieval.
Piotr Malak, Institute of Information Science and Book Studies, University of Wroclaw, Poland
In current paper we discuss the results of preliminary, but promising, research on including some Natural Language Processing and Machine Learning approaches into Information Retrieval. Classical IR uses indexing and term weighting in order to increase pertinence of answers given to users queries. However, such approach allows for meaning matching, i.e. matching all keywords of the same or very similar meaning as expressed in user query. For most cases this approach is sufficient enough to fulfil user information needs. However indexing and retrieving information over professional language texts brings new challenges as well as new possibilities. One of them is different grammar, causing the need of adjusting NLP tools for a given professiolect. One of the possibilities is detecting the context of occurrence of indexed term in the text. In our research we made an attempt to answer the question whether Natural Language Processing (NLP) approach combined with supervised Machine Learning (ML) is capable of detecting contextual features of professional language texts.
Enhanced Information Retrieval, Contextual IR, NLP, Machine Learning.
Enmei Wang1 and Shunan Wu2, 1School of Aeronautics and Astronautics, Dalian University of Technology, Dalian City, China and 2Key Laboratory of Advanced Technology for Aerospace Vehicles, Dalian University of Technology, Dalian City, China
To deal with the issues of vibration suppression of the large space structures (LSS) such as design complexity, fault-tolerant limitation, repeated expansion difficulty and etc., a distributed vibration control approach is proposed in this paper. According to the structure characteristics, the LSS is firstly divided into different control units, and the dynamic model of each unit is developed. The distributed LQR vibration controller of each unit is then designed and the final distributed vibration control system of the whole structure is therefore integrated. Simulations are presented to verify the validity of the proposed controller, and the results demonstrate that repeatable distributed controllers can achieve vibration suppression for LSS and provide good fault-tolerance performance.
Large Space Structure, Distributed Control, Linear Quadratic Regulator, Fault Tolerance
Nataly Ilyasova1,2 and Alexander Shirokanev1,2, 1IPSI RAS - branch of the FSRC «Crystallography and Photonics» RAS, Samara, Russia 2Samara National Research University, Samara, Russia
In this paper, information technology has been developed for automatic highlighting the lungs on x-ray images, based on the images pre-processing, calculation of textural properties and classification of kmeans. In some cases, the highlighted objects can describe not only the current patient’s condition but also specific characteristics regarding age, gender, constitution, etc. While using the k-means method, the relationship between the segmentation error and fragmentation window size was revealed. Within the study, both a visual criterion for evaluating the quality of the segmentation result and a criterion based on calculating the clustering error on a large set of fragmented images were implemented. The study also included image pre-processing techniques. Thus, the study showed that the technology provided key objects highlighting error at 26%. However, the equalizing procedure has lessened this error to 14%. Xray image clustering errors for fragmentation windows of 12x12, 24x24 and 36x36 were presented.
Lungs X-rays Images, Image Processing, Texture Analysis, Selection Technique of Interest Regions
Nataly Ilyasova1,2 and Alexander Shirokanev1,2, 1IPSI RAS - branch of the FSRC «Crystallography and Photonics» RAS, Samara, Russia and 2Samara National Research University, Samara, Russia
The article proposes a new method for analyzing eye fundus images. The method is based on the convolutional neural network (CNN). The CNN architecture was constructed, followed by network learning on a balanced dataset composed of four classes of images, composed of thick and thin blood vessels, healthy areas, and exudate areas. Segmentation of fundus images was performed using CNN. Considering that exudates are a primary target of laser coagulation surgery, the segmentation error was calculated on the exudate class, amounting to 5%. In the course of this research, the HSL color system was found to be most informative, using which the segmentation error was reduced to 3%.
Convolution Neural Networks, Fundus Image, Diabetic Retinopathy, Exudates, Laser Coagulation Image Processing, Image Segmentation
Clark Ren, Yu Sun and Fangyan Zhang, California State Polytechnic University, USA
As more and more students get access to computers to aid them in their studies, they also gain access to machines that can play games, which can negatively affect a student's academic performance. However, it is also debated that playing video games could also positively affect a student’s academic performance. In order to address both sides of the argument, we can create an app that limits the amount of time a student has to play games while not completely removing the ability for students to play games.
Parental Control, Smart System, Digital Games, Web Service
Yuhan Chen1, Yu Sun2 and Fangyan Zhang3, 1Santa Margarita Catholic High School, Rancho Santa Margarita CA 92688, 2Department of Computer Science, California State Polytechnic University, Pomona, CA, 91768 and 3ASML, San Jose, CA, 95131
Book selling and exchange is very popular among students at campus, especially at the beginning of each semester, which can save students’ expense on text books. Generally, a used book may be only worth one third or half of the original price or even less. However, existing book selling platforms have various issues in practice, such as not user-friendly, not efficient, or not widely used among students. In this paper, we develop a new book selling and exchange platform, which facilitates the distribution of book selling information and the communication between sellers and buyers. This application can be easily used on smartphone after it is properly downloaded and installed from app store.
Book selling platform, distributed system, iOS and Android system, Firebase
Junyi Lu1, Yu Sun1 and Fangyan Zhang2, 1Department of Computer Science, California State Polytechnic University, Pomona, CA, 91768 and 2ASML, San Jose, CA, 95131
Many of the urban areas are highly polluted. They are aware of the damage caused by pollution but lack efficient and economical solutions to address it. The purpose of this project is to design a portable pollution sensor that is able to communicate with an online database and allow users to access data through the internet. The algorithm of machine learning is able to create data models to predict future pollution level with existed data values.
Deep Learning, Environmental detection, Machine Learning, Wireless Network
Mike Qu1, Qi Lu2 and Yu Sun3, 1Northwood High School Irvine, CA, 92602, 2Department of Social Science University of California, Irvine Irvine, CA, 92697 and 3Department of Computer Science California State Polytechnic University, Pomona, Pomona, CA, 91768
This paper proposes the concept overall concept of a bluetooth-based proximity IoT device. Its methodology is introduced along with a working prototype. Potential applications and viability for data analysis is also discussed. In summary, such a device accomplishes the goal of close-range information communication in an effective, secure and reliable way through the use of beacons utilizing the BLE (Bluetooth Low Energy) Technology, a variety of different receiver devices that has native support for Bluetooth as well as a database used for data storage and retrieval. The biggest advantage of this technology is that proximity-based Internet has virtually endless potential for applications in the real world, including healthcare, retail and industrial manufacturing.
BLE, Proximity sensing, Machine learning