{"id":2385,"date":"2018-02-19T13:01:23","date_gmt":"2018-02-19T13:01:23","guid":{"rendered":"https:\/\/www.syslog.cl.cam.ac.uk\/?p=2385"},"modified":"2018-02-19T13:01:23","modified_gmt":"2018-02-19T13:01:23","slug":"aaai-aies18-trip-report","status":"publish","type":"post","link":"https:\/\/www.syslog.cl.cam.ac.uk\/2018\/02\/19\/aaai-aies18-trip-report\/","title":{"rendered":"AAAI\/AIES\u201918 Trip Report"},"content":{"rendered":"<p>Recently I\u2019m honoured to get the opportunity to present our work \u201cPrivacy-preserving Machine Learning Based Data Analytics on Edge Devices\u201d at the <a href=\"http:\/\/www.aies-conference.com\/accepted-papers\/\">AIES'18<\/a> conference, which is co-located with <a href=\"https:\/\/aaai.org\/Conferences\/AAAI-18\">AAAI'18<\/a>, one of the top conference in the field of AI and Machine Learning. Here is a brief review of some of the papers and trends that I find interesting from this conference.<\/p>\n<p>Activity detection is obviously a hot topic. New building blocks are invented. \u201cAction Prediction from Videos via Memorizing Hard-to-Predict Samples\u201d aims to improve the prediction accuracy, since one challenge is that different actions may share similar early-phase pattern. The proposed solution is a new CNN plus augmented LSTM network block. \u201cA Cascaded Inception of Inception Network with Attention Modulated Feature Fusion for Human Pose Estimation\u201d proposes a new \u201cInception-of-Inception\u201d block to solve current limitations in preserving low level features, adaptively adjusting the importance of different levels of features, and modelling the human perception process. The research focus is also on reduce the computation overhead. \u201cR-C3D: Region Convolutional 3D Network for Temporal Activity Detection\u201d aims to reduce the time of activity detection by sharing convolutional features between the proposal and the classification pipelines. \u201cA Self-Adaptive Proposal Model for Temporal Action Detection based on Reinforcement Learning\u201d proposes that agent can learn to find actions through continuously adjusting the temporal bounds in a self-adaptive way to reduce required computation.<\/p>\n<p>Face identification is also a widely discussed topic. \u201cDual-reference Face Retrieval\u201d propose a mechanism to enable recognise face at a specific age range. The solution is to take another reference image at certain age range, then search similar face of similar age. Person re-identification associates various person images, captured by different surveillance cameras, to the same person. Its main challenge is the large quantity of noisy video sources. In \u201cVideo-based Person Re-identification via Self Paced Weighting\u201d, the authors claims that not every frame in videos should be treated equally. Their approach reduces noises and improves the detection accuracy. In \u201cGraph Correspondence Transfer for person re-identification\u201d, the authors try to solve the problem of spatial misalignment caused by large variations in view angles and human poses.<\/p>\n<p>To improve Deep Neural Networks, many researchers seek to transfer the learned knowledge to new environment. \u201cRegion-based Quality Estimation Network for Large-scale Person Re-identification\u201d is another paper on person re-identification. It proposes a training method to learn the lost information from other regions and thus performs good with input of low quality. \u201cMultispectral Transfer Network: Unsupervised Depth Estimation for all day vision\u201d estimate depth image from a single thermal image. \u201cLess-forgetful learning for domain expansion in DNN\u201d enhances DNN so that it can remember previously learned information when learning new data from a new domain. Another line of research is to enhance training data generation. \u201cMix-and-Match Tuning for Self-Supervised Semantic Segmentation\u201d reduces dataset required for training segmentation network. \u201cHierarchical Nonlinear Orthogonal Adaptive-Subspace Self-Organizing Map based Feature Extraction for Human Action Recognition\u201d aims to solve of the problem that feature extraction need large-scale labelled data for training. Its solution is to adaptively learn effective features from data without supervision.<\/p>\n<p>One computation theme in these research work is that to reduce the computation overhead. \u201cRecurrent Attentional Reinforcement Learning for Multi-label Image Recognition\u201d achieves it by locating the redundant computation in the region proposal in image recognition. \u201cAuto-Balanced Filter Pruning for Efficient Convolutional Neural Networks\u201d compresses network module by throwing away a large part of filters in the proposed two-pass training approach. Another trend is to combine multiple input sources to improve accuracy. \u201cAction Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion\u201d combine multiple video streams to achieve more precise feature extraction of different granularity. \u201cMultimodal Keyless Attention Fusion for Video Classification\u201d combines multiple single-modal models such as rgb, flow, and sound models to solve the problem that CNN and RNN models are difficult to be combined to for joint end-to-end training directly on large-scale datasets. \u201cHierarchical Discriminative Learning for Visible Thermal Person Re-Identification\u201d improves person re-identification by cross-compare normal and thermal video streams.<\/p>\n<p>It is not a surprise that not many system-related papers are presented at this conference. \u201cComputation Error Analysis of Block Floating Point Arithmetic Oriented Convolution Neural Network Accelerator Design\u201d focuses one the challenge of float-point arithmetic overhead in transplant CNNs on FPGA. \u201cAdaComp: Adaptive Residual Gradient Compression for Data-Parallel Distributed Training\u201d proposes a gradient compression technique to reduce the communication bottleneck in distributed training.<\/p>\n<p>The research from industry takes large portion in this conference. IBM presents a series of papers and demos. For example, the \u201cDataset evolver\u201d is an interactive Jupyter notebook-based tool to support data scientists perform feature engineering for classification tasks, and \u201cDLPaper2Code: Auto-generation of Code from Deep Learning Research Papers\u201d proposes to automatically take design flow diagrams and tables from existing research and translate them to abstract computational graphs, then to Keras\/Caffe implementations. A full list of IBM\u2019s work at AAAI can be seen <a href=\"https:\/\/www.ibm.com\/blogs\/research\/2018\/02\/ibm-research-ai-aaai18\/\">here<\/a>. Google presents \u201cModeling Individual Labelers Improves Classification\u201d and \u201cLearning to Attack: Adversarial Transformation Networks\u201d, and Facebook shows \u201cEfficient Large-scale Multi-modal classification\u201d. Both companies focus on a specific application field compared to IBM wide spectrum of research. Many research work from industry are closely related to application, such as Alibaba\u2019s \u201dA multi-task learning approach for improving product title compression with User search log data.\u201d Though I\u2019m curious to find that financial companies are not found at the conference.<\/p>\n<p>On the other hand, the top universities tend to focus on theoretical work. \u201cInterpreting CNN Knowledge Via An Explanatory Graph\u201d from UCLA aims to explain a CNN model and improve its transparency. Tokyo University presents \u201cConstructing Hierarchical Bayesian Network with Pooling\u201d and \u201cAlternating circulant random features for semigroup kernels\u201d. CMU presents \u201cBrute-Force Facial Landmark Analysis with A 140,000-way classifier\u201d. Together with ETH Zurich, MIT shows \u201cStreaming Non-monotone submodular maximization: personalized video summarization\u201d. However, the work of UC Berkeley seems to be absent from this conference.<\/p>\n<p>Adversarial learning is one key topic in different vision-related research areas. For example, \u201cAdversarial Discriminative Heterogeneous Face Recognition\u201d, \u201cExtreme Low resolution activity recognition with Multi-siamese embedding learning\u201d, and \u201cEnd-to-end united video dehazing and detection\u201d. One of the tutorials \u201c<a href=\"https:\/\/aaai18adversarial.github.io\/\">Adversarial Machine Learning<\/a>\u201d gives an excellent introduction to the state of art on this topic. Prof. Zoubin Ghahramani from Uber gives a <a href=\"https:\/\/www.youtube.com\/watch?v=095Ee0rKC14\">talk<\/a> on his vision about Probabilistic AI, which is also one of the trends at this conference.<\/p>\n<p>Best paper of this year goes to \u201cMemory-Augmented Monte Carlo Tree Search\u201d from University of Alberta, and best student paper to \u201cCounterfactual Multi-Agent Policy Gradients\u201d from, ahem, the other place.<\/p>\n<p>These papers is only a scratch of all the papers contained in the AAAI\u201918 conference, and mostly on the Computer Vision topic that is my personal interest. If you are interested, please refer to the <a href=\"https:\/\/aaai.org\/Conferences\/AAAI-18\/wp-content\/uploads\/2017\/12\/AAAI-18-Accepted-Paper-List.Web_.pdf\">full list<\/a> of accepted papers.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently I\u2019m honoured to get the opportunity to present our work \u201cPrivacy-preserving Machine Learning Based Data Analytics on Edge Devices\u201d at the AIES&#8217;18 conference, which is co-located with AAAI&#8217;18, one of the top conference in the field of AI and Machine Learning. Here is a brief review of some of the papers and trends that [&hellip;]<\/p>\n","protected":false},"author":46,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[6],"tags":[100],"_links":{"self":[{"href":"https:\/\/www.syslog.cl.cam.ac.uk\/wp-json\/wp\/v2\/posts\/2385"}],"collection":[{"href":"https:\/\/www.syslog.cl.cam.ac.uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.syslog.cl.cam.ac.uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.syslog.cl.cam.ac.uk\/wp-json\/wp\/v2\/users\/46"}],"replies":[{"embeddable":true,"href":"https:\/\/www.syslog.cl.cam.ac.uk\/wp-json\/wp\/v2\/comments?post=2385"}],"version-history":[{"count":1,"href":"https:\/\/www.syslog.cl.cam.ac.uk\/wp-json\/wp\/v2\/posts\/2385\/revisions"}],"predecessor-version":[{"id":2386,"href":"https:\/\/www.syslog.cl.cam.ac.uk\/wp-json\/wp\/v2\/posts\/2385\/revisions\/2386"}],"wp:attachment":[{"href":"https:\/\/www.syslog.cl.cam.ac.uk\/wp-json\/wp\/v2\/media?parent=2385"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.syslog.cl.cam.ac.uk\/wp-json\/wp\/v2\/categories?post=2385"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.syslog.cl.cam.ac.uk\/wp-json\/wp\/v2\/tags?post=2385"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}