Arxivisor

3D Semantic Scene Completion: a Survey
Luis Roldao, Raoul de Charette, Anne Verroust-Blondet
3/12/2021, 18:59cs.CV
Semantic Scene Completion (SSC) aims to jointly estimate the complete geometry and semantics of a scene, assuming partial sparse input. In the last years following the multiplication of large-scale 3D datasets, SSC has gained significant momentum in the research community because it holds unresolved challenges. Specifically, SSC lies in the ambiguous completion of large unobserved areas and the weak supervision signal of the ground truth. This led to a substantially increasing number of papers on the matter. This survey aims to identify, compare and analyze the techniques providing a critical analysis of the SSC literature on both methods and datasets. Throughout the paper, we provide an in-depth analysis of the existing works covering all choices made by the authors while highlighting the remaining avenues of research. SSC performance of the SoA on the most popular datasets is also evaluated and analyzed.
Probabilistic two-stage detection
Xingyi Zhou, Vladlen Koltun, Philipp Krähenbühl
3/12/2021, 18:56cs.CV
We develop a probabilistic interpretation of two-stage object detection. We show that this probabilistic interpretation motivates a number of common empirical training practices. It also suggests changes to two-stage detection pipelines. Specifically, the first stage should infer proper object-vs-background likelihoods, which should then inform the overall score of the detector. A standard region proposal network (RPN) cannot infer this likelihood sufficiently well, but many one-stage detectors can. We show how to build a probabilistic two-stage detector from any state-of-the-art one-stage detector. The resulting detectors are faster and more accurate than both their one- and two-stage precursors. Our detector achieves 56.4 mAP on COCO test-dev with single-scale testing, outperforming all published results. Using a lightweight backbone, our detector achieves 49.2 mAP on COCO at 33 fps on a Titan Xp, outperforming the popular YOLOv4 model.
Lifelong Multi-Agent Path Finding in Large-Scale Warehouses
Jiaoyang Li, Andrew Tinka, Scott Kiesel, Joseph W. Durham, T. K. Satish Kumar, Sven Koenig
5/15/2020, 06:07cs.AIcs.MAcs.RO
Multi-Agent Path Finding (MAPF) is the problem of moving a team of agents to their goal locations without collisions. In this paper, we study the lifelong variant of MAPF, where agents are constantly engaged with new goal locations, such as in large-scale automated warehouses. We propose a new framework Rolling-Horizon Collision Resolution (RHCR) for solving lifelong MAPF by decomposing the problem into a sequence of Windowed MAPF instances, where a Windowed MAPF solver resolves collisions among the paths of the agents only within a bounded time horizon and ignores collisions beyond it. RHCR is particularly well suited to generating pliable plans that adapt to continually arriving new goal locations. We empirically evaluate RHCR with a variety of MAPF solvers and show that it can produce high-quality solutions for up to 1,000 agents (= 38.9\% of the empty cells on the map) for simulated warehouse instances, significantly outperforming existing work.
Towards Risk Modeling for Collaborative AI
Matteo Camilli, Michael Felderer, Andrea Giusti, Dominik T. Matt, Anna Perini, Barbara Russo, Angelo Susi
3/12/2021, 18:53cs.SEcs.AI
Collaborative AI systems aim at working together with humans in a shared space to achieve a common goal. This setting imposes potentially hazardous circumstances due to contacts that could harm human beings. Thus, building such systems with strong assurances of compliance with requirements domain specific standards and regulations is of greatest importance. Challenges associated with the achievement of this goal become even more severe when such systems rely on machine learning components rather than such as top-down rule-based AI. In this paper, we introduce a risk modeling approach tailored to Collaborative AI systems. The risk model includes goals, risk events and domain specific indicators that potentially expose humans to hazards. The risk model is then leveraged to drive assurance methods that feed in turn the risk model through insights extracted from run-time evidence. Our envisioned approach is described by means of a running example in the domain of Industry 4.0, where a robotic arm endowed with a visual perception component, implemented with machine learning, collaborates with a human operator for a production-relevant task.
Multiview Sensing With Unknown Permutations: An Optimal Transport Approach
Yanting Ma, Petros T. Boufounos, Hassan Mansour, Shuchin Aeron
3/12/2021, 18:48cs.ITcs.CVcs.LGeess.IVeess.SPmath.IT
In several applications, including imaging of deformable objects while in motion, simultaneous localization and mapping, and unlabeled sensing, we encounter the problem of recovering a signal that is measured subject to unknown permutations. In this paper we take a fresh look at this problem through the lens of optimal transport (OT). In particular, we recognize that in most practical applications the unknown permutations are not arbitrary but some are more likely to occur than others. We exploit this by introducing a regularization function that promotes the more likely permutations in the solution. We show that, even though the general problem is not convex, an appropriate relaxation of the resulting regularized problem allows us to exploit the well-developed machinery of OT and develop a tractable algorithm.
EventGraD: Event-Triggered Communication in Parallel Machine Learning
Soumyadip Ghosh, Bernardo Aquino, Vijay Gupta
3/12/2021, 18:28cs.LGcs.DCcs.SYeess.SY
Communication in parallel systems imposes significant overhead which often turns out to be a bottleneck in parallel machine learning. To relieve some of this overhead, in this paper, we present EventGraD - an algorithm with event-triggered communication for stochastic gradient descent in parallel machine learning. The main idea of this algorithm is to modify the requirement of communication at every iteration in standard implementations of stochastic gradient descent in parallel machine learning to communicating only when necessary at certain iterations. We provide theoretical analysis of convergence of our proposed algorithm. We also implement the proposed algorithm for data-parallel training of a popular residual neural network used for training the CIFAR-10 dataset and show that EventGraD can reduce the communication load by up to 60% while retaining the same level of accuracy.
Machine Learning Assisted Orthonormal Basis Selection for Functional Data Analysis
Rani Basna, Hiba Nassar, Krzysztof Podgórski
3/12/2021, 18:27stat.MLcs.LGstat.CO
In implementations of the functional data methods, the effect of the initial choice of an orthonormal basis has not gained much attention in the past. Typically, several standard bases such as Fourier, wavelets, splines, etc. are considered to transform observed functional data and a choice is made without any formal criteria indicating which of the bases is preferable for the initial transformation of the data into functions. In an attempt to address this issue, we propose a strictly data-driven method of orthogonal basis selection. The method uses recently introduced orthogonal spline bases called the splinets obtained by efficient orthogonalization of the B-splines. The algorithm learns from the data in the machine learning style to efficiently place knots. The optimality criterion is based on the average (per functional data point) mean square error and is utilized both in the learning algorithms and in comparison studies. The latter indicates efficiency that is particularly evident for the sparse functional data and to a lesser degree in analyses of responses to complex physical systems.
Cooperative Learning of Zero-Shot Machine Reading Comprehension
Hongyin Luo, Seunghak Yu, James Glass
3/12/2021, 18:22cs.CLcs.AI
Pretrained language models have significantly improved the performance of down-stream tasks, for example extractive question answering, by providing high-quality contextualized word embeddings. However, learning question answering models still need large-scale data annotation in specific domains. In this work, we propose a cooperative, self-play learning model for question generation and answering. We implemented a masked answer entity extraction task with an interactive learning environment, containing a question generator and a question extractor. Given a passage with a mask, a question generator asks a question about the masked entity, meanwhile the extractor is trained to extract the masked entity with the generated question and raw texts. With this strategy, we can train question generation and answering models on any textual corpora without annotation. To further improve the performances of the question answering model, we propose a reinforcement learning method that rewards generated questions that improves the extraction learning. Experimental results showed that our model outperforms the state-of-the-art pretrained language models on standard question answering benchmarks, and reaches the state-of-the-art performance under the zero-shot learning setting.
Efficient reconstruction of depth three circuits with top fan-in two
Gaurav Sinha
3/12/2021, 18:19cs.CCcs.DMcs.LG
We develop efficient randomized algorithms to solve the black-box reconstruction problem for polynomials over finite fields, computable by depth three arithmetic circuits with alternating addition/multiplication gates, such that output gate is an addition gate with in-degree two. These circuits compute polynomials of form $G\times(T_1 + T_2)$ , where $G,T_1,T_2$ are product of affine forms, and polynomials $T_1,T_2$ have no common factors. Rank of such a circuit is defined as dimension of vector space spanned by all affine factors of $T_1$ and $T_2$ . For any polynomial $f$ computable by such a circuit, $rank(f)$ is defined to be the minimum rank of any such circuit computing it. Our work develops randomized reconstruction algorithms which take as input black-box access to a polynomial $f$ (over finite field $\mathbb{F}$ ), computable by such a circuit. Here are the results. 1 [Low rank]: When $5\leq rank(f) = O(\log^3 d)$ , it runs in time $(nd^{\log^3d}\log |\mathbb{F}|)^{O(1)}$ , and, with high probability, outputs a depth three circuit computing $f$ , with top addition gate having in-degree $\leq d^{rank(f)}$ . 2 [High rank]: When $rank(f) = \Omega(\log^3 d)$ , it runs in time $(nd\log |\mathbb{F}|)^{O(1)}$ , and, with high probability, outputs a depth three circuit computing $f$ , with top addition gate having in-degree two. Ours is the first blackbox reconstruction algorithm for this circuit class, that runs in time polynomial in $\log |\mathbb{F}|$ . This problem has been mentioned as an open problem in [GKL12] (STOC 2012)
Estimating and Evaluating Regression Predictive Uncertainty in Deep Object Detectors
Ali Harakeh, Steven L. Waslander
1/13/2021, 12:53cs.CVstat.ML
Predictive uncertainty estimation is an essential next step for the reliable deployment of deep object detectors in safety-critical tasks. In this work, we focus on estimating predictive distributions for bounding box regression output with variance networks. We show that in the context of object detection, training variance networks with negative log likelihood (NLL) can lead to high entropy predictive distributions regardless of the correctness of the output mean. We propose to use the energy score as a non-local proper scoring rule and find that when used for training, the energy score leads to better calibrated and lower entropy predictive distributions than NLL. We also address the widespread use of non-proper scoring metrics for evaluating predictive distributions from deep object detectors by proposing an alternate evaluation approach founded on proper scoring rules. Using the proposed evaluation tools, we show that although variance networks can be used to produce high quality predictive distributions, ad-hoc approaches used by seminal object detectors for choosing regression targets during training do not provide wide enough data support for reliable variance learning. We hope that our work helps shift evaluation in probabilistic object detection to better align with predictive uncertainty evaluation in other machine learning domains. Code for all models, evaluation, and datasets is available at: https://github.com/asharakeh/probdet.git.
Loading more papers....