Notes on interesting papers, to remind myself why I found them interesting, and to help find them again.
Primarily "blue-skies" research reading, not papers read in the course of writing one of my own, so not necessarily representative of my research.
2024-11-05 – In search of types
2024-07-30 – Complex behavior from intrinsic motivation to occupy future action-state path space
2023-12-20 – LoRA: Low-rank adaptation of large language models
2023-07-04 – Five stages of accepting constructive mathematics
2023-03-26 – Understanding deep learning requires rethinking generalization
2022-06-12 – The unreliability of naive introspection
2022-01-12 – The evolution of operating systems
2021-07-12 – Is long horizon RL more difficult than short horizon RL?
2021-06-21 – Generative text using classical nondeterminism
2020-10-25 – What you always wanted to know about Datalog (and never dared to ask)
2020-09-23 – The gambler's problem and beyond
2020-06-17 – Answer set programming for non-stationary Markov decision processes
2020-06-07 – Addressing the fundamental tension of PCGML with discriminative learning
2020-06-05 – From logicism to proceduralism (an autobiographical account)
2020-06-04 – Languages for computer music
2020-06-02 – Grounding and solving in answer set programming
2020-05-30 – Answer set; programming?
2020-02-19 – A novelty detection approach to classification
2020-01-06 – On links: Exercises in style
2019-10-30 – State aggregation in Monte Carlo tree search
2019-09-10 – Objections to Bayesian statistics
2019-08-28 – A collection of definitions of intelligence
2019-04-16 – Integer multiplication in time O(n log n)
2019-04-15 – MinCaml: A simple and efficient compiler for a minimal functional language
2019-02-04 – Entropic GANs meet VAEs: A statistical approach to compute sample likelihoods in GANs
2018-11-19 – Learning and using models
2018-10-17 – Kolmogorov: Life and creative activities
2018-09-25 – Can citation indexing be automated?
2018-09-24 – Automatic differentiation in machine learning: A survey
2018-09-09 – Mission control: A history of the urban dashboard
2018-09-07 – Multi-agent diverse generative adversarial networks
2018-09-05 – Finding meaning in abstract games: A deep reading of Sage Solitaire
2018-07-02 – On embeddings as alternative paradigm for relational learning
2018-06-07 – The abstract organism: Towards a prehistory for a-life art
2018-05-30 – Measuring the intrinsic dimension of objective landscapes
2018-05-24 – A trans-Canada computer communications network
2018-05-08 – Vagueness, rationality and undecidability: A theory of why there is vagueness
2018-05-02 – Practical reinforcement learning in continuous spaces
2018-04-28 – Six mathematical gems from the history of distance geometry
2018-04-10 – Theorizing affordances: From request to refuse
2018-04-05 – How to do research at the MIT AI Lab
2018-03-30 – Extremely randomized trees
2018-03-28 – Perspectives of approximate dynamic programming
2018-03-27 – Datafun: A functional Datalog
2018-03-26 – Finding alternative musical scales
2018-02-26 – Seagull: A bird's-eye view of the evolution of technical games research
2018-02-14 – Propp's morphology of the folk tale as a grammar for generation
2017-11-16 – Wordless games: Gameplay as narrative technique
2017-10-27 – On the notion of interestingness in automated mathematical discovery
2017-09-12 – The ordinal nature of emotions
2017-08-26 – Grimes' fairy tales: A 1960s story generator
2017-08-25 – Some natural solutions to the p-value communication problem—and why they won't work
2017-07-27 – Twenty things to do with a computer
2017-05-30 – Propositional & analogical generation of coordinated verbal, visual & musical texts
Abstract:
The concept of "type" has been used without a consistent, precise definition in discussions about programming languages for 60 years. In this essay I explore various concepts lurking behind distinct uses of this word, highlighting two traditions in which the word came into use largely independently: engineering traditions on the one hand, and those of symbolic logic on the other. These traditions are founded on differing attitudes to the nature and purpose of abstraction, but their distinct uses of "type" have never been explicitly unified. One result is that discourse across these traditions often finds itself at cross purposes, such as overapplying one sense of "type" where another is appropriate, and occasionally proceeding to draw wrong conclusions. I illustrate this with examples from well-known and justly well-regarded literature, and argue that ongoing developments in both the theory and practice of programming make now a good time to resolve these problems.
Abstract:
Most theories of behavior posit that agents tend to maximize some form of reward or utility. However, animals very often move with curiosity and seem to be motivated in a reward-free manner. Here we abandon the idea of reward maximization and propose that the goal of behavior is maximizing occupancy of future paths of actions and states. According to this maximum occupancy principle, rewards are the means to occupy path space, not the goal per se; goal-directedness simply emerges as rational ways of searching for resources so that movement, understood amply, never ends. We find that action-state path entropy is the only measure consistent with additivity and other intuitive properties of expected future action-state path occupancy. We provide analytical expressions that relate the optimal policy and state-value function and prove convergence of our value iteration algorithm. Using discrete and continuous state tasks, including a high-dimensional controller, we show that complex behaviors such as “dancing”, hide-and-seek, and a basic form of altruistic behavior naturally result from the intrinsic motivation to occupy path space. All in all, we present a theory of behavior that generates both variability and goal-directedness in the absence of reward maximization.
Abstract:
In recent years, reinforcement learning (RL) has been applied to real-world problems with increasing success. Such applications often require to put constraints on the agent's behavior. Existing algorithms for constrained RL (CRL) rely on gradient descent-ascent, but this approach comes with a caveat. While these algorithms are guaranteed to converge on average, they do not guarantee last-iterate convergence, i.e., the current policy of the agent may never converge to the optimal solution. In practice, it is often observed that the policy alternates between satisfying the constraints and maximizing the reward, rarely accomplishing both objectives simultaneously. Here, we address this problem by introducing Reinforcement Learning with Optimistic Ascent-Descent (ReLOAD), a principled CRL method with guaranteed last-iterate convergence. We demonstrate its empirical effectiveness on a wide variety of CRL problems including discrete MDPs and continuous control. In the process we establish a benchmark of challenging CRL problems.
Abstract:
It is commonly assumed that images, whether in the world or in the head, do not have a privileged analysis into constituent parts. They are thought to lack the sort of syntactic structure necessary for representing complex contents and entering into sophisticated patterns of inference. I reject this assumption. "Image grammars" are models in computer vision that articulate systematic principles governing the form and content of images. These models are empirically credible and can be construed as literal grammars for images. Images can have rich syntactic structure, though of a markedly different form than sentences in language.
Abstract:
An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example – deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by a factor of 10,000 and the GPU memory requirement by a factor of 3. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at https://github.com/microsoft/LoRA.
Abstract:
On the odd day, a mathematician might wonder what constructive mathematics is all about. They may have heard arguments in favor of constructivism but are not at all convinced by them, and in any case they may care little about philosophy. A typical introductory text about constructivism spends a great deal of time explaining the principles and contains only trivial mathematics, while advanced constructive texts are impenetrable, like all unfamiliar mathematics. How then can a mathematician find out what constructive mathematics feels like? What new and relevant ideas does constructive mathematics have to offer, if any? I shall attempt to answer these questions.
Abstract:
Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. We interpret our experimental findings by comparison with traditional models.
Abstract:
We are prone to gross error, even in favorable circumstances of extended reflection, about our own ongoing conscious experience, our current phenomenology. Even in this apparently privileged domain, our self-knowledge is faulty and untrustworthy. We are not simply fallible at the margins but broadly inept. Examples highlighted in this essay include: emotional experience (for example, is it entirely bodily; does joy have a common, distinctive phenomenological core?), peripheral vision (how broad and stable is the region of visual clarity?), and the phenomenology of thought (does it have a distinctive phenomenology, beyond just imagery and feelings?). Cartesian skeptical scenarios undermine knowledge of ongoing conscious experience as well as knowledge of the outside world. Infallible judgments about ongoing mental states are simply banal cases of self-fulfillment. Philosophical foundationalism supposing that we infer an external world from secure knowledge of our own consciousness is almost exactly backward.
Abstract:
The author looks back on the first half century of operating systems and selects his favorite papers on classic operating systems. These papers span the entire history of the field from the batch processing systems of the 1950s to the distributed systems of the 1990s. Each paper describes an operating system that combines significant ideas in an elegant way. Most of them were written by the pioneers who had the visions and the drive to make them work. The author summarizes each paper and concludes that operating systems are based on a surprisingly small number of ideas of permanent interest.
Abstract:
Learning to plan for long horizons is a central challenge in episodic reinforcement learning problems. A fundamental question is to understand how the difficulty of the problem scales as the horizon increases. Here the natural measure of sample complexity is a normalized one: we are interested in the number of episodes it takes to provably discover a policy whose value is ε near to that of the optimal value, where the value is measured by the normalized cumulative reward in each episode. In a COLT 2018 open problem, Jiang and Agarwal conjectured that, for tabular, episodic reinforcement learning problems, there exists a sample complexity lower bound which exhibits a polynomial dependence on the horizon – a conjecture which is consistent with all known sample complexity upper bounds. This work refutes this conjecture, proving that tabular, episodic reinforcement learning is possible with a sample complexity that scales only logarithmically with the planning horizon. In other words, when the values are appropriately normalized (to lie in the unit interval), this results shows that long horizon RL is no more difficult than short horizon RL, at least in a minimax sense. Our analysis introduces two ideas: (i) the construction of an ε-net for optimal policies whose log-covering number scales only logarithmically with the planning horizon, and (ii) the Online Trajectory Synthesis algorithm, which adaptively evaluates all policies in a given policy class using sample complexity that scales with the log-covering number of the given policy class. Both may be of independent interest.
Abstract:
Many recent generative text systems combine a context-free grammar base with some set of extensions such as tagging or inline JavaScript. We argue that the restriction to CFGs is unnecessary and that the standard pattern-directed, nondeterministic control structures common to Prolog, definite-clause grammars, and many HTNs, support a superset of these capabilities while still being simple to implement. We describe Step, a generative text system for Unity games based on higher-order HTNs that so far as we can determine from published descriptions, generalizes this previous work. We then describe syntactic extensions to make Step more natural as a text-authoring language. Finally, we discuss how Step and similar systems can be implemented very compactly in modern mainstream programming languages.
Abstract:
Datalog is a database query language based on the logic programming paradigm; it has been designed and intensively studied over the last five years. We present the syntax and semantics of Datalog and its use for querying a relational database. Then, we classify optimization methods for achieving efficient evaluations of Datalog queries, and present the most relevant methods. Finally, we discuss various enhancements of Datalog, currently under study, and indicate what is still needed in order to extend Datalog's applicability to the solution of real-life problems. The aim of this paper is to provide a survey of research performed on Datalog, also addressed to those members of the database community who are not too familiar with logic programming concepts.
Abstract:
We analyze the Gambler's problem, a simple reinforcement learning problem where the gambler has the chance to double or lose the bets until the target is reached. This is an early example introduced in the reinforcement learning textbook by Sutton and Barto (2018), where they mention an interesting pattern of the optimal value function with high-frequency components and repeating non-smooth points. It is however without further investigation. We provide the exact formula for the optimal value function for both the discrete and the continuous cases. Though simple as it might seem, the value function is pathological: fractal, self-similar, derivative taking either zero or infinity, and not written as elementary functions. It is in fact one of the generalized Cantor functions, where it holds a complexity that has been uncharted thus far. Our analyses could provide insights into improving value function approximation, gradient-based algorithms, and Q-learning, in real applications and implementations.
Abstract:
Non-stationary domains, where unforeseen changes happen, present a challenge for agents to find an optimal policy for a sequential decision making problem. This work investigates a solution to this problem that combines Markov Decision Processes (MDP) and Reinforcement Learning (RL) with Answer Set Programming (ASP) in a method we call ASP(RL). In this method, Answer Set Programming is used to find the possible trajectories of an MDP, from where Reinforcement Learning is applied to learn the optimal policy of the problem. Results show that ASP(RL) is capable of efficiently finding the optimal solution of an MDP representing non-stationary domains.
Abstract:
Procedural content generation via machine learning (PCGML) is typically framed as the task of fitting a generative model to full-scale examples of a desired content distribution. This approach presents a fundamental tension: the more design effort expended to produce detailed training examples for shaping a generator, the lower the return on investment from applying PCGML in the first place. In response, we propose the use of discriminative models, which capture the validity of a design rather the distribution of the content, trained on positive and negative example design fragments. Through a modest modification of WaveFunctionCollapse, a commercially-adopted PCG approach that we characterize as using elementary machine learning, we demonstrate a new mode of control for learning-based generators. We demonstrate how an artist might craft a focused set of additional positive and negative design fragments by critique of the generator's previous outputs. This interaction mode bridges PCGML with mixed-initiative design assistance tools by working with a machine to define a space of valid designs rather than just one new design.
Abstract:
This is a story of how I changed my views from the belief that good knowledge must always be represented as a set of logical statements, within a suitable mathematical model of reality, to my present opinion that knowledge is basically algorithmic.
Abstract:
Specialized languages for computer music have long been an important area of research in this community. Computer music languages have enabled composers who are not software engineers to nevertheless use computers effectively. While powerful general-purpose programming languages can be used for music tasks, experience has shown that time plays a special role in music computation, and languages that embrace musical time are especially expressive for many musical tasks. Time is expressed in procedural languages through schedulers and abstractions of beats, duration and tempo. Functional languages have been extended with temporal semantics, and object-oriented languages are often used to model stream-based computation of audio. This article considers models of computation that are especially important for music programming, how these models are supported in programming languages, and how this leads to expressive and efficient programs. Concrete examples are drawn from some of the most widely used music programming languages.
Abstract:
Answer set programming is a declarative problem solving paradigm that rests upon a workflow involving modeling, grounding, and solving. While the former is described by Gebser and Schaub (2016), we focus here on key issues in grounding, or how to systematically replace object variables by ground terms in a effective way, and solving, or how to compute the answer sets of a propositional logic program obtained by grounding.
Abstract:
Motivated by a discussion maintained by Michael Gelfond and other researchers, this short essay contains some thoughts and refections about the following question: should ASP be considered a programming language?
Abstract:
Novelty Detection techniques are concept-learning methods that proceed by recognizing positive instances of a concept rather than differentiating between its positive and negative instances. Novelty Detection approaches consequently require very few, if any, negative training instances. This paper presents a particular Novelty Detection approach to classification that uses a Redundancy Compression and Non-Redundancy Differentiation technique based on the [Gluck & Myers, 1993] model of the hippocampus, a part of the brain critically involved in learning and memory. In particular, this approach consists of training an autoencoder to reconstruct positive input instances at the output layer and then using this autoencoder to recognize novel instances. Classification is possible, after training, because positive instances are expected to be reconstructed accurately while negative instances are not. The purpose of this paper is to compare HIPPO, the system that implements this technique, to C4.5 and feedforward neural network classification on several applications.
Abstract:
Links are the most important new punctuation mark since the invention of the comma, but it has been years since the last in-depth discussions of link poetics. Taking inspiration Raymond Queneau's Exercices De Style, we explore the poetics of contemporary link usage by offering exercises in which the same piece of text is divided and linked in different ways. We present three different exercises—varying the division of a text into lexia, varying links among lexia, and varying links within lexia—while pointing toward potential aesthetic considerations of each variation. Our exercises are intended descriptively, not prescriptively, as a conversational starting point for analysis and as a compendium of useful techniques upon which artists might build.
Abstract:
Monte Carlo tree search (MCTS) algorithms are a popular approach to online decision-making in Markov decision processes (MDPs). These algorithms can, however, perform poorly in MDPs with high stochastic branching factors. In this paper, we study state aggregation as a way of reducing stochastic branching in tree search. Prior work has studied formal properties of MDP state aggregation in the context of dynamic programming and reinforcement learning, but little attention has been paid to state aggregation in MCTS. Our main result is a performance loss bound for a class of value function-based state aggregation criteria in expectimax search trees. We also consider how to construct MCTS algorithms that operate in the abstract state space but require a simulator of the ground dynamics only. We find that trajectory sampling algorithms like UCT can be adapted easily, but that sparse sampling algorithms present difficulties. As a proof of concept, we experimentally confirm that state aggregation can improve the finite-sample performance of UCT.
Abstract:
Bayesian inference is one of the more controversial approaches to statistics. The fundamental objections to Bayesian methods are twofold: on one hand, Bayesian methods are presented as an automatic inference engine, and this raises suspicion in anyone with applied experience. The second objection to Bayes comes from the opposite direction and addresses the subjective strand of Bayesian inference. This article presents a series of objections to Bayesian inference, written in the voice of a hypothetical anti-Bayesian statistician. The article is intended to elicit elaborations and extensions of these and other arguments from non-Bayesians and responses from Bayesians who might have different perspectives on these issues.
Abstract:
This paper is a survey of a large number of informal definitions of "intelligence" that the authors have collected over the years. Naturally, compiling a complete list would be impossible as many definitions of intelligence are buried deep inside articles and books. Nevertheless, the 70-odd definitions presented here are, to the authors' knowledge, the largest and most well referenced collection there is.
Abstract:
We present an algorithm that computes the product of two n-bit integers in O(n log n) bit operations.
Abstract:
We present a simple compiler, consisting of only 2000 lines of ML, for a strict, impure, monomorphic, and higher-order functional language. Although this language is minimal, our compiler generates as fast code as standard compilers like Objective Caml and GCC for several applications including ray tracing, written in the optimal style of each language implementation. Our primary purpose is education at undergraduate level to convince students—as well as average programmers—that functional languages are simple and efficient.
Abstract:
Building on the success of deep learning, two modern approaches to learn a probability model of the observed data are Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs). VAEs consider an explicit probability model for the data and compute a generative distribution by maximizing a variational lower-bound on the log-likelihood function. GANs, however, compute a generative model by minimizing a distance between observed and generated probability distributions without considering an explicit model for the observed data. The lack of having explicit probability models in GANs prohibits computation of sample likelihoods in their frameworks and limits their use in statistical inference problems. In this work, we show that an optimal transport GAN with the entropy regularization can be viewed as a generative model that maximizes a lower-bound on average sample likelihoods, an approach that VAEs are based on. In particular, our proof constructs an explicit probability model for GANs that can be used to compute likelihood statistics within GAN's framework. Our numerical results on several datasets demonstrate consistent trends with the proposed theory.
Abstract:
As opposed to model-free RL methods, which learn directly from experience in the domain, model-based methods learn a model of the transition and reward functions of the domain on-line and plan a policy using this model. Once the method has learned an accurate model, it can plan an optimal policy on this model without any further experience in the world. Therefore, when model-based methods are able to learn a good model quickly, they frequently have improved sample efficiency over model-free methods, which must continue taking actions in the world for values to propagate back to previous states. Another advantage of model-based methods is that they can use their models to plan multi-step exploration trajectories. In particular, many methods drive the agent to explore where there is uncertainty in the model, so as to learn the model as fast as possible. In this chapter, we survey some of the types of models used in model-based methods and ways of learning them, as well as methods for planning on these models. [...]
Excerpt:
Topics in the theory of trigonometric series, theory of measure and sets, studies in the theory of integration, approximation theory, constructive logic, topology, theory of superposition of functions and Hilbert's 13th problem, topics in classical mechanics, ergodic theory, theory of turbulence, diffusion and patterns (models) in the dynamics of populations, papers on the foundations of probability theory, limit theorems, theory of stochastic (Markov, stationary, branching, ...) processes, mathematical statistics, theory of algorithms, information theory... —even this is hardly a complete list of all the branches of science in which Andrei Nikolaevich obtained fundamentally important results, which determined the state of many fields of 20th century mathematics and possible directions for their development.
Abstract:
The main characteristics of conventional language-oriented indexing systems are itemized and compared to the characteristics of citation indexes. The advantages and disadvantages arc discussed in relation to the capability of the computer automatically to simulate human critical proccaws reflected in the act of citation. It is shown that a considerable standardization of document presentations will be necessary and probably not achievable for many years if we are to achieve automatic referencing. On the other hand, many citations, now fortuitously or otherwise omitted, might be supplied by computer analyses of text.
Abstract:
Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in machine learning. Automatic differentiation (AD), also called algorithmic differentiation or simply "autodiff", is a family of techniques similar to but more general than backpropagation for efficiently and accurately evaluating derivatives of numeric functions expressed as computer programs. AD is a small but established field with applications in areas including computational fluid dynamics, atmospheric sciences, and engineering design optimization. Until very recently, the fields of machine learning and AD have largely been unaware of each other and, in some cases, have independently discovered each other's results. Despite its relevance, general-purpose AD has been missing from the machine learning toolbox, a situation slowly changing with its ongoing adoption under the names "dynamic computational graphs" and "differentiable programming". We survey the intersection of AD and machine learning, cover applications where AD has direct relevance, and address the main implementation techniques. By precisely defining the main differentiation techniques and their interrelationships, we aim to bring clarity to the usage of the terms "autodiff", "automatic differentiation", and "symbolic differentiation" as these are encountered more and more in machine learning settings.
Excerpt:
We know what rocket science looks like in the movies: a windowless bunker filled with blinking consoles, swivel chairs, and shirt-sleeved men in headsets nonchalantly relaying updates from "Houston" to outer space. Lately, that vision of Mission Control has taken over City Hall. NASA meets Copacabana, proclaimed the New York Times, hailing Rio de Janeiro's Operations Center as a "potentially lucrative experiment that could shape the future of cities around the world." The Times photographed an IBM executive in front of a seemingly endless wall of screens integrating data from 30 city agencies, including transit video, rainfall patterns, crime statistics, car accidents, power failures, and more. Futuristic control rooms have proliferated in dozens of global cities.
Abstract:
We propose MAD-GAN, an intuitive generalization to the Generative Adversarial Networks (GANs) and its conditional variants to address the well known problem of mode collapse. First, MAD-GAN is a multi-agent GAN architecture incorporating multiple generators and one discriminator. Second, to enforce that different generators capture diverse high probability modes, the discriminator of MAD-GAN is designed such that along with finding the real and fake samples, it is also required to identify the generator that generated the given fake sample. Intuitively, to succeed in this task, the discriminator must learn to push different generators towards different identifiable modes. We perform extensive experiments on synthetic and real datasets and compare MAD-GAN with different variants of GAN. We show high quality diverse sample generations for challenging tasks such as image-to-image translation and face generation. In addition, we also show that MAD-GAN is able to disentangle different modalities when trained using highly challenging diverse-class dataset (e.g. dataset with images of forests, icebergs, and bedrooms). In the end, we show its efficacy on the unsupervised feature representation task.
Abstract:
This paper presents a methodology for discovering and explaining how games with very few thematic assets (or abstract games) are meaningful to players through rules and dynamics. Through the process of implementing play strategies as computer code, and then running simulations of the game being played, insights about how a player might think about and experience playing the game are revealed. These insights are compiled into interpretations of the themes and meanings that can be found in the abstract game. The paper then applies the methodology to perform a deep reading of the single player digital card game Sage Solitaire.
Abstract:
Many real-world domains can be expressed as graphs and, more generally, as multi-relational knowledge graphs. Though reasoning and learning with knowledge graphs has traditionally been addressed by symbolic approaches, recent methods in (deep) representation learning has shown promising results for specialized tasks such as knowledge base completion. These approaches abandon the traditional symbolic paradigm by replacing symbols with vectors in Euclidean space. With few exceptions, symbolic and distributional approaches are explored in different communities and little is known about their respective strengths and weaknesses. In this work, we compare representation learning and relational learning on various relational classification and clustering tasks and analyse the complexity of the rules used implicitly by these approaches. Preliminary results reveal possible indicators that could help in choosing one approach over the other for particular knowledge graphs.
Abstract:
The author examines historical precedents for contemporary art practice using artificial life, in particular in the work of Paul Klee and Kasimir Malevich. Similarities are identified between artificial life and the philosophical tradition of organicism; specific examples from Klee and Malevich indicate that those artists were engaged in a form of creative organicist thought that imagined the realization of living structures in artificial media.
Abstract:
Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. We slowly increase the dimension of this subspace, note at which dimension solutions first appear, and define this to be the intrinsic dimension of the objective landscape. The approach is simple to implement, computationally tractable, and produces several suggestive conclusions. Many problems have smaller intrinsic dimensions than one might suspect, and the intrinsic dimension for a given dataset varies little across a family of models with vastly different sizes. This latter result has the profound implication that once a parameter space is large enough to solve a problem, extra parameters serve directly to increase the dimensionality of the solution manifold. Intrinsic dimension allows some quantitative comparison of problem difficulty across supervised, reinforcement, and other types of learning where we conclude, for example, that solving the inverted pendulum problem is 100 times easier than classifying digits from MNIST, and playing Atari Pong from pixels is about as hard as classifying CIFAR-10. In addition to providing new cartography of the objective landscapes wandered by parameterized models, the method is a simple technique for constructively obtaining an upper bound on the minimum description length of a solution. A byproduct of this construction is a simple approach for compressing networks, in some cases by more than 100 times.
Excerpt:
In the sector of service to users, there is one Major Project which would dramatically improve the quality and quantity of computer services available to all Canadians and would greatly assist Canadian industry. It is the organization of a nationwide system of computer communications networks. To achieve this goal a positive, vigorous Canadian policy is urgently needed. A "laissez-faire" attitude will eventually result in the supply of most computing and information services via spur lines from U.S. computer commu nications networks. Such an outcome is completely unacceptable on economic and social grounds. [...] The system of networks should not be allowed to practise "cream-skimming" by concentrating exclusively on the densely populated, highly profitable regions of Canada. It must link all important centres in Canada in order to bring computing and information services to the greatest possible number of Canadians.
Excerpt:
Vagueness is not undecidability, but undecidability does enter into an explanation of why there is vagueness. My theory, called the Undecidability Theory of Vagueness, explains vagueness largely as a result of the fact that we are computationally bound. Vagueness is not due to any particularly human weakness, but due to a weakness that any computationally bound agent possesses; even HAL from 2001: A Space Odyssey will probably experience vagueness. Furthermore, I will argue that vagueness is good for you. I will do so by showing that if you were a computationally bound, rational alien agent given the task of figuring out what our natural language predicates mean, you would very probably end up with vagueness. That is, unless two highly plausible hypotheses are false, it would be rational for you—i.e., in your best interest—to choose concepts in such a way that they are vague. Given that you are computationally bound, avoiding vagueness brings in greater costs than accepting it.
Abstract:
Dynamic control tasks are good candidates for the application of reinforcement learning techniques. However, many of these tasks inherently have continuous state or action variables. This can cause problems for traditional reinforcement learning algorithms which assume discrete states and actions. In this paper, we introduce an algorithm that safely approximates the value function for continuous state control tasks, and that learns quickly from a small amount of data. We give experimental results using this algorithm to learn policies for both a simulated task and also for a real robot, operating in an unaltered environment. The algorithm works well in a traditional learning setting, and demonstrates extremely good learning when bootstrapped with a small amount of human-provided data.
Abstract:
We report the detection of a 51-joule (320 +/- 90 EeV) cosmic ray by the Fly's Eye air shower detector in Utah. This is substantially greater than the energy of any previously reported cosmic ray. A Greisen-Zatsepin-Kuz'min cutoff of the energy spectrum (due to pion photoproduction energy losses) should occur below this energy unless the highest energy cosmic rays have traveled less than about 30 Mpc. The error box for the arrival direction in galactic coordinates is centered on b=9.6 deg, l=163.4 deg. The particle cascade reached a maximum size near a depth of 815 g/cm^2 in the atmosphere, a depth which does not uniquely identify the type of primary particle.
Abstract:
This is a partial account of the fascinating history of distance geometry. We make no claim to completeness, but we do promise a dazzling display of beautiful, elementary mathematics. We prove Heron's formula, Cauchy's theorem on the rigidity of polyhedra, Cayley's generalization of Heron's formula to higher dimensions, Menger's characterization of abstract semimetric spaces, a result of Gödel on metric spaces on the sphere, and Schoenberg's equivalence of distance and positive semidefinite matrices, which is at the basis of multidimensional scaling.
Abstract:
As a concept, affordance is integral to scholarly analysis across multiple fields—including media studies, science and technology studies, communication studies, ecological psychology, and design studies among others. Critics, however, rightly point to the following shortcomings: definitional confusion, a false binary in which artifacts either afford or do not, and failure to account for diverse subject-artifact relations. Addressing these critiques, this article demarcates the mechanisms of affordance—as artifacts request, demand, allow, encourage, discourage, and refuse—which take shape through interrelated conditions: perception, dexterity, and cultural and institutional legitimacy. Together, the mechanisms and conditions constitute a dynamic and structurally situated model that addresses how artifacts afford, for whom and under what circumstances.
Abstract:
This document presumptuously purports to explain how to do research. We give heuristics that may be useful in picking up the specific skills needed for research (reading, writing, programming) and for understanding and enjoying the process itself (methodology, topic and advisor selection, and emotional factors).
Abstract:
This paper proposes a new tree-based ensemble method for supervised classification and regression problems. It essentially consists of randomizing strongly both attribute and cut-point choice while splitting a tree node. In the extreme case, it builds totally randomized trees whose structures are independent of the output values of the learning sample. The strength of the randomization can be tuned to problem specifics by the appropriate choice of a parameter. We evaluate the robustness of the default choice of this parameter, and we also provide insight on how to adjust it in particular situations. Besides accuracy, the main strength of the resulting algorithm is computational efficiency. A bias/variance analysis of the Extra-Trees algorithm is also provided as well as a geometrical and a kernel characterization of the models induced.
Abstract:
Approximate dynamic programming has evolved, initially independently, within operations research, computer science and the engineering controls community, all searching for practical tools for solving sequential stochastic optimization problems. More so than other communities, operations research continued to develop the theory behind the basic model introduced by Bellman with discrete states and actions, even while authors as early as Bellman himself recognized its limits due to the "curse of dimensionality" inherent in discrete state spaces. In response to these limitations, subcommunities in computer science, control theory and operations research have developed a variety of methods for solving different classes of stochastic, dynamic optimization problems, creating the appearance of a jungle of competing approaches. In this article, we show that there is actually a common theme to these strategies, and underpinning the entire field remains the fundamental algorithmic strategies of value and policy iteration that were first introduced in the 1950's and 60's.
Abstract:
Datalog may be considered either an unusually powerful query language or a carefully limited logic programming language. Datalog is declarative, expressive, and optimizable, and has been applied successfully in a wide variety of problem domains. However, most use-cases require extending Datalog in an application-specific manner. In this paper we define Datafun, an analogue of Datalog supporting higher-order functional programming. The key idea is to track monotonicity with types.
Abstract:
We search for alternative musical scales that share the main advantages of classical scales: pitch frequencies that bear simple ratios to each other, and multiple keys based on an underlying chromatic scale with tempered tuning. We conduct the search by formulating a constraint satisfaction problem that is well suited for solution by constraint programming. We find that certain 11-note scales on a 19-note chromatic stand out as superior to all others. These scales enjoy harmonic and structural possibilities that go significantly beyond what is available in classical scales and therefore provide a possible medium for innovative musical composition.
Abstract:
Within Entertainment Computing, games research has grown to be its own area, with numerous publication venues dedicated to it. As this area evolves, it is fruitful to examine its overall development—which subcommunities and research interests were present from the start, which have come and gone, and which are currently active—to better understand the research community as a whole and where it may proceed. In this paper, we present a data-driven analysis and interactive visualization tool to shed light on how technical domains within the games research field have evolved from 2000 to 2013, based on publication data from over 8000 articles collected from 48 games research venues, including Entertainment Computing, FDG, AIIDE, and DiGRA. The approach we present is descriptive. We first used data mining algorithms to group related papers into clusters of similar research topics and evolve these clusters over time. We then designed an interactive visualization system, named Seagull, comprised of Sankey diagrams that allow us to interactively visualize and examine the transition and coalescing of different clusters across time. We present our descriptive analysis in this paper and also contribute the visualization interface to allow other researchers to examine the data and develop their own analysis.
Abstract:
The semi-formal analysis of Russian folk tales carried out by Vladimir Propp has often been used as theoretical background for the automated generation of stories. Its rigour and its exhaustive description of the constituent elements of Russian folk tales, and the enumeration of the patterns they follow, have acted as inspiration for several story generation systems, both sequential and interactive. Yet most of these efforts have attempted to generalize Propp's account to types of stories beyond the corpus that it arose from. In the process, a number of the valuable intuitions present in the original work are lost. The present paper revisits Propp's morphology to build a system that generates instances of Russian folk tales. Propp's view of the folk tale as a rigid sequence of character functions is phrased as a grammar that can be employed as a plot driver. Unification is used to incrementally build a conceptual representation of discourse by adding to an ongoing draft story actions that instantiate the character functions. Story actions are defined by pre and post conditions on the state of the plot to account for the causal relations crucial to narrative. The potential of the resulting system for providing a generic story generation system is discussed and possible lines of future work are discussed.
Abstract:
In this paper, we look at how gameplay can be used to tell stories in a game without the help of words. Through close readings of three wordless games with a strong narrative focus, Journey, Brothers: A Tale of Two Sons and A Bird Story, we explore how gameplay within wordless games can help to convey a narrative. We have identified four techniques by which gameplay is used for storytelling: gameplay as enacting narrative, manipulating player controls for narrative effect, gameplay for exploring narrative setting, and gameplay as time progression. We discuss these techniques in relation to existing concepts of player experience, and suggest ways gameplay can help to circumvent issues of ambiguity in wordless narrative in games.
Abstract:
We survey five mathematical discovery programs by looking in detail at the discovery processes they illustrate and the success they had. We focus on how they estimate the interestingness of concepts and conjectures and extract some common notions about interestingness in automated mathematical discovery. We detail how empirical evidence is used to give plausibility to conjectures, and the different ways in which a result can be thought of as novel. We also look at the ways in which the programs assess how surprising and complex a conjecture statement is, and the different ways in which the applicability of a concept or conjecture is used. Finally, we note how a user can set tasks for the program to achieve and how this affects the calculation of interestingness. We conclude with some hints on the use of interestingness measures for future developers of discovery programs in mathematics.
Abstract:
We examined the sequence of decision problems that are encountered in the game of Tetris and found that most of the problems are easy in the following sense: One can choose well among the available actions without knowing an evaluation function that scores well in the game. This is a consequence of three conditions that are prevalent in the game: simple dominance, cumulative dominance, and noncompensation. These conditions can be exploited to develop faster and more effective learning algorithms. In addition, they allow certain types of domain knowledge to be incorporated with ease into a learning algorithm. Among the sequential decision problems we encounter, it is unlikely that Tetris is unique or rare in having these properties.
Abstract:
Representing computationally everyday emotional states is a challenging task and, arguably, one of the most fundamental for affective computing. Standard practice in emotion annotation is to ask humans to assign an absolute value of intensity to each emotional behavior they observe. Psychological theories and evidence from multiple disciplines including neuroscience, economics and artificial intelligence, however, suggest that the task of assigning reference-based (relative) values to subjective notions is better aligned with the underlying representations than assigning absolute values. Evidence also shows that we use reference points, or else anchors, against which we evaluate values such as the emotional state of a stimulus; suggesting again that ordinal labels are a more suitable way to represent emotions. This paper draws together the theoretical reasons to favor relative over absolute labels for representing and annotating emotion, reviewing the literature across several disciplines. We go on to discuss good and bad practices of treating ordinal and other forms of annotation data, and make the case for preference learning methods as the appropriate approach for treating ordinal labels. We finally discuss the advantages of relative annotation with respect to both reliability and validity through a number of case studies in affective computing, and address common objections to the use of ordinal data. Overall, the thesis that emotions are by nature relative is supported by both theoretical arguments and evidence, and opens new horizons for the way emotions are viewed, represented and analyzed computationally.
Abstract:
We provide the first extensive account of an unknown story generator that was developed by linguist Joseph E. Grimes in the early 1960s. A pioneering system, it was the first to take a grammar-based approach and the first to operationalize Propp’s famous model. This is the opening paper in a series that will aim to reformulate the prevailing history of story generation in light of new findings we have made pertaining to several forgotten early projects. Our study here has been made possible by personal communication with the system's creator, Grimes, and excavation of three obscure contemporaneous sources. While the accepted knowledge in our field is that the earliest story generator was Sheldon Klein’s automatic novel writer, first reported in 1971, we show that Grimes’s system and two others preceded it. In doing this, we reveal a new earliest known system. With this paper, and follow-ups to it that are in progress, we aim to provide a new account of the area of story generation that lends our community insight as to where it came from and where it should go next. We hope others will join us in this mission.
Excerpt:
It is well known that even experienced scientists routinely misinterpret p-values in all sorts of ways, including confusion of statistical and practical significance, treating non-rejection as acceptance of the null hypothesis, and interpreting the p-value as some sort of replication probability or as the posterior probability that the null hypothesis is true. [...] At this point it would be natural for statisticians to think that this is a problem of education and communication. If we could just add a few more paragraphs to the relevant sections of our textbooks, and persuade applied practitioners to consult more with statisticians, then all would be well, or so goes this logic. [...] The problem does not seem just to be technical misunderstandings; rather, statistical analysis is being asked to do something that it simply can’t do, to bring out a signal from any data, no matter how noisy. We suspect that, to make progress in pedagogy, statisticians will have to give up some of the claims we have implicitly been making about the effectiveness of our methods.
Excerpt:
When people talk about computers in education they do not all have the same image in mind. Some think of using the computer to program the kid; others think of using the kid to program the computer. [...] In the real world computers are used in many different ways. Some are programmed to fly airplanes; not to *tell* a human pilot what to do, but to pull levers with their own electro-mechanical effectors... [...] Why then should computers in schools be confined to computing the sum of the squares of the first twenty odd numbers and similar so-called "problem-solving" uses? Why not use them to produce some action? [...] But our purpose here is not to complain of what other people have not done, but to tell of some of the exciting things you can do with the computer you have now or with the one you will be incited to get by the pages that follow.
Excerpt:
[L]ooking back to the first encounters of computers and design, one discovers a rich legacy of speculation on the implications of this merging. Inquiry into the early computational era (1965-1975) can therefore expose (part of) the cultural and historical origins of the popular but loosely defined term "computational design." [...] The intense impulse to situate the new entity of the computer in the traditional, empirical processes of design lead to assignments of anthropomorphic roles to the machine, such as the "clerk," the "partner," the "accountant," and others. These different "occupations" were eloquent metaphors denoting different approaches to the ways that the innate characteristics of the computer could be reconciled with the elusive characteristics of design, as well as to the new relationship of the machine as a design actor with the designer-author. The main body of this paper places these metaphors in conversation, thus revealing different models of computation, as well as different processes of design. The purpose of this short survey is to bring forth computational "role models" which survive until today, assert them as historical and cultural artifacts, and present their conceptual counterpoints, re-opening them for discussion.
Excerpt:
We are developing a model for human cognitive processing which assumes that a major component of the rules for calculating behavior are resident outside the individual, in the inherited, collective phenomena that anthropologists call 'culture'. Our model contains rules of behavior encoded in propositional structures such as frames or scripts, plus a method for calculating behavior by analogy. Propositional rules are transformed into situation state descriptions that are related by transformation operators which are also valid for transforming situations by analogy, and for planning by analogy. [...] We use the simulation model as a generator of operas, complete with textual, pictorial and musical output, all derived from a common semantic source. The output is in the medium of a videotape recording, and is generally entitled, Revolt in Flatland.
Abstract:
We argue that an evaluation of system behavior at the level of the music is required to usefully address the fundamental problems of music genre recognition (MGR), and indeed other tasks of music information retrieval, such as autotagging. A recent review of works in MGR since 1995 shows that most (82%) measure the capacity of a system to recognize genre by its classification accuracy. After reviewing evaluation in MGR, we show that neither classification accuracy, nor recall and precision, nor confusion tables, necessarily reflect the capacity of a system to recognize genre in musical signals. Hence, such figures of merit cannot be used to reliably rank, promote or discount the genre recognition performance of MGR systems if genre recognition (rather than identification by irrelevant confounding factors) is the objective. This motivates the development of a richer experimental toolbox for evaluating any system designed to intelligently extract information from music signals.