Foundations of Sequence-to-Sequence Modeling for Time Series

Vitaly Kuznetsov, Zelda Mariet*
arXiv preprint [arXiv]

The availability of large amounts of time series data, paired with the performance of deep-learning algorithms on a broad class of problems, has recently led to significant interest in the use of sequence-to-sequence models for time series forecasting. We provide the first theoretical analysis of this time series forecasting framework. We include a comparison of sequence-to-sequence modeling to classical time series models, and as such our theory can serve as a quantitative guide for practitioners choosing between different modeling methodologies.

  Author = {Vitaly Kuznetsov and Zelda Mariet},
  Title = {Foundations of Sequence-to-Sequence Modeling for Time Series},
  Year = {2018},
  Eprint = {arXiv:1805.03714}

Learning Determinantal Point Processes by Sampling Inferred Negatives

Zelda Mariet, Mike Gartrell, Suvrit Sra
arXiv preprint [arXiv]

Determinantal Point Processes (DPPs) have attracted significant interest from the machine-learning community due to their ability to elegantly and tractably model the delicate balance between quality and diversity of sets. We consider learning DPPs from data, a key task for DPPs; for this task, we introduce a novel optimization problem, Contrastive Estimation (CE), which encodes information about “negative” samples into the basic learning model. CE is grounded in the successful use of negative information in machine-vision and language modeling. Depending on the chosen negative distribution (which may be static or evolve during optimization), CE assumes two different forms, which we analyze theoretically and experimentally. We evaluate our new model on real-world datasets; on a challenging dataset, CE learning delivers a considerable improvement in predictive performance over a DPP learned without using contrastive information.

  author = {Zelda Mariet and Mike Gartrell and Suvrit Sra},
  title = {Learning Determinantal Point Processes by Sampling Inferred Negatives},
  year = {2018},
  Eprint = {arXiv:1802.05649},

Elementary Symmetric Polynomials for Optimal Experimental Design

Zelda Mariet, Suvrit Sra
Advances in Neural Information Processing Systems 30 (NIPS 2017) [PDF] [arXiv]

We revisit the classical problem of optimal experimental design (OED) under a new mathematical model grounded in a geometric motivation. Specifically, we introduce models based on elementary symmetric polynomials; these polynomials capture “partial volumes” and offer a graded interpolation between the widely used A-optimal and D-optimal design models, obtaining each of them as special cases. We analyze properties of our models, and derive both greedy and convex-relaxation algorithms for computing the associated designs. Our analysis establishes approximation guarantees on these algorithms, while our empirical results substantiate our claims and demonstrate a curious phenomenon concerning our greedy algorithm. Finally, as a byproduct, we obtain new results on the theory of elementary symmetric polynomials that may be of independent interest.

  title = {Elementary Symmetric Polynomials for Optimal Experimental Design},
  author = {Mariet, Zelda E. and Sra, Suvrit},
  booktitle = {Advances in Neural Information Processing Systems 30},
  editor = {I. Guyon and U. V. Luxburg and S. Bengio and H. Wallach and R. Fergus and S. Vishwanathan and R. Garnett},
  pages = {2136--2145},
  year = {2017},
  publisher = {Curran Associates, Inc.},
  url = {}

Kronecker Determinantal Point Processes

Zelda Mariet, Suvrit Sra
Advances in Neural Information Processing Systems 29 (NIPS 2016) [PDF] [arXiv]

Determinantal Point Processes (DPPs) are probabilistic models over all subsets a ground set of N items. They have recently gained prominence in several applications that rely on diverse subsets. However, their applicability to large problems is still limited due to O(N^3) complexity of core tasks such as sampling and learning. We enable efficient sampling and learning for DPPs by introducing KronDPP, a DPP model whose kernel matrix decomposes as a tensor product of multiple smaller kernel matrices. This decomposition immediately enables fast exact sampling. But contrary to what one may expect, leveraging the Kronecker product structure for speeding up DPP learning turns out to be more difficult. We overcome this challenge, and derive batch and stochastic optimization algorithms for efficiently learning the parameters of a KronDPP.

  title     = {Kronecker Determinantal Point Processes},
  author    = {Zelda Mariet and Suvrit Sra},
  booktitle = {Advances in Neural Information Processing Systems 29},
  editor    = {D. D. Lee and M. Sugiyama and U. V. Luxburg and I. Guyon and R. Garnett},
  pages     = {2694--2702},
  year      = {2016},
  publisher = {Curran Associates, Inc.},
  url       = {}

Learning and enforcing diversity with Determinantal Point Processes

Zelda Mariet
Master’s thesis [PDF]

As machine-learning techniques continue to require more data and become increasingly memory-heavy, being able to choose a subset of relevant, high-quality and diverse elements among large amounts of redundant or noisy data and parameters has become an important concern. Here, we approach this problem using Determinantal Point Processes (DPPs), probabilistic models that provide an intuitive and powerful way of balancing quality and diversity in sets of items. We introduce a novel, fixed-point algorithm for estimating the maximum likelihood parameters of a DPP, provide proof of convergence and discuss generalizations of this technique. We then apply DPPs to the difficult problem of detecting and eliminating redundancy in fully-connected layers of neural networks. By placing a DPP over a layer, we are able to sample a subset of neurons that perform non-overlapping computations and merge all other neurons of the layer into the previous diverse subset. This allows us to significantly reduce the size of the neural network while simultaneously maintaining a good performance.

  title = 	{Learning and enforcing diversity with Determinantal Point Processes},
  author = 	{Zelda Mariet},
  school = 	{Massachusetts Institute of Technology},
  year = 	{2016},

Diversity Networks: Neural Network Compression Using Determinantal Point Processes

Zelda Mariet, Suvrit Sra
International Conference on Learning Representations (ICLR 2016) [PDF] [arXiv]

We introduce Divnet, a flexible technique for learning networks with diverse neurons. Divnet models neuronal diversity by placing a Determinantal Point Process (DPP) over neurons in a given layer. It uses this DPP to select a subset of diverse neurons and subsequently fuses the redundant neurons into the selected ones. Compared with previous approaches, Divnet offers a more principled, flexible technique for capturing neuronal diversity and thus implicitly enforcing regularization. This enables effective auto-tuning of network architecture and leads to smaller network sizes without hurting performance. Moreover, through its focus on diversity and neuron fusing, Divnet remains compatible with other procedures that seek to reduce memory footprints of networks. We present experimental results to corroborate our claims: for pruning neural networks, Divnet is seen to be notably superior to competing approaches.

  author    = {Zelda Mariet and Suvrit Sra},
  title     = {Diversity Networks: Neural Network Compression Using Determinantal Point Processes},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year      = {2016}

Outlier Detection in Heterogeneous Datasets using Automatic Tuple Expansion

C. Pit–Claudel, Z. Mariet, R. Harding, S. Madden
Tech report (2016) [PDF]

Abstract—Rapidly developing areas of information technology are generating massive amounts of data. Human errors, sensor failures, and other unforeseen circumstances unfortunately tend to undermine the quality and consistency of these datasets by introducing outliers – data points that exhibit surprising behavior when compared to the rest of the data. Characterizing, locating, and in some cases eliminating these outliers offers interesting insight about the data under scrutiny and reinforces the confidence that one may have in conclusions drawn from otherwise noisy datasets. In this paper, we describe a tuple expansion procedure which reconstructs rich information from semantically poor SQL data types such as strings, integers, and floating point numbers. We then use this procedure as the foundation of a new user-guided outlier detection framework, dBoost, which relies on inference and statistical modeling of heterogeneous data to flag suspicious fields in database tuples. We show that this novel approach achieves good classification performance, both in traditional numerical datasets and in highly non-numerical contexts such as mostly textual datasets. Our implementation is publicly available, under version 3 of the GNU General Public License.

  title  = {Outlier Detection in Heterogeneous Datasets using Automatic Tuple Expansion},
  author = {Pit--Claudel, Clément and Mariet, Zelda and Harding, Rachael and Madden, Sam},
  year   = {2016}

Fixed-point algorithms for learning determinantal point processes

Zelda Mariet, Suvrit Sra
Proceedings of the 32nd International Conference on Machine Learning (ICML 2015) [PDF] [arXiv]

Determinantal point processes (DPPs) offer an elegant tool for encoding probabilities over subsets of a ground set. Discrete DPPs are parametrized by a positive semidefinite matrix (called the DPP kernel), and estimating this kernel is key to learning DPPs from observed data. We consider the task of learning the DPP kernel, and develop for it a surprisingly simple yet effective new algorithm. Our algorithm offers the following benefits over previous approaches: (a) it is much simpler; (b) it yields equally good and sometimes even better local maxima; and (c) it runs an order of magnitude faster on large problems. We present experimental results on both real and simulated data to illustrate the numerical performance of our technique.

  title = {Fixed-point algorithms for learning determinantal point processes},
  author = {Zelda Mariet and Suvrit Sra},
  booktitle = {Proceedings of the 32nd International Conference on Machine Learning},
  pages = {2389--2397},
  year = {2015},
  editor = {Francis Bach and David Blei},
  volume = {37},
  series = {Proceedings of Machine Learning Research},
  address = {Lille, France},
  month = {07--09 Jul},
  publisher = {PMLR},
  pdf = {},
  url = {},
function res = picard(L, sets, a)
    N = size(L, 1);
    I = eye(N,N);
    res = a * L * (pseudoinv(L, sets) - inv(I+L)) * L + L;

function theta = pseudoinv(L, T)
% T.Ys is the training set
% T.Y_fracs(j) represents how often the training sample T.Ys{j} appears in the training set
    theta = zeros(size(L, 1));
    for j = 1:length(T.Ys),
        set = T.Ys{j};
        theta(set,set) = theta(set,set) + T.Y_fracs(j) * inv(L(set,set));