EconBase is a hub for the econometrics community. It provides a continuously updated feed of econometrics papers from arXiv (econ.EM) and will soon feature reading groups, conferences, and collaborative projects.

Recent Research Updates

Automatically updated daily. Total papers in DB: 4609

No recent papers in the last 7 days.

Econometrics arXiv paper, submitted: 2025-10-20

Mixed LR-$C(α)$-type tests for irregular hypotheses, general criterion functions and misspecified models

Authors: Jean-Marie Dufour, Purevdorj Tuvaandorj

This paper introduces a likelihood ratio (LR)-type test that possesses the
robustness properties of \(C(\alpha)\)-type procedures in an extremum
estimation setting.
The test statistic is constructed by applying separate adjustments to the
restricted and unrestricted criterion functions, and is shown to be
asymptotically pivotal under minimal conditions. It features two main
robustness properties. First, unlike standard LR-type statistics, its null
asymptotic distribution remains chi-square even under model misspecification,
where the information matrix equality fails. Second, it accommodates irregular
hypotheses involving constrained parameter spaces, such as boundary parameters,
relying solely on root-\(n\)-consistent estimators for nuisance parameters.
When the model is correctly specified, no boundary constraints are present, and
parameters are estimated by extremum estimators, the proposed test reduces to
the standard LR-type statistic.
Simulations with ARCH models, where volatility parameters are constrained to
be nonnegative, and parametric survival regressions with potentially monotone
increasing hazard functions, demonstrate that our test maintains accurate size
and exhibits good power. An empirical application to a two-way error components
model shows that the proposed test can provide more informative inference than
the conventional \(t\)-test.

arXiv link: http://arxiv.org/abs/2510.17070v1

Econometrics arXiv paper, submitted: 2025-10-19

Equilibrium-Constrained Estimation of Recursive Logit Choice Models

Authors: Hung Tran, Tien Mai, Minh Hoang Ha

The recursive logit (RL) model provides a flexible framework for modeling
sequential decision-making in transportation and choice networks, with
important applications in route choice analysis, multiple discrete choice
problems, and activity-based travel demand modeling. Despite its versatility,
estimation of the RL model typically relies on nested fixed-point (NFXP)
algorithms that are computationally expensive and prone to numerical
instability. We propose a new approach that reformulates the maximum likelihood
estimation problem as an optimization problem with equilibrium constraints,
where both the structural parameters and the value functions are treated as
decision variables. We further show that this formulation can be equivalently
transformed into a conic optimization problem with exponential cones, enabling
efficient solution using modern conic solvers such as MOSEK. Experiments on
synthetic and real-world datasets demonstrate that our convex reformulation
achieves accuracy comparable to traditional methods while offering significant
improvements in computational stability and efficiency, thereby providing a
practical and scalable alternative for recursive logit model estimation.

arXiv link: http://arxiv.org/abs/2510.16886v1

Econometrics arXiv paper, submitted: 2025-10-19

Local Overidentification and Efficiency Gains in Modern Causal Inference and Data Combination

Authors: Xiaohong Chen, Haitian Xie

This paper studies nonparametric local (over-)identification, in the sense of
Chen and Santos (2018), and the associated semiparametric efficiency in modern
causal frameworks. We develop a unified approach that begins by translating
structural models with latent variables into their induced statistical models
of observables and then analyzes local overidentification through conditional
moment restrictions. We apply this approach to three leading models: (i) the
general treatment model under unconfoundedness, (ii) the negative control
model, and (iii) the long-term causal inference model under unobserved
confounding. The first design yields a locally just-identified statistical
model, implying that all regular asymptotically linear estimators of the
treatment effect share the same asymptotic variance, equal to the (trivial)
semiparametric efficiency bound. In contrast, the latter two models involve
nonparametric endogeneity and are naturally locally overidentified;
consequently, some doubly robust orthogonal moment estimators of the average
treatment effect are inefficient. Whereas existing work typically imposes
strong conditions to restore just-identification before deriving the efficiency
bound, we relax such assumptions and characterize the general efficiency bound,
along with efficient estimators, in the overidentified models (ii) and (iii).

arXiv link: http://arxiv.org/abs/2510.16683v1

Econometrics arXiv paper, submitted: 2025-10-19

On Quantile Treatment Effects, Rank Similarity,and Variation of Instrumental Variables

Authors: Sukjin Han, Haiqing Xu

This paper develops a nonparametric framework to identify and estimate
distributional treatment effects under nonseparable endogeneity. We begin by
revisiting the widely adopted rank similarity (RS) assumption and
characterizing it by the relationship it imposes between observed and
counterfactual potential outcome distributions. The characterization highlights
the restrictiveness of RS, motivating a weaker identifying condition. Under
this alternative, we construct identifying bounds on the distributional
treatment effects of interest through a linear semi-infinite programming (SILP)
formulation. Our identification strategy also clarifies how richer exogenous
instrument variation, such as multi-valued or multiple instruments, can further
tighten these bounds. Finally, exploiting the SILP's saddle-point structure and
Karush-Kuhn-Tucker (KKT) conditions, we establish large-sample properties for
the empirical SILP: consistency and asymptotic distribution results for the
estimated bounds and associated solutions.

arXiv link: http://arxiv.org/abs/2510.16681v1

Econometrics arXiv paper, submitted: 2025-10-18

Causal Inference in High-Dimensional Generalized Linear Models with Binary Outcomes

Authors: Jing Kong

This paper proposes a debiased estimator for causal effects in
high-dimensional generalized linear models with binary outcomes and general
link functions. The estimator augments a regularized regression plug-in with
weights computed from a convex optimization problem that approximately balances
link-derivative-weighted covariates and controls variance; it does not rely on
estimated propensity scores. Under standard conditions, the estimator is
$n$-consistent and asymptotically normal for dense linear contrasts and
causal parameters. Simulation results show the superior performance of our
approach in comparison to alternatives such as inverse propensity score
estimators and double machine learning estimators in finite samples. In an
application to the National Supported Work training data, our estimates and
confidence intervals are close to the experimental benchmark.

arXiv link: http://arxiv.org/abs/2510.16669v1

Econometrics arXiv paper, submitted: 2025-10-18

On the Asymptotics of the Minimax Linear Estimator

Authors: Jing Kong

Many causal estimands, such as average treatment effects under
unconfoundedness, can be written as continuous linear functionals of an unknown
regression function. We study a weighting estimator that sets weights by a
minimax procedure: solving a convex optimization problem that trades off
worst-case conditional bias against variance. Despite its growing use, general
root-$n$ theory for this method has been limited. This paper fills that gap.
Under regularity conditions, we show that the minimax linear estimator is
root-$n$ consistent and asymptotically normal, and we derive its asymptotic
variance. These results justify ignoring worst-case bias when forming
large-sample confidence intervals and make inference less sensitive to the
scaling of the function class. With a mild variance condition, the estimator
attains the semiparametric efficiency bound, so an augmentation step commonly
used in the literature is not needed to achieve first-order optimality.
Evidence from simulations and three empirical applications, including
job-training and minimum-wage policies, points to a simple rule: in designs
satisfying our regularity conditions, standard-error confidence intervals
suffice; otherwise, bias-aware intervals remain important.

arXiv link: http://arxiv.org/abs/2510.16661v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-10-18

From Reviews to Actionable Insights: An LLM-Based Approach for Attribute and Feature Extraction

Authors: Khaled Boughanmi, Kamel Jedidi, Nour Jedidi

This research proposes a systematic, large language model (LLM) approach for
extracting product and service attributes, features, and associated sentiments
from customer reviews. Grounded in marketing theory, the framework
distinguishes perceptual attributes from actionable features, producing
interpretable and managerially actionable insights. We apply the methodology to
20,000 Yelp reviews of Starbucks stores and evaluate eight prompt variants on a
random subset of reviews. Model performance is assessed through agreement with
human annotations and predictive validity for customer ratings. Results show
high consistency between LLMs and human coders and strong predictive validity,
confirming the reliability of the approach. Human coders required a median of
six minutes per review, whereas the LLM processed each in two seconds,
delivering comparable insights at a scale unattainable through manual coding.
Managerially, the analysis identifies attributes and features that most
strongly influence customer satisfaction and their associated sentiments,
enabling firms to pinpoint "joy points," address "pain points," and design
targeted interventions. We demonstrate how structured review data can power an
actionable marketing dashboard that tracks sentiment over time and across
stores, benchmarks performance, and highlights high-leverage features for
improvement. Simulations indicate that enhancing sentiment for key service
features could yield 1-2% average revenue gains per store.

arXiv link: http://arxiv.org/abs/2510.16551v1

Econometrics arXiv paper, submitted: 2025-10-17

Prediction Intervals for Model Averaging

Authors: Zhongjun Qu, Wendun Wang, Xiaomeng Zhang

A rich set of frequentist model averaging methods has been developed, but
their applications have largely been limited to point prediction, as measuring
prediction uncertainty in general settings remains an open problem. In this
paper we propose prediction intervals for model averaging based on conformal
inference. These intervals cover out-of-sample realizations of the outcome
variable with a pre-specified probability, providing a way to assess predictive
uncertainty beyond point prediction. The framework allows general model
misspecification and applies to averaging across multiple models that can be
nested, disjoint, overlapping, or any combination thereof, with weights that
may depend on the estimation sample. We establish coverage guarantees under two
sets of assumptions: exact finite-sample validity under exchangeability,
relevant for cross-sectional data, and asymptotic validity under stationarity,
relevant for time-series data. We first present a benchmark algorithm and then
introduce a locally adaptive refinement and split-sample procedures that
broaden applicability. The methods are illustrated with a cross-sectional
application to real estate appraisal and a time-series application to equity
premium forecasting.

arXiv link: http://arxiv.org/abs/2510.16224v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-10-17

Learning Correlated Reward Models: Statistical Barriers and Opportunities

Authors: Yeshwanth Cherapanamjeri, Constantinos Daskalakis, Gabriele Farina, Sobhan Mohammadpour

Random Utility Models (RUMs) are a classical framework for modeling user
preferences and play a key role in reward modeling for Reinforcement Learning
from Human Feedback (RLHF). However, a crucial shortcoming of many of these
techniques is the Independence of Irrelevant Alternatives (IIA) assumption,
which collapses all human preferences to a universal underlying utility
function, yielding a coarse approximation of the range of human preferences. On
the other hand, statistical and computational guarantees for models avoiding
this assumption are scarce. In this paper, we investigate the statistical and
computational challenges of learning a correlated probit model, a
fundamental RUM that avoids the IIA assumption. First, we establish that the
classical data collection paradigm of pairwise preference data is
fundamentally insufficient to learn correlational information,
explaining the lack of statistical and computational guarantees in this
setting. Next, we demonstrate that best-of-three preference data
provably overcomes these shortcomings, and devise a statistically and
computationally efficient estimator with near-optimal performance. These
results highlight the benefits of higher-order preference data in learning
correlated utilities, allowing for more fine-grained modeling of human
preferences. Finally, we validate these theoretical guarantees on several
real-world datasets, demonstrating improved personalization of human
preferences.

arXiv link: http://arxiv.org/abs/2510.15839v1

Econometrics arXiv updated paper (originally submitted: 2025-10-17)

Dynamic Spatial Treatment Effects as Continuous Functionals: Theory and Evidence from Healthcare Access

Authors: Tatsuru Kikuchi

I develop a continuous functional framework for spatial treatment effects
grounded in Navier-Stokes partial differential equations. Rather than discrete
treatment parameters, the framework characterizes treatment intensity as
continuous functions $\tau(x, t)$ over space-time, enabling rigorous
analysis of boundary evolution, spatial gradients, and cumulative exposure.
Empirical validation using 32,520 U.S. ZIP codes demonstrates exponential
spatial decay for healthcare access ($\kappa = 0.002837$ per km, $R^2 =
0.0129$) with detectable boundaries at 37.1 km. The framework successfully
diagnoses when scope conditions hold: positive decay parameters validate
diffusion assumptions near hospitals, while negative parameters correctly
signal urban confounding effects. Heterogeneity analysis reveals 2-13 $\times$
stronger distance effects for elderly populations and substantial education
gradients. Model selection strongly favors logarithmic decay over exponential
($\Delta AIC > 10,000$), representing a middle ground between
exponential and power-law decay. Applications span environmental economics,
banking, and healthcare policy. The continuous functional framework provides
predictive capability ($d^*(t) = \xi^* t$), parameter sensitivity
($\partial d^*/\partial \nu$), and diagnostic tests unavailable in traditional
difference-in-differences approaches.

arXiv link: http://arxiv.org/abs/2510.15324v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2025-10-16

Regression Model Selection Under General Conditions

Authors: Amaze Lusompa

Model selection criteria are one of the most important tools in statistics.
Proofs showing a model selection criterion is asymptotically optimal are
tailored to the type of model (linear regression, quantile regression,
penalized regression, etc.), the estimation method (linear smoothers, maximum
likelihood, generalized method of moments, etc.), the type of data (i.i.d.,
dependent, high dimensional, etc.), and the type of model selection criterion.
Moreover, assumptions are often restrictive and unrealistic making it a slow
and winding process for researchers to determine if a model selection criterion
is selecting an optimal model. This paper provides general proofs showing
asymptotic optimality for a wide range of model selection criteria under
general conditions. This paper not only asymptotically justifies model
selection criteria for most situations, but it also unifies and extends a range
of previously disparate results.

arXiv link: http://arxiv.org/abs/2510.14822v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-10-16

Evaluating Policy Effects under Network Interference without Network Information: A Transfer Learning Approach

Authors: Tadao Hoshino

This paper develops a sensitivity analysis framework that transfers the
average total treatment effect (ATTE) from source data with a fully observed
network to target data whose network is completely unknown. The ATTE represents
the average social impact of a policy that assigns the treatment to every
individual in the dataset. We postulate a covariate-shift type assumption that
both source and target datasets share the same conditional mean outcome.
However, because the target network is unobserved, this assumption alone is not
sufficient to pin down the ATTE for the target data. To address this issue, we
consider a sensitivity analysis based on the uncertainty of the target
network's degree distribution, where the extent of uncertainty is measured by
the Wasserstein distance from a given reference degree distribution. We then
construct bounds on the target ATTE using a linear programming-based estimator.
The limiting distribution of the bound estimator is derived via the functional
delta method, and we develop a wild bootstrap approach to approximate the
distribution. As an empirical illustration, we revisit the social network
experiment on farmers' weather insurance adoption in China by Cai et al.
(2015).

arXiv link: http://arxiv.org/abs/2510.14415v1

Econometrics arXiv updated paper (originally submitted: 2025-10-16)

Dynamic Spatial Treatment Effect Boundaries: A Continuous Functional Framework from Navier-Stokes Equations

Authors: Tatsuru Kikuchi

I develop a comprehensive theoretical framework for dynamic spatial treatment
effect boundaries using continuous functional definitions grounded in
Navier-Stokes partial differential equations. Rather than discrete treatment
effect estimators, the framework characterizes treatment intensity as a
continuous function $\tau(x, t)$ over space-time, enabling rigorous
analysis of propagation dynamics, boundary evolution, and cumulative exposure
patterns. Building on exact self-similar solutions expressible through Kummer
confluent hypergeometric and modified Bessel functions, I establish that
treatment effects follow scaling laws $\tau(d, t) = t^{-\alpha} f(d/t^\beta)$
where exponents characterize diffusion mechanisms. Empirical validation using
42 million TROPOMI satellite observations of NO$_2$ pollution from U.S.
coal-fired power plants demonstrates strong exponential spatial decay
($\kappa_s = 0.004$ per km, $R^2 = 0.35$) with detectable boundaries at 572 km.
Monte Carlo simulations confirm superior performance over discrete parametric
methods in boundary detection and false positive avoidance (94% vs 27%
correct rejection). Regional heterogeneity analysis validates diagnostic
capability: positive decay parameters within 100 km confirm coal plant
dominance; negative parameters beyond 100 km correctly signal when urban
sources dominate. The continuous functional perspective unifies spatial
econometrics with mathematical physics, providing theoretically grounded
methods for boundary detection, exposure quantification, and policy evaluation
across environmental economics, banking, and healthcare applications.

arXiv link: http://arxiv.org/abs/2510.14409v2

Econometrics arXiv paper, submitted: 2025-10-16

Debiased Kernel Estimation of Spot Volatility in the Presence of Infinite Variation Jumps

Authors: B. Cooper Boniece, José E. Figueroa-López, Tianwei Zhou

Volatility estimation is a central problem in financial econometrics, but
becomes particularly challenging when jump activity is high, a phenomenon
observed empirically in highly traded financial securities. In this paper, we
revisit the problem of spot volatility estimation for an It\^o semimartingale
with jumps of unbounded variation. We construct truncated kernel-based
estimators and debiased variants that extend the efficiency frontier for spot
volatility estimation in terms of the jump activity index $Y$, raising the
previous bound $Y<4/3$ to $Y<20/11$, thereby covering nearly the entire
admissible range $Y<2$. Compared with earlier work, our approach attains
smaller asymptotic variances through the use of unbounded kernels, is simpler
to implement, and has broader applicability under more flexible model
assumptions. A comprehensive simulation study confirms that our procedures
substantially outperform competing methods in finite samples.

arXiv link: http://arxiv.org/abs/2510.14285v1

Econometrics arXiv updated paper (originally submitted: 2025-10-15)

Nonparametric Identification of Spatial Treatment Effect Boundaries: Evidence from Bank Branch Consolidation

Authors: Tatsuru Kikuchi

I develop a nonparametric framework for identifying spatial boundaries of
treatment effects without imposing parametric functional form restrictions. The
method employs local linear regression with data-driven bandwidth selection to
flexibly estimate spatial decay patterns and detect treatment effect
boundaries. Monte Carlo simulations demonstrate that the nonparametric approach
exhibits lower bias and correctly identifies the absence of boundaries when
none exist, unlike parametric methods that may impose spurious spatial
patterns. I apply this framework to bank branch openings during 2015--2020,
matching 5,743 new branches to 5.9 million mortgage applications across 14,209
census tracts. The analysis reveals that branch proximity significantly affects
loan application volume (8.5% decline per 10 miles) but not approval rates,
consistent with branches stimulating demand through local presence while credit
decisions remain centralized. Examining branch survival during the digital
transformation era (2010--2023), I find a non-monotonic relationship with area
income: high-income areas experience more closures despite conventional wisdom.
This counterintuitive pattern reflects strategic consolidation of redundant
branches in over-banked wealthy urban areas rather than discrimination against
poor neighborhoods. Controlling for branch density, urbanization, and
competition, the direct income effect diminishes substantially, with branch
density emerging as the primary determinant of survival. These findings
demonstrate the necessity of flexible nonparametric methods for detecting
complex spatial patterns that parametric models would miss, and challenge
simplistic narratives about banking deserts by revealing the organizational
complexity underlying spatial consolidation decisions.

arXiv link: http://arxiv.org/abs/2510.13148v2

Econometrics arXiv paper, submitted: 2025-10-15

Nonparametric Identification of Spatial Treatment Effect Boundaries: Evidence from Bank Branch Consolidation

Authors: Tatsuru Kikuchi

I develop a nonparametric framework for identifying spatial boundaries of
treatment effects without imposing parametric functional form restrictions. The
method employs local linear regression with data-driven bandwidth selection to
flexibly estimate spatial decay patterns and detect treatment effect
boundaries. Monte Carlo simulations demonstrate that the nonparametric approach
exhibits lower bias and correctly identifies the absence of boundaries when
none exist, unlike parametric methods that may impose spurious spatial
patterns. I apply this framework to bank branch openings during 2015--2020,
matching 5,743 new branches to 5.9 million mortgage applications across 14,209
census tracts. The analysis reveals that branch proximity significantly affects
loan application volume (8.5% decline per 10 miles) but not approval rates,
consistent with branches stimulating demand through local presence while credit
decisions remain centralized. Examining branch survival during the digital
transformation era (2010--2023), I find a non-monotonic relationship with area
income: high-income areas experience more closures despite conventional wisdom.
This counterintuitive pattern reflects strategic consolidation of redundant
branches in over-banked wealthy urban areas rather than discrimination against
poor neighborhoods. Controlling for branch density, urbanization, and
competition, the direct income effect diminishes substantially, with branch
density emerging as the primary determinant of survival. These findings
demonstrate the necessity of flexible nonparametric methods for detecting
complex spatial patterns that parametric models would miss, and challenge
simplistic narratives about banking deserts by revealing the organizational
complexity underlying spatial consolidation decisions.

arXiv link: http://arxiv.org/abs/2510.13148v1

Econometrics arXiv paper, submitted: 2025-10-14

Beyond Returns: A Candlestick-Based Approach to Spot Covariance Estimation

Authors: Yasin Simsek

Spot covariance estimation is commonly based on high-frequency open-to-close
return data over short time windows, but such approaches face a trade-off
between statistical accuracy and localization. In this paper, I introduce a new
estimation framework using high-frequency candlestick data, which include open,
high, low, and close prices, effectively addressing this trade-off. By
exploiting the information contained in candlesticks, the proposed method
improves estimation accuracy relative to benchmarks while preserving local
structure. I further develop a test for spot covariance inference based on
candlesticks that demonstrates reasonable size control and a notable increase
in power, particularly in small samples. Motivated by recent work in the
finance literature, I empirically test the market neutrality of the iShares
Bitcoin Trust ETF (IBIT) using 1-minute candlestick data for the full year of
2024. The results show systematic deviations from market neutrality, especially
in periods of market stress. An event study around FOMC announcements further
illustrates the new method's ability to detect subtle shifts in response to
relatively mild information events.

arXiv link: http://arxiv.org/abs/2510.12911v1

Econometrics arXiv updated paper (originally submitted: 2025-10-14)

Nonparametric Identification and Estimation of Spatial Treatment Effect Boundaries: Evidence from 42 Million Pollution Observations

Authors: Tatsuru Kikuchi

This paper develops a nonparametric framework for identifying and estimating
spatial boundaries of treatment effects in settings with geographic spillovers.
While atmospheric dispersion theory predicts exponential decay of pollution
under idealized assumptions, these assumptions -- steady winds, homogeneous
atmospheres, flat terrain -- are systematically violated in practice. I
establish nonparametric identification of spatial boundaries under weak
smoothness and monotonicity conditions, propose a kernel-based estimator with
data-driven bandwidth selection, and derive asymptotic theory for inference.
Using 42 million satellite observations of NO$_2$ concentrations near coal
plants (2019-2021), I find that nonparametric kernel regression reduces
prediction errors by 1.0 percentage point on average compared to parametric
exponential decay assumptions, with largest improvements at policy-relevant
distances: 2.8 percentage points at 10 km (near-source impacts) and 3.7
percentage points at 100 km (long-range transport). Parametric methods
systematically underestimate near-source concentrations while overestimating
long-range decay. The COVID-19 pandemic provides a natural experiment
validating the framework's temporal sensitivity: NO$_2$ concentrations dropped
4.6% in 2020, then recovered 5.7% in 2021. These results demonstrate that
flexible, data-driven spatial methods substantially outperform restrictive
parametric assumptions in environmental policy applications.

arXiv link: http://arxiv.org/abs/2510.12289v2

Econometrics arXiv paper, submitted: 2025-10-14

Nonparametric Identification and Estimation of Spatial Treatment Effect Boundaries: Evidence from 42 Million Pollution Observations

Authors: Tatsuru Kikuchi

This paper develops a nonparametric framework for identifying and estimating
spatial boundaries of treatment effects in settings with geographic spillovers.
While atmospheric dispersion theory predicts exponential decay of pollution
under idealized assumptions, these assumptions -- steady winds, homogeneous
atmospheres, flat terrain -- are systematically violated in practice. I
establish nonparametric identification of spatial boundaries under weak
smoothness and monotonicity conditions, propose a kernel-based estimator with
data-driven bandwidth selection, and derive asymptotic theory for inference.
Using 42 million satellite observations of NO$_2$ concentrations near coal
plants (2019-2021), I find that nonparametric kernel regression reduces
prediction errors by 1.0 percentage point on average compared to parametric
exponential decay assumptions, with largest improvements at policy-relevant
distances: 2.8 percentage points at 10 km (near-source impacts) and 3.7
percentage points at 100 km (long-range transport). Parametric methods
systematically underestimate near-source concentrations while overestimating
long-range decay. The COVID-19 pandemic provides a natural experiment
validating the framework's temporal sensitivity: NO$_2$ concentrations dropped
4.6% in 2020, then recovered 5.7% in 2021. These results demonstrate that
flexible, data-driven spatial methods substantially outperform restrictive
parametric assumptions in environmental policy applications.

arXiv link: http://arxiv.org/abs/2510.12289v1

Econometrics arXiv paper, submitted: 2025-10-14

Optimal break tests for large linear time series models

Authors: Abhimanyu Gupta, Myung Hwan Seo

We develop a class of optimal tests for a structural break occurring at an
unknown date in infinite and growing-order time series regression models, such
as AR($\infty$), linear regression with increasingly many covariates, and
nonparametric regression. Under an auxiliary i.i.d. Gaussian error assumption,
we derive an average power optimal test, establishing a growing-dimensional
analog of the exponential tests of Andrews and Ploberger (1994) to handle
identification failure under the null hypothesis of no break. Relaxing the
i.i.d. Gaussian assumption to a more general dependence structure, we establish
a functional central limit theorem for the underlying stochastic processes,
which features an extra high-order serial dependence term due to the growing
dimension. We robustify our test both against this term and finite sample bias
and illustrate its excellent performance and practical relevance in a Monte
Carlo study and a real data empirical example.

arXiv link: http://arxiv.org/abs/2510.12262v1

Econometrics arXiv paper, submitted: 2025-10-14

L2-relaxation for Economic Prediction

Authors: Zhentao Shi, Yishu Wang

We leverage an ensemble of many regressors, the number of which can exceed
the sample size, for economic prediction. An underlying latent factor structure
implies a dense regression model with highly correlated covariates. We propose
the L2-relaxation method for estimating the regression coefficients and
extrapolating the out-of-sample (OOS) outcomes. This framework can be applied
to policy evaluation using the panel data approach (PDA), where we further
establish inference for the average treatment effect. In addition, we extend
the traditional single unit setting in PDA to allow for many treated units with
a short post-treatment period. Monte Carlo simulations demonstrate that our
approach exhibits excellent finite sample performance for both OOS prediction
and policy evaluation. We illustrate our method with two empirical examples:
(i) predicting China's producer price index growth rate and evaluating the
effect of real estate regulations, and (ii) estimating the impact of Brexit on
the stock returns of British and European companies.

arXiv link: http://arxiv.org/abs/2510.12183v1

Econometrics arXiv paper, submitted: 2025-10-13

Estimating Variances for Causal Panel Data Estimators

Authors: Alexander Almeida, Susan Athey, Guido Imbens, Eva Lestant, Alexia Olaizola

This paper studies variance estimators in panel data settings. There has been
a recent surge in research on panel data models with a number of new estimators
proposed. However, there has been less attention paid to the quantification of
the precision of these estimators. Of the variance estimators that have been
proposed, their relative merits are not well understood. In this paper we
develop a common framework for comparing some of the proposed variance
estimators for generic point estimators. We reinterpret three commonly used
approaches as targeting different conditional variances under an
exchangeability assumption. We find that the estimators we consider are all
valid on average, but that their performance in terms of power differs
substantially depending on the heteroskedasticity structure of the data.
Building on these insights, we propose a new variance estimator that flexibly
accounts for heteroskedasticity in both the unit and time dimensions, and
delivers superior statistical power in realistic panel data settings.

arXiv link: http://arxiv.org/abs/2510.11841v1

Econometrics arXiv paper, submitted: 2025-10-13

Compositional difference-in-differences for categorical outcomes

Authors: Onil Boussim

In difference-in-differences (DiD) settings with categorical outcomes,
treatment effects often operate on both total quantities (e.g., voter turnout)
and category shares (e.g., vote distribution across parties). In this context,
linear DiD models can be problematic: they suffer from scale dependence, may
produce negative counterfactual quantities, and are inconsistent with discrete
choice theory. We propose compositional DiD (CoDiD), a new method that
identifies counterfactual categorical quantities, and thus total levels and
shares, under a parallel growths assumption. The assumption states that, absent
treatment, each category's size grows or shrinks at the same proportional rate
in treated and control groups. In a random utility framework, we show that this
implies parallel evolution of relative preferences between any pair of
categories. Analytically, we show that it also means the shares are reallocated
in the same way in both groups in the absence of treatment. Finally,
geometrically, it corresponds to parallel trajectories (or movements) of
probability mass functions of the two groups in the probability simplex under
Aitchison geometry. We extend CoDiD to i) derive bounds under relaxed
assumptions, ii) handle staggered adoption, and iii) propose a synthetic DiD
analog. We illustrate the method's empirical relevance through two
applications: first, we examine how early voting reforms affect voter choice in
U.S. presidential elections; second, we analyze how the Regional Greenhouse Gas
Initiative (RGGI) affected the composition of electricity generation across
sources such as coal, natural gas, nuclear, and renewables.

arXiv link: http://arxiv.org/abs/2510.11659v1

Econometrics arXiv cross-link from cs.CE (cs.CE), submitted: 2025-10-13

A mathematical model for pricing perishable goods for quick-commerce applications

Authors: Milon Bhattacharya

Quick commerce (q-commerce) is one of the fastest growing sectors in India.
It provides informal employment to approximately 4,50,000 workers, and it is
estimated to become a USD 200 Billion industry by 2026. A significant portion
of this industry deals with perishable goods. (e.g. milk, dosa batter etc.)
These are food items which are consumed relatively fresh by the consumers and
therefore their order volume is high and repetitive even when the average
basket size is relatively small. The fundamental challenge for the retailer is
that, increasing selling price would hamper sales and would lead to unsold
inventory. On the other hand setting a price less, would lead to forgoing of
potential revenue. This paper attempts to propose a mathematical model which
formalizes this dilemma. The problem statement is not only important for
improving the unit economics of the perennially loss making quick commerce
firms, but also would lead to a trickle-down effect in improving the conditions
of the gig workers as observed in [4]. The sections below describe the
mathematical formulation. The results from the simulation would be published in
a follow-up study.

arXiv link: http://arxiv.org/abs/2510.11360v1

Econometrics arXiv paper, submitted: 2025-10-13

Superstars or Super-Villains? Productivity Spillovers and Firm Dynamics in Indonesia

Authors: Mohammad Zeqi Yasin

Do industrial "superstars" help others up or crowd them out? We examine the
relationship between the spillovers of superstar firms (those with the top
market share in their industry) and the productivity dynamics in Indonesia.
Employing data on Indonesian manufacturing firms from 2001 to 2015, we find
that superstar exposures in the market raise both the productivity level and
the growth of non-superstar firms through horizontal (within a sector-province)
and vertical (across sectors) channels. When we distinguish by ownership,
foreign superstars consistently encourage productivity except through the
horizontal channel. In contrast, domestic superstars generate positive
spillovers through both horizontal and vertical linkages, indicating that
foreign firms do not solely drive positive externalities. Furthermore, despite
overall productivity growth being positive in 2001-2015, the source of negative
growth is mainly driven by within-group reallocation, evidence of misallocation
among surviving firms, notably by domestic superstars. Although Indonesian
superstar firms are more efficient in their operations, their relatively modest
growth rates suggest a potential stagnation, which can be plausibly attributed
to limited innovation activity or a slow pace of adopting new technologies.

arXiv link: http://arxiv.org/abs/2510.11139v1

Econometrics arXiv paper, submitted: 2025-10-13

Spatial and Temporal Boundaries in Difference-in-Differences: A Framework from Navier-Stokes Equation

Authors: Tatsuru Kikuchi

This paper develops a unified framework for identifying spatial and temporal
boundaries of treatment effects in difference-in-differences designs. Starting
from fundamental fluid dynamics equations (Navier-Stokes), we derive conditions
under which treatment effects decay exponentially in space and time, enabling
researchers to calculate explicit boundaries beyond which effects become
undetectable. The framework encompasses both linear (pure diffusion) and
nonlinear (advection-diffusion with chemical reactions) regimes, with testable
scope conditions based on dimensionless numbers from physics (P\'eclet and
Reynolds numbers). We demonstrate the framework's diagnostic capability using
air pollution from coal-fired power plants. Analyzing 791 ground-based
PM$_{2.5}$ monitors and 189,564 satellite-based NO$_2$ grid cells in the
Western United States over 2019-2021, we find striking regional heterogeneity:
within 100 km of coal plants, both pollutants show positive spatial decay
(PM$_{2.5}$: $\kappa_s = 0.00200$, $d^* = 1,153$ km; NO$_2$: $\kappa_s =
0.00112$, $d^* = 2,062$ km), validating the framework. Beyond 100 km, negative
decay parameters correctly signal that urban sources dominate and diffusion
assumptions fail. Ground-level PM$_{2.5}$ decays approximately twice as fast as
satellite column NO$_2$, consistent with atmospheric transport physics. The
framework successfully diagnoses its own validity in four of eight analyzed
regions, providing researchers with physics-based tools to assess whether their
spatial difference-in-differences setting satisfies diffusion assumptions
before applying the estimator. Our results demonstrate that rigorous boundary
detection requires both theoretical derivation from first principles and
empirical validation of underlying physical assumptions.

arXiv link: http://arxiv.org/abs/2510.11013v1

Econometrics arXiv paper, submitted: 2025-10-13

Macroeconomic Forecasting and Machine Learning

Authors: Ta-Chung Chi, Ting-Han Fan, Raffaele M. Ghigliazza, Domenico Giannone, Zixuan, Wang

We forecast the full conditional distribution of macroeconomic outcomes by
systematically integrating three key principles: using high-dimensional data
with appropriate regularization, adopting rigorous out-of-sample validation
procedures, and incorporating nonlinearities. By exploiting the rich
information embedded in a large set of macroeconomic and financial predictors,
we produce accurate predictions of the entire profile of macroeconomic risk in
real time. Our findings show that regularization via shrinkage is essential to
control model complexity, while introducing nonlinearities yields limited
improvements in predictive accuracy. Out-of-sample validation plays a critical
role in selecting model architecture and preventing overfitting.

arXiv link: http://arxiv.org/abs/2510.11008v1

Econometrics arXiv paper, submitted: 2025-10-13

Identifying treatment effects on categorical outcomes in IV models

Authors: Onil Boussim

This paper provides a nonparametric framework for causal inference with
categorical outcomes under binary treatment and binary instrument settings. We
decompose the observed joint probability of outcomes and treatment into
marginal probabilities of potential outcomes and treatment, and association
parameters that capture selection bias due to unobserved heterogeneity. Under a
novel identifying assumption, association similarity, which requires the
dependence between unobserved factors and potential outcomes to be invariant
across treatment states, we achieve point identification of the full
distribution of potential outcomes. Recognizing that this assumption may be
strong in some contexts, we propose two weaker alternatives: monotonic
association, which restricts the direction of selection heterogeneity, and
bounded association, which constrains its magnitude. These relaxed assumptions
deliver sharp partial identification bounds that nest point identification as a
special case and facilitate transparent sensitivity analysis. We illustrate the
framework in an empirical application, estimating the causal effect of private
health insurance on health outcomes.

arXiv link: http://arxiv.org/abs/2510.10946v1

Econometrics arXiv paper, submitted: 2025-10-12

Denoised IPW-Lasso for Heterogeneous Treatment Effect Estimation in Randomized Experiments

Authors: Mingqian Guan, Komei Fujita, Naoya Sueishi, Shota Yasui

This paper proposes a new method for estimating conditional average treatment
effects (CATE) in randomized experiments. We adopt inverse probability
weighting (IPW) for identification; however, IPW-transformed outcomes are known
to be noisy, even when true propensity scores are used. To address this issue,
we introduce a noise reduction procedure and estimate a linear CATE model using
Lasso, achieving both accuracy and interpretability. We theoretically show that
denoising reduces the prediction error of the Lasso. The method is particularly
effective when treatment effects are small relative to the variability of
outcomes, which is often the case in empirical applications. Applications to
the Get-Out-the-Vote dataset and Criteo Uplift Modeling dataset demonstrate
that our method outperforms fully nonparametric machine learning methods in
identifying individuals with higher treatment effects. Moreover, our method
uncovers informative heterogeneity patterns that are consistent with previous
empirical findings.

arXiv link: http://arxiv.org/abs/2510.10527v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2025-10-10

Ranking Policies Under Loss Aversion and Inequality Aversion

Authors: Martyna Kobus, Radosław Kurek, Thomas Parker

Strong empirical evidence from laboratory experiments, and more recently from
population surveys, shows that individuals, when evaluating their situations,
pay attention to whether they experience gains or losses, with losses weighing
more heavily than gains. The electorate's loss aversion, in turn, influences
politicians' choices. We propose a new framework for welfare analysis of policy
outcomes that, in addition to the traditional focus on post-policy incomes,
also accounts for individuals' gains and losses resulting from policies. We
develop several bivariate stochastic dominance criteria for ranking policy
outcomes that are sensitive to features of the joint distribution of
individuals' income changes and absolute incomes. The main social objective
assumes that individuals are loss averse with respect to income gains and
losses, inequality averse with respect to absolute incomes, and hold varying
preferences regarding the association between incomes and income changes. We
translate these and other preferences into functional inequalities that can be
tested using sample data. The concepts and methods are illustrated using data
from an income support experiment conducted in Connecticut.

arXiv link: http://arxiv.org/abs/2510.09590v1

Econometrics arXiv paper, submitted: 2025-10-10

Boundary estimation in the regression-discontinuity design: Evidence for a merit- and need-based financial aid program

Authors: Eugenio Felipe Merlano

In the conventional regression-discontinuity (RD) design, the probability
that units receive a treatment changes discontinuously as a function of one
covariate exceeding a threshold or cutoff point. This paper studies an extended
RD design where assignment rules simultaneously involve two or more continuous
covariates. We show that assignment rules with more than one variable allow the
estimation of a more comprehensive set of treatment effects, relaxing in a
research-driven style the local and sometimes limiting nature of univariate RD
designs. We then propose a flexible nonparametric approach to estimate the
multidimensional discontinuity by univariate local linear regression and
compare its performance to existing methods. We present an empirical
application to a large-scale and countrywide financial aid program for
low-income students in Colombia. The program uses a merit-based (academic
achievement) and need-based (wealth index) assignment rule to select students
for the program. We show that our estimation strategy fully exploits the
multidimensional assignment rule and reveals heterogeneous effects along the
treatment boundaries.

arXiv link: http://arxiv.org/abs/2510.09257v1

Econometrics arXiv paper, submitted: 2025-10-10

Flexibility without foresight: the predictive limitations of mixture models

Authors: Stephane Hess, Sander van Cranenburgh

Models allowing for random heterogeneity, such as mixed logit and latent
class, are generally observed to obtain superior model fit and yield detailed
insights into unobserved preference heterogeneity. Using theoretical arguments
and two case studies on revealed and stated choice data, this paper highlights
that these advantages do not translate into any benefits in forecasting,
whether looking at prediction performance or the recovery of market shares. The
only exception arises when using conditional distributions in making
predictions for the same individuals included in the estimation sample, which
obviously precludes any out-of-sample forecasting.

arXiv link: http://arxiv.org/abs/2510.09185v1

Econometrics arXiv paper, submitted: 2025-10-10

Sensitivity Analysis for Causal ML: A Use Case at Booking.com

Authors: Philipp Bach, Victor Chernozhukov, Carlos Cinelli, Lin Jia, Sven Klaassen, Nils Skotara, Martin Spindler

Causal Machine Learning has emerged as a powerful tool for flexibly
estimating causal effects from observational data in both industry and
academia. However, causal inference from observational data relies on
untestable assumptions about the data-generating process, such as the absence
of unobserved confounders. When these assumptions are violated, causal effect
estimates may become biased, undermining the validity of research findings. In
these contexts, sensitivity analysis plays a crucial role, by enabling data
scientists to assess the robustness of their findings to plausible violations
of unconfoundedness. This paper introduces sensitivity analysis and
demonstrates its practical relevance through a (simulated) data example based
on a use case at Booking.com. We focus our presentation on a recently proposed
method by Chernozhukov et al. (2023), which derives general non-parametric
bounds on biases due to omitted variables, and is fully compatible with (though
not limited to) modern inferential tools of Causal Machine Learning. By
presenting this use case, we aim to raise awareness of sensitivity analysis and
highlight its importance in real-world scenarios.

arXiv link: http://arxiv.org/abs/2510.09109v1

Econometrics arXiv paper, submitted: 2025-10-10

Sensitivity Analysis for Treatment Effects in Difference-in-Differences Models using Riesz Representation

Authors: Philipp Bach, Sven Klaassen, Jannis Kueck, Mara Mattes, Martin Spindler

Difference-in-differences (DiD) is one of the most popular approaches for
empirical research in economics, political science, and beyond. Identification
in these models is based on the conditional parallel trends assumption: In the
absence of treatment, the average outcome of the treated and untreated group
are assumed to evolve in parallel over time, conditional on pre-treatment
covariates. We introduce a novel approach to sensitivity analysis for DiD
models that assesses the robustness of DiD estimates to violations of this
assumption due to unobservable confounders, allowing researchers to
transparently assess and communicate the credibility of their causal estimation
results. Our method focuses on estimation by Double Machine Learning and
extends previous work on sensitivity analysis based on Riesz Representation in
cross-sectional settings. We establish asymptotic bounds for point estimates
and confidence intervals in the canonical $2\times2$ setting and group-time
causal parameters in settings with staggered treatment adoption. Our approach
makes it possible to relate the formulation of parallel trends violation to
empirical evidence from (1) pre-testing, (2) covariate benchmarking and (3)
standard reporting statistics and visualizations. We provide extensive
simulation experiments demonstrating the validity of our sensitivity approach
and diagnostics and apply our approach to two empirical applications.

arXiv link: http://arxiv.org/abs/2510.09064v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2025-10-09

Blackwell without Priors

Authors: Maxwell Rosenthal

This paper proposes a fully prior-free model of experimentation in which the
decision maker observes the entire distribution of signals generated by a known
experiment under an unknown distribution of the state of the world. One
experiment is robustly more informative than another if the decision maker's
maxmin expected utility after observing the output of the former is always at
least her maxmin expected utility after observing the latter. We show that this
ranking holds if and only if the less informative experiment is a linear
transformation of the more informative experiment; equivalently, the null space
of the more informative experiment is a subset of the null space of the less
informative experiment. Our criterion is implied by Blackwell's order but does
not imply it, and we show by example that our ranking admits strictly more
comparable pairs of experiments than the classical ranking.

arXiv link: http://arxiv.org/abs/2510.08709v2

Econometrics arXiv paper, submitted: 2025-10-09

Stochastic Volatility-in-mean VARs with Time-Varying Skewness

Authors: Leonardo N. Ferreira, Haroon Mumtaz, Ana Skoblar

This paper introduces a Bayesian vector autoregression (BVAR) with stochastic
volatility-in-mean and time-varying skewness. Unlike previous approaches, the
proposed model allows both volatility and skewness to directly affect
macroeconomic variables. We provide a Gibbs sampling algorithm for posterior
inference and apply the model to quarterly data for the US and the UK.
Empirical results show that skewness shocks have economically significant
effects on output, inflation and spreads, often exceeding the impact of
volatility shocks. In a pseudo-real-time forecasting exercise, the proposed
model outperforms existing alternatives in many cases. Moreover, the model
produces sharper measures of tail risk, revealing that standard stochastic
volatility models tend to overstate uncertainty. These findings highlight the
importance of incorporating time-varying skewness for capturing macro-financial
risks and improving forecast performance.

arXiv link: http://arxiv.org/abs/2510.08415v1

Econometrics arXiv paper, submitted: 2025-10-08

Beyond the Oracle Property: Adaptive LASSO in Cointegrating Regressions

Authors: Karsten Reichold, Ulrike Schneider

This paper establishes new asymptotic results for the adaptive LASSO
estimator in cointegrating regression models. We study model selection
probabilities, estimator consistency, and limiting distributions under both
standard and moving-parameter asymptotics. We also derive uniform convergence
rates and the fastest local-to-zero rates that can still be detected by the
estimator, complementing and extending the results of Lee, Shi, and Gao (2022,
Journal of Econometrics, 229, 322--349). Our main findings include that under
conservative tuning, the adaptive LASSO estimator is uniformly $T$-consistent
and the cut-off rate for local-to-zero coefficients that can be detected by the
procedure is $1/T$. Under consistent tuning, however, both rates are slower and
depend on the tuning parameter. The theoretical results are complemented by a
detailed simulation study showing that the finite-sample distribution of the
adaptive LASSO estimator deviates substantially from what is suggested by the
oracle property, whereas the limiting distributions derived under
moving-parameter asymptotics provide much more accurate approximations.
Finally, we show that our results also extend to models with local-to-unit-root
regressors and to predictive regressions with unit-root predictors.

arXiv link: http://arxiv.org/abs/2510.07204v1

Econometrics arXiv paper, submitted: 2025-10-08

Bayesian Portfolio Optimization by Predictive Synthesis

Authors: Masahiro Kato, Kentaro Baba, Hibiki Kaibuchi, Ryo Inokuchi

Portfolio optimization is a critical task in investment. Most existing
portfolio optimization methods require information on the distribution of
returns of the assets that make up the portfolio. However, such distribution
information is usually unknown to investors. Various methods have been proposed
to estimate distribution information, but their accuracy greatly depends on the
uncertainty of the financial markets. Due to this uncertainty, a model that
could well predict the distribution information at one point in time may
perform less accurately compared to another model at a different time. To solve
this problem, we investigate a method for portfolio optimization based on
Bayesian predictive synthesis (BPS), one of the Bayesian ensemble methods for
meta-learning. We assume that investors have access to multiple asset return
prediction models. By using BPS with dynamic linear models to combine these
predictions, we can obtain a Bayesian predictive posterior about the mean
rewards of assets that accommodate the uncertainty of the financial markets. In
this study, we examine how to construct mean-variance portfolios and
quantile-based portfolios based on the predicted distribution information.

arXiv link: http://arxiv.org/abs/2510.07180v1

Econometrics arXiv paper, submitted: 2025-10-07

Robust Inference for Convex Pairwise Difference Estimators

Authors: Matias D. Cattaneo, Michael Jansson, Kenichi Nagasawa

This paper develops distribution theory and bootstrap-based inference methods
for a broad class of convex pairwise difference estimators. These estimators
minimize a kernel-weighted convex-in-parameter function over observation pairs
that are similar in terms of certain covariates, where the similarity is
governed by a localization (bandwidth) parameter. While classical results
establish asymptotic normality under restrictive bandwidth conditions, we show
that valid Gaussian and bootstrap-based inference remains possible under
substantially weaker assumptions. First, we extend the theory of small
bandwidth asymptotics to convex pairwise estimation settings, deriving robust
Gaussian approximations even when a smaller than standard bandwidth is used.
Second, we employ a debiasing procedure based on generalized jackknifing to
enable inference with larger bandwidths, while preserving convexity of the
objective function. Third, we construct a novel bootstrap method that adjusts
for bandwidth-induced variance distortions, yielding valid inference across a
wide range of bandwidth choices. Our proposed inference method enjoys
demonstrable more robustness, while retaining the practical appeal of convex
pairwise difference estimators.

arXiv link: http://arxiv.org/abs/2510.05991v1

Econometrics arXiv paper, submitted: 2025-10-07

Assessing the Effects of Monetary Shocks on Macroeconomic Stars: A SMUC-IV Framework

Authors: Bowen Fu, Chenghan Hou, Jan Prüser

This paper proposes a structural multivariate unobserved components model
with external instrument (SMUC-IV) to investigate the effects of monetary
policy shocks on key U.S. macroeconomic "stars"-namely, the level of potential
output, the growth rate of potential output, trend inflation, and the neutral
interest rate. A key feature of our approach is the use of an external
instrument to identify monetary policy shocks within the multivariate unob-
served components modeling framework. We develop an MCMC estimation method to
facilitate posterior inference within our proposed SMUC-IV frame- work. In
addition, we propose an marginal likelihood estimator to enable model
comparison across alternative specifications. Our empirical analysis shows that
contractionary monetary policy shocks have significant negative effects on the
macroeconomic stars, highlighting the nonzero long-run effects of transitory
monetary policy shocks.

arXiv link: http://arxiv.org/abs/2510.05802v1

Econometrics arXiv paper, submitted: 2025-10-07

Correcting sample selection bias with categorical outcomes

Authors: Onil Boussim

In this paper, we propose a method for correcting sample selection bias when
the outcome of interest is categorical, such as occupational choice, health
status, or field of study. Classical approaches to sample selection rely on
strong parametric distributional assumptions, which may be restrictive in
practice. While the recent framework of Chernozhukov et al. (2023) offers a
nonparametric identification using a local Gaussian representation (LGR) that
holds for any bivariate joint distributions. This makes this approach limited
to ordered discrete outcomes. We therefore extend it by developing a local
representation that applies to joint probabilities, thereby eliminating the
need to impose an artificial ordering on categories. Our representation
decomposes each joint probability into marginal probabilities and a
category-specific association parameter that captures how selection
differentially affects each outcome. Under exclusion restrictions analogous to
those in the LGR model, we establish nonparametric point identification of the
latent categorical distribution. Building on this identification result, we
introduce a semiparametric multinomial logit model with sample selection,
propose a computationally tractable two-step estimator, and derive its
asymptotic properties. This framework significantly broadens the set of tools
available for analyzing selection in categorical and other discrete outcomes,
offering substantial relevance for empirical work across economics, health
sciences, and social sciences.

arXiv link: http://arxiv.org/abs/2510.05551v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-10-07

Can language models boost the power of randomized experiments without statistical bias?

Authors: Xinrui Ruan, Xinwei Ma, Yingfei Wang, Waverly Wei, Jingshen Wang

Randomized experiments or randomized controlled trials (RCTs) are gold
standards for causal inference, yet cost and sample-size constraints limit
power. Meanwhile, modern RCTs routinely collect rich, unstructured data that
are highly prognostic of outcomes but rarely used in causal analyses. We
introduce CALM (Causal Analysis leveraging Language Models), a statistical
framework that integrates large language models (LLMs) predictions with
established causal estimators to increase precision while preserving
statistical validity. CALM treats LLM outputs as auxiliary prognostic
information and corrects their potential bias via a heterogeneous calibration
step that residualizes and optimally reweights predictions. We prove that CALM
remains consistent even when LLM predictions are biased and achieves efficiency
gains over augmented inverse probability weighting estimators for various
causal effects. In particular, CALM develops a few-shot variant that aggregates
predictions across randomly sampled demonstration sets. The resulting
U-statistic-like predictor restores i.i.d. structure and also mitigates
prompt-selection variability. Empirically, in simulations calibrated to a
mobile-app depression RCT, CALM delivers lower variance relative to other
benchmarking methods, is effective in zero- and few-shot settings, and remains
stable across prompt designs. By principled use of LLMs to harness unstructured
data and external knowledge learned during pretraining, CALM provides a
practical path to more precise causal analyses in RCTs.

arXiv link: http://arxiv.org/abs/2510.05545v1

Econometrics arXiv paper, submitted: 2025-10-06

Estimating Treatment Effects Under Bounded Heterogeneity

Authors: Soonwoo Kwon, Liyang Sun

Researchers often use specifications that correctly estimate the average
treatment effect under the assumption of constant effects. When treatment
effects are heterogeneous, however, such specifications generally fail to
recover this average effect. Augmenting these specifications with interaction
terms between demeaned covariates and treatment eliminates this bias, but often
leads to imprecise estimates and becomes infeasible under limited overlap. We
propose a generalized ridge regression estimator, $regulaTE$, that
penalizes the coefficients on the interaction terms to achieve an optimal
trade-off between worst-case bias and variance in estimating the average effect
under limited treatment effect heterogeneity. Building on this estimator, we
construct confidence intervals that remain valid under limited overlap and can
also be used to assess sensitivity to violations of the constant effects
assumption. We illustrate the method in empirical applications under
unconfoundedness and staggered adoption, providing a practical approach to
inference under limited overlap.

arXiv link: http://arxiv.org/abs/2510.05454v1

Econometrics arXiv paper, submitted: 2025-10-06

Risk-Adjusted Policy Learning and the Social Cost of Uncertainty: Theory and Evidence from CAP evaluation

Authors: Giovanni Cerulli, Francesco Caracciolo

This paper develops a risk-adjusted alternative to standard optimal policy
learning (OPL) for observational data by importing Roy's (1952) safety-first
principle into the treatment assignment problem. We formalize a welfare
functional that maximizes the probability that outcomes exceed a socially
required threshold and show that the associated pointwise optimal rule ranks
treatments by the ratio of conditional means to conditional standard
deviations. We implement the framework using microdata from the Italian Farm
Accountancy Data Network to evaluate the allocation of subsidies under the EU
Common Agricultural Policy. Empirically, risk-adjusted optimal policies
systematically dominate the realized allocation across specifications, while
risk aversion lowers overall welfare relative to the risk-neutral benchmark,
making transparent the social cost of insurance against uncertainty. The
results illustrate how safety-first OPL provides an implementable,
interpretable tool for risk-sensitive policy design, quantifying the
efficiency-insurance trade-off that policymakers face when outcomes are
volatile.

arXiv link: http://arxiv.org/abs/2510.05007v1

Econometrics arXiv updated paper (originally submitted: 2025-10-06)

Identification in Auctions with Truncated Transaction Prices

Authors: Tonghui Qi

I establish nonparametric identification results in first- and second-price
auctions when transaction prices are truncated by a binding reserve price under
a range of information structures. When the number of potential bidders is
fixed and known across all auctions, if only the transaction price is observed,
the bidders' private-value distribution is identified in second-price auctions
but not in first-price auctions. Identification in first-price auctions can be
achieved if either the number of active bidders or the number of auctions with
no sales is observed. When the number of potential bidders varies across
auctions and is unknown, the bidders' private-value distribution is identified
in first-price auctions but not in second-price auctions, provided that both
the transaction price and the number of active bidders are observed. I derive
analogous results to auctions with entry costs, which face a similar truncation
issue when data on potential bidders who do not enter are missing.

arXiv link: http://arxiv.org/abs/2510.04464v2

Econometrics arXiv paper, submitted: 2025-10-03

Forecasting Inflation Based on Hybrid Integration of the Riemann Zeta Function and the FPAS Model (FPAS + $ζ$): Cyclical Flexibility, Socio-Economic Challenges and Shocks, and Comparative Analysis of Models

Authors: Davit Gondauri

Inflation forecasting is a core socio-economic challenge in modern
macroeconomic modeling, especially when cyclical, structural, and shock factors
act simultaneously. Traditional systems such as FPAS and ARIMA often struggle
with cyclical asymmetry and unexpected fluctuations. This study proposes a
hybrid framework (FPAS + $\zeta$) that integrates a structural macro model
(FPAS) with cyclical components derived from the Riemann zeta function
$\zeta(1/2 + i t)$. Using Georgia's macro data (2005-2024), a nonlinear
argument $t$ is constructed from core variables (e.g., GDP, M3, policy rate),
and the hybrid forecast is calibrated by minimizing RMSE via a modulation
coefficient $\alpha$. Fourier-based spectral analysis and a Hidden Markov Model
(HMM) are employed for cycle/phase identification, and a multi-criteria
AHP-TOPSIS scheme compares FPAS, FPAS + $\zeta$, and ARIMA. Results show lower
RMSE and superior cyclical responsiveness for FPAS + $\zeta$, along with
early-warning capability for shocks and regime shifts, indicating practical
value for policy institutions.

arXiv link: http://arxiv.org/abs/2510.02966v1

Econometrics arXiv paper, submitted: 2025-10-03

Repeated Matching Games: An Empirical Framework

Authors: Pauline Corblet, Jeremy Fox, Alfred Galichon

We introduce a model of dynamic matching with transferable utility, extending
the static model of Shapley and Shubik (1971). Forward-looking agents have
individual states that evolve with current matches. Each period, a matching
market with market-clearing prices takes place. We prove the existence of an
equilibrium with time-varying distributions of agent types and show it is the
solution to a social planner's problem. We also prove that a stationary
equilibrium exists. We introduce econometric shocks to account for unobserved
heterogeneity in match formation. We propose two algorithms to compute a
stationary equilibrium. We adapt both algorithms for estimation. We estimate a
model of accumulation of job-specific human capital using data on Swedish
engineers.

arXiv link: http://arxiv.org/abs/2510.02737v1

Econometrics arXiv paper, submitted: 2025-10-02

"Post" Pre-Analysis Plans: Valid Inference for Non-Preregistered Specifications

Authors: Reca Sarfati, Vod Vilfort

Pre-analysis plans (PAPs) have become standard in experimental economics
research, but it is nevertheless common to see researchers deviating from their
PAPs to supplement preregistered estimates with non-prespecified findings.
While such ex-post analysis can yield valuable insights, there is broad
uncertainty over how to interpret -- or whether to even acknowledge --
non-preregistered results. In this paper, we consider the case of a
truth-seeking researcher who, after seeing the data, earnestly wishes to report
additional estimates alongside those preregistered in their PAP. We show that,
even absent "nefarious" behavior, conventional confidence intervals and point
estimators are invalid due to the fact that non-preregistered estimates are
only reported in a subset of potential data realizations. We propose inference
procedures that account for this conditional reporting. We apply these
procedures to Bessone et al. (2021), which studies the economic effects of
increased sleep among the urban poor. We demonstrate that, depending on the
reason for deviating, the adjustments from our procedures can range from having
no difference to an economically significant difference relative to
conventional practice. Finally, we consider the robustness of our procedure to
certain forms of misspecification, motivating possible heuristic checks and
norms for journals to adopt.

arXiv link: http://arxiv.org/abs/2510.02507v1

Econometrics arXiv paper, submitted: 2025-10-02

Cautions on Tail Index Regressions

Authors: Thomas T. Yang

We revisit tail-index regressions. For linear specifications, we find that
the usual full-rank condition can fail because conditioning on extreme outcomes
causes regressors to degenerate to constants. More generally, the conditional
distribution of the covariates in the tails concentrates on the values at which
the tail index is minimized. Away from those points, the conditional density
tends to zero. For local nonparametric tail index regression, the convergence
rate can be very slow. We conclude with practical suggestions for applied work.

arXiv link: http://arxiv.org/abs/2510.01535v1

Econometrics arXiv paper, submitted: 2025-10-01

Generalized Bayes in Conditional Moment Restriction Models

Authors: Sid Kankanala

This paper develops a generalized (quasi-) Bayes framework for conditional
moment restriction models, where the parameter of interest is a nonparametric
structural function of endogenous variables. We establish contraction rates for
a class of Gaussian process priors and provide conditions under which a
Bernstein-von Mises theorem holds for the quasi-Bayes posterior. Consequently,
we show that optimally weighted quasi-Bayes credible sets achieve exact
asymptotic frequentist coverage, extending classical results for parametric GMM
models. As an application, we estimate firm-level production functions using
Chilean plant-level data. Simulations illustrate the favorable performance of
generalized Bayes estimators relative to common alternatives.

arXiv link: http://arxiv.org/abs/2510.01036v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-10-01

An alternative bootstrap procedure for factor-augmented regression models

Authors: Peiyun Jiang, Takashi Yamagata

In this paper, we propose a novel bootstrap algorithm that is more efficient
than existing methods for approximating the distribution of the
factor-augmented regression estimator for a rotated parameter vector. The
regression is augmented by $r$ factors extracted from a large panel of $N$
variables observed over $T$ time periods. We consider general weak factor (WF)
models with $r$ signal eigenvalues that may diverge at different rates,
$N^{\alpha _{k}}$, where $0<\alpha _{k}\leq 1$ for $k=1,2,...,r$. We establish
the asymptotic validity of our bootstrap method using not only the conventional
data-dependent rotation matrix $\bH$, but also an alternative
data-dependent rotation matrix, $\bH_q$, which typically exhibits smaller
asymptotic bias and achieves a faster convergence rate. Furthermore, we
demonstrate the asymptotic validity of the bootstrap under a purely
signal-dependent rotation matrix ${\bH}$, which is unique and can be regarded
as the population analogue of both $\bH$ and $\bH_q$. Experimental
results provide compelling evidence that the proposed bootstrap procedure
achieves superior performance relative to the existing procedure.

arXiv link: http://arxiv.org/abs/2510.00947v1

Econometrics arXiv updated paper (originally submitted: 2025-10-01)

A Unified Framework for Spatial and Temporal Treatment Effect Boundaries: Theory and Identification

Authors: Tatsuru Kikuchi

This paper develops a unified theoretical framework for detecting and
estimating boundaries in treatment effects across both spatial and temporal
dimensions. We formalize the concept of treatment effect boundaries as
structural parameters characterizing regime transitions where causal effects
cease to operate. Building on reaction-diffusion models of information
propagation, we establish conditions under which spatial and temporal
boundaries share common dynamics governed by diffusion parameters (delta,
lambda), yielding the testable prediction d^*/tau^* = 3.32 lambda sqrt{delta}
for standard detection thresholds. We derive formal identification results
under staggered treatment adoption and develop a three-stage estimation
procedure implementable with standard panel data. Monte Carlo simulations
demonstrate excellent finite-sample performance, with boundary estimates
achieving RMSE below 10% in realistic configurations. We apply the framework to
two empirical settings: EU broadband diffusion (2006-2021) and US wildfire
economic impacts (2017-2022). The broadband application reveals a scope
limitation -- our framework assumes depreciation dynamics and fails when
effects exhibit increasing returns through network externalities. The wildfire
application provides strong validation: estimated boundaries satisfy d^* = 198
km and tau^* = 2.7 years, with the empirical ratio (72.5) exactly matching the
theoretical prediction 3.32 lambda sqrt{delta} = 72.5. The framework provides
practical tools for detecting when localized treatments become systemic and
identifying critical thresholds for policy intervention.

arXiv link: http://arxiv.org/abs/2510.00754v2

Econometrics arXiv paper, submitted: 2025-09-30

Persuasion Effects in Regression Discontinuity Designs

Authors: Sung Jae Jun, Sokbae Lee

We develop a framework for identifying and estimating persuasion effects in
regression discontinuity (RD) designs. The RD persuasion rate measures the
probability that individuals at the threshold would take the action if exposed
to a persuasive message, given that they would not take the action without
exposure. We present identification results for both sharp and fuzzy RD
designs, derive sharp bounds under various data scenarios, and extend the
analysis to local compliers. Estimation and inference rely on local polynomial
regression, enabling straightforward implementation with standard RD tools.
Applications to public health and media illustrate its empirical relevance.

arXiv link: http://arxiv.org/abs/2509.26517v1

Econometrics arXiv paper, submitted: 2025-09-30

Triadic Network Formation

Authors: Chris Muris, Cavit Pakel

We study estimation and inference for triadic link formation with dyad-level
fixed effects in a nonlinear binary choice logit framework. Dyad-level effects
provide a richer and more realistic representation of heterogeneity across
pairs of dimensions (e.g. importer-exporter, importer-product,
exporter-product), yet their sheer number creates a severe incidental parameter
problem. We propose a novel “hexad logit” estimator and establish its
consistency and asymptotic normality. Identification is achieved through a
conditional likelihood approach that eliminates the fixed effects by
conditioning on sufficient statistics, in the form of hexads -- wirings that
involve two nodes from each part of the network. Our central finding is that
dyad-level heterogeneity fundamentally changes how information accumulates.
Unlike under node-level heterogeneity, where informative wirings automatically
grow with link formation, under dyad-level heterogeneity the network may
generate infinitely many links yet asymptotically zero informative wirings. We
derive explicit sparsity thresholds that determine when consistency holds and
when asymptotic normality is attainable. These results have important practical
implications, as they reveal that there is a limit to how granular or
disaggregate a dataset one can employ under dyad-level heterogeneity.

arXiv link: http://arxiv.org/abs/2509.26420v1

Econometrics arXiv paper, submitted: 2025-09-30

Joint Inference for the Regression Discontinuity Effect and Its External Validity

Authors: Yuta Okamoto

The external validity of regression discontinuity (RD) designs is essential
for informing policy and remains an active research area in econometrics and
statistics. However, we document that only a limited number of empirical
studies explicitly address the external validity of standard RD effects. To
advance empirical practice, we propose a simple joint inference procedure for
the RD effect and its local external validity, building on Calonico, Cattaneo,
and Titiunik (2014, Econometrica) and Dong and Lewbel (2015, Review of
Economics and Statistics). We further introduce a locally linear treatment
effects assumption, which enhances the interpretability of the treatment effect
derivative proposed by Dong and Lewbel. Under this assumption, we establish
identification and derive a uniform confidence band for the extrapolated
treatment effects. Our approaches require no additional covariates or design
features, making them applicable to virtually all RD settings and thereby
enhancing the policy relevance of many empirical RD studies. The usefulness of
the method is demonstrated through an empirical application, highlighting its
complementarity to existing approaches.

arXiv link: http://arxiv.org/abs/2509.26380v1

Econometrics arXiv paper, submitted: 2025-09-30

Leveraging LLMs to Improve Experimental Design: A Generative Stratification Approach

Authors: George Gui, Seungwoo Kim

Pre-experiment stratification, or blocking, is a well-established technique
for designing more efficient experiments and increasing the precision of the
experimental estimates. However, when researchers have access to many
covariates at the experiment design stage, they often face challenges in
effectively selecting or weighting covariates when creating their strata. This
paper proposes a Generative Stratification procedure that leverages Large
Language Models (LLMs) to synthesize high-dimensional covariate data to improve
experimental design. We demonstrate the value of this approach by applying it
to a set of experiments and find that our method would have reduced the
variance of the treatment effect estimate by 10%-50% compared to simple
randomization in our empirical applications. When combined with other standard
stratification methods, it can be used to further improve the efficiency. Our
results demonstrate that LLM-based simulation is a practical and
easy-to-implement way to improve experimental design in covariate-rich
settings.

arXiv link: http://arxiv.org/abs/2509.25709v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-09-29

Efficient Difference-in-Differences Estimation when Outcomes are Missing at Random

Authors: Lorenzo Testa, Edward H. Kennedy, Matthew Reimherr

The Difference-in-Differences (DiD) method is a fundamental tool for causal
inference, yet its application is often complicated by missing data. Although
recent work has developed robust DiD estimators for complex settings like
staggered treatment adoption, these methods typically assume complete data and
fail to address the critical challenge of outcomes that are missing at random
(MAR) -- a common problem that invalidates standard estimators. We develop a
rigorous framework, rooted in semiparametric theory, for identifying and
efficiently estimating the Average Treatment Effect on the Treated (ATT) when
either pre- or post-treatment (or both) outcomes are missing at random. We
first establish nonparametric identification of the ATT under two minimal sets
of sufficient conditions. For each, we derive the semiparametric efficiency
bound, which provides a formal benchmark for asymptotic optimality. We then
propose novel estimators that are asymptotically efficient, achieving this
theoretical bound. A key feature of our estimators is their multiple
robustness, which ensures consistency even if some nuisance function models are
misspecified. We validate the properties of our estimators and showcase their
broad applicability through an extensive simulation study.

arXiv link: http://arxiv.org/abs/2509.25009v1

Econometrics arXiv paper, submitted: 2025-09-29

Nowcasting and aggregation: Why small Euro area countries matter

Authors: Andrii Babii, Luca Barbaglia, Eric Ghysels, Jonas Striaukas

The paper studies the nowcasting of Euro area Gross Domestic Product (GDP)
growth using mixed data sampling machine learning panel data regressions with
both standard macro releases and daily news data. Using a panel of 19 Euro area
countries, we investigate whether directly nowcasting the Euro area aggregate
is better than weighted individual country nowcasts. Our results highlight the
importance of the information from small- and medium-sized countries,
particularly when including the COVID-19 pandemic period. The empirical
analysis is supplemented by studying the so-called Big Four -- France, Germany,
Italy, and Spain -- and the value added of news data when official statistics
are lagging. From a theoretical perspective, we formally show that the
aggregation of individual components forecasted with pooled panel data
regressions is superior to direct aggregate forecasting due to lower estimation
error.

arXiv link: http://arxiv.org/abs/2509.24780v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-09-29

Robust Semiparametric Inference for Bayesian Additive Regression Trees

Authors: Christoph Breunig, Ruixuan Liu, Zhengfei Yu

We develop a semiparametric framework for inference on the mean response in
missing-data settings using a corrected posterior distribution. Our approach is
tailored to Bayesian Additive Regression Trees (BART), which is a powerful
predictive method but whose nonsmoothness complicate asymptotic theory with
multi-dimensional covariates. When using BART combined with Bayesian bootstrap
weights, we establish a new Bernstein-von Mises theorem and show that the limit
distribution generally contains a bias term. To address this, we introduce
RoBART, a posterior bias-correction that robustifies BART for valid inference
on the mean response. Monte Carlo studies support our theory, demonstrating
reduced bias and improved coverage relative to existing procedures using BART.

arXiv link: http://arxiv.org/abs/2509.24634v1

Econometrics arXiv updated paper (originally submitted: 2025-09-27)

Automatic Order, Bandwidth Selection and Flaws of Eigen Adjustment in HAC Estimation

Authors: Zhuoxun Li, Clifford M. Hurvich

In this paper, we propose a new heteroskedasticity and autocorrelation
consistent covariance matrix estimator based on the prewhitened kernel
estimator and a localized leave-one-out frequency domain cross-validation
(FDCV). We adapt the cross-validated log likelihood (CVLL) function to
simultaneously select the order of the prewhitening vector autoregression (VAR)
and the bandwidth. The prewhitening VAR is estimated by the Burg method without
eigen adjustment as we find the eigen adjustment rule of Andrews and Monahan
(1992) can be triggered unnecessarily and harmfully when regressors have
nonzero mean. Through Monte Carlo simulations and three empirical examples, we
illustrate the flaws of eigen adjustment and the reliability of our method.

arXiv link: http://arxiv.org/abs/2509.23256v2

Econometrics arXiv paper, submitted: 2025-09-27

Nonparametric and Semiparametric Estimation of Upward Rank Mobility Curves

Authors: Tsung-Chih Lai, Jia-Han Shih, Yi-Hau Chen

We introduce the upward rank mobility curve as a new measure of
intergenerational mobility that captures upward movements across the entire
parental income distribution. Our approach extends Bhattacharya and Mazumder
(2011) by conditioning on a single parental income rank, thereby eliminating
aggregation bias. We show that the measure can be characterized solely by the
copula of parent and child income, and we propose a nonparametric copula-based
estimator with better properties than kernel-based alternatives. For a
conditional version of the measure without such a representation, we develop a
two-step semiparametric estimator based on distribution regression and
establish its asymptotic properties. An application to U.S. data reveals that
whites exhibit significant upward mobility dominance over blacks among
lower-middle-income families.

arXiv link: http://arxiv.org/abs/2509.23174v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-09-26

Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression

Authors: Haodong Liang, Yanhao Jin, Krishnakumar Balasubramanian, Lifeng Lai

We study instrumental variable regression (IVaR) under differential privacy
constraints. Classical IVaR methods (like two-stage least squares regression)
rely on solving moment equations that directly use sensitive covariates and
instruments, creating significant risks of privacy leakage and posing
challenges in designing algorithms that are both statistically efficient and
differentially private. We propose a noisy two-state gradient descent algorithm
that ensures $\rho$-zero-concentrated differential privacy by injecting
carefully calibrated noise into the gradient updates. Our analysis establishes
finite-sample convergence rates for the proposed method, showing that the
algorithm achieves consistency while preserving privacy. In particular, we
derive precise bounds quantifying the trade-off among privacy parameters,
sample size, and iteration-complexity. To the best of our knowledge, this is
the first work to provide both privacy guarantees and provable convergence
rates for instrumental variable regression in linear models. We further
validate our theoretical findings with experiments on both synthetic and real
datasets, demonstrating that our method offers practical accuracy-privacy
trade-offs.

arXiv link: http://arxiv.org/abs/2509.22794v1

Econometrics arXiv paper, submitted: 2025-09-26

Direct Bias-Correction Term Estimation for Propensity Scores and Average Treatment Effect Estimation

Authors: Masahiro Kato

This study considers the estimation of the average treatment effect (ATE).
For ATE estimation, we estimate the propensity score through direct
bias-correction term estimation. Let $\{(X_i, D_i, Y_i)\}_{i=1}^{n}$ be the
observations, where $X_i \in R^p$ denotes $p$-dimensional covariates,
$D_i \in \{0, 1\}$ denotes a binary treatment assignment indicator, and $Y_i
\in R$ is an outcome. In ATE estimation, the bias-correction term
$h_0(X_i, D_i) = 1[D_i = 1]{e_0(X_i)} - 1[D_i = 0]{1 - e_0(X_i)}$
plays an important role, where $e_0(X_i)$ is the propensity score, the
probability of being assigned treatment $1$. In this study, we propose
estimating $h_0$ (or equivalently the propensity score $e_0$) by directly
minimizing the prediction error of $h_0$. Since the bias-correction term $h_0$
is essential for ATE estimation, this direct approach is expected to improve
estimation accuracy for the ATE. For example, existing studies often employ
maximum likelihood or covariate balancing to estimate $e_0$, but these
approaches may not be optimal for accurately estimating $h_0$ or the ATE. We
present a general framework for this direct bias-correction term estimation
approach from the perspective of Bregman divergence minimization and conduct
simulation studies to evaluate the effectiveness of the proposed method.

arXiv link: http://arxiv.org/abs/2509.22122v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-09-25

Inverse Reinforcement Learning Using Just Classification and a Few Regressions

Authors: Lars van der Laan, Nathan Kallus, Aurélien Bibaut

Inverse reinforcement learning (IRL) aims to explain observed behavior by
uncovering an underlying reward. In the maximum-entropy or
Gumbel-shocks-to-reward frameworks, this amounts to fitting a reward function
and a soft value function that together satisfy the soft Bellman consistency
condition and maximize the likelihood of observed actions. While this
perspective has had enormous impact in imitation learning for robotics and
understanding dynamic choices in economics, practical learning algorithms often
involve delicate inner-loop optimization, repeated dynamic programming, or
adversarial training, all of which complicate the use of modern, highly
expressive function approximators like neural nets and boosting. We revisit
softmax IRL and show that the population maximum-likelihood solution is
characterized by a linear fixed-point equation involving the behavior policy.
This observation reduces IRL to two off-the-shelf supervised learning problems:
probabilistic classification to estimate the behavior policy, and iterative
regression to solve the fixed point. The resulting method is simple and modular
across function approximation classes and algorithms. We provide a precise
characterization of the optimal solution, a generic oracle-based algorithm,
finite-sample error bounds, and empirical results showing competitive or
superior performance to MaxEnt IRL.

arXiv link: http://arxiv.org/abs/2509.21172v1

Econometrics arXiv paper, submitted: 2025-09-25

Overidentification testing with weak instruments and heteroskedasticity

Authors: Stuart Lane, Frank Windmeijer

Exogeneity is key for IV estimators, which can assessed via
overidentification (OID) tests. We discuss the Kleibergen-Paap (KP) rank test
as a heteroskedasticity-robust OID test and compare to the typical J-test. We
derive the heteroskedastic weak-instrument limiting distributions for J and KP
as special cases of the robust score test estimated via 2SLS and LIML
respectively. Monte Carlo simulations show that KP usually performs better than
J, which is prone to severe size distortions. Test size depends on model
parameters not consistently estimable with weak instruments, so a conservative
approach is recommended. This generalises recommendations to use LIML-based OID
tests under homoskedasticity. We then revisit the classic problem of estimating
the elasticity of intertemporal substitution (EIS) in lifecycle consumption
models. Lagged macroeconomic indicators should provide naturally valid but
frequently weak instruments. The literature provides a wide range of estimates
for this parameter, and J frequently rejects the null of valid instruments. J
often rejects the null whereas KP does not; we suggest that J over-rejects,
sometimes severely. We argue that KP-test should be used over the J-test. We
also argue that instrument invalidity/misspecification is unlikely the cause of
the range of EIS estimates in the literature.

arXiv link: http://arxiv.org/abs/2509.21096v1

Econometrics arXiv paper, submitted: 2025-09-25

Recidivism and Peer Influence with LLM Text Embeddings in Low Security Correctional Facilities

Authors: Shanjukta Nath, Jiwon Hong, Jae Ho Chang, Keith Warren, Subhadeep Paul

We find AI embeddings obtained using a pre-trained transformer-based Large
Language Model (LLM) of 80,000-120,000 written affirmations and correction
exchanges among residents in low-security correctional facilities to be highly
predictive of recidivism. The prediction accuracy is 30% higher with embedding
vectors than with only pre-entry covariates. However, since the text embedding
vectors are high-dimensional, we perform Zero-Shot classification of these
texts to a low-dimensional vector of user-defined classes to aid interpretation
while retaining the predictive power. To shed light on the social dynamics
inside the correctional facilities, we estimate peer effects in these
LLM-generated numerical representations of language with a multivariate peer
effect model, adjusting for network endogeneity. We develop new methodology and
theory for peer effect estimation that accommodate sparse networks,
multivariate latent variables, and correlated multivariate outcomes. With these
new methods, we find significant peer effects in language usage for interaction
and feedback.

arXiv link: http://arxiv.org/abs/2509.20634v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-09-24

Identification and Semiparametric Estimation of Conditional Means from Aggregate Data

Authors: Cory McCartan, Shiro Kuriwaki

We introduce a new method for estimating the mean of an outcome variable
within groups when researchers only observe the average of the outcome and
group indicators across a set of aggregation units, such as geographical areas.
Existing methods for this problem, also known as ecological inference,
implicitly make strong assumptions about the aggregation process. We first
formalize weaker conditions for identification, which motivates estimators that
can efficiently control for many covariates. We propose a debiased machine
learning estimator that is based on nuisance functions restricted to a
partially linear form. Our estimator also admits a semiparametric sensitivity
analysis for violations of the key identifying assumption, as well as
asymptotically valid confidence intervals for local, unit-level estimates under
additional assumptions. Simulations and validation on real-world data where
ground truth is available demonstrate the advantages of our approach over
existing methods. Open-source software is available which implements the
proposed methods.

arXiv link: http://arxiv.org/abs/2509.20194v1

Econometrics arXiv paper, submitted: 2025-09-24

Identification and Estimation of Seller Risk Aversion in Ascending Auctions

Authors: Nathalie Gimenes, Tonghui Qi, Sorawoot Srisuma

How sellers choose reserve prices is central to auction theory, and the
optimal reserve price depends on the seller's risk attitude. Numerous studies
have found that observed reserve prices lie below the optimal level implied by
risk-neutral sellers, while the theoretical literature suggests that
risk-averse sellers can rationalize these empirical findings. In this paper, we
develop an econometric model of ascending auctions with a risk-averse seller
under independent private values. We provide primitive conditions for the
identification of the Arrow-Pratt measures of risk aversion and an estimator
for these measures that is consistent and converges in distribution to a normal
distribution at the parametric rate under standard regularity conditions. A
Monte Carlo study demonstrates good finite-sample performance of the estimator,
and we illustrate the approach using data from foreclosure real estate auctions
in S\ {a}o Paulo.

arXiv link: http://arxiv.org/abs/2509.19945v1

Econometrics arXiv paper, submitted: 2025-09-24

Decomposing Co-Movements in Matrix-Valued Time Series: A Pseudo-Structural Reduced-Rank Approach

Authors: Alain Hecq, Ivan Ricardo, Ines Wilms

We propose a pseudo-structural framework for analyzing contemporaneous
co-movements in reduced-rank matrix autoregressive (RRMAR) models. Unlike
conventional vector-autoregressive (VAR) models that would discard the matrix
structure, our formulation preserves it, enabling a decomposition of
co-movements into three interpretable components: row-specific,
column-specific, and joint (row-column) interactions across the matrix-valued
time series. Our estimator admits standard asymptotic inference and we propose
a BIC-type criterion for the joint selection of the reduced ranks and the
autoregressive lag order. We validate the method's finite-sample performance in
terms of estimation accuracy, coverage and rank selection in simulation
experiments, including cases of rank misspecification. We illustrate the
method's practical usefelness in identifying co-movement structures in two
empirical applications: U.S. state-level coincident and leading indicators, and
cross-country macroeconomic indicators.

arXiv link: http://arxiv.org/abs/2509.19911v1

Econometrics arXiv paper, submitted: 2025-09-23

Driver Identification and PCA Augmented Selection Shrinkage Framework for Nordic System Price Forecasting

Authors: Yousef Adeli Sadabad, Mohammad Reza Hesamzadeh, Gyorgy Dan, Matin Bagherpour, Darryl R. Biggar

The System Price (SP) of the Nordic electricity market serves as a key
reference for financial hedge contracts such as Electricity Price Area
Differentials (EPADs) and other risk management instruments. Therefore, the
identification of drivers and the accurate forecasting of SP are essential for
market participants to design effective hedging strategies. This paper develops
a systematic framework that combines interpretable drivers analysis with robust
forecasting methods. It proposes an interpretable feature engineering algorithm
to identify the main drivers of the Nordic SP based on a novel combination of
K-means clustering, Multiple Seasonal-Trend Decomposition (MSTD), and Seasonal
Autoregressive Integrated Moving Average (SARIMA) model. Then, it applies
principal component analysis (PCA) to the identified data matrix, which is
adapted to the downstream task of price forecasting to mitigate the issue of
imperfect multicollinearity in the data. Finally, we propose a multi-forecast
selection-shrinkage algorithm for Nordic SP forecasting, which selects a subset
of complementary forecast models based on their bias-variance tradeoff at the
ensemble level and then computes the optimal weights for the retained forecast
models to minimize the error variance of the combined forecast. Using
historical data from the Nordic electricity market, we demonstrate that the
proposed approach outperforms individual input models uniformly, robustly, and
significantly, while maintaining a comparable computational cost. Notably, our
systematic framework produces superior results using simple input models,
outperforming the state-of-the-art Temporal Fusion Transformer (TFT).
Furthermore, we show that our approach also exceeds the performance of several
well-established practical forecast combination methods.

arXiv link: http://arxiv.org/abs/2509.18887v1

Econometrics arXiv paper, submitted: 2025-09-23

Optimal estimation for regression discontinuity design with binary outcomes

Authors: Takuya Ishihara, Masayuki Sawada, Kohei Yata

We develop a finite-sample optimal estimator for regression discontinuity
designs when the outcomes are bounded, including binary outcomes as the leading
case. Our finite-sample optimal estimator achieves the exact minimax mean
squared error among linear shrinkage estimators with nonnegative weights when
the regression function of a bounded outcome lies in a Lipschitz class.
Although the original minimax problem involves an iterating (n+1)-dimensional
non-convex optimization problem where n is the sample size, we show that our
estimator is obtained by solving a convex optimization problem. A key advantage
of our estimator is that the Lipschitz constant is the only tuning parameter.
We also propose a uniformly valid inference procedure without a large-sample
approximation. In a simulation exercise for small samples, our estimator
exhibits smaller mean squared errors and shorter confidence intervals than
conventional large-sample techniques which may be unreliable when the effective
sample size is small. We apply our method to an empirical multi-cutoff design
where the sample size for each cutoff is small. In the application, our method
yields informative confidence intervals, in contrast to the leading
large-sample approach.

arXiv link: http://arxiv.org/abs/2509.18857v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2025-09-23

Filtering amplitude dependence of correlation dynamics in complex systems: application to the cryptocurrency market

Authors: Marcin Wątorek, Marija Bezbradica, Martin Crane, Jarosław Kwapień, Stanisław Drożdż

Based on the cryptocurrency market dynamics, this study presents a general
methodology for analyzing evolving correlation structures in complex systems
using the $q$-dependent detrended cross-correlation coefficient \rho(q,s). By
extending traditional metrics, this approach captures correlations at varying
fluctuation amplitudes and time scales. The method employs $q$-dependent
minimum spanning trees ($q$MSTs) to visualize evolving network structures.
Using minute-by-minute exchange rate data for 140 cryptocurrencies on Binance
(Jan 2021-Oct 2024), a rolling window analysis reveals significant shifts in
$q$MSTs, notably around April 2022 during the Terra/Luna crash. Initially
centralized around Bitcoin (BTC), the network later decentralized, with
Ethereum (ETH) and others gaining prominence. Spectral analysis confirms BTC's
declining dominance and increased diversification among assets. A key finding
is that medium-scale fluctuations exhibit stronger correlations than
large-scale ones, with $q$MSTs based on the latter being more decentralized.
Properly exploiting such facts may offer the possibility of a more flexible
optimal portfolio construction. Distance metrics highlight that major
disruptions amplify correlation differences, leading to fully decentralized
structures during crashes. These results demonstrate $q$MSTs' effectiveness in
uncovering fluctuation-dependent correlations, with potential applications
beyond finance, including biology, social and other complex systems.

arXiv link: http://arxiv.org/abs/2509.18820v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2025-09-22

An Econometric Analysis of the Impact of Telecare on the Length of Stay in Hospital

Authors: Kevin Momanyi

In this paper, we develop a theoretical model that links the demand for
telecare to the length of stay in hospital and formulate three models that can
be used to derive the treatment effect by making various assumptions about the
probability distribution of the outcome measure. We then fit the models to data
and estimate them using a strategy that controls for the effects of confounding
variables and unobservable factors, and compare the treatment effects with that
of the Propensity Score Matching (PSM) technique which adopts a
quasi-experimental study design. To ensure comparability, the covariates are
kept identical in all cases. An important finding that emerges from our
analysis is that the treatment effects derived from our econometric models of
interest are better than that obtained from an experimental study design as the
latter does not account for all the relevant unobservable factors. In
particular, the results show that estimating the treatment effect of telecare
in the way that an experimental study design entails fails to account for the
systematic variations in individuals' health production functions within each
experimental arm.

arXiv link: http://arxiv.org/abs/2509.22706v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-09-22

Functional effects models: Accounting for preference heterogeneity in panel data with machine learning

Authors: Nicolas Salvadé, Tim Hillel

In this paper, we present a general specification for Functional Effects
Models, which use Machine Learning (ML) methodologies to learn
individual-specific preference parameters from socio-demographic
characteristics, therefore accounting for inter-individual heterogeneity in
panel choice data. We identify three specific advantages of the Functional
Effects Model over traditional fixed, and random/mixed effects models: (i) by
mapping individual-specific effects as a function of socio-demographic
variables, we can account for these effects when forecasting choices of
previously unobserved individuals (ii) the (approximate) maximum-likelihood
estimation of functional effects avoids the incidental parameters problem of
the fixed effects model, even when the number of observed choices per
individual is small; and (iii) we do not rely on the strong distributional
assumptions of the random effects model, which may not match reality. We learn
functional intercept and functional slopes with powerful non-linear machine
learning regressors for tabular data, namely gradient boosting decision trees
and deep neural networks. We validate our proposed methodology on a synthetic
experiment and three real-world panel case studies, demonstrating that the
Functional Effects Model: (i) can identify the true values of
individual-specific effects when the data generation process is known; (ii)
outperforms both state-of-the-art ML choice modelling techniques that omit
individual heterogeneity in terms of predictive performance, as well as
traditional static panel choice models in terms of learning inter-individual
heterogeneity. The results indicate that the FI-RUMBoost model, which combines
the individual-specific constants of the Functional Effects Model with the
complex, non-linear utilities of RUMBoost, performs marginally best on
large-scale revealed preference panel data.

arXiv link: http://arxiv.org/abs/2509.18047v1

Econometrics arXiv paper, submitted: 2025-09-22

Local Projections Bootstrap Inference

Authors: María Dolores Gadea, Òscar Jordà

Bootstrap procedures for local projections typically rely on assuming that
the data generating process (DGP) is a finite order vector autoregression
(VAR), often taken to be that implied by the local projection at horizon 1.
Although convenient, it is well documented that a VAR can be a poor
approximation to impulse dynamics at horizons beyond its lag length. In this
paper we assume instead that the precise form of the parametric model
generating the data is not known. If one is willing to assume that the DGP is
perhaps an infinite order process, a larger class of models can be accommodated
and more tailored bootstrap procedures can be constructed. Using the moving
average representation of the data, we construct appropriate bootstrap
procedures.

arXiv link: http://arxiv.org/abs/2509.17949v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-09-22

Bayesian Semi-supervised Inference via a Debiased Modeling Approach

Authors: Gözde Sert, Abhishek Chakrabortty, Anirban Bhattacharya

Inference in semi-supervised (SS) settings has gained substantial attention
in recent years due to increased relevance in modern big-data problems. In a
typical SS setting, there is a much larger-sized unlabeled data, containing
only observations of predictors, and a moderately sized labeled data containing
observations for both an outcome and the set of predictors. Such data naturally
arises when the outcome, unlike the predictors, is costly or difficult to
obtain. One of the primary statistical objectives in SS settings is to explore
whether parameter estimation can be improved by exploiting the unlabeled data.
We propose a novel Bayesian method for estimating the population mean in SS
settings. The approach yields estimators that are both efficient and optimal
for estimation and inference. The method itself has several interesting
artifacts. The central idea behind the method is to model certain summary
statistics of the data in a targeted manner, rather than the entire raw data
itself, along with a novel Bayesian notion of debiasing. Specifying appropriate
summary statistics crucially relies on a debiased representation of the
population mean that incorporates unlabeled data through a flexible nuisance
function while also learning its estimation bias. Combined with careful usage
of sample splitting, this debiasing approach mitigates the effect of bias due
to slow rates or misspecification of the nuisance parameter from the posterior
of the final parameter of interest, ensuring its robustness and efficiency.
Concrete theoretical results, via Bernstein--von Mises theorems, are
established, validating all claims, and are further supported through extensive
numerical studies. To our knowledge, this is possibly the first work on
Bayesian inference in SS settings, and its central ideas also apply more
broadly to other Bayesian semi-parametric inference problems.

arXiv link: http://arxiv.org/abs/2509.17385v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2025-09-21

Improving S&P 500 Volatility Forecasting through Regime-Switching Methods

Authors: Ava C. Blake, Nivika A. Gandhi, Anurag R. Jakkula

Accurate prediction of financial market volatility is critical for risk
management, derivatives pricing, and investment strategy. In this study, we
propose a multitude of regime-switching methods to improve the prediction of
S&P 500 volatility by capturing structural changes in the market across time.
We use eleven years of SPX data, from May 1st, 2014 to May 27th, 2025, to
compute daily realized volatility (RV) from 5-minute intraday log returns,
adjusted for irregular trading days. To enhance forecast accuracy, we
engineered features to capture both historical dynamics and forward-looking
market sentiment across regimes. The regime-switching methods include a soft
Markov switching algorithm to estimate soft-regime probabilities, a
distributional spectral clustering method that uses XGBoost to assign clusters
at prediction time, and a coefficient-based soft regime algorithm that extracts
HAR coefficients from time segments segmented through the Mood test and
clusters through Bayesian GMM for soft regime weights, using XGBoost to predict
regime probabilities. Models were evaluated across three time periods--before,
during, and after the COVID-19 pandemic. The coefficient-based clustering
algorithm outperformed all other models, including the baseline autoregressive
model, during all time periods. Additionally, each model was evaluated on its
recursive forecasting performance for 5- and 10-day horizons during each time
period. The findings of this study demonstrate the value of regime-aware
modeling frameworks and soft clustering approaches in improving volatility
forecasting, especially during periods of heightened uncertainty and structural
change.

arXiv link: http://arxiv.org/abs/2510.03236v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-09-21

Regularizing Extrapolation in Causal Inference

Authors: David Arbour, Harsh Parikh, Bijan Niknam, Elizabeth Stuart, Kara Rudolph, Avi Feller

Many common estimators in machine learning and causal inference are linear
smoothers, where the prediction is a weighted average of the training outcomes.
Some estimators, such as ordinary least squares and kernel ridge regression,
allow for arbitrarily negative weights, which improve feature imbalance but
often at the cost of increased dependence on parametric modeling assumptions
and higher variance. By contrast, estimators like importance weighting and
random forests (sometimes implicitly) restrict weights to be non-negative,
reducing dependence on parametric modeling and variance at the cost of worse
imbalance. In this paper, we propose a unified framework that directly
penalizes the level of extrapolation, replacing the current practice of a hard
non-negativity constraint with a soft constraint and corresponding
hyperparameter. We derive a worst-case extrapolation error bound and introduce
a novel "bias-bias-variance" tradeoff, encompassing biases due to feature
imbalance, model misspecification, and estimator variance; this tradeoff is
especially pronounced in high dimensions, particularly when positivity is poor.
We then develop an optimization procedure that regularizes this bound while
minimizing imbalance and outline how to use this approach as a sensitivity
analysis for dependence on parametric modeling assumptions. We demonstrate the
effectiveness of our approach through synthetic experiments and a real-world
application, involving the generalization of randomized controlled trial
estimates to a target population of interest.

arXiv link: http://arxiv.org/abs/2509.17180v1

Econometrics arXiv paper, submitted: 2025-09-19

KRED: Korea Research Economic Database for Macroeconomic Research

Authors: Changryong Baek, Seunghyun Moon, Seunghyeon Lee

We introduce KRED (Korea Research Economic Database), a new FRED MD style
macroeconomic dataset for South Korea. KRED is constructed by aggregating 88
key monthly time series from multiple official sources (e.g., Bank of Korea
ECOS, Statistics Korea KOSIS) into a unified, publicly available database. The
dataset is aligned with the FRED MD format, enabling standardized
transformations and direct comparability; an Appendix maps each Korean series
to its FRED MD counterpart. Using a balanced panel of 80 series from 2009 to
2024, we extract four principal components via PCA that explain approximately
40% of the total variance. These four factors have intuitive economic
interpretations, capturing monetary conditions, labor market activity, real
output, and housing demand, analogous to diffusion indexes summarizing broad
economic movements. Notably, the factor based diffusion indexes derived from
KRED clearly trace major macroeconomic fluctuations over the sample period such
as the 2020 COVID 19 recession. Our results demonstrate that KRED's factor
structure can effectively condense complex economic information into a few
informative indexes, yielding new insights into South Korea's business cycles
and co movements.

arXiv link: http://arxiv.org/abs/2509.16115v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-09-19

Beyond the Average: Distributional Causal Inference under Imperfect Compliance

Authors: Undral Byambadalai, Tomu Hirata, Tatsushi Oka, Shota Yasui

We study the estimation of distributional treatment effects in randomized
experiments with imperfect compliance. When participants do not adhere to their
assigned treatments, we leverage treatment assignment as an instrumental
variable to identify the local distributional treatment effect-the difference
in outcome distributions between treatment and control groups for the
subpopulation of compliers. We propose a regression-adjusted estimator based on
a distribution regression framework with Neyman-orthogonal moment conditions,
enabling robustness and flexibility with high-dimensional covariates. Our
approach accommodates continuous, discrete, and mixed discrete-continuous
outcomes, and applies under a broad class of covariate-adaptive randomization
schemes, including stratified block designs and simple random sampling. We
derive the estimator's asymptotic distribution and show that it achieves the
semiparametric efficiency bound. Simulation results demonstrate favorable
finite-sample performance, and we demonstrate the method's practical relevance
in an application to the Oregon Health Insurance Experiment.

arXiv link: http://arxiv.org/abs/2509.15594v1

Econometrics arXiv paper, submitted: 2025-09-18

Inference on the Distribution of Individual Treatment Effects in Nonseparable Triangular Models

Authors: Jun Ma, Vadim Marmer, Zhengfei Yu

In this paper, we develop inference methods for the distribution of
heterogeneous individual treatment effects (ITEs) in the nonseparable
triangular model with a binary endogenous treatment and a binary instrument of
Vuong and Xu (2017) and Feng, Vuong, and Xu (2019). We focus on the estimation
of the cumulative distribution function (CDF) of the ITE, which can be used to
address a wide range of practically important questions such as inference on
the proportion of individuals with positive ITEs, the quantiles of the
distribution of ITEs, and the interquartile range as a measure of the spread of
the ITEs, as well as comparison of the ITE distributions across
sub-populations. Moreover, our CDF-based approach can deliver more precise
results than density-based approach previously considered in the literature. We
establish weak convergence to tight Gaussian processes for the empirical CDF
and quantile function computed from nonparametric ITE estimates of Feng, Vuong,
and Xu (2019). Using those results, we develop bootstrap-based nonparametric
inferential methods, including uniform confidence bands for the CDF and
quantile function of the ITE distribution.

arXiv link: http://arxiv.org/abs/2509.15401v1

Econometrics arXiv paper, submitted: 2025-09-18

Efficient and Accessible Discrete Choice Experiments: The DCEtool Package for R

Authors: Daniel Pérez-Troncoso

Discrete Choice Experiments (DCEs) are widely used to elicit preferences for
products or services by analyzing choices among alternatives described by their
attributes. The quality of the insights obtained from a DCE heavily depends on
the properties of its experimental design. While early DCEs often relied on
linear criteria such as orthogonality, these approaches were later found to be
inappropriate for discrete choice models, which are inherently non-linear. As a
result, statistically efficient design methods, based on minimizing the D-error
to reduce parameter variance, have become the standard. Although such methods
are implemented in several commercial tools, researchers seeking free and
accessible solutions often face limitations. This paper presents DCEtool, an R
package with a Shiny-based graphical interface designed to support both novice
and experienced users in constructing, decoding, and analyzing statistically
efficient DCE designs. DCEtool facilitates the implementation of serial DCEs,
offers flexible design settings, and enables rapid estimation of discrete
choice models. By making advanced design techniques more accessible, DCEtool
contributes to the broader adoption of rigorous experimental practices in
choice modelling.

arXiv link: http://arxiv.org/abs/2509.15326v1

Econometrics arXiv paper, submitted: 2025-09-18

Monetary Policy and Exchange Rate Fluctuations

Authors: Yongheng Hu

In this paper, we model USD-CNY bilateral exchange rate fluctuations as a
general stochastic process and incorporate monetary policy shock to examine how
bilateral exchange rate fluctuations affect the Revealed Comparative Advantage
(RCA) index. Numerical simulations indicate that as the mean of bilateral
exchange rate fluctuations increases, i.e., currency devaluation, the RCA index
rises. Moreover, smaller bilateral exchange rate fluctuations after the policy
shock cause the RCA index to gradually converge toward its mean level. For the
empirical analysis, we select the USD-CNY bilateral exchange rate and
provincial manufacturing industry export competitiveness data in China from
2008 to 2021. We find that in the short term, when exchange rate fluctuations
stabilize within a range less than 0.2 RMB depreciation will effectively boost
export competitiveness. Then, the 8.11 exchange rate policy reversed the
previous linear trend of the CNY, stabilizing it within a narrow fluctuation
range over the long term. This policy leads to a gradual convergence of
provincial RCA indices toward a relatively high level, which is commensurate
with our numerical simulations, and indirectly enhances provincial export
competitiveness.

arXiv link: http://arxiv.org/abs/2509.15169v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2025-09-18

Forecasting in small open emerging economies Evidence from Thailand

Authors: Paponpat Taveeapiradeecharoen, Nattapol Aunsri

Forecasting inflation in small open economies is difficult because limited
time series and strong external exposures create an imbalance between few
observations and many potential predictors. We study this challenge using
Thailand as a representative case, combining more than 450 domestic and
international indicators. We evaluate modern Bayesian shrinkage and factor
models, including Horseshoe regressions, factor-augmented autoregressions,
factor-augmented VARs, dynamic factor models, and Bayesian additive regression
trees.
Our results show that factor models dominate at short horizons, when global
shocks and exchange rate movements drive inflation, while shrinkage-based
regressions perform best at longer horizons. These models not only improve
point and density forecasts but also enhance tail-risk performance at the
one-year horizon.
Shrinkage diagnostics, on the other hand, additionally reveal that Google
Trends variables, especially those related to food essential goods and housing
costs, progressively rotate into predictive importance as the horizon
lengthens. This underscores their role as forward-looking indicators of
household inflation expectations in small open economies.

arXiv link: http://arxiv.org/abs/2509.14805v1

Econometrics arXiv paper, submitted: 2025-09-17

Time-Varying Heterogeneous Treatment Effects in Event Studies

Authors: Irene Botosaru, Laura Liu

This paper examines the identification and estimation of heterogeneous
treatment effects in event studies, emphasizing the importance of both lagged
dependent variables and treatment effect heterogeneity. We show that omitting
lagged dependent variables can induce omitted variable bias in the estimated
time-varying treatment effects. We develop a novel semiparametric approach
based on a short-T dynamic linear panel model with correlated random
coefficients, where the time-varying heterogeneous treatment effects can be
modeled by a time-series process to reduce dimensionality. We construct a
two-step estimator employing quasi-maximum likelihood for common parameters and
empirical Bayes for the heterogeneous treatment effects. The procedure is
flexible, easy to implement, and achieves ratio optimality asymptotically. Our
results also provide insights into common assumptions in the event study
literature, such as no anticipation, homogeneous treatment effects across
treatment timing cohorts, and state dependence structure.

arXiv link: http://arxiv.org/abs/2509.13698v1

Econometrics arXiv paper, submitted: 2025-09-16

Generalized Covariance Estimator under Misspecification and Constraints

Authors: Aryan Manafi Neyazi

This paper investigates the properties of the Generalized Covariance (GCov)
estimator under misspecification and constraints with application to processes
with local explosive patterns, such as causal-noncausal and double
autoregressive (DAR) processes. We show that GCov is consistent and has an
asymptotically Normal distribution under misspecification. Then, we construct
GCov-based Wald-type and score-type tests to test one specification against the
other, all of which follow a $\chi^2$ distribution. Furthermore, we propose the
constrained GCov (CGCov) estimator, which extends the use of the GCov estimator
to a broader range of models with constraints on their parameters. We
investigate the asymptotic distribution of the CGCov estimator when the true
parameters are far from the boundary and on the boundary of the parameter
space. We validate the finite sample performance of the proposed estimators and
tests in the context of causal-noncausal and DAR models. Finally, we provide
two empirical applications by applying the noncausal model to the final energy
demand commodity index and also the DAR model to the US 3-month treasury bill.

arXiv link: http://arxiv.org/abs/2509.13492v1

Econometrics arXiv paper, submitted: 2025-09-16

Dynamic Local Average Treatment Effects in Time Series

Authors: Alessandro Casini, Adam McCloskey, Luca Rolla, Raimondo Pala

This paper discusses identification, estimation, and inference on dynamic
local average treatment effects (LATEs) in instrumental variables (IVs)
settings. First, we show that compliers--observations whose treatment status is
affected by the instrument--can be identified individually in time series data
using smoothness assumptions and local comparisons of treatment assignments.
Second, we show that this result enables not only better interpretability of IV
estimates but also direct testing of the exclusion restriction by comparing
outcomes among identified non-compliers across instrument values. Third, we
document pervasive weak identification in applied work using IVs with time
series data by surveying recent publications in leading economics journals.
However, we find that strong identification often holds in large subsamples for
which the instrument induces changes in the treatment. Motivated by this, we
introduce a method based on dynamic programming to detect the most
strongly-identified subsample and show how to use this subsample to improve
estimation and inference. We also develop new identification-robust inference
procedures that focus on the most strongly-identified subsample, offering
efficiency gains relative to existing full sample identification-robust
inference when identification fails over parts of the sample. Finally, we apply
our results to heteroskedasticity-based identification of monetary policy
effects. We find that about 75% of observations are compliers (i.e., cases
where the variance of the policy shifts up on FOMC announcement days), and we
fail to reject the exclusion restriction. Estimation using the most
strongly-identified subsample helps reconcile conflicting IV and GMM estimates
in the literature.

arXiv link: http://arxiv.org/abs/2509.12985v1

Econometrics arXiv paper, submitted: 2025-09-16

Policy-relevant causal effect estimation using instrumental variables with interference

Authors: Didier Nibbering, Matthijs Oosterveen

Many policy evaluations using instrumental variable (IV) methods include
individuals who interact with each other, potentially violating the standard IV
assumptions. This paper defines and partially identifies direct and spillover
effects with a clear policy-relevant interpretation under relatively mild
assumptions on interference. Our framework accommodates both spillovers from
the instrument to treatment and from treatment to outcomes and allows for
multiple peers. By generalizing monotone treatment response and selection
assumptions, we derive informative bounds on policy-relevant effects without
restricting the type or direction of interference. The results extend IV
estimation to more realistic social contexts, informing program evaluation and
treatment scaling when interference is present.

arXiv link: http://arxiv.org/abs/2509.12538v1

Econometrics arXiv paper, submitted: 2025-09-15

A Decision Theoretic Perspective on Artificial Superintelligence: Coping with Missing Data Problems in Prediction and Treatment Choice

Authors: Jeff Dominitz, Charles F. Manski

Enormous attention and resources are being devoted to the quest for
artificial general intelligence and, even more ambitiously, artificial
superintelligence. We wonder about the implications for our methodological
research, which aims to help decision makers cope with what econometricians
call identification problems, inferential problems in empirical research that
do not diminish as sample size grows. Of particular concern are missing data
problems in prediction and treatment choice. Essentially all data collection
intended to inform decision making is subject to missing data, which gives rise
to identification problems. Thus far, we see no indication that the current
dominant architecture of machine learning (ML)-based artificial intelligence
(AI) systems will outperform humans in this context. In this paper, we explain
why we have reached this conclusion and why we see the missing data problem as
a cautionary case study in the quest for superintelligence more generally. We
first discuss the concept of intelligence, before presenting a
decision-theoretic perspective that formalizes the connection between
intelligence and identification problems. We next apply this perspective to two
leading cases of missing data problems. Then we explain why we are skeptical
that AI research is currently on a path toward machines doing better than
humans at solving these identification problems.

arXiv link: http://arxiv.org/abs/2509.12388v1

Econometrics arXiv paper, submitted: 2025-09-15

Fairness-Aware and Interpretable Policy Learning

Authors: Nora Bearth, Michael Lechner, Jana Mareckova, Fabian Muny

Fairness and interpretability play an important role in the adoption of
decision-making algorithms across many application domains. These requirements
are intended to avoid undesirable group differences and to alleviate concerns
related to transparency. This paper proposes a framework that integrates
fairness and interpretability into algorithmic decision making by combining
data transformation with policy trees, a class of interpretable policy
functions. The approach is based on pre-processing the data to remove
dependencies between sensitive attributes and decision-relevant features,
followed by a tree-based optimization to obtain the policy. Since data
pre-processing compromises interpretability, an additional transformation maps
the parameters of the resulting tree back to the original feature space. This
procedure enhances fairness by yielding policy allocations that are pairwise
independent of sensitive attributes, without sacrificing interpretability.
Using administrative data from Switzerland to analyze the allocation of
unemployed individuals to active labor market programs (ALMP), the framework is
shown to perform well in a realistic policy setting. Effects of integrating
fairness and interpretability constraints are measured through the change in
expected employment outcomes. The results indicate that, for this particular
application, fairness can be substantially improved at relatively low cost.

arXiv link: http://arxiv.org/abs/2509.12119v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2025-09-14

The Honest Truth About Causal Trees: Accuracy Limits for Heterogeneous Treatment Effect Estimation

Authors: Matias D. Cattaneo, Jason M. Klusowski, Ruiqi Rae Yu

Recursive decision trees have emerged as a leading methodology for
heterogeneous causal treatment effect estimation and inference in experimental
and observational settings. These procedures are fitted using the celebrated
CART (Classification And Regression Tree) algorithm [Breiman et al., 1984], or
custom variants thereof, and hence are believed to be "adaptive" to
high-dimensional data, sparsity, or other specific features of the underlying
data generating process. Athey and Imbens [2016] proposed several "honest"
causal decision tree estimators, which have become the standard in both
academia and industry. We study their estimators, and variants thereof, and
establish lower bounds on their estimation error. We demonstrate that these
popular heterogeneous treatment effect estimators cannot achieve a
polynomial-in-$n$ convergence rate under basic conditions, where $n$ denotes
the sample size. Contrary to common belief, honesty does not resolve these
limitations and at best delivers negligible logarithmic improvements in sample
size or dimension. As a result, these commonly used estimators can exhibit poor
performance in practice, and even be inconsistent in some settings. Our
theoretical insights are empirically validated through simulations.

arXiv link: http://arxiv.org/abs/2509.11381v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2025-09-14

What is in a Price? Estimating Willingness-to-Pay with Bayesian Hierarchical Models

Authors: Srijesh Pillai, Rajesh Kumar Chandrawat

For premium consumer products, pricing strategy is not about a single number,
but about understanding the perceived monetary value of the features that
justify a higher cost. This paper proposes a robust methodology to deconstruct
a product's price into the tangible value of its constituent parts. We employ
Bayesian Hierarchical Conjoint Analysis, a sophisticated statistical technique,
to solve this high-stakes business problem using the Apple iPhone as a
universally recognizable case study. We first simulate a realistic choice based
conjoint survey where consumers choose between different hypothetical iPhone
configurations. We then develop a Bayesian Hierarchical Logit Model to infer
consumer preferences from this choice data. The core innovation of our model is
its ability to directly estimate the Willingness-to-Pay (WTP) in dollars for
specific feature upgrades, such as a "Pro" camera system or increased storage.
Our results demonstrate that the model successfully recovers the true,
underlying feature valuations from noisy data, providing not just a point
estimate but a full posterior probability distribution for the dollar value of
each feature. This work provides a powerful, practical framework for
data-driven product design and pricing strategy, enabling businesses to make
more intelligent decisions about which features to build and how to price them.

arXiv link: http://arxiv.org/abs/2509.11089v1

Econometrics arXiv paper, submitted: 2025-09-14

Large-Scale Curve Time Series with Common Stochastic Trends

Authors: Degui Li, Yu-Ning Li, Peter C. B. Phillips

This paper studies high-dimensional curve time series with common stochastic
trends. A dual functional factor model structure is adopted with a
high-dimensional factor model for the observed curve time series and a
low-dimensional factor model for the latent curves with common trends. A
functional PCA technique is applied to estimate the common stochastic trends
and functional factor loadings. Under some regularity conditions we derive the
mean square convergence and limit distribution theory for the developed
estimates, allowing the dimension and sample size to jointly diverge to
infinity. We propose an easy-to-implement criterion to consistently select the
number of common stochastic trends and further discuss model estimation when
the nonstationary factors are cointegrated. Extensive Monte-Carlo simulations
and two empirical applications to large-scale temperature curves in Australia
and log-price curves of S&P 500 stocks are conducted, showing finite-sample
performance and providing practical implementations of the new methodology.

arXiv link: http://arxiv.org/abs/2509.11060v1

Econometrics arXiv cross-link from physics.ao-ph (physics.ao-ph), submitted: 2025-09-11

Climate change: across time and frequencies

Authors: Luis Aguiar-Conraria, Vasco J. Gabriel, Luis F. Martins, Anthoulla Phella

We use continuous wavelet tools to characterize the dynamics of climate
change across time and frequencies. This approach allows us to capture the
changing patterns in the relationship between global mean temperature anomalies
and climate forcings. Using historical data from 1850 to 2022, we find that
greenhouse gases, and CO$_2$ in particular, play a significant role in driving
the very low frequency trending behaviour in temperatures, even after
controlling for the effects of natural forcings. At shorter frequencies, the
effect of forcings on temperatures switches on and off, most likely because of
complex feedback mechanisms in Earth's climate system.

arXiv link: http://arxiv.org/abs/2509.21334v1

Econometrics arXiv paper, submitted: 2025-09-11

Taking the Highway or the Green Road? Conditional Temperature Forecasts Under Alternative SSP Scenarios

Authors: Anthoulla Phella, Vasco J. Gabriel, Luis F. Martins

In this paper, using the Bayesian VAR framework suggested by Chan et al.
(2025), we produce conditional temperature forecasts up until 2050, by
exploiting both equality and inequality constraints on climate drivers like
carbon dioxide or methane emissions. Engaging in a counterfactual scenario
analysis by imposing a Shared Socioeconomic Pathways (SSPs) scenario of
"business as-usual", with no mitigation and high emissions, we observe that
conditional and unconditional forecasts would follow a similar path. Instead,
if a high mitigation with low emissions scenario were to be followed, the
conditional temperature paths would remain below the unconditional trajectory
after 2040, i.e. temperatures increases can potentially slow down in a
meaningful way, but the lags for changes in emissions to have an effect are
quite substantial. The latter should be taken into account greatly when
designing response policies to climate change.

arXiv link: http://arxiv.org/abs/2509.09384v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-09-10

Functional Regression with Nonstationarity and Error Contamination: Application to the Economic Impact of Climate Change

Authors: Kyungsik Nam, Won-Ki Seo

This paper studies a regression model with functional dependent and
explanatory variables, both of which exhibit nonstationary dynamics. The model
assumes that the nonstationary stochastic trends of the dependent variable are
explained by those of the explanatory variables, and hence that there exists a
stable long-run relationship between the two variables despite their
nonstationary behavior. We also assume that the functional observations may be
error-contaminated. We develop novel autocovariance-based estimation and
inference methods for this model. The methodology is broadly applicable to
economic and statistical functional time series with nonstationary dynamics. To
illustrate our methodology and its usefulness, we apply it to evaluating the
global economic impact of climate change, an issue of intrinsic importance.

arXiv link: http://arxiv.org/abs/2509.08591v3

Econometrics arXiv updated paper (originally submitted: 2025-09-10)

On the Identification of Diagnostic Expectations: Econometric Insights from DSGE Models

Authors: Jinting Guo

This paper provides the first econometric evidence for diagnostic
expectations (DE) in DSGE models. Using the identification framework of Qu and
Tkachenko (2017), I show that DE generate dynamics unreplicable under rational
expectations (RE), with no RE parameterization capable of matching the
autocovariance implied by DE. Consequently, DE are not observationally
equivalent to RE and constitute an endogenous source of macroeconomic
fluctuations, distinct from both structural frictions and exogenous shocks.
From an econometric perspective, DE preserve overall model identification but
weaken the identification of shock variances. To ensure robust conclusions
across estimation methods and equilibrium conditions, I extend Bayesian
estimation with Sequential Monte Carlo sampling to the indeterminacy domain.
These findings advance the econometric study of expectations and highlight the
macroeconomic relevance of diagnostic beliefs.

arXiv link: http://arxiv.org/abs/2509.08472v2

Econometrics arXiv paper, submitted: 2025-09-10

Posterior inference of attitude-behaviour relationships using latent class choice models

Authors: Akshay Vij, Stephane Hess

The link between attitudes and behaviour has been a key topic in choice
modelling for two decades, with the widespread application of ever more complex
hybrid choice models. This paper proposes a flexible and transparent
alternative framework for empirically examining the relationship between
attitudes and behaviours using latent class choice models (LCCMs). Rather than
embedding attitudinal constructs within the structural model, as in hybrid
choice frameworks, we recover class-specific attitudinal profiles through
posterior inference. This approach enables analysts to explore
attitude-behaviour associations without the complexity and convergence issues
often associated with integrated estimation. Two case studies are used to
demonstrate the framework: one on employee preferences for working from home,
and another on public acceptance of COVID-19 vaccines. Across both studies, we
compare posterior profiling of indicator means, fractional multinomial logit
(FMNL) models, factor-based representations, and hybrid specifications. We find
that posterior inference methods provide behaviourally rich insights with
minimal additional complexity, while factor-based models risk discarding key
attitudinal information, and fullinformation hybrid models offer little gain in
explanatory power and incur substantially greater estimation burden. Our
findings suggest that when the goal is to explain preference heterogeneity,
posterior inference offers a practical alternative to hybrid models, one that
retains interpretability and robustness without sacrificing behavioural depth.

arXiv link: http://arxiv.org/abs/2509.08373v1

Econometrics arXiv cross-link from q-fin.RM (q-fin.RM), submitted: 2025-09-09

Chaotic Bayesian Inference: Strange Attractors as Risk Models for Black Swan Events

Authors: Crystal Rust

We introduce a new risk modeling framework where chaotic attractors shape the
geometry of Bayesian inference. By combining heavy-tailed priors with Lorenz
and Rossler dynamics, the models naturally generate volatility clustering, fat
tails, and extreme events. We compare two complementary approaches: Model A,
which emphasizes geometric stability, and Model B, which highlights rare bursts
using Fibonacci diagnostics. Together, they provide a dual perspective for
systemic risk analysis, linking Black Swan theory to practical tools for stress
testing and volatility monitoring.

arXiv link: http://arxiv.org/abs/2509.08183v1

Econometrics arXiv paper, submitted: 2025-09-09

Estimating Peer Effects Using Partial Network Data

Authors: Vincent Boucher, Aristide Houndetoungan

We study the estimation of peer effects through social networks when
researchers do not observe the entire network structure. Special cases include
sampled networks, censored networks, and misclassified links. We assume that
researchers can obtain a consistent estimator of the distribution of the
network. We show that this assumption is sufficient for estimating peer effects
using a linear-in-means model. We provide an empirical application to the study
of peer effects on students' academic achievement using the widely used Add
Health database, and show that network data errors have a large downward bias
on estimated peer effects.

arXiv link: http://arxiv.org/abs/2509.08145v1

Econometrics arXiv paper, submitted: 2025-09-09

Epsilon-Minimax Solutions of Statistical Decision Problems

Authors: Andrés Aradillas Fernández, José Blanchet, José Luis Montiel Olea, Chen Qiu, Jörg Stoye, Lezhi Tan

A decision rule is epsilon-minimax if it is minimax up to an additive factor
epsilon. We present an algorithm for provably obtaining epsilon-minimax
solutions of statistical decision problems. We are interested in problems where
the statistician chooses randomly among I decision rules. The minimax solution
of these problems admits a convex programming representation over the
(I-1)-simplex. Our suggested algorithm is a well-known mirror subgradient
descent routine, designed to approximately solve the convex optimization
problem that defines the minimax decision rule. This iterative routine is known
in the computer science literature as the hedge algorithm and it is used in
algorithmic game theory as a practical tool to find approximate solutions of
two-person zero-sum games. We apply the suggested algorithm to different
minimax problems in the econometrics literature. An empirical application to
the problem of optimally selecting sites to maximize the external validity of
an experimental policy evaluation illustrates the usefulness of the suggested
procedure.

arXiv link: http://arxiv.org/abs/2509.08107v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2025-09-09

Forecasting dementia incidence

Authors: Jérôme R. Simons, Yuntao Chen, Eric Brunner, Eric French

This paper estimates the stochastic process of how dementia incidence evolves
over time. We proceed in two steps: first, we estimate a time trend for
dementia using a multi-state Cox model. The multi-state model addresses
problems of both interval censoring arising from infrequent measurement and
also measurement error in dementia. Second, we feed the estimated mean and
variance of the time trend into a Kalman filter to infer the population level
dementia process. Using data from the English Longitudinal Study of Aging
(ELSA), we find that dementia incidence is no longer declining in England.
Furthermore, our forecast is that future incidence remains constant, although
there is considerable uncertainty in this forecast. Our two-step estimation
procedure has significant computational advantages by combining a multi-state
model with a time series method. To account for the short sample that is
available for dementia, we derive expressions for the Kalman filter's
convergence speed, size, and power to detect changes and conclude our estimator
performs well even in short samples.

arXiv link: http://arxiv.org/abs/2509.07874v1

Econometrics arXiv paper, submitted: 2025-09-09

Estimating Social Network Models with Link Misclassification

Authors: Arthur Lewbel, Xi Qu, Xun Tang

We propose an adjusted 2SLS estimator for social network models when reported
binary network links are misclassified (some zeros reported as ones and vice
versa) due, e.g., to survey respondents' recall errors, or lapses in data
input. We show misclassification adds new sources of correlation between the
regressors and errors, which makes all covariates endogenous and invalidates
conventional estimators. We resolve these issues by constructing a novel
estimator of misclassification rates and using those estimates to both adjust
endogenous peer outcomes and construct new instruments for 2SLS estimation. A
distinctive feature of our method is that it does not require structural
modeling of link formation. Simulation results confirm our adjusted 2SLS
estimator corrects the bias from a naive, unadjusted 2SLS estimator which
ignores misclassification and uses conventional instruments. We apply our
method to study peer effects in household decisions to participate in a
microfinance program in Indian villages.

arXiv link: http://arxiv.org/abs/2509.07343v1

Econometrics arXiv paper, submitted: 2025-09-08

Optimal Policy Learning for Multi-Action Treatment with Risk Preference using Stata

Authors: Giovanni Cerulli

This paper presents the Stata community-distributed command "opl_ma_fb" (and
the companion command "opl_ma_vf"), for implementing the first-best Optimal
Policy Learning (OPL) algorithm to estimate the best treatment assignment given
the observation of an outcome, a multi-action (or multi-arm) treatment, and a
set of observed covariates (features). It allows for different risk preferences
in decision-making (i.e., risk-neutral, linear risk-averse, and quadratic
risk-averse), and provides a graphical representation of the optimal policy,
along with an estimate of the maximal welfare (i.e., the value-function
estimated at optimal policy) using regression adjustment (RA),
inverse-probability weighting (IPW), and doubly robust (DR) formulas.

arXiv link: http://arxiv.org/abs/2509.06851v1

Econometrics arXiv paper, submitted: 2025-09-08

Neural ARFIMA model for forecasting BRIC exchange rates with long memory under oil shocks and policy uncertainties

Authors: Tanujit Chakraborty, Donia Besher, Madhurima Panja, Shovon Sengupta

Accurate forecasting of exchange rates remains a persistent challenge,
particularly for emerging economies such as Brazil, Russia, India, and China
(BRIC). These series exhibit long memory, nonlinearity, and non-stationarity
properties that conventional time series models struggle to capture.
Additionally, there exist several key drivers of exchange rate dynamics,
including global economic policy uncertainty, US equity market volatility, US
monetary policy uncertainty, oil price growth rates, and country-specific
short-term interest rate differentials. These empirical complexities underscore
the need for a flexible modeling framework that can jointly accommodate long
memory, nonlinearity, and the influence of external drivers. To address these
challenges, we propose a Neural AutoRegressive Fractionally Integrated Moving
Average (NARFIMA) model that combines the long-memory representation of ARFIMA
with the nonlinear learning capacity of neural networks, while flexibly
incorporating exogenous causal variables. We establish theoretical properties
of the model, including asymptotic stationarity of the NARFIMA process using
Markov chains and nonlinear time series techniques. We quantify forecast
uncertainty using conformal prediction intervals within the NARFIMA framework.
Empirical results across six forecast horizons show that NARFIMA consistently
outperforms various state-of-the-art statistical and machine learning models in
forecasting BRIC exchange rates. These findings provide new insights for
policymakers and market participants navigating volatile financial conditions.
The narfima R package provides an implementation of our
approach.

arXiv link: http://arxiv.org/abs/2509.06697v1

Econometrics arXiv paper, submitted: 2025-09-08

Largevars: An R Package for Testing Large VARs for the Presence of Cointegration

Authors: Anna Bykhovskaya, Vadim Gorin, Eszter Kiss

Cointegration is a property of multivariate time series that determines
whether its non-stationary, growing components have a stationary linear
combination. Largevars R package conducts a cointegration test for
high-dimensional vector autoregressions of order k based on the large N, T
asymptotics of Bykhovskaya and Gorin (2022, 2025). The implemented test is a
modification of the Johansen likelihood ratio test. In the absence of
cointegration the test converges to the partial sum of the Airy_1 point
process, an object arising in random matrix theory.
The package and this article contain simulated quantiles of the first ten
partial sums of the Airy_1 point process that are precise up to the first 3
digits. We also include two examples using Largevars: an empirical example on
S&P100 stocks and a simulated VAR(2) example.

arXiv link: http://arxiv.org/abs/2509.06295v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2025-09-07

Predicting Market Troughs: A Machine Learning Approach with Causal Interpretation

Authors: Peilin Rao, Randall R. Rojas

This paper provides robust, new evidence on the causal drivers of market
troughs. We demonstrate that conclusions about these triggers are critically
sensitive to model specification, moving beyond restrictive linear models with
a flexible DML average partial effect causal machine learning framework. Our
robust estimates identify the volatility of options-implied risk appetite and
market liquidity as key causal drivers, relationships misrepresented or
obscured by simpler models. These findings provide high-frequency empirical
support for intermediary asset pricing theories. This causal analysis is
enabled by a high-performance nowcasting model that accurately identifies
capitulation events in real-time.

arXiv link: http://arxiv.org/abs/2509.05922v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2025-09-06

Polynomial Log-Marginals and Tweedie's Formula : When Is Bayes Possible?

Authors: Jyotishka Datta, Nicholas G. Polson

Motivated by Tweedie's formula for the Compound Decision problem, we examine
the theoretical foundations of empirical Bayes estimators that directly model
the marginal density $m(y)$. Our main result shows that polynomial
log-marginals of degree $k \ge 3 $ cannot arise from any valid prior
distribution in exponential family models, while quadratic forms correspond
exactly to Gaussian priors. This provides theoretical justification for why
certain empirical Bayes decision rules, while practically useful, do not
correspond to any formal Bayes procedures. We also strengthen the diagnostic by
showing that a marginal is a Gaussian convolution only if it extends to a
bounded solution of the heat equation in a neighborhood of the smoothing
parameter, beyond the convexity of $c(y)=\tfrac12 y^2+\log m(y)$.

arXiv link: http://arxiv.org/abs/2509.05823v1

Econometrics arXiv updated paper (originally submitted: 2025-09-05)

Utilitarian or Quantile-Welfare Evaluation of Health Policy?

Authors: Charles F. Manski, John Mullahy

This paper considers quantile-welfare evaluation of health policy as an
alternative to utilitarian evaluation. Manski (1988) originally proposed and
studied maximization of quantile utility as a model of individual decision
making under uncertainty, juxtaposing it with maximization of expected utility.
That paper's primary motivation was to exploit the fact that maximization of
quantile utility requires only an ordinal formalization of utility, not a
cardinal one. This paper transfers these ideas from analysis of individual
decision making to analysis of social planning. We begin by summarizing basic
theoretical properties of quantile welfare in general terms rather than related
specifically to health policy. We then turn attention to health policy and
propose a procedure to nonparametrically bound the quantile welfare of health
states using data from binary-choice time-tradeoff (TTO) experiments of the
type regularly performed by health economists. After this we assess related
econometric considerations concerning measurement, using the EQ-5D framework to
structure our discussion.

arXiv link: http://arxiv.org/abs/2509.05529v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-09-05

Bayesian Inference for Confounding Variables and Limited Information

Authors: Ellis Scharfenaker, Duncan K. Foley

A central challenge in statistical inference is the presence of confounding
variables that may distort observed associations between treatment and outcome.
Conventional "causal" methods, grounded in assumptions such as ignorability,
exclude the possibility of unobserved confounders, leading to posterior
inferences that overstate certainty. We develop a Bayesian framework that
relaxes these assumptions by introducing entropy-favoring priors over
hypothesis spaces that explicitly allow for latent confounding variables and
partial information. Using the case of Simpson's paradox, we demonstrate how
this approach produces logically consistent posterior distributions that widen
credibly intervals in the presence of potential confounding. Our method
provides a generalizable, information-theoretic foundation for more robust
predictive inference in observational sciences.

arXiv link: http://arxiv.org/abs/2509.05520v1

Econometrics arXiv paper, submitted: 2025-09-05

Causal mechanism and mediation analysis for macroeconomics dynamics: a bridge of Granger and Sims causality

Authors: Jean-Marie Dufour, Endong Wang

This paper introduces a novel concept of impulse response decomposition to
disentangle the dynamic contributions of the mediator variables in the
transmission of structural shocks. We justify our decomposition by drawing on
causal mediation analysis and demonstrating its equivalence to the average
mediation effect. Our result establishes a formal link between Sims and Granger
causality. Sims causality captures the total effect, while Granger causality
corresponds to the mediation effect. We construct a dynamic mediation index
that quantifies the evolving role of mediator variables in shock propagation.
Applying our framework to studies of the transmission channels of US monetary
policy, we find that investor sentiment explains approximately 60% of the peak
aggregate output response in three months following a policy shock, while
expected default risk contributes negligibly across all horizons.

arXiv link: http://arxiv.org/abs/2509.05284v1

Econometrics arXiv paper, submitted: 2025-09-05

Treatment Effects of Multi-Valued Treatments in Hyper-Rectangle Model

Authors: Xunkang Tian

This study investigates the identification of marginal treatment responses
within multi-valued treatment models. Extending the hyper-rectangle model
introduced by Lee and Salanie (2018), this paper relaxes restrictive
assumptions, including the requirement of known treatment selection thresholds
and the dependence of treatments on all unobserved heterogeneity. By
incorporating an additional ranked treatment assumption, this study
demonstrates that the marginal treatment responses can be identified under a
broader set of conditions, either point or set identification. The framework
further enables the derivation of various treatment effects from the marginal
treatment responses. Additionally, this paper introduces a hypothesis testing
method to evaluate the effectiveness of policies on treatment effects,
enhancing its applicability to empirical policy analysis.

arXiv link: http://arxiv.org/abs/2509.05177v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2025-09-05

Optimal Estimation for General Gaussian Processes

Authors: Tetsuya Takabatake, Jun Yu, Chen Zhang

This paper proposes a novel exact maximum likelihood (ML) estimation method
for general Gaussian processes, where all parameters are estimated jointly. The
exact ML estimator (MLE) is consistent and asymptotically normally distributed.
We prove the local asymptotic normality (LAN) property of the sequence of
statistical experiments for general Gaussian processes in the sense of Le Cam,
thereby enabling optimal estimation and facilitating statistical inference. The
results rely solely on the asymptotic behavior of the spectral density near
zero, allowing them to be widely applied. The established optimality not only
addresses the gap left by Adenstedt(1974), who proposed an efficient but
infeasible estimator for the long-run mean $\mu$, but also enables us to
evaluate the finite-sample performance of the existing method -- the commonly
used plug-in MLE, in which the sample mean is substituted into the likelihood.
Our simulation results show that the plug-in MLE performs nearly as well as the
exact MLE, alleviating concerns that inefficient estimation of $\mu$ would
compromise the efficiency of the remaining parameter estimates.

arXiv link: http://arxiv.org/abs/2509.04987v1

Econometrics arXiv paper, submitted: 2025-09-05

A Bayesian Gaussian Process Dynamic Factor Model

Authors: Tony Chernis, Niko Hauzenberger, Haroon Mumtaz, Michael Pfarrhofer

We propose a dynamic factor model (DFM) where the latent factors are linked
to observed variables with unknown and potentially nonlinear functions. The key
novelty and source of flexibility of our approach is a nonparametric
observation equation, specified via Gaussian Process (GP) priors for each
series. Factor dynamics are modeled with a standard vector autoregression
(VAR), which facilitates computation and interpretation. We discuss a
computationally efficient estimation algorithm and consider two empirical
applications. First, we forecast key series from the FRED-QD dataset and show
that the model yields improvements in predictive accuracy relative to linear
benchmarks. Second, we extract driving factors of global inflation dynamics
with the GP-DFM, which allows for capturing international asymmetries.

arXiv link: http://arxiv.org/abs/2509.04928v1

Econometrics arXiv updated paper (originally submitted: 2025-09-04)

The exact distribution of the conditional likelihood-ratio test in instrumental variables regression

Authors: Malte Londschien

We derive the exact asymptotic distribution of the conditional
likelihood-ratio test in instrumental variables regression under weak
instrument asymptotics and for multiple endogenous variables. The distribution
is conditional on all eigenvalues of the concentration matrix, rather than only
the smallest eigenvalue as in an existing asymptotic upper bound. This exact
characterization leads to a substantially more powerful test if there are
differently identified endogenous variables. We provide computational methods
implementing the test and demonstrate the power gains through numerical
analysis.

arXiv link: http://arxiv.org/abs/2509.04144v2

Econometrics arXiv paper, submitted: 2025-09-04

Selecting the Best Arm in One-Shot Multi-Arm RCTs: The Asymptotic Minimax-Regret Decision Framework for the Best-Population Selection Problem

Authors: Joonhwi Joo

We develop a frequentist decision-theoretic framework for selecting the best
arm in one-shot, multi-arm randomized controlled trials (RCTs). Our approach
characterizes the minimax-regret (MMR) optimal decision rule for any
location-family reward distribution with full support. We show that the MMR
rule is deterministic, unique, and computationally tractable, as it can be
derived by solving the dual problem with nature's least-favorable prior. We
then specialize to the case of multivariate normal (MVN) rewards with an
arbitrary covariance matrix, and establish the local asymptotic minimaxity of a
plug-in version of the rule when only estimated means and covariances are
available. This asymptotic MMR (AMMR) procedure maps a covariance-matrix
estimate directly into decision boundaries, allowing straightforward
implementation in practice. Our analysis highlights a sharp contrast between
two-arm and multi-arm designs. With two arms, the empirical success rule
("pick-the-winner") remains MMR-optimal, regardless of the arm-specific
variances. By contrast, with three or more arms and heterogeneous variances,
the empirical success rule is no longer optimal: the MMR decision boundaries
become nonlinear and systematically penalize high-variance arms, requiring
stronger evidence to select them. This result underscores that variance plays
no role in optimal two-arm comparisons, but it matters critically when more
than two options are on the table. Our multi-arm AMMR framework extends
classical decision theory to multi-arm RCTs, offering a rigorous foundation and
a practical tool for comparing multiple policies simultaneously.

arXiv link: http://arxiv.org/abs/2509.03796v1

Econometrics arXiv paper, submitted: 2025-09-03

Data driven modeling of multiple interest rates with generalized Vasicek-type models

Authors: Pauliina Ilmonen, Milla Laurikkala, Kostiantyn Ralchenko, Tommi Sottinen, Lauri Viitasaari

The Vasicek model is a commonly used interest rate model, and there exist
many extensions and generalizations of it. However, most generalizations of the
model are either univariate or assume the noise process to be Gaussian, or
both. In this article, we study a generalized multivariate Vasicek model that
allows simultaneous modeling of multiple interest rates while making minimal
assumptions. In the model, we only assume that the noise process has stationary
increments with a suitably decaying autocovariance structure. We provide
estimators for the unknown parameters and prove their consistencies. We also
derive limiting distributions for each estimator and provide theoretical
examples. Furthermore, the model is tested empirically with both simulated data
and real data.

arXiv link: http://arxiv.org/abs/2509.03208v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-09-02

Bias Correction in Factor-Augmented Regression Models with Weak Factors

Authors: Peiyun Jiang, Yoshimasa Uematsu, Takashi Yamagata

In this paper, we study the asymptotic bias of the factor-augmented
regression estimator and its reduction, which is augmented by the $r$ factors
extracted from a large number of $N$ variables with $T$ observations. In
particular, we consider general weak latent factor models with $r$ signal
eigenvalues that may diverge at different rates, $N^{\alpha _{k}}$, $0<\alpha
_{k}\leq 1$, $k=1,\dots,r$. In the existing literature, the bias has been
derived using an approximation for the estimated factors with a specific
data-dependent rotation matrix $H$ for the model with $\alpha_{k}=1$ for
all $k$, whereas we derive the bias for weak factor models. In addition, we
derive the bias using the approximation with a different rotation matrix
$H_q$, which generally has a smaller bias than with $H$. We also
derive the bias using our preferred approximation with a purely
signal-dependent rotation $H$, which is unique and can be regarded as the
population version of $H$ and $H_q$. Since this bias is
parametrically inestimable, we propose a split-panel jackknife bias correction,
and theory shows that it successfully reduces the bias. The extensive
finite-sample experiments suggest that the proposed bias correction works very
well, and the empirical application illustrates its usefulness in practice.

arXiv link: http://arxiv.org/abs/2509.02066v2

Econometrics arXiv paper, submitted: 2025-09-02

Interpretational errors with instrumental variables

Authors: Luca Locher, Mats J. Stensrud, Aaron L. Sarvet

Instrumental variables (IV) are often used to identify causal effects in
observational settings and experiments subject to non-compliance. Under
canonical assumptions, IVs allow us to identify a so-called local average
treatment effect (LATE). The use of IVs is often accompanied by a pragmatic
decision to abandon the identification of the causal parameter that corresponds
to the original research question and target the LATE instead. This pragmatic
decision presents a potential source of error: an investigator mistakenly
interprets findings as if they had made inference on their original causal
parameter of interest. We conducted a systematic review and meta-analysis of
patterns of pragmatism and interpretational errors in the applied IV literature
published in leading journals of economics, political science, epidemiology,
and clinical medicine (n = 309 unique studies). We found that a large fraction
of studies targeted the LATE, although specific interest in this parameter was
rare. Of these studies, 61% contained claims that mistakenly suggested that
another parameter was targeted -- one whose value likely differs, and could
even have the opposite sign, from the parameter actually estimated. Our
findings suggest that the validity of conclusions drawn from IV applications is
often compromised by interpretational errors.

arXiv link: http://arxiv.org/abs/2509.02045v1

Econometrics arXiv paper, submitted: 2025-09-02

On the role of the design phase in a linear regression

Authors: Junho Choi

The "design phase" refers to a stage in observational studies, during which a
researcher constructs a subsample that achieves a better balance in covariate
distributions between the treated and untreated units. In this paper, we study
the role of this preliminary phase in the context of linear regression,
offering a justification for its utility. To that end, we first formalize the
design phase as a process of estimand adjustment via selecting a subsample.
Then, we show that covariate balance of a subsample is indeed a justifiable
criterion for guiding the selection: it informs on the maximum degree of model
misspecification that can be allowed for a subsample, when a researcher wishes
to restrict the bias of the estimand for the parameter of interest within a
target level of precision. In this sense, the pursuit of a balanced subsample
in the design phase is interpreted as identifying an estimand that is less
susceptible to bias in the presence of model misspecification. Also, we
demonstrate that covariate imbalance can serve as a sensitivity measure in
regression analysis, and illustrate how it can structure a communication
between a researcher and the readers of her report.

arXiv link: http://arxiv.org/abs/2509.01861v1

Econometrics arXiv updated paper (originally submitted: 2025-09-01)

Cohort-Anchored Robust Inference for Event-Study with Staggered Adoption

Authors: Ziyi Liu

This paper proposes a cohort-anchored framework for robust inference in event
studies with staggered adoption, building on Rambachan and Roth (2023). Robust
inference based on event-study coefficients aggregated across cohorts can be
misleading due to the dynamic composition of treated cohorts, especially when
pre-trends differ across cohorts. My approach avoids this problem by operating
at the cohort-period level. To address the additional challenge posed by
time-varying control groups in modern DiD estimators, I introduce the concept
of block bias: the parallel-trends violation for a cohort relative to its fixed
initial control group. I show that the biases of these estimators can be
decomposed invertibly into block biases. Because block biases maintain a
consistent comparison across pre- and post-treatment periods, researchers can
impose transparent restrictions on them to conduct robust inference. In
simulations and a reanalysis of minimum-wage effects on teen employment, my
framework yields better-centered (and sometimes narrower) confidence sets than
the aggregated approach when pre-trends vary across cohorts. The framework is
most useful in settings with multiple cohorts, sufficient within-cohort
precision, and substantial cross-cohort heterogeneity.

arXiv link: http://arxiv.org/abs/2509.01829v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-09-01

Finite-Sample Non-Parametric Bounds with an Application to the Causal Effect of Workforce Gender Diversity on Firm Performance

Authors: Grace Lordan, Kaveh Salehzadeh Nobari

Classical Manski bounds identify average treatment effects under minimal
assumptions but, in finite samples, assume that latent conditional expectations
are bounded by the sample's own extrema or that the population extrema are
known a priori -- often untrue in firm-level data with heavy tails. We develop
a finite-sample, concentration-driven band (concATE) that replaces that
assumption with a Dvoretzky--Kiefer--Wolfowitz tail bound, combines it with
delta-method variance, and allocates size via Bonferroni. The band extends to a
group-sequential design that controls the family-wise error when the first
“significant” diversity threshold is data-chosen. Applied to 945 listed firms
(2015 Q2--2022 Q1), concATE shows that senior-level gender diversity raises
Tobin's Q once representation exceeds approximately 30% in growth sectors and
approximately 65% in cyclical sectors.

arXiv link: http://arxiv.org/abs/2509.01622v1

Econometrics arXiv paper, submitted: 2025-09-01

Constrained Recursive Logit for Route Choice Analysis

Authors: Hung Tran, Tien Mai, Minh Ha Hoang

The recursive logit (RL) model has become a widely used framework for route
choice modeling, but it suffers from a key limitation: it assigns nonzero
probabilities to all paths in the network, including those that are
unrealistic, such as routes exceeding travel time deadlines or violating energy
constraints. To address this gap, we propose a novel Constrained Recursive
Logit (CRL) model that explicitly incorporates feasibility constraints into the
RL framework. CRL retains the main advantages of RL-no path sampling and ease
of prediction-but systematically excludes infeasible paths from the universal
choice set. The model is inherently non-Markovian; to address this, we develop
a tractable estimation approach based on extending the state space, which
restores the Markov property and enables estimation using standard value
iteration methods. We prove that our estimation method admits a unique solution
under positive discrete costs and establish its equivalence to a multinomial
logit model defined over restricted universal path choice sets. Empirical
experiments on synthetic and real networks demonstrate that CRL improves
behavioral realism and estimation stability, particularly in cyclic networks.

arXiv link: http://arxiv.org/abs/2509.01595v1

Econometrics arXiv paper, submitted: 2025-09-01

On the Estimation of Multinomial Logit and Nested Logit Models: A Conic Optimization Approach

Authors: Hoang Giang Pham, Tien Mai, Minh Ha Hoang

In this paper, we revisit parameter estimation for multinomial logit (MNL),
nested logit (NL), and tree-nested logit (TNL) models through the framework of
convex conic optimization. Traditional approaches typically solve the maximum
likelihood estimation (MLE) problem using gradient-based methods, which are
sensitive to step-size selection and initialization, and may therefore suffer
from slow or unstable convergence. In contrast, we propose a novel estimation
strategy that reformulates these models as conic optimization problems,
enabling more robust and reliable estimation procedures. Specifically, we show
that the MLE for MNL admits an equivalent exponential cone program (ECP). For
NL and TNL, we prove that when the dissimilarity (scale) parameters are fixed,
the estimation problem is convex and likewise reducible to an ECP. Leveraging
these results, we design a two-stage procedure: an outer loop that updates the
scale parameters and an inner loop that solves the ECP to update the utility
coefficients. The inner problems are handled by interior-point methods with
iteration counts that grow only logarithmically in the target accuracy, as
implemented in off-the-shelf solvers (e.g., MOSEK). Extensive experiments
across estimation instances of varying size show that our conic approach
attains better MLE solutions, greater robustness to initialization, and
substantial speedups compared to standard gradient-based MLE, particularly on
large-scale instances with high-dimensional specifications and large choice
sets. Our findings establish exponential cone programming as a practical and
scalable alternative for estimating a broad class of discrete choice models.

arXiv link: http://arxiv.org/abs/2509.01562v1

Econometrics arXiv updated paper (originally submitted: 2025-09-01)

Using Aggregate Relational Data to Infer Social Networks

Authors: Xunkang Tian

This study introduces a novel approach for inferring social network
structures using Aggregate Relational Data (ARD), addressing the challenge of
limited detailed network data availability. By integrating ARD with variational
approximation methods, we provide a computationally efficient and
cost-effective solution for network analysis. Our methodology demonstrates the
potential of ARD to offer insightful approximations of network dynamics, as
evidenced by Monte Carlo Simulations. This paper not only showcases the utility
of ARD in social network inference but also opens avenues for future research
in enhancing estimation precision and exploring diverse network datasets.
Through this work, we contribute to the field of network analysis by offering
an alternative strategy for understanding complex social networks with
constrained data.

arXiv link: http://arxiv.org/abs/2509.01503v2

Econometrics arXiv paper, submitted: 2025-09-01

Handling Sparse Non-negative Data in Finance

Authors: Agostino Capponi, Zhaonan Qu

We show that Poisson regression, though often recommended over log-linear
regression for modeling count and other non-negative variables in finance and
economics, can be far from optimal when heteroskedasticity and sparsity -- two
common features of such data -- are both present. We propose a general class of
moment estimators, encompassing Poisson regression, that balances the
bias-variance trade-off under these conditions. A simple cross-validation
procedure selects the optimal estimator. Numerical simulations and applications
to corporate finance data reveal that the best choice varies substantially
across settings and often departs from Poisson regression, underscoring the
need for a more flexible estimation framework.

arXiv link: http://arxiv.org/abs/2509.01478v1

Econometrics arXiv updated paper (originally submitted: 2025-09-01)

Bootstrap Diagnostic Tests

Authors: Giuseppe Cavaliere, Luca Fanelli, Iliyan Georgiev

Violation of the assumptions underlying classical (Gaussian) limit theory
often yields unreliable statistical inference. This paper shows that the
bootstrap can detect such violations by delivering simple and powerful
diagnostic tests that (a) induce no pre-testing bias, (b) use the same critical
values across applications, and (c) are consistent against deviations from
asymptotic normality. The tests compare the conditional distribution of a
bootstrap statistic with the Gaussian limit implied by valid specification and
assess whether the resulting discrepancy is large enough to indicate failure of
the asymptotic Gaussian approximation. The method is computationally
straightforward and only requires a sample of i.i.d. draws of the bootstrap
statistic. We derive sufficient conditions for the randomness in the data to
mix with the randomness in the bootstrap repetitions in a way such that (a),
(b) and (c) above hold. We demonstrate the practical relevance and broad
applicability of bootstrap diagnostics by considering several scenarios where
the asymptotic Gaussian approximation may fail, including weak instruments,
non-stationarity, parameters on the boundary of the parameter space, infinite
variance data and singular Jacobian in applications of the delta method. An
illustration drawn from the empirical macroeconomic literature concludes.

arXiv link: http://arxiv.org/abs/2509.01351v2

Econometrics arXiv paper, submitted: 2025-08-29

Treatment effects at the margin: Everyone is marginal

Authors: Haotian Deng

This paper develops a framework for identifying treatment effects when a
policy simultaneously alters both the incentive to participate and the outcome
of interest -- such as hiring decisions and wages in response to employment
subsidies; or working decisions and wages in response to job trainings. This
framework was inspired by my PhD project on a Belgian reform that subsidised
first-time hiring, inducing entry by marginal firms yet meanwhile changing the
wages they pay. Standard methods addressing selection-into-treatment concepts
(like Heckman selection equations and local average treatment effects), or
before-after comparisons (including simple DiD or RDD), cannot isolate effects
at this shifting margin where treatment defines who is observed. I introduce
marginality-weighted estimands that recover causal effects among policy-induced
entrants, offering a policy-relevant alternative in settings with endogenous
selection. This method can thus be applied widely to understanding the economic
impacts of public programmes, especially in fields largely relying on
reduced-form causal inference estimation (e.g. labour economics, development
economics, health economics).

arXiv link: http://arxiv.org/abs/2508.21583v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-08-29

Triply Robust Panel Estimators

Authors: Susan Athey, Guido Imbens, Zhaonan Qu, Davide Viviano

This paper studies estimation of causal effects in a panel data setting. We
introduce a new estimator, the Triply RObust Panel (TROP) estimator, that
combines (i) a flexible model for the potential outcomes based on a low-rank
factor structure on top of a two-way-fixed effect specification, with (ii) unit
weights intended to upweight units similar to the treated units and (iii) time
weights intended to upweight time periods close to the treated time periods. We
study the performance of the estimator in a set of simulations designed to
closely match several commonly studied real data sets. We find that there is
substantial variation in the performance of the estimators across the settings
considered. The proposed estimator outperforms
two-way-fixed-effect/difference-in-differences, synthetic control, matrix
completion and synthetic-difference-in-differences estimators. We investigate
what features of the data generating process lead to this performance, and
assess the relative importance of the three components of the proposed
estimator. We have two recommendations. Our preferred strategy is that
researchers use simulations closely matched to the data they are interested in,
along the lines discussed in this paper, to investigate which estimators work
well in their particular setting. A simpler approach is to use more robust
estimators such as synthetic difference-in-differences or the new triply robust
panel estimator which we find to substantially outperform two-way fixed effect
estimators in many empirically relevant settings.

arXiv link: http://arxiv.org/abs/2508.21536v2

Econometrics arXiv paper, submitted: 2025-08-28

Uniform Quasi ML based inference for the panel AR(1) model

Authors: Hugo Kruiniger

This paper proposes new inference methods for panel AR models with arbitrary
initial conditions and heteroskedasticity and possibly additional regressors
that are robust to the strength of identification. Specifically, we consider
several Maximum Likelihood based methods of constructing tests and confidence
sets (CSs) and show that (Quasi) LM tests and CSs that use the expected Hessian
rather than the observed Hessian of the log-likelihood have correct asymptotic
size (in a uniform sense). We derive the power envelope of a Fixed Effects
version of such a LM test for hypotheses involving the autoregressive parameter
when the average information matrix is estimated by a centered OPG estimator
and the model is only second-order identified, and show that it coincides with
the maximal attainable power curve in the worst case setting. We also study the
empirical size and power properties of these (Quasi) LM tests and CSs.

arXiv link: http://arxiv.org/abs/2508.20855v1

Econometrics arXiv paper, submitted: 2025-08-28

Time Series Embedding and Combination of Forecasts: A Reinforcement Learning Approach

Authors: Marcelo C. Medeiros, Jeronymo M. Pinro

The forecasting combination puzzle is a well-known phenomenon in forecasting
literature, stressing the challenge of outperforming the simple average when
aggregating forecasts from diverse methods. This study proposes a Reinforcement
Learning - based framework as a dynamic model selection approach to address
this puzzle. Our framework is evaluated through extensive forecasting exercises
using simulated and real data. Specifically, we analyze the M4 Competition
dataset and the Survey of Professional Forecasters (SPF). This research
introduces an adaptable methodology for selecting and combining forecasts under
uncertainty, offering a promising advancement in resolving the forecasting
combination puzzle.

arXiv link: http://arxiv.org/abs/2508.20795v1

Econometrics arXiv paper, submitted: 2025-08-28

A further look at Modified ML estimation of the panel AR(1) model with fixed effects and arbitrary initial conditions

Authors: Hugo Kruiniger

In this paper we consider two kinds of generalizations of Lancaster's (Review
of Economic Studies, 2002) Modified ML estimator (MMLE) for the panel AR(1)
model with fixed effects and arbitrary initial conditions and possibly
covariates when the time dimension, T, is fixed. When the autoregressive
parameter rho=1, the limiting modified profile log-likelihood function for this
model has a stationary point of inflection, and rho is first-order
underidentified but second-order identified. We show that the generalized MMLEs
exist w.p.a.1. and are uniquely defined w.p.1. and consistent for any value of
|rho| =< 1. When rho=1, the rate of convergence of the MMLEs is N^{1/4}, where
N is the cross-sectional dimension of the panel. We then develop an asymptotic
theory for GMM estimators when one of the parameters is only second-order
identified and use this to derive the limiting distributions of the MMLEs. They
are generally asymmetric when rho=1. We also show that Quasi LM tests that are
based on the modified profile log-likelihood and use its expected rather than
observed Hessian, with an additional modification for rho=1, and confidence
regions based on inverting these tests have correct asymptotic size in a
uniform sense when |rho| =< 1. Finally, we investigate the finite sample
properties of the MMLEs and the QLM test in a Monte Carlo study.

arXiv link: http://arxiv.org/abs/2508.20753v1

Econometrics arXiv paper, submitted: 2025-08-27

Inference on Partially Identified Parameters with Separable Nuisance Parameters: a Two-Stage Method

Authors: Xunkang Tian

This paper develops a two-stage method for inference on partially identified
parameters in moment inequality models with separable nuisance parameters. In
the first stage, the nuisance parameters are estimated separately, and in the
second stage, the identified set for the parameters of interest is constructed
using a refined chi-squared test with variance correction that accounts for the
first-stage estimation error. We establish the asymptotic validity of the
proposed method under mild conditions and characterize its finite-sample
properties. The method is broadly applicable to models where direct elimination
of nuisance parameters is difficult or introduces conservativeness. Its
practical performance is illustrated through an application: structural
estimation of entry and exit costs in the U.S. vehicle market based on Wollmann
(2018).

arXiv link: http://arxiv.org/abs/2508.19853v1

Econometrics arXiv cross-link from q-fin.PR (q-fin.PR), submitted: 2025-08-26

Is attention truly all we need? An empirical study of asset pricing in pretrained RNN sparse and global attention models

Authors: Shanyan Lai

This study investigates the pretrained RNN attention models with the
mainstream attention mechanisms such as additive attention, Luong's three
attentions, global self-attention (Self-att) and sliding window sparse
attention (Sparse-att) for the empirical asset pricing research on top 420
large-cap US stocks. This is the first paper on the large-scale
state-of-the-art (SOTA) attention mechanisms applied in the asset pricing
context. They overcome the limitations of the traditional machine learning (ML)
based asset pricing, such as mis-capturing the temporal dependency and short
memory. Moreover, the enforced causal masks in the attention mechanisms address
the future data leaking issue ignored by the more advanced attention-based
models, such as the classic Transformer. The proposed attention models also
consider the temporal sparsity characteristic of asset pricing data and
mitigate potential overfitting issues by deploying the simplified model
structures. This provides some insights for future empirical economic research.
All models are examined in three periods, which cover pre-COVID-19 (mild
uptrend), COVID-19 (steep uptrend with a large drawdown) and one year
post-COVID-19 (sideways movement with high fluctuations), for testing the
stability of these models under extreme market conditions. The study finds that
in value-weighted portfolio back testing, Model Self-att and Model Sparse-att
exhibit great capabilities in deriving the absolute returns and hedging
downside risks, while they achieve an annualized Sortino ratio of 2.0 and 1.80
respectively in the period with COVID-19. And Model Sparse-att performs more
stably than Model Self-att from the perspective of absolute portfolio returns
with respect to the size of stocks' market capitalization.

arXiv link: http://arxiv.org/abs/2508.19006v1

Econometrics arXiv paper, submitted: 2025-08-21

A bias test for heteroscedastic linear least-squares regression

Authors: Eric Blankmeyer

Linear least squares regression is subject to bias due to an omitted
variable, a mismeasured regressor, or simultaneity. A simple test to detect the
bias is proposed and explored in simulation and in real data sets.

arXiv link: http://arxiv.org/abs/2508.15969v1

Econometrics arXiv paper, submitted: 2025-08-21

Multivariate quantile regression

Authors: Antonio F. Galvao, Gabriel Montes-Rojas

This paper introduces a new framework for multivariate quantile regression
based on the multivariate distribution function, termed multivariate quantile
regression (MQR). In contrast to existing approaches--such as directional
quantiles, vector quantile regression, or copula-based methods--MQR defines
quantiles through the conditional probability structure of the joint
conditional distribution function. The method constructs multivariate quantile
curves using sequential univariate quantile regressions derived from
conditioning mechanisms, allowing for an intuitive interpretation and flexible
estimation of marginal effects. The paper develops theoretical foundations of
MQR, including asymptotic properties of the estimators. Through simulation
exercises, the estimator demonstrates robust finite sample performance across
different dependence structures. As an empirical application, the MQR framework
is applied to the analysis of exchange rate pass-through in Argentina from 2004
to 2024.

arXiv link: http://arxiv.org/abs/2508.15749v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-08-21

Effect Identification and Unit Categorization in the Multi-Score Regression Discontinuity Design with Application to LED Manufacturing

Authors: Philipp Alexander Schwarz, Oliver Schacht, Sven Klaassen, Johannes Oberpriller, Martin Spindler

RDD (Regression discontinuity design) is a widely used framework for
identifying and estimating causal effects at the cutoff of a single running
variable. In practice, however, decision-making often involves multiple
thresholds and criteria, especially in production systems. Standard MRD
(multi-score RDD) methods address this complexity by reducing the problem to a
one-dimensional design. This simplification allows existing approaches to be
used to identify and estimate causal effects, but it can introduce
non-compliance by misclassifying units relative to the original cutoff rules.
We develop theoretical tools to detect and reduce "fuzziness" when estimating
the cutoff effect for units that comply with individual subrules of a
multi-rule system. In particular, we propose a formal definition and
categorization of unit behavior types under multi-dimensional cutoff rules,
extending standard classifications of compliers, alwaystakers, and nevertakers,
and incorporating defiers and indecisive units. We further identify conditions
under which cutoff effects for compliers can be estimated in multiple
dimensions, and establish when identification remains valid after excluding
nevertakers and alwaystakers. In addition, we examine how decomposing complex
Boolean cutoff rules (such as AND- and OR-type rules) into simpler components
affects the classification of units into behavioral types and improves
estimation by making it possible to identify and remove non-compliant units
more accurately. We validate our framework using both semi-synthetic
simulations calibrated to production data and real-world data from
opto-electronic semiconductor manufacturing. The empirical results demonstrate
that our approach has practical value in refining production policies and
reduces estimation variance. This underscores the usefulness of the MRD
framework in manufacturing contexts.

arXiv link: http://arxiv.org/abs/2508.15692v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-08-21

Large-dimensional Factor Analysis with Weighted PCA

Authors: Zhongyuan Lyu, Ming Yuan

Principal component analysis (PCA) is arguably the most widely used approach
for large-dimensional factor analysis. While it is effective when the factors
are sufficiently strong, it can be inconsistent when the factors are weak
and/or the noise has complex dependence structure. We argue that the
inconsistency often stems from bias and introduce a general approach to restore
consistency. Specifically, we propose a general weighting scheme for PCA and
show that with a suitable choice of weighting matrices, it is possible to
deduce consistent and asymptotic normal estimators under much weaker conditions
than the usual PCA. While the optimal weight matrix may require knowledge about
the factors and covariance of the idiosyncratic noise that are not known a
priori, we develop an agnostic approach to adaptively choose from a large class
of weighting matrices that can be viewed as PCA for weighted linear
combinations of auto-covariances among the observations. Theoretical and
numerical results demonstrate the merits of our methodology over the usual PCA
and other recently developed techniques for large-dimensional approximate
factor models.

arXiv link: http://arxiv.org/abs/2508.15675v1

Econometrics arXiv paper, submitted: 2025-08-21

K-Means Panel Data Clustering in the Presence of Small Groups

Authors: Mikihito Nishi

We consider panel data models with group structure. We study the asymptotic
behavior of least-squares estimators and information criterion for the number
of groups, allowing for the presence of small groups that have an
asymptotically negligible relative size. Our contributions are threefold.
First, we derive sufficient conditions under which the least-squares estimators
are consistent and asymptotically normal. One of the conditions implies that a
longer sample period is required as there are smaller groups. Second, we show
that information criteria for the number of groups proposed in earlier works
can be inconsistent or perform poorly in the presence of small groups. Third,
we propose modified information criteria (MIC) designed to perform well in the
presence of small groups. A Monte Carlo simulation confirms their good
performance in finite samples. An empirical application illustrates that
K-means clustering paired with the proposed MIC allows one to discover small
groups without producing too many groups. This enables characterizing small
groups and differentiating them from the other large groups in a parsimonious
group structure.

arXiv link: http://arxiv.org/abs/2508.15408v1

Econometrics arXiv paper, submitted: 2025-08-19

A Nonparametric Approach to Augmenting a Bayesian VAR with Nonlinear Factors

Authors: Todd Clark, Florian Huber, Gary Koop

This paper proposes a Vector Autoregression augmented with nonlinear factors
that are modeled nonparametrically using regression trees. There are four main
advantages of our model. First, modeling potential nonlinearities
nonparametrically lessens the risk of mis-specification. Second, the use of
factor methods ensures that departures from linearity are modeled
parsimoniously. In particular, they exhibit functional pooling where a small
number of nonlinear factors are used to model common nonlinearities across
variables. Third, Bayesian computation using MCMC is straightforward even in
very high dimensional models, allowing for efficient, equation by equation
estimation, thus avoiding computational bottlenecks that arise in popular
alternatives such as the time varying parameter VAR. Fourth, existing methods
for identifying structural economic shocks in linear factor models can be
adapted for the nonlinear case in a straightforward fashion using our model.
Exercises involving artificial and macroeconomic data illustrate the properties
of our model and its usefulness for forecasting and structural economic
analysis.

arXiv link: http://arxiv.org/abs/2508.13972v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-08-19

Partial Identification of Causal Effects for Endogenous Continuous Treatments

Authors: Abhinandan Dalal, Eric J. Tchetgen Tchetgen

No unmeasured confounding is a common assumption when reasoning about
counterfactual outcomes, but such an assumption may not be plausible in
observational studies. Sensitivity analysis is often employed to assess the
robustness of causal conclusions to unmeasured confounding, but existing
methods are predominantly designed for binary treatments. In this paper, we
provide natural extensions of two extensively used sensitivity frameworks --
the Rosenbaum and Marginal sensitivity models -- to the setting of continuous
exposures. Our generalization replaces scalar sensitivity parameters with
sensitivity functions that vary with exposure level, enabling richer modeling
and sharper identification bounds. We develop a unified pseudo-outcome
regression formulation for bounding the counterfactual dose-response curve
under both models, and propose corresponding nonparametric estimators which
have second order bias. These estimators accommodate modern machine learning
methods for obtaining nuisance parameter estimators, which are shown to achieve
$L^2$- consistency, minimax rates of convergence under suitable conditions. Our
resulting estimators of bounds for the counterfactual dose-response curve are
shown to be consistent and asymptotic normal allowing for a user-specified
bound on the degree of uncontrolled exposure endogeneity. We also offer a
geometric interpretation that relates the Rosenbaum and Marginal sensitivity
model and guides their practical usage in global versus targeted sensitivity
analysis. The methods are validated through simulations and a real-data
application on the effect of second-hand smoke exposure on blood lead levels in
children.

arXiv link: http://arxiv.org/abs/2508.13946v1

Econometrics arXiv paper, submitted: 2025-08-18

Reasonable uncertainty: Confidence intervals in empirical Bayes discrimination detection

Authors: Jiaying Gu, Nikolaos Ignatiadis, Azeem M. Shaikh

We revisit empirical Bayes discrimination detection, focusing on uncertainty
arising from both partial identification and sampling variability. While prior
work has mostly focused on partial identification, we find that some empirical
findings are not robust to sampling uncertainty. To better connect statistical
evidence to the magnitude of real-world discriminatory behavior, we propose a
counterfactual odds-ratio estimand with a attractive properties and
interpretation. Our analysis reveals the importance of careful attention to
uncertainty quantification and downstream goals in empirical Bayes analyses.

arXiv link: http://arxiv.org/abs/2508.13110v1

Econometrics arXiv updated paper (originally submitted: 2025-08-18)

The purpose of an estimator is what it does: Misspecification, estimands, and over-identification

Authors: Isaiah Andrews, Jiafeng Chen, Otavio Tecchio

In over-identified models, misspecification -- the norm rather than exception
-- fundamentally changes what estimators estimate. Different estimators imply
different estimands rather than different efficiency for the same target. A
review of recent applications of generalized method of moments in the American
Economic Review suggests widespread acceptance of this fact: There is little
formal specification testing and widespread use of estimators that would be
inefficient were the model correct, including the use of "hand-selected"
moments and weighting matrices. Motivated by these observations, we review and
synthesize recent results on estimation under model misspecification, providing
guidelines for transparent and robust empirical research. We also provide a new
theoretical result, showing that Hansen's J-statistic measures, asymptotically,
the range of estimates achievable at a given standard error. Given the
widespread use of inefficient estimators and the resulting researcher degrees
of freedom, we thus particularly recommend the broader reporting of
J-statistics.

arXiv link: http://arxiv.org/abs/2508.13076v3

Econometrics arXiv paper, submitted: 2025-08-18

Estimation in linear models with clustered data

Authors: Anna Mikusheva, Mikkel Sølvsten, Baiyun Jing

We study linear regression models with clustered data, high-dimensional
controls, and a complicated structure of exclusion restrictions. We propose a
correctly centered internal IV estimator that accommodates a variety of
exclusion restrictions and permits within-cluster dependence. The estimator has
a simple leave-out interpretation and remains computationally tractable. We
derive a central limit theorem for its quadratic form and propose a robust
variance estimator. We also develop inference methods that remain valid under
weak identification. Our framework extends classical dynamic panel methods to
more general clustered settings. An empirical application of a large-scale
fiscal intervention in rural Kenya with spatial interference illustrates the
approach.

arXiv link: http://arxiv.org/abs/2508.12860v1

Econometrics arXiv paper, submitted: 2025-08-18

Bivariate Distribution Regression; Theory, Estimation and an Application to Intergenerational Mobility

Authors: Victor Chernozhukov, Iván Fernández-Val, Jonas Meier, Aico van Vuuren, Francis Vella

We employ distribution regression (DR) to estimate the joint distribution of
two outcome variables conditional on chosen covariates. While Bivariate
Distribution Regression (BDR) is useful in a variety of settings, it is
particularly valuable when some dependence between the outcomes persists after
accounting for the impact of the covariates. Our analysis relies on a result
from Chernozhukov et al. (2018) which shows that any conditional joint
distribution has a local Gaussian representation. We describe how BDR can be
implemented and present some associated functionals of interest. As modeling
the unexplained dependence is a key feature of BDR, we focus on functionals
related to this dependence. We decompose the difference between the joint
distributions for different groups into composition, marginal and sorting
effects. We provide a similar decomposition for the transition matrices which
describe how location in the distribution in one of the outcomes is associated
with location in the other. Our theoretical contributions are the derivation of
the properties of these estimated functionals and appropriate procedures for
inference. Our empirical illustration focuses on intergenerational mobility.
Using the Panel Survey of Income Dynamics data, we model the joint distribution
of parents' and children's earnings. By comparing the observed distribution
with constructed counterfactuals, we isolate the impact of observable and
unobservable factors on the observed joint distribution. We also evaluate the
forces responsible for the difference between the transition matrices of sons'
and daughters'.

arXiv link: http://arxiv.org/abs/2508.12716v1

Econometrics arXiv paper, submitted: 2025-08-18

Bayesian Double Machine Learning for Causal Inference

Authors: Francis J. DiTraglia, Laura Liu

This paper proposes a simple, novel, and fully-Bayesian approach for causal
inference in partially linear models with high-dimensional control variables.
Off-the-shelf machine learning methods can introduce biases in the causal
parameter known as regularization-induced confounding. To address this, we
propose a Bayesian Double Machine Learning (BDML) method, which modifies a
standard Bayesian multivariate regression model and recovers the causal effect
of interest from the reduced-form covariance matrix. Our BDML is related to the
burgeoning frequentist literature on DML while addressing its limitations in
finite-sample inference. Moreover, the BDML is based on a fully generative
probability model in the DML context, adhering to the likelihood principle. We
show that in high dimensional setups the naive estimator implicitly assumes no
selection on observables--unlike our BDML. The BDML exhibits lower asymptotic
bias and achieves asymptotic normality and semiparametric efficiency as
established by a Bernstein-von Mises theorem, thereby ensuring robustness to
misspecification. In simulations, our BDML achieves lower RMSE, better
frequentist coverage, and shorter confidence interval width than alternatives
from the literature, both Bayesian and frequentist.

arXiv link: http://arxiv.org/abs/2508.12688v1

Econometrics arXiv updated paper (originally submitted: 2025-08-17)

Reconstructing Subnational Labor Indicators in Colombia: An Integrated Machine and Deep Learning Approach

Authors: Jaime Vera-Jaramillo

This study proposes a unified multi-stage framework to reconstruct consistent
monthly and annual labor indicators for all 33 Colombian departments from 1993
to 2025. The approach integrates temporal disaggregation, time-series splicing
and interpolation, statistical learning, and institutional covariates to
estimate seven key variables: employment, unemployment, labor force
participation (PEA), inactivity, working-age population (PET), total
population, and informality rate, including in regions without direct survey
coverage. The framework enforces labor accounting identities, scales results to
demographic projections, and aligns all estimates with national benchmarks to
ensure internal coherence. Validation against official departmental GEIH
aggregates and city-level informality data for the 23 metropolitan areas yields
in-sample Mean Absolute Percentage Errors (MAPEs) below 2.3% across indicators,
confirming strong predictive performance. To our knowledge, this is the first
dataset to provide spatially exhaustive and temporally consistent monthly labor
measures for Colombia. By incorporating both quantitative and qualitative
dimensions of employment, the panel enhances the empirical foundation for
analysing long-term labor market dynamics, identifying regional disparities,
and designing targeted policy interventions.

arXiv link: http://arxiv.org/abs/2508.12514v2

Econometrics arXiv paper, submitted: 2025-08-17

A statistician's guide to weak-instrument-robust inference in instrumental variables regression with illustrations in Python

Authors: Malte Londschien

We provide an overview of results relating to estimation and
weak-instrument-robust inference in instrumental variables regression. Methods
are implemented in the ivmodels software package for Python, which we use to
illustrate results.

arXiv link: http://arxiv.org/abs/2508.12474v1

Econometrics arXiv updated paper (originally submitted: 2025-08-17)

The Identification Power of Combining Experimental and Observational Data for Distributional Treatment Effect Parameters

Authors: Shosei Sakaguchi

This study investigates the identification power gained by combining
experimental data, in which treatment is randomized, with observational data,
in which treatment is self-selected, for distributional treatment effect (DTE)
parameters. While experimental data identify average treatment effects, many
DTE parameters, such as the distribution of individual treatment effects, are
only partially identified. We examine whether and how combining these two data
sources tightens the identified set for such parameters. For broad classes of
DTE parameters, we derive nonparametric sharp bounds under the combined data
and clarify the mechanism through which data combination improves
identification relative to using experimental data alone. Our analysis
highlights that self-selection in observational data is a key source of
identification power. We establish necessary and sufficient conditions under
which the combined data shrink the identified set, showing that such shrinkage
generally occurs unless selection-on-observables holds in the observational
data. We also propose a linear programming approach to compute sharp bounds
that can incorporate additional structural restrictions, such as positive
dependence between potential outcomes and the generalized Roy model. An
empirical application using data on negative campaign advertisements in the
2008 U.S. presidential election illustrates the practical relevance of the
proposed approach.

arXiv link: http://arxiv.org/abs/2508.12206v3

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2025-08-16

A note on simulation methods for the Dirichlet-Laplace prior

Authors: Luis Gruber, Gregor Kastner, Anirban Bhattacharya, Debdeep Pati, Natesh Pillai, David Dunson

Bhattacharya et al. (2015, Journal of the American Statistical Association
110(512): 1479-1490) introduce a novel prior, the Dirichlet-Laplace (DL) prior,
and propose a Markov chain Monte Carlo (MCMC) method to simulate posterior
draws under this prior in a conditionally Gaussian setting. The original
algorithm samples from conditional distributions in the wrong order, i.e., it
does not correctly sample from the joint posterior distribution of all latent
variables. This note details the issue and provides two simple solutions: A
correction to the original algorithm and a new algorithm based on an
alternative, yet equivalent, formulation of the prior. This corrigendum does
not affect the theoretical results in Bhattacharya et al. (2015).

arXiv link: http://arxiv.org/abs/2508.11982v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-08-15

Approximate Factor Model with S-vine Copula Structure

Authors: Jialing Han, Yu-Ning Li

We propose a novel framework for approximate factor models that integrates an
S-vine copula structure to capture complex dependencies among common factors.
Our estimation procedure proceeds in two steps: first, we apply principal
component analysis (PCA) to extract the factors; second, we employ maximum
likelihood estimation that combines kernel density estimation for the margins
with an S-vine copula to model the dependence structure. Jointly fitting the
S-vine copula with the margins yields an oblique factor rotation without
resorting to ad hoc restrictions or traditional projection pursuit methods. Our
theoretical contributions include establishing the consistency of the rotation
and copula parameter estimators, developing asymptotic theory for the
factor-projected empirical process under dependent data, and proving the
uniform consistency of the projected entropy estimators. Simulation studies
demonstrate convergence with respect to both the dimensionality and the sample
size. We further assess model performance through Value-at-Risk (VaR)
estimation via Monte Carlo methods and apply our methodology to the daily
returns of S&P 500 Index constituents to forecast the VaR of S&P 500 index.

arXiv link: http://arxiv.org/abs/2508.11619v1

Econometrics arXiv paper, submitted: 2025-08-15

Binary choice logit models with general fixed effects for panel and network data

Authors: Kevin Dano, Bo E. Honoré, Martin Weidner

This paper systematically analyzes and reviews identification strategies for
binary choice logit models with fixed effects in panel and network data
settings. We examine both static and dynamic models with general fixed-effect
structures, including individual effects, time trends, and two-way or dyadic
effects. A key challenge is the incidental parameter problem, which arises from
the increasing number of fixed effects as the sample size grows. We explore two
main strategies for eliminating nuisance parameters: conditional likelihood
methods, which remove fixed effects by conditioning on sufficient statistics,
and moment-based methods, which derive fixed-effect-free moment conditions. We
demonstrate how these approaches apply to a variety of models, summarizing key
findings from the literature while also presenting new examples and new
results.

arXiv link: http://arxiv.org/abs/2508.11556v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2025-08-15

Stealing Accuracy: Predicting Day-ahead Electricity Prices with Temporal Hierarchy Forecasting (THieF)

Authors: Arkadiusz Lipiecki, Kaja Bilinska, Nicolaos Kourentzes, Rafal Weron

We introduce the concept of temporal hierarchy forecasting (THieF) in
predicting day-ahead electricity prices and show that reconciling forecasts for
hourly products, 2- to 12-hour blocks, and baseload contracts significantly (up
to 13%) improves accuracy at all levels. These results remain consistent
throughout a challenging 4-year test period (2021-2024) in the German power
market and across model architectures, including linear regression, a shallow
neural network, gradient boosting, and a state-of-the-art transformer. Given
that (i) trading of block products is becoming more common and (ii) the
computational cost of reconciliation is comparable to that of predicting hourly
prices alone, we recommend using it in daily forecasting practice.

arXiv link: http://arxiv.org/abs/2508.11372v1

Econometrics arXiv updated paper (originally submitted: 2025-08-15)

Factor Models of Matrix-Valued Time Series: Nonstationarity and Cointegration

Authors: Degui Li, Yayi Yan, Qiwei Yao

In this paper, we consider the nonstationary matrix-valued time series with
common stochastic trends. Unlike the traditional factor analysis which flattens
matrix observations into vectors, we adopt a matrix factor model in order to
fully explore the intrinsic matrix structure in the data, allowing interaction
between the row and column stochastic trends, and subsequently improving the
estimation convergence. It also reduces the computation complexity in
estimation. The main estimation methodology is built on the eigenanalysis of
sample row and column covariance matrices when the nonstationary matrix factors
are of full rank and the idiosyncratic components are temporally stationary,
and is further extended to tackle a more flexible setting when the matrix
factors are cointegrated and the idiosyncratic components may be nonstationary.
Under some mild conditions which allow the existence of weak factors, we derive
the convergence theory for the estimated factor loading matrices and
nonstationary factor matrices. In particular, the developed methodology and
theory are applicable to the general case of heterogeneous strengths over weak
factors. An easy-to-implement ratio criterion is adopted to consistently
estimate the size of latent factor matrix. Both simulation and empirical
studies are conducted to examine the numerical performance of the developed
model and methodology in finite samples.

arXiv link: http://arxiv.org/abs/2508.11358v2

Econometrics arXiv cross-link from q-fin.MF (q-fin.MF), submitted: 2025-08-14

Higher-order Gini indices: An axiomatic approach

Authors: Xia Han, Ruodu Wang, Qinyu Wu

Via an axiomatic approach, we characterize the family of n-th order Gini
deviation, defined as the expected range over n independent draws from a
distribution, to quantify joint dispersion across multiple observations. This
family extends the classical Gini deviation, which relies solely on pairwise
comparisons. The normalized version is called a high-order Gini coefficient.
The generalized indices grow increasingly sensitive to tail inequality as n
increases, offering a more nuanced view of distributional extremes. The
higher-order Gini deviations admit a Choquet integral representation,
inheriting the desirable properties of coherent deviation measures.
Furthermore, we show that both the n-th order Gini deviation and the n-th order
Gini coefficient are statistically n-observation elicitable, allowing for
direct computation through empirical risk minimization. Data analysis using
World Inequality Database data reveals that higher-order Gini coefficients
capture disparities that the classical Gini coefficient may fail to reflect,
particularly in cases of extreme income or wealth concentration.

arXiv link: http://arxiv.org/abs/2508.10663v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-08-14

On the implications of proportional hazards assumptions for competing risks modelling

Authors: Simon M. S. Lo, Ralf A. Wilke, Takeshi Emura

The assumption of hazard rates being proportional in covariates is widely
made in empirical research and extensive research has been done to develop
tests of its validity. This paper does not contribute on this end. Instead, it
gives new insights on the implications of proportional hazards (PH) modelling
in competing risks models. It is shown that the use of a PH model for the
cause-specific hazards or subdistribution hazards can strongly restrict the
class of copulas and marginal hazards for being compatible with a competing
risks model. The empirical researcher should be aware that working with these
models can be so restrictive that only degenerate or independent risks models
are compatible. Numerical results confirm that estimates of cause-specific
hazards models are not informative about patterns in the data generating
process.

arXiv link: http://arxiv.org/abs/2508.10577v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2025-08-14

Heterogeneity in Women's Nighttime Ride-Hailing Intention: Evidence from an LC-ICLV Model Analysis

Authors: Ke Wang, Dongmin Yao, Xin Ye, Mingyang Pei

While ride-hailing services offer increased travel flexibility and
convenience, persistent nighttime safety concerns significantly reduce women's
willingness to use them. Existing research often treats women as a homogeneous
group, neglecting the heterogeneity in their decision-making processes. To
address this gap, this study develops the Latent Class Integrated Choice and
Latent Variable (LC-ICLV) model with a mixed Logit kernel, combined with an
ordered Probit model for attitudinal indicators, to capture unobserved
heterogeneity in women's nighttime ride-hailing decisions. Based on panel data
from 543 respondents across 29 provinces in China, the analysis identifies two
distinct female subgroups. The first, labeled the "Attribute-Sensitive Group",
consists mainly of young women and students from first- and second-tier cities.
Their choices are primarily influenced by observable service attributes such as
price and waiting time, but they exhibit reduced usage intention when matched
with female drivers, possibly reflecting deeper safety heuristics. The second,
the "Perception-Sensitive Group", includes older working women and residents of
less urbanized areas. Their decisions are shaped by perceived risk and safety
concerns; notably, high-frequency use or essential nighttime commuting needs
may reinforce rather than alleviate avoidance behaviors. The findings
underscore the need for differentiated strategies: platforms should tailor
safety features and user interfaces by subgroup, policymakers must develop
targeted interventions, and female users can benefit from more personalized
risk mitigation strategies. This study offers empirical evidence to advance
gender-responsive mobility policy and improve the inclusivity of ride-hailing
services in urban nighttime contexts.

arXiv link: http://arxiv.org/abs/2508.10951v1

Econometrics arXiv paper, submitted: 2025-08-14

Two-Way Mean Group Estimators for Heterogeneous Panel Models with Fixed T

Authors: Xun Lu, Liangjun Su

We consider a correlated random coefficient panel data model with two-way
fixed effects and interactive fixed effects in a fixed T framework. We propose
a two-way mean group (TW-MG) estimator for the expected value of the slope
coefficient and propose a leave-one-out jackknife method for valid inference.
We also consider a pooled estimator and provide a Hausman-type test for
poolability. Simulations demonstrate the excellent performance of our
estimators and inference methods in finite samples. We apply our new methods to
two datasets to examine the relationship between health-care expenditure and
income, and estimate a production function.

arXiv link: http://arxiv.org/abs/2508.10302v1

Econometrics arXiv paper, submitted: 2025-08-13

Machine Learning for Detecting Collusion and Capacity Withholding in Wholesale Electricity Markets

Authors: Jeremy Proz, Martin Huber

Collusion and capacity withholding in electricity wholesale markets are
important mechanisms of market manipulation. This study applies a refined
machine learning-based cartel detection algorithm to two cartel cases in the
Italian electricity market and evaluates its out-of-sample performance.
Specifically, we consider an ensemble machine learning method that uses
statistical screens constructed from the offer price distribution as predictors
for the incidence of collusion among electricity providers in specific regions.
We propose novel screens related to the capacity-withholding behavior of
electricity providers and find that including such screens derived from the
day-ahead spot market as predictors can improve cartel detection. We find that,
under complete cartels - where collusion in a tender presumably involves all
suppliers - the method correctly classifies up to roughly 95% of tenders in our
data as collusive or competitive, improving classification accuracy compared to
using only previously available screens. However, when trained on larger
datasets including non-cartel members and applying algorithms tailored to
detect incomplete cartels, the previously existing screens are sufficient to
achieve 98% accuracy, and the addition of our newly proposed
capacity-withholding screens does not further improve performance. Overall,
this study highlights the promising potential of supervised machine learning
techniques for detecting and dismantling cartels in electricity markets.

arXiv link: http://arxiv.org/abs/2508.09885v1

Econometrics arXiv paper, submitted: 2025-08-12

Approximate Sparsity Class and Minimax Estimation

Authors: Lucas Z. Zhang

Motivated by the orthogonal series density estimation in $L^2([0,1],\mu)$, in
this project we consider a new class of functions that we call the approximate
sparsity class. This new class is characterized by the rate of decay of the
individual Fourier coefficients for a given orthonormal basis. We establish the
$L^2([0,1],\mu)$ metric entropy of such class, with which we show the minimax
rate of convergence. For the density subset in this class, we propose an
adaptive density estimator based on a hard-thresholding procedure that achieves
this minimax rate up to a $\log$ term.

arXiv link: http://arxiv.org/abs/2508.09278v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-08-12

Bias correction for Chatterjee's graph-based correlation coefficient

Authors: Mona Azadkia, Leihao Chen, Fang Han

Azadkia and Chatterjee (2021) recently introduced a simple nearest neighbor
(NN) graph-based correlation coefficient that consistently detects both
independence and functional dependence. Specifically, it approximates a measure
of dependence that equals 0 if and only if the variables are independent, and 1
if and only if they are functionally dependent. However, this NN estimator
includes a bias term that may vanish at a rate slower than root-$n$, preventing
root-$n$ consistency in general. In this article, we propose a bias correction
approach that overcomes this limitation, yielding an NN-based estimator that is
both root-$n$ consistent and asymptotically normal.

arXiv link: http://arxiv.org/abs/2508.09040v1

Econometrics arXiv paper, submitted: 2025-08-11

Amazon Ads Multi-Touch Attribution

Authors: Randall Lewis, Florian Zettelmeyer, Brett R. Gordon, Cristobal Garib, Johannes Hermle, Mike Perry, Henrique Romero, German Schnaidt

Amazon's new Multi-Touch Attribution (MTA) solution allows advertisers to
measure how each touchpoint across the marketing funnel contributes to a
conversion. This gives advertisers a more comprehensive view of their Amazon
Ads performance across objectives when multiple ads influence shopping
decisions. Amazon MTA uses a combination of randomized controlled trials (RCTs)
and machine learning (ML) models to allocate credit for Amazon conversions
across Amazon Ads touchpoints in proportion to their value, i.e., their likely
contribution to shopping decisions. ML models trained purely on observational
data are easy to scale and can yield precise predictions, but the models might
produce biased estimates of ad effects. RCTs yield unbiased ad effects but can
be noisy. Our MTA methodology combines experiments, ML models, and Amazon's
shopping signals in a thoughtful manner to inform attribution credit
allocation.

arXiv link: http://arxiv.org/abs/2508.08209v1

Econometrics arXiv updated paper (originally submitted: 2025-08-11)

Treatment-Effect Estimation in Complex Designs under a Parallel-trends Assumption

Authors: Clément de Chaisemartin, Xavier D'Haultfœuille

This paper considers the identification of dynamic treatment effects with
panel data, in complex designs where the treatment may not be binary and may
not be absorbing. We first show that under no-anticipation and parallel-trends
assumptions, we can identify event-study effects comparing outcomes under the
actual treatment path and under the status-quo path where all units would have
kept their period-one treatment throughout the panel. Those effects can be
helpful to evaluate ex-post the policies that effectively took place, and once
properly normalized they estimate weighted averages of marginal effects of the
current and lagged treatments on the outcome. Yet, they may still be hard to
interpret, and they cannot be used to evaluate the effects of other policies
than the ones that were conducted. To make progress, we impose another
restriction, namely a random coefficients distributed-lag linear model, where
effects remain constant over time. Under this model, the usual distributed-lag
two-way-fixed-effects regression may be misleading. Instead, we show that this
random coefficients model can be estimated simply. We illustrate our findings
by revisiting Gentzkow, Shapiro and Sinkinson (2011).

arXiv link: http://arxiv.org/abs/2508.07808v2

Econometrics arXiv updated paper (originally submitted: 2025-08-10)

Conceptual winsorizing: An application to the social cost of carbon

Authors: Richard S. J. Tol

There are many published estimates of the social cost of carbon. Some are
clear outliers, the result of poorly constrained models. Percentile winsorizing
is an option, but I here propose conceptual winsorizing: The social cost of
carbon is either a willingness to pay, which cannot exceed the ability to pay,
or a proposed carbon tax, which cannot raise more revenue than all other taxes
combined. Conceptual winsorizing successfully removes high outliers. It
slackens as economies decarbonize, slowly without climate policy, faster with.

arXiv link: http://arxiv.org/abs/2508.07384v2

Econometrics arXiv cross-link from q-fin.TR (q-fin.TR), submitted: 2025-08-09

Returns and Order Flow Imbalances: Intraday Dynamics and Macroeconomic News Effects

Authors: Makoto Takahashi

We study the interaction between returns and order flow imbalances in the S&P
500 E-mini futures market using a structural VAR model identified through
heteroskedasticity. The model is estimated at one-second frequency for each
15-minute interval, capturing both intraday variation and endogeneity due to
time aggregation. We find that macroeconomic news announcements sharply reshape
price-flow dynamics: price impact rises, flow impact declines, return
volatility spikes, and flow volatility falls. Pooling across days, both price
and flow impacts are significant at the one-second horizon, with estimates
broadly consistent with stylized limit-order-book predictions. Impulse
responses indicate that shocks dissipate almost entirely within a second.
Structural parameters and volatilities also exhibit pronounced intraday
variation tied to liquidity, trading intensity, and spreads. These results
provide new evidence on high-frequency price formation and liquidity,
highlighting the role of public information and order submission in shaping
market quality.

arXiv link: http://arxiv.org/abs/2508.06788v4

Econometrics arXiv updated paper (originally submitted: 2025-08-07)

Causal Mediation in Natural Experiments

Authors: Senan Hogan-Hennessy

Natural experiments are a cornerstone of applied economics, providing
settings for estimating causal effects with a compelling argument for treatment
randomisation, but give little indication of the mechanisms behind causal
effects. Causal Mediation (CM) is a framework for sufficiently identifying a
mechanism behind the treatment effect, decomposing it into an indirect effect
channel through a mediator mechanism and a remaining direct effect. By
contrast, a suggestive analysis of mechanisms gives necessary but not
sufficient evidence. Conventional CM methods require that the relevant mediator
mechanism is as-good-as-randomly assigned; when people choose the mediator
based on costs and benefits (whether to visit a doctor, to attend university,
etc.), this assumption fails and conventional CM analyses are at risk of bias.
I propose an alternative strategy that delivers unbiased estimates of CM
effects despite unobserved selection, using instrumental variation in mediator
take-up costs. The method identifies CM effects via the marginal effect of the
mediator, with parametric or semi-parametric estimation that is simple to
implement in two stages. Applying these methods to the Oregon Health Insurance
Experiment reveals a substantial portion of the Medicaid lottery's effect on
subjective health and well-being flows through increased healthcare usage -- an
effect that a conventional CM analysis would mistake. This approach gives
applied researchers an alternative method to estimate CM effects when an
initial treatment is quasi-randomly assigned, but a mediator mechanism is not,
as is common in natural experiments.

arXiv link: http://arxiv.org/abs/2508.05449v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2025-08-06

Weak Identification in Peer Effects Estimation

Authors: William W. Wang, Ali Jadbabaie

It is commonly accepted that some phenomena are social: for example,
individuals' smoking habits often correlate with those of their peers. Such
correlations can have a variety of explanations, such as direct contagion or
shared socioeconomic circumstances. The network linear-in-means model is a
workhorse statistical model which incorporates these peer effects by including
average neighborhood characteristics as regressors. Although the model's
parameters are identifiable under mild structural conditions on the network, it
remains unclear whether identification ensures reliable estimation in the
"infill" asymptotic setting, where a single network grows in size. We show that
when covariates are i.i.d. and the average network degree of nodes increases
with the population size, standard estimators suffer from bias or slow
convergence rates due to asymptotic collinearity induced by network averaging.
As an alternative, we demonstrate that linear-in-sums models, which are based
on aggregate rather than average neighborhood characteristics, do not exhibit
such issues as long as the network degrees have some nontrivial variation, a
condition satisfied by most network models.

arXiv link: http://arxiv.org/abs/2508.04897v1

Econometrics arXiv paper, submitted: 2025-08-06

Assessing Dynamic Connectedness in Global Supply Chain Infrastructure Portfolios: The Impact of Risk Factors and Extreme Events

Authors: Haibo Wang

This paper analyses the risk factors around investing in global supply chain
infrastructure: the energy market, investor sentiment, and global shipping
costs. It presents portfolio strategies associated with dynamic risks. A
time-varying parameter vector autoregression (TVP-VAR) model is used to study
the spillover and interconnectedness of the risk factors for global supply
chain infrastructure portfolios from January 5th, 2010, to June 29th, 2023,
which are associated with a set of environmental, social, and governance (ESG)
indexes. The effects of extreme events on risk spillovers and investment
strategy are calculated and compared before and after the COVID-19 outbreak.
The results of this study demonstrate that risk shocks influence the dynamic
connectedness between global supply chain infrastructure portfolios and three
risk factors and show the effects of extreme events on risk spillovers and
investment outcomes. Portfolios with higher ESG scores exhibit stronger dynamic
connectedness with other portfolios and factors. Net total directional
connectedness indicates that West Texas Intermediate (WTI), Baltic Exchange Dry
Index (BDI), and investor sentiment volatility index (VIX) consistently are net
receivers of spillover shocks. A portfolio with a ticker GLFOX appears to be a
time-varying net receiver and giver. The pairwise connectedness shows that WTI
and VIX are mostly net receivers. Portfolios with tickers CSUAX, GII, and FGIAX
are mostly net givers of spillover shocks. The COVID-19 outbreak changed the
structure of dynamic connectedness on portfolios. The mean value of HR and HE
indicates that the weights of long/short positions in investment strategy after
the COVID-19 outbreak have undergone structural changes compared to the period
before. The hedging ability of global supply chain infrastructure investment
portfolios with higher ESG scores is superior.

arXiv link: http://arxiv.org/abs/2508.04858v1

Econometrics arXiv paper, submitted: 2025-08-06

High-Dimensional Matrix-Variate Diffusion Index Models for Time Series Forecasting

Authors: Zhiren Ma, Qian Zhao, Riquan Zhang, Zhaoxing Gao

This paper proposes a novel diffusion-index model for forecasting when
predictors are high-dimensional matrix-valued time series. We apply an
$\alpha$-PCA method to extract low-dimensional matrix factors and build a
bilinear regression linking future outcomes to these factors, estimated via
iterative least squares. To handle weak factor structures, we introduce a
supervised screening step to select informative rows and columns. Theoretical
properties, including consistency and asymptotic normality, are established.
Simulations and real data show that our method significantly improves forecast
accuracy, with the screening procedure providing additional gains over standard
benchmarks in out-of-sample mean squared forecast error.

arXiv link: http://arxiv.org/abs/2508.04259v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-08-05

The Regression Discontinuity Design in Medical Science

Authors: Matias D. Cattaneo, Rocio Titiunik

This article provides an introduction to the Regression Discontinuity (RD)
design, and its application to empirical research in the medical sciences.
While the main focus of this article is on causal interpretation, key concepts
of estimation and inference are also briefly mentioned. A running medical
empirical example is provided.

arXiv link: http://arxiv.org/abs/2508.03878v1

Econometrics arXiv paper, submitted: 2025-08-04

Structural Extrapolation in Regression Discontinuity Designs with an Application to School Expenditure Referenda

Authors: Austin Feng, Francesco Ruggieri

We propose a structural approach to extrapolate average partial effects away
from the cutoff in regression discontinuity designs (RDDs). Our focus is on
applications that exploit closely contested school district referenda to
estimate the effects of changes in education spending on local economic
outcomes. We embed these outcomes in a spatial equilibrium model of local
jurisdictions in which fiscal policy is determined by majority rule voting.
This integration provides a microfoundation for the running variable, the share
of voters who approve a ballot initiative, and enables identification of
structural parameters using RDD coefficients. We then leverage the model to
simulate the effects of counterfactual referenda over a broad range of proposed
spending changes. These scenarios imply realizations of the running variable
away from the threshold, allowing extrapolation of RDD estimates to nonmarginal
referenda. Applying the method to school expenditure ballot measures in
Wisconsin, we document substantial heterogeneity in housing price
capitalization across the approval margin.

arXiv link: http://arxiv.org/abs/2508.02658v1

Econometrics arXiv paper, submitted: 2025-08-04

Estimating Causal Effects with Observational Data: Guidelines for Agricultural and Applied Economists

Authors: Arne Henningsen, Guy Low, David Wuepper, Tobias Dalhaus, Hugo Storm, Dagim Belay, Stefan Hirsch

Most research questions in agricultural and applied economics are of a causal
nature, i.e., how one or more variables (e.g., policies, prices, the weather)
affect one or more other variables (e.g., income, crop yields, pollution). Only
some of these research questions can be studied experimentally. Most empirical
studies in agricultural and applied economics thus rely on observational data.
However, estimating causal effects with observational data requires appropriate
research designs and a transparent discussion of all identifying assumptions,
together with empirical evidence to assess the probability that they hold. This
paper provides an overview of various approaches that are frequently used in
agricultural and applied economics to estimate causal effects with
observational data. It then provides advice and guidelines for agricultural and
applied economists who are intending to estimate causal effects with
observational data, e.g., how to assess and discuss the chosen identification
strategies in their publications.

arXiv link: http://arxiv.org/abs/2508.02310v1

Econometrics arXiv paper, submitted: 2025-08-04

A difference-in-differences estimator by covariate balancing propensity score

Authors: Junjie Li, Yukitoshi Matsushita

This article develops a covariate balancing approach for the estimation of
treatment effects on the treated (ATT) in a difference-in-differences (DID)
research design when panel data are available. We show that the proposed
covariate balancing propensity score (CBPS) DID estimator possesses several
desirable properties: (i) local efficiency, (ii) double robustness in terms of
consistency, (iii) double robustness in terms of inference, and (iv) faster
convergence to the ATT compared to the augmented inverse probability weighting
(AIPW) DID estimators when both working models are locally misspecified. These
latter two characteristics set the CBPS DID estimator apart from the AIPW DID
estimator theoretically. Simulation studies and an empirical study demonstrate
the desirable finite sample performance of the proposed estimator.

arXiv link: http://arxiv.org/abs/2508.02097v1

Econometrics arXiv paper, submitted: 2025-08-03

A Relaxation Approach to Synthetic Control

Authors: Chengwang Liao, Zhentao Shi, Yapeng Zheng

The synthetic control method (SCM) is widely used for constructing the
counterfactual of a treated unit based on data from control units in a donor
pool. Allowing the donor pool contains more control units than time periods, we
propose a novel machine learning algorithm, named SCM-relaxation, for
counterfactual prediction. Our relaxation approach minimizes an
information-theoretic measure of the weights subject to a set of relaxed linear
inequality constraints in addition to the simplex constraint. When the donor
pool exhibits a group structure, SCM-relaxation approximates the equal weights
within each group to diversify the prediction risk. Asymptotically, the
proposed estimator achieves oracle performance in terms of out-of-sample
prediction accuracy. We demonstrate our method by Monte Carlo simulations and
by an empirical application that assesses the economic impact of Brexit on the
United Kingdom's real GDP.

arXiv link: http://arxiv.org/abs/2508.01793v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-08-03

Bayesian Smoothed Quantile Regression

Authors: Bingqi Liu, Kangqiang Li, Tianxiao Pang

Bayesian quantile regression (BQR) based on the asymmetric Laplace
distribution (ALD) has two fundamental limitations: its posterior mean yields
biased quantile estimates, and the non-differentiable check loss precludes
gradient-based MCMC methods. We propose Bayesian smoothed quantile regression
(BSQR), a principled reformulation that constructs a novel, continuously
differentiable likelihood from a kernel-smoothed check loss, simultaneously
ensuring a consistent posterior by aligning the inferential target with the
smoothed objective and enabling efficient Hamiltonian Monte Carlo (HMC)
sampling. Our theoretical analysis establishes posterior propriety for various
priors and examines the impact of kernel choice. Simulations show BSQR reduces
predictive check loss by up to 50% at extreme quantiles over ALD-based methods
and improves MCMC efficiency by 20-40% in effective sample size. An application
to financial risk during the COVID-19 era demonstrates superior tail risk
modeling. The BSQR framework offers a theoretically grounded, computationally
efficient solution to longstanding challenges in BQR, with uniform and
triangular kernels emerging as highly effective.

arXiv link: http://arxiv.org/abs/2508.01738v3

Econometrics arXiv cross-link from cs.AI (cs.AI), submitted: 2025-08-01

Multi-Band Variable-Lag Granger Causality: A Unified Framework for Causal Time Series Inference across Frequencies

Authors: Chakattrai Sookkongwaree, Tattep Lakmuang, Chainarong Amornbunchornvej

Understanding causal relationships in time series is fundamental to many
domains, including neuroscience, economics, and behavioral science. Granger
causality is one of the well-known techniques for inferring causality in time
series. Typically, Granger causality frameworks have a strong fix-lag
assumption between cause and effect, which is often unrealistic in complex
systems. While recent work on variable-lag Granger causality (VLGC) addresses
this limitation by allowing a cause to influence an effect with different time
lags at each time point, it fails to account for the fact that causal
interactions may vary not only in time delay but also across frequency bands.
For example, in brain signals, alpha-band activity may influence another region
with a shorter delay than slower delta-band oscillations. In this work, we
formalize Multi-Band Variable-Lag Granger Causality (MB-VLGC) and propose a
novel framework that generalizes traditional VLGC by explicitly modeling
frequency-dependent causal delays. We provide a formal definition of MB-VLGC,
demonstrate its theoretical soundness, and propose an efficient inference
pipeline. Extensive experiments across multiple domains demonstrate that our
framework significantly outperforms existing methods on both synthetic and
real-world datasets, confirming its broad applicability to any type of time
series data. Code and datasets are publicly available.

arXiv link: http://arxiv.org/abs/2508.00658v1

Econometrics arXiv paper, submitted: 2025-08-01

Robust Econometrics for Growth-at-Risk

Authors: Tobias Adrian, Yuya Sasaki, Yulong Wang

The Growth-at-Risk (GaR) framework has garnered attention in recent
econometric literature, yet current approaches implicitly assume a constant
Pareto exponent. We introduce novel and robust econometrics to estimate the
tails of GaR based on a rigorous theoretical framework and establish validity
and effectiveness. Simulations demonstrate consistent outperformance relative
to existing alternatives in terms of predictive accuracy. We perform a
long-term GaR analysis that provides accurate and insightful predictions,
effectively capturing financial anomalies better than current methods.

arXiv link: http://arxiv.org/abs/2508.00263v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-07-31

Relative Bias Under Imperfect Identification in Observational Causal Inference

Authors: Melody Huang, Cory McCartan

To conduct causal inference in observational settings, researchers must rely
on certain identifying assumptions. In practice, these assumptions are unlikely
to hold exactly. This paper considers the bias of selection-on-observables,
instrumental variables, and proximal inference estimates under violations of
their identifying assumptions. We develop bias expressions for IV and proximal
inference that show how violations of their respective assumptions are
amplified by any unmeasured confounding in the outcome variable. We propose a
set of sensitivity tools that quantify the sensitivity of different
identification strategies, and an augmented bias contour plot visualizes the
relationship between these strategies. We argue that the act of choosing an
identification strategy implicitly expresses a belief about the degree of
violations that must be present in alternative identification strategies. Even
when researchers intend to conduct an IV or proximal analysis, a sensitivity
analysis comparing different identification strategies can help to better
understand the implications of each set of assumptions. Throughout, we compare
the different approaches on a re-analysis of the impact of state surveillance
on the incidence of protest in Communist Poland.

arXiv link: http://arxiv.org/abs/2507.23743v1

Econometrics arXiv paper, submitted: 2025-07-30

Inference on Common Trends in a Cointegrated Nonlinear SVAR

Authors: James A. Duffy, Xiyu Jiao

We consider the problem of performing inference on the number of common
stochastic trends when data is generated by a cointegrated CKSVAR (a
two-regime, piecewise-linear SVAR; Mavroeidis, 2021), using a modified version
of the Breitung (2002) multivariate variance ratio test that is robust to the
presence of nonlinear cointegration (of a known form). To derive the
asymptotics of our test statistic, we prove a fundamental LLN-type result for a
class of stable but nonstationary autoregressive processes, using a novel dual
linear process approximation. We show that our modified test yields correct
inferences regarding the number of common trends in such a system, whereas the
unmodified test tends to infer a higher number of common trends than are
actually present, when cointegrating relations are nonlinear.

arXiv link: http://arxiv.org/abs/2507.22869v1

Econometrics arXiv paper, submitted: 2025-07-30

Generalized Optimal Transport

Authors: Andrei Voronin

Many causal and structural parameters in economics can be identified and
estimated by computing the value of an optimization program over all
distributions consistent with the model and the data. Existing tools apply when
the data is discrete, or when only disjoint marginals of the distribution are
identified, which is restrictive in many applications. We develop a general
framework that yields sharp bounds on a linear functional of the unknown true
distribution under i) an arbitrary collection of identified joint
subdistributions and ii) structural conditions, such as (conditional)
independence. We encode the identification restrictions as a continuous
collection of moments of characteristic kernels, and use duality and
approximation theory to rewrite the infinite-dimensional program over Borel
measures as a finite-dimensional program that is simple to compute. Our
approach yields a consistent estimator that is $n$-uniformly valid for
the sharp bounds. In the special case of empirical optimal transport with
Lipschitz cost, where the minimax rate is $n^{2/d}$, our method yields a
uniformly consistent estimator with an asymmetric rate, converging at
$n$ uniformly from one side.

arXiv link: http://arxiv.org/abs/2507.22422v1

Econometrics arXiv updated paper (originally submitted: 2025-07-30)

Dimension Reduction for Conditional Density Estimation with Applications to High-Dimensional Causal Inference

Authors: Jianhua Mei, Fu Ouyang, Thomas T. Yang

We propose a novel and computationally efficient approach for nonparametric
conditional density estimation in high-dimensional settings that achieves
dimension reduction without imposing restrictive distributional or functional
form assumptions. To uncover the underlying sparsity structure of the data, we
develop an innovative conditional dependence measure and a modified
cross-validation procedure that enables data-driven variable selection, thereby
circumventing the need for subjective threshold selection. We demonstrate the
practical utility of our dimension-reduced conditional density estimation by
applying it to doubly robust estimators for average treatment effects. Notably,
our proposed procedure is able to select relevant variables for nonparametric
propensity score estimation and also inherently reduce the dimensionality of
outcome regressions through a refined ignorability condition. We evaluate the
finite-sample properties of our approach through comprehensive simulation
studies and an empirical study on the effects of 401(k) eligibility on savings
using SIPP data.

arXiv link: http://arxiv.org/abs/2507.22312v2

Econometrics arXiv paper, submitted: 2025-07-29

Testing for multiple change-points in macroeconometrics: an empirical guide and recent developments

Authors: Otilia Boldea, Alastair R. Hall

We review recent developments in detecting and estimating multiple
change-points in time series models with exogenous and endogenous regressors,
panel data models, and factor models. This review differs from others in
multiple ways: (1) it focuses on inference about the change-points in slope
parameters, rather than in the mean of the dependent variable - the latter
being common in the statistical literature; (2) it focuses on detecting - via
sequential testing and other methods - multiple change-points, and only
discusses one change-point when methods for multiple change-points are not
available; (3) it is meant as a practitioner's guide for empirical
macroeconomists first, and as a result, it focuses only on the methods derived
under the most general assumptions relevant to macroeconomic applications.

arXiv link: http://arxiv.org/abs/2507.22204v1

Econometrics arXiv paper, submitted: 2025-07-29

Low-Rank Structured Nonparametric Prediction of Instantaneous Volatility

Authors: Sung Hoon Choi, Donggyu Kim

Based on It\^o semimartingale models, several studies have proposed methods
for forecasting intraday volatility using high-frequency financial data. These
approaches typically rely on restrictive parametric assumptions and are often
vulnerable to model misspecification. To address this issue, we introduce a
novel nonparametric prediction method for the future intraday instantaneous
volatility process during trading hours, which leverages both previous days'
data and the current day's observed intraday data. Our approach imposes an
interday-by-intraday matrix representation of the instantaneous volatility,
which is decomposed into a low-rank conditional expectation component and a
noise matrix. To predict the future conditional expected volatility vector, we
exploit this low-rank structure and propose the Structural Intraday-volatility
Prediction (SIP) procedure. We establish the asymptotic properties of the SIP
estimator and demonstrate its effectiveness through an out-of-sample prediction
study using real high-frequency trading data.

arXiv link: http://arxiv.org/abs/2507.22173v1

Econometrics arXiv paper, submitted: 2025-07-29

Regional Price Dynamics and Market Integration in the U.S. Beef Industry: An Econometric Analysis

Authors: Leonardo Manríquez-Méndez

The United States, a leading global producer and consumer of beef, continues
to face substantial challenges in achieving price harmonization across its
regional markets. This paper evaluates the validity of the Law of One Price
(LOP) in the U.S. beef industry and investigates causal relationships among
regional price dynamics. Through a series of econometric tests, we establish
that regional price series are integrated of order one, displaying
non-stationarity in levels and stationarity in first differences. The analysis
reveals partial LOP compliance in the Northeast and West, while full
convergence remains elusive at the national level. Although no region
demonstrates persistent price leadership, Southern prices appear particularly
sensitive to exogenous shocks. These findings reflect asymmetrical integration
across U.S. beef markets and suggest the presence of structural frictions that
hinder complete market unification.

arXiv link: http://arxiv.org/abs/2507.21950v1

Econometrics arXiv paper, submitted: 2025-07-29

Nonlinear Treatment Effects in Shift-Share Designs

Authors: Luigi Garzon, Vitor Possebom

We analyze heterogenous, nonlinear treatment effects in shift-share designs
with exogenous shares. We employ a triangular model and correct for treatment
endogeneity using a control function. Our tools identify four target
parameters. Two of them capture the observable heterogeneity of treatment
effects, while one summarizes this heterogeneity in a single measure. The last
parameter analyzes counterfactual, policy-relevant treatment assignment
mechanisms. We propose flexible parametric estimators for these parameters and
apply them to reevaluate the impact of Chinese imports on U.S. manufacturing
employment. Our results highlight substantial treatment effect heterogeneity,
which is not captured by commonly used shift-share tools.

arXiv link: http://arxiv.org/abs/2507.21915v1

Econometrics arXiv paper, submitted: 2025-07-29

Can large language models assist choice modelling? Insights into prompting strategies and current models capabilities

Authors: Georges Sfeir, Gabriel Nova, Stephane Hess, Sander van Cranenburgh

Large Language Models (LLMs) are widely used to support various workflows
across different disciplines, yet their potential in choice modelling remains
relatively unexplored. This work examines the potential of LLMs as assistive
agents in the specification and, where technically feasible, estimation of
Multinomial Logit models. We implement a systematic experimental framework
involving thirteen versions of six leading LLMs (ChatGPT, Claude, DeepSeek,
Gemini, Gemma, and Llama) evaluated under five experimental configurations.
These configurations vary along three dimensions: modelling goal (suggesting
vs. suggesting and estimating MNLs); prompting strategy (Zero-Shot vs.
Chain-of-Thoughts); and information availability (full dataset vs. data
dictionary only). Each LLM-suggested specification is implemented, estimated,
and evaluated based on goodness-of-fit metrics, behavioural plausibility, and
model complexity. Findings reveal that proprietary LLMs can generate valid and
behaviourally sound utility specifications, particularly when guided by
structured prompts. Open-weight models such as Llama and Gemma struggled to
produce meaningful specifications. Claude 4 Sonnet consistently produced the
best-fitting and most complex models, while GPT models suggested models with
robust and stable modelling outcomes. Some LLMs performed better when provided
with just data dictionary, suggesting that limiting raw data access may enhance
internal reasoning capabilities. Among all LLMs, GPT o3 was uniquely capable of
correctly estimating its own specifications by executing self-generated code.
Overall, the results demonstrate both the promise and current limitations of
LLMs as assistive agents in choice modelling, not only for model specification
but also for supporting modelling decision and estimation, and provide
practical guidance for integrating these tools into choice modellers'
workflows.

arXiv link: http://arxiv.org/abs/2507.21790v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2025-07-29

A Bayesian Ensemble Projection of Climate Change and Technological Impacts on Future Crop Yields

Authors: Dan Li, Vassili Kitsios, David Newth, Terence John O'Kane

This paper introduces a Bayesian hierarchical modeling framework within a
fully probabilistic setting for crop yield estimation, model selection, and
uncertainty forecasting under multiple future greenhouse gas emission
scenarios. By informing on regional agricultural impacts, this approach
addresses broader risks to global food security. Extending an established
multivariate econometric crop-yield model to incorporate country-specific error
variances, the framework systematically relaxes restrictive homogeneity
assumptions and enables transparent decomposition of predictive uncertainty
into contributions from climate models, emission scenarios, and crop model
parameters. In both in-sample and out-of-sample analyses focused on global
wheat production, the results demonstrate significant improvements in
calibration and probabilistic accuracy of yield projections. These advances
provide policymakers and stakeholders with detailed, risk-sensitive information
to support the development of more resilient and adaptive agricultural and
climate strategies in response to escalating climate-related risks.

arXiv link: http://arxiv.org/abs/2507.21559v1

Econometrics arXiv paper, submitted: 2025-07-28

Policy Learning under Unobserved Confounding: A Robust and Efficient Approach

Authors: Zequn Jin, Gaoqian Xu, Xi Zheng, Yahong Zhou

This paper develops a robust and efficient method for policy learning from
observational data in the presence of unobserved confounding, complementing
existing instrumental variable (IV) based approaches. We employ the marginal
sensitivity model (MSM) to relax the commonly used yet restrictive
unconfoundedness assumption by introducing a sensitivity parameter that
captures the extent of selection bias induced by unobserved confounders.
Building on this framework, we consider two distributionally robust welfare
criteria, defined as the worst-case welfare and policy improvement functions,
evaluated over an uncertainty set of counterfactual distributions characterized
by the MSM. Closed-form expressions for both welfare criteria are derived.
Leveraging these identification results, we construct doubly robust scores and
estimate the robust policies by maximizing the proposed criteria. Our approach
accommodates flexible machine learning methods for estimating nuisance
components, even when these converge at moderately slow rate. We establish
asymptotic regret bounds for the resulting policies, providing a robust
guarantee against the most adversarial confounding scenario. The proposed
method is evaluated through extensive simulation studies and empirical
applications to the JTPA study and Head Start program.

arXiv link: http://arxiv.org/abs/2507.20550v1

Econometrics arXiv paper, submitted: 2025-07-27

Staggered Adoption DiD Designs with Misclassification and Anticipation

Authors: Clara Augustin, Daniel Gutknecht, Cenchen Liu

This paper examines the identification and estimation of treatment effects in
staggered adoption designs -- a common extension of the canonical
Difference-in-Differences (DiD) model to multiple groups and time-periods -- in
the presence of (time varying) misclassification of the treatment status as
well as of anticipation. We demonstrate that standard estimators are biased
with respect to commonly used causal parameters of interest under such forms of
misspecification. To address this issue, we provide modified estimators that
recover the Average Treatment Effect of observed and true switching units,
respectively. Additionally, we suggest a testing procedure aimed at detecting
the timing and extent of misclassification and anticipation effects. We
illustrate the proposed methods with an application to the effects of an
anti-cheating policy on school mean test scores in high stakes national exams
in Indonesia.

arXiv link: http://arxiv.org/abs/2507.20415v1

Econometrics arXiv cross-link from q-fin.PM (q-fin.PM), submitted: 2025-07-26

Dependency Network-Based Portfolio Design with Forecasting and VaR Constraints

Authors: Zihan Lin, Haojie Liu, Randall R. Rojas

This study proposes a novel portfolio optimization framework that integrates
statistical social network analysis with time series forecasting and risk
management. Using daily stock data from the S&P 500 (2020-2024), we construct
dependency networks via Vector Autoregression (VAR) and Forecast Error Variance
Decomposition (FEVD), transforming influence relationships into a cost-based
network. Specifically, FEVD breaks down the VAR's forecast error variance to
quantify how much each stock's shocks contribute to another's uncertainty
information we invert to form influence-based edge weights in our network. By
applying the Minimum Spanning Tree (MST) algorithm, we extract the core
inter-stock structure and identify central stocks through degree centrality. A
dynamic portfolio is constructed using the top-ranked stocks, with capital
allocated based on Value at Risk (VaR). To refine stock selection, we
incorporate forecasts from ARIMA and Neural Network Autoregressive (NNAR)
models. Trading simulations over a one-year period demonstrate that the
MST-based strategies outperform a buy-and-hold benchmark, with the tuned
NNAR-enhanced strategy achieving a 63.74% return versus 18.00% for the
benchmark. Our results highlight the potential of combining network structures,
predictive modeling, and risk metrics to improve adaptive financial
decision-making.

arXiv link: http://arxiv.org/abs/2507.20039v1

Econometrics arXiv paper, submitted: 2025-07-26

Semiparametric Identification of the Discount Factor and Payoff Function in Dynamic Discrete Choice Models

Authors: Yu Hao, Hiroyuki Kasahara, Katsumi Shimotsu

This paper investigates how the discount factor and payoff functions can be
identified in stationary infinite-horizon dynamic discrete choice models. In
single-agent models, we show that common nonparametric assumptions on
per-period payoffs -- such as homogeneity of degree one, monotonicity,
concavity, zero cross-differences, and complementarity -- provide identifying
restrictions on the discount factor. These restrictions take the form of
polynomial equalities and inequalities with degrees bounded by the cardinality
of the state space. These restrictions also identify payoff functions under
standard normalization at one action. In dynamic game models, we show that
firm-specific discount factors can be identified using assumptions such as
irrelevance of other firms' lagged actions, exchangeability, and the
independence of adjustment costs from other firms' actions. Our results
demonstrate that widely used nonparametric assumptions in economic analysis can
provide substantial identifying power in dynamic structural models.

arXiv link: http://arxiv.org/abs/2507.19814v1

Econometrics arXiv paper, submitted: 2025-07-25

Binary Classification with the Maximum Score Model and Linear Programming

Authors: Joel L. Horowitz, Sokbae Lee

This paper presents a computationally efficient method for binary
classification using Manski's (1975,1985) maximum score model when covariates
are discretely distributed and parameters are partially but not point
identified. We establish conditions under which it is minimax optimal to allow
for either non-classification or random classification and derive finite-sample
and asymptotic lower bounds on the probability of correct classification. We
also describe an extension of our method to continuous covariates. Our approach
avoids the computational difficulty of maximum score estimation by
reformulating the problem as two linear programs. Compared to parametric and
nonparametric methods, our method balances extrapolation ability with minimal
distributional assumptions. Monte Carlo simulations and empirical applications
demonstrate its effectiveness and practical relevance.

arXiv link: http://arxiv.org/abs/2507.19654v1

Econometrics arXiv paper, submitted: 2025-07-25

Beyond Bonferroni: Hierarchical Multiple Testing in Empirical Research

Authors: Sebastian Calonico, Sebastian Galiani

Empirical research in the social and medical sciences frequently involves
testing multiple hypotheses simultaneously, increasing the risk of false
positives due to chance. Classical multiple testing procedures, such as the
Bonferroni correction, control the family-wise error rate (FWER) but tend to be
overly conservative, reducing statistical power. Stepwise alternatives like the
Holm and Hochberg procedures offer improved power while maintaining error
control under certain dependence structures. However, these standard approaches
typically ignore hierarchical relationships among hypotheses -- structures that
are common in settings such as clinical trials and program evaluations, where
outcomes are often logically or causally linked. Hierarchical multiple testing
procedures -- including fixed sequence, fallback, and gatekeeping methods --
explicitly incorporate these relationships, providing more powerful and
interpretable frameworks for inference. This paper reviews key hierarchical
methods, compares their statistical properties and practical trade-offs, and
discusses implications for applied empirical research.

arXiv link: http://arxiv.org/abs/2507.19610v1

Econometrics arXiv paper, submitted: 2025-07-25

Uniform Critical Values for Likelihood Ratio Tests in Boundary Problems

Authors: Giuseppe Cavaliere, Adam McCloskey, Rasmus S. Pedersen, Anders Rahbek

Limit distributions of likelihood ratio statistics are well-known to be
discontinuous in the presence of nuisance parameters at the boundary of the
parameter space, which lead to size distortions when standard critical values
are used for testing. In this paper, we propose a new and simple way of
constructing critical values that yields uniformly correct asymptotic size,
regardless of whether nuisance parameters are at, near or far from the boundary
of the parameter space. Importantly, the proposed critical values are trivial
to compute and at the same time provide powerful tests in most settings. In
comparison to existing size-correction methods, the new approach exploits the
monotonicity of the two components of the limiting distribution of the
likelihood ratio statistic, in conjunction with rectangular confidence sets for
the nuisance parameters, to gain computational tractability. Uniform validity
is established for likelihood ratio tests based on the new critical values, and
we provide illustrations of their construction in two key examples: (i) testing
a coefficient of interest in the classical linear regression model with
non-negativity constraints on control coefficients, and, (ii) testing for the
presence of exogenous variables in autoregressive conditional heteroskedastic
models (ARCH) with exogenous regressors. Simulations confirm that the tests
have desirable size and power properties. A brief empirical illustration
demonstrates the usefulness of our proposed test in relation to testing for
spill-overs and ARCH effects.

arXiv link: http://arxiv.org/abs/2507.19603v1

Econometrics arXiv paper, submitted: 2025-07-25

Sequential Decision Problems with Missing Feedback

Authors: Filippo Palomba

This paper investigates the challenges of optimal online policy learning
under missing data. State-of-the-art algorithms implicitly assume that rewards
are always observable. I show that when rewards are missing at random, the
Upper Confidence Bound (UCB) algorithm maintains optimal regret bounds;
however, it selects suboptimal policies with high probability as soon as this
assumption is relaxed. To overcome this limitation, I introduce a fully
nonparametric algorithm-Doubly-Robust Upper Confidence Bound (DR-UCB)-which
explicitly models the form of missingness through observable covariates and
achieves a nearly-optimal worst-case regret rate of $O(T)$.
To prove this result, I derive high-probability bounds for a class of
doubly-robust estimators that hold under broad dependence structures.
Simulation results closely match the theoretical predictions, validating the
proposed framework.

arXiv link: http://arxiv.org/abs/2507.19596v1

Econometrics arXiv updated paper (originally submitted: 2025-07-25)

Interactive, Grouped and Non-separable Fixed Effects: A Practitioner's Guide to the New Panel Data Econometrics

Authors: Jan Ditzen, Yiannis Karavias

The past 20 years have brought fundamental advances in modeling unobserved
heterogeneity in panel data. Interactive Fixed Effects (IFE) proved to be a
foundational framework, generalizing the standard one-way and two-way fixed
effects models by allowing the unit-specific unobserved heterogeneity to be
interacted with unobserved time-varying common factors, allowing for more
general forms of omitted variables. The IFE framework laid the theoretical
foundations for other forms of heterogeneity, such as grouped fixed effects
(GFE) and non-separable two-way fixed effects (NSTW). The existence of IFE, GFE
or NSTW has significant implications for identification, estimation, and
inference, leading to the development of many new estimators for panel data
models. This paper provides an accessible review of the new estimation methods
and their associated diagnostic tests, and offers a guide to empirical
practice. In two separate empirical investigations we demonstrate that there is
empirical support for the new forms of fixed effects and that the results can
differ significantly from those obtained using traditional fixed effects
estimators.

arXiv link: http://arxiv.org/abs/2507.19099v2

Econometrics arXiv paper, submitted: 2025-07-25

Flexible estimation of skill formation models

Authors: Antonia Antweiler, Joachim Freyberger

This paper examines estimation of skill formation models, a critical
component in understanding human capital development and its effects on
individual outcomes. Existing estimators are either based on moment conditions
and only applicable in specific settings or rely on distributional
approximations that often do not align with the model. Our method employs an
iterative likelihood-based procedure, which flexibly estimates latent variable
distributions and recursively incorporates model restrictions across time
periods. This approach reduces computational complexity while accommodating
nonlinear production functions and measurement systems. Inference can be based
on a bootstrap procedure that does not require re-estimating the model for
bootstrap samples. Monte Carlo simulations and an empirical application
demonstrate that our estimator outperforms existing methods, whose estimators
can be substantially biased or noisy.

arXiv link: http://arxiv.org/abs/2507.18995v1

Econometrics arXiv paper, submitted: 2025-07-25

Batched Adaptive Network Formation

Authors: Yan Xu, Bo Zhou

Networks are central to many economic and organizational applications,
including workplace team formation, social platform recommendations, and
classroom friendship development. In these settings, networks are modeled as
graphs, with agents as nodes, agent pairs as edges, and edge weights capturing
pairwise production or interaction outcomes. This paper develops an adaptive,
or online, policy that learns to form increasingly effective networks
as data accumulates over time, progressively improving total network output
measured by the sum of edge weights.
Our approach builds on the weighted stochastic block model (WSBM), which
captures agents' unobservable heterogeneity through discrete latent types and
models their complementarities in a flexible, nonparametric manner. We frame
the online network formation problem as a non-standard batched
multi-armed bandit, where each type pair corresponds to an arm, and pairwise
reward depends on type complementarity. This strikes a balance between
exploration -- learning latent types and complementarities -- and exploitation
-- forming high-weighted networks. We establish two key results: a
batched local asymptotic normality result for the WSBM and an
asymptotic equivalence between maximum likelihood and variational estimates of
the intractable likelihood. Together, they provide a theoretical foundation for
treating variational estimates as normal signals, enabling principled Bayesian
updating across batches. The resulting posteriors are then incorporated into a
tailored maximum-weight matching problem to determine the policy for the next
batch. Simulations show that our algorithm substantially improves outcomes
within a few batches, yields increasingly accurate parameter estimates, and
remains effective even in nonstationary settings with evolving agent pools.

arXiv link: http://arxiv.org/abs/2507.18961v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-07-24

How weak are weak factors? Uniform inference for signal strength in signal plus noise models

Authors: Anna Bykhovskaya, Vadim Gorin, Sasha Sodin

The paper analyzes four classical signal-plus-noise models: the factor model,
spiked sample covariance matrices, the sum of a Wigner matrix and a low-rank
perturbation, and canonical correlation analysis with low-rank dependencies.
The objective is to construct confidence intervals for the signal strength that
are uniformly valid across all regimes - strong, weak, and critical signals. We
demonstrate that traditional Gaussian approximations fail in the critical
regime. Instead, we introduce a universal transitional distribution that
enables valid inference across the entire spectrum of signal strengths. The
approach is illustrated through applications in macroeconomics and finance.

arXiv link: http://arxiv.org/abs/2507.18554v1

Econometrics arXiv paper, submitted: 2025-07-24

Partitioned Wild Bootstrap for Panel Data Quantile Regression

Authors: Antonio F. Galvao, Carlos Lamarche, Thomas Parker

Practical inference procedures for quantile regression models of panel data
have been a pervasive concern in empirical work, and can be especially
challenging when the panel is observed over many time periods and temporal
dependence needs to be taken into account. In this paper, we propose a new
bootstrap method that applies random weighting to a partition of the data --
partition-invariant weights are used in the bootstrap data generating process
-- to conduct statistical inference for conditional quantiles in panel data
that have significant time-series dependence. We demonstrate that the procedure
is asymptotically valid for approximating the distribution of the fixed effects
quantile regression estimator. The bootstrap procedure offers a viable
alternative to existing resampling methods. Simulation studies show numerical
evidence that the novel approach has accurate small sample behavior, and an
empirical application illustrates its use.

arXiv link: http://arxiv.org/abs/2507.18494v1

Econometrics arXiv paper, submitted: 2025-07-23

A general randomized test for Alpha

Authors: Daniele Massacci, Lucio Sarno, Lorenzo Trapani, Pierluigi Vallarino

We propose a methodology to construct tests for the null hypothesis that the
pricing errors of a panel of asset returns are jointly equal to zero in a
linear factor asset pricing model -- that is, the null of "zero alpha". We
consider, as a leading example, a model with observable, tradable factors, but
we also develop extensions to accommodate for non-tradable and latent factors.
The test is based on equation-by-equation estimation, using a randomized
version of the estimated alphas, which only requires rates of convergence. The
distinct features of the proposed methodology are that it does not require the
estimation of any covariance matrix, and that it allows for both N and T to
pass to infinity, with the former possibly faster than the latter. Further,
unlike extant approaches, the procedure can accommodate conditional
heteroskedasticity, non-Gaussianity, and even strong cross-sectional dependence
in the error terms. We also propose a de-randomized decision rule to choose in
favor or against the correct specification of a linear factor pricing model.
Monte Carlo simulations show that the test has satisfactory properties and it
compares favorably to several existing tests. The usefulness of the testing
procedure is illustrated through an application of linear factor pricing models
to price the constituents of the S&P 500.

arXiv link: http://arxiv.org/abs/2507.17599v1

Econometrics arXiv paper, submitted: 2025-07-23

Decoding Consumer Preferences Using Attention-Based Language Models

Authors: Joshua Foster, Fredrik Odegaard

This paper proposes a new demand estimation method using attention-based
language models. An encoder-only language model is trained in a two-stage
process to analyze the natural language descriptions of used cars from a large
US-based online auction marketplace. The approach enables
semi-nonparametrically estimation for the demand primitives of a structural
model representing the private valuations and market size for each vehicle
listing. In the first stage, the language model is fine-tuned to encode the
target auction outcomes using the natural language vehicle descriptions. In the
second stage, the trained language model's encodings are projected into the
parameter space of the structural model. The model's capability to conduct
counterfactual analyses within the trained market space is validated using a
subsample of withheld auction data, which includes a set of unique "zero shot"
instances.

arXiv link: http://arxiv.org/abs/2507.17564v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2025-07-22

Adaptive Market Intelligence: A Mixture of Experts Framework for Volatility-Sensitive Stock Forecasting

Authors: Diego Vallarino

This study develops and empirically validates a Mixture of Experts (MoE)
framework for stock price prediction across heterogeneous volatility regimes
using real market data. The proposed model combines a Recurrent Neural Network
(RNN) optimized for high-volatility stocks with a linear regression model
tailored to stable equities. A volatility-aware gating mechanism dynamically
weights the contributions of each expert based on asset classification. Using a
dataset of 30 publicly traded U.S. stocks spanning diverse sectors, the MoE
approach consistently outperforms both standalone models.
Specifically, it achieves up to 33% improvement in MSE for volatile assets
and 28% for stable assets relative to their respective baselines. Stratified
evaluation across volatility classes demonstrates the model's ability to adapt
complexity to underlying market dynamics. These results confirm that no single
model suffices across market regimes and highlight the advantage of adaptive
architectures in financial prediction. Future work should explore real-time
gate learning, dynamic volatility segmentation, and applications to portfolio
optimization.

arXiv link: http://arxiv.org/abs/2508.02686v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2025-07-22

Can we have it all? Non-asymptotically valid and asymptotically exact confidence intervals for expectations and linear regressions

Authors: Alexis Derumigny, Lucas Girard, Yannick Guyonvarch

We contribute to bridging the gap between large- and finite-sample inference
by studying confidence sets (CSs) that are both non-asymptotically valid and
asymptotically exact uniformly (NAVAE) over semi-parametric statistical models.
NAVAE CSs are not easily obtained; for instance, we show they do not exist over
the set of Bernoulli distributions. We first derive a generic sufficient
condition: NAVAE CSs are available as soon as uniform asymptotically exact CSs
are. Second, building on that connection, we construct closed-form NAVAE
confidence intervals (CIs) in two standard settings -- scalar expectations and
linear combinations of OLS coefficients -- under moment conditions only. For
expectations, our sole requirement is a bounded kurtosis. In the OLS case, our
moment constraints accommodate heteroskedasticity and weak exogeneity of the
regressors. Under those conditions, we enlarge the Central Limit Theorem-based
CIs, which are asymptotically exact, to ensure non-asymptotic guarantees. Those
modifications vanish asymptotically so that our CIs coincide with the classical
ones in the limit. We illustrate the potential and limitations of our approach
through a simulation study.

arXiv link: http://arxiv.org/abs/2507.16776v2

Econometrics arXiv paper, submitted: 2025-07-22

Dyadic data with ordered outcome variables

Authors: Chris Muris, Cavit Pakel, Qichen Zhang

We consider ordered logit models for directed network data that allow for
flexible sender and receiver fixed effects that can vary arbitrarily across
outcome categories. This structure poses a significant incidental parameter
problem, particularly challenging under network sparsity or when some outcome
categories are rare. We develop the first estimation method for this setting by
extending tetrad-differencing conditional maximum likelihood (CML) techniques
from binary choice network models. This approach yields conditional
probabilities free of the fixed effects, enabling consistent estimation even
under sparsity. Applying the CML principle to ordered data yields multiple
likelihood contributions corresponding to different outcome thresholds. We
propose and analyze two distinct estimators based on aggregating these
contributions: an Equally-Weighted Tetrad Logit Estimator (ETLE) and a Pooled
Tetrad Logit Estimator (PTLE). We prove PTLE is consistent under weaker
identification conditions, requiring only sufficient information when pooling
across categories, rather than sufficient information in each category. Monte
Carlo simulations confirm the theoretical preference for PTLE, and an empirical
application to friendship networks among Dutch university students demonstrates
the method's value. Our approach reveals significant positive homophily effects
for gender, smoking behavior, and academic program similarities, while standard
methods without fixed effects produce counterintuitive results.

arXiv link: http://arxiv.org/abs/2507.16689v1

Econometrics arXiv paper, submitted: 2025-07-22

Binary Response Forecasting under a Factor-Augmented Framework

Authors: Tingting Cheng, Jiachen Cong, Fei Liu, Xuanbin Yang

In this paper, we propose a novel factor-augmented forecasting regression
model with a binary response variable. We develop a maximum likelihood
estimation method for the regression parameters and establish the asymptotic
properties of the resulting estimators. Monte Carlo simulation results show
that the proposed estimation method performs very well in finite samples.
Finally, we demonstrate the usefulness of the proposed model through an
application to U.S. recession forecasting. The proposed model consistently
outperforms conventional Probit regression across both in-sample and
out-of-sample exercises, by effectively utilizing high-dimensional information
through latent factors.

arXiv link: http://arxiv.org/abs/2507.16462v1

Econometrics arXiv paper, submitted: 2025-07-20

Volatility Spillovers and Interconnectedness in OPEC Oil Markets: A Network-Based log-ARCH Approach

Authors: Fayçal Djebari, Kahina Mehidi, Khelifa Mazouz, Philipp Otto

This paper examines several network-based volatility models for oil prices,
capturing spillovers among OPEC oil-exporting countries by embedding novel
network structures into ARCH-type models. We apply a network-based log-ARCH
framework that incorporates weight matrices derived from time-series clustering
and model-implied distances into the conditional variance equation. These
weight matrices are constructed from return data and standard multivariate
GARCH model outputs (CCC, DCC, and GO-GARCH), enabling a comparative analysis
of volatility transmission across specifications. Through a rolling-window
forecast evaluation, the network-based models demonstrate competitive
forecasting performance relative to traditional specifications and uncover
intricate spillover effects. These results provide a deeper understanding of
the interconnectedness within the OPEC network, with important implications for
financial risk assessment, market integration, and coordinated policy among
oil-producing economies.

arXiv link: http://arxiv.org/abs/2507.15046v1

Econometrics arXiv updated paper (originally submitted: 2025-07-19)

Testing Clustered Equal Predictive Ability with Unknown Clusters

Authors: Oguzhan Akgun, Alain Pirotte, Giovanni Urga, Zhenlin Yang

This paper proposes a selective inference procedure for testing equal
predictive ability in panel data settings with unknown heterogeneity. The
framework allows predictive performance to vary across unobserved clusters and
accounts for the data-driven selection of these clusters using the Panel Kmeans
Algorithm. A post-selection Wald-type statistic is constructed, and valid
$p$-values are derived under general forms of autocorrelation and
cross-sectional dependence in forecast loss differentials. The method
accommodates conditioning on covariates or common factors and permits both
strong and weak dependence across units. Simulations demonstrate the
finite-sample validity of the procedure and show that it has very high power.
An empirical application to exchange rate forecasting using machine learning
methods illustrates the practical relevance of accounting for unknown clusters
in forecast evaluation.

arXiv link: http://arxiv.org/abs/2507.14621v2

Econometrics arXiv paper, submitted: 2025-07-18

A New Perspective of the Meese-Rogoff Puzzle: Application of Sparse Dynamic Shrinkage

Authors: Zheng Fan, Worapree Maneesoonthorn, Yong Song

We propose the Markov Switching Dynamic Shrinkage process (MSDSP), nesting
the Dynamic Shrinkage Process (DSP) of Kowal et al. (2019). We revisit the
Meese-Rogoff puzzle (Meese and Rogoff, 1983a,b, 1988) by applying the MSDSP to
the economic models deemed inferior to the random walk model for exchange rate
predictions. The flexibility of the MSDSP model captures the possibility of
zero coefficients (sparsity), constant coefficient (dynamic shrinkage), as well
as sudden and gradual parameter movements (structural change) in the
time-varying parameter model setting. We also apply MSDSP in the context of
Bayesian predictive synthesis (BPS) (McAlinn and West, 2019), where dynamic
combination schemes exploit the information from the alternative economic
models. Our analysis provide a new perspective to the Meese-Rogoff puzzle,
illustrating that the economic models, enhanced with the parameter flexibility
of the MSDSP, produce predictive distributions that are superior to the random
walk model, even when stochastic volatility is considered.

arXiv link: http://arxiv.org/abs/2507.14408v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-07-18

Policy relevance of causal quantities in networks

Authors: Sahil Loomba, Dean Eckles

In settings where units' outcomes are affected by others' treatments, there
has been a proliferation of ways to quantify effects of treatments on outcomes.
Here we describe how many proposed estimands can be represented as involving
one of two ways of averaging over units and treatment assignments. The more
common representation often results in quantities that are irrelevant, or at
least insufficient, for optimal choice of policies governing treatment
assignment. The other representation often yields quantities that lack an
interpretation as summaries of unit-level effects, but that we argue may still
be relevant to policy choice. Among various estimands, the expected average
outcome -- or its contrast between two different policies -- can be represented
both ways and, we argue, merits further attention.

arXiv link: http://arxiv.org/abs/2507.14391v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2025-07-18

Regional compositional trajectories and structural change: A spatiotemporal multivariate autoregressive framework

Authors: Matthias Eckardt, Philipp Otto

Compositional data, such as regional shares of economic sectors or property
transactions, are central to understanding structural change in economic
systems across space and time. This paper introduces a spatiotemporal
multivariate autoregressive model tailored for panel data with
composition-valued responses at each areal unit and time point. The proposed
framework enables the joint modelling of temporal dynamics and spatial
dependence under compositional constraints and is estimated via a quasi maximum
likelihood approach. We build on recent theoretical advances to establish
identifiability and asymptotic properties of the estimator when both the number
of regions and time points grow. The utility and flexibility of the model are
demonstrated through two applications: analysing property transaction
compositions in an intra-city housing market (Berlin), and regional sectoral
compositions in Spain's economy. These case studies highlight how the proposed
framework captures key features of spatiotemporal economic processes that are
often missed by conventional methods.

arXiv link: http://arxiv.org/abs/2507.14389v1

Econometrics arXiv paper, submitted: 2025-07-18

Leveraging Covariates in Regression Discontinuity Designs

Authors: Matias D. Cattaneo, Filippo Palomba

It is common practice to incorporate additional covariates in empirical
economics. In the context of Regression Discontinuity (RD) designs, covariate
adjustment plays multiple roles, making it essential to understand its impact
on analysis and conclusions. Typically implemented via local least squares
regressions, covariate adjustment can serve three main distinct purposes: (i)
improving the efficiency of RD average causal effect estimators, (ii) learning
about heterogeneous RD policy effects, and (iii) changing the RD parameter of
interest. This article discusses and illustrates empirically how to leverage
covariates effectively in RD designs.

arXiv link: http://arxiv.org/abs/2507.14311v1

Econometrics arXiv paper, submitted: 2025-07-18

Debiased Machine Learning for Unobserved Heterogeneity: High-Dimensional Panels and Measurement Error Models

Authors: Facundo Argañaraz, Juan Carlos Escanciano

Developing robust inference for models with nonparametric Unobserved
Heterogeneity (UH) is both important and challenging. We propose novel Debiased
Machine Learning (DML) procedures for valid inference on functionals of UH,
allowing for partial identification of multivariate target and high-dimensional
nuisance parameters. Our main contribution is a full characterization of all
relevant Neyman-orthogonal moments in models with nonparametric UH, where
relevance means informativeness about the parameter of interest. Under
additional support conditions, orthogonal moments are globally robust to the
distribution of the UH. They may still involve other high-dimensional nuisance
parameters, but their local robustness reduces regularization bias and enables
valid DML inference. We apply these results to: (i) common parameters, average
marginal effects, and variances of UH in panel data models with
high-dimensional controls; (ii) moments of the common factor in the Kotlarski
model with a factor loading; and (iii) smooth functionals of teacher
value-added. Monte Carlo simulations show substantial efficiency gains from
using efficient orthogonal moments relative to ad-hoc choices. We illustrate
the practical value of our approach by showing that existing estimates of the
average and variance effects of maternal smoking on child birth weight are
robust.

arXiv link: http://arxiv.org/abs/2507.13788v1

Econometrics arXiv paper, submitted: 2025-07-17

Who With Whom? Learning Optimal Matching Policies

Authors: Yagan Hazard, Toru Kitagawa

There are many economic contexts where the productivity and welfare
performance of institutions and policies depend on who matches with whom.
Examples include caseworkers and job seekers in job search assistance programs,
medical doctors and patients, teachers and students, attorneys and defendants,
and tax auditors and taxpayers, among others. Although reallocating individuals
through a change in matching policy can be less costly than training personnel
or introducing a new program, methods for learning optimal matching policies
and their statistical performance are less studied than methods for other
policy interventions. This paper develops a method to learn welfare optimal
matching policies for two-sided matching problems in which a planner matches
individuals based on the rich set of observable characteristics of the two
sides. We formulate the learning problem as an empirical optimal transport
problem with a match cost function estimated from training data, and propose
estimating an optimal matching policy by maximizing the entropy regularized
empirical welfare criterion. We derive a welfare regret bound for the estimated
policy and characterize its convergence. We apply our proposal to the problem
of matching caseworkers and job seekers in a job search assistance program, and
assess its welfare performance in a simulation study calibrated with French
administrative data.

arXiv link: http://arxiv.org/abs/2507.13567v1

Econometrics arXiv paper, submitted: 2025-07-17

Combining stated and revealed preferences

Authors: Romuald Meango, Marc Henry, Ismael Mourifie

Can stated preferences inform counterfactual analyses of actual choice? This
research proposes a novel approach to researchers who have access to both
stated choices in hypothetical scenarios and actual choices, matched or
unmatched. The key idea is to use stated choices to identify the distribution
of individual unobserved heterogeneity. If this unobserved heterogeneity is the
source of endogeneity, the researcher can correct for its influence in a demand
function estimation using actual choices and recover causal effects. Bounds on
causal effects are derived in the case, where stated choice and actual choices
are observed in unmatched data sets. These data combination bounds are of
independent interest. We derive a valid bootstrap inference for the bounds and
show its good performance in a simulation experiment.

arXiv link: http://arxiv.org/abs/2507.13552v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-07-17

Refining the Notion of No Anticipation in Difference-in-Differences Studies

Authors: Marco Piccininni, Eric J. Tchetgen Tchetgen, Mats J. Stensrud

We address an ambiguity in identification strategies using
difference-in-differences, which are widely applied in empirical research,
particularly in economics. The assumption commonly referred to as the
"no-anticipation assumption" states that treatment has no effect on outcomes
before its implementation. However, because standard causal models rely on a
temporal structure in which causes precede effects, such an assumption seems to
be inherently satisfied. This raises the question of whether the assumption is
repeatedly stated out of redundancy or because the formal statements fail to
capture the intended subject-matter interpretation. We argue that confusion
surrounding the no-anticipation assumption arises from ambiguity in the
intervention considered and that current formulations of the assumption are
ambiguous. Therefore, new definitions and identification results are proposed.

arXiv link: http://arxiv.org/abs/2507.12891v2

Econometrics arXiv paper, submitted: 2025-07-16

Placebo Discontinuity Design

Authors: Rahul Singh, Moses Stewart

Standard regression discontinuity design (RDD) models rely on the continuity
of expected potential outcomes at the cutoff. The standard continuity
assumption can be violated by strategic manipulation of the running variable,
which is realistic when the cutoff is widely known and when the treatment of
interest is a social program or government benefit. In this work, we identify
the treatment effect despite such a violation, by leveraging a placebo
treatment and a placebo outcome. We introduce a local instrumental variable
estimator. Our estimator decomposes into two terms: the standard RDD estimator
of the target outcome's discontinuity, and a new adjustment term based on the
placebo outcome's discontinuity. We show that our estimator is consistent, and
we justify a robust bias-corrected inference procedure. Our method expands the
applicability of RDD to settings with strategic behavior around the cutoff,
which commonly arise in social science.

arXiv link: http://arxiv.org/abs/2507.12693v1

Econometrics arXiv paper, submitted: 2025-07-16

NA-DiD: Extending Difference-in-Differences with Capabilities

Authors: Stanisław M. S. Halkiewicz

This paper introduces the Non-Additive Difference-in-Differences (NA-DiD)
framework, which extends classical DiD by incorporating non-additive measures
the Choquet integral for effect aggregation. It serves as a novel econometric
tool for impact evaluation, particularly in settings with non-additive
treatment effects. First, we introduce the integral representation of the
classial DiD model, and then extend it to non-additive measures, therefore
deriving the formulae for NA-DiD estimation. Then, we give its theoretical
properties. Applying NA-DiD to a simulated hospital hygiene intervention, we
find that classical DiD can overestimate treatment effects, f.e. failing to
account for compliance erosion. In contrast, NA-DiD provides a more accurate
estimate by incorporating non-linear aggregation. The Julia implementation of
the techniques used and introduced in this article is provided in the
appendices.

arXiv link: http://arxiv.org/abs/2507.12690v1

Econometrics arXiv paper, submitted: 2025-07-16

Semiparametric Learning of Integral Functionals on Submanifolds

Authors: Xiaohong Chen, Wayne Yuan Gao

This paper studies the semiparametric estimation and inference of integral
functionals on submanifolds, which arise naturally in a variety of econometric
settings. For linear integral functionals on a regular submanifold, we show
that the semiparametric plug-in estimator attains the minimax-optimal
convergence rate $n^{-s{2s+d-m}}$, where $s$ is the H\"{o}lder
smoothness order of the underlying nonparametric function, $d$ is the dimension
of the first-stage nonparametric estimation, $m$ is the dimension of the
submanifold over which the integral is taken. This rate coincides with the
standard minimax-optimal rate for a $(d-m)$-dimensional nonparametric
estimation problem, illustrating that integration over the $m$-dimensional
manifold effectively reduces the problem's dimensionality. We then provide a
general asymptotic normality theorem for linear/nonlinear submanifold
integrals, along with a consistent variance estimator. We provide simulation
evidence in support of our theoretical results.

arXiv link: http://arxiv.org/abs/2507.12673v1

Econometrics arXiv updated paper (originally submitted: 2025-07-16)

Catching Bid-rigging Cartels with Graph Attention Neural Networks

Authors: David Imhof, Emanuel W Viklund, Martin Huber

We propose a novel application of graph attention networks (GATs), a type of
graph neural network enhanced with attention mechanisms, to develop a deep
learning algorithm for detecting collusive behavior, leveraging predictive
features suggested in prior research. We test our approach on a large dataset
covering 13 markets across seven countries. Our results show that predictive
models based on GATs, trained on a subset of the markets, can be effectively
transferred to other markets, achieving accuracy rates between 80% and 90%,
depending on the hyperparameter settings. The best-performing configuration,
applied to eight markets from Switzerland and the Japanese region of Okinawa,
yields an average accuracy of 91% for cross-market prediction. When extended to
12 markets, the method maintains a strong performance with an average accuracy
of 84%, surpassing traditional ensemble approaches in machine learning. These
results suggest that GAT-based detection methods offer a promising tool for
competition authorities to screen markets for potential cartel activity.

arXiv link: http://arxiv.org/abs/2507.12369v2

Econometrics arXiv paper, submitted: 2025-07-16

Forecasting Climate Policy Uncertainty: Evidence from the United States

Authors: Donia Besher, Anirban Sengupta, Tanujit Chakraborty

Forecasting Climate Policy Uncertainty (CPU) is essential as policymakers
strive to balance economic growth with environmental goals. High levels of CPU
can slow down investments in green technologies, make regulatory planning more
difficult, and increase public resistance to climate reforms, especially during
times of economic stress. This study addresses the challenge of forecasting the
US CPU index by building the Bayesian Structural Time Series (BSTS) model with
a large set of covariates, including economic indicators, financial cycle data,
and public sentiments captured through Google Trends. The key strength of the
BSTS model lies in its ability to efficiently manage a large number of
covariates through its dynamic feature selection mechanism based on the
spike-and-slab prior. To validate the effectiveness of the selected features of
the BSTS model, an impulse response analysis is performed. The results show
that macro-financial shocks impact CPU in different ways over time. Numerical
experiments are performed to evaluate the performance of the BSTS model with
exogenous variables on the US CPU dataset over different forecasting horizons.
The empirical results confirm that BSTS consistently outperforms classical and
deep learning frameworks, particularly for semi-long-term and long-term
forecasts.

arXiv link: http://arxiv.org/abs/2507.12276v1

Econometrics arXiv paper, submitted: 2025-07-16

Data Synchronization at High Frequencies

Authors: Xinbing Kong, Cheng Liu, Bin Wu

Asynchronous trading in high-frequency financial markets introduces
significant biases into econometric analysis, distorting risk estimates and
leading to suboptimal portfolio decisions. Existing synchronization methods,
such as the previous-tick approach, suffer from information loss and create
artificial price staleness. We introduce a novel framework that recasts the
data synchronization challenge as a constrained matrix completion problem. Our
approach recovers the potential matrix of high-frequency price increments by
minimizing its nuclear norm -- capturing the underlying low-rank factor
structure -- subject to a large-scale linear system derived from observed,
asynchronous price changes. Theoretically, we prove the existence and
uniqueness of our estimator and establish its convergence rate. A key
theoretical insight is that our method accurately and robustly leverages
information from both frequently and infrequently traded assets, overcoming a
critical difficulty of efficiency loss in traditional methods. Empirically,
using extensive simulations and a large panel of S&P 500 stocks, we demonstrate
that our method substantially outperforms established benchmarks. It not only
achieves significantly lower synchronization errors, but also corrects the bias
in systematic risk estimates (i.e., eigenvalues) and the estimate of betas
caused by stale prices. Crucially, portfolios constructed using our
synchronized data yield consistently and economically significant higher
out-of-sample Sharpe ratios. Our framework provides a powerful tool for
uncovering the true dynamics of asset prices, with direct implications for
high-frequency risk management, algorithmic trading, and econometric inference.

arXiv link: http://arxiv.org/abs/2507.12220v1

Econometrics arXiv paper, submitted: 2025-07-15

Inference on Optimal Policy Values and Other Irregular Functionals via Smoothing

Authors: Justin Whitehouse, Morgane Austern, Vasilis Syrgkanis

Constructing confidence intervals for the value of an optimal treatment
policy is an important problem in causal inference. Insight into the optimal
policy value can guide the development of reward-maximizing, individualized
treatment regimes. However, because the functional that defines the optimal
value is non-differentiable, standard semi-parametric approaches for performing
inference fail to be directly applicable. Existing approaches for handling this
non-differentiability fall roughly into two camps. In one camp are estimators
based on constructing smooth approximations of the optimal value. These
approaches are computationally lightweight, but typically place unrealistic
parametric assumptions on outcome regressions. In another camp are approaches
that directly de-bias the non-smooth objective. These approaches don't place
parametric assumptions on nuisance functions, but they either require the
computation of intractably-many nuisance estimates, assume unrealistic
$L^\infty$ nuisance convergence rates, or make strong margin assumptions that
prohibit non-response to a treatment. In this paper, we revisit the problem of
constructing smooth approximations of non-differentiable functionals. By
carefully controlling first-order bias and second-order remainders, we show
that a softmax smoothing-based estimator can be used to estimate parameters
that are specified as a maximum of scores involving nuisance components. In
particular, this includes the value of the optimal treatment policy as a
special case. Our estimator obtains $n$ convergence rates, avoids
parametric restrictions/unrealistic margin assumptions, and is often
statistically efficient.

arXiv link: http://arxiv.org/abs/2507.11780v1

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2025-07-14

FARS: Factor Augmented Regression Scenarios in R

Authors: Gian Pietro Bellocca, Ignacio Garrón, Vladimir Rodríguez-Caballero, Esther Ruiz

In the context of macroeconomic/financial time series, the FARS package
provides a comprehensive framework in R for the construction of conditional
densities of the variable of interest based on the factor-augmented quantile
regressions (FA-QRs) methodology, with the factors extracted from multi-level
dynamic factor models (ML-DFMs) with potential overlapping group-specific
factors. Furthermore, the package also allows the construction of measures of
risk as well as modeling and designing economic scenarios based on the
conditional densities. In particular, the package enables users to: (i) extract
global and group-specific factors using a flexible multi-level factor
structure; (ii) compute asymptotically valid confidence regions for the
estimated factors, accounting for uncertainty in the factor loadings; (iii)
obtain estimates of the parameters of the FA-QRs together with their standard
deviations; (iv) recover full predictive conditional densities from estimated
quantiles; (v) obtain risk measures based on extreme quantiles of the
conditional densities; and (vi) estimate the conditional density and the
corresponding extreme quantiles when the factors are stressed.

arXiv link: http://arxiv.org/abs/2507.10679v3

Econometrics arXiv updated paper (originally submitted: 2025-07-14)

Breakdown Analysis for Instrumental Variables with Binary Outcomes

Authors: Pedro Picchetti

This paper studies the partial identification of treatment effects in
Instrumental Variables (IV) settings with binary outcomes under violations of
independence. I derive the identified sets for the treatment parameters of
interest in the setting, as well as breakdown values for conclusions regarding
the true treatment effects. I derive $N$-consistent nonparametric
estimators for the bounds of treatment effects and for breakdown values. These
results can be used to assess the robustness of empirical conclusions obtained
under the assumption that the instrument is independent from potential
quantities, which is a pervasive concern in studies that use IV methods with
observational data. In the empirical application, I show that the conclusions
regarding the effects of family size on female unemployment using same-sex
siblings as the instrument are highly sensitive to violations of independence.

arXiv link: http://arxiv.org/abs/2507.10242v4

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-07-13

An Algorithm for Identifying Interpretable Subgroups With Elevated Treatment Effects

Authors: Albert Chiu

We introduce an algorithm for identifying interpretable subgroups with
elevated treatment effects, given an estimate of individual or conditional
average treatment effects (CATE). Subgroups are characterized by “rule sets”
-- easy-to-understand statements of the form (Condition A AND Condition B) OR
(Condition C) -- which can capture high-order interactions while retaining
interpretability. Our method complements existing approaches for estimating the
CATE, which often produce high dimensional and uninterpretable results, by
summarizing and extracting critical information from fitted models to aid
decision making, policy implementation, and scientific understanding. We
propose an objective function that trades-off subgroup size and effect size,
and varying the hyperparameter that controls this trade-off results in a
“frontier” of Pareto optimal rule sets, none of which dominates the others
across all criteria. Valid inference is achievable through sample splitting. We
demonstrate the utility and limitations of our method using simulated and
empirical examples.

arXiv link: http://arxiv.org/abs/2507.09494v1

Econometrics arXiv paper, submitted: 2025-07-11

Propensity score with factor loadings: the effect of the Paris Agreement

Authors: Angelo Forino, Andrea Mercatanti, Giacomo Morelli

Factor models for longitudinal data, where policy adoption is unconfounded
with respect to a low-dimensional set of latent factor loadings, have become
increasingly popular for causal inference. Most existing approaches, however,
rely on a causal finite-sample approach or computationally intensive methods,
limiting their applicability and external validity. In this paper, we propose a
novel causal inference method for panel data based on inverse propensity score
weighting where the propensity score is a function of latent factor loadings
within a framework of causal inference from super-population. The approach
relaxes the traditional restrictive assumptions of causal panel methods, while
offering advantages in terms of causal interpretability, policy relevance, and
computational efficiency. Under standard assumptions, we outline a three-step
estimation procedure for the ATT and derive its large-sample properties using
Mestimation theory. We apply the method to assess the causal effect of the
Paris Agreement, a policy aimed at fostering the transition to a low-carbon
economy, on European stock returns. Our empirical results suggest a
statistically significant and negative short-run effect on the stock returns of
firms that issued green bonds.

arXiv link: http://arxiv.org/abs/2507.08764v1

Econometrics arXiv paper, submitted: 2025-07-11

Correlated Synthetic Controls

Authors: Tzvetan Moev

Synthetic Control methods have recently gained considerable attention in
applications with only one treated unit. Their popularity is partly based on
the key insight that we can predict good synthetic counterfactuals for our
treated unit. However, this insight of predicting counterfactuals is
generalisable to microeconometric settings where we often observe many treated
units. We propose the Correlated Synthetic Controls (CSC) estimator for such
situations: intuitively, it creates synthetic controls that are correlated
across individuals with similar observables. When treatment assignment is
correlated with unobservables, we show that the CSC estimator has more
desirable theoretical properties than the difference-in-differences estimator.
We also utilise CSC in practice to obtain heterogeneous treatment effects in
the well-known Mariel Boatlift study, leveraging additional information from
the PSID.

arXiv link: http://arxiv.org/abs/2507.08918v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-07-10

Efficient and Scalable Estimation of Distributional Treatment Effects with Multi-Task Neural Networks

Authors: Tomu Hirata, Undral Byambadalai, Tatsushi Oka, Shota Yasui, Shingo Uto

We propose a novel multi-task neural network approach for estimating
distributional treatment effects (DTE) in randomized experiments. While DTE
provides more granular insights into the experiment outcomes over conventional
methods focusing on the Average Treatment Effect (ATE), estimating it with
regression adjustment methods presents significant challenges. Specifically,
precision in the distribution tails suffers due to data imbalance, and
computational inefficiencies arise from the need to solve numerous regression
problems, particularly in large-scale datasets commonly encountered in
industry. To address these limitations, our method leverages multi-task neural
networks to estimate conditional outcome distributions while incorporating
monotonic shape constraints and multi-threshold label learning to enhance
accuracy. To demonstrate the practical effectiveness of our proposed method, we
apply our method to both simulated and real-world datasets, including a
randomized field experiment aimed at reducing water consumption in the US and a
large-scale A/B test from a leading streaming platform in Japan. The
experimental results consistently demonstrate superior performance across
various datasets, establishing our method as a robust and practical solution
for modern causal inference applications requiring a detailed understanding of
treatment effect heterogeneity.

arXiv link: http://arxiv.org/abs/2507.07738v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-07-10

Galerkin-ARIMA: A Two-Stage Polynomial Regression Framework for Fast Rolling One-Step-Ahead Forecasting

Authors: Haojie Liu, Zihan Lin

We introduce Galerkin-ARIMA, a novel time-series forecasting framework that
integrates Galerkin projection techniques with the classical ARIMA model to
capture potentially nonlinear dependencies in lagged observations. By replacing
the fixed linear autoregressive component with a spline-based basis expansion,
Galerkin-ARIMA flexibly approximates the underlying relationship among past
values via ordinary least squares, while retaining the moving-average structure
and Gaussian innovation assumptions of ARIMA. We derive closed-form solutions
for both the AR and MA components using two-stage Galerkin projections,
establish conditions for asymptotic unbiasedness and consistency, and analyze
the bias-variance trade-off under basis-size growth. Complexity analysis
reveals that, for moderate basis dimensions, our approach can substantially
reduce computational cost compared to maximum-likelihood ARIMA estimation.
Through extensive simulations on four synthetic processes-including noisy ARMA,
seasonal, trend-AR, and nonlinear recursion series-we demonstrate that
Galerkin-ARIMA matches or closely approximates ARIMA's forecasting accuracy
while achieving orders-of-magnitude speedups in rolling forecasting tasks.
These results suggest that Galerkin-ARIMA offers a powerful, efficient
alternative for modeling complex time series dynamics in high-volume or
real-time applications.

arXiv link: http://arxiv.org/abs/2507.07469v2

Econometrics arXiv paper, submitted: 2025-07-10

Tracking the economy at high frequency

Authors: Freddy García-Albán, Juan Jarrín

This paper develops a high-frequency economic indicator using a Bayesian
Dynamic Factor Model estimated with mixed-frequency data. The model
incorporates weekly, monthly, and quarterly official indicators, and allows for
dynamic heterogeneity and stochastic volatility. To ensure temporal consistency
and avoid irregular aggregation artifacts, we introduce a pseudo-week structure
that harmonizes the timing of observations. Our framework integrates dispersed
and asynchronous official statistics into a unified High-Frequency Economic
Index (HFEI), enabling real-time economic monitoring even in environments
characterized by severe data limitations. We apply this framework to construct
a high-frequency indicator for Ecuador, a country where official data are
sparse and highly asynchronous, and compute pseudo-weekly recession
probabilities using a time-varying mean regime-switching model fitted to the
resulting index.

arXiv link: http://arxiv.org/abs/2507.07450v1

Econometrics arXiv paper, submitted: 2025-07-09

Identifying Present-Biased Discount Functions in Dynamic Discrete Choice Models

Authors: Jaap H. Abbring, Øystein Daljord, Fedor Iskhakov

We study the identification of dynamic discrete choice models with
sophisticated, quasi-hyperbolic time preferences under exclusion restrictions.
We consider both standard finite horizon problems and empirically useful
infinite horizon ones, which we prove to always have solutions. We reduce
identification to finding the present-bias and standard discount factors that
solve a system of polynomial equations with coefficients determined by the data
and use this to bound the cardinality of the identified set. The discount
factors are usually identified, but hard to precisely estimate, because
exclusion restrictions do not capture the defining feature of present bias,
preference reversals, well.

arXiv link: http://arxiv.org/abs/2507.07286v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-07-09

On a Debiased and Semiparametric Efficient Changes-in-Changes Estimator

Authors: Jinghao Sun, Eric J. Tchetgen Tchetgen

We present a novel extension of the influential changes-in-changes (CiC)
framework of Athey and Imbens (2006) for estimating the average treatment
effect on the treated (ATT) and distributional causal effects in panel data
with unmeasured confounding. While CiC relaxes the parallel trends assumption
in difference-in-differences (DiD), existing methods typically assume a scalar
unobserved confounder and monotonic outcome relationships, and lack inference
tools that accommodate continuous covariates flexibly. Motivated by empirical
settings with complex confounding and rich covariate information, we make two
main contributions. First, we establish nonparametric identification under
relaxed assumptions that allow high-dimensional, non-monotonic unmeasured
confounding. Second, we derive semiparametrically efficient estimators that are
Neyman orthogonal to infinite-dimensional nuisance parameters, enabling valid
inference even with machine learning-based estimation of nuisance components.
We illustrate the utility of our approach in an empirical analysis of mass
shootings and U.S. electoral outcomes, where key confounders, such as political
mobilization or local gun culture, are typically unobserved and challenging to
quantify.

arXiv link: http://arxiv.org/abs/2507.07228v2

Econometrics arXiv paper, submitted: 2025-07-08

Equity Markets Volatility, Regime Dependence and Economic Uncertainty: The Case of Pacific Basin

Authors: Bahram Adrangi, Arjun Chatrath, Saman Hatamerad, Kambiz Raffiee

This study investigates the relationship between the market volatility of the
iShares Asia 50 ETF (AIA) and economic and market sentiment indicators from the
United States, China, and globally during periods of economic uncertainty.
Specifically, it examines the association between AIA volatility and key
indicators such as the US Economic Uncertainty Index (ECU), the US Economic
Policy Uncertainty Index (EPU), China's Economic Policy Uncertainty Index
(EPUCH), the Global Economic Policy Uncertainty Index (GEPU), and the Chicago
Board Options Exchange's Volatility Index (VIX), spanning the years 2007 to
2023. Employing methodologies such as the two-covariate GARCH-MIDAS model,
regime-switching Markov Chain (MSR), and quantile regressions (QR), the study
explores the regime-dependent dynamics between AIA volatility and
economic/market sentiment, taking into account investors' sensitivity to market
uncertainties across different regimes. The findings reveal that the
relationship between realized volatility and sentiment varies significantly
between high- and low-volatility regimes, reflecting differences in investors'
responses to market uncertainties under these conditions. Additionally, a weak
association is observed between short-term volatility and economic/market
sentiment indicators, suggesting that these indicators may have limited
predictive power, especially during high-volatility regimes. The QR results
further demonstrate the robustness of MSR estimates across most quantiles.
Overall, the study provides valuable insights into the complex interplay
between market volatility and economic/market sentiment, offering practical
implications for investors and policymakers.

arXiv link: http://arxiv.org/abs/2507.05552v1

Econometrics arXiv paper, submitted: 2025-07-07

Identification of Causal Effects with a Bunching Design

Authors: Carolina Caetano, Gregorio Caetano, Leonard Goff, Eric Nielsen

We show that causal effects can be identified when there is bunching in the
distribution of a continuous treatment variable, without imposing any
parametric assumptions. This yields a new nonparametric method for overcoming
selection bias in the absence of instrumental variables, panel data, or other
popular research designs for causal inference. The method leverages the change
of variables theorem from integration theory, relating the selection bias to
the ratio of the density of the treatment and the density of the part of the
outcome that varies with confounders. At the bunching point, the treatment
level is constant, so the variation in the outcomes is due entirely to
unobservables, allowing us to identify the denominator. Our main result
identifies the average causal response to the treatment among individuals who
marginally select into the bunching point. We further show that under
additional smoothness assumptions on the selection bias, treatment effects away
from the bunching point may also be identified. We propose estimators based on
standard software packages and apply the method to estimate the effect of
maternal smoking during pregnancy on birth weight.

arXiv link: http://arxiv.org/abs/2507.05210v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-07-07

Blind Targeting: Personalization under Third-Party Privacy Constraints

Authors: Anya Shchetkina

Major advertising platforms recently increased privacy protections by
limiting advertisers' access to individual-level data. Instead of providing
access to granular raw data, the platforms only allow a limited number of
aggregate queries to a dataset, which is further protected by adding
differentially private noise. This paper studies whether and how advertisers
can design effective targeting policies within these restrictive privacy
preserving data environments. To achieve this, I develop a probabilistic
machine learning method based on Bayesian optimization, which facilitates
dynamic data exploration. Since Bayesian optimization was designed to sample
points from a function to find its maximum, it is not applicable to aggregate
queries and to targeting. Therefore, I introduce two innovations: (i) integral
updating of posteriors which allows to select the best regions of the data to
query rather than individual points and (ii) a targeting-aware acquisition
function that dynamically selects the most informative regions for the
targeting task. I identify the conditions of the dataset and privacy
environment that necessitate the use of such a "smart" querying strategy. I
apply the strategic querying method to the Criteo AI Labs dataset for uplift
modeling (Diemert et al., 2018) that contains visit and conversion data from
14M users. I show that an intuitive benchmark strategy only achieves 33% of the
non-privacy-preserving targeting potential in some cases, while my strategic
querying method achieves 97-101% of that potential, and is statistically
indistinguishable from Causal Forest (Athey et al., 2019): a state-of-the-art
non-privacy-preserving machine learning targeting method.

arXiv link: http://arxiv.org/abs/2507.05175v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-07-07

Forward Variable Selection in Ultra-High Dimensional Linear Regression Using Gram-Schmidt Orthogonalization

Authors: Jialuo Chen, Zhaoxing Gao, Ruey S. Tsay

We investigate forward variable selection for ultra-high dimensional linear
regression using a Gram-Schmidt orthogonalization procedure. Unlike the
commonly used Forward Regression (FR) method, which computes regression
residuals using an increasing number of selected features, or the Orthogonal
Greedy Algorithm (OGA), which selects variables based on their marginal
correlations with the residuals, our proposed Gram-Schmidt Forward Regression
(GSFR) simplifies the selection process by evaluating marginal correlations
between the residuals and the orthogonalized new variables. Moreover, we
introduce a new model size selection criterion that determines the number of
selected variables by detecting the most significant change in their unique
contributions, effectively filtering out redundant predictors along the
selection path. While GSFR is theoretically equivalent to FR except for the
stopping rule, our refinement and the newly proposed stopping rule
significantly improve computational efficiency. In ultra-high dimensional
settings, where the dimensionality far exceeds the sample size and predictors
exhibit strong correlations, we establish that GSFR achieves a convergence rate
comparable to OGA and ensures variable selection consistency under mild
conditions. We demonstrate the proposed method {using} simulations and real
data examples. Extensive numerical studies show that GSFR outperforms commonly
used methods in ultra-high dimensional variable selection.

arXiv link: http://arxiv.org/abs/2507.04668v1

Econometrics arXiv paper, submitted: 2025-07-07

A General Class of Model-Free Dense Precision Matrix Estimators

Authors: Mehmet Caner Agostino Capponi Mihailo Stojnic

We introduce prototype consistent model-free, dense precision matrix
estimators that have broad application in economics. Using quadratic form
concentration inequalities and novel algebraic characterizations of confounding
dimension reductions, we are able to: (i) obtain non-asymptotic bounds for
precision matrix estimation errors and also (ii) consistency in high
dimensions; (iii) uncover the existence of an intrinsic signal-to-noise --
underlying dimensions tradeoff; and (iv) avoid exact population sparsity
assumptions. In addition to its desirable theoretical properties, a thorough
empirical study of the S&P 500 index shows that a tuning parameter-free special
case of our general estimator exhibits a doubly ascending Sharpe Ratio pattern,
thereby establishing a link with the famous double descent phenomenon
dominantly present in recent statistical and machine learning literature.

arXiv link: http://arxiv.org/abs/2507.04663v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-07-06

A Test for Jumps in Metric-Space Conditional Means

Authors: David Van Dijcke

Standard methods for detecting discontinuities in conditional means are not
applicable to outcomes that are complex, non-Euclidean objects like
distributions, networks, or covariance matrices. This article develops a
nonparametric test for jumps in conditional means when outcomes lie in a
non-Euclidean metric space. Using local Fr\'echet regression, the method
estimates a mean path on either side of a candidate cutoff. This extends
existing $k$-sample tests to a non-parametric regression setting with
metric-space valued outcomes. I establish the asymptotic distribution of the
test and its consistency against contiguous alternatives. For this, I derive a
central limit theorem for the local estimator of the conditional Fr\'echet
variance and a consistent estimator of its asymptotic variance. Simulations
confirm nominal size control and robust power in finite samples. Two empirical
illustrations demonstrate the method's ability to reveal discontinuities missed
by scalar-based tests. I find sharp changes in (i) work-from-home compositions
at an income threshold for non-compete enforceability and (ii) national
input-output networks following the loss of preferential U.S. trade access.
These findings show the value of analyzing regression outcomes in their native
metric spaces.

arXiv link: http://arxiv.org/abs/2507.04560v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-07-05

A New and Efficient Debiased Estimation of General Treatment Models by Balanced Neural Networks Weighting

Authors: Zeqi Wu, Meilin Wang, Wei Huang, Zheng Zhang

Estimation and inference of treatment effects under unconfounded treatment
assignments often suffer from bias and the `curse of dimensionality' due to the
nonparametric estimation of nuisance parameters for high-dimensional
confounders. Although debiased state-of-the-art methods have been proposed for
binary treatments under particular treatment models, they can be unstable for
small sample sizes. Moreover, directly extending them to general treatment
models can lead to computational complexity. We propose a balanced neural
networks weighting method for general treatment models, which leverages deep
neural networks to alleviate the curse of dimensionality while retaining
optimal covariate balance through calibration, thereby achieving debiased and
robust estimation. Our method accommodates a wide range of treatment models,
including average, quantile, distributional, and asymmetric least squares
treatment effects, for discrete, continuous, and mixed treatments. Under
regularity conditions, we show that our estimator achieves rate double
robustness and $N$-asymptotic normality, and its asymptotic variance
achieves the semiparametric efficiency bound. We further develop a statistical
inference procedure based on weighted bootstrap, which avoids estimating the
efficient influence/score functions. Simulation results reveal that the
proposed method consistently outperforms existing alternatives, especially when
the sample size is small. Applications to the 401(k) dataset and the Mother's
Significant Features dataset further illustrate the practical value of the
method for estimating both average and quantile treatment effects under binary
and continuous treatments, respectively.

arXiv link: http://arxiv.org/abs/2507.04044v1

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2025-07-05

Increasing Systemic Resilience to Socioeconomic Challenges: Modeling the Dynamics of Liquidity Flows and Systemic Risks Using Navier-Stokes Equations

Authors: Davit Gondauri

Modern economic systems face unprecedented socioeconomic challenges, making
systemic resilience and effective liquidity flow management essential.
Traditional models such as CAPM, VaR, and GARCH often fail to reflect real
market fluctuations and extreme events. This study develops and validates an
innovative mathematical model based on the Navier-Stokes equations, aimed at
the quantitative assessment, forecasting, and simulation of liquidity flows and
systemic risks. The model incorporates 13 macroeconomic and financial
parameters, including liquidity velocity, market pressure, internal stress,
stochastic fluctuations, and risk premiums, all based on real data and formally
included in the modified equation. The methodology employs econometric testing,
Fourier analysis, stochastic simulation, and AI-based calibration to enable
dynamic testing and forecasting. Simulation-based sensitivity analysis
evaluates the impact of parameter changes on financial balance. The model is
empirically tested using Georgian macroeconomic and financial data from
2010-2024, including GDP, inflation, the Gini index, CDS spreads, and LCR
metrics. Results show that the model effectively describes liquidity dynamics,
systemic risk, and extreme scenarios, while also offering a robust framework
for multifactorial analysis, crisis prediction, and countercyclical policy
planning.

arXiv link: http://arxiv.org/abs/2507.05287v1

Econometrics arXiv updated paper (originally submitted: 2025-07-04)

Nonparametric regression for cost-effectiveness analyses with observational data -- a tutorial

Authors: Jonas Esser, Mateus Maia, Judith Bosmans, Johanna van Dongen

Healthcare decision-making often requires selecting among treatment options
under budget constraints, particularly when one option is more effective but
also more costly. Cost-effectiveness analysis (CEA) provides a framework for
evaluating whether the health benefits of a treatment justify its additional
costs. A key component of CEA is the estimation of treatment effects on both
health outcomes and costs, which becomes challenging when using observational
data, due to potential confounding. While advanced causal inference methods
exist for use in such circumstances, their adoption in CEAs remains limited,
with many studies relying on overly simplistic methods such as linear
regression or propensity score matching. We believe that this is mainly due to
health economists being generally unfamiliar with superior methodology. In this
paper, we address this gap by introducing cost-effectiveness researchers to
modern nonparametric regression models, with a particular focus on Bayesian
Additive Regression Trees (BART). We provide practical guidance on how to
implement BART in CEAs, including code examples, and discuss its advantages in
producing more robust and credible estimates from observational data.

arXiv link: http://arxiv.org/abs/2507.03511v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-07-03

Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions

Authors: Thomas Hazenberg, Yao Ma, Seyed Sahand Mohammadi Ziabari, Marijn van Rijswijk

This study investigates how Multi-Agent Reinforcement Learning (MARL) can
improve dynamic pricing strategies in supply chains, particularly in contexts
where traditional ERP systems rely on static, rule-based approaches that
overlook strategic interactions among market actors. While recent research has
applied reinforcement learning to pricing, most implementations remain
single-agent and fail to model the interdependent nature of real-world supply
chains. This study addresses that gap by evaluating the performance of three
MARL algorithms: MADDPG, MADQN, and QMIX against static rule-based baselines,
within a simulated environment informed by real e-commerce transaction data and
a LightGBM demand prediction model. Results show that rule-based agents achieve
near-perfect fairness (Jain's Index: 0.9896) and the highest price stability
(volatility: 0.024), but they fully lack competitive dynamics. Among MARL
agents, MADQN exhibits the most aggressive pricing behaviour, with the highest
volatility and the lowest fairness (0.5844). MADDPG provides a more balanced
approach, supporting market competition (share volatility: 9.5 pp) while
maintaining relatively high fairness (0.8819) and stable pricing. These
findings suggest that MARL introduces emergent strategic behaviour not captured
by static pricing rules and may inform future developments in dynamic pricing.

arXiv link: http://arxiv.org/abs/2507.02698v1

Econometrics arXiv paper, submitted: 2025-07-03

Large-Scale Estimation under Unknown Heteroskedasticity

Authors: Sheng Chao Ho

This paper studies nonparametric empirical Bayes methods in a heterogeneous
parameters framework that features unknown means and variances. We provide
extended Tweedie's formulae that express the (infeasible) optimal estimators of
heterogeneous parameters, such as unit-specific means or quantiles, in terms of
the density of certain sufficient statistics. These are used to propose
feasible versions with nearly parametric regret bounds of the order of $(\log
n)^\kappa / n$. The estimators are employed in a study of teachers'
value-added, where we find that allowing for heterogeneous variances across
teachers is crucial for delivery optimal estimates of teacher quality and
detecting low-performing teachers.

arXiv link: http://arxiv.org/abs/2507.02293v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-07-03

It's Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation

Authors: Jikai Jin, Lester Mackey, Vasilis Syrgkanis

Structure-agnostic causal inference studies how well one can estimate a
treatment effect given black-box machine learning estimates of nuisance
functions (like the impact of confounders on treatment and outcomes). Here, we
find that the answer depends in a surprising way on the distribution of the
treatment noise. Focusing on the partially linear model of
robinson1988root, we first show that the widely adopted double machine
learning (DML) estimator is minimax rate-optimal for Gaussian treatment noise,
resolving an open problem of mackey2018orthogonal. Meanwhile, for
independent non-Gaussian treatment noise, we show that DML is always suboptimal
by constructing new practical procedures with higher-order robustness to
nuisance errors. These ACE procedures use structure-agnostic cumulant
estimators to achieve $r$-th order insensitivity to nuisance errors whenever
the $(r+1)$-st treatment cumulant is non-zero. We complement these core results
with novel minimax guarantees for binary treatments in the partially linear
model. Finally, using synthetic demand estimation experiments, we demonstrate
the practical benefits of our higher-order robust estimators.

arXiv link: http://arxiv.org/abs/2507.02275v2

Econometrics arXiv paper, submitted: 2025-07-02

Meta-emulation: An application to the social cost of carbon

Authors: Richard S. J. Tol

A large database of published model results is used to estimate the
distribution of the social cost of carbon as a function of the underlying
assumptions. The literature on the social cost of carbon deviates in its
assumptions from the literatures on the impacts of climate change, discounting,
and risk aversion. The proposed meta-emulator corrects this. The social cost of
carbon is higher than reported in the literature.

arXiv link: http://arxiv.org/abs/2507.01804v1

Econometrics arXiv paper, submitted: 2025-07-02

Covariance Matrix Estimation for Positively Correlated Assets

Authors: Weilong Liu, Yanchu Liu

The comovement phenomenon in financial markets creates decision scenarios
with positively correlated asset returns. This paper addresses covariance
matrix estimation under such conditions, motivated by observations of
significant positive correlations in factor-sorted portfolio monthly returns.
We demonstrate that fine-tuning eigenvectors linked to weak factors within
rotation-equivariant frameworks produces well-conditioned covariance matrix
estimates. Our Eigenvector Rotation Shrinkage Estimator (ERSE) pairwise rotates
eigenvectors while preserving orthogonality, equivalent to performing multiple
linear shrinkage on two distinct eigenvalues. Empirical results on
factor-sorted portfolios from the Ken French data library demonstrate that ERSE
outperforms existing rotation-equivariant estimators in reducing out-of-sample
portfolio variance, achieving average risk reductions of 10.52% versus linear
shrinkage methods and 12.46% versus nonlinear shrinkage methods. Further
checks indicate that ERSE yields covariance matrices with lower condition
numbers, produces more concentrated and stable portfolio weights, and provides
consistent improvements across different subperiods and estimation windows.

arXiv link: http://arxiv.org/abs/2507.01545v1

Econometrics arXiv paper, submitted: 2025-07-02

Heterogeneity Analysis with Heterogeneous Treatments

Authors: Phillip Heiler, Michael C. Knaus

Analysis of effect heterogeneity at the group level is standard practice in
empirical treatment evaluation research. However, treatments analyzed are often
aggregates of multiple underlying treatments which are themselves
heterogeneous, e.g. different modules of a training program or varying
exposures. In these settings, conventional approaches such as comparing
(adjusted) differences-in-means across groups can produce misleading
conclusions when underlying treatment propensities differ systematically
between groups. This paper develops a novel decomposition framework that
disentangles contributions of effect heterogeneity and qualitatively distinct
components of treatment heterogeneity to observed group-level differences. We
propose semiparametric debiased machine learning estimators that are robust to
complex treatments and limited overlap. We revisit a widely documented gender
gap in training returns of an active labor market policy. The decomposition
reveals that it is almost entirely driven by women being treated differently
than men and not by heterogeneous returns from identical treatments. In
particular, women are disproportionately targeted towards vocational training
tracks with lower unconditional returns.

arXiv link: http://arxiv.org/abs/2507.01517v1

Econometrics arXiv paper, submitted: 2025-07-01

Shrinkage-Based Regressions with Many Related Treatments

Authors: Enes Dilber, Colin Gray

When using observational causal models, practitioners often want to
disentangle the effects of many related, partially-overlapping treatments.
Examples include estimating treatment effects of different marketing
touchpoints, ordering different types of products, or signing up for different
services. Common approaches that estimate separate treatment coefficients are
too noisy for practical decision-making. We propose a computationally light
model that uses a customized ridge regression to move between a heterogeneous
and a homogenous model: it substantially reduces MSE for the effects of each
individual sub-treatment while allowing us to easily reconstruct the effects of
an aggregated treatment. We demonstrate the properties of this estimator in
theory and simulation, and illustrate how it has unlocked targeted
decision-making at Wayfair.

arXiv link: http://arxiv.org/abs/2507.01202v1

Econometrics arXiv paper, submitted: 2025-07-01

Uniform Validity of the Subset Anderson-Rubin Test under Heteroskedasticity and Nonlinearity

Authors: Atsushi Inoue, Òscar Jordà, Guido M. Kuersteiner

We consider the Anderson-Rubin (AR) statistic for a general set of nonlinear
moment restrictions. The statistic is based on the criterion function of the
continuous updating estimator (CUE) for a subset of parameters not constrained
under the Null. We treat the data distribution nonparametrically with
parametric moment restrictions imposed under the Null. We show that subset
tests and confidence intervals based on the AR statistic are uniformly valid
over a wide range of distributions that include moment restrictions with
general forms of heteroskedasticity. We show that the AR based tests have
correct asymptotic size when parameters are unidentified, partially identified,
weakly or strongly identified. We obtain these results by constructing an upper
bound that is using a novel perturbation and regularization approach applied to
the first order conditions of the CUE. Our theory applies to both
cross-sections and time series data and does not assume stationarity in time
series settings or homogeneity in cross-sectional settings.

arXiv link: http://arxiv.org/abs/2507.01167v1

Econometrics arXiv paper, submitted: 2025-07-01

rdhte: Conditional Average Treatment Effects in RD Designs

Authors: Sebastian Calonico, Matias D. Cattaneo, Max H. Farrell, Filippo Palomba, Rocio Titiunik

Understanding causal heterogeneous treatment effects based on pretreatment
covariates is a crucial aspect of empirical work. Building on Calonico,
Cattaneo, Farrell, Palomba, and Titiunik (2025), this article discusses the
software package rdhte for estimation and inference of heterogeneous treatment
effects in sharp regression discontinuity (RD) designs. The package includes
three main commands: rdhte conducts estimation and robust bias-corrected
inference for heterogeneous RD treatment effects, for a given choice of the
bandwidth parameter; rdbwhte implements automatic bandwidth selection methods;
and rdhte lincom computes point estimates and robust bias-corrected confidence
intervals for linear combinations, a post-estimation command specifically
tailored to rdhte. We also provide an overview of heterogeneous effects for
sharp RD designs, give basic details on the methodology, and illustrate using
an empirical application. Finally, we discuss how the package rdhte
complements, and in specific cases recovers, the canonical RD package rdrobust
(Calonico, Cattaneo, Farrell, and Titiunik 2017).

arXiv link: http://arxiv.org/abs/2507.01128v1

Econometrics arXiv paper, submitted: 2025-07-01

Randomization Inference with Sample Attrition

Authors: Xinran Li, Peizan Sheng, Zeyang Yu

Although appealing, randomization inference for treatment effects can suffer
from severe size distortion due to sample attrition. We propose new,
computationally efficient methods for randomization inference that remain valid
under a range of potentially informative missingness mechanisms. We begin by
constructing valid p-values for testing sharp null hypotheses, using the
worst-case p-value from the Fisher randomization test over all possible
imputations of missing outcomes. Leveraging distribution-free test statistics,
this worst-case p-value admits a closed-form solution, connecting naturally to
bounds in the partial identification literature. Our test statistics
incorporate both potential outcomes and missingness indicators, allowing us to
exploit structural assumptions-such as monotone missingness-for increased
power. We further extend our framework to test non-sharp null hypotheses
concerning quantiles of individual treatment effects. The methods are
illustrated through simulations and an empirical application.

arXiv link: http://arxiv.org/abs/2507.00795v1

Econometrics arXiv paper, submitted: 2025-07-01

Comparing Misspecified Models with Big Data: A Variational Bayesian Perspective

Authors: Yong Li, Sushanta K. Mallick, Tao Zeng, Junxing Zhang

Optimal data detection in massive multiple-input multiple-output (MIMO)
systems often requires prohibitively high computational complexity. A variety
of detection algorithms have been proposed in the literature, offering
different trade-offs between complexity and detection performance. In recent
years, Variational Bayes (VB) has emerged as a widely used method for
addressing statistical inference in the context of massive data. This study
focuses on misspecified models and examines the risk functions associated with
predictive distributions derived from variational posterior distributions.
These risk functions, defined as the expectation of the Kullback-Leibler (KL)
divergence between the true data-generating density and the variational
predictive distributions, provide a framework for assessing predictive
performance. We propose two novel information criteria for predictive model
comparison based on these risk functions. Under certain regularity conditions,
we demonstrate that the proposed information criteria are asymptotically
unbiased estimators of their respective risk functions. Through comprehensive
numerical simulations and empirical applications in economics and finance, we
demonstrate the effectiveness of these information criteria in comparing
misspecified models in the context of massive data.

arXiv link: http://arxiv.org/abs/2507.00763v1

Econometrics arXiv paper, submitted: 2025-07-01

Plausible GMM: A Quasi-Bayesian Approach

Authors: Victor Chernozhukov, Christian B. Hansen, Lingwei Kong, Weining Wang

Structural estimation in economics often makes use of models formulated in
terms of moment conditions. While these moment conditions are generally
well-motivated, it is often unknown whether the moment restrictions hold
exactly. We consider a framework where researchers model their belief about the
potential degree of misspecification via a prior distribution and adopt a
quasi-Bayesian approach for performing inference on structural parameters. We
provide quasi-posterior concentration results, verify that quasi-posteriors can
be used to obtain approximately optimal Bayesian decision rules under the
maintained prior structure over misspecification, and provide a form of
frequentist coverage results. We illustrate the approach through empirical
examples where we obtain informative inference for structural objects allowing
for substantial relaxations of the requirement that moment conditions hold
exactly.

arXiv link: http://arxiv.org/abs/2507.00555v1

Econometrics arXiv updated paper (originally submitted: 2025-06-30)

Robust Inference when Nuisance Parameters may be Partially Identified with Applications to Synthetic Controls

Authors: Joseph Fry

When conducting inference for the average treatment effect on the treated
with a Synthetic Control Estimator, the vector of control weights is a nuisance
parameter which is often constrained, high-dimensional, and may be only
partially identified even when the average treatment effect on the treated is
point-identified. All three of these features of a nuisance parameter can lead
to failure of asymptotic normality for the estimate of the parameter of
interest when using standard methods. I provide a new method yielding
asymptotic normality for an estimate of the parameter of interest, even when
all three of these complications are present. This is accomplished by first
estimating the nuisance parameter using a regularization penalty to achieve a
form of identification, and then estimating the parameter of interest using
moment conditions that have been orthogonalized with respect to the nuisance
parameter. I present high-level sufficient conditions for the estimator and
verify these conditions in an example involving Synthetic Controls.

arXiv link: http://arxiv.org/abs/2507.00307v2

Econometrics arXiv paper, submitted: 2025-06-30

Extrapolation in Regression Discontinuity Design Using Comonotonicity

Authors: Ben Deaner, Soonwoo Kwon

We present a novel approach for extrapolating causal effects away from the
margin between treatment and non-treatment in sharp regression discontinuity
designs with multiple covariates. Our methods apply both to settings in which
treatment is a function of multiple observables and settings in which treatment
is determined based on a single running variable. Our key identifying
assumption is that conditional average treated and untreated potential outcomes
are comonotonic: covariate values associated with higher average untreated
potential outcomes are also associated with higher average treated potential
outcomes. We provide an estimation method based on local linear regression. Our
estimands are weighted average causal effects, even if comonotonicity fails. We
apply our methods to evaluate counterfactual mandatory summer school policies.

arXiv link: http://arxiv.org/abs/2507.00289v1

Econometrics arXiv updated paper (originally submitted: 2025-06-30)

Minimax and Bayes Optimal Best-Arm Identification

Authors: Masahiro Kato

This study investigates minimax and Bayes optimal strategies in fixed-budget
best-arm identification. We consider an adaptive procedure consisting of a
sampling phase followed by a recommendation phase, and we design an adaptive
experiment within this framework to efficiently identify the best arm, defined
as the one with the highest expected outcome. In our proposed strategy, the
sampling phase consists of two stages. The first stage is a pilot phase, in
which we allocate each arm uniformly in equal proportions to eliminate clearly
suboptimal arms and estimate outcome variances. In the second stage, arms are
allocated in proportion to the variances estimated during the first stage.
After the sampling phase, the procedure enters the recommendation phase, where
we select the arm with the highest sample mean as our estimate of the best arm.
We prove that this single strategy is simultaneously asymptotically minimax and
Bayes optimal for the simple regret, with upper bounds that coincide exactly
with our lower bounds, including the constant terms.

arXiv link: http://arxiv.org/abs/2506.24007v3

Econometrics arXiv paper, submitted: 2025-06-30

Robust Inference with High-Dimensional Instruments

Authors: Qu Feng, Sombut Jaidee, Wenjie Wang

We propose a weak-identification-robust test for linear instrumental variable
(IV) regressions with high-dimensional instruments, whose number is allowed to
exceed the sample size. In addition, our test is robust to general error
dependence, such as network dependence and spatial dependence. The test
statistic takes a self-normalized form and the asymptotic validity of the test
is established by using random matrix theory. Simulation studies are conducted
to assess the numerical performance of the test, confirming good size control
and satisfactory testing power across a range of various error dependence
structures.

arXiv link: http://arxiv.org/abs/2506.23834v1

Econometrics arXiv paper, submitted: 2025-06-30

Testing parametric additive time-varying GARCH models

Authors: Niklas Ahlgren, Alexander Back, Timo Teräsvirta

We develop misspecification tests for building additive time-varying
(ATV-)GARCH models. In the model, the volatility equation of the GARCH model is
augmented by a deterministic time-varying intercept modeled as a linear
combination of logistic transition functions. The intercept is specified by a
sequence of tests, moving from specific to general. The first test is the test
of the standard stationary GARCH model against an ATV-GARCH model with one
transition. The alternative model is unidentified under the null hypothesis,
which makes the usual LM test invalid. To overcome this problem, we use the
standard method of approximating the transition function by a Taylor expansion
around the null hypothesis. Testing proceeds until the first non-rejection. We
investigate the small-sample properties of the tests in a comprehensive
simulation study. An application to the VIX index indicates that the volatility
of the index is not constant over time but begins a slow increase around the
2007-2008 financial crisis.

arXiv link: http://arxiv.org/abs/2506.23821v1

Econometrics arXiv paper, submitted: 2025-06-30

An Improved Inference for IV Regressions

Authors: Liyu Dou, Pengjin Min, Wenjie Wang, Yichong Zhang

Researchers often report empirical results that are based on low-dimensional
IVs, such as the shift-share IV, together with many IVs. Could we combine these
results in an efficient way and take advantage of the information from both
sides? In this paper, we propose a combination inference procedure to solve the
problem. Specifically, we consider a linear combination of three test
statistics: a standard cluster-robust Wald statistic based on the
low-dimensional IVs, a leave-one-cluster-out Lagrangian Multiplier (LM)
statistic, and a leave-one-cluster-out Anderson-Rubin (AR) statistic. We first
establish the joint asymptotic normality of the Wald, LM, and AR statistics and
derive the corresponding limit experiment under local alternatives. Then, under
the assumption that at least the low-dimensional IVs can strongly identify the
parameter of interest, we derive the optimal combination test based on the
three statistics and establish that our procedure leads to the uniformly most
powerful (UMP) unbiased test among the class of tests considered. In
particular, the efficiency gain from the combined test is of “free lunch" in
the sense that it is always at least as powerful as the test that is only based
on the low-dimensional IVs or many IVs.

arXiv link: http://arxiv.org/abs/2506.23816v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2025-06-30

Overparametrized models with posterior drift

Authors: Guillaume Coqueret, Martial Laguerre

This paper investigates the impact of posterior drift on out-of-sample
forecasting accuracy in overparametrized machine learning models. We document
the loss in performance when the loadings of the data generating process change
between the training and testing samples. This matters crucially in settings in
which regime changes are likely to occur, for instance, in financial markets.
Applied to equity premium forecasting, our results underline the sensitivity of
a market timing strategy to sub-periods and to the bandwidth parameters that
control the complexity of the model. For the average investor, we find that
focusing on holding periods of 15 years can generate very heterogeneous
returns, especially for small bandwidths. Large bandwidths yield much more
consistent outcomes, but are far less appealing from a risk-adjusted return
standpoint. All in all, our findings tend to recommend cautiousness when
resorting to large linear models for stock market predictions.

arXiv link: http://arxiv.org/abs/2506.23619v1

Econometrics arXiv paper, submitted: 2025-06-29

P-CRE-DML: A Novel Approach for Causal Inference in Non-Linear Panel Data

Authors: Amarendra Sharma

This paper introduces a novel Proxy-Enhanced Correlated Random Effects Double
Machine Learning (P-CRE-DML) framework to estimate causal effects in panel data
with non-linearities and unobserved heterogeneity. Combining Double Machine
Learning (DML, Chernozhukov et al., 2018), Correlated Random Effects (CRE,
Mundlak, 1978), and lagged variables (Arellano & Bond, 1991) and innovating
within the CRE-DML framework (Chernozhukov et al., 2022; Clarke & Polselli,
2025; Fuhr & Papies, 2024), we apply P-CRE-DML to investigate the effect of
social trust on GDP growth across 89 countries (2010-2020). We find positive
and statistically significant relationship between social trust and economic
growth. This aligns with prior findings on trust-growth relationship (e.g.,
Knack & Keefer, 1997). Furthermore, a Monte Carlo simulation demonstrates
P-CRE-DML's advantage in terms of lower bias over CRE-DML and System GMM.
P-CRE-DML offers a robust and flexible alternative for panel data causal
inference, with applications beyond economic growth.

arXiv link: http://arxiv.org/abs/2506.23297v1

Econometrics arXiv paper, submitted: 2025-06-29

Modeling European Electricity Market Integration during turbulent times

Authors: Francesco Ravazzolo, Luca Rossini, Andrea Viselli

This paper introduces a novel Bayesian reverse unrestricted mixed-frequency
model applied to a panel of nine European electricity markets. Our model
analyzes the impact of daily fossil fuel prices and hourly renewable energy
generation on hourly electricity prices, employing a hierarchical structure to
capture cross-country interdependencies and idiosyncratic factors. The
inclusion of random effects demonstrates that electricity market integration
both mitigates and amplifies shocks. Our results highlight that while renewable
energy sources consistently reduce electricity prices across all countries, gas
prices remain a dominant driver of cross-country electricity price disparities
and instability. This finding underscores the critical importance of energy
diversification, above all on renewable energy sources, and coordinated fossil
fuel supply strategies for bolstering European energy security.

arXiv link: http://arxiv.org/abs/2506.23289v1

Econometrics arXiv updated paper (originally submitted: 2025-06-28)

Design-Based and Network Sampling-Based Uncertainties in Network Experiments

Authors: Kensuke Sakamoto, Yuya Shimizu

Ordinary least squares (OLS) estimators are widely used in network
experiments to estimate spillover effects. We study the causal interpretation
of, and inference for the OLS estimator under both design-based uncertainty
from random treatment assignment and sampling-based uncertainty in network
links. We show that correlations among regressors that capture the exposure to
neighbors' treatments can induce contamination bias, preventing OLS from
aggregating heterogeneous spillover effects for a clear causal interpretation.
We derive the OLS estimator's asymptotic distribution and propose a
network-robust variance estimator. Simulations and an empirical application
demonstrate that contamination bias can be substantial, leading to inflated
spillover estimates.

arXiv link: http://arxiv.org/abs/2506.22989v3

Econometrics arXiv paper, submitted: 2025-06-28

Causal Inference for Aggregated Treatment

Authors: Carolina Caetano, Gregorio Caetano, Brantly Callaway, Derek Dyal

In this paper, we study causal inference when the treatment variable is an
aggregation of multiple sub-treatment variables. Researchers often report
marginal causal effects for the aggregated treatment, implicitly assuming that
the target parameter corresponds to a well-defined average of sub-treatment
effects. We show that, even in an ideal scenario for causal inference such as
random assignment, the weights underlying this average have some key
undesirable properties: they are not unique, they can be negative, and, holding
all else constant, these issues become exponentially more likely to occur as
the number of sub-treatments increases and the support of each sub-treatment
grows. We propose approaches to avoid these problems, depending on whether or
not the sub-treatment variables are observed.

arXiv link: http://arxiv.org/abs/2506.22885v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-06-28

Doubly robust estimation of causal effects for random object outcomes with continuous treatments

Authors: Satarupa Bhattacharjee, Bing Li, Xiao Wu, Lingzhou Xue

Causal inference is central to statistics and scientific discovery, enabling
researchers to identify cause-and-effect relationships beyond associations.
While traditionally studied within Euclidean spaces, contemporary applications
increasingly involve complex, non-Euclidean data structures that reside in
abstract metric spaces, known as random objects, such as images, shapes,
networks, and distributions. This paper introduces a novel framework for causal
inference with continuous treatments applied to non-Euclidean data. To address
the challenges posed by the lack of linear structures, we leverage Hilbert
space embeddings of the metric spaces to facilitate Fr\'echet mean estimation
and causal effect mapping. Motivated by a study on the impact of exposure to
fine particulate matter on age-at-death distributions across U.S. counties, we
propose a nonparametric, doubly-debiased causal inference approach for outcomes
as random objects with continuous treatments. Our framework can accommodate
moderately high-dimensional vector-valued confounders and derive efficient
influence functions for estimation to ensure both robustness and
interpretability. We establish rigorous asymptotic properties of the
cross-fitted estimators and employ conformal inference techniques for
counterfactual outcome prediction. Validated through numerical experiments and
applied to real-world environmental data, our framework extends causal
inference methodologies to complex data structures, broadening its
applicability across scientific disciplines.

arXiv link: http://arxiv.org/abs/2506.22754v1

Econometrics arXiv paper, submitted: 2025-06-27

Optimal Estimation of Two-Way Effects under Limited Mobility

Authors: Xu Cheng, Sheng Chao Ho, Frank Schorfheide

We propose an empirical Bayes estimator for two-way effects in linked data
sets based on a novel prior that leverages patterns of assortative matching
observed in the data. To capture limited mobility we model the bipartite graph
associated with the matched data in an asymptotic framework where its Laplacian
matrix has small eigenvalues that converge to zero. The prior hyperparameters
that control the shrinkage are determined by minimizing an unbiased risk
estimate. We show the proposed empirical Bayes estimator is asymptotically
optimal in compound loss, despite the weak connectivity of the bipartite graph
and the potential misspecification of the prior. We estimate teacher
values-added from a linked North Carolina Education Research Data Center
student-teacher data set.

arXiv link: http://arxiv.org/abs/2506.21987v1

Econometrics arXiv paper, submitted: 2025-06-26

Multilevel Decomposition of Generalized Entropy Measures Using Constrained Bayes Estimation: An Application to Japanese Regional Data

Authors: Yuki Kawakubo, Kazuhiko Kakamu

We propose a method for multilevel decomposition of generalized entropy (GE)
measures that explicitly accounts for nested population structures such as
national, regional, and subregional levels. Standard approaches that estimate
GE separately at each level do not guarantee compatibility with multilevel
decomposition. Our method constrains lower-level GE estimates to match
higher-level benchmarks while preserving hierarchical relationships across
layers. We apply the method to Japanese income data to estimate GE at the
national, prefectural, and municipal levels, decomposing national inequality
into between-prefecture and within-prefecture inequality, and further
decomposing prefectural GE into between-municipality and within-municipality
inequality.

arXiv link: http://arxiv.org/abs/2506.21213v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-06-26

Orthogonality conditions for convex regression

Authors: Sheng Dai, Timo Kuosmanen, Xun Zhou

Econometric identification generally relies on orthogonality conditions,
which usually state that the random error term is uncorrelated with the
explanatory variables. In convex regression, the orthogonality conditions for
identification are unknown. Applying Lagrangian duality theory, we establish
the sample orthogonality conditions for convex regression, including additive
and multiplicative formulations of the regression model, with and without
monotonicity and homogeneity constraints. We then propose a hybrid instrumental
variable control function approach to mitigate the impact of potential
endogeneity in convex regression. The superiority of the proposed approach is
shown in a Monte Carlo study and examined in an empirical application to
Chilean manufacturing data.

arXiv link: http://arxiv.org/abs/2506.21110v1

Econometrics arXiv paper, submitted: 2025-06-26

Heterogeneous Exposures to Systematic and Idiosyncratic Risk across Crypto Assets: A Divide-and-Conquer Approach

Authors: Nektarios Aslanidis, Aurelio Bariviera, George Kapetanios, Vasilis Sarafidis

This paper analyzes realized return behavior across a broad set of crypto
assets by estimating heterogeneous exposures to idiosyncratic and systematic
risk. A key challenge arises from the latent nature of broader economy-wide
risk sources: macro-financial proxies are unavailable at high-frequencies,
while the abundance of low-frequency candidates offers limited guidance on
empirical relevance. To address this, we develop a two-stage
“divide-and-conquer” approach. The first stage estimates exposures to
high-frequency idiosyncratic and market risk only, using asset-level IV
regressions. The second stage identifies latent economy-wide factors by
extracting the leading principal component from the model residuals and mapping
it to lower-frequency macro-financial uncertainty and sentiment-based
indicators via high-dimensional variable selection. Structured patterns of
heterogeneity in exposures are uncovered using Mean Group estimators across
asset categories. The method is applied to a broad sample of crypto assets,
covering more than 80% of total market capitalization. We document short-term
mean reversion and significant average exposures to idiosyncratic volatility
and illiquidity. Green and DeFi assets are, on average, more exposed to
market-level and economy-wide risk than their non-Green and non-DeFi
counterparts. By contrast, stablecoins are less exposed to idiosyncratic,
market-level, and economy-wide risk factors relative to non-stablecoins. At a
conceptual level, our study develops a coherent framework for isolating
distinct layers of risk in crypto markets. Empirically, it sheds light on how
return sensitivities vary across digital asset categories -- insights that are
important for both portfolio design and regulatory oversight.

arXiv link: http://arxiv.org/abs/2506.21100v1

Econometrics arXiv paper, submitted: 2025-06-26

Wild Bootstrap Inference for Linear Regressions with Many Covariates

Authors: Wenze Li

We propose a simple modification to the wild bootstrap procedure and
establish its asymptotic validity for linear regression models with many
covariates and heteroskedastic errors. Monte Carlo simulations show that the
modified wild bootstrap has excellent finite sample performance compared with
alternative methods that are based on standard normal critical values,
especially when the sample size is small and/or the number of controls is of
the same order of magnitude as the sample size.

arXiv link: http://arxiv.org/abs/2506.20972v1

Econometrics arXiv paper, submitted: 2025-06-25

Analytic inference with two-way clustering

Authors: Laurent Davezies, Xavier D'Haultfœuille, Yannick Guyonvarch

This paper studies analytic inference along two dimensions of clustering. In
such setups, the commonly used approach has two drawbacks. First, the
corresponding variance estimator is not necessarily positive. Second, inference
is invalid in non-Gaussian regimes, namely when the estimator of the parameter
of interest is not asymptotically Gaussian. We consider a simple fix that
addresses both issues. In Gaussian regimes, the corresponding tests are
asymptotically exact and equivalent to usual ones. Otherwise, the new tests are
asymptotically conservative. We also establish their uniform validity over a
certain class of data generating processes. Independently of our tests, we
highlight potential issues with multiple testing and nonlinear estimators under
two-way clustering. Finally, we compare our approach with existing ones through
simulations.

arXiv link: http://arxiv.org/abs/2506.20749v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-06-25

Anytime-Valid Inference in Adaptive Experiments: Covariate Adjustment and Balanced Power

Authors: Daniel Molitor, Samantha Gold

Adaptive experiments such as multi-armed bandits offer efficiency gains over
traditional randomized experiments but pose two major challenges: invalid
inference on the Average Treatment Effect (ATE) due to adaptive sampling and
low statistical power for sub-optimal treatments. We address both issues by
extending the Mixture Adaptive Design framework (arXiv:2311.05794). First, we
propose MADCovar, a covariate-adjusted ATE estimator that is unbiased and
preserves anytime-valid inference guarantees while substantially improving ATE
precision. Second, we introduce MADMod, which dynamically reallocates samples
to underpowered arms, enabling more balanced statistical power across
treatments without sacrificing valid inference. Both methods retain MAD's core
advantage of constructing asymptotic confidence sequences (CSs) that allow
researchers to continuously monitor ATE estimates and stop data collection once
a desired precision or significance criterion is met. Empirically, we validate
both methods using simulations and real-world data. In simulations, MADCovar
reduces CS width by up to $60%$ relative to MAD. In a large-scale political
RCT with $\approx32,000$ participants, MADCovar achieves similar precision
gains. MADMod improves statistical power and inferential precision across all
treatment arms, particularly for suboptimal treatments. Simulations show that
MADMod sharply reduces Type II error while preserving the efficiency benefits
of adaptive allocation. Together, MADCovar and MADMod make adaptive experiments
more practical, reliable, and efficient for applied researchers across many
domains. Our proposed methods are implemented through an open-source software
package.

arXiv link: http://arxiv.org/abs/2506.20523v3

Econometrics arXiv updated paper (originally submitted: 2025-06-25)

Daily Fluctuations in Weather and Economic Growth at the Subnational Level: Evidence from Thailand

Authors: Sarun Kamolthip

This paper examines the effects of daily temperature fluctuations on
subnational economic growth in Thailand. Using annual gross provincial product
(GPP) per capita data from 1982 to 2022 and high-resolution reanalysis weather
data, I estimate fixed-effects panel regressions that isolate plausibly
exogenous within-province year-to-year variation in temperature. The results
indicate a statistically significant inverted-U relationship between
temperature and annual growth in GPP per capita, with adverse effects
concentrated in the agricultural sector. Industrial and service outputs appear
insensitive to short-term weather variation. Distributed lag models suggest
that temperature shocks have persistent effects on growth trajectories,
particularly in lower-income provinces with higher average temperatures. I
combine these estimates with climate projections under RCP4.5 and RCP8.5
emission scenarios to evaluate province-level economic impacts through 2090.
Without adjustments for biases in climate projections or lagged temperature
effects, climate change is projected to reduce per capita output for 63-86% of
Thai population, with median GDP per capita impacts ranging from -4% to +56%
for RCP4.5 and from -52% to -15% for RCP8.5. When correcting for projected
warming biases - but omitting lagged dynamics - median losses increase to
57-63% (RCP4.5) and 80-86% (RCP8.5). Accounting for delayed temperature effects
further raises the upper-bound estimates to near-total loss. These results
highlight the importance of accounting for model uncertainty and temperature
dynamics in subnational climate impact assessments. All projections should be
interpreted with appropriate caution.

arXiv link: http://arxiv.org/abs/2506.20105v2

Econometrics arXiv paper, submitted: 2025-06-24

A Sharp and Robust Test for Selective Reporting

Authors: Stefan Faridani

This paper proposes a test that is consistent against every detectable form
of selective reporting and remains interpretable even when the t-scores are not
exactly normal. The test statistic is the distance between the smoothed
empirical t-curve and the set of all t-curves that would be possible in the
absence of any selective reporting. This novel projection test can only be
evaded in large meta-samples by selective reporting that also evades all other
valid tests of restrictions on the t-curve. A second benefit of the projection
test is that under the null we can interpret the projection residual as noise
plus bias incurred from approximating the t-score's exact distribution with the
normal. Applying the test to the Brodeur et al. (2020) meta-data, we find that
the t-curves for RCTs, IVs, and DIDs are more distorted than could arise by
chance. But an Edgeworth Expansion reveals that these distortions are small
enough to be plausibly explained by the only approximate normality of the
individual t-scores. The detection of selective reporting in this meta-sample
is therefore more fragile than previously known.

arXiv link: http://arxiv.org/abs/2506.20035v1

Econometrics arXiv paper, submitted: 2025-06-24

Single-Index Quantile Factor Model with Observed Characteristics

Authors: Ruofan Xu, Qingliang Fan

We propose a characteristics-augmented quantile factor (QCF) model, where
unknown factor loading functions are linked to a large set of observed
individual-level (e.g., bond- or stock-specific) covariates via a single-index
projection. The single-index specification offers a parsimonious,
interpretable, and statistically efficient way to nonparametrically
characterize the time-varying loadings, while avoiding the curse of
dimensionality in flexible nonparametric models. Using a three-step sieve
estimation procedure, the QCF model demonstrates high in-sample and
out-of-sample accuracy in simulations. We establish asymptotic properties for
estimators of the latent factor, loading functions, and index parameters. In an
empirical study, we analyze the dynamic distributional structure of U.S.
corporate bond returns from 2003 to 2020. Our method outperforms the benchmark
quantile Fama-French five-factor model and quantile latent factor model,
particularly in the tails ($\tau=0.05, 0.95$). The model reveals
state-dependent risk exposures driven by characteristics such as bond and
equity volatility, coupon, and spread. Finally, we provide economic
interpretations of the latent factors.

arXiv link: http://arxiv.org/abs/2506.19586v1

Econometrics arXiv paper, submitted: 2025-06-23

100-Day Analysis of USD/IDR Exchange Rate Dynamics Around the 2025 U.S. Presidential Inauguration

Authors: Sandy H. S. Herho, Siti N. Kaban, Cahya Nugraha

Using a 100-day symmetric window around the January 2025 U.S. presidential
inauguration, non-parametric statistical methods with bootstrap resampling
(10,000 iterations) analyze distributional properties and anomalies. Results
indicate a statistically significant 3.61% Indonesian rupiah depreciation
post-inauguration, with a large effect size (Cliff's Delta $= -0.9224$, CI:
$[-0.9727, -0.8571]$). Central tendency shifted markedly, yet volatility
remained stable (variance ratio $= 0.9061$, $p = 0.504$). Four significant
anomalies exhibiting temporal clustering are detected. These findings provide
quantitative evidence of political transition effects on emerging market
currencies, highlighting implications for monetary policy and currency risk
management.

arXiv link: http://arxiv.org/abs/2506.18738v1

Econometrics arXiv paper, submitted: 2025-06-23

The Persistent Effects of Peru's Mining MITA: Double Machine Learning Approach

Authors: Alper Deniz Karakas

This study examines the long-term economic impact of the colonial Mita system
in Peru, building on Melissa Dell's foundational work on the enduring effects
of forced labor institutions. The Mita, imposed by the Spanish colonial
authorities from 1573 to 1812, required indigenous communities within a
designated boundary to supply labor to mines, primarily near Potosi. Dell's
original regression discontinuity design (RDD) analysis, leveraging the Mita
boundary to estimate the Mita's legacy on modern economic outcomes, indicates
that regions subjected to the Mita exhibit lower household consumption levels
and higher rates of child stunting. In this paper, I replicate Dell's results
and extend this analysis. I apply Double Machine Learning (DML) methods--the
Partially Linear Regression (PLR) model and the Interactive Regression Model
(IRM)--to further investigate the Mita's effects. DML allows for the inclusion
of high-dimensional covariates and enables more flexible, non-linear modeling
of treatment effects, potentially capturing complex relationships that a
polynomial-based approach may overlook. While the PLR model provides some
additional flexibility, the IRM model allows for fully heterogeneous treatment
effects, offering a nuanced perspective on the Mita's impact across regions and
district characteristics. My findings suggest that the Mita's economic legacy
is more substantial and spatially heterogeneous than originally estimated. The
IRM results reveal that proximity to Potosi and other district-specific factors
intensify the Mita's adverse impact, suggesting a deeper persistence of
regional economic inequality. These findings underscore that machine learning
addresses the realistic non-linearity present in complex, real-world systems.
By modeling hypothetical counterfactuals more accurately, DML enhances my
ability to estimate the true causal impact of historical interventions.

arXiv link: http://arxiv.org/abs/2506.18947v1

Econometrics arXiv paper, submitted: 2025-06-22

Poverty Targeting with Imperfect Information

Authors: Juan C. Yamin

A key challenge for targeted antipoverty programs in developing countries is
that policymakers must rely on estimated rather than observed income, which
leads to substantial targeting errors. I propose a statistical decision
framework in which a benevolent planner, subject to a budget constraint and
equipped only with noisy income estimates, allocates cash transfers to the
poorest individuals. In this setting, the commonly used plug-in rule, which
allocates transfers based on point estimates, is inadmissible and uniformly
dominated by a shrinkage-based alternative. Building on this result, I propose
an empirical Bayes (EB) targeting rule. I show that the regret of the empirical
Bayes rule converges at the same rate as that of the posterior mean estimator,
despite applying a nonsmooth transformation to it. Simulations show that the EB
rule delivers large improvements over the plug-in approach in an idealized
setting and modest but consistent gains in a more realistic application.

arXiv link: http://arxiv.org/abs/2506.18188v1

Econometrics arXiv paper, submitted: 2025-06-22

Beyond utility: incorporating eye-tracking, skin conductance and heart rate data into cognitive and econometric travel behaviour models

Authors: Thomas O. Hancock, Stephane Hess, Charisma F. Choudhury

Choice models for large-scale applications have historically relied on
economic theories (e.g. utility maximisation) that establish relationships
between the choices of individuals, their characteristics, and the attributes
of the alternatives. In a parallel stream, choice models in cognitive
psychology have focused on modelling the decision-making process, but typically
in controlled scenarios. Recent research developments have attempted to bridge
the modelling paradigms, with choice models that are based on psychological
foundations, such as decision field theory (DFT), outperforming traditional
econometric choice models for travel mode and route choice behaviour. The use
of physiological data, which can provide indications about the choice-making
process and mental states, opens up the opportunity to further advance the
models. In particular, the use of such data to enrich 'process' parameters
within a cognitive theory-driven choice model has not yet been explored. This
research gap is addressed by incorporating physiological data into both
econometric and DFT models for understanding decision-making in two different
contexts: stated-preference responses (static) of accomodation choice and
gap-acceptance decisions within a driving simulator experiment (dynamic).
Results from models for the static scenarios demonstrate that both models can
improve substantially through the incorporation of eye-tracking information.
Results from models for the dynamic scenarios suggest that stress measurement
and eye-tracking data can be linked with process parameters in DFT, resulting
in larger improvements in comparison to simpler methods for incorporating this
data in either DFT or econometric models. The findings provide insights into
the value added by physiological data as well as the performance of different
candidate modelling frameworks for integrating such data.

arXiv link: http://arxiv.org/abs/2506.18068v1

Econometrics arXiv paper, submitted: 2025-06-22

An Empirical Comparison of Weak-IV-Robust Procedures in Just-Identified Models

Authors: Wenze Li

Instrumental variable (IV) regression is recognized as one of the five core
methods for causal inference, as identified by Angrist and Pischke (2008). This
paper compares two leading approaches to inference under weak identification
for just-identified IV models: the classical Anderson-Rubin (AR) procedure and
the recently popular tF method proposed by Lee et al. (2022). Using replication
data from the American Economic Review (AER) and Monte Carlo simulation
experiments, we evaluate the two procedures in terms of statistical
significance testing and confidence interval (CI) length. Empirically, we find
that the AR procedure typically offers higher power and yields shorter CIs than
the tF method. Nonetheless, as noted by Lee et al. (2022), tF has a theoretical
advantage in terms of expected CI length. Our findings suggest that the two
procedures may be viewed as complementary tools in empirical applications
involving potentially weak instruments.

arXiv link: http://arxiv.org/abs/2506.18001v1

Econometrics arXiv paper, submitted: 2025-06-21

Efficient Difference-in-Differences and Event Study Estimators

Authors: Xiaohong Chen, Pedro H. C. Sant'Anna, Haitian Xie

This paper investigates efficient Difference-in-Differences (DiD) and Event
Study (ES) estimation using short panel data sets within the heterogeneous
treatment effect framework, free from parametric functional form assumptions
and allowing for variation in treatment timing. We provide an equivalent
characterization of the DiD potential outcome model using sequential
conditional moment restrictions on observables, which shows that the DiD
identification assumptions typically imply nonparametric overidentification
restrictions. We derive the semiparametric efficient influence function (EIF)
in closed form for DiD and ES causal parameters under commonly imposed parallel
trends assumptions. The EIF is automatically Neyman orthogonal and yields the
smallest variance among all asymptotically normal, regular estimators of the
DiD and ES parameters. Leveraging the EIF, we propose simple-to-compute
efficient estimators. Our results highlight how to optimally explore different
pre-treatment periods and comparison groups to obtain the tightest (asymptotic)
confidence intervals, offering practical tools for improving inference in
modern DiD and ES applications even in small samples. Calibrated simulations
and an empirical application demonstrate substantial precision gains of our
efficient estimators in finite samples.

arXiv link: http://arxiv.org/abs/2506.17729v1

Econometrics arXiv paper, submitted: 2025-06-19

Leave No One Undermined: Policy Targeting with Regret Aversion

Authors: Toru Kitagawa, Sokbae Lee, Chen Qiu

While the importance of personalized policymaking is widely recognized, fully
personalized implementation remains rare in practice. We study the problem of
policy targeting for a regret-averse planner when training data gives a rich
set of observable characteristics while the assignment rules can only depend on
its subset. Grounded in decision theory, our regret-averse criterion reflects a
planner's concern about regret inequality across the population, which
generally leads to a fractional optimal rule due to treatment effect
heterogeneity beyond the average treatment effects conditional on the subset
characteristics. We propose a debiased empirical risk minimization approach to
learn the optimal rule from data. Viewing our debiased criterion as a weighted
least squares problem, we establish new upper and lower bounds for the excess
risk, indicating a convergence rate of 1/n and asymptotic efficiency in certain
cases. We apply our approach to the National JTPA Study and the International
Stroke Trial.

arXiv link: http://arxiv.org/abs/2506.16430v1

Econometrics arXiv paper, submitted: 2025-06-18

Fast Learning of Optimal Policy Trees

Authors: James Cussens, Julia Hatamyar, Vishalie Shah, Noemi Kreif

We develop and implement a version of the popular "policytree" method (Athey
and Wager, 2021) using discrete optimisation techniques. We test the
performance of our algorithm in finite samples and find an improvement in the
runtime of optimal policy tree learning by a factor of nearly 50 compared to
the original version. We provide an R package, "fastpolicytree", for public
use.

arXiv link: http://arxiv.org/abs/2506.15435v1

Econometrics arXiv paper, submitted: 2025-06-17

On the relationship between prediction intervals, tests of sharp nulls and inference on realized treatment effects in settings with few treated units

Authors: Luis Alvarez, Bruno Ferman

We study how inference methods for settings with few treated units that rely
on treatment effect homogeneity extend to alternative inferential targets when
treatment effects are heterogeneous -- namely, tests of sharp null hypotheses,
inference on realized treatment effects, and prediction intervals. We show that
inference methods for these alternative targets are deeply interconnected: they
are either equivalent or become equivalent under additional assumptions. Our
results show that methods designed under treatment effect homogeneity can
remain valid for these alternative targets when treatment effects are
stochastic, offering new theoretical justifications and insights on their
applicability.

arXiv link: http://arxiv.org/abs/2506.14998v1

Econometrics arXiv paper, submitted: 2025-06-17

Heterogeneous economic growth vulnerability across Euro Area countries under stressed scenarios

Authors: Claudio Lissona, Esther Ruiz

We analyse economic growth vulnerability of the four largest Euro Area (EA)
countries under stressed macroeconomic and financial conditions. Vulnerability,
measured as a lower quantile of the growth distribution conditional on EA-wide
and country-specific underlying factors, is found to be higher in Germany,
which is more exposed to EA-wide economic conditions, and in Spain, which has
large country-specific sectoral dynamics. We show that, under stress, financial
factors amplify adverse macroeconomic conditions. Furthermore, even severe
sectoral (financial or macro) shocks, whether common or country-specific, fail
to fully explain the vulnerability observed under overall stress. Our results
underscore the importance of monitoring both local and EA-wide macro-financial
conditions to design effective policies for mitigating growth vulnerability.

arXiv link: http://arxiv.org/abs/2506.14321v1

Econometrics arXiv paper, submitted: 2025-06-17

A break from the norm? Parametric representations of preference heterogeneity for discrete choice models in health

Authors: John Buckell, Alice Wreford, Matthew Quaife, Thomas O. Hancock

Background: Any sample of individuals has its own, unique distribution of
preferences for choices that they make. Discrete choice models try to capture
these distributions. Mixed logits are by far the most commonly used choice
model in health. A raft of parametric model specifications for these models are
available. We test a range of alternatives assumptions, and model averaging, to
test if or how model outputs are impacted. Design: Scoping review of current
modelling practices. Seven alternative distributions, and model averaging over
all distributional assumptions, were compared on four datasets: two were stated
preference, one was revealed preference, and one was simulated. Analyses
examined model fit, preference distributions, willingness-to-pay, and
forecasting. Results: Almost universally, using normal distributions is the
standard practice in health. Alternative distributional assumptions
outperformed standard practice. Preference distributions and the mean
willingness-to-pay varied significantly across specifications, and were seldom
comparable to those derived from normal distributions. Model averaging offered
distributions allowed for greater flexibility, further gains in fit, reproduced
underlying distributions in simulations, and mitigated against analyst bias
arising from distribution selection. There was no evidence that distributional
assumptions impacted predictions from models. Limitations: Our focus was on
mixed logit models since these models are the most common in health, though
latent class models are also used. Conclusions: The standard practice of using
all normal distributions appears to be an inferior approach for capturing
random preference heterogeneity. Implications: Researchers should test
alternative assumptions to normal distributions in their models.

arXiv link: http://arxiv.org/abs/2506.14099v1

Econometrics arXiv updated paper (originally submitted: 2025-06-17)

Machine Learning-Based Estimation of Monthly GDP

Authors: Yonggeun Jung

This paper proposes a scalable framework to estimate monthly GDP using
machine learning methods. We apply Multi-Layer Perceptron (MLP), Long
Short-Term Memory networks (LSTM), Extreme Gradient Boosting (XGBoost), and
Elastic Net regression to map monthly indicators to quarterly GDP growth, and
reconcile the outputs with actual aggregates. Using data from China, Germany,
the UK, and the US, our method delivers robust performance across varied data
environments. Benchmark comparisons with prior US studies and UK official
statistics validate its accuracy. We also explore nighttime light as a proxy,
finding its usefulness varies by economic structure. The approach offers a
flexible and data-driven tool for high-frequency macroeconomic monitoring and
policy analysis.

arXiv link: http://arxiv.org/abs/2506.14078v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-06-16

Causal Mediation Analysis with Multiple Mediators: A Simulation Approach

Authors: Jesse Zhou, Geoffrey T. Wodtke

Analyses of causal mediation often involve exposure-induced confounders or,
relatedly, multiple mediators. In such applications, researchers aim to
estimate a variety of different quantities, including interventional direct and
indirect effects, multivariate natural direct and indirect effects, and/or
path-specific effects. This study introduces a general approach to estimating
all these quantities by simulating potential outcomes from a series of
distribution models for each mediator and the outcome. Building on similar
methods developed for analyses with only a single mediator (Imai et al. 2010),
we first outline how to implement this approach with parametric models. The
parametric implementation can accommodate linear and nonlinear relationships,
both continuous and discrete mediators, and many different types of outcomes.
However, it depends on correct specification of each model used to simulate the
potential outcomes. To address the risk of misspecification, we also introduce
an alternative implementation using a novel class of nonparametric models,
which leverage deep neural networks to approximate the relevant distributions
without relying on strict assumptions about functional form. We illustrate both
methods by reanalyzing the effects of media framing on attitudes toward
immigration (Brader et al. 2008) and the effects of prenatal care on preterm
birth (VanderWeele et al. 2014).

arXiv link: http://arxiv.org/abs/2506.14019v1

Econometrics arXiv paper, submitted: 2025-06-16

High-Dimensional Spatial-Plus-Vertical Price Relationships and Price Transmission: A Machine Learning Approach

Authors: Mindy L. Mallory, Rundong Peng, Meilin Ma, H. Holly Wang

Price transmission has been studied extensively in agricultural economics
through the lens of spatial and vertical price relationships. Classical time
series econometric techniques suffer from the "curse of dimensionality" and are
applied almost exclusively to small sets of price series, either prices of one
commodity in a few regions or prices of a few commodities in one region.
However, an agrifood supply chain usually contains several commodities (e.g.,
cattle and beef) and spans numerous regions. Failing to jointly examine
multi-region, multi-commodity price relationships limits researchers' ability
to derive insights from increasingly high-dimensional price datasets of
agrifood supply chains. We apply a machine-learning method - specifically,
regularized regression - to augment the classical vector error correction model
(VECM) and study large spatial-plus-vertical price systems. Leveraging weekly
provincial-level data on the piglet-hog-pork supply chain in China, we uncover
economically interesting changes in price relationships in the system before
and after the outbreak of a major hog disease. To quantify price transmission
in the large system, we rely on the spatial-plus-vertical price relationships
identified by the regularized VECM to visualize comprehensive spatial and
vertical price transmission of hypothetical shocks through joint impulse
response functions. Price transmission shows considerable heterogeneity across
regions and commodities as the VECM outcomes imply and display different
dynamics over time.

arXiv link: http://arxiv.org/abs/2506.13967v1

Econometrics arXiv paper, submitted: 2025-06-16

Gradient Boosting for Spatial Regression Models with Autoregressive Disturbances

Authors: Michael Balzer

Researchers in urban and regional studies increasingly deal with spatial data
that reflects geographic location and spatial relationships. As a framework for
dealing with the unique nature of spatial data, various spatial regression
models have been introduced. In this article, a novel model-based gradient
boosting algorithm for spatial regression models with autoregressive
disturbances is proposed. Due to the modular nature, the approach provides an
alternative estimation procedure which is feasible even in high-dimensional
settings where established quasi-maximum likelihood or generalized method of
moments estimators do not yield unique solutions. The approach additionally
enables data-driven variable and model selection in low- as well as
high-dimensional settings. Since the bias-variance trade-off is also controlled
in the algorithm, implicit regularization is imposed which improves prediction
accuracy on out-of-sample spatial data. Detailed simulation studies regarding
the performance of estimation, prediction and variable selection in low- and
high-dimensional settings confirm proper functionality of the proposed
methodology. To illustrative the functionality of the model-based gradient
boosting algorithm, a case study is presented where the life expectancy in
German districts is modeled incorporating a potential spatial dependence
structure.

arXiv link: http://arxiv.org/abs/2506.13682v1

Econometrics arXiv updated paper (originally submitted: 2025-06-16)

Identification of Impulse Response Functions for Nonlinear Dynamic Models

Authors: Christian Gourieroux, Quinlan Lee

We explore the issues of identification for nonlinear Impulse Response
Functions in nonlinear dynamic models and discuss the settings in which the
problem can be mitigated. In particular, we introduce the nonlinear
autoregressive representation with Gaussian innovations and characterize the
identified set. This set arises from the multiplicity of nonlinear innovations
and transformations which leave invariant the standard normal density. We then
discuss possible identifying restrictions, such as non-Gaussianity of
independent sources, or identifiable parameters by means of learning
algorithms, and the possibility of identification in nonlinear dynamic factor
models when the underlying latent factors have different dynamics. We also
explain how these identification results depend ultimately on the set of series
under consideration.

arXiv link: http://arxiv.org/abs/2506.13531v2

Econometrics arXiv updated paper (originally submitted: 2025-06-16)

Production Function Estimation without Invertibility: Imperfectly Competitive Environments and Demand Shocks

Authors: Ulrich Doraszelski, Lixiong Li

We advance the proxy variable approach to production function estimation. We
show that the invertibility assumption at its heart is testable. We
characterize what goes wrong if invertibility fails and what can still be done.
We show that rethinking how the estimation procedure is implemented either
eliminates or mitigates the bias that arises if invertibility fails. In
particular, a simple change to the first step of the estimation procedure
provides a first-order bias correction for the GMM estimator in the second
step. Furthermore, a modification of the moment condition in the second step
ensures Neyman orthogonality and enhances efficiency and robustness by
rendering the asymptotic distribution of the GMM estimator invariant to
estimation noise from the first step.

arXiv link: http://arxiv.org/abs/2506.13520v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-06-16

Joint Quantile Shrinkage: A State-Space Approach toward Non-Crossing Bayesian Quantile Models

Authors: David Kohns, Tibor Szendrei

Crossing of fitted conditional quantiles is a prevalent problem for quantile
regression models. We propose a new Bayesian modelling framework that penalises
multiple quantile regression functions toward the desired non-crossing space.
We achieve this by estimating multiple quantiles jointly with a prior on
variation across quantiles, a fused shrinkage prior with quantile adaptivity.
The posterior is derived from a decision-theoretic general Bayes perspective,
whose form yields a natural state-space interpretation aligned with
Time-Varying Parameter (TVP) models. Taken together our approach leads to a
Quantile-Varying Parameter (QVP) model, for which we develop efficient sampling
algorithms. We demonstrate that our proposed modelling framework provides
superior parameter recovery and predictive performance compared to competing
Bayesian and frequentist quantile regression estimators in simulated
experiments and a real-data application to multivariate quantile estimation in
macroeconomics.

arXiv link: http://arxiv.org/abs/2506.13257v2

Econometrics arXiv paper, submitted: 2025-06-15

Quantile Peer Effect Models

Authors: Aristide Houndetoungan

I propose a flexible structural model to estimate peer effects across various
quantiles of the peer outcome distribution. The model allows peers with low,
intermediate, and high outcomes to exert distinct influences, thereby capturing
more nuanced patterns of peer effects than standard approaches that are based
on aggregate measures. I establish the existence and uniqueness of the Nash
equilibrium and demonstrate that the model parameters can be estimated using a
straightforward instrumental variable strategy. Applying the model to a range
of outcomes that are commonly studied in the literature, I uncover diverse and
rich patterns of peer influences that challenge assumptions inherent in
standard models. These findings carry important policy implications: key player
status in a network depends not only on network structure, but also on the
distribution of outcomes within the population.

arXiv link: http://arxiv.org/abs/2506.12920v1

Econometrics arXiv updated paper (originally submitted: 2025-06-15)

Rethinking Distributional IVs: KAN-Powered D-IV-LATE & Model Choice

Authors: Charles Shaw

The double/debiased machine learning (DML) framework has become a cornerstone
of modern causal inference, allowing researchers to utilise flexible machine
learning models for the estimation of nuisance functions without introducing
first-order bias into the final parameter estimate. However, the choice of
machine learning model for the nuisance functions is often treated as a minor
implementation detail. In this paper, we argue that this choice can have a
profound impact on the substantive conclusions of the analysis. We demonstrate
this by presenting and comparing two distinct Distributional Instrumental
Variable Local Average Treatment Effect (D-IV-LATE) estimators. The first
estimator leverages standard machine learning models like Random Forests for
nuisance function estimation, while the second is a novel estimator employing
Kolmogorov-Arnold Networks (KANs). We establish the asymptotic properties of
these estimators and evaluate their performance through Monte Carlo
simulations. An empirical application analysing the distributional effects of
401(k) participation on net financial assets reveals that the choice of machine
learning model for nuisance functions can significantly alter substantive
conclusions, with the KAN-based estimator suggesting more complex treatment
effect heterogeneity. These findings underscore a critical "caveat emptor". The
selection of nuisance function estimators is not a mere implementation detail.
Instead, it is a pivotal choice that can profoundly impact research outcomes in
causal inference.

arXiv link: http://arxiv.org/abs/2506.12765v2

Econometrics arXiv paper, submitted: 2025-06-14

Dynamic allocation: extremes, tail dependence, and regime Shifts

Authors: Yin Luo, Sheng Wang, Javed Jussa

By capturing outliers, volatility clustering, and tail dependence in the
asset return distribution, we build a sophisticated model to predict the
downside risk of the global financial market. We further develop a dynamic
regime switching model that can forecast real-time risk regime of the market.
Our GARCH-DCC-Copula risk model can significantly improve both risk- and
alpha-based global tactical asset allocation strategies. Our risk regime has
strong predictive power of quantitative equity factor performance, which can
help equity investors to build better factor models and asset allocation
managers to construct more efficient risk premia portfolios.

arXiv link: http://arxiv.org/abs/2506.12587v1

Econometrics arXiv updated paper (originally submitted: 2025-06-14)

Moment Restrictions for Nonlinear Panel Data Models with Feedback

Authors: Stéphane Bonhomme, Kevin Dano, Bryan S. Graham

Many panel data methods, while allowing for general dependence between
covariates and time-invariant agent-specific heterogeneity, place strong a
priori restrictions on feedback: how past outcomes, covariates, and
heterogeneity map into future covariate levels. Ruling out feedback entirely,
as often occurs in practice, is unattractive in many dynamic economic settings.
We provide a general characterization of all feedback and heterogeneity robust
(FHR) moment conditions for nonlinear panel data models and present
constructive methods to derive feasible moment-based estimators for specific
models. We also use our moment characterization to compute semiparametric
efficiency bounds, allowing for a quantification of the information loss
associated with accommodating feedback, as well as providing insight into how
to construct estimators with good efficiency properties in practice. Our
results apply both to the finite dimensional parameter indexing the parametric
part of the model as well as to estimands that involve averages over the
distribution of unobserved heterogeneity. We illustrate our methods by
providing a complete characterization of all FHR moment functions in the
multi-spell mixed proportional hazards model. We compute efficient moment
functions for both model parameters and average effects in this setting.

arXiv link: http://arxiv.org/abs/2506.12569v2

Econometrics arXiv updated paper (originally submitted: 2025-06-13)

Optimal treatment assignment rules under capacity constraints

Authors: Keita Sunada, Kohei Izumi

We study treatment assignment problems under capacity constraints, where a
planner aims to maximize social welfare by assigning treatments based on
observable covariates. Such constraints, common when treatments are costly or
limited in supply, introduce nontrivial challenges for deriving optimal
statistical assignment rules because the planner needs to coordinate treatment
assignment probabilities across the entire covariate distribution. To address
these challenges, we reformulate the planner's constrained maximization problem
as an optimal transport problem, which makes the problem effectively
unconstrained. We then establish local asymptotic optimality results of
assignment rules using a limits of experiments framework. Finally, we
illustrate our method with a voucher assignment problem for private secondary
school attendance using data from Angrist et al. (2006)

arXiv link: http://arxiv.org/abs/2506.12225v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-06-13

Partial identification via conditional linear programs: estimation and policy learning

Authors: Eli Ben-Michael

Many important quantities of interest are only partially identified from
observable data: the data can limit them to a set of plausible values, but not
uniquely determine them. This paper develops a unified framework for
covariate-assisted estimation, inference, and decision making in partial
identification problems where the parameter of interest satisfies a series of
linear constraints, conditional on covariates. In such settings, bounds on the
parameter can be written as expectations of solutions to conditional linear
programs that optimize a linear function subject to linear constraints, where
both the objective function and the constraints may depend on covariates and
need to be estimated from data. Examples include estimands involving the joint
distributions of potential outcomes, policy learning with inequality-aware
value functions, and instrumental variable settings. We propose two de-biased
estimators for bounds defined by conditional linear programs. The first
directly solves the conditional linear programs with plugin estimates and uses
output from standard LP solvers to de-bias the plugin estimate, avoiding the
need for computationally demanding vertex enumeration of all possible solutions
for symbolic bounds. The second uses entropic regularization to create smooth
approximations to the conditional linear programs, trading a small amount of
approximation error for improved estimation and computational efficiency. We
establish conditions for asymptotic normality of both estimators, show that
both estimators are robust to first-order errors in estimating the conditional
constraints and objectives, and construct Wald-type confidence intervals for
the partially identified parameters. These results also extend to policy
learning problems where the value of a decision policy is only partially
identified. We apply our methods to a study on the effects of Medicaid
enrollment.

arXiv link: http://arxiv.org/abs/2506.12215v2

Econometrics arXiv paper, submitted: 2025-06-13

Evaluating Program Sequences with Double Machine Learning: An Application to Labor Market Policies

Authors: Fabian Muny

Many programs evaluated in observational studies incorporate a sequential
structure, where individuals may be assigned to various programs over time.
While this complexity is often simplified by analyzing programs at single
points in time, this paper reviews, explains, and applies methods for program
evaluation within a sequential framework. It outlines the assumptions required
for identification under dynamic confounding and demonstrates how extending
sequential estimands to dynamic policies enables the construction of more
realistic counterfactuals. Furthermore, the paper explores recently developed
methods for estimating effects across multiple treatments and time periods,
utilizing Double Machine Learning (DML), a flexible estimator that avoids
parametric assumptions while preserving desirable statistical properties. Using
Swiss administrative data, the methods are demonstrated through an empirical
application assessing the participation of unemployed individuals in active
labor market policies, where assignment decisions by caseworkers can be
reconsidered between two periods. The analysis identifies a temporary wage
subsidy as the most effective intervention, on average, even after adjusting
for its extended duration compared to other programs. Overall, DML-based
analysis of dynamic policies proves to be a useful approach within the program
evaluation toolkit.

arXiv link: http://arxiv.org/abs/2506.11960v1

Econometrics arXiv updated paper (originally submitted: 2025-06-13)

Structural Representations and Identification of Marginal Policy Effects

Authors: Zhixin Wang, Yu Zhang, Zhengyu Zhang

This paper investigates the structural interpretation of the marginal policy
effect (MPE) within nonseparable models. We demonstrate that, for a smooth
functional of the outcome distribution, the MPE equals its functional
derivative evaluated at the outcome-conditioned weighted average structural
derivative. This equivalence is definitional rather than identification-based.
Building on this theoretical result, we propose an alternative identification
strategy for the MPE that complements existing methods.

arXiv link: http://arxiv.org/abs/2506.11694v2

Econometrics arXiv paper, submitted: 2025-06-13

Identification and Inference of Partial Effects in Sharp Regression Kink Designs

Authors: Zhixin Wang, Zhengyu Zhang

The partial effect refers to the impact of a change in a target variable D on
the distribution of an outcome variable Y . This study examines the
identification and inference of a wide range of partial effects at the
threshold in the sharp regression kink (RK) design under general policy
interventions. We establish a unifying framework for conducting inference on
the effect of an infinitesimal change in D on smooth functionals of the
distribution of Y, particularly when D is endogenous and instrumental variables
are unavailable. This framework yields a general formula that clarifies the
causal interpretation of numerous existing sharp RK estimands in the
literature.
We develop the relevant asymptotic theory, introduce a multiplier bootstrap
procedure for inference, and provide practical implementation guidelines.
Applying our method to the effect of unemployment insurance (UI) benefits on
unemployment duration, we find that while higher benefits lead to longer
durations, they also tend to reduce their dispersion. Furthermore, our results
show that the magnitude of the partial effect can change substantially
depending on the specific form of the policy intervention.

arXiv link: http://arxiv.org/abs/2506.11663v1

Econometrics arXiv paper, submitted: 2025-06-13

Let the Tree Decide: FABART A Non-Parametric Factor Model

Authors: Sofia Velasco

This article proposes a novel framework that integrates Bayesian Additive
Regression Trees (BART) into a Factor-Augmented Vector Autoregressive (FAVAR)
model to forecast macro-financial variables and examine asymmetries in the
transmission of oil price shocks. By employing nonparametric techniques for
dimension reduction, the model captures complex, nonlinear relationships
between observables and latent factors that are often missed by linear
approaches. A simulation experiment comparing FABART to linear alternatives and
a Monte Carlo experiment demonstrate that the framework accurately recovers the
relationship between latent factors and observables in the presence of
nonlinearities, while remaining consistent under linear data-generating
processes. The empirical application shows that FABART substantially improves
forecast accuracy for industrial production relative to linear benchmarks,
particularly during periods of heightened volatility and economic stress. In
addition, the model reveals pronounced sign asymmetries in the transmission of
oil supply news shocks to the U.S. economy, with positive shocks generating
stronger and more persistent contractions in real activity and inflation than
the expansions triggered by negative shocks. A similar pattern emerges at the
U.S. federal state level, where negative shocks lead to modest declines in
employment compared to the substantially larger contractions observed after
positive shocks.

arXiv link: http://arxiv.org/abs/2506.11551v1

Econometrics arXiv paper, submitted: 2025-06-12

Inference on panel data models with a generalized factor structure

Authors: Juan M. Rodriguez-Poo, Alexandra Soberon, Stefan Sperlich

We consider identification, inference and validation of linear panel data
models when both factors and factor loadings are accounted for by a
nonparametric function. This general specification encompasses rather popular
models such as the two-way fixed effects and the interactive fixed effects
ones. By applying a conditional mean independence assumption between unobserved
heterogeneity and the covariates, we obtain consistent estimators of the
parameters of interest at the optimal rate of convergence, for fixed and large
$T$. We also provide a specification test for the modeling assumption based on
the methodology of conditional moment tests and nonparametric estimation
techniques. Using degenerate and nondegenerate theories of U-statistics we show
its convergence and asymptotic distribution under the null, and that it
diverges under the alternative at a rate arbitrarily close to $NT$.
Finite sample inference is based on bootstrap. Simulations reveal an excellent
performance of our methods and an empirical application is conducted.

arXiv link: http://arxiv.org/abs/2506.10690v1

Econometrics arXiv paper, submitted: 2025-06-12

Nowcasting the euro area with social media data

Authors: Konstantin Boss, Luigi Longo, Luca Onorante

Using a state-of-the-art large language model, we extract forward-looking and
context-sensitive signals related to inflation and unemployment in the euro
area from millions of Reddit submissions and comments. We develop daily
indicators that incorporate, in addition to posts, the social interaction among
users. Our empirical results show consistent gains in out-of-sample nowcasting
accuracy relative to daily newspaper sentiment and financial variables,
especially in unusual times such as the (post-)COVID-19 period. We conclude
that the application of AI tools to the analysis of social media, specifically
Reddit, provides useful signals about inflation and unemployment in Europe at
daily frequency and constitutes a useful addition to the toolkit available to
economic forecasters and nowcasters.

arXiv link: http://arxiv.org/abs/2506.10546v1

Econometrics arXiv updated paper (originally submitted: 2025-06-11)

How much is too much? Measuring divergence from Benford's Law with the Equivalent Contamination Proportion (ECP)

Authors: Manuel Cano-Rodriguez

Conformity with Benford's Law is widely used to detect irregularities in
numerical datasets, particularly in accounting, finance, and economics.
However, the statistical tools commonly used for this purpose (such as
Chi-squared, MAD, or KS) suffer from three key limitations: sensitivity to
sample size, lack of interpretability of their scale, and the absence of a
common metric that allows for comparison across different statistics. This
paper introduces the Equivalent Contamination Proportion (ECP) to address these
issues. Defined as the proportion of contamination in a hypothetical
Benford-conforming sample such that the expected value of the divergence
statistic matches the one observed in the actual data, the ECP provides a
continuous and interpretable measure of deviation (ranging from 0 to 1), is
robust to sample size, and offers consistent results across different
divergence statistics under mild conditions. Closed-form and simulation-based
methods are developed for estimating the ECP, and, through a retrospective
analysis of three influential studies, it is shown how the ECP can complement
the information provided by traditional divergence statistics and enhance the
interpretation of results.

arXiv link: http://arxiv.org/abs/2506.09915v2

Econometrics arXiv paper, submitted: 2025-06-11

Estimating the Number of Components in Panel Data Finite Mixture Regression Models with an Application to Production Function Heterogeneity

Authors: Yu Hao, Hiroyuki Kasahara

This paper develops statistical methods for determining the number of
components in panel data finite mixture regression models with regression
errors independently distributed as normal or more flexible normal mixtures. We
analyze the asymptotic properties of the likelihood ratio test (LRT) and
information criteria (AIC and BIC) for model selection in both conditionally
independent and dynamic panel settings. Unlike cross-sectional normal mixture
models, we show that panel data structures eliminate higher-order degeneracy
problems while retaining issues of unbounded likelihood and infinite Fisher
information. Addressing these challenges, we derive the asymptotic null
distribution of the LRT statistic as the maximum of random variables and
develop a sequential testing procedure for consistent selection of the number
of components. Our theoretical analysis also establishes the consistency of BIC
and the inconsistency of AIC. Empirical application to Chilean manufacturing
data reveals significant heterogeneity in production technology, with
substantial variation in output elasticities of material inputs and
factor-augmented technological processes within narrowly defined industries,
indicating plant-specific variation in production functions beyond
Hicks-neutral technological differences. These findings contrast sharply with
the standard practice of assuming a homogeneous production function and
highlight the necessity of accounting for unobserved plant heterogeneity in
empirical production analysis.

arXiv link: http://arxiv.org/abs/2506.09666v1

Econometrics arXiv paper, submitted: 2025-06-11

Diffusion index forecasts under weaker loadings: PCA, ridge regression, and random projections

Authors: Tom Boot, Bart Keijsers

We study the accuracy of forecasts in the diffusion index forecast model with
possibly weak loadings. The default option to construct forecasts is to
estimate the factors through principal component analysis (PCA) on the
available predictor matrix, and use the estimated factors to forecast the
outcome variable. Alternatively, we can directly relate the outcome variable to
the predictors through either ridge regression or random projections. We
establish that forecasts based on PCA, ridge regression and random projections
are consistent for the conditional mean under the same assumptions on the
strength of the loadings. However, under weaker loadings the convergence rate
is lower for ridge and random projections if the time dimension is small
relative to the cross-section dimension. We assess the relevance of these
findings in an empirical setting by comparing relative forecast accuracy for
monthly macroeconomic and financial variables using different window sizes. The
findings support the theoretical results, and at the same time show that
regularization-based procedures may be more robust in settings not covered by
the developed theory.

arXiv link: http://arxiv.org/abs/2506.09575v1

Econometrics arXiv paper, submitted: 2025-06-10

Fragility in Average Treatment Effect on the Treated under Limited Covariate Support

Authors: Mengqi Li

This paper studies the identification of the average treatment effect on the
treated (ATT) under unconfoundedness when covariate overlap is partial. A
formal diagnostic is proposed to characterize empirical support -- the subset
of the covariate space where ATT is point-identified due to the presence of
comparable untreated units. Where support is absent, standard estimators remain
computable but cease to identify meaningful causal parameters. A general
sensitivity framework is developed, indexing identified sets by curvature
constraints on the selection mechanism. This yields a structural selection
frontier tracing the trade-off between assumption strength and inferential
precision. Two diagnostic statistics are introduced: the minimum assumption
strength for sign identification (MAS-SI), and a fragility index that
quantifies the minimal deviation from ignorability required to overturn
qualitative conclusions. Applied to the LaLonde (1986) dataset, the framework
reveals that nearly half the treated strata lack empirical support, rendering
the ATT undefined in those regions. Simulations confirm that ATT estimates may
be stable in magnitude yet fragile in epistemic content. These findings reframe
overlap not as a regularity condition but as a prerequisite for identification,
and recast sensitivity analysis as integral to empirical credibility rather
than auxiliary robustness.

arXiv link: http://arxiv.org/abs/2506.08950v1

Econometrics arXiv paper, submitted: 2025-06-10

Testing Shape Restrictions with Continuous Treatment: A Transformation Model Approach

Authors: Arkadiusz Szydłowski

We propose tests for the convexity/linearity/concavity of a transformation of
the dependent variable in a semiparametric transformation model. These tests
can be used to verify monotonicity of the treatment effect, or, equivalently,
concavity/convexity of the outcome with respect to the treatment, in
(quasi-)experimental settings. Our procedure does not require estimation of the
transformation or the distribution of the error terms, thus it is easy to
implement. The statistic takes the form of a U statistic or a localised U
statistic, and we show that critical values can be obtained by bootstrapping.
In our application we test the convexity of loan demand with respect to the
interest rate using experimental data from South Africa.

arXiv link: http://arxiv.org/abs/2506.08914v1

Econometrics arXiv paper, submitted: 2025-06-09

Enterprise value, economic and policy uncertainties: the case of US air carriers

Authors: Bahram Adrangi, Arjun Chatrath, Madhuparna Kolay, Kambiz Raffiee

The enterprise value (EV) is a crucial metric in company valuation as it
encompasses not only equity but also assets and liabilities, offering a
comprehensive measure of total value, especially for companies with diverse
capital structures. The relationship between economic uncertainty and firm
value is rooted in economic theory, with early studies dating back to Sandmo's
work in 1971 and further elaborated upon by John Kenneth Galbraith in 1977.
Subsequent significant events have underscored the pivotal role of uncertainty
in the financial and economic realm. Using a VAR-MIDAS methodology, analysis of
accumulated impulse responses reveals that the EV of air carrier firms responds
heterogeneously to financial and economic uncertainties, suggesting unique
coping strategies. Most firms exhibit negative reactions to recessionary risks
and economic policy uncertainties. Financial shocks also elicit varied
responses, with positive impacts observed on EV in response to increases in the
current ratio and operating income after depreciation. However, high debt
levels are unfavorably received by the market, leading to negative EV responses
to debt-to-asset ratio shocks. Other financial shocks show mixed or
indeterminate impacts on EV.

arXiv link: http://arxiv.org/abs/2506.07766v1

Econometrics arXiv paper, submitted: 2025-06-09

Economic and Policy Uncertainties and Firm Value: The Case of Consumer Durable Goods

Authors: Bahram Adrangi, Saman Hatamerad, Madhuparna Kolay, Kambiz Raffiee

The objective of this study is to analyze the response of firm value,
represented by the Tobin's Q (Q) for a group of twelve U.S. durable goods
producers to uncertainties in the US Economy. The results, based on an
estimated panel quantile regressions (PQR) and panel vector autoregressive
MIDAS model (PVM), show that Q for these firms reacts negatively to the
positive shocks to the current ratio, and debt-to-asset ratio and positively to
operating income after depreciation and the quick ratio in most quantiles. The
Q of the firms under study reacts negatively to the economic policy
uncertainty, risk of recession, and inflationary expectation, but positively to
consumer confidence in most quantiles of its distribution. Finally, Granger
causality tests confirm that the uncertainty indicators considered in the study
are significant predictors of changes in the value of these companies as
reflected by Q.

arXiv link: http://arxiv.org/abs/2506.07476v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-06-09

Individual Treatment Effect: Prediction Intervals and Sharp Bounds

Authors: Zhehao Zhang, Thomas S. Richardson

Individual treatment effect (ITE) is often regarded as the ideal target of
inference in causal analyses and has been the focus of several recent studies.
In this paper, we describe the intrinsic limits regarding what can be learned
concerning ITEs given data from large randomized experiments. We consider when
a valid prediction interval for the ITE is informative and when it can be
bounded away from zero. The joint distribution over potential outcomes is only
partially identified from a randomized trial. Consequently, to be valid, an ITE
prediction interval must be valid for all joint distribution consistent with
the observed data and hence will in general be wider than that resulting from
knowledge of this joint distribution. We characterize prediction intervals in
the binary treatment and outcome setting, and extend these insights to models
with continuous and ordinal outcomes. We derive sharp bounds on the probability
mass function (pmf) of the individual treatment effect (ITE). Finally, we
contrast prediction intervals for the ITE and confidence intervals for the
average treatment effect (ATE). This also leads to the consideration of Fisher
versus Neyman null hypotheses. While confidence intervals for the ATE shrink
with increasing sample size due to its status as a population parameter,
prediction intervals for the ITE generally do not vanish, leading to scenarios
where one may reject the Neyman null yet still find evidence consistent with
the Fisher null, highlighting the challenges of individualized decision-making
under partial identification.

arXiv link: http://arxiv.org/abs/2506.07469v1

Econometrics arXiv paper, submitted: 2025-06-09

Does Residuals-on-Residuals Regression Produce Representative Estimates of Causal Effects?

Authors: Apoorva Lal, Winston Chou

Double Machine Learning is commonly used to estimate causal effects in large
observational datasets. The "residuals-on-residuals" regression estimator
(RORR) is especially popular for its simplicity and computational tractability.
However, when treatment effects are heterogeneous, the proper interpretation of
RORR may not be well understood. We show that, for many-valued treatments with
continuous dose-response functions, RORR converges to a conditional
variance-weighted average of derivatives evaluated at points not in the
observed dataset, which generally differs from the Average Causal Derivative
(ACD). Hence, even if all units share the same dose-response function, RORR
does not in general converge to an average treatment effect in the population
represented by the sample. We propose an alternative estimator suitable for
large datasets. We demonstrate the pitfalls of RORR and the favorable
properties of the proposed estimator in both an illustrative numerical example
and an application to real-world data from Netflix.

arXiv link: http://arxiv.org/abs/2506.07462v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-06-08

Quantile-Optimal Policy Learning under Unmeasured Confounding

Authors: Zhongren Chen, Siyu Chen, Zhengling Qi, Xiaohong Chen, Zhuoran Yang

We study quantile-optimal policy learning where the goal is to find a policy
whose reward distribution has the largest $\alpha$-quantile for some $\alpha
\in (0, 1)$. We focus on the offline setting whose generating process involves
unobserved confounders. Such a problem suffers from three main challenges: (i)
nonlinearity of the quantile objective as a functional of the reward
distribution, (ii) unobserved confounding issue, and (iii) insufficient
coverage of the offline dataset. To address these challenges, we propose a
suite of causal-assisted policy learning methods that provably enjoy strong
theoretical guarantees under mild conditions. In particular, to address (i) and
(ii), using causal inference tools such as instrumental variables and negative
controls, we propose to estimate the quantile objectives by solving nonlinear
functional integral equations. Then we adopt a minimax estimation approach with
nonparametric models to solve these integral equations, and propose to
construct conservative policy estimates that address (iii). The final policy is
the one that maximizes these pessimistic estimates. In addition, we propose a
novel regularized policy learning method that is more amenable to computation.
Finally, we prove that the policies learned by these methods are
$\mathscr{O}(n^{-1/2})$ quantile-optimal under a mild coverage
assumption on the offline dataset. Here, $\mathscr{O}(\cdot)$ omits
poly-logarithmic factors. To the best of our knowledge, we propose the first
sample-efficient policy learning algorithms for estimating the quantile-optimal
policy when there exist unmeasured confounding.

arXiv link: http://arxiv.org/abs/2506.07140v1

Econometrics arXiv updated paper (originally submitted: 2025-06-07)

Inference on the value of a linear program

Authors: Leonard Goff, Eric Mbakop

This paper studies inference on the value of a linear program (LP) when both
the objective function and constraints are possibly unknown and must be
estimated from data. We show that many inference problems in partially
identified models can be reformulated in this way. Building on Shapiro (1991)
and Fang and Santos (2019), we develop a pointwise valid inference procedure
for the value of an LP. We modify this pointwise inference procedure to
construct one-sided inference procedures that are uniformly valid over large
classes of data-generating processes. Our results provide alternative testing
procedures for problems considered in Andrews et al. (2023), Cox and Shi
(2023), and Fang et al. (2023) (in the low-dimensional case), and remain valid
when key components--such as the coefficient matrix--are unknown and must be
estimated. Moreover, our framework also accommodates inference on the
identified set of a subvector, in models defined by linear moment inequalities,
and does so under weaker constraint qualifications than those in Gafarov
(2025).

arXiv link: http://arxiv.org/abs/2506.06776v2

Econometrics arXiv paper, submitted: 2025-06-06

Practically significant differences between conditional distribution functions

Authors: Holger Dette, Kathrin Möllenhoff, Dominik Wied

In the framework of semiparametric distribution regression, we consider the
problem of comparing the conditional distribution functions corresponding to
two samples. In contrast to testing for exact equality, we are interested in
the (null) hypothesis that the $L^2$ distance between the conditional
distribution functions does not exceed a certain threshold in absolute value.
The consideration of these hypotheses is motivated by the observation that in
applications, it is rare, and perhaps impossible, that a null hypothesis of
exact equality is satisfied and that the real question of interest is to detect
a practically significant deviation between the two conditional distribution
functions.
The consideration of a composite null hypothesis makes the testing problem
challenging, and in this paper we develop a pivotal test for such hypotheses.
Our approach is based on self-normalization and therefore requires neither the
estimation of (complicated) variances nor bootstrap approximations. We derive
the asymptotic limit distribution of the (appropriately normalized) test
statistic and show consistency under local alternatives. A simulation study and
an application to German SOEP data reveal the usefulness of the method.

arXiv link: http://arxiv.org/abs/2506.06545v1

Econometrics arXiv updated paper (originally submitted: 2025-06-06)

Statistical significance in choice modelling: computation, usage and reporting

Authors: Stephane Hess, Andrew Daly, Michiel Bliemer, Angelo Guevara, Ricardo Daziano, Thijs Dekker

This paper offers a commentary on the use of notions of statistical
significance in choice modelling. We argue that, as in many other areas of
science, there is an over-reliance on 95% confidence levels, and
misunderstandings of the meaning of significance. We also observe a lack of
precision in the reporting of measures of uncertainty in many studies,
especially when using p-values and even more so with star measures. The paper
provides a precise discussion on the computation of measures of uncertainty and
confidence intervals, discusses the use of statistical tests, and also stresses
the importance of considering behavioural or policy significance in addition to
statistical significance.

arXiv link: http://arxiv.org/abs/2506.05996v2

Econometrics arXiv paper, submitted: 2025-06-06

On Efficient Estimation of Distributional Treatment Effects under Covariate-Adaptive Randomization

Authors: Undral Byambadalai, Tomu Hirata, Tatsushi Oka, Shota Yasui

This paper focuses on the estimation of distributional treatment effects in
randomized experiments that use covariate-adaptive randomization (CAR). These
include designs such as Efron's biased-coin design and stratified block
randomization, where participants are first grouped into strata based on
baseline covariates and assigned treatments within each stratum to ensure
balance across groups. In practice, datasets often contain additional
covariates beyond the strata indicators. We propose a flexible distribution
regression framework that leverages off-the-shelf machine learning methods to
incorporate these additional covariates, enhancing the precision of
distributional treatment effect estimates. We establish the asymptotic
distribution of the proposed estimator and introduce a valid inference
procedure. Furthermore, we derive the semiparametric efficiency bound for
distributional treatment effects under CAR and demonstrate that our
regression-adjusted estimator attains this bound. Simulation studies and
empirical analyses of microcredit programs highlight the practical advantages
of our method.

arXiv link: http://arxiv.org/abs/2506.05945v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-06-05

Admissibility of Completely Randomized Trials: A Large-Deviation Approach

Authors: Guido Imbens, Chao Qin, Stefan Wager

When an experimenter has the option of running an adaptive trial, is it
admissible to ignore this option and run a non-adaptive trial instead? We
provide a negative answer to this question in the best-arm identification
problem, where the experimenter aims to allocate measurement efforts
judiciously to confidently deploy the most effective treatment arm. We find
that, whenever there are at least three treatment arms, there exist simple
adaptive designs that universally and strictly dominate non-adaptive completely
randomized trials. This dominance is characterized by a notion called
efficiency exponent, which quantifies a design's statistical efficiency when
the experimental sample is large. Our analysis focuses on the class of batched
arm elimination designs, which progressively eliminate underperforming arms at
pre-specified batch intervals. We characterize simple sufficient conditions
under which these designs universally and strictly dominate completely
randomized trials. These results resolve the second open problem posed in Qin
[2022].

arXiv link: http://arxiv.org/abs/2506.05329v1

Econometrics arXiv paper, submitted: 2025-06-05

Enhancing the Merger Simulation Toolkit with ML/AI

Authors: Harold D. Chiang, Jack Collison, Lorenzo Magnolfi, Christopher Sullivan

This paper develops a flexible approach to predict the price effects of
horizontal mergers using ML/AI methods. While standard merger simulation
techniques rely on restrictive assumptions about firm conduct, we propose a
data-driven framework that relaxes these constraints when rich market data are
available. We develop and identify a flexible nonparametric model of supply
that nests a broad range of conduct models and cost functions. To overcome the
curse of dimensionality, we adapt the Variational Method of Moments (VMM)
(Bennett and Kallus, 2023) to estimate the model, allowing for various forms of
strategic interaction. Monte Carlo simulations show that our method
significantly outperforms an array of misspecified models and rivals the
performance of the true model, both in predictive performance and
counterfactual merger simulations. As a way to interpret the economics of the
estimated function, we simulate pass-through and reveal that the model learns
markup and cost functions that imply approximately correct pass-through
behavior. Applied to the American Airlines-US Airways merger, our method
produces more accurate post-merger price predictions than traditional
approaches. The results demonstrate the potential for machine learning
techniques to enhance merger analysis while maintaining economic structure.

arXiv link: http://arxiv.org/abs/2506.05225v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-06-05

The Spurious Factor Dilemma: Robust Inference in Heavy-Tailed Elliptical Factor Models

Authors: Jiang Hu, Jiahui Xie, Yangchun Zhang, Wang Zhou

Factor models are essential tools for analyzing high-dimensional data,
particularly in economics and finance. However, standard methods for
determining the number of factors often overestimate the true number when data
exhibit heavy-tailed randomness, misinterpreting noise-induced outliers as
genuine factors. This paper addresses this challenge within the framework of
Elliptical Factor Models (EFM), which accommodate both heavy tails and
potential non-linear dependencies common in real-world data. We demonstrate
theoretically and empirically that heavy-tailed noise generates spurious
eigenvalues that mimic true factor signals. To distinguish these, we propose a
novel methodology based on a fluctuation magnification algorithm. We show that
under magnifying perturbations, the eigenvalues associated with real factors
exhibit significantly less fluctuation (stabilizing asymptotically) compared to
spurious eigenvalues arising from heavy-tailed effects. This differential
behavior allows the identification and detection of the true and spurious
factors. We develop a formal testing procedure based on this principle and
apply it to the problem of accurately selecting the number of common factors in
heavy-tailed EFMs. Simulation studies and real data analysis confirm the
effectiveness of our approach compared to existing methods, particularly in
scenarios with pronounced heavy-tailedness.

arXiv link: http://arxiv.org/abs/2506.05116v1

Econometrics arXiv updated paper (originally submitted: 2025-06-05)

Finite-Sample Distortion in Kernel Specification Tests: A Perturbation Analysis of Empirical Directional Components

Authors: Cui Rui, Li Yuhao, Song Xiaojun

This paper provides a new theoretical lens for understanding the
finite-sample performance of kernel-based specification tests, such as the
Kernel Conditional Moment (KCM) test. Rather than introducing a fundamentally
new test, we isolate and rigorously analyze the finite-sample distortion
arising from the discrepancy between the empirical and population eigenspaces
of the kernel operator. Using perturbation theory for compact operators, we
demonstrate that the estimation error in directional components is governed by
local eigengaps: components associated with small eigenvalues are highly
unstable and contribute primarily noise rather than signal under fixed
alternatives. Although this error vanishes asymptotically under the null, it
can substantially degrade power in finite samples. This insight explains why
the effective power of omnibus kernel tests is often concentrated in a
low-dimensional subspace. We illustrate how truncating unstable high-frequency
components--a natural consequence of our analysis--can improve finite-sample
performance, but emphasize that the core contribution is the diagnostic
understanding of why and when such instability occurs. The
analysis is largely non-asymptotic and applies broadly to reproducing kernel
Hilbert space-based inference.

arXiv link: http://arxiv.org/abs/2506.04900v2

Econometrics arXiv updated paper (originally submitted: 2025-06-04)

Latent Variable Autoregression with Exogenous Inputs

Authors: Daniil Bargman

This paper introduces a new least squares regression methodology called
(C)LARX: a (constrained) latent variable autoregressive model with exogenous
inputs. Two additional contributions are made as a side effect: First, a new
matrix operator is introduced for matrices and vectors with blocks along one
dimension; Second, a new latent variable regression (LVR) framework is proposed
for economics and finance. The empirical section examines how well the stock
market predicts real economic activity in the United States. (C)LARX models
outperform the baseline OLS specification in out-of-sample forecasts and offer
novel analytical insights about the underlying functional relationship.

arXiv link: http://arxiv.org/abs/2506.04488v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2025-06-04

What Makes Treatment Effects Identifiable? Characterizations and Estimators Beyond Unconfoundedness

Authors: Yang Cai, Alkis Kalavasis, Katerina Mamali, Anay Mehrotra, Manolis Zampetakis

Most of the widely used estimators of the average treatment effect (ATE) in
causal inference rely on the assumptions of unconfoundedness and overlap.
Unconfoundedness requires that the observed covariates account for all
correlations between the outcome and treatment. Overlap requires the existence
of randomness in treatment decisions for all individuals. Nevertheless, many
types of studies frequently violate unconfoundedness or overlap, for instance,
observational studies with deterministic treatment decisions - popularly known
as Regression Discontinuity designs - violate overlap.
In this paper, we initiate the study of general conditions that enable the
identification of the average treatment effect, extending beyond
unconfoundedness and overlap. In particular, following the paradigm of
statistical learning theory, we provide an interpretable condition that is
sufficient and necessary for the identification of ATE. Moreover, this
condition also characterizes the identification of the average treatment effect
on the treated (ATT) and can be used to characterize other treatment effects as
well. To illustrate the utility of our condition, we present several
well-studied scenarios where our condition is satisfied and, hence, we prove
that ATE can be identified in regimes that prior works could not capture. For
example, under mild assumptions on the data distributions, this holds for the
models proposed by Tan (2006) and Rosenbaum (2002), and the Regression
Discontinuity design model introduced by Thistlethwaite and Campbell (1960).
For each of these scenarios, we also show that, under natural additional
assumptions, ATE can be estimated from finite samples.
We believe these findings open new avenues for bridging learning-theoretic
insights and causal inference methodologies, particularly in observational
studies with complex treatment mechanisms.

arXiv link: http://arxiv.org/abs/2506.04194v2

Econometrics arXiv cross-link from cs.CY (cs.CY), submitted: 2025-06-04

Evaluating Large Language Model Capabilities in Assessing Spatial Econometrics Research

Authors: Giuseppe Arbia, Luca Morandini, Vincenzo Nardelli

This paper investigates Large Language Models (LLMs) ability to assess the
economic soundness and theoretical consistency of empirical findings in spatial
econometrics. We created original and deliberately altered "counterfactual"
summaries from 28 published papers (2005-2024), which were evaluated by a
diverse set of LLMs. The LLMs provided qualitative assessments and structured
binary classifications on variable choice, coefficient plausibility, and
publication suitability. The results indicate that while LLMs can expertly
assess the coherence of variable choices (with top models like GPT-4o achieving
an overall F1 score of 0.87), their performance varies significantly when
evaluating deeper aspects such as coefficient plausibility and overall
publication suitability. The results further revealed that the choice of LLM,
the specific characteristics of the paper and the interaction between these two
factors significantly influence the accuracy of the assessment, particularly
for nuanced judgments. These findings highlight LLMs' current strengths in
assisting with initial, more surface-level checks and their limitations in
performing comprehensive, deep economic reasoning, suggesting a potential
assistive role in peer review that still necessitates robust human oversight.

arXiv link: http://arxiv.org/abs/2506.06377v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2025-06-04

High-Dimensional Learning in Finance

Authors: Hasan Fallahgoul

Recent advances in machine learning have shown promising results for
financial prediction using large, over-parameterized models. This paper
provides theoretical foundations and empirical validation for understanding
when and how these methods achieve predictive success. I examine two key
aspects of high-dimensional learning in finance. First, I prove that
within-sample standardization in Random Fourier Features implementations
fundamentally alters the underlying Gaussian kernel approximation, replacing
shift-invariant kernels with training-set dependent alternatives. Second, I
establish information-theoretic lower bounds that identify when reliable
learning is impossible no matter how sophisticated the estimator. A detailed
quantitative calibration of the polynomial lower bound shows that with typical
parameter choices, e.g., 12,000 features, 12 monthly observations, and R-square
2-3%, the required sample size to escape the bound exceeds 25-30 years of
data--well beyond any rolling-window actually used. Thus, observed
out-of-sample success must originate from lower-complexity artefacts rather
than from the intended high-dimensional mechanism.

arXiv link: http://arxiv.org/abs/2506.03780v3

Econometrics arXiv paper, submitted: 2025-06-04

Conventional and Fuzzy Data Envelopment Analysis with deaR

Authors: Vicente J. Bolos, Rafael Benitez, Vicente Coll-Serrano

deaR is a recently developed R package for data envelopment analysis (DEA)
that implements a large number of conventional and fuzzy models, along with
super-efficiency models, cross-efficiency analysis, Malmquist index,
bootstrapping, and metafrontier analysis. It should be noted that deaR is the
only package to date that incorporates Kao-Liu, Guo-Tanaka and possibilistic
fuzzy models. The versatility of the package allows the user to work with
different returns to scale and orientations, as well as to consider special
features, namely non-controllable, non-discretionary or undesirable variables.
Moreover, it includes novel graphical representations that can help the user to
display the results. This paper is a comprehensive description of deaR,
reviewing all implemented models and giving examples of use.

arXiv link: http://arxiv.org/abs/2506.03766v1

Econometrics arXiv paper, submitted: 2025-06-04

Combine and conquer: model averaging for out-of-distribution forecasting

Authors: Stephane Hess, Sander van Cranenburgh

Travel behaviour modellers have an increasingly diverse set of models at
their disposal, ranging from traditional econometric structures to models from
mathematical psychology and data-driven approaches from machine learning. A key
question arises as to how well these different models perform in prediction,
especially when considering trips of different characteristics from those used
in estimation, i.e. out-of-distribution prediction, and whether better
predictions can be obtained by combining insights from the different models.
Across two case studies, we show that while data-driven approaches excel in
predicting mode choice for trips within the distance bands used in estimation,
beyond that range, the picture is fuzzy. To leverage the relative advantages of
the different model families and capitalise on the notion that multiple `weak'
models can result in more robust models, we put forward the use of a model
averaging approach that allocates weights to different model families as a
function of the distance between the characteristics of the trip for
which predictions are made, and those used in model estimation. Overall, we see
that the model averaging approach gives larger weight to models with stronger
behavioural or econometric underpinnings the more we move outside the interval
of trip distances covered in estimation. Across both case studies, we show that
our model averaging approach obtains improved performance both on the
estimation and validation data, and crucially also when predicting mode choices
for trips of distances outside the range used in estimation.

arXiv link: http://arxiv.org/abs/2506.03693v1

Econometrics arXiv cross-link from q-fin.CP (q-fin.CP), submitted: 2025-06-03

Deep Learning Enhanced Multivariate GARCH

Authors: Haoyuan Wang, Chen Liu, Minh-Ngoc Tran, Chao Wang

This paper introduces a novel multivariate volatility modeling framework,
named Long Short-Term Memory enhanced BEKK (LSTM-BEKK), that integrates deep
learning into multivariate GARCH processes. By combining the flexibility of
recurrent neural networks with the econometric structure of BEKK models, our
approach is designed to better capture nonlinear, dynamic, and high-dimensional
dependence structures in financial return data. The proposed model addresses
key limitations of traditional multivariate GARCH-based methods, particularly
in capturing persistent volatility clustering and asymmetric co-movement across
assets. Leveraging the data-driven nature of LSTMs, the framework adapts
effectively to time-varying market conditions, offering improved robustness and
forecasting performance. Empirical results across multiple equity markets
confirm that the LSTM-BEKK model achieves superior performance in terms of
out-of-sample portfolio risk forecast, while maintaining the interpretability
from the BEKK models. These findings highlight the potential of hybrid
econometric-deep learning models in advancing financial risk management and
multivariate volatility forecasting.

arXiv link: http://arxiv.org/abs/2506.02796v1

Econometrics arXiv paper, submitted: 2025-06-03

Orthogonality-Constrained Deep Instrumental Variable Model for Causal Effect Estimation

Authors: Shunxin Yao

OC-DeepIV is a neural network model designed for estimating causal effects.
It characterizes heterogeneity by adding interaction features and reduces
redundancy through orthogonal constraints. The model includes two feature
extractors, one for the instrumental variable Z and the other for the covariate
X*. The training process is divided into two stages: the first stage uses the
mean squared error (MSE) loss function, and the second stage incorporates
orthogonal regularization. Experimental results show that this model
outperforms DeepIV and DML in terms of accuracy and stability. Future research
directions include applying the model to real-world problems and handling
scenarios with multiple processing variables.

arXiv link: http://arxiv.org/abs/2506.02790v1

Econometrics arXiv paper, submitted: 2025-06-03

Get me out of this hole: a profile likelihood approach to identifying and avoiding inferior local optima in choice models

Authors: Stephane Hess, David Bunch, Andrew Daly

Choice modellers routinely acknowledge the risk of convergence to inferior
local optima when using structures other than a simple linear-in-parameters
logit model. At the same time, there is no consensus on appropriate mechanisms
for addressing this issue. Most analysts seem to ignore the problem, while
others try a set of different starting values, or put their faith in what they
believe to be more robust estimation approaches. This paper puts forward the
use of a profile likelihood approach that systematically analyses the parameter
space around an initial maximum likelihood estimate and tests for the existence
of better local optima in that space. We extend this to an iterative algorithm
which then progressively searches for the best local optimum under given
settings for the algorithm. Using a well known stated choice dataset, we show
how the approach identifies better local optima for both latent class and mixed
logit, with the potential for substantially different policy implications. In
the case studies we conduct, an added benefit of the approach is that the new
solutions exhibit properties that more closely adhere to the property of
asymptotic normality, also highlighting the benefits of the approach in
analysing the statistical properties of a solution.

arXiv link: http://arxiv.org/abs/2506.02722v1

Econometrics arXiv updated paper (originally submitted: 2025-06-02)

Analysis of Multiple Long-Run Relations in Panel Data Models

Authors: Alexander Chudik, M. Hashem Pesaran, Ron P. Smith

The literature on panel cointegration is extensive but does not cover data
sets where the cross section dimension, $n$, is larger than the time series
dimension $T$. This paper proposes a novel methodology that filters out the
short run dynamics using sub-sample time averages as deviations from their
full-sample counterpart, and estimates the number of long-run relations and
their coefficients using eigenvalues and eigenvectors of the pooled covariance
matrix of these sub-sample deviations. We refer to this procedure as pooled
minimum eigenvalue (PME). We show that PME estimator is consistent and
asymptotically normal as $n$ and $T \rightarrow \infty$ jointly, such that
$T\approx n^{d}$, with $d>0$ for consistency and $d>1/2$ for asymptotic
normality. Extensive Monte Carlo studies show that the number of long-run
relations can be estimated with high precision, and the PME estimators have
good size and power properties. The utility of our approach is illustrated by
micro and macro applications using Compustat and Penn World Tables.

arXiv link: http://arxiv.org/abs/2506.02135v3

Econometrics arXiv paper, submitted: 2025-06-02

Stock Market Telepathy: Graph Neural Networks Predicting the Secret Conversations between MINT and G7 Countries

Authors: Nurbanu Bursa

Emerging economies, particularly the MINT countries (Mexico, Indonesia,
Nigeria, and T\"urkiye), are gaining influence in global stock markets,
although they remain susceptible to the economic conditions of developed
countries like the G7 (Canada, France, Germany, Italy, Japan, the United
Kingdom, and the United States). This interconnectedness and sensitivity of
financial markets make understanding these relationships crucial for investors
and policymakers to predict stock price movements accurately. To this end, we
examined the main stock market indices of G7 and MINT countries from 2012 to
2024, using a recent graph neural network (GNN) algorithm called multivariate
time series forecasting with graph neural network (MTGNN). This method allows
for considering complex spatio-temporal connections in multivariate time
series. In the implementations, MTGNN revealed that the US and Canada are the
most influential G7 countries regarding stock indices in the forecasting
process, and Indonesia and T\"urkiye are the most influential MINT countries.
Additionally, our results showed that MTGNN outperformed traditional methods in
forecasting the prices of stock market indices for MINT and G7 countries.
Consequently, the study offers valuable insights into economic blocks' markets
and presents a compelling empirical approach to analyzing global stock market
dynamics using MTGNN.

arXiv link: http://arxiv.org/abs/2506.01945v1

Econometrics arXiv paper, submitted: 2025-06-02

Life Sequence Transformer: Generative Modelling for Counterfactual Simulation

Authors: Alberto Cabezas, Carlotta Montorsi

Social sciences rely on counterfactual analysis using surveys and
administrative data, generally depending on strong assumptions or the existence
of suitable control groups, to evaluate policy interventions and estimate
causal effects. We propose a novel approach that leverages the Transformer
architecture to simulate counterfactual life trajectories from large-scale
administrative records. Our contributions are: the design of a novel encoding
method that transforms longitudinal administrative data to sequences and the
proposal of a generative model tailored to life sequences with overlapping
events across life domains. We test our method using data from the Istituto
Nazionale di Previdenza Sociale (INPS), showing that it enables the realistic
and coherent generation of life trajectories. This framework offers a scalable
alternative to classical counterfactual identification strategy, such as
difference-in-differences and synthetic controls, particularly in contexts
where these methods are infeasible or their assumptions unverifiable. We
validate the model's utility by comparing generated life trajectories against
established findings from causal studies, demonstrating its potential to enrich
labour market research and policy evaluation through individual-level
simulations.

arXiv link: http://arxiv.org/abs/2506.01874v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2025-06-02

Spillovers and Effect Attenuation in Firearm Policy Research in the United States

Authors: Lee Kennedy-Shaffer, Alan Hamilton Kennedy

In the United States, firearm-related deaths and injuries are a major public
health issue. Because of limited federal action, state policies are
particularly important, and their evaluation informs the actions of other
policymakers. The movement of firearms across state and local borders, however,
can undermine the effectiveness of these policies and have statistical
consequences for their empirical evaluation. This movement causes spillover and
bypass effects of policies, wherein interventions affect nearby control states
and the lack of intervention in nearby states reduces the effectiveness in the
intervention states. While some causal inference methods exist to account for
spillover effects and reduce bias, these do not necessarily align well with the
data available for firearm research or with the most policy-relevant estimands.
Integrated data infrastructure and new methods are necessary for a better
understanding of the effects these policies would have if widely adopted. In
the meantime, appropriately understanding and interpreting effect estimates
from quasi-experimental analyses is crucial for ensuring that effective
policies are not dismissed due to these statistical challenges.

arXiv link: http://arxiv.org/abs/2506.01695v1

Econometrics arXiv paper, submitted: 2025-06-02

Large Bayesian VARs for Binary and Censored Variables

Authors: Joshua C. C. Chan, Michael Pfarrhofer

We extend the standard VAR to jointly model the dynamics of binary, censored
and continuous variables, and develop an efficient estimation approach that
scales well to high-dimensional settings. In an out-of-sample forecasting
exercise, we show that the proposed VARs forecast recessions and short-term
interest rates well. We demonstrate the utility of the proposed framework using
a wide rage of empirical applications, including conditional forecasting and a
structural analysis that examines the dynamic effects of a financial shock on
recession probabilities.

arXiv link: http://arxiv.org/abs/2506.01422v1

Econometrics arXiv updated paper (originally submitted: 2025-06-01)

Can AI Master Econometrics? Evidence from Econometrics AI Agent on Expert-Level Tasks

Authors: Qiang Chen, Tianyang Han, Jin Li, Ye Luo, Yuxiao Wu, Xiaowei Zhang, Tuo Zhou

Can AI effectively perform complex econometric analysis traditionally
requiring human expertise? This paper evaluates AI agents' capability to master
econometrics, focusing on empirical analysis performance. We develop an
“Econometrics AI Agent” built on the open-source MetaGPT framework. This
agent exhibits outstanding performance in: (1) planning econometric tasks
strategically, (2) generating and executing code, (3) employing error-based
reflection for improved robustness, and (4) allowing iterative refinement
through multi-round conversations. We construct two datasets from academic
coursework materials and published research papers to evaluate performance
against real-world challenges. Comparative testing shows our domain-specialized
AI agent significantly outperforms both benchmark large language models (LLMs)
and general-purpose AI agents. This work establishes a testbed for exploring
AI's impact on social science research and enables cost-effective integration
of domain expertise, making advanced econometric methods accessible to users
with minimal coding skills. Furthermore, our AI agent enhances research
reproducibility and offers promising pedagogical applications for econometrics
teaching.

arXiv link: http://arxiv.org/abs/2506.00856v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-05-31

Learning from Double Positive and Unlabeled Data for Potential-Customer Identification

Authors: Masahiro Kato, Yuki Ikeda, Kentaro Baba, Takashi Imai, Ryo Inokuchi

In this study, we propose a method for identifying potential customers in
targeted marketing by applying learning from positive and unlabeled data (PU
learning). We consider a scenario in which a company sells a product and can
observe only the customers who purchased it. Decision-makers seek to market
products effectively based on whether people have loyalty to the company.
Individuals with loyalty are those who are likely to remain interested in the
company even without additional advertising. Consequently, those loyal
customers would likely purchase from the company if they are interested in the
product. In contrast, people with lower loyalty may overlook the product or buy
similar products from other companies unless they receive marketing attention.
Therefore, by focusing marketing efforts on individuals who are interested in
the product but do not have strong loyalty, we can achieve more efficient
marketing. To achieve this goal, we consider how to learn, from limited data, a
classifier that identifies potential customers who (i) have interest in the
product and (ii) do not have loyalty to the company. Although our algorithm
comprises a single-stage optimization, its objective function implicitly
contains two losses derived from standard PU learning settings. For this
reason, we refer to our approach as double PU learning. We verify the validity
of the proposed algorithm through numerical experiments, confirming that it
functions appropriately for the problem at hand.

arXiv link: http://arxiv.org/abs/2506.00436v2

Econometrics arXiv paper, submitted: 2025-05-30

Residual Income Valuation and Stock Returns: Evidence from a Value-to-Price Investment Strategy

Authors: Ahmad Haboub, Aris Kartsaklas, Vasilis Sarafidis

We hypothesize that portfolio sorts based on the V/P ratio generate excess
returns and consist of companies that are undervalued for prolonged periods.
Results, for the US market show that high V/P portfolios outperform low V/P
portfolios across horizons extending from one to three years. The V/P ratio is
positively correlated to future stock returns after controlling for firm
characteristics, which are well known risk proxies. Findings also indicate that
profitability and investment add explanatory power to the Fama and French three
factor model and for stocks with V/P ratio close to 1. However, these factors
cannot explain all variation in excess returns especially for years two and
three and for stocks with high V/P ratio. Finally, portfolios with the highest
V/P stocks select companies that are significantly mispriced relative to their
equity (investment) and profitability growth persistence in the future.

arXiv link: http://arxiv.org/abs/2506.00206v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-05-30

Aligning Language Models with Observational Data: Opportunities and Risks from a Causal Perspective

Authors: Erfan Loghmani

Large language models are being widely used across industries to generate
content that contributes directly to key performance metrics, such as
conversion rates. Pretrained models, however, often fall short when it comes to
aligning with human preferences or optimizing for business objectives. As a
result, fine-tuning with good-quality labeled data is essential to guide models
to generate content that achieves better results. Controlled experiments, like
A/B tests, can provide such data, but they are often expensive and come with
significant engineering and logistical challenges. Meanwhile, companies have
access to a vast amount of historical (observational) data that remains
underutilized. In this work, we study the challenges and opportunities of
fine-tuning LLMs using observational data. We show that while observational
outcomes can provide valuable supervision, directly fine-tuning models on such
data can lead them to learn spurious correlations. We present empirical
evidence of this issue using various real-world datasets and propose
DeconfoundLM, a method that explicitly removes the effect of known confounders
from reward signals. Using simulation experiments, we demonstrate that
DeconfoundLM improves the recovery of causal relationships and mitigates
failure modes found in fine-tuning methods that ignore or naively incorporate
confounding variables. Our findings highlight that while observational data
presents risks, with the right causal corrections, it can be a powerful source
of signal for LLM alignment. Please refer to the project page for code and
related resources.

arXiv link: http://arxiv.org/abs/2506.00152v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-05-30

Data Fusion for Partial Identification of Causal Effects

Authors: Quinn Lanners, Cynthia Rudin, Alexander Volfovsky, Harsh Parikh

Data fusion techniques integrate information from heterogeneous data sources
to improve learning, generalization, and decision making across data sciences.
In causal inference, these methods leverage rich observational data to improve
causal effect estimation, while maintaining the trustworthiness of randomized
controlled trials. Existing approaches often relax the strong no unobserved
confounding assumption by instead assuming exchangeability of counterfactual
outcomes across data sources. However, when both assumptions simultaneously
fail - a common scenario in practice - current methods cannot identify or
estimate causal effects. We address this limitation by proposing a novel
partial identification framework that enables researchers to answer key
questions such as: Is the causal effect positive or negative? and How severe
must assumption violations be to overturn this conclusion? Our approach
introduces interpretable sensitivity parameters that quantify assumption
violations and derives corresponding causal effect bounds. We develop doubly
robust estimators for these bounds and operationalize breakdown frontier
analysis to understand how causal conclusions change as assumption violations
increase. We apply our framework to the Project STAR study, which investigates
the effect of classroom size on students' third-grade standardized test
performance. Our analysis reveals that the Project STAR results are robust to
simultaneous violations of key assumptions, both on average and across various
subgroups of interest. This strengthens confidence in the study's conclusions
despite potential unmeasured biases in the data.

arXiv link: http://arxiv.org/abs/2505.24296v1

Econometrics arXiv updated paper (originally submitted: 2025-05-29)

A Gibbs Sampler for Efficient Bayesian Inference in Sign-Identified SVARs

Authors: Jonas E. Arias, Juan F. Rubio-Ramírez, Minchul Shin

We develop a new algorithm for inference based on structural vector
autoregressions (SVARs) identified with sign restrictions. The key insight of
our algorithm is to break apart from the accept-reject tradition associated
with sign-identified SVARs. We show that embedding an elliptical slice sampling
within a Gibbs sampler approach can deliver dramatic gains in speed and turn
previously infeasible applications into feasible ones. We provide a tractable
example to illustrate the power of the elliptical slice sampling applied to
sign-identified SVARs. We demonstrate the usefulness of our algorithm by
applying it to a well-known small-SVAR model of the oil market featuring a
tight identified set, as well as to a large SVAR model with more than 100 sign
restrictions.

arXiv link: http://arxiv.org/abs/2505.23542v2

Econometrics arXiv paper, submitted: 2025-05-29

Evaluating financial tail risk forecasts: Testing Equal Predictive Ability

Authors: Lukas Bauer

This paper provides comprehensive simulation results on the finite sample
properties of the Diebold-Mariano (DM) test by Diebold and Mariano (1995) and
the model confidence set (MCS) testing procedure by Hansen et al. (2011)
applied to the asymmetric loss functions specific to financial tail risk
forecasts, such as Value-at-Risk (VaR) and Expected Shortfall (ES). We focus on
statistical loss functions that are strictly consistent in the sense of
Gneiting (2011a). We find that the tests show little power against models that
underestimate the tail risk at the most extreme quantile levels, while the
finite sample properties generally improve with the quantile level and the
out-of-sample size. For the small quantile levels and out-of-sample sizes of up
to two years, we observe heavily skewed test statistics and non-negligible type
III errors, which implies that researchers should be cautious about using
standard normal or bootstrapped critical values. We demonstrate both
empirically and theoretically how these unfavorable finite sample results
relate to the asymmetric loss functions and the time varying volatility
inherent in financial return data.

arXiv link: http://arxiv.org/abs/2505.23333v1

Econometrics arXiv paper, submitted: 2025-05-28

A Synthetic Business Cycle Approach to Counterfactual Analysis with Nonstationary Macroeconomic Data

Authors: Zhentao Shi, Jin Xi, Haitian Xie

This paper investigates the use of synthetic control methods for causal
inference in macroeconomic settings when dealing with possibly nonstationary
data. While the synthetic control approach has gained popularity for estimating
counterfactual outcomes, we caution researchers against assuming a common
nonstationary trend factor across units for macroeconomic outcomes, as doing so
may result in misleading causal estimation-a pitfall we refer to as the
spurious synthetic control problem. To address this issue, we propose a
synthetic business cycle framework that explicitly separates trend and cyclical
components. By leveraging the treated unit's historical data to forecast its
trend and using control units only for cyclical fluctuations, our
divide-and-conquer strategy eliminates spurious correlations and improves the
robustness of counterfactual prediction in macroeconomic applications. As
empirical illustrations, we examine the cases of German reunification and the
handover of Hong Kong, demonstrating the advantages of the proposed approach.

arXiv link: http://arxiv.org/abs/2505.22388v1

Econometrics arXiv updated paper (originally submitted: 2025-05-28)

Causal Inference for Experiments with Latent Outcomes: Key Results and Their Implications for Design and Analysis

Authors: Jiawei Fu, Donald P. Green

How should researchers analyze randomized experiments in which the main
outcome is measured in multiple ways but each measure contains some degree of
error? We describe modeling approaches that enable researchers to identify
causal parameters of interest, suggest ways that experimental designs can be
augmented so as to make linear latent variable models more credible, and
discuss empirical tests of key modeling assumptions. We show that when
experimental researchers invest appropriately in multiple outcome measures, an
optimally weighted index of the outcome measures enables researchers to obtain
efficient and interpretable estimates of causal parameters by applying standard
regression methods, and that weights may be obtained using instrumental
variables regression. Maximum likelihood and generalized method of moments
estimators can be used to obtain estimates and standard errors in a single
step. An empirical application illustrates the gains in precision and
robustness that multiple outcome measures can provide.

arXiv link: http://arxiv.org/abs/2505.21909v2

Econometrics arXiv cross-link from cs.AI (cs.AI), submitted: 2025-05-27

Learning Individual Behavior in Agent-Based Models with Graph Diffusion Networks

Authors: Francesco Cozzi, Marco Pangallo, Alan Perotti, André Panisson, Corrado Monti

Agent-Based Models (ABMs) are powerful tools for studying emergent properties
in complex systems. In ABMs, agent behaviors are governed by local interactions
and stochastic rules. However, these rules are, in general, non-differentiable,
limiting the use of gradient-based methods for optimization, and thus
integration with real-world data. We propose a novel framework to learn a
differentiable surrogate of any ABM by observing its generated data. Our method
combines diffusion models to capture behavioral stochasticity and graph neural
networks to model agent interactions. Distinct from prior surrogate approaches,
our method introduces a fundamental shift: rather than approximating
system-level outputs, it models individual agent behavior directly, preserving
the decentralized, bottom-up dynamics that define ABMs. We validate our
approach on two ABMs (Schelling's segregation model and a Predator-Prey
ecosystem) showing that it replicates individual-level patterns and accurately
forecasts emergent dynamics beyond training. Our results demonstrate the
potential of combining diffusion models and graph learning for data-driven ABM
simulation.

arXiv link: http://arxiv.org/abs/2505.21426v1

Econometrics arXiv paper, submitted: 2025-05-27

Conditional Method Confidence Set

Authors: Lukas Bauer, Ekaterina Kazak

This paper proposes a Conditional Method Confidence Set (CMCS) which allows
to select the best subset of forecasting methods with equal predictive ability
conditional on a specific economic regime. The test resembles the Model
Confidence Set by Hansen et al. (2011) and is adapted for conditional forecast
evaluation. We show the asymptotic validity of the proposed test and illustrate
its properties in a simulation study. The proposed testing procedure is
particularly suitable for stress-testing of financial risk models required by
the regulators. We showcase the empirical relevance of the CMCS using the
stress-testing scenario of Expected Shortfall. The empirical evidence suggests
that the proposed CMCS procedure can be used as a robust tool for forecast
evaluation of market risk models for different economic regimes.

arXiv link: http://arxiv.org/abs/2505.21278v1

Econometrics arXiv updated paper (originally submitted: 2025-05-27)

Nonparametric "rich covariates" without saturation

Authors: Ludgero Glorias, Federico Martellosio, J. M. C. Santos Silva

We consider two nonparametric approaches to ensure that linear instrumental
variables estimators satisfy the rich-covariates condition emphasized by
Blandhol et al. (2025), even when the instrument is not unconditionally
randomly assigned and the model is not saturated. Both approaches start with a
nonparametric estimate of the expectation of the instrument conditional on the
covariates, and ensure that the rich-covariates condition is satisfied either
by using as the instrument the difference between the original instrument and
its estimated conditional expectation, or by adding the estimated conditional
expectation to the set of regressors. We derive asymptotic properties when the
first step uses kernel regression, and assess finite-sample performance in
simulations where we also use neural networks in the first step. Finally, we
present an empirical illustration that highlights some significant advantages
of the proposed methods.

arXiv link: http://arxiv.org/abs/2505.21213v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-05-27

Debiased Ill-Posed Regression

Authors: AmirEmad Ghassami, James M. Robins, Andrea Rotnitzky

In various statistical settings, the goal is to estimate a function which is
restricted by the statistical model only through a conditional moment
restriction. Prominent examples include the nonparametric instrumental variable
framework for estimating the structural function of the outcome variable, and
the proximal causal inference framework for estimating the bridge functions. A
common strategy in the literature is to find the minimizer of the projected
mean squared error. However, this approach can be sensitive to misspecification
or slow convergence rate of the estimators of the involved nuisance components.
In this work, we propose a debiased estimation strategy based on the influence
function of a modification of the projected error and demonstrate its
finite-sample convergence rate. Our proposed estimator possesses a second-order
bias with respect to the involved nuisance functions and a desirable robustness
property with respect to the misspecification of one of the nuisance functions.
The proposed estimator involves a hyper-parameter, for which the optimal value
depends on potentially unknown features of the underlying data-generating
process. Hence, we further propose a hyper-parameter selection approach based
on cross-validation and derive an error bound for the resulting estimator. This
analysis highlights the potential rate loss due to hyper-parameter selection
and underscore the importance and advantages of incorporating debiasing in this
setting. We also study the application of our approach to the estimation of
regular parameters in a specific parameter class, which are linear functionals
of the solutions to the conditional moment restrictions and provide sufficient
conditions for achieving root-n consistency using our debiased estimator.

arXiv link: http://arxiv.org/abs/2505.20787v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-05-26

Covariate-Adjusted Deep Causal Learning for Heterogeneous Panel Data Models

Authors: Guanhao Zhou, Yuefeng Han, Xiufan Yu

This paper studies the task of estimating heterogeneous treatment effects in
causal panel data models, in the presence of covariate effects. We propose a
novel Covariate-Adjusted Deep Causal Learning (CoDEAL) for panel data models,
that employs flexible model structures and powerful neural network
architectures to cohesively deal with the underlying heterogeneity and
nonlinearity of both panel units and covariate effects. The proposed CoDEAL
integrates nonlinear covariate effect components (parameterized by a
feed-forward neural network) with nonlinear factor structures (modeled by a
multi-output autoencoder) to form a heterogeneous causal panel model. The
nonlinear covariate component offers a flexible framework for capturing the
complex influences of covariates on outcomes. The nonlinear factor analysis
enables CoDEAL to effectively capture both cross-sectional and temporal
dependencies inherent in the data panel. This latent structural information is
subsequently integrated into a customized matrix completion algorithm, thereby
facilitating more accurate imputation of missing counterfactual outcomes.
Moreover, the use of a multi-output autoencoder explicitly accounts for
heterogeneity across units and enhances the model interpretability of the
latent factors. We establish theoretical guarantees on the convergence of the
estimated counterfactuals, and demonstrate the compelling performance of the
proposed method using extensive simulation studies and a real data application.

arXiv link: http://arxiv.org/abs/2505.20536v1

Econometrics arXiv paper, submitted: 2025-05-26

Intraday Functional PCA Forecasting of Cryptocurrency Returns

Authors: Joann Jasiak, Cheng Zhong

We study the Functional PCA (FPCA) forecasting method in application to
functions of intraday returns on Bitcoin. We show that improved interval
forecasts of future return functions are obtained when the conditional
heteroscedasticity of return functions is taken into account. The
Karhunen-Loeve (KL) dynamic factor model is introduced to bridge the functional
and discrete time dynamic models. It offers a convenient framework for
functional time series analysis. For intraday forecasting, we introduce a new
algorithm based on the FPCA applied by rolling, which can be used for any data
observed continuously 24/7. The proposed FPCA forecasting methods are applied
to return functions computed from data sampled hourly and at 15-minute
intervals. Next, the functional forecasts evaluated at discrete points in time
are compared with the forecasts based on other methods, including machine
learning and a traditional ARMA model. The proposed FPCA-based methods perform
well in terms of forecast accuracy and outperform competitors in terms of
directional (sign) of return forecasts at fixed points in time.

arXiv link: http://arxiv.org/abs/2505.20508v1

Econometrics arXiv updated paper (originally submitted: 2025-05-25)

Large structural VARs with multiple linear shock and impact inequality restrictions

Authors: Lukas Berend, Jan Prüser

We propose a high-dimensional structural vector autoregression framework that
features a factor structure in the error terms and accommodates a large number
of linear inequality restrictions on impact impulse responses, structural
shocks, and their element-wise products. In particular, we demonstrate that
narrative restrictions can be imposed via constraints on the structural shocks,
which can be used to sharpen inference and disentangle structurally
interpretable shocks. To estimate the model, we develop a highly efficient
sampling algorithm that scales well with both the model dimension and the
number of inequality restrictions on impact responses and structural shocks. It
remains computationally feasible even in settings where existing algorithms may
break down. To illustrate the practical utility of our approach, we identify
five structural shocks and examine the dynamic responses of thirty
macroeconomic variables, highlighting the model's flexibility and feasibility
in complex empirical applications. We provide empirical evidence that financial
shocks are the most important driver of business cycle dynamics.

arXiv link: http://arxiv.org/abs/2505.19244v2

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2025-05-25

Comparative analysis of financial data differentiation techniques using LSTM neural network

Authors: Dominik Stempień, Janusz Gajda

We compare traditional approach of computing logarithmic returns with the
fractional differencing method and its tempered extension as methods of data
preparation before their usage in advanced machine learning models.
Differencing parameters are estimated using multiple techniques. The empirical
investigation is conducted on data from four major stock indices covering the
most recent 10-year period. The set of explanatory variables is additionally
extended with technical indicators. The effectiveness of the differencing
methods is evaluated using both forecast error metrics and risk-adjusted return
trading performance metrics. The findings suggest that fractional
differentiation methods provide a suitable data transformation technique,
improving the predictive model forecasting performance. Furthermore, the
generated predictions appeared to be effective in constructing profitable
trading strategies for both individual assets and a portfolio of stock indices.
These results underline the importance of appropriate data transformation
techniques in financial time series forecasting, supporting the application of
memory-preserving techniques.

arXiv link: http://arxiv.org/abs/2505.19243v1

Econometrics arXiv updated paper (originally submitted: 2025-05-23)

Potential Outcome Modeling and Estimation in DiD Designs with Staggered Treatments

Authors: Siddhartha Chib, Kenichi Shimizu

We propose the first potential outcome modeling of Difference-in-Differences
designs with multiple time periods and variation in treatment timing.
Importantly, the modeling respects the two key identifying assumptions:
parallel trends and noanticipation. We then introduce a straightforward
Bayesian approach for estimation and inference of the time-varying group
specific Average Treatment Effects on the Treated (ATT). To improve parsimony
and guide prior elicitation, we reparametrize the model in a way that reduces
the effective number of parameters. Prior information about the ATT's is
incorporated through black-box training sample priors and, in small-sample
settings, by thick-tailed t-priors that shrink ATT's of small magnitudes toward
zero. We provide a computationally efficient Bayesian estimation procedure and
establish a Bernstein-von Mises-type result that justifies posterior inference
for the treatment effects. Simulation studies confirm that our method performs
well in both large and small samples, offering credible uncertainty
quantification even in settings that challenge standard estimators. We
illustrate the practical value of the method through an empirical application
that examines the effect of minimum wage increases on teen employment in the
United States.

arXiv link: http://arxiv.org/abs/2505.18391v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-05-23

Bayesian Deep Learning for Discrete Choice

Authors: Daniel F. Villarraga, Ricardo A. Daziano

Discrete choice models (DCMs) are used to analyze individual decision-making
in contexts such as transportation choices, political elections, and consumer
preferences. DCMs play a central role in applied econometrics by enabling
inference on key economic variables, such as marginal rates of substitution,
rather than focusing solely on predicting choices on new unlabeled data.
However, while traditional DCMs offer high interpretability and support for
point and interval estimation of economic quantities, these models often
underperform in predictive tasks compared to deep learning (DL) models. Despite
their predictive advantages, DL models remain largely underutilized in discrete
choice due to concerns about their lack of interpretability, unstable parameter
estimates, and the absence of established methods for uncertainty
quantification. Here, we introduce a deep learning model architecture
specifically designed to integrate with approximate Bayesian inference methods,
such as Stochastic Gradient Langevin Dynamics (SGLD). Our proposed model
collapses to behaviorally informed hypotheses when data is limited, mitigating
overfitting and instability in underspecified settings while retaining the
flexibility to capture complex nonlinear relationships when sufficient data is
available. We demonstrate our approach using SGLD through a Monte Carlo
simulation study, evaluating both predictive metrics--such as out-of-sample
balanced accuracy--and inferential metrics--such as empirical coverage for
marginal rates of substitution interval estimates. Additionally, we present
results from two empirical case studies: one using revealed mode choice data in
NYC, and the other based on the widely used Swiss train choice stated
preference data.

arXiv link: http://arxiv.org/abs/2505.18077v1

Econometrics arXiv cross-link from cs.CY (cs.CY), submitted: 2025-05-23

Twin-2K-500: A dataset for building digital twins of over 2,000 people based on their answers to over 500 questions

Authors: Olivier Toubia, George Z. Gui, Tianyi Peng, Daniel J. Merlau, Ang Li, Haozhe Chen

LLM-based digital twin simulation, where large language models are used to
emulate individual human behavior, holds great promise for research in AI,
social science, and digital experimentation. However, progress in this area has
been hindered by the scarcity of real, individual-level datasets that are both
large and publicly available. This lack of high-quality ground truth limits
both the development and validation of digital twin methodologies. To address
this gap, we introduce a large-scale, public dataset designed to capture a rich
and holistic view of individual human behavior. We survey a representative
sample of $N = 2,058$ participants (average 2.42 hours per person) in the US
across four waves with 500 questions in total, covering a comprehensive battery
of demographic, psychological, economic, personality, and cognitive measures,
as well as replications of behavioral economics experiments and a pricing
survey. The final wave repeats tasks from earlier waves to establish a
test-retest accuracy baseline. Initial analyses suggest the data are of high
quality and show promise for constructing digital twins that predict human
behavior well at the individual and aggregate levels. By making the full
dataset publicly available, we aim to establish a valuable testbed for the
development and benchmarking of LLM-based persona simulations. Beyond LLM
applications, due to its unique breadth and scale the dataset also enables
broad social science research, including studies of cross-construct
correlations and heterogeneous treatment effects.

arXiv link: http://arxiv.org/abs/2505.17479v1

Econometrics arXiv paper, submitted: 2025-05-21

Analysis of Distributional Dynamics for Repeated Cross-Sectional and Intra-Period Observations

Authors: Bo Hu, Joon Y. Park, Junhui Qian

This paper introduces a novel approach to investigate the dynamics of state
distributions, which accommodate both cross-sectional distributions of repeated
panels and intra-period distributions of a time series observed at high
frequency. In our approach, densities of the state distributions are regarded
as functional elements in a Hilbert space, and are assumed to follow a
functional autoregressive model. We propose an estimator for the autoregressive
operator, establish its consistency, and provide tools and asymptotics to
analyze the forecast of state density and the moment dynamics of state
distributions. We apply our methodology to study the time series of
distributions of the GBP/USD exchange rate intra-month returns and the time
series of cross-sectional distributions of the NYSE stocks monthly returns.
Finally, we conduct simulations to evaluate the density forecasts based on our
model.

arXiv link: http://arxiv.org/abs/2505.15763v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-05-21

SplitWise Regression: Stepwise Modeling with Adaptive Dummy Encoding

Authors: Marcell T. Kurbucz, Nikolaos Tzivanakis, Nilufer Sari Aslam, Adam M. Sykulski

Capturing nonlinear relationships without sacrificing interpretability
remains a persistent challenge in regression modeling. We introduce SplitWise,
a novel framework that enhances stepwise regression. It adaptively transforms
numeric predictors into threshold-based binary features using shallow decision
trees, but only when such transformations improve model fit, as assessed by the
Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC).
This approach preserves the transparency of linear models while flexibly
capturing nonlinear effects. Implemented as a user-friendly R package,
SplitWise is evaluated on both synthetic and real-world datasets. The results
show that it consistently produces more parsimonious and generalizable models
than traditional stepwise and penalized regression techniques.

arXiv link: http://arxiv.org/abs/2505.15423v1

Econometrics arXiv paper, submitted: 2025-05-20

Dynamic Decision-Making under Model Misspecification

Authors: Xinyu Dai

In this study, I investigate the dynamic decision problem with a finite
parameter space when the functional form of conditional expected rewards is
misspecified. Traditional algorithms, such as Thompson Sampling, guarantee
neither an $O(e^{-T})$ rate of posterior parameter concentration nor an
$O(T^{-1})$ rate of average regret. However, under mild conditions, we can
still achieve an exponential convergence rate of the parameter to a pseudo
truth set, an extension of the pseudo truth parameter concept introduced by
White (1982). I further characterize the necessary conditions for the
convergence of the expected posterior within this pseudo-truth set. Simulations
demonstrate that while the maximum a posteriori (MAP) estimate of the
parameters fails to converge under misspecification, the algorithm's average
regret remains relatively robust compared to the correctly specified case.
These findings suggest opportunities to design simple yet robust algorithms
that achieve desirable outcomes even in the presence of model
misspecifications.

arXiv link: http://arxiv.org/abs/2505.14913v1

Econometrics arXiv paper, submitted: 2025-05-20

Bubble Detection with Application to Green Bubbles: A Noncausal Approach

Authors: Francesco Giancaterini, Alain Hecq, Joann Jasiak, Aryan Manafi Neyazi

This paper introduces a new approach to detect bubbles based on mixed causal
and noncausal processes and their tail process representation during explosive
episodes. Departing from traditional definitions of bubbles as nonstationary
and temporarily explosive processes, we adopt a perspective in which prices are
viewed as following a strictly stationary process, with the bubble considered
an intrinsic component of its non-linear dynamics. We illustrate our approach
on the phenomenon referred to as the "green bubble" in the field of renewable
energy investment.

arXiv link: http://arxiv.org/abs/2505.14911v1

Econometrics arXiv paper, submitted: 2025-05-20

The Post Double LASSO for Efficiency Analysis

Authors: Christopher Parmeter, Artem Prokhorov, Valentin Zelenyuk

Big data and machine learning methods have become commonplace across economic
milieus. One area that has not seen as much attention to these important topics
yet is efficiency analysis. We show how the availability of big (wide) data can
actually make detection of inefficiency more challenging. We then show how
machine learning methods can be leveraged to adequately estimate the primitives
of the frontier itself as well as inefficiency using the `post double LASSO' by
deriving Neyman orthogonal moment conditions for this problem. Finally, an
application is presented to illustrate key differences of the post-double LASSO
compared to other approaches.

arXiv link: http://arxiv.org/abs/2505.14282v1

Econometrics arXiv paper, submitted: 2025-05-20

The Impact of Research and Development (R&D) Expenditures on the Value Added in the Agricultural Sector of Iran

Authors: Soheil Hataminia, Tania Khosravi

In this study, the impact of research and development (R&D) expenditures on
the value added of the agricultural sector in Iran was investigated for the
period 1971-2021. For data analysis, the researchers utilized the ARDL
econometric model and EViews software. The results indicated that R&D
expenditures, both in the short and long run, have a significant positive
effect on the value added in the agricultural sector. The estimated elasticity
coefficient for R&D expenditures in the short run was 0.45 and in the long run
was 0.35, indicating that with a 1 percent increase in research and development
expenditures, the value added in the agricultural sector would increase by 0.45
percent in the short run and by 0.35 percent in the long run. Moreover,
variables such as capital stock, number of employees in the agricultural
sector, and working days also had a significant and positive effect on the
value added in the agricultural sector.

arXiv link: http://arxiv.org/abs/2505.14746v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-05-20

Adaptive stable distribution and Hurst exponent by method of moments moving estimator for nonstationary time series

Authors: Jarek Duda

Nonstationarity of real-life time series requires model adaptation. In
classical approaches like ARMA-ARCH there is assumed some arbitrarily chosen
dependence type. To avoid their bias, we will focus on novel more agnostic
approach: moving estimator, which estimates parameters separately for every
time $t$: optimizing $F_t=\sum_{\tau<t} (1-\eta)^{t-\tau} \ln(\rho_\theta
(x_\tau))$ local log-likelihood with exponentially weakening weights of the old
values. In practice such moving estimates can be found by EMA (exponential
moving average) of some parameters, like $m_p=E[|x-\mu|^p]$ absolute central
moments, updated by $m_{p,t+1} = m_{p,t} + \eta (|x_t-\mu_t|^p-m_{p,t})$. We
will focus here on its applications for alpha-Stable distribution, which also
influences Hurst exponent, hence can be used for its adaptive estimation. Its
application will be shown on financial data as DJIA time series - beside
standard estimation of evolution of center $\mu$ and scale parameter $\sigma$,
there is also estimated evolution of $\alpha$ parameter allowing to
continuously evaluate market stability - tails having $\rho(x) \sim
1/|x|^{\alpha+1}$ behavior, controlling probability of potentially dangerous
extreme events.

arXiv link: http://arxiv.org/abs/2506.05354v1

Econometrics arXiv cross-link from quant-ph (quant-ph), submitted: 2025-05-20

Quantum Reservoir Computing for Realized Volatility Forecasting

Authors: Qingyu Li, Chiranjib Mukhopadhyay, Abolfazl Bayat, Ali Habibnia

Recent advances in quantum computing have demonstrated its potential to
significantly enhance the analysis and forecasting of complex classical data.
Among these, quantum reservoir computing has emerged as a particularly powerful
approach, combining quantum computation with machine learning for modeling
nonlinear temporal dependencies in high-dimensional time series. As with many
data-driven disciplines, quantitative finance and econometrics can hugely
benefit from emerging quantum technologies. In this work, we investigate the
application of quantum reservoir computing for realized volatility forecasting.
Our model employs a fully connected transverse-field Ising Hamiltonian as the
reservoir with distinct input and memory qubits to capture temporal
dependencies. The quantum reservoir computing approach is benchmarked against
several econometric models and standard machine learning algorithms. The models
are evaluated using multiple error metrics and the model confidence set
procedures. To enhance interpretability and mitigate current quantum hardware
limitations, we utilize wrapper-based forward selection for feature selection,
identifying optimal subsets, and quantifying feature importance via Shapley
values. Our results indicate that the proposed quantum reservoir approach
consistently outperforms benchmark models across various metrics, highlighting
its potential for financial forecasting despite existing quantum hardware
constraints. This work serves as a proof-of-concept for the applicability of
quantum computing in econometrics and financial analysis, paving the way for
further research into quantum-enhanced predictive modeling as quantum hardware
capabilities continue to advance.

arXiv link: http://arxiv.org/abs/2505.13933v1

Econometrics arXiv paper, submitted: 2025-05-20

Valid Post-Contextual Bandit Inference

Authors: Ramon van den Akker, Bas J. M. Werker, Bo Zhou

We establish an asymptotic framework for the statistical analysis of the
stochastic contextual multi-armed bandit problem (CMAB), which is widely
employed in adaptively randomized experiments across various fields. While
algorithms for maximizing rewards or, equivalently, minimizing regret have
received considerable attention, our focus centers on statistical inference
with adaptively collected data under the CMAB model. To this end we derive the
limit experiment (in the Hajek-Le Cam sense). This limit experiment is highly
nonstandard and, applying Girsanov's theorem, we obtain a structural
representation in terms of stochastic differential equations. This structural
representation, and a general weak convergence result we develop, allow us to
obtain the asymptotic distribution of statistics for the CMAB problem. In
particular, we obtain the asymptotic distributions for the classical t-test
(non-Gaussian), Adaptively Weighted tests, and Inverse Propensity Weighted
tests (non-Gaussian). We show that, when comparing both arms, validity of these
tests requires the sampling scheme to be translation invariant in a way we make
precise. We propose translation-invariant versions of Thompson, tempered
greedy, and tempered Upper Confidence Bound sampling. Simulation results
corroborate our asymptotic analysis.

arXiv link: http://arxiv.org/abs/2505.13897v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2025-05-20

Characterization of Efficient Influence Function for Off-Policy Evaluation Under Optimal Policies

Authors: Haoyu Wei

Off-policy evaluation (OPE) provides a powerful framework for estimating the
value of a counterfactual policy using observational data, without the need for
additional experimentation. Despite recent progress in robust and efficient OPE
across various settings, rigorous efficiency analysis of OPE under an estimated
optimal policy remains limited. In this paper, we establish a concise
characterization of the efficient influence function (EIF) for the value
function under optimal policy within canonical Markov decision process models.
Specifically, we provide the sufficient conditions for the existence of the EIF
and characterize its expression. We also give the conditions under which the
EIF does not exist.

arXiv link: http://arxiv.org/abs/2505.13809v3

Econometrics arXiv paper, submitted: 2025-05-19

Machine learning the first stage in 2SLS: Practical guidance from bias decomposition and simulation

Authors: Connor Lennon, Edward Rubin, Glen Waddell

Machine learning (ML) primarily evolved to solve "prediction problems." The
first stage of two-stage least squares (2SLS) is a prediction problem,
suggesting potential gains from ML first-stage assistance. However, little
guidance exists on when ML helps 2SLS$x2014$or when it hurts. We
investigate the implications of inserting ML into 2SLS, decomposing the bias
into three informative components. Mechanically, ML-in-2SLS procedures face
issues common to prediction and causal-inference settings$x2014$and
their interaction. Through simulation, we show linear ML methods (e.g.,
post-Lasso) work well, while nonlinear methods (e.g., random forests, neural
nets) generate substantial bias in second-stage
estimates$x2014$potentially exceeding the bias of endogenous OLS.

arXiv link: http://arxiv.org/abs/2505.13422v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-05-19

From What Ifs to Insights: Counterfactuals in Causal Inference vs. Explainable AI

Authors: Galit Shmueli, David Martens, Jaewon Yoo, Travis Greene

Counterfactuals play a pivotal role in the two distinct data science fields
of causal inference (CI) and explainable artificial intelligence (XAI). While
the core idea behind counterfactuals remains the same in both fields--the
examination of what would have happened under different circumstances--there
are key differences in how they are used and interpreted. We introduce a formal
definition that encompasses the multi-faceted concept of the counterfactual in
CI and XAI. We then discuss how counterfactuals are used, evaluated, generated,
and operationalized in CI vs. XAI, highlighting conceptual and practical
differences. By comparing and contrasting the two, we hope to identify
opportunities for cross-fertilization across CI and XAI.

arXiv link: http://arxiv.org/abs/2505.13324v1

Econometrics arXiv paper, submitted: 2025-05-19

CATS: Clustering-Aggregated and Time Series for Business Customer Purchase Intention Prediction

Authors: Yingjie Kuang, Tianchen Zhang, Zhen-Wei Huang, Zhongjie Zeng, Zhe-Yuan Li, Ling Huang, Yuefang Gao

Accurately predicting customers' purchase intentions is critical to the
success of a business strategy. Current researches mainly focus on analyzing
the specific types of products that customers are likely to purchase in the
future, little attention has been paid to the critical factor of whether
customers will engage in repurchase behavior. Predicting whether a customer
will make the next purchase is a classic time series forecasting task. However,
in real-world purchasing behavior, customer groups typically exhibit imbalance
- i.e., there are a large number of occasional buyers and a small number of
loyal customers. This head-to-tail distribution makes traditional time series
forecasting methods face certain limitations when dealing with such problems.
To address the above challenges, this paper proposes a unified Clustering and
Attention mechanism GRU model (CAGRU) that leverages multi-modal data for
customer purchase intention prediction. The framework first performs customer
profiling with respect to the customer characteristics and clusters the
customers to delineate the different customer clusters that contain similar
features. Then, the time series features of different customer clusters are
extracted by GRU neural network and an attention mechanism is introduced to
capture the significance of sequence locations. Furthermore, to mitigate the
head-to-tail distribution of customer segments, we train the model separately
for each customer segment, to adapt and capture more accurately the differences
in behavioral characteristics between different customer segments, as well as
the similar characteristics of the customers within the same customer segment.
We constructed four datasets and conducted extensive experiments to demonstrate
the superiority of the proposed CAGRU approach.

arXiv link: http://arxiv.org/abs/2505.13558v1

Econometrics arXiv updated paper (originally submitted: 2025-05-18)

Opening the Black Box of Local Projections

Authors: Philippe Goulet Coulombe, Karin Klieber

Local projections (LPs) are widely used in empirical macroeconomics to
estimate impulse responses to policy interventions. Yet, in many ways, they are
black boxes. It is often unclear what mechanism or historical episodes drive a
particular estimate. We introduce a new decomposition of LP estimates into the
sum of contributions of historical events, which is the product, for each time
stamp, of a weight and the realization of the response variable. In the least
squares case, we show that these weights admit two interpretations. First, they
represent purified and standardized shocks. Second, they serve as proximity
scores between the projected policy intervention and past interventions in the
sample. Notably, this second interpretation extends naturally to machine
learning methods, many of which yield impulse responses that, while nonlinear
in predictors, still aggregate past outcomes linearly via proximity-based
weights. Applying this framework to shocks in monetary and fiscal policy,
global temperature, and the excess bond premium, we find that easily
identifiable events-such as Nixon's interference with the Fed, stagflation,
World War II, and the Mount Agung volcanic eruption-emerge as dominant drivers
of often heavily concentrated impulse response estimates.

arXiv link: http://arxiv.org/abs/2505.12422v2

Econometrics arXiv paper, submitted: 2025-05-18

Multivariate Affine GARCH with Heavy Tails: A Unified Framework for Portfolio Optimization and Option Valuation

Authors: Ayush Jha, Abootaleb Shirvani, Ali Jaffri, Svetlozar T. Rachev, Frank J. Fabozzi

This paper develops and estimates a multivariate affine GARCH(1,1) model with
Normal Inverse Gaussian innovations that captures time-varying volatility,
heavy tails, and dynamic correlation across asset returns. We generalize the
Heston-Nandi framework to a multivariate setting and apply it to 30 Dow Jones
Industrial Average stocks. The model jointly supports three core financial
applications: dynamic portfolio optimization, wealth path simulation, and
option pricing. Closed-form solutions are derived for a Constant Relative Risk
Aversion (CRRA) investor's intertemporal asset allocation, and we implement a
forward-looking risk-adjusted performance comparison against Merton-style
constant strategies. Using the model's conditional volatilities, we also
construct implied volatility surfaces for European options, capturing skew and
smile features. Empirically, we document substantial wealth-equivalent utility
losses from ignoring time-varying correlation and tail risk. These findings
underscore the value of a unified econometric framework for analyzing joint
asset dynamics and for managing portfolio and derivative exposures under
non-Gaussian risks.

arXiv link: http://arxiv.org/abs/2505.12198v1

Econometrics arXiv paper, submitted: 2025-05-17

(Visualizing) Plausible Treatment Effect Paths

Authors: Simon Freyaldenhoven, Christian Hansen

We consider point estimation and inference for the treatment effect path of a
policy. Examples include dynamic treatment effects in microeconomics, impulse
response functions in macroeconomics, and event study paths in finance. We
present two sets of plausible bounds to quantify and visualize the uncertainty
associated with this object. Both plausible bounds are often substantially
tighter than traditional confidence intervals, and can provide useful insights
even when traditional (uniform) confidence bands appear uninformative. Our
bounds can also lead to markedly different conclusions when there is
significant correlation in the estimates, reflecting the fact that traditional
confidence bands can be ineffective at visualizing the impact of such
correlation. Our first set of bounds covers the average (or overall) effect
rather than the entire treatment path. Our second set of bounds imposes
data-driven smoothness restrictions on the treatment path. Post-selection
Inference (Berk et al. [2013]) provides formal coverage guarantees for these
bounds. The chosen restrictions also imply novel point estimates that perform
well across our simulations.

arXiv link: http://arxiv.org/abs/2505.12014v1

Econometrics arXiv paper, submitted: 2025-05-17

A New Bayesian Bootstrap for Quantitative Trade and Spatial Models

Authors: Bas Sanders

Economists use quantitative trade and spatial models to make counterfactual
predictions. Because such predictions often inform policy decisions, it is
important to communicate the uncertainty surrounding them. Three key challenges
arise in this setting: the data are dyadic and exhibit complex dependence; the
number of interacting units is typically small; and counterfactual predictions
depend on the data in two distinct ways-through the estimation of structural
parameters and through their role as inputs into the model's counterfactual
equilibrium. I address these challenges by proposing a new Bayesian bootstrap
procedure tailored to this context. The method is simple to implement and
provides both finite-sample Bayesian and asymptotic frequentist guarantees.
Revisiting the results in Waugh (2010), Caliendo and Parro (2015), and
Artuc et al. (2010) illustrates the practical advantages of the approach.

arXiv link: http://arxiv.org/abs/2505.11967v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-05-16

IISE PG&E Energy Analytics Challenge 2025: Hourly-Binned Regression Models Beat Transformers in Load Forecasting

Authors: Millend Roy, Vladimir Pyltsov, Yinbo Hu

Accurate electricity load forecasting is essential for grid stability,
resource optimization, and renewable energy integration. While
transformer-based deep learning models like TimeGPT have gained traction in
time-series forecasting, their effectiveness in long-term electricity load
prediction remains uncertain. This study evaluates forecasting models ranging
from classical regression techniques to advanced deep learning architectures
using data from the ESD 2025 competition. The dataset includes two years of
historical electricity load data, alongside temperature and global horizontal
irradiance (GHI) across five sites, with a one-day-ahead forecasting horizon.
Since actual test set load values remain undisclosed, leveraging predicted
values would accumulate errors, making this a long-term forecasting challenge.
We employ (i) Principal Component Analysis (PCA) for dimensionality reduction
and (ii) frame the task as a regression problem, using temperature and GHI as
covariates to predict load for each hour, (iii) ultimately stacking 24 models
to generate yearly forecasts.
Our results reveal that deep learning models, including TimeGPT, fail to
consistently outperform simpler statistical and machine learning approaches due
to the limited availability of training data and exogenous variables. In
contrast, XGBoost, with minimal feature engineering, delivers the lowest error
rates across all test cases while maintaining computational efficiency. This
highlights the limitations of deep learning in long-term electricity
forecasting and reinforces the importance of model selection based on dataset
characteristics rather than complexity. Our study provides insights into
practical forecasting applications and contributes to the ongoing discussion on
the trade-offs between traditional and modern forecasting methods.

arXiv link: http://arxiv.org/abs/2505.11390v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-05-16

A Cautionary Tale on Integrating Studies with Disparate Outcome Measures for Causal Inference

Authors: Harsh Parikh, Trang Quynh Nguyen, Elizabeth A. Stuart, Kara E. Rudolph, Caleb H. Miles

Data integration approaches are increasingly used to enhance the efficiency
and generalizability of studies. However, a key limitation of these methods is
the assumption that outcome measures are identical across datasets -- an
assumption that often does not hold in practice. Consider the following opioid
use disorder (OUD) studies: the XBOT trial and the POAT study, both evaluating
the effect of medications for OUD on withdrawal symptom severity (not the
primary outcome of either trial). While XBOT measures withdrawal severity using
the subjective opiate withdrawal scale, POAT uses the clinical opiate
withdrawal scale. We analyze this realistic yet challenging setting where
outcome measures differ across studies and where neither study records both
types of outcomes. Our paper studies whether and when integrating studies with
disparate outcome measures leads to efficiency gains. We introduce three sets
of assumptions -- with varying degrees of strength -- linking both outcome
measures. Our theoretical and empirical results highlight a cautionary tale:
integration can improve asymptotic efficiency only under the strongest
assumption linking the outcomes. However, misspecification of this assumption
leads to bias. In contrast, a milder assumption may yield finite-sample
efficiency gains, yet these benefits diminish as sample size increases. We
illustrate these trade-offs via a case study integrating the XBOT and POAT
datasets to estimate the comparative effect of two medications for opioid use
disorder on withdrawal symptoms. By systematically varying the assumptions
linking the SOW and COW scales, we show potential efficiency gains and the
risks of bias. Our findings emphasize the need for careful assumption selection
when fusing datasets with differing outcome measures, offering guidance for
researchers navigating this common challenge in modern data integration.

arXiv link: http://arxiv.org/abs/2505.11014v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-05-16

Tractable Unified Skew-t Distribution and Copula for Heterogeneous Asymmetries

Authors: Lin Deng, Michael Stanley Smith, Worapree Maneesoonthorn

Multivariate distributions that allow for asymmetry and heavy tails are
important building blocks in many econometric and statistical models. The
Unified Skew-t (UST) is a promising choice because it is both scalable and
allows for a high level of flexibility in the asymmetry in the distribution.
However, it suffers from parameter identification and computational hurdles
that have to date inhibited its use for modeling data. In this paper we propose
a new tractable variant of the unified skew-t (TrUST) distribution that
addresses both challenges. Moreover, the copula of this distribution is shown
to also be tractable, while allowing for greater heterogeneity in asymmetric
dependence over variable pairs than the popular skew-t copula. We show how
Bayesian posterior inference for both the distribution and its copula can be
computed using an extended likelihood derived from a generative representation
of the distribution. The efficacy of this Bayesian method, and the enhanced
flexibility of both the TrUST distribution and its implicit copula, is first
demonstrated using simulated data. Applications of the TrUST distribution to
highly skewed regional Australian electricity prices, and the TrUST copula to
intraday U.S. equity returns, demonstrate how our proposed distribution and its
copula can provide substantial increases in accuracy over the popular skew-t
and its copula in practice.

arXiv link: http://arxiv.org/abs/2505.10849v1

Econometrics arXiv paper, submitted: 2025-05-16

Distribution Regression with Censored Selection

Authors: Ivan Fernandez-Val, Seoyun Hong

We develop a distribution regression model with a censored selection rule,
offering a semi-parametric generalization of the Heckman selection model. Our
approach applies to the entire distribution, extending beyond the mean or
median, accommodates non-Gaussian error structures, and allows for
heterogeneous effects of covariates on both the selection and outcome
distributions. By employing a censored selection rule, our model can uncover
richer selection patterns according to both outcome and selection variables,
compared to the binary selection case. We analyze identification, estimation,
and inference of model functionals such as sorting parameters and distributions
purged of sample selection. An application to labor supply using data from the
UK reveals different selection patterns into full-time and overtime work across
gender, marital status, and time. Additionally, decompositions of wage
distributions by gender show that selection effects contribute to a decrease in
the observed gender wage gap at low quantiles and an increase in the gap at
high quantiles for full-time workers. The observed gender wage gap among
overtime workers is smaller, which may be driven by different selection
behaviors into overtime work across genders.

arXiv link: http://arxiv.org/abs/2505.10814v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-05-15

Statistically Significant Linear Regression Coefficients Solely Driven By Outliers In Finite-sample Inference

Authors: Felix Reichel

In this paper, we investigate the impact of outliers on the statistical
significance of coefficients in linear regression. We demonstrate, through
numerical simulation using R, that a single outlier can cause an otherwise
insignificant coefficient to appear statistically significant. We compare this
with robust Huber regression, which reduces the effects of outliers.
Afterwards, we approximate the influence of a single outlier on estimated
regression coefficients and discuss common diagnostic statistics to detect
influential observations in regression (e.g., studentized residuals).
Furthermore, we relate this issue to the optional normality assumption in
simple linear regression [14], required for exact finite-sample inference but
asymptotically justified for large n by the Central Limit Theorem (CLT). We
also address the general dangers of relying solely on p-values without
performing adequate regression diagnostics. Finally, we provide a brief
overview of regression methods and discuss how they relate to the assumptions
of the Gauss-Markov theorem.

arXiv link: http://arxiv.org/abs/2505.10738v2

Econometrics arXiv updated paper (originally submitted: 2025-05-15)

Optimal Post-Hoc Theorizing

Authors: Andrew Y. Chen

For many economic questions, the empirical results are not interesting unless
they are strong. For these questions, theorizing before the results are known
is not always optimal. Instead, the optimal sequencing of theory and empirics
trades off a “Darwinian Learning” effect from theorizing first with a
“Statistical Learning” effect from examining the data first. This short paper
formalizes the tradeoff in a Bayesian model. In the modern era of mature
economic theory and enormous datasets, I argue that post hoc theorizing is
typically optimal.

arXiv link: http://arxiv.org/abs/2505.10370v2

Econometrics arXiv updated paper (originally submitted: 2025-05-15)

Better Understanding Triple Differences Estimators

Authors: Marcelo Ortiz-Villavicencio, Pedro H. C. Sant'Anna

Triple Differences (DDD) designs are widely used in empirical work to relax
parallel trends assumptions in Difference-in-Differences (DiD) settings. This
paper highlights that common DDD implementations -- such as taking the
difference between two DiDs or applying three-way fixed effects regressions --
are generally invalid when identification requires conditioning on covariates.
In staggered adoption settings, the common DiD practice of pooling all
not-yet-treated units as a comparison group can introduce additional bias, even
when covariates are not required for identification. These insights challenge
conventional empirical strategies and underscore the need for estimators
tailored specifically to DDD structures. We develop regression adjustment,
inverse probability weighting, and doubly robust estimators that remain valid
under covariate-adjusted DDD parallel trends. For staggered designs, we
demonstrate how to effectively utilize multiple comparison groups to obtain
more informative inferences. Simulations and three empirical applications
highlight bias reductions and precision gains relative to standard approaches.
A companion R package is available.

arXiv link: http://arxiv.org/abs/2505.09942v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2025-05-14

Sequential Scoring Rule Evaluation for Forecast Method Selection

Authors: David T. Frazier, Donald S. Poskitt

This paper shows that sequential statistical analysis techniques can be
generalised to the problem of selecting between alternative forecasting methods
using scoring rules. A return to basic principles is necessary in order to show
that ideas and concepts from sequential statistical methods can be adapted and
applied to sequential scoring rule evaluation (SSRE). One key technical
contribution of this paper is the development of a large deviations type result
for SSRE schemes using a change of measure that parallels a traditional
exponential tilting form. Further, we also show that SSRE will terminate in
finite time with probability one, and that the moments of the SSRE stopping
time exist. A second key contribution is to show that the exponential tilting
form underlying our large deviations result allows us to cast SSRE within the
framework of generalised e-values. Relying on this formulation, we devise
sequential testing approaches that are both powerful and maintain control on
error probabilities underlying the analysis. Through several simulated
examples, we demonstrate that our e-values based SSRE approach delivers
reliable results that are more powerful than more commonly applied testing
methods precisely in the situations where these commonly applied methods can be
expected to fail.

arXiv link: http://arxiv.org/abs/2505.09090v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-05-13

Assumption-robust Causal Inference

Authors: Aditya Ghosh, Dominik Rothenhäusler

In observational causal inference, it is common to encounter multiple
adjustment sets that appear equally plausible. It is often untestable which of
these adjustment sets are valid to adjust for (i.e., satisfies ignorability).
This discrepancy can pose practical challenges as it is typically unclear how
to reconcile multiple, possibly conflicting estimates of the average treatment
effect (ATE). A naive approach is to report the whole range (convex hull of the
union) of the resulting confidence intervals. However, the width of this
interval might not shrink to zero in large samples and can be unnecessarily
wide in real applications. To address this issue, we propose a summary
procedure that generates a single estimate, one confidence interval, and
identifies a set of units for which the causal effect estimate remains valid,
provided at least one adjustment set is valid. The width of our proposed
confidence interval shrinks to zero with sample size at $n^{-1/2}$ rate, unlike
the original range which is of constant order. Thus, our assumption-robust
approach enables reliable causal inference on the ATE even in scenarios where
most of the adjustment sets are invalid. Admittedly, this robustness comes at a
cost: our inferential guarantees apply to a target population close to, but
different from, the one originally intended. We use synthetic and real-data
examples to demonstrate that our proposed procedure provides substantially
tighter confidence intervals for the ATE as compared to the whole range.

arXiv link: http://arxiv.org/abs/2505.08729v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-05-13

An Efficient Multi-scale Leverage Effect Estimator under Dependent Microstructure Noise

Authors: Ziyang Xiong, Zhao Chen, Christina Dan Wang

Estimating the leverage effect from high-frequency data is vital but
challenged by complex, dependent microstructure noise, often exhibiting
non-Gaussian higher-order moments. This paper introduces a novel multi-scale
framework for efficient and robust leverage effect estimation under such
flexible noise structures. We develop two new estimators, the
Subsampling-and-Averaging Leverage Effect (SALE) and the Multi-Scale Leverage
Effect (MSLE), which adapt subsampling and multi-scale approaches holistically
using a unique shifted window technique. This design simplifies the multi-scale
estimation procedure and enhances noise robustness without requiring the
pre-averaging approach. We establish central limit theorems and stable
convergence, with MSLE achieving convergence rates of an optimal $n^{-1/4}$ and
a near-optimal $n^{-1/9}$ for the noise-free and noisy settings, respectively.
A cornerstone of our framework's efficiency is a specifically designed MSLE
weighting strategy that leverages covariance structures across scales. This
significantly reduces asymptotic variance and, critically, yields substantially
smaller finite-sample errors than existing methods under both noise-free and
realistic noisy settings. Extensive simulations and empirical analyses confirm
the superior efficiency, robustness, and practical advantages of our approach.

arXiv link: http://arxiv.org/abs/2505.08654v2

Econometrics arXiv updated paper (originally submitted: 2025-05-13)

On Selection of Cross-Section Averages in Non-stationary Environments

Authors: Jan Ditzen, Ovidijus Stauskas

Information criteria (IC) have been widely used in factor models to estimate
an unknown number of latent factors. It has recently been shown that IC perform
well in Common Correlated Effects (CCE) and related setups in selecting a set
of cross-section averages (CAs) sufficient for the factor space under
stationary factors. As CAs can proxy non-stationary factors, it is tempting to
claim such generality of IC, too. We show formally and in simulations that IC
have a severe underselection issue even under very mild forms of factor
non-stationarity, which goes against the sentiment in the literature.

arXiv link: http://arxiv.org/abs/2505.08615v4

Econometrics arXiv updated paper (originally submitted: 2025-05-13)

Team Networks with Partially Observed Links

Authors: Yang Xu

This paper studies a linear production model in team networks with missing
links. In the model, heterogeneous workers, represented as nodes, produce
jointly and repeatedly within teams, represented as links. Links are omitted
when their associated outcome variables fall below a threshold, resulting in
partial observability of the network. To address this, I propose a Generalized
Method of Moments estimator under normally distributed errors and develop a
distribution-free test for detecting link truncation. Applied to academic
publication data, the estimator reveals and corrects a substantial downward
bias in the estimated scaling factor that aggregates individual fixed effects
into team-specific fixed effects. This finding suggests that the collaboration
premium may be systematically underestimated when missing links are not
properly accounted for.

arXiv link: http://arxiv.org/abs/2505.08405v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-05-12

rd2d: Causal Inference in Boundary Discontinuity Designs

Authors: Matias D. Cattaneo, Rocio Titiunik, Ruiqi Rae Yu

Boundary discontinuity designs -- also known as Multi-Score Regression
Discontinuity (RD) designs, with Geographic RD designs as a prominent example
-- are often used in empirical research to learn about causal treatment effects
along a continuous assignment boundary defined by a bivariate score. This
article introduces the R package rd2d, which implements and extends the
methodological results developed in Cattaneo, Titiunik and Yu (2025) for
boundary discontinuity designs. The package employs local polynomial estimation
and inference using either the bivariate score or a univariate
distance-to-boundary metric. It features novel data-driven bandwidth selection
procedures, and offers both pointwise and uniform estimation and inference
along the assignment boundary. The numerical performance of the package is
demonstrated through a simulation study.

arXiv link: http://arxiv.org/abs/2505.07989v2

Econometrics arXiv paper, submitted: 2025-05-10

Exploring Monetary Policy Shocks with Large-Scale Bayesian VARs

Authors: Dimitris Korobilis

I introduce a high-dimensional Bayesian vector autoregressive (BVAR)
framework designed to estimate the effects of conventional monetary policy
shocks. The model captures structural shocks as latent factors, enabling
computationally efficient estimation in high-dimensional settings through a
straightforward Gibbs sampler. By incorporating time variation in the effects
of monetary policy while maintaining tractability, the methodology offers a
flexible and scalable approach to empirical macroeconomic analysis using BVARs,
well-suited to handle data irregularities observed in recent times. Applied to
the U.S. economy, I identify monetary shocks using a combination of
high-frequency surprises and sign restrictions, yielding results that are
robust across a wide range of specification choices. The findings indicate that
the Federal Reserve's influence on disaggregated consumer prices fluctuated
significantly during the 2022-24 high-inflation period, shedding new light on
the evolving dynamics of monetary policy transmission.

arXiv link: http://arxiv.org/abs/2505.06649v1

Econometrics arXiv paper, submitted: 2025-05-09

Beyond the Mean: Limit Theory and Tests for Infinite-Mean Autoregressive Conditional Durations

Authors: Giuseppe Cavaliere, Thomas Mikosch, Anders Rahbek, Frederik Vilandt

Integrated autoregressive conditional duration (ACD) models serve as natural
counterparts to the well-known integrated GARCH models used for financial
returns. However, despite their resemblance, asymptotic theory for ACD is
challenging and also not complete, in particular for integrated ACD. Central
challenges arise from the facts that (i) integrated ACD processes imply
durations with infinite expectation, and (ii) even in the non-integrated case,
conventional asymptotic approaches break down due to the randomness in the
number of durations within a fixed observation period. Addressing these
challenges, we provide here unified asymptotic theory for the (quasi-) maximum
likelihood estimator for ACD models; a unified theory which includes integrated
ACD models. Based on the new results, we also provide a novel framework for
hypothesis testing in duration models, enabling inference on a key empirical
question: whether durations possess a finite or infinite expectation. We apply
our results to high-frequency cryptocurrency ETF trading data. Motivated by
parameter estimates near the integrated ACD boundary, we assess whether
durations between trades in these markets have finite expectation, an
assumption often made implicitly in the literature on point process models. Our
empirical findings indicate infinite-mean durations for all the five
cryptocurrencies examined, with the integrated ACD hypothesis rejected --
against alternatives with tail index less than one -- for four out of the five
cryptocurrencies considered.

arXiv link: http://arxiv.org/abs/2505.06190v1

Econometrics arXiv paper, submitted: 2025-05-08

Estimation and Inference in Boundary Discontinuity Designs

Authors: Matias D. Cattaneo, Rocio Titiunik, Ruiqi Rae Yu

Boundary Discontinuity Designs are used to learn about treatment effects
along a continuous boundary that splits units into control and treatment groups
according to a bivariate score variable. These research designs are also called
Multi-Score Regression Discontinuity Designs, a leading special case being
Geographic Regression Discontinuity Designs. We study the statistical
properties of commonly used local polynomial treatment effects estimators along
the continuous treatment assignment boundary. We consider two distinct
approaches: one based explicitly on the bivariate score variable for each unit,
and the other based on their univariate distance to the boundary. For each
approach, we present pointwise and uniform estimation and inference methods for
the treatment effect function over the assignment boundary. Notably, we show
that methods based on univariate distance to the boundary exhibit an
irreducible large misspecification bias when the assignment boundary has kinks
or other irregularities, making the distance-based approach unsuitable for
empirical work in those settings. In contrast, methods based on the bivariate
score variable do not suffer from that drawback. We illustrate our methods with
an empirical application. Companion general-purpose software is provided.

arXiv link: http://arxiv.org/abs/2505.05670v1

Econometrics arXiv cross-link from q-fin.RM (q-fin.RM), submitted: 2025-05-08

Comparative Evaluation of VaR Models: Historical Simulation, GARCH-Based Monte Carlo, and Filtered Historical Simulation

Authors: Xin Tian

This report presents a comprehensive evaluation of three Value-at-Risk (VaR)
modeling approaches: Historical Simulation (HS), GARCH with Normal
approximation (GARCH-N), and GARCH with Filtered Historical Simulation (FHS),
using both in-sample and multi-day forecasting frameworks. We compute daily 5
percent VaR estimates using each method and assess their accuracy via empirical
breach frequencies and visual breach indicators. Our findings reveal severe
miscalibration in the HS and GARCH-N models, with empirical breach rates far
exceeding theoretical levels. In contrast, the FHS method consistently aligns
with theoretical expectations and exhibits desirable statistical and visual
behavior. We further simulate 5-day cumulative returns under both GARCH-N and
GARCH-FHS frameworks to compute multi-period VaR and Expected Shortfall.
Results show that GARCH-N underestimates tail risk due to its reliance on the
Gaussian assumption, whereas GARCH-FHS provides more robust and conservative
tail estimates. Overall, the study demonstrates that the GARCH-FHS model offers
superior performance in capturing fat-tailed risks and provides more reliable
short-term risk forecasts.

arXiv link: http://arxiv.org/abs/2505.05646v1

Econometrics arXiv updated paper (originally submitted: 2025-05-08)

Nonparametric Testability of Slutsky Symmetry

Authors: Florian Gunsilius, Lonjezo Sithole

Economic theory implies strong limitations on what types of consumption
behavior are considered rational. Rationality implies that the Slutsky matrix,
which captures the substitution effects of compensated price changes on demand
for different goods, is symmetric and negative semi-definite. While empirically
informed versions of negative semi-definiteness have been shown to be
nonparametrically testable, the analogous question for Slutsky symmetry has
remained open. Recently, it has even been shown that the symmetry condition is
not testable via the average Slutsky matrix, prompting conjectures about its
non-testability. We settle this question by deriving nonparametric conditional
quantile restrictions on observable data that permit construction of a fully
nonparametric test for Slutsky symmetry in an empirical setting with individual
heterogeneity and endogeneity. The theoretical contribution is a multivariate
generalization of identification results for partial effects in nonseparable
models without monotonicity, which is of independent interest. This result has
implications for different areas in econometric theory, including nonparametric
welfare analysis with individual heterogeneity for which, in the case of more
than two goods, the symmetry condition introduces a nonlinear correction
factor.

arXiv link: http://arxiv.org/abs/2505.05603v2

Econometrics arXiv paper, submitted: 2025-05-08

Measuring the Euro Area Output Gap

Authors: Matteo Barigozzi, Claudio Lissona, Matteo Luciani

We measure the Euro Area (EA) output gap and potential output using a
non-stationary dynamic factor model estimated on a large dataset of
macroeconomic and financial variables. From 2012 to 2024, we estimate that the
EA economy was tighter than policy institutions estimate, suggesting that the
slow EA growth results from a potential output issue, not a business cycle
issue. Moreover, we find that a decline in trend inflation, not slack in the
economy, kept core inflation below 2% before the pandemic and that demand
forces account for at least 30% of the post-pandemic increase in core
inflation.

arXiv link: http://arxiv.org/abs/2505.05536v1

Econometrics arXiv updated paper (originally submitted: 2025-05-08)

Forecasting Thai inflation from univariate Bayesian regression perspective

Authors: Paponpat Taveeapiradeecharoen, Popkarn Arwatchanakarn

This study investigates the forecasting performance of Bayesian shrinkage
priors in predicting Thai inflation in a univariate setup, with a particular
interest in comparing those more advance shrinkage prior to a likelihood
dominated/noninformative prior. Our forecasting exercises are evaluated using
Root Mean Squared Error (RMSE), Quantile-Weighted Continuous Ranked Probability
Scores (qwCRPS), and Log Predictive Likelihood (LPL). The empirical results
reveal several interesting findings: SV-augmented models consistently
underperform compared to their non-SV counterparts, particularly in large
predictor settings. Notably, HS, DL and LASSO in large-sized model setting
without SV exhibit superior performance across multiple horizons. This
indicates that a broader range of predictors captures economic dynamics more
effectively than modeling time-varying volatility. Furthermore, while left-tail
risks (deflationary pressures) are well-controlled by advanced priors (HS, HS+,
and DL), right-tail risks (inflationary surges) remain challenging to forecast
accurately. The results underscore the trade-off between model complexity and
forecast accuracy, with simpler models delivering more reliable predictions in
both normal and crisis periods (e.g., the COVID-19 pandemic). This study
contributes to the literature by highlighting the limitations of SV models in
high-dimensional environments and advocating for a balanced approach that
combines advanced shrinkage techniques with broad predictor coverage. These
insights are crucial for policymakers and researchers aiming to enhance the
precision of inflation forecasts in emerging economies.

arXiv link: http://arxiv.org/abs/2505.05334v2

Econometrics arXiv paper, submitted: 2025-05-08

Scenario Synthesis and Macroeconomic Risk

Authors: Tobias Adrian, Domenico Giannone, Matteo Luciani, Mike West

We introduce methodology to bridge scenario analysis and model-based risk
forecasting, leveraging their respective strengths in policy settings. Our
Bayesian framework addresses the fundamental challenge of reconciling
judgmental narrative approaches with statistical forecasting. Analysis
evaluates explicit measures of concordance of scenarios with a reference
forecasting model, delivers Bayesian predictive synthesis of the scenarios to
best match that reference, and addresses scenario set incompleteness. This
underlies systematic evaluation and integration of risks from different
scenarios, and quantifies relative support for scenarios modulo the defined
reference forecasts. The framework offers advances in forecasting in policy
institutions that supports clear and rigorous communication of evolving risks.
We also discuss broader questions of integrating judgmental information with
statistical model-based forecasts in the face of unexpected circumstances.

arXiv link: http://arxiv.org/abs/2505.05193v1

Econometrics arXiv paper, submitted: 2025-05-07

A Powerful Chi-Square Specification Test with Support Vectors

Authors: Yuhao Li, Xiaojun Song

Specification tests, such as Integrated Conditional Moment (ICM) and Kernel
Conditional Moment (KCM) tests, are crucial for model validation but often lack
power in finite samples. This paper proposes a novel framework to enhance
specification test performance using Support Vector Machines (SVMs) for
direction learning. We introduce two alternative SVM-based approaches: one
maximizes the discrepancy between nonparametric and parametric classes, while
the other maximizes the separation between residuals and the origin. Both
approaches lead to a $t$-type test statistic that converges to a standard
chi-square distribution under the null hypothesis. Our method is
computationally efficient and capable of detecting any arbitrary alternative.
Simulation studies demonstrate its superior performance compared to existing
methods, particularly in large-dimensional settings.

arXiv link: http://arxiv.org/abs/2505.04414v1

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2025-05-07

Shocking concerns: public perception about climate change and the macroeconomy

Authors: Giovanni Angelini, Maria Elena Bontempi, Luca De Angelis, Paolo Neri, Marco Maria Sorge

Public perceptions of climate change arguably contribute to shaping private
adaptation and support for policy intervention. In this paper, we propose a
novel Climate Concern Index (CCI), based on disaggregated web-search volumes
related to climate change topics, to gauge the intensity and dynamic evolution
of collective climate perceptions, and evaluate its impacts on the business
cycle. Using data from the United States over the 2004:2024 span, we capture
widespread shifts in perceived climate-related risks, particularly those
consistent with the postcognitive interpretation of affective responses to
extreme climate events. To assess the aggregate implications of evolving public
concerns about the climate, we estimate a proxy-SVAR model and find that
exogenous variation in the CCI entails a statistically significant drop in both
employment and private consumption and a persistent surge in stock market
volatility, while core inflation remains largely unaffected. These results
suggest that, even in the absence of direct physical risks, heightened concerns
for climate-related phenomena can trigger behavioral adaptation with nontrivial
consequences for the macroeconomy, thereby demanding attention from
institutional players in the macro-financial field.

arXiv link: http://arxiv.org/abs/2505.04669v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-05-06

Causal Inference in Counterbalanced Within-Subjects Designs

Authors: Justin Ho, Jonathan Min

Experimental designs are fundamental for estimating causal effects. In some
fields, within-subjects designs, which expose participants to both control and
treatment at different time periods, are used to address practical and
logistical concerns. Counterbalancing, a common technique in within-subjects
designs, aims to remove carryover effects by randomizing treatment sequences.
Despite its appeal, counterbalancing relies on the assumption that carryover
effects are symmetric and cancel out, which is often unverifiable a priori. In
this paper, we formalize the challenges of counterbalanced within-subjects
designs using the potential outcomes framework. We introduce sequential
exchangeability as an additional identification assumption necessary for valid
causal inference in these designs. To address identification concerns, we
propose diagnostic checks, the use of washout periods, and covariate
adjustments, and alternative experimental designs to counterbalanced
within-subjects design. Our findings demonstrate the limitations of
counterbalancing and provide guidance on when and how within-subjects designs
can be appropriately used for causal inference.

arXiv link: http://arxiv.org/abs/2505.03937v1

Econometrics arXiv updated paper (originally submitted: 2025-05-05)

Slope Consistency of Quasi-Maximum Likelihood Estimator for Binary Choice Models

Authors: Yoosoon Chang, Joon Y. Park, Guo Yan

Although QMLE is generally inconsistent, logistic regression relying on the
binary choice model (BCM) with logistic errors is widely used, especially in
machine learning contexts with many covariates and high-dimensional slope
coefficients. This paper revisits the slope consistency of QMLE for BCMs. Ruud
(1983) introduced a set of conditions under which QMLE may yield a constant
multiple of the slope coefficient of BCMs asymptotically. However, he did not
fully establish slope consistency of QMLE, which requires the existence of a
positive multiple of slope coefficient identified as an interior maximizer of
the population QMLE likelihood function over an appropriately restricted
parameter space. We fill this gap by providing a formal proof of slope
consistency under the same set of conditions for any binary choice model
identified as in Manski (1975, 1985). Our result implies that logistic
regression yields a consistent estimate for the slope coefficient of BCMs under
suitable conditions.

arXiv link: http://arxiv.org/abs/2505.02327v2

Econometrics arXiv cross-link from q-fin.PM (q-fin.PM), submitted: 2025-05-04

Latent Variable Estimation in Bayesian Black-Litterman Models

Authors: Thomas Y. L. Lin, Jerry Yao-Chieh Hu, Paul W. Chiou, Peter Lin

We revisit the Bayesian Black-Litterman (BL) portfolio model and remove its
reliance on subjective investor views. Classical BL requires an investor
"view": a forecast vector $q$ and its uncertainty matrix $\Omega$ that describe
how much a chosen portfolio should outperform the market. Our key idea is to
treat $(q,\Omega)$ as latent variables and learn them from market data within a
single Bayesian network. Consequently, the resulting posterior estimation
admits closed-form expression, enabling fast inference and stable portfolio
weights. Building on these, we propose two mechanisms to capture how features
interact with returns: shared-latent parametrization and feature-influenced
views; both recover classical BL and Markowitz portfolios as special cases.
Empirically, on 30-year Dow-Jones and 20-year sector-ETF data, we improve
Sharpe ratios by 50% and cut turnover by 55% relative to Markowitz and the
index baselines. This work turns BL into a fully data-driven, view-free, and
coherent Bayesian framework for portfolio optimization.

arXiv link: http://arxiv.org/abs/2505.02185v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-05-03

Unemployment Dynamics Forecasting with Machine Learning Regression Models

Authors: Kyungsu Kim

In this paper, I explored how a range of regression and machine learning
techniques can be applied to monthly U.S. unemployment data to produce timely
forecasts. I compared seven models: Linear Regression, SGDRegressor, Random
Forest, XGBoost, CatBoost, Support Vector Regression, and an LSTM network,
training each on a historical span of data and then evaluating on a later
hold-out period. Input features include macro indicators (GDP growth, CPI),
labor market measures (job openings, initial claims), financial variables
(interest rates, equity indices), and consumer sentiment.
I tuned model hyperparameters via cross-validation and assessed performance
with standard error metrics and the ability to predict the correct unemployment
direction. Across the board, tree-based ensembles (and CatBoost in particular)
deliver noticeably better forecasts than simple linear approaches, while the
LSTM captures underlying temporal patterns more effectively than other
nonlinear methods. SVR and SGDRegressor yield modest gains over standard
regression but don't match the consistency of the ensemble and deep-learning
models.
Interpretability tools ,feature importance rankings and SHAP values, point to
job openings and consumer sentiment as the most influential predictors across
all methods. By directly comparing linear, ensemble, and deep-learning
approaches on the same dataset, our study shows how modern machine-learning
techniques can enhance real-time unemployment forecasting, offering economists
and policymakers richer insights into labor market trends.
In the comparative evaluation of the models, I employed a dataset comprising
thirty distinct features over the period from January 2020 through December
2024.

arXiv link: http://arxiv.org/abs/2505.01933v1

Econometrics arXiv paper, submitted: 2025-05-02

Identification and estimation of dynamic random coefficient models

Authors: Wooyong Lee

I study panel data linear models with predetermined regressors (such as
lagged dependent variables) where coefficients are individual-specific,
allowing for heterogeneity in the effects of the regressors on the dependent
variable. I show that the model is not point-identified in a short panel
context but rather partially identified, and I characterize the identified sets
for the mean, variance, and CDF of the coefficient distribution. This
characterization is general, accommodating discrete, continuous, and unbounded
data, and it leads to computationally tractable estimation and inference
procedures. I apply the method to study lifecycle earnings dynamics among U.S.
households using the Panel Study of Income Dynamics (PSID) dataset. The results
suggest substantial unobserved heterogeneity in earnings persistence, implying
that households face varying levels of earnings risk which, in turn, contribute
to heterogeneity in their consumption and savings behaviors.

arXiv link: http://arxiv.org/abs/2505.01600v1

Econometrics arXiv cross-link from q-fin.CP (q-fin.CP), submitted: 2025-05-02

Asset Pricing in Pre-trained Transformer

Authors: Shanyan Lai

This paper proposes an innovative Transformer model, Single-directional
representative from Transformer (SERT), for US large capital stock pricing. It
also innovatively applies the pre-trained Transformer models under the stock
pricing and factor investment context. They are compared with standard
Transformer models and encoder-only Transformer models in three periods
covering the entire COVID-19 pandemic to examine the model adaptivity and
suitability during the extreme market fluctuations. Namely, pre-COVID-19 period
(mild up-trend), COVID-19 period (sharp up-trend with deep down shock) and
1-year post-COVID-19 (high fluctuation sideways movement). The best proposed
SERT model achieves the highest out-of-sample R2, 11.2% and 10.91%
respectively, when extreme market fluctuation takes place followed by
pre-trained Transformer models (10.38% and 9.15%). Their Trend-following-based
strategy wise performance also proves their excellent capability for hedging
downside risks during market shocks. The proposed SERT model achieves a Sortino
ratio 47% higher than the buy-and-hold benchmark in the equal-weighted
portfolio and 28% higher in the value-weighted portfolio when the pandemic
period is attended. It proves that Transformer models have a great capability
to capture patterns of temporal sparsity data in the asset pricing factor
model, especially with considerable volatilities. We also find the softmax
signal filter as the common configuration of Transformer models in alternative
contexts, which only eliminates differences between models, but does not
improve strategy-wise performance, while increasing attention heads improve the
model performance insignificantly and applying the 'layer norm first' method do
not boost the model performance in our case.

arXiv link: http://arxiv.org/abs/2505.01575v2

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2025-05-02

Multiscale Causal Analysis of Market Efficiency via News Uncertainty Networks and the Financial Chaos Index

Authors: Masoud Ataei

This study evaluates the scale-dependent informational efficiency of stock
markets using the Financial Chaos Index, a tensor-eigenvalue-based measure of
realized volatility. Incorporating Granger causality and network-theoretic
analysis across a range of economic, policy, and news-based uncertainty
indices, we assess whether public information is efficiently incorporated into
asset price fluctuations. Based on a 34-year time period from 1990 to 2023, at
the daily frequency, the semi-strong form of the Efficient Market Hypothesis is
rejected at the 1% level of significance, indicating that asset price changes
respond predictably to lagged news-based uncertainty. In contrast, at the
monthly frequency, such predictive structure largely vanishes, supporting
informational efficiency at coarser temporal resolutions. A structural analysis
of the Granger causality network reveals that fiscal and monetary policy
uncertainties act as core initiators of systemic volatility, while peripheral
indices, such as those related to healthcare and consumer prices, serve as
latent bridges that become activated under crisis conditions. These findings
underscore the role of time-scale decomposition and structural asymmetries in
diagnosing market inefficiencies and mapping the propagation of macro-financial
uncertainty.

arXiv link: http://arxiv.org/abs/2505.01543v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-05-02

Predicting the Price of Gold in the Financial Markets Using Hybrid Models

Authors: Mohammadhossein Rashidi, Mohammad Modarres

Predicting the price that has the least error and can provide the best and
highest accuracy has been one of the most challenging issues and one of the
most critical concerns among capital market activists and researchers.
Therefore, a model that can solve problems and provide results with high
accuracy is one of the topics of interest among researchers. In this project,
using time series prediction models such as ARIMA to estimate the price,
variables, and indicators related to technical analysis show the behavior of
traders involved in involving psychological factors for the model. By linking
all of these variables to stepwise regression, we identify the best variables
influencing the prediction of the variable. Finally, we enter the selected
variables as inputs to the artificial neural network. In other words, we want
to call this whole prediction process the "ARIMA_Stepwise Regression_Neural
Network" model and try to predict the price of gold in international financial
markets. This approach is expected to be able to be used to predict the types
of stocks, commodities, currency pairs, financial market indicators, and other
items used in local and international financial markets. Moreover, a comparison
between the results of this method and time series methods is also expressed.
Finally, based on the results, it can be seen that the resulting hybrid model
has the highest accuracy compared to the time series method, regression, and
stepwise regression.

arXiv link: http://arxiv.org/abs/2505.01402v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-05-02

Design-Based Inference under Random Potential Outcomes via Riesz Representation

Authors: Yukai Yang

We introduce a design-based framework for causal inference that accommodates
random potential outcomes, thereby extending the classical Neyman-Rubin model
in which outcomes are treated as fixed. Each unit's potential outcome is
modelled as a structural mapping $y_i(z, \omega)$, where $z$ denotes
the treatment assignment and \(\omega\) represents latent outcome-level
randomness. Inspired by recent connections between design-based inference and
the Riesz representation theorem, we embed potential outcomes in a Hilbert
space and define treatment effects as linear functionals, yielding estimators
constructed via their Riesz representers. This approach preserves the core
identification logic of randomised assignment while enabling valid inference
under stochastic outcome variation. We establish large-sample properties under
local dependence and develop consistent variance estimators that remain valid
under weaker structural assumptions, including partially known dependence. A
simulation study illustrates the robustness and finite-sample behaviour of the
estimators. Overall, the framework unifies design-based reasoning with
stochastic outcome modelling, broadening the scope of causal inference in
complex experimental settings.

arXiv link: http://arxiv.org/abs/2505.01324v5

Econometrics arXiv updated paper (originally submitted: 2025-05-02)

Detecting multiple change points in linear models with heteroscedastic errors

Authors: Lajos Horvath, Gregory Rice, Yuqian Zhao

The problem of detecting change points in the regression parameters of a
linear regression model with errors and covariates exhibiting
heteroscedasticity is considered. Asymptotic results for weighted functionals
of the cumulative sum (CUSUM) processes of model residuals are established when
the model errors are weakly dependent and non-stationary, allowing for either
abrupt or smooth changes in their variance. These theoretical results
illuminate how to adapt standard change point test statistics for linear models
to this setting. We studied such adapted change-point tests in simulation
experiments, along with a finite sample adjustment to the proposed testing
procedures. The results suggest that these methods perform well in practice for
detecting multiple change points in the linear model parameters and controlling
the Type I error rate in the presence of heteroscedasticity. We illustrate the
use of these approaches in applications to test for instability in predictive
regression models and explanatory asset pricing models.

arXiv link: http://arxiv.org/abs/2505.01296v2

Econometrics arXiv paper, submitted: 2025-05-02

Model Checks in a Kernel Ridge Regression Framework

Authors: Yuhao Li

We propose new reproducing kernel-based tests for model checking in
conditional moment restriction models. By regressing estimated residuals on
kernel functions via kernel ridge regression (KRR), we obtain a coefficient
function in a reproducing kernel Hilbert space (RKHS) that is zero if and only
if the model is correctly specified. We introduce two classes of test
statistics: (i) projection-based tests, using RKHS inner products to capture
global deviations, and (ii) random location tests, evaluating the KRR estimator
at randomly chosen covariate points to detect local departures. The tests are
consistent against fixed alternatives and sensitive to local alternatives at
the $n^{-1/2}$ rate. When nuisance parameters are estimated, Neyman
orthogonality projections ensure valid inference without repeated estimation in
bootstrap samples. The random location tests are interpretable and can
visualize model misspecification. Simulations show strong power and size
control, especially in higher dimensions, outperforming existing methods.

arXiv link: http://arxiv.org/abs/2505.01161v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-05-01

Proper Correlation Coefficients for Nominal Random Variables

Authors: Jan-Lukas Wermuth

This paper develops an intuitive concept of perfect dependence between two
variables of which at least one has a nominal scale that is attainable for all
marginal distributions and proposes a set of dependence measures that are 1 if
and only if this perfect dependence is satisfied. The advantages of these
dependence measures relative to classical dependence measures like contingency
coefficients, Goodman-Kruskal's lambda and tau and the so-called uncertainty
coefficient are twofold. Firstly, they are defined if one of the variables is
real-valued and exhibits continuities. Secondly, they satisfy the property of
attainability. That is, they can take all values in the interval [0,1]
irrespective of the marginals involved. Both properties are not shared by the
classical dependence measures which need two discrete marginal distributions
and can in some situations yield values close to 0 even though the dependence
is strong or even perfect.
Additionally, I provide a consistent estimator for one of the new dependence
measures together with its asymptotic distribution under independence as well
as in the general case. This allows to construct confidence intervals and an
independence test, whose finite sample performance I subsequently examine in a
simulation study. Finally, I illustrate the use of the new dependence measure
in two applications on the dependence between the variables country and income
or country and religion, respectively.

arXiv link: http://arxiv.org/abs/2505.00785v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-05-01

Explainable AI in Spatial Analysis

Authors: Ziqi Li

This chapter discusses the opportunities of eXplainable Artificial
Intelligence (XAI) within the realm of spatial analysis. A key objective in
spatial analysis is to model spatial relationships and infer spatial processes
to generate knowledge from spatial data, which has been largely based on
spatial statistical methods. More recently, machine learning offers scalable
and flexible approaches that complement traditional methods and has been
increasingly applied in spatial data science. Despite its advantages, machine
learning is often criticized for being a black box, which limits our
understanding of model behavior and output. Recognizing this limitation, XAI
has emerged as a pivotal field in AI that provides methods to explain the
output of machine learning models to enhance transparency and understanding.
These methods are crucial for model diagnosis, bias detection, and ensuring the
reliability of results obtained from machine learning models. This chapter
introduces key concepts and methods in XAI with a focus on Shapley value-based
approaches, which is arguably the most popular XAI method, and their
integration with spatial analysis. An empirical example of county-level voting
behaviors in the 2020 Presidential election is presented to demonstrate the use
of Shapley values and spatial analysis with a comparison to multi-scale
geographically weighted regression. The chapter concludes with a discussion on
the challenges and limitations of current XAI techniques and proposes new
directions.

arXiv link: http://arxiv.org/abs/2505.00591v1

Econometrics arXiv updated paper (originally submitted: 2025-05-01)

Pre-Training Estimators for Structural Models: Application to Consumer Search

Authors: Yanhao 'Max' Wei, Zhenling Jiang

We explore pretraining estimators for structural econometric models. The
estimator is "pretrained" in the sense that the bulk of the computational cost
and researcher effort occur during the construction of the estimator.
Subsequent applications of the estimator to different datasets require little
computational cost or researcher effort. The estimation leverages a neural net
to recognize the structural model's parameter from data patterns. As an initial
trial, this paper builds a pretrained estimator for a sequential search model
that is known to be difficult to estimate. We evaluate the pretrained estimator
on 12 real datasets. The estimation takes seconds to run and shows high
accuracy. We provide the estimator at pnnehome.github.io. More generally,
pretrained, off-the-shelf estimators can make structural models more accessible
to researchers and practitioners.

arXiv link: http://arxiv.org/abs/2505.00526v3

Econometrics arXiv updated paper (originally submitted: 2025-05-01)

A Unifying Framework for Robust and Efficient Inference with Unstructured Data

Authors: Jacob Carlson, Melissa Dell

This paper presents a general framework for conducting efficient inference on
parameters derived from unstructured data, which include text, images, audio,
and video. Economists have long used unstructured data by first extracting
low-dimensional structured features (e.g., the topic or sentiment of a text),
since the raw data are too high-dimensional and uninterpretable to include
directly in empirical analyses. The rise of deep neural networks has
accelerated this practice by greatly reducing the costs of extracting
structured data at scale, but neural networks do not make generically unbiased
predictions. This potentially propagates bias to the downstream estimators that
incorporate imputed structured data, and the availability of different
off-the-shelf neural networks with different biases moreover raises p-hacking
concerns. To address these challenges, we reframe inference with unstructured
data as a problem of missing structured data, where structured variables are
imputed from high-dimensional unstructured inputs. This perspective allows us
to apply classic results from semiparametric inference, leading to estimators
that are valid, efficient, and robust. We formalize this approach with MAR-S, a
framework that unifies and extends existing methods for debiased inference
using machine learning predictions, connecting them to familiar problems such
as causal inference. Within this framework, we develop robust and efficient
estimators for both descriptive and causal estimands and address challenges
like inference with aggregated and transformed missing structured data-a common
scenario that is not covered by existing work. These methods-and the
accompanying implementation package-provide economists with accessible tools
for constructing unbiased estimators using unstructured data in a wide range of
applications, as we demonstrate by re-analyzing several influential studies.

arXiv link: http://arxiv.org/abs/2505.00282v2

Econometrics arXiv paper, submitted: 2025-05-01

Policy Learning with $α$-Expected Welfare

Authors: Yanqin Fan, Yuan Qi, Gaoqian Xu

This paper proposes an optimal policy that targets the average welfare of the
worst-off $\alpha$-fraction of the post-treatment outcome distribution. We
refer to this policy as the $\alpha$-Expected Welfare Maximization
($\alpha$-EWM) rule, where $\alpha \in (0,1]$ denotes the size of the
subpopulation of interest. The $\alpha$-EWM rule interpolates between the
expected welfare ($\alpha=1$) and the Rawlsian welfare ($\alpha\rightarrow 0$).
For $\alpha\in (0,1)$, an $\alpha$-EWM rule can be interpreted as a
distributionally robust EWM rule that allows the target population to have a
different distribution than the study population. Using the dual formulation of
our $\alpha$-expected welfare function, we propose a debiased estimator for the
optimal policy and establish its asymptotic upper regret bounds. In addition,
we develop asymptotically valid inference for the optimal welfare based on the
proposed debiased estimator. We examine the finite sample performance of the
debiased estimator and inference via both real and synthetic data.

arXiv link: http://arxiv.org/abs/2505.00256v1

Econometrics arXiv paper, submitted: 2025-04-30

On the Robustness of Mixture Models in the Presence of Hidden Markov Regimes with Covariate-Dependent Transition Probabilities

Authors: Demian Pouzo, Martin Sola, Zacharias Psaradakis

This paper studies the robustness of quasi-maximum-likelihood (QML)
estimation in hidden Markov models (HMMs) when the regime-switching structure
is misspecified. Specifically, we examine the case where the true
data-generating process features a hidden Markov regime sequence with
covariate-dependent transition probabilities, but estimation proceeds under a
simplified mixture model that assumes regimes are independent and identically
distributed. We show that the parameters governing the conditional distribution
of the observables can still be consistently estimated under this
misspecification, provided certain regularity conditions hold. Our results
highlight a practical benefit of using computationally simpler mixture models
in settings where regime dependence is complex or difficult to model directly.

arXiv link: http://arxiv.org/abs/2504.21669v1

Econometrics arXiv paper, submitted: 2025-04-30

Real-time Program Evaluation using Anytime-valid Rank Tests

Authors: Sam van Meer, Nick W. Koning

Counterfactual mean estimators such as difference-in-differences and
synthetic control have grown into workhorse tools for program evaluation.
Inference for these estimators is well-developed in settings where all
post-treatment data is available at the time of analysis. However, in settings
where data arrives sequentially, these tests do not permit real-time inference,
as they require a pre-specified sample size T. We introduce real-time inference
for program evaluation through anytime-valid rank tests. Our methodology relies
on interpreting the absence of a treatment effect as exchangeability of the
treatment estimates. We then convert these treatment estimates into sequential
ranks, and construct optimal finite-sample valid sequential tests for
exchangeability. We illustrate our methods in the context of
difference-in-differences and synthetic control. In simulations, they control
size even under mild exchangeability violations. While our methods suffer
slight power loss at T, they allow for early rejection (before T) and preserve
the ability to reject later (after T).

arXiv link: http://arxiv.org/abs/2504.21595v1

Econometrics arXiv updated paper (originally submitted: 2025-04-29)

Publication Design with Incentives in Mind

Authors: Ravi Jagadeesan, Davide Viviano

The publication process both determines which research receives the most
attention, and influences the supply of research through its impact on
researchers' private incentives. We introduce a framework to study optimal
publication decisions when researchers can choose (i) whether or how to conduct
a study and (ii) whether or how to manipulate the research findings (e.g., via
selective reporting or data manipulation). When manipulation is not possible,
but research entails substantial private costs for the researchers, it may be
optimal to incentivize cheaper research designs even if they are less accurate.
When manipulation is possible, it is optimal to publish some manipulated
results, as well as results that would have not received attention in the
absence of manipulability. Even if it is possible to deter manipulation, such
as by requiring pre-registered experiments instead of (potentially manipulable)
observational studies, it is suboptimal to do so when experiments entail high
research costs. We illustrate the implications of our model in an application
to medical studies.

arXiv link: http://arxiv.org/abs/2504.21156v2

Econometrics arXiv updated paper (originally submitted: 2025-04-29)

An Axiomatic Approach to Comparing Sensitivity Parameters

Authors: Paul Diegert, Matthew A. Masten, Alexandre Poirier

Many methods are available for assessing the importance of omitted variables.
These methods typically make different, non-falsifiable assumptions. Hence the
data alone cannot tell us which method is most appropriate. Since it is
unreasonable to expect results to be robust against all possible robustness
checks, researchers often use methods deemed "interpretable", a subjective
criterion with no formal definition. In contrast, we develop the first formal,
axiomatic framework for comparing and selecting among these methods. Our
framework is analogous to the standard approach for comparing estimators based
on their sampling distributions. We propose that sensitivity parameters be
selected based on their covariate sampling distributions, a design distribution
of parameter values induced by an assumption on how covariates are assigned to
be observed or unobserved. Using this idea, we define a new concept of
parameter consistency, and argue that a reasonable sensitivity parameter should
be consistent. We prove that the literature's most popular approach is
inconsistent, while several alternatives are consistent.

arXiv link: http://arxiv.org/abs/2504.21106v2

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2025-04-29

Construct to Commitment: The Effect of Narratives on Economic Growth

Authors: Hanyuan Jiang, Yi Man

We study how government-led narratives through mass media evolve from
construct, a mechanism for framing expectations, into commitment, a sustainable
pillar for growth. We propose the "Narratives-Construct-Commitment (NCC)"
framework outlining the mechanism and institutionalization of narratives, and
formalize it as a dynamic Bayesian game. Using the Innovation-Driven
Development Strategy (2016) as a case study, we identify the narrative shock
from high-frequency financial data and trace its impact using local projection
method. By shaping expectations, credible narratives institutionalize
investment incentives, channel resources into R&D, and facilitate sustained
improvements in total factor productivity (TFP). Our findings strive to provide
insights into the New Quality Productive Forces initiative, highlighting the
role of narratives in transforming vision into tangible economic growth.

arXiv link: http://arxiv.org/abs/2504.21060v3

Econometrics arXiv updated paper (originally submitted: 2025-04-28)

Inference with few treated units

Authors: Luis Alvarez, Bruno Ferman, Kaspar Wüthrich

In many causal inference applications, only one or a few units (or clusters
of units) are treated. An important challenge in such settings is that standard
inference methods that rely on asymptotic theory may be unreliable, even when
the total number of units is large. This survey reviews and categorizes
inference methods that are designed to accommodate few treated units,
considering both cross-sectional and panel data methods. We discuss trade-offs
and connections between different approaches. In doing so, we propose slight
modifications to improve the finite-sample validity of some methods, and we
also provide theoretical justifications for existing heuristic approaches that
have been proposed in the literature.

arXiv link: http://arxiv.org/abs/2504.19841v2

Econometrics arXiv updated paper (originally submitted: 2025-04-28)

Assignment at the Frontier: Identifying the Frontier Structural Function and Bounding Mean Deviations

Authors: Dan Ben-Moshe, David Genesove

This paper analyzes a model in which an outcome equals a frontier function of
inputs minus a nonnegative unobserved deviation. We allow the distribution of
the deviation to depend on inputs. If zero lies in the support of the deviation
given inputs -- an assumption we term assignment at the frontier -- then the
frontier is identified by the supremum of the outcome at those inputs,
obviating the need for instrumental variables. We then estimate the frontier,
allowing for random error whose distribution may also depend on inputs.
Finally, we derive a lower bound on the mean deviation, using only variance and
skewness, that is robust to a scarcity of data near the frontier. We apply our
methods to estimate a firm-level frontier production function and mean
inefficiency.

arXiv link: http://arxiv.org/abs/2504.19832v5

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-04-26

Finite-Sample Properties of Generalized Ridge Estimators in Nonlinear Models

Authors: Masamune Iwasawa

This paper addresses the longstanding challenge of analyzing the mean squared
error (MSE) of ridge-type estimators in nonlinear models, including duration,
Poisson, and multinomial choice models, where theoretical results have been
scarce. Using a finite-sample approximation technique from the econometrics
literature, we derive new results showing that the generalized ridge maximum
likelihood estimator (MLE) with a sufficiently small penalty achieves lower
finite-sample MSE for both estimation and prediction than the conventional MLE,
regardless of whether the hypotheses incorporated in the penalty are valid. A
key theoretical contribution is to demonstrate that generalized ridge
estimators generate a variance-bias trade-off in the first-order MSE of
nonlinear likelihood-based models -- a feature absent for the conventional MLE
-- which enables ridge-type estimators to attain smaller MSE when the penalty
is properly selected. Extensive simulations and an empirical application to the
estimation of marginal mean and quantile treatment effects further confirm the
superior performance and practical relevance of the proposed method.

arXiv link: http://arxiv.org/abs/2504.19018v2

Econometrics arXiv paper, submitted: 2025-04-26

Inference in High-Dimensional Panel Models: Two-Way Dependence and Unobserved Heterogeneity

Authors: Kaicheng Chen

Panel data allows for the modeling of unobserved heterogeneity, significantly
raising the number of nuisance parameters and making high dimensionality a
practical issue. Meanwhile, temporal and cross-sectional dependence in panel
data further complicates high-dimensional estimation and inference. This paper
proposes a toolkit for high-dimensional panel models with large cross-sectional
and time sample sizes. To reduce the dimensionality, I propose a weighted LASSO
using two-way cluster-robust penalty weights. Although consistent, the
convergence rate of LASSO is slow due to the cluster dependence, rendering
inference challenging in general. Nevertheless, asymptotic normality can be
established in a semiparametric moment-restriction model by leveraging a
clustered-panel cross-fitting approach and, as a special case, in a partial
linear model using the full sample. In a panel estimation of the government
spending multiplier, I demonstrate how high dimensionality could be hidden and
how the proposed toolkit enables flexible modeling and robust inference.

arXiv link: http://arxiv.org/abs/2504.18772v1

Econometrics arXiv paper, submitted: 2025-04-25

Regularized Generalized Covariance (RGCov) Estimator

Authors: Francesco Giancaterini, Alain Hecq, Joann Jasiak, Aryan Manafi Neyazi

We introduce a regularized Generalized Covariance (RGCov) estimator as an
extension of the GCov estimator to high dimensional setting that results either
from high-dimensional data or a large number of nonlinear transformations used
in the objective function. The approach relies on a ridge-type regularization
for high-dimensional matrix inversion in the objective function of the GCov.
The RGCov estimator is consistent and asymptotically normally distributed. We
provide the conditions under which it can reach semiparametric efficiency and
discuss the selection of the optimal regularization parameter. We also examine
the diagonal GCov estimator, which simplifies the computation of the objective
function. The GCov-based specification test, and the test for nonlinear serial
dependence (NLSD) are extended to the regularized RGCov specification and RNLSD
tests with asymptotic Chi-square distributions. Simulation studies show that
the RGCov estimator and the regularized tests perform well in the high
dimensional setting. We apply the RGCov to estimate the mixed causal and
noncausal VAR model of stock prices of green energy companies.

arXiv link: http://arxiv.org/abs/2504.18678v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-04-23

Common Functional Decompositions Can Mis-attribute Differences in Outcomes Between Populations

Authors: Manuel Quintero, William T. Stephenson, Advik Shreekumar, Tamara Broderick

In science and social science, we often wish to explain why an outcome is
different in two populations. For instance, if a jobs program benefits members
of one city more than another, is that due to differences in program
participants (particular covariates) or the local labor markets (outcomes given
covariates)? The Kitagawa-Oaxaca-Blinder (KOB) decomposition is a standard tool
in econometrics that explains the difference in the mean outcome across two
populations. However, the KOB decomposition assumes a linear relationship
between covariates and outcomes, while the true relationship may be
meaningfully nonlinear. Modern machine learning boasts a variety of nonlinear
functional decompositions for the relationship between outcomes and covariates
in one population. It seems natural to extend the KOB decomposition using these
functional decompositions. We observe that a successful extension should not
attribute the differences to covariates -- or, respectively, to outcomes given
covariates -- if those are the same in the two populations. Unfortunately, we
demonstrate that, even in simple examples, two common decompositions --
functional ANOVA and Accumulated Local Effects -- can attribute differences to
outcomes given covariates, even when they are identical in two populations. We
provide a characterization of when functional ANOVA misattributes, as well as a
general property that any discrete decomposition must satisfy to avoid
misattribution. We show that if the decomposition is independent of its input
distribution, it does not misattribute. We further conjecture that
misattribution arises in any reasonable additive decomposition that depends on
the distribution of the covariates.

arXiv link: http://arxiv.org/abs/2504.16864v1

Econometrics arXiv paper, submitted: 2025-04-23

MLOps Monitoring at Scale for Digital Platforms

Authors: Yu Jeffrey Hu, Jeroen Rombouts, Ines Wilms

Machine learning models are widely recognized for their strong performance in
forecasting. To keep that performance in streaming data settings, they have to
be monitored and frequently re-trained. This can be done with machine learning
operations (MLOps) techniques under supervision of an MLOps engineer. However,
in digital platform settings where the number of data streams is typically
large and unstable, standard monitoring becomes either suboptimal or too labor
intensive for the MLOps engineer. As a consequence, companies often fall back
on very simple worse performing ML models without monitoring. We solve this
problem by adopting a design science approach and introducing a new monitoring
framework, the Machine Learning Monitoring Agent (MLMA), that is designed to
work at scale for any ML model with reasonable labor cost. A key feature of our
framework concerns test-based automated re-training based on a data-adaptive
reference loss batch. The MLOps engineer is kept in the loop via key metrics
and also acts, pro-actively or retrospectively, to maintain performance of the
ML model in the production stage. We conduct a large-scale test at a last-mile
delivery platform to empirically validate our monitoring framework.

arXiv link: http://arxiv.org/abs/2504.16789v1

Econometrics arXiv updated paper (originally submitted: 2025-04-23)

Evaluating Meta-Regression Techniques: A Simulation Study on Heterogeneity in Location and Time

Authors: Ali Habibnia, Jonathan Gendron

In this paper, we conduct a simulation study with subject-level data to
evaluate conventional meta-regression approaches (study-level random, fixed,
and mixed effects) against seven methodology specifications new to
meta-regressions that control joint heterogeneity in location and time
(including a new one that we introduce). We systematically vary heterogeneity
levels to assess statistical power, estimator bias and model robustness for
each methodology specification. This assessment focuses on three aspects:
performance under joint heterogeneity in location and time, the effectiveness
of our proposed settings incorporating location fixed effects and study-level
fixed effects with a time trend, as well as guidelines for model selection. The
results show that jointly modeling heterogeneity when heterogeneity is in both
dimensions improves performance compared to modeling only one type of
heterogeneity.

arXiv link: http://arxiv.org/abs/2504.16696v2

Econometrics arXiv paper, submitted: 2025-04-23

Dynamic Discrete-Continuous Choice Models: Identification and Conditional Choice Probability Estimation

Authors: Christophe Bruneel-Zupanc

This paper develops a general framework for dynamic models in which
individuals simultaneously make both discrete and continuous choices. The
framework incorporates a wide range of unobserved heterogeneity. I show that
such models are nonparametrically identified. Based on constructive
identification arguments, I build a novel two-step estimation method in the
lineage of Hotz and Miller (1993) and Arcidiacono and Miller (2011) but
extended to simultaneous discrete-continuous choice. In the first step, I
recover the (type-dependent) optimal choices with an expectation-maximization
algorithm and instrumental variable quantile regression. In the second step, I
estimate the primitives of the model taking the estimated optimal choices as
given. The method is especially attractive for complex dynamic models because
it significantly reduces the computational burden associated with their
estimation compared to alternative full solution methods.

arXiv link: http://arxiv.org/abs/2504.16630v1

Econometrics arXiv paper, submitted: 2025-04-19

Global identification of dynamic panel models with interactive effects

Authors: Jushan Bai, Pablo Mones

This paper examines the problem of global identification in dynamic panel
models with interactive effects, a fundamental issue in econometric theory. We
focus on the setting where the number of cross-sectional units (N) is large,
but the time dimension (T) remains fixed. While local identification based on
the Jacobian matrix is well understood and relatively straightforward to
establish, achieving global identification remains a significant challenge.
Under a set of mild and easily satisfied conditions, we demonstrate that the
parameters of the model are globally identified, ensuring that no two distinct
parameter values generate the same probability distribution of the observed
data. Our findings contribute to the broader literature on identification in
panel data models and have important implications for empirical research that
relies on interactive effects.

arXiv link: http://arxiv.org/abs/2504.14354v1

Econometrics arXiv updated paper (originally submitted: 2025-04-19)

Finite Population Identification and Design-Based Sensitivity Analysis

Authors: Brendan Kline, Matthew A. Masten

We develop a new approach for quantifying uncertainty in finite populations,
by using design distributions to calibrate sensitivity parameters in finite
population identified sets. This yields uncertainty intervals that can be
interpreted as identified sets, Bayesian credible sets, or frequentist
design-based confidence sets. We focus on quantifying uncertainty about the
average treatment effect (ATE) due to missing potential outcomes in a
randomized experiment, where our approach (1) yields design-based confidence
intervals for ATE which allow for heterogeneous treatment effects but do not
rely on asymptotics, (2) provides a new motivation for examining covariate
balance, and (3) gives a new formal analysis of the role of randomized
treatment assignment. We illustrate our approach in three empirical
applications.

arXiv link: http://arxiv.org/abs/2504.14127v2

Econometrics arXiv updated paper (originally submitted: 2025-04-18)

Projection Inference for set-identified SVARs

Authors: Bulat Gafarov, Matthias Meier, José Luis Montiel Olea

We study the properties of projection inference for set-identified Structural
Vector Autoregressions. A nominal $1-\alpha$ projection region collects the
structural parameters that are compatible with a $1-\alpha$ Wald ellipsoid for
the model's reduced-form parameters (autoregressive coefficients and the
covariance matrix of residuals).
We show that projection inference can be applied to a general class of
stationary models, is computationally feasible, and -- as the sample size grows
large -- it produces regions for the structural parameters and their identified
set with both frequentist coverage and robust Bayesian credibility of at
least $1-\alpha$.
A drawback of the projection approach is that both coverage and robust
credibility may be strictly above their nominal level. Following the work of
Kaido_Molinari_Stoye:2014, we `calibrate' the radius of the Wald
ellipsoid to guarantee that -- for a given posterior on the reduced-form
parameters -- the robust Bayesian credibility of the projection method is
exactly $1-\alpha$. If the bounds of the identified set are differentiable, our
calibrated projection also covers the identified set with probability
$1-\alpha$. %eliminating the excess of robust Bayesian credibility also
eliminates excessive frequentist coverage.
We illustrate the main results of the paper using the demand/supply-model for
the U.S. labor market in Baumeister_Hamilton(2015)

arXiv link: http://arxiv.org/abs/2504.14106v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-04-18

Bayesian Model Averaging in Causal Instrumental Variable Models

Authors: Gregor Steiner, Mark Steel

Instrumental variables are a popular tool to infer causal effects under
unobserved confounding, but choosing suitable instruments is challenging in
practice. We propose gIVBMA, a Bayesian model averaging procedure that
addresses this challenge by averaging across different sets of instrumental
variables and covariates in a structural equation model. Our approach extends
previous work through a scale-invariant prior structure and accommodates
non-Gaussian outcomes and treatments, offering greater flexibility than
existing methods. The computational strategy uses conditional Bayes factors to
update models separately for the outcome and treatments. We prove that this
model selection procedure is consistent. By explicitly accounting for model
uncertainty, gIVBMA allows instruments and covariates to switch roles and
provides robustness against invalid instruments. In simulation experiments,
gIVBMA outperforms current state-of-the-art methods. We demonstrate its
usefulness in two empirical applications: the effects of malaria and
institutions on income per capita and the returns to schooling. A software
implementation of gIVBMA is available in Julia.

arXiv link: http://arxiv.org/abs/2504.13520v4

Econometrics arXiv paper, submitted: 2025-04-17

Using Multiple Outcomes to Adjust Standard Errors for Spatial Correlation

Authors: Stefano DellaVigna, Guido Imbens, Woojin Kim, David M. Ritzwoller

Empirical research in economics often examines the behavior of agents located
in a geographic space. In such cases, statistical inference is complicated by
the interdependence of economic outcomes across locations. A common approach to
account for this dependence is to cluster standard errors based on a predefined
geographic partition. A second strategy is to model dependence in terms of the
distance between units. Dependence, however, does not necessarily stop at
borders and is typically not determined by distance alone. This paper
introduces a method that leverages observations of multiple outcomes to adjust
standard errors for cross-sectional dependence. Specifically, a researcher,
while interested in a particular outcome variable, often observes dozens of
other variables for the same units. We show that these outcomes can be used to
estimate dependence under the assumption that the cross-sectional correlation
structure is shared across outcomes. We develop a procedure, which we call
Thresholding Multiple Outcomes (TMO), that uses this estimate to adjust
standard errors in a given regression setting. We show that adjustments of this
form can lead to sizable reductions in the bias of standard errors in
calibrated U.S. county-level regressions. Re-analyzing nine recent papers, we
find that the proposed correction can make a substantial difference in
practice.

arXiv link: http://arxiv.org/abs/2504.13295v1

Econometrics arXiv updated paper (originally submitted: 2025-04-17)

How Much Weak Overlap Can Doubly Robust T-Statistics Handle?

Authors: Jacob Dorn

In the presence of sufficiently weak overlap, it is known that no regular
root-n-consistent estimators exist and standard estimators may fail to be
asymptotically normal. This paper shows that a thresholded version of the
standard doubly robust estimator is asymptotically normal with well-calibrated
Wald confidence intervals even when constructed using nonparametric estimates
of the propensity score and conditional mean outcome. The analysis implies a
cost of weak overlap in terms of black-box nuisance rates, borne when the
semiparametric bound is infinite, and the contribution of outcome smoothness to
the outcome regression rate, which is incurred even when the semiparametric
bound is finite. As a byproduct of this analysis, I show that under weak
overlap, the optimal global regression rate is the same as the optimal
pointwise regression rate, without the usual polylogarithmic penalty. The
high-level conditions yield new rules of thumb for thresholding in practice. In
simulations, thresholded AIPW can exhibit moderate overrejection in small
samples, but I am unable to reject a null hypothesis of exact coverage in large
samples. In an empirical application, the clipped AIPW estimator that targets
the standard average treatment effect yields similar precision to a heuristic
10% fixed-trimming approach that changes the target sample.

arXiv link: http://arxiv.org/abs/2504.13273v2

Econometrics arXiv cross-link from q-bio.PE (q-bio.PE), submitted: 2025-04-17

Anemia, weight, and height among children under five in Peru from 2007 to 2022: A Panel Data analysis

Authors: Luis-Felipe Arizmendi, Carlos De la Torre-Domingo, Erick W. Rengifo

Econometrics in general, and Panel Data methods in particular, are becoming
crucial in Public Health Economics and Social Policy analysis. In this
discussion paper, we employ a helpful approach of Feasible Generalized Least
Squares (FGLS) to assess if there are statistically relevant relationships
between hemoglobin (adjusted to sea-level), weight, and height from 2007 to
2022 in children up to five years of age in Peru. By using this method, we may
find a tool that allows us to confirm if the relationships considered between
the target variables by the Peruvian agencies and authorities are in the right
direction to fight against chronic malnutrition and stunting.

arXiv link: http://arxiv.org/abs/2504.12888v1

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2025-04-17

The heterogeneous causal effects of the EU's Cohesion Fund

Authors: Angelos Alexopoulos, Ilias Kostarakos, Christos Mylonakis, Petros Varthalitis

This paper quantifies the causal effect of cohesion policy on EU regional
output and investment focusing on one of its least studied instruments, i.e.,
the Cohesion Fund (CF). We employ modern causal inference methods to estimate
not only the local average treatment effect but also its time-varying and
heterogeneous effects across regions. Utilizing this method, we propose a novel
framework for evaluating the effectiveness of CF as an EU cohesion policy tool.
Specifically, we estimate the time varying distribution of the CF's causal
effects across EU regions and derive key distribution metrics useful for policy
evaluation. Our analysis shows that relying solely on average treatment effects
masks significant heterogeneity and can lead to misleading conclusions about
the effectiveness of the EU's cohesion policy. We find that the impact of the
CF is frontloaded, peaking within the first seven years after a region's
initial inclusion in the program. The distribution of the effects during this
first seven-year cycle of funding is right skewed with relatively thick tails.
This indicates positive effects but unevenly distributed across regions.
Moreover, the magnitude of the CF effect is inversely related to a region's
relative position in the initial distribution of output, i.e., relatively
poorer recipient regions experience higher effects compared to relatively
richer regions. Finally, we find a non-linear relationship with diminishing
returns, whereby the impact of CF declines as the ratio of CF funds received to
a region's gross value added (GVA) increases.

arXiv link: http://arxiv.org/abs/2504.13223v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-04-16

Can Moran Eigenvectors Improve Machine Learning of Spatial Data? Insights from Synthetic Data Validation

Authors: Ziqi Li, Zhan Peng

Moran Eigenvector Spatial Filtering (ESF) approaches have shown promise in
accounting for spatial effects in statistical models. Can this extend to
machine learning? This paper examines the effectiveness of using Moran
Eigenvectors as additional spatial features in machine learning models. We
generate synthetic datasets with known processes involving spatially varying
and nonlinear effects across two different geometries. Moran Eigenvectors
calculated from different spatial weights matrices, with and without a priori
eigenvector selection, are tested. We assess the performance of popular machine
learning models, including Random Forests, LightGBM, XGBoost, and TabNet, and
benchmark their accuracies in terms of cross-validated R2 values against models
that use only coordinates as features. We also extract coefficients and
functions from the models using GeoShapley and compare them with the true
processes. Results show that machine learning models using only location
coordinates achieve better accuracies than eigenvector-based approaches across
various experiments and datasets. Furthermore, we discuss that while these
findings are relevant for spatial processes that exhibit positive spatial
autocorrelation, they do not necessarily apply when modeling network
autocorrelation and cases with negative spatial autocorrelation, where Moran
Eigenvectors would still be useful.

arXiv link: http://arxiv.org/abs/2504.12450v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-04-13

Ordinary Least Squares as an Attention Mechanism

Authors: Philippe Goulet Coulombe

I show that ordinary least squares (OLS) predictions can be rewritten as the
output of a restricted attention module, akin to those forming the backbone of
large language models. This connection offers an alternative perspective on
attention beyond the conventional information retrieval framework, making it
more accessible to researchers and analysts with a background in traditional
statistics. It falls into place when OLS is framed as a similarity-based method
in a transformed regressor space, distinct from the standard view based on
partial correlations. In fact, the OLS solution can be recast as the outcome of
an alternative problem: minimizing squared prediction errors by optimizing the
embedding space in which training and test vectors are compared via inner
products. Rather than estimating coefficients directly, we equivalently learn
optimal encoding and decoding operations for predictors. From this vantage
point, OLS maps naturally onto the query-key-value structure of attention
mechanisms. Building on this foundation, I discuss key elements of
Transformer-style attention and draw connections to classic ideas from time
series econometrics.

arXiv link: http://arxiv.org/abs/2504.09663v1

Econometrics arXiv paper, submitted: 2025-04-11

Robust Tests for Factor-Augmented Regressions with an Application to the novel EA-MD Dataset

Authors: Alessandro Morico, Ovidijus Stauskas

We present four novel tests of equal predictive accuracy and encompassing for
out-of-sample forecasts based on factor-augmented regression. We extend the
work of Pitarakis (2023a,b) to develop the inferential theory of predictive
regressions with generated regressors which are estimated by using Common
Correlated Effects (henceforth CCE) - a technique that utilizes cross-sectional
averages of grouped series. It is particularly useful since large datasets of
such structure are becoming increasingly popular. Under our framework,
CCE-based tests are asymptotically normal and robust to overspecification of
the number of factors, which is in stark contrast to existing methodologies in
the CCE context. Our tests are highly applicable in practice as they
accommodate for different predictor types (e.g., stationary and highly
persistent factors), and remain invariant to the location of structural breaks
in loadings. Extensive Monte Carlo simulations indicate that our tests exhibit
excellent local power properties. Finally, we apply our tests to a novel
EA-MD-QD dataset by Barigozzi et al. (2024b), which covers Euro Area as a whole
and primary member countries. We demonstrate that CCE factors offer a
substantial predictive power even under varying data persistence and structural
breaks.

arXiv link: http://arxiv.org/abs/2504.08455v1

Econometrics arXiv paper, submitted: 2025-04-11

An Introduction to Double/Debiased Machine Learning

Authors: Achim Ahrens, Victor Chernozhukov, Christian Hansen, Damian Kozbur, Mark Schaffer, Thomas Wiemann

This paper provides a practical introduction to Double/Debiased Machine
Learning (DML). DML provides a general approach to performing inference about a
target parameter in the presence of nuisance parameters. The aim of DML is to
reduce the impact of nuisance parameter estimation on estimators of the
parameter of interest. We describe DML and its two essential components: Neyman
orthogonality and cross-fitting. We highlight that DML reduces functional form
dependence and accommodates the use of complex data types, such as text data.
We illustrate its application through three empirical examples that demonstrate
DML's applicability in cross-sectional and panel settings.

arXiv link: http://arxiv.org/abs/2504.08324v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-04-10

Double Machine Learning for Causal Inference under Shared-State Interference

Authors: Chris Hays, Manish Raghavan

Researchers and practitioners often wish to measure treatment effects in
settings where units interact via markets and recommendation systems. In these
settings, units are affected by certain shared states, like prices, algorithmic
recommendations or social signals. We formalize this structure, calling it
shared-state interference, and argue that our formulation captures many
relevant applied settings. Our key modeling assumption is that individuals'
potential outcomes are independent conditional on the shared state. We then
prove an extension of a double machine learning (DML) theorem providing
conditions for achieving efficient inference under shared-state interference.
We also instantiate our general theorem in several models of interest where it
is possible to efficiently estimate the average direct effect (ADE) or global
average treatment effect (GATE).

arXiv link: http://arxiv.org/abs/2504.08836v1

Econometrics arXiv paper, submitted: 2025-04-09

Causal Inference under Interference through Designed Markets

Authors: Evan Munro

Equilibrium effects make it challenging to evaluate the impact of an
individual-level treatment on outcomes in a single market, even with data from
a randomized trial. In some markets, however, a centralized mechanism allocates
goods and imposes useful structure on spillovers. For a class of strategy-proof
"cutoff" mechanisms, we propose an estimator for global treatment effects using
individual-level data from one market, where treatment assignment is
unconfounded. Algorithmically, we re-run a weighted and perturbed version of
the mechanism. Under a continuum market approximation, the estimator is
asymptotically normal and semi-parametrically efficient. We extend this
approach to learn spillover-aware treatment rules with vanishing asymptotic
regret. Empirically, adjusting for equilibrium effects notably diminishes the
estimated effect of information on inequality in the Chilean school system.

arXiv link: http://arxiv.org/abs/2504.07217v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-04-08

Randomization Inference in Two-Sided Market Experiments

Authors: Jizhou Liu, Azeem M. Shaikh, Panos Toulis

Randomized experiments are increasingly employed in two-sided markets, such
as buyer-seller platforms, to evaluate treatment effects from marketplace
interventions. These experiments must reflect the underlying two-sided market
structure in their design (e.g., sellers and buyers), making them particularly
challenging to analyze. In this paper, we propose a randomization inference
framework to analyze outcomes from such two-sided experiments. Our approach is
finite-sample valid under sharp null hypotheses for any test statistic and
maintains asymptotic validity under weak null hypotheses through
studentization. Moreover, we provide heuristic guidance for choosing among
multiple valid randomization tests to enhance statistical power, which we
demonstrate empirically. Finally, we demonstrate the performance of our
methodology through a series of simulation studies.

arXiv link: http://arxiv.org/abs/2504.06215v1

Econometrics arXiv paper, submitted: 2025-04-08

Optimizing Data-driven Weights In Multidimensional Indexes

Authors: Lidia Ceriani, Chiara Gigliarano, Paolo Verme

Multidimensional indexes are ubiquitous, and popular, but present
non-negligible normative choices when it comes to attributing weights to their
dimensions. This paper provides a more rigorous approach to the choice of
weights by defining a set of desirable properties that weighting models should
meet. It shows that Bayesian Networks is the only model across statistical,
econometric, and machine learning computational models that meets these
properties. An example with EU-SILC data illustrates this new approach
highlighting its potential for policies.

arXiv link: http://arxiv.org/abs/2504.06012v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-04-07

Bayesian Shrinkage in High-Dimensional VAR Models: A Comparative Study

Authors: Harrison Katz, Robert E. Weiss

High-dimensional vector autoregressive (VAR) models offer a versatile
framework for multivariate time series analysis, yet face critical challenges
from over-parameterization and uncertain lag order. In this paper, we
systematically compare three Bayesian shrinkage priors (horseshoe, lasso, and
normal) and two frequentist regularization approaches (ridge and nonparametric
shrinkage) under three carefully crafted simulation scenarios. These scenarios
encompass (i) overfitting in a low-dimensional setting, (ii) sparse
high-dimensional processes, and (iii) a combined scenario where both large
dimension and overfitting complicate inference.
We evaluate each method in quality of parameter estimation (root mean squared
error, coverage, and interval length) and out-of-sample forecasting
(one-step-ahead forecast RMSE). Our findings show that local-global Bayesian
methods, particularly the horseshoe, dominate in maintaining accurate coverage
and minimizing parameter error, even when the model is heavily
over-parameterized. Frequentist ridge often yields competitive point forecasts
but underestimates uncertainty, leading to sub-nominal coverage. A real-data
application using macroeconomic variables from Canada illustrates how these
methods perform in practice, reinforcing the advantages of local-global priors
in stabilizing inference when dimension or lag order is inflated.

arXiv link: http://arxiv.org/abs/2504.05489v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-04-07

Eigenvalue-Based Randomness Test for Residual Diagnostics in Panel Data Models

Authors: Marcell T. Kurbucz, Betsabé Pérez Garrido, Antal Jakovác

This paper introduces the Eigenvalue-Based Randomness (EBR) test - a novel
approach rooted in the Tracy-Widom law from random matrix theory - and applies
it to the context of residual analysis in panel data models. Unlike traditional
methods, which target specific issues like cross-sectional dependence or
autocorrelation, the EBR test simultaneously examines multiple assumptions by
analyzing the largest eigenvalue of a symmetrized residual matrix. Monte Carlo
simulations demonstrate that the EBR test is particularly robust in detecting
not only standard violations such as autocorrelation and linear cross-sectional
dependence (CSD) but also more intricate non-linear and non-monotonic
dependencies, making it a comprehensive and highly flexible tool for enhancing
the reliability of panel data analyses.

arXiv link: http://arxiv.org/abs/2504.05297v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2025-04-07

Rationalizing dynamic choices

Authors: Henrique de Oliveira, Rohit Lamba

An analyst observes an agent take a sequence of actions. The analyst does not
have access to the agent's information and ponders whether the observed actions
could be justified through a rational Bayesian model with a known utility
function. We show that the observed actions cannot be justified if and only if
there is a single deviation argument that leaves the agent better off,
regardless of the information. The result is then extended to allow for
distributions over possible action sequences. Four applications are presented:
monotonicity of rationalization with risk aversion, a potential rejection of
the Bayesian model with observable data, feasible outcomes in dynamic
information design, and partial identification of preferences without
assumptions on information.

arXiv link: http://arxiv.org/abs/2504.05251v1

Econometrics arXiv paper, submitted: 2025-04-06

Non-linear Phillips Curve for India: Evidence from Explainable Machine Learning

Authors: Shovon Sengupta, Bhanu Pratap, Amit Pawar

The conventional linear Phillips curve model, while widely used in
policymaking, often struggles to deliver accurate forecasts in the presence of
structural breaks and inherent nonlinearities. This paper addresses these
limitations by leveraging machine learning methods within a New Keynesian
Phillips Curve framework to forecast and explain headline inflation in India, a
major emerging economy. Our analysis demonstrates that machine learning-based
approaches significantly outperform standard linear models in forecasting
accuracy. Moreover, by employing explainable machine learning techniques, we
reveal that the Phillips curve relationship in India is highly nonlinear,
characterized by thresholds and interaction effects among key variables.
Headline inflation is primarily driven by inflation expectations, followed by
past inflation and the output gap, while supply shocks, except rainfall, exert
only a marginal influence. These findings highlight the ability of machine
learning models to improve forecast accuracy and uncover complex, nonlinear
dynamics in inflation data, offering valuable insights for policymakers.

arXiv link: http://arxiv.org/abs/2504.05350v1

Econometrics arXiv paper, submitted: 2025-04-05

Estimating Demand with Recentered Instruments

Authors: Kirill Borusyak, Mauricio Caceres Bravo, Peter Hull

We develop a new approach to estimating flexible demand models with exogenous
supply-side shocks. Our approach avoids conventional assumptions of exogenous
product characteristics, putting no restrictions on product entry, despite
using instrumental variables that incorporate characteristic variation. The
proposed instruments are model-predicted responses of endogenous variables to
the exogenous shocks, recentered to avoid bias from endogenous characteristics.
We illustrate the approach in a series of Monte Carlo simulations.

arXiv link: http://arxiv.org/abs/2504.04056v1

Econometrics arXiv paper, submitted: 2025-04-04

Regression Discontinuity Design with Distribution-Valued Outcomes

Authors: David Van Dijcke

This article introduces Regression Discontinuity Design (RDD) with
Distribution-Valued Outcomes (R3D), extending the standard RDD framework to
settings where the outcome is a distribution rather than a scalar. Such
settings arise when treatment is assigned at a higher level of aggregation than
the outcome-for example, when a subsidy is allocated based on a firm-level
revenue cutoff while the outcome of interest is the distribution of employee
wages within the firm. Since standard RDD methods cannot accommodate such
two-level randomness, I propose a novel approach based on random distributions.
The target estimand is a "local average quantile treatment effect", which
averages across random quantiles. To estimate this target, I introduce two
related approaches: one that extends local polynomial regression to random
quantiles and another based on local Fr\'echet regression, a form of functional
regression. For both estimators, I establish asymptotic normality and develop
uniform, debiased confidence bands together with a data-driven bandwidth
selection procedure. Simulations validate these theoretical properties and show
existing methods to be biased and inconsistent in this setting. I then apply
the proposed methods to study the effects of gubernatorial party control on
within-state income distributions in the US, using a close-election design. The
results suggest a classic equality-efficiency tradeoff under Democratic
governorship, driven by reductions in income at the top of the distribution.

arXiv link: http://arxiv.org/abs/2504.03992v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-04-04

Flatness-Robust Critical Bandwidth

Authors: Scott Kostyshak

Critical bandwidth (CB) is used to test the multimodality of densities and
regression functions, as well as for clustering methods. CB tests are known to
be inconsistent if the function of interest is constant ("flat") over even a
small interval, and to suffer from low power and incorrect size in finite
samples if the function has a relatively small derivative over an interval.
This paper proposes a solution, flatness-robust CB (FRCB), that exploits the
novel observation that the inconsistency manifests only from regions consistent
with the null hypothesis, and thus identifying and excluding them does not
alter the null or alternative sets. I provide sufficient conditions for
consistency of FRCB, and simulations of a test of regression monotonicity
demonstrate the finite-sample properties of FRCB compared with CB for various
regression functions. Surprisingly, FRCB performs better than CB in some cases
where there are no flat regions, which can be explained by FRCB essentially
giving more importance to parts of the function where there are larger
violations of the null hypothesis. I illustrate the usefulness of FRCB with an
empirical analysis of the monotonicity of the conditional mean function of
radiocarbon age with respect to calendar age.

arXiv link: http://arxiv.org/abs/2504.03594v1

Econometrics arXiv updated paper (originally submitted: 2025-04-04)

Weak instrumental variables due to nonlinearities in panel data: A Super Learner Control Function estimator

Authors: Monika Avila Marquez

A triangular structural panel data model with additive separable
individual-specific effects is used to model the causal effect of a covariate
on an outcome variable when there are unobservable confounders with some of
them time-invariant. In this setup, a linear reduced-form equation might be
problematic when the conditional mean of the endogenous covariate and the
instrumental variables is nonlinear. The reason is that ignoring the
nonlinearity could lead to weak instruments As a solution, we propose a
triangular simultaneous equation model for panel data with additive separable
individual-specific fixed effects composed of a linear structural equation with
a nonlinear reduced form equation. The parameter of interest is the structural
parameter of the endogenous variable. The identification of this parameter is
obtained under the assumption of available exclusion restrictions and using a
control function approach. Estimating the parameter of interest is done using
an estimator that we call Super Learner Control Function estimator (SLCFE). The
estimation procedure is composed of two main steps and sample splitting. We
estimate the control function using a super learner using sample splitting. In
the following step, we use the estimated control function to control for
endogeneity in the structural equation. Sample splitting is done across the
individual dimension. We perform a Monte Carlo simulation to test the
performance of the estimators proposed. We conclude that the Super Learner
Control Function Estimators significantly outperform Within 2SLS estimators.

arXiv link: http://arxiv.org/abs/2504.03228v4

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-04-03

Online Multivariate Regularized Distributional Regression for High-dimensional Probabilistic Electricity Price Forecasting

Authors: Simon Hirsch

Probabilistic electricity price forecasting (PEPF) is vital for short-term
electricity markets, yet the multivariate nature of day-ahead prices - spanning
24 consecutive hours - remains underexplored. At the same time, real-time
decision-making requires methods that are both accurate and fast. We introduce
an online algorithm for multivariate distributional regression models, allowing
an efficient modelling of the conditional means, variances, and dependence
structures of electricity prices. The approach combines multivariate
distributional regression with online coordinate descent and LASSO-type
regularization, enabling scalable estimation in high-dimensional covariate
spaces. Additionally, we propose a regularized estimation path over
increasingly complex dependence structures, allowing for early stopping and
avoiding overfitting. In a case study of the German day-ahead market, our
method outperforms a wide range of benchmarks, showing that modeling dependence
improves both calibration and predictive accuracy. Furthermore, we analyse the
trade-off between predictive accuracy and computational costs for batch and
online estimation and provide an high-performing open-source Python
implementation in the ondil package.

arXiv link: http://arxiv.org/abs/2504.02518v2

Econometrics arXiv paper, submitted: 2025-04-02

Estimation of the complier causal hazard ratio under dependent censoring

Authors: Gilles Crommen, Jad Beyhum, Ingrid Van Keilegom

In this work, we are interested in studying the causal effect of an
endogenous binary treatment on a dependently censored duration outcome. By
dependent censoring, it is meant that the duration time ($T$) and right
censoring time ($C$) are not statistically independent of each other, even
after conditioning on the measured covariates. The endogeneity issue is handled
by making use of a binary instrumental variable for the treatment. To deal with
the dependent censoring problem, it is assumed that on the stratum of
compliers: (i) $T$ follows a semiparametric proportional hazards model; (ii)
$C$ follows a fully parametric model; and (iii) the relation between $T$ and
$C$ is modeled by a parametric copula, such that the association parameter can
be left unspecified. In this framework, the treatment effect of interest is the
complier causal hazard ratio (CCHR). We devise an estimation procedure that is
based on a weighted maximum likelihood approach, where the weights are the
probabilities of an observation coming from a complier. The weights are
estimated non-parametrically in a first stage, followed by the estimation of
the CCHR. Novel conditions under which the model is identifiable are given, a
two-step estimation procedure is proposed and some important asymptotic
properties are established. Simulations are used to assess the validity and
finite-sample performance of the estimation procedure. Finally, we apply the
approach to estimate the CCHR of both job training programs on unemployment
duration and periodic screening examinations on time until death from breast
cancer. The data come from the National Job Training Partnership Act study and
the Health Insurance Plan of Greater New York experiment respectively.

arXiv link: http://arxiv.org/abs/2504.02096v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-04-02

Non-parametric Quantile Regression and Uniform Inference with Unknown Error Distribution

Authors: Haoze Hou, Wei Huang, Zheng Zhang

This paper studies the non-parametric estimation and uniform inference for
the conditional quantile regression function (CQRF) with covariates exposed to
measurement errors. We consider the case that the distribution of the
measurement error is unknown and allowed to be either ordinary or super smooth.
We estimate the density of the measurement error by the repeated measurements
and propose the deconvolution kernel estimator for the CQRF. We derive the
uniform Bahadur representation of the proposed estimator and construct the
uniform confidence bands for the CQRF, uniformly in the sense for all
covariates and a set of quantile indices, and establish the theoretical
validity of the proposed inference. A data-driven approach for selecting the
tuning parameter is also included. Monte Carlo simulations and a real data
application demonstrate the usefulness of the proposed method.

arXiv link: http://arxiv.org/abs/2504.01761v1

Econometrics arXiv paper, submitted: 2025-04-02

A Causal Inference Framework for Data Rich Environments

Authors: Alberto Abadie, Anish Agarwal, Devavrat Shah

We propose a formal model for counterfactual estimation with unobserved
confounding in "data-rich" settings, i.e., where there are a large number of
units and a large number of measurements per unit. Our model provides a bridge
between the structural causal model view of causal inference common in the
graphical models literature with that of the latent factor model view common in
the potential outcomes literature. We show how classic models for potential
outcomes and treatment assignments fit within our framework. We provide an
identification argument for the average treatment effect, the average treatment
effect on the treated, and the average treatment effect on the untreated. For
any estimator that has a fast enough estimation error rate for a certain
nuisance parameter, we establish it is consistent for these various causal
parameters. We then show principal component regression is one such estimator
that leads to consistent estimation, and we analyze the minimal smoothness
required of the potential outcomes function for consistency.

arXiv link: http://arxiv.org/abs/2504.01702v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2025-04-02

On Robust Empirical Likelihood for Nonparametric Regression with Application to Regression Discontinuity Designs

Authors: Qin Fang, Shaojun Guo, Yang Hong, Xinghao Qiao

Empirical likelihood serves as a powerful tool for constructing confidence
intervals in nonparametric regression and regression discontinuity designs
(RDD). The original empirical likelihood framework can be naturally extended to
these settings using local linear smoothers, with Wilks' theorem holding only
when an undersmoothed bandwidth is selected. However, the generalization of
bias-corrected versions of empirical likelihood under more realistic conditions
is non-trivial and has remained an open challenge in the literature. This paper
provides a satisfactory solution by proposing a novel approach, referred to as
robust empirical likelihood, designed for nonparametric regression and RDD. The
core idea is to construct robust weights which simultaneously achieve bias
correction and account for the additional variability introduced by the
estimated bias, thereby enabling valid confidence interval construction without
extra estimation steps involved. We demonstrate that the Wilks' phenomenon
still holds under weaker conditions in nonparametric regression, sharp and
fuzzy RDD settings. Extensive simulation studies confirm the effectiveness of
our proposed approach, showing superior performance over existing methods in
terms of coverage probabilities and interval lengths. Moreover, the proposed
procedure exhibits robustness to bandwidth selection, making it a flexible and
reliable tool for empirical analyses. The practical usefulness is further
illustrated through applications to two real datasets.

arXiv link: http://arxiv.org/abs/2504.01535v1

Econometrics arXiv paper, submitted: 2025-04-02

Locally- but not Globally-identified SVARs

Authors: Emanuele Bacchiocchi, Toru Kitagawa

This paper analyzes Structural Vector Autoregressions (SVARs) where
identification of structural parameters holds locally but not globally. In this
case there exists a set of isolated structural parameter points that are
observationally equivalent under the imposed restrictions. Although the data do
not inform us which observationally equivalent point should be selected, the
common frequentist practice is to obtain one as a maximum likelihood estimate
and perform impulse response analysis accordingly. For Bayesians, the lack of
global identification translates to non-vanishing sensitivity of the posterior
to the prior, and the multi-modal likelihood gives rise to computational
challenges as posterior sampling algorithms can fail to explore all the modes.
This paper overcomes these challenges by proposing novel estimation and
inference procedures. We characterize a class of identifying restrictions and
circumstances that deliver local but non-global identification, and the
resulting number of observationally equivalent parameter values. We propose
algorithms to exhaustively compute all admissible structural parameters given
reduced-form parameters and utilize them to sample from the multi-modal
posterior. In addition, viewing the set of observationally equivalent parameter
points as the identified set, we develop Bayesian and frequentist procedures
for inference on the corresponding set of impulse responses. An empirical
example illustrates our proposal.

arXiv link: http://arxiv.org/abs/2504.01441v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-04-02

A Practical Guide to Estimating Conditional Marginal Effects: Modern Approaches

Authors: Jiehan Liu, Ziyi Liu, Yiqing Xu

This Element offers a practical guide to estimating conditional marginal
effects-how treatment effects vary with a moderating variable-using modern
statistical methods. Commonly used approaches, such as linear interaction
models, often suffer from unclarified estimands, limited overlap, and
restrictive functional forms. This guide begins by clearly defining the
estimand and presenting the main identification results. It then reviews and
improves upon existing solutions, such as the semiparametric kernel estimator,
and introduces robust estimation strategies, including augmented inverse
propensity score weighting with Lasso selection (AIPW-Lasso) and double machine
learning (DML) with modern algorithms. Each method is evaluated through
simulations and empirical examples, with practical recommendations tailored to
sample size and research context. All tools are implemented in the accompanying
interflex package for R.

arXiv link: http://arxiv.org/abs/2504.01355v1

Econometrics arXiv updated paper (originally submitted: 2025-04-01)

Partial Identification of Mean Achievement in ILSA Studies with Multi-Stage Stratified Sample Design and Student Non-Participation

Authors: Diego Cortes, Jeff Dominitz, Maximiliano Romero

International large-scale assessment (ILSA) studies collect information
across education systems with the objective of learning about the
population-wide distribution of student achievement in the assessment. In this
article, we study one of the most fundamental threats that these studies face
when justifying the conclusions reached about these distributions: the
identification problem that arises from student non-participation during data
collection. Recognizing that ILSA studies have traditionally employed a narrow
range of strategies to address non-participation, we examine this problem using
tools developed within the framework of partial identification of probability
distributions. We tailor this framework to the problem of non-participation
when data are collected using a multi-stage stratified random sample design, as
in most ILSA studies. We demonstrate this approach with application to the
International Computer and Information Literacy Study in 2018. We show how to
use the framework to assess mean achievement under reasonable and credible sets
of assumptions about the non-participating population. We also provide examples
of how these results may be reported by agencies that administer ILSA studies.
By doing so, we bring to the field of ILSA an alternative strategy for
identification, estimation, and reporting of population parameters of interest.

arXiv link: http://arxiv.org/abs/2504.01209v2

Econometrics arXiv updated paper (originally submitted: 2025-04-01)

Nonlinearity in Dynamic Causal Effects: Making the Bad into the Good, and the Good into the Great?

Authors: Toru Kitagawa, Weining Wang, Mengshan Xu

This paper was prepared as a comment on "Dynamic Causal Effects in a
Nonlinear World: the Good, the Bad, and the Ugly" by Michal Koles\'{a}r and
Mikkel Plagborg-M{\o}ller. We make three comments, including a novel
contribution to the literature, showing how a reasonable economic
interpretation can potentially be restored for average-effect estimators with
negative weights.

arXiv link: http://arxiv.org/abs/2504.01140v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-04-01

Quantile Treatment Effects in High Dimensional Panel Data

Authors: Yihong Xu, Li Zheng

We introduce novel estimators for quantile causal effects with high
dimensional panel data (large $N$ and $T$), where only one or a few units are
affected by the intervention or policy. Our method extends the generalized
synthetic control method xu_2017 from average treatment effects on the
treated to quantile treatment effects on the treated, allowing the underlying
factor structure to change across the quantile of the interested outcome
distribution. Our method involves estimating the quantile-dependent factors
using the control group, followed by a quantile regression to estimate the
quantile treatment effect using the treated units. We establish the asymptotic
properties of our estimators and propose a bootstrap procedure for statistical
inference, supported by simulation studies. An empirical application of the
2008 China Stimulus Program is provided.

arXiv link: http://arxiv.org/abs/2504.00785v2

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2025-03-30

Where the Trees Fall: Macroeconomic Forecasts for Forest-Reliant States

Authors: Andrew Crawley, Adam Daigneault, Jonathan Gendron

Several key states in various regions of the U.S. have experienced recent
sawtimber as well as pulp and paper mill closures, which raises an important
policy question: how have and will key macroeconomic and industry specific
indicators within the U.S. forest sector likely to change over time? This study
provides empirical evidence to support forest-sector policy design by using a
vector error correction (VEC) model to forecast economic trends in three major
industries - forestry and logging, wood manufacturing, and paper manufacturing
- across six of the most forest-dependent states found by the location quotient
(LQ) measure: Alabama, Arkansas, Maine, Mississippi, Oregon, and Wisconsin.
Overall, the results suggest a general decline in employment and the number of
firms in the forestry and logging industry as well as the paper manufacturing
industry, while wood manufacturing is projected to see modest employment gains.
These results also offer key insights for regional policymakers, industry
leaders, and local economic development officials: communities dependent on
timber-based manufacturing may be more resilient than other forestry-based
industries in the face of economic disruptions. Our findings can help
prioritize targeted policy interventions and inform regional economic
resilience strategies. We show distinct differences across forest-dependent
industries and/or state sectors and geographies, highlighting that policies may
have to be specific to each sector and/or geographical area. Finally, our VEC
modeling framework is adaptable to other resource-dependent industries that
serve as regional economic pillars such as mining, agriculture, and energy
production offering a transferable tool for policy analysis in regions with
similar economic structures.

arXiv link: http://arxiv.org/abs/2503.23569v2

Econometrics arXiv updated paper (originally submitted: 2025-03-30)

Reinterpreting demand estimation

Authors: Jiafeng Chen

This paper bridges the demand estimation and causal inference literatures by
interpreting nonparametric structural assumptions as restrictions on
counterfactual outcomes. It offers nontrivial and equivalent restatements of
key demand estimation assumptions in the Neyman-Rubin potential outcomes model,
for both settings with market-level data (Berry and Haile, 2014) and settings
with demographic-specific market shares (Berry and Haile, 2024). The
reformulation highlights a latent homogeneity assumption underlying structural
demand models: The relationship between counterfactual outcomes is assumed to
be identical across markets. This assumption is strong, but necessary for
identification of market-level counterfactuals. Viewing structural demand
models as misspecified but approximately correct reveals a tradeoff between
specification flexibility and robustness to latent homogeneity.

arXiv link: http://arxiv.org/abs/2503.23524v2

Econometrics arXiv paper, submitted: 2025-03-30

Forward Selection Fama-MacBeth Regression with Higher-Order Asset Pricing Factors

Authors: Nicola Borri, Denis Chetverikov, Yukun Liu, Aleh Tsyvinski

We show that the higher-orders and their interactions of the common sparse
linear factors can effectively subsume the factor zoo. To this extend, we
propose a forward selection Fama-MacBeth procedure as a method to estimate a
high-dimensional stochastic discount factor model, isolating the most relevant
higher-order factors. Applying this approach to terms derived from six widely
used factors (the Fama-French five-factor model and the momentum factor), we
show that the resulting higher-order model with only a small number of selected
higher-order terms significantly outperforms traditional benchmarks both
in-sample and out-of-sample. Moreover, it effectively subsumes a majority of
the factors from the extensive factor zoo, suggesting that the pricing power of
most zoo factors is attributable to their exposure to higher-order terms of
common linear factors.

arXiv link: http://arxiv.org/abs/2503.23501v1

Econometrics arXiv paper, submitted: 2025-03-29

Estimation of Latent Group Structures in Time-Varying Panel Data Models

Authors: Paul Haimerl, Stephan Smeekes, Ines Wilms

We introduce a panel data model where coefficients vary both over time and
the cross-section. Slope coefficients change smoothly over time and follow a
latent group structure, being homogeneous within but heterogeneous across
groups. The group structure is identified using a pairwise adaptive group
fused-Lasso penalty. The trajectories of time-varying coefficients are
estimated via polynomial spline functions. We derive the asymptotic
distributions of the penalized and post-selection estimators and show their
oracle efficiency. A simulation study demonstrates excellent finite sample
properties. An application to the emission intensity of GDP highlights the
relevance of addressing cross-sectional heterogeneity and time-variance in
empirical settings.

arXiv link: http://arxiv.org/abs/2503.23165v1

Econometrics arXiv updated paper (originally submitted: 2025-03-28)

Inference on effect size after multiple hypothesis testing

Authors: Andreas Dzemski, Ryo Okui, Wenjie Wang

Significant treatment effects are often emphasized when interpreting and
summarizing empirical findings in studies that estimate multiple, possibly
many, treatment effects. Under this kind of selective reporting, conventional
treatment effect estimates may be biased and their corresponding confidence
intervals may undercover the true effect sizes. We propose new estimators and
confidence intervals that provide valid inferences on the effect sizes of the
significant effects after multiple hypothesis testing. Our methods are based on
the principle of selective conditional inference and complement a wide range of
tests, including step-up tests and bootstrap-based step-down tests. Our
approach is scalable, allowing us to study an application with over 370
estimated effects. We justify our procedure for asymptotically normal treatment
effect estimators. We provide two empirical examples that demonstrate bias
correction and confidence interval adjustments for significant effects. The
magnitude and direction of the bias correction depend on the correlation
structure of the estimated effects and whether the interpretation of the
significant effects depends on the (in)significance of other effects.

arXiv link: http://arxiv.org/abs/2503.22369v2

Econometrics arXiv paper, submitted: 2025-03-28

tempdisagg: A Python Framework for Temporal Disaggregation of Time Series Data

Authors: Jaime Vera-Jaramillo

tempdisagg is a modern, extensible, and production-ready Python framework for
temporal disaggregation of time series data. It transforms low-frequency
aggregates into consistent, high-frequency estimates using a wide array of
econometric techniques-including Chow-Lin, Denton, Litterman, Fernandez, and
uniform interpolation-as well as enhanced variants with automated estimation of
key parameters such as the autocorrelation coefficient rho. The package
introduces features beyond classical methods, including robust ensemble
modeling via non-negative least squares optimization, post-estimation
correction of negative values under multiple aggregation rules, and optional
regression-based imputation of missing values through a dedicated
Retropolarizer module. Architecturally, it follows a modular design inspired by
scikit-learn, offering a clean API for validation, modeling, visualization, and
result interpretation.

arXiv link: http://arxiv.org/abs/2503.22054v1

Econometrics arXiv paper, submitted: 2025-03-27

An Artificial Trend Index for Private Consumption Using Google Trends

Authors: Juan Tenorio, Heidi Alpiste, Jakelin Remón, Arian Segil

In recent years, the use of databases that analyze trends, sentiments or news
to make economic projections or create indicators has gained significant
popularity, particularly with the Google Trends platform. This article explores
the potential of Google search data to develop a new index that improves
economic forecasts, with a particular focus on one of the key components of
economic activity: private consumption (64% of GDP in Peru). By selecting and
estimating categorized variables, machine learning techniques are applied,
demonstrating that Google data can identify patterns to generate a leading
indicator in real time and improve the accuracy of forecasts. Finally, the
results show that Google's "Food" and "Tourism" categories significantly reduce
projection errors, highlighting the importance of using this information in a
segmented manner to improve macroeconomic forecasts.

arXiv link: http://arxiv.org/abs/2503.21981v1

Econometrics arXiv paper, submitted: 2025-03-27

Identification and estimation of treatment effects in a linear factor model with fixed number of time periods

Authors: Koki Fusejima, Takuya Ishihara

This paper provides a new approach for identifying and estimating the Average
Treatment Effect on the Treated under a linear factor model that allows for
multiple time-varying unobservables. Unlike the majority of the literature on
treatment effects in linear factor models, our approach does not require the
number of pre-treatment periods to go to infinity to obtain a valid estimator.
Our identification approach employs a certain nonlinear transformations of the
time invariant observed covariates that are sufficiently correlated with the
unobserved variables. This relevance condition can be checked with the
available data on pre-treatment periods by validating the correlation of the
transformed covariates and the pre-treatment outcomes. Based on our
identification approach, we provide an asymptotically unbiased estimator of the
effect of participating in the treatment when there is only one treated unit
and the number of control units is large.

arXiv link: http://arxiv.org/abs/2503.21763v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-03-27

A Powerful Bootstrap Test of Independence in High Dimensions

Authors: Mauricio Olivares, Tomasz Olma, Daniel Wilhelm

This paper proposes a nonparametric test of pairwise independence of one
random variable from a large pool of other random variables. The test statistic
is the maximum of several Chatterjee's rank correlations and critical values
are computed via a block multiplier bootstrap. The test is shown to
asymptotically control size uniformly over a large class of data-generating
processes, even when the number of variables is much larger than sample size.
The test is consistent against any fixed alternative. It can be combined with a
stepwise procedure for selecting those variables from the pool that violate
independence, while controlling the family-wise error rate. All formal results
leave the dependence among variables in the pool completely unrestricted. In
simulations, we find that our test is very powerful, outperforming existing
tests in most scenarios considered, particularly in high dimensions and/or when
the variables in the pool are dependent.

arXiv link: http://arxiv.org/abs/2503.21715v2

Econometrics arXiv updated paper (originally submitted: 2025-03-26)

Inferring Treatment Effects in Large Panels by Uncovering Latent Similarities

Authors: Ben Deaner, Chen-Wei Hsiang, Andrei Zeleneev

The presence of unobserved confounders is one of the main challenges in
identifying treatment effects. In this paper, we propose a new approach to
causal inference using panel data with large large $N$ and $T$. Our approach
imputes the untreated potential outcomes for treated units using the outcomes
for untreated individuals with similar values of the latent confounders. In
order to find units with similar latent characteristics, we utilize long
pre-treatment histories of the outcomes. Our analysis is based on a
nonparametric, nonlinear, and nonseparable factor model for untreated potential
outcomes and treatments. The model satisfies minimal smoothness requirements.
We impute both missing counterfactual outcomes and propensity scores using
kernel smoothing based on the constructed measure of latent similarity between
units, and demonstrate that our estimates can achieve the optimal nonparametric
rate of convergence up to log terms. Using these estimates, we construct a
doubly robust estimator of the period-specifc average treatment effect on the
treated (ATT), and provide conditions, under which this estimator is
$N$-consistent, and asymptotically normal and unbiased. Our simulation
study demonstrates that our method provides accurate inference for a wide range
of data generating processes.

arXiv link: http://arxiv.org/abs/2503.20769v2

Econometrics arXiv paper, submitted: 2025-03-26

Large Structural VARs with Multiple Sign and Ranking Restrictions

Authors: Joshua Chan, Christian Matthes, Xuewen Yu

Large VARs are increasingly used in structural analysis as a unified
framework to study the impacts of multiple structural shocks simultaneously.
However, the concurrent identification of multiple shocks using sign and
ranking restrictions poses significant practical challenges to the point where
existing algorithms cannot be used with such large VARs. To address this, we
introduce a new numerically efficient algorithm that facilitates the estimation
of impulse responses and related measures in large structural VARs identified
with a large number of structural restrictions on impulse responses. The
methodology is illustrated using a 35-variable VAR with over 100 sign and
ranking restrictions to identify 8 structural shocks.

arXiv link: http://arxiv.org/abs/2503.20668v1

Econometrics arXiv updated paper (originally submitted: 2025-03-26)

Quasi-Bayesian Local Projections: Simultaneous Inference and Extension to the Instrumental Variable Method

Authors: Masahiro Tanaka

Local projections (LPs) are widely used for impulse response analysis, but
Bayesian methods face challenges due to the absence of a likelihood function.
Existing approaches rely on pseudo-likelihoods, which often result in poorly
calibrated posteriors. We propose a quasi-Bayesian method based on the
Laplace-type estimator, where a quasi-likelihood is constructed using a
generalized method of moments criterion. This approach avoids strict
distributional assumptions, ensures well-calibrated inferences, and supports
simultaneous credible bands. Additionally, it can be naturally extended to the
instrumental variable method. We validate our approach through Monte Carlo
simulations.

arXiv link: http://arxiv.org/abs/2503.20249v2

Econometrics arXiv paper, submitted: 2025-03-26

Treatment Effects Inference with High-Dimensional Instruments and Control Variables

Authors: Xiduo Chen, Xingdong Feng, Antonio F. Galvao, Yeheng Ge

Obtaining valid treatment effect inferences remains a challenging problem
when dealing with numerous instruments and non-sparse control variables. In
this paper, we propose a novel ridge regularization-based instrumental
variables method for estimation and inference in the presence of both
high-dimensional instrumental variables and high-dimensional control variables.
These methods are applicable both with and without sparsity assumptions. To
address the bias caused by high-dimensional instruments, we introduce a
two-step procedure incorporating a data-splitting strategy. We establish
statistical properties of the estimator, including consistency and asymptotic
normality. Furthermore, we develop statistical inference procedures by
providing a consistent estimator for the asymptotic variance of the estimator.
The finite sample performance of the proposed method is evaluated through
numerical simulations. Results indicate that the new estimator consistently
outperforms existing sparsity-based approaches across various settings,
offering valuable insights for more complex scenarios. Finally, we provide an
empirical application estimating the causal effect of schooling on earnings by
addressing potential endogeneity through the use of high-dimensional
instrumental variables and high-dimensional covariates.

arXiv link: http://arxiv.org/abs/2503.20149v1

Econometrics arXiv paper, submitted: 2025-03-25

EASI Drugs in the Streets of Colombia: Modeling Heterogeneous and Endogenous Drug Preferences

Authors: Santiago Montoya-Blandón, Andrés Ramírez-Hassan

The response of illicit drug consumers to large-scale policy changes, such as
legalization, is heavily mediated by their demand behavior. Since individual
drug use is driven by many unobservable factors, accounting for unobserved
heterogeneity is crucial for modeling demand and designing targeted public
policies. This paper introduces a finite Gaussian mixture of Exact Affine Stone
Index (EASI) demand systems to estimate the joint demand for marijuana,
cocaine, and basuco (cocaine residual or "crack") in Colombia, accounting for
corner solutions and endogenous price variation. Our results highlight the
importance of unobserved heterogeneity in identifying reliable price
elasticities. The method reveals two regular consumer subpopulations: "safe"
(recreational) and "addict" users, with the majority falling into the first
group. For the "safe" group, whose estimates are precise and nationally
representative, all three drugs exhibit unitary price elasticities, with
cocaine being complementary to marijuana and basuco an inferior substitute to
cocaine. Given the low production cost of marijuana in Colombia, legalization
is likely to drive prices down significantly. Our counterfactual analysis
suggests that a 50% price decrease would result in a $363 USD gain in
utility-equivalent expenditure per representative consumer, $120 million USD
in government tax revenue, and a $127 million USD revenue loss for drug
dealers. Legalization, therefore, has the potential to reduce the incentive for
drug-related criminal activity, the current largest source of violent crime in
Colombia.

arXiv link: http://arxiv.org/abs/2503.20100v1

Econometrics arXiv paper, submitted: 2025-03-25

Identification of Average Treatment Effects in Nonparametric Panel Models

Authors: Susan Athey, Guido Imbens

This paper studies identification of average treatment effects in a panel
data setting. It introduces a novel nonparametric factor model and proves
identification of average treatment effects. The identification proof is based
on the introduction of a consistent estimator. Underlying the proof is a result
that there is a consistent estimator for the expected outcome in the absence of
the treatment for each unit and time period; this result can be applied more
broadly, for example in problems of decompositions of group-level differences
in outcomes, such as the much-studied gender wage gap.

arXiv link: http://arxiv.org/abs/2503.19873v1

Econometrics arXiv updated paper (originally submitted: 2025-03-25)

Two Level Nested and Sequential Logit

Authors: Davide Luparello

This technical note provides comprehensive derivations of fundamental
equations in two-level nested and sequential logit models for analyzing
hierarchical choice structures. We present derivations of the Berry (1994)
inversion formula, nested inclusive values computation, and multi-level market
share equations, complementing existing literature. While conceptually
distinct, nested and sequential logit models share mathematical similarities
and, under specific distributional assumptions, yield identical inversion
formulas-offering valuable analytical insights. These notes serve as a
practical reference for researchers implementing multi-level discrete choice
models in empirical applications, particularly in industrial organization and
demand estimation contexts, and complement Mansley et al. (2019).

arXiv link: http://arxiv.org/abs/2503.21808v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-03-25

Bayesian Outlier Detection for Matrix-variate Models

Authors: Monica Billio, Roberto Casarin, Fausto Corradin, Antonio Peruzzi

Anomalies in economic and financial data -- often linked to rare yet
impactful events -- are of theoretical interest, but can also severely distort
inference. Although outlier-robust methodologies can be used, many researchers
prefer pre-processing strategies that remove outliers. In this work, an
efficient sequential Bayesian framework is proposed for outlier detection based
on the predictive Bayes Factor (BF). The proposed method is specifically
designed for large, multidimensional datasets and extends univariate Bayesian
model outlier detection procedures to the matrix-variate setting. Leveraging
power-discounted priors, tractable predictive BF are obtained, thereby avoiding
computationally intensive techniques. The BF finite sample distribution, the
test critical region, and robust extensions of the test are introduced by
exploiting the sampling variability. The framework supports online detection
with analytical tractability, ensuring both accuracy and scalability. Its
effectiveness is demonstrated through simulations, and three applications to
reference datasets in macroeconomics and finance are provided.

arXiv link: http://arxiv.org/abs/2503.19515v2

Econometrics arXiv paper, submitted: 2025-03-24

Automatic Inference for Value-Added Regressions

Authors: Tian Xie

It is common to use shrinkage methods such as empirical Bayes to improve
estimates of teacher value-added. However, when the goal is to perform
inference on coefficients in the regression of long-term outcomes on
value-added, it's unclear whether shrinking the value-added estimators can help
or hurt. In this paper, we consider a general class of value-added estimators
and the properties of their corresponding regression coefficients. Our main
finding is that regressing long-term outcomes on shrinkage estimates of
value-added performs an automatic bias correction: the associated regression
estimator is asymptotically unbiased, asymptotically normal, and efficient in
the sense that it is asymptotically equivalent to regressing on the true
(latent) value-added. Further, OLS standard errors from regressing on shrinkage
estimates are consistent. As such, efficient inference is easy for
practitioners to implement: simply regress outcomes on shrinkage estimates of
value added.

arXiv link: http://arxiv.org/abs/2503.19178v1

Econometrics arXiv paper, submitted: 2025-03-24

Empirical Bayes shrinkage (mostly) does not correct the measurement error in regression

Authors: Jiafeng Chen, Jiaying Gu, Soonwoo Kwon

In the value-added literature, it is often claimed that regressing on
empirical Bayes shrinkage estimates corrects for the measurement error problem
in linear regression. We clarify the conditions needed; we argue that these
conditions are stronger than the those needed for classical measurement error
correction, which we advocate for instead. Moreover, we show that the classical
estimator cannot be improved without stronger assumptions. We extend these
results to regressions on nonlinear transformations of the latent attribute and
find generically slow minimax estimation rates.

arXiv link: http://arxiv.org/abs/2503.19095v1

Econometrics arXiv paper, submitted: 2025-03-24

Forecasting Labor Demand: Predicting JOLT Job Openings using Deep Learning Model

Authors: Kyungsu Kim

This thesis studies the effectiveness of Long Short Term Memory model in
forecasting future Job Openings and Labor Turnover Survey data in the United
States. Drawing on multiple economic indicators from various sources, the data
are fed directly into LSTM model to predict JOLT job openings in subsequent
periods. The performance of the LSTM model is compared with conventional
autoregressive approaches, including ARIMA, SARIMA, and Holt-Winters. Findings
suggest that the LSTM model outperforms these traditional models in predicting
JOLT job openings, as it not only captures the dependent variables trends but
also harmonized with key economic factors. These results highlight the
potential of deep learning techniques in capturing complex temporal
dependencies in economic data, offering valuable insights for policymakers and
stakeholders in developing data-driven labor market strategies

arXiv link: http://arxiv.org/abs/2503.19048v1

Econometrics arXiv updated paper (originally submitted: 2025-03-24)

Simultaneous Inference Bands for Autocorrelations

Authors: Uwe Hassler, Marc-Oliver Pohle, Tanja Zahn

Sample autocorrelograms typically come with significance bands (non-rejection
regions) for the null hypothesis of no temporal correlation. These bands have
two shortcomings. First, they build on pointwise intervals and suffer from
joint undercoverage (overrejection) under the null hypothesis. Second, if this
null is clearly violated one would rather prefer to see confidence bands to
quantify estimation uncertainty. We propose and discuss both simultaneous
significance bands and simultaneous confidence bands for time series and series
of regression residuals. They are as easy to construct as their pointwise
counterparts and at the same time provide an intuitive and visual
quantification of sampling uncertainty as well as valid statistical inference.
For regression residuals, we show that for static regressions the asymptotic
variances underlying the construction of the bands are the same as those for
observed time series, and for dynamic regressions (with lagged endogenous
regressors) we show how they need to be adjusted. We study theoretical
properties of simultaneous significance bands and two types of simultaneous
confidence bands (sup-t and Bonferroni) and analyse their finite-sample
performance in a simulation study. Finally, we illustrate the use of the bands
in an application to monthly US inflation and residuals from Phillips curve
regressions.

arXiv link: http://arxiv.org/abs/2503.18560v2

Econometrics arXiv paper, submitted: 2025-03-22

Spatiotemporal Impact of Trade Policy Variables on Asian Manufacturing Hubs: Bayesian Global Vector Autoregression Model

Authors: Lutfu S. Sua, Haibo Wang, Jun Huang

A novel spatiotemporal framework using diverse econometric approaches is
proposed in this research to analyze relationships among eight economy-wide
variables in varying market conditions. Employing Vector Autoregression (VAR)
and Granger causality, we explore trade policy effects on emerging
manufacturing hubs in China, India, Malaysia, Singapore, and Vietnam. A
Bayesian Global Vector Autoregression (BGVAR) model also assesses interaction
of cross unit and perform Unconditional and Conditional Forecasts. Utilizing
time-series data from the Asian Development Bank, our study reveals multi-way
cointegration and dynamic connectedness relationships among key economy-wide
variables. This innovative framework enhances investment decisions and
policymaking through a data-driven approach.

arXiv link: http://arxiv.org/abs/2503.17790v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-03-21

Calibration Strategies for Robust Causal Estimation: Theoretical and Empirical Insights on Propensity Score-Based Estimators

Authors: Sven Klaassen, Jan Rabenseifner, Jannis Kueck, Philipp Bach

The partitioning of data for estimation and calibration critically impacts
the performance of propensity score based estimators like inverse probability
weighting (IPW) and double/debiased machine learning (DML) frameworks. We
extend recent advances in calibration techniques for propensity score
estimation, improving the robustness of propensity scores in challenging
settings such as limited overlap, small sample sizes, or unbalanced data. Our
contributions are twofold: First, we provide a theoretical analysis of the
properties of calibrated estimators in the context of DML. To this end, we
refine existing calibration frameworks for propensity score models, with a
particular emphasis on the role of sample-splitting schemes in ensuring valid
causal inference. Second, through extensive simulations, we show that
calibration reduces variance of inverse-based propensity score estimators while
also mitigating bias in IPW, even in small-sample regimes. Notably, calibration
improves stability for flexible learners (e.g., gradient boosting) while
preserving the doubly robust properties of DML. A key insight is that, even
when methods perform well without calibration, incorporating a calibration step
does not degrade performance, provided that an appropriate sample-splitting
approach is chosen.

arXiv link: http://arxiv.org/abs/2503.17290v3

Econometrics arXiv updated paper (originally submitted: 2025-03-21)

Local Projections or VARs? A Primer for Macroeconomists

Authors: José Luis Montiel Olea, Mikkel Plagborg-Møller, Eric Qian, Christian K. Wolf

What should applied macroeconomists know about local projection (LP) and
vector autoregression (VAR) impulse response estimators? The two methods share
the same estimand, but in finite samples lie on opposite ends of a
bias-variance trade-off. While the low bias of LPs comes at a quite steep
variance cost, this cost must be paid to achieve robust uncertainty
assessments. Hence, when the goal is to convey what can be learned about
dynamic causal effects from the data, VARs should only be used with long lag
lengths, ensuring equivalence with LP. For LP estimation, we provide guidance
on selection of lag length and controls, bias correction, and confidence
interval construction.

arXiv link: http://arxiv.org/abs/2503.17144v2

Econometrics arXiv updated paper (originally submitted: 2025-03-20)

Partial Identification in Moment Models with Incomplete Data--A Conditional Optimal Transport Approach

Authors: Yanqin Fan, Hyeonseok Park, Brendan Pass, Xuetao Shi

In this paper, we develop a unified approach to study partial identification
of a finite-dimensional parameter defined by a general moment model with
incomplete data. We establish a novel characterization of the identified set
for the true parameter in terms of a continuum of inequalities defined by
conditional optimal transport. For the special case of an affine moment model,
we show that the identified set is convex and that its support function can be
easily computed by solving a conditional optimal transport problem. For
parameters that may not satisfy the moment model, we propose a two-step
procedure to construct its identified set. Finally, we demonstrate the
generality and effectiveness of our approach through several running examples.

arXiv link: http://arxiv.org/abs/2503.16098v2

Econometrics arXiv updated paper (originally submitted: 2025-03-19)

Has the Paris Agreement Shaped Emission Trends? A Panel VECM Analysis of Energy, Growth, and CO$_2$ in 106 Middle-Income Countries

Authors: Tuhin G. M. Al Mamun, Ehsanullah, Md. Sharif Hassan, Mohammad Bin Amin, Judit Oláh

Rising CO$_2$ emissions remain a critical global challenge, particularly in
middle-income countries where economic growth drives environmental degradation.
This study examines the long-run and short-run relationships between CO$_2$
emissions, energy use, GDP per capita, and population across 106 middle-income
countries from 1980 to 2023. Using a Panel Vector Error Correction Model
(VECM), we assess the impact of the Paris Agreement (2015) on emissions while
conducting cointegration tests to confirm long-run equilibrium relationships.
The findings reveal a strong long-run relationship among the variables, with
energy use as the dominant driver of emissions, while GDP per capita has a
moderate impact. However, the Paris Agreement has not significantly altered
emissions trends in middle-income economies. Granger causality tests indicate
that energy use strongly causes emissions, but GDP per capita and population do
not exhibit significant short-run causal effects. Variance decomposition
confirms that energy shocks have the most persistent effects, and impulse
response functions (IRFs) show emissions trajectories are primarily shaped by
economic activity rather than climate agreements. Robustness checks, including
autocorrelation tests, polynomial root stability, and Yamagata-Pesaran slope
homogeneity tests, validate model consistency. These results suggest that while
global agreements set emissions reduction goals, their effectiveness remains
limited without stronger national climate policies, sectoral energy reforms,
and financial incentives for clean energy adoption to ensure sustainable
economic growth.

arXiv link: http://arxiv.org/abs/2503.14946v2

Econometrics arXiv paper, submitted: 2025-03-19

Linear programming approach to partially identified econometric models

Authors: Andrei Voronin

Sharp bounds on partially identified parameters are often given by the values
of linear programs (LPs). This paper introduces a novel estimator of the LP
value. Unlike existing procedures, our estimator is root-n-consistent,
pointwise in the probability measure, whenever the population LP is feasible
and finite. Our estimator is valid under point-identification, over-identifying
constraints, and solution multiplicity. Turning to uniformity properties, we
prove that the LP value cannot be uniformly consistently estimated without
restricting the set of possible distributions. We then show that our estimator
achieves uniform consistency under a condition that is minimal for the
existence of any such estimator. We obtain computationally efficient,
asymptotically normal inference procedure with exact asymptotic coverage at any
fixed probability measure. To complement our estimation results, we derive LP
sharp bounds in a general identification setting. We apply our findings to
estimating returns to education. To that end, we propose the conditionally
monotone IV assumption (cMIV) that tightens the classical monotone IV (MIV)
bounds and is testable under a mild regularity condition. Under cMIV,
university education in Colombia is shown to increase the average wage by at
least $5.5%$, whereas classical conditions fail to yield an informative bound.

arXiv link: http://arxiv.org/abs/2503.14940v1

Econometrics arXiv updated paper (originally submitted: 2025-03-18)

Testing Conditional Stochastic Dominance at Target Points

Authors: Federico A. Bugni, Ivan A. Canay, Deborah Kim

This paper introduces a novel test for conditional stochastic dominance (CSD)
at specific values of the conditioning covariates, referred to as target
points. The test is relevant for analyzing income inequality, evaluating
treatment effects, and studying discrimination. We propose a
Kolmogorov--Smirnov-type test statistic that utilizes induced order statistics
from independent samples. Notably, the test features a data-independent
critical value, eliminating the need for resampling techniques such as the
bootstrap. Our approach avoids kernel smoothing and parametric assumptions,
instead relying on a tuning parameter to select relevant observations. We
establish the asymptotic properties of our test, showing that the induced order
statistics converge to independent draws from the true conditional
distributions and that the test is asymptotically of level $\alpha$ under weak
regularity conditions. While our results apply to both continuous and discrete
data, in the discrete case, the critical value only provides a valid upper
bound. To address this, we propose a refined critical value that significantly
enhances power, requiring only knowledge of the support size of the
distributions. Additionally, we analyze the test's behavior in the limit
experiment, demonstrating that it reduces to a problem analogous to testing
unconditional stochastic dominance in finite samples. This framework allows us
to prove the validity of permutation-based tests for stochastic dominance when
the random variables are continuous. Monte Carlo simulations confirm the strong
finite-sample performance of our method.

arXiv link: http://arxiv.org/abs/2503.14747v2

Econometrics arXiv paper, submitted: 2025-03-18

Bounds for within-household encouragement designs with interference

Authors: Santiago Acerenza, Julian Martinez-Iriarte, Alejandro Sánchez-Becerra, Pietro Emilio Spini

We obtain partial identification of direct and spillover effects in settings
with strategic interaction and discrete treatments, outcome and independent
instruments. We consider a framework with two decision-makers who play
pure-strategy Nash equilibria in treatment take-up, whose outcomes are
determined by their joint take-up decisions. We obtain a latent-type
representation at the pair level. We enumerate all types that are consistent
with pure-strategy Nash equilibria and exclusion restrictions, and then impose
conditions such as symmetry, strategic complementarity/substitution, several
notions of monotonicity, and homogeneity. Under any combination of the above
restrictions, we provide sharp bounds for our parameters of interest via a
simple Python optimization routine. Our framework allows the empirical
researcher to tailor the above menu of assumptions to their empirical
application and to assess their individual and joint identifying power.

arXiv link: http://arxiv.org/abs/2503.14314v1

Econometrics arXiv paper, submitted: 2025-03-18

How does Bike Absence Influence Mode Shifts Among Dockless Bike-Sharing Users? Evidence From Nanjing, China

Authors: Hongjun Cui, Zhixiao Ren, Xinwei Ma, Minqing Zhu

Dockless bike-sharing (DBS) users often encounter difficulties in finding
available bikes at their preferred times and locations. This study examines the
determinants of the users' mode shifts in the context of bike absence, using
survey data from Nanjing, China. An integrated choice and latent variable based
on multinomial logit was employed to investigate the impact of
socio-demographic, trip characteristics, and psychological factors on travel
mode choices. Mode choice models were estimated with seven mode alternatives,
including bike-sharing related choices (waiting in place, picking up bikes on
the way, and picking up bikes on a detour), bus, taxi, riding hailing, and
walk. The findings show that under shared-bike unavailability, users prefer to
pick up bikes on the way rather than take detours, with buses and walking as
favored alternatives to shared bikes. Lower-educated users tend to wait in
place, showing greater concern for waiting time compared to riding time.
Lower-income users, commuters, and females prefer picking up bikes on the way,
while non-commuters and males opt for detours. The insights gained in this
study can provide ideas for solving the problems of demand estimation, parking
area siting, and multi-modal synergies of bike sharing to enhance utilization
and user satisfaction.

arXiv link: http://arxiv.org/abs/2503.14265v1

Econometrics arXiv paper, submitted: 2025-03-18

A Note on the Asymptotic Properties of the GLS Estimator in Multivariate Regression with Heteroskedastic and Autocorrelated Errors

Authors: Koichiro Moriya, Akihiko Noda

We study the asymptotic properties of the GLS estimator in multivariate
regression with heteroskedastic and autocorrelated errors. We derive Wald
statistics for linear restrictions and assess their performance. The statistics
remains robust to heteroskedasticity and autocorrelation.

arXiv link: http://arxiv.org/abs/2503.13950v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-03-17

Minnesota BART

Authors: Pedro A. Lima, Carlos M. Carvalho, Hedibert F. Lopes, Andrew Herren

Vector autoregression (VAR) models are widely used for forecasting and
macroeconomic analysis, yet they remain limited by their reliance on a linear
parameterization. Recent research has introduced nonparametric alternatives,
such as Bayesian additive regression trees (BART), which provide flexibility
without strong parametric assumptions. However, existing BART-based frameworks
do not account for time dependency or allow for sparse estimation in the
construction of regression tree priors, leading to noisy and inefficient
high-dimensional representations. This paper introduces a sparsity-inducing
Dirichlet hyperprior on the regression tree's splitting probabilities, allowing
for automatic variable selection and high-dimensional VARs. Additionally, we
propose a structured shrinkage prior that decreases the probability of
splitting on higher-order lags, aligning with the Minnesota prior's principles.
Empirical results demonstrate that our approach improves predictive accuracy
over the baseline BART prior and Bayesian VAR (BVAR), particularly in capturing
time-dependent relationships and enhancing density forecasts. These findings
highlight the potential of developing domain-specific nonparametric methods in
macroeconomic forecasting.

arXiv link: http://arxiv.org/abs/2503.13759v1

Econometrics arXiv updated paper (originally submitted: 2025-03-17)

Treatment Effect Heterogeneity in Regression Discontinuity Designs

Authors: Sebastian Calonico, Matias D. Cattaneo, Max H. Farrell, Filippo Palomba, Rocio Titiunik

Empirical studies using Regression Discontinuity (RD) designs often explore
heterogeneous treatment effects based on pretreatment covariates, even though
no formal statistical methods exist for such analyses. This has led to the
widespread use of ad hoc approaches in applications. Motivated by common
empirical practice, we develop a unified, theoretically grounded framework for
RD heterogeneity analysis. We show that a fully interacted local linear (in
functional parameters) model effectively captures heterogeneity while still
being tractable and interpretable in applications. The model structure holds
without loss of generality for discrete covariates. Although our proposed model
is potentially restrictive for continuous covariates, it naturally aligns with
standard empirical practice and offers a causal interpretation for RD
applications. We establish principled bandwidth selection and robust
bias-corrected inference methods to analyze heterogeneous treatment effects and
test group differences. We provide companion software to facilitate
implementation of our results. An empirical application illustrates the
practical relevance of our methods.

arXiv link: http://arxiv.org/abs/2503.13696v3

Econometrics arXiv updated paper (originally submitted: 2025-03-17)

Difference-in-Differences Designs: A Practitioner's Guide

Authors: Andrew Baker, Brantly Callaway, Scott Cunningham, Andrew Goodman-Bacon, Pedro H. C. Sant'Anna

Difference-in-differences (DiD) is arguably the most popular
quasi-experimental research design. Its canonical form, with two groups and two
periods, is well-understood. However, empirical practices can be ad hoc when
researchers go beyond that simple case. This article provides an organizing
framework for discussing different types of DiD designs and their associated
DiD estimators. It discusses covariates, weights, handling multiple periods,
and staggered treatments. The organizational framework, however, applies to
other extensions of DiD methods as well.

arXiv link: http://arxiv.org/abs/2503.13323v3

Econometrics arXiv paper, submitted: 2025-03-17

Tracking the Hidden Forces Behind Laos' 2022 Exchange Rate Crisis and Balance of Payments Instability

Authors: Mariza Cooray, Rolando Gonzales Martinez

This working paper uses a Dynamic Factor Model ('the model') to identify
underlying factors contributing to the debt-induced economic crisis in the
People's Democratic Republic of Laos ('Laos'). The analysis aims to use the
latent macroeconomic insights to propose ways forward for forecasting. We focus
on Laos's historic structural weaknesses to identify when a balance of payments
crisis with either a persistent current account imbalance or rapid capital
outflows would occur. By extracting latent economic factors from macroeconomic
indicators, the model provides a starting point for analyzing the structural
vulnerabilities leading to the value of the kip in USD terms dropping and
contributing to inflation in the country. This findings of this working paper
contribute to the broader literature on exchange rate instability and external
sector vulnerabilities in emerging economies, offering insights on what
constitutes as 'signals' as opposed to plain 'noise' from a macroeconomic
forecasting standpoint.

arXiv link: http://arxiv.org/abs/2503.13308v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-03-17

SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement

Authors: Brian Cho, Ana-Roxana Pop, Ariel Evnine, Nathan Kallus

To design effective digital interventions, experimenters face the challenge
of learning decision policies that balance multiple objectives using offline
data. Often, they aim to develop policies that maximize goal outcomes, while
ensuring there are no undesirable changes in guardrail outcomes. To provide
credible recommendations, experimenters must not only identify policies that
satisfy the desired changes in goal and guardrail outcomes, but also offer
probabilistic guarantees about the changes these policies induce. In practice,
however, policy classes are often large, and digital experiments tend to
produce datasets with small effect sizes relative to noise. In this setting,
standard approaches such as data splitting or multiple testing often result in
unstable policy selection and/or insufficient statistical power. In this paper,
we provide safe noisy policy learning (SNPL), a novel approach that leverages
the concept of algorithmic stability to address these challenges. Our method
enables policy learning while simultaneously providing high-confidence
guarantees using the entire dataset, avoiding the need for data-splitting. We
present finite-sample and asymptotic versions of our algorithm that ensure the
recommended policy satisfies high-probability guarantees for avoiding guardrail
regressions and/or achieving goal outcome improvements. We test both variants
of our approach approach empirically on a real-world application of
personalizing SMS delivery. Our results on real-world data suggest that our
approach offers dramatic improvements in settings with large policy classes and
low signal-to-noise across both finite-sample and asymptotic safety guarantees,
offering up to 300% improvements in detection rates and 150% improvements in
policy gains at significantly smaller sample sizes.

arXiv link: http://arxiv.org/abs/2503.12760v2

Econometrics arXiv updated paper (originally submitted: 2025-03-16)

Functional Factor Regression with an Application to Electricity Price Curve Modeling

Authors: Sven Otto, Luis Winter

We propose a function-on-function linear regression model for time-dependent
curve data that is consistently estimated by imposing factor structures on the
regressors. An integral operator based on cross-covariances identifies two
components for each functional regressor: a predictive low-dimensional
component, along with associated factors that are guaranteed to be correlated
with the dependent variable, and an infinite-dimensional component that has no
predictive power. In order to consistently estimate the correct number of
factors for each regressor, we introduce a functional eigenvalue difference
test. While conventional estimators for functional linear models fail to
converge in distribution, we establish asymptotic normality, making it possible
to construct confidence bands and conduct statistical inference. The model is
applied to forecast electricity price curves in three different energy markets.
Its prediction accuracy is found to be comparable to popular machine learning
approaches, while providing statistically valid inference and interpretable
insights into the conditional correlation structures of electricity prices.

arXiv link: http://arxiv.org/abs/2503.12611v2

Econometrics arXiv paper, submitted: 2025-03-16

Identification and estimation of structural vector autoregressive models via LU decomposition

Authors: Masato Shimokawa, Kou Fujimori

Structural vector autoregressive (SVAR) models are widely used to analyze the
simultaneous relationships between multiple time-dependent data. Various
statistical inference methods have been studied to overcome the identification
problems of SVAR models. However, most of these methods impose strong
assumptions for innovation processes such as the uncorrelation of components.
In this study, we relax the assumptions for innovation processes and propose an
identification method for SVAR models under the zero-restrictions on the
coefficient matrices, which correspond to sufficient conditions for LU
decomposition of the coefficient matrices of the reduced form of the SVAR
models. Moreover, we establish asymptotically normal estimators for the
coefficient matrices and impulse responses, which enable us to construct test
statistics for the simultaneous relationships of time-dependent data. The
finite-sample performance of the proposed method is elucidated by numerical
simulations. We also present an example of an empirical study that analyzes the
impact of policy rates on unemployment and prices.

arXiv link: http://arxiv.org/abs/2503.12378v1

Econometrics arXiv updated paper (originally submitted: 2025-03-14)

Nonlinear Forecast Error Variance Decompositions with Hermite Polynomials

Authors: Quinlan Lee

A novel approach to Forecast Error Variance Decompositions (FEVD) in
nonlinear Structural Vector Autoregressive models with Gaussian innovations is
proposed, called the Hermite FEVD (HFEVD). This method employs a Hermite
polynomial expansion to approximate the future trajectory of a nonlinear
process. The orthogonality of Hermite polynomials under the Gaussian density
facilitates the construction of the decomposition, providing a separation of
shock effects by time horizon, by components of the structural innovation and
by degree of nonlinearity. A link between the HFEVD and nonlinear Impulse
Response Functions is established and distinguishes between marginal and
interaction contributions of shocks. Simulation results from standard nonlinear
models are provided as illustrations and an application to fiscal policy shocks
is examined.

arXiv link: http://arxiv.org/abs/2503.11416v2

Econometrics arXiv updated paper (originally submitted: 2025-03-14)

Difference-in-Differences Meets Synthetic Control: Doubly Robust Identification and Estimation

Authors: Yixiao Sun, Haitian Xie, Yuhang Zhang

Difference-in-Differences (DiD) and Synthetic Control (SC) are widely used
methods for causal inference in panel data, each with distinct strengths and
limitations. We propose a novel method for short-panel causal inference that
integrates the advantages of both approaches. Our method delivers a doubly
robust identification strategy for the average treatment effect on the treated
(ATT) under either of two non-nested assumptions: parallel trends or a
group-level SC condition. Building on this identification result, we develop a
unified semiparametric framework for estimating the ATT. Notably, the
identification-robust moment function satisfies Neyman orthogonality under the
parallel trends assumption but not under the SC assumption, leading to
different asymptotic variances across the two identification strategies. To
ensure valid inference, we propose a multiplier bootstrap method that
consistently approximates the asymptotic distribution under either assumption.
Furthermore, we extend our methodology to accommodate repeated cross-sectional
data and staggered treatment designs. As an empirical application, we evaluate
the impact of the 2003 minimum wage increase in Alaska on family income.
Finally, in simulation studies based on empirically calibrated data-generating
processes, we demonstrate that the proposed estimation and inference methods
perform well in finite samples under either identification assumption.

arXiv link: http://arxiv.org/abs/2503.11375v2

Econometrics arXiv paper, submitted: 2025-03-13

On the numerical approximation of minimax regret rules via fictitious play

Authors: Patrik Guggenberger, Jiaqi Huang

Finding numerical approximations to minimax regret treatment rules is of key
interest. To do so when potential outcomes are in {0,1} we discretize the
action space of nature and apply a variant of Robinson's (1951) algorithm for
iterative solutions for finite two-person zero sum games. Our approach avoids
the need to evaluate regret of each treatment rule in each iteration. When
potential outcomes are in [0,1] we apply the so-called coarsening approach. We
consider a policymaker choosing between two treatments after observing data
with unequal sample sizes per treatment and the case of testing several
innovations against the status quo.

arXiv link: http://arxiv.org/abs/2503.10932v1

Econometrics arXiv updated paper (originally submitted: 2025-03-13)

Constructing an Instrument as a Function of Covariates

Authors: Moses Stewart

Researchers often use instrumental variables (IV) models to investigate the
causal relationship between an endogenous variable and an outcome while
controlling for covariates. When an exogenous variable is unavailable to serve
as the instrument for an endogenous treatment, a recurring empirical practice
is to construct one from a nonlinear transformation of the covariates. We
investigate how reliable these estimates are under mild forms of
misspecification. Our main result shows that for instruments constructed from
covariates, the IV estimand can be arbitrarily biased under mild forms of
misspecification, even when imposing constant linear treatment effects. We
perform a semi-synthetic exercise by calibrating data to alternative models
proposed in the literature and estimating the average treatment effect. Our
results show that IV specifications that use instruments constructed from
covariates are non-robust to nonlinearity in the true structural function.

arXiv link: http://arxiv.org/abs/2503.10929v2

Econometrics arXiv updated paper (originally submitted: 2025-03-13)

A New Design-Based Variance Estimator for Finely Stratified Experiments

Authors: Yuehao Bai, Xun Huang, Joseph P. Romano, Azeem M. Shaikh, Max Tabord-Meehan

This paper considers the problem of design-based inference for the average
treatment effect in finely stratified experiments. Here, by "design-based” we
mean that the only source of uncertainty stems from the randomness in treatment
assignment; by "finely stratified” we mean that units are stratified into
groups of a fixed size according to baseline covariates and then, within each
group, a fixed number of units are assigned uniformly at random to treatment
and the remainder to control. In this setting we present a novel estimator of
the variance of the difference-in-means based on pairing "adjacent" strata.
Importantly, our estimator is well defined even in the challenging setting
where there is exactly one treated or control unit per stratum. We prove that
our estimator is upward-biased, and thus can be used for inference under mild
restrictions on the finite population. We compare our estimator with some
well-known estimators that have been proposed previously in this setting, and
demonstrate that, while these estimators are also upward-biased, our estimator
has smaller bias and therefore leads to more precise inferences whenever
adjacent strata are sufficiently similar. To further understand when our
estimator leads to more precise inferences, we introduce a framework motivated
by a thought experiment in which the finite population is modeled as having
been drawn once in an i.i.d. fashion from a well-behaved probability
distribution. In this framework, we argue that our estimator dominates the
others in terms of limiting bias and that these improvements are strict except
under strong restrictions on the treatment effects. Finally, we illustrate the
practical relevance of our theoretical results through a simulation study,
which reveals that our estimator can in fact lead to substantially more precise
inferences, especially when the quality of stratification is high.

arXiv link: http://arxiv.org/abs/2503.10851v3

Econometrics arXiv cross-link from cs.CV (cs.CV), submitted: 2025-03-13

Visual Polarization Measurement Using Counterfactual Image Generation

Authors: Mohammad Mosaffa, Omid Rafieian, Hema Yoganarasimhan

Political polarization is a significant issue in American politics,
influencing public discourse, policy, and consumer behavior. While studies on
polarization in news media have extensively focused on verbal content,
non-verbal elements, particularly visual content, have received less attention
due to the complexity and high dimensionality of image data. Traditional
descriptive approaches often rely on feature extraction from images, leading to
biased polarization estimates due to information loss. In this paper, we
introduce the Polarization Measurement using Counterfactual Image Generation
(PMCIG) method, which combines economic theory with generative models and
multi-modal deep learning to fully utilize the richness of image data and
provide a theoretically grounded measure of polarization in visual content.
Applying this framework to a decade-long dataset featuring 30 prominent
politicians across 20 major news outlets, we identify significant polarization
in visual content, with notable variations across outlets and politicians. At
the news outlet level, we observe significant heterogeneity in visual slant.
Outlets such as Daily Mail, Fox News, and Newsmax tend to favor Republican
politicians in their visual content, while The Washington Post, USA Today, and
The New York Times exhibit a slant in favor of Democratic politicians. At the
politician level, our results reveal substantial variation in polarized
coverage, with Donald Trump and Barack Obama among the most polarizing figures,
while Joe Manchin and Susan Collins are among the least. Finally, we conduct a
series of validation tests demonstrating the consistency of our proposed
measures with external measures of media slant that rely on non-image-based
sources.

arXiv link: http://arxiv.org/abs/2503.10738v1

Econometrics arXiv paper, submitted: 2025-03-12

PLRD: Partially Linear Regression Discontinuity Inference

Authors: Aditya Ghosh, Guido Imbens, Stefan Wager

Regression discontinuity designs have become one of the most popular research
designs in empirical economics. We argue, however, that widely used approaches
to building confidence intervals in regression discontinuity designs exhibit
suboptimal behavior in practice: In a simulation study calibrated to
high-profile applications of regression discontinuity designs, existing methods
either have systematic under-coverage or have wider-than-necessary intervals.
We propose a new approach, partially linear regression discontinuity inference
(PLRD), and find it to address shortcomings of existing methods: Throughout our
experiments, confidence intervals built using PLRD are both valid and short. We
also provide large-sample guarantees for PLRD under smoothness assumptions.

arXiv link: http://arxiv.org/abs/2503.09907v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-03-12

Designing Graph Convolutional Neural Networks for Discrete Choice with Network Effects

Authors: Daniel F. Villarraga, Ricardo A. Daziano

We introduce a novel model architecture that incorporates network effects
into discrete choice problems, achieving higher predictive performance than
standard discrete choice models while offering greater interpretability than
general-purpose flexible model classes. Econometric discrete choice models aid
in studying individual decision-making, where agents select the option with the
highest reward from a discrete set of alternatives. Intuitively, the utility an
individual derives from a particular choice depends on their personal
preferences and characteristics, the attributes of the alternative, and the
value their peers assign to that alternative or their previous choices.
However, most applications ignore peer influence, and models that do consider
peer or network effects often lack the flexibility and predictive performance
of recently developed approaches to discrete choice, such as deep learning. We
propose a novel graph convolutional neural network architecture to model
network effects in discrete choices, achieving higher predictive performance
than standard discrete choice models while retaining the interpretability
necessary for inference--a quality often lacking in general-purpose deep
learning architectures. We evaluate our architecture using revealed commuting
choice data, extended with travel times and trip costs for each travel mode for
work-related trips in New York City, as well as 2016 U.S. election data
aggregated by county, to test its performance on datasets with highly
imbalanced classes. Given the interpretability of our models, we can estimate
relevant economic metrics, such as the value of travel time savings in New York
City. Finally, we compare the predictive performance and behavioral insights
from our architecture to those derived from traditional discrete choice and
general-purpose deep learning models.

arXiv link: http://arxiv.org/abs/2503.09786v1

Econometrics arXiv paper, submitted: 2025-03-12

On the Wisdom of Crowds (of Economists)

Authors: Francis X. Diebold, Aaron Mora, Minchul Shin

We study the properties of macroeconomic survey forecast response averages as
the number of survey respondents grows. Such averages are "portfolios" of
forecasts. We characterize the speed and pattern of the gains from
diversification and their eventual decrease with portfolio size (the number of
survey respondents) in both (1) the key real-world data-based environment of
the U.S. Survey of Professional Forecasters (SPF), and (2) the theoretical
model-based environment of equicorrelated forecast errors. We proceed by
proposing and comparing various direct and model-based "crowd size signature
plots," which summarize the forecasting performance of k-average forecasts as a
function of k, where k is the number of forecasts in the average. We then
estimate the equicorrelation model for growth and inflation forecast errors by
choosing model parameters to minimize the divergence between direct and
model-based signature plots. The results indicate near-perfect equicorrelation
model fit for both growth and inflation, which we explicate by showing
analytically that, under conditions, the direct and fitted equicorrelation
model-based signature plots are identical at a particular model parameter
configuration, which we characterize. We find that the gains from
diversification are greater for inflation forecasts than for growth forecasts,
but that both gains nevertheless decrease quite quickly, so that fewer SPF
respondents than currently used may be adequate.

arXiv link: http://arxiv.org/abs/2503.09287v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-03-11

On a new robust method of inference for general time series models

Authors: Zihan Wang, Xinghao Qiao, Dong Li, Howell Tong

In this article, we propose a novel logistic quasi-maximum likelihood
estimation (LQMLE) for general parametric time series models. Compared to the
classical Gaussian QMLE and existing robust estimations, it enjoys many
distinctive advantages, such as robustness in respect of distributional
misspecification and heavy-tailedness of the innovation, more resiliency to
outliers, smoothness and strict concavity of the log logistic quasi-likelihood
function, and boundedness of the influence function among others. Under some
mild conditions, we establish the strong consistency and asymptotic normality
of the LQMLE. Moreover, we propose a new and vital parameter identifiability
condition to ensure desirable asymptotics of the LQMLE. Further, based on the
LQMLE, we consider the Wald test and the Lagrange multiplier test for the
unknown parameters, and derive the limiting distributions of the corresponding
test statistics. The applicability of our methodology is demonstrated by
several time series models, including DAR, GARCH, ARMA-GARCH, DTARMACH, and
EXPAR. Numerical simulation studies are carried out to assess the finite-sample
performance of our methodology, and an empirical example is analyzed to
illustrate its usefulness.

arXiv link: http://arxiv.org/abs/2503.08655v1

Econometrics arXiv updated paper (originally submitted: 2025-03-11)

Functional Linear Projection and Impulse Response Analysis

Authors: Won-Ki Seo, Dakyung Seong

This paper proposes econometric methods for studying how economic variables
respond to function-valued shocks. Our methods are developed based on linear
projection estimation of predictive regression models with a function-valued
predictor and other control variables. We show that the linear projection
coefficient associated with the functional variable allows for the impulse
response interpretation in a functional structural vector autoregressive model
under a certain identification scheme, similar to well-known Sims' (1972)
causal chain, but with nontrivial complications in our functional setup. A
novel estimator based on an operator Schur complement is proposed and its
asymptotic properties are studied. We illustrate its empirical applicability
with two examples involving functional variables: economy sentiment
distributions and functional monetary policy shocks.

arXiv link: http://arxiv.org/abs/2503.08364v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-03-10

A primer on optimal transport for causal inference with observational data

Authors: Florian F Gunsilius

The theory of optimal transportation has developed into a powerful and
elegant framework for comparing probability distributions, with wide-ranging
applications in all areas of science. The fundamental idea of analyzing
probabilities by comparing their underlying state space naturally aligns with
the core idea of causal inference, where understanding and quantifying
counterfactual states is paramount. Despite this intuitive connection, explicit
research at the intersection of optimal transport and causal inference is only
beginning to develop. Yet, many foundational models in causal inference have
implicitly relied on optimal transport principles for decades, without
recognizing the underlying connection. Therefore, the goal of this review is to
offer an introduction to the surprisingly deep existing connections between
optimal transport and the identification of causal effects with observational
data -- where optimal transport is not just a set of potential tools, but
actually builds the foundation of model assumptions. As a result, this review
is intended to unify the language and notation between different areas of
statistics, mathematics, and econometrics, by pointing out these existing
connections, and to explore novel problems and directions for future work in
both areas derived from this realization.

arXiv link: http://arxiv.org/abs/2503.07811v2

Econometrics arXiv paper, submitted: 2025-03-10

Nonlinear Temperature Sensitivity of Residential Electricity Demand: Evidence from a Distributional Regression Approach

Authors: Kyungsik Nam, Won-Ki Seo

We estimate the temperature sensitivity of residential electricity demand
during extreme temperature events using the distribution-to-scalar regression
model. Rather than relying on simple averages or individual quantile statistics
of raw temperature data, we construct distributional summaries, such as
probability density, hazard rate, and quantile functions, to retain a more
comprehensive representation of temperature variation. This approach not only
utilizes richer information from the underlying temperature distribution but
also enables the examination of extreme temperature effects that conventional
models fail to capture. Additionally, recognizing that distribution functions
are typically estimated from limited discrete observations and may be subject
to measurement errors, our econometric framework explicitly addresses this
issue. Empirical findings from the hazard-to-demand model indicate that
residential electricity demand exhibits a stronger nonlinear response to cold
waves than to heat waves, while heat wave shocks demonstrate a more pronounced
incremental effect. Moreover, the temperature quantile-to-demand model produces
largely insignificant demand response estimates, attributed to the offsetting
influence of two counteracting forces.

arXiv link: http://arxiv.org/abs/2503.07213v1

Econometrics arXiv updated paper (originally submitted: 2025-03-09)

Taxonomy and Estimation of Multiple Breakpoints in High-Dimensional Factor Models

Authors: Jiangtao Duan, Jushan Bai, Xu Han

This paper investigates the estimation of high-dimensional factor models in
which factor loadings undergo an unknown number of structural changes over
time. Given that a model with multiple changes in factor loadings can be
observationally indistinguishable from one with constant loadings but varying
factor variances, this reduces the high-dimensional structural change problem
to a lower-dimensional one. Due to the presence of multiple breakpoints, the
factor space may expand, potentially causing the pseudo factor covariance
matrix within some regimes to be singular. We define two types of breakpoints:
{\bf a singular change}, where the number of factors in the combined regime
exceeds the minimum number of factors in the two separate regimes, and {\bf a
rotational change}, where the number of factors in the combined regime equals
that in each separate regime. Under a singular change, we derive the properties
of the small eigenvalues and establish the consistency of the QML estimators.
Under a rotational change, unlike in the single-breakpoint case, the pseudo
factor covariance matrix within each regime can be either full rank or
singular, yet the QML estimation error for the breakpoints remains stably
bounded. We further propose an information criterion (IC) to estimate the
number of breakpoints and show that, with probability approaching one, it
accurately identifies the true number of structural changes. Monte Carlo
simulations confirm strong finite-sample performance. Finally, we apply our
method to the FRED-MD dataset, identifying five structural breaks in factor
loadings between 1959 and 2024.

arXiv link: http://arxiv.org/abs/2503.06645v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-03-09

Bayesian Synthetic Control with a Soft Simplex Constraint

Authors: Yihong Xu, Quan Zhou

Whether the synthetic control method should be implemented with the simplex
constraint and how to implement it in a high-dimensional setting have been
widely discussed. To address both issues simultaneously, we propose a novel
Bayesian synthetic control method that integrates a soft simplex constraint
with spike-and-slab variable selection. Our model is featured by a hierarchical
prior capturing how well the data aligns with the simplex assumption, which
enables our method to efficiently adapt to the structure and information
contained in the data by utilizing the constraint in a more flexible and
data-driven manner. A unique computational challenge posed by our model is that
conventional Markov chain Monte Carlo sampling algorithms for Bayesian variable
selection are no longer applicable, since the soft simplex constraint results
in an intractable marginal likelihood. To tackle this challenge, we propose to
update the regression coefficients of two predictors simultaneously from their
full conditional posterior distribution, which has an explicit but highly
complicated characterization. This novel Gibbs updating scheme leads to an
efficient Metropolis-within-Gibbs sampler that enables effective posterior
sampling from our model and accurate estimation of the average treatment
effect. Simulation studies demonstrate that our method performs well across a
wide range of settings, in terms of both variable selection and treatment
effect estimation, even when the true data-generating process does not adhere
to the simplex constraint. Finally, application of our method to two empirical
examples in the economic literature yields interesting insights into the impact
of economic policies.

arXiv link: http://arxiv.org/abs/2503.06454v1

Econometrics arXiv updated paper (originally submitted: 2025-03-08)

Bounding the Effect of Persuasion with Monotonicity Assumptions: Reassessing the Impact of TV Debates

Authors: Sung Jae Jun, Sokbae Lee

Televised debates between presidential candidates are often regarded as the
exemplar of persuasive communication. Yet, recent evidence from Le Pennec and
Pons (2023) indicates that they may not sway voters as strongly as popular
belief suggests. We revisit their findings through the lens of the persuasion
rate and introduce a robust framework that does not require exogenous
treatment, parallel trends, or credible instruments. Instead, we leverage
plausible monotonicity assumptions to partially identify the persuasion rate
and related parameters. Our results reaffirm that the sharp upper bounds on the
persuasive effects of TV debates remain modest.

arXiv link: http://arxiv.org/abs/2503.06046v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-03-07

A Hybrid Framework Combining Autoregression and Common Factors for Matrix Time Series Modeling

Authors: Zhiyun Fan, Xiaoyu Zhang, Mingyang Chen, Di Wang

Matrix-valued time series are increasingly common in economics and finance,
but existing approaches such as matrix autoregressive and dynamic matrix factor
models often impose restrictive assumptions and fail to capture complex
dependencies. We propose a hybrid framework that integrates autoregressive
dynamics with a shared low-rank common factor structure, enabling flexible
modeling of temporal dependence and cross-sectional correlation while achieving
dimension reduction. The model captures dynamic relationships through lagged
matrix terms and leverages low-rank structures across predictor and response
matrices, with connections between their row and column subspaces established
via common latent bases to improve interpretability and efficiency. We develop
a computationally efficient gradient-based estimation method and establish
theoretical guarantees for statistical consistency and algorithmic convergence.
Extensive simulations show robust performance under various data-generating
processes, and in an application to multinational macroeconomic data, the model
outperforms existing methods in forecasting and reveals meaningful interactions
among economic factors and countries. The proposed framework provides a
practical, interpretable, and theoretically grounded tool for analyzing
high-dimensional matrix time series.

arXiv link: http://arxiv.org/abs/2503.05340v2

Econometrics arXiv paper, submitted: 2025-03-07

When can we get away with using the two-way fixed effects regression?

Authors: Apoorva Lal

The use of the two-way fixed effects regression in empirical social science
was historically motivated by folk wisdom that it uncovers the Average
Treatment effect on the Treated (ATT) as in the canonical two-period two-group
case. This belief has come under scrutiny recently due to recent results in
applied econometrics showing that it fails to uncover meaningful averages of
heterogeneous treatment effects in the presence of effect heterogeneity over
time and across adoption cohorts, and several heterogeneity-robust alternatives
have been proposed. However, these estimators often have higher variance and
are therefore under-powered for many applications, which poses a bias-variance
tradeoff that is challenging for researchers to navigate. In this paper, we
propose simple tests of linear restrictions that can be used to test for
differences in dynamic treatment effects over cohorts, which allows us to test
for when the two-way fixed effects regression is likely to yield biased
estimates of the ATT. These tests are implemented as methods in the pyfixest
python library.

arXiv link: http://arxiv.org/abs/2503.05125v1

Econometrics arXiv paper, submitted: 2025-03-06

Enhancing Poverty Targeting with Spatial Machine Learning: An application to Indonesia

Authors: Rolando Gonzales Martinez, Mariza Cooray

This study leverages spatial machine learning (SML) to enhance the accuracy
of Proxy Means Testing (PMT) for poverty targeting in Indonesia. Conventional
PMT methodologies are prone to exclusion and inclusion errors due to their
inability to account for spatial dependencies and regional heterogeneity. By
integrating spatial contiguity matrices, SML models mitigate these limitations,
facilitating a more precise identification and comparison of geographical
poverty clusters. Utilizing household survey data from the Social Welfare
Integrated Data Survey (DTKS) for the periods 2016 to 2020 and 2016 to 2021,
this study examines spatial patterns in income distribution and delineates
poverty clusters at both provincial and district levels. Empirical findings
indicate that the proposed SML approach reduces exclusion errors from 28% to
20% compared to standard machine learning models, underscoring the critical
role of spatial analysis in refining machine learning-based poverty targeting.
These results highlight the potential of SML to inform the design of more
equitable and effective social protection policies, particularly in
geographically diverse contexts. Future research can explore the applicability
of spatiotemporal models and assess the generalizability of SML approaches
across varying socio-economic settings.

arXiv link: http://arxiv.org/abs/2503.04300v1

Econometrics arXiv updated paper (originally submitted: 2025-03-05)

Optimal Policy Choices Under Uncertainty

Authors: Sarah Moon

Policymakers often make changes to policies whose benefits and costs are
unknown and must be inferred from statistical estimates in empirical studies.
In this paper I consider the problem of a planner who changes upfront spending
on a set of policies to maximize social welfare but faces statistical
uncertainty about the impact of those changes. I set up a local optimization
problem that is tractable under statistical uncertainty and solve for the local
change in spending that maximizes the posterior expected rate of increase in
welfare. I propose an empirical Bayes approach to approximating the optimal
local spending rule, which solves the planner's local problem with posterior
mean estimates of benefits and net costs. I show theoretically that the
empirical Bayes approach performs well by deriving rates of convergence for the
rate of increase in welfare. These rates converge for a large class of decision
problems, including those where rates from a sample plug-in approach do not.

arXiv link: http://arxiv.org/abs/2503.03910v3

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2025-03-04

Extrapolating the long-term seasonal component of electricity prices for forecasting in the day-ahead market

Authors: Katarzyna Chęć, Bartosz Uniejewski, Rafał Weron

Recent studies provide evidence that decomposing the electricity price into
the long-term seasonal component (LTSC) and the remaining part, predicting both
separately, and then combining their forecasts can bring significant accuracy
gains in day-ahead electricity price forecasting. However, not much attention
has been paid to predicting the LTSC, and the last 24 hourly values of the
estimated pattern are typically copied for the target day. To address this gap,
we introduce a novel approach which extracts the trend-seasonal pattern from a
price series extrapolated using price forecasts for the next 24 hours. We
assess it using two 5-year long test periods from the German and Spanish power
markets, covering the Covid-19 pandemic, the 2021/2022 energy crisis, and the
war in Ukraine. Considering parsimonious autoregressive and LASSO-estimated
models, we find that improvements in predictive accuracy range from 3% to 15%
in terms of the root mean squared error and exceed 1% in terms of profits from
a realistic trading strategy involving day-ahead bidding and battery storage.

arXiv link: http://arxiv.org/abs/2503.02518v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2025-03-04

On the Realized Joint Laplace Transform of Volatilities with Application to Test the Volatility Dependence

Authors: XinWei Feng, Yu Jiang, Zhi Liu, Zhe Meng

In this paper, we first investigate the estimation of the empirical joint
Laplace transform of volatilities of two semi-martingales within a fixed time
interval [0, T] by using overlapped increments of high-frequency data. The
proposed estimator is robust to the presence of finite variation jumps in price
processes. The related functional central limit theorem for the proposed
estimator has been established. Compared with the estimator with non-overlapped
increments, the estimator with overlapped increments improves the asymptotic
estimation efficiency. Moreover, we study the asymptotic theory of estimator
under a long-span setting and employ it to create a feasible test for the
dependence between volatilities. Finally, simulation and empirical studies
demonstrate the performance of proposed estimators.

arXiv link: http://arxiv.org/abs/2503.02283v1

Econometrics arXiv paper, submitted: 2025-03-04

Enhancing Efficiency of Local Projections Estimation with Volatility Clustering in High-Frequency Data

Authors: Chew Lian Chua, David Gunawan, Sandy Suardi

This paper advances the local projections (LP) method by addressing its
inefficiency in high-frequency economic and financial data with volatility
clustering. We incorporate a generalized autoregressive conditional
heteroskedasticity (GARCH) process to resolve serial correlation issues and
extend the model with GARCH-X and GARCH-HAR structures. Monte Carlo simulations
show that exploiting serial dependence in LP error structures improves
efficiency across forecast horizons, remains robust to persistent volatility,
and yields greater gains as sample size increases. Our findings contribute to
refining LP estimation, enhancing its applicability in analyzing economic
interventions and financial market dynamics.

arXiv link: http://arxiv.org/abs/2503.02217v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-03-03

How Do Consumers Really Choose: Exposing Hidden Preferences with the Mixture of Experts Model

Authors: Diego Vallarino

Understanding consumer choice is fundamental to marketing and management
research, as firms increasingly seek to personalize offerings and optimize
customer engagement. Traditional choice modeling frameworks, such as
multinomial logit (MNL) and mixed logit models, impose rigid parametric
assumptions that limit their ability to capture the complexity of consumer
decision-making. This study introduces the Mixture of Experts (MoE) framework
as a machine learning-driven alternative that dynamically segments consumers
based on latent behavioral patterns. By leveraging probabilistic gating
functions and specialized expert networks, MoE provides a flexible,
nonparametric approach to modeling heterogeneous preferences.
Empirical validation using large-scale retail data demonstrates that MoE
significantly enhances predictive accuracy over traditional econometric models,
capturing nonlinear consumer responses to price variations, brand preferences,
and product attributes. The findings underscore MoEs potential to improve
demand forecasting, optimize targeted marketing strategies, and refine
segmentation practices. By offering a more granular and adaptive framework,
this study bridges the gap between data-driven machine learning approaches and
marketing theory, advocating for the integration of AI techniques in managerial
decision-making and strategic consumer insights.

arXiv link: http://arxiv.org/abs/2503.05800v1

Econometrics arXiv paper, submitted: 2025-03-03

Dynamic Factor Correlation Model

Authors: Chen Tong, Peter Reinhard Hansen

We introduce a new dynamic factor correlation model with a novel
variation-free parametrization of factor loadings. The model is applicable to
high dimensions and can accommodate time-varying correlations, heterogeneous
heavy-tailed distributions, and dependent idiosyncratic shocks, such as those
observed in returns on stocks in the same subindustry. We apply the model to a
"small universe" with 12 asset returns and to a "large universe" with 323 asset
returns. The former facilitates a comprehensive empirical analysis and
comparisons and the latter demonstrates the flexibility and scalability of the
model.

arXiv link: http://arxiv.org/abs/2503.01080v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-03-03

Vector Copula Variational Inference and Dependent Block Posterior Approximations

Authors: Yu Fu, Michael Stanley Smith, Anastasios Panagiotelis

The key to VI is the selection of a tractable density to approximate the
Bayesian posterior. For large and complex models a common choice is to assume
independence between multivariate blocks in a partition of the parameter space.
While this simplifies the problem it can reduce accuracy. This paper proposes
using vector copulas to capture dependence between the blocks parsimoniously.
Tailored multivariate marginals are constructed using learnable transport maps.
We call the resulting joint distribution a “dependent block posterior”
approximation. Vector copula models are suggested that make tractable and
flexible variational approximations. They allow for differing marginals,
numbers of blocks, block sizes and forms of between block dependence. They also
allow for solution of the variational optimization using efficient stochastic
gradient methods. The approach is demonstrated using four different statistical
models and 16 datasets which have posteriors that are challenging to
approximate. This includes models that use global-local shrinkage priors for
regularization, and hierarchical models for smoothing and heteroscedastic time
series. In all cases, our method produces more accurate posterior
approximations than benchmark VI methods that either assume block independence
or factor-based dependence, at limited additional computational cost. A python
package implementing the method is available on GitHub at
https://github.com/YuFuOliver/VCVI_Rep_PyPackage.

arXiv link: http://arxiv.org/abs/2503.01072v2

Econometrics arXiv updated paper (originally submitted: 2025-03-02)

Bayesian inference for dynamic spatial quantile models with interactive effects

Authors: Tomohiro Ando, Jushan Bai, Kunpeng Li, Yong Song

With the rapid advancement of information technology and data collection
systems, large-scale spatial panel data presents new methodological and
computational challenges. This paper introduces a dynamic spatial panel
quantile model that incorporates unobserved heterogeneity. The proposed model
captures the dynamic structure of panel data, high-dimensional cross-sectional
dependence, and allows for heterogeneous regression coefficients. To estimate
the model, we propose a novel Bayesian Markov Chain Monte Carlo (MCMC)
algorithm. Contributions to Bayesian computation include the development of
quantile randomization, a new Gibbs sampler for structural parameters, and
stabilization of the tail behavior of the inverse Gaussian random generator. We
establish Bayesian consistency for the proposed estimation method as both the
time and cross-sectional dimensions of the panel approach infinity. Monte Carlo
simulations demonstrate the effectiveness of the method. Finally, we illustrate
the applicability of the approach through a case study on the quantile
co-movement structure of the gasoline market.

arXiv link: http://arxiv.org/abs/2503.00772v2

Econometrics arXiv cross-link from cs.HC (cs.HC), submitted: 2025-03-02

Wikipedia Contributions in the Wake of ChatGPT

Authors: Liang Lyu, James Siderius, Hannah Li, Daron Acemoglu, Daniel Huttenlocher, Asuman Ozdaglar

How has Wikipedia activity changed for articles with content similar to
ChatGPT following its introduction? We estimate the impact using
differences-in-differences models, with dissimilar Wikipedia articles as a
baseline for comparison, to examine how changes in voluntary knowledge
contributions and information-seeking behavior differ by article content. Our
analysis reveals that newly created, popular articles whose content overlaps
with ChatGPT 3.5 saw a greater decline in editing and viewership after the
November 2022 launch of ChatGPT than dissimilar articles did. These findings
indicate heterogeneous substitution effects, where users selectively engage
less with existing platforms when AI provides comparable content. This points
to potential uneven impacts on the future of human-driven online knowledge
contributions.

arXiv link: http://arxiv.org/abs/2503.00757v1

Econometrics arXiv paper, submitted: 2025-03-02

Causal Inference on Outcomes Learned from Text

Authors: Iman Modarressi, Jann Spiess, Amar Venugopal

We propose a machine-learning tool that yields causal inference on text in
randomized trials. Based on a simple econometric framework in which text may
capture outcomes of interest, our procedure addresses three questions: First,
is the text affected by the treatment? Second, which outcomes is the effect on?
And third, how complete is our description of causal effects? To answer all
three questions, our approach uses large language models (LLMs) that suggest
systematic differences across two groups of text documents and then provides
valid inference based on costly validation. Specifically, we highlight the need
for sample splitting to allow for statistical validation of LLM outputs, as
well as the need for human labeling to validate substantive claims about how
documents differ across groups. We illustrate the tool in a proof-of-concept
application using abstracts of academic manuscripts.

arXiv link: http://arxiv.org/abs/2503.00725v1

Econometrics arXiv paper, submitted: 2025-03-01

The Uncertainty of Machine Learning Predictions in Asset Pricing

Authors: Yuan Liao, Xinjie Ma, Andreas Neuhierl, Linda Schilling

Machine learning in asset pricing typically predicts expected returns as
point estimates, ignoring uncertainty. We develop new methods to construct
forecast confidence intervals for expected returns obtained from neural
networks. We show that neural network forecasts of expected returns share the
same asymptotic distribution as classic nonparametric methods, enabling a
closed-form expression for their standard errors. We also propose a
computationally feasible bootstrap to obtain the asymptotic distribution. We
incorporate these forecast confidence intervals into an uncertainty-averse
investment framework. This provides an economic rationale for shrinkage
implementations of portfolio selection. Empirically, our methods improve
out-of-sample performance.

arXiv link: http://arxiv.org/abs/2503.00549v1

Econometrics arXiv updated paper (originally submitted: 2025-03-01)

GMM and M Estimation under Network Dependence

Authors: Yuya Sasaki

This paper presents GMM and M estimators and their asymptotic properties for
network-dependent data. To this end, I build on Kojevnikov, Marmer, and Song
(KMS, 2021) and develop a novel uniform law of large numbers (ULLN), which is
essential to ensure desired asymptotic behaviors of nonlinear estimators (e.g.,
Newey and McFadden, 1994, Section 2). Using this ULLN, I establish the
consistency and asymptotic normality of both GMM and M estimators. For
practical convenience, complete estimation and inference procedures are also
provided.

arXiv link: http://arxiv.org/abs/2503.00290v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2025-02-28

Location Characteristics of Conditional Selective Confidence Intervals via Polyhedral Methods

Authors: Andreas Dzemski, Ryo Okui, Wenjie Wang

We examine the location properties of a conditional selective confidence
interval constructed via the polyhedral method. The interval is derived from
the distribution of a test statistic conditional on the event of statistical
significance. For a one-sided test, its behavior depends on whether the
parameter is highly or only marginally significant. In the highly significant
case, the interval closely resembles the conventional confidence interval that
ignores selection. By contrast, when the parameter is only marginally
significant, the interval may shift far to the left of zero, potentially
excluding all a priori plausible parameter values. This "location problem" does
not arise if significance is determined by a two-sided test or by a one-sided
test with randomized response (e.g., data carving).

arXiv link: http://arxiv.org/abs/2502.20917v2

Econometrics arXiv updated paper (originally submitted: 2025-02-28)

Structural breaks detection and variable selection in dynamic linear regression via the Iterative Fused LASSO in high dimension

Authors: Angelo Milfont, Alvaro Veiga

We aim to develop a time series modeling methodology tailored to
high-dimensional environments, addressing two critical challenges: variable
selection from a large pool of candidates, and the detection of structural
break points, where the model's parameters shift. This effort centers on
formulating a least squares estimation problem with regularization constraints,
drawing on techniques such as Fused LASSO and AdaLASSO, which are
well-established in machine learning. Our primary achievement is the creation
of an efficient algorithm capable of handling high-dimensional cases within
practical time limits. By addressing these pivotal challenges, our methodology
holds the potential for widespread adoption. To validate its effectiveness, we
detail the iterative algorithm and benchmark its performance against the widely
recognized Path Algorithm for Generalized Lasso. Comprehensive simulations and
performance analyses highlight the algorithm's strengths. Additionally, we
demonstrate the methodology's applicability and robustness through simulated
case studies and a real-world example involving a stock portfolio dataset.
These examples underscore the methodology's practical utility and potential
impact across diverse high-dimensional settings.

arXiv link: http://arxiv.org/abs/2502.20816v2

Econometrics arXiv paper, submitted: 2025-02-27

Economic Causal Inference Based on DML Framework: Python Implementation of Binary and Continuous Treatment Variables

Authors: Shunxin Yao

This study utilizes a simulated dataset to establish Python code for Double
Machine Learning (DML) using Anaconda's Jupyter Notebook and the DML software
package from GitHub. The research focuses on causal inference experiments for
both binary and continuous treatment variables. The findings reveal that the
DML model demonstrates relatively stable performance in calculating the Average
Treatment Effect (ATE) and its robustness metrics. However, the study also
highlights that the computation of Conditional Average Treatment Effect (CATE)
remains a significant challenge for future DML modeling, particularly in the
context of continuous treatment variables. This underscores the need for
further research and development in this area to enhance the model's
applicability and accuracy.

arXiv link: http://arxiv.org/abs/2502.19898v1

Econometrics arXiv updated paper (originally submitted: 2025-02-27)

Semiparametric Triple Difference Estimators

Authors: Sina Akbari, Negar Kiyavash, AmirEmad Ghassami

The triple difference causal inference framework is an extension of the
well-known difference-in-differences framework. It relaxes the parallel trends
assumption of the difference-in-differences framework through leveraging data
from an auxiliary domain. Despite being commonly applied in empirical research,
the triple difference framework has received relatively limited attention in
the statistics literature. Specifically, investigating the intricacies of
identification and the design of robust and efficient estimators for this
framework has remained largely unexplored. This work aims to address these gaps
in the literature. From the identification standpoint, we present outcome
regression and weighting methods to identify the average treatment effect on
the treated in both panel data and repeated cross-section settings. For the
latter, we relax the commonly made assumption of time-invariant composition of
units. From the estimation perspective, we develop semiparametric estimators
for the triple difference framework in both panel data and repeated
cross-sections settings. These estimators are based on the cross-fitting
technique, and flexible machine learning tools can be used to estimate the
nuisance components. We characterize conditions under which our proposed
estimators are efficient, doubly robust, root-n consistent and asymptotically
normal. As an application of our proposed methodology, we examined the effect
of mandated maternity benefits on the hourly wages of women of childbearing age
and found that these mandates result in a 2.6% drop in hourly wages.

arXiv link: http://arxiv.org/abs/2502.19788v3

Econometrics arXiv paper, submitted: 2025-02-27

Time-Varying Identification of Structural Vector Autoregressions

Authors: Annika Camehl, Tomasz Woźniak

We propose a novel Bayesian heteroskedastic Markov-switching structural
vector autoregression with data-driven time-varying identification. The model
selects among alternative patterns of exclusion restrictions to identify
structural shocks within the Markov process regimes. We implement the selection
through a multinomial prior distribution over these patterns, which is a
spike'n'slab prior for individual parameters. By combining a Markov-switching
structural matrix with heteroskedastic structural shocks following a stochastic
volatility process, the model enables shock identification through time-varying
volatility within a regime. As a result, the exclusion restrictions become
over-identifying, and their selection is driven by the signal from the data.
Our empirical application shows that data support time variation in the US
monetary policy shock identification. We also verify that time-varying
volatility identifies the monetary policy shock within the regimes.

arXiv link: http://arxiv.org/abs/2502.19659v1

Econometrics arXiv updated paper (originally submitted: 2025-02-26)

Triple Difference Designs with Heterogeneous Treatment Effects

Authors: Laura Caron

Triple difference designs have become increasingly popular in empirical
economics. The advantage of a triple difference design is that, within a
treatment group, it allows for another subgroup of the population --
potentially less impacted by the treatment -- to serve as a control for the
subgroup of interest. While literature on difference-in-differences has
discussed heterogeneity in treatment effects between treated and control groups
or over time, little attention has been given to the implications of
heterogeneity in treatment effects between subgroups. In this paper, I show
that the parameter identified under the usual triple difference assumptions
does not allow for causal interpretation of differences between subgroups when
subgroups may differ in their underlying (unobserved) treatment effects. I
propose a new parameter of interest, the causal difference in average treatment
effects on the treated, which makes causal comparisons between subgroups. I
discuss assumptions for identification and derive the semiparametric efficiency
bounds for this parameter. I then propose doubly-robust, efficient estimators
for this parameter. I use a simulation study to highlight the desirable
finite-sample properties of these estimators, as well as to show the difference
between this parameter and the usual triple difference parameter of interest.
An empirical application shows the importance of considering treatment effect
heterogeneity in practical applications.

arXiv link: http://arxiv.org/abs/2502.19620v2

Econometrics arXiv updated paper (originally submitted: 2025-02-26)

Empirical likelihood approach for high-dimensional moment restrictions with dependent data

Authors: Jinyuan Chang, Qiao Hu, Zhentao Shi, Jia Zhang

Economic and financial models -- such as vector autoregressions, local
projections, and multivariate volatility models -- feature complex dynamic
interactions and spillovers across many time series. These models can be
integrated into a unified framework, with high-dimensional parameters
identified by moment conditions. As the number of parameters and moment
conditions may surpass the sample size, we propose adding a double penalty to
the empirical likelihood criterion to induce sparsity and facilitate dimension
reduction. Notably, we utilize a marginal empirical likelihood approach despite
temporal dependence in the data. Under regularity conditions, we provide
asymptotic guarantees for our method, making it an attractive option for
estimating large-scale multivariate time series models. We demonstrate the
versatility of our procedure through extensive Monte Carlo simulations and
three empirical applications, including analyses of US sectoral inflation
rates, fiscal multipliers, and volatility spillover in China's banking sector.

arXiv link: http://arxiv.org/abs/2502.18970v2

Econometrics arXiv paper, submitted: 2025-02-25

Minimum Distance Estimation of Quantile Panel Data Models

Authors: Blaise Melly, Martina Pons

We propose a minimum distance estimation approach for quantile panel data
models where unit effects may be correlated with covariates. This
computationally efficient method involves two stages: first, computing quantile
regression within each unit, then applying GMM to the first-stage fitted
values. Our estimators apply to (i) classical panel data, tracking units over
time, and (ii) grouped data, where individual-level data are available, but
treatment varies at the group level. Depending on the exogeneity assumptions,
this approach provides quantile analogs of classic panel data estimators,
including fixed effects, random effects, between, and Hausman-Taylor
estimators. In addition, our method offers improved precision for grouped
(instrumental) quantile regression compared to existing estimators. We
establish asymptotic properties as the number of units and observations per
unit jointly diverge to infinity. Additionally, we introduce an inference
procedure that automatically adapts to the potentially unknown convergence rate
of the estimator. Monte Carlo simulations demonstrate that our estimator and
inference procedure perform well in finite samples, even when the number of
observations per unit is moderate. In an empirical application, we examine the
impact of the food stamp program on birth weights. We find that the program's
introduction increased birth weights predominantly at the lower end of the
distribution, highlighting the ability of our method to capture heterogeneous
effects across the outcome distribution.

arXiv link: http://arxiv.org/abs/2502.18242v1

Econometrics arXiv paper, submitted: 2025-02-25

Certified Decisions

Authors: Isaiah Andrews, Jiafeng Chen

Hypothesis tests and confidence intervals are ubiquitous in empirical
research, yet their connection to subsequent decision-making is often unclear.
We develop a theory of certified decisions that pairs recommended decisions
with inferential guarantees. Specifically, we attach P-certificates -- upper
bounds on loss that hold with probability at least $1-\alpha$ -- to recommended
actions. We show that such certificates allow "safe," risk-controlling adoption
decisions for ambiguity-averse downstream decision-makers. We further prove
that it is without loss to limit attention to P-certificates arising as minimax
decisions over confidence sets, or what Manski (2021) terms "as-if decisions
with a set estimate." A parallel argument applies to E-certified decisions
obtained from e-values in settings with unbounded loss.

arXiv link: http://arxiv.org/abs/2502.17830v1

Econometrics arXiv paper, submitted: 2025-02-24

Optimal Salaries of Researchers with Motivational Emergence

Authors: Eldar Knar

In the context of scientific policy and science management, this study
examines the system of nonuniform wage distribution for researchers. A
nonlinear mathematical model of optimal remuneration for scientific workers has
been developed, considering key and additive aspects of scientific activity:
basic qualifications, research productivity, collaborative projects, skill
enhancement, distinctions, and international collaborations. Unlike traditional
linear schemes, the proposed approach is based on exponential and logarithmic
dependencies, allowing for the consideration of saturation effects and
preventing artificial wage growth due to mechanical increases in scientific
productivity indicators.
The study includes detailed calculations of optimal, minimum, and maximum
wages, demonstrating a fair distribution of remuneration on the basis of
researcher productivity. A linear increase in publication activity or grant
funding should not lead to uncontrolled salary growth, thus avoiding
distortions in the motivational system. The results of this study can be used
to reform and modernize the wage system for researchers in Kazakhstan and other
countries, as well as to optimize grant-based science funding mechanisms. The
proposed methodology fosters scientific motivation, long-term productivity, and
the internationalization of research while also promoting self-actualization
and ultimately forming an adequate and authentic reward system for the research
community.
Specifically, in resource-limited scientific systems, science policy should
focus on the qualitative development of individual researchers rather than
quantitative expansion (e.g., increasing the number of scientists). This can be
achieved through the productive progress of their motivation and
self-actualization.

arXiv link: http://arxiv.org/abs/2502.17271v1

Econometrics arXiv updated paper (originally submitted: 2025-02-22)

Conditional Triple Difference-in-Differences

Authors: Dor Leventer

Triple difference-in-differences designs are widely used to estimate causal
effects in empirical work. Surveying the literature, we find that most
applications include controls. We show that this standard practice is generally
biased for the target causal estimand when covariate distributions differ
across groups. To address this, we propose identifying a causal estimand by
fixing the covariate distribution to that of one group. We then develop a
double-robust estimator and illustrate its application in a canonical policy
setting.

arXiv link: http://arxiv.org/abs/2502.16126v3

Econometrics arXiv paper, submitted: 2025-02-22

Binary Outcome Models with Extreme Covariates: Estimation and Prediction

Authors: Laura Liu, Yulong Wang

This paper presents a novel semiparametric method to study the effects of
extreme events on binary outcomes and subsequently forecast future outcomes.
Our approach, based on Bayes' theorem and regularly varying (RV) functions,
facilitates a Pareto approximation in the tail without imposing parametric
assumptions beyond the tail. We analyze cross-sectional as well as static and
dynamic panel data models, incorporate additional covariates, and accommodate
the unobserved unit-specific tail thickness and RV functions in panel data. We
establish consistency and asymptotic normality of our tail estimator, and show
that our objective function converges to that of a panel Logit regression on
tail observations with the log extreme covariate as a regressor, thereby
simplifying implementation. The empirical application assesses whether small
banks become riskier when local housing prices sharply decline, a crucial
channel in the 2007--2008 financial crisis.

arXiv link: http://arxiv.org/abs/2502.16041v1

Econometrics arXiv paper, submitted: 2025-02-21

Clustered Network Connectedness: A New Measurement Framework with Application to Global Equity Markets

Authors: Bastien Buchwalter, Francis X. Diebold, Kamil Yilmaz

Network connections, both across and within markets, are central in countless
economic contexts. In recent decades, a large literature has developed and
applied flexible methods for measuring network connectedness and its evolution,
based on variance decompositions from vector autoregressions (VARs), as in
Diebold and Yilmaz (2014). Those VARs are, however, typically identified using
full orthogonalization (Sims, 1980), or no orthogonalization (Koop, Pesaran,
and Potter, 1996; Pesaran and Shin, 1998), which, although useful, are special
and extreme cases of a more general framework that we develop in this paper. In
particular, we allow network nodes to be connected in "clusters", such as asset
classes, industries, regions, etc., where shocks are orthogonal across clusters
(Sims style orthogonalized identification) but correlated within clusters
(Koop-Pesaran-Potter-Shin style generalized identification), so that the
ordering of network nodes is relevant across clusters but irrelevant within
clusters. After developing the clustered connectedness framework, we apply it
in a detailed empirical exploration of sixteen country equity markets spanning
three global regions.

arXiv link: http://arxiv.org/abs/2502.15458v1

Econometrics arXiv paper, submitted: 2025-02-21

A Supervised Screening and Regularized Factor-Based Method for Time Series Forecasting

Authors: Sihan Tu, Zhaoxing Gao

Factor-based forecasting using Principal Component Analysis (PCA) is an
effective machine learning tool for dimension reduction with many applications
in statistics, economics, and finance. This paper introduces a Supervised
Screening and Regularized Factor-based (SSRF) framework that systematically
addresses high-dimensional predictor sets through a structured four-step
procedure integrating both static and dynamic forecasting mechanisms. The
static approach selects predictors via marginal correlation screening and
scales them using univariate predictive slopes, while the dynamic method
screens and scales predictors based on time series regression incorporating
lagged predictors. PCA then extracts latent factors from the scaled predictors,
followed by LASSO regularization to refine predictive accuracy. In the
simulation study, we validate the effectiveness of SSRF and identify its
parameter adjustment strategies in high-dimensional data settings. An empirical
analysis of macroeconomic indices in China demonstrates that the SSRF method
generally outperforms several commonly used forecasting techniques in
out-of-sample predictions.

arXiv link: http://arxiv.org/abs/2502.15275v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-02-20

Policy-Oriented Binary Classification: Improving (KD-)CART Final Splits for Subpopulation Targeting

Authors: Lei Bill Wang, Zhenbang Jiao, Fangyi Wang

Policymakers often use recursive binary split rules to partition populations
based on binary outcomes and target subpopulations whose probability of the
binary event exceeds a threshold. We call such problems Latent Probability
Classification (LPC). Practitioners typically employ Classification and
Regression Trees (CART) for LPC. We prove that in the context of LPC, classic
CART and the knowledge distillation method, whose student model is a CART
(referred to as KD-CART), are suboptimal. We propose Maximizing Distance Final
Split (MDFS), which generates split rules that strictly dominate CART/KD-CART
under the unique intersect assumption. MDFS identifies the unique best split
rule, is consistent, and targets more vulnerable subpopulations than
CART/KD-CART. To relax the unique intersect assumption, we additionally propose
Penalized Final Split (PFS) and weighted Empirical risk Final Split (wEFS).
Through extensive simulation studies, we demonstrate that the proposed methods
predominantly outperform CART/KD-CART. When applied to real-world datasets,
MDFS generates policies that target more vulnerable subpopulations than the
CART/KD-CART.

arXiv link: http://arxiv.org/abs/2502.15072v2

Econometrics arXiv updated paper (originally submitted: 2025-02-20)

biastest: Testing parameter equality across different models in Stata

Authors: Hasraddin Guliyev

The biastest command in Stata is a powerful and user-friendly tool designed
to compare the coefficients of different regression models, enabling
researchers to assess the robustness and consistency of their empirical
findings. This command is particularly valuable for evaluating alternative
modeling approaches, such as ordinary least squares versus robust regression,
robust regression versus median regression, quantile regression across
different quantiles, and fixed effects versus random effects models in panel
data analysis. By providing both variable-specific and joint tests, biastest
command offers a comprehensive framework for detecting bias or significant
differences in model estimates, ensuring that researchers can make informed
decisions about model selection and interpretation.

arXiv link: http://arxiv.org/abs/2502.15049v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-02-19

An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model

Authors: Enoch H. Kang, Hema Yoganarasimhan, Lalit Jain

We study the problem of estimating Dynamic Discrete Choice (DDC) models, also
known as offline Maximum Entropy-Regularized Inverse Reinforcement Learning
(offline MaxEnt-IRL) in machine learning. The objective is to recover reward or
$Q^*$ functions that govern agent behavior from offline behavior data. In this
paper, we propose a globally convergent gradient-based method for solving these
problems without the restrictive assumption of linearly parameterized rewards.
The novelty of our approach lies in introducing the Empirical Risk Minimization
(ERM) based IRL/DDC framework, which circumvents the need for explicit state
transition probability estimation in the Bellman equation. Furthermore, our
method is compatible with non-parametric estimation techniques such as neural
networks. Therefore, the proposed method has the potential to be scaled to
high-dimensional, infinite state spaces. A key theoretical insight underlying
our approach is that the Bellman residual satisfies the Polyak-Lojasiewicz (PL)
condition -- a property that, while weaker than strong convexity, is sufficient
to ensure fast global convergence guarantees. Through a series of synthetic
experiments, we demonstrate that our approach consistently outperforms
benchmark methods and state-of-the-art alternatives.

arXiv link: http://arxiv.org/abs/2502.14131v5

Econometrics arXiv paper, submitted: 2025-02-19

Locally Robust Policy Learning: Inequality, Inequality of Opportunity and Intergenerational Mobility

Authors: Joël Terschuur

Policy makers need to decide whether to treat or not to treat heterogeneous
individuals. The optimal treatment choice depends on the welfare function that
the policy maker has in mind and it is referred to as the policy learning
problem. I study a general setting for policy learning with semiparametric
Social Welfare Functions (SWFs) that can be estimated by locally
robust/orthogonal moments based on U-statistics. This rich class of SWFs
substantially expands the setting in Athey and Wager (2021) and accommodates a
wider range of distributional preferences. Three main applications of the
general theory motivate the paper: (i) Inequality aware SWFs, (ii) Inequality
of Opportunity aware SWFs and (iii) Intergenerational Mobility SWFs. I use the
Panel Study of Income Dynamics (PSID) to assess the effect of attending
preschool on adult earnings and estimate optimal policy rules based on parental
years of education and parental income.

arXiv link: http://arxiv.org/abs/2502.13868v1

Econometrics arXiv cross-link from q-fin.MF (q-fin.MF), submitted: 2025-02-19

The Risk-Neutral Equivalent Pricing of Model-Uncertainty

Authors: Ken Kangda Wren

Existing approaches to asset-pricing under model-uncertainty adapt classical
utility-maximization frameworks and seek theoretical comprehensiveness. We move
toward practice by considering binary model-risks and by emphasizing
'constraints' over 'preference'. This decomposes viable economic asset-pricing
into that of model and non-model risks separately, leading to a unique and
convenient model-risk pricing formula. Its parameter, a dynamically conserved
constant of model-risk inference, allows an integrated representation of
ex-ante risk-pricing and bias such that their ex-post impacts are disentangled
via well-known anomalies, Momentum and Low-Risk, whose risk-reward patterns
acquire a fresh significance: peak-reward reveals ex-ante risk-premia, and
peak-location, bias.

arXiv link: http://arxiv.org/abs/2502.13744v9

Econometrics arXiv cross-link from q-fin.PM (q-fin.PM), submitted: 2025-02-19

Tensor dynamic conditional correlation model: A new way to pursuit "Holy Grail of investing"

Authors: Cheng Yu, Zhoufan Zhu, Ke Zhu

Style investing creates asset classes (or the so-called "styles") with low
correlations, aligning well with the principle of "Holy Grail of investing" in
terms of portfolio selection. The returns of styles naturally form a
tensor-valued time series, which requires new tools for studying the dynamics
of the conditional correlation matrix to facilitate the aforementioned
principle. Towards this goal, we introduce a new tensor dynamic conditional
correlation (TDCC) model, which is based on two novel treatments:
trace-normalization and dimension-normalization. These two normalizations adapt
to the tensor nature of the data, and they are necessary except when the tensor
data reduce to vector data. Moreover, we provide an easy-to-implement
estimation procedure for the TDCC model, and examine its finite sample
performance by simulations. Finally, we assess the usefulness of the TDCC model
in international portfolio selection across ten global markets and in large
portfolio selection for 1800 stocks from the Chinese stock market.

arXiv link: http://arxiv.org/abs/2502.13461v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-02-19

Balancing Flexibility and Interpretability: A Conditional Linear Model Estimation via Random Forest

Authors: Ricardo Masini, Marcelo Medeiros

Traditional parametric econometric models often rely on rigid functional
forms, while nonparametric techniques, despite their flexibility, frequently
lack interpretability. This paper proposes a parsimonious alternative by
modeling the outcome $Y$ as a linear function of a vector of variables of
interest $X$, conditional on additional covariates
$Z$. Specifically, the conditional expectation is expressed as
$E[Y|X,Z]=X^{T}\beta(Z)$,
where $\beta(\cdot)$ is an unknown Lipschitz-continuous function.
We introduce an adaptation of the Random Forest (RF) algorithm to estimate this
model, balancing the flexibility of machine learning methods with the
interpretability of traditional linear models. This approach addresses a key
challenge in applied econometrics by accommodating heterogeneity in the
relationship between covariates and outcomes. Furthermore, the heterogeneous
partial effects of $X$ on $Y$ are represented by
$\beta(\cdot)$ and can be directly estimated using our proposed
method. Our framework effectively unifies established parametric and
nonparametric models, including varying-coefficient, switching regression, and
additive models. We provide theoretical guarantees, such as pointwise and
$L^p$-norm rates of convergence for the estimator, and establish a pointwise
central limit theorem through subsampling, aiding inference on the function
$\boldsymbol\beta(\cdot)$. We present Monte Carlo simulation results to assess
the finite-sample performance of the method.

arXiv link: http://arxiv.org/abs/2502.13438v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-02-19

Functional Network Autoregressive Models for Panel Data

Authors: Tomohiro Ando, Tadao Hoshino

This study proposes a novel functional vector autoregressive framework for
analyzing network interactions of functional outcomes in panel data settings.
In this framework, an individual's outcome function is influenced by the
outcomes of others through a simultaneous equation system. To estimate the
functional parameters of interest, we need to address the endogeneity issue
arising from these simultaneous interactions among outcome functions. This
issue is carefully handled by developing a novel functional moment-based
estimator. We establish the consistency, convergence rate, and pointwise
asymptotic normality of the proposed estimator. Additionally, we discuss the
estimation of marginal effects and impulse response analysis. As an empirical
illustration, we analyze the demand for a bike-sharing service in the U.S. The
results reveal statistically significant spatial interactions in bike
availability across stations, with interaction patterns varying over the time
of day.

arXiv link: http://arxiv.org/abs/2502.13431v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-02-18

Robust Inference for the Direct Average Treatment Effect with Treatment Assignment Interference

Authors: Matias D. Cattaneo, Yihan He, Ruiqi, Yu

This paper develops methods for uncertainty quantification in causal
inference settings with random network interference. We study the large-sample
distributional properties of the classical difference-in-means Hajek treatment
effect estimator, and propose a robust inference procedure for the
(conditional) direct average treatment effect. Our framework allows for
cross-unit interference in both the outcome equation and the treatment
assignment mechanism. Drawing from statistical physics, we introduce a novel
Ising model to capture complex dependencies in treatment assignment, and derive
three results. First, we establish a Berry-Esseen-type distributional
approximation that holds pointwise in the degree of interference induced by the
Ising model. This approximation recovers existing results in the absence of
treatment interference, and highlights the fragility of inference procedures
that do not account for the presence of interference in treatment assignment.
Second, we establish a uniform distributional approximation for the Hajek
estimator and use it to develop robust inference procedures that remain valid
uniformly over all interference regimes allowed by the model. Third, we propose
a novel resampling method to implement the robust inference procedure and
validate its performance through Monte Carlo simulations. A key technical
innovation is the introduction of a conditional i.i.d. Gaussianization that may
have broader applications. We also discuss extensions and generalizations of
our results.

arXiv link: http://arxiv.org/abs/2502.13238v2

Econometrics arXiv paper, submitted: 2025-02-18

Imputation Strategies for Rightcensored Wages in Longitudinal Datasets

Authors: Jörg Drechsler, Johannes Ludsteck

Censoring from above is a common problem with wage information as the
reported wages are typically top-coded for confidentiality reasons. In
administrative databases the information is often collected only up to a
pre-specified threshold, for example, the contribution limit for the social
security system. While directly accounting for the censoring is possible for
some analyses, the most flexible solution is to impute the values above the
censoring point. This strategy offers the advantage that future users of the
data no longer need to implement possibly complicated censoring estimators.
However, standard cross-sectional imputation routines relying on the classical
Tobit model to impute right-censored data have a high risk of introducing bias
from uncongeniality (Meng, 1994) as future analyses to be conducted on the
imputed data are unknown to the imputer. Furthermore, as we show using a
large-scale administrative database from the German Federal Employment agency,
the classical Tobit model offers a poor fit to the data. In this paper, we
present some strategies to address these problems. Specifically, we use
leave-one-out means as suggested by Card et al. (2013) to avoid biases from
uncongeniality and rely on quantile regression or left censoring to improve the
model fit. We illustrate the benefits of these modeling adjustments using the
German Structure of Earnings Survey, which is (almost) unaffected by censoring
and can thus serve as a testbed to evaluate the imputation procedures.

arXiv link: http://arxiv.org/abs/2502.12967v1

Econometrics arXiv paper, submitted: 2025-02-18

Assortative Marriage and Geographic Sorting

Authors: Jiaming Mao, Jiayi Wen

Between 1980 and 2000, the U.S. experienced a significant rise in geographic
sorting and educational homogamy, with college graduates increasingly
concentrating in high-skill cities and marrying similarly educated spouses. We
develop and estimate a spatial equilibrium model with local labor, housing, and
marriage markets, incorporating a marriage matching framework with transferable
utility. Using the model, we estimate trends in assortative preferences,
quantify the interplay between marital and geographic sorting, and assess their
combined impact on household inequality. Welfare analyses show that after
accounting for marriage, the college well-being gap grew substantially more
than the college wage gap.

arXiv link: http://arxiv.org/abs/2502.12867v1

Econometrics arXiv updated paper (originally submitted: 2025-02-17)

Causal Inference for Qualitative Outcomes

Authors: Riccardo Di Francesco, Giovanni Mellace

Causal inference methods such as instrumental variables, regression
discontinuity, and difference-in-differences are widely used to identify and
estimate treatment effects. However, when outcomes are qualitative, their
application poses fundamental challenges. This paper highlights these
challenges and proposes an alternative framework that focuses on well-defined
and interpretable estimands. We show that conventional identification
assumptions suffice for identifying the new estimands and outline simple,
intuitive estimation strategies that remain fully compatible with conventional
econometric methods. We provide an accompanying open-source R package,
$causalQual$, which is publicly available on CRAN.

arXiv link: http://arxiv.org/abs/2502.11691v2

Econometrics arXiv updated paper (originally submitted: 2025-02-17)

Maximal Inequalities for Separately Exchangeable Empirical Processes

Authors: Harold D. Chiang

This paper derives new maximal inequalities for empirical processes
associated with separately exchangeable random arrays. For fixed index
dimension $K\ge 1$, we establish a global maximal inequality bounding the
$q$-th moment ($q\in[1,\infty)$) of the supremum of these processes. We also
obtain a refined local maximal inequality controlling the first absolute moment
of the supremum. Both results are proved for a general pointwise measurable
function class. Our approach uses a new technique partitioning the index set
into transversal groups, decoupling dependencies and enabling more
sophisticated higher moment bounds.

arXiv link: http://arxiv.org/abs/2502.11432v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-02-16

Regression Modeling of the Count Relational Data with Exchangeable Dependencies

Authors: Wenqin Du, Bailey K. Fosdick, Wen Zhou

Relational data characterized by directed edges with count measurements are
common in social science. Most existing methods either assume the count edges
are derived from continuous random variables or model the edge dependency by
parametric distributions. In this paper, we develop a latent multiplicative
Poisson model for relational data with count edges. Our approach directly
models the edge dependency of count data by the pairwise dependence of latent
errors, which are assumed to be weakly exchangeable. This assumption not only
covers a variety of common network effects, but also leads to a concise
representation of the error covariance. In addition, the identification and
inference of the mean structure, as well as the regression coefficients, depend
on the errors only through their covariance. Such a formulation provides
substantial flexibility for our model. Based on this, we propose a
pseudo-likelihood based estimator for the regression coefficients,
demonstrating its consistency and asymptotic normality. The newly suggested
method is applied to a food-sharing network, revealing interesting network
effects in gift exchange behaviors.

arXiv link: http://arxiv.org/abs/2502.11255v1

Econometrics arXiv updated paper (originally submitted: 2025-02-15)

Policy Learning with Confidence

Authors: Victor Chernozhukov, Sokbae Lee, Adam M. Rosen, Liyang Sun

This paper introduces a framework for selecting policies that maximize
expected welfare under estimation uncertainty. The proposed method explicitly
balances the size of the estimated welfare against the uncertainty inherent in
its estimation, ensuring that chosen policies meet a reporting guarantee,
namely, that actual welfare is guaranteed not to fall below the reported
estimate with a pre-specified confidence level. We produce the efficient
decision frontier, describing policies that offer maximum estimated welfare for
a given acceptable level of estimation risk. We apply this approach to a
variety of settings, including the selection of policy rules that allocate
individuals to treatments and the allocation of limited budgets among competing
social programs.

arXiv link: http://arxiv.org/abs/2502.10653v2

Econometrics arXiv paper, submitted: 2025-02-14

Residualised Treatment Intensity and the Estimation of Average Partial Effects

Authors: Julius Schäper

This paper introduces R-OLS, an estimator for the average partial effect
(APE) of a continuous treatment variable on an outcome variable in the presence
of non-linear and non-additively separable confounding of unknown form.
Identification of the APE is achieved by generalising Stein's Lemma (Stein,
1981), leveraging an exogenous error component in the treatment along with a
flexible functional relationship between the treatment and the confounders. The
identification results for R-OLS are used to characterize the properties of
Double/Debiased Machine Learning (Chernozhukov et al., 2018), specifying the
conditions under which the APE is estimated consistently. A novel decomposition
of the ordinary least squares estimand provides intuition for these results.
Monte Carlo simulations demonstrate that the proposed estimator outperforms
existing methods, delivering accurate estimates of the true APE and exhibiting
robustness to moderate violations of its underlying assumptions. The
methodology is further illustrated through an empirical application to Fetzer
(2019).

arXiv link: http://arxiv.org/abs/2502.10301v1

Econometrics arXiv updated paper (originally submitted: 2025-02-14)

Self-Normalized Inference in (Quantile, Expected Shortfall) Regressions for Time Series

Authors: Yannick Hoga, Christian Schulz

This paper proposes valid inference tools, based on self-normalization, in
time series expected shortfall regressions and, as a corollary, also in
quantile regressions. Extant methods for such time series regressions, based on
a bootstrap or direct estimation of the long-run variance, are computationally
more involved, require the choice of tuning parameters and have serious size
distortions when the regression errors are strongly serially dependent. In
contrast, our inference tools only require estimates of the (quantile, expected
shortfall) regression parameters that are computed on an expanding window, and
are correctly sized as we show in simulations. Two empirical applications to
stock return predictability and to Growth-at-Risk demonstrate the practical
usefulness of the developed inference tools.

arXiv link: http://arxiv.org/abs/2502.10065v2

Econometrics arXiv paper, submitted: 2025-02-13

Prioritized Ranking Experimental Design Using Recommender Systems in Two-Sided Platforms

Authors: Mahyar Habibi, Zahra Khanalizadeh, Negar Ziaeian

Interdependencies between units in online two-sided marketplaces complicate
estimating causal effects in experimental settings. We propose a novel
experimental design to mitigate the interference bias in estimating the total
average treatment effect (TATE) of item-side interventions in online two-sided
marketplaces. Our Two-Sided Prioritized Ranking (TSPR) design uses the
recommender system as an instrument for experimentation. TSPR strategically
prioritizes items based on their treatment status in the listings displayed to
users. We designed TSPR to provide users with a coherent platform experience by
ensuring access to all items and a consistent realization of their treatment by
all users. We evaluate our experimental design through simulations using a
search impression dataset from an online travel agency. Our methodology closely
estimates the true simulated TATE, while a baseline item-side estimator
significantly overestimates TATE.

arXiv link: http://arxiv.org/abs/2502.09806v1

Econometrics arXiv paper, submitted: 2025-02-13

High-dimensional censored MIDAS logistic regression for corporate survival forecasting

Authors: Wei Miao, Jad Beyhum, Jonas Striaukas, Ingrid Van Keilegom

This paper addresses the challenge of forecasting corporate distress, a
problem marked by three key statistical hurdles: (i) right censoring, (ii)
high-dimensional predictors, and (iii) mixed-frequency data. To overcome these
complexities, we introduce a novel high-dimensional censored MIDAS (Mixed Data
Sampling) logistic regression. Our approach handles censoring through inverse
probability weighting and achieves accurate estimation with numerous
mixed-frequency predictors by employing a sparse-group penalty. We establish
finite-sample bounds for the estimation error, accounting for censoring, the
MIDAS approximation error, and heavy tails. The superior performance of the
method is demonstrated through Monte Carlo simulations. Finally, we present an
extensive application of our methodology to predict the financial distress of
Chinese-listed firms. Our novel procedure is implemented in the R package
'Survivalml'.

arXiv link: http://arxiv.org/abs/2502.09740v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2025-02-13

On (in)consistency of M-estimators under contamination

Authors: Jens Klooster, Bent Nielsen

We consider robust location-scale estimators under contamination. We show
that commonly used robust estimators such as the median and the Huber estimator
are inconsistent under asymmetric contamination, while the Tukey estimator is
consistent. In order to make nuisance parameter free inference based on the
Tukey estimator a consistent scale estimator is required. However, standard
robust scale estimators such as the interquartile range and the median absolute
deviation are inconsistent under contamination.

arXiv link: http://arxiv.org/abs/2502.09145v1

Econometrics arXiv paper, submitted: 2025-02-12

Difference-in-Differences and Changes-in-Changes with Sample Selection

Authors: Javier Viviens

Sample selection arises endogenously in causal research when the treatment
affects whether certain units are observed. It is a common pitfall in
longitudinal studies, particularly in settings where treatment assignment is
confounded. In this paper, I highlight the drawbacks of one of the most popular
identification strategies in such settings: Difference-in-Differences (DiD).
Specifically, I employ principal stratification analysis to show that the
conventional ATT estimand may not be well defined, and the DiD estimand cannot
be interpreted causally without additional assumptions. To address these
issues, I develop an identification strategy to partially identify causal
effects on the subset of units with well-defined and observed outcomes under
both treatment regimes. I adapt Lee bounds to the Changes-in-Changes (CiC)
setting (Athey & Imbens, 2006), leveraging the time dimension of the data to
relax the unconfoundedness assumption in the original trimming strategy of Lee
(2009). This setting has the DiD identification strategy as a particular case,
which I also implement in the paper. Additionally, I explore how to leverage
multiple sources of sample selection to relax the monotonicity assumption in
Lee (2009), which may be of independent interest. Alongside the identification
strategy, I present estimators and inference results. I illustrate the
relevance of the proposed methodology by analyzing a job training program in
Colombia.

arXiv link: http://arxiv.org/abs/2502.08614v1

Econometrics arXiv updated paper (originally submitted: 2025-02-12)

Scenario Analysis with Multivariate Bayesian Machine Learning Models

Authors: Michael Pfarrhofer, Anna Stelzer

We present an econometric framework that adapts tools for scenario analysis,
such as variants of conditional forecasts and generalized impulse responses,
for use with dynamic nonparametric models. The proposed algorithms are based on
predictive simulation and sequential Monte Carlo methods. Their utility is
demonstrated with three applications: (1) conditional forecasts based on stress
test scenarios, measuring (2) macroeconomic risk under varying financial
stress, and estimating the (3) asymmetric effects of financial shocks in the US
and their international spillovers. Our empirical results indicate the
importance of nonlinearities and asymmetries in relationships between
macroeconomic and financial variables.

arXiv link: http://arxiv.org/abs/2502.08440v3

Econometrics arXiv paper, submitted: 2025-02-12

Inference in dynamic models for panel data using the moving block bootstrap

Authors: Ayden Higgins, Koen Jochmans

Inference in linear panel data models is complicated by the presence of fixed
effects when (some of) the regressors are not strictly exogenous. Under
asymptotics where the number of cross-sectional observations and time periods
grow at the same rate, the within-group estimator is consistent but its limit
distribution features a bias term. In this paper we show that a panel version
of the moving block bootstrap, where blocks of adjacent cross-sections are
resampled with replacement, replicates the limit distribution of the
within-group estimator. Confidence ellipsoids and hypothesis tests based on the
reverse-percentile bootstrap are thus asymptotically valid without the need to
take the presence of bias into account.

arXiv link: http://arxiv.org/abs/2502.08311v1

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2025-02-11

Are Princelings Truly Busted? Evaluating Transaction Discounts in China's Land Market

Authors: Julia Manso

This paper narrowly replicates Chen and Kung's 2019 paper ($The$ $Quarterly$
$Journal$ $of$ $Economics$ 134(1): 185-226). Inspecting the data reveals that
nearly one-third of the transactions (388,903 out of 1,208,621) are perfect
duplicates of other rows, excluding the transaction number. Replicating the
analysis on the data sans-duplicates yields a slightly smaller but still
statistically significant princeling effect, robust across the regression
results. Further analysis also reveals that coefficients interpreted as the
effect of logarithm of area actually reflect the effect of scaled values of
area; this paper also reinterprets and contextualizes these results in light of
the true scaled values.

arXiv link: http://arxiv.org/abs/2502.07692v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-02-10

Comment on "Generic machine learning inference on heterogeneous treatment effects in randomized experiments."

Authors: Kosuke Imai, Michael Lingzhi Li

We analyze the split-sample robust inference (SSRI) methodology proposed by
Chernozhukov, Demirer, Duflo, and Fernandez-Val (CDDF) for quantifying
uncertainty in heterogeneous treatment effect estimation. While SSRI
effectively accounts for randomness in data splitting, its computational cost
can be prohibitive when combined with complex machine learning (ML) models. We
present an alternative randomization inference (RI) approach that maintains
SSRI's generality without requiring repeated data splitting. By leveraging
cross-fitting and design-based inference, RI achieves valid confidence
intervals while significantly reducing computational burden. We compare the two
methods through simulation, demonstrating that RI retains statistical
efficiency while being more practical for large-scale applications.

arXiv link: http://arxiv.org/abs/2502.06758v1

Econometrics arXiv paper, submitted: 2025-02-10

Grouped fixed effects regularization for binary choice models

Authors: Claudia Pigini, Alessandro Pionati, Francesco Valentini

We study the application of the Grouped Fixed Effects (GFE) estimator
(Bonhomme et al., ECMTA 90(2):625-643, 2022) to binary choice models for
network and panel data. This approach discretizes unobserved heterogeneity via
k-means clustering and performs maximum likelihood estimation, reducing the
number of fixed effects in finite samples. This regularization helps analyze
small/sparse networks and rare events by mitigating complete separation, which
can lead to data loss. We focus on dynamic models with few state transitions
and network formation models for sparse networks. The effectiveness of this
method is demonstrated through simulations and real data applications.

arXiv link: http://arxiv.org/abs/2502.06446v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-02-10

Dynamic Pricing with Adversarially-Censored Demands

Authors: Jianyu Xu, Yining Wang, Xi Chen, Yu-Xiang Wang

We study an online dynamic pricing problem where the potential demand at each
time period $t=1,2,\ldots, T$ is stochastic and dependent on the price.
However, a perishable inventory is imposed at the beginning of each time $t$,
censoring the potential demand if it exceeds the inventory level. To address
this problem, we introduce a pricing algorithm based on the optimistic
estimates of derivatives. We show that our algorithm achieves
$O(T)$ optimal regret even with adversarial inventory series.
Our findings advance the state-of-the-art in online decision-making problems
with censored feedback, offering a theoretically optimal solution against
adversarial observations.

arXiv link: http://arxiv.org/abs/2502.06168v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-02-08

Global Ease of Living Index: a machine learning framework for longitudinal analysis of major economies

Authors: Tanay Panat, Rohitash Chandra

The drastic changes in the global economy, geopolitical conditions, and
disruptions such as the COVID-19 pandemic have impacted the cost of living and
quality of life. It is important to understand the long-term nature of the cost
of living and quality of life in major economies. A transparent and
comprehensive living index must include multiple dimensions of living
conditions. In this study, we present an approach to quantifying the quality of
life through the Global Ease of Living Index that combines various
socio-economic and infrastructural factors into a single composite score. Our
index utilises economic indicators that define living standards, which could
help in targeted interventions to improve specific areas. We present a machine
learning framework for addressing the problem of missing data for some of the
economic indicators for specific countries. We then curate and update the data
and use a dimensionality reduction approach (principal component analysis) to
create the Ease of Living Index for major economies since 1970. Our work
significantly adds to the literature by offering a practical tool for
policymakers to identify areas needing improvement, such as healthcare systems,
employment opportunities, and public safety. Our approach with open data and
code can be easily reproduced and applied to various contexts. This
transparency and accessibility make our work a valuable resource for ongoing
research and policy development in quality-of-life assessment.

arXiv link: http://arxiv.org/abs/2502.06866v2

Econometrics arXiv paper, submitted: 2025-02-07

Point-Identifying Semiparametric Sample Selection Models with No Excluded Variable

Authors: Dongwoo Kim, Young Jun Lee

Sample selection is pervasive in applied economic studies. This paper
develops semiparametric selection models that achieve point identification
without relying on exclusion restrictions, an assumption long believed
necessary for identification in semiparametric selection models. Our
identification conditions require at least one continuously distributed
covariate and certain nonlinearity in the selection process. We propose a
two-step plug-in estimator that is root-n-consistent, asymptotically normal,
and computationally straightforward (readily available in statistical
software), allowing for heteroskedasticity. Our approach provides a middle
ground between Lee (2009)'s nonparametric bounds and Honor\'e and Hu (2020)'s
linear selection bounds, while ensuring point identification. Simulation
evidence confirms its excellent finite-sample performance. We apply our method
to estimate the racial and gender wage disparity using data from the US Current
Population Survey. Our estimates tend to lie outside the Honor\'e and Hu
bounds.

arXiv link: http://arxiv.org/abs/2502.05353v1

Econometrics arXiv paper, submitted: 2025-02-07

Estimating Parameters of Structural Models Using Neural Networks

Authors: Yanhao, Wei, Zhenling Jiang

We study an alternative use of machine learning. We train neural nets to
provide the parameter estimate of a given (structural) econometric model, for
example, discrete choice or consumer search. Training examples consist of
datasets generated by the econometric model under a range of parameter values.
The neural net takes the moments of a dataset as input and tries to recognize
the parameter value underlying that dataset. Besides the point estimate, the
neural net can also output statistical accuracy. This neural net estimator
(NNE) tends to limited-information Bayesian posterior as the number of training
datasets increases. We apply NNE to a consumer search model. It gives more
accurate estimates at lighter computational costs than the prevailing approach.
NNE is also robust to redundant moment inputs. In general, NNE offers the most
benefits in applications where other estimation approaches require very heavy
simulation costs. We provide code at: https://nnehome.github.io.

arXiv link: http://arxiv.org/abs/2502.04945v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2025-02-07

A sliced Wasserstein and diffusion approach to random coefficient models

Authors: Keunwoo Lim, Ting Ye, Fang Han

We propose a new minimum-distance estimator for linear random coefficient
models. This estimator integrates the recently advanced sliced Wasserstein
distance with the nearest neighbor methods, both of which enhance computational
efficiency. We demonstrate that the proposed method is consistent in
approximating the true distribution. Moreover, our formulation naturally leads
to a diffusion process-based algorithm and is closely connected to treatment
effect distribution estimation -- both of which are of independent interest and
hold promise for broader applications.

arXiv link: http://arxiv.org/abs/2502.04654v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-02-06

Estimation of large approximate dynamic matrix factor models based on the EM algorithm and Kalman filtering

Authors: Matteo Barigozzi, Luca Trapin

This paper considers an approximate dynamic matrix factor model that accounts
for the time series nature of the data by explicitly modelling the time
evolution of the factors. We study estimation of the model parameters based on
the Expectation Maximization (EM) algorithm, implemented jointly with the
Kalman smoother which gives estimates of the factors. We establish the
consistency of the estimated loadings and factor matrices as the sample size
$T$ and the matrix dimensions $p_1$ and $p_2$ diverge to infinity. We then
illustrate two immediate extensions of this approach to: (a) the case of
arbitrary patterns of missing data and (b) the presence of common stochastic
trends. The finite sample properties of the estimators are assessed through a
large simulation study and two applications on: (i) a financial dataset of
volatility proxies and (ii) a macroeconomic dataset covering the main euro area
countries.

arXiv link: http://arxiv.org/abs/2502.04112v2

Econometrics arXiv paper, submitted: 2025-02-06

Combining Clusters for the Approximate Randomization Test

Authors: Chun Pong Lau

This paper develops procedures to combine clusters for the approximate
randomization test proposed by Canay, Romano, and Shaikh (2017). Their test can
be used to conduct inference with a small number of clusters and imposes weak
requirements on the correlation structure. However, their test requires the
target parameter to be identified within each cluster. A leading example where
this requirement fails to hold is when a variable has no variation within
clusters. For instance, this happens in difference-in-differences designs
because the treatment variable equals zero in the control clusters. Under this
scenario, combining control and treated clusters can solve the identification
problem, and the test remains valid. However, there is an arbitrariness in how
the clusters are combined. In this paper, I develop computationally efficient
procedures to combine clusters when this identification requirement does not
hold. Clusters are combined to maximize local asymptotic power. The simulation
study and empirical application show that the procedures to combine clusters
perform well in various settings.

arXiv link: http://arxiv.org/abs/2502.03865v1

Econometrics arXiv paper, submitted: 2025-02-06

Misspecification-Robust Shrinkage and Selection for VAR Forecasts and IRFs

Authors: Oriol González-Casasús, Frank Schorfheide

VARs are often estimated with Bayesian techniques to cope with model
dimensionality. The posterior means define a class of shrinkage estimators,
indexed by hyperparameters that determine the relative weight on maximum
likelihood estimates and prior means. In a Bayesian setting, it is natural to
choose these hyperparameters by maximizing the marginal data density. However,
this is undesirable if the VAR is misspecified. In this paper, we derive
asymptotically unbiased estimates of the multi-step forecasting risk and the
impulse response estimation risk to determine hyperparameters in settings where
the VAR is (potentially) misspecified. The proposed criteria can be used to
jointly select the optimal shrinkage hyperparameter, VAR lag length, and to
choose among different types of multi-step-ahead predictors; or among IRF
estimates based on VARs and local projections. The selection approach is
illustrated in a Monte Carlo study and an empirical application.

arXiv link: http://arxiv.org/abs/2502.03693v1

Econometrics arXiv paper, submitted: 2025-02-05

Type 2 Tobit Sample Selection Models with Bayesian Additive Regression Trees

Authors: Eoghan O'Neill

This paper introduces Type 2 Tobit Bayesian Additive Regression Trees
(TOBART-2). BART can produce accurate individual-specific treatment effect
estimates. However, in practice estimates are often biased by sample selection.
We extend the Type 2 Tobit sample selection model to account for nonlinearities
and model uncertainty by including sums of trees in both the selection and
outcome equations. A Dirichlet Process Mixture distribution for the error terms
allows for departure from the assumption of bivariate normally distributed
errors. Soft trees and a Dirichlet prior on splitting probabilities improve
modeling of smooth and sparse data generating processes. We include a
simulation study and an application to the RAND Health Insurance Experiment
data set.

arXiv link: http://arxiv.org/abs/2502.03600v1

Econometrics arXiv updated paper (originally submitted: 2025-02-05)

Wald inference on varying coefficients

Authors: Abhimanyu Gupta, Xi Qu, Sorawoot Srisuma, Jiajun Zhang

We present simple to implement Wald-type statistics that deliver a general
nonparametric inference theory for linear restrictions on varying coefficients
in a range of regression models allowing for cross-sectional or spatial
dependence. We provide a general central limit theorem that covers a broad
range of error spatial dependence structures, allows for a degree of
misspecification robustness via nonparametric spatial weights and permits
inference on both varying regression and spatial dependence parameters. Using
our method, we first uncover evidence of constant returns to scale in the
Chinese nonmetal mineral industry's production function, and then show that
Boston house prices respond nonlinearly to proximity to employment centers. A
simulation study confirms that our tests perform very well in finite samples.

arXiv link: http://arxiv.org/abs/2502.03084v2

Econometrics arXiv updated paper (originally submitted: 2025-02-05)

Panel Data Estimation and Inference: Homogeneity versus Heterogeneity

Authors: Jiti Gao, Fei Liu, Bin Peng, Yayi Yan

In this paper, we define an underlying data generating process that allows
for different magnitudes of cross-sectional dependence, along with time series
autocorrelation. This is achieved via high-dimensional moving average processes
of infinite order (HDMA($\infty$)). Our setup and investigation integrates and
enhances homogenous and heterogeneous panel data estimation and testing in a
unified way. To study HDMA($\infty$), we extend the Beveridge-Nelson
decomposition to a high-dimensional time series setting, and derive a complete
toolkit set. We exam homogeneity versus heterogeneity using Gaussian
approximation, a prevalent technique for establishing uniform inference. For
post-testing inference, we derive central limit theorems through Edgeworth
expansions for both homogenous and heterogeneous settings. Additionally, we
showcase the practical relevance of the established asymptotic theory by (1).
connecting our results with the literature on grouping structure analysis, (2).
examining a nonstationary panel data generating process, and (3). revisiting
the common correlated effects (CCE) estimators. Finally, we verify our
theoretical findings via extensive numerical studies using both simulated and
real datasets.

arXiv link: http://arxiv.org/abs/2502.03019v2

Econometrics arXiv paper, submitted: 2025-02-04

Kotlarski's lemma for dyadic models

Authors: Grigory Franguridi, Hyungsik Roger Moon

We show how to identify the distributions of the error components in the
two-way dyadic model $y_{ij}=c+\alpha_i+\eta_j+\varepsilon_{ij}$. To this end,
we extend the lemma of Kotlarski (1967), mimicking the arguments of Evdokimov
and White (2012). We allow the characteristic functions of the error components
to have real zeros, as long as they do not overlap with zeros of their first
derivatives.

arXiv link: http://arxiv.org/abs/2502.02734v1

Econometrics arXiv updated paper (originally submitted: 2025-02-04)

Improving volatility forecasts of the Nikkei 225 stock index using a realized EGARCH model with realized and realized range-based volatilities

Authors: Yaming Chang

This paper applies the realized exponential generalized autoregressive
conditional heteroskedasticity (REGARCH) model to analyze the Nikkei 225 index
from 2010 to 2017, utilizing realized variance (RV) and realized range-based
volatility (RRV) as high-frequency measures of volatility. The findings show
that REGARCH models outperform standard GARCH family models in both in-sample
fitting and out-of-sample forecasting, driven by the dynamic information
embedded in high-frequency realized measures. Incorporating multiple realized
measures within a joint REGARCH framework further enhances model performance.
Notably, RRV demonstrates superior predictive power compared to RV, as
evidenced by improvements in forecast accuracy metrics. Moreover, the
forecasting results remain robust under both rolling-window and recursive
evaluation schemes.

arXiv link: http://arxiv.org/abs/2502.02695v2

Econometrics arXiv cross-link from math.OC (math.OC), submitted: 2025-02-04

Loss Functions for Inventory Control

Authors: Steven R. Pauly

In this paper, we provide analytic expressions for the first-order loss
function, the complementary loss function and the second-order loss function
for several probability distributions. These loss functions are important
functions in inventory optimization and other quantitative fields. For several
reasons, which will become apparent throughout this paper, the implementation
of these loss functions prefers the use of an analytic expression, only using
standard probability functions. However, complete and consistent references of
analytic expressions for these loss functions are lacking in literature. This
paper aims to close this gap and can serve as a reference for researchers,
software engineers and practitioners that are concerned with the optimization
of a quantitative system. This should lead directly to easily using different
probability distributions in quantitive models which is at the core of
optimization. Also, this paper serves as a broad introduction to loss functions
and their use in inventory control.

arXiv link: http://arxiv.org/abs/2502.05212v1

Econometrics arXiv cross-link from cs.SI (cs.SI), submitted: 2025-02-03

Estimating Network Models using Neural Networks

Authors: Angelo Mele

Exponential random graph models (ERGMs) are very flexible for modeling
network formation but pose difficult estimation challenges due to their
intractable normalizing constant. Existing methods, such as MCMC-MLE, rely on
sequential simulation at every optimization step. We propose a neural network
approach that trains on a single, large set of parameter-simulation pairs to
learn the mapping from parameters to average network statistics. Once trained,
this map can be inverted, yielding a fast and parallelizable estimation method.
The procedure also accommodates extra network statistics to mitigate model
misspecification. Some simple illustrative examples show that the method
performs well in practice.

arXiv link: http://arxiv.org/abs/2502.01810v1

Econometrics arXiv updated paper (originally submitted: 2025-02-03)

Comment on "Sequential validation of treatment heterogeneity" and "Comment on generic machine learning inference on heterogeneous treatment effects in randomized experiments"

Authors: Victor Chernozhukov, Mert Demirer, Esther Duflo, Iván Fernández-Val

We warmly thank Kosuke Imai, Michael Lingzhi Li, and Stefan Wager for their
gracious and insightful comments. We are particularly encouraged that both
pieces recognize the importance of the research agenda the lecture laid out,
which we see as critical for applied researchers. It is also great to see that
both underscore the potential of the basic approach we propose - targeting
summary features of the CATE after proxy estimation with sample splitting. We
are also happy that both papers push us (and the reader) to continue thinking
about the inference problem associated with sample splitting. We recognize that
our current paper is only scratching the surface of this interesting agenda.
Our proposal is certainly not the only option, and it is exciting that both
papers provide and assess alternatives. Hopefully, this will generate even more
work in this area.

arXiv link: http://arxiv.org/abs/2502.01548v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-02-03

Can We Validate Counterfactual Estimations in the Presence of General Network Interference?

Authors: Sadegh Shirani, Yuwei Luo, William Overman, Ruoxuan Xiong, Mohsen Bayati

In experimental settings with network interference, a unit's treatment can
influence outcomes of other units, challenging both causal effect estimation
and its validation. Classic validation approaches fail as outcomes are only
observable under one treatment scenario and exhibit complex correlation
patterns due to interference. To address these challenges, we introduce a new
framework enabling cross-validation for counterfactual estimation. At its core
is our distribution-preserving network bootstrap method -- a
theoretically-grounded approach inspired by approximate message passing. This
method creates multiple subpopulations while preserving the underlying
distribution of network effects. We extend recent causal message-passing
developments by incorporating heterogeneous unit-level characteristics and
varying local interactions, ensuring reliable finite-sample performance through
non-asymptotic analysis. We also develop and publicly release a comprehensive
benchmark toolbox with diverse experimental environments, from networks of
interacting AI agents to opinion formation in real-world communities and
ride-sharing applications. These environments provide known ground truth values
while maintaining realistic complexities, enabling systematic examination of
causal inference methods. Extensive evaluation across these environments
demonstrates our method's robustness to diverse forms of network interference.
Our work provides researchers with both a practical estimation framework and a
standardized platform for testing future methodological developments.

arXiv link: http://arxiv.org/abs/2502.01106v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-02-02

Online Generalized Method of Moments for Time Series

Authors: Man Fung Leung, Kin Wai Chan, Xiaofeng Shao

Online learning has gained popularity in recent years due to the urgent need
to analyse large-scale streaming data, which can be collected in perpetuity and
serially dependent. This motivates us to develop the online generalized method
of moments (OGMM), an explicitly updated estimation and inference framework in
the time series setting. The OGMM inherits many properties of offline GMM, such
as its broad applicability to many problems in econometrics and statistics,
natural accommodation for over-identification, and achievement of
semiparametric efficiency under temporal dependence. As an online method, the
key gain relative to offline GMM is the vast improvement in time complexity and
memory requirement.
Building on the OGMM framework, we propose improved versions of online
Sargan--Hansen and structural stability tests following recent work in
econometrics and statistics. Through Monte Carlo simulations, we observe
encouraging finite-sample performance in online instrumental variables
regression, online over-identifying restrictions test, online quantile
regression, and online anomaly detection. Interesting applications of OGMM to
stochastic volatility modelling and inertial sensor calibration are presented
to demonstrate the effectiveness of OGMM.

arXiv link: http://arxiv.org/abs/2502.00751v1

Econometrics arXiv paper, submitted: 2025-02-01

Serial-Dependence and Persistence Robust Inference in Predictive Regressions

Authors: Jean-Yves Pitarakis

This paper introduces a new method for testing the statistical significance
of estimated parameters in predictive regressions. The approach features a new
family of test statistics that are robust to the degree of persistence of the
predictors. Importantly, the method accounts for serial correlation and
conditional heteroskedasticity without requiring any corrections or
adjustments. This is achieved through a mechanism embedded within the test
statistics that effectively decouples serial dependence present in the data.
The limiting null distributions of these test statistics are shown to follow a
chi-square distribution, and their asymptotic power under local alternatives is
derived. A comprehensive set of simulation experiments illustrates their finite
sample size and power properties.

arXiv link: http://arxiv.org/abs/2502.00475v1

Econometrics arXiv paper, submitted: 2025-02-01

Confidence intervals for intentionally biased estimators

Authors: David M. Kaplan, Xin Liu

We propose and study three confidence intervals (CIs) centered at an
estimator that is intentionally biased to reduce mean squared error. The first
CI simply uses an unbiased estimator's standard error; compared to centering at
the unbiased estimator, this CI has higher coverage probability for confidence
levels above 91.7%, even if the biased and unbiased estimators have equal mean
squared error. The second CI trades some of this "excess" coverage for shorter
length. The third CI is centered at a convex combination of the two estimators
to further reduce length. Practically, these CIs apply broadly and are simple
to compute.

arXiv link: http://arxiv.org/abs/2502.00450v1

Econometrics arXiv updated paper (originally submitted: 2025-01-31)

Fixed-Population Causal Inference for Models of Equilibrium

Authors: Konrad Menzel

In contrast to problems of interference in (exogenous) treatments, models of
interference in unit-specific (endogenous) outcomes do not usually produce a
reduced-form representation where outcomes depend on other units' treatment
status only at a short network distance, or only through a known exposure
mapping. This remains true if the structural mechanism depends on outcomes of
peers only at a short network distance, or through a known exposure mapping. In
this paper, we first define causal estimands that are identified and estimable
from a single experiment on the network under minimal assumptions on the
structure of interference, and which represent average partial causal responses
which generally vary with other global features of the realized assignment.
Under a fixed-population, design-based approach, we show unbiasedness and
consistency for inverse-probability weighting (IPW) estimators for those causal
parameters from a randomized experiment on a single network. We also analyze
more closely the case of marginal interventions in a model of equilibrium with
smooth response functions where we can recover LATE-type weighted averages of
derivatives of those response functions. Under additional structural
assumptions, these “agnostic" causal estimands can be combined to recover
model parameters, but also retain their less restrictive causal interpretation.

arXiv link: http://arxiv.org/abs/2501.19394v4

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-01-31

PUATE: Efficient Average Treatment Effect Estimation from Treated (Positive) and Unlabeled Units

Authors: Masahiro Kato, Fumiaki Kozai, Ryo Inokuchi

The estimation of average treatment effects (ATEs), defined as the difference
in expected outcomes between treatment and control groups, is a central topic
in causal inference. This study develops semiparametric efficient estimators
for ATE in a setting where only a treatment group and an unlabeled group,
consisting of units whose treatment status is unknown, are observed. This
scenario constitutes a variant of learning from positive and unlabeled data (PU
learning) and can be viewed as a special case of ATE estimation with missing
data. For this setting, we derive the semiparametric efficiency bounds, which
characterize the lowest achievable asymptotic variance for regular estimators.
We then construct semiparametric efficient ATE estimators that attain these
bounds. Our results contribute to the literature on causal inference with
missing data and weakly supervised learning.

arXiv link: http://arxiv.org/abs/2501.19345v2

Econometrics arXiv paper, submitted: 2025-01-31

Untestability of Average Slutsky Symmetry

Authors: Haruki Kono

Slutsky symmetry and negative semidefiniteness are necessary and sufficient
conditions for the rationality of demand functions. While the empirical
implications of Slutsky negative semidefiniteness in repeated cross-sectional
demand data are well understood, the empirical content of Slutsky symmetry
remains largely unexplored. This paper takes an important first step toward
addressing this gap. We demonstrate that the average Slutsky matrix is not
identified and that its identified set always contains a symmetric matrix. A
key implication of our findings is that the symmetry of the average Slutsky
matrix is untestable, and consequently, individual Slutsky symmetry cannot be
tested using the average Slutsky matrix.

arXiv link: http://arxiv.org/abs/2501.18923v1

Econometrics arXiv updated paper (originally submitted: 2025-01-30)

Model-Adaptive Approach to Dynamic Discrete Choice Models with Large State Spaces

Authors: Ertian Chen

Estimation and counterfactual experiments in dynamic discrete choice models
with large state spaces pose computational difficulties. This paper develops a
novel model-adaptive approach to solve the linear system of fixed point
equations of the policy valuation operator. We propose a model-adaptive sieve
space, constructed by iteratively augmenting the space with the residual from
the previous iteration. We show both theoretically and numerically that
model-adaptive sieves dramatically improve performance. In particular, the
approximation error decays at a superlinear rate in the sieve dimension, unlike
a linear rate achieved using conventional methods. Our method works for both
conditional choice probability estimators and full-solution estimators with
policy iteration. We apply the method to analyze consumer demand for laundry
detergent using Kantar's Worldpanel Take Home data. On average, our method is
51.5% faster than conventional methods in solving the dynamic programming
problem, making the Bayesian MCMC estimator computationally feasible.

arXiv link: http://arxiv.org/abs/2501.18746v2

Econometrics arXiv paper, submitted: 2025-01-30

IV Estimation of Heterogeneous Spatial Dynamic Panel Models with Interactive Effects

Authors: Jia Chen, Guowei Cui, Vasilis Sarafidis, Takashi Yamagata

This paper develops a Mean Group Instrumental Variables (MGIV) estimator for
spatial dynamic panel data models with interactive effects, under large N and T
asymptotics. Unlike existing approaches that typically impose slope-parameter
homogeneity, MGIV accommodates cross-sectional heterogeneity in slope
coefficients. The proposed estimator is linear, making it computationally
efficient and robust. Furthermore, it avoids the incidental parameters problem,
enabling asymptotically valid inferences without requiring bias correction. The
Monte Carlo experiments indicate strong finite-sample performance of the MGIV
estimator across various sample sizes and parameter configurations. The
practical utility of the estimator is illustrated through an application to
regional economic growth in Europe. By explicitly incorporating heterogeneity,
our approach provides fresh insights into the determinants of regional growth,
underscoring the critical roles of spatial and temporal dependencies.

arXiv link: http://arxiv.org/abs/2501.18467v1

Econometrics arXiv paper, submitted: 2025-01-29

Universal Inference for Incomplete Discrete Choice Models

Authors: Hiroaki Kaido, Yi Zhang

A growing number of empirical models exhibit set-valued predictions. This
paper develops a tractable inference method with finite-sample validity for
such models. The proposed procedure uses a robust version of the universal
inference framework by Wasserman et al. (2020) and avoids using moment
selection tuning parameters, resampling, or simulations. The method is designed
for constructing confidence intervals for counterfactual objects and other
functionals of the underlying parameter. It can be used in applications that
involve model incompleteness, discrete and continuous covariates, and
parameters containing nuisance components.

arXiv link: http://arxiv.org/abs/2501.17973v1

Econometrics arXiv updated paper (originally submitted: 2025-01-29)

Uniform Confidence Band for Marginal Treatment Effect Function

Authors: Toshiki Tsuda, Yanchun Jin, Ryo Okui

This paper presents a method for constructing uniform confidence bands for
the marginal treatment effect (MTE) function. The shape of the MTE function
offers insight into how the unobserved propensity to receive treatment is
related to the treatment effect. Our approach visualizes the statistical
uncertainty of an estimated function, facilitating inferences about the
function's shape. The proposed method is computationally inexpensive and
requires only minimal information: sample size, standard errors, kernel
function, and bandwidth. This minimal data requirement enables applications to
both new analyses and published results without access to original data. We
derive a Gaussian approximation for a local quadratic estimator and consider
the approximation of the distribution of its supremum in polynomial order.
Monte Carlo simulations demonstrate that our bands provide the desired coverage
and are less conservative than those based on the Gumbel approximation. An
empirical illustration regarding the returns to education is included.

arXiv link: http://arxiv.org/abs/2501.17455v2

Econometrics arXiv paper, submitted: 2025-01-28

Demand Analysis under Price Rigidity and Endogenous Assortment: An Application to China's Tobacco Industry

Authors: Hui Liu, Yao Luo

We observe nominal price rigidity in tobacco markets across China. The
monopolistic seller responds by adjusting product assortments, which remain
unobserved by the analyst. We develop and estimate a logit demand model that
incorporates assortment discrimination and nominal price rigidity. We find that
consumers are significantly more responsive to price changes than conventional
models predict. Simulated tax increases reveal that neglecting the role of
endogenous assortments results in underestimations of the decline in
higher-tier product sales, incorrect directional predictions of lower-tier
product sales, and overestimation of tax revenue by more than 50%. Finally, we
extend our methodology to settings with competition and random coefficient
models.

arXiv link: http://arxiv.org/abs/2501.17251v1

Econometrics arXiv cross-link from q-fin.TR (q-fin.TR), submitted: 2025-01-28

Why is the estimation of metaorder impact with public market data so challenging?

Authors: Manuel Naviglio, Giacomo Bormetti, Francesco Campigli, German Rodikov, Fabrizio Lillo

Estimating market impact and transaction costs of large trades (metaorders)
is a very important topic in finance. However, using models of price and trade
based on public market data provide average price trajectories which are
qualitatively different from what is observed during real metaorder executions:
the price increases linearly, rather than in a concave way, during the
execution and the amount of reversion after its end is very limited. We claim
that this is a generic phenomenon due to the fact that even sophisticated
statistical models are unable to correctly describe the origin of the
autocorrelation of the order flow. We propose a modified Transient Impact Model
which provides more realistic trajectories by assuming that only a fraction of
the metaorder trading triggers market order flow. Interestingly, in our model
there is a critical condition on the kernels of the price and order flow
equations in which market impact becomes permanent.

arXiv link: http://arxiv.org/abs/2501.17096v1

Econometrics arXiv paper, submitted: 2025-01-28

Bayesian Analyses of Structural Vector Autoregressions with Sign, Zero, and Narrative Restrictions Using the R Package bsvarSIGNs

Authors: Xiaolei Wang, Tomasz Woźniak

The R package bsvarSIGNs implements state-of-the-art algorithms for the
Bayesian analysis of Structural Vector Autoregressions identified by sign,
zero, and narrative restrictions. It offers fast and efficient estimation
thanks to the deployment of frontier econometric and numerical techniques and
algorithms written in C++. The core model is based on a flexible Vector
Autoregression with estimated hyper-parameters of the Minnesota prior and the
dummy observation priors. The structural model can be identified by sign, zero,
and narrative restrictions, including a novel solution, making it possible to
use the three types of restrictions at once. The package facilitates predictive
and structural analyses using impulse responses, forecast error variance and
historical decompositions, forecasting and conditional forecasting, as well as
analyses of structural shocks and fitted values. All this is complemented by
colourful plots, user-friendly summary functions, and comprehensive
documentation. The package was granted the Di Cook Open-Source Statistical
Software Award by the Statistical Society of Australia in 2024.

arXiv link: http://arxiv.org/abs/2501.16711v1

Econometrics arXiv updated paper (originally submitted: 2025-01-27)

Copyright and Competition: Estimating Supply and Demand with Unstructured Data

Authors: Sukjin Han, Kyungho Lee

We study the competitive and welfare effects of copyright in creative
industries in the face of cost-reducing technologies such as generative
artificial intelligence. Creative products often feature unstructured
attributes (e.g., images and text) that are complex and high-dimensional. To
address this challenge, we study a stylized design product -- fonts -- using
data from the world's largest font marketplace. We construct neural network
embeddings to quantify unstructured attributes and measure visual similarity in
a manner consistent with human perception. Spatial regression and event-study
analyses demonstrate that competition is local in the visual characteristics
space. Building on this evidence, we develop a structural model of supply and
demand that incorporates embeddings and captures product positioning under
copyright-based similarity constraints. Our estimates reveal consumers'
heterogeneous design preferences and producers' cost-effective mimicry
advantages. Counterfactual analyses show that copyright protection can raise
consumer welfare by encouraging product relocation, and that the optimal policy
depends on the interaction between copyright and cost-reducing technologies.

arXiv link: http://arxiv.org/abs/2501.16120v2

Econometrics arXiv paper, submitted: 2025-01-27

Advancing Portfolio Optimization: Adaptive Minimum-Variance Portfolios and Minimum Risk Rate Frameworks

Authors: Ayush Jha, Abootaleb Shirvani, Ali Jaffri, Svetlozar T. Rachev, Frank J. Fabozzi

This study presents the Adaptive Minimum-Variance Portfolio (AMVP) framework
and the Adaptive Minimum-Risk Rate (AMRR) metric, innovative tools designed to
optimize portfolios dynamically in volatile and nonstationary financial
markets. Unlike traditional minimum-variance approaches, the AMVP framework
incorporates real-time adaptability through advanced econometric models,
including ARFIMA-FIGARCH processes and non-Gaussian innovations. Empirical
applications on cryptocurrency and equity markets demonstrate the proposed
framework's superior performance in risk reduction and portfolio stability,
particularly during periods of structural market breaks and heightened
volatility. The findings highlight the practical implications of using the AMVP
and AMRR methodologies to address modern investment challenges, offering
actionable insights for portfolio managers navigating uncertain and rapidly
changing market conditions.

arXiv link: http://arxiv.org/abs/2501.15793v1

Econometrics arXiv updated paper (originally submitted: 2025-01-27)

Universal Factor Models

Authors: Songnian Chen, Junlong Feng

We propose a new factor analysis framework and estimators of the factors and
loadings that are robust to weak factors in a large $N$ and large $T$ setting.
Our framework, by simultaneously considering all quantile levels of the outcome
variable, induces standard mean and quantile factor models, but the factors can
have an arbitrarily weak influence on the outcome's mean or quantile at most
quantile levels. Our method estimates the factor space at the $N$-rate
without requiring the knowledge of weak factors' presence or strength, and
achieves $N$- and $T$-asymptotic normality for the factors and
loadings based on a novel sample splitting approach that handles incidental
nuisance parameters. We also develop a weak-factor-robust estimator of the
number of factors and consistent selectors of factors of any tolerated level of
influence on the outcome's mean or quantiles. Monte Carlo simulations
demonstrate the effectiveness of our method.

arXiv link: http://arxiv.org/abs/2501.15761v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-01-27

Scale-Insensitive Neural Network Significance Tests

Authors: Hasan Fallahgoul

This paper develops a scale-insensitive framework for neural network
significance testing, substantially generalizing existing approaches through
three key innovations. First, we replace metric entropy calculations with
Rademacher complexity bounds, enabling the analysis of neural networks without
requiring bounded weights or specific architectural constraints. Second, we
weaken the regularity conditions on the target function to require only Sobolev
space membership $H^s([-1,1]^d)$ with $s > d/2$, significantly relaxing
previous smoothness assumptions while maintaining optimal approximation rates.
Third, we introduce a modified sieve space construction based on moment bounds
rather than weight constraints, providing a more natural theoretical framework
for modern deep learning practices. Our approach achieves these generalizations
while preserving optimal convergence rates and establishing valid asymptotic
distributions for test statistics. The technical foundation combines
localization theory, sharp concentration inequalities, and scale-insensitive
complexity measures to handle unbounded weights and general Lipschitz
activation functions. This framework better aligns theoretical guarantees with
contemporary deep learning practice while maintaining mathematical rigor.

arXiv link: http://arxiv.org/abs/2501.15753v3

Econometrics arXiv paper, submitted: 2025-01-26

Simple Inference on a Simplex-Valued Weight

Authors: Nathan Canen, Kyungchul Song

In many applications, the parameter of interest involves a simplex-valued
weight which is identified as a solution to an optimization problem. Examples
include synthetic control methods with group-level weights and various methods
of model averaging and forecast combination. The simplex constraint on the
weight poses a challenge in statistical inference due to the constraint
potentially binding. In this paper, we propose a simple method of constructing
a confidence set for the weight and prove that the method is asymptotically
uniformly valid. The procedure does not require tuning parameters or
simulations to compute critical values. The confidence set accommodates both
the cases of point-identification or set-identification of the weight. We
illustrate the method with an empirical example.

arXiv link: http://arxiv.org/abs/2501.15692v1

Econometrics arXiv updated paper (originally submitted: 2025-01-26)

Philip G. Wright, directed acyclic graphs, and instrumental variables

Authors: Jaap H. Abbring, Victor Chernozhukov, Iván Fernández-Val

Wright (1928) deals with demand and supply of oils and butter. In Appendix B
of this book, Philip Wright made several fundamental contributions to causal
inference. He introduced a structural equation model of supply and demand,
established the identification of supply and demand elasticities via the method
of moments and directed acyclical graphs, developed empirical methods for
estimating demand elasticities using weather conditions as instruments, and
proposed methods for counterfactual analysis of the welfare effect of imposing
tariffs and taxes. Moreover, he took all of these methods to data. These ideas
were far ahead, and much more profound than, any contemporary theoretical and
empirical developments on causal inference in statistics or econometrics. This
editorial aims to present P. Wright's work in a more modern framework, in a
lecture note format that can be useful for teaching and linking to contemporary
research.

arXiv link: http://arxiv.org/abs/2501.16395v2

Econometrics arXiv paper, submitted: 2025-01-26

A General Approach to Relaxing Unconfoundedness

Authors: Matthew A. Masten, Alexandre Poirier, Muyang Ren

This paper defines a general class of relaxations of the unconfoundedness
assumption. This class includes several previous approaches as special cases,
including the marginal sensitivity model of Tan (2006). This class therefore
allows us to precisely compare and contrast these previously disparate
relaxations. We use this class to derive a variety of new identification
results which can be used to assess sensitivity to unconfoundedness. In
particular, the prior literature focuses on average parameters, like the
average treatment effect (ATE). We move beyond averages by providing sharp
bounds for a large class of parameters, including both the quantile treatment
effect (QTE) and the distribution of treatment effects (DTE), results which
were previously unknown even for the marginal sensitivity model.

arXiv link: http://arxiv.org/abs/2501.15400v1

Econometrics arXiv paper, submitted: 2025-01-25

Influence Function: Local Robustness and Efficiency

Authors: Ruonan Xu, Xiye Yang

We propose a direct approach to calculating influence functions based on the
concept of functional derivatives. The relative simplicity of our direct method
is demonstrated through well-known examples. Using influence functions as a key
device, we examine the connection and difference between local robustness and
efficiency in both joint and sequential identification/estimation procedures.
We show that the joint procedure is associated with efficiency, while the
sequential procedure is linked to local robustness. Furthermore, we provide
conditions that are theoretically verifiable and empirically testable on when
efficient and locally robust estimation for the parameter of interest in a
semiparametric model can be achieved simultaneously. In addition, we present
straightforward conditions for an adaptive procedure in the presence of
nuisance parameters.

arXiv link: http://arxiv.org/abs/2501.15307v1

Econometrics arXiv paper, submitted: 2025-01-25

Multiscale risk spillovers and external driving factors: Evidence from the global futures and spot markets of staple foods

Authors: Yun-Shi Dai, Peng-Fei Dai, Stéphane Goutte, Duc Khuong Nguyen, Wei-Xing Zhou

Stable and efficient food markets are crucial for global food security, yet
international staple food markets are increasingly exposed to complex risks,
including intensified risk contagion and escalating external uncertainties.
This paper systematically investigates risk spillovers in global staple food
markets and explores the key determinants of these spillover effects, combining
innovative decomposition-reconstruction techniques, risk connectedness
analysis, and random forest models. The findings reveal that short-term
components exhibit the highest volatility, with futures components generally
more volatile than spot components. Further analysis identifies two main risk
transmission patterns, namely cross-grain and cross-timescale transmission, and
clarifies the distinct roles of each component in various net risk spillover
networks. Additionally, price drivers, external uncertainties, and core
supply-demand indicators significantly influence these spillover effects, with
heterogeneous importance of varying factors in explaining different risk
spillovers. This study provides valuable insights into the risk dynamics of
staple food markets, offers evidence-based guidance for policymakers and market
participants to enhance risk warning and mitigation efforts, and supports the
stabilization of international food markets and the safeguarding of global food
security.

arXiv link: http://arxiv.org/abs/2501.15173v1

Econometrics arXiv paper, submitted: 2025-01-24

Quantitative Theory of Money or Prices? A Historical, Theoretical, and Econometric Analysis

Authors: Jose Mauricio Gomez Julian

This research studies the relation between money and prices and its practical
implications analyzing quarterly data from United States (1959-2022), Canada
(1961-2022), United Kingdom (1986-2022), and Brazil (1996-2022). The
historical, logical, and econometric consistency of the logical core of the two
main theories of money is analyzed using objective bayesian and frequentist
machine learning models, bayesian regularized artificial neural networks, and
ensemble learning. It is concluded that money is not neutral at any time
horizon and that, despite money is ultimately subordinated to prices, there is
a reciprocal influence over time between money and prices which constitute a
complex system. Non-neutrality is transmitted through aggregate demand and is
based on the exchange value of money as a monetary unit.

arXiv link: http://arxiv.org/abs/2501.14623v1

Econometrics arXiv paper, submitted: 2025-01-24

Triple Instrumented Difference-in-Differences

Authors: Sho Miyaji

In this paper, we formalize a triple instrumented difference-in-differences
(DID-IV). In this design, a triple Wald-DID estimand, which divides the
difference-in-difference-in-differences (DDD) estimand of the outcome by the
DDD estimand of the treatment, captures the local average treatment effect on
the treated. The identifying assumptions mainly comprise a monotonicity
assumption, and the common acceleration assumptions in the treatment and the
outcome. We extend the canonical triple DID-IV design to staggered instrument
cases. We also describe the estimation and inference in this design in
practice.

arXiv link: http://arxiv.org/abs/2501.14405v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-01-23

Detecting Sparse Cointegration

Authors: Jesus Gonzalo, Jean-Yves Pitarakis

We propose a two-step procedure to detect cointegration in high-dimensional
settings, focusing on sparse relationships. First, we use the adaptive LASSO to
identify the small subset of integrated covariates driving the equilibrium
relationship with a target series, ensuring model-selection consistency.
Second, we adopt an information-theoretic model choice criterion to distinguish
between stationarity and nonstationarity in the resulting residuals, avoiding
dependence on asymptotic distributional assumptions. Monte Carlo experiments
confirm robust finite-sample performance, even under endogeneity and serial
correlation.

arXiv link: http://arxiv.org/abs/2501.13839v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2025-01-23

A Non-Parametric Approach to Heterogeneity Analysis

Authors: Avner Seror

We develop a non-parametric methodology to quantify preference heterogeneity
in consumer choices. By repeatedly sampling individual observations and
partitioning agents into groups consistent with the Generalized Axiom of
Revealed Preferences (GARP), we construct a similarity matrix capturing latent
preference structures. Under mild assumptions, this matrix consistently and
asymptotically normally estimates the probability that any pair of agents share
a common utility function. Leveraging this, we develop hypothesis tests to
assess whether demographic characteristics systematically explain unobserved
heterogeneity. Simulations confirm the test's validity, and we apply the method
to a standard grocery expenditure dataset.

arXiv link: http://arxiv.org/abs/2501.13721v4

Econometrics arXiv updated paper (originally submitted: 2025-01-23)

Generalizability with ignorance in mind: learning what we do (not) know for archetypes discovery

Authors: Emily Breza, Arun G. Chandrasekhar, Davide Viviano

When studying policy interventions, researchers often pursue two goals: i)
identifying for whom the program has the largest effects (heterogeneity) and
ii) determining whether those patterns of treatment effects have predictive
power across environments (generalizability). We develop a framework to learn
when and how to partition observations into groups of individual and
environmental characterstics within which treatment effects are predictively
stable, and when instead extrapolation is unwarranted and further evidence is
needed. Our procedure determines in which contexts effects are generalizable
and when, instead, researchers should admit ignorance and collect more data. We
provide a decision-theoretic foundation, derive finite-sample regret
guarantees, and establish asymptotic inference results. We illustrate the
benefits of our approach by reanalyzing a multifaceted anti-poverty program
across six countries.

arXiv link: http://arxiv.org/abs/2501.13355v2

Econometrics arXiv paper, submitted: 2025-01-22

Continuity of the Distribution Function of the argmax of a Gaussian Process

Authors: Matias D. Cattaneo, Gregory Fletcher Cox, Michael Jansson, Kenichi Nagasawa

An increasingly important class of estimators has members whose asymptotic
distribution is non-Gaussian, yet characterizable as the argmax of a Gaussian
process. This paper presents high-level sufficient conditions under which such
asymptotic distributions admit a continuous distribution function. The
plausibility of the sufficient conditions is demonstrated by verifying them in
three prominent examples, namely maximum score estimation, empirical risk
minimization, and threshold regression estimation. In turn, the continuity
result buttresses several recently proposed inference procedures whose validity
seems to require a result of the kind established herein. A notable feature of
the high-level assumptions is that one of them is designed to enable us to
employ the celebrated Cameron-Martin theorem. In a leading special case, the
assumption in question is demonstrably weak and appears to be close to minimal.

arXiv link: http://arxiv.org/abs/2501.13265v1

Econometrics arXiv paper, submitted: 2025-01-22

An Adaptive Moving Average for Macroeconomic Monitoring

Authors: Philippe Goulet Coulombe, Karin Klieber

The use of moving averages is pervasive in macroeconomic monitoring,
particularly for tracking noisy series such as inflation. The choice of the
look-back window is crucial. Too long of a moving average is not timely enough
when faced with rapidly evolving economic conditions. Too narrow averages are
noisy, limiting signal extraction capabilities. As is well known, this is a
bias-variance trade-off. However, it is a time-varying one: the optimal size of
the look-back window depends on current macroeconomic conditions. In this
paper, we introduce a simple adaptive moving average estimator based on a
Random Forest using as sole predictor a time trend. Then, we compare the
narratives inferred from the new estimator to those derived from common
alternatives across series such as headline inflation, core inflation, and real
activity indicators. Notably, we find that this simple tool provides a
different account of the post-pandemic inflation acceleration and subsequent
deceleration.

arXiv link: http://arxiv.org/abs/2501.13222v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-01-21

Bias Analysis of Experiments for Multi-Item Multi-Period Inventory Control Policies

Authors: Xinqi Chen, Xingyu Bai, Zeyu Zheng, Nian Si

Randomized experiments, or A/B testing, are the gold standard for evaluating
interventions but are underutilized in the area of inventory management. This
study addresses this gap by analyzing A/B testing strategies in multi-item,
multi-period inventory systems with lost sales and capacity constraints. We
examine switchback experiments, item-level randomization, pairwise
randomization, and staggered rollouts, analyzing their biases theoretically and
comparing them through numerical experiments. Our findings provide actionable
guidance for selecting experimental designs across various contexts in
inventory management.

arXiv link: http://arxiv.org/abs/2501.11996v1

Econometrics arXiv paper, submitted: 2025-01-18

Estimation of Linear models from Coarsened Observations Estimation of Linear models Estimation from Coarsened Observations A Method of Moments Approach

Authors: Bernard M. S. van Praag, J. Peter Hop, William H. Greene

In the last few decades, the study of ordinal data in which the variable of
interest is not exactly observed but only known to be in a specific ordinal
category has become important. In Psychometrics such variables are analysed
under the heading of item response models (IRM). In Econometrics, subjective
well-being (SWB) and self-assessed health (SAH) studies, and in marketing
research, Ordered Probit, Ordered Logit, and Interval Regression models are
common research platforms. To emphasize that the problem is not specific to a
specific discipline we will use the neutral term coarsened observation. For
single-equation models estimation of the latent linear model by Maximum
Likelihood (ML) is routine. But, for higher -dimensional multivariate models it
is computationally cumbersome as estimation requires the evaluation of
multivariate normal distribution functions on a large scale. Our proposed
alternative estimation method, based on the Generalized Method of Moments
(GMM), circumvents this multivariate integration problem. The method is based
on the assumed zero correlations between explanatory variables and generalized
residuals. This is more general than ML but coincides with ML if the error
distribution is multivariate normal. It can be implemented by repeated
application of standard techniques. GMM provides a simpler and faster approach
than the usual ML approach. It is applicable to multiple -equation models with
-dimensional error correlation matrices and response categories for the
equation. It also yields a simple method to estimate polyserial and polychoric
correlations. Comparison of our method with the outcomes of the Stata ML
procedure cmp yields estimates that are not statistically different, while
estimation by our method requires only a fraction of the computing time.

arXiv link: http://arxiv.org/abs/2501.10726v1

Econometrics arXiv updated paper (originally submitted: 2025-01-18)

Recovering Unobserved Network Links from Aggregated Relational Data: Discussions on Bayesian Latent Surface Modeling and Penalized Regression

Authors: Yen-hsuan Tseng

Accurate network data are essential in fields such as economics, sociology,
and computer science. Aggregated Relational Data (ARD) provides a way to
capture network structures using partial data. This article compares two main
frameworks for recovering network links from ARD: Bayesian Latent Surface
Modeling (BLSM) and Frequentist Penalized Regression (FPR). Using simulation
studies and real-world applications, we evaluate their theoretical properties,
computational efficiency, and practical utility in domains like financial risk
assessment and epidemiology. Key findings emphasize the importance of trait
design, privacy considerations, and hybrid modeling approaches to improve
scalability and robustness.

arXiv link: http://arxiv.org/abs/2501.10675v2

Econometrics arXiv updated paper (originally submitted: 2025-01-17)

Prediction Sets and Conformal Inference with Interval Outcomes

Authors: Weiguang Liu, Áureo de Paula, Elie Tamer

Given data on a scalar random variable $Y$, a prediction set for $Y$ with
miscoverage level $\alpha$ is a set of values for $Y$ that contains a randomly
drawn $Y$ with probability $1 - \alpha$, where $\alpha \in (0,1)$. Among all
prediction sets that satisfy this coverage property, the oracle prediction set
is the one with the smallest volume. This paper provides estimation methods of
such prediction sets given observed conditioning covariates when $Y$ is
censored or measured in intervals. We first characterise the
oracle prediction set under interval censoring and develop a consistent
estimator for the shortest prediction {\it interval} that satisfies this
coverage property.These consistency results are extended to accommodate cases
where the prediction set consists of multiple disjoint intervals. We use
conformal inference to construct a prediction set that achieves finite-sample
validity under censoring and maintains consistency as sample size increases,
using a conformity score function designed for interval data. The procedure
accommodates the prediction uncertainty that is irreducible (due to the
stochastic nature of outcomes), the modelling uncertainty due to partial
identification and also sampling uncertainty that gets reduced as samples get
larger. We conduct a set of Monte Carlo simulations and an application to data
from the Current Population Survey. The results highlight the robustness and
efficiency of the proposed methods.

arXiv link: http://arxiv.org/abs/2501.10117v3

Econometrics arXiv paper, submitted: 2025-01-16

Convergence Rates of GMM Estimators with Nonsmooth Moments under Misspecification

Authors: Byunghoon Kang, Seojeong Lee, Juha Song

The asymptotic behavior of GMM estimators depends critically on whether the
underlying moment condition model is correctly specified. Hong and Li (2023,
Econometric Theory) showed that GMM estimators with nonsmooth
(non-directionally differentiable) moment functions are at best
$n^{1/3}$-consistent under misspecification. Through simulations, we verify the
slower convergence rate of GMM estimators in such cases. For the two-step GMM
estimator with an estimated weight matrix, our results align with theory.
However, for the one-step GMM estimator with the identity weight matrix, the
convergence rate remains $n$, even under severe misspecification.

arXiv link: http://arxiv.org/abs/2501.09540v1

Econometrics arXiv paper, submitted: 2025-01-16

Recovering latent linkage structures and spillover effects with structural breaks in panel data models

Authors: Ryo Okui, Yutao Sun, Wendun Wang

This paper introduces a framework to analyze time-varying spillover effects
in panel data. We consider panel models where a unit's outcome depends not only
on its own characteristics (private effects) but also on the characteristics of
other units (spillover effects). The linkage of units is allowed to be latent
and may shift at an unknown breakpoint. We propose a novel procedure to
estimate the breakpoint, linkage structure, spillover and private effects. We
address the high-dimensionality of spillover effect parameters using penalized
estimation, and estimate the breakpoint with refinement. We establish the
super-consistency of the breakpoint estimator, ensuring that inferences about
other parameters can proceed as if the breakpoint were known. The private
effect parameters are estimated using a double machine learning method. The
proposed method is applied to estimate the cross-country R&D spillovers, and we
find that the R&D spillovers become sparser after the financial crisis.

arXiv link: http://arxiv.org/abs/2501.09517v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2025-01-16

Semiparametrics via parametrics and contiguity

Authors: Adam Lee, Emil A. Stoltenberg, Per A. Mykland

Inference on the parametric part of a semiparametric model is no trivial
task. If one approximates the infinite dimensional part of the semiparametric
model by a parametric function, one obtains a parametric model that is in some
sense close to the semiparametric model and inference may proceed by the method
of maximum likelihood. Under regularity conditions, the ensuing maximum
likelihood estimator is asymptotically normal and efficient in the
approximating parametric model. Thus one obtains a sequence of asymptotically
normal and efficient estimators in a sequence of growing parametric models that
approximate the semiparametric model and, intuitively, the limiting
'semiparametric' estimator should be asymptotically normal and efficient as
well. In this paper we make this intuition rigorous: we move much of the
semiparametric analysis back into classical parametric terrain, and then
translate our parametric results back to the semiparametric world by way of
contiguity. Our approach departs from the conventional sieve literature by
being more specific about the approximating parametric models, by working not
only with but also under these when treating the parametric models, and by
taking full advantage of the mutual contiguity that we require between the
parametric and semiparametric models. We illustrate our theory with two
canonical examples of semiparametric models, namely the partially linear
regression model and the Cox regression model. An upshot of our theory is a
new, relatively simple, and rather parametric proof of the efficiency of the
Cox partial likelihood estimator.

arXiv link: http://arxiv.org/abs/2501.09483v2

Econometrics arXiv paper, submitted: 2025-01-14

The Impact of Digitalisation and Sustainability on Inclusiveness: Inclusive Growth Determinants

Authors: Radu Rusu, Camelia Oprean-Stan

Inclusiveness and economic development have been slowed by the pandemics and
military conflicts. This study investigates the main determinants of
inclusiveness at the European level. A multi-method approach is used, with
Principal Component Analysis (PCA) applied to create the Inclusiveness Index
and Generalised Method of Moments (GMM) analysis used to investigate the
determinants of inclusiveness. The data comprises a range of 22 years, from
2000 to 2021, for 32 European countries. The determinants of inclusiveness and
their effects were identified. First, economic growth, industrial upgrading,
electricity consumption, digitalisation, and the quantitative aspect of
governance, all have a positive impact on inclusive growth in Europe. Second,
the level of CO2 emissions and inflation have a negative impact on
inclusiveness. Tomorrow's inclusive and sustainable growth must include
investments in renewable energy, digital infrastructure, inequality policies,
sustainable governance, human capital, and inflation management. These findings
can help decision makers design inclusive growth policies.

arXiv link: http://arxiv.org/abs/2501.07880v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2025-01-14

Bridging Root-$n$ and Non-standard Asymptotics: Adaptive Inference in M-Estimation

Authors: Kenta Takatsu, Arun Kumar Kuchibhotla

This manuscript studies a general approach to construct confidence sets for
the solution of population-level optimization, commonly referred to as
M-estimation. Statistical inference for M-estimation poses significant
challenges due to the non-standard limiting behaviors of the corresponding
estimator, which arise in settings with increasing dimension of parameters,
non-smooth objectives, or constraints. We propose a simple and unified method
that guarantees validity in both regular and irregular cases. Moreover, we
provide a comprehensive width analysis of the proposed confidence set, showing
that the convergence rate of the diameter is adaptive to the unknown degree of
instance-specific regularity. We apply the proposed method to several
high-dimensional and irregular statistical problems.

arXiv link: http://arxiv.org/abs/2501.07772v3

Econometrics arXiv updated paper (originally submitted: 2025-01-13)

disco: Distributional Synthetic Controls

Authors: Florian Gunsilius, David Van Dijcke

The method of synthetic controls is widely used for evaluating causal effects
of policy changes in settings with observational data. Often, researchers aim
to estimate the causal impact of policy interventions on a treated unit at an
aggregate level while also possessing data at a finer granularity. In this
article, we introduce the new disco command, which implements the
Distributional Synthetic Controls method introduced in Gunsilius (2023). This
command allows researchers to construct entire synthetic distributions for the
treated unit based on an optimally weighted average of the distributions of the
control units. Several aggregation schemes are provided to facilitate clear
reporting of the distributional effects of the treatment. The package offers
both quantile-based and CDF-based approaches, comprehensive inference
procedures via bootstrap and permutation methods, and visualization
capabilities. We empirically illustrate the use of the package by replicating
the results in Van Dijcke et al. (2024).

arXiv link: http://arxiv.org/abs/2501.07550v2

Econometrics arXiv updated paper (originally submitted: 2025-01-13)

Estimating Sequential Search Models Based on a Partial Ranking Representation

Authors: Tinghan Zhang

The rapid growth of online shopping has made consumer search data
increasingly available, opening up new possibilities for empirical research.
Sequential search models offer a structured approach for analyzing such data,
but their estimation remains difficult. This is because consumers make optimal
decisions based on private information revealed in search, which is not
observed in typical data. As a result, the model's likelihood function involves
high-dimensional integrals that require intensive simulation. This paper
introduces a new representation that shows a consumer's optimal search
decision-making can be recast as a partial ranking over all actions available
throughout the consumer's search process. This reformulation yields the same
choice probabilities as the original model but leads to a simpler likelihood
function that relies less on simulation. Based on this insight, we provide
identification arguments and propose a modified GHK-style simulator that
improves both estimation performances and ease of implementation. The proposed
approach also generalizes to a wide range of model variants, including those
with incomplete search data and structural extensions such as search with
product discovery. It enables a tractable and unified estimation strategy
across different settings in sequential search models, offering both a new
perspective on understanding sequential search and a practical tool for its
application.

arXiv link: http://arxiv.org/abs/2501.07514v3

Econometrics arXiv paper, submitted: 2025-01-13

Forecasting for monetary policy

Authors: Laura Coroneo

This paper discusses three key themes in forecasting for monetary policy
highlighted in the Bernanke (2024) review: the challenges in economic
forecasting, the conditional nature of central bank forecasts, and the
importance of forecast evaluation. In addition, a formal evaluation of the Bank
of England's inflation forecasts indicates that, despite the large forecast
errors in recent years, they were still accurate relative to common benchmarks.

arXiv link: http://arxiv.org/abs/2501.07386v1

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2025-01-13

Social and Genetic Ties Drive Skewed Cross-Border Media Coverage of Disasters

Authors: Thiemo Fetzer, Prashant Garg

Climate change is increasing the frequency and severity of natural disasters
worldwide. Media coverage of these events may be vital to generate empathy and
mobilize global populations to address the common threat posed by climate
change. Using a dataset of 466 news sources from 123 countries, covering 135
million news articles since 2016, we apply an event study framework to measure
cross-border media activity following natural disasters. Our results shows that
while media attention rises after disasters, it is heavily skewed towards
certain events, notably earthquakes, accidents, and wildfires. In contrast,
climatologically salient events such as floods, droughts, or extreme
temperatures receive less coverage. This cross-border disaster reporting is
strongly related to the number of deaths associated with the event, especially
when the affected populations share strong social ties or genetic similarities
with those in the reporting country. Achieving more balanced media coverage
across different types of natural disasters may be essential to counteract
skewed perceptions. Further, fostering closer social connections between
countries may enhance empathy and mobilize the resources necessary to confront
the global threat of climate change.

arXiv link: http://arxiv.org/abs/2501.07615v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-01-12

Doubly Robust Inference on Causal Derivative Effects for Continuous Treatments

Authors: Yikun Zhang, Yen-Chi Chen

Statistical methods for causal inference with continuous treatments mainly
focus on estimating the mean potential outcome function, commonly known as the
dose-response curve. However, it is often not the dose-response curve but its
derivative function that signals the treatment effect. In this paper, we
investigate nonparametric inference on the derivative of the dose-response
curve with and without the positivity condition. Under the positivity and other
regularity conditions, we propose a doubly robust (DR) inference method for
estimating the derivative of the dose-response curve using kernel smoothing.
When the positivity condition is violated, we demonstrate the inconsistency of
conventional inverse probability weighting (IPW) and DR estimators, and
introduce novel bias-corrected IPW and DR estimators. In all settings, our DR
estimator achieves asymptotic normality at the standard nonparametric rate of
convergence with nonparametric efficiency guarantees. Additionally, our
approach reveals an interesting connection to nonparametric support and level
set estimation problems. Finally, we demonstrate the applicability of our
proposed estimators through simulations and a case study of evaluating a job
training program.

arXiv link: http://arxiv.org/abs/2501.06969v2

Econometrics arXiv updated paper (originally submitted: 2025-01-12)

Identification and Estimation of Simultaneous Equation Models Using Higher-Order Cumulant Restrictions

Authors: Ziyu Jiang

Identifying structural parameters in linear simultaneous-equation models is a
longstanding challenge. Recent work exploits information in higher-order
moments of non-Gaussian data. In this literature, the structural errors are
typically assumed to be uncorrelated so that, after standardizing the
covariance matrix of the observables (whitening), the structural parameter
matrix becomes orthogonal -- a device that underpins many identification proofs
but can be restrictive in econometric applications. We show that neither zero
covariance nor whitening is necessary. For any order $h>2$, a simple
diagonality condition on the $h$th-order cumulants alone identifies the
structural parameter matrix -- up to unknown scaling and permutation -- as the
solution to an eigenvector problem; no restrictions on cumulants of other
orders are required. This general, single-order result enlarges the class of
models covered by our framework and yields a sample-analogue estimator that is
$n$-consistent, asymptotically normal, and easy to compute. Furthermore,
when uncorrelatedness is intrinsic -- as in vector autoregressive (VAR) models
-- our framework provides a transparent overidentification test. Monte Carlo
experiments show favorable finite-sample performance, and two applications --
"Returns to Schooling" and "Uncertainty and the Business Cycle" -- demonstrate
its practical value.

arXiv link: http://arxiv.org/abs/2501.06777v2

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2025-01-12

The Causal Impact of Dean's List Recognition on Academic Performance: Evidence from a Regression Discontinuity Design

Authors: Luc, Chen

This study examines the causal impact of being placed on the Dean's List, a
positive education incentive, on future student performance using a regression
discontinuity design. The results suggest that for students with low prior
academic performance and who are native English speakers, there is a positive
impact of being on the Dean's List on the probability of getting onto the
Dean's List in the following year. However, being on the Dean's List does not
appear to have a statistically significant effect on subsequent GPA, total
credits taken, dropout rates, or the probability of graduating within four
years. These findings suggest that a place on the Dean's List may not be a
strong motivator for students to improve their academic performance and achieve
better outcomes.

arXiv link: http://arxiv.org/abs/2501.09763v1

Econometrics arXiv paper, submitted: 2025-01-11

Optimizing Financial Data Analysis: A Comparative Study of Preprocessing Techniques for Regression Modeling of Apple Inc.'s Net Income and Stock Prices

Authors: Kevin Ungar, Camelia Oprean-Stan

This article presents a comprehensive methodology for processing financial
datasets of Apple Inc., encompassing quarterly income and daily stock prices,
spanning from March 31, 2009, to December 31, 2023. Leveraging 60 observations
for quarterly income and 3774 observations for daily stock prices, sourced from
Macrotrends and Yahoo Finance respectively, the study outlines five distinct
datasets crafted through varied preprocessing techniques. Through detailed
explanations of aggregation, interpolation (linear, polynomial, and cubic
spline) and lagged variables methods, the study elucidates the steps taken to
transform raw data into analytically rich datasets. Subsequently, the article
delves into regression analysis, aiming to decipher which of the five data
processing methods best suits capital market analysis, by employing both linear
and polynomial regression models on each preprocessed dataset and evaluating
their performance using a range of metrics, including cross-validation score,
MSE, MAE, RMSE, R-squared, and Adjusted R-squared. The research findings reveal
that linear interpolation with polynomial regression emerges as the
top-performing method, boasting the lowest validation MSE and MAE values,
alongside the highest R-squared and Adjusted R-squared values.

arXiv link: http://arxiv.org/abs/2501.06587v1

Econometrics arXiv paper, submitted: 2025-01-11

A Hybrid Framework for Reinsurance Optimization: Integrating Generative Models and Reinforcement Learning

Authors: Stella C. Dong, James R. Finlay

Reinsurance optimization is critical for insurers to manage risk exposure,
ensure financial stability, and maintain solvency. Traditional approaches often
struggle with dynamic claim distributions, high-dimensional constraints, and
evolving market conditions. This paper introduces a novel hybrid framework that
integrates {Generative Models}, specifically Variational Autoencoders (VAEs),
with {Reinforcement Learning (RL)} using Proximal Policy Optimization (PPO).
The framework enables dynamic and scalable optimization of reinsurance
strategies by combining the generative modeling of complex claim distributions
with the adaptive decision-making capabilities of reinforcement learning.
The VAE component generates synthetic claims, including rare and catastrophic
events, addressing data scarcity and variability, while the PPO algorithm
dynamically adjusts reinsurance parameters to maximize surplus and minimize
ruin probability. The framework's performance is validated through extensive
experiments, including out-of-sample testing, stress-testing scenarios (e.g.,
pandemic impacts, catastrophic events), and scalability analysis across
portfolio sizes. Results demonstrate its superior adaptability, scalability,
and robustness compared to traditional optimization techniques, achieving
higher final surpluses and computational efficiency.
Key contributions include the development of a hybrid approach for
high-dimensional optimization, dynamic reinsurance parameterization, and
validation against stochastic claim distributions. The proposed framework
offers a transformative solution for modern reinsurance challenges, with
potential applications in multi-line insurance operations, catastrophe
modeling, and risk-sharing strategy design.

arXiv link: http://arxiv.org/abs/2501.06404v1

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2025-01-09

Sectorial Exclusion Criteria in the Marxist Analysis of the Average Rate of Profit: The United States Case (1960-2020)

Authors: Jose Mauricio Gomez Julian

The long-term estimation of the Marxist average rate of profit does not
adhere to a theoretically grounded standard regarding which economic activities
should or should not be included for such purposes, which is relevant because
methodological non-uniformity can be a significant source of overestimation or
underestimation, generating a less accurate reflection of the capital
accumulation dynamics. This research aims to provide a standard Marxist
decision criterion regarding the inclusion and exclusion of economic activities
for the calculation of the Marxist average profit rate for the case of United
States economic sectors from 1960 to 2020, based on the Marxist definition of
productive labor, its location in the circuit of capital, and its relationship
with the production of surplus value. Using wavelet-transformed Daubechies
filters with increased symmetry, empirical mode decomposition, Hodrick-Prescott
filter embedded in unobserved components model, and a wide variety of unit root
tests the internal theoretical consistency of the presented criteria is
evaluated. Also, the objective consistency of the theory is evaluated by a
dynamic factor auto-regressive model, Principal Component Analysis, Singular
Value Decomposition and Backward Elimination with Linear and Generalized Linear
Models. The results are consistent both theoretically and econometrically with
the logic of Marx's political economy.

arXiv link: http://arxiv.org/abs/2501.06270v1

Econometrics arXiv paper, submitted: 2025-01-09

Comparing latent inequality with ordinal data

Authors: David M. Kaplan, Wei Zhao

We propose new ways to compare two latent distributions when only ordinal
data are available and without imposing parametric assumptions on the
underlying continuous distributions. First, we contribute identification
results. We show how certain ordinal conditions provide evidence of
between-group inequality, quantified by particular quantiles being higher in
one latent distribution than in the other. We also show how other ordinal
conditions provide evidence of higher within-group inequality in one
distribution than in the other, quantified by particular interquantile ranges
being wider in one latent distribution than in the other. Second, we propose an
"inner" confidence set for the quantiles that are higher for the first latent
distribution. We also describe frequentist and Bayesian inference on features
of the ordinal distributions relevant to our identification results. Our
contributions are illustrated by empirical examples with mental health and
general health.

arXiv link: http://arxiv.org/abs/2501.05338v1

Econometrics arXiv cross-link from q-fin.GN (q-fin.GN), submitted: 2025-01-09

Time-Varying Bidirectional Causal Relationships Between Transaction Fees and Economic Activity of Subsystems Utilizing the Ethereum Blockchain Network

Authors: Lennart Ante, Aman Saggu

The Ethereum blockchain network enables transaction processing and
smart-contract execution through levies of transaction fees, commonly known as
gas fees. This framework mediates economic participation via a market-based
mechanism for gas fees, permitting users to offer higher gas fees to expedite
pro-cessing. Historically, the ensuing gas fee volatility led to critical
disequilibria between supply and demand for block space, presenting stakeholder
challenges. This study examines the dynamic causal interplay between
transaction fees and economic subsystems leveraging the network. By utilizing
data related to unique active wallets and transaction volume of each subsystem
and applying time-varying Granger causality analysis, we reveal temporal
heterogeneity in causal relationships between economic activity and transaction
fees across all subsystems. This includes (a) a bidirectional causal feedback
loop between cross-blockchain bridge user activity and transaction fees, which
diminishes over time, potentially signaling user migration; (b) a bidirectional
relationship between centralized cryptocurrency exchange deposit and withdrawal
transaction volume and fees, indicative of increased competition for block
space; (c) decentralized exchange volumes causally influence fees, while fees
causally influence user activity, although this relationship is weakening,
potentially due to the diminished significance of decentralized finance; (d)
intermittent causal relationships with maximal extractable value bots; (e) fees
causally in-fluence non-fungible token transaction volumes; and (f) a highly
significant and growing causal influence of transaction fees on stablecoin
activity and transaction volumes highlight its prominence.

arXiv link: http://arxiv.org/abs/2501.05299v1

Econometrics arXiv paper, submitted: 2025-01-09

RUM-NN: A Neural Network Model Compatible with Random Utility Maximisation for Discrete Choice Setups

Authors: Niousha Bagheri, Milad Ghasri, Michael Barlow

This paper introduces a framework for capturing stochasticity of choice
probabilities in neural networks, derived from and fully consistent with the
Random Utility Maximization (RUM) theory, referred to as RUM-NN. Neural network
models show remarkable performance compared with statistical models; however,
they are often criticized for their lack of transparency and interoperability.
The proposed RUM-NN is introduced in both linear and nonlinear structures. The
linear RUM-NN retains the interpretability and identifiability of traditional
econometric discrete choice models while using neural network-based estimation
techniques. The nonlinear RUM-NN extends the model's flexibility and predictive
capabilities to capture nonlinear relationships between variables within
utility functions. Additionally, the RUM-NN allows for the implementation of
various parametric distributions for unobserved error components in the utility
function and captures correlations among error terms. The performance of RUM-NN
in parameter recovery and prediction accuracy is rigorously evaluated using
synthetic datasets through Monte Carlo experiments. Additionally, RUM-NN is
evaluated on the Swissmetro and the London Passenger Mode Choice (LPMC)
datasets with different sets of distribution assumptions for the error
component. The results demonstrate that RUM-NN under a linear utility structure
and IID Gumbel error terms can replicate the performance of the Multinomial
Logit (MNL) model, but relaxing those constraints leads to superior performance
for both Swissmetro and LPMC datasets. By introducing a novel estimation
approach aligned with statistical theories, this study empowers econometricians
to harness the advantages of neural network models.

arXiv link: http://arxiv.org/abs/2501.05221v1

Econometrics arXiv updated paper (originally submitted: 2025-01-09)

DisSim-FinBERT: Text Simplification for Core Message Extraction in Complex Financial Texts

Authors: Wonseong Kim, Christina Niklaus, Choong Lyol Lee, Siegfried Handschuh

This study proposes DisSim-FinBERT, a novel framework that integrates
Discourse Simplification (DisSim) with Aspect-Based Sentiment Analysis (ABSA)
to enhance sentiment prediction in complex financial texts. By simplifying
intricate documents such as Federal Open Market Committee (FOMC) minutes,
DisSim improves the precision of aspect identification, resulting in sentiment
predictions that align more closely with economic events. The model preserves
the original informational content and captures the inherent volatility of
financial language, offering a more nuanced and accurate interpretation of
long-form financial communications. This approach provides a practical tool for
policymakers and analysts aiming to extract actionable insights from central
bank narratives and other detailed economic documents.

arXiv link: http://arxiv.org/abs/2501.04959v2

Econometrics arXiv updated paper (originally submitted: 2025-01-08)

Identification of dynamic treatment effects when treatment histories are partially observed

Authors: Akanksha Negi, Didier Nibbering

This paper presents a general difference-in-differences framework for
identifying path-dependent treatment effects when treatment histories are
partially observed. We introduce a novel robust estimator that adjusts for
missing histories using a combination of outcome, propensity score, and missing
treatment models. We show that this approach identifies the target parameter as
long as any two of the three models are correctly specified. The
method delivers improved robustness against competing alternatives under the
same set of identifying assumptions. Theoretical results and numerical
experiments demonstrate how the proposed method yields more accurate inference
compared to conventional and doubly robust estimators, particularly under
nontrivial missingness and misspecification scenarios. Two applications
demonstrate that the robust method can produce substantively different
estimates of path-dependent treatment effects relative to conventional
approaches.

arXiv link: http://arxiv.org/abs/2501.04853v2

Econometrics arXiv paper, submitted: 2025-01-08

Monthly GDP Growth Estimates for the U.S. States

Authors: Gary Koop, Stuart McIntyre, James Mitchell, Aristeidis Raftapostolos

This paper develops a mixed frequency vector autoregressive (MF-VAR) model to
produce nowcasts and historical estimates of monthly real state-level GDP for
the 50 U.S. states, plus Washington DC, from 1964 through the present day. The
MF-VAR model incorporates state and U.S. data at the monthly, quarterly, and
annual frequencies. Temporal and cross-sectional constraints are imposed to
ensure that the monthly state-level estimates are consistent with official
estimates of quarterly GDP at the U.S. and state-levels. We illustrate the
utility of the historical estimates in better understanding state business
cycles and cross-state dependencies. We show how the model produces accurate
nowcasts of state GDP three months ahead of the BEA's quarterly estimates,
after conditioning on the latest estimates of U.S. GDP.

arXiv link: http://arxiv.org/abs/2501.04607v1

Econometrics arXiv paper, submitted: 2025-01-07

Sequential Monte Carlo for Noncausal Processes

Authors: Gianluca Cubadda, Francesco Giancaterini, Stefano Grassi

This paper proposes a Sequential Monte Carlo approach for the Bayesian
estimation of mixed causal and noncausal models. Unlike previous Bayesian
estimation methods developed for these models, Sequential Monte Carlo offers
extensive parallelization opportunities, significantly reducing estimation time
and mitigating the risk of becoming trapped in local minima, a common issue in
noncausal processes. Simulation studies demonstrate the strong ability of the
algorithm to produce accurate estimates and correctly identify the process. In
particular, we propose a novel identification methodology that leverages the
Marginal Data Density and the Bayesian Information Criterion. Unlike previous
studies, this methodology determines not only the causal and noncausal
polynomial orders but also the error term distribution that best fits the data.
Finally, Sequential Monte Carlo is applied to a bivariate process containing
S$&$P Europe 350 ESG Index and Brent crude oil prices.

arXiv link: http://arxiv.org/abs/2501.03945v1

Econometrics arXiv paper, submitted: 2025-01-06

High-frequency Density Nowcasts of U.S. State-Level Carbon Dioxide Emissions

Authors: Ignacio Garrón, Andrey Ramos

Accurate tracking of anthropogenic carbon dioxide (CO2) emissions is crucial
for shaping climate policies and meeting global decarbonization targets.
However, energy consumption and emissions data are released annually and with
substantial publication lags, hindering timely decision-making. This paper
introduces a panel nowcasting framework to produce higher-frequency predictions
of the state-level growth rate of per-capita energy consumption and CO2
emissions in the United States (U.S.). Our approach employs a panel mixed-data
sampling (MIDAS) model to predict per-capita energy consumption growth,
considering quarterly personal income, monthly electricity consumption, and a
weekly economic conditions index as predictors. A bridge equation linking
per-capita CO2 emissions growth with the nowcasts of energy consumption is
estimated using panel quantile regression methods. A pseudo out-of-sample study
(2009-2018), simulating the real-time data release calendar, confirms the
improved accuracy of our nowcasts with respect to a historical benchmark. Our
results suggest that by leveraging the availability of higher-frequency
indicators, we not only enhance predictive accuracy for per-capita energy
consumption growth but also provide more reliable estimates of the distribution
of CO2 emissions growth.

arXiv link: http://arxiv.org/abs/2501.03380v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2025-01-06

A data-driven merit order: Learning a fundamental electricity price model

Authors: Paul Ghelasi, Florian Ziel

Power prices can be forecasted using data-driven models or fundamental
models. Data-driven models learn from historical patterns, while fundamental
models simulate electricity markets. Traditionally, fundamental models have
been too computationally demanding to allow for intrinsic parameter estimation
or frequent updates, which are essential for short-term forecasting. In this
paper, we propose a novel data-driven fundamental model that combines the
strengths of both approaches. We estimate the parameters of a fully fundamental
merit order model using historical data, similar to how data-driven models
work. This removes the need for fixed technical parameters or expert
assumptions, allowing most parameters to be calibrated directly to
observations. The model is efficient enough for quick parameter estimation and
forecast generation. We apply it to forecast German day-ahead electricity
prices and demonstrate that it outperforms both classical fundamental and
purely data-driven models. The hybrid model effectively captures price
volatility and sequential price clusters, which are becoming increasingly
important with the expansion of renewable energy sources. It also provides
valuable insights, such as fuel switches, marginal power plant contributions,
estimated parameters, dispatched plants, and power generation.

arXiv link: http://arxiv.org/abs/2501.02963v1

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2025-01-05

Identifying the Hidden Nexus between Benford Law Establishment in Stock Market and Market Efficiency: An Empirical Investigation

Authors: M. R. Sarkandiz

Benford's law, or the law of the first significant digit, has been subjected
to numerous studies due to its unique applications in financial fields,
especially accounting and auditing. However, studies that addressed the law's
establishment in the stock markets generally concluded that stock prices do not
comply with the underlying distribution. The present research, emphasizing data
randomness as the underlying assumption of Benford's law, has conducted an
empirical investigation of the Warsaw Stock Exchange. The outcomes demonstrated
that since stock prices are not distributed randomly, the law cannot be held in
the stock market. Besides, the Chi-square goodness-of-fit test also supported
the obtained results. Moreover, it is discussed that the lack of randomness
originated from market inefficiency. In other words, violating the efficient
market hypothesis has caused the time series non-randomness and the failure to
establish Benford's law.

arXiv link: http://arxiv.org/abs/2501.02674v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2025-01-05

Re-examining Granger Causality from Causal Bayesian Networks Perspective

Authors: S. A. Adedayo

Characterizing cause-effect relationships in complex systems could be
critical to understanding these systems. For many, Granger causality (GC)
remains a computational tool of choice to identify causal relations in time
series data. Like other causal discovery tools, GC has limitations and has been
criticized as a non-causal framework. Here, we addressed one of the recurring
criticisms of GC by endowing it with proper causal interpretation. This was
achieved by analyzing GC from Reichenbach's Common Cause Principles (RCCPs) and
causal Bayesian networks (CBNs) lenses. We showed theoretically and graphically
that this reformulation endowed GC with a proper causal interpretation under
certain assumptions and achieved satisfactory results on simulation.

arXiv link: http://arxiv.org/abs/2501.02672v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2025-01-05

Revealed Social Networks

Authors: Christopher P. Chambers, Yusufcan Masatlioglu, Christopher Turansick

The linear-in-means model is the standard empirical model of peer effects.
Using choice data and exogenous group variation, we first develop a revealed
preference style test for the linear-in-means model. This test is formulated as
a linear program and can be interpreted as a no money pump condition with an
additional incentive compatibility constraint. We then study the identification
properties of the linear-in-means model. A key takeaway from our analysis is
that there is a close relationship between the dimension of the outcome
variable and the identifiability of the model. Importantly, when the outcome
variable is one-dimensional, failures of identification are generic. On the
other hand, when the outcome variable is multi-dimensional, we provide natural
conditions under which identification is generic.

arXiv link: http://arxiv.org/abs/2501.02609v2

Econometrics arXiv updated paper (originally submitted: 2025-01-04)

Estimating Discrete Choice Demand Models with Sparse Market-Product Shocks

Authors: Zhentong Lu, Kenichi Shimizu

We propose a new approach to estimating the random coefficient logit demand
model for differentiated products when the vector of market-product level
shocks is sparse. Assuming sparsity, we establish nonparametric identification
of the distribution of random coefficients and demand shocks under mild
conditions. Then we develop a Bayesian procedure, which exploits the sparsity
structure using shrinkage priors, to conduct inference about the model
parameters and counterfactual quantities. Comparing to the standard BLP (Berry,
Levinsohn, & Pakes, 1995) method, our approach does not require demand
inversion or instrumental variables (IVs), thus provides a compelling
alternative when IVs are not available or their validity is questionable. Monte
Carlo simulations validate our theoretical findings and demonstrate the
effectiveness of our approach, while empirical applications reveal evidence of
sparse demand shocks in well-known datasets.

arXiv link: http://arxiv.org/abs/2501.02381v2

Econometrics arXiv paper, submitted: 2025-01-04

Prediction with Differential Covariate Classification: Illustrated by Racial/Ethnic Classification in Medical Risk Assessment

Authors: Charles F. Manski, John Mullahy, Atheendar S. Venkataramani

A common practice in evidence-based decision-making uses estimates of
conditional probabilities P(y|x) obtained from research studies to predict
outcomes y on the basis of observed covariates x. Given this information,
decisions are then based on the predicted outcomes. Researchers commonly assume
that the predictors used in the generation of the evidence are the same as
those used in applying the evidence: i.e., the meaning of x in the two
circumstances is the same. This may not be the case in real-world settings.
Across a wide-range of settings, ranging from clinical practice or education
policy, demographic attributes (e.g., age, race, ethnicity) are often
classified differently in research studies than in decision settings. This
paper studies identification in such settings. We propose a formal framework
for prediction with what we term differential covariate classification (DCC).
Using this framework, we analyze partial identification of probabilistic
predictions and assess how various assumptions influence the identification
regions. We apply the findings to a range of settings, focusing mainly on
differential classification of individuals' race and ethnicity in clinical
medicine. We find that bounds on P(y|x) can be wide, and the information needed
to narrow them available only in special cases. These findings highlight an
important problem in using evidence in decision making, a problem that has not
yet been fully appreciated in debates on classification in public policy and
medicine.

arXiv link: http://arxiv.org/abs/2501.02318v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2025-01-04

Efficient estimation of average treatment effects with unmeasured confounding and proxies

Authors: Chunrong Ai, Jiawei Shan

One approach to estimating the average treatment effect in binary treatment
with unmeasured confounding is the proximal causal inference, which assumes the
availability of outcome and treatment confounding proxies. The key identifying
result relies on the existence of a so-called bridge function. A parametric
specification of the bridge function is usually postulated and estimated using
standard techniques. The estimated bridge function is then plugged in to
estimate the average treatment effect. This approach may have two efficiency
losses. First, the bridge function may not be efficiently estimated since it
solves an integral equation. Second, the sequential procedure may fail to
account for the correlation between the two steps. This paper proposes to
approximate the integral equation with increasing moment restrictions and
jointly estimate the bridge function and the average treatment effect. Under
sufficient conditions, we show that the proposed estimator is efficient. To
assist implementation, we propose a data-driven procedure for selecting the
tuning parameter (i.e., number of moment restrictions). Simulation studies
reveal that the proposed method performs well in finite samples, and
application to the right heart catheterization dataset from the SUPPORT study
demonstrates its practical value.

arXiv link: http://arxiv.org/abs/2501.02214v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2025-01-03

Grid-level impacts of renewable energy on thermal generation: efficiency, emissions and flexibility

Authors: Dhruv Suri, Jacques de Chalendar, Ines Azevedo

Wind and solar generation constitute an increasing share of electricity
supply globally. We find that this leads to shifts in the operational dynamics
of thermal power plants. Using fixed effects panel regression across seven
major U.S. balancing authorities, we analyze the impact of renewable generation
on coal, natural gas combined cycle plants, and natural gas combustion
turbines. Wind generation consistently displaces thermal output, while effects
from solar vary significantly by region, achieving substantial displacement in
areas with high solar penetration such as the California Independent System
Operator but limited impacts in coal reliant grids such as the Midcontinent
Independent System Operator. Renewable energy sources effectively reduce carbon
dioxide emissions in regions with flexible thermal plants, achieving
displacement effectiveness as high as one hundred and two percent in the
California Independent System Operator and the Electric Reliability Council of
Texas. However, in coal heavy areas such as the Midcontinent Independent System
Operator and the Pennsylvania New Jersey Maryland Interconnection,
inefficiencies from ramping and cycling reduce carbon dioxide displacement to
as low as seventeen percent and often lead to elevated nitrogen oxides and
sulfur dioxide emissions. These findings underscore the critical role of grid
design, fuel mix, and operational flexibility in shaping the emissions benefits
of renewables. Targeted interventions, including retrofitting high emitting
plants and deploying energy storage, are essential to maximize emissions
reductions and support the decarbonization of electricity systems.

arXiv link: http://arxiv.org/abs/2501.01954v2

Econometrics arXiv cross-link from q-fin.GN (q-fin.GN), submitted: 2025-01-03

Quantifying A Firm's AI Engagement: Constructing Objective, Data-Driven, AI Stock Indices Using 10-K Filings

Authors: Lennart Ante, Aman Saggu

Following an analysis of existing AI-related exchange-traded funds (ETFs), we
reveal the selection criteria for determining which stocks qualify as
AI-related are often opaque and rely on vague phrases and subjective judgments.
This paper proposes a new, objective, data-driven approach using natural
language processing (NLP) techniques to classify AI stocks by analyzing annual
10-K filings from 3,395 NASDAQ-listed firms between 2011 and 2023. This
analysis quantifies each company's engagement with AI through binary indicators
and weighted AI scores based on the frequency and context of AI-related terms.
Using these metrics, we construct four AI stock indices-the Equally Weighted AI
Index (AII), the Size-Weighted AI Index (SAII), and two Time-Discounted AI
Indices (TAII05 and TAII5X)-offering different perspectives on AI investment.
We validate our methodology through an event study on the launch of OpenAI's
ChatGPT, demonstrating that companies with higher AI engagement saw
significantly greater positive abnormal returns, with analyses supporting the
predictive power of our AI measures. Our indices perform on par with or surpass
14 existing AI-themed ETFs and the Nasdaq Composite Index in risk-return
profiles, market responsiveness, and overall performance, achieving higher
average daily returns and risk-adjusted metrics without increased volatility.
These results suggest our NLP-based approach offers a reliable,
market-responsive, and cost-effective alternative to existing AI-related ETF
products. Our innovative methodology can also guide investors, asset managers,
and policymakers in using corporate data to construct other thematic
portfolios, contributing to a more transparent, data-driven, and competitive
approach.

arXiv link: http://arxiv.org/abs/2501.01763v1

Econometrics arXiv paper, submitted: 2025-01-03

Instrumental Variables with Time-Varying Exposure: New Estimates of Revascularization Effects on Quality of Life

Authors: Joshua D. Angrist, Bruno Ferman, Carol Gao, Peter Hull, Otavio L. Tecchio, Robert W. Yeh

The ISCHEMIA Trial randomly assigned patients with ischemic heart disease to
an invasive treatment strategy centered on revascularization with a control
group assigned non-invasive medical therapy. As is common in such “strategy
trials,” many participants assigned to treatment remained untreated while many
assigned to control crossed over into treatment. Intention-to-treat (ITT)
analyses of strategy trials preserve randomization-based comparisons, but ITT
effects are diluted by non-compliance. Conventional per-protocol analyses that
condition on treatment received are likely biased by discarding random
assignment. In trials where compliance choices are made shortly after
assignment, instrumental variables (IV) methods solve both problems --
recovering an undiluted average causal effect of treatment for treated subjects
who comply with trial protocol. In ISCHEMIA, however, some controls were
revascularized as long as five years after random assignment. This paper
extends the IV framework for strategy trials, allowing for such dynamic
non-random compliance behavior. IV estimates of long-run revascularization
effects on quality of life are markedly larger than previously reported ITT and
per-protocol estimates. We also show how to estimate complier characteristics
in a dynamic-treatment setting. These estimates reveal increasing selection
bias in naive time-varying per-protocol estimates of revascularization effects.
Compliers have baseline health similar to that of the study population, while
control-group crossovers are far sicker.

arXiv link: http://arxiv.org/abs/2501.01623v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2025-01-01

HMM-LSTM Fusion Model for Economic Forecasting

Authors: Guhan Sivakumar

This paper explores the application of Hidden Markov Models (HMM) and Long
Short-Term Memory (LSTM) neural networks for economic forecasting, focusing on
predicting CPI inflation rates. The study explores a new approach that
integrates HMM-derived hidden states and means as additional features for LSTM
modeling, aiming to enhance the interpretability and predictive performance of
the models. The research begins with data collection and preprocessing,
followed by the implementation of the HMM to identify hidden states
representing distinct economic conditions. Subsequently, LSTM models are
trained using the original and augmented data sets, allowing for comparative
analysis and evaluation. The results demonstrate that incorporating HMM-derived
data improves the predictive accuracy of LSTM models, particularly in capturing
complex temporal patterns and mitigating the impact of volatile economic
conditions. Additionally, the paper discusses the implementation of Integrated
Gradients for model interpretability and provides insights into the economic
dynamics reflected in the forecasting outcomes.

arXiv link: http://arxiv.org/abs/2501.02002v1

Econometrics arXiv paper, submitted: 2025-01-01

The Impact of Socio-Economic Challenges and Technological Progress on Economic Inequality: An Estimation with the Perelman Model and Ricci Flow Methods

Authors: Davit Gondauri

The article examines the impact of 16 key parameters of the Georgian economy
on economic inequality, using the Perelman model and Ricci flow mathematical
methods. The study aims to conduct a deep analysis of the impact of
socio-economic challenges and technological progress on the dynamics of the
Gini coefficient. The article examines the following parameters: income
distribution, productivity (GDP per hour), unemployment rate, investment rate,
inflation rate, migration (net negative), education level, social mobility,
trade infrastructure, capital flows, innovative activities, access to
healthcare, fiscal policy (budget deficit), international trade (turnover
relative to GDP), social protection programs, and technological access. The
results of the study confirm that technological innovations and social
protection programs have a positive impact on reducing inequality. Productivity
growth, improving the quality of education, and strengthening R&D investments
increase the possibility of inclusive development. Sensitivity analysis shows
that social mobility and infrastructure are important factors that affect
economic stability. The accuracy of the model is confirmed by high R^2 values
(80-90%) and the statistical reliability of the Z-statistic (<0.05). The study
uses Ricci flow methods, which allow for a geometric analysis of the
transformation of economic parameters in time and space. Recommendations
include the strategic introduction of technological progress, the expansion of
social protection programs, improving the quality of education, and encouraging
international trade, which will contribute to economic sustainability and
reduce inequality. The article highlights multifaceted approaches that combine
technological innovation and responses to socio-economic challenges to ensure
sustainable and inclusive economic development.

arXiv link: http://arxiv.org/abs/2501.00800v1

Econometrics arXiv updated paper (originally submitted: 2024-12-31)

Copula Central Asymmetry of Equity Portfolios

Authors: Lorenzo Frattarolo

Financial crises are usually associated with increased cross-sectional
dependence between asset returns, causing asymmetry between the lower and upper
tail of return distribution. The detection of asymmetric dependence is now
understood to be essential for market supervision, risk management, and
portfolio allocation. I propose a non-parametric test procedure for the
hypothesis of copula central symmetry based on the Cram\'er-von Mises distance
of the empirical copula and its survival counterpart, deriving the asymptotic
properties of the test under standard assumptions for stationary time series. I
use the powerful tie-break bootstrap that, as the included simulation study
implies, allows me to detect asymmetries with up to 25 series and the number of
observations corresponding to one year of daily returns. Applying the procedure
to US portfolio returns separately for each year shows that the amount of
copula central asymmetry is time-varying and less present in the recent past.
Asymmetry is more critical in portfolios based on size and less in portfolios
based on book-to-market and momentum. In portfolios based on industry
classification, asymmetry is present during market downturns, coherently with
the financial contagion narrative.

arXiv link: http://arxiv.org/abs/2501.00634v2

Econometrics arXiv paper, submitted: 2024-12-31

Panel Estimation of Taxable Income Elasticities with Heterogeneity and Endogenous Budget Sets

Authors: Soren Blomquist, Anil Kumar, Whitney K. Newey

This paper introduces an estimator for the average of heterogeneous
elasticities of taxable income (ETI), addressing key econometric challenges
posed by nonlinear budget sets. Building on an isoelastic utility framework, we
derive a linear-in-logs taxable income specification that incorporates the
entire budget set while allowing for individual-specific ETI and productivity
growth. To account for endogenous budget sets, we employ panel data and
estimate individual-specific ridge regressions, constructing a debiased average
of ridge coefficients to obtain the average ETI.

arXiv link: http://arxiv.org/abs/2501.00633v1

Econometrics arXiv paper, submitted: 2024-12-31

Regression discontinuity aggregation, with an application to the union effects on inequality

Authors: Kirill Borusyak, Matan Kolerman-Shemer

We extend the regression discontinuity (RD) design to settings where each
unit's treatment status is an average or aggregate across multiple
discontinuity events. Such situations arise in many studies where the outcome
is measured at a higher level of spatial or temporal aggregation (e.g., by
state with district-level discontinuities) or when spillovers from
discontinuity events are of interest. We propose two novel estimation
procedures - one at the level at which the outcome is measured and the other in
the sample of discontinuities - and show that both identify a local average
causal effect under continuity assumptions similar to those of standard RD
designs. We apply these ideas to study the effect of unionization on inequality
in the United States. Using credible variation from close unionization
elections at the establishment level, we show that a higher rate of newly
unionized workers in a state-by-industry cell reduces wage inequality within
the cell.

arXiv link: http://arxiv.org/abs/2501.00428v1

Econometrics arXiv paper, submitted: 2024-12-30

Causal Hangover Effects

Authors: Andreas Santucci, Eric Lax

It's not unreasonable to think that in-game sporting performance can be
affected partly by what takes place off the court. We can't observe what
happens between games directly. Instead, we proxy for the possibility of
athletes partying by looking at play following games in party cities. We are
interested to see if teams exhibit a decline in performance the day following a
game in a city with active nightlife; we call this a "hangover effect". Part of
the question is determining a reasonable way to measure levels of nightlife,
and correspondingly which cities are notorious for it; we colloquially refer to
such cities as "party cities". To carry out this study, we exploit data on
bookmaker spreads: the expected score differential between two teams after
conditioning on observable performance in past games and expectations about the
upcoming game. We expect a team to meet the spread half the time, since this is
one of the easiest ways for bookmakers to guarantee a profit. We construct a
model which attempts to estimate the causal effect of visiting a "party city"
on subsequent day performance as measured by the odds of beating the spread. In
particular, we only consider the hangover effect on games played back-to-back
within 24 hours of each other. To the extent that odds of beating the spread
against next day opponent is uncorrelated with playing in a party city the day
before, which should be the case under an efficient betting market, we have
identification in our variable of interest. We find that visiting a city with
active nightlife the day prior to a game does have a statistically significant
negative effect on a team's likelihood of meeting bookmakers' expectations for
both NBA and MLB.

arXiv link: http://arxiv.org/abs/2412.21181v1

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2024-12-30

Analyzing Country-Level Vaccination Rates and Determinants of Practical Capacity to Administer COVID-19 Vaccines

Authors: Sharika J. Hegde, Max T. M. Ng, Marcos Rios, Hani S. Mahmassani, Ying Chen, Karen Smilowitz

The COVID-19 vaccine development, manufacturing, transportation, and
administration proved an extreme logistics operation of global magnitude.
Global vaccination levels, however, remain a key concern in preventing the
emergence of new strains and minimizing the impact of the pandemic's disruption
of daily life. In this paper, country-level vaccination rates are analyzed
through a queuing framework to extract service rates that represent the
practical capacity of a country to administer vaccines. These rates are further
characterized through regression and interpretable machine learning methods
with country-level demographic, governmental, and socio-economic variates.
Model results show that participation in multi-governmental collaborations such
as COVAX may improve the ability to vaccinate. Similarly, improved
transportation and accessibility variates such as roads per area for low-income
countries and rail lines per area for high-income countries can improve rates.
It was also found that for low-income countries specifically, improvements in
basic and health infrastructure (as measured through spending on healthcare,
number of doctors and hospital beds per 100k, population percent with access to
electricity, life expectancy, and vehicles per 1000 people) resulted in higher
vaccination rates. Of the high-income countries, those with larger 65-plus
populations struggled to vaccinate at high rates, indicating potential
accessibility issues for the elderly. This study finds that improving basic and
health infrastructure, focusing on accessibility in the last mile, particularly
for the elderly, and fostering global partnerships can improve logistical
operations of such a scale. Such structural impediments and inequities in
global health care must be addressed in preparation for future global public
health crises.

arXiv link: http://arxiv.org/abs/2501.01447v2

Econometrics arXiv paper, submitted: 2024-12-30

Econometric Analysis of Pandemic Disruption and Recovery Trajectory in the U.S. Rail Freight Industry

Authors: Max T. M. Ng, Hani S. Mahmassani, Joseph L. Schofer

To measure the impacts on U.S. rail and intermodal freight by economic
disruptions of the 2007-09 Great Recession and the COVID-19 pandemic, this
paper uses time series analysis with the AutoRegressive Integrated Moving
Average (ARIMA) family of models and covariates to model intermodal and
commodity-specific rail freight volumes based on pre-disruption data. A
framework to construct scenarios and select parameters and variables is
demonstrated. By comparing actual freight volumes during the disruptions
against three counterfactual scenarios, Trend Continuation, Covariate-adapted
Trend Continuation, and Full Covariate-adapted Prediction, the characteristics
and differences in magnitude and timing between the two disruptions and their
effects across nine freight components are examined.
Results show the disruption impacts differ from measurement by simple
comparison with pre-disruption levels or year-on-year comparison depending on
the structural trend and seasonal pattern. Recovery Pace Plots are introduced
to support comparison in recovery speeds across freight components. Accounting
for economic variables helps improve model fitness. It also enables evaluation
of the change in association between freight volumes and covariates, where
intermodal freight was found to respond more slowly during the pandemic,
potentially due to supply constraint.

arXiv link: http://arxiv.org/abs/2412.20669v1

Econometrics arXiv paper, submitted: 2024-12-29

Automated Demand Forecasting in small to medium-sized enterprises

Authors: Thomas Gaertner, Christoph Lippert, Stefan Konigorski

In response to the growing demand for accurate demand forecasts, this
research proposes a generalized automated sales forecasting pipeline tailored
for small- to medium-sized enterprises (SMEs). Unlike large corporations with
dedicated data scientists for sales forecasting, SMEs often lack such
resources. To address this, we developed a comprehensive forecasting pipeline
that automates time series sales forecasting, encompassing data preparation,
model training, and selection based on validation results.
The development included two main components: model preselection and the
forecasting pipeline. In the first phase, state-of-the-art methods were
evaluated on a showcase dataset, leading to the selection of ARIMA, SARIMAX,
Holt-Winters Exponential Smoothing, Regression Tree, Dilated Convolutional
Neural Networks, and Generalized Additive Models. An ensemble prediction of
these models was also included. Long-Short-Term Memory (LSTM) networks were
excluded due to suboptimal prediction accuracy, and Facebook Prophet was
omitted for compatibility reasons.
In the second phase, the proposed forecasting pipeline was tested with SMEs
in the food and electric industries, revealing variable model performance
across different companies. While one project-based company derived no benefit,
others achieved superior forecasts compared to naive estimators.
Our findings suggest that no single model is universally superior. Instead, a
diverse set of models, when integrated within an automated validation
framework, can significantly enhance forecasting accuracy for SMEs. These
results emphasize the importance of model diversity and automated validation in
addressing the unique needs of each business. This research contributes to the
field by providing SMEs access to state-of-the-art sales forecasting tools,
enabling data-driven decision-making and improving operational efficiency.

arXiv link: http://arxiv.org/abs/2412.20420v1

Econometrics arXiv updated paper (originally submitted: 2024-12-28)

Fitting Dynamically Misspecified Models: An Optimal Transportation Approach

Authors: Jean-Jacques Forneron, Zhongjun Qu

This paper considers filtering, parameter estimation, and testing for
potentially dynamically misspecified state-space models. When dynamics are
misspecified, filtered values of state variables often do not satisfy model
restrictions, making them hard to interpret, and parameter estimates may fail
to characterize the dynamics of filtered variables. To address this, a
sequential optimal transportation approach is used to generate a
model-consistent sample by mapping observations from a flexible reduced-form to
the structural conditional distribution iteratively. Filtered series from the
generated sample are model-consistent. Specializing to linear processes, a
closed-form Optimal Transport Filtering algorithm is derived. Minimizing the
discrepancy between generated and actual observations defines an Optimal
Transport Estimator. Its large sample properties are derived. A specification
test determines if the model can reproduce the sample path, or if the
discrepancy is statistically significant. Empirical applications to trend-cycle
decomposition, DSGE models, and affine term structure models illustrate the
methodology and the results.

arXiv link: http://arxiv.org/abs/2412.20204v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-12-28

Debiased Nonparametric Regression for Statistical Inference and Distributionally Robustness

Authors: Masahiro Kato

This study proposes a debiasing method for smooth nonparametric estimators.
While machine learning techniques such as random forests and neural networks
have demonstrated strong predictive performance, their theoretical properties
remain relatively underexplored. In particular, many modern algorithms lack
guarantees of pointwise and uniform risk convergence, as well as asymptotic
normality. These properties are essential for statistical inference and robust
estimation and have been well-established for classical methods such as
Nadaraya-Watson regression. To ensure these properties for various
nonparametric regression estimators, we introduce a model-free debiasing
method. By incorporating a correction term that estimates the conditional
expected residual of the original estimator, or equivalently, its estimation
error, into the initial nonparametric regression estimator, we obtain a
debiased estimator that satisfies pointwise and uniform risk convergence, along
with asymptotic normality, under mild smoothness conditions. These properties
facilitate statistical inference and enhance robustness to covariate shift,
making the method broadly applicable to a wide range of nonparametric
regression problems.

arXiv link: http://arxiv.org/abs/2412.20173v3

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2024-12-27

Assets Forecasting with Feature Engineering and Transformation Methods for LightGBM

Authors: Konstantinos-Leonidas Bisdoulis

Fluctuations in the stock market rapidly shape the economic world and
consumer markets, impacting millions of individuals. Hence, accurately
forecasting it is essential for mitigating risks, including those associated
with inactivity. Although research shows that hybrid models of Deep Learning
(DL) and Machine Learning (ML) yield promising results, their computational
requirements often exceed the capabilities of average personal computers,
rendering them inaccessible to many. In order to address this challenge in this
paper we optimize LightGBM (an efficient implementation of gradient-boosted
decision trees (GBDT)) for maximum performance, while maintaining low
computational requirements. We introduce novel feature engineering techniques
including indicator-price slope ratios and differences of close and open prices
divided by the corresponding 14-period Exponential Moving Average (EMA),
designed to capture market dynamics and enhance predictive accuracy.
Additionally, we test seven different feature and target variable
transformation methods, including returns, logarithmic returns, EMA ratios and
their standardized counterparts as well as EMA difference ratios, so as to
identify the most effective ones weighing in both efficiency and accuracy. The
results demonstrate Log Returns, Returns and EMA Difference Ratio constitute
the best target variable transformation methods, with EMA ratios having a lower
percentage of correct directional forecasts, and standardized versions of
target variable transformations requiring significantly more training time.
Moreover, the introduced features demonstrate high feature importance in
predictive performance across all target variable transformation methods. This
study highlights an accessible, computationally efficient approach to stock
market forecasting using LightGBM, making advanced forecasting techniques more
widely attainable.

arXiv link: http://arxiv.org/abs/2501.07580v1

Econometrics arXiv updated paper (originally submitted: 2024-12-27)

Asymptotic Properties of the Maximum Likelihood Estimator for Markov-switching Observation-driven Models

Authors: Frederik Krabbe

A Markov-switching observation-driven model is a stochastic process
$((S_t,Y_t))_{t \in Z}$ where (i) $(S_t)_{t \in Z}$ is an
unobserved Markov process taking values in a finite set and (ii) $(Y_t)_{t \in
Z}$ is an observed process such that the conditional distribution of
$Y_t$ given all past $Y$'s and the current and all past $S$'s depends only on
all past $Y$'s and $S_t$. In this paper, we prove the consistency and
asymptotic normality of the maximum likelihood estimator for such model. As a
special case hereof, we give conditions under which the maximum likelihood
estimator for the widely applied Markov-switching generalised autoregressive
conditional heteroscedasticity model introduced by Haas et al. (2004b) is
consistent and asymptotic normal.

arXiv link: http://arxiv.org/abs/2412.19555v2

Econometrics arXiv cross-link from q-fin.CP (q-fin.CP), submitted: 2024-12-26

Sentiment trading with large language models

Authors: Kemal Kirtac, Guido Germano

We investigate the efficacy of large language models (LLMs) in sentiment
analysis of U.S. financial news and their potential in predicting stock market
returns. We analyze a dataset comprising 965,375 news articles that span from
January 1, 2010, to June 30, 2023; we focus on the performance of various LLMs,
including BERT, OPT, FINBERT, and the traditional Loughran-McDonald dictionary
model, which has been a dominant methodology in the finance literature. The
study documents a significant association between LLM scores and subsequent
daily stock returns. Specifically, OPT, which is a GPT-3 based LLM, shows the
highest accuracy in sentiment prediction with an accuracy of 74.4%, slightly
ahead of BERT (72.5%) and FINBERT (72.2%). In contrast, the Loughran-McDonald
dictionary model demonstrates considerably lower effectiveness with only 50.1%
accuracy. Regression analyses highlight a robust positive impact of OPT model
scores on next-day stock returns, with coefficients of 0.274 and 0.254 in
different model specifications. BERT and FINBERT also exhibit predictive
relevance, though to a lesser extent. Notably, we do not observe a significant
relationship between the Loughran-McDonald dictionary model scores and stock
returns, challenging the efficacy of this traditional method in the current
financial context. In portfolio performance, the long-short OPT strategy excels
with a Sharpe ratio of 3.05, compared to 2.11 for BERT and 2.07 for FINBERT
long-short strategies. Strategies based on the Loughran-McDonald dictionary
yield the lowest Sharpe ratio of 1.23. Our findings emphasize the superior
performance of advanced LLMs, especially OPT, in financial market prediction
and portfolio management, marking a significant shift in the landscape of
financial analysis tools with implications to financial regulation and policy
analysis.

arXiv link: http://arxiv.org/abs/2412.19245v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2024-12-25

Using Ordinal Voting to Compare the Utilitarian Welfare of a Status Quo and A Proposed Policy: A Simple Nonparametric Analysis

Authors: Charles F. Manski

The relationship of policy choice by majority voting and by maximization of
utilitarian welfare has long been discussed. I consider choice between a status
quo and a proposed policy when persons have interpersonally comparable cardinal
utilities taking values in a bounded interval, voting is compulsory, and each
person votes for a policy that maximizes utility. I show that knowledge of the
attained status quo welfare and the voting outcome yields an informative bound
on welfare with the proposed policy. The bound contains the value of status quo
welfare, so the better utilitarian policy is not known. The minimax-regret
decision and certain Bayes decisions choose the proposed policy if its vote
share exceeds the known value of status quo welfare. This procedure differs
from majority rule, which chooses the proposed policy if its vote share exceeds
1/2.

arXiv link: http://arxiv.org/abs/2412.18714v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-12-24

Conditional Influence Functions

Authors: Victor Chernozhukov, Whitney K. Newey, Vasilis Syrgkanis

There are many nonparametric objects of interest that are a function of a
conditional distribution. One important example is an average treatment effect
conditional on a subset of covariates. Many of these objects have a conditional
influence function that generalizes the classical influence function of a
functional of a (unconditional) distribution. Conditional influence functions
have important uses analogous to those of the classical influence function.
They can be used to construct Neyman orthogonal estimating equations for
conditional objects of interest that depend on high dimensional regressions.
They can be used to formulate local policy effects and describe the effect of
local misspecification on conditional objects of interest. We derive
conditional influence functions for functionals of conditional means and other
features of the conditional distribution of an outcome variable. We show how
these can be used for locally linear estimation of conditional objects of
interest. We give rate conditions for first step machine learners to have no
effect on asymptotic distributions of locally linear estimators. We also give a
general construction of Neyman orthogonal estimating equations for conditional
objects of interest.

arXiv link: http://arxiv.org/abs/2412.18080v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2024-12-23

Minimax Optimal Simple Regret in Two-Armed Best-Arm Identification

Authors: Masahiro Kato

This study investigates an asymptotically minimax optimal algorithm in the
two-armed fixed-budget best-arm identification (BAI) problem. Given two
treatment arms, the objective is to identify the arm with the highest expected
outcome through an adaptive experiment. We focus on the Neyman allocation,
where treatment arms are allocated following the ratio of their outcome
standard deviations. Our primary contribution is to prove the minimax
optimality of the Neyman allocation for the simple regret, defined as the
difference between the expected outcomes of the true best arm and the estimated
best arm. Specifically, we first derive a minimax lower bound for the expected
simple regret, which characterizes the worst-case performance achievable under
the location-shift distributions, including Gaussian distributions. We then
show that the simple regret of the Neyman allocation asymptotically matches
this lower bound, including the constant term, not just the rate in terms of
the sample size, under the worst-case distribution. Notably, our optimality
result holds without imposing locality restrictions on the distribution, such
as the local asymptotic normality. Furthermore, we demonstrate that the Neyman
allocation reduces to the uniform allocation, i.e., the standard randomized
controlled trial, under Bernoulli distributions.

arXiv link: http://arxiv.org/abs/2412.17753v2

Econometrics arXiv paper, submitted: 2024-12-23

A large non-Gaussian structural VAR with application to Monetary Policy

Authors: Jan Prüser

We propose a large structural VAR which is identified by higher moments
without the need to impose economically motivated restrictions. The model
scales well to higher dimensions, allowing the inclusion of a larger number of
variables. We develop an efficient Gibbs sampler to estimate the model. We also
present an estimator of the deviance information criterion to facilitate model
comparison. Finally, we discuss how economically motivated restrictions can be
added to the model. Experiments with artificial data show that the model
possesses good estimation properties. Using real data we highlight the benefits
of including more variables in the structural analysis. Specifically, we
identify a monetary policy shock and provide empirical evidence that prices and
economic output respond with a large delay to the monetary policy shock.

arXiv link: http://arxiv.org/abs/2412.17598v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-12-23

A Necessary and Sufficient Condition for Size Controllability of Heteroskedasticity Robust Test Statistics

Authors: Benedikt M. Pötscher, David Preinerstorfer

We revisit size controllability results in P\"otscher and Preinerstorfer
(2025) concerning heteroskedasticity robust test statistics in regression
models. For the special, but important, case of testing a single restriction
(e.g., a zero restriction on a single coefficient), we povide a necessary and
sufficient condition for size controllability, whereas the condition in
P\"otscher and Preinerstorfer (2025) is, in general, only sufficient (even in
the case of testing a single restriction).

arXiv link: http://arxiv.org/abs/2412.17470v2

Econometrics arXiv paper, submitted: 2024-12-23

Advanced Models for Hourly Marginal CO2 Emission Factor Estimation: A Synergy between Fundamental and Statistical Approaches

Authors: Souhir Ben Amor, Smaranda Sgarciu, Taimyra BatzLineiro, Felix Muesgens

Global warming is caused by increasing concentrations of greenhouse gases,
particularly carbon dioxide (CO2). A metric used to quantify the change in CO2
emissions is the marginal emission factor, defined as the marginal change in
CO2 emissions resulting from a marginal change in electricity demand over a
specified period. This paper aims to present two methodologies to estimate the
marginal emission factor in a decarbonized electricity system with high
temporal resolution. First, we present an energy systems model that
incrementally calculates the marginal emission factors. Second, we examine a
Markov Switching Dynamic Regression model, a statistical model designed to
estimate marginal emission factors faster and use an incremental marginal
emission factor as a benchmark to assess its precision. For the German
electricity market, we estimate the marginal emissions factor time series
historically (2019, 2020) using Agora Energiewende and for the future (2025,
2030, and 2040) using estimated energy system data. The results indicate that
the Markov Switching Dynamic Regression model is more accurate in estimating
marginal emission factors than the Dynamic Linear Regression models, which are
frequently used in the literature. Hence, the Markov Switching Dynamic
Regression model is a simpler alternative to the computationally intensive
incremental marginal emissions factor, especially when short-term marginal
emissions factor estimation is needed. The results of the marginal emission
factor estimation are applied to an exemplary low-emission vehicle charging
scenario to estimate CO2 savings by shifting the charge hours to those
corresponding to the lower marginal emissions factor. By implementing this
emission-minimized charging approach, an average reduction of 31% in the
marginal emission factor was achieved over the 5 years.

arXiv link: http://arxiv.org/abs/2412.17379v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-12-23

Bayesian penalized empirical likelihood and Markov Chain Monte Carlo sampling

Authors: Jinyuan Chang, Cheng Yong Tang, Yuanzheng Zhu

In this study, we introduce a novel methodological framework called Bayesian
Penalized Empirical Likelihood (BPEL), designed to address the computational
challenges inherent in empirical likelihood (EL) approaches. Our approach has
two primary objectives: (i) to enhance the inherent flexibility of EL in
accommodating diverse model conditions, and (ii) to facilitate the use of
well-established Markov Chain Monte Carlo (MCMC) sampling schemes as a
convenient alternative to the complex optimization typically required for
statistical inference using EL. To achieve the first objective, we propose a
penalized approach that regularizes the Lagrange multipliers, significantly
reducing the dimensionality of the problem while accommodating a comprehensive
set of model conditions. For the second objective, our study designs and
thoroughly investigates two popular sampling schemes within the BPEL context.
We demonstrate that the BPEL framework is highly flexible and efficient,
enhancing the adaptability and practicality of EL methods. Our study highlights
the practical advantages of using sampling techniques over traditional
optimization methods for EL problems, showing rapid convergence to the global
optima of posterior distributions and ensuring the effective resolution of
complex statistical inference challenges.

arXiv link: http://arxiv.org/abs/2412.17354v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-12-22

Gaussian and Bootstrap Approximation for Matching-based Average Treatment Effect Estimators

Authors: Zhaoyang Shi, Chinmoy Bhattacharjee, Krishnakumar Balasubramanian, Wolfgang Polonik

We establish Gaussian approximation bounds for covariate and
rank-matching-based Average Treatment Effect (ATE) estimators. By analyzing
these estimators through the lens of stabilization theory, we employ the
Malliavin-Stein method to derive our results. Our bounds precisely quantify the
impact of key problem parameters, including the number of matches and treatment
balance, on the accuracy of the Gaussian approximation. Additionally, we
develop multiplier bootstrap procedures to estimate the limiting distribution
in a fully data-driven manner, and we leverage the derived Gaussian
approximation results to further obtain bootstrap approximation bounds. Our
work not only introduces a novel theoretical framework for commonly used ATE
estimators, but also provides data-driven methods for constructing
non-asymptotically valid confidence intervals.

arXiv link: http://arxiv.org/abs/2412.17181v1

Econometrics arXiv cross-link from math.OC (math.OC), submitted: 2024-12-22

Competitive Facility Location with Market Expansion and Customer-centric Objective

Authors: Cuong Le, Tien Mai, Ngan Ha Duong, Minh Hoang Ha

We study a competitive facility location problem, where customer behavior is
modeled and predicted using a discrete choice random utility model. The goal is
to strategically place new facilities to maximize the overall captured customer
demand in a competitive marketplace. In this work, we introduce two novel
considerations. First, the total customer demand in the market is not fixed but
is modeled as an increasing function of the customers' total utilities. Second,
we incorporate a new term into the objective function, aiming to balance the
firm's benefits and customer satisfaction. Our new formulation exhibits a
highly nonlinear structure and is not directly solved by existing approaches.
To address this, we first demonstrate that, under a concave market expansion
function, the objective function is concave and submodular, allowing for a
$(1-1/e)$ approximation solution by a simple polynomial-time greedy algorithm.
We then develop a new method, called Inner-approximation, which enables us to
approximate the mixed-integer nonlinear problem (MINLP), with arbitrary
precision, by an MILP without introducing additional integer variables. We
further demonstrate that our inner-approximation method consistently yields
lower approximations than the outer-approximation methods typically used in the
literature. Moreover, we extend our settings by considering a general
(non-concave) market-expansion function and show that the Inner-approximation
mechanism enables us to approximate the resulting MINLP, with arbitrary
precision, by an MILP. To further enhance this MILP, we show how to
significantly reduce the number of additional binary variables by leveraging
concave areas of the objective function. Extensive experiments demonstrate the
efficiency of our approaches.

arXiv link: http://arxiv.org/abs/2412.17021v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-12-21

Sharp Results for Hypothesis Testing with Risk-Sensitive Agents

Authors: Flora C. Shi, Stephen Bates, Martin J. Wainwright

Statistical protocols are often used for decision-making involving multiple
parties, each with their own incentives, private information, and ability to
influence the distributional properties of the data. We study a game-theoretic
version of hypothesis testing in which a statistician, also known as a
principal, interacts with strategic agents that can generate data. The
statistician seeks to design a testing protocol with controlled error, while
the data-generating agents, guided by their utility and prior information,
choose whether or not to opt in based on expected utility maximization. This
strategic behavior affects the data observed by the statistician and,
consequently, the associated testing error. We analyze this problem for general
concave and monotonic utility functions and prove an upper bound on the Bayes
false discovery rate (FDR). Underlying this bound is a form of prior
elicitation: we show how an agent's choice to opt in implies a certain upper
bound on their prior null probability. Our FDR bound is unimprovable in a
strong sense, achieving equality at a single point for an individual agent and
at any countable number of points for a population of agents. We also
demonstrate that our testing protocols exhibit a desirable maximin property
when the principal's utility is considered. To illustrate the qualitative
predictions of our theory, we examine the effects of risk aversion, reward
stochasticity, and signal-to-noise ratio, as well as the implications for the
Food and Drug Administration's testing protocols.

arXiv link: http://arxiv.org/abs/2412.16452v1

Econometrics arXiv updated paper (originally submitted: 2024-12-20)

Counting Defiers: A Design-Based Model of an Experiment Can Reveal Evidence Beyond the Average Effect

Authors: Neil Christy, Amanda Ellen Kowalski

Leveraging structure from the randomization process, a design-based model of
an experiment with a binary intervention and outcome can reveal evidence beyond
the average effect without additional data. Our proposed statistical decision
rule yields a design-based maximum likelihood estimate (MLE) of the joint
distribution of potential outcomes in intervention and control, specified by
the numbers of always takers, compliers, defiers, and never takers in the
sample. With a visualization, we explain why the likelihood varies with the
number of defiers within the Frechet bounds determined by the estimated
marginal distributions. We illustrate how the MLE varies with all possible data
in samples of 50 and 200: when the estimated average effect is positive, the
MLE includes defiers if takeup is below half in control and above half in
intervention, unless takeup is zero in control or full in intervention. Under
optimality conditions, for increasing sample sizes in which exhaustive grid
search is possible, our rule's performance increases relative to a rule that
places equal probability on all numbers of defiers within the estimated Frechet
bounds. We offer insights into effect heterogeneity in two published
experiments with positive, statistically significant average effects on takeup
of desired health behaviors and plausible defiers. Our 95% smallest credible
sets for defiers include zero and the estimated upper Frechet bound,
demonstrating that evidence is weak. Yet, our rule yields no defiers in one
experiment. In the other, our rule yields the estimated upper Frechet bound on
defiers -- a count representing over 18% of the sample.

arXiv link: http://arxiv.org/abs/2412.16352v3

Econometrics arXiv updated paper (originally submitted: 2024-12-19)

Testing linearity of spatial interaction functions à la Ramsey

Authors: Abhimanyu Gupta, Jungyoon Lee, Francesca Rossi

We propose a computationally straightforward test for the linearity of a
spatial interaction function. Such functions arise commonly, either as
practitioner imposed specifications or due to optimizing behaviour by agents.
Our conditional heteroskedasticity robust test is nonparametric, but based on
the Lagrange Multiplier principle and reminiscent of the Ramsey RESET approach.
This entails estimation only under the null hypothesis, which yields an easy to
estimate linear spatial autoregressive model. Monte Carlo simulations show
excellent size control and power. An empirical study with Finnish data
illustrates the test's practical usefulness, shedding light on debates on the
presence of tax competition among neighbouring municipalities.

arXiv link: http://arxiv.org/abs/2412.14778v2

Econometrics arXiv updated paper (originally submitted: 2024-12-19)

Good Controls Gone Bad: Difference-in-Differences with Covariates

Authors: Sunny Karim, Matthew D. Webb

This paper introduces the two-way common causal covariates (CCC) assumption,
which is necessary to get an unbiased estimate of the ATT when using
time-varying covariates in existing Difference-in-Differences methods. The
two-way CCC assumption implies that the effect of the covariates remain the
same between groups and across time periods. This assumption has been implied
in previous literature, but has not been explicitly addressed. Through
theoretical proofs and a Monte Carlo simulation study, we show that the
standard TWFE and the CS-DID estimators are biased when the two-way CCC
assumption is violated. We propose a new estimator called the Intersection
Difference-in-differences (DID-INT) which can provide an unbiased estimate of
the ATT under two-way CCC violations. DID-INT can also identify the ATT under
heterogeneous treatment effects and with staggered treatment rollout. The
estimator relies on parallel trends of the residuals of the outcome variable,
after appropriately adjusting for covariates. This covariate residualization
can recover parallel trends that are hidden with conventional estimators.

arXiv link: http://arxiv.org/abs/2412.14447v2

Econometrics arXiv updated paper (originally submitted: 2024-12-18)

An Analysis of the Relationship Between the Characteristics of Innovative Consumers and the Degree of Serious Leisure in User Innovation

Authors: Taichi Abe, Yasunobu Morita

This study examines the relationship between the concept of serious leisure
and user innovation. We adopted the characteristics of innovative consumers
identified by Luthje (2004)-product use experience, information exchange, and
new product adoption speed-to analyze their correlation with serious leisure
engagement. The analysis utilized consumer behavior survey data from the
"Marketing Analysis Contest 2023" sponsored by Nomura Research Institute,
examining the relationship between innovative consumer characteristics and the
degree of serious leisure (Serious Leisure Inventory and Measure: SLIM). Since
the contest data did not directly measure innovative consumer characteristics
or serious leisure engagement, we established alternative variables for
quantitative analysis. The results showed that the SLIM alternative variable
had positive correlations with diverse product experiences and early adoption
of new products. However, no clear relationship was found with information
exchange among consumers. These findings suggest that serious leisure practice
may serve as a potential antecedent to user innovation. The leisure career
perspective of the serious leisure concept may capture the motivations of user
innovators that Okada and Nishikawa (2019) identified.

arXiv link: http://arxiv.org/abs/2412.13556v2

Econometrics arXiv paper, submitted: 2024-12-17

Dual Interpretation of Machine Learning Forecasts

Authors: Philippe Goulet Coulombe, Maximilian Goebel, Karin Klieber

Machine learning predictions are typically interpreted as the sum of
contributions of predictors. Yet, each out-of-sample prediction can also be
expressed as a linear combination of in-sample values of the predicted
variable, with weights corresponding to pairwise proximity scores between
current and past economic events. While this dual route leads nowhere in some
contexts (e.g., large cross-sectional datasets), it provides sparser
interpretations in settings with many regressors and little training data-like
macroeconomic forecasting. In this case, the sequence of contributions can be
visualized as a time series, allowing analysts to explain predictions as
quantifiable combinations of historical analogies. Moreover, the weights can be
viewed as those of a data portfolio, inspiring new diagnostic measures such as
forecast concentration, short position, and turnover. We show how weights can
be retrieved seamlessly for (kernel) ridge regression, random forest, boosted
trees, and neural networks. Then, we apply these tools to analyze post-pandemic
forecasts of inflation, GDP growth, and recession probabilities. In all cases,
the approach opens the black box from a new angle and demonstrates how machine
learning models leverage history partly repeating itself.

arXiv link: http://arxiv.org/abs/2412.13076v1

Econometrics arXiv paper, submitted: 2024-12-15

Moderating the Mediation Bootstrap for Causal Inference

Authors: Kees Jan van Garderen, Noud van Giersbergen

Mediation analysis is a form of causal inference that investigates indirect
effects and causal mechanisms. Confidence intervals for indirect effects play a
central role in conducting inference. The problem is non-standard leading to
coverage rates that deviate considerably from their nominal level. The default
inference method in the mediation model is the paired bootstrap, which
resamples directly from the observed data. However, a residual bootstrap that
explicitly exploits the assumed causal structure (X->M->Y) could also be
applied. There is also a debate whether the bias-corrected (BC) bootstrap
method is superior to the percentile method, with the former showing liberal
behavior (actual coverage too low) in certain circumstances. Moreover,
bootstrap methods tend to be very conservative (coverage higher than required)
when mediation effects are small. Finally, iterated bootstrap methods like the
double bootstrap have not been considered due to their high computational
demands. We investigate the issues mentioned in the simple mediation model by a
large-scale simulation. Results are explained using graphical methods and the
newly derived finite-sample distribution. The main findings are: (i)
conservative behavior of the bootstrap is caused by extreme dependence of the
bootstrap distribution's shape on the estimated coefficients (ii) this
dependence leads to counterproductive correction of the the double bootstrap.
The added randomness of the BC method inflates the coverage in the absence of
mediation, but still leads to (invalid) liberal inference when the mediation
effect is small.

arXiv link: http://arxiv.org/abs/2412.11285v1

Econometrics arXiv updated paper (originally submitted: 2024-12-15)

VAR models with an index structure: A survey with new results

Authors: Gianluca Cubadda

The main aim of this paper is to review recent advances in the multivariate
autoregressive index model [MAI], originally proposed by Reinsel (1983), and
their applications to economic and financial time series. MAI has recently
gained momentum because it can be seen as a link between two popular but
distinct multivariate time series approaches: vector autoregressive modeling
[VAR] and the dynamic factor model [DFM]. Indeed, on the one hand, the MAI is a
VAR model with a peculiar reduced-rank structure; on the other hand, it allows
for identification of common components and common shocks in a similar way as
the DFM. The focus is on recent developments of the MAI, which include
extending the original model with individual autoregressive structures,
stochastic volatility, time-varying parameters, high-dimensionality, and
cointegration. In addition, new insights on previous contributions and a novel
model are also provided.

arXiv link: http://arxiv.org/abs/2412.11278v2

Econometrics arXiv paper, submitted: 2024-12-15

Treatment Evaluation at the Intensive and Extensive Margins

Authors: Phillip Heiler, Asbjørn Kaufmann, Bezirgen Veliyev

This paper provides a solution to the evaluation of treatment effects in
selective samples when neither instruments nor parametric assumptions are
available. We provide sharp bounds for average treatment effects under a
conditional monotonicity assumption for all principal strata, i.e. units
characterizing the complete intensive and extensive margins. Most importantly,
we allow for a large share of units whose selection is indifferent to
treatment, e.g. due to non-compliance. The existence of such a population is
crucially tied to the regularity of sharp population bounds and thus
conventional asymptotic inference for methods such as Lee bounds can be
misleading. It can be solved using smoothed outer identification regions for
inference. We provide semiparametrically efficient debiased machine learning
estimators for both regular and smooth bounds that can accommodate
high-dimensional covariates and flexible functional forms. Our study of active
labor market policy reveals the empirical prevalence of the aforementioned
indifference population and supports results from previous impact analysis
under much weaker assumptions.

arXiv link: http://arxiv.org/abs/2412.11179v1

Econometrics arXiv paper, submitted: 2024-12-14

Forecasting realized covariances using HAR-type models

Authors: Matias Quiroz, Laleh Tafakori, Hans Manner

We investigate methods for forecasting multivariate realized covariances
matrices applied to a set of 30 assets that were included in the DJ30 index at
some point, including two novel methods that use existing (univariate) log of
realized variance models that account for attenuation bias and time-varying
parameters. We consider the implications of some modeling choices within the
class of heterogeneous autoregressive models. The following are our key
findings. First, modeling the logs of the marginal volatilities is strongly
preferred over direct modeling of marginal volatility. Thus, our proposed model
that accounts for attenuation bias (for the log-response) provides superior
one-step-ahead forecasts over existing multivariate realized covariance
approaches. Second, accounting for measurement errors in marginal realized
variances generally improves multivariate forecasting performance, but to a
lesser degree than previously found in the literature. Third, time-varying
parameter models based on state-space models perform almost equally well.
Fourth, statistical and economic criteria for comparing the forecasting
performance lead to some differences in the models' rankings, which can
partially be explained by the turbulent post-pandemic data in our out-of-sample
validation dataset using sub-sample analyses.

arXiv link: http://arxiv.org/abs/2412.10791v1

Econometrics arXiv paper, submitted: 2024-12-14

Do LLMs Act as Repositories of Causal Knowledge?

Authors: Nick Huntington-Klein, Eleanor J. Murray

Large language models (LLMs) offer the potential to automate a large number
of tasks that previously have not been possible to automate, including some in
science. There is considerable interest in whether LLMs can automate the
process of causal inference by providing the information about causal links
necessary to build a structural model. We use the case of confounding in the
Coronary Drug Project (CDP), for which there are several studies listing
expert-selected confounders that can serve as a ground truth. LLMs exhibit
mediocre performance in identifying confounders in this setting, even though
text about the ground truth is in their training data. Variables that experts
identify as confounders are only slightly more likely to be labeled as
confounders by LLMs compared to variables that experts consider
non-confounders. Further, LLM judgment on confounder status is highly
inconsistent across models, prompts, and irrelevant concerns like
multiple-choice option ordering. LLMs do not yet have the ability to automate
the reporting of causal links.

arXiv link: http://arxiv.org/abs/2412.10635v1

Econometrics arXiv updated paper (originally submitted: 2024-12-13)

An overview of meta-analytic methods for economic research

Authors: Amin Haghnejad, Mahboobeh Farahati

Meta-analysis employs statistical techniques to synthesize the results of
individual studies, providing an estimate of the overall effect size for a
specific outcome of interest. The direction and magnitude of this estimate,
along with its confidence interval, offer valuable insights into the underlying
phenomenon or relationship. As an extension of standard meta-analysis,
meta-regression analysis incorporates multiple moderators -- capturing key
study characteristics -- into the model to explain heterogeneity in true effect
sizes across studies. This study provides a comprehensive overview of
meta-analytic procedures tailored to economic research, addressing key
challenges such as between-study heterogeneity, publication bias, and effect
size dependence. It equips researchers with essential tools and insights to
conduct rigorous and informative meta-analyses in economics and related fields.

arXiv link: http://arxiv.org/abs/2412.10608v2

Econometrics arXiv updated paper (originally submitted: 2024-12-13)

A Neyman-Orthogonalization Approach to the Incidental Parameter Problem

Authors: Stéphane Bonhomme, Koen Jochmans, Martin Weidner

A popular approach to perform inference on a target parameter in the presence
of nuisance parameters is to construct estimating equations that are orthogonal
to the nuisance parameters, in the sense that their expected first derivative
is zero. Such first-order orthogonalization may, however, not suffice when the
nuisance parameters are very imprecisely estimated. Leading examples where this
is the case are models for panel and network data that feature fixed effects.
In this paper, we show how, in the conditional-likelihood setting, estimating
equations can be constructed that are orthogonal to any chosen order. Combining
these equations with sample splitting yields higher-order bias-corrected
estimators of target parameters. In an empirical application we apply our
method to a fixed-effect model of team production and obtain estimates of
complementarity in production and impacts of counterfactual re-allocations.

arXiv link: http://arxiv.org/abs/2412.10304v2

Econometrics arXiv cross-link from q-fin.CP (q-fin.CP), submitted: 2024-12-12

Geometric Deep Learning for Realized Covariance Matrix Forecasting

Authors: Andrea Bucci, Michele Palma, Chao Zhang

Traditional methods employed in matrix volatility forecasting often overlook
the inherent Riemannian manifold structure of symmetric positive definite
matrices, treating them as elements of Euclidean space, which can lead to
suboptimal predictive performance. Moreover, they often struggle to handle
high-dimensional matrices. In this paper, we propose a novel approach for
forecasting realized covariance matrices of asset returns using a
Riemannian-geometry-aware deep learning framework. In this way, we account for
the geometric properties of the covariance matrices, including possible
non-linear dynamics and efficient handling of high-dimensionality. Moreover,
building upon a Fr\'echet sample mean of realized covariance matrices, we are
able to extend the HAR model to the matrix-variate. We demonstrate the efficacy
of our approach using daily realized covariance matrices for the 50 most
capitalized companies in the S&P 500 index, showing that our method outperforms
traditional approaches in terms of predictive accuracy.

arXiv link: http://arxiv.org/abs/2412.09517v1

Econometrics arXiv updated paper (originally submitted: 2024-12-12)

A Kernel Score Perspective on Forecast Disagreement and the Linear Pool

Authors: Fabian Krüger

The variance of a linearly combined forecast distribution (or linear pool)
consists of two components: The average variance of the component distributions
(`average uncertainty'), and the average squared difference between the
components' means and the pool's mean (`disagreement'). This paper shows that
similar decompositions hold for a class of uncertainty measures that can be
constructed as entropy functions of kernel scores. The latter are a rich family
of scoring rules that covers point and distribution forecasts for univariate
and multivariate, discrete and continuous settings. We further show that the
disagreement term is useful for understanding the ex-post performance of the
linear pool (as compared to the component distributions), and motivates using
the linear pool instead of other forecast combination techniques. From a
practical perspective, the results in this paper suggest principled measures of
forecast disagreement in a wide range of applied settings.

arXiv link: http://arxiv.org/abs/2412.09430v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2024-12-12

The Global Carbon Budget as a cointegrated system

Authors: Mikkel Bennedsen, Eric Hillebrand, Morten Ørregaard Nielsen

The Global Carbon Budget, maintained by the Global Carbon Project, summarizes
Earth's global carbon cycle through four annual time series beginning in 1959:
atmospheric CO$_2$ concentrations, anthropogenic CO$_2$ emissions, and CO$_2$
uptake by land and ocean. We analyze these four time series as a multivariate
(cointegrated) system. Statistical tests show that the four time series are
cointegrated with rank three and identify anthropogenic CO$_2$ emissions as the
single stochastic trend driving the nonstationary dynamics of the system. The
three cointegrated relations correspond to the physical relations that the
sinks are linearly related to atmospheric concentrations and that the change in
concentrations equals emissions minus the combined uptake by land and ocean.
Furthermore, likelihood ratio tests show that a parametrically restricted
error-correction model that embodies these physical relations and accounts for
the El Ni\ no/Southern Oscillation cannot be rejected on the data. The model
can be used for both in-sample and out-of-sample analysis. In an application of
the latter, we demonstrate that projections based on this model, using Shared
Socioeconomic Pathways scenarios, yield results consistent with established
climate science.

arXiv link: http://arxiv.org/abs/2412.09226v3

Econometrics arXiv updated paper (originally submitted: 2024-12-12)

Panel Stochastic Frontier Models with Latent Group Structures

Authors: Kazuki Tomioka, Thomas T. Yang, Xibin Zhang

Stochastic frontier models have attracted significant interest over the years
due to their unique feature of including a distinct inefficiency term alongside
the usual error term. To effectively separate these two components, strong
distributional assumptions are often necessary. To overcome this limitation,
numerous studies have sought to relax or generalize these models for more
robust estimation. In line with these efforts, we introduce a latent group
structure that accommodates heterogeneity across firms, addressing not only the
stochastic frontiers but also the distribution of the inefficiency term. This
framework accounts for the distinctive features of stochastic frontier models,
and we propose a practical estimation procedure to implement it. Simulation
studies demonstrate the strong performance of our proposed method, which is
further illustrated through an application to study the cost efficiency of the
U.S. commercial banking sector.

arXiv link: http://arxiv.org/abs/2412.08831v2

Econometrics arXiv paper, submitted: 2024-12-10

Machine Learning the Macroeconomic Effects of Financial Shocks

Authors: Niko Hauzenberger, Florian Huber, Karin Klieber, Massimiliano Marcellino

We propose a method to learn the nonlinear impulse responses to structural
shocks using neural networks, and apply it to uncover the effects of US
financial shocks. The results reveal substantial asymmetries with respect to
the sign of the shock. Adverse financial shocks have powerful effects on the US
economy, while benign shocks trigger much smaller reactions. Instead, with
respect to the size of the shocks, we find no discernible asymmetries.

arXiv link: http://arxiv.org/abs/2412.07649v1

Econometrics arXiv updated paper (originally submitted: 2024-12-10)

Inference after discretizing time-varying unobserved heterogeneity

Authors: Jad Beyhum, Martin Mugnier

Approximating time-varying unobserved heterogeneity by discrete types has
become increasingly popular in economics. Yet, provably valid post-clustering
inference for target parameters in models that do not impose an exact group
structure is still lacking. This paper fills this gap in the leading case of a
linear panel data model with nonseparable two-way unobserved heterogeneity.
Building on insights from the double machine learning literature, we propose a
simple inference procedure based on a bias-reducing moment. Asymptotic theory
and simulations suggest excellent performance. In the application on fiscal
policy we revisit, the novel approach yields conclusions in line with economic
theory.

arXiv link: http://arxiv.org/abs/2412.07352v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-12-10

Automatic Doubly Robust Forests

Authors: Zhaomeng Chen, Junting Duan, Victor Chernozhukov, Vasilis Syrgkanis

This paper proposes the automatic Doubly Robust Random Forest (DRRF)
algorithm for estimating the conditional expectation of a moment functional in
the presence of high-dimensional nuisance functions. DRRF extends the automatic
debiasing framework based on the Riesz representer to the conditional setting
and enables nonparametric, forest-based estimation (Athey et al., 2019; Oprescu
et al., 2019). In contrast to existing methods, DRRF does not require prior
knowledge of the form of the debiasing term or impose restrictive parametric or
semi-parametric assumptions on the target quantity. Additionally, it is
computationally efficient in making predictions at multiple query points. We
establish consistency and asymptotic normality results for the DRRF estimator
under general assumptions, allowing for the construction of valid confidence
intervals. Through extensive simulations in heterogeneous treatment effect
(HTE) estimation, we demonstrate the superior performance of DRRF over
benchmark approaches in terms of estimation accuracy, robustness, and
computational efficiency.

arXiv link: http://arxiv.org/abs/2412.07184v2

Econometrics arXiv updated paper (originally submitted: 2024-12-09)

Large Language Models: An Applied Econometric Framework

Authors: Jens Ludwig, Sendhil Mullainathan, Ashesh Rambachan

How can we use the novel capacities of large language models (LLMs) in
empirical research? And how can we do so while accounting for their
limitations, which are themselves only poorly understood? We develop an
econometric framework to answer this question that distinguishes between two
types of empirical tasks. Using LLMs for prediction problems (including
hypothesis generation) is valid under one condition: no “leakage” between the
LLM's training dataset and the researcher's sample. No leakage can be ensured
by using open-source LLMs with documented training data and published weights.
Using LLM outputs for estimation problems to automate the measurement of some
economic concept (expressed either by some text or from human subjects)
requires the researcher to collect at least some validation data: without such
data, the errors of the LLM's automation cannot be assessed and accounted for.
As long as these steps are taken, LLM outputs can be used in empirical research
with the familiar econometric guarantees we desire. Using two illustrative
applications to finance and political economy, we find that these requirements
are stringent; when they are violated, the limitations of LLMs now result in
unreliable empirical estimates. Our results suggest the excitement around the
empirical uses of LLMs is warranted -- they allow researchers to effectively
use even small amounts of language data for both prediction and estimation --
but only with these safeguards in place.

arXiv link: http://arxiv.org/abs/2412.07031v2

Econometrics arXiv updated paper (originally submitted: 2024-12-09)

Probabilistic Targeted Factor Analysis

Authors: Miguel C. Herculano, Santiago Montoya-Blandón

We develop a probabilistic variant of Partial Least Squares (PLS) we call
Probabilistic Targeted Factor Analysis (PTFA), which can be used to extract
common factors in predictors that are useful to predict a set of predetermined
target variables. Along with the technique, we provide an efficient
expectation-maximization (EM) algorithm to learn the parameters and forecast
the targets of interest. We develop a number of extensions to missing-at-random
data, stochastic volatility, factor dynamics, and mixed-frequency data for
real-time forecasting. In a simulation exercise, we show that PTFA outperforms
PLS at recovering the common underlying factors affecting both features and
target variables delivering better in-sample fit, and providing valid forecasts
under contamination such as measurement error or outliers. Finally, we provide
three applications in Economics and Finance where PTFA outperforms compared
with PLS and Principal Component Analysis (PCA) at out-of-sample forecasting.

arXiv link: http://arxiv.org/abs/2412.06688v3

Econometrics arXiv paper, submitted: 2024-12-08

Density forecast transformations

Authors: Matteo Mogliani, Florens Odendahl

The popular choice of using a $direct$ forecasting scheme implies that the
individual predictions do not contain information on cross-horizon dependence.
However, this dependence is needed if the forecaster has to construct, based on
$direct$ density forecasts, predictive objects that are functions of several
horizons ($e.g.$ when constructing annual-average growth rates from
quarter-on-quarter growth rates). To address this issue we propose to use
copulas to combine the individual $h$-step-ahead predictive distributions into
a joint predictive distribution. Our method is particularly appealing to
practitioners for whom changing the $direct$ forecasting specification is too
costly. In a Monte Carlo study, we demonstrate that our approach leads to a
better approximation of the true density than an approach that ignores the
potential dependence. We show the superior performance of our method in several
empirical examples, where we construct (i) quarterly forecasts using
month-on-month $direct$ forecasts, (ii) annual-average forecasts using monthly
year-on-year $direct$ forecasts, and (iii) annual-average forecasts using
quarter-on-quarter $direct$ forecasts.

arXiv link: http://arxiv.org/abs/2412.06092v1

Econometrics arXiv paper, submitted: 2024-12-08

Estimating Spillover Effects in the Presence of Isolated Nodes

Authors: Bora Kim

In estimating spillover effects under network interference, practitioners
often use linear regression with either the number or fraction of treated
neighbors as regressors. An often overlooked fact is that the latter is
undefined for units without neighbors (“isolated nodes"). The common practice
is to impute this fraction as zero for isolated nodes. This paper shows that
such practice introduces bias through theoretical derivations and simulations.
Causal interpretations of the commonly used spillover regression coefficients
are also provided.

arXiv link: http://arxiv.org/abs/2412.05919v1

Econometrics arXiv paper, submitted: 2024-12-08

Bundle Choice Model with Endogenous Regressors: An Application to Soda Tax

Authors: Tao Sun

This paper proposes a Bayesian factor-augmented bundle choice model to
estimate joint consumption as well as the substitutability and complementarity
of multiple goods in the presence of endogenous regressors. The model extends
the two primary treatments of endogeneity in existing bundle choice models: (1)
endogenous market-level prices and (2) time-invariant unobserved individual
heterogeneity. A Bayesian sparse factor approach is employed to capture
high-dimensional error correlations that induce taste correlation and
endogeneity. Time-varying factor loadings allow for more general
individual-level and time-varying heterogeneity and endogeneity, while the
sparsity induced by the shrinkage prior on loadings balances flexibility with
parsimony. Applied to a soda tax in the context of complementarities, the new
approach captures broader effects of the tax that were previously overlooked.
Results suggest that a soda tax could yield additional health benefits by
marginally decreasing the consumption of salty snacks along with sugary drinks,
extending the health benefits beyond the reduction in sugar consumption alone.

arXiv link: http://arxiv.org/abs/2412.05794v1

Econometrics arXiv paper, submitted: 2024-12-07

Convolution Mode Regression

Authors: Eduardo Schirmer Finn, Eduardo Horta

For highly skewed or fat-tailed distributions, mean or median-based methods
often fail to capture the central tendencies in the data. Despite being a
viable alternative, estimating the conditional mode given certain covariates
(or mode regression) presents significant challenges. Nonparametric approaches
suffer from the "curse of dimensionality", while semiparametric strategies
often lead to non-convex optimization problems. In order to avoid these issues,
we propose a novel mode regression estimator that relies on an intermediate
step of inverting the conditional quantile density. In contrast to existing
approaches, we employ a convolution-type smoothed variant of the quantile
regression. Our estimator converges uniformly over the design points of the
covariates and, unlike previous quantile-based mode regressions, is uniform
with respect to the smoothing bandwidth. Additionally, the Convolution Mode
Regression is dimension-free, carries no issues regarding optimization and
preliminary simulations suggest the estimator is normally distributed in finite
samples.

arXiv link: http://arxiv.org/abs/2412.05736v1

Econometrics arXiv paper, submitted: 2024-12-07

Property of Inverse Covariance Matrix-based Financial Adjacency Matrix for Detecting Local Groups

Authors: Minseog Oh, Donggyu Kim

In financial applications, we often observe both global and local factors
that are modeled by a multi-level factor model. When detecting unknown local
group memberships under such a model, employing a covariance matrix as an
adjacency matrix for local group memberships is inadequate due to the
predominant effect of global factors. Thus, to detect a local group structure
more effectively, this study introduces an inverse covariance matrix-based
financial adjacency matrix (IFAM) that utilizes negative values of the inverse
covariance matrix. We show that IFAM ensures that the edge density between
different groups vanishes, while that within the same group remains
non-vanishing. This reduces falsely detected connections and helps identify
local group membership accurately. To estimate IFAM under the multi-level
factor model, we introduce a factor-adjusted GLASSO estimator to address the
prevalent global factor effect in the inverse covariance matrix. An empirical
study using returns from international stocks across 20 financial markets
demonstrates that incorporating IFAM effectively detects latent local groups,
which helps improve the minimum variance portfolio allocation performance.

arXiv link: http://arxiv.org/abs/2412.05664v1

Econometrics arXiv paper, submitted: 2024-12-07

Minimum Sliced Distance Estimation in a Class of Nonregular Econometric Models

Authors: Yanqin Fan, Hyeonseok Park

This paper proposes minimum sliced distance estimation in structural
econometric models with possibly parameter-dependent supports. In contrast to
likelihood-based estimation, we show that under mild regularity conditions, the
minimum sliced distance estimator is asymptotically normally distributed
leading to simple inference regardless of the presence/absence of parameter
dependent supports. We illustrate the performance of our estimator on an
auction model.

arXiv link: http://arxiv.org/abs/2412.05621v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-12-07

Optimizing Returns from Experimentation Programs

Authors: Timothy Sudijono, Simon Ejdemyr, Apoorva Lal, Martin Tingley

Experimentation in online digital platforms is used to inform decision
making. Specifically, the goal of many experiments is to optimize a metric of
interest. Null hypothesis statistical testing can be ill-suited to this task,
as it is indifferent to the magnitude of effect sizes and opportunity costs.
Given access to a pool of related past experiments, we discuss how
experimentation practice should change when the goal is optimization. We survey
the literature on empirical Bayes analyses of A/B test portfolios, and single
out the A/B Testing Problem (Azevedo et al., 2020) as a starting point, which
treats experimentation as a constrained optimization problem. We show that the
framework can be solved with dynamic programming and implemented by
appropriately tuning $p$-value thresholds. Furthermore, we develop several
extensions of the A/B Testing Problem and discuss the implications of these
results on experimentation programs in industry. For example, under no-cost
assumptions, firms should be testing many more ideas, reducing test allocation
sizes, and relaxing $p$-value thresholds away from $p = 0.05$.

arXiv link: http://arxiv.org/abs/2412.05508v1

Econometrics arXiv paper, submitted: 2024-12-06

Linear Regressions with Combined Data

Authors: Xavier D'Haultfoeuille, Christophe Gaillac, Arnaud Maurel

We study best linear predictions in a context where the outcome of interest
and some of the covariates are observed in two different datasets that cannot
be matched. Traditional approaches obtain point identification by relying,
often implicitly, on exclusion restrictions. We show that without such
restrictions, coefficients of interest can still be partially identified and we
derive a constructive characterization of the sharp identified set. We then
build on this characterization to develop computationally simple and
asymptotically normal estimators of the corresponding bounds. We show that
these estimators exhibit good finite sample performances.

arXiv link: http://arxiv.org/abs/2412.04816v1

Econometrics arXiv updated paper (originally submitted: 2024-12-05)

Semiparametric Bayesian Difference-in-Differences

Authors: Christoph Breunig, Ruixuan Liu, Zhengfei Yu

This paper studies semiparametric Bayesian inference for the average
treatment effect on the treated (ATT) within the difference-in-differences
(DiD) research design. We propose two new Bayesian methods with frequentist
validity. The first one places a standard Gaussian process prior on the
conditional mean function of the control group. The second method is a double
robust Bayesian procedure that adjusts the prior distribution of the
conditional mean function and subsequently corrects the posterior distribution
of the resulting ATT. We prove new semiparametric Bernstein-von Mises (BvM)
theorems for both proposals. Monte Carlo simulations and an empirical
application demonstrate that the proposed Bayesian DiD methods exhibit strong
finite-sample performance compared to existing frequentist methods. We also
present extensions of the canonical DiD approach, incorporating both the
staggered design and the repeated cross-sectional design.

arXiv link: http://arxiv.org/abs/2412.04605v3

Econometrics arXiv updated paper (originally submitted: 2024-12-05)

Large Volatility Matrix Prediction using Tensor Factor Structure

Authors: Sung Hoon Choi, Donggyu Kim

Several approaches for predicting large volatility matrices have been
developed based on high-dimensional factor-based It\^o processes. These methods
often impose restrictions to reduce the model complexity, such as constant
eigenvectors or factor loadings over time. However, several studies indicate
that eigenvector processes are also time-varying. To address this feature, this
paper generalizes the factor structure by representing the integrated
volatility matrix process as a cubic (order-3 tensor) form, which is decomposed
into low-rank tensor and idiosyncratic tensor components. To predict
conditional expected large volatility matrices, we propose the Projected Tensor
Principal Orthogonal componEnt Thresholding (PT-POET) procedure and establish
its asymptotic properties. The advantages of PT-POET are validated through a
simulation study and demonstrated in an application to minimum variance
portfolio allocation using high-frequency trading data.

arXiv link: http://arxiv.org/abs/2412.04293v2

Econometrics arXiv updated paper (originally submitted: 2024-12-05)

On Extrapolation of Treatment Effects in Multiple-Cutoff Regression Discontinuity Designs

Authors: Yuta Okamoto, Yuuki Ozaki

We investigate how to learn treatment effects away from the cutoff in
multiple-cutoff regression discontinuity designs. Using a microeconomic model,
we demonstrate that the parallel-trend type assumption proposed in the
literature is justified when cutoff positions are assigned as if randomly and
the running variable is non-manipulable (e.g., parental income). However, when
the running variable is partially manipulable (e.g., test scores),
extrapolations based on that assumption can be biased. As a complementary
strategy, we propose a novel partial identification approach based on
empirically motivated assumptions. We also develop a uniform inference
procedure and provide two empirical illustrations.

arXiv link: http://arxiv.org/abs/2412.04265v3

Econometrics arXiv updated paper (originally submitted: 2024-12-03)

Endogenous Heteroskedasticity in Linear Models

Authors: Javier Alejo, Antonio F. Galvao, Julian Martinez-Iriarte, Gabriel Montes-Rojas

Linear regressions with endogeneity are widely used to estimate causal
effects. This paper studies a framework that involves two common issues:
endogeneity of the regressors and heteroskedasticity that depends on endogenous
regressors-i.e., endogenous heteroskedasticity. We show that the presence of
endogenous heteroskedasticity in the structural regression renders the
two-stage least squares estimator inconsistent. To address this issue, we
propose sufficient conditions and a control function approach to identify and
estimate the causal parameters of interest. We establish the limiting
properties of the estimator--namely, consistency and asymptotic normality--and
propose inference procedures. Monte Carlo simulations provide evidence on the
finite-sample performance of the proposed methods and evaluate different
implementation strategies. We revisit an empirical application on job training
to illustrate the methods.

arXiv link: http://arxiv.org/abs/2412.02767v3

Econometrics arXiv paper, submitted: 2024-12-03

A Markowitz Approach to Managing a Dynamic Basket of Moving-Band Statistical Arbitrages

Authors: Kasper Johansson, Thomas Schmelzer, Stephen Boyd

We consider the problem of managing a portfolio of moving-band statistical
arbitrages (MBSAs), inspired by the Markowitz optimization framework. We show
how to manage a dynamic basket of MBSAs, and illustrate the method on recent
historical data, showing that it can perform very well in terms of
risk-adjusted return, essentially uncorrelated with the market.

arXiv link: http://arxiv.org/abs/2412.02660v1

Econometrics arXiv paper, submitted: 2024-12-03

Simple and Effective Portfolio Construction with Crypto Assets

Authors: Kasper Johansson, Stephen Boyd

We consider the problem of constructing a portfolio that combines traditional
financial assets with crypto assets. We show that despite the documented
attributes of crypto assets, such as high volatility, heavy tails, excess
kurtosis, and skewness, a simple extension of traditional risk allocation
provides robust solutions for integrating these emerging assets into broader
investment strategies. Examination of the risk allocation holdings suggests an
even simpler method, analogous to the traditional 60/40 stocks/bonds
allocation, involving a fixed allocation to crypto and traditional assets,
dynamically diluted with cash to achieve a target risk level.

arXiv link: http://arxiv.org/abs/2412.02654v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2024-12-03

Use of surrogate endpoints in health technology assessment: a review of selected NICE technology appraisals in oncology

Authors: Lorna Wheaton, Sylwia Bujkiewicz

Objectives: Surrogate endpoints, used to substitute for and predict final
clinical outcomes, are increasingly being used to support submissions to health
technology assessment agencies. The increase in use of surrogate endpoints has
been accompanied by literature describing frameworks and statistical methods to
ensure their robust validation. The aim of this review was to assess how
surrogate endpoints have recently been used in oncology technology appraisals
by the National Institute for Health and Care Excellence (NICE) in England and
Wales.
Methods: This paper identified technology appraisals in oncology published by
NICE between February 2022 and May 2023. Data are extracted on methods for the
use and validation of surrogate endpoints.
Results: Of the 47 technology appraisals in oncology available for review, 18
(38 percent) utilised surrogate endpoints, with 37 separate surrogate endpoints
being discussed. However, the evidence supporting the validity of the surrogate
relationship varied significantly across putative surrogate relationships with
11 providing RCT evidence, 7 providing evidence from observational studies, 12
based on clinical opinion and 7 providing no evidence for the use of surrogate
endpoints.
Conclusions: This review supports the assertion that surrogate endpoints are
frequently used in oncology technology appraisals in England and Wales. Despite
increasing availability of statistical methods and guidance on appropriate
validation of surrogate endpoints, this review highlights that use and
validation of surrogate endpoints can vary between technology appraisals which
can lead to uncertainty in decision-making.

arXiv link: http://arxiv.org/abs/2412.02380v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2024-12-03

Selective Reviews of Bandit Problems in AI via a Statistical View

Authors: Pengjie Zhou, Haoyu Wei, Huiming Zhang

Reinforcement Learning (RL) is a widely researched area in artificial
intelligence that focuses on teaching agents decision-making through
interactions with their environment. A key subset includes stochastic
multi-armed bandit (MAB) and continuum-armed bandit (SCAB) problems, which
model sequential decision-making under uncertainty. This review outlines the
foundational models and assumptions of bandit problems, explores non-asymptotic
theoretical tools like concentration inequalities and minimax regret bounds,
and compares frequentist and Bayesian algorithms for managing
exploration-exploitation trade-offs. Additionally, we explore K-armed
contextual bandits and SCAB, focusing on their methodologies and regret
analyses. We also examine the connections between SCAB problems and functional
data analysis. Finally, we highlight recent advances and ongoing challenges in
the field.

arXiv link: http://arxiv.org/abs/2412.02251v3

Econometrics arXiv paper, submitted: 2024-12-03

Endogenous Interference in Randomized Experiments

Authors: Mengsi Gao

This paper investigates the identification and inference of treatment effects
in randomized controlled trials with social interactions. Two key network
features characterize the setting and introduce endogeneity: (1) latent
variables may affect both network formation and outcomes, and (2) the
intervention may alter network structure, mediating treatment effects. I make
three contributions. First, I define parameters within a post-treatment network
framework, distinguishing direct effects of treatment from indirect effects
mediated through changes in network structure. I provide a causal
interpretation of the coefficients in a linear outcome model. For estimation
and inference, I focus on a specific form of peer effects, represented by the
fraction of treated friends. Second, in the absence of endogeneity, I establish
the consistency and asymptotic normality of ordinary least squares estimators.
Third, if endogeneity is present, I propose addressing it through shift-share
instrumental variables, demonstrating the consistency and asymptotic normality
of instrumental variable estimators in relatively sparse networks. For denser
networks, I propose a denoised estimator based on eigendecomposition to restore
consistency. Finally, I revisit Prina (2015) as an empirical illustration,
demonstrating that treatment can influence outcomes both directly and through
network structure changes.

arXiv link: http://arxiv.org/abs/2412.02183v1

Econometrics arXiv updated paper (originally submitted: 2024-12-02)

A Dimension-Agnostic Bootstrap Anderson-Rubin Test For Instrumental Variable Regressions

Authors: Dennis Lim, Wenjie Wang, Yichong Zhang

Weak-identification-robust tests for instrumental variable (IV) regressions
are typically developed separately depending on whether the number of IVs is
treated as fixed or increasing with the sample size, forcing researchers to
make a stance on the asymptotic behavior, which is often ambiguous in practice.
This paper proposes a bootstrap-based, dimension-agnostic Anderson-Rubin (AR)
test that achieves correct asymptotic size regardless of whether the number of
IVs is fixed or diverging, and even accommodates cases where the number of IVs
exceeds the sample size. By incorporating ridge regularization, our approach
reduces the effective rank of the projection matrix and yields regimes where
the limiting distribution of the AR statistic can be a weighted chi-squared, a
normal, or a mixture of the two. Strong approximation results ensure that the
bootstrap procedure remains uniformly valid across all regimes, while also
delivering substantial power gains over existing methods by exploiting rank
reduction.

arXiv link: http://arxiv.org/abs/2412.01603v2

Econometrics arXiv paper, submitted: 2024-12-02

From rotational to scalar invariance: Enhancing identifiability in score-driven factor models

Authors: Giuseppe Buccheri, Fulvio Corsi, Emilija Dzuverovic

We show that, for a certain class of scaling matrices including the commonly
used inverse square-root of the conditional Fisher Information, score-driven
factor models are identifiable up to a multiplicative scalar constant under
very mild restrictions. This result has no analogue in parameter-driven models,
as it exploits the different structure of the score-driven factor dynamics.
Consequently, score-driven models offer a clear advantage in terms of economic
interpretability compared to parameter-driven factor models, which are
identifiable only up to orthogonal transformations. Our restrictions are
order-invariant and can be generalized to scoredriven factor models with
dynamic loadings and nonlinear factor models. We test extensively the
identification strategy using simulated and real data. The empirical analysis
on financial and macroeconomic data reveals a substantial increase of
log-likelihood ratios and significantly improved out-of-sample forecast
performance when switching from the classical restrictions adopted in the
literature to our more flexible specifications.

arXiv link: http://arxiv.org/abs/2412.01367v1

Econometrics arXiv paper, submitted: 2024-12-02

Locally robust semiparametric estimation of sample selection models without exclusion restrictions

Authors: Zhewen Pan, Yifan Zhang

Existing identification and estimation methods for semiparametric sample
selection models rely heavily on exclusion restrictions. However, it is
difficult in practice to find a credible excluded variable that has a
correlation with selection but no correlation with the outcome. In this paper,
we establish a new identification result for a semiparametric sample selection
model without the exclusion restriction. The key identifying assumptions are
nonlinearity on the selection equation and linearity on the outcome equation.
The difference in the functional form plays the role of an excluded variable
and provides identification power. According to the identification result, we
propose to estimate the model by a partially linear regression with a
nonparametrically generated regressor. To accommodate modern machine learning
methods in generating the regressor, we construct an orthogonalized moment by
adding the first-step influence function and develop a locally robust estimator
by solving the cross-fitted orthogonalized moment condition. We prove
root-n-consistency and asymptotic normality of the proposed estimator under
mild regularity conditions. A Monte Carlo simulation shows the satisfactory
performance of the estimator in finite samples, and an application to wage
regression illustrates its usefulness in the absence of exclusion restrictions.

arXiv link: http://arxiv.org/abs/2412.01208v1

Econometrics arXiv paper, submitted: 2024-12-02

Iterative Distributed Multinomial Regression

Authors: Yanqin Fan, Yigit Okar, Xuetao Shi

This article introduces an iterative distributed computing estimator for the
multinomial logistic regression model with large choice sets. Compared to the
maximum likelihood estimator, the proposed iterative distributed estimator
achieves significantly faster computation and, when initialized with a
consistent estimator, attains asymptotic efficiency under a weak dominance
condition. Additionally, we propose a parametric bootstrap inference procedure
based on the iterative distributed estimator and establish its consistency.
Extensive simulation studies validate the effectiveness of the proposed methods
and highlight the computational efficiency of the iterative distributed
estimator.

arXiv link: http://arxiv.org/abs/2412.01030v1

Econometrics arXiv paper, submitted: 2024-12-01

Optimization of Delivery Routes for Fresh E-commerce in Pre-warehouse Mode

Authors: Alice Harward, Junjie Lin, Yun Wang, Xiaoke Xie

With the development of the economy, fresh food e-commerce has experienced
rapid growth. One of the core competitive advantages of fresh food e-commerce
platforms lies in selecting an appropriate logistics distribution model. This
study focuses on the front warehouse model, aiming to minimize distribution
costs. Considering the perishable nature and short shelf life of fresh food, a
distribution route optimization model is constructed, and the saving mileage
method is designed to determine the optimal distribution scheme. The results
indicate that under certain conditions, different distribution schemes
significantly impact the performance of fresh food e-commerce platforms. Based
on a review of domestic and international research, this paper takes Dingdong
Maicai as an example to systematically introduce the basic concepts of
distribution route optimization in fresh food e-commerce platforms under the
front warehouse model, analyze the advantages of logistics distribution, and
thoroughly examine the importance of distribution routes for fresh products.

arXiv link: http://arxiv.org/abs/2412.00634v1

Econometrics arXiv paper, submitted: 2024-11-29

Peer Effects and Herd Behavior: An Empirical Study Based on the "Double 11" Shopping Festival

Authors: Hambur Wang

This study employs a Bayesian Probit model to empirically analyze peer
effects and herd behavior among consumers during the "Double 11" shopping
festival, using data collected through a questionnaire survey. The results
demonstrate that peer effects significantly influence consumer decision-making,
with the probability of participation in the shopping event increasing notably
when roommates are involved. Additionally, factors such as gender, online
shopping experience, and fashion consciousness significantly impact consumers'
herd behavior. This research not only enhances the understanding of online
shopping behavior among college students but also provides empirical evidence
for e-commerce platforms to formulate targeted marketing strategies. Finally,
the study discusses the fragility of online consumption activities, the need
for adjustments in corporate marketing strategies, and the importance of
promoting a healthy online culture.

arXiv link: http://arxiv.org/abs/2412.00233v1

Econometrics arXiv updated paper (originally submitted: 2024-11-29)

Canonical correlation analysis of stochastic trends via functional approximation

Authors: Massimo Franchi, Iliyan Georgiev, Paolo Paruolo

This paper proposes a novel approach for semiparametric inference on the
number $s$ of common trends and their loading matrix $\psi$ in $I(1)/I(0)$
systems. It combines functional approximation of limits of random walks and
canonical correlations analysis, performed between the $p$ observed time series
of length $T$ and the first $K$ discretized elements of an $L^2$ basis. Tests
and selection criteria on $s$, and estimators and tests on $\psi$ are proposed;
their properties are discussed as $T$ and $K$ diverge sequentially for fixed
$p$ and $s$. It is found that tests on $s$ are asymptotically pivotal,
selection criteria of $s$ are consistent, estimators of $\psi$ are
$T$-consistent, mixed-Gaussian and efficient, so that Wald tests on $\psi$ are
asymptotically Normal or $\chi^2$. The paper also discusses asymptotically
pivotal misspecification tests for checking model assumptions. The approach can
be coherently applied to subsets or aggregations of variables in a given panel.
Monte Carlo simulations show that these tools have reasonable performance for
$T\geq 10 p$ and $p\leq 300$. An empirical analysis of 20 exchange rates
illustrates the methods.

arXiv link: http://arxiv.org/abs/2411.19572v2

Econometrics arXiv updated paper (originally submitted: 2024-11-28)

Warfare Ignited Price Contagion Dynamics in Early Modern Europe

Authors: Emile Esmaili, Michael J. Puma, Francis Ludlow, Poul Holm, Eva Jobbova

Economic historians have long studied market integration and contagion
dynamics during periods of warfare and global stress, but there is a lack of
model-based evidence on these phenomena. This paper uses an econometric
contagion model, the Diebold-Yilmaz framework, to examine the dynamics of
economic shocks across European markets in the early modern period. Our
findings suggest that key periods of violent conflicts significantly increased
food price spillover across cities, causing widespread disruptions across
Europe. We also demonstrate the ability of this framework to capture relevant
historical dynamics between the main trade centers of the period.

arXiv link: http://arxiv.org/abs/2411.18978v3

Econometrics arXiv paper, submitted: 2024-11-28

Contrasting the optimal resource allocation to cybersecurity and cyber insurance using prospect theory versus expected utility theory

Authors: Chaitanya Joshi, Jinming Yang, Sergeja Slapnicar, Ryan K L Ko

Protecting against cyber-threats is vital for every organization and can be
done by investing in cybersecurity controls and purchasing cyber insurance.
However, these are interlinked since insurance premiums could be reduced by
investing more in cybersecurity controls. The expected utility theory and the
prospect theory are two alternative theories explaining decision-making under
risk and uncertainty, which can inform strategies for optimizing resource
allocation. While the former is considered a rational approach, research has
shown that most people make decisions consistent with the latter, including on
insurance uptakes. We compare and contrast these two approaches to provide
important insights into how the two approaches could lead to different optimal
allocations resulting in differing risk exposure as well as financial costs. We
introduce the concept of a risk curve and show that identifying the nature of
the risk curve is a key step in deriving the optimal resource allocation.

arXiv link: http://arxiv.org/abs/2411.18838v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-11-27

Difference-in-differences Design with Outcomes Missing Not at Random

Authors: Sooahn Shin

This paper addresses one of the most prevalent problems encountered by
political scientists working with difference-in-differences (DID) design:
missingness in panel data. A common practice for handling missing data, known
as complete case analysis, is to drop cases with any missing values over time.
A more principled approach involves using nonparametric bounds on causal
effects or applying inverse probability weighting based on baseline covariates.
Yet, these methods are general remedies that often under-utilize the
assumptions already imposed on panel structure for causal identification. In
this paper, I outline the pitfalls of complete case analysis and propose an
alternative identification strategy based on principal strata. To be specific,
I impose parallel trends assumption within each latent group that shares the
same missingness pattern (e.g., always-respondents, if-treated-respondents) and
leverage missingness rates over time to estimate the proportions of these
groups. Building on this, I tailor Lee bounds, a well-known nonparametric
bounds under selection bias, to partially identify the causal effect within the
DID design. Unlike complete case analysis, the proposed method does not require
independence between treatment selection and missingness patterns, nor does it
assume homogeneous effects across these patterns.

arXiv link: http://arxiv.org/abs/2411.18772v1

Econometrics arXiv cross-link from q-fin.RM (q-fin.RM), submitted: 2024-11-26

Autoencoder Enhanced Realised GARCH on Volatility Forecasting

Authors: Qianli Zhao, Chao Wang, Richard Gerlach, Giuseppe Storti, Lingxiang Zhang

Realised volatility has become increasingly prominent in volatility
forecasting due to its ability to capture intraday price fluctuations. With a
growing variety of realised volatility estimators, each with unique advantages
and limitations, selecting an optimal estimator may introduce challenges. In
this thesis, aiming to synthesise the impact of various realised volatility
measures on volatility forecasting, we propose an extension of the Realised
GARCH model that incorporates an autoencoder-generated synthetic realised
measure, combining the information from multiple realised measures in a
nonlinear manner. Our proposed model extends existing linear methods, such as
Principal Component Analysis and Independent Component Analysis, to reduce the
dimensionality of realised measures. The empirical evaluation, conducted across
four major stock markets from January 2000 to June 2022 and including the
period of COVID-19, demonstrates both the feasibility of applying an
autoencoder to synthesise volatility measures and the superior effectiveness of
the proposed model in one-step-ahead rolling volatility forecasting. The model
exhibits enhanced flexibility in parameter estimations across each rolling
window, outperforming traditional linear approaches. These findings indicate
that nonlinear dimension reduction offers further adaptability and flexibility
in improving the synthetic realised measure, with promising implications for
future volatility forecasting applications.

arXiv link: http://arxiv.org/abs/2411.17136v1

Econometrics arXiv updated paper (originally submitted: 2024-11-25)

Normal Approximation for U-Statistics with Cross-Sectional Dependence

Authors: Weiguang Liu

We establish normal approximation in the Wasserstein metric and central limit
theorems for both non-degenerate and degenerate U-statistics with
cross-sectionally dependent samples using Stein's method. For the
non-degenerate case, our results extend recent studies on the asymptotic
properties of sums of cross-sectionally dependent random variables. The
degenerate case is more challenging due to the additional dependence induced by
the nonlinearity of the U-statistic kernel. Through a specific implementation
of Stein's method, we derive convergence rates under conditions on the mixing
rate, the sparsity of the cross-sectional dependence structure, and the moments
of the U-statistic kernel. Finally, we demonstrate the application of our
theoretical results with a nonparametric specification test for data with
cross-sectional dependence.

arXiv link: http://arxiv.org/abs/2411.16978v2

Econometrics arXiv paper, submitted: 2024-11-25

Anomaly Detection in California Electricity Price Forecasting: Enhancing Accuracy and Reliability Using Principal Component Analysis

Authors: Joseph Nyangon, Ruth Akintunde

Accurate and reliable electricity price forecasting has significant practical
implications for grid management, renewable energy integration, power system
planning, and price volatility management. This study focuses on enhancing
electricity price forecasting in California's grid, addressing challenges from
complex generation data and heteroskedasticity. Utilizing principal component
analysis (PCA), we analyze CAISO's hourly electricity prices and demand from
2016-2021 to improve day-ahead forecasting accuracy. Initially, we apply
traditional outlier analysis with the interquartile range method, followed by
robust PCA (RPCA) for more effective outlier elimination. This approach
improves data symmetry and reduces skewness. We then construct multiple linear
regression models using both raw and PCA-transformed features. The model with
transformed features, refined through traditional and SAS Sparse Matrix outlier
removal methods, shows superior forecasting performance. The SAS Sparse Matrix
method, in particular, significantly enhances model accuracy. Our findings
demonstrate that PCA-based methods are key in advancing electricity price
forecasting, supporting renewable integration and grid management in day-ahead
markets.
Keywords: Electricity price forecasting, principal component analysis (PCA),
power system planning, heteroskedasticity, renewable energy integration.

arXiv link: http://arxiv.org/abs/2412.07787v1

Econometrics arXiv updated paper (originally submitted: 2024-11-25)

A Binary IV Model for Persuasion: Profiling Persuasion Types among Compliers

Authors: Zeyang Yu

In an empirical study of persuasion, researchers often use a binary
instrument to encourage individuals to consume information and take some
action. We show that, with a binary Imbens-Angrist instrumental variable model
and the monotone treatment response assumption, it is possible to identify the
joint distribution of potential outcomes among compliers. This is necessary to
identify the percentage of mobilised voters and their statistical
characteristic defined by the moments of the joint distribution of treatment
and covariates. Specifically, we develop a method that enables researchers to
identify the statistical characteristic of persuasion types: always-voters,
never-voters, and mobilised voters among compliers. These findings extend the
kappa weighting results in Abadie (2003). We also provide a sharp test for the
two sets of identification assumptions. The test boils down to testing whether
there exists a nonnegative solution to a possibly under-determined system of
linear equations with known coefficients. An application based on Green et al.
(2003) is provided.

arXiv link: http://arxiv.org/abs/2411.16906v2

Econometrics arXiv updated paper (originally submitted: 2024-11-25)

A Supervised Machine Learning Approach for Assessing Grant Peer Review Reports

Authors: Gabriel Okasa, Alberto de León, Michaela Strinzel, Anne Jorstad, Katrin Milzow, Matthias Egger, Stefan Müller

Peer review in grant evaluation informs funding decisions, but the contents
of peer review reports are rarely analyzed. In this work, we develop a
thoroughly tested pipeline to analyze the texts of grant peer review reports
using methods from applied Natural Language Processing (NLP) and machine
learning. We start by developing twelve categories reflecting content of grant
peer review reports that are of interest to research funders. This is followed
by multiple human annotators' iterative annotation of these categories in a
novel text corpus of grant peer review reports submitted to the Swiss National
Science Foundation. After validating the human annotation, we use the annotated
texts to fine-tune pre-trained transformer models to classify these categories
at scale, while conducting several robustness and validation checks. Our
results show that many categories can be reliably identified by human
annotators and machine learning approaches. However, the choice of text
classification approach considerably influences the classification performance.
We also find a high correspondence between out-of-sample classification
performance and human annotators' perceived difficulty in identifying
categories. Our results and publicly available fine-tuned transformer models
will allow researchers and research funders and anybody interested in peer
review to examine and report on the contents of these reports in a structured
manner. Ultimately, we hope our approach can contribute to ensuring the quality
and trustworthiness of grant peer review.

arXiv link: http://arxiv.org/abs/2411.16662v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2024-11-25

When Is Heterogeneity Actionable for Personalization?

Authors: Anya Shchetkina, Ron Berman

Targeting and personalization policies can be used to improve outcomes beyond
the uniform policy that assigns the best performing treatment in an A/B test to
everyone. Personalization relies on the presence of heterogeneity of treatment
effects, yet, as we show in this paper, heterogeneity alone is not sufficient
for personalization to be successful. We develop a statistical model to
quantify "actionable heterogeneity," or the conditions when personalization is
likely to outperform the best uniform policy. We show that actionable
heterogeneity can be visualized as crossover interactions in outcomes across
treatments and depends on three population-level parameters: within-treatment
heterogeneity, cross-treatment correlation, and the variation in average
responses. Our model can be used to predict the expected gain from
personalization prior to running an experiment and also allows for sensitivity
analysis, providing guidance on how changing treatments can affect the
personalization gain. To validate our model, we apply five common
personalization approaches to two large-scale field experiments with many
interventions that encouraged flu vaccination. We find an 18% gain from
personalization in one and a more modest 4% gain in the other, which is
consistent with our model. Counterfactual analysis shows that this difference
in the gains from personalization is driven by a drastic difference in
within-treatment heterogeneity. However, reducing cross-treatment correlation
holds a larger potential to further increase personalization gains. Our
findings provide a framework for assessing the potential from personalization
and offer practical recommendations for improving gains from targeting in
multi-intervention settings.

arXiv link: http://arxiv.org/abs/2411.16552v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2024-11-25

What events matter for exchange rate volatility ?

Authors: Igor Martins, Hedibert Freitas Lopes

This paper expands on stochastic volatility models by proposing a data-driven
method to select the macroeconomic events most likely to impact volatility. The
paper identifies and quantifies the effects of macroeconomic events across
multiple countries on exchange rate volatility using high-frequency currency
returns, while accounting for persistent stochastic volatility effects and
seasonal components capturing time-of-day patterns. Given the hundreds of
macroeconomic announcements and their lags, we rely on sparsity-based methods
to select relevant events for the model. We contribute to the exchange rate
literature in four ways: First, we identify the macroeconomic events that drive
currency volatility, estimate their effects and connect them to macroeconomic
fundamentals. Second, we find a link between intraday seasonality, trading
volume, and the opening hours of major markets across the globe. We provide a
simple labor-based explanation for this observed pattern. Third, we show that
including macroeconomic events and seasonal components is crucial for
forecasting exchange rate volatility. Fourth, our proposed model yields the
lowest volatility and highest Sharpe ratio in portfolio allocations when
compared to standard SV and GARCH models.

arXiv link: http://arxiv.org/abs/2411.16244v1

Econometrics arXiv paper, submitted: 2024-11-24

Ranking probabilistic forecasting models with different loss functions

Authors: Tomasz Serafin, Bartosz Uniejewski

In this study, we introduced various statistical performance metrics, based
on the pinball loss and the empirical coverage, for the ranking of
probabilistic forecasting models. We tested the ability of the proposed metrics
to determine the top performing forecasting model and investigated the use of
which metric corresponds to the highest average per-trade profit in the
out-of-sample period. Our findings show that for the considered trading
strategy, ranking the forecasting models according to the coverage of quantile
forecasts used in the trading hours exhibits a superior economic performance.

arXiv link: http://arxiv.org/abs/2411.17743v1

Econometrics arXiv paper, submitted: 2024-11-24

Homeopathic Modernization and the Middle Science Trap: conceptual context of ergonomics, econometrics and logic of some national scientific case

Authors: Eldar Knar

This article analyses the structural and institutional barriers hindering the
development of scientific systems in transition economies, such as Kazakhstan.
The main focus is on the concept of the "middle science trap," which is
characterized by steady growth in quantitative indicators (publications,
grants) but a lack of qualitative advancement. Excessive bureaucracy, weak
integration into the international scientific community, and ineffective
science management are key factors limiting development. This paper proposes an
approach of "homeopathic modernization," which focuses on minimal yet
strategically significant changes aimed at reducing bureaucratic barriers and
enhancing the effectiveness of the scientific ecosystem. A comparative analysis
of international experience (China, India, and the European Union) is provided,
demonstrating how targeted reforms in the scientific sector can lead to
significant results. Social and cultural aspects, including the influence of
mentality and institutional structure, are also examined, and practical
recommendations for reforming the scientific system in Kazakhstan and Central
Asia are offered. The conclusions of the article could be useful for developing
national science modernization programs, particularly in countries with high
levels of bureaucracy and conservatism.

arXiv link: http://arxiv.org/abs/2411.15996v1

Econometrics arXiv updated paper (originally submitted: 2024-11-24)

Utilization and Profitability of Tractor Services for Maize Farming in Ejura-Sekyedumase Municipality, Ghana

Authors: Fred Nimoh, Innocent Yao Yevu, Attah-Nyame Essampong, Asante Emmanuel Addo, Addai Kevin

Maize farming is a major livelihood activity for many farmers in Ghana.
Unfortunately, farmers usually do not obtain the expected returns on their
investment due to reliance on rudimentary, labor-intensive, and inefficient
methods of production. Using cross-sectional data from 359 maize farmers, this
study investigates the profitability and determinants of the use of tractor
services for maize production in Ejura-Sekyedumase, Ashanti Region of Ghana.
Results from descriptive and profitability analyses reveal that tractor
services such as ploughing and shelling are widely used, but their
profitability varies significantly among farmers. Key factors influencing
profitability include farm size, fertilizer quantity applied, and farmer
experience. Results from a multivariate probit analysis also showed that
farming experience, fertilizer quantity, and profit per acre have a positive
influence on tractor service use for shelling, while household size, farm size,
and FBO have a negative influence. Farming experience, fertilizer quantity, and
profit per acre positively influence tractor service use for ploughing, while
farm size has a negative influence. A t-test result reveals a statistically
significant difference in profit between farmers who use tractor services and
those who do not. Specifically, farmers who utilize tractor services on their
maize farm had a return to cost of 9 percent more than those who do not
(p-value < 0.05). The Kendall's result showed a moderate agreement among the
maize farmers (first ranked being financial issues) in their ability to
access/utilize tractor services on their farm.

arXiv link: http://arxiv.org/abs/2411.15797v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-11-23

Canonical Correlation Analysis: review

Authors: Anna Bykhovskaya, Vadim Gorin

For over a century canonical correlations, variables, and related concepts
have been studied across various fields, with contributions dating back to
Jordan [1875] and Hotelling [1936]. This text surveys the evolution of
canonical correlation analysis, a fundamental statistical tool, beginning with
its foundational theorems and progressing to recent developments and open
research problems. Along the way we introduce and review methods, notions, and
fundamental concepts from linear algebra, random matrix theory, and
high-dimensional statistics, placing particular emphasis on rigorous
mathematical treatment.
The survey is intended for technically proficient graduate students and other
researchers with an interest in this area. The content is organized into five
chapters, supplemented by six sets of exercises found in Chapter 6. These
exercises introduce additional material, reinforce key concepts, and serve to
bridge ideas across chapters. We recommend the following sequence: first, solve
Problem Set 0, then proceed with Chapter 1, solve Problem Set 1, and so on
through the text.

arXiv link: http://arxiv.org/abs/2411.15625v1

Econometrics arXiv updated paper (originally submitted: 2024-11-22)

From Replications to Revelations: Heteroskedasticity-Robust Inference

Authors: Sebastian Kranz

Analysing the Stata regression commands from 4,420 reproduction packages of
leading economic journals, we find that, among the 40,571 regressions
specifying heteroskedasticity-robust standard errors, 98.1% adhere to Stata's
default HC1 specification. We then compare several heteroskedasticity-robust
inference methods with a large-scale Monte Carlo study based on regressions
from 155 reproduction packages. Our results show that t-tests based on HC1 or
HC2 with default degrees of freedom exhibit substantial over-rejection.
Inference methods with customized degrees of freedom, as proposed by Bell and
McCaffrey (2002), Hansen (2024), and a novel approach based on partial
leverages, perform best. Additionally, we provide deeper insights into the role
of leverages and partial leverages across different inference methods.

arXiv link: http://arxiv.org/abs/2411.14763v2

Econometrics arXiv updated paper (originally submitted: 2024-11-21)

Dynamic Spatial Interaction Models for a Resource Allocator's Decisions and Local Agents' Multiple Activities

Authors: Hanbat Jeong

This paper introduces a novel spatial interaction model to explore the
decision-making processes of a resource allocator and local agents, with
central and local governments serving as empirical representations. The model
captures two key features: (i) resource allocations from the allocator to local
agents and the resulting strategic interactions, and (ii) local agents'
multiple activities and their interactions. We develop a network game for the
micro-foundations of these processes. In this game, local agents engage in
multiple activities, while the allocator distributes resources by monitoring
the externalities arising from their interactions. The game's unique Nash
equilibrium establishes our econometric framework. To estimate the agent payoff
parameters, we employ the quasi-maximum likelihood (QML) estimation method and
examine the asymptotic properties of the QML estimator to ensure robust
statistical inference. Empirically, we study interactions among U.S. states in
public welfare and housing and community development expenditures, focusing on
how federal grants influence these expenditures and the interdependencies among
state governments. Our findings reveal significant spillovers across the
states' two expenditures. Additionally, we detect positive effects of federal
grants on both types of expenditures, inducing a responsive grant scheme based
on states' decisions. Last, we compare state expenditures and social welfare
through counterfactual simulations under two scenarios: (i) responsive
intervention by monitoring states' decisions and (ii) autonomous transfers. We
find that responsive intervention enhances social welfare by leading to an
increase in the states' two expenditures. However, due to the heavy reliance on
autonomous transfers, the magnitude of these improvements remains relatively
small compared to the share of federal grants in total state revenues.

arXiv link: http://arxiv.org/abs/2411.13810v2

Econometrics arXiv paper, submitted: 2024-11-20

Clustering with Potential Multidimensionality: Inference and Practice

Authors: Ruonan Xu, Luther Yap

We show how clustering standard errors in one or more dimensions can be
justified in M-estimation when there is sampling or assignment uncertainty.
Since existing procedures for variance estimation are either conservative or
invalid, we propose a variance estimator that refines a conservative procedure
and remains valid. We then interpret environments where clustering is
frequently employed in empirical work from our design-based perspective and
provide insights on their estimands and inference procedures.

arXiv link: http://arxiv.org/abs/2411.13372v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2024-11-20

Revealed Information

Authors: Laura Doval, Ran Eilat, Tianhao Liu, Yangfan Zhou

An analyst observes the frequency with which a decision maker (DM) takes
actions, but not the frequency conditional on payoff-relevant states. We ask
when the analyst can rationalize the DM's choices as if the DM first learns
something about the state before acting. We provide a support-function
characterization of the triples of utility functions, prior beliefs, and
(marginal) distributions over actions such that the DM's action distribution is
consistent with information given the DM's prior and utility function.
Assumptions on the cardinality of the state space and the utility function
allow us to refine this characterization, obtaining a sharp system of finitely
many inequalities the utility function, prior, and action distribution must
satisfy. We apply our characterization to study comparative statics and to
identify conditions under which a single information structure rationalizes
choices across multiple decision problems. We characterize the set of
distributions over posterior beliefs that are consistent with the DM's choices.
We extend our results to settings with a continuum of actions and states
assuming the first-order approach applies, and to simple multi-agent settings.

arXiv link: http://arxiv.org/abs/2411.13293v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2024-11-18

Prediction-Guided Active Experiments

Authors: Ruicheng Ao, Hongyu Chen, David Simchi-Levi

In this work, we introduce a new framework for active experimentation, the
Prediction-Guided Active Experiment (PGAE), which leverages predictions from an
existing machine learning model to guide sampling and experimentation.
Specifically, at each time step, an experimental unit is sampled according to a
designated sampling distribution, and the actual outcome is observed based on
an experimental probability. Otherwise, only a prediction for the outcome is
available. We begin by analyzing the non-adaptive case, where full information
on the joint distribution of the predictor and the actual outcome is assumed.
For this scenario, we derive an optimal experimentation strategy by minimizing
the semi-parametric efficiency bound for the class of regular estimators. We
then introduce an estimator that meets this efficiency bound, achieving
asymptotic optimality. Next, we move to the adaptive case, where the predictor
is continuously updated with newly sampled data. We show that the adaptive
version of the estimator remains efficient and attains the same semi-parametric
bound under certain regularity assumptions. Finally, we validate PGAE's
performance through simulations and a semi-synthetic experiment using data from
the US Census Bureau. The results underscore the PGAE framework's effectiveness
and superiority compared to other existing methods.

arXiv link: http://arxiv.org/abs/2411.12036v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2024-11-18

Debiased Regression for Root-N-Consistent Conditional Mean Estimation

Authors: Masahiro Kato

This study introduces a debiasing method for regression estimators, including
high-dimensional and nonparametric regression estimators. For example,
nonparametric regression methods allow for the estimation of regression
functions in a data-driven manner with minimal assumptions; however, these
methods typically fail to achieve $n$-consistency in their convergence
rates, and many, including those in machine learning, lack guarantees that
their estimators asymptotically follow a normal distribution. To address these
challenges, we propose a debiasing technique for nonparametric estimators by
adding a bias-correction term to the original estimators, extending the
conventional one-step estimator used in semiparametric analysis. Specifically,
for each data point, we estimate the conditional expected residual of the
original nonparametric estimator, which can, for instance, be computed using
kernel (Nadaraya-Watson) regression, and incorporate it as a bias-reduction
term. Our theoretical analysis demonstrates that the proposed estimator
achieves $n$-consistency and asymptotic normality under a mild
convergence rate condition for both the original nonparametric estimator and
the conditional expected residual estimator. Notably, this approach remains
model-free as long as the original estimator and the conditional expected
residual estimator satisfy the convergence rate condition. The proposed method
offers several advantages, including improved estimation accuracy and
simplified construction of confidence intervals.

arXiv link: http://arxiv.org/abs/2411.11748v3

Econometrics arXiv updated paper (originally submitted: 2024-11-18)

Treatment Effect Estimators as Weighted Outcomes

Authors: Michael C. Knaus

Estimators that weight observed outcomes to form effect estimates have a long
tradition. Their outcome weights are widely used in established procedures,
such as checking covariate balance, characterizing target populations, or
detecting and managing extreme weights. This paper introduces a general
framework for deriving such outcome weights. It establishes when and how
numerical equivalence between an original estimator representation as moment
condition and a unique weighted representation can be obtained. The framework
is applied to derive novel outcome weights for the six seminal instances of
double machine learning and generalized random forests, while recovering
existing results for other estimators as special cases. The analysis highlights
that implementation choices determine (i) the availability of outcome weights
and (ii) their properties. Notably, standard implementations of partially
linear regression-based estimators, like causal forests, employ outcome weights
that do not sum to (minus) one in the (un)treated group, not fulfilling a
property often considered desirable.

arXiv link: http://arxiv.org/abs/2411.11559v2

Econometrics arXiv paper, submitted: 2024-11-17

Econometrics and Formalism of Psychological Archetypes of Scientific Workers with Introverted Thinking Type

Authors: Eldar Knar

The chronological hierarchy and classification of psychological types of
individuals are examined. The anomalous nature of psychological activity in
individuals involved in scientific work is highlighted. Certain aspects of the
introverted thinking type in scientific activities are analyzed. For the first
time, psychological archetypes of scientists with pronounced introversion are
postulated in the context of twelve hypotheses about the specifics of
professional attributes of introverted scientific activities.
A linear regression and Bayesian equation are proposed for quantitatively
assessing the econometric degree of introversion in scientific employees,
considering a wide range of characteristics inherent to introverts in
scientific processing. Specifically, expressions for a comprehensive assessment
of introversion in a linear model and the posterior probability of the
econometric (scientometric) degree of introversion in a Bayesian model are
formulated.
The models are based on several econometric (scientometric) hypotheses
regarding various aspects of professional activities of introverted scientists,
such as a preference for solo publications, low social activity, narrow
specialization, high research depth, and so forth. Empirical data and multiple
linear regression methods can be used to calibrate the equations. The model can
be applied to gain a deeper understanding of the psychological characteristics
of scientific employees, which is particularly useful in ergonomics and the
management of scientific teams and projects. The proposed method also provides
scientists with pronounced introversion the opportunity to develop their
careers, focusing on individual preferences and features.

arXiv link: http://arxiv.org/abs/2411.11058v1

Econometrics arXiv updated paper (originally submitted: 2024-11-17)

Program Evaluation with Remotely Sensed Outcomes

Authors: Ashesh Rambachan, Rahul Singh, Davide Viviano

Economists often estimate treatment effects in experiments using remotely
sensed variables (RSVs), e.g. satellite images or mobile phone activity, in
place of directly measured economic outcomes. A common practice is to use an
observational sample to train a predictor of the economic outcome from the RSV,
and then to use its predictions as the outcomes in the experiment. We show that
this method is biased whenever the RSV is post-outcome, i.e. if variation in
the economic outcome causes variation in the RSV. In program evaluation,
changes in poverty or environmental quality cause changes in satellite images,
but not vice versa. As our main result, we nonparametrically identify the
treatment effect by formalizing the intuition that underlies common practice:
the conditional distribution of the RSV given the outcome and treatment is
stable across the samples.Based on our identifying formula, we find that the
efficient representation of RSVs for causal inference requires three
predictions rather than one. Valid inference does not require any rate
conditions on RSV predictions, justifying the use of complex deep learning
algorithms with unknown statistical properties. We re-analyze the effect of an
anti-poverty program in India using satellite images.

arXiv link: http://arxiv.org/abs/2411.10959v2

Econometrics arXiv updated paper (originally submitted: 2024-11-16)

Building Interpretable Climate Emulators for Economics

Authors: Aryan Eftekhari, Doris Folini, Aleksandra Friedl, Felix Kübler, Simon Scheidegger, Olaf Schenk

We introduce a framework for developing efficient and interpretable climate
emulators (CEs) for economic models of climate change. The paper makes two main
contributions. First, we propose a general framework for constructing
carbon-cycle emulators (CCEs) for macroeconomic models. The framework is
implemented as a generalized linear multi-reservoir (box) model that conserves
key physical quantities and can be customized for specific applications. We
consider three versions of the CCE, which we evaluate within a simple
representative agent economic model: (i) a three-box setting comparable to
DICE-2016, (ii) a four-box extension, and (iii) a four-box version that
explicitly captures land-use change. While the three-box model reproduces
benchmark results well and the fourth reservoir adds little, incorporating the
impact of land-use change on the carbon storage capacity of the terrestrial
biosphere substantially alters atmospheric carbon stocks, temperature
trajectories, and the optimal mitigation path. Second, we investigate
pattern-scaling techniques that transform global-mean temperature projections
from CEs into spatially heterogeneous warming fields. We show how regional
baseline climates, non-uniform warming, and the associated uncertainties
propagate into economic damages.

arXiv link: http://arxiv.org/abs/2411.10768v2

Econometrics arXiv paper, submitted: 2024-11-15

Feature Importance of Climate Vulnerability Indicators with Gradient Boosting across Five Global Cities

Authors: Lidia Cano Pecharroman, Melissa O. Tier, Elke U. Weber

Efforts are needed to identify and measure both communities' exposure to
climate hazards and the social vulnerabilities that interact with these
hazards, but the science of validating hazard vulnerability indicators is still
in its infancy. Progress is needed to improve: 1) the selection of variables
that are used as proxies to represent hazard vulnerability; 2) the
applicability and scale for which these indicators are intended, including
their transnational applicability. We administered an international urban
survey in Buenos Aires, Argentina; Johannesburg, South Africa; London, United
Kingdom; New York City, United States; and Seoul, South Korea in order to
collect data on exposure to various types of extreme weather events,
socioeconomic characteristics commonly used as proxies for vulnerability (i.e.,
income, education level, gender, and age), and additional characteristics not
often included in existing composite indices (i.e., queer identity, disability
identity, non-dominant primary language, and self-perceptions of both
discrimination and vulnerability to flood risk). We then use feature importance
analysis with gradient-boosted decision trees to measure the importance that
these variables have in predicting exposure to various types of extreme weather
events. Our results show that non-traditional variables were more relevant to
self-reported exposure to extreme weather events than traditionally employed
variables such as income or age. Furthermore, differences in variable relevance
across different types of hazards and across urban contexts suggest that
vulnerability indicators need to be fit to context and should not be used in a
one-size-fits-all fashion.

arXiv link: http://arxiv.org/abs/2411.10628v1

Econometrics arXiv paper, submitted: 2024-11-15

Monetary Incentives, Landowner Preferences: Estimating Cross-Elasticities in Farmland Conversion to Renewable Energy

Authors: Chad Fiechter, Binayak Kunwar, Guy Tchuente

This study examines the impact of monetary factors on the conversion of
farmland to renewable energy generation, specifically solar and wind, in the
context of expanding U.S. energy production. We propose a new econometric
method that accounts for the diverse circumstances of landowners, including
their unordered alternative land use options, non-monetary benefits from
farming, and the influence of local regulations. We demonstrate that
identifying the cross elasticity of landowners' farming income in relation to
the conversion of farmland to renewable energy requires an understanding of
their preferences. By utilizing county legislation that we assume to be shaped
by land-use preferences, we estimate the cross-elasticities of farming income.
Our findings indicate that monetary incentives may only influence landowners'
decisions in areas with potential for future residential development,
underscoring the importance of considering both preferences and regulatory
contexts.

arXiv link: http://arxiv.org/abs/2411.10600v1

Econometrics arXiv updated paper (originally submitted: 2024-11-15)

Dynamic Causal Effects in a Nonlinear World: the Good, the Bad, and the Ugly

Authors: Michal Kolesár, Mikkel Plagborg-Møller

Applied macroeconomists frequently use impulse response estimators motivated
by linear models. We study whether the estimands of such procedures have a
causal interpretation when the true data generating process is in fact
nonlinear. We show that vector autoregressions and linear local projections
onto observed shocks or proxies identify weighted averages of causal effects
regardless of the extent of nonlinearities. By contrast, identification
approaches that exploit heteroskedasticity or non-Gaussianity of latent shocks
are highly sensitive to departures from linearity. Our analysis is based on new
results on the identification of marginal treatment effects through weighted
regressions, which may also be of interest to researchers outside
macroeconomics.

arXiv link: http://arxiv.org/abs/2411.10415v4

Econometrics arXiv paper, submitted: 2024-11-15

Semiparametric inference for impulse response functions using double/debiased machine learning

Authors: Daniele Ballinari, Alexander Wehrli

We introduce a double/debiased machine learning (DML) estimator for the
impulse response function (IRF) in settings where a time series of interest is
subjected to multiple discrete treatments, assigned over time, which can have a
causal effect on future outcomes. The proposed estimator can rely on fully
nonparametric relations between treatment and outcome variables, opening up the
possibility to use flexible machine learning approaches to estimate IRFs. To
this end, we extend the theory of DML from an i.i.d. to a time series setting
and show that the proposed DML estimator for the IRF is consistent and
asymptotically normally distributed at the parametric rate, allowing for
semiparametric inference for dynamic effects in a time series setting. The
properties of the estimator are validated numerically in finite samples by
applying it to learn the IRF in the presence of serial dependence in both the
confounder and observation innovation processes. We also illustrate the
methodology empirically by applying it to the estimation of the effects of
macroeconomic shocks.

arXiv link: http://arxiv.org/abs/2411.10009v1

Econometrics arXiv updated paper (originally submitted: 2024-11-14)

Sharp Testable Implications of Encouragement Designs

Authors: Yuehao Bai, Shunzhuang Huang, Max Tabord-Meehan

This paper studies a potential outcome model with a continuous or discrete
outcome, a discrete multi-valued treatment, and a discrete multi-valued
instrument. We derive sharp, closed-form testable implications for a class of
restrictions on potential treatments where each value of the instrument
encourages towards at most one unique treatment choice; such restrictions serve
as the key identifying assumption in several prominent recent empirical papers.
Borrowing the terminology used in randomized experiments, we call such a
setting an encouragement design. The testable implications are inequalities in
terms of the conditional distributions of choices and the outcome given the
instrument. Through a novel constructive argument, we show these inequalities
are sharp in the sense that any distribution of the observed data that
satisfies these inequalities is compatible with this class of restrictions on
potential treatments. Based on these inequalities, we propose tests of the
restrictions. In an empirical application, we show some of these restrictions
are violated and pinpoint the substitution pattern that leads to the violation.

arXiv link: http://arxiv.org/abs/2411.09808v3

Econometrics arXiv paper, submitted: 2024-11-14

Bayesian estimation of finite mixtures of Tobit models

Authors: Caio Waisman

This paper outlines a Bayesian approach to estimate finite mixtures of Tobit
models. The method consists of an MCMC approach that combines Gibbs sampling
with data augmentation and is simple to implement. I show through simulations
that the flexibility provided by this method is especially helpful when
censoring is not negligible. In addition, I demonstrate the broad utility of
this methodology with applications to a job training program, labor supply, and
demand for medical care. I find that this approach allows for non-trivial
additional flexibility that can alter results considerably and beyond improving
model fit.

arXiv link: http://arxiv.org/abs/2411.09771v1

Econometrics arXiv paper, submitted: 2024-11-14

Sparse Interval-valued Time Series Modeling with Machine Learning

Authors: Haowen Bao, Yongmiao Hong, Yuying Sun, Shouyang Wang

By treating intervals as inseparable sets, this paper proposes sparse machine
learning regressions for high-dimensional interval-valued time series. With
LASSO or adaptive LASSO techniques, we develop a penalized minimum distance
estimation, which covers point-based estimators are special cases. We establish
the consistency and oracle properties of the proposed penalized estimator,
regardless of whether the number of predictors is diverging with the sample
size. Monte Carlo simulations demonstrate the favorable finite sample
properties of the proposed estimation. Empirical applications to
interval-valued crude oil price forecasting and sparse index-tracking portfolio
construction illustrate the robustness and effectiveness of our method against
competing approaches, including random forest and multilayer perceptron for
interval-valued data. Our findings highlight the potential of machine learning
techniques in interval-valued time series analysis, offering new insights for
financial forecasting and portfolio management.

arXiv link: http://arxiv.org/abs/2411.09452v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-11-14

On Asymptotic Optimality of Least Squares Model Averaging When True Model Is Included

Authors: Wenchao Xu, Xinyu Zhang

Asymptotic optimality is a key theoretical property in model averaging. Due
to technical difficulties, existing studies rely on restricted weight sets or
the assumption that there is no true model with fixed dimensions in the
candidate set. The focus of this paper is to overcome these difficulties.
Surprisingly, we discover that when the penalty factor in the weight selection
criterion diverges with a certain order and the true model dimension is fixed,
asymptotic loss optimality does not hold, but asymptotic risk optimality does.
This result differs from the corresponding result of Fang et al. (2023,
Econometric Theory 39, 412-441) and reveals that using the discrete weight set
of Hansen (2007, Econometrica 75, 1175-1189) can yield opposite asymptotic
properties compared to using the usual weight set. Simulation studies
illustrate the theoretical findings in a variety of settings.

arXiv link: http://arxiv.org/abs/2411.09258v1

Econometrics arXiv updated paper (originally submitted: 2024-11-14)

Difference-in-Differences with Sample Selection

Authors: Gayani Rathnayake, Akanksha Negi, Otavio Bartalotti, Xueyan Zhao

We consider identification of average treatment effects on the treated (ATT)
within the difference-in-differences (DiD) framework in the presence of
endogenous sample selection. First, we establish that the usual DiD estimand
fails to recover meaningful treatment effects, even if selection and treatment
assignment are independent. Next, we partially identify the ATT for individuals
who are always observed post-treatment regardless of their treatment status,
and derive bounds on this parameter under different sets of assumptions about
the relationship between sample selection and treatment assignment. Extensions
to the repeated cross-section and two-by-two comparisons in the staggered
adoption case are explored. Furthermore, we provide identification results for
the ATT of three additional empirically relevant latent groups by incorporating
outcome mean dominance assumptions which have intuitive appeal in applications.
Finally, two empirical illustrations demonstrate the approach's usefulness by
revisiting (i) the effect of a job training program on earnings(Calonico &
Smith, 2017) and (ii) the effect of a working-from-home policy on employee
performance (Bloom, Liang, Roberts, & Ying, 2015).

arXiv link: http://arxiv.org/abs/2411.09221v2

Econometrics arXiv updated paper (originally submitted: 2024-11-14)

On the (Mis)Use of Machine Learning with Panel Data

Authors: Augusto Cerqua, Marco Letta, Gabriele Pinto

We provide the first systematic assessment of data leakage issues in the use
of machine learning on panel data. Our organizing framework clarifies why
neglecting the cross-sectional and longitudinal structure of these data leads
to hard-to-detect data leakage, inflated out-of-sample performance, and an
inadvertent overestimation of the real-world usefulness and applicability of
machine learning models. We then offer empirical guidelines for practitioners
to ensure the correct implementation of supervised machine learning in panel
data environments. An empirical application, using data from over 3,000 U.S.
counties spanning 2000-2019 and focused on income prediction, illustrates the
practical relevance of these points across nearly 500 models for both
classification and regression tasks.

arXiv link: http://arxiv.org/abs/2411.09218v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-11-13

Covariate Adjustment in Randomized Experiments Motivated by Higher-Order Influence Functions

Authors: Sihui Zhao, Xinbo Wang, Lin Liu, Xin Zhang

Higher-Order Influence Functions (HOIF), developed in a series of papers over
the past twenty years, is a fundamental theoretical device for constructing
rate-optimal causal-effect estimators from observational studies. However, the
value of HOIF for analyzing well-conducted randomized controlled trials (RCTs)
has not been explicitly explored. In the recent U.S. Food and Drug
Administration (FDA) and European Medicines Agency (EMA) guidelines on the
practice of covariate adjustment in analyzing RCTs, in addition to the simple,
unadjusted difference-in-mean estimator, it was also recommended to report the
estimator adjusting for baseline covariates via a simple parametric working
model, such as a linear model. In this paper, we show that a HOIF-motivated
estimator for the treatment-specific mean has significantly improved
statistical properties compared to popular adjusted estimators in practice when
the number of baseline covariates $p$ is relatively large compared to the
sample size $n$. We also characterize the conditions under which the
HOIF-motivated estimator improves upon the unadjusted one. Furthermore, we
demonstrate that a novel debiased adjusted estimator proposed recently by Lu et
al. is, in fact, another HOIF-motivated estimator in disguise. Numerical and
empirical studies are conducted to corroborate our theoretical findings.

arXiv link: http://arxiv.org/abs/2411.08491v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-11-12

MSTest: An R-Package for Testing Markov Switching Models

Authors: Gabriel Rodriguez-Rondon, Jean-Marie Dufour

We present the R package MSTest, which implements hypothesis testing
procedures to identify the number of regimes in Markov switching models. These
models have wide-ranging applications in economics, finance, and numerous other
fields. The MSTest package includes the Monte Carlo likelihood ratio test
procedures proposed by Rodriguez-Rondon and Dufour (2024), the moment-based
tests of Dufour and Luger (2017), the parameter stability tests of Carrasco,
Hu, and Ploberger (2014), and the likelihood ratio test of Hansen (1992).
Additionally, the package enables users to simulate and estimate univariate and
multivariate Markov switching and hidden Markov processes, using the
expectation-maximization (EM) algorithm or maximum likelihood estimation (MLE).
We demonstrate the functionality of the MSTest package through both simulation
experiments and an application to U.S. GNP growth data.

arXiv link: http://arxiv.org/abs/2411.08188v1

Econometrics arXiv updated paper (originally submitted: 2024-11-12)

A Note on Doubly Robust Estimator in Regression Discontinuity Designs

Authors: Masahiro Kato

This note introduces a doubly robust (DR) estimator for regression
discontinuity (RD) designs. RD designs provide a quasi-experimental framework
for estimating treatment effects, where treatment assignment depends on whether
a running variable surpasses a predefined cutoff. A common approach in RD
estimation is the use of nonparametric regression methods, such as local linear
regression. However, the validity of these methods still relies on the
consistency of the nonparametric estimators. In this study, we propose the
DR-RD estimator, which combines two distinct estimators for the conditional
expected outcomes. The primary advantage of the DR-RD estimator lies in its
ability to ensure the consistency of the treatment effect estimation as long as
at least one of the two estimators is consistent. Consequently, our DR-RD
estimator enhances robustness of treatment effect estimators in RD designs.

arXiv link: http://arxiv.org/abs/2411.07978v4

Econometrics arXiv updated paper (originally submitted: 2024-11-12)

Matching $\leq$ Hybrid $\leq$ Difference in Differences

Authors: Yechan Park, Yuya Sasaki

Since LaLonde's (1986) seminal paper, there has been ongoing interest in
estimating treatment effects using pre- and post-intervention data. Scholars
have traditionally used experimental benchmarks to evaluate the accuracy of
alternative econometric methods, including Matching, Difference-in-Differences
(DID), and their hybrid forms (e.g., Heckman et al., 1998b; Dehejia and Wahba,
2002; Smith and Todd, 2005). We revisit these methodologies in the evaluation
of job training and educational programs using four datasets (LaLonde, 1986;
Heckman et al., 1998a; Smith and Todd, 2005; Chetty et al., 2014a; Athey et
al., 2020), and show that the inequality relationship, Matching $\leq$ Hybrid
$\leq$ DID, appears as a consistent norm, rather than a mere coincidence. We
provide a formal theoretical justification for this puzzling phenomenon under
plausible conditions such as negative selection, by generalizing the classical
bracketing (Angrist and Pischke, 2009, Section 5). Consequently, when
treatments are expected to be non-negative, DID tends to provide optimistic
estimates, while Matching offers more conservative ones.

arXiv link: http://arxiv.org/abs/2411.07952v2

Econometrics arXiv paper, submitted: 2024-11-12

Impact of R&D and AI Investments on Economic Growth and Credit Rating

Authors: Davit Gondauri, Ekaterine Mikautadze

The research and development (R&D) phase is essential for fostering
innovation and aligns with long-term strategies in both public and private
sectors. This study addresses two primary research questions: (1) assessing the
relationship between R&D investments and GDP through regression analysis, and
(2) estimating the economic value added (EVA) that Georgia must generate to
progress from a BB to a BBB credit rating. Using World Bank data from
2014-2022, this analysis found that increasing R&D, with an emphasis on AI, by
30-35% has a measurable impact on GDP. Regression results reveal a coefficient
of 7.02%, indicating a 10% increase in R&D leads to a 0.70% GDP rise, with an
81.1% determination coefficient and a strong 90.1% correlation.
Georgia's EVA model was calculated to determine the additional value needed
for a BBB rating, comparing indicators from Greece, Hungary, India, and
Kazakhstan as benchmarks. Key economic indicators considered were nominal GDP,
GDP per capita, real GDP growth, and fiscal indicators (government balance/GDP,
debt/GDP). The EVA model projects that to achieve a BBB rating within nine
years, Georgia requires $61.7 billion in investments. Utilizing EVA and
comprehensive economic indicators will support informed decision-making and
enhance the analysis of Georgia's economic trajectory.

arXiv link: http://arxiv.org/abs/2411.07817v1

Econometrics arXiv updated paper (originally submitted: 2024-11-12)

Spatial Competition on Psychological Pricing Strategies -- Preliminary Evidence from an Online Marketplace

Authors: Magdalena Schindl, Felix Reichel

This paper investigates whether spatial proximity shapes
psychological-pricing choices on Austria's C2C marketplace willhaben. Two
web-scraped snapshots of 826 Woom Bike listings - a standardised product sold
on the platform reveal that sellers near direct competitors are more likely to
adopt 9-, 90-, or 99-ending prices, who also use such pricing strategy
unconditional on product characteristics or underlying spatiotemporal
differences. Such strategy is associated with an average premium of
approximately cet. par. 3.4 %. Information asymmetry persists: buyer trust
hinges on signals such as the "Trusted Seller" badge, and missing data on the
"PayLivery" feature. Lacking final transaction prices limits inference.

arXiv link: http://arxiv.org/abs/2411.07808v3

Econometrics arXiv paper, submitted: 2024-11-12

Dynamic Evolutionary Game Analysis of How Fintech in Banking Mitigates Risks in Agricultural Supply Chain Finance

Authors: Qiang Wan, Jun Cui

This paper explores the impact of banking fintech on reducing financial risks
in the agricultural supply chain, focusing on the secondary allocation of
commercial credit. The study constructs a three-player evolutionary game model
involving banks, core enterprises, and SMEs to analyze how fintech innovations,
such as big data credit assessment, blockchain, and AI-driven risk evaluation,
influence financial risks and access to credit. The findings reveal that
banking fintech reduces financing costs and mitigates financial risks by
improving transaction reliability, enhancing risk identification, and
minimizing information asymmetry. By optimizing cooperation between banks, core
enterprises, and SMEs, fintech solutions enhance the stability of the
agricultural supply chain, contributing to rural revitalization goals and
sustainable agricultural development. The study provides new theoretical
insights and practical recommendations for improving agricultural finance
systems and reducing financial risks.
Keywords: banking fintech, agricultural supply chain, financial risk,
commercial credit, SMEs, evolutionary game model, big data, blockchain,
AI-driven risk evaluation.

arXiv link: http://arxiv.org/abs/2411.07604v1

Econometrics arXiv cross-link from cs.AI (cs.AI), submitted: 2024-11-11

Evaluating the Accuracy of Chatbots in Financial Literature

Authors: Orhan Erdem, Kristi Hassett, Feyzullah Egriboyun

We evaluate the reliability of two chatbots, ChatGPT (4o and o1-preview
versions), and Gemini Advanced, in providing references on financial literature
and employing novel methodologies. Alongside the conventional binary approach
commonly used in the literature, we developed a nonbinary approach and a
recency measure to assess how hallucination rates vary with how recent a topic
is. After analyzing 150 citations, ChatGPT-4o had a hallucination rate of 20.0%
(95% CI, 13.6%-26.4%), while the o1-preview had a hallucination rate of 21.3%
(95% CI, 14.8%-27.9%). In contrast, Gemini Advanced exhibited higher
hallucination rates: 76.7% (95% CI, 69.9%-83.4%). While hallucination rates
increased for more recent topics, this trend was not statistically significant
for Gemini Advanced. These findings emphasize the importance of verifying
chatbot-provided references, particularly in rapidly evolving fields.

arXiv link: http://arxiv.org/abs/2411.07031v1

Econometrics arXiv updated paper (originally submitted: 2024-11-10)

Return and Volatility Forecasting Using On-Chain Flows in Cryptocurrency Markets

Authors: Yeguang Chi, Qionghua, Chu, Wenyan Hao

We empirically examine the intraday return- and volatility-forecasting power
of on-chain flow data for Bitcoin(BTC), Ethereum(ETH), and Tether(USDT). We
find ETH net inflows to strongly predict ETH returns and volatility in the
2017-2023 period. Our intraday frequencies are 1-6 hours. We find that
differing significantly from forecasting patterns for BTC, ETH net inflows
negatively predict ETH returns and volatility. First, we find that USDT flowing
out of investors wallets and into cryptocurrency exchanges, namely, USDT net
inflows into the exchanges, positively predicts BTC and ETH returns at multiple
intervals and negatively predicts ETH volatility at various intervals and BTC
volatility at the 6-hour interval. Second, we find that ETH net inflows
negatively predict ETH returns and volatility for all intraday intervals.
Third, BTC net inflows generally lack predictive power for BTC returns(except
at 4 hours) but are negatively associated with volatility across all intraday
intervals. We illustrate our findings on return forecasting via case studies.
Moreover, we develop option strategies to assess profits and losses on ETH
investments based on ETH net inflows. Our findings contribute to the growing
literature on on-chain activity and its asset pricing implications, offering
economically relevant insights for intraday portfolio management in
cryptocurrency markets.

arXiv link: http://arxiv.org/abs/2411.06327v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-11-08

On the limiting variance of matching estimators

Authors: Songliang Chen, Fang Han

This paper examines the limiting variance of nearest neighbor matching
estimators for average treatment effects with a fixed number of matches. We
present, for the first time, a closed-form expression for this limit. Here the
key is the establishment of the limiting second moment of the catchment area's
volume, which resolves a question of Abadie and Imbens. At the core of our
approach is a new universality theorem on the measures of high-order Voronoi
cells, extending a result by Devroye, Gy\"orfi, Lugosi, and Walk.

arXiv link: http://arxiv.org/abs/2411.05758v1

Econometrics arXiv paper, submitted: 2024-11-08

Firm Heterogeneity and Macroeconomic Fluctuations: a Functional VAR model

Authors: Massimiliano Marcellino, Andrea Renzetti, Tommaso Tornese

We develop a Functional Augmented Vector Autoregression (FunVAR) model to
explicitly incorporate firm-level heterogeneity observed in more than one
dimension and study its interaction with aggregate macroeconomic fluctuations.
Our methodology employs dimensionality reduction techniques for tensor data
objects to approximate the joint distribution of firm-level characteristics.
More broadly, our framework can be used for assessing predictions from
structural models that account for micro-level heterogeneity observed on
multiple dimensions. Leveraging firm-level data from the Compustat database, we
use the FunVAR model to analyze the propagation of total factor productivity
(TFP) shocks, examining their impact on both macroeconomic aggregates and the
cross-sectional distribution of capital and labor across firms.

arXiv link: http://arxiv.org/abs/2411.05695v1

Econometrics arXiv paper, submitted: 2024-11-08

Nowcasting distributions: a functional MIDAS model

Authors: Massimiliano Marcellino, Andrea Renzetti, Tommaso Tornese

We propose a functional MIDAS model to leverage high-frequency information
for forecasting and nowcasting distributions observed at a lower frequency. We
approximate the low-frequency distribution using Functional Principal Component
Analysis and consider a group lasso spike-and-slab prior to identify the
relevant predictors in the finite-dimensional SUR-MIDAS approximation of the
functional MIDAS model. In our application, we use the model to nowcast the
U.S. households' income distribution. Our findings indicate that the model
enhances forecast accuracy for the entire target distribution and for key
features of the distribution that signal changes in inequality.

arXiv link: http://arxiv.org/abs/2411.05629v1

Econometrics arXiv updated paper (originally submitted: 2024-11-08)

Detecting Cointegrating Relations in Non-stationary Matrix-Valued Time Series

Authors: Alain Hecq, Ivan Ricardo, Ines Wilms

This paper proposes a Matrix Error Correction Model to identify cointegration
relations in matrix-valued time series. We hereby allow separate cointegrating
relations along the rows and columns of the matrix-valued time series and use
information criteria to select the cointegration ranks. Through Monte Carlo
simulations and a macroeconomic application, we demonstrate that our approach
provides a reliable estimation of the number of cointegrating relationships.

arXiv link: http://arxiv.org/abs/2411.05601v2

Econometrics arXiv updated paper (originally submitted: 2024-11-07)

Inference for Treatment Effects Conditional on Generalized Principal Strata using Instrumental Variables

Authors: Yuehao Bai, Shunzhuang Huang, Sarah Moon, Andres Santos, Azeem M. Shaikh, Edward J. Vytlacil

We propose a general approach for inference for a broad class of treatment
effect parameters in a setting of a discrete valued treatment and instrument
with a general outcome variable. The class of parameters considered are those
that can be expressed as the expectation of a function of the response type
conditional on a generalized principal stratum. Here, the response type refers
to the vector of potential outcomes and potential treatments, and a generalized
principal stratum is a set of possible values for the response type. In
addition to instrument exogeneity, the main substantive restriction imposed
rules out certain values for the response types in the sense that they are
assumed to occur with probability zero. It is shown through a series of
examples that this framework includes a wide variety of parameters and
assumptions that have been considered in the previous literature. A key result
in our analysis is a characterization of the identified set for such parameters
under these assumptions in terms of existence of a non-negative solution to
linear systems of equations with a special structure. We propose methods for
inference exploiting this special structure and recent results in Fang et al.
(2023).

arXiv link: http://arxiv.org/abs/2411.05220v2

Econometrics arXiv paper, submitted: 2024-11-07

The role of expansion strategies and operational attributes on hotel performance: a compositional approach

Authors: Carles Mulet-Forteza, Berta Ferrer-Rosell, Onofre Martorell Cunill, Salvador Linares-Mustarós

This study aims to explore the impact of expansion strategies and specific
attributes of hotel establishments on the performance of international hotel
chains, focusing on four key performance indicators: RevPAR, efficiency,
occupancy, and asset turnover. Data were collected from 255 hotels across
various international hotel chains, providing a comprehensive assessment of how
different expansion strategies and hotel attributes influence performance. The
research employs compositional data analysis (CoDA) to address the
methodological limitations of traditional financial ratios in statistical
analysis. The findings indicate that ownership-based expansion strategies
result in higher operational performance, as measured by revenue per available
room, but yield lower economic performance due to the high capital investment
required. Non-ownership strategies, such as management contracts and
franchising, show superior economic efficiency, offering more flexibility and
reduced financial risk. This study contributes to the hospitality management
literature by applying CoDA, a novel methodological approach in this field, to
examine the performance of different hotel expansion strategies with a sound
and more appropriate method. The insights provided can guide hotel managers and
investors in making informed decisions to optimize both operational and
economic performance.

arXiv link: http://arxiv.org/abs/2411.04640v1

Econometrics arXiv paper, submitted: 2024-11-07

Partial Identification of Distributional Treatment Effects in Panel Data using Copula Equality Assumptions

Authors: Heshani Madigasekara, D. S. Poskitt, Lina Zhang, Xueyan Zhao

This paper aims to partially identify the distributional treatment effects
(DTEs) that depend on the unknown joint distribution of treated and untreated
potential outcomes. We construct the DTE bounds using panel data and allow
individuals to switch between the treated and untreated states more than once
over time. Individuals are grouped based on their past treatment history, and
DTEs are allowed to be heterogeneous across different groups. We provide two
alternative group-wise copula equality assumptions to bound the unknown joint
and the DTEs, both of which leverage information from the past observations.
Testability of these two assumptions are also discussed, and test results are
presented. We apply this method to study the treatment effect heterogeneity of
exercising on the adults' body weight. These results demonstrate that our
method improves the identification power of the DTE bounds compared to the
existing methods.

arXiv link: http://arxiv.org/abs/2411.04450v1

Econometrics arXiv paper, submitted: 2024-11-07

Identification of Long-Term Treatment Effects via Temporal Links, Observational, and Experimental Data

Authors: Filip Obradović

Recent literature proposes combining short-term experimental and long-term
observational data to provide credible alternatives to conventional
observational studies for identification of long-term average treatment effects
(LTEs). I show that experimental data have an auxiliary role in this context.
They bring no identifying power without additional modeling assumptions. When
modeling assumptions are imposed, experimental data serve to amplify their
identifying power. If the assumptions fail, adding experimental data may only
yield results that are farther from the truth. Motivated by this, I introduce
two assumptions on treatment response that may be defensible based on economic
theory or intuition. To utilize them, I develop a novel two-step identification
approach that centers on bounding temporal link functions -- the relationship
between short-term and mean long-term potential outcomes. The approach provides
sharp bounds on LTEs for a general class of assumptions, and allows for
imperfect experimental compliance -- extending existing results.

arXiv link: http://arxiv.org/abs/2411.04380v1

Econometrics arXiv updated paper (originally submitted: 2024-11-06)

Lee Bounds with a Continuous Treatment in Sample Selection

Authors: Ying-Ying Lee, Chu-An Liu

We study causal inference in sample selection models where a continuous or
multivalued treatment affects both outcomes and their observability (e.g.,
employment or survey responses). We generalized the widely used Lee (2009)'s
bounds for binary treatment effects. Our key innovation is a sufficient
treatment values assumption that imposes weak restrictions on selection
heterogeneity and is implicit in separable threshold-crossing models, including
monotone effects on selection. Our double debiased machine learning estimator
enables nonparametric and high-dimensional methods, using covariates to tighten
the bounds and capture heterogeneity. Applications to Job Corps and CCC program
evaluations reinforce prior findings under weaker assumptions.

arXiv link: http://arxiv.org/abs/2411.04312v4

Econometrics arXiv paper, submitted: 2024-11-06

Bounded Rationality in Central Bank Communication

Authors: Wonseong Kim, Choong Lyol Lee

This study explores the influence of FOMC sentiment on market expectations,
focusing on cognitive differences between experts and non-experts. Using
sentiment analysis of FOMC minutes, we integrate these insights into a bounded
rationality model to examine the impact on inflation expectations. Results show
that experts form more conservative expectations, anticipating FOMC
stabilization actions, while non-experts react more directly to inflation
concerns. A lead-lag analysis indicates that institutions adjust faster, though
the gap with individual investors narrows in the short term. These findings
highlight the need for tailored communication strategies to better align public
expectations with policy goals.

arXiv link: http://arxiv.org/abs/2411.04286v1

Econometrics arXiv updated paper (originally submitted: 2024-11-06)

An Adversarial Approach to Identification

Authors: Irene Botosaru, Isaac Loh, Chris Muris

We introduce a new framework for characterizing identified sets of structural
and counterfactual parameters in econometric models. By reformulating the
identification problem as a set membership question, we leverage the separating
hyperplane theorem in the space of observed probability measures to
characterize the identified set through the zeros of a discrepancy function
with an adversarial game interpretation. The set can be a singleton, resulting
in point identification. A feature of many econometric models, with or without
distributional assumptions on the error terms, is that the probability measure
of observed variables can be expressed as a linear transformation of the
probability measure of latent variables. This structure provides a unifying
framework and facilitates computation and inference via linear programming. We
demonstrate the versatility of our approach by applying it to nonlinear panel
models with fixed effects, with parametric and nonparametric error
distributions, and across various exogeneity restrictions, including strict and
sequential.

arXiv link: http://arxiv.org/abs/2411.04239v2

Econometrics arXiv updated paper (originally submitted: 2024-11-06)

Identification and Inference in General Bunching Designs

Authors: Myunghyun Song

This paper develops an econometric framework and tools for the identification
and inference of a structural parameter in general bunching designs. We present
point and partial identification results, which generalize previous approaches
in the literature. The key assumption for point identification is the
analyticity of the counterfactual density, which defines a broader class of
distributions than many commonly used parametric families. In the partial
identification approach, the analyticity condition is relaxed and various
inequality restrictions can be incorporated. Both of our identification
approaches allow for observed covariates in the model, which has previously
been permitted only in limited ways. These covariates allow us to account for
observable factors that influence decisions regarding the running variable. We
provide a suite of counterfactual estimation and inference methods, termed the
generalized polynomial strategy. Our method restores the merits of the original
polynomial strategy proposed by Chetty et al. (2011) while addressing several
weaknesses in the widespread practice. The efficacy of the proposed method is
demonstrated compared to the polynomial estimator in a series of Monte Carlo
studies within the augmented isoelastic model. We revisit the data used in Saez
(2010) and find substantially different results relative to those from the
polynomial strategy.

arXiv link: http://arxiv.org/abs/2411.03625v3

Econometrics arXiv updated paper (originally submitted: 2024-11-05)

Improving precision of A/B experiments using trigger intensity

Authors: Tanmoy Das, Dohyeon Lee, Arnab Sinha

In industry, online randomized controlled experiment (a.k.a. A/B experiment)
is a standard approach to measure the impact of a causal change. These
experiments have small treatment effect to reduce the potential blast radius.
As a result, these experiments often lack statistical significance due to low
signal-to-noise ratio. A standard approach for improving the precision (or
reducing the standard error) focuses only on the trigger observations, where
the output of the treatment and the control model are different. Although
evaluation with full information about trigger observations (full knowledge)
improves the precision, detecting all such trigger observations is a costly
affair. In this paper, we propose a sampling based evaluation method (partial
knowledge) to reduce this cost. The randomness of sampling introduces bias in
the estimated outcome. We theoretically analyze this bias and show that the
bias is inversely proportional to the number of observations used for sampling.
We also compare the proposed evaluation methods using simulation and empirical
data. In simulation, bias in evaluation with partial knowledge effectively
reduces to zero when a limited number of observations (<= 0.1%) are sampled for
trigger estimation. In empirical setup, evaluation with partial knowledge
reduces the standard error by 36.48%.

arXiv link: http://arxiv.org/abs/2411.03530v2

Econometrics arXiv updated paper (originally submitted: 2024-11-05)

Randomly Assigned First Differences?

Authors: Facundo Argañaraz, Clément de Chaisemartin, Ziteng Lei

We consider treatment-effect estimation using a first-difference regression
of an outcome evolution $\Delta Y$ on a treatment evolution $\Delta D$. Under a
causal model in levels with a time-varying effect, the regression residual is a
function of the period-one treatment $D_{1}$. Then, researchers should test if
$\Delta D$ and $D_{1}$ are correlated: if they are, the regression may suffer
from an omitted variable bias. To solve it, researchers may control
nonparametrically for $E(\Delta D|D_{1})$. We use our results to revisit
first-difference regressions estimated on the data of
acemoglu2016import, who study the effect of imports from China on US
employment. $\Delta D$ and $D_{1}$ are strongly correlated, thus implying that
first-difference regressions may be biased if the effect of Chinese imports
changes over time. The coefficient on $\Delta D$ is no longer significant when
controlling for $E(\Delta D|D_{1})$.

arXiv link: http://arxiv.org/abs/2411.03208v7

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2024-11-05

Robust Market Interventions

Authors: Andrea Galeotti, Benjamin Golub, Sanjeev Goyal, Eduard Talamàs, Omer Tamuz

When can interventions in markets be designed to increase surplus robustly --
i.e., with high probability -- accounting for uncertainty due to imprecise
information about economic primitives? In a setting with many strategic firms,
each possessing some market power, we present conditions for such interventions
to exist. The key condition, recoverable structure, requires large-scale
complementarities among families of products. The analysis works by decomposing
the incidence of interventions in terms of principal components of a Slutsky
matrix. Under recoverable structure, a noisy signal of this matrix reveals
enough about these principal components to design robust interventions. Our
results demonstrate the usefulness of spectral methods for analyzing
imperfectly observed strategic interactions with many agents.

arXiv link: http://arxiv.org/abs/2411.03026v2

Econometrics arXiv paper, submitted: 2024-11-05

Beyond the Traditional VIX: A Novel Approach to Identifying Uncertainty Shocks in Financial Markets

Authors: Ayush Jha, Abootaleb Shirvani, Svetlozar T. Rachev, Frank J. Fabozzi

We introduce a new identification strategy for uncertainty shocks to explain
macroeconomic volatility in financial markets. The Chicago Board Options
Exchange Volatility Index (VIX) measures market expectations of future
volatility, but traditional methods based on second-moment shocks and
time-varying volatility of the VIX often fail to capture the non-Gaussian,
heavy-tailed nature of asset returns. To address this, we construct a revised
VIX by fitting a double-subordinated Normal Inverse Gaussian Levy process to
S&P 500 option prices, providing a more comprehensive measure of volatility
that reflects the extreme movements and heavy tails observed in financial data.
Using an axiomatic approach, we introduce a general family of risk-reward
ratios, computed with our revised VIX and fitted over a fractional time series
to more accurately identify uncertainty shocks in financial markets.

arXiv link: http://arxiv.org/abs/2411.02804v1

Econometrics arXiv paper, submitted: 2024-11-04

Does Regression Produce Representative Causal Rankings?

Authors: Apoorva Lal

We examine the challenges in ranking multiple treatments based on their
estimated effects when using linear regression or its popular
double-machine-learning variant, the Partially Linear Model (PLM), in the
presence of treatment effect heterogeneity. We demonstrate by example that
overlap-weighting performed by linear models like PLM can produce Weighted
Average Treatment Effects (WATE) that have rankings that are inconsistent with
the rankings of the underlying Average Treatment Effects (ATE). We define this
as ranking reversals and derive a necessary and sufficient condition for
ranking reversals under the PLM. We conclude with several simulation studies
conditions under which ranking reversals occur.

arXiv link: http://arxiv.org/abs/2411.02675v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-11-04

Comment on 'Sparse Bayesian Factor Analysis when the Number of Factors is Unknown' by S. Frühwirth-Schnatter, D. Hosszejni, and H. Freitas Lopes

Authors: Roberto Casarin, Antonio Peruzzi

The techniques suggested in Fr\"uhwirth-Schnatter et al. (2024) concern
sparsity and factor selection and have enormous potential beyond standard
factor analysis applications. We show how these techniques can be applied to
Latent Space (LS) models for network data. These models suffer from well-known
identification issues of the latent factors due to likelihood invariance to
factor translation, reflection, and rotation (see Hoff et al., 2002). A set of
observables can be instrumental in identifying the latent factors via auxiliary
equations (see Liu et al., 2021). These, in turn, share many analogies with the
equations used in factor modeling, and we argue that the factor loading
restrictions may be beneficial for achieving identification.

arXiv link: http://arxiv.org/abs/2411.02531v3

Econometrics arXiv cross-link from cs.CY (cs.CY), submitted: 2024-11-04

Identifying Economic Factors Affecting Unemployment Rates in the United States

Authors: Alrick Green, Ayesha Nasim, Jaydeep Radadia, Devi Manaswi Kallam, Viswas Kalyanam, Samfred Owenga, Huthaifa I. Ashqar

In this study, we seek to understand how macroeconomic factors such as GDP,
inflation, Unemployment Insurance, and S&P 500 index; as well as microeconomic
factors such as health, race, and educational attainment impacted the
unemployment rate for about 20 years in the United States. Our research
question is to identify which factor(s) contributed the most to the
unemployment rate surge using linear regression. Results from our studies
showed that GDP (negative), inflation (positive), Unemployment Insurance
(contrary to popular opinion; negative), and S&P 500 index (negative) were all
significant factors, with inflation being the most important one. As for health
issue factors, our model produced resultant correlation scores for occurrences
of Cardiovascular Disease, Neurological Disease, and Interpersonal Violence
with unemployment. Race as a factor showed a huge discrepancies in the
unemployment rate between Black Americans compared to their counterparts.
Asians had the lowest unemployment rate throughout the years. As for education
attainment, results showed that having a higher education attainment
significantly reduced one chance of unemployment. People with higher degrees
had the lowest unemployment rate. Results of this study will be beneficial for
policymakers and researchers in understanding the unemployment rate during the
pandemic.

arXiv link: http://arxiv.org/abs/2411.02374v1

Econometrics arXiv paper, submitted: 2024-11-04

On the Asymptotic Properties of Debiased Machine Learning Estimators

Authors: Amilcar Velez

This paper studies the properties of debiased machine learning (DML)
estimators under a novel asymptotic framework, offering insights for improving
the performance of these estimators in applications. DML is an estimation
method suited to economic models where the parameter of interest depends on
unknown nuisance functions that must be estimated. It requires weaker
conditions than previous methods while still ensuring standard asymptotic
properties. Existing theoretical results do not distinguish between two
alternative versions of DML estimators, DML1 and DML2. Under a new asymptotic
framework, this paper demonstrates that DML2 asymptotically dominates DML1 in
terms of bias and mean squared error, formalizing a previous conjecture based
on simulation results regarding their relative performance. Additionally, this
paper provides guidance for improving the performance of DML2 in applications.

arXiv link: http://arxiv.org/abs/2411.01864v1

Econometrics arXiv updated paper (originally submitted: 2024-11-04)

Estimating Nonseparable Selection Models: A Functional Contraction Approach

Authors: Fan Wu, Yi Xin

We propose a novel method for estimating nonseparable selection models. We
show that, given the selection rule and the observed selected outcome
distribution, the potential outcome distribution can be characterized as the
fixed point of an operator, which we prove to be a functional contraction. We
propose a two-step semiparametric maximum likelihood estimator to estimate the
selection model and the potential outcome distribution. The consistency and
asymptotic normality of the estimator are established. Our approach performs
well in Monte Carlo simulations and is applicable in a variety of empirical
settings where only a selected sample of outcomes is observed. Examples include
consumer demand models with only transaction prices, auctions with incomplete
bid data, and Roy models with data on accepted wages.

arXiv link: http://arxiv.org/abs/2411.01799v2

Econometrics arXiv updated paper (originally submitted: 2024-11-03)

Understanding the decision-making process of choice modellers

Authors: Gabriel Nova, Sander van Cranenburgh, Stephane Hess

Discrete Choice Modelling serves as a robust framework for modelling human
choice behaviour across various disciplines. Building a choice model is a semi
structured research process that involves a combination of a priori
assumptions, behavioural theories, and statistical methods. This complex set of
decisions, coupled with diverse workflows, can lead to substantial variability
in model outcomes. To better understand these dynamics, we developed the
Serious Choice Modelling Game, which simulates the real world modelling process
and tracks modellers' decisions in real time using a stated preference dataset.
Participants were asked to develop choice models to estimate Willingness to Pay
values to inform policymakers about strategies for reducing noise pollution.
The game recorded actions across multiple phases, including descriptive
analysis, model specification, and outcome interpretation, allowing us to
analyse both individual decisions and differences in modelling approaches.
While our findings reveal a strong preference for using data visualisation
tools in descriptive analysis, it also identifies gaps in missing values
handling before model specification. We also found significant variation in the
modelling approach, even when modellers were working with the same choice
dataset. Despite the availability of more complex models, simpler models such
as Multinomial Logit were often preferred, suggesting that modellers tend to
avoid complexity when time and resources are limited. Participants who engaged
in more comprehensive data exploration and iterative model comparison tended to
achieve better model fit and parsimony, which demonstrate that the
methodological choices made throughout the workflow have significant
implications, particularly when modelling outcomes are used for policy
formulation.

arXiv link: http://arxiv.org/abs/2411.01704v2

Econometrics arXiv paper, submitted: 2024-11-03

Changes-In-Changes For Discrete Treatment

Authors: Onil Boussim

This paper generalizes the changes-in-changes (CIC) model to handle discrete
treatments with more than two categories, extending the binary case of Athey
and Imbens (2006). While the original CIC model is well-suited for binary
treatments, it cannot accommodate multi-category discrete treatments often
found in economic and policy settings. Although recent work has extended CIC to
continuous treatments, there remains a gap for multi-category discrete
treatments. I introduce a generalized CIC model that adapts the rank invariance
assumption to multiple treatment levels, allowing for robust modeling while
capturing the distinct effects of varying treatment intensities.

arXiv link: http://arxiv.org/abs/2411.01617v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-11-03

Educational Effects in Mathematics: Conditional Average Treatment Effect depending on the Number of Treatments

Authors: Tomoko Nagai, Takayuki Okuda, Tomoya Nakamura, Yuichiro Sato, Yusuke Sato, Kensaku Kinjo, Kengo Kawamura, Shin Kikuta, Naoto Kumano-go

This study examines the educational effect of the Academic Support Center at
Kogakuin University. Following the initial assessment, it was suggested that
group bias had led to an underestimation of the Center's true impact. To
address this issue, the authors applied the theory of causal inference. By
using T-learner, the conditional average treatment effect (CATE) of the
Center's face-to-face (F2F) personal assistance program was evaluated.
Extending T-learner, the authors produced a new CATE function that depends on
the number of treatments (F2F sessions) and used the estimated function to
predict the CATE performance of F2F assistance.

arXiv link: http://arxiv.org/abs/2411.01498v1

Econometrics arXiv paper, submitted: 2024-11-01

Empirical Welfare Analysis with Hedonic Budget Constraints

Authors: Debopam Bhattacharya, Ekaterina Oparina, Qianya Xu

We analyze demand settings where heterogeneous consumers maximize utility for
product attributes subject to a nonlinear budget constraint. We develop
nonparametric methods for welfare-analysis of interventions that change the
constraint. Two new findings are Roy's identity for smooth, nonlinear budgets,
which yields a Partial Differential Equation system, and a Slutsky-like
symmetry condition for demand. Under scalar unobserved heterogeneity and
single-crossing preferences, the coefficient functions in the PDEs are
nonparametrically identified, and under symmetry, lead to path-independent,
money-metric welfare. We illustrate our methods with welfare evaluation of a
hypothetical change in relationship between property rent and neighborhood
school-quality using British microdata.

arXiv link: http://arxiv.org/abs/2411.01064v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-11-01

Higher-Order Causal Message Passing for Experimentation with Complex Interference

Authors: Mohsen Bayati, Yuwei Luo, William Overman, Sadegh Shirani, Ruoxuan Xiong

Accurate estimation of treatment effects is essential for decision-making
across various scientific fields. This task, however, becomes challenging in
areas like social sciences and online marketplaces, where treating one
experimental unit can influence outcomes for others through direct or indirect
interactions. Such interference can lead to biased treatment effect estimates,
particularly when the structure of these interactions is unknown. We address
this challenge by introducing a new class of estimators based on causal
message-passing, specifically designed for settings with pervasive, unknown
interference. Our estimator draws on information from the sample mean and
variance of unit outcomes and treatments over time, enabling efficient use of
observed data to estimate the evolution of the system state. Concretely, we
construct non-linear features from the moments of unit outcomes and treatments
and then learn a function that maps these features to future mean and variance
of unit outcomes. This allows for the estimation of the treatment effect over
time. Extensive simulations across multiple domains, using synthetic and real
network data, demonstrate the efficacy of our approach in estimating total
treatment effect dynamics, even in cases where interference exhibits
non-monotonic behavior in the probability of treatment.

arXiv link: http://arxiv.org/abs/2411.00945v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-11-01

Calibrated quantile prediction for Growth-at-Risk

Authors: Pietro Bogani, Matteo Fontana, Luca Neri, Simone Vantini

Accurate computation of robust estimates for extremal quantiles of empirical
distributions is an essential task for a wide range of applicative fields,
including economic policymaking and the financial industry. Such estimates are
particularly critical in calculating risk measures, such as Growth-at-Risk
(GaR). % and Value-at-Risk (VaR). This work proposes a conformal framework to
estimate calibrated quantiles, and presents an extensive simulation study and a
real-world analysis of GaR to examine its benefits with respect to the state of
the art. Our findings show that CP methods consistently improve the calibration
and robustness of quantile estimates at all levels. The calibration gains are
appreciated especially at extremal quantiles, which are critical for risk
assessment and where traditional methods tend to fall short. In addition, we
introduce a novel property that guarantees coverage under the exchangeability
assumption, providing a valuable tool for managing risks by quantifying and
controlling the likelihood of future extreme observations.

arXiv link: http://arxiv.org/abs/2411.00520v1

Econometrics arXiv paper, submitted: 2024-11-01

Inference in a Stationary/Nonstationary Autoregressive Time-Varying-Parameter Model

Authors: Donald W. K. Andrews, Ming Li

This paper considers nonparametric estimation and inference in first-order
autoregressive (AR(1)) models with deterministically time-varying parameters. A
key feature of the proposed approach is to allow for time-varying stationarity
in some time periods, time-varying nonstationarity (i.e., unit root or
local-to-unit root behavior) in other periods, and smooth transitions between
the two. The estimation of the AR parameter at any time point is based on a
local least squares regression method, where the relevant initial condition is
endogenous. We obtain limit distributions for the AR parameter estimator and
t-statistic at a given point $\tau$ in time when the parameter exhibits unit
root, local-to-unity, or stationary/stationary-like behavior at time $\tau$.
These results are used to construct confidence intervals and median-unbiased
interval estimators for the AR parameter at any specified point in time. The
confidence intervals have correct asymptotic coverage probabilities with the
coverage holding uniformly over stationary and nonstationary behavior of the
observations.

arXiv link: http://arxiv.org/abs/2411.00358v1

Econometrics arXiv paper, submitted: 2024-10-31

The ET Interview: Professor Joel L. Horowitz

Authors: Sokbae Lee

Joel L. Horowitz has made profound contributions to many areas in
econometrics and statistics. These include bootstrap methods, semiparametric
and nonparametric estimation, specification testing, nonparametric instrumental
variables estimation, high-dimensional models, functional data analysis, and
shape restrictions, among others. Originally trained as a physicist, Joel made
a pivotal transition to econometrics, greatly benefiting our profession.
Throughout his career, he has collaborated extensively with a diverse range of
coauthors, including students, departmental colleagues, and scholars from
around the globe. Joel was born in 1941 in Pasadena, California. He attended
Stanford for his undergraduate studies and obtained his Ph.D. in physics from
Cornell in 1967. He has been Charles E. and Emma H. Morrison Professor of
Economics at Northwestern University since 2001. Prior to that, he was a
faculty member at the University of Iowa (1982-2001). He has served as a
co-editor of Econometric Theory (1992-2000) and Econometrica (2000-2004). He is
a Fellow of the Econometric Society and of the American Statistical
Association, and an elected member of the International Statistical Institute.
The majority of this interview took place in London during June 2022.

arXiv link: http://arxiv.org/abs/2411.00886v1

Econometrics arXiv updated paper (originally submitted: 2024-10-31)

Bagging the Network

Authors: Ming Li, Zhentao Shi, Yapeng Zheng

This paper studies parametric estimation and inference in a dyadic network
formation model with nontransferable utilities, incorporating observed
covariates and unobservable individual fixed effects. We address both
theoretical and computational challenges of maximum likelihood estimation in
this complex network model by proposing a new bootstrap aggregating (bagging)
estimator, which is asymptotically normal, unbiased, and efficient. We extend
the approach to estimating average partial effects and analyzing link function
misspecification. Simulations demonstrate strong finite-sample performance. Two
empirical applications to Nyakatoke risk-sharing networks and Indian
microfinance data find insignificant roles of wealth differences in link
formation and the strong influence of caste in Indian villages, respectively.

arXiv link: http://arxiv.org/abs/2410.23852v2

Econometrics arXiv paper, submitted: 2024-10-31

Machine Learning Debiasing with Conditional Moment Restrictions: An Application to LATE

Authors: Facundo Argañaraz, Juan Carlos Escanciano

Models with Conditional Moment Restrictions (CMRs) are popular in economics.
These models involve finite and infinite dimensional parameters. The infinite
dimensional components include conditional expectations, conditional choice
probabilities, or policy functions, which might be flexibly estimated using
Machine Learning tools. This paper presents a characterization of locally
debiased moments for regular models defined by general semiparametric CMRs with
possibly different conditioning variables. These moments are appealing as they
are known to be less affected by first-step bias. Additionally, we study their
existence and relevance. Such results apply to a broad class of smooth
functionals of finite and infinite dimensional parameters that do not
necessarily appear in the CMRs. As a leading application of our theory, we
characterize debiased machine learning for settings of treatment effects with
endogeneity, giving necessary and sufficient conditions. We present a large
class of relevant debiased moments in this context. We then propose the
Compliance Machine Learning Estimator (CML), based on a practically convenient
orthogonal relevant moment. We show that the resulting estimand can be written
as a convex combination of conditional local average treatment effects (LATE).
Altogether, CML enjoys three appealing properties in the LATE framework: (1)
local robustness to first-stage estimation, (2) an estimand that can be
identified under a minimal relevance condition, and (3) a meaningful causal
interpretation. Our numerical experimentation shows satisfactory relative
performance of such an estimator. Finally, we revisit the Oregon Health
Insurance Experiment, analyzed by Finkelstein et al. (2012). We find that the
use of machine learning and CML suggest larger positive effects on health care
utilization than previously determined.

arXiv link: http://arxiv.org/abs/2410.23785v1

Econometrics arXiv updated paper (originally submitted: 2024-10-31)

Moments by Integrating the Moment-Generating Function

Authors: Peter Reinhard Hansen, Chen Tong

We introduce a novel method for obtaining a wide variety of moments of a
random variable with a well-defined moment-generating function (MGF). We derive
new expressions for fractional moments and fractional absolute moments, both
central and non-central moments. The new moment expressions are relatively
simple integrals that involve the MGF, but do not require its derivatives. We
label the new method CMGF because it uses a complex extension of the MGF and
can be used to obtain complex moments. We illustrate the new method with three
applications where the MGF is available in closed-form, while the corresponding
densities and the derivatives of the MGF are either unavailable or very
difficult to obtain.

arXiv link: http://arxiv.org/abs/2410.23587v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-10-31

On the consistency of bootstrap for matching estimators

Authors: Ziming Lin, Fang Han

In a landmark paper, Abadie and Imbens (2008) showed that the naive bootstrap
is inconsistent when applied to nearest neighbor matching estimators of the
average treatment effect with a fixed number of matches. Since then, this
finding has inspired numerous efforts to address the inconsistency issue,
typically by employing alternative bootstrap methods. In contrast, this paper
shows that the naive bootstrap is provably consistent for the original matching
estimator, provided that the number of matches, $M$, diverges. The bootstrap
inconsistency identified by Abadie and Imbens (2008) thus arises solely from
the use of a fixed $M$.

arXiv link: http://arxiv.org/abs/2410.23525v2

Econometrics arXiv paper, submitted: 2024-10-29

Inference in Partially Linear Models under Dependent Data with Deep Neural Networks

Authors: Chad Brown

I consider inference in a partially linear regression model under stationary
$\beta$-mixing data after first stage deep neural network (DNN) estimation.
Using the DNN results of Brown (2024), I show that the estimator for the finite
dimensional parameter, constructed using DNN-estimated nuisance components,
achieves $n$-consistency and asymptotic normality. By avoiding sample
splitting, I address one of the key challenges in applying machine learning
techniques to econometric models with dependent data. In a future version of
this work, I plan to extend these results to obtain general conditions for
semiparametric inference after DNN estimation of nuisance components, which
will allow for considerations such as more efficient estimation procedures, and
instrumental variable settings.

arXiv link: http://arxiv.org/abs/2410.22574v1

Econometrics arXiv updated paper (originally submitted: 2024-10-28)

Forecasting Political Stability in GCC Countries

Authors: Mahdi Goldani

Political stability is crucial for the socioeconomic development of nations,
particularly in geopolitically sensitive regions such as the Gulf Cooperation
Council Countries, Saudi Arabia, UAE, Kuwait, Qatar, Oman, and Bahrain. This
study focuses on predicting the political stability index for these six
countries using machine learning techniques. The study uses data from the World
Banks comprehensive dataset, comprising 266 indicators covering economic,
political, social, and environmental factors. Employing the Edit Distance on
Real Sequence method for feature selection and XGBoost for model training, the
study forecasts political stability trends for the next five years. The model
achieves high accuracy, with mean absolute percentage error values under 10,
indicating reliable predictions. The forecasts suggest that Oman, the UAE, and
Qatar will experience relatively stable political conditions, while Saudi
Arabia and Bahrain may continue to face negative political stability indices.
The findings underscore the significance of economic factors such as GDP and
foreign investment, along with variables related to military expenditure and
international tourism, as key predictors of political stability. These results
provide valuable insights for policymakers, enabling proactive measures to
enhance governance and mitigate potential risks.

arXiv link: http://arxiv.org/abs/2410.21516v2

Econometrics arXiv updated paper (originally submitted: 2024-10-28)

Economic Diversification and Social Progress in the GCC Countries: A Study on the Transition from Oil-Dependency to Knowledge-Based Economies

Authors: Mahdi Goldani, Soraya Asadi Tirvan

The Gulf Cooperation Council countries -- Oman, Bahrain, Kuwait, UAE, Qatar,
and Saudi Arabia -- holds strategic significance due to its large oil reserves.
However, these nations face considerable challenges in shifting from
oil-dependent economies to more diversified, knowledge-based systems. This
study examines the progress of Gulf Cooperation Council (GCC) countries in
achieving economic diversification and social development, focusing on the
Social Progress Index (SPI), which provides a broader measure of societal
well-being beyond just economic growth. Using data from the World Bank,
covering 2010 to 2023, the study employs the XGBoost machine learning model to
forecast SPI values for the period of 2024 to 2026. Key components of the
methodology include data preprocessing, feature selection, and the simulation
of independent variables through ARIMA modeling. The results highlight
significant improvements in education, healthcare, and women's rights,
contributing to enhanced SPI performance across the GCC countries. However,
notable challenges persist in areas like personal rights and inclusivity. The
study further indicates that despite economic setbacks caused by global
disruptions, including the COVID-19 pandemic and oil price volatility, GCC
nations are expected to see steady improvements in their SPI scores through
2027. These findings underscore the critical importance of economic
diversification, investment in human capital, and ongoing social reforms to
reduce dependence on hydrocarbons and build knowledge-driven economies. This
research offers valuable insights for policymakers aiming to strengthen both
social and economic resilience in the region while advancing long-term
sustainable development goals.

arXiv link: http://arxiv.org/abs/2410.21505v2

Econometrics arXiv paper, submitted: 2024-10-28

Difference-in-Differences with Time-varying Continuous Treatments using Double/Debiased Machine Learning

Authors: Michel F. C. Haddad, Martin Huber, Lucas Z. Zhang

We propose a difference-in-differences (DiD) method for a time-varying
continuous treatment and multiple time periods. Our framework assesses the
average treatment effect on the treated (ATET) when comparing two non-zero
treatment doses. The identification is based on a conditional parallel trend
assumption imposed on the mean potential outcome under the lower dose, given
observed covariates and past treatment histories. We employ kernel-based ATET
estimators for repeated cross-sections and panel data adopting the
double/debiased machine learning framework to control for covariates and past
treatment histories in a data-adaptive manner. We also demonstrate the
asymptotic normality of our estimation approach under specific regularity
conditions. In a simulation study, we find a compelling finite sample
performance of undersmoothed versions of our estimators in setups with several
thousand observations.

arXiv link: http://arxiv.org/abs/2410.21105v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-10-28

On Spatio-Temporal Stochastic Frontier Models

Authors: Elisa Fusco, Giuseppe Arbia, Francesco Vidoli, Vincenzo Nardelli

In the literature on stochastic frontier models until the early 2000s, the
joint consideration of spatial and temporal dimensions was often inadequately
addressed, if not completely neglected. However, from an evolutionary economics
perspective, the production process of the decision-making units constantly
changes over both dimensions: it is not stable over time due to managerial
enhancements and/or internal or external shocks, and is influenced by the
nearest territorial neighbours. This paper proposes an extension of the Fusco
and Vidoli [2013] SEM-like approach, which globally accounts for spatial and
temporal effects in the term of inefficiency. In particular, coherently with
the stochastic panel frontier literature, two different versions of the model
are proposed: the time-invariant and the time-varying spatial stochastic
frontier models. In order to evaluate the inferential properties of the
proposed estimators, we first run Monte Carlo experiments and we then present
the results of an application to a set of commonly referenced data,
demonstrating robustness and stability of estimates across all scenarios.

arXiv link: http://arxiv.org/abs/2410.20915v1

Econometrics arXiv paper, submitted: 2024-10-28

A Distributed Lag Approach to the Generalised Dynamic Factor Model (GDFM)

Authors: Philipp Gersing

We provide estimation and inference for the Generalised Dynamic Factor Model
(GDFM) under the assumption that the dynamic common component can be expressed
in terms of a finite number of lags of contemporaneously pervasive factors. The
proposed estimator is simply an OLS regression of the observed variables on
factors extracted via static principal components and therefore avoids
frequency domain techniques entirely.

arXiv link: http://arxiv.org/abs/2410.20885v1

Econometrics arXiv updated paper (originally submitted: 2024-10-28)

Robust Network Targeting with Multiple Nash Equilibria

Authors: Guanyi Wang

Many policy problems involve designing individualized treatment allocation
rules to maximize the equilibrium social welfare of interacting agents.
Focusing on large-scale simultaneous decision games with strategic
complementarities, we develop a method to estimate an optimal treatment
allocation rule that is robust to the presence of multiple equilibria. Our
approach remains agnostic about changes in the equilibrium selection mechanism
under counterfactual policies, and we provide a closed-form expression for the
boundary of the set-identified equilibrium outcomes. To address the
incompleteness that arises when an equilibrium selection mechanism is not
specified, we use the maximin welfare criterion to select a policy, and
implement this policy using a greedy algorithm. We establish a performance
guarantee for our method by deriving a welfare regret bound, which accounts for
sampling uncertainty and the use of the greedy algorithm. We demonstrate our
method with an application to the microfinance dataset of Banerjee et al.
(2013).

arXiv link: http://arxiv.org/abs/2410.20860v2

Econometrics arXiv updated paper (originally submitted: 2024-10-27)

International vulnerability of inflation

Authors: Ignacio Garrón, C. Vladimir Rodríguez-Caballero, Esther Ruiz

In a globalised world, inflation in a given country may be becoming less
responsive to domestic economic activity, while being increasingly determined
by international conditions. Consequently, understanding the international
sources of vulnerability of domestic inflation is turning fundamental for
policy makers. In this paper, we propose the construction of Inflation-at-risk
and Deflation-at-risk measures of vulnerability obtained using factor-augmented
quantile regressions estimated with international factors extracted from a
multi-level Dynamic Factor Model with overlapping blocks of inflations
corresponding to economies grouped either in a given geographical region or
according to their development level. The methodology is implemented to
inflation observed monthly from 1999 to 2022 for over 115 countries. We
conclude that, in a large number of developed countries, international factors
are relevant to explain the right tail of the distribution of inflation, and,
consequently, they are more relevant for the vulnerability related to high
inflation than for average or low inflation. However, while inflation of
developing low-income countries is hardly affected by international conditions,
the results for middle-income countries are mixed. Finally, based on a
rolling-window out-of-sample forecasting exercise, we show that the predictive
power of international factors has increased in the most recent years of high
inflation.

arXiv link: http://arxiv.org/abs/2410.20628v2

Econometrics arXiv paper, submitted: 2024-10-26

Jacobian-free Efficient Pseudo-Likelihood (EPL) Algorithm

Authors: Takeshi Fukasawa

This study proposes a simple procedure to compute Efficient Pseudo Likelihood
(EPL) estimator proposed by Dearing and Blevins (2024) for estimating dynamic
discrete games, without computing Jacobians of equilibrium constraints. EPL
estimator is efficient, convergent, and computationally fast. However, the
original algorithm requires deriving and coding the Jacobians, which are
cumbersome and prone to coding mistakes especially when considering complicated
models. The current study proposes to avoid the computation of Jacobians by
combining the ideas of numerical derivatives (for computing Jacobian-vector
products) and the Krylov method (for solving linear equations). It shows good
computational performance of the proposed method by numerical experiments.

arXiv link: http://arxiv.org/abs/2410.20029v1

Econometrics arXiv paper, submitted: 2024-10-25

Testing the effects of an unobservable factor: Do marriage prospects affect college major choice?

Authors: Hayri Alper Arslan, Brantly Callaway, Tong Li

Motivated by studying the effects of marriage prospects on students' college
major choices, this paper develops a new econometric test for analyzing the
effects of an unobservable factor in a setting where this factor potentially
influences both agents' decisions and a binary outcome variable. Our test is
built upon a flexible copula-based estimation procedure and leverages the
ordered nature of latent utilities of the polychotomous choice model. Using the
proposed method, we demonstrate that marriage prospects significantly influence
the college major choices of college graduates participating in the National
Longitudinal Study of Youth (97) Survey. Furthermore, we validate the
robustness of our findings with alternative tests that use stated marriage
expectation measures from our data, thereby demonstrating the applicability and
validity of our testing procedure in real-life scenarios.

arXiv link: http://arxiv.org/abs/2410.19947v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-10-25

Unified Causality Analysis Based on the Degrees of Freedom

Authors: András Telcs, Marcell T. Kurbucz, Antal Jakovác

Temporally evolving systems are typically modeled by dynamic equations. A key
challenge in accurate modeling is understanding the causal relationships
between subsystems, as well as identifying the presence and influence of
unobserved hidden drivers on the observed dynamics. This paper presents a
unified method capable of identifying fundamental causal relationships between
pairs of systems, whether deterministic or stochastic. Notably, the method also
uncovers hidden common causes beyond the observed variables. By analyzing the
degrees of freedom in the system, our approach provides a more comprehensive
understanding of both causal influence and hidden confounders. This unified
framework is validated through theoretical models and simulations,
demonstrating its robustness and potential for broader application.

arXiv link: http://arxiv.org/abs/2410.19469v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-10-25

Robust Time Series Causal Discovery for Agent-Based Model Validation

Authors: Gene Yu, Ce Guo, Wayne Luk

Agent-Based Model (ABM) validation is crucial as it helps ensuring the
reliability of simulations, and causal discovery has become a powerful tool in
this context. However, current causal discovery methods often face accuracy and
robustness challenges when applied to complex and noisy time series data, which
is typical in ABM scenarios. This study addresses these issues by proposing a
Robust Cross-Validation (RCV) approach to enhance causal structure learning for
ABM validation. We develop RCV-VarLiNGAM and RCV-PCMCI, novel extensions of two
prominent causal discovery algorithms. These aim to reduce the impact of noise
better and give more reliable causal relation results, even with
high-dimensional, time-dependent data. The proposed approach is then integrated
into an enhanced ABM validation framework, which is designed to handle diverse
data and model structures.
The approach is evaluated using synthetic datasets and a complex simulated
fMRI dataset. The results demonstrate greater reliability in causal structure
identification. The study examines how various characteristics of datasets
affect the performance of established causal discovery methods. These
characteristics include linearity, noise distribution, stationarity, and causal
structure density. This analysis is then extended to the RCV method to see how
it compares in these different situations. This examination helps confirm
whether the results are consistent with existing literature and also reveals
the strengths and weaknesses of the novel approaches.
By tackling key methodological challenges, the study aims to enhance ABM
validation with a more resilient valuation framework presented. These
improvements increase the reliability of model-driven decision making processes
in complex systems analysis.

arXiv link: http://arxiv.org/abs/2410.19412v1

Econometrics arXiv updated paper (originally submitted: 2024-10-24)

Inference on Multiple Winners with Applications to Economic Mobility

Authors: Andreas Petrou-Zeniou, Azeem M. Shaikh

This paper considers the problem of inference on multiple winners. In our
setting, a winner is defined abstractly as any population whose rank according
to some random quantity, such as an estimated treatment effect, a measure of
value-added, or benefit (net of cost), falls in a pre-specified range of
values. As such, this framework generalizes the inference on a single winner
setting previously considered in Andrews et al. (2023), in which a winner is
understood to be the single population whose rank according to some random
quantity is highest. We show that this richer setting accommodates a broad
variety of empirically-relevant applications. We develop a two-step method for
inference in the spirit of Romano et al. (2014), which we compare to existing
methods or their natural generalizations to this setting. We first show the
finite-sample validity of this method in a normal location model and then
develop asymptotic counterparts to these results by proving uniform validity
over a large class of distributions satisfying a weak uniform integrability
condition. Importantly, our results permit degeneracy in the covariance matrix
of the limiting distribution, which arises naturally in many applications. In
an application to the literature on economic mobility, we find that it is
difficult to distinguish between high and low mobility census tracts when
correcting for selection. Finally, we demonstrate the practical relevance of
our theoretical results through an extensive set of simulations.

arXiv link: http://arxiv.org/abs/2410.19212v4

Econometrics arXiv updated paper (originally submitted: 2024-10-24)

Heterogeneous Treatment Effects via Linear Dynamic Panel Data Models

Authors: Philip Marx, Elie Tamer, Xun Tang

We study the identification of heterogeneous, intertemporal treatment effects
(TE) when potential outcomes depend on past treatments. First, applying a
dynamic panel data model to observed outcomes, we show that an instrumental
variable (IV) version of the estimand in Arellano and Bond (1991) recovers a
non-convex (negatively weighted) aggregate of TE plus non-vanishing trends. We
then provide conditions on sequential exchangeability (SE) of treatment and on
TE heterogeneity that reduce such an IV estimand to a convex (positively
weighted) aggregate of TE. Second, even when SE is generically violated, such
estimands identify causal parameters when potential outcomes are generated by
dynamic panel data models with some homogeneity or mild selection assumptions.
Finally, we motivate SE and compare it with parallel trends (PT) in various
settings with experimental data (when treatments are sequentially randomized)
and observational data (when treatments are dynamic, rational choices under
learning).

arXiv link: http://arxiv.org/abs/2410.19060v2

Econometrics arXiv updated paper (originally submitted: 2024-10-24)

Inference on High Dimensional Selective Labeling Models

Authors: Shakeeb Khan, Elie Tamer, Qingsong Yao

A class of simultaneous equation models arise in the many domains where
observed binary outcomes are themselves a consequence of the existing choices
of of one of the agents in the model. These models are gaining increasing
interest in the computer science and machine learning literatures where they
refer the potentially endogenous sample selection as the {\em selective labels}
problem. Empirical settings for such models arise in fields as diverse as
criminal justice, health care, and insurance. For important recent work in this
area, see for example Lakkaruju et al. (2017), Kleinberg et al. (2018), and
Coston et al.(2021) where the authors focus on judicial bail decisions, and
where one observes the outcome of whether a defendant filed to return for their
court appearance only if the judge in the case decides to release the defendant
on bail. Identifying and estimating such models can be computationally
challenging for two reasons. One is the nonconcavity of the bivariate
likelihood function, and the other is the large number of covariates in each
equation. Despite these challenges, in this paper we propose a novel
distribution free estimation procedure that is computationally friendly in many
covariates settings. The new method combines the semiparametric batched
gradient descent algorithm introduced in Khan et al.(2023) with a novel sorting
algorithms incorporated to control for selection bias. Asymptotic properties of
the new procedure are established under increasing dimension conditions in both
equations, and its finite sample properties are explored through a simulation
study and an application using judicial bail data.

arXiv link: http://arxiv.org/abs/2410.18381v3

Econometrics arXiv updated paper (originally submitted: 2024-10-23)

Partially Identified Rankings from Pairwise Interactions

Authors: Federico Crippa, Danil Fedchenko

This paper considers the problem of ranking objects based on their latent
merits using data from pairwise interactions. We allow for incomplete
observation of these interactions and study what can be inferred about rankings
in such settings. First, we show that identification of the ranking depends on
a trade-off between the tournament graph and the interaction function: in
parametric models, such as the Bradley-Terry-Luce, rankings are point
identified even with sparse graphs, whereas nonparametric models require dense
graphs. Second, moving beyond point identification, we characterize the
identified set in the nonparametric model under any tournament structure and
represent it through moment inequalities. Finally, we propose a
likelihood-based statistic to test whether a ranking belongs to the identified
set. We study two testing procedures: one is finite-sample valid but
computationally intensive; the other is easy to implement and valid
asymptotically. We illustrate our results using Brazilian employer-employee
data to study how workers rank firms when moving across jobs.

arXiv link: http://arxiv.org/abs/2410.18272v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-10-23

Detecting Spatial Outliers: the Role of the Local Influence Function

Authors: Giuseppe Arbia, Vincenzo Nardelli

In the analysis of large spatial datasets, identifying and treating spatial
outliers is essential for accurately interpreting geographical phenomena. While
spatial correlation measures, particularly Local Indicators of Spatial
Association (LISA), are widely used to detect spatial patterns, the presence of
abnormal observations frequently distorts the landscape and conceals critical
spatial relationships. These outliers can significantly impact analysis due to
the inherent spatial dependencies present in the data. Traditional influence
function (IF) methodologies, commonly used in statistical analysis to measure
the impact of individual observations, are not directly applicable in the
spatial context because the influence of an observation is determined not only
by its own value but also by its spatial location, its connections with
neighboring regions, and the values of those neighboring observations. In this
paper, we introduce a local version of the influence function (LIF) that
accounts for these spatial dependencies. Through the analysis of both simulated
and real-world datasets, we demonstrate how the LIF provides a more nuanced and
accurate detection of spatial outliers compared to traditional LISA measures
and local impact assessments, improving our understanding of spatial patterns.

arXiv link: http://arxiv.org/abs/2410.18261v1

Econometrics arXiv updated paper (originally submitted: 2024-10-23)

On the Existence of One-Sided Representations for the Generalised Dynamic Factor Model

Authors: Philipp Gersing

We show that the common component of the Generalised Dynamic Factor Model
(GDFM) can be represented using only current and past observations basically
whenever it is purely non-deterministic.

arXiv link: http://arxiv.org/abs/2410.18159v3

Econometrics arXiv paper, submitted: 2024-10-22

A Bayesian Perspective on the Maximum Score Problem

Authors: Christopher D. Walker

This paper presents a Bayesian inference framework for a linear index
threshold-crossing binary choice model that satisfies a median independence
restriction. The key idea is that the model is observationally equivalent to a
probit model with nonparametric heteroskedasticity. Consequently, Gibbs
sampling techniques from Albert and Chib (1993) and Chib and Greenberg (2013)
lead to a computationally attractive Bayesian inference procedure in which a
Gaussian process forms a conditionally conjugate prior for the natural
logarithm of the skedastic function.

arXiv link: http://arxiv.org/abs/2410.17153v1

Econometrics arXiv updated paper (originally submitted: 2024-10-22)

General Seemingly Unrelated Local Projections

Authors: Florian Huber, Christian Matthes, Michael Pfarrhofer

We develop a Bayesian framework for the efficient estimation of impulse
responses using Local Projections (LPs) with instrumental variables. It
accommodates multiple shocks and instruments, accounts for autocorrelation in
multi-step forecasts by jointly modeling all LPs as a seemingly unrelated
system of equations, defines a flexible yet parsimonious joint prior for
impulse responses based on a Gaussian Process, and allows for joint inference
about the entire vector of impulse responses. We show via Monte Carlo
simulations that our approach delivers more accurate point and uncertainty
estimates than standard methods. To address potential misspecification, we
propose an optional robustification step based on power posteriors.

arXiv link: http://arxiv.org/abs/2410.17105v3

Econometrics arXiv paper, submitted: 2024-10-22

Identifying Conduct Parameters with Separable Demand: A Counterexample to Lau (1982)

Authors: Yuri Matsumura, Suguru Otani

We provide a counterexample to the conduct parameter identification result
established in the foundational work of Lau (1982), which generalizes the
identification theorem of Bresnahan (1982) by relaxing the linearity
assumptions. We identify a separable demand function that still permits
identification and validate this case both theoretically and through numerical
simulations.

arXiv link: http://arxiv.org/abs/2410.16998v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-10-21

A Dynamic Spatiotemporal and Network ARCH Model with Common Factors

Authors: Osman Doğan, Raffaele Mattera, Philipp Otto, Süleyman Taşpınar

We introduce a dynamic spatiotemporal volatility model that extends
traditional approaches by incorporating spatial, temporal, and spatiotemporal
spillover effects, along with volatility-specific observed and latent factors.
The model offers a more general network interpretation, making it applicable
for studying various types of network spillovers. The primary innovation lies
in incorporating volatility-specific latent factors into the dynamic
spatiotemporal volatility model. Using Bayesian estimation via the Markov Chain
Monte Carlo (MCMC) method, the model offers a robust framework for analyzing
the spatial, temporal, and spatiotemporal effects of a log-squared outcome
variable on its volatility. We recommend using the deviance information
criterion (DIC) and a regularized Bayesian MCMC method to select the number of
relevant factors in the model. The model's flexibility is demonstrated through
two applications: a spatiotemporal model applied to the U.S. housing market and
another applied to financial stock market networks, both highlighting the
model's ability to capture varying degrees of interconnectedness. In both
applications, we find strong spatial/network interactions with relatively
stronger spillover effects in the stock market.

arXiv link: http://arxiv.org/abs/2410.16526v1

Econometrics arXiv paper, submitted: 2024-10-21

Asymmetries in Financial Spillovers

Authors: Florian Huber, Karin Klieber, Massimiliano Marcellino, Luca Onorante, Michael Pfarrhofer

This paper analyzes nonlinearities in the international transmission of
financial shocks originating in the US. To do so, we develop a flexible
nonlinear multi-country model. Our framework is capable of producing
asymmetries in the responses to financial shocks for shock size and sign, and
over time. We show that international reactions to US-based financial shocks
are asymmetric along these dimensions. Particularly, we find that adverse
shocks trigger stronger declines in output, inflation, and stock markets than
benign shocks. Further, we investigate time variation in the estimated dynamic
effects and characterize the responsiveness of three major central banks to
financial shocks.

arXiv link: http://arxiv.org/abs/2410.16214v1

Econometrics arXiv paper, submitted: 2024-10-21

Dynamic Biases of Static Panel Data Estimators

Authors: Sylvia Klosin

This paper identifies an important bias - termed dynamic bias - in fixed
effects panel estimators that arises when dynamic feedback is ignored in the
estimating equation. Dynamic feedback occurs if past outcomes impact current
outcomes, a feature of many settings ranging from economic growth to
agricultural and labor markets. When estimating equations omit past outcomes,
dynamic bias can lead to significantly inaccurate treatment effect estimates,
even with randomly assigned treatments. This dynamic bias in simulations is
larger than Nickell bias. I show that dynamic bias stems from the estimation of
fixed effects, as their estimation generates confounding in the data. To
recover consistent treatment effects, I develop a flexible estimator that
provides fixed-T bias correction. I apply this approach to study the impact of
temperature shocks on GDP, a canonical example where economic theory points to
an important feedback from past to future outcomes. Accounting for dynamic bias
lowers the estimated effects of higher yearly temperatures on GDP growth by 10%
and GDP levels by 120%.

arXiv link: http://arxiv.org/abs/2410.16112v1

Econometrics arXiv paper, submitted: 2024-10-21

Semiparametric Bayesian Inference for a Conditional Moment Equality Model

Authors: Christopher D. Walker

Conditional moment equality models are regularly encountered in empirical
economics, yet they are difficult to estimate. These models map a conditional
distribution of data to a structural parameter via the restriction that a
conditional mean equals zero. Using this observation, I introduce a Bayesian
inference framework in which an unknown conditional distribution is replaced
with a nonparametric posterior, and structural parameter inference is then
performed using an implied posterior. The method has the same flexibility as
frequentist semiparametric estimators and does not require converting
conditional moments to unconditional moments. Importantly, I prove a
semiparametric Bernstein-von Mises theorem, providing conditions under which,
in large samples, the posterior for the structural parameter is approximately
normal, centered at an efficient estimator, and has variance equal to the
Chamberlain (1987) semiparametric efficiency bound. As byproducts, I show that
Bayesian uncertainty quantification methods are asymptotically optimal
frequentist confidence sets and derive low-level sufficient conditions for
Gaussian process priors. The latter sheds light on a key prior stability
condition and relates to the numerical aspects of the paper in which these
priors are used to predict the welfare effects of price changes.

arXiv link: http://arxiv.org/abs/2410.16017v1

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2024-10-21

Quantifying world geography as seen through the lens of Soviet propaganda

Authors: M. V. Tamm, M. Oiva, K. D. Mukhina, M. Mets, M. Schich

Cultural data typically contains a variety of biases. In particular,
geographical locations are unequally portrayed in media, creating a distorted
representation of the world. Identifying and measuring such biases is crucial
to understand both the data and the socio-cultural processes that have produced
them. Here we suggest to measure geographical biases in a large historical news
media corpus by studying the representation of cities. Leveraging ideas of
quantitative urban science, we develop a mixed quantitative-qualitative
procedure, which allows us to get robust quantitative estimates of the biases.
These biases can be further qualitatively interpreted resulting in a
hermeneutic feedback loop. We apply this procedure to a corpus of the Soviet
newsreel series 'Novosti Dnya' (News of the Day) and show that city
representation grows super-linearly with city size, and is further biased by
city specialization and geographical location. This allows to systematically
identify geographical regions which are explicitly or sneakily emphasized by
Soviet propaganda and quantify their importance.

arXiv link: http://arxiv.org/abs/2410.15938v2

Econometrics arXiv paper, submitted: 2024-10-21

A Kernelization-Based Approach to Nonparametric Binary Choice Models

Authors: Guo Yan

We propose a new estimator for nonparametric binary choice models that does
not impose a parametric structure on either the systematic function of
covariates or the distribution of the error term. A key advantage of our
approach is its computational efficiency. For instance, even when assuming a
normal error distribution as in probit models, commonly used sieves for
approximating an unknown function of covariates can lead to a large-dimensional
optimization problem when the number of covariates is moderate. Our approach,
motivated by kernel methods in machine learning, views certain reproducing
kernel Hilbert spaces as special sieve spaces, coupled with spectral cut-off
regularization for dimension reduction. We establish the consistency of the
proposed estimator for both the systematic function of covariates and the
distribution function of the error term, and asymptotic normality of the
plug-in estimator for weighted average partial derivatives. Simulation studies
show that, compared to parametric estimation methods, the proposed method
effectively improves finite sample performance in cases of misspecification,
and has a rather mild efficiency loss if the model is correctly specified.
Using administrative data on the grant decisions of US asylum applications to
immigration courts, along with nine case-day variables on weather and
pollution, we re-examine the effect of outdoor temperature on court judges'
"mood", and thus, their grant decisions.

arXiv link: http://arxiv.org/abs/2410.15734v1

Econometrics arXiv updated paper (originally submitted: 2024-10-21)

Distributionally Robust Instrumental Variables Estimation

Authors: Zhaonan Qu, Yongchan Kwon

Instrumental variables (IV) estimation is a fundamental method in
econometrics and statistics for estimating causal effects in the presence of
unobserved confounding. However, challenges such as untestable model
assumptions and poor finite sample properties have undermined its reliability
in practice. Viewing common issues in IV estimation as distributional
uncertainties, we propose DRIVE, a distributionally robust IV estimation
method. We show that DRIVE minimizes a square root variant of ridge regularized
two stage least squares (TSLS) objective when the ambiguity set is based on a
Wasserstein distance. In addition, we develop a novel asymptotic theory for
this estimator, showing that it achieves consistency without requiring the
regularization parameter to vanish. This novel property ensures that the
estimator is robust to distributional uncertainties that persist in large
samples. We further derive the asymptotic distribution of Wasserstein DRIVE and
propose data-driven procedures to select the regularization parameter based on
theoretical results. Simulation studies demonstrate the superior finite sample
performance of Wasserstein DRIVE in terms of estimation error and out-of-sample
prediction. Due to its regularization and robustness properties, Wasserstein
DRIVE presents an appealing option when the practitioner is uncertain about
model assumptions or distributional shifts in data.

arXiv link: http://arxiv.org/abs/2410.15634v2

Econometrics arXiv cross-link from q-fin.PM (q-fin.PM), submitted: 2024-10-19

Conformal Predictive Portfolio Selection

Authors: Masahiro Kato

This study examines portfolio selection using predictive models for portfolio
returns. Portfolio selection is a fundamental task in finance, and a variety of
methods have been developed to achieve this goal. For instance, the
mean-variance approach constructs portfolios by balancing the trade-off between
the mean and variance of asset returns, while the quantile-based approach
optimizes portfolios by considering tail risk. These methods often depend on
distributional information estimated from historical data using predictive
models, each of which carries its own uncertainty. To address this, we propose
a framework for predictive portfolio selection via conformal prediction ,
called Conformal Predictive Portfolio Selection (CPPS). Our approach
forecasts future portfolio returns, computes the corresponding prediction
intervals, and selects the portfolio of interest based on these intervals. The
framework is flexible and can accommodate a wide range of predictive models,
including autoregressive (AR) models, random forests, and neural networks. We
demonstrate the effectiveness of the CPPS framework by applying it to an AR
model and validate its performance through empirical studies, showing that it
delivers superior returns compared to simpler strategies.

arXiv link: http://arxiv.org/abs/2410.16333v2

Econometrics arXiv paper, submitted: 2024-10-19

Predictive Quantile Regression with High-Dimensional Predictors: The Variable Screening Approach

Authors: Hongqi Chen, Ji Hyung Lee

This paper advances a variable screening approach to enhance conditional
quantile forecasts using high-dimensional predictors. We have refined and
augmented the quantile partial correlation (QPC)-based variable screening
proposed by Ma et al. (2017) to accommodate $\beta$-mixing time-series data.
Our approach is inclusive of i.i.d scenarios but introduces new convergence
bounds for time-series contexts, suggesting the performance of QPC-based
screening is influenced by the degree of time-series dependence. Through Monte
Carlo simulations, we validate the effectiveness of QPC under weak dependence.
Our empirical assessment of variable selection for growth-at-risk (GaR)
forecasting underscores the method's advantages, revealing that specific labor
market determinants play a pivotal role in forecasting GaR. While prior
empirical research has predominantly considered a limited set of predictors, we
employ the comprehensive Fred-QD dataset, retaining a richer breadth of
information for GaR forecasts.

arXiv link: http://arxiv.org/abs/2410.15097v1

Econometrics arXiv updated paper (originally submitted: 2024-10-19)

Fast and Efficient Bayesian Analysis of Structural Vector Autoregressions Using the R Package bsvars

Authors: Tomasz Woźniak

The R package bsvars provides a wide range of tools for empirical
macroeconomic and financial analyses using Bayesian Structural Vector
Autoregressions. It uses frontier econometric techniques and C++ code to ensure
fast and efficient estimation of these multivariate dynamic structural models,
possibly with many variables, complex identification strategies, and non-linear
characteristics. The models can be identified using adjustable exclusion
restrictions and heteroskedastic or non-normal shocks. They feature a flexible
three-level equation-specific local-global hierarchical prior distribution for
the estimated level of shrinkage for autoregressive and structural parameters.
Additionally, the package facilitates predictive and structural analyses such
as impulse responses, forecast error variance and historical decompositions,
forecasting, statistical verification of identification and hypotheses on
autoregressive parameters, and analyses of structural shocks, volatilities, and
fitted values. These features differentiate bsvars from existing R packages
that either focus on a specific structural model, do not consider
heteroskedastic shocks, or lack the implementation using compiled code.

arXiv link: http://arxiv.org/abs/2410.15090v2

Econometrics arXiv cross-link from cs.GT (cs.GT), submitted: 2024-10-18

Switchback Price Experiments with Forward-Looking Demand

Authors: Yifan Wu, Ramesh Johari, Vasilis Syrgkanis, Gabriel Y. Weintraub

We consider a retailer running a switchback experiment for the price of a
single product, with infinite supply. In each period, the seller chooses a
price $p$ from a set of predefined prices that consist of a reference price and
a few discounted price levels. The goal is to estimate the demand gradient at
the reference price point, with the goal of adjusting the reference price to
improve revenue after the experiment. In our model, in each period, a unit mass
of buyers arrives on the market, with values distributed based on a
time-varying process. Crucially, buyers are forward looking with a discounted
utility and will choose to not purchase now if they expect to face a discounted
price in the near future. We show that forward-looking demand introduces bias
in naive estimators of the demand gradient, due to intertemporal interference.
Furthermore, we prove that there is no estimator that uses data from price
experiments with only two price points that can recover the correct demand
gradient, even in the limit of an infinitely long experiment with an
infinitesimal price discount. Moreover, we characterize the form of the bias of
naive estimators. Finally, we show that with a simple three price level
experiment, the seller can remove the bias due to strategic forward-looking
behavior and construct an estimator for the demand gradient that asymptotically
recovers the truth.

arXiv link: http://arxiv.org/abs/2410.14904v1

Econometrics arXiv updated paper (originally submitted: 2024-10-18)

Learning the Effect of Persuasion via Difference-In-Differences

Authors: Sung Jae Jun, Sokbae Lee

We develop a difference-in-differences framework to measure the persuasive
impact of informational treatments on behavior. We introduce two causal
parameters, the forward and backward average persuasion rates on the treated,
which refine the average treatment effect on the treated. The forward rate
excludes cases of "preaching to the converted," while the backward rate omits
"talking to a brick wall" cases. We propose both regression-based and
semiparametrically efficient estimators. The framework applies to both
two-period and staggered treatment settings, including event studies, and we
demonstrate its usefulness with applications to a British election and a
Chinese curriculum reform.

arXiv link: http://arxiv.org/abs/2410.14871v3

Econometrics arXiv paper, submitted: 2024-10-18

A GARCH model with two volatility components and two driving factors

Authors: Luca Vincenzo Ballestra, Enzo D'Innocenzo, Christian Tezza

We introduce a novel GARCH model that integrates two sources of uncertainty
to better capture the rich, multi-component dynamics often observed in the
volatility of financial assets. This model provides a quasi closed-form
representation of the characteristic function for future log-returns, from
which semi-analytical formulas for option pricing can be derived. A theoretical
analysis is conducted to establish sufficient conditions for strict
stationarity and geometric ergodicity, while also obtaining the continuous-time
diffusion limit of the model. Empirical evaluations, conducted both in-sample
and out-of-sample using S&P500 time series data, show that our model
outperforms widely used single-factor models in predicting returns and option
prices.

arXiv link: http://arxiv.org/abs/2410.14585v1

Econometrics arXiv paper, submitted: 2024-10-18

GARCH option valuation with long-run and short-run volatility components: A novel framework ensuring positive variance

Authors: Luca Vincenzo Ballestra, Enzo D'Innocenzo, Christian Tezza

Christoffersen, Jacobs, Ornthanalai, and Wang (2008) (CJOW) proposed an
improved Generalized Autoregressive Conditional Heteroskedasticity (GARCH)
model for valuing European options, where the return volatility is comprised of
two distinct components. Empirical studies indicate that the model developed by
CJOW outperforms widely-used single-component GARCH models and provides a
superior fit to options data than models that combine conditional
heteroskedasticity with Poisson-normal jumps. However, a significant limitation
of this model is that it allows the variance process to become negative. Oh and
Park [2023] partially addressed this issue by developing a related model, yet
the positivity of the volatility components is not guaranteed, both
theoretically and empirically. In this paper we introduce a new GARCH model
that improves upon the models by CJOW and Oh and Park [2023], ensuring the
positivity of the return volatility. In comparison to the two earlier GARCH
approaches, our novel methodology shows comparable in-sample performance on
returns data and superior performance on S&P500 options data.

arXiv link: http://arxiv.org/abs/2410.14513v1

Econometrics arXiv updated paper (originally submitted: 2024-10-18)

Identification of a Rank-dependent Peer Effect Model

Authors: Eyo I. Herstad, Myungkou Shin

We develop a model that captures peer effect heterogeneity by modeling the
endogenous spillover to be linear in ordered peer outcomes. Unlike the
canonical linear-in-means model, our approach accounts for the distribution of
peer outcomes as well as the size of peer groups. Under a minimal condition,
our model admits a unique equilibrium and is therefore tractable and
identified. Simulations show our estimator has good finite sample performance.
Finally, we apply our model to educational data from Norway, finding that
higher-performing friends disproportionately drive GPA spillovers. Our
framework provides new insights into the structure of peer effects beyond
aggregate measures.

arXiv link: http://arxiv.org/abs/2410.14317v2

Econometrics arXiv updated paper (originally submitted: 2024-10-17)

The Subtlety of Optimal Paternalism in a Population with Bounded Rationality

Authors: Charles F. Manski, Eytan Sheshinski

We study optimal policy when a paternalistic utilitarian planner has the
power to design a discrete choice set for a heterogeneous population with
bounded rationality. We show that the policy that most effectively constrains
or influences choices depends in a particular multiplicative way on the
preferences of the population and on the choice probabilities conditional on
preferences that measure the suboptimality of behavior. We first consider the
planning problem in abstraction. We then study two settings in which the
planner may mandate an action or decentralize decision making. In one setting,
we suppose that individuals measure utility with additive random error and
maximize mismeasured rather than actual utility. Then optimal planning requires
knowledge of the distribution of measurement errors. In the second setting, we
consider binary treatment choice under uncertainty when the planner can mandate
a treatment conditional on publicly observed personal covariates or can enable
individuals to choose their own treatments conditional on private information.
We focus on situations where bounded rationality takes the form of deviations
between subjective personal beliefs and objective probabilities of uncertain
outcomes. To illustrate, we consider clinical decision making in medicine. In
toto, our analysis is cautionary. It characterizes the subtle nature of optimal
policy, whose determination requires the planner to possess extensive knowledge
that is rarely available. We conclude that studies of policy choice by a
paternalistic utilitarian planner should view not only the population but also
the planner to be boundedly rational.

arXiv link: http://arxiv.org/abs/2410.13658v2

Econometrics arXiv paper, submitted: 2024-10-16

Counterfactual Analysis in Empirical Games

Authors: Brendan Kline, Elie Tamer

We address counterfactual analysis in empirical models of games with
partially identified parameters, and multiple equilibria and/or randomized
strategies, by constructing and analyzing the counterfactual predictive
distribution set (CPDS). This framework accommodates various outcomes of
interest, including behavioral and welfare outcomes. It allows a variety of
changes to the environment to generate the counterfactual, including
modifications of the utility functions, the distribution of utility
determinants, the number of decision makers, and the solution concept. We use a
Bayesian approach to summarize statistical uncertainty. We establish conditions
under which the population CPDS is sharp from the point of view of
identification. We also establish conditions under which the posterior CPDS is
consistent if the posterior distribution for the underlying model parameter is
consistent. Consequently, our results can be employed to conduct counterfactual
analysis after a preliminary step of identifying and estimating the underlying
model parameter based on the existing literature. Our consistency results
involve the development of a new general theory for Bayesian consistency of
posterior distributions for mappings of sets. Although we primarily focus on a
model of a strategic game, our approach is applicable to other structural
models with similar features.

arXiv link: http://arxiv.org/abs/2410.12731v1

Econometrics arXiv paper, submitted: 2024-10-16

A Simple Interactive Fixed Effects Estimator for Short Panels

Authors: Robert F. Phillips, Benjamin D. Williams

We study the interactive effects (IE) model as an extension of the
conventional additive effects (AE) model. For the AE model, the fixed effects
estimator can be obtained by applying least squares to a regression that adds a
linear projection of the fixed effect on the explanatory variables (Mundlak,
1978; Chamberlain, 1984). In this paper, we develop a novel estimator -- the
projection-based IE (PIE) estimator -- for the IE model that is based on a
similar approach. We show that, for the IE model, fixed effects estimators that
have appeared in the literature are not equivalent to our PIE estimator, though
both can be expressed as a generalized within estimator. Unlike the fixed
effects estimators for the IE model, the PIE estimator is consistent for a
fixed number of time periods with no restrictions on serial correlation or
conditional heteroskedasticity in the errors. We also derive a statistic for
testing the consistency of the two-way fixed effects estimator in the possible
presence of iterative effects. Moreover, although the PIE estimator is the
solution to a high-dimensional nonlinear least squares problem, we show that it
can be computed by iterating between two steps, both of which have simple
analytical solutions. The computational simplicity is an important advantage
relative to other strategies that have been proposed for estimating the IE
model for short panels. Finally, we compare the finite sample performance of IE
estimators through simulations.

arXiv link: http://arxiv.org/abs/2410.12709v1

Econometrics arXiv paper, submitted: 2024-10-15

Testing Identifying Assumptions in Parametric Separable Models: A Conditional Moment Inequality Approach

Authors: Leonard Goff, Désiré Kédagni, Huan Wu

In this paper, we propose a simple method for testing identifying assumptions
in parametric separable models, namely treatment exogeneity, instrument
validity, and/or homoskedasticity. We show that the testable implications can
be written in the intersection bounds framework, which is easy to implement
using the inference method proposed in Chernozhukov, Lee, and Rosen (2013), and
the Stata package of Chernozhukov et al. (2015). Monte Carlo simulations
confirm that our test is consistent and controls size. We use our proposed
method to test the validity of some commonly used instrumental variables, such
as the average price in other markets in Nevo and Rosen (2012), the Bartik
instrument in Card (2009), and the test rejects both instrumental variable
models. When the identifying assumptions are rejected, we discuss solutions
that allow researchers to identify some causal parameters of interest after
relaxing functional form assumptions. We show that the IV model is nontestable
if no functional form assumption is made on the outcome equation, when there
exists a one-to-one mapping between the continuous treatment variable, the
instrument, and the first-stage unobserved heterogeneity.

arXiv link: http://arxiv.org/abs/2410.12098v1

Econometrics arXiv updated paper (originally submitted: 2024-10-15)

Aggregation Trees

Authors: Riccardo Di Francesco

Uncovering the heterogeneous effects of particular policies or "treatments"
is a key concern for researchers and policymakers. A common approach is to
report average treatment effects across subgroups based on observable
covariates. However, the choice of subgroups is crucial as it poses the risk of
$p$-hacking and requires balancing interpretability with granularity. This
paper proposes a nonparametric approach to construct heterogeneous subgroups.
The approach enables a flexible exploration of the trade-off between
interpretability and the discovery of more granular heterogeneity by
constructing a sequence of nested groupings, each with an optimality property.
By integrating our approach with "honesty" and debiased machine learning, we
provide valid inference about the average treatment effect of each group. We
validate the proposed methodology through an empirical Monte-Carlo study and
apply it to revisit the impact of maternal smoking on birth weight, revealing
systematic heterogeneity driven by parental and birth-related characteristics.

arXiv link: http://arxiv.org/abs/2410.11408v2

Econometrics arXiv updated paper (originally submitted: 2024-10-15)

Closed-form estimation and inference for panels with attrition and refreshment samples

Authors: Grigory Franguridi, Lidia Kosenkova

It has long been established that, if a panel dataset suffers from attrition,
auxiliary (refreshment) sampling restores full identification under additional
assumptions that still allow for nontrivial attrition mechanisms. Such
identification results rely on implausible assumptions about the attrition
process or lead to theoretically and computationally challenging estimation
procedures. We propose an alternative identifying assumption that, despite its
nonparametric nature, suggests a simple estimation algorithm based on a
transformation of the empirical cumulative distribution function of the data.
This estimation procedure requires neither tuning parameters nor optimization
in the first step, i.e., has a closed form. We prove that our estimator is
consistent and asymptotically normal and demonstrate its good performance in
simulations. We provide an empirical illustration with income data from the
Understanding America Study.

arXiv link: http://arxiv.org/abs/2410.11263v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2024-10-14

Statistical Properties of Deep Neural Networks with Dependent Data

Authors: Chad Brown

This paper establishes statistical properties of deep neural network (DNN)
estimators under dependent data. Two general results for nonparametric sieve
estimators directly applicable to DNN estimators are given. The first
establishes rates for convergence in probability under nonstationary data. The
second provides non-asymptotic probability bounds on $L^{2}$-errors
under stationary $\beta$-mixing data. I apply these results to DNN estimators
in both regression and classification contexts imposing only a standard
H\"older smoothness assumption. The DNN architectures considered are common in
applications, featuring fully connected feedforward networks with any
continuous piecewise linear activation function, unbounded weights, and a width
and depth that grows with sample size. The framework provided also offers
potential for research into other DNN architectures and time-series
applications.

arXiv link: http://arxiv.org/abs/2410.11113v3

Econometrics arXiv paper, submitted: 2024-10-14

Testing the order of fractional integration in the presence of smooth trends, with an application to UK Great Ratios

Authors: Mustafa R. Kılınç, Michael Massmann, Maximilian Ambros

This note proposes semi-parametric tests for investigating whether a
stochastic process is fractionally integrated of order $\delta$, where
$|\delta| < 1/2$, when smooth trends are present in the model. We combine the
semi-parametric approach by Iacone, Nielsen & Taylor (2022) to model the short
range dependence with the use of Chebyshev polynomials by Cuestas & Gil-Alana
to describe smooth trends. Our proposed statistics have standard limiting null
distributions and match the asymptotic local power of infeasible tests based on
unobserved errors. We also establish the conditions under which an information
criterion can consistently estimate the order of the Chebyshev polynomial. The
finite sample performance is evaluated using simulations, and an empirical
application is given for the UK Great Ratios.

arXiv link: http://arxiv.org/abs/2410.10749v1

Econometrics arXiv paper, submitted: 2024-10-13

Large Scale Longitudinal Experiments: Estimation and Inference

Authors: Apoorva Lal, Alexander Fischer, Matthew Wardrop

Large-scale randomized experiments are seldom analyzed using panel regression
methods because of computational challenges arising from the presence of
millions of nuisance parameters. We leverage Mundlak's insight that unit
intercepts can be eliminated by using carefully chosen averages of the
regressors to rewrite several common estimators in a form that is amenable to
weighted-least squares estimation with frequency weights. This renders
regressions involving arbitrary strata intercepts tractable with very large
datasets, optionally with the key compression step computed out-of-memory in
SQL. We demonstrate that these methods yield more precise estimates than other
commonly used estimators, and also find that the compression strategy greatly
increases computational efficiency. We provide in-memory (pyfixest) and
out-of-memory (duckreg) python libraries to implement these estimators.

arXiv link: http://arxiv.org/abs/2410.09952v1

Econometrics arXiv paper, submitted: 2024-10-13

Nickell Meets Stambaugh: A Tale of Two Biases in Panel Predictive Regressions

Authors: Chengwang Liao, Ziwei Mei, Zhentao Shi

In panel predictive regressions with persistent covariates, coexistence of
the Nickell bias and the Stambaugh bias imposes challenges for hypothesis
testing. This paper introduces a new estimator, the IVX-X-Jackknife (IVXJ),
which effectively removes this composite bias and reinstates standard
inferential procedures. The IVXJ estimator is inspired by the IVX technique in
time series. In panel data where the cross section is of the same order as the
time dimension, the bias of the baseline panel IVX estimator can be corrected
via an analytical formula by leveraging an innovative X-Jackknife scheme that
divides the time dimension into the odd and even indices. IVXJ is the first
procedure that achieves unified inference across a wide range of modes of
persistence in panel predictive regressions, whereas such unified inference is
unattainable for the popular within-group estimator. Extended to accommodate
long-horizon predictions with multiple regressions, IVXJ is used to examine the
impact of debt levels on financial crises by panel local projection. Our
empirics provide comparable results across different categories of debt.

arXiv link: http://arxiv.org/abs/2410.09825v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-10-11

Variance reduction combining pre-experiment and in-experiment data

Authors: Zhexiao Lin, Pablo Crespo

Online controlled experiments (A/B testing) are essential in data-driven
decision-making for many companies. Increasing the sensitivity of these
experiments, particularly with a fixed sample size, relies on reducing the
variance of the estimator for the average treatment effect (ATE). Existing
methods like CUPED and CUPAC use pre-experiment data to reduce variance, but
their effectiveness depends on the correlation between the pre-experiment data
and the outcome. In contrast, in-experiment data is often more strongly
correlated with the outcome and thus more informative. In this paper, we
introduce a novel method that combines both pre-experiment and in-experiment
data to achieve greater variance reduction than CUPED and CUPAC, without
introducing bias or additional computation complexity. We also establish
asymptotic theory and provide consistent variance estimators for our method.
Applying this method to multiple online experiments at Etsy, we reach
substantial variance reduction over CUPAC with the inclusion of only a few
in-experiment covariates. These results highlight the potential of our approach
to significantly improve experiment sensitivity and accelerate decision-making.

arXiv link: http://arxiv.org/abs/2410.09027v1

Econometrics arXiv updated paper (originally submitted: 2024-10-09)

On the Lower Confidence Band for the Optimal Welfare in Policy Learning

Authors: Kirill Ponomarev, Vira Semenova

We study inference on the optimal welfare in a policy learning problem and
propose reporting a lower confidence band (LCB). A natural approach to
constructing an LCB is to invert a one-sided t-test based on an efficient
estimator for the optimal welfare. However, we show that for an empirically
relevant class of DGPs, such an LCB can be first-order dominated by an LCB
based on a welfare estimate for a suitable suboptimal treatment policy. We show
that such first-order dominance is possible if and only if the optimal
treatment policy is not “well-separated” from the rest, in the sense of the
commonly imposed margin condition. When this condition fails, standard debiased
inference methods are not applicable. We show that uniformly valid and
easy-to-compute LCBs can be constructed analytically by inverting
moment-inequality tests with the maximum and quasi-likelihood-ratio test
statistics. As an empirical illustration, we revisit the National JTPA study
and find that the proposed LCBs achieve reliable coverage and competitive
length.

arXiv link: http://arxiv.org/abs/2410.07443v3

Econometrics arXiv paper, submitted: 2024-10-09

Collusion Detection with Graph Neural Networks

Authors: Lucas Gomes, Jannis Kueck, Mara Mattes, Martin Spindler, Alexey Zaytsev

Collusion is a complex phenomenon in which companies secretly collaborate to
engage in fraudulent practices. This paper presents an innovative methodology
for detecting and predicting collusion patterns in different national markets
using neural networks (NNs) and graph neural networks (GNNs). GNNs are
particularly well suited to this task because they can exploit the inherent
network structures present in collusion and many other economic problems. Our
approach consists of two phases: In Phase I, we develop and train models on
individual market datasets from Japan, the United States, two regions in
Switzerland, Italy, and Brazil, focusing on predicting collusion in single
markets. In Phase II, we extend the models' applicability through zero-shot
learning, employing a transfer learning approach that can detect collusion in
markets in which training data is unavailable. This phase also incorporates
out-of-distribution (OOD) generalization to evaluate the models' performance on
unseen datasets from other countries and regions. In our empirical study, we
show that GNNs outperform NNs in detecting complex collusive patterns. This
research contributes to the ongoing discourse on preventing collusion and
optimizing detection methodologies, providing valuable guidance on the use of
NNs and GNNs in economic applications to enhance market fairness and economic
welfare.

arXiv link: http://arxiv.org/abs/2410.07091v1

Econometrics arXiv paper, submitted: 2024-10-09

Group Shapley Value and Counterfactual Simulations in a Structural Model

Authors: Yongchan Kwon, Sokbae Lee, Guillaume A. Pouliot

We propose a variant of the Shapley value, the group Shapley value, to
interpret counterfactual simulations in structural economic models by
quantifying the importance of different components. Our framework compares two
sets of parameters, partitioned into multiple groups, and applying group
Shapley value decomposition yields unique additive contributions to the changes
between these sets. The relative contributions sum to one, enabling us to
generate an importance table that is as easily interpretable as a regression
table. The group Shapley value can be characterized as the solution to a
constrained weighted least squares problem. Using this property, we develop
robust decomposition methods to address scenarios where inputs for the group
Shapley value are missing. We first apply our methodology to a simple Roy model
and then illustrate its usefulness by revisiting two published papers.

arXiv link: http://arxiv.org/abs/2410.06875v1

Econometrics arXiv paper, submitted: 2024-10-09

Green bubbles: a four-stage paradigm for detection and propagation

Authors: Gian Luca Vriz, Luigi Grossi

Climate change has emerged as a significant global concern, attracting
increasing attention worldwide. While green bubbles may be examined through a
social bubble hypothesis, it is essential not to neglect a Climate Minsky
moment triggered by sudden asset price changes. The significant increase in
green investments highlights the urgent need for a comprehensive understanding
of these market dynamics. Therefore, the current paper introduces a novel
paradigm for studying such phenomena. Focusing on the renewable energy sector,
Statistical Process Control (SPC) methodologies are employed to identify green
bubbles within time series data. Furthermore, search volume indexes and social
factors are incorporated into established econometric models to reveal
potential implications for the financial system. Inspired by Joseph
Schumpeter's perspectives on business cycles, this study recognizes green
bubbles as a necessary evil for facilitating a successful transition towards a
more sustainable future.

arXiv link: http://arxiv.org/abs/2410.06564v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-10-08

Persistence-Robust Break Detection in Predictive Quantile and CoVaR Regressions

Authors: Yannick Hoga

Forecasting risk (as measured by quantiles) and systemic risk (as measured by
Adrian and Brunnermeiers's (2016) CoVaR) is important in economics and finance.
However, past research has shown that predictive relationships may be unstable
over time. Therefore, this paper develops structural break tests in predictive
quantile and CoVaR regressions. These tests can detect changes in the
forecasting power of covariates, and are based on the principle of
self-normalization. We show that our tests are valid irrespective of whether
the predictors are stationary or near-stationary, rendering the tests suitable
for a range of practical applications. Simulations illustrate the good
finite-sample properties of our tests. Two empirical applications concerning
equity premium and systemic risk forecasting models show the usefulness of the
tests.

arXiv link: http://arxiv.org/abs/2410.05861v1

Econometrics arXiv updated paper (originally submitted: 2024-10-08)

The Transmission of Monetary Policy via Common Cycles in the Euro Area

Authors: Lukas Berend, Jan Prüser

We use a FAVAR model with proxy variables and sign restrictions to
investigate the role of the euro area's common output and inflation cycles in
the transmission of monetary policy shocks. Our findings indicate that common
cycles explain most of the variation in output and inflation across member
countries. However, Southern European economies exhibit a notable divergence
from these cycles in the aftermath of the financial crisis. Building on this
evidence, we demonstrate that monetary policy is homogeneously propagated to
member countries via the common cycles. In contrast, country-specific
transmission channels lead to heterogeneous country responses to monetary
policy shocks. Consequently, our empirical results suggest that the divergent
effects of ECB monetary policy are attributable to heterogeneous
country-specific exposures to financial markets, rather than to
dis-synchronized economies within the euro area.

arXiv link: http://arxiv.org/abs/2410.05741v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-10-08

Identification and estimation for matrix time series CP-factor models

Authors: Jinyuan Chang, Yue Du, Guanglin Huang, Qiwei Yao

We propose a new method for identifying and estimating the CP-factor models
for matrix time series. Unlike the generalized eigenanalysis-based method of
Chang et al. (2023) for which the convergence rates of the associated
estimators may suffer from small eigengaps as the asymptotic theory is based on
some matrix perturbation analysis, the proposed new method enjoys faster
convergence rates which are free from any eigengaps. It achieves this by
turning the problem into a joint diagonalization of several matrices whose
elements are determined by a basis of a linear system, and by choosing the
basis carefully to avoid near co-linearity (see Proposition 5 and Section 4.3).
Furthermore, unlike Chang et al. (2023) which requires the two factor loading
matrices to be full-ranked, the proposed new method can handle rank-deficient
factor loading matrices. Illustration with both simulated and real matrix time
series data shows the advantages of the proposed new method.

arXiv link: http://arxiv.org/abs/2410.05634v3

Econometrics arXiv paper, submitted: 2024-10-08

Navigating Inflation in Ghana: How Can Machine Learning Enhance Economic Stability and Growth Strategies

Authors: Theophilus G. Baidoo, Ashley Obeng

Inflation remains a persistent challenge for many African countries. This
research investigates the critical role of machine learning (ML) in
understanding and managing inflation in Ghana, emphasizing its significance for
the country's economic stability and growth. Utilizing a comprehensive dataset
spanning from 2010 to 2022, the study aims to employ advanced ML models,
particularly those adept in time series forecasting, to predict future
inflation trends. The methodology is designed to provide accurate and reliable
inflation forecasts, offering valuable insights for policymakers and advocating
for a shift towards data-driven approaches in economic decision-making. This
study aims to significantly advance the academic field of economic analysis by
applying machine learning (ML) and offering practical guidance for integrating
advanced technological tools into economic governance, ultimately demonstrating
ML's potential to enhance Ghana's economic resilience and support sustainable
development through effective inflation management.

arXiv link: http://arxiv.org/abs/2410.05630v1

Econometrics arXiv paper, submitted: 2024-10-07

$\texttt{rdid}$ and $\texttt{rdidstag}$: Stata commands for robust difference-in-differences

Authors: Kyunghoon Ban, Désiré Kédagni

This article provides a Stata package for the implementation of the robust
difference-in-differences (RDID) method developed in Ban and K\'edagni (2023).
It contains three main commands: $rdid$, $rdid_dy$,
$rdidstag$, which we describe in the introduction and the main text.
We illustrate these commands through simulations and empirical examples.

arXiv link: http://arxiv.org/abs/2410.05212v1

Econometrics arXiv paper, submitted: 2024-10-07

Large datasets for the Euro Area and its member countries and the dynamic effects of the common monetary policy

Authors: Matteo Barigozzi, Claudio Lissona, Lorenzo Tonni

We present and describe a new publicly available large dataset which
encompasses quarterly and monthly macroeconomic time series for both the Euro
Area (EA) as a whole and its ten primary member countries. The dataset, which
is called EA-MD-QD, includes more than 800 time series and spans the period
from January 2000 to the latest available month. Since January 2024 EA-MD-QD is
updated on a monthly basis and constantly revised, making it an essential
resource for conducting policy analysis related to economic outcomes in the EA.
To illustrate the usefulness of EA-MD-QD, we study the country specific Impulse
Responses of the EA wide monetary policy shock by means of the Common Component
VAR plus either Instrumental Variables or Sign Restrictions identification
schemes. The results reveal asymmetries in the transmission of the monetary
policy shock across countries, particularly between core and peripheral
countries. Additionally, we find comovements across Euro Area countries'
business cycles to be driven mostly by real variables, compared to nominal
ones.

arXiv link: http://arxiv.org/abs/2410.05082v1

Econometrics arXiv updated paper (originally submitted: 2024-10-07)

Democratizing Strategic Planning in Master-Planned Communities

Authors: Christopher K. Allsup, Irene S. Gabashvili

This paper introduces a strategic planning tool for master-planned
communities designed specifically to quantify residents' subjective preferences
about large investments in amenities and infrastructure projects. Drawing on
data obtained from brief online surveys, the tool ranks alternative plans by
considering the aggregate anticipated utilization of each proposed amenity and
cost sensitivity to it (or risk sensitivity for infrastructure plans). In
addition, the tool estimates the percentage of households that favor the
preferred plan and predicts whether residents would actually be willing to fund
the project. The mathematical underpinnings of the tool are borrowed from
utility theory, incorporating exponential functions to model diminishing
marginal returns on quality, cost, and risk mitigation.

arXiv link: http://arxiv.org/abs/2410.04676v2

Econometrics arXiv paper, submitted: 2024-10-06

A Structural Approach to Growth-at-Risk

Authors: Robert Wojciechowski

We identify the structural impulse responses of quantiles of the outcome
variable to a shock. Our estimation strategy explicitly distinguishes treatment
from control variables, allowing us to model responses of unconditional
quantiles while using controls for identification. Disentangling the effect of
adding control variables on identification versus interpretation brings our
structural quantile impulse responses conceptually closer to structural mean
impulse responses. Applying our methodology to study the impact of financial
shocks on lower quantiles of output growth confirms that financial shocks have
an outsized effect on growth-at-risk, but the magnitude of our estimates is
more extreme than in previous studies.

arXiv link: http://arxiv.org/abs/2410.04431v1

Econometrics arXiv paper, submitted: 2024-10-06

Inference in High-Dimensional Linear Projections: Multi-Horizon Granger Causality and Network Connectedness

Authors: Eugene Dettaa, Endong Wang

This paper presents a Wald test for multi-horizon Granger causality within a
high-dimensional sparse Vector Autoregression (VAR) framework. The null
hypothesis focuses on the causal coefficients of interest in a local projection
(LP) at a given horizon. Nevertheless, the post-double-selection method on LP
may not be applicable in this context, as a sparse VAR model does not
necessarily imply a sparse LP for horizon h>1. To validate the proposed test,
we develop two types of de-biased estimators for the causal coefficients of
interest, both relying on first-step machine learning estimators of the VAR
slope parameters. The first estimator is derived from the Least Squares method,
while the second is obtained through a two-stage approach that offers potential
efficiency gains. We further derive heteroskedasticity- and
autocorrelation-consistent (HAC) inference for each estimator. Additionally, we
propose a robust inference method for the two-stage estimator, eliminating the
need to correct for serial correlation in the projection residuals. Monte Carlo
simulations show that the two-stage estimator with robust inference outperforms
the Least Squares method in terms of the Wald test size, particularly for
longer projection horizons. We apply our methodology to analyze the
interconnectedness of policy-related economic uncertainty among a large set of
countries in both the short and long run. Specifically, we construct a causal
network to visualize how economic uncertainty spreads across countries over
time. Our empirical findings reveal, among other insights, that in the short
run (1 and 3 months), the U.S. influences China, while in the long run (9 and
12 months), China influences the U.S. Identifying these connections can help
anticipate a country's potential vulnerabilities and propose proactive
solutions to mitigate the transmission of economic uncertainty.

arXiv link: http://arxiv.org/abs/2410.04330v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-10-05

How to Compare Copula Forecasts?

Authors: Tobias Fissler, Yannick Hoga

This paper lays out a principled approach to compare copula forecasts via
strictly consistent scores. We first establish the negative result that, in
general, copulas fail to be elicitable, implying that copula predictions cannot
sensibly be compared on their own. A notable exception is on Fr\'echet classes,
that is, when the marginal distribution structure is given and fixed, in which
case we give suitable scores for the copula forecast comparison. As a remedy
for the general non-elicitability of copulas, we establish novel
multi-objective scores for copula forecast along with marginal forecasts. They
give rise to two-step tests of equal or superior predictive ability which admit
attribution of the forecast ranking to the accuracy of the copulas or the
marginals. Simulations show that our two-step tests work well in terms of size
and power. We illustrate our new methodology via an empirical example using
copula forecasts for international stock market indices.

arXiv link: http://arxiv.org/abs/2410.04165v1

Econometrics arXiv cross-link from q-fin.CP (q-fin.CP), submitted: 2024-10-04

A Dynamic Approach to Stock Price Prediction: Comparing RNN and Mixture of Experts Models Across Different Volatility Profiles

Authors: Diego Vallarino

This study evaluates the effectiveness of a Mixture of Experts (MoE) model
for stock price prediction by comparing it to a Recurrent Neural Network (RNN)
and a linear regression model. The MoE framework combines an RNN for volatile
stocks and a linear model for stable stocks, dynamically adjusting the weight
of each model through a gating network. Results indicate that the MoE approach
significantly improves predictive accuracy across different volatility
profiles. The RNN effectively captures non-linear patterns for volatile
companies but tends to overfit stable data, whereas the linear model performs
well for predictable trends. The MoE model's adaptability allows it to
outperform each individual model, reducing errors such as Mean Squared Error
(MSE) and Mean Absolute Error (MAE). Future work should focus on enhancing the
gating mechanism and validating the model with real-world datasets to optimize
its practical applicability.

arXiv link: http://arxiv.org/abs/2410.07234v1

Econometrics arXiv updated paper (originally submitted: 2024-10-04)

A new GARCH model with a deterministic time-varying intercept

Authors: Niklas Ahlgren, Alexander Back, Timo Teräsvirta

It is common for long financial time series to exhibit gradual change in the
unconditional volatility. We propose a new model that captures this type of
nonstationarity in a parsimonious way. The model augments the volatility
equation of a standard GARCH model by a deterministic time-varying intercept.
It captures structural change that slowly affects the amplitude of a time
series while keeping the short-run dynamics constant. We parameterize the
intercept as a linear combination of logistic transition functions. We show
that the model can be derived from a multiplicative decomposition of volatility
and preserves the financial motivation of variance decomposition. We use the
theory of locally stationary processes to show that the quasi maximum
likelihood estimator (QMLE) of the parameters of the model is consistent and
asymptotically normally distributed. We examine the quality of the asymptotic
approximation in a small simulation study. An empirical application to Oracle
Corporation stock returns demonstrates the usefulness of the model. We find
that the persistence implied by the GARCH parameter estimates is reduced by
including a time-varying intercept in the volatility equation.

arXiv link: http://arxiv.org/abs/2410.03239v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-10-02

Smaller Confidence Intervals From IPW Estimators via Data-Dependent Coarsening

Authors: Alkis Kalavasis, Anay Mehrotra, Manolis Zampetakis

Inverse propensity-score weighted (IPW) estimators are prevalent in causal
inference for estimating average treatment effects in observational studies.
Under unconfoundedness, given accurate propensity scores and $n$ samples, the
size of confidence intervals of IPW estimators scales down with $n$, and,
several of their variants improve the rate of scaling. However, neither IPW
estimators nor their variants are robust to inaccuracies: even if a single
covariate has an $\varepsilon>0$ additive error in the propensity score, the
size of confidence intervals of these estimators can increase arbitrarily.
Moreover, even without errors, the rate with which the confidence intervals of
these estimators go to zero with $n$ can be arbitrarily slow in the presence of
extreme propensity scores (those close to 0 or 1).
We introduce a family of Coarse IPW (CIPW) estimators that captures existing
IPW estimators and their variants. Each CIPW estimator is an IPW estimator on a
coarsened covariate space, where certain covariates are merged. Under mild
assumptions, e.g., Lipschitzness in expected outcomes and sparsity of extreme
propensity scores, we give an efficient algorithm to find a robust estimator:
given $\varepsilon$-inaccurate propensity scores and $n$ samples, its
confidence interval size scales with $\varepsilon+1/n$. In contrast,
under the same assumptions, existing estimators' confidence interval sizes are
$\Omega(1)$ irrespective of $\varepsilon$ and $n$. Crucially, our estimator is
data-dependent and we show that no data-independent CIPW estimator can be
robust to inaccuracies.

arXiv link: http://arxiv.org/abs/2410.01658v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2024-10-02

Transformers Handle Endogeneity in In-Context Linear Regression

Authors: Haodong Liang, Krishnakumar Balasubramanian, Lifeng Lai

We explore the capability of transformers to address endogeneity in
in-context linear regression. Our main finding is that transformers inherently
possess a mechanism to handle endogeneity effectively using instrumental
variables (IV). First, we demonstrate that the transformer architecture can
emulate a gradient-based bi-level optimization procedure that converges to the
widely used two-stage least squares $(2SLS)$ solution at an
exponential rate. Next, we propose an in-context pretraining scheme and provide
theoretical guarantees showing that the global minimizer of the pre-training
loss achieves a small excess loss. Our extensive experiments validate these
theoretical findings, showing that the trained transformer provides more robust
and reliable in-context predictions and coefficient estimates than the
$2SLS$ method, in the presence of endogeneity.

arXiv link: http://arxiv.org/abs/2410.01265v3

Econometrics arXiv paper, submitted: 2024-10-02

Forecasting short-term inflation in Argentina with Random Forest Models

Authors: Federico Daniel Forte

This paper examines the performance of Random Forest models in forecasting
short-term monthly inflation in Argentina, based on a database of monthly
indicators since 1962. It is found that these models achieve forecast accuracy
that is statistically comparable to the consensus of market analysts'
expectations surveyed by the Central Bank of Argentina (BCRA) and to
traditional econometric models. One advantage of Random Forest models is that,
as they are non-parametric, they allow for the exploration of nonlinear effects
in the predictive power of certain macroeconomic variables on inflation. Among
other findings, the relative importance of the exchange rate gap in forecasting
inflation increases when the gap between the parallel and official exchange
rates exceeds 60%. The predictive power of the exchange rate on inflation rises
when the BCRA's net international reserves are negative or close to zero
(specifically, below USD 2 billion). The relative importance of inflation
inertia and the nominal interest rate in forecasting the following month's
inflation increases when the nominal levels of inflation and/or interest rates
rise.

arXiv link: http://arxiv.org/abs/2410.01175v1

Econometrics arXiv updated paper (originally submitted: 2024-10-02)

Partially Identified Heterogeneous Treatment Effect with Selection: An Application to Gender Gaps

Authors: Xiaolin Sun, Xueyan Zhao, D. S. Poskitt

This paper addresses the sample selection model within the context of the
gender gap problem, where even random treatment assignment is affected by
selection bias. By offering a robust alternative free from distributional or
specification assumptions, we bound the treatment effect under the sample
selection model with an exclusion restriction, an assumption whose validity is
tested in the literature. This exclusion restriction allows for further
segmentation of the population into distinct types based on observed and
unobserved characteristics. For each type, we derive the proportions and bound
the gender gap accordingly. Notably, trends in type proportions and gender gap
bounds reveal an increasing proportion of always-working individuals over time,
alongside variations in bounds, including a general decline across time and
consistently higher bounds for those in high-potential wage groups. Further
analysis, considering additional assumptions, highlights persistent gender gaps
for some types, while other types exhibit differing or inconclusive trends.
This underscores the necessity of separating individuals by type to understand
the heterogeneous nature of the gender gap.

arXiv link: http://arxiv.org/abs/2410.01159v2

Econometrics arXiv paper, submitted: 2024-10-01

A Nonparametric Test of Heterogeneous Treatment Effects under Interference

Authors: Julius Owusu

Statistical inference of heterogeneous treatment effects (HTEs) across
predefined subgroups is challenging when units interact because treatment
effects may vary by pre-treatment variables, post-treatment exposure variables
(that measure the exposure to other units' treatment statuses), or both. Thus,
the conventional HTEs testing procedures may be invalid under interference. In
this paper, I develop statistical methods to infer HTEs and disentangle the
drivers of treatment effects heterogeneity in populations where units interact.
Specifically, I incorporate clustered interference into the potential outcomes
model and propose kernel-based test statistics for the null hypotheses of (i)
no HTEs by treatment assignment (or post-treatment exposure variables) for all
pre-treatment variables values and (ii) no HTEs by pre-treatment variables for
all treatment assignment vectors. I recommend a multiple-testing algorithm to
disentangle the source of heterogeneity in treatment effects. I prove the
asymptotic properties of the proposed test statistics. Finally, I illustrate
the application of the test procedures in an empirical setting using an
experimental data set from a Chinese weather insurance program.

arXiv link: http://arxiv.org/abs/2410.00733v1

Econometrics arXiv updated paper (originally submitted: 2024-09-30)

Inference for the Marginal Value of Public Funds

Authors: Vedant Vohra

Economists often estimate causal effects of policies on multiple outcomes and
summarize them into scalar measures of cost-effectiveness or welfare, such as
the Marginal Value of Public Funds (MVPF). In many settings, microdata
underlying these estimates are unavailable, leaving researchers with only
published estimates and their standard errors. We develop tools for valid
inference on functions of causal effects, such as the MVPF, when the
correlation structure is unknown. Our approach is to construct worst-case
confidence intervals, leveraging experimental designs to tighten them, and to
assess robustness using breakdown analyses. We illustrate our method with MVPFs
for eight policies.

arXiv link: http://arxiv.org/abs/2410.00217v3

Econometrics arXiv updated paper (originally submitted: 2024-09-30)

New Tests of Equal Forecast Accuracy for Factor-Augmented Regressions with Weaker Loadings

Authors: Luca Margaritella, Ovidijus Stauskas

We provide the theoretical foundation for the recent tests of equal forecast
accuracy and encompassing by Pitarakis (2023) and Pitarakis (2025), when the
competing forecast specification is that of a factor-augmented regression
model. This should be of interest for practitioners, as there is no theory
justifying the use of these simple and powerful tests in such context. In
pursuit of this, we employ a novel theory to incorporate the empirically
well-documented fact of homogeneously/heterogeneously weak factor loadings, and
track their effect on the forecast comparison problem.

arXiv link: http://arxiv.org/abs/2409.20415v3

Econometrics arXiv paper, submitted: 2024-09-30

Synthetic Difference in Differences for Repeated Cross-Sectional Data

Authors: Yoann Morin

The synthetic difference-in-differences method provides an efficient method
to estimate a causal effect with a latent factor model. However, it relies on
the use of panel data. This paper presents an adaptation of the synthetic
difference-in-differences method for repeated cross-sectional data. The
treatment is considered to be at the group level so that it is possible to
aggregate data by group to compute the two types of synthetic
difference-in-differences weights on these aggregated data. Then, I develop and
compute a third type of weight that accounts for the different number of
observations in each cross-section. Simulation results show that the
performance of the synthetic difference-in-differences estimator is improved
when using the third type of weights on repeated cross-sectional data.

arXiv link: http://arxiv.org/abs/2409.20199v1

Econometrics arXiv paper, submitted: 2024-09-28

Factors in Fashion: Factor Analysis towards the Mode

Authors: Zhe Sun, Yundong Tu

The modal factor model represents a new factor model for dimension reduction
in high dimensional panel data. Unlike the approximate factor model that
targets for the mean factors, it captures factors that influence the
conditional mode of the distribution of the observables. Statistical inference
is developed with the aid of mode estimation, where the modal factors and the
loadings are estimated through maximizing a kernel-type objective function. An
easy-to-implement alternating maximization algorithm is designed to obtain the
estimators numerically. Two model selection criteria are further proposed to
determine the number of factors. The asymptotic properties of the proposed
estimators are established under some regularity conditions. Simulations
demonstrate the nice finite sample performance of our proposed estimators, even
in the presence of heavy-tailed and asymmetric idiosyncratic error
distributions. Finally, the application to inflation forecasting illustrates
the practical merits of modal factors.

arXiv link: http://arxiv.org/abs/2409.19287v1

Econometrics arXiv paper, submitted: 2024-09-24

Large Bayesian Tensor VARs with Stochastic Volatility

Authors: Joshua C. C. Chan, Yaling Qi

We consider Bayesian tensor vector autoregressions (TVARs) in which the VAR
coefficients are arranged as a three-dimensional array or tensor, and this
coefficient tensor is parameterized using a low-rank CP decomposition. We
develop a family of TVARs using a general stochastic volatility specification,
which includes a wide variety of commonly-used multivariate stochastic
volatility and COVID-19 outlier-augmented models. In a forecasting exercise
involving 40 US quarterly variables, we show that these TVARs outperform the
standard Bayesian VAR with the Minnesota prior. The results also suggest that
the parsimonious common stochastic volatility model tends to forecast better
than the more flexible Cholesky stochastic volatility model.

arXiv link: http://arxiv.org/abs/2409.16132v1

Econometrics arXiv paper, submitted: 2024-09-23

Identifying Elasticities in Autocorrelated Time Series Using Causal Graphs

Authors: Silvana Tiedemann, Jorge Sanchez Canales, Felix Schur, Raffaele Sgarlato, Lion Hirth, Oliver Ruhnau, Jonas Peters

The price elasticity of demand can be estimated from observational data using
instrumental variables (IV). However, naive IV estimators may be inconsistent
in settings with autocorrelated time series. We argue that causal time graphs
can simplify IV identification and help select consistent estimators. To do so,
we propose to first model the equilibrium condition by an unobserved
confounder, deriving a directed acyclic graph (DAG) while maintaining the
assumption of a simultaneous determination of prices and quantities. We then
exploit recent advances in graphical inference to derive valid IV estimators,
including estimators that achieve consistency by simultaneously estimating
nuisance effects. We further argue that observing significant differences
between the estimates of presumably valid estimators can help to reject false
model assumptions, thereby improving our understanding of underlying economic
dynamics. We apply this approach to the German electricity market, estimating
the price elasticity of demand on simulated and real-world data. The findings
underscore the importance of accounting for structural autocorrelation in
IV-based analysis.

arXiv link: http://arxiv.org/abs/2409.15530v1

Econometrics arXiv updated paper (originally submitted: 2024-09-23)

Non-linear dependence and Granger causality: A vine copula approach

Authors: Roberto Fuentes-Martínez, Irene Crimaldi, Armando Rungi

Inspired by Jang et al. (2022), we propose a Granger causality-in-the-mean
test for bivariate $k-$Markov stationary processes based on a recently
introduced class of non-linear models, i.e., vine copula models. By means of a
simulation study, we show that the proposed test improves on the statistical
properties of the original test in Jang et al. (2022), and also of other
previous methods, constituting an excellent tool for testing Granger causality
in the presence of non-linear dependence structures. Finally, we apply our test
to study the pairwise relationships between energy consumption, GDP and
investment in the U.S. and, notably, we find that Granger-causality runs two
ways between GDP and energy consumption.

arXiv link: http://arxiv.org/abs/2409.15070v2

Econometrics arXiv updated paper (originally submitted: 2024-09-23)

Inequality Sensitive Optimal Treatment Assignment

Authors: Eduardo Zambrano

The egalitarian equivalent, $ee$, of a societal distribution of outcomes with
mean $m$ is the outcome level such that the evaluator is indifferent between
the distribution of outcomes and a society in which everyone obtains an outcome
of $ee$. For an inequality averse evaluator, $ee < m$. In this paper, I extend
the optimal treatment choice framework in Manski (2024) to the case where the
welfare evaluation is made using egalitarian equivalent measures, and derive
optimal treatment rules for the Bayesian, maximin and minimax regret inequality
averse evaluators. I illustrate how the methodology operates in the context of
the JobCorps education and training program for disadvantaged youth (Schochet,
Burghardt, and McConnell 2008) and in Meager (2022)'s Bayesian meta analysis of
the microcredit literature.

arXiv link: http://arxiv.org/abs/2409.14776v2

Econometrics arXiv cross-link from math.PR (math.PR), submitted: 2024-09-23

The continuous-time limit of quasi score-driven volatility models

Authors: Yinhao Wu, Ping He

This paper explores the continuous-time limit of a class of Quasi
Score-Driven (QSD) models that characterize volatility. As the sampling
frequency increases and the time interval tends to zero, the model weakly
converges to a continuous-time stochastic volatility model where the two
Brownian motions are correlated, thereby capturing the leverage effect in the
market. Subsequently, we identify that a necessary condition for non-degenerate
correlation is that the distribution of driving innovations differs from that
of computing score, and at least one being asymmetric. We then illustrate this
with two typical examples. As an application, the QSD model is used as an
approximation for correlated stochastic volatility diffusions and quasi maximum
likelihood estimation is performed. Simulation results confirm the method's
effectiveness, particularly in estimating the correlation coefficient.

arXiv link: http://arxiv.org/abs/2409.14734v2

Econometrics arXiv updated paper (originally submitted: 2024-09-21)

Mining Causality: AI-Assisted Search for Instrumental Variables

Authors: Sukjin Han

The instrumental variables (IVs) method is a leading empirical strategy for
causal inference. Finding IVs is a heuristic and creative process, and
justifying its validity -- especially exclusion restrictions -- is largely
rhetorical. We propose using large language models (LLMs) to search for new IVs
through narratives and counterfactual reasoning, similar to how a human
researcher would. The stark difference, however, is that LLMs can dramatically
accelerate this process and explore an extremely large search space. We
demonstrate how to construct prompts to search for potentially valid IVs. We
contend that multi-step and role-playing prompting strategies are effective for
simulating the endogenous decision-making processes of economic agents and for
navigating language models through the realm of real-world scenarios, rather
than anchoring them within the narrow realm of academic discourses on IVs. We
apply our method to three well-known examples in economics: returns to
schooling, supply and demand, and peer effects. We then extend our strategy to
finding (i) control variables in regression and difference-in-differences and
(ii) running variables in regression discontinuity designs.

arXiv link: http://arxiv.org/abs/2409.14202v3

Econometrics arXiv paper, submitted: 2024-09-20

A simple but powerful tail index regression

Authors: João Nicolau, Paulo M. M. Rodrigues

This paper introduces a flexible framework for the estimation of the
conditional tail index of heavy tailed distributions. In this framework, the
tail index is computed from an auxiliary linear regression model that
facilitates estimation and inference based on established econometric methods,
such as ordinary least squares (OLS), least absolute deviations, or
M-estimation. We show theoretically and via simulations that OLS provides
interesting results. Our Monte Carlo results highlight the adequate finite
sample properties of the OLS tail index estimator computed from the proposed
new framework and contrast its behavior to that of tail index estimates
obtained by maximum likelihood estimation of exponential regression models,
which is one of the approaches currently in use in the literature. An empirical
analysis of the impact of determinants of the conditional left- and right-tail
indexes of commodities' return distributions highlights the empirical relevance
of our proposed approach. The novel framework's flexibility allows for
extensions and generalizations in various directions, empowering researchers
and practitioners to straightforwardly explore a wide range of research
questions.

arXiv link: http://arxiv.org/abs/2409.13531v1

Econometrics arXiv paper, submitted: 2024-09-20

Dynamic tail risk forecasting: what do realized skewness and kurtosis add?

Authors: Giampiero Gallo, Ostap Okhrin, Giuseppe Storti

This paper compares the accuracy of tail risk forecasts with a focus on
including realized skewness and kurtosis in "additive" and "multiplicative"
models. Utilizing a panel of 960 US stocks, we conduct diagnostic tests, employ
scoring functions, and implement rolling window forecasting to evaluate the
performance of Value at Risk (VaR) and Expected Shortfall (ES) forecasts.
Additionally, we examine the impact of the window length on forecast accuracy.
We propose model specifications that incorporate realized skewness and kurtosis
for enhanced precision. Our findings provide insights into the importance of
considering skewness and kurtosis in tail risk modeling, contributing to the
existing literature and offering practical implications for risk practitioners
and researchers.

arXiv link: http://arxiv.org/abs/2409.13516v1

Econometrics arXiv paper, submitted: 2024-09-19

Testing for equal predictive accuracy with strong dependence

Authors: Laura Coroneo, Fabrizio Iacone

We analyse the properties of the Diebold and Mariano (1995) test in the
presence of autocorrelation in the loss differential. We show that the power of
the Diebold and Mariano (1995) test decreases as the dependence increases,
making it more difficult to obtain statistically significant evidence of
superior predictive ability against less accurate benchmarks. We also find
that, after a certain threshold, the test has no power and the correct null
hypothesis is spuriously rejected. Taken together, these results caution to
seriously consider the dependence properties of the loss differential before
the application of the Diebold and Mariano (1995) test.

arXiv link: http://arxiv.org/abs/2409.12662v1

Econometrics arXiv paper, submitted: 2024-09-19

Parameters on the boundary in predictive regression

Authors: Giuseppe Cavaliere, Iliyan Georgiev, Edoardo Zanelli

We consider bootstrap inference in predictive (or Granger-causality)
regressions when the parameter of interest may lie on the boundary of the
parameter space, here defined by means of a smooth inequality constraint. For
instance, this situation occurs when the definition of the parameter space
allows for the cases of either no predictability or sign-restricted
predictability. We show that in this context constrained estimation gives rise
to bootstrap statistics whose limit distribution is, in general, random, and
thus distinct from the limit null distribution of the original statistics of
interest. This is due to both (i) the possible location of the true parameter
vector on the boundary of the parameter space, and (ii) the possible
non-stationarity of the posited predicting (resp. Granger-causing) variable. We
discuss a modification of the standard fixed-regressor wild bootstrap scheme
where the bootstrap parameter space is shifted by a data-dependent function in
order to eliminate the portion of limiting bootstrap randomness attributable to
the boundary, and prove validity of the associated bootstrap inference under
non-stationarity of the predicting variable as the only remaining source of
limiting bootstrap randomness. Our approach, which is initially presented in a
simple location model, has bearing on inference in parameter-on-the-boundary
situations beyond the predictive regression problem.

arXiv link: http://arxiv.org/abs/2409.12611v1

Econometrics arXiv paper, submitted: 2024-09-19

Robust Bond Risk Premia Predictability Test in the Quantiles

Authors: Xiaosai Liao, Xinjue Li, Qingliang Fan

Different from existing literature on testing the macro-spanning hypothesis
of bond risk premia, which only considers mean regressions, this paper
investigates whether the yield curve represented by CP factor (Cochrane and
Piazzesi, 2005) contains all available information about future bond returns in
a predictive quantile regression with many other macroeconomic variables. In
this study, we introduce the Trend in Debt Holding (TDH) as a novel predictor,
testing it alongside established macro indicators such as Trend Inflation (TI)
(Cieslak and Povala, 2015), and macro factors from Ludvigson and Ng (2009). A
significant challenge in this study is the invalidity of traditional quantile
model inference approaches, given the high persistence of many macro variables
involved. Furthermore, the existing methods addressing this issue do not
perform well in the marginal test with many highly persistent predictors. Thus,
we suggest a robust inference approach, whose size and power performance are
shown to be better than existing tests. Using data from 1980-2022, the
macro-spanning hypothesis is strongly supported at center quantiles by the
empirical finding that the CP factor has predictive power while all other macro
variables have negligible predictive power in this case. On the other hand, the
evidence against the macro-spanning hypothesis is found at tail quantiles, in
which TDH has predictive power at right tail quantiles while TI has predictive
power at both tails quantiles. Finally, we show the performance of in-sample
and out-of-sample predictions implemented by the proposed method are better
than existing methods.

arXiv link: http://arxiv.org/abs/2410.03557v1

Econometrics arXiv updated paper (originally submitted: 2024-09-18)

A Way to Synthetic Triple Difference

Authors: Castiel Chen Zhuang

This paper discusses a practical approach that combines synthetic control
with triple difference to address violations of the parallel trends assumption.
By transforming triple difference into a DID structure, we can apply synthetic
control to a triple-difference framework, enabling more robust estimates when
parallel trends are violated across multiple dimensions. The proposed procedure
is applied to a real-world dataset to illustrate when and how we should apply
this practice, while cautions are presented afterwards. This method contributes
to improving causal inference in policy evaluations and offers a valuable tool
for researchers dealing with heterogeneous treatment effects across subgroups.

arXiv link: http://arxiv.org/abs/2409.12353v2

Econometrics arXiv paper, submitted: 2024-09-17

Simple robust two-stage estimation and inference for generalized impulse responses and multi-horizon causality

Authors: Jean-Marie Dufour, Endong Wang

This paper introduces a novel two-stage estimation and inference procedure
for generalized impulse responses (GIRs). GIRs encompass all coefficients in a
multi-horizon linear projection model of future outcomes of y on lagged values
(Dufour and Renault, 1998), which include the Sims' impulse response. The
conventional use of Least Squares (LS) with heteroskedasticity- and
autocorrelation-consistent covariance estimation is less precise and often
results in unreliable finite sample tests, further complicated by the selection
of bandwidth and kernel functions. Our two-stage method surpasses the LS
approach in terms of estimation efficiency and inference robustness. The
robustness stems from our proposed covariance matrix estimates, which eliminate
the need to correct for serial correlation in the multi-horizon projection
residuals. Our method accommodates non-stationary data and allows the
projection horizon to grow with sample size. Monte Carlo simulations
demonstrate our two-stage method outperforms the LS method. We apply the
two-stage method to investigate the GIRs, implement multi-horizon Granger
causality test, and find that economic uncertainty exerts both short-run (1-3
months) and long-run (30 months) effects on economic activities.

arXiv link: http://arxiv.org/abs/2409.10820v1

Econometrics arXiv paper, submitted: 2024-09-16

GPT takes the SAT: Tracing changes in Test Difficulty and Math Performance of Students

Authors: Vikram Krishnaveti, Saannidhya Rawat

Scholastic Aptitude Test (SAT) is crucial for college admissions but its
effectiveness and relevance are increasingly questioned. This paper enhances
Synthetic Control methods by introducing "Transformed Control", a novel method
that employs Large Language Models (LLMs) powered by Artificial Intelligence to
generate control groups. We utilize OpenAI's API to generate a control group
where GPT-4, or ChatGPT, takes multiple SATs annually from 2008 to 2023. This
control group helps analyze shifts in SAT math difficulty over time, starting
from the baseline year of 2008. Using parallel trends, we calculate the Average
Difference in Scores (ADS) to assess changes in high school students' math
performance. Our results indicate a significant decrease in the difficulty of
the SAT math section over time, alongside a decline in students' math
performance. The analysis shows a 71-point drop in the rigor of SAT math from
2008 to 2023, with student performance decreasing by 36 points, resulting in a
107-point total divergence in average student math performance. We investigate
possible mechanisms for this decline in math proficiency, such as changing
university selection criteria, increased screen time, grade inflation, and
worsening adolescent mental health. Disparities among demographic groups show a
104-point drop for White students, 84 points for Black students, and 53 points
for Asian students. Male students saw a 117-point reduction, while female
students had a 100-point decrease.

arXiv link: http://arxiv.org/abs/2409.10750v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-09-16

Why you should also use OLS estimation of tail exponents

Authors: Thiago Trafane Oliveira Santos, Daniel Oliveira Cajueiro

Even though practitioners often estimate Pareto exponents running OLS
rank-size regressions, the usual recommendation is to use the Hill MLE with a
small-sample correction instead, due to its unbiasedness and efficiency. In
this paper, we advocate that you should also apply OLS in empirical
applications. On the one hand, we demonstrate that, with a small-sample
correction, the OLS estimator is also unbiased. On the other hand, we show that
the MLE assigns significantly greater weight to smaller observations. This
suggests that the OLS estimator may outperform the MLE in cases where the
distribution is (i) strictly Pareto but only in the upper tail or (ii)
regularly varying rather than strictly Pareto. We substantiate our theoretical
findings with Monte Carlo simulations and real-world applications,
demonstrating the practical relevance of the OLS method in estimating tail
exponents.

arXiv link: http://arxiv.org/abs/2409.10448v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-09-16

Econometric Inference for High Dimensional Predictive Regressions

Authors: Zhan Gao, Ji Hyung Lee, Ziwei Mei, Zhentao Shi

LASSO introduces shrinkage bias into estimated coefficients, which can
adversely affect the desirable asymptotic normality and invalidate the standard
inferential procedure based on the $t$-statistic. The desparsified LASSO has
emerged as a well-known remedy for this issue. In the context of high
dimensional predictive regression, the desparsified LASSO faces an additional
challenge: the Stambaugh bias arising from nonstationary regressors. To restore
the standard inferential procedure, we propose a novel estimator called
IVX-desparsified LASSO (XDlasso). XDlasso eliminates the shrinkage bias and the
Stambaugh bias simultaneously and does not require prior knowledge about the
identities of nonstationary and stationary regressors. We establish the
asymptotic properties of XDlasso for hypothesis testing, and our theoretical
findings are supported by Monte Carlo simulations. Applying our method to
real-world applications from the FRED-MD database -- which includes a rich set
of control variables -- we investigate two important empirical questions: (i)
the predictability of the U.S. stock returns based on the earnings-price ratio,
and (ii) the predictability of the U.S. inflation using the unemployment rate.

arXiv link: http://arxiv.org/abs/2409.10030v2

Econometrics arXiv paper, submitted: 2024-09-16

A Simple and Adaptive Confidence Interval when Nuisance Parameters Satisfy an Inequality

Authors: Gregory Fletcher Cox

Inequalities may appear in many models. They can be as simple as assuming a
parameter is nonnegative, possibly a regression coefficient or a treatment
effect. This paper focuses on the case that there is only one inequality and
proposes a confidence interval that is particularly attractive, called the
inequality-imposed confidence interval (IICI). The IICI is simple. It does not
require simulations or tuning parameters. The IICI is adaptive. It reduces to
the usual confidence interval (calculated by adding and subtracting the
standard error times the $1 - \alpha/2$ standard normal quantile) when the
inequality is sufficiently slack. When the inequality is sufficiently violated,
the IICI reduces to an equality-imposed confidence interval (the usual
confidence interval for the submodel where the inequality holds with equality).
Also, the IICI is uniformly valid and has (weakly) shorter length than the
usual confidence interval; it is never longer. The first empirical application
considers a linear regression when a coefficient is known to be nonpositive. A
second empirical application considers an instrumental variables regression
when the endogeneity of a regressor is known to be nonnegative.

arXiv link: http://arxiv.org/abs/2409.09962v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-09-15

Estimating Wage Disparities Using Foundation Models

Authors: Keyon Vafa, Susan Athey, David M. Blei

The rise of foundation models marks a paradigm shift in machine learning:
instead of training specialized models from scratch, foundation models are
first trained on massive datasets before being adapted or fine-tuned to make
predictions on smaller datasets. Initially developed for text, foundation
models have also excelled at making predictions about social science data.
However, while many estimation problems in the social sciences use prediction
as an intermediate step, they ultimately require different criteria for
success. In this paper, we develop methods for fine-tuning foundation models to
perform these estimation problems. We first characterize an omitted variable
bias that can arise when a foundation model is only fine-tuned to maximize
predictive accuracy. We then provide a novel set of conditions for fine-tuning
under which estimates derived from a foundation model are root-n-consistent.
Based on this theory, we develop new fine-tuning algorithms that empirically
mitigate this omitted variable bias. To demonstrate our ideas, we study gender
wage decomposition. This is a statistical estimation problem from econometrics
where the goal is to decompose the gender wage gap into components that can and
cannot be explained by career histories of workers. Classical methods for
decomposing the wage gap employ simple predictive models of wages which
condition on coarse summaries of career history that may omit factors that are
important for explaining the gap. Instead, we use a custom-built foundation
model to decompose the gender wage gap, which captures a richer representation
of career history. Using data from the Panel Study of Income Dynamics, we find
that career history explains more of the gender wage gap than standard
econometric models can measure, and we identify elements of career history that
are omitted by standard models but are important for explaining the wage gap.

arXiv link: http://arxiv.org/abs/2409.09894v2

Econometrics arXiv paper, submitted: 2024-09-15

Structural counterfactual analysis in macroeconomics: theory and inference

Authors: Endong Wang

We propose a structural model-free methodology to analyze two types of
macroeconomic counterfactuals related to policy path deviation: hypothetical
trajectory and policy intervention. Our model-free approach is built on a
structural vector moving-average (SVMA) model that relies solely on the
identification of policy shocks, thereby eliminating the need to specify an
entire structural model. Analytical solutions are derived for the
counterfactual parameters, and statistical inference for these parameter
estimates is provided using the Delta method. By utilizing external
instruments, we introduce a projection-based method for the identification,
estimation, and inference of these parameters. This approach connects our
counterfactual analysis with the Local Projection literature. A
simulation-based approach with nonlinear model is provided to add in addressing
Lucas' critique. The innovative model-free methodology is applied in three
counterfactual studies on the U.S. monetary policy: (1) a historical scenario
analysis for a hypothetical interest rate path in the post-pandemic era, (2) a
future scenario analysis under either hawkish or dovish interest rate policy,
and (3) an evaluation of the policy intervention effect of an oil price shock
by zeroing out the systematic responses of the interest rate.

arXiv link: http://arxiv.org/abs/2409.09577v1

Econometrics arXiv updated paper (originally submitted: 2024-09-14)

Unconditional Randomization Tests for Interference

Authors: Liang Zhong

Researchers are often interested in the existence and extent of interference
between units when conducting causal inference or designing policy. However,
testing for interference presents significant econometric challenges,
particularly due to complex clustering patterns and dependencies that can
invalidate standard methods. This paper introduces the pairwise
imputation-based randomization test (PIRT), a general and robust framework for
assessing the existence and extent of interference in experimental settings.
PIRT employs unconditional randomization testing and pairwise comparisons,
enabling straightforward implementation and ensuring finite-sample validity
under minimal assumptions about network structure. The method's practical value
is demonstrated through an application to a large-scale policing experiment in
Bogota, Colombia (Blattman et al., 2021), which evaluates the effects of
hotspot policing on crime at the street segment level. The analysis reveals
that increased police patrolling in hotspots significantly displaces violent
crime, but not property crime. Simulations calibrated to this context further
underscore the power and robustness of PIRT.

arXiv link: http://arxiv.org/abs/2409.09243v3

Econometrics arXiv paper, submitted: 2024-09-13

The Clustered Dose-Response Function Estimator for continuous treatment with heterogeneous treatment effects

Authors: Cerqua Augusto, Di Stefano Roberta, Mattera Raffaele

Many treatments are non-randomly assigned, continuous in nature, and exhibit
heterogeneous effects even at identical treatment intensities. Taken together,
these characteristics pose significant challenges for identifying causal
effects, as no existing estimator can provide an unbiased estimate of the
average causal dose-response function. To address this gap, we introduce the
Clustered Dose-Response Function (Cl-DRF), a novel estimator designed to
discern the continuous causal relationships between treatment intensity and the
dependent variable across different subgroups. This approach leverages both
theoretical and data-driven sources of heterogeneity and operates under relaxed
versions of the conditional independence and positivity assumptions, which are
required to be met only within each identified subgroup. To demonstrate the
capabilities of the Cl-DRF estimator, we present both simulation evidence and
an empirical application examining the impact of European Cohesion funds on
economic growth.

arXiv link: http://arxiv.org/abs/2409.08773v1

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2024-09-12

Machine Learning and Econometric Approaches to Fiscal Policies: Understanding Industrial Investment Dynamics in Uruguay (1974-2010)

Authors: Diego Vallarino

This paper examines the impact of fiscal incentives on industrial investment
in Uruguay from 1974 to 2010. Using a mixed-method approach that combines
econometric models with machine learning techniques, the study investigates
both the short-term and long-term effects of fiscal benefits on industrial
investment. The results confirm the significant role of fiscal incentives in
driving long-term industrial growth, while also highlighting the importance of
a stable macroeconomic environment, public investment, and access to credit.
Machine learning models provide additional insights into nonlinear interactions
between fiscal benefits and other macroeconomic factors, such as exchange
rates, emphasizing the need for tailored fiscal policies. The findings have
important policy implications, suggesting that fiscal incentives, when combined
with broader economic reforms, can effectively promote industrial development
in emerging economies.

arXiv link: http://arxiv.org/abs/2410.00002v1

Econometrics arXiv updated paper (originally submitted: 2024-09-12)

Bayesian Dynamic Factor Models for High-dimensional Matrix-valued Time Series

Authors: Wei Zhang

We introduce a class of Bayesian matrix dynamic factor models that
accommodates time-varying volatility, outliers, and cross-sectional correlation
in the idiosyncratic components. For model comparison, we employ an
importance-sampling estimator of the marginal likelihood based on the
cross-entropy method to determine: (1) the optimal dimension of the factor
matrix; (2) whether a vector- or matrix-valued structure is more suitable; and
(3) whether an approximate or exact factor model is favored by the data.
Through a series of Monte Carlo experiments, we demonstrate the accuracy of the
factor estimates and the effectiveness of the marginal likelihood estimator in
correctly identifying the true model. Applications to macroeconomic and
financial datasets illustrate the model's ability to capture key features in
matrix-valued time series.

arXiv link: http://arxiv.org/abs/2409.08354v3

Econometrics arXiv updated paper (originally submitted: 2024-09-12)

Sensitivity analysis of the perturbed utility stochastic traffic equilibrium

Authors: Mogens Fosgerau, Nikolaj Nielsen, Mads Paulsen, Thomas Kjær Rasmussen, Rui Yao

This paper develops a novel sensitivity analysis framework for the perturbed
utility route choice (PURC) model and the accompanying stochastic traffic
equilibrium model. We provide general results that determine the marginal
change in link flows following a marginal change in link costs across the
network in the cases of flow-independent and flow-dependent link costs. We
derive analytical sensitivity expressions for the Jacobian of the individual
optimal PURC flow and equilibrium link flows with respect to link cost
parameters under mild differentiability assumptions. Numerical examples
illustrate the robustness of our method, demonstrating its use for estimating
equilibrium link flows after link cost shifts, identifying critical design
parameters, and quantifying uncertainty in performance predictions. The
findings have implications for network design, pricing strategies, and policy
analysis in transportation planning and economics, providing a bridge between
theoretical models and real-world applications.

arXiv link: http://arxiv.org/abs/2409.08347v2

Econometrics arXiv paper, submitted: 2024-09-12

Trends and biases in the social cost of carbon

Authors: Richard S. J. Tol

An updated and extended meta-analysis confirms that the central estimate of
the social cost of carbon is around $200/tC with a large, right-skewed
uncertainty and trending up. The pure rate of time preference and the inverse
of the elasticity of intertemporal substitution are key assumptions, the total
impact of 2.5K warming less so. The social cost of carbon is much higher if
climate change is assumed to affect economic growth rather than the level of
output and welfare. The literature is dominated by a relatively small network
of authors, based in a few countries. Publication and citation bias have pushed
the social cost of carbon up.

arXiv link: http://arxiv.org/abs/2409.08158v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-09-12

Bootstrap Adaptive Lasso Solution Path Unit Root Tests

Authors: Martin C. Arnold, Thilo Reinschlüssel

We propose sieve wild bootstrap analogues to the adaptive Lasso solution path
unit root tests of Arnold and Reinschl\"ussel (2024) arXiv:2404.06205 to
improve finite sample properties and extend their applicability to a
generalised framework, allowing for non-stationary volatility. Numerical
evidence shows the bootstrap to improve the tests' precision for error
processes that promote spurious rejections of the unit root null, depending on
the detrending procedure. The bootstrap mitigates finite-sample size
distortions and restores asymptotically valid inference when the data features
time-varying unconditional variance. We apply the bootstrap tests to real
residential property prices of the top six Eurozone economies and find evidence
of stationarity to be period-specific, supporting the conjecture that
exuberance in the housing market characterises the development of Euro-era
residential property prices in the recent past.

arXiv link: http://arxiv.org/abs/2409.07859v1

Econometrics arXiv paper, submitted: 2024-09-11

Testing for a Forecast Accuracy Breakdown under Long Memory

Authors: Jannik Kreye, Philipp Sibbertsen

We propose a test to detect a forecast accuracy breakdown in a long memory
time series and provide theoretical and simulation evidence on the memory
transfer from the time series to the forecast residuals. The proposed method
uses a double sup-Wald test against the alternative of a structural break in
the mean of an out-of-sample loss series. To address the problem of estimating
the long-run variance under long memory, a robust estimator is applied. The
corresponding breakpoint results from a long memory robust CUSUM test. The
finite sample size and power properties of the test are derived in a Monte
Carlo simulation. A monotonic power function is obtained for the fixed
forecasting scheme. In our practical application, we find that the global
energy crisis that began in 2021 led to a forecast break in European
electricity prices, while the results for the U.S. are mixed.

arXiv link: http://arxiv.org/abs/2409.07087v1

Econometrics arXiv paper, submitted: 2024-09-10

Estimation and Inference for Causal Functions with Multiway Clustered Data

Authors: Nan Liu, Yanbo Liu, Yuya Sasaki

This paper proposes methods of estimation and uniform inference for a general
class of causal functions, such as the conditional average treatment effects
and the continuous treatment effects, under multiway clustering. The causal
function is identified as a conditional expectation of an adjusted
(Neyman-orthogonal) signal that depends on high-dimensional nuisance
parameters. We propose a two-step procedure where the first step uses machine
learning to estimate the high-dimensional nuisance parameters. The second step
projects the estimated Neyman-orthogonal signal onto a dictionary of basis
functions whose dimension grows with the sample size. For this two-step
procedure, we propose both the full-sample and the multiway cross-fitting
estimation approaches. A functional limit theory is derived for these
estimators. To construct the uniform confidence bands, we develop a novel
resampling procedure, called the multiway cluster-robust sieve score bootstrap,
that extends the sieve score bootstrap (Chen and Christensen, 2018) to the
novel setting with multiway clustering. Extensive numerical simulations
showcase that our methods achieve desirable finite-sample behaviors. We apply
the proposed methods to analyze the causal relationship between mistrust levels
in Africa and the historical slave trade. Our analysis rejects the null
hypothesis of uniformly zero effects and reveals heterogeneous treatment
effects, with significant impacts at higher levels of trade volumes.

arXiv link: http://arxiv.org/abs/2409.06654v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-09-09

Enhancing Preference-based Linear Bandits via Human Response Time

Authors: Shen Li, Yuyang Zhang, Zhaolin Ren, Claire Liang, Na Li, Julie A. Shah

Interactive preference learning systems infer human preferences by presenting
queries as pairs of options and collecting binary choices. Although binary
choices are simple and widely used, they provide limited information about
preference strength. To address this, we leverage human response times, which
are inversely related to preference strength, as an additional signal. We
propose a computationally efficient method that combines choices and response
times to estimate human utility functions, grounded in the EZ diffusion model
from psychology. Theoretical and empirical analyses show that for queries with
strong preferences, response times complement choices by providing extra
information about preference strength, leading to significantly improved
utility estimation. We incorporate this estimator into preference-based linear
bandits for fixed-budget best-arm identification. Simulations on three
real-world datasets demonstrate that using response times significantly
accelerates preference learning compared to choice-only approaches. Additional
materials, such as code, slides, and talk video, are available at
https://shenlirobot.github.io/pages/NeurIPS24.html

arXiv link: http://arxiv.org/abs/2409.05798v4

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-09-09

Uniform Estimation and Inference for Nonparametric Partitioning-Based M-Estimators

Authors: Matias D. Cattaneo, Yingjie Feng, Boris Shigida

This paper presents uniform estimation and inference theory for a large class
of nonparametric partitioning-based M-estimators. The main theoretical results
include: (i) uniform consistency for convex and non-convex objective functions;
(ii) rate-optimal uniform Bahadur representations; (iii) rate-optimal uniform
(and mean square) convergence rates; (iv) valid strong approximations and
feasible uniform inference methods; and (v) extensions to functional
transformations of underlying estimators. Uniformity is established over both
the evaluation point of the nonparametric functional parameter and a Euclidean
parameter indexing the class of loss functions. The results also account
explicitly for the smoothness degree of the loss function (if any), and allow
for a possibly non-identity (inverse) link function. We illustrate the
theoretical and methodological results in four examples: quantile regression,
distribution regression, $L_p$ regression, and Logistic regression. Many other
possibly non-smooth, nonlinear, generalized, robust M-estimation settings are
covered by our results. We provide detailed comparisons with the existing
literature and demonstrate substantive improvements: we achieve the best (in
some cases optimal) known results under improved (in some cases minimal)
requirements in terms of regularity conditions and side rate restrictions. The
supplemental appendix reports complementary technical results that may be of
independent interest, including a novel uniform strong approximation result
based on Yurinskii's coupling.

arXiv link: http://arxiv.org/abs/2409.05715v2

Econometrics arXiv paper, submitted: 2024-09-09

The Surprising Robustness of Partial Least Squares

Authors: João B. Assunção, Pedro Afonso Fernandes

Partial least squares (PLS) is a simple factorisation method that works well
with high dimensional problems in which the number of observations is limited
given the number of independent variables. In this article, we show that PLS
can perform better than ordinary least squares (OLS), least absolute shrinkage
and selection operator (LASSO) and ridge regression in forecasting quarterly
gross domestic product (GDP) growth, covering the period from 2000 to 2023. In
fact, through dimension reduction, PLS proved to be effective in lowering the
out-of-sample forecasting error, specially since 2020. For the period
2000-2019, the four methods produce similar results, suggesting that PLS is a
valid regularisation technique like LASSO or ridge.

arXiv link: http://arxiv.org/abs/2409.05713v1

Econometrics arXiv cross-link from q-fin.TR (q-fin.TR), submitted: 2024-09-08

Bellwether Trades: Characteristics of Trades influential in Predicting Future Price Movements in Markets

Authors: Tejas Ramdas, Martin T. Wells

In this study, we leverage powerful non-linear machine learning methods to
identify the characteristics of trades that contain valuable information.
First, we demonstrate the effectiveness of our optimized neural network
predictor in accurately predicting future market movements. Then, we utilize
the information from this successful neural network predictor to pinpoint the
individual trades within each data point (trading window) that had the most
impact on the optimized neural network's prediction of future price movements.
This approach helps us uncover important insights about the heterogeneity in
information content provided by trades of different sizes, venues, trading
contexts, and over time.

arXiv link: http://arxiv.org/abs/2409.05192v1

Econometrics arXiv updated paper (originally submitted: 2024-09-08)

Difference-in-Differences with Multiple Events

Authors: Lin-Tung Tsai

This paper studies staggered Difference-in-Differences (DiD) design when
there is a second event confounding the target event. When the events are
correlated, the treatment and the control group are unevenly exposed to the
effects of the second event, causing an omitted event bias. To address this
bias, I propose a two-stage DiD design. In the first stage, I estimate the
combined effects of both treatments using a control group that is neither
treated nor confounded. In the second stage, I isolate the effects of the
target treatment by leveraging a parallel treatment effect assumption and a
control group that is treated but not yet confounded. Finally, I apply this
method to revisit the effect of minimum wage increases on teen employment using
state-level hikes between 2010 and 2020. I find that the Medicaid expansion
under the ACA is a significant confounder: controlling for this bias reduces
the short-term estimate of the minimum wage effect by two-thirds.

arXiv link: http://arxiv.org/abs/2409.05184v3

Econometrics arXiv paper, submitted: 2024-09-07

DEPLOYERS: An agent based modeling tool for multi country real world data

Authors: Martin Jaraiz, Ruth Pinacho

We present recent progress in the design and development of DEPLOYERS, an
agent-based macroeconomics modeling (ABM) framework, capable to deploy and
simulate a full economic system (individual workers, goods and services firms,
government, central and private banks, financial market, external sectors)
whose structure and activity analysis reproduce the desired calibration data,
that can be, for example a Social Accounting Matrix (SAM) or a Supply-Use Table
(SUT) or an Input-Output Table (IOT).Here we extend our previous work to a
multi-country version and show an example using data from a 46-countries
64-sectors FIGARO Inter-Country IOT. The simulation of each country runs on a
separate thread or CPU core to simulate the activity of one step (month, week,
or day) and then interacts (updates imports, exports, transfer) with that
country's foreign partners, and proceeds to the next step. This interaction can
be chosen to be aggregated (a single row and column IO account) or
disaggregated (64 rows and columns) with each partner. A typical run simulates
thousands of individuals and firms engaged in their monthly activity and then
records the results, much like a survey of the country's economic system. This
data can then be subjected to, for example, an Input-Output analysis to find
out the sources of observed stylized effects as a function of time in the
detailed and realistic modeling environment that can be easily implemented in
an ABM framework.

arXiv link: http://arxiv.org/abs/2409.04876v1

Econometrics arXiv updated paper (originally submitted: 2024-09-07)

Improving the Finite Sample Estimation of Average Treatment Effects using Double/Debiased Machine Learning with Propensity Score Calibration

Authors: Daniele Ballinari, Nora Bearth

In the last decade, machine learning techniques have gained popularity for
estimating causal effects. One machine learning approach that can be used for
estimating an average treatment effect is Double/debiased machine learning
(DML) (Chernozhukov et al., 2018). This approach uses a double-robust score
function that relies on the prediction of nuisance functions, such as the
propensity score, which is the probability of treatment assignment conditional
on covariates. Estimators relying on double-robust score functions are highly
sensitive to errors in propensity score predictions. Machine learners increase
the severity of this problem as they tend to over- or underestimate these
probabilities. Several calibration approaches have been proposed to improve
probabilistic forecasts of machine learners. This paper investigates the use of
probability calibration approaches within the DML framework. Simulation results
demonstrate that calibrating propensity scores may significantly reduces the
root mean squared error of DML estimates of the average treatment effect in
finite samples. We showcase it in an empirical example and provide conditions
under which calibration does not alter the asymptotic properties of the DML
estimator.

arXiv link: http://arxiv.org/abs/2409.04874v2

Econometrics arXiv updated paper (originally submitted: 2024-09-06)

Horowitz-Manski-Lee Bounds with Multilayered Sample Selection

Authors: Kory Kroft, Ismael Mourifié, Atom Vayalinkal

This paper investigates the causal effect of job training on wage rates in
the presence of firm heterogeneity. When training affects the sorting of
workers to firms, sample selection is no longer binary but is “multilayered".
This paper extends the canonical Heckman (1979) sample selection model -- which
assumes selection is binary -- to a setting where it is multilayered. In this
setting Lee bounds set identifies a total effect that combines a
weighted-average of the causal effect of job training on wage rates across
firms with a weighted-average of the contrast in wages between different firms
for a fixed level of training. Thus, Lee bounds set identifies a
policy-relevant estimand only when firms pay homogeneous wages and/or when job
training does not affect worker sorting across firms. We derive analytic
expressions for sharp bounds for the causal effect of job training on wage
rates at each firm that leverage information on firm-specific wages. We
illustrate our partial identification approach with two empirical applications
to job training experiments. Our estimates demonstrate that even when
conventional Lee bounds are strictly positive, our within-firm bounds can be
tight around 0, showing that the canonical Lee bounds may capture only a pure
sorting effect of job training.

arXiv link: http://arxiv.org/abs/2409.04589v2

Econometrics arXiv paper, submitted: 2024-09-06

An MPEC Estimator for the Sequential Search Model

Authors: Shinji Koiso, Suguru Otani

This paper proposes a constrained maximum likelihood estimator for sequential
search models, using the MPEC (Mathematical Programming with Equilibrium
Constraints) approach. This method enhances numerical accuracy while avoiding
ad hoc components and errors related to equilibrium conditions. Monte Carlo
simulations show that the estimator performs better in small samples, with
lower bias and root-mean-squared error, though less effectively in large
samples. Despite these mixed results, the MPEC approach remains valuable for
identifying candidate parameters comparable to the benchmark, without relying
on ad hoc look-up tables, as it generates the table through solved equilibrium
constraints.

arXiv link: http://arxiv.org/abs/2409.04378v1

Econometrics arXiv paper, submitted: 2024-09-06

Extreme Quantile Treatment Effects under Endogeneity: Evaluating Policy Effects for the Most Vulnerable Individuals

Authors: Yuya Sasaki, Yulong Wang

We introduce a novel method for estimating and conducting inference about
extreme quantile treatment effects (QTEs) in the presence of endogeneity. Our
approach is applicable to a broad range of empirical research designs,
including instrumental variables design and regression discontinuity design,
among others. By leveraging regular variation and subsampling, the method
ensures robust performance even in extreme tails, where data may be sparse or
entirely absent. Simulation studies confirm the theoretical robustness of our
approach. Applying our method to assess the impact of job training provided by
the Job Training Partnership Act (JTPA), we find significantly negative QTEs
for the lowest quantiles (i.e., the most disadvantaged individuals),
contrasting with previous literature that emphasizes positive QTEs for
intermediate quantiles.

arXiv link: http://arxiv.org/abs/2409.03979v1

Econometrics arXiv updated paper (originally submitted: 2024-09-05)

Performance of Empirical Risk Minimization For Principal Component Regression

Authors: Christian Brownlees, Guðmundur Stefán Guðmundsson, Yaping Wang

This paper establishes bounds on the predictive performance of empirical risk
minimization for principal component regression. Our analysis is nonparametric,
in the sense that the relation between the prediction target and the predictors
is not specified. In particular, we do not rely on the assumption that the
prediction target is generated by a factor model. In our analysis we consider
the cases in which the largest eigenvalues of the covariance matrix of the
predictors grow linearly in the number of predictors (strong signal regime) or
sublinearly (weak signal regime). The main result of this paper shows that
empirical risk minimization for principal component regression is consistent
for prediction and, under appropriate conditions, it achieves near-optimal
performance in both the strong and weak signal regimes.

arXiv link: http://arxiv.org/abs/2409.03606v2

Econometrics arXiv paper, submitted: 2024-09-05

Automatic Pricing and Replenishment Strategies for Vegetable Products Based on Data Analysis and Nonlinear Programming

Authors: Mingpu Ma

In the field of fresh produce retail, vegetables generally have a relatively
limited shelf life, and their quality deteriorates with time. Most vegetable
varieties, if not sold on the day of delivery, become difficult to sell the
following day. Therefore, retailers usually perform daily quantitative
replenishment based on historical sales data and demand conditions. Vegetable
pricing typically uses a "cost-plus pricing" method, with retailers often
discounting products affected by transportation loss and quality decline. In
this context, reliable market demand analysis is crucial as it directly impacts
replenishment and pricing decisions. Given the limited retail space, a rational
sales mix becomes essential. This paper first uses data analysis and
visualization techniques to examine the distribution patterns and
interrelationships of vegetable sales quantities by category and individual
item, based on provided data on vegetable types, sales records, wholesale
prices, and recent loss rates. Next, it constructs a functional relationship
between total sales volume and cost-plus pricing for vegetable categories,
forecasts future wholesale prices using the ARIMA model, and establishes a
sales profit function and constraints. A nonlinear programming model is then
developed and solved to provide daily replenishment quantities and pricing
strategies for each vegetable category for the upcoming week. Further, we
optimize the profit function and constraints based on the actual sales
conditions and requirements, providing replenishment quantities and pricing
strategies for individual items on July 1 to maximize retail profit. Finally,
to better formulate replenishment and pricing decisions for vegetable products,
we discuss and forecast the data that retailers need to collect and analyses
how the collected data can be applied to the above issues.

arXiv link: http://arxiv.org/abs/2409.09065v1

Econometrics arXiv paper, submitted: 2024-09-04

Momentum Dynamics in Competitive Sports: A Multi-Model Analysis Using TOPSIS and Logistic Regression

Authors: Mingpu Ma

This paper explores the concept of "momentum" in sports competitions through
the use of the TOPSIS model and 0-1 logistic regression model. First, the
TOPSIS model is employed to evaluate the performance of two tennis players,
with visualizations used to analyze the situation's evolution at every moment
in the match, explaining how "momentum" manifests in sports. Then, the 0-1
logistic regression model is utilized to verify the impact of "momentum" on
match outcomes, demonstrating that fluctuations in player performance and the
successive occurrence of successes are not random. Additionally, this paper
examines the indicators that influence the reversal of game situations by
analyzing key match data and testing the accuracy of the models with match
data. The findings show that the model accurately explains the conditions
during matches and can be generalized to other sports competitions. Finally,
the strengths, weaknesses, and potential future improvements of the model are
discussed.

arXiv link: http://arxiv.org/abs/2409.02872v1

Econometrics arXiv paper, submitted: 2024-09-04

The Impact of Data Elements on Narrowing the Urban-Rural Consumption Gap in China: Mechanisms and Policy Analysis

Authors: Mingpu Ma

The urban-rural consumption gap, as one of the important indicators in social
development, directly reflects the imbalance in urban and rural economic and
social development. Data elements, as an important component of New Quality
Productivity, are of significant importance in promoting economic development
and improving people's living standards in the information age. This study,
through the analysis of fixed-effects regression models, system GMM regression
models, and the intermediate effect model, found that the development level of
data elements to some extent promotes the narrowing of the urban-rural
consumption gap. At the same time, the intermediate variable of urban-rural
income gap plays an important role between data elements and consumption gap,
with a significant intermediate effect. The results of the study indicate that
the advancement of data elements can promote the balance of urban and rural
residents' consumption levels by reducing the urban-rural income gap, providing
theoretical support and policy recommendations for achieving common prosperity
and promoting coordinated urban-rural development. Building upon this, this
paper emphasizes the complex correlation between the development of data
elements and the urban-rural consumption gap, and puts forward policy
suggestions such as promoting the development of the data element market,
strengthening the construction of the digital economy and e-commerce, and
promoting integrated urban-rural development. Overall, the development of data
elements is not only an important path to reducing the urban-rural consumption
gap but also one of the key drivers for promoting the balanced development of
China's economic and social development. This study has a certain theoretical
and practical significance for understanding the mechanism of the urban-rural
consumption gap and improving policies for urban-rural economic development.

arXiv link: http://arxiv.org/abs/2409.02662v1

Econometrics arXiv paper, submitted: 2024-09-04

The Application of Green GDP and Its Impact on Global Economy and Environment: Analysis of GGDP based on SEEA model

Authors: Mingpu Ma

This paper presents an analysis of Green Gross Domestic Product (GGDP) using
the System of Environmental-Economic Accounting (SEEA) model to evaluate its
impact on global climate mitigation and economic health. GGDP is proposed as a
superior measure to tradi-tional GDP by incorporating natural resource
consumption, environmental pollution control, and degradation factors. The
study develops a GGDP model and employs grey correlation analysis and grey
prediction models to assess its relationship with these factors. Key findings
demonstrate that replacing GDP with GGDP can positively influence climate
change, partic-ularly in reducing CO2 emissions and stabilizing global
temperatures. The analysis further explores the implications of GGDP adoption
across developed and developing countries, with specific predictions for China
and the United States. The results indicate a potential increase in economic
levels for developing countries, while developed nations may experi-ence a
decrease. Additionally, the shift to GGDP is shown to significantly reduce
natural re-source depletion and population growth rates in the United States,
suggesting broader envi-ronmental and economic benefits. This paper highlights
the universal applicability of the GGDP model and its potential to enhance
environmental and economic policies globally.

arXiv link: http://arxiv.org/abs/2409.02642v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-09-04

Fitting an Equation to Data Impartially

Authors: Chris Tofallis

We consider the problem of fitting a relationship (e.g. a potential
scientific law) to data involving multiple variables. Ordinary (least squares)
regression is not suitable for this because the estimated relationship will
differ according to which variable is chosen as being dependent, and the
dependent variable is unrealistically assumed to be the only variable which has
any measurement error (noise). We present a very general method for estimating
a linear functional relationship between multiple noisy variables, which are
treated impartially, i.e. no distinction between dependent and independent
variables. The data are not assumed to follow any distribution, but all
variables are treated as being equally reliable. Our approach extends the
geometric mean functional relationship to multiple dimensions. This is
especially useful with variables measured in different units, as it is
naturally scale-invariant, whereas orthogonal regression is not. This is
because our approach is not based on minimizing distances, but on the symmetric
concept of correlation. The estimated coefficients are easily obtained from the
covariances or correlations, and correspond to geometric means of associated
least squares coefficients. The ease of calculation will hopefully allow
widespread application of impartial fitting to estimate relationships in a
neutral way.

arXiv link: http://arxiv.org/abs/2409.02573v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-09-03

Double Machine Learning at Scale to Predict Causal Impact of Customer Actions

Authors: Sushant More, Priya Kotwal, Sujith Chappidi, Dinesh Mandalapu, Chris Khawand

Causal Impact (CI) of customer actions are broadly used across the industry
to inform both short- and long-term investment decisions of various types. In
this paper, we apply the double machine learning (DML) methodology to estimate
the CI values across 100s of customer actions of business interest and 100s of
millions of customers. We operationalize DML through a causal ML library based
on Spark with a flexible, JSON-driven model configuration approach to estimate
CI at scale (i.e., across hundred of actions and millions of customers). We
outline the DML methodology and implementation, and associated benefits over
the traditional potential outcomes based CI model. We show population-level as
well as customer-level CI values along with confidence intervals. The
validation metrics show a 2.2% gain over the baseline methods and a 2.5X gain
in the computational time. Our contribution is to advance the scalable
application of CI, while also providing an interface that allows faster
experimentation, cross-platform support, ability to onboard new use cases, and
improves accessibility of underlying code for partner teams.

arXiv link: http://arxiv.org/abs/2409.02332v1

Econometrics arXiv updated paper (originally submitted: 2024-09-03)

Distribution Regression Difference-In-Differences

Authors: Iván Fernández-Val, Jonas Meier, Aico van Vuuren, Francis Vella

We provide a simple distribution regression estimator for treatment effects
in the difference-in-differences (DiD) design. Our procedure is particularly
useful when the treatment effect differs across the distribution of the outcome
variable. Our proposed estimator easily incorporates covariates and,
importantly, can be extended to settings where the treatment potentially
affects the joint distribution of multiple outcomes. Our key identifying
restriction is that the counterfactual distribution of the treated in the
untreated state has no interaction effect between treatment and time. This
assumption results in a parallel trend assumption on a transformation of the
distribution. We highlight the relationship between our procedure and
assumptions with the changes-in-changes approach of Athey and Imbens (2006). We
also reexamine the Card and Krueger (1994) study of the impact of minimum wages
on employment to illustrate the utility of our approach.

arXiv link: http://arxiv.org/abs/2409.02311v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-09-03

Variable selection in convex nonparametric least squares via structured Lasso: An application to the Swedish electricity distribution networks

Authors: Zhiqiang Liao

We study the problem of variable selection in convex nonparametric least
squares (CNLS). Whereas the least absolute shrinkage and selection operator
(Lasso) is a popular technique for least squares, its variable selection
performance is unknown in CNLS problems. In this work, we investigate the
performance of the Lasso estimator and find out it is usually unable to select
variables efficiently. Exploiting the unique structure of the subgradients in
CNLS, we develop a structured Lasso method by combining $\ell_1$-norm and
$\ell_{\infty}$-norm. The relaxed version of the structured Lasso is proposed
for achieving model sparsity and predictive performance simultaneously, where
we can control the two effects--variable selection and model shrinkage--using
separate tuning parameters. A Monte Carlo study is implemented to verify the
finite sample performance of the proposed approaches. We also use real data
from Swedish electricity distribution networks to illustrate the effects of the
proposed variable selection techniques. The results from the simulation and
application confirm that the proposed structured Lasso performs favorably,
generally leading to sparser and more accurate predictive models, relative to
the conventional Lasso methods in the literature.

arXiv link: http://arxiv.org/abs/2409.01911v2

Econometrics arXiv paper, submitted: 2024-09-02

Double Machine Learning meets Panel Data -- Promises, Pitfalls, and Potential Solutions

Authors: Jonathan Fuhr, Dominik Papies

Estimating causal effect using machine learning (ML) algorithms can help to
relax functional form assumptions if used within appropriate frameworks.
However, most of these frameworks assume settings with cross-sectional data,
whereas researchers often have access to panel data, which in traditional
methods helps to deal with unobserved heterogeneity between units. In this
paper, we explore how we can adapt double/debiased machine learning (DML)
(Chernozhukov et al., 2018) for panel data in the presence of unobserved
heterogeneity. This adaptation is challenging because DML's cross-fitting
procedure assumes independent data and the unobserved heterogeneity is not
necessarily additively separable in settings with nonlinear observed
confounding. We assess the performance of several intuitively appealing
estimators in a variety of simulations. While we find violations of the
cross-fitting assumptions to be largely inconsequential for the accuracy of the
effect estimates, many of the considered methods fail to adequately account for
the presence of unobserved heterogeneity. However, we find that using
predictive models based on the correlated random effects approach (Mundlak,
1978) within DML leads to accurate coefficient estimates across settings, given
a sample size that is large relative to the number of observed confounders. We
also show that the influence of the unobserved heterogeneity on the observed
confounders plays a significant role for the performance of most alternative
methods.

arXiv link: http://arxiv.org/abs/2409.01266v1

Econometrics arXiv paper, submitted: 2024-08-31

Bandit Algorithms for Policy Learning: Methods, Implementation, and Welfare-performance

Authors: Toru Kitagawa, Jeff Rowley

Static supervised learning-in which experimental data serves as a training
sample for the estimation of an optimal treatment assignment policy-is a
commonly assumed framework of policy learning. An arguably more realistic but
challenging scenario is a dynamic setting in which the planner performs
experimentation and exploitation simultaneously with subjects that arrive
sequentially. This paper studies bandit algorithms for learning an optimal
individualised treatment assignment policy. Specifically, we study
applicability of the EXP4.P (Exponential weighting for Exploration and
Exploitation with Experts) algorithm developed by Beygelzimer et al. (2011) to
policy learning. Assuming that the class of policies has a finite
Vapnik-Chervonenkis dimension and that the number of subjects to be allocated
is known, we present a high probability welfare-regret bound of the algorithm.
To implement the algorithm, we use an incremental enumeration algorithm for
hyperplane arrangements. We perform extensive numerical analysis to assess the
algorithm's sensitivity to its tuning parameters and its welfare-regret
performance. Further simulation exercises are calibrated to the National Job
Training Partnership Act (JTPA) Study sample to determine how the algorithm
performs when applied to economic data. Our findings highlight various
computational challenges and suggest that the limited welfare gain from the
algorithm is due to substantial heterogeneity in causal effects in the JTPA
data.

arXiv link: http://arxiv.org/abs/2409.00379v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-08-30

Weighted Regression with Sybil Networks

Authors: Nihar Shah

In many online domains, Sybil networks -- or cases where a single user
assumes multiple identities -- is a pervasive feature. This complicates
experiments, as off-the-shelf regression estimators at least assume known
network topologies (if not fully independent observations) when Sybil network
topologies in practice are often unknown. The literature has exclusively
focused on techniques to detect Sybil networks, leading many experimenters to
subsequently exclude suspected networks entirely before estimating treatment
effects. I present a more efficient solution in the presence of these suspected
Sybil networks: a weighted regression framework that applies weights based on
the probabilities that sets of observations are controlled by single actors. I
show in the paper that the MSE-minimizing solution is to set the weight matrix
equal to the inverse of the expected network topology. I demonstrate the
methodology on simulated data, and then I apply the technique to a competition
with suspected Sybil networks run on the Sui blockchain and show reductions in
the standard error of the estimate by 6 - 24%.

arXiv link: http://arxiv.org/abs/2408.17426v3

Econometrics arXiv paper, submitted: 2024-08-30

State Space Model of Realized Volatility under the Existence of Dependent Market Microstructure Noise

Authors: Toru Yano

Volatility means the degree of variation of a stock price which is important
in finance. Realized Volatility (RV) is an estimator of the volatility
calculated using high-frequency observed prices. RV has lately attracted
considerable attention of econometrics and mathematical finance. However, it is
known that high-frequency data includes observation errors called market
microstructure noise (MN). Nagakura and Watanabe[2015] proposed a state space
model that resolves RV into true volatility and influence of MN. In this paper,
we assume a dependent MN that autocorrelates and correlates with return as
reported by Hansen and Lunde[2006] and extends the results of Nagakura and
Watanabe[2015] and compare models by simulation and actual data.

arXiv link: http://arxiv.org/abs/2408.17187v1

Econometrics arXiv paper, submitted: 2024-08-29

Sensitivity Analysis for Dynamic Discrete Choice Models

Authors: Chun Pong Lau

In dynamic discrete choice models, some parameters, such as the discount
factor, are being fixed instead of being estimated. This paper proposes two
sensitivity analysis procedures for dynamic discrete choice models with respect
to the fixed parameters. First, I develop a local sensitivity measure that
estimates the change in the target parameter for a unit change in the fixed
parameter. This measure is fast to compute as it does not require model
re-estimation. Second, I propose a global sensitivity analysis procedure that
uses model primitives to study the relationship between target parameters and
fixed parameters. I show how to apply the sensitivity analysis procedures of
this paper through two empirical applications.

arXiv link: http://arxiv.org/abs/2408.16330v1

Econometrics arXiv paper, submitted: 2024-08-28

Marginal homogeneity tests with panel data

Authors: Federico Bugni, Jackson Bunting, Muyang Ren

A panel dataset satisfies marginal homogeneity if the time-specific marginal
distributions are homogeneous or time-invariant. Marginal homogeneity is
relevant in economic settings such as dynamic discrete games. In this paper, we
propose several tests for the hypothesis of marginal homogeneity and
investigate their properties. We consider an asymptotic framework in which the
number of individuals n in the panel diverges, and the number of periods T is
fixed. We implement our tests by comparing a studentized or non-studentized
T-sample version of the Cramer-von Mises statistic with a suitable critical
value. We propose three methods to construct the critical value: asymptotic
approximations, the bootstrap, and time permutations. We show that the first
two methods result in asymptotically exact hypothesis tests. The permutation
test based on a non-studentized statistic is asymptotically exact when T=2, but
is asymptotically invalid when T>2. In contrast, the permutation test based on
a studentized statistic is always asymptotically exact. Finally, under a
time-exchangeability assumption, the permutation test is exact in finite
samples, both with and without studentization.

arXiv link: http://arxiv.org/abs/2408.15862v1

Econometrics arXiv paper, submitted: 2024-08-28

BayesSRW: Bayesian Sampling and Re-weighting approach for variance reduction

Authors: Carol Liu

In this paper, we address the challenge of sampling in scenarios where
limited resources prevent exhaustive measurement across all subjects. We
consider a setting where samples are drawn from multiple groups, each following
a distribution with unknown mean and variance parameters. We introduce a novel
sampling strategy, motivated simply by Cauchy-Schwarz inequality, which
minimizes the variance of the population mean estimator by allocating samples
proportionally to both the group size and the standard deviation. This approach
improves the efficiency of sampling by focusing resources on groups with
greater variability, thereby enhancing the precision of the overall estimate.
Additionally, we extend our method to a two-stage sampling procedure in a Bayes
approach, named BayesSRW, where a preliminary stage is used to estimate the
variance, which then informs the optimal allocation of the remaining sampling
budget. Through simulation examples, we demonstrate the effectiveness of our
approach in reducing estimation uncertainty and providing more reliable
insights in applications ranging from user experience surveys to
high-dimensional peptide array studies.

arXiv link: http://arxiv.org/abs/2408.15454v1

Econometrics arXiv paper, submitted: 2024-08-28

The effects of data preprocessing on probability of default model fairness

Authors: Di Wu

In the context of financial credit risk evaluation, the fairness of machine
learning models has become a critical concern, especially given the potential
for biased predictions that disproportionately affect certain demographic
groups. This study investigates the impact of data preprocessing, with a
specific focus on Truncated Singular Value Decomposition (SVD), on the fairness
and performance of probability of default models. Using a comprehensive dataset
sourced from Kaggle, various preprocessing techniques, including SVD, were
applied to assess their effect on model accuracy, discriminatory power, and
fairness.

arXiv link: http://arxiv.org/abs/2408.15452v1

Econometrics arXiv paper, submitted: 2024-08-26

Double/Debiased CoCoLASSO of Treatment Effects with Mismeasured High-Dimensional Control Variables

Authors: Geonwoo Kim, Suyong Song

We develop an estimator for treatment effects in high-dimensional settings
with additive measurement error, a prevalent challenge in modern econometrics.
We introduce the Double/Debiased Convex Conditioned LASSO (Double/Debiased
CoCoLASSO), which extends the double/debiased machine learning framework to
accommodate mismeasured covariates. Our principal contributions are threefold.
(1) We construct a Neyman-orthogonal score function that remains valid under
measurement error, incorporating a bias correction term to account for
error-induced correlations. (2) We propose a method of moments estimator for
the measurement error variance, enabling implementation without prior knowledge
of the error covariance structure. (3) We establish the $N$-consistency
and asymptotic normality of our estimator under general conditions, allowing
for both the number of covariates and the magnitude of measurement error to
increase with the sample size. Our theoretical results demonstrate the
estimator's efficiency within the class of regularized high-dimensional
estimators accounting for measurement error. Monte Carlo simulations
corroborate our asymptotic theory and illustrate the estimator's robust
performance across various levels of measurement error. Notably, our
covariance-oblivious approach nearly matches the efficiency of methods that
assume known error variance.

arXiv link: http://arxiv.org/abs/2408.14671v1

Econometrics arXiv updated paper (originally submitted: 2024-08-26)

Modeling the Dynamics of Growth in Master-Planned Communities

Authors: Christopher K. Allsup, Irene S. Gabashvili

This paper describes how a time-varying Markov model was used to forecast
housing development at a master-planned community during a transition from high
to low growth. Our approach draws on detailed historical data to model the
dynamics of the market participants, producing results that are entirely
data-driven and free of bias. While traditional time series forecasting methods
often struggle to account for nonlinear regime changes in growth, our approach
successfully captures the onset of buildout as well as external economic
shocks, such as the 1990 and 2008-2011 recessions and the 2021 post-pandemic
boom.
This research serves as a valuable tool for urban planners, homeowner
associations, and property stakeholders aiming to navigate the complexities of
growth at master-planned communities during periods of both system stability
and instability.

arXiv link: http://arxiv.org/abs/2408.14214v2

Econometrics arXiv paper, submitted: 2024-08-26

Endogenous Treatment Models with Social Interactions: An Application to the Impact of Exercise on Self-Esteem

Authors: Zhongjian Lin, Francis Vella

We address the estimation of endogenous treatment models with social
interactions in both the treatment and outcome equations. We model the
interactions between individuals in an internally consistent manner via a game
theoretic approach based on discrete Bayesian games. This introduces a
substantial computational burden in estimation which we address through a
sequential version of the nested fixed point algorithm. We also provide some
relevant treatment effects, and procedures for their estimation, which capture
the impact on both the individual and the total sample. Our empirical
application examines the impact of an individual's exercise frequency on her
level of self-esteem. We find that an individual's exercise frequency is
influenced by her expectation of her friends'. We also find that an
individual's level of self-esteem is affected by her level of exercise and, at
relatively lower levels of self-esteem, by the expectation of her friends'
self-esteem.

arXiv link: http://arxiv.org/abs/2408.13971v1

Econometrics arXiv paper, submitted: 2024-08-25

Inference on Consensus Ranking of Distributions

Authors: David M. Kaplan

Instead of testing for unanimous agreement, I propose learning how broad of a
consensus favors one distribution over another (of earnings, productivity,
asset returns, test scores, etc.). Specifically, given a sample from each of
two distributions, I propose statistical inference methods to learn about the
set of utility functions for which the first distribution has higher expected
utility than the second distribution. With high probability, an "inner"
confidence set is contained within this true set, while an "outer" confidence
set contains the true set. Such confidence sets can be formed by inverting a
proposed multiple testing procedure that controls the familywise error rate.
Theoretical justification comes from empirical process results, given that very
large classes of utility functions are generally Donsker (subject to finite
moments). The theory additionally justifies a uniform (over utility functions)
confidence band of expected utility differences, as well as tests with a
utility-based "restricted stochastic dominance" as either the null or
alternative hypothesis. Simulated and empirical examples illustrate the
methodology.

arXiv link: http://arxiv.org/abs/2408.13949v1

Econometrics arXiv updated paper (originally submitted: 2024-08-24)

Cross-sectional Dependence in Idiosyncratic Volatility

Authors: Ilze Kalnina, Kokouvi Tewou

This paper introduces an econometric framework for analyzing cross-sectional
dependence in the idiosyncratic volatilities of assets using high frequency
data. We first consider the estimation of standard measures of dependence in
the idiosyncratic volatilities such as covariances and correlations. Naive
estimators of these measures are biased due to the use of the error-laden
estimates of idiosyncratic volatilities. We provide bias-corrected estimators
and the relevant asymptotic theory. Next, we introduce an idiosyncratic
volatility factor model, in which we decompose the variation in idiosyncratic
volatilities into two parts: the variation related to the systematic factors
such as the market volatility, and the residual variation. Again, naive
estimators of the decomposition are biased, and we provide bias-corrected
estimators. We also provide the asymptotic theory that allows us to test
whether the residual (non-systematic) components of the idiosyncratic
volatilities exhibit cross-sectional dependence. We apply our methodology to
the S&P 100 index constituents, and document strong cross-sectional dependence
in their idiosyncratic volatilities. We consider two different sets of
idiosyncratic volatility factors, and find that neither can fully account for
the cross-sectional dependence in idiosyncratic volatilities. For each model,
we map out the network of dependencies in residual (non-systematic)
idiosyncratic volatilities across all stocks.

arXiv link: http://arxiv.org/abs/2408.13437v2

Econometrics arXiv updated paper (originally submitted: 2024-08-23)

Difference-in-differences with as few as two cross-sectional units -- A new perspective to the democracy-growth debate

Authors: Gilles Koumou, Emmanuel Selorm Tsyawo

Pooled panel analyses often mask heterogeneity in unit-specific treatment
effects. This challenge, for example, crops up in studies of the impact of
democracy on economic growth, where findings vary substantially due to
differences in country composition. To address this challenge, this paper
introduces a Difference-in-Differences (DiD) estimator that leverages temporal
variation in the data to estimate unit-specific average treatment effects on
the treated (ATT) with as few as two cross-sectional units. Under weak
identification and temporal dependence conditions, the proposed DiD estimator
is shown to be asymptotically normal. The method is further complemented with
an identification test that, unlike pre-trends tests, is more powerful and can
detect violations of parallel trends in post-treatment periods. Empirical
results using the DiD estimator suggest Benin's economy would have been 6.3%
smaller on average over the 1993-2018 period had she not democratised.

arXiv link: http://arxiv.org/abs/2408.13047v4

Econometrics arXiv updated paper (originally submitted: 2024-08-23)

Machine Learning and the Yield Curve: Tree-Based Macroeconomic Regime Switching

Authors: Siyu Bie, Francis X. Diebold, Jingyu He, Junye Li

We explore tree-based macroeconomic regime-switching in the context of the
dynamic Nelson-Siegel (DNS) yield-curve model. In particular, we customize the
tree-growing algorithm to partition macroeconomic variables based on the DNS
model's marginal likelihood, thereby identifying regime-shifting patterns in
the yield curve. Compared to traditional Markov-switching models, our model
offers clear economic interpretation via macroeconomic linkages and ensures
computational simplicity. In an empirical application to U.S. Treasury yields,
we find (1) important yield-curve regime switching, and (2) evidence that
macroeconomic variables have predictive power for the yield curve when the
federal funds rate is high, but not in other regimes, thereby refining the
notion of yield curve ”macro-spanning”.

arXiv link: http://arxiv.org/abs/2408.12863v2

Econometrics arXiv updated paper (originally submitted: 2024-08-22)

A nested nonparametric logit model for microtransit revenue management supplemented with citywide synthetic data

Authors: Xiyuan Ren, Joseph Y. J. Chow, Venktesh Pandey, Linfei Yuan

As an IT-enabled multi-passenger mobility service, microtransit can improve
accessibility, reduce congestion, and enhance flexibility. However, its
heterogeneous impacts across travelers necessitate better tools for
microtransit forecasting and revenue management, especially when actual usage
data are limited. We propose a nested nonparametric model for joint travel mode
and ride pass subscription choice, estimated using marginal subscription data
and synthetic populations. The model improves microtransit choice modeling by
(1) leveraging citywide synthetic data for greater spatiotemporal granularity,
(2) employing an agent-based estimation approach to capture heterogeneous user
preferences, and (3) integrating mode choice parameters into subscription
choice modeling. We apply our methodology to a case study in Arlington, TX,
using synthetic data from Replica Inc. and microtransit data from Via. Our
model accurately predicts the number of subscribers in the upper branch and
achieves a high McFadden R2 in the lower branch (0.603 for weekday trips and
0.576 for weekend trips), while also retrieving interpretable elasticities and
consumer surplus. We further integrate the model into a simulation-based
framework for microtransit revenue management. For the ride pass pricing
policy, our simulation results show that reducing the price of the weekly pass
($25 -> $18.9) and monthly pass ($80 -> $71.5) would surprisingly increase
total revenue by $127 per day. For the subsidy policy, our simulation results
show that a 100% fare discount would reduce 61 car trips to AT&T Stadium for a
game event, and increase 82 microtransit trips to Medical City Arlington, but
require subsidies of $533 per event and $483 per day, respectively.

arXiv link: http://arxiv.org/abs/2408.12577v2

Econometrics arXiv paper, submitted: 2024-08-22

Momentum Informed Inflation-at-Risk

Authors: Tibor Szendrei, Arnab Bhattacharjee

Growth-at-Risk has recently become a key measure of macroeconomic tail-risk,
which has seen it be researched extensively. Surprisingly, the same cannot be
said for Inflation-at-Risk where both tails, deflation and high inflation, are
of key concern to policymakers, which has seen comparatively much less
research. This paper will tackle this gap and provide estimates for
Inflation-at-Risk. The key insight of the paper is that inflation is best
characterised by a combination of two types of nonlinearities: quantile
variation, and conditioning on the momentum of inflation.

arXiv link: http://arxiv.org/abs/2408.12286v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2024-08-22

Enhancing Causal Discovery in Financial Networks with Piecewise Quantile Regression

Authors: Cameron Cornell, Lewis Mitchell, Matthew Roughan

Financial networks can be constructed using statistical dependencies found
within the price series of speculative assets. Across the various methods used
to infer these networks, there is a general reliance on predictive modelling to
capture cross-correlation effects. These methods usually model the flow of
mean-response information, or the propagation of volatility and risk within the
market. Such techniques, though insightful, don't fully capture the broader
distribution-level causality that is possible within speculative markets. This
paper introduces a novel approach, combining quantile regression with a
piecewise linear embedding scheme - allowing us to construct causality networks
that identify the complex tail interactions inherent to financial markets.
Applying this method to 260 cryptocurrency return series, we uncover
significant tail-tail causal effects and substantial causal asymmetry. We
identify a propensity for coins to be self-influencing, with comparatively
sparse cross variable effects. Assessing all link types in conjunction, Bitcoin
stands out as the primary influencer - a nuance that is missed in conventional
linear mean-response analyses. Our findings introduce a comprehensive framework
for modelling distributional causality, paving the way towards more holistic
representations of causality in financial markets.

arXiv link: http://arxiv.org/abs/2408.12210v1

Econometrics arXiv cross-link from eess.SY (eess.SY), submitted: 2024-08-21

An Econometric Analysis of Large Flexible Cryptocurrency-mining Consumers in Electricity Markets

Authors: Subir Majumder, Ignacio Aravena, Le Xie

In recent years, power grids have seen a surge in large cryptocurrency mining
firms, with individual consumption levels reaching 700MW. This study examines
the behavior of these firms in Texas, focusing on how their consumption is
influenced by cryptocurrency conversion rates, electricity prices, local
weather, and other factors. We transform the skewed electricity consumption
data of these firms, perform correlation analysis, and apply a seasonal
autoregressive moving average model for analysis. Our findings reveal that,
surprisingly, short-term mining electricity consumption is not directly
correlated with cryptocurrency conversion rates. Instead, the primary
influencers are the temperature and electricity prices. These firms also
respond to avoid transmission and distribution network (T&D) charges - commonly
referred to as four Coincident peak (4CP) charges - during the summer months.
As the scale of these firms is likely to surge in future years, the developed
electricity consumption model can be used to generate public, synthetic
datasets to understand the overall impact on the power grid. The developed
model could also lead to better pricing mechanisms to effectively use the
flexibility of these resources towards improving power grid reliability.

arXiv link: http://arxiv.org/abs/2408.12014v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-08-21

Valuing an Engagement Surface using a Large Scale Dynamic Causal Model

Authors: Abhimanyu Mukerji, Sushant More, Ashwin Viswanathan Kannan, Lakshmi Ravi, Hua Chen, Naman Kohli, Chris Khawand, Dinesh Mandalapu

With recent rapid growth in online shopping, AI-powered Engagement Surfaces
(ES) have become ubiquitous across retail services. These engagement surfaces
perform an increasing range of functions, including recommending new products
for purchase, reminding customers of their orders and providing delivery
notifications. Understanding the causal effect of engagement surfaces on value
driven for customers and businesses remains an open scientific question. In
this paper, we develop a dynamic causal model at scale to disentangle value
attributable to an ES, and to assess its effectiveness. We demonstrate the
application of this model to inform business decision-making by understanding
returns on investment in the ES, and identifying product lines and features
where the ES adds the most value.

arXiv link: http://arxiv.org/abs/2408.11967v1

Econometrics arXiv paper, submitted: 2024-08-21

SPORTSCausal: Spill-Over Time Series Causal Inference

Authors: Carol Liu

Randomized controlled trials (RCTs) have long been the gold standard for
causal inference across various fields, including business analysis, economic
studies, sociology, clinical research, and network learning. The primary
advantage of RCTs over observational studies lies in their ability to
significantly reduce noise from individual variance. However, RCTs depend on
strong assumptions, such as group independence, time independence, and group
randomness, which are not always feasible in real-world applications.
Traditional inferential methods, including analysis of covariance (ANCOVA),
often fail when these assumptions do not hold. In this paper, we propose a
novel approach named Spillover Time
Series Causal (\verb+SPORTSCausal+), which enables the
estimation of treatment effects without relying on these stringent assumptions.
We demonstrate the practical applicability of \verb+SPORTSCausal+ through a
real-world budget-control experiment. In this experiment, data was collected
from both a 5% live experiment and a 50% live experiment using the same
treatment. Due to the spillover effect, the vanilla estimation of the treatment
effect was not robust across different treatment sizes, whereas
\verb+SPORTSCausal+ provided a robust estimation.

arXiv link: http://arxiv.org/abs/2408.11951v1

Econometrics arXiv updated paper (originally submitted: 2024-08-21)

Actually, There is No Rotational Indeterminacy in the Approximate Factor Model

Authors: Philipp Gersing

We show that in the approximate factor model the population normalised
principal components converge in mean square (up to sign) under the standard
assumptions for $n\to \infty$. Consequently, we have a generic interpretation
of what the principal components estimator is actually identifying and existing
results on factor identification are reinforced and refined. Based on this
result, we provide a new asymptotic theory for the approximate factor model
entirely without rotation matrices. We show that the factors space is
consistently estimated with finite $T$ for $n\to \infty$ while consistency of
the factors a.k.a the $L^2$ limit of the normalised principal components
requires that both $(n, T)\to \infty$.

arXiv link: http://arxiv.org/abs/2408.11676v2

Econometrics arXiv paper, submitted: 2024-08-21

Robust Bayes Treatment Choice with Partial Identification

Authors: Andrés Aradillas Fernández, José Luis Montiel Olea, Chen Qiu, Jörg Stoye, Serdil Tinda

We study a class of binary treatment choice problems with partial
identification, through the lens of robust (multiple prior) Bayesian analysis.
We use a convenient set of prior distributions to derive ex-ante and ex-post
robust Bayes decision rules, both for decision makers who can randomize and for
decision makers who cannot.
Our main messages are as follows: First, ex-ante and ex-post robust Bayes
decision rules do not tend to agree in general, whether or not randomized rules
are allowed. Second, randomized treatment assignment for some data realizations
can be optimal in both ex-ante and, perhaps more surprisingly, ex-post
problems. Therefore, it is usually with loss of generality to exclude
randomized rules from consideration, even when regret is evaluated ex-post.
We apply our results to a stylized problem where a policy maker uses
experimental data to choose whether to implement a new policy in a population
of interest, but is concerned about the external validity of the experiment at
hand (Stoye, 2012); and to the aggregation of data generated by multiple
randomized control trials in different sites to make a policy choice in a
population for which no experimental data are available (Manski, 2020; Ishihara
and Kitagawa, 2021).

arXiv link: http://arxiv.org/abs/2408.11621v1

Econometrics arXiv paper, submitted: 2024-08-21

Towards an Inclusive Approach to Corporate Social Responsibility (CSR) in Morocco: CGEM's Commitment

Authors: Gnaoui Imane, Moutahaddib Aziz

Corporate social responsibility encourages companies to integrate social and
environmental concerns into their activities and their relations with
stakeholders. It encompasses all actions aimed at the social good, above and
beyond corporate interests and legal requirements. Various international
organizations, authors and researchers have explored the notion of CSR and
proposed a range of definitions reflecting their perspectives on the concept.
In Morocco, although Moroccan companies are not overwhelmingly embracing CSR,
several factors are encouraging them to integrate the CSR approach not only
into their discourse, but also into their strategies. The CGEM is actively
involved in promoting CSR within Moroccan companies, awarding the "CGEM Label
for CSR" to companies that meet the criteria set out in the CSR Charter. The
process of labeling Moroccan companies is in full expansion. The graphs
presented in this article are broken down according to several criteria, such
as company size, sector of activity and listing on the Casablanca Stock
Exchange, in order to provide an overview of CSR-labeled companies in Morocco.
The approach adopted for this article is a qualitative one aimed at presenting,
firstly, the different definitions of the CSR concept and its evolution over
time. In this way, the study focuses on the Moroccan context to dissect and
analyze the state of progress of CSR integration in Morocco and the various
efforts made by the CGEM to implement it. According to the data, 124 Moroccan
companies have been awarded the CSR label. For a label in existence since 2006,
this figure reflects a certain reluctance on the part of Moroccan companies to
fully implement the CSR approach in their strategies. Nevertheless, Morocco is
in a transitional phase, marked by the gradual adoption of various socially
responsible practices.

arXiv link: http://arxiv.org/abs/2408.11519v1

Econometrics arXiv updated paper (originally submitted: 2024-08-20)

Inference with Many Weak Instruments and Heterogeneity

Authors: Luther Yap

This paper considers inference in a linear instrumental variable regression
model with many potentially weak instruments, in the presence of heterogeneous
treatment effects. I first show that existing test procedures, including those
that are robust to either weak instruments or heterogeneous treatment effects,
can be arbitrarily oversized. I propose a novel and valid test based on a score
statistic and a “leave-three-out" variance estimator. In the presence of
heterogeneity and within the class of tests that are functions of the
leave-one-out analog of a maximal invariant, this test is asymptotically the
uniformly most powerful unbiased test. In two applications to judge and
quarter-of-birth instruments, the proposed inference procedure also yields a
bounded confidence set while some existing methods yield unbounded or empty
confidence sets.

arXiv link: http://arxiv.org/abs/2408.11193v3

Econometrics arXiv paper, submitted: 2024-08-20

Conditional nonparametric variable screening by neural factor regression

Authors: Jianqing Fan, Weining Wang, Yue Zhao

High-dimensional covariates often admit linear factor structure. To
effectively screen correlated covariates in high-dimension, we propose a
conditional variable screening test based on non-parametric regression using
neural networks due to their representation power. We ask the question whether
individual covariates have additional contributions given the latent factors or
more generally a set of variables. Our test statistics are based on the
estimated partial derivative of the regression function of the candidate
variable for screening and a observable proxy for the latent factors. Hence,
our test reveals how much predictors contribute additionally to the
non-parametric regression after accounting for the latent factors. Our
derivative estimator is the convolution of a deep neural network regression
estimator and a smoothing kernel. We demonstrate that when the neural network
size diverges with the sample size, unlike estimating the regression function
itself, it is necessary to smooth the partial derivative of the neural network
estimator to recover the desired convergence rate for the derivative. Moreover,
our screening test achieves asymptotic normality under the null after finely
centering our test statistics that makes the biases negligible, as well as
consistency for local alternatives under mild conditions. We demonstrate the
performance of our test in a simulation study and two real world applications.

arXiv link: http://arxiv.org/abs/2408.10825v1

Econometrics arXiv paper, submitted: 2024-08-20

Gradient Wild Bootstrap for Instrumental Variable Quantile Regressions with Weak and Few Clusters

Authors: Wenjie Wang, Yichong Zhang

We study the gradient wild bootstrap-based inference for instrumental
variable quantile regressions in the framework of a small number of large
clusters in which the number of clusters is viewed as fixed, and the number of
observations for each cluster diverges to infinity. For the Wald inference, we
show that our wild bootstrap Wald test, with or without studentization using
the cluster-robust covariance estimator (CRVE), controls size asymptotically up
to a small error as long as the parameter of endogenous variable is strongly
identified in at least one of the clusters. We further show that the wild
bootstrap Wald test with CRVE studentization is more powerful for distant local
alternatives than that without. Last, we develop a wild bootstrap
Anderson-Rubin (AR) test for the weak-identification-robust inference. We show
it controls size asymptotically up to a small error, even under weak or partial
identification for all clusters. We illustrate the good finite-sample
performance of the new inference methods using simulations and provide an
empirical application to a well-known dataset about US local labor markets.

arXiv link: http://arxiv.org/abs/2408.10686v1

Econometrics arXiv updated paper (originally submitted: 2024-08-20)

Continuous difference-in-differences with double/debiased machine learning

Authors: Lucas Z. Zhang

This paper extends difference-in-differences to settings with continuous
treatments. Specifically, the average treatment effect on the treated (ATT) at
any level of treatment intensity is identified under a conditional parallel
trends assumption. Estimating the ATT in this framework requires first
estimating infinite-dimensional nuisance parameters, particularly the
conditional density of the continuous treatment, which can introduce
substantial bias. To address this challenge, we propose estimators for the
causal parameters under the double/debiased machine learning framework and
establish their asymptotic normality. Additionally, we provide consistent
variance estimators and construct uniform confidence bands based on a
multiplier bootstrap procedure. To demonstrate the effectiveness of our
approach, we apply our estimators to the 1983 Medicare Prospective Payment
System (PPS) reform studied by Acemoglu and Finkelstein (2008), reframing it as
a DiD with continuous treatment and nonparametrically estimating its effects.

arXiv link: http://arxiv.org/abs/2408.10509v4

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2024-08-19

kendallknight: An R Package for Efficient Implementation of Kendall's Correlation Coefficient Computation

Authors: Mauricio Vargas Sepúlveda

The kendallknight package introduces an efficient implementation of Kendall's
correlation coefficient computation, significantly improving the processing
time for large datasets without sacrificing accuracy. The kendallknight
package, following Knight (1966) and posterior literature, reduces the
computational complexity resulting in drastic reductions in computation time,
transforming operations that would take minutes or hours into milliseconds or
minutes, while maintaining precision and correctly handling edge cases and
errors. The package is particularly advantageous in econometric and statistical
contexts where rapid and accurate calculation of Kendall's correlation
coefficient is desirable. Benchmarks demonstrate substantial performance gains
over the base R implementation, especially for large datasets.

arXiv link: http://arxiv.org/abs/2408.09618v5

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-08-18

Experimental Design For Causal Inference Through An Optimization Lens

Authors: Jinglong Zhao

The study of experimental design offers tremendous benefits for answering
causal questions across a wide range of applications, including agricultural
experiments, clinical trials, industrial experiments, social experiments, and
digital experiments. Although valuable in such applications, the costs of
experiments often drive experimenters to seek more efficient designs. Recently,
experimenters have started to examine such efficiency questions from an
optimization perspective, as experimental design problems are fundamentally
decision-making problems. This perspective offers a lot of flexibility in
leveraging various existing optimization tools to study experimental design
problems. This manuscript thus aims to examine the foundations of experimental
design problems in the context of causal inference as viewed through an
optimization lens.

arXiv link: http://arxiv.org/abs/2408.09607v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-08-18

Anytime-Valid Inference for Double/Debiased Machine Learning of Causal Parameters

Authors: Abhinandan Dalal, Patrick Blöbaum, Shiva Kasiviswanathan, Aaditya Ramdas

Double (debiased) machine learning (DML) has seen widespread use in recent
years for learning causal/structural parameters, in part due to its flexibility
and adaptability to high-dimensional nuisance functions as well as its ability
to avoid bias from regularization or overfitting. However, the classic
double-debiased framework is only valid asymptotically for a predetermined
sample size, thus lacking the flexibility of collecting more data if sharper
inference is needed, or stopping data collection early if useful inferences can
be made earlier than expected. This can be of particular concern in large scale
experimental studies with huge financial costs or human lives at stake, as well
as in observational studies where the length of confidence of intervals do not
shrink to zero even with increasing sample size due to partial identifiability
of a structural parameter. In this paper, we present time-uniform counterparts
to the asymptotic DML results, enabling valid inference and confidence
intervals for structural parameters to be constructed at any arbitrary
(possibly data-dependent) stopping time. We provide conditions which are only
slightly stronger than the standard DML conditions, but offer the stronger
guarantee for anytime-valid inference. This facilitates the transformation of
any existing DML method to provide anytime-valid guarantees with minimal
modifications, making it highly adaptable and easy to use. We illustrate our
procedure using two instances: a) local average treatment effect in online
experiments with non-compliance, and b) partial identification of average
treatment effect in observational studies with potential unmeasured
confounding.

arXiv link: http://arxiv.org/abs/2408.09598v2

Econometrics arXiv paper, submitted: 2024-08-18

Deep Learning for the Estimation of Heterogeneous Parameters in Discrete Choice Models

Authors: Stephan Hetzenecker, Maximilian Osterhaus

This paper studies the finite sample performance of the flexible estimation
approach of Farrell, Liang, and Misra (2021a), who propose to use deep learning
for the estimation of heterogeneous parameters in economic models, in the
context of discrete choice models. The approach combines the structure imposed
by economic models with the flexibility of deep learning, which assures the
interpretebility of results on the one hand, and allows estimating flexible
functional forms of observed heterogeneity on the other hand. For inference
after the estimation with deep learning, Farrell et al. (2021a) derive an
influence function that can be applied to many quantities of interest. We
conduct a series of Monte Carlo experiments that investigate the impact of
regularization on the proposed estimation and inference procedure in the
context of discrete choice models. The results show that the deep learning
approach generally leads to precise estimates of the true average parameters
and that regular robust standard errors lead to invalid inference results,
showing the need for the influence function approach for inference. Without
regularization, the influence function approach can lead to substantial bias
and large estimated standard errors caused by extreme outliers. Regularization
reduces this property and stabilizes the estimation procedure, but at the
expense of inducing an additional bias. The bias in combination with decreasing
variance associated with increasing regularization leads to the construction of
invalid inferential statements in our experiments. Repeated sample splitting,
unlike regularization, stabilizes the estimation approach without introducing
an additional bias, thereby allowing for the construction of valid inferential
statements.

arXiv link: http://arxiv.org/abs/2408.09560v1

Econometrics arXiv updated paper (originally submitted: 2024-08-17)

Counterfactual and Synthetic Control Method: Causal Inference with Instrumented Principal Component Analysis

Authors: Cong Wang

In this paper, we propose a novel method for causal inference within the
framework of counterfactual and synthetic control. Matching forward the
generalized synthetic control method, our instrumented principal component
analysis method instruments factor loadings with predictive covariates rather
than including them as regressors. These instrumented factor loadings exhibit
time-varying dynamics, offering a better economic interpretation. Covariates
are instrumented through a transformation matrix, $\Gamma$, when we have a
large number of covariates it can be easily reduced in accordance with a small
number of latent factors helping us to effectively handle high-dimensional
datasets and making the model parsimonious. Moreover, the novel way of handling
covariates is less exposed to model misspecification and achieved better
prediction accuracy. Our simulations show that this method is less biased in
the presence of unobserved covariates compared to other mainstream approaches.
In the empirical application, we use the proposed method to evaluate the effect
of Brexit on foreign direct investment to the UK.

arXiv link: http://arxiv.org/abs/2408.09271v2

Econometrics arXiv updated paper (originally submitted: 2024-08-17)

Externally Valid Selection of Experimental Sites via the k-Median Problem

Authors: José Luis Montiel Olea, Brenda Prallon, Chen Qiu, Jörg Stoye, Yiwei Sun

We present a decision-theoretic justification for viewing the question of how
to best choose where to experiment in order to optimize external validity as a
$k$-median problem, a popular problem in computer science and operations
research. We present conditions under which minimizing the worst-case,
welfare-based regret among all nonrandom schemes that select $k$ sites to
experiment is approximately equal - and sometimes exactly equal - to finding
the k most central vectors of baseline site-level covariates. The k-median
problem can be formulated as a linear integer program. Two empirical
applications illustrate the theoretical and computational benefits of the
suggested procedure.

arXiv link: http://arxiv.org/abs/2408.09187v2

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2024-08-17

Method of Moments Estimation for Affine Stochastic Volatility Models

Authors: Yan-Feng Wu, Xiangyu Yang, Jian-Qiang Hu

We develop moment estimators for the parameters of affine stochastic
volatility models. We first address the challenge of calculating moments for
the models by introducing a recursive equation for deriving closed-form
expressions for moments of any order. Consequently, we propose our moment
estimators. We then establish a central limit theorem for our estimators and
derive the explicit formulas for the asymptotic covariance matrix. Finally, we
provide numerical results to validate our method.

arXiv link: http://arxiv.org/abs/2408.09185v1

Econometrics arXiv updated paper (originally submitted: 2024-08-16)

Revisiting the Many Instruments Problem using Random Matrix Theory

Authors: Helmut Farbmacher, Rebecca Groh, Michael Mühlegger, Gabriel Vollert

Instrumental variables estimation with many instruments is biased.
Traditional bias-adjustments are closely connected to the Silverstein equation.
Based on the theory of random matrices, we show that Ridge estimation of the
first-stage parameters reduces the implicit price of bias-adjustments. This
leads to a trade-off, allowing for less costly estimation of the causal effect,
which comes along with improved asymptotic properties. Our theoretical results
nest existing ones on bias approximation and adjustment with ordinary
least-squares in the first-stage regression and, moreover, generalize them to
settings with more instruments than observations. Finally, we derive the
optimal tuning parameter of Ridge regressions in simultaneous equations models,
which comprises the well-known result for single equation models as a special
case with uncorrelated error terms.

arXiv link: http://arxiv.org/abs/2408.08580v2

Econometrics arXiv paper, submitted: 2024-08-14

Quantile and Distribution Treatment Effects on the Treated with Possibly Non-Continuous Outcomes

Authors: Nelly K. Djuazon, Emmanuel Selorm Tsyawo

Quantile and Distribution Treatment effects on the Treated (QTT/DTT) for
non-continuous outcomes are either not identified or inference thereon is
infeasible using existing methods. By introducing functional index parallel
trends and no anticipation assumptions, this paper identifies and provides
uniform inference procedures for QTT/DTT. The inference procedure applies under
both the canonical two-group and staggered treatment designs with balanced
panels, unbalanced panels, or repeated cross-sections. Monte Carlo experiments
demonstrate the proposed method's robust and competitive performance, while an
empirical application illustrates its practical utility.

arXiv link: http://arxiv.org/abs/2408.07842v1

Econometrics arXiv paper, submitted: 2024-08-14

Your MMM is Broken: Identification of Nonlinear and Time-varying Effects in Marketing Mix Models

Authors: Ryan Dew, Nicolas Padilla, Anya Shchetkina

Recent years have seen a resurgence in interest in marketing mix models
(MMMs), which are aggregate-level models of marketing effectiveness. Often
these models incorporate nonlinear effects, and either implicitly or explicitly
assume that marketing effectiveness varies over time. In this paper, we show
that nonlinear and time-varying effects are often not identifiable from
standard marketing mix data: while certain data patterns may be suggestive of
nonlinear effects, such patterns may also emerge under simpler models that
incorporate dynamics in marketing effectiveness. This lack of identification is
problematic because nonlinearities and dynamics suggest fundamentally different
optimal marketing allocations. We examine this identification issue through
theory and simulations, wherein we explore the exact conditions under which
conflation between the two types of models is likely to occur. In doing so, we
introduce a flexible Bayesian nonparametric model that allows us to both
flexibly simulate and estimate different data-generating processes. We show
that conflating the two types of effects is especially likely in the presence
of autocorrelated marketing variables, which are common in practice, especially
given the widespread use of stock variables to capture long-run effects of
advertising. We illustrate these ideas through numerous empirical applications
to real-world marketing mix data, showing the prevalence of the conflation
issue in practice. Finally, we show how marketers can avoid this conflation, by
designing experiments that strategically manipulate spending in ways that pin
down model form.

arXiv link: http://arxiv.org/abs/2408.07678v1

Econometrics arXiv paper, submitted: 2024-08-13

A Sparse Grid Approach for the Nonparametric Estimation of High-Dimensional Random Coefficient Models

Authors: Maximilian Osterhaus

A severe limitation of many nonparametric estimators for random coefficient
models is the exponential increase of the number of parameters in the number of
random coefficients included into the model. This property, known as the curse
of dimensionality, restricts the application of such estimators to models with
moderately few random coefficients. This paper proposes a scalable
nonparametric estimator for high-dimensional random coefficient models. The
estimator uses a truncated tensor product of one-dimensional hierarchical basis
functions to approximate the underlying random coefficients' distribution. Due
to the truncation, the number of parameters increases at a much slower rate
than in the regular tensor product basis, rendering the nonparametric
estimation of high-dimensional random coefficient models feasible. The derived
estimator allows estimating the underlying distribution with constrained least
squares, making the approach computationally simple and fast. Monte Carlo
experiments and an application to data on the regulation of air pollution
illustrate the good performance of the estimator.

arXiv link: http://arxiv.org/abs/2408.07185v1

Econometrics arXiv updated paper (originally submitted: 2024-08-13)

Endogeneity Corrections in Binary Outcome Models with Nonlinear Transformations: Identification and Inference

Authors: Alexander Mayer, Dominik Wied

For binary outcome models, an endogeneity correction based on nonlinear
rank-based transformations is proposed. Identification without external
instruments is achieved under one of two assumptions: either the endogenous
regressor is a nonlinear function of one component of the error term,
conditional on the exogenous regressors, or the dependence between the
endogenous and exogenous regressors is nonlinear. Under these conditions, we
prove consistency and asymptotic normality. Monte Carlo simulations and an
application on German insolvency data illustrate the usefulness of the method.

arXiv link: http://arxiv.org/abs/2408.06977v5

Econometrics arXiv paper, submitted: 2024-08-13

Panel Data Unit Root testing: Overview

Authors: Anton Skrobotov

This review discusses methods of testing for a panel unit root. Modern
approaches to testing in cross-sectionally correlated panels are discussed,
preceding the analysis with an analysis of independent panels. In addition,
methods for testing in the case of non-linearity in the data (for example, in
the case of structural breaks) are presented, as well as methods for testing in
short panels, when the time dimension is small and finite. In conclusion, links
to existing packages that allow implementing some of the described methods are
provided.

arXiv link: http://arxiv.org/abs/2408.08908v1

Econometrics arXiv updated paper (originally submitted: 2024-08-13)

Estimation and Inference of Average Treatment Effect in Percentage Points under Heterogeneity

Authors: Ying Zeng

In semi-logarithmic regressions, treatment coefficients are often interpreted
as approximations of the average treatment effect (ATE) in percentage points.
This paper highlights the overlooked bias of this approximation under treatment
effect heterogeneity, arising from Jensen's inequality. The issue is
particularly relevant for difference-in-differences designs with
log-transformed outcomes and staggered treatment adoption, where treatment
effects often vary across groups and periods. I propose new estimation and
inference methods for the ATE in percentage points, which are applicable when
treatment effects vary across and within groups. I establish the methods'
large-sample properties and demonstrate their finite-sample performance through
simulations, revealing substantial discrepancies between conventional and
proposed measures. Two empirical applications further underscore the practical
importance of these methods.

arXiv link: http://arxiv.org/abs/2408.06624v2

Econometrics arXiv paper, submitted: 2024-08-12

An unbounded intensity model for point processes

Authors: Kim Christensen, Alexei Kolokolov

We develop a model for point processes on the real line, where the intensity
can be locally unbounded without inducing an explosion. In contrast to an
orderly point process, for which the probability of observing more than one
event over a short time interval is negligible, the bursting intensity causes
an extreme clustering of events around the singularity. We propose a
nonparametric approach to detect such bursts in the intensity. It relies on a
heavy traffic condition, which admits inference for point processes over a
finite time interval. With Monte Carlo evidence, we show that our testing
procedure exhibits size control under the null, whereas it has high rejection
rates under the alternative. We implement our approach on high-frequency data
for the EUR/USD spot exchange rate, where the test statistic captures abnormal
surges in trading activity. We detect a nontrivial amount of intensity bursts
in these data and describe their basic properties. Trading activity during an
intensity burst is positively related to volatility, illiquidity, and the
probability of observing a drift burst. The latter effect is reinforced if the
order flow is imbalanced or the price elasticity of the limit order book is
large.

arXiv link: http://arxiv.org/abs/2408.06519v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-08-12

Method-of-Moments Inference for GLMs and Doubly Robust Functionals under Proportional Asymptotics

Authors: Xingyu Chen, Lin Liu, Rajarshi Mukherjee

In this paper, we consider the estimation of regression coefficients and
signal-to-noise (SNR) ratio in high-dimensional Generalized Linear Models
(GLMs), and explore their implications in inferring popular estimands such as
average treatment effects in high-dimensional observational studies. Under the
“proportional asymptotic” regime and Gaussian covariates with known
(population) covariance $\Sigma$, we derive Consistent and Asymptotically
Normal (CAN) estimators of our targets of inference through a Method-of-Moments
type of estimators that bypasses estimation of high dimensional nuisance
functions and hyperparameter tuning altogether. Additionally, under
non-Gaussian covariates, we demonstrate universality of our results under
certain additional assumptions on the regression coefficients and $\Sigma$. We
also demonstrate that knowing $\Sigma$ is not essential to our proposed
methodology when the sample covariance matrix estimator is invertible. Finally,
we complement our theoretical results with numerical experiments and
comparisons with existing literature.

arXiv link: http://arxiv.org/abs/2408.06103v3

Econometrics arXiv updated paper (originally submitted: 2024-08-11)

Correcting invalid regression discontinuity designs with multiple time period data

Authors: Dor Leventer, Daniel Nevo

Regression Discontinuity (RD) designs rely on the continuity of potential
outcome means at the cutoff, but this assumption often fails when other
treatments or policies are implemented at this cutoff. We characterize the bias
in sharp and fuzzy RD designs due to violations of continuity, and develop a
general identification framework that leverages multiple time periods to
estimate local effects on the (un)treated. We extend the framework to settings
with carry-over effects and time-varying running variables, highlighting
additional assumptions needed for valid causal inference. We propose an
estimation framework that extends the conventional and bias-corrected
single-period local linear regression framework to multiple periods and
different sampling schemes, and study its finite-sample performance in
simulations. Finally, we revisit a prior study on fiscal rules in Italy to
illustrate the practical utility of our approach.

arXiv link: http://arxiv.org/abs/2408.05847v2

Econometrics arXiv paper, submitted: 2024-08-11

Bank Cost Efficiency and Credit Market Structure Under a Volatile Exchange Rate

Authors: Mikhail Mamonov, Christopher Parmeter, Artem Prokhorov

We study the impact of exchange rate volatility on cost efficiency and market
structure in a cross-section of banks that have non-trivial exposures to
foreign currency (FX) operations. We use unique data on quarterly revaluations
of FX assets and liabilities (Revals) that Russian banks were reporting between
2004 Q1 and 2020 Q2. {\it First}, we document that Revals constitute the
largest part of the banks' total costs, 26.5% on average, with considerable
variation across banks. {\it Second}, we find that stochastic estimates of cost
efficiency are both severely downward biased -- by 30% on average -- and
generally not rank preserving when Revals are ignored, except for the tails, as
our nonparametric copulas reveal. To ensure generalizability to other emerging
market economies, we suggest a two-stage approach that does not rely on Revals
but is able to shrink the downward bias in cost efficiency estimates by
two-thirds. {\it Third}, we show that Revals are triggered by the mismatch in
the banks' FX operations, which, in turn, is driven by household FX deposits
and the instability of Ruble's exchange rate. {\it Fourth}, we find that the
failure to account for Revals leads to the erroneous conclusion that the credit
market is inefficient, which is driven by the upper quartile of the banks'
distribution by total assets. Revals have considerable negative implications
for financial stability which can be attenuated by the cross-border
diversification of bank assets.

arXiv link: http://arxiv.org/abs/2408.05688v1

Econometrics arXiv updated paper (originally submitted: 2024-08-11)

Change-Point Detection in Time Series Using Mixed Integer Programming

Authors: Artem Prokhorov, Peter Radchenko, Alexander Semenov, Anton Skrobotov

We use cutting-edge mixed integer optimization (MIO) methods to develop a
framework for detection and estimation of structural breaks in time series
regression models. The framework is constructed based on the least squares
problem subject to a penalty on the number of breakpoints. We restate the
$l_0$-penalized regression problem as a quadratic programming problem with
integer- and real-valued arguments and show that MIO is capable of finding
provably optimal solutions using a well-known optimization solver. Compared to
the popular $l_1$-penalized regression (LASSO) and other classical methods, the
MIO framework permits simultaneous estimation of the number and location of
structural breaks as well as regression coefficients, while accommodating the
option of specifying a given or minimal number of breaks. We derive the
asymptotic properties of the estimator and demonstrate its effectiveness
through extensive numerical experiments, confirming a more accurate estimation
of multiple breaks as compared to popular non-MIO alternatives. Two empirical
examples demonstrate usefulness of the framework in applications from business
and economic statistics.

arXiv link: http://arxiv.org/abs/2408.05665v3

Econometrics arXiv updated paper (originally submitted: 2024-08-09)

ARMA-Design: Optimal Treatment Allocation Strategies for A/B Testing in Partially Observable Time Series Experiments

Authors: Ke Sun, Linglong Kong, Hongtu Zhu, Chengchun Shi

Online experiments %in which experimental units receive a sequence of
treatments over time are frequently employed in many technological companies to
evaluate the performance of a newly developed policy, product, or treatment
relative to a baseline control. In many applications, the experimental units
receive a sequence of treatments over time. To handle these time-dependent
settings, existing A/B testing solutions typically assume a fully observable
experimental environment that satisfies the Markov condition. However, this
assumption often does not hold in practice.
This paper studies the optimal design for A/B testing in partially observable
online experiments. We introduce a controlled (vector) autoregressive moving
average model to capture partial observability. We introduce a small signal
asymptotic framework to simplify the calculation of asymptotic mean squared
errors of average treatment effect estimators under various designs. We develop
two algorithms to estimate the optimal design: one utilizing constrained
optimization and the other employing reinforcement learning. We demonstrate the
superior performance of our designs using two dispatch simulators that
realistically mimic the behaviors of drivers and passengers to create virtual
environments, along with two real datasets from a ride-sharing company. A
Python implementation of our proposal is available at
https://github.com/datake/ARMADesign.

arXiv link: http://arxiv.org/abs/2408.05342v4

Econometrics arXiv paper, submitted: 2024-08-09

What are the real implications for $CO_2$ as generation from renewables increases?

Authors: Dhruv Suri, Jacques de Chalendar, Ines Azevedo

Wind and solar electricity generation account for 14% of total electricity
generation in the United States and are expected to continue to grow in the
next decades. In low carbon systems, generation from renewable energy sources
displaces conventional fossil fuel power plants resulting in lower system-level
emissions and emissions intensity. However, we find that intermittent
generation from renewables changes the way conventional thermal power plants
operate, and that the displacement of generation is not 1 to 1 as expected. Our
work provides a method that allows policy and decision makers to continue to
track the effect of additional renewable capacity and the resulting thermal
power plant operational responses.

arXiv link: http://arxiv.org/abs/2408.05209v1

Econometrics arXiv paper, submitted: 2024-08-08

Vela: A Data-Driven Proposal for Joint Collaboration in Space Exploration

Authors: Holly M. Dinkel, Jason K. Cornelius

The UN Office of Outer Space Affairs identifies synergy of space development
activities and international cooperation through data and infrastructure
sharing in their Sustainable Development Goal 17 (SDG17). Current multilateral
space exploration paradigms, however, are divided between the Artemis and the
Roscosmos-CNSA programs to return to the moon and establish permanent human
settlements. As space agencies work to expand human presence in space, economic
resource consolidation in pursuit of technologically ambitious space
expeditions is the most sensible path to accomplish SDG17. This paper compiles
a budget dataset for the top five federally-funded space agencies: CNSA, ESA,
JAXA, NASA, and Roscosmos. Using time-series econometric anslysis methods in
STATA, this work analyzes each agency's economic contributions toward space
exploration. The dataset results are used to propose a multinational space
mission, Vela, for the development of an orbiting space station around Mars in
the late 2030s. Distribution of economic resources and technological
capabilities by the respective space programs are proposed to ensure
programmatic redundancy and increase the odds of success on the given timeline.

arXiv link: http://arxiv.org/abs/2408.04730v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2024-08-08

Difference-in-Differences for Health Policy and Practice: A Review of Modern Methods

Authors: Shuo Feng, Ishani Ganguli, Youjin Lee, John Poe, Andrew Ryan, Alyssa Bilinski

Difference-in-differences (DiD) is the most popular observational causal
inference method in health policy, employed to evaluate the real-world impact
of policies and programs. To estimate treatment effects, DiD relies on the
"parallel trends assumption", that on average treatment and comparison groups
would have had parallel trajectories in the absence of an intervention.
Historically, DiD has been considered broadly applicable and straightforward to
implement, but recent years have seen rapid advancements in DiD methods. This
paper reviews and synthesizes these innovations for medical and health policy
researchers. We focus on four topics: (1) assessing the parallel trends
assumption in health policy contexts; (2) relaxing the parallel trends
assumption when appropriate; (3) employing estimators to account for staggered
treatment timing; and (4) conducting robust inference for analyses in which
normal-based clustered standard errors are inappropriate. For each, we explain
challenges and common pitfalls in traditional DiD and modern methods available
to address these issues.

arXiv link: http://arxiv.org/abs/2408.04617v1

Econometrics arXiv paper, submitted: 2024-08-08

Semiparametric Estimation of Individual Coefficients in a Dyadic Link Formation Model Lacking Observable Characteristics

Authors: L. Sanna Stephan

Dyadic network formation models have wide applicability in economic research,
yet are difficult to estimate in the presence of individual specific effects
and in the absence of distributional assumptions regarding the model noise
component. The availability of (continuously distributed) individual or link
characteristics generally facilitates estimation. Yet, while data on social
networks has recently become more abundant, the characteristics of the entities
involved in the link may not be measured. Adapting the procedure of KS,
I propose to use network data alone in a semiparametric estimation of the
individual fixed effect coefficients, which carry the interpretation of the
individual relative popularity. This entails the possibility to anticipate how
a new-coming individual will connect in a pre-existing group. The estimator,
needed for its fast convergence, fails to implement the monotonicity assumption
regarding the model noise component, thereby potentially reversing the order if
the fixed effect coefficients. This and other numerical issues can be
conveniently tackled by my novel, data-driven way of normalising the fixed
effects, which proves to outperform a conventional standardisation in many
cases. I demonstrate that the normalised coefficients converge both at the same
rate and to the same limiting distribution as if the true error distribution
was known. The cost of semiparametric estimation is thus purely computational,
while the potential benefits are large whenever the errors have a strongly
convex or strongly concave distribution.

arXiv link: http://arxiv.org/abs/2408.04552v1

Econometrics arXiv paper, submitted: 2024-08-07

Robust Estimation of Regression Models with Potentially Endogenous Outliers via a Modern Optimization Lens

Authors: Zhan Gao, Hyungsik Roger Moon

This paper addresses the robust estimation of linear regression models in the
presence of potentially endogenous outliers. Through Monte Carlo simulations,
we demonstrate that existing $L_1$-regularized estimation methods, including
the Huber estimator and the least absolute deviation (LAD) estimator, exhibit
significant bias when outliers are endogenous. Motivated by this finding, we
investigate $L_0$-regularized estimation methods. We propose systematic
heuristic algorithms, notably an iterative hard-thresholding algorithm and a
local combinatorial search refinement, to solve the combinatorial optimization
problem of the \(L_0\)-regularized estimation efficiently. Our Monte Carlo
simulations yield two key results: (i) The local combinatorial search algorithm
substantially improves solution quality compared to the initial
projection-based hard-thresholding algorithm while offering greater
computational efficiency than directly solving the mixed integer optimization
problem. (ii) The $L_0$-regularized estimator demonstrates superior performance
in terms of bias reduction, estimation accuracy, and out-of-sample prediction
errors compared to $L_1$-regularized alternatives. We illustrate the practical
value of our method through an empirical application to stock return
forecasting.

arXiv link: http://arxiv.org/abs/2408.03930v1

Econometrics arXiv updated paper (originally submitted: 2024-08-07)

Robust Identification in Randomized Experiments with Noncompliance

Authors: Désiré Kédagni, Huan Wu, Yi Cui

Instrument variable (IV) methods are widely used in empirical research to
identify causal effects of a policy. In the local average treatment effect
(LATE) framework, the IV estimand identifies the LATE under three main
assumptions: random assignment, exclusion restriction, and monotonicity.
However, these assumptions are often questionable in many applications, leading
some researchers to doubt the causal interpretation of the IV estimand. This
paper considers a robust identification of causal parameters in a randomized
experiment setting with noncompliance where the standard LATE assumptions could
be violated. We discuss identification under two sets of weaker assumptions:
random assignment and exclusion restriction (without monotonicity), and random
assignment and monotonicity (without exclusion restriction). We derive sharp
bounds on some causal parameters under these two sets of relaxed LATE
assumptions. Finally, we apply our method to revisit the random information
experiment conducted in Bursztyn, Gonz\'alez, and Yanagizawa-Drott (2020) and
find that the standard LATE assumptions are jointly incompatible in this
application. We then estimate the robust identified sets under the two sets of
relaxed assumptions.

arXiv link: http://arxiv.org/abs/2408.03530v4

Econometrics arXiv updated paper (originally submitted: 2024-08-06)

Efficient Asymmetric Causality Tests

Authors: Abdulnasser Hatemi-J

Asymmetric causality tests are increasingly gaining popularity in different
scientific fields. This approach corresponds better to reality since logical
reasons behind asymmetric behavior exist and need to be considered in empirical
investigations. Hatemi-J (2012) introduced the asymmetric causality tests via
partial cumulative sums for positive and negative components of the variables
operating within the vector autoregressive (VAR) model. However, since the
residuals across the equations in the VAR model are not independent, the
ordinary least squares method for estimating the parameters is not efficient.
Additionally, asymmetric causality tests mean having different causal
parameters (i.e., for positive or negative components), thus, it is crucial to
assess not only if these causal parameters are individually statistically
significant, but also if their difference is statistically significant.
Consequently, tests of difference between estimated causal parameters should
explicitly be conducted, which are neglected in the existing literature. The
purpose of the current paper is to deal with these issues explicitly. An
application is provided, and ten different hypotheses pertinent to the
asymmetric causal interaction between two largest financial markets worldwide
are efficiently tested within a multivariate setting.

arXiv link: http://arxiv.org/abs/2408.03137v4

Econometrics arXiv paper, submitted: 2024-08-05

A nonparametric test for diurnal variation in spot correlation processes

Authors: Kim Christensen, Ulrich Hounyo, Zhi Liu

The association between log-price increments of exchange-traded equities, as
measured by their spot correlation estimated from high-frequency data, exhibits
a pronounced upward-sloping and almost piecewise linear relationship at the
intraday horizon. There is notably lower-on average less positive-correlation
in the morning than in the afternoon. We develop a nonparametric testing
procedure to detect such deterministic variation in a correlation process. The
test statistic has a known distribution under the null hypothesis, whereas it
diverges under the alternative. It is robust against stochastic correlation. We
run a Monte Carlo simulation to discover the finite sample properties of the
test statistic, which are close to the large sample predictions, even for small
sample sizes and realistic levels of diurnal variation. In an application, we
implement the test on a monthly basis for a high-frequency dataset covering the
stock market over an extended period. The test leads to rejection of the null
most of the time. This suggests diurnal variation in the correlation process is
a nontrivial effect in practice.

arXiv link: http://arxiv.org/abs/2408.02757v1

Econometrics arXiv updated paper (originally submitted: 2024-08-05)

Testing identifying assumptions in Tobit Models

Authors: Santiago Acerenza, Otávio Bartalotti, Federico Veneri

This paper develops sharp testable implications for Tobit and IV-Tobit
models' identifying assumptions: linear index specification, (joint) normality
of latent errors, and treatment (instrument) exogeneity and relevance. The new
sharp testable equalities can detect all possible observable violations of the
identifying conditions. We propose a testing procedure for the model's validity
using existing inference methods for intersection bounds. Simulation results
suggests proper size for large samples and that the test is powerful to detect
large violation of the exogeneity assumption and violations in the error
structure. Finally, we review and propose new alternative paths to partially
identify the parameters of interest under less restrictive assumptions.

arXiv link: http://arxiv.org/abs/2408.02573v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-08-05

Kullback-Leibler-based characterizations of score-driven updates

Authors: Ramon de Punder, Timo Dimitriadis, Rutger-Jan Lange

Score-driven models have been applied in some 400 published articles over the
last decade. Much of this literature cites the optimality result in Blasques et
al. (2015), which, roughly, states that sufficiently small score-driven updates
are unique in locally reducing the Kullback-Leibler divergence relative to the
true density for every observation. This is at odds with other well-known
optimality results; the Kalman filter, for example, is optimal in a
mean-squared-error sense, but occasionally moves away from the true state. We
show that score-driven updates are, similarly, not guaranteed to improve the
localized Kullback-Leibler divergence at every observation. The seemingly
stronger result in Blasques et al. (2015) is due to their use of an improper
(localized) scoring rule. Even as a guaranteed improvement for every
observation is unattainable, we prove that sufficiently small score-driven
updates are unique in reducing the Kullback-Leibler divergence relative to the
true density in expectation. This positive, albeit weaker, result justifies the
continued use of score-driven models and places their information-theoretic
properties on solid footing.

arXiv link: http://arxiv.org/abs/2408.02391v2

Econometrics arXiv paper, submitted: 2024-08-04

Analysis of Factors Affecting the Entry of Foreign Direct Investment into Indonesia (Case Study of Three Industrial Sectors in Indonesia)

Authors: Tracy Patricia Nindry Abigail Rolnmuch, Yuhana Astuti

The realization of FDI and DDI from January to December 2022 reached
Rp1,207.2 trillion. The largest FDI investment realization by sector was led by
the Basic Metal, Metal Goods, Non-Machinery, and Equipment Industry sector,
followed by the Mining sector and the Electricity, Gas, and Water sector. The
uneven amount of FDI investment realization in each industry and the impact of
the COVID-19 pandemic in Indonesia are the main issues addressed in this study.
This study aims to identify the factors that influence the entry of FDI into
industries in Indonesia and measure the extent of these factors' influence on
the entry of FDI. In this study, classical assumption tests and hypothesis
tests are conducted to investigate whether the research model is robust enough
to provide strategic options nationally. Moreover, this study uses the ordinary
least squares (OLS) method. The results show that the electricity factor does
not influence FDI inflows in the three industries. The Human Development Index
(HDI) factor has a significant negative effect on FDI in the Mining Industry
and a significant positive effect on FDI in the Basic Metal, Metal Goods,
Non-Machinery, and Equipment Industries. However, HDI does not influence FDI in
the Electricity, Gas, and Water Industries in Indonesia.

arXiv link: http://arxiv.org/abs/2408.01985v1

Econometrics arXiv updated paper (originally submitted: 2024-08-02)

Distributional Difference-in-Differences Models with Multiple Time Periods

Authors: Andrea Ciaccio

Researchers are often interested in evaluating the impact of a policy on the
entire (or specific parts of the) distribution of the outcome of interest. In
this paper, I provide a method to recover the whole distribution of the
untreated potential outcome for the treated group in non-experimental settings
with staggered treatment adoption by generalizing the existing quantile
treatment effects on the treated (QTT) estimator proposed by Callaway and Li
(2019). Besides the QTT, I consider different approaches that anonymously
summarize the quantiles of the distribution of the outcome of interest (such as
tests for stochastic dominance rankings) without relying on rank invariance
assumptions. The finite-sample properties of the estimator proposed are
analyzed via different Monte Carlo simulations. Despite being slightly biased
for relatively small sample sizes, the proposed method's performance increases
substantially when the sample size increases.

arXiv link: http://arxiv.org/abs/2408.01208v2

Econometrics arXiv paper, submitted: 2024-08-02

Distilling interpretable causal trees from causal forests

Authors: Patrick Rehill

Machine learning methods for estimating treatment effect heterogeneity
promise greater flexibility than existing methods that test a few pre-specified
hypotheses. However, one problem these methods can have is that it can be
challenging to extract insights from complicated machine learning models. A
high-dimensional distribution of conditional average treatment effects may give
accurate, individual-level estimates, but it can be hard to understand the
underlying patterns; hard to know what the implications of the analysis are.
This paper proposes the Distilled Causal Tree, a method for distilling a
single, interpretable causal tree from a causal forest. This compares well to
existing methods of extracting a single tree, particularly in noisy data or
high-dimensional data where there are many correlated features. Here it even
outperforms the base causal forest in most simulations. Its estimates are
doubly robust and asymptotically normal just as those of the causal forest are.

arXiv link: http://arxiv.org/abs/2408.01023v1

Econometrics arXiv cross-link from math.DS (math.DS), submitted: 2024-08-02

Application of Superconducting Technology in the Electricity Industry: A Game-Theoretic Analysis of Government Subsidy Policies and Power Company Equipment Upgrade Decisions

Authors: Mingyang Li, Maoqin Yuan, Han Pengsihua, Yuan Yuan, Zejun Wang

This study investigates the potential impact of "LK-99," a novel material
developed by a Korean research team, on the power equipment industry. Using
evolutionary game theory, the interactions between governmental subsidies and
technology adoption by power companies are modeled. A key innovation of this
research is the introduction of sensitivity analyses concerning time delays and
initial subsidy amounts, which significantly influence the strategic decisions
of both government and corporate entities. The findings indicate that these
factors are critical in determining the rate of technology adoption and the
efficiency of the market as a whole. Due to existing data limitations, the
study offers a broad overview of likely trends and recommends the inclusion of
real-world data for more precise modeling once the material demonstrates
room-temperature superconducting characteristics. The research contributes
foundational insights valuable for future policy design and has significant
implications for advancing the understanding of technology adoption and market
dynamics.

arXiv link: http://arxiv.org/abs/2408.01017v1

Econometrics arXiv updated paper (originally submitted: 2024-08-01)

Identification and Inference for Synthetic Control Methods with Spillover Effects: Estimating the Economic Cost of the Sudan Split

Authors: Shosei Sakaguchi, Hayato Tagawa

The synthetic control method (SCM) is widely used for causal inference with
panel data, particularly when there are few treated units. SCM assumes the
stable unit treatment value assumption (SUTVA), which posits that potential
outcomes are unaffected by the treatment status of other units. However,
interventions often impact not only treated units but also untreated units,
known as spillover effects. This study introduces a novel panel data method
that extends SCM to allow for spillover effects and estimate both treatment and
spillover effects. This method leverages a spatial autoregressive panel data
model to account for spillover effects. We also propose Bayesian inference
methods using Bayesian horseshoe priors for regularization. We apply the
proposed method to two empirical studies: evaluating the effect of the
California tobacco tax on consumption and estimating the economic impact of the
2011 division of Sudan on GDP per capita.

arXiv link: http://arxiv.org/abs/2408.00291v2

Econometrics arXiv paper, submitted: 2024-07-31

Methodological Foundations of Modern Causal Inference in Social Science Research

Authors: Guanghui Pan

This paper serves as a literature review of methodology concerning the
(modern) causal inference methods to address the causal estimand with
observational/survey data that have been or will be used in social science
research. Mainly, this paper is divided into two parts: inference from
statistical estimand for the causal estimand, in which we reviewed the
assumptions for causal identification and the methodological strategies
addressing the problems if some of the assumptions are violated. We also
discuss the asymptotical analysis concerning the measure from the observational
data to the theoretical measure and replicate the deduction of the
efficient/doubly robust average treatment effect estimator, which is commonly
used in current social science analysis.

arXiv link: http://arxiv.org/abs/2408.00032v1

Econometrics arXiv updated paper (originally submitted: 2024-07-30)

Potential weights and implicit causal designs in linear regression

Authors: Jiafeng Chen

When we interpret linear regression as estimating causal effects justified by
quasi-experimental treatment variation, what do we mean? This paper
characterizes the necessary implications when linear regressions are
interpreted causally. A minimal requirement for causal interpretation is that
the regression estimates some contrast of individual potential outcomes under
the true treatment assignment process. This requirement implies linear
restrictions on the true distribution of treatment. Solving these linear
restrictions leads to a set of implicit designs. Implicit designs are plausible
candidates for the true design if the regression were to be causal. The
implicit designs serve as a framework that unifies and extends existing
theoretical results across starkly distinct settings (including multiple
treatment, panel, and instrumental variables). They lead to new theoretical
insights for widely used but less understood specifications.

arXiv link: http://arxiv.org/abs/2407.21119v3

Econometrics arXiv updated paper (originally submitted: 2024-07-29)

On the power properties of inference for parameters with interval identified sets

Authors: Federico A. Bugni, Mengsi Gao, Filip Obradovic, Amilcar Velez

This paper studies the power properties of confidence intervals (CIs) for a
partially-identified parameter of interest with an interval identified set. We
assume the researcher has bounds estimators to construct the CIs proposed by
Stoye (2009), referred to as CI1, CI2, and CI3. We also assume that these
estimators are "ordered": the lower bound estimator is less than or equal to
the upper bound estimator.
Under these conditions, we establish two results. First, we show that CI1 and
CI2 are equally powerful, and both dominate CI3. Second, we consider a
favorable situation in which there are two possible bounds estimators to
construct these CIs, and one is more efficient than the other. One would expect
that the more efficient bounds estimator yields more powerful inference. We
prove that this desirable result holds for CI1 and CI2, but not necessarily for
CI3.

arXiv link: http://arxiv.org/abs/2407.20386v2

Econometrics arXiv cross-link from q-fin.RM (q-fin.RM), submitted: 2024-07-29

Testing for the Asymmetric Optimal Hedge Ratios: With an Application to Bitcoin

Authors: Abdulnasser Hatemi-J

Reducing financial risk is of paramount importance to investors, financial
institutions, and corporations. Since the pioneering contribution of Johnson
(1960), the optimal hedge ratio based on futures is regularly utilized. The
current paper suggests an explicit and efficient method for testing the null
hypothesis of a symmetric optimal hedge ratio against an asymmetric alternative
one within a multivariate setting. If the null is rejected, the position
dependent optimal hedge ratios can be estimated via the suggested model. This
approach is expected to enhance the accuracy of the implemented hedging
strategies compared to the standard methods since it accounts for the fact that
the source of risk depends on whether the investor is a buyer or a seller of
the risky asset. An application is provided using spot and futures prices of
Bitcoin. The results strongly support the view that the optimal hedge ratio for
this cryptocurrency is position dependent. The investor that is long in Bitcoin
has a much higher conditional optimal hedge ratio compared to the one that is
short in the asset. The difference between the two conditional optimal hedge
ratios is statistically significant, which has important repercussions for
implementing risk management strategies.

arXiv link: http://arxiv.org/abs/2407.19932v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-07-29

Improving the Estimation of Lifetime Effects in A/B Testing via Treatment Locality

Authors: Shuze Chen, David Simchi-Levi, Chonghuan Wang

Utilizing randomized experiments to evaluate the effect of short-term
treatments on the short-term outcomes has been well understood and become the
golden standard in industrial practice. However, as service systems become
increasingly dynamical and personalized, much focus is shifting toward
maximizing long-term outcomes, such as customer lifetime value, through
lifetime exposure to interventions. Our goal is to assess the impact of
treatment and control policies on long-term outcomes from relatively short-term
observations, such as those generated by A/B testing. A key managerial
observation is that many practical treatments are local, affecting only
targeted states while leaving other parts of the policy unchanged. This paper
rigorously investigates whether and how such locality can be exploited to
improve estimation of long-term effects in Markov Decision Processes (MDPs), a
fundamental model of dynamic systems. We first develop optimal inference
techniques for general A/B testing in MDPs and establish corresponding
efficiency bounds. We then propose methods to harness the localized structure
by sharing information on the non-targeted states. Our new estimator can
achieve a linear reduction with the number of test arms for a major part of the
variance without sacrificing unbiasedness. It also matches a tighter variance
lower bound that accounts for locality. Furthermore, we extend our framework to
a broad class of differentiable estimators, which encompasses many widely used
approaches in practice. We show that all such estimators can benefit from
variance reduction through information sharing without increasing their bias.
Together, these results provide both theoretical foundations and practical
tools for conducting efficient experiments in dynamic service systems with
local treatments.

arXiv link: http://arxiv.org/abs/2407.19618v3

Econometrics arXiv paper, submitted: 2024-07-28

Heterogeneous Grouping Structures in Panel Data

Authors: Katerina Chrysikou, George Kapetanios

In this paper we examine the existence of heterogeneity within a group, in
panels with latent grouping structure. The assumption of within group
homogeneity is prevalent in this literature, implying that the formation of
groups alleviates cross-sectional heterogeneity, regardless of the prior
knowledge of groups. While the latter hypothesis makes inference powerful, it
can be often restrictive. We allow for models with richer heterogeneity that
can be found both in the cross-section and within a group, without imposing the
simple assumption that all groups must be heterogeneous. We further contribute
to the method proposed by su2016identifying, by showing that the model
parameters can be consistently estimated and the groups, while unknown, can be
identifiable in the presence of different types of heterogeneity. Within the
same framework we consider the validity of assuming both cross-sectional and
within group homogeneity, using testing procedures. Simulations demonstrate
good finite-sample performance of the approach in both classification and
estimation, while empirical applications across several datasets provide
evidence of multiple clusters, as well as reject the hypothesis of within group
homogeneity.

arXiv link: http://arxiv.org/abs/2407.19509v1

Econometrics arXiv updated paper (originally submitted: 2024-07-27)

Using Total Margin of Error to Account for Non-Sampling Error in Election Polls: The Case of Nonresponse

Authors: Jeff Dominitz, Charles F. Manski

The potential impact of non-sampling errors on election polls is well known,
but measurement has focused on the margin of sampling error. Survey
statisticians have long recommended measurement of total survey error by mean
square error (MSE), which jointly measures sampling and non-sampling errors. We
think it reasonable to use the square root of maximum MSE to measure the total
margin of error (TME). Measurement of TME should encompass both sampling error
and all forms of non-sampling error. We suggest that measurement of TME should
be a standard feature in the reporting of polls. To provide a clear
illustration, and because we believe the exceedingly low response rates
commonly obtained by election polls to be a particularly worrisome source of
potential error, we demonstrate how to measure the potential impact of
nonresponse using the concept of TME. We first show how to measure TME when a
pollster lacks any knowledge of the candidate preferences of nonrespondents. We
then extend the analysis to settings where the pollster has partial knowledge
that bounds the preferences of non-respondents. In each setting, we derive a
simple poll estimate that approximately minimizes TME, a midpoint estimate, and
compare it to a conventional poll estimate.

arXiv link: http://arxiv.org/abs/2407.19339v3

Econometrics arXiv paper, submitted: 2024-07-25

Starting Small: Prioritizing Safety over Efficacy in Randomized Experiments Using the Exact Finite Sample Likelihood

Authors: Neil Christy, A. E. Kowalski

We use the exact finite sample likelihood and statistical decision theory to
answer questions of “why?” and “what should you have done?” using data from
randomized experiments and a utility function that prioritizes safety over
efficacy. We propose a finite sample Bayesian decision rule and a finite sample
maximum likelihood decision rule. We show that in finite samples from 2 to 50,
it is possible for these rules to achieve better performance according to
established maximin and maximum regret criteria than a rule based on the
Boole-Frechet-Hoeffding bounds. We also propose a finite sample maximum
likelihood criterion. We apply our rules and criterion to an actual clinical
trial that yielded a promising estimate of efficacy, and our results point to
safety as a reason for why results were mixed in subsequent trials.

arXiv link: http://arxiv.org/abs/2407.18206v1

Econometrics arXiv updated paper (originally submitted: 2024-07-25)

Enhanced power enhancements for testing many moment equalities: Beyond the $2$- and $\infty$-norm

Authors: Anders Bredahl Kock, David Preinerstorfer

Contemporary testing problems in statistics are increasingly complex, i.e.,
high-dimensional. Tests based on the $2$- and $\infty$-norm have received
considerable attention in such settings, as they are powerful against dense and
sparse alternatives, respectively. The power enhancement principle of Fan et
al. (2015) combines these two norms to construct improved tests that are
powerful against both types of alternatives. In the context of testing whether
a candidate parameter satisfies a large number of moment equalities, we
construct a test that harnesses the strength of all $p$-norms with $p\in[2,
\infty]$. As a result, this test is consistent against strictly more
alternatives than any test based on a single $p$-norm. In particular, our test
is consistent against more alternatives than tests based on the $2$- and
$\infty$-norm, which is what most implementations of the power enhancement
principle target.
We illustrate the scope of our general results by using them to construct a
test that simultaneously dominates the Anderson-Rubin test (based on $p=2$),
tests based on the $\infty$-norm and power enhancement based combinations of
these in terms of consistency in the linear instrumental variable model with
many instruments.

arXiv link: http://arxiv.org/abs/2407.17888v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-07-24

Formalising causal inference as prediction on a target population

Authors: Benedikt Höltgen, Robert C. Williamson

The standard approach to causal modelling especially in social and health
sciences is the potential outcomes framework due to Neyman and Rubin. In this
framework, observations are thought to be drawn from a distribution over
variables of interest, and the goal is to identify parameters of this
distribution. Even though the stated goal is often to inform decision making on
some target population, there is no straightforward way to include these target
populations in the framework. Instead of modelling the relationship between the
observed sample and the target population, the inductive assumptions in this
framework take the form of abstract sampling and independence assumptions. In
this paper, we develop a version of this framework that construes causal
inference as treatment-wise predictions for finite populations where all
assumptions are testable in retrospect; this means that one can not only test
predictions themselves (without any fundamental problem) but also investigate
sources of error when they fail. Due to close connections to the original
framework, established methods can still be be analysed under the new
framework.

arXiv link: http://arxiv.org/abs/2407.17385v3

Econometrics arXiv paper, submitted: 2024-07-24

Identification and inference of outcome conditioned partial effects of general interventions

Authors: Zhengyu Zhang, Zequn Jin, Lihua Lin

This paper proposes a new class of distributional causal quantities, referred
to as the outcome conditioned partial policy effects (OCPPEs), to
measure the average effect of a general counterfactual intervention of
a target covariate on the individuals in different quantile ranges of the
outcome distribution.
The OCPPE approach is valuable in several aspects: (i) Unlike the
unconditional quantile partial effect (UQPE) that is not $n$-estimable,
an OCPPE is $n$-estimable. Analysts can use it to capture heterogeneity
across the unconditional distribution of $Y$ as well as obtain accurate
estimation of the aggregated effect at the upper and lower tails of $Y$. (ii)
The semiparametric efficiency bound for an OCPPE is explicitly derived. (iii)
We propose an efficient debiased estimator for OCPPE, and provide feasible
uniform inference procedures for the OCPPE process. (iv) The efficient doubly
robust score for an OCPPE can be used to optimize infinitesimal nudges to a
continuous treatment by maximizing a quantile specific Empirical Welfare
function. We illustrate the method by analyzing how anti-smoking policies
impact low percentiles of live infants' birthweights.

arXiv link: http://arxiv.org/abs/2407.16950v1

Econometrics arXiv paper, submitted: 2024-07-23

Bayesian modelling of VAR precision matrices using stochastic block networks

Authors: Florian Huber, Gary Koop, Massimiliano Marcellino, Tobias Scheckel

Commonly used priors for Vector Autoregressions (VARs) induce shrinkage on
the autoregressive coefficients. Introducing shrinkage on the error covariance
matrix is sometimes done but, in the vast majority of cases, without
considering the network structure of the shocks and by placing the prior on the
lower Cholesky factor of the precision matrix. In this paper, we propose a
prior on the VAR error precision matrix directly. Our prior, which resembles a
standard spike and slab prior, models variable inclusion probabilities through
a stochastic block model that clusters shocks into groups. Within groups, the
probability of having relations across group members is higher (inducing less
sparsity) whereas relations across groups imply a lower probability that
members of each group are conditionally related. We show in simulations that
our approach recovers the true network structure well. Using a US macroeconomic
data set, we illustrate how our approach can be used to cluster shocks together
and that this feature leads to improved density forecasts.

arXiv link: http://arxiv.org/abs/2407.16349v1

Econometrics arXiv paper, submitted: 2024-07-22

Estimating Distributional Treatment Effects in Randomized Experiments: Machine Learning for Variance Reduction

Authors: Undral Byambadalai, Tatsushi Oka, Shota Yasui

We propose a novel regression adjustment method designed for estimating
distributional treatment effect parameters in randomized experiments.
Randomized experiments have been extensively used to estimate treatment effects
in various scientific fields. However, to gain deeper insights, it is essential
to estimate distributional treatment effects rather than relying solely on
average effects. Our approach incorporates pre-treatment covariates into a
distributional regression framework, utilizing machine learning techniques to
improve the precision of distributional treatment effect estimators. The
proposed approach can be readily implemented with off-the-shelf machine
learning methods and remains valid as long as the nuisance components are
reasonably well estimated. Also, we establish the asymptotic properties of the
proposed estimator and present a uniformly valid inference method. Through
simulation results and real data analysis, we demonstrate the effectiveness of
integrating machine learning techniques in reducing the variance of
distributional treatment effect estimators in finite samples.

arXiv link: http://arxiv.org/abs/2407.16037v1

Econometrics arXiv paper, submitted: 2024-07-22

Big Data Analytics-Enabled Dynamic Capabilities and Market Performance: Examining the Roles of Marketing Ambidexterity and Competitor Pressure

Authors: Gulfam Haider, Laiba Zubair, Aman Saleem

This study, rooted in dynamic capability theory and the developing era of Big
Data Analytics, explores the transformative effect of BDA EDCs on marketing.
Ambidexterity and firms market performance in the textile sector of Pakistans
cities. Specifically, focusing on the firms who directly deal with customers,
investigates the nuanced role of BDA EDCs in textile retail firms potential to
navigate market dynamics. Emphasizing the exploitation component of marketing
ambidexterity, the study investigated the mediating function of marketing
ambidexterity and the moderating influence of competitive pressure. Using a
survey questionnaire, the study targets key choice makers in textile firms of
Faisalabad, Chiniot and Lahore, Pakistan. The PLS-SEM model was employed as an
analytical technique, allows for a full examination of the complicated
relations between BDA EDCs, marketing ambidexterity, rival pressure, and market
performance. The study Predicting a positive impact of Big Data on marketing
ambidexterity, with a specific emphasis on exploitation. The study expects this
exploitation-orientated marketing ambidexterity to significantly enhance the
firms market performance. This research contributes to the existing literature
on dynamic capabilities-based frameworks from the perspective of the retail
segment of textile industry. The study emphasizes the role of BDA-EDCs in the
retail sector, imparting insights into the direct and indirect results of BDA
EDCs on market performance inside the retail area. The study s novelty lies in
its contextualization of BDA-EDCs in the textile zone of Faisalabad, Lahore and
Chiniot, providing a unique perspective on the effect of BDA on marketing
ambidexterity and market performance in firms. Methodologically, the study uses
numerous samples of retail sectors to make sure broader universality,
contributing realistic insights.

arXiv link: http://arxiv.org/abs/2407.15522v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-07-21

Nonlinear Binscatter Methods

Authors: Matias D. Cattaneo, Richard K. Crump, Max H. Farrell, Yingjie Feng

Binned scatter plots are a powerful statistical tool for empirical work in
the social, behavioral, and biomedical sciences. Available methods rely on a
quantile-based partitioning estimator of the conditional mean regression
function to primarily construct flexible yet interpretable visualization
methods, but they can also be used to estimate treatment effects, assess
uncertainty, and test substantive domain-specific hypotheses. This paper
introduces novel binscatter methods based on nonlinear, possibly nonsmooth
M-estimation methods, covering generalized linear, robust, and quantile
regression models. We provide a host of theoretical results and practical tools
for local constant estimation along with piecewise polynomial and spline
approximations, including (i) optimal tuning parameter (number of bins)
selection, (ii) confidence bands, and (iii) formal statistical tests regarding
functional form or shape restrictions. Our main results rely on novel strong
approximations for general partitioning-based estimators covering random,
data-driven partitions, which may be of independent interest. We demonstrate
our methods with an empirical application studying the relation between the
percentage of individuals without health insurance and per capita income at the
zip-code level. We provide general-purpose software packages implementing our
methods in Python, R, and Stata.

arXiv link: http://arxiv.org/abs/2407.15276v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-07-21

Weak-instrument-robust subvector inference in instrumental variables regression: A subvector Lagrange multiplier test and properties of subvector Anderson-Rubin confidence sets

Authors: Malte Londschien, Peter Bühlmann

We propose a weak-instrument-robust subvector Lagrange multiplier test for
instrumental variables regression. We show that it is asymptotically
size-correct under a technical condition. This is the first
weak-instrument-robust subvector test for instrumental variables regression to
recover the degrees of freedom of the commonly used non-weak-instrument-robust
Wald test. Additionally, we provide a closed-form solution for subvector
confidence sets obtained by inverting the subvector Anderson-Rubin test. We
show that they are centered around a k-class estimator. Also, we show that the
subvector confidence sets for single coefficients of the causal parameter are
jointly bounded if and only if Anderson's likelihood-ratio test rejects the
hypothesis that the first-stage regression parameter is of reduced rank, that
is, that the causal parameter is not identified. Finally, we show that if a
confidence set obtained by inverting the Anderson-Rubin test is bounded and
nonempty, it is equal to a Wald-based confidence set with a data-dependent
confidence level. We explicitly compute this Wald-based confidence test.

arXiv link: http://arxiv.org/abs/2407.15256v3

Econometrics arXiv updated paper (originally submitted: 2024-07-20)

Leveraging Uniformization and Sparsity for Computation and Estimation of Continuous Time Dynamic Discrete Choice Games

Authors: Jason R. Blevins

Continuous-time empirical dynamic discrete choice games offer notable
computational advantages over discrete-time models. This paper addresses
remaining computational challenges to further improve both model solution and
maximum likelihood estimation. We establish convergence rates for value
iteration and policy evaluation with fixed beliefs, and develop
Newton-Kantorovich methods that exploit analytical Jacobians and sparse matrix
structure. We apply uniformization both to derive a new representation of the
value function that draws direct analogies to discrete-time models and to
enable stable computation of the matrix exponential and its parameter
derivatives for likelihood-based estimation with snapshot data. Critically,
these methods provide a complete chain of analytical derivatives from the
equilibrium value function through the log likelihood function, eliminating
numerical approximations in both model solution and estimation and improving
finite-sample statistical properties. Monte Carlo experiments demonstrate
substantial gains in computational time and estimator accuracy, enabling
estimation of richer models of strategic interaction.

arXiv link: http://arxiv.org/abs/2407.14914v2

Econometrics arXiv updated paper (originally submitted: 2024-07-19)

Predicting the Distribution of Treatment Effects via Covariate-Adjustment, with an Application to Microcredit

Authors: Bruno Fava

Important questions for impact evaluation require knowledge not only of
average effects, but of the distribution of treatment effects. The inability to
observe individual counterfactuals makes answering these empirical questions
challenging. I propose an inference approach for points of the distribution of
treatment effects by incorporating predicted counterfactuals through covariate
adjustment. I provide finite-sample valid inference using sample-splitting, and
asymptotically valid inference using cross-fitting, under arguably weak
conditions. Revisiting five randomized controlled trials on microcredit that
reported null average effects, I find important distributional impacts, with
some individuals helped and others harmed by the increased credit access.

arXiv link: http://arxiv.org/abs/2407.14635v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-07-19

Spatially-clustered spatial autoregressive models with application to agricultural market concentration in Europe

Authors: Roy Cerqueti, Paolo Maranzano, Raffaele Mattera

In this paper, we present an extension of the spatially-clustered linear
regression models, namely, the spatially-clustered spatial autoregression
(SCSAR) model, to deal with spatial heterogeneity issues in clustering
procedures. In particular, we extend classical spatial econometrics models,
such as the spatial autoregressive model, the spatial error model, and the
spatially-lagged model, by allowing the regression coefficients to be spatially
varying according to a cluster-wise structure. Cluster memberships and
regression coefficients are jointly estimated through a penalized maximum
likelihood algorithm which encourages neighboring units to belong to the same
spatial cluster with shared regression coefficients. Motivated by the increase
of observed values of the Gini index for the agricultural production in Europe
between 2010 and 2020, the proposed methodology is employed to assess the
presence of local spatial spillovers on the market concentration index for the
European regions in the last decade. Empirical findings support the hypothesis
of fragmentation of the European agricultural market, as the regions can be
well represented by a clustering structure partitioning the continent into
three-groups, roughly approximated by a division among Western, North Central
and Southeastern regions. Also, we detect heterogeneous local effects induced
by the selected explanatory variables on the regional market concentration. In
particular, we find that variables associated with social, territorial and
economic relevance of the agricultural sector seem to act differently
throughout the spatial dimension, across the clusters and with respect to the
pooled model, and temporal dimension.

arXiv link: http://arxiv.org/abs/2407.15874v1

Econometrics arXiv updated paper (originally submitted: 2024-07-19)

Regression Adjustment for Estimating Distributional Treatment Effects in Randomized Controlled Trials

Authors: Tatsushi Oka, Shota Yasui, Yuta Hayakawa, Undral Byambadalai

In this paper, we address the issue of estimating and inferring
distributional treatment effects in randomized experiments. The distributional
treatment effect provides a more comprehensive understanding of treatment
heterogeneity compared to average treatment effects. We propose a regression
adjustment method that utilizes distributional regression and pre-treatment
information, establishing theoretical efficiency gains without imposing
restrictive distributional assumptions. We develop a practical inferential
framework and demonstrate its advantages through extensive simulations.
Analyzing water conservation policies, our method reveals that behavioral
nudges systematically shift consumption from high to moderate levels. Examining
health insurance coverage, we show the treatment reduces the probability of
zero doctor visits by 6.6 percentage points while increasing the likelihood of
3-6 visits. In both applications, our regression adjustment method
substantially improves precision and identifies treatment effects that were
statistically insignificant under conventional approaches.

arXiv link: http://arxiv.org/abs/2407.14074v2

Econometrics arXiv updated paper (originally submitted: 2024-07-18)

Revisiting Randomization with the Cube Method

Authors: Laurent Davezies, Guillaume Hollard, Pedro Vergara Merino

We introduce a new randomization procedure for experiments based on the cube
method, which achieves near-exact covariate balance. This ensures compliance
with standard balance tests and allows for balancing on many covariates,
enabling more precise estimation of treatment effects using pre-experimental
information. We derive theoretical bounds on imbalance as functions of sample
size and covariate dimension, and establish consistency and asymptotic
normality of the resulting estimators. Simulations show substantial
improvements in precision and covariate balance over existing methods,
particularly when the number of covariates is large.

arXiv link: http://arxiv.org/abs/2407.13613v3

Econometrics arXiv paper, submitted: 2024-07-17

Conduct Parameter Estimation in Homogeneous Goods Markets with Equilibrium Existence and Uniqueness Conditions: The Case of Log-linear Specification

Authors: Yuri Matsumura, Suguru Otani

We propose a constrained generalized method of moments estimator (GMM)
incorporating theoretical conditions for the unique existence of equilibrium
prices for estimating conduct parameters in a log-linear model with homogeneous
goods markets. First, we derive such conditions. Second, Monte Carlo
simulations confirm that in a log-linear model, incorporating the conditions
resolves the problems of implausibly low or negative values of conduct
parameters.

arXiv link: http://arxiv.org/abs/2407.12422v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-07-16

Factorial Difference-in-Differences

Authors: Yiqing Xu, Anqi Zhao, Peng Ding

We formulate factorial difference-in-differences (FDID) as a research design
that extends the canonical difference-in-differences (DID) to settings without
clean controls. Such situations often arise when researchers exploit
cross-sectional variation in a baseline factor and temporal variation in an
event affecting all units. In these applications, the exact estimand is often
unspecified and justification for using the DID estimator is unclear. We
formalize FDID by characterizing its data structure, target parameters, and
identifying assumptions. Framing FDID as a factorial design with two factors --
the baseline factor G and the exposure level Z, we define effect modification
and causal moderation as the associative and causal effects of G on the effect
of Z. Under standard DID assumptions, including no anticipation and parallel
trends, the DID estimator identifies effect modification but not causal
moderation. To identify the latter, we propose an additional factorial parallel
trends assumption. We also show that the canonical DID is a special case of
FDID under an exclusion restriction. We extend the framework to conditionally
valid assumptions and clarify regression-based implementations. We then discuss
extensions to repeated cross-sectional data and continuous G. We illustrate the
approach with an empirical example on the role of social capital in famine
relief in China.

arXiv link: http://arxiv.org/abs/2407.11937v4

Econometrics arXiv paper, submitted: 2024-07-16

Nowcasting R&D Expenditures: A Machine Learning Approach

Authors: Atin Aboutorabi, Gaétan de Rassenfosse

Macroeconomic data are crucial for monitoring countries' performance and
driving policy. However, traditional data acquisition processes are slow,
subject to delays, and performed at a low frequency. We address this
'ragged-edge' problem with a two-step framework. The first step is a supervised
learning model predicting observed low-frequency figures. We propose a
neural-network-based nowcasting model that exploits mixed-frequency,
high-dimensional data. The second step uses the elasticities derived from the
previous step to interpolate unobserved high-frequency figures. We apply our
method to nowcast countries' yearly research and development (R&D) expenditure
series. These series are collected through infrequent surveys, making them
ideal candidates for this task. We exploit a range of predictors, chiefly
Internet search volume data, and document the relevance of these data in
improving out-of-sample predictions. Furthermore, we leverage the high
frequency of our data to derive monthly estimates of R&D expenditures, which
are currently unobserved. We compare our results with those obtained from the
classical regression-based and the sparse temporal disaggregation methods.
Finally, we validate our results by reporting a strong correlation with monthly
R&D employment data.

arXiv link: http://arxiv.org/abs/2407.11765v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-07-15

A nonparametric test for rough volatility

Authors: Carsten H. Chong, Viktor Todorov

We develop a nonparametric test for deciding whether volatility of an asset
follows a standard semimartingale process, with paths of finite quadratic
variation, or a rough process with paths of infinite quadratic variation. The
test utilizes the fact that volatility is rough if and only if volatility
increments are negatively autocorrelated at high frequencies. It is based on
the sample autocovariance of increments of spot volatility estimates computed
from high-frequency asset return data. By showing a feasible CLT for this
statistic under the null hypothesis of semimartingale volatility paths, we
construct a test with fixed asymptotic size and an asymptotic power equal to
one. The test is derived under very general conditions for the data-generating
process. In particular, it is robust to jumps with arbitrary activity and to
the presence of market microstructure noise. In an application of the test to
SPY high-frequency data, we find evidence for rough volatility.

arXiv link: http://arxiv.org/abs/2407.10659v1

Econometrics arXiv updated paper (originally submitted: 2024-07-15)

The Dynamic, the Static, and the Weak: Factor models and the analysis of high-dimensional time series

Authors: Matteo Barigozzi, Marc Hallin

Several fundamental and closely interconnected issues related to factor
models are reviewed and discussed: dynamic versus static loadings, rate-strong
versus rate-weak factors, the concept of weakly common component recently
introduced by Gersing et al. (2023), the irrelevance of cross-sectional
ordering and the assumption of cross-sectional exchangeability, the impact of
undetected strong factors, and the problem of combining common and
idiosyncratic forecasts. Conclusions all point to the advantages of the General
Dynamic Factor Model approach of Forni et al. (2000) over the widely used
Static Approximate Factor Model introduced by Chamberlain and Rothschild
(1983).

arXiv link: http://arxiv.org/abs/2407.10653v3

Econometrics arXiv cross-link from q-fin.TR (q-fin.TR), submitted: 2024-07-14

Reinforcement Learning in High-frequency Market Making

Authors: Yuheng Zheng, Zihan Ding

This paper establishes a new and comprehensive theoretical analysis for the
application of reinforcement learning (RL) in high-frequency market making. We
bridge the modern RL theory and the continuous-time statistical models in
high-frequency financial economics. Different with most existing literature on
methodological research about developing various RL methods for market making
problem, our work is a pilot to provide the theoretical analysis. We target the
effects of sampling frequency, and find an interesting tradeoff between error
and complexity of RL algorithm when tweaking the values of the time increment
$\Delta$ $-$ as $\Delta$ becomes smaller, the error will be smaller but the
complexity will be larger. We also study the two-player case under the
general-sum game framework and establish the convergence of Nash equilibrium to
the continuous-time game equilibrium as $\Delta\rightarrow0$. The Nash
Q-learning algorithm, which is an online multi-agent RL method, is applied to
solve the equilibrium. Our theories are not only useful for practitioners to
choose the sampling frequency, but also very general and applicable to other
high-frequency financial decision making problems, e.g., optimal executions, as
long as the time-discretization of a continuous-time markov decision process is
adopted. Monte Carlo simulation evidence support all of our theories.

arXiv link: http://arxiv.org/abs/2407.21025v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2024-07-14

Low Volatility Stock Portfolio Through High Dimensional Bayesian Cointegration

Authors: Parley R Yang, Alexander Y Shestopaloff

We employ a Bayesian modelling technique for high dimensional cointegration
estimation to construct low volatility portfolios from a large number of
stocks. The proposed Bayesian framework effectively identifies sparse and
important cointegration relationships amongst large baskets of stocks across
various asset spaces, resulting in portfolios with reduced volatility. Such
cointegration relationships persist well over the out-of-sample testing time,
providing practical benefits in portfolio construction and optimization.
Further studies on drawdown and volatility minimization also highlight the
benefits of including cointegrated portfolios as risk management instruments.

arXiv link: http://arxiv.org/abs/2407.10175v1

Econometrics arXiv updated paper (originally submitted: 2024-07-13)

Estimation of Integrated Volatility Functionals with Kernel Spot Volatility Estimators

Authors: José E. Figueroa-López, Jincheng Pang, Bei Wu

For a multidimensional It\^o semimartingale, we consider the problem of
estimating integrated volatility functionals. Jacod and Rosenbaum (2013)
studied a plug-in type of estimator based on a Riemann sum approximation of the
integrated functional and a spot volatility estimator with a forward uniform
kernel. Motivated by recent results that show that spot volatility estimators
with general two-side kernels of unbounded support are more accurate, in this
paper, an estimator using a general kernel spot volatility estimator as the
plug-in is considered. A biased central limit theorem for estimating the
integrated functional is established with an optimal convergence rate. Unbiased
central limit theorems for estimators with proper de-biasing terms are also
obtained both at the optimal convergence regime for the bandwidth and when
applying undersmoothing. Our results show that one can significantly reduce the
estimator's bias by adopting a general kernel instead of the standard uniform
kernel. Our proposed bias-corrected estimators are found to maintain remarkable
robustness against bandwidth selection in a variety of sampling frequencies and
functions.

arXiv link: http://arxiv.org/abs/2407.09759v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-07-13

Sparse Asymptotic PCA: Identifying Sparse Latent Factors Across Time Horizon in High-Dimensional Time Series

Authors: Zhaoxing Gao

This paper introduces a novel sparse latent factor modeling framework using
sparse asymptotic Principal Component Analysis (APCA) to analyze the
co-movements of high-dimensional panel data over time. Unlike existing methods
based on sparse PCA, which assume sparsity in the loading matrices, our
approach posits sparsity in the factor processes while allowing non-sparse
loadings. This is motivated by the fact that financial returns typically
exhibit universal and non-sparse exposure to market factors. Unlike the
commonly used $\ell_1$-relaxation in sparse PCA, the proposed sparse APCA
employs a truncated power method to estimate the leading sparse factor and a
sequential deflation method for multi-factor cases under $\ell_0$-constraints.
Furthermore, we develop a data-driven approach to identify the sparsity of risk
factors over the time horizon using a novel cross-sectional cross-validation
method. We establish the consistency of our estimators under mild conditions as
both the dimension $N$ and the sample size $T$ grow. Monte Carlo simulations
demonstrate that the proposed method performs well in finite samples.
Empirically, we apply our method to daily S&P 500 stock returns (2004--2016)
and identify nine risk factors influencing the stock market.

arXiv link: http://arxiv.org/abs/2407.09738v3

Econometrics arXiv paper, submitted: 2024-07-12

Regularizing stock return covariance matrices via multiple testing of correlations

Authors: Richard Luger

This paper develops a large-scale inference approach for the regularization
of stock return covariance matrices. The framework allows for the presence of
heavy tails and multivariate GARCH-type effects of unknown form among the stock
returns. The approach involves simultaneous testing of all pairwise
correlations, followed by setting non-statistically significant elements to
zero. This adaptive thresholding is achieved through sign-based Monte Carlo
resampling within multiple testing procedures, controlling either the
traditional familywise error rate, a generalized familywise error rate, or the
false discovery proportion. Subsequent shrinkage ensures that the final
covariance matrix estimate is positive definite and well-conditioned while
preserving the achieved sparsity. Compared to alternative estimators, this new
regularization method demonstrates strong performance in simulation experiments
and real portfolio optimization.

arXiv link: http://arxiv.org/abs/2407.09696v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-07-12

An Introduction to Permutation Processes (version 0.5)

Authors: Fang Han

These lecture notes were prepared for a special topics course in the
Department of Statistics at the University of Washington, Seattle. They
comprise the first eight chapters of a book currently in progress.

arXiv link: http://arxiv.org/abs/2407.09664v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-07-12

Computationally Efficient Estimation of Large Probit Models

Authors: Patrick Ding, Guido Imbens, Zhaonan Qu, Yinyu Ye

Probit models are useful for modeling correlated discrete responses in many
disciplines, including consumer choice data in economics and marketing.
However, the Gaussian latent variable feature of probit models coupled with
identification constraints pose significant computational challenges for its
estimation and inference, especially when the dimension of the discrete
response variable is large. In this paper, we propose a computationally
efficient Expectation-Maximization (EM) algorithm for estimating large probit
models. Our work is distinct from existing methods in two important aspects.
First, instead of simulation or sampling methods, we apply and customize
expectation propagation (EP), a deterministic method originally proposed for
approximate Bayesian inference, to estimate moments of the truncated
multivariate normal (TMVN) in the E (expectation) step. Second, we take
advantage of a symmetric identification condition to transform the constrained
optimization problem in the M (maximization) step into a one-dimensional
problem, which is solved efficiently using Newton's method instead of
off-the-shelf solvers. Our method enables the analysis of correlated choice
data in the presence of more than 100 alternatives, which is a reasonable size
in modern applications, such as online shopping and booking platforms, but has
been difficult in practice with probit models. We apply our probit estimation
method to study ordering effects in hotel search results on Expedia's online
booking platform.

arXiv link: http://arxiv.org/abs/2407.09371v2

Econometrics arXiv paper, submitted: 2024-07-11

An Introduction to Causal Discovery

Authors: Martin Huber

In social sciences and economics, causal inference traditionally focuses on
assessing the impact of predefined treatments (or interventions) on predefined
outcomes, such as the effect of education programs on earnings. Causal
discovery, in contrast, aims to uncover causal relationships among multiple
variables in a data-driven manner, by investigating statistical associations
rather than relying on predefined causal structures. This approach, more common
in computer science, seeks to understand causality in an entire system of
variables, which can be visualized by causal graphs. This survey provides an
introduction to key concepts, algorithms, and applications of causal discovery
from the perspectives of economics and social sciences. It covers fundamental
concepts like d-separation, causal faithfulness, and Markov equivalence,
sketches various algorithms for causal discovery, and discusses the back-door
and front-door criteria for identifying causal effects. The survey concludes
with more specific examples of causal discovery, e.g. for learning all
variables that directly affect an outcome of interest and/or testing
identification of causal effects in observational data.

arXiv link: http://arxiv.org/abs/2407.08602v1

Econometrics arXiv paper, submitted: 2024-07-11

Comparative analysis of Mixed-Data Sampling (MIDAS) model compared to Lag-Llama model for inflation nowcasting

Authors: Adam Bahelka, Harmen de Weerd

Inflation is one of the most important economic indicators closely watched by
both public institutions and private agents. This study compares the
performance of a traditional econometric model, Mixed Data Sampling regression,
with one of the newest developments from the field of Artificial Intelligence,
a foundational time series forecasting model based on a Long short-term memory
neural network called Lag-Llama, in their ability to nowcast the Harmonized
Index of Consumer Prices in the Euro area. Two models were compared and
assessed whether the Lag-Llama can outperform the MIDAS regression, ensuring
that the MIDAS regression is evaluated under the best-case scenario using a
dataset spanning from 2010 to 2022. The following metrics were used to evaluate
the models: Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE),
Mean Squared Error (MSE), correlation with the target, R-squared and adjusted
R-squared. The results show better performance of the pre-trained Lag-Llama
across all metrics.

arXiv link: http://arxiv.org/abs/2407.08510v1

Econometrics arXiv paper, submitted: 2024-07-10

Production function estimation using subjective expectations data

Authors: Agnes Norris Keiller, Aureo de Paula, John Van Reenen

Standard methods for estimating production functions in the Olley and Pakes
(1996) tradition require assumptions on input choices. We introduce a new
method that exploits (increasingly available) data on a firm's expectations of
its future output and inputs that allows us to obtain consistent production
function parameter estimates while relaxing these input demand assumptions. In
contrast to dynamic panel methods, our proposed estimator can be implemented on
very short panels (including a single cross-section), and Monte Carlo
simulations show it outperforms alternative estimators when firms' material
input choices are subject to optimization error. Implementing a range of
production function estimators on UK data, we find our proposed estimator
yields results that are either similar to or more credible than commonly-used
alternatives. These differences are larger in industries where material inputs
appear harder to optimize. We show that TFP implied by our proposed estimator
is more strongly associated with future jobs growth than existing methods,
suggesting that failing to adequately account for input endogeneity may
underestimate the degree of dynamic reallocation in the economy.

arXiv link: http://arxiv.org/abs/2407.07988v1

Econometrics arXiv paper, submitted: 2024-07-10

Reduced-Rank Matrix Autoregressive Models: A Medium $N$ Approach

Authors: Alain Hecq, Ivan Ricardo, Ines Wilms

Reduced-rank regressions are powerful tools used to identify co-movements
within economic time series. However, this task becomes challenging when we
observe matrix-valued time series, where each dimension may have a different
co-movement structure. We propose reduced-rank regressions with a tensor
structure for the coefficient matrix to provide new insights into co-movements
within and between the dimensions of matrix-valued time series. Moreover, we
relate the co-movement structures to two commonly used reduced-rank models,
namely the serial correlation common feature and the index model. Two empirical
applications involving U.S.\ states and economic indicators for the Eurozone
and North American countries illustrate how our new tools identify
co-movements.

arXiv link: http://arxiv.org/abs/2407.07973v1

Econometrics arXiv paper, submitted: 2024-07-09

R. A. Fisher's Exact Test Revisited

Authors: Martin Mugnier

This note provides a conceptual clarification of Ronald Aylmer Fisher's
(1935) pioneering exact test in the context of the Lady Testing Tea experiment.
It unveils a critical implicit assumption in Fisher's calibration: the taster
minimizes expected misclassification given fixed probabilistic information.
Without similar assumptions or an explicit alternative hypothesis, the
rationale behind Fisher's specification of the rejection region remains
unclear.

arXiv link: http://arxiv.org/abs/2407.07251v1

Econometrics arXiv paper, submitted: 2024-07-09

The Hidden Subsidy of the Affordable Care Act

Authors: Liam Sigaud, Markus Bjoerkheim, Vitor Melo

Under the ACA, the federal government paid a substantially larger share of
medical costs of newly eligible Medicaid enrollees than previously eligible
ones. States could save up to 100% of their per-enrollee costs by reclassifying
original enrollees into the newly eligible group. We examine whether this
fiscal incentive changed states' enrollment practices. We find that Medicaid
expansion caused large declines in the number of beneficiaries enrolled in the
original Medicaid population, suggesting widespread reclassifications. In 2019
alone, this phenomenon affected 4.4 million Medicaid enrollees at a federal
cost of $8.3 billion. Our results imply that reclassifications inflated the
federal cost of Medicaid expansion by 18.2%.

arXiv link: http://arxiv.org/abs/2407.07217v1

Econometrics arXiv paper, submitted: 2024-07-09

Dealing with idiosyncratic cross-correlation when constructing confidence regions for PC factors

Authors: Diego Fresoli, Pilar Poncela, Esther Ruiz

In this paper, we propose a computationally simple estimator of the
asymptotic covariance matrix of the Principal Components (PC) factors valid in
the presence of cross-correlated idiosyncratic components. The proposed
estimator of the asymptotic Mean Square Error (MSE) of PC factors is based on
adaptive thresholding the sample covariances of the id iosyncratic residuals
with the threshold based on their individual variances. We compare the nite
sample performance of condence regions for the PC factors obtained using the
proposed asymptotic MSE with those of available extant asymptotic and bootstrap
regions and show that the former beats all alternative procedures for a wide
variety of idiosyncratic cross-correlation structures.

arXiv link: http://arxiv.org/abs/2407.06883v1

Econometrics arXiv paper, submitted: 2024-07-09

Causes and Electoral Consequences of Political Assassinations: The Role of Organized Crime in Mexico

Authors: Roxana Gutiérrez-Romero, Nayely Iturbe

Mexico has experienced a notable surge in assassinations of political
candidates and mayors. This article argues that these killings are largely
driven by organized crime, aiming to influence candidate selection, control
local governments for rent-seeking, and retaliate against government
crackdowns. Using a new dataset of political assassinations in Mexico from 2000
to 2021 and instrumental variables, we address endogeneity concerns in the
location and timing of government crackdowns. Our instruments include
historical Chinese immigration patterns linked to opium cultivation in Mexico,
local corn prices, and U.S. illicit drug prices. The findings reveal that
candidates in municipalities near oil pipelines face an increased risk of
assassination due to drug trafficking organizations expanding into oil theft,
particularly during elections and fuel price hikes. Government arrests or
killings of organized crime members trigger retaliatory violence, further
endangering incumbent mayors. This political violence has a negligible impact
on voter turnout, as it targets politicians rather than voters. However, voter
turnout increases in areas where authorities disrupt drug smuggling, raising
the chances of the local party being re-elected. These results offer new
insights into how criminal groups attempt to capture local governments and the
implications for democracy under criminal governance.

arXiv link: http://arxiv.org/abs/2407.06733v1

Econometrics arXiv updated paper (originally submitted: 2024-07-09)

Femicide Laws, Unilateral Divorce, and Abortion Decriminalization Fail to Stop Women from Being Killed in Mexico

Authors: Roxana Gutiérrez-Romero

This paper evaluates the effectiveness of femicide laws in combating
gender-based killings of women, a major cause of premature female mortality.
Focusing on Mexico, a pioneer in adopting such legislation, the paper exploits
variations in the enactment of femicide laws and prison sentences across
states. Using the difference-in-differences estimator, the analysis reveals
femicide laws have not impacted femicides, homicides, disappearances, or
suicides of women. Results remain robust when considering differences in prison
sentencing, states introducing unilateral divorce, equitable divorce asset
compensation, or decriminalizing abortion. Findings also hold with synthetic
matching, suggesting laws are insufficient to combat gender-based violence in
contexts of impunity.

arXiv link: http://arxiv.org/abs/2407.06722v2

Econometrics arXiv updated paper (originally submitted: 2024-07-08)

Conditional Rank-Rank Regression

Authors: Victor Chernozhukov, Iván Fernández-Val, Jonas Meier, Aico van Vuuren, Francis Vella

Rank-rank regression is commonly employed in economic research as a way of
capturing the relationship between two economic variables. It frequently
features in studies of intergenerational mobility as the resulting coefficient,
capturing the rank correlation between the variables, is easy to interpret and
measures overall persistence. However, in many applications it is common
practice to include other covariates to account for differences in persistence
levels between groups defined by the values of these covariates. In these
instances the resulting coefficients can be difficult to interpret. We propose
the conditional rank-rank regression, which uses conditional ranks instead of
unconditional ranks, to measure average within-group persistence. The
difference between conditional and unconditional rank-rank regression
coefficients can then be interpreted as a measure of between-group persistence.
We develop a flexible estimation approach using distribution regression and
establish a theoretical framework for large sample inference. An empirical
study on intergenerational income mobility in Switzerland demonstrates the
advantages of this approach. The study reveals stronger intergenerational
persistence between fathers and sons compared to fathers and daughters, with
the within-group persistence explaining 62% of the overall income persistence
for sons and 52% for daughters. Smaller families and those with highly educated
fathers exhibit greater persistence in economic status.

arXiv link: http://arxiv.org/abs/2407.06387v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-07-08

Dynamic Matrix Factor Models for High Dimensional Time Series

Authors: Ruofan Yu, Rong Chen, Han Xiao, Yuefeng Han

Matrix time series, which consist of matrix-valued data observed over time,
are prevalent in various fields such as economics, finance, and engineering.
Such matrix time series data are often observed in high dimensions. Matrix
factor models are employed to reduce the dimensionality of such data, but they
lack the capability to make predictions without specified dynamics in the
latent factor process. To address this issue, we propose a two-component
dynamic matrix factor model that extends the standard matrix factor model by
incorporating a matrix autoregressive structure for the low-dimensional latent
factor process. This two-component model injects prediction capability to the
matrix factor model and provides deeper insights into the dynamics of
high-dimensional matrix time series. We present the estimation procedures of
the model and their theoretical properties, as well as empirical analysis of
the estimation procedures via simulations, and a case study of New York city
taxi data, demonstrating the performance and usefulness of the model.

arXiv link: http://arxiv.org/abs/2407.05624v1

Econometrics arXiv paper, submitted: 2024-07-08

Methodology for Calculating CO2 Absorption by Tree Planting for Greening Projects

Authors: Kento Ichii, Toshiki Muraoka, Nobumichi Shinohara, Shunsuke Managi, Shutaro Takeda

In order to explore the possibility of carbon credits for greening projects,
which play an important role in climate change mitigation, this paper examines
a formula for estimating the amount of carbon fixation for greening activities
in urban areas through tree planting. The usefulness of the formula studied was
examined by conducting calculations based on actual data through measurements
made by on-site surveys of a greening companie. A series of calculation results
suggest that this formula may be useful. Recognizing carbon credits for green
businesses for the carbon sequestration of their projects is an important
incentive not only as part of environmental improvement and climate change
action, but also to improve the health and well-being of local communities and
to generate economic benefits. This study is a pioneering exploration of the
methodology.

arXiv link: http://arxiv.org/abs/2407.05596v1

Econometrics arXiv paper, submitted: 2024-07-07

A Convexified Matching Approach to Imputation and Individualized Inference

Authors: YoonHaeng Hur, Tengyuan Liang

We introduce a new convexified matching method for missing value imputation
and individualized inference inspired by computational optimal transport. Our
method integrates favorable features from mainstream imputation approaches:
optimal matching, regression imputation, and synthetic control. We impute
counterfactual outcomes based on convex combinations of observed outcomes,
defined based on an optimal coupling between the treated and control data sets.
The optimal coupling problem is considered a convex relaxation to the
combinatorial optimal matching problem. We estimate granular-level individual
treatment effects while maintaining a desirable aggregate-level summary by
properly constraining the coupling. We construct transparent, individual
confidence intervals for the estimated counterfactual outcomes. We devise fast
iterative entropic-regularized algorithms to solve the optimal coupling problem
that scales favorably when the number of units to match is large. Entropic
regularization plays a crucial role in both inference and computation; it helps
control the width of the individual confidence intervals and design fast
optimization algorithms.

arXiv link: http://arxiv.org/abs/2407.05372v1

Econometrics arXiv updated paper (originally submitted: 2024-07-05)

A Short Note on Event-Study Synthetic Difference-in-Differences Estimators

Authors: Diego Ciccia

I propose an event study extension of Synthetic Difference-in-Differences
(SDID) estimators. I show that, in simple and staggered adoption designs,
estimators from Arkhangelsky et al. (2021) can be disaggregated into dynamic
treatment effect estimators, comparing the lagged outcome differentials of
treated and synthetic controls to their pre-treatment average. Estimators
presented in this note can be computed using the sdid_event Stata package.

arXiv link: http://arxiv.org/abs/2407.09565v2

Econometrics arXiv updated paper (originally submitted: 2024-07-05)

Learning control variables and instruments for causal analysis in observational data

Authors: Nicolas Apfel, Julia Hatamyar, Martin Huber, Jannis Kueck

This study introduces a data-driven, machine learning-based method to detect
suitable control variables and instruments for assessing the causal effect of a
treatment on an outcome in observational data, if they exist. Our approach
tests the joint existence of instruments, which are associated with the
treatment but not directly with the outcome (at least conditional on
observables), and suitable control variables, conditional on which the
treatment is exogenous, and learns the partition of instruments and control
variables from the observed data. The detection of sets of instruments and
control variables relies on the condition that proper instruments are
conditionally independent of the outcome given the treatment and suitable
control variables. We establish the consistency of our method for detecting
control variables and instruments under certain regularity conditions,
investigate the finite sample performance through a simulation study, and
provide an empirical application to labor market data from the Job Corps study.

arXiv link: http://arxiv.org/abs/2407.04448v2

Econometrics arXiv paper, submitted: 2024-07-05

Overeducation under different macroeconomic conditions: The case of Spanish university graduates

Authors: Maite Blázquez Cuesta, Marco A. Pérez Navarro, Rocío Sánchez-Mangas

This paper examines the incidence and persistence of overeducation in the
early careers of Spanish university graduates. We investigate the role played
by the business cycle and field of study and their interaction in shaping both
phenomena. We also analyse the relevance of specific types of knowledge and
skills as driving factors in reducing overeducation risk. We use data from the
Survey on the Labour Insertion of University Graduates (EILU) conducted by the
Spanish National Statistics Institute in 2014 and 2019. The survey collects
rich information on cohorts that graduated in the 2009/2010 and 2014/2015
academic years during the Great Recession and the subsequent economic recovery,
respectively. Our results show, first, the relevance of the economic scenario
when graduates enter the labour market. Graduation during a recession increased
overeducation risk and persistence. Second, a clear heterogeneous pattern
occurs across fields of study, with health sciences graduates displaying better
performance in terms of both overeducation incidence and persistence and less
impact of the business cycle. Third, we find evidence that some transversal
skills (language, IT, management) can help to reduce overeducation risk in the
absence of specific knowledge required for the job, thus indicating some kind
of compensatory role. Finally, our findings have important policy implications.
Overeducation, and more importantly overeducation persistence, imply a
non-neglectable misallocation of resources. Therefore, policymakers need to
address this issue in the design of education and labour market policies.

arXiv link: http://arxiv.org/abs/2407.04437v1

Econometrics arXiv updated paper (originally submitted: 2024-07-04)

Under the null of valid specification, pre-tests cannot make post-test inference liberal

Authors: Clément de Chaisemartin, Xavier D'Haultfœuille

Consider a parameter of interest, which can be consistently estimated under
some conditions. Suppose also that we can at least partly test these conditions
with specification tests. We consider the common practice of conducting
inference on the parameter of interest conditional on not rejecting these
tests. We show that if the tested conditions hold, conditional inference is
valid, though possibly conservative. This holds generally, without imposing any
assumption on the asymptotic dependence between the estimator of the parameter
of interest and the specification test.

arXiv link: http://arxiv.org/abs/2407.03725v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-07-04

When can weak latent factors be statistically inferred?

Authors: Jianqing Fan, Yuling Yan, Yuheng Zheng

This article establishes a new and comprehensive estimation and inference
theory for principal component analysis (PCA) under the weak factor model that
allow for cross-sectional dependent idiosyncratic components under the nearly
minimal factor strength relative to the noise level or signal-to-noise ratio.
Our theory is applicable regardless of the relative growth rate between the
cross-sectional dimension $N$ and temporal dimension $T$. This more realistic
assumption and noticeable result require completely new technical device, as
the commonly-used leave-one-out trick is no longer applicable to the case with
cross-sectional dependence. Another notable advancement of our theory is on PCA
inference $ - $ for example, under the regime where $N\asymp T$, we show that
the asymptotic normality for the PCA-based estimator holds as long as the
signal-to-noise ratio (SNR) grows faster than a polynomial rate of $\log N$.
This finding significantly surpasses prior work that required a polynomial rate
of $N$. Our theory is entirely non-asymptotic, offering finite-sample
characterizations for both the estimation error and the uncertainty level of
statistical inference. A notable technical innovation is our closed-form
first-order approximation of PCA-based estimator, which paves the way for
various statistical tests. Furthermore, we apply our theories to design
easy-to-implement statistics for validating whether given factors fall in the
linear spans of unknown latent factors, testing structural breaks in the factor
loadings for an individual unit, checking whether two units have the same risk
exposures, and constructing confidence intervals for systematic risks. Our
empirical studies uncover insightful correlations between our test results and
economic cycles.

arXiv link: http://arxiv.org/abs/2407.03616v3

Econometrics arXiv updated paper (originally submitted: 2024-07-03)

Finely Stratified Rerandomization Designs

Authors: Max Cytrynbaum

We study estimation and inference on causal parameters under finely
stratified rerandomization designs, which use baseline covariates to match
units into groups (e.g. matched pairs), then rerandomize within-group treatment
assignments until a balance criterion is satisfied. We show that finely
stratified rerandomization does partially linear regression adjustment by
design, providing nonparametric control over the stratified covariates and
linear control over the rerandomized covariates. We introduce several new forms
of rerandomization, allowing for imbalance metrics based on nonlinear
estimators, and proposing a minimax scheme that minimizes the computational
cost of rerandomization subject to a bound on estimation error. While the
asymptotic distribution of GMM estimators under stratified rerandomization is
generically non-normal, we show how to restore asymptotic normality using
ex-post linear adjustment tailored to the stratification. We derive new
variance bounds that enable conservative inference on finite population causal
parameters, and provide asymptotically exact inference on their superpopulation
counterparts.

arXiv link: http://arxiv.org/abs/2407.03279v3

Econometrics arXiv updated paper (originally submitted: 2024-07-03)

Wild inference for wild SVARs with application to heteroscedasticity-based IV

Authors: Bulat Gafarov, Madina Karamysheva, Andrey Polbin, Anton Skrobotov

Structural vector autoregressions are used to compute impulse response
functions (IRF) for persistent data. Existing multiple-parameter inference
requires cumbersome pretesting for unit roots, cointegration, and trends with
subsequent stationarization. To avoid pretesting, we propose a novel
dependent wild bootstrap procedure for simultaneous inference on IRF
using local projections (LP) estimated in levels in possibly
nonstationary and heteroscedastic SVARs. The bootstrap also
allows efficient smoothing of LP estimates.
We study IRF to US monetary policy identified using FOMC meetings count as an
instrument for heteroscedasticity of monetary shocks. We validate our method
using DSGE model simulations and alternative SVAR methods.

arXiv link: http://arxiv.org/abs/2407.03265v2

Econometrics arXiv paper, submitted: 2024-07-02

Conditional Forecasts in Large Bayesian VARs with Multiple Equality and Inequality Constraints

Authors: Joshua C. C. Chan, Davide Pettenuzzo, Aubrey Poon, Dan Zhu

Conditional forecasts, i.e. projections of a set of variables of interest on
the future paths of some other variables, are used routinely by empirical
macroeconomists in a number of applied settings. In spite of this, the existing
algorithms used to generate conditional forecasts tend to be very
computationally intensive, especially when working with large Vector
Autoregressions or when multiple linear equality and inequality constraints are
imposed at once. We introduce a novel precision-based sampler that is fast,
scales well, and yields conditional forecasts from linear equality and
inequality constraints. We show in a simulation study that the proposed method
produces forecasts that are identical to those from the existing algorithms but
in a fraction of the time. We then illustrate the performance of our method in
a large Bayesian Vector Autoregression where we simultaneously impose a mix of
linear equality and inequality constraints on the future trajectories of key US
macroeconomic indicators over the 2020--2022 period.

arXiv link: http://arxiv.org/abs/2407.02262v1

Econometrics arXiv paper, submitted: 2024-07-02

How do financial variables impact public debt growth in China? An empirical study based on Markov regime-switching model

Authors: Tianbao Zhou, Zhixin Liu, Yingying Xu

The deep financial turmoil in China caused by the COVID-19 pandemic has
exacerbated fiscal shocks and soaring public debt levels, which raises concerns
about the stability and sustainability of China's public debt growth in the
future. This paper employs the Markov regime-switching model with time-varying
transition probability (TVTP-MS) to investigate the growth pattern of China's
public debt and the impact of financial variables such as credit, house prices
and stock prices on the growth of public debt. We identify two distinct regimes
of China's public debt, i.e., the surge regime with high growth rate and high
volatility and the steady regime with low growth rate and low volatility. The
main results are twofold. On the one hand, an increase in the growth rate of
the financial variables helps to moderate the growth rate of public debt,
whereas the effects differ between the two regimes. More specifically, the
impacts of credit and house prices are significant in the surge regime, whereas
stock prices affect public debt growth significantly in the steady regime. On
the other hand, a higher growth rate of financial variables also increases the
probability of public debt either staying in or switching to the steady regime.
These findings highlight the necessity of aligning financial adjustments with
the prevailing public debt regime when developing sustainable fiscal policies.

arXiv link: http://arxiv.org/abs/2407.02183v1

Econometrics arXiv updated paper (originally submitted: 2024-07-01)

Macroeconomic Forecasting with Large Language Models

Authors: Andrea Carriero, Davide Pettenuzzo, Shubhranshu Shekhar

This paper presents a comparative analysis evaluating the accuracy of Large
Language Models (LLMs) against traditional macro time series forecasting
approaches. In recent times, LLMs have surged in popularity for forecasting due
to their ability to capture intricate patterns in data and quickly adapt across
very different domains. However, their effectiveness in forecasting
macroeconomic time series data compared to conventional methods remains an area
of interest. To address this, we conduct a rigorous evaluation of LLMs against
traditional macro forecasting methods, using as common ground the FRED-MD
database. Our findings provide valuable insights into the strengths and
limitations of LLMs in forecasting macroeconomic time series, shedding light on
their applicability in real-world scenarios

arXiv link: http://arxiv.org/abs/2407.00890v4

Econometrics arXiv updated paper (originally submitted: 2024-06-28)

Three Scores and 15 Years (1948-2023) of Rao's Score Test: A Brief History

Authors: Anil K. Bera, Yannis Bilias

Rao (1948) introduced the score test statistic as an alternative to the
likelihood ratio and Wald test statistics. In spite of the optimality
properties of the score statistic shown in Rao and Poti (1946), the Rao score
(RS) test remained unnoticed for almost 20 years. Today, the RS test is part of
the “Holy Trinity” of hypothesis testing and has found its place in the
Statistics and Econometrics textbooks and related software. Reviewing the
history of the RS test we note that remarkable test statistics proposed in the
literature earlier or around the time of Rao (1948) mostly from intuition, such
as Pearson (1900) goodness-fit-test, Moran (1948) I test for spatial dependence
and Durbin and Watson (1950) test for serial correlation, can be given RS test
statistic interpretation. At the same time, recent developments in the robust
hypothesis testing under certain forms of misspecification, make the RS test an
active area of research in Statistics and Econometrics. From our brief account
of the history the RS test we conclude that its impact in science goes far
beyond its calendar starting point with promising future research activities
for many years to come.

arXiv link: http://arxiv.org/abs/2406.19956v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-06-28

Vector AutoRegressive Moving Average Models: A Review

Authors: Marie-Christine Düker, David S. Matteson, Ruey S. Tsay, Ines Wilms

Vector AutoRegressive Moving Average (VARMA) models form a powerful and
general model class for analyzing dynamics among multiple time series. While
VARMA models encompass the Vector AutoRegressive (VAR) models, their popularity
in empirical applications is dominated by the latter. Can this phenomenon be
explained fully by the simplicity of VAR models? Perhaps many users of VAR
models have not fully appreciated what VARMA models can provide. The goal of
this review is to provide a comprehensive resource for researchers and
practitioners seeking insights into the advantages and capabilities of VARMA
models. We start by reviewing the identification challenges inherent to VARMA
models thereby encompassing classical and modern identification schemes and we
continue along the same lines regarding estimation, specification and diagnosis
of VARMA models. We then highlight the practical utility of VARMA models in
terms of Granger Causality analysis, forecasting and structural analysis as
well as recent advances and extensions of VARMA models to further facilitate
their adoption in practice. Finally, we discuss some interesting future
research directions where VARMA models can fulfill their potentials in
applications as compared to their subclass of VAR models.

arXiv link: http://arxiv.org/abs/2406.19702v1

Econometrics arXiv paper, submitted: 2024-06-27

Factor multivariate stochastic volatility models of high dimension

Authors: Benjamin Poignard, Manabu Asai

Building upon the pertinence of the factor decomposition to break the curse
of dimensionality inherent to multivariate volatility processes, we develop a
factor model-based multivariate stochastic volatility (fMSV) framework that
relies on two viewpoints: sparse approximate factor model and sparse factor
loading matrix. We propose a two-stage estimation procedure for the fMSV model:
the first stage obtains the estimators of the factor model, and the second
stage estimates the MSV part using the estimated common factor variables. We
derive the asymptotic properties of the estimators. Simulated experiments are
performed to assess the forecasting performances of the covariance matrices.
The empirical analysis based on vectors of asset returns illustrates that the
forecasting performances of the fMSV models outperforms competing conditional
covariance models.

arXiv link: http://arxiv.org/abs/2406.19033v1

Econometrics arXiv updated paper (originally submitted: 2024-06-27)

A Note on Identification of Match Fixed Effects as Interpretable Unobserved Match Affinity

Authors: Suguru Otani, Tohya Sugano

We highlight that match fixed effects, represented by the coefficients of
interaction terms involving dummy variables for two elements, lack
identification without specific restrictions on parameters. Consequently, the
coefficients typically reported as relative match fixed effects by statistical
software are not interpretable. To address this, we establish normalization
conditions that enable identification of match fixed effect parameters as
interpretable indicators of unobserved match affinity, facilitating comparisons
among observed matches. Using data from middle school students in the 2007
Trends in International Mathematics and Science Study (TIMSS), we highlight the
distribution of comparable match fixed effects within a specific school.

arXiv link: http://arxiv.org/abs/2406.18913v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2024-06-26

Online Distributional Regression

Authors: Simon Hirsch, Jonathan Berrisch, Florian Ziel

Large-scale streaming data are common in modern machine learning applications
and have led to the development of online learning algorithms. Many fields,
such as supply chain management, weather and meteorology, energy markets, and
finance, have pivoted towards using probabilistic forecasts. This results in
the need not only for accurate learning of the expected value but also for
learning the conditional heteroskedasticity and conditional moments. Against
this backdrop, we present a methodology for online estimation of regularized,
linear distributional models. The proposed algorithm is based on a combination
of recent developments for the online estimation of LASSO models and the
well-known GAMLSS framework. We provide a case study on day-ahead electricity
price forecasting, in which we show the competitive performance of the
incremental estimation combined with strongly reduced computational effort. Our
algorithms are implemented in a computationally efficient Python package ondil.

arXiv link: http://arxiv.org/abs/2407.08750v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-06-25

LABOR-LLM: Language-Based Occupational Representations with Large Language Models

Authors: Susan Athey, Herman Brunborg, Tianyu Du, Ayush Kanodia, Keyon Vafa

Vafa et al. (2024) introduced a transformer-based econometric model, CAREER,
that predicts a worker's next job as a function of career history (an
"occupation model"). CAREER was initially estimated ("pre-trained") using a
large, unrepresentative resume dataset, which served as a "foundation model,"
and parameter estimation was continued ("fine-tuned") using data from a
representative survey. CAREER had better predictive performance than
benchmarks. This paper considers an alternative where the resume-based
foundation model is replaced by a large language model (LLM). We convert
tabular data from the survey into text files that resemble resumes and
fine-tune the LLMs using these text files with the objective to predict the
next token (word). The resulting fine-tuned LLM is used as an input to an
occupation model. Its predictive performance surpasses all prior models. We
demonstrate the value of fine-tuning and further show that by adding more
career data from a different population, fine-tuning smaller LLMs surpasses the
performance of fine-tuning larger models.

arXiv link: http://arxiv.org/abs/2406.17972v3

Econometrics arXiv paper, submitted: 2024-06-25

Forecast Relative Error Decomposition

Authors: Christian Gourieroux, Quinlan Lee

We introduce a class of relative error decomposition measures that are
well-suited for the analysis of shocks in nonlinear dynamic models. They
include the Forecast Relative Error Decomposition (FRED), Forecast Error
Kullback Decomposition (FEKD) and Forecast Error Laplace Decomposition (FELD).
These measures are favourable over the traditional Forecast Error Variance
Decomposition (FEVD) because they account for nonlinear dependence in both a
serial and cross-sectional sense. This is illustrated by applications to
dynamic models for qualitative data, count data, stochastic volatility and
cyberrisk.

arXiv link: http://arxiv.org/abs/2406.17708v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-06-25

Estimation and Inference for CP Tensor Factor Models

Authors: Bin Chen, Yuefeng Han, Qiyang Yu

High-dimensional tensor-valued data have recently gained attention from
researchers in economics and finance. We consider the estimation and inference
of high-dimensional tensor factor models, where each dimension of the tensor
diverges. Our focus is on a factor model that admits CP-type tensor
decomposition, which allows for non-orthogonal loading vectors. Based on the
contemporary covariance matrix, we propose an iterative simultaneous projection
estimation method. Our estimator is robust to weak dependence among factors and
weak correlation across different dimensions in the idiosyncratic shocks. We
establish an inferential theory, demonstrating both consistency and asymptotic
normality under relaxed assumptions. Within a unified framework, we consider
two eigenvalue ratio-based estimators for the number of factors in a tensor
factor model and justify their consistency. Simulation studies confirm the
theoretical results and an empirical application to sorted portfolios reveals
three important factors: a market factor, a long-short factor, and a volatility
factor.

arXiv link: http://arxiv.org/abs/2406.17278v2

Econometrics arXiv paper, submitted: 2024-06-24

Efficient two-sample instrumental variable estimators with change points and near-weak identification

Authors: Bertille Antoine, Otilia Boldea, Niccolo Zaccaria

We consider estimation and inference in a linear model with endogenous
regressors where the parameters of interest change across two samples. If the
first-stage is common, we show how to use this information to obtain more
efficient two-sample GMM estimators than the standard split-sample GMM, even in
the presence of near-weak instruments. We also propose two tests to detect
change points in the parameters of interest, depending on whether the
first-stage is common or not. We derive the limiting distribution of these
tests and show that they have non-trivial power even under weaker and possibly
time-varying identification patterns. The finite sample properties of our
proposed estimators and testing procedures are illustrated in a series of
Monte-Carlo experiments, and in an application to the open-economy New
Keynesian Phillips curve. Our empirical analysis using US data provides strong
support for a New Keynesian Phillips curve with incomplete pass-through and
reveals important time variation in the relationship between inflation and
exchange rate pass-through.

arXiv link: http://arxiv.org/abs/2406.17056v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-06-23

F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data

Authors: Zexing Xu, Linjun Zhang, Sitan Yang, Rasoul Etesami, Hanghang Tong, Huan Zhang, Jiawei Han

Demand prediction is a crucial task for e-commerce and physical retail
businesses, especially during high-stake sales events. However, the limited
availability of historical data from these peak periods poses a significant
challenge for traditional forecasting methods. In this paper, we propose a
novel approach that leverages strategically chosen proxy data reflective of
potential sales patterns from similar entities during non-peak periods,
enriched by features learned from a graph neural networks (GNNs)-based
forecasting model, to predict demand during peak events. We formulate the
demand prediction as a meta-learning problem and develop the Feature-based
First-Order Model-Agnostic Meta-Learning (F-FOMAML) algorithm that leverages
proxy data from non-peak periods and GNN-generated relational metadata to learn
feature-specific layer parameters, thereby adapting to demand forecasts for
peak events. Theoretically, we show that by considering domain similarities
through task-specific metadata, our model achieves improved generalization,
where the excess risk decreases as the number of training tasks increases.
Empirical evaluations on large-scale industrial datasets demonstrate the
superiority of our approach. Compared to existing state-of-the-art models, our
method demonstrates a notable improvement in demand prediction accuracy,
reducing the Mean Absolute Error by 26.24% on an internal vending machine
dataset and by 1.04% on the publicly accessible JD.com dataset.

arXiv link: http://arxiv.org/abs/2406.16221v1

Econometrics arXiv paper, submitted: 2024-06-22

Testing for Restricted Stochastic Dominance under Survey Nonresponse with Panel Data: Theory and an Evaluation of Poverty in Australia

Authors: Rami V. Tabri, Mathew J. Elias

This paper lays the groundwork for a unifying approach to stochastic
dominance testing under survey nonresponse that integrates the partial
identification approach to incomplete data and design-based inference for
complex survey data. We propose a novel inference procedure for restricted
$s$th-order stochastic dominance, tailored to accommodate a broad spectrum of
nonresponse assumptions. The method uses pseudo-empirical likelihood to
formulate the test statistic and compares it to a critical value from the
chi-squared distribution with one degree of freedom. We detail the procedure's
asymptotic properties under both null and alternative hypotheses, establishing
its uniform validity under the null and consistency against various
alternatives. Using the Household, Income and Labour Dynamics in Australia
survey, we demonstrate the procedure's utility in a sensitivity analysis of
temporal poverty comparisons among Australian households.

arXiv link: http://arxiv.org/abs/2406.15702v1

Econometrics arXiv updated paper (originally submitted: 2024-06-21)

Identification and Estimation of Causal Effects in High-Frequency Event Studies

Authors: Alessandro Casini, Adam McCloskey

We provide precise conditions for nonparametric identification of causal
effects by high-frequency event study regressions, which have been used widely
in the recent macroeconomics, financial economics and political economy
literatures. The high-frequency event study method regresses changes in an
outcome variable on a measure of unexpected changes in a policy variable in a
narrow time window around an event or a policy announcement (e.g., a 30-minute
window around an FOMC announcement). We show that, contrary to popular belief,
the narrow size of the window is not sufficient for identification. Rather, the
population regression coefficient identifies a causal estimand when (i) the
effect of the policy shock on the outcome does not depend on the other
variables (separability) and (ii) the surprise component of the news or event
dominates all other variables that are present in the event window (relative
exogeneity). Technically, the latter condition requires the ratio between the
variance of the policy shock and that of the other variables to be infinite in
the event window. Under these conditions, we establish the causal meaning of
the event study estimand corresponding to the regression coefficient and the
consistency and asymptotic normality of the event study estimator. Notably,
this standard linear regression estimator is robust to general forms of
nonlinearity. We apply our results to Nakamura and Steinsson's (2018a) analysis
of the real economic effects of monetary policy, providing a simple empirical
procedure to analyze the extent to which the standard event study estimator
adequately estimates causal effects of interest.

arXiv link: http://arxiv.org/abs/2406.15667v5

Econometrics arXiv cross-link from cs.DL (cs.DL), submitted: 2024-06-21

The disruption index suffers from citation inflation and is confounded by shifts in scholarly citation practice

Authors: Alexander M. Petersen, Felber Arroyave, Fabio Pammolli

Measuring the rate of innovation in academia and industry is fundamental to
monitoring the efficiency and competitiveness of the knowledge economy. To this
end, a disruption index (CD) was recently developed and applied to publication
and patent citation networks (Wu et al., Nature 2019; Park et al., Nature
2023). Here we show that CD systematically decreases over time due to secular
growth in research and patent production, following two distinct mechanisms
unrelated to innovation -- one behavioral and the other structural. Whereas the
behavioral explanation reflects shifts associated with techno-social factors
(e.g. self-citation practices), the structural explanation follows from
`citation inflation' (CI), an inextricable feature of real citation networks
attributable to increasing reference list lengths, which causes CD to
systematically decrease. We demonstrate this causal link by way of mathematical
deduction, computational simulation, multi-variate regression, and
quasi-experimental comparison of the disruptiveness of PNAS versus PNAS Plus
articles, which differ only in their lengths. Accordingly, we analyze CD data
available in the SciSciNet database and find that disruptiveness incrementally
increased from 2005-2015, and that the negative relationship between disruption
and team-size is remarkably small in overall magnitude effect size, and shifts
from negative to positive for team size $\geq$ 8 coauthors.

arXiv link: http://arxiv.org/abs/2406.15311v1

Econometrics arXiv updated paper (originally submitted: 2024-06-21)

Difference-in-Differences when Parallel Trends Holds Conditional on Covariates

Authors: Carolina Caetano, Brantly Callaway

In this paper, we study difference-in-differences identification and
estimation strategies when the parallel trends assumption holds after
conditioning on covariates. We consider empirically relevant settings where the
covariates can be time-varying, time-invariant, or both. We uncover a number of
weaknesses of commonly used two-way fixed effects (TWFE) regressions in this
context, even in applications with only two time periods. In addition to some
weaknesses due to estimating linear regression models that are similar to cases
with cross-sectional data, we also point out a collection of additional issues
that we refer to as hidden linearity bias that arise because the
transformations used to eliminate the unit fixed effect also transform the
covariates (e.g., taking first differences can result in the estimating
equation only including the change in covariates over time, not their level,
and also drop time-invariant covariates altogether). We provide simple
diagnostics for assessing how susceptible a TWFE regression is to hidden
linearity bias based on reformulating the TWFE regression as a weighting
estimator. Finally, we propose simple alternative estimation strategies that
can circumvent these issues.

arXiv link: http://arxiv.org/abs/2406.15288v2

Econometrics arXiv paper, submitted: 2024-06-21

MIDAS-QR with 2-Dimensional Structure

Authors: Tibor Szendrei, Arnab Bhattacharjee, Mark E. Schaffer

Mixed frequency data has been shown to improve the performance of
growth-at-risk models in the literature. Most of the research has focused on
imposing structure on the high-frequency lags when estimating MIDAS-QR models
akin to what is done in mean models. However, only imposing structure on the
lag-dimension can potentially induce quantile variation that would otherwise
not be there. In this paper we extend the framework by introducing structure on
both the lag dimension and the quantile dimension. In this way we are able to
shrink unnecessary quantile variation in the high-frequency variables. This
leads to more gradual lag profiles in both dimensions compared to the MIDAS-QR
and UMIDAS-QR. We show that this proposed method leads to further gains in
nowcasting and forecasting on a pseudo-out-of-sample exercise on US data.

arXiv link: http://arxiv.org/abs/2406.15157v1

Econometrics arXiv cross-link from cs.GT (cs.GT), submitted: 2024-06-21

Statistical Inference and A/B Testing in Fisher Markets and Paced Auctions

Authors: Luofeng Liao, Christian Kroer

We initiate the study of statistical inference and A/B testing for two market
equilibrium models: linear Fisher market (LFM) equilibrium and first-price
pacing equilibrium (FPPE). LFM arises from fair resource allocation systems
such as allocation of food to food banks and notification opportunities to
different types of notifications. For LFM, we assume that the data observed is
captured by the classical finite-dimensional Fisher market equilibrium, and its
steady-state behavior is modeled by a continuous limit Fisher market. The
second type of equilibrium we study, FPPE, arises from internet advertising
where advertisers are constrained by budgets and advertising opportunities are
sold via first-price auctions. For platforms that use pacing-based methods to
smooth out the spending of advertisers, FPPE provides a hindsight-optimal
configuration of the pacing method. We propose a statistical framework for the
FPPE model, in which a continuous limit FPPE models the steady-state behavior
of the auction platform, and a finite FPPE provides the data to estimate
primitives of the limit FPPE. Both LFM and FPPE have an Eisenberg-Gale convex
program characterization, the pillar upon which we derive our statistical
theory. We start by deriving basic convergence results for the finite market to
the limit market. We then derive asymptotic distributions, and construct
confidence intervals. Furthermore, we establish the asymptotic local minimax
optimality of estimation based on finite markets. We then show that the theory
can be used for conducting statistically valid A/B testing on auction
platforms. Synthetic and semi-synthetic experiments verify the validity and
practicality of our theory.

arXiv link: http://arxiv.org/abs/2406.15522v3

Econometrics arXiv cross-link from cs.CE (cs.CE), submitted: 2024-06-20

Movement Prediction-Adjusted Naive Forecast: Is the Naive Baseline Unbeatable in Financial Time Series Forecasting?

Authors: Cheng Zhang

In financial time series forecasting, the naive forecast is a notoriously
difficult benchmark to surpass because of the stochastic nature of the data.
Motivated by this challenge, this study introduces the movement
prediction-adjusted naive forecast (MPANF), a forecast combination method that
systematically refines the naive forecast by incorporating directional
information. In particular, MPANF adjusts the naive forecast with an increment
formed by three components: the in-sample mean absolute increment as the base
magnitude, the movement prediction as the sign, and a coefficient derived from
the in-sample movement prediction accuracy as the scaling factor. The
experimental results on eight financial time series, using the RMSE, MAE, MAPE,
and sMAPE, show that with a movement prediction accuracy of approximately 0.55,
MPANF generally outperforms common benchmarks, including the naive forecast,
naive forecast with drift, IMA(1,1), and linear regression. These findings
indicate that MPANF has the potential to outperform the naive baseline when
reliable movement predictions are available.

arXiv link: http://arxiv.org/abs/2406.14469v10

Econometrics arXiv updated paper (originally submitted: 2024-06-20)

Estimating Treatment Effects under Recommender Interference: A Structured Neural Networks Approach

Authors: Ruohan Zhan, Shichao Han, Yuchen Hu, Zhenling Jiang

Recommender systems are essential for content-sharing platforms by curating
personalized content. To evaluate updates to recommender systems targeting
content creators, platforms frequently rely on creator-side randomized
experiments. The treatment effect measures the change in outcomes when a new
algorithm is implemented compared to the status quo. We show that the standard
difference-in-means estimator can lead to biased estimates due to recommender
interference that arises when treated and control creators compete for
exposure. We propose a "recommender choice model" that describes which item
gets exposed from a pool containing both treated and control items. By
combining a structural choice model with neural networks, this framework
directly models the interference pathway while accounting for rich
viewer-content heterogeneity. We construct a debiased estimator of the
treatment effect and prove it is $\sqrt n$-consistent and asymptotically normal
with potentially correlated samples. We validate our estimator's empirical
performance with a field experiment on Weixin short-video platform. In addition
to the standard creator-side experiment, we conduct a costly double-sided
randomization design to obtain a benchmark estimate free from interference
bias. We show that the proposed estimator yields results comparable to the
benchmark, whereas the standard difference-in-means estimator can exhibit
significant bias and even produce reversed signs.

arXiv link: http://arxiv.org/abs/2406.14380v3

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2024-06-20

Temperature in the Iberian Peninsula: Trend, seasonality, and heterogeneity

Authors: C. Vladimir Rodríguez-Caballero, Esther Ruiz

In this paper, we propose fitting unobserved component models to represent
the dynamic evolution of bivariate systems of centre and log-range temperatures
obtained monthly from minimum/maximum temperatures observed at a given
location. In doing so, the centre and log-range temperature are decomposed into
potentially stochastic trends, seasonal, and transitory components. Since our
model encompasses deterministic trends and seasonal components as limiting
cases, we contribute to the debate on whether stochastic or deterministic
components better represent the trend and seasonal components. The methodology
is implemented to centre and log-range temperature observed in four locations
in the Iberian Peninsula, namely, Barcelona, Coru\ {n}a, Madrid, and Seville.
We show that, at each location, the centre temperature can be represented by a
smooth integrated random walk with time-varying slope, while a stochastic level
better represents the log-range. We also show that centre and log-range
temperature are unrelated. The methodology is then extended to simultaneously
model centre and log-range temperature observed at several locations in the
Iberian Peninsula. We fit a multi-level dynamic factor model to extract
potential commonalities among centre (log-range) temperature while also
allowing for heterogeneity in different areas in the Iberian Peninsula. We show
that, although the commonality in trends of average temperature is
considerable, the regional components are also relevant.

arXiv link: http://arxiv.org/abs/2406.14145v1

Econometrics arXiv updated paper (originally submitted: 2024-06-20)

Estimating Time-Varying Parameters of Various Smoothness in Linear Models via Kernel Regression

Authors: Mikihito Nishi

We consider estimating nonparametric time-varying parameters in linear models
using kernel regression. Our contributions are threefold. First, we consider a
broad class of time-varying parameters including deterministic smooth
functions, the rescaled random walk, structural breaks, the threshold model and
their mixtures. We show that those time-varying parameters can be consistently
estimated by kernel regression. Our analysis exploits the smoothness of the
time-varying parameter quantified by a single parameter. The second
contribution is to reveal that the bandwidth used in kernel regression
determines a trade-off between the rate of convergence and the size of the
class of time-varying parameters that can be estimated. We demonstrate that an
improper choice of the bandwidth yields biased estimation, and argue that the
bandwidth should be selected according to the smoothness of the time-varying
parameter. Our third contribution is to propose a data-driven procedure for
bandwidth selection that is adaptive to the smoothness of the time-varying
parameter.

arXiv link: http://arxiv.org/abs/2406.14046v4

Econometrics arXiv paper, submitted: 2024-06-19

Testing identification in mediation and dynamic treatment models

Authors: Martin Huber, Kevin Kloiber, Lukas Laffers

We propose a test for the identification of causal effects in mediation and
dynamic treatment models that is based on two sets of observed variables,
namely covariates to be controlled for and suspected instruments, building on
the test by Huber and Kueck (2022) for single treatment models. We consider
models with a sequential assignment of a treatment and a mediator to assess the
direct treatment effect (net of the mediator), the indirect treatment effect
(via the mediator), or the joint effect of both treatment and mediator. We
establish testable conditions for identifying such effects in observational
data. These conditions jointly imply (1) the exogeneity of the treatment and
the mediator conditional on covariates and (2) the validity of distinct
instruments for the treatment and the mediator, meaning that the instruments do
not directly affect the outcome (other than through the treatment or mediator)
and are unconfounded given the covariates. Our framework extends to
post-treatment sample selection or attrition problems when replacing the
mediator by a selection indicator for observing the outcome, enabling joint
testing of the selectivity of treatment and attrition. We propose a machine
learning-based test to control for covariates in a data-driven manner and
analyze its finite sample performance in a simulation study. Additionally, we
apply our method to Slovak labor market data and find that our testable
implications are not rejected for a sequence of training programs typically
considered in dynamic treatment evaluations.

arXiv link: http://arxiv.org/abs/2406.13826v1

Econometrics arXiv paper, submitted: 2024-06-19

Bayesian Inference for Multidimensional Welfare Comparisons

Authors: David Gunawan, William Griffiths, Duangkamon Chotikapanich

Using both single-index measures and stochastic dominance concepts, we show
how Bayesian inference can be used to make multivariate welfare comparisons. A
four-dimensional distribution for the well-being attributes income, mental
health, education, and happiness are estimated via Bayesian Markov chain Monte
Carlo using unit-record data taken from the Household, Income and Labour
Dynamics in Australia survey. Marginal distributions of beta and gamma mixtures
and discrete ordinal distributions are combined using a copula. Improvements in
both well-being generally and poverty magnitude are assessed using posterior
means of single-index measures and posterior probabilities of stochastic
dominance. The conditions for stochastic dominance depend on the class of
utility functions that is assumed to define a social welfare function and the
number of attributes in the utility function. Three classes of utility
functions are considered, and posterior probabilities of dominance are computed
for one, two, and four-attribute utility functions for three time intervals
within the period 2001 to 2019.

arXiv link: http://arxiv.org/abs/2406.13395v1

Econometrics arXiv updated paper (originally submitted: 2024-06-19)

Testing for Underpowered Literatures

Authors: Stefan Faridani

How many experimental studies would have come to different conclusions had
they been run on larger samples? I show how to estimate the expected number of
statistically significant results that a set of experiments would have reported
had their sample sizes all been counterfactually increased. The proposed
deconvolution estimator is asymptotically normal and adjusts for publication
bias. Unlike related methods, this approach requires no assumptions of any kind
about the distribution of true intervention treatment effects and allows for
point masses. Simulations find good coverage even when the t-score is only
approximately normal. An application to randomized trials (RCTs) published in
economics journals finds that doubling every sample would increase the power of
t-tests by 7.2 percentage points on average. This effect is smaller than for
non-RCTs and comparable to systematic replications in laboratory psychology
where previous studies enabled more accurate power calculations. This suggests
that RCTs are on average relatively insensitive to sample size increases.
Research funders who wish to raise power should generally consider sponsoring
better-measured and higher quality experiments -- rather than only larger ones.

arXiv link: http://arxiv.org/abs/2406.13122v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-06-17

Model-Based Inference and Experimental Design for Interference Using Partial Network Data

Authors: Steven Wilkins Reeves, Shane Lubold, Arun G. Chandrasekhar, Tyler H. McCormick

The stable unit treatment value assumption states that the outcome of an
individual is not affected by the treatment statuses of others, however in many
real world applications, treatments can have an effect on many others beyond
the immediately treated. Interference can generically be thought of as mediated
through some network structure. In many empirically relevant situations
however, complete network data (required to adjust for these spillover effects)
are too costly or logistically infeasible to collect. Partially or indirectly
observed network data (e.g., subsamples, aggregated relational data (ARD),
egocentric sampling, or respondent-driven sampling) reduce the logistical and
financial burden of collecting network data, but the statistical properties of
treatment effect adjustments from these design strategies are only beginning to
be explored. In this paper, we present a framework for the estimation and
inference of treatment effect adjustments using partial network data through
the lens of structural causal models. We also illustrate procedures to assign
treatments using only partial network data, with the goal of either minimizing
estimator variance or optimally seeding. We derive single network asymptotic
results applicable to a variety of choices for an underlying graph model. We
validate our approach using simulated experiments on observed graphs with
applications to information diffusion in India and Malawi.

arXiv link: http://arxiv.org/abs/2406.11940v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2024-06-17

Dynamically Consistent Analysis of Realized Covariations in Term Structure Models

Authors: Dennis Schroers

In this article we show how to analyze the covariation of bond prices
nonparametrically and robustly, staying consistent with a general no-arbitrage
setting. This is, in particular, motivated by the problem of identifying the
number of statistically relevant factors in the bond market under minimal
conditions. We apply this method in an empirical study which suggests that a
high number of factors is needed to describe the term structure evolution and
that the term structure of volatility varies over time.

arXiv link: http://arxiv.org/abs/2406.19412v1

Econometrics arXiv paper, submitted: 2024-06-17

Resilience of international oil trade networks under extreme event shock-recovery simulations

Authors: Na Wei, Wen-Jie Xie, Wei-Xing Zhou

With the frequent occurrence of black swan events, global energy security
situation has become increasingly complex and severe. Assessing the resilience
of the international oil trade network (iOTN) is crucial for evaluating its
ability to withstand extreme shocks and recover thereafter, ensuring energy
security. We overcomes the limitations of discrete historical data by
developing a simulation model for extreme event shock-recovery in the iOTNs. We
introduce network efficiency indicator to measure oil resource allocation
efficiency and evaluate network performance. Then, construct a resilience index
to explore the resilience of the iOTNs from dimensions of resistance and
recoverability. Our findings indicate that extreme events can lead to sharp
declines in performance of the iOTNs, especially when economies with
significant trading positions and relations suffer shocks. The upward trend in
recoverability and resilience reflects the self-organizing nature of the iOTNs,
demonstrating its capacity for optimizing its own structure and functionality.
Unlike traditional energy security research based solely on discrete historical
data or resistance indicators, our model evaluates resilience from multiple
dimensions, offering insights for global energy governance systems while
providing diverse perspectives for various economies to mitigate risks and
uphold energy security.

arXiv link: http://arxiv.org/abs/2406.11467v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-06-17

Management Decisions in Manufacturing using Causal Machine Learning -- To Rework, or not to Rework?

Authors: Philipp Schwarz, Oliver Schacht, Sven Klaassen, Daniel Grünbaum, Sebastian Imhof, Martin Spindler

In this paper, we present a data-driven model for estimating optimal rework
policies in manufacturing systems. We consider a single production stage within
a multistage, lot-based system that allows for optional rework steps. While the
rework decision depends on an intermediate state of the lot and system, the
final product inspection, and thus the assessment of the actual yield, is
delayed until production is complete. Repair steps are applied uniformly to the
lot, potentially improving some of the individual items while degrading others.
The challenge is thus to balance potential yield improvement with the rework
costs incurred. Given the inherently causal nature of this decision problem, we
propose a causal model to estimate yield improvement. We apply methods from
causal machine learning, in particular double/debiased machine learning (DML)
techniques, to estimate conditional treatment effects from data and derive
policies for rework decisions. We validate our decision model using real-world
data from opto-electronic semiconductor manufacturing, achieving a yield
improvement of 2 - 3% during the color-conversion process of white
light-emitting diodes (LEDs).

arXiv link: http://arxiv.org/abs/2406.11308v1

Econometrics arXiv cross-link from cs.SE (cs.SE), submitted: 2024-06-16

Impact of the Availability of ChatGPT on Software Development: A Synthetic Difference in Differences Estimation using GitHub Data

Authors: Alexander Quispe, Rodrigo Grijalba

Advancements in Artificial Intelligence, particularly with ChatGPT, have
significantly impacted software development. Utilizing novel data from GitHub
Innovation Graph, we hypothesize that ChatGPT enhances software production
efficiency. Utilizing natural experiments where some governments banned
ChatGPT, we employ Difference-in-Differences (DID), Synthetic Control (SC), and
Synthetic Difference-in-Differences (SDID) methods to estimate its effects. Our
findings indicate a significant positive impact on the number of git pushes,
repositories, and unique developers per 100,000 people, particularly for
high-level, general purpose, and shell scripting languages. These results
suggest that AI tools like ChatGPT can substantially boost developer
productivity, though further analysis is needed to address potential downsides
such as low quality code and privacy concerns.

arXiv link: http://arxiv.org/abs/2406.11046v1

Econometrics arXiv updated paper (originally submitted: 2024-06-16)

EM Estimation of Conditional Matrix Variate $t$ Distributions

Authors: Battulga Gankhuu

Conditional matrix variate student $t$ distribution was introduced by
Battulga (2024a). In this paper, we propose a new version of the conditional
matrix variate student $t$ distribution. The paper provides EM algorithms,
which estimate parameters of the conditional matrix variate student $t$
distributions, including general cases and special cases with Minnesota prior.

arXiv link: http://arxiv.org/abs/2406.10837v3

Econometrics arXiv updated paper (originally submitted: 2024-06-13)

Randomization Inference: Theory and Applications

Authors: David M. Ritzwoller, Joseph P. Romano, Azeem M. Shaikh

We review approaches to statistical inference based on randomization.
Permutation tests are treated as an important special case. Under a certain
group invariance property, referred to as the “randomization hypothesis,”
randomization tests achieve exact control of the Type I error rate in finite
samples. Although this unequivocal precision is very appealing, the range of
problems that satisfy the randomization hypothesis is somewhat limited. We show
that randomization tests are often asymptotically, or approximately, valid and
efficient in settings that deviate from the conditions required for
finite-sample error control. When randomization tests fail to offer even
asymptotic Type 1 error control, their asymptotic validity may be restored by
constructing an asymptotically pivotal test statistic. Randomization tests can
then provide exact error control for tests of highly structured hypotheses with
good performance in a wider class of problems. We give a detailed overview of
several prominent applications of randomization tests, including two-sample
permutation tests, regression, and conformal inference.

arXiv link: http://arxiv.org/abs/2406.09521v2

Econometrics arXiv paper, submitted: 2024-06-13

Multidimensional clustering in judge designs

Authors: Johannes W. Ligtenberg, Tiemen Woutersen

Estimates in judge designs run the risk of being biased due to the many judge
identities that are implicitly or explicitly used as instrumental variables.
The usual method to analyse judge designs, via a leave-out mean instrument,
eliminates this many instrument bias only in case the data are clustered in at
most one dimension. What is left out in the mean defines this clustering
dimension. How most judge designs cluster their standard errors, however,
implies that there are additional clustering dimensions, which makes that a
many instrument bias remains. We propose two estimators that are many
instrument bias free, also in multidimensional clustered judge designs. The
first generalises the one dimensional cluster jackknife instrumental variable
estimator, by removing from this estimator the additional bias terms due to the
extra dependence in the data. The second models all but one clustering
dimensions by fixed effects and we show how these numerous fixed effects can be
removed without introducing extra bias. A Monte-Carlo experiment and the
revisitation of two judge designs show the empirical relevance of properly
accounting for multidimensional clustering in estimation.

arXiv link: http://arxiv.org/abs/2406.09473v1

Econometrics arXiv paper, submitted: 2024-06-13

Jackknife inference with two-way clustering

Authors: James G. MacKinnon, Morten Ørregaard Nielsen, Matthew D. Webb

For linear regression models with cross-section or panel data, it is natural
to assume that the disturbances are clustered in two dimensions. However, the
finite-sample properties of two-way cluster-robust tests and confidence
intervals are often poor. We discuss several ways to improve inference with
two-way clustering. Two of these are existing methods for avoiding, or at least
ameliorating, the problem of undefined standard errors when a cluster-robust
variance matrix estimator (CRVE) is not positive definite. One is a new method
that always avoids the problem. More importantly, we propose a family of new
two-way CRVEs based on the cluster jackknife. Simulations for models with
two-way fixed effects suggest that, in many cases, the cluster-jackknife CRVE
combined with our new method yields surprisingly accurate inferences. We
provide a simple software package, twowayjack for Stata, that implements our
recommended variance estimator.

arXiv link: http://arxiv.org/abs/2406.08880v1

Econometrics arXiv updated paper (originally submitted: 2024-06-12)

Identification and Inference on Treatment Effects under Covariate-Adaptive Randomization and Imperfect Compliance

Authors: Federico A. Bugni, Mengsi Gao, Filip Obradovic, Amilcar Velez

Randomized controlled trials (RCTs) frequently utilize covariate-adaptive
randomization (CAR) (e.g., stratified block randomization) and commonly suffer
from imperfect compliance. This paper studies the identification and inference
for the average treatment effect (ATE) and the average treatment effect on the
treated (ATT) in such RCTs with a binary treatment.
We first develop characterizations of the identified sets for both estimands.
Since data are generally not i.i.d. under CAR, these characterizations do not
follow from existing results. We then provide consistent estimators of the
identified sets and asymptotically valid confidence intervals for the
parameters. Our asymptotic analysis leads to concrete practical recommendations
regarding how to estimate the treatment assignment probabilities that enter the
estimated bounds. For the ATE bounds, using sample analog assignment
frequencies is more efficient than relying on the true assignment
probabilities. For the ATT bounds, the most efficient approach is to use the
true assignment probability for the probabilities in the numerator and the
sample analog for those in the denominator.

arXiv link: http://arxiv.org/abs/2406.08419v3

Econometrics arXiv paper, submitted: 2024-06-12

Positive and negative word of mouth in the United States

Authors: Shawn Berry

Word of mouth is a process by which consumers transmit positive or negative
sentiment to other consumers about a business. While this process has long been
recognized as a type of promotion for businesses, the value of word of mouth is
questionable. This study will examine the various correlates of word of mouth
to demographic variables, including the role of the trust of business owners.
Education level, region of residence, and income level were found to be
significant predictors of positive word of mouth. Although the results
generally suggest that the majority of respondents do not engage in word of
mouth, there are valuable insights to be learned.

arXiv link: http://arxiv.org/abs/2406.08279v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2024-06-12

HARd to Beat: The Overlooked Impact of Rolling Windows in the Era of Machine Learning

Authors: Francesco Audrino, Jonathan Chassot

We investigate the predictive abilities of the heterogeneous autoregressive
(HAR) model compared to machine learning (ML) techniques across an
unprecedented dataset of 1,455 stocks. Our analysis focuses on the role of
fitting schemes, particularly the training window and re-estimation frequency,
in determining the HAR model's performance. Despite extensive hyperparameter
tuning, ML models fail to surpass the linear benchmark set by HAR when
utilizing a refined fitting approach for the latter. Moreover, the simplicity
of HAR allows for an interpretable model with drastically lower computational
costs. We assess performance using QLIKE, MSE, and realized utility metrics,
finding that HAR consistently outperforms its ML counterparts when both rely
solely on realized volatility and VIX as predictors. Our results underscore the
importance of a correctly specified fitting scheme. They suggest that properly
fitted HAR models provide superior forecasting accuracy, establishing robust
guidelines for their practical application and use as a benchmark. This study
not only reaffirms the efficacy of the HAR model but also provides a critical
perspective on the practical limitations of ML approaches in realized
volatility forecasting.

arXiv link: http://arxiv.org/abs/2406.08041v1

Econometrics arXiv paper, submitted: 2024-06-12

Did Harold Zuercher Have Time-Separable Preferences?

Authors: Jay Lu, Yao Luo, Kota Saito, Yi Xin

This paper proposes an empirical model of dynamic discrete choice to allow
for non-separable time preferences, generalizing the well-known Rust (1987)
model. Under weak conditions, we show the existence of value functions and
hence well-defined optimal choices. We construct a contraction mapping of the
value function and propose an estimation method similar to Rust's nested fixed
point algorithm. Finally, we apply the framework to the bus engine replacement
data. We improve the fit of the data with our general model and reject the null
hypothesis that Harold Zuercher has separable time preferences. Misspecifying
an agent's preference as time-separable when it is not leads to biased
inferences about structure parameters (such as the agent's risk attitudes) and
misleading policy recommendations.

arXiv link: http://arxiv.org/abs/2406.07809v1

Econometrics arXiv paper, submitted: 2024-06-11

Cluster GARCH

Authors: Chen Tong, Peter Reinhard Hansen, Ilya Archakov

We introduce a novel multivariate GARCH model with flexible convolution-t
distributions that is applicable in high-dimensional systems. The model is
called Cluster GARCH because it can accommodate cluster structures in the
conditional correlation matrix and in the tail dependencies. The expressions
for the log-likelihood function and its derivatives are tractable, and the
latter facilitate a score-drive model for the dynamic correlation structure. We
apply the Cluster GARCH model to daily returns for 100 assets and find it
outperforms existing models, both in-sample and out-of-sample. Moreover, the
convolution-t distribution provides a better empirical performance than the
conventional multivariate t-distribution.

arXiv link: http://arxiv.org/abs/2406.06860v1

Econometrics arXiv paper, submitted: 2024-06-10

Robustness to Missing Data: Breakdown Point Analysis

Authors: Daniel Ober-Reynolds

Missing data is pervasive in econometric applications, and rarely is it
plausible that the data are missing (completely) at random. This paper proposes
a methodology for studying the robustness of results drawn from incomplete
datasets. Selection is measured as the squared Hellinger divergence between the
distributions of complete and incomplete observations, which has a natural
interpretation. The breakdown point is defined as the minimal amount of
selection needed to overturn a given result. Reporting point estimates and
lower confidence intervals of the breakdown point is a simple, concise way to
communicate the robustness of a result. An estimator of the breakdown point of
a result drawn from a generalized method of moments model is proposed and shown
root-n consistent and asymptotically normal under mild assumptions. Lower
confidence intervals of the breakdown point are simple to construct. The paper
concludes with a simulation study illustrating the finite sample performance of
the estimators in several common models.

arXiv link: http://arxiv.org/abs/2406.06804v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-06-10

Data-Driven Switchback Experiments: Theoretical Tradeoffs and Empirical Bayes Designs

Authors: Ruoxuan Xiong, Alex Chin, Sean J. Taylor

We study the design and analysis of switchback experiments conducted on a
single aggregate unit. The design problem is to partition the continuous time
space into intervals and switch treatments between intervals, in order to
minimize the estimation error of the treatment effect. We show that the
estimation error depends on four factors: carryover effects, periodicity,
serially correlated outcomes, and impacts from simultaneous experiments. We
derive a rigorous bias-variance decomposition and show the tradeoffs of the
estimation error from these factors. The decomposition provides three new
insights in choosing a design: First, balancing the periodicity between treated
and control intervals reduces the variance; second, switching less frequently
reduces the bias from carryover effects while increasing the variance from
correlated outcomes, and vice versa; third, randomizing interval start and end
points reduces both bias and variance from simultaneous experiments. Combining
these insights, we propose a new empirical Bayes design approach. This approach
uses prior data and experiments for designing future experiments. We illustrate
this approach using real data from a ride-sharing platform, yielding a design
that reduces MSE by 33% compared to the status quo design used on the platform.

arXiv link: http://arxiv.org/abs/2406.06768v1

Econometrics arXiv updated paper (originally submitted: 2024-06-10)

Data-Driven Real-time Coupon Allocation in the Online Platform

Authors: Jinglong Dai, Hanwei Li, Weiming Zhu, Jianfeng Lin, Binqiang Huang

Traditionally, firms have offered coupons to customer groups at predetermined
discount rates. However, advancements in machine learning and the availability
of abundant customer data now enable platforms to provide real-time customized
coupons to individuals. In this study, we partner with Meituan, a leading
shopping platform, to develop a real-time, end-to-end coupon allocation system
that is fast and effective in stimulating demand while adhering to marketing
budgets when faced with uncertain traffic from a diverse customer base.
Leveraging comprehensive customer and product features, we estimate Conversion
Rates (CVR) under various coupon values and employ isotonic regression to
ensure the monotonicity of predicted CVRs with respect to coupon value. Using
calibrated CVR predictions as input, we propose a Lagrangian Dual-based
algorithm that efficiently determines optimal coupon values for each arriving
customer within 50 milliseconds. We theoretically and numerically investigate
the model performance under parameter misspecifications and apply a control
loop to adapt to real-time updated information, thereby better adhering to the
marketing budget. Finally, we demonstrate through large-scale field experiments
and observational data that our proposed coupon allocation algorithm
outperforms traditional approaches in terms of both higher conversion rates and
increased revenue. As of May 2024, Meituan has implemented our framework to
distribute coupons to over 100 million users across more than 110 major cities
in China, resulting in an additional CNY 8 million in annual profit. We
demonstrate how to integrate a machine learning prediction model for estimating
customer CVR, a Lagrangian Dual-based coupon value optimizer, and a control
system to achieve real-time coupon delivery while dynamically adapting to
random customer arrival patterns.

arXiv link: http://arxiv.org/abs/2406.05987v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2024-06-09

Heterogeneous Treatment Effects in Panel Data

Authors: Retsef Levi, Elisabeth Paulson, Georgia Perakis, Emily Zhang

We address a core problem in causal inference: estimating heterogeneous
treatment effects using panel data with general treatment patterns. Many
existing methods either do not utilize the potential underlying structure in
panel data or have limitations in the allowable treatment patterns. In this
work, we propose and evaluate a new method that first partitions observations
into disjoint clusters with similar treatment effects using a regression tree,
and then leverages the (assumed) low-rank structure of the panel data to
estimate the average treatment effect for each cluster. Our theoretical results
establish the convergence of the resulting estimates to the true treatment
effects. Computation experiments with semi-synthetic data show that our method
achieves superior accuracy compared to alternative approaches, using a
regression tree with no more than 40 leaves. Hence, our method provides more
accurate and interpretable estimates than alternative methods.

arXiv link: http://arxiv.org/abs/2406.05633v1

Econometrics arXiv paper, submitted: 2024-06-08

Causal Interpretation of Regressions With Ranks

Authors: Lihua Lei

In studies of educational production functions or intergenerational mobility,
it is common to transform the key variables into percentile ranks. Yet, it
remains unclear what the regression coefficient estimates with ranks of the
outcome or the treatment. In this paper, we derive effective causal estimands
for a broad class of commonly-used regression methods, including the ordinary
least squares (OLS), two-stage least squares (2SLS), difference-in-differences
(DiD), and regression discontinuity designs (RDD). Specifically, we introduce a
novel primitive causal estimand, the Rank Average Treatment Effect (rank-ATE),
and prove that it serves as the building block of the effective estimands of
all the aforementioned econometrics methods. For 2SLS, DiD, and RDD, we show
that direct applications to outcome ranks identify parameters that are
difficult to interpret. To address this issue, we develop alternative methods
to identify more interpretable causal parameters.

arXiv link: http://arxiv.org/abs/2406.05548v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-06-06

Strong Approximations for Empirical Processes Indexed by Lipschitz Functions

Authors: Matias D. Cattaneo, Ruiqi Rae Yu

This paper presents new uniform Gaussian strong approximations for empirical
processes indexed by classes of functions based on $d$-variate random vectors
($d\geq1$). First, a uniform Gaussian strong approximation is established for
general empirical processes indexed by possibly Lipschitz functions, improving
on previous results in the literature. In the setting considered by Rio (1994),
and if the function class is Lipschitzian, our result improves the
approximation rate $n^{-1/(2d)}$ to $n^{-1/\max\{d,2\}}$, up to a
$polylog(n)$ term, where $n$ denotes the sample size.
Remarkably, we establish a valid uniform Gaussian strong approximation at the
rate $n^{-1/2}\log n$ for $d=2$, which was previously known to be valid only
for univariate ($d=1$) empirical processes via the celebrated Hungarian
construction (Koml\'os et al., 1975). Second, a uniform Gaussian strong
approximation is established for multiplicative separable empirical processes
indexed by possibly Lipschitz functions, which addresses some outstanding
problems in the literature (Chernozhukov et al., 2014, Section 3). Finally, two
other uniform Gaussian strong approximation results are presented when the
function class is a sequence of Haar basis based on quasi-uniform partitions.
Applications to nonparametric density and regression estimation are discussed.

arXiv link: http://arxiv.org/abs/2406.04191v2

Econometrics arXiv paper, submitted: 2024-06-06

GLOBUS: Global building renovation potential by 2070

Authors: Shufan Zhang, Minda Ma, Nan Zhou, Jinyue Yan

Surpassing the two large emission sectors of transportation and industry, the
building sector accounted for 34% and 37% of global energy consumption and
carbon emissions in 2021, respectively. The building sector, the final piece to
be addressed in the transition to net-zero carbon emissions, requires a
comprehensive, multisectoral strategy for reducing emissions. Until now, the
absence of data on global building floorspace has impeded the measurement of
building carbon intensity (carbon emissions per floorspace) and the
identification of ways to achieve carbon neutrality for buildings. For this
study, we develop a global building stock model (GLOBUS) to fill that data gap.
Our study's primary contribution lies in providing a dataset of global building
stock turnover using scenarios that incorporate various levels of building
renovation. By unifying the evaluation indicators, the dataset empowers
building science researchers to perform comparative analyses based on
floorspace. Specifically, the building stock dataset establishes a reference
for measuring carbon emission intensity and decarbonization intensity of
buildings within different countries. Further, we emphasize the sufficiency of
existing buildings by incorporating building renovation into the model.
Renovation can minimize the need to expand the building stock, thereby
bolstering decarbonization of the building sector.

arXiv link: http://arxiv.org/abs/2406.04133v1

Econometrics arXiv paper, submitted: 2024-06-06

Comments on B. Hansen's Reply to "A Comment on: `A Modern Gauss-Markov Theorem'", and Some Related Discussion

Authors: Benedikt M. Pötscher

In P\"otscher and Preinerstorfer (2022) and in the abridged version
P\"otscher and Preinerstorfer (2024, published in Econometrica) we have tried
to clear up the confusion introduced in Hansen (2022a) and in the earlier
versions Hansen (2021a,b). Unfortunatelly, Hansen's (2024) reply to P\"otscher
and Preinerstorfer (2024) further adds to the confusion. While we are already
somewhat tired of the matter, for the sake of the econometrics community we
feel compelled to provide clarification. We also add a comment on Portnoy
(2023), a "correction" to Portnoy (2022), as well as on Lei and Wooldridge
(2022).

arXiv link: http://arxiv.org/abs/2406.03971v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-06-05

Decision synthesis in monetary policy

Authors: Tony Chernis, Gary Koop, Emily Tallman, Mike West

The macroeconomy is a sophisticated dynamic system involving significant
uncertainties that complicate modelling. In response, decision-makers consider
multiple models that provide different predictions and policy recommendations
which are then synthesized into a policy decision. In this setting, we develop
Bayesian predictive decision synthesis (BPDS) to formalize monetary policy
decision processes. BPDS draws on recent developments in model combination and
statistical decision theory that yield new opportunities in combining multiple
models, emphasizing the integration of decision goals, expectations and
outcomes into the model synthesis process. Our case study concerns central bank
policy decisions about target interest rates with a focus on implications for
multi-step macroeconomic forecasting. This application also motivates new
methodological developments in conditional forecasting and BPDS, presented and
developed here.

arXiv link: http://arxiv.org/abs/2406.03321v2

Econometrics arXiv updated paper (originally submitted: 2024-06-05)

Identification of structural shocks in Bayesian VEC models with two-state Markov-switching heteroskedasticity

Authors: Justyna Wróblewska, Łukasz Kwiatkowski

We develop a Bayesian framework for cointegrated structural VAR models
identified by two-state Markovian breaks in conditional covariances. The
resulting structural VEC specification with Markov-switching heteroskedasticity
(SVEC-MSH) is formulated in the so-called B-parameterization, in which the
prior distribution is specified directly for the matrix of the instantaneous
reactions of the endogenous variables to structural innovations. We discuss
some caveats pertaining to the identification conditions presented earlier in
the literature on stationary structural VAR-MSH models, and revise the
restrictions to actually ensure the unique global identification through the
two-state heteroskedasticity. To enable the posterior inference in the proposed
model, we design an MCMC procedure, combining the Gibbs sampler and the
Metropolis-Hastings algorithm. The methodology is illustrated both with a
simulated as well as real-world data examples.

arXiv link: http://arxiv.org/abs/2406.03053v2

Econometrics arXiv paper, submitted: 2024-06-05

Is local opposition taking the wind out of the energy transition?

Authors: Federica Daniele, Guido de Blasio, Alessandra Pasquini

Local opposition to the installation of renewable energy sources is a
potential threat to the energy transition. Local communities tend to oppose the
construction of energy plants due to the associated negative externalities (the
so-called 'not in my backyard' or NIMBY phenomenon) according to widespread
belief, mostly based on anecdotal evidence. Using administrative data on wind
turbine installation and electoral outcomes across municipalities located in
the South of Italy during 2000-19, we estimate the impact of wind turbines'
installation on incumbent regional governments' electoral support during the
next elections. Our main findings, derived by a wind-speed based instrumental
variable strategy, point in the direction of a mild and not statistically
significant electoral backlash for right-wing regional administrations and of a
strong and statistically significant positive reinforcement for left-wing
regional administrations. Based on our analysis, the hypothesis of an electoral
effect of NIMBY type of behavior in connection with the development of wind
turbines appears not to be supported by the data.

arXiv link: http://arxiv.org/abs/2406.03022v1

Econometrics arXiv updated paper (originally submitted: 2024-06-05)

When does IV identification not restrict outcomes?

Authors: Leonard Goff

Many identification results in instrumental variables (IV) models hold
without requiring any restrictions on the distribution of potential outcomes,
or how those outcomes are correlated with selection behavior. This enables IV
models to allow for arbitrary heterogeneity in treatment effects and the
possibility of selection on gains in the outcome. I provide a necessary and
sufficient condition for treatment effects to be point identified in a manner
that does not restrict outcomes, when the instruments take a finite number of
values. The condition generalizes the well-known LATE monotonicity assumption,
and unifies a wide variety of other known IV identification results. The result
also yields a brute-force approach to reveal all selection models that allow
for point identification of treatment effects without restricting outcomes, and
then enumerate all of the identified parameters within each such selection
model. The search uncovers new selection models that yield identification,
provides impossibility results for others, and offers opportunities to relax
assumptions on selection used in existing literature. An application considers
the identification of complementarities between two cross-randomized
treatments, obtaining a necessary and sufficient condition on selection for
local average complementarities among compliers to be identified in a manner
that does not restrict outcomes. I use this result to revisit two empirical
settings, one in which the data are incompatible with this restriction on
selection, and another in which the data are compatible with the restriction.

arXiv link: http://arxiv.org/abs/2406.02835v6

Econometrics arXiv paper, submitted: 2024-06-04

The Impact of Acquisition on Product Quality in the Console Gaming Industry

Authors: Shivam Somani

The console gaming industry, a dominant force in the global entertainment
sector, has witnessed a wave of consolidation in recent years, epitomized by
Microsoft's high-profile acquisitions of Activision Blizzard and Zenimax. This
study investigates the repercussions of such mergers on consumer welfare and
innovation within the gaming landscape, focusing on product quality as a key
metric. Through a comprehensive analysis employing a difference-in-difference
model, the research evaluates the effects of acquisition on game review
ratings, drawing from a dataset comprising over 16,000 console games released
between 2000 and 2023. The research addresses key assumptions underlying the
difference-in-difference methodology, including parallel trends and spillover
effects, to ensure the robustness of the findings. The DID results suggest a
positive and statistically significant impact of acquisition on game review
ratings, when controlling for genre and release year. The study contributes to
the literature by offering empirical evidence on the direct consequences of
industry consolidation on consumer welfare and competition dynamics within the
gaming sector.

arXiv link: http://arxiv.org/abs/2406.02525v1

Econometrics arXiv paper, submitted: 2024-06-04

Enabling Decision-Making with the Modified Causal Forest: Policy Trees for Treatment Assignment

Authors: Hugo Bodory, Federica Mascolo, Michael Lechner

Decision-making plays a pivotal role in shaping outcomes in various
disciplines, such as medicine, economics, and business. This paper provides
guidance to practitioners on how to implement a decision tree designed to
address treatment assignment policies using an interpretable and non-parametric
algorithm. Our Policy Tree is motivated on the method proposed by Zhou, Athey,
and Wager (2023), distinguishing itself for the policy score calculation,
incorporating constraints, and handling categorical and continuous variables.
We demonstrate the usage of the Policy Tree for multiple, discrete treatments
on data sets from different fields. The Policy Tree is available in Python's
open-source package mcf (Modified Causal Forest).

arXiv link: http://arxiv.org/abs/2406.02241v1

Econometrics arXiv paper, submitted: 2024-06-04

A sequential test procedure for the choice of the number of regimes in multivariate nonlinear models

Authors: Andrea Bucci

This paper proposes a sequential test procedure for determining the number of
regimes in nonlinear multivariate autoregressive models. The procedure relies
on linearity and no additional nonlinearity tests for both multivariate smooth
transition and threshold autoregressive models. We conduct a simulation study
to evaluate the finite-sample properties of the proposed test in small samples.
Our findings indicate that the test exhibits satisfactory size properties, with
the rescaled version of the Lagrange Multiplier test statistics demonstrating
the best performance in most simulation settings. The sequential procedure is
also applied to two empirical cases, the US monthly interest rates and
Icelandic river flows. In both cases, the detected number of regimes aligns
well with the existing literature.

arXiv link: http://arxiv.org/abs/2406.02152v1

Econometrics arXiv paper, submitted: 2024-06-03

Random Subspace Local Projections

Authors: Viet Hoang Dinh, Didier Nibbering, Benjamin Wong

We show how random subspace methods can be adapted to estimating local
projections with many controls. Random subspace methods have their roots in the
machine learning literature and are implemented by averaging over regressions
estimated over different combinations of subsets of these controls. We document
three key results: (i) Our approach can successfully recover the impulse
response functions across Monte Carlo experiments representative of different
macroeconomic settings and identification schemes. (ii) Our results suggest
that random subspace methods are more accurate than other dimension reduction
methods if the underlying large dataset has a factor structure similar to
typical macroeconomic datasets such as FRED-MD. (iii) Our approach leads to
differences in the estimated impulse response functions relative to benchmark
methods when applied to two widely studied empirical applications.

arXiv link: http://arxiv.org/abs/2406.01002v1

Econometrics arXiv updated paper (originally submitted: 2024-06-03)

A Robust Residual-Based Test for Structural Changes in Factor Models

Authors: Bin Peng, Liangjun Su, Yayi Yan

In this paper, we propose an easy-to-implement residual-based specification
testing procedure for detecting structural changes in factor models, which is
powerful against both smooth and abrupt structural changes with unknown break
dates. The proposed test is robust against the over-specified number of
factors, and serially and crosssectionally correlated error processes. A new
central limit theorem is given for the quadratic forms of panel data with
dependence over both dimensions, thereby filling a gap in the literature. We
establish the asymptotic properties of the proposed test statistic, and
accordingly develop a simulation-based scheme to select critical value in order
to improve finite sample performance. Through extensive simulations and a
real-world application, we confirm our theoretical results and demonstrate that
the proposed test exhibits desirable size and power in practice.

arXiv link: http://arxiv.org/abs/2406.00941v2

Econometrics arXiv updated paper (originally submitted: 2024-06-02)

Comparing Experimental and Nonexperimental Methods: What Lessons Have We Learned Four Decades After LaLonde (1986)?

Authors: Guido Imbens, Yiqing Xu

In 1986, Robert LaLonde published an article comparing nonexperimental
estimates to experimental benchmarks (LaLonde 1986). He concluded that the
nonexperimental methods at the time could not systematically replicate
experimental benchmarks, casting doubt on their credibility. Following
LaLonde's critical assessment, there have been significant methodological
advances and practical changes, including (i) an emphasis on the
unconfoundedness assumption separated from functional form considerations, (ii)
a focus on the importance of overlap in covariate distributions, (iii) the
introduction of propensity score-based methods leading to doubly robust
estimators, (iv) methods for estimating and exploiting treatment effect
heterogeneity, and (v) a greater emphasis on validation exercises to bolster
research credibility. To demonstrate the practical lessons from these advances,
we reexamine the LaLonde data. We show that modern methods, when applied in
contexts with sufficient covariate overlap, yield robust estimates for the
adjusted differences between the treatment and control groups. However, this
does not imply that these estimates are causally interpretable. To assess their
credibility, validation exercises (such as placebo tests) are essential,
whereas goodness-of-fit tests alone are inadequate. Our findings highlight the
importance of closely examining the assignment process, carefully inspecting
overlap, and conducting validation exercises when analyzing causal effects with
nonexperimental data.

arXiv link: http://arxiv.org/abs/2406.00827v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-06-02

On the modelling and prediction of high-dimensional functional time series

Authors: Jinyuan Chang, Qin Fang, Xinghao Qiao, Qiwei Yao

We propose a two-step procedure to model and predict high-dimensional
functional time series, where the number of function-valued time series $p$ is
large in relation to the length of time series $n$. Our first step performs an
eigenanalysis of a positive definite matrix, which leads to a one-to-one linear
transformation for the original high-dimensional functional time series, and
the transformed curve series can be segmented into several groups such that any
two subseries from any two different groups are uncorrelated both
contemporaneously and serially. Consequently in our second step those groups
are handled separately without the information loss on the overall linear
dynamic structure. The second step is devoted to establishing a
finite-dimensional dynamical structure for all the transformed functional time
series within each group. Furthermore the finite-dimensional structure is
represented by that of a vector time series. Modelling and forecasting for the
original high-dimensional functional time series are realized via those for the
vector time series in all the groups. We investigate the theoretical properties
of our proposed methods, and illustrate the finite-sample performance through
both extensive simulation and two real datasets.

arXiv link: http://arxiv.org/abs/2406.00700v1

Econometrics arXiv updated paper (originally submitted: 2024-06-02)

Cluster-robust jackknife and bootstrap inference for logistic regression models

Authors: James G. MacKinnon, Morten Ørregaard Nielsen, Matthew D. Webb

We study cluster-robust inference for logistic regression (logit) models.
Inference based on the most commonly-used cluster-robust variance matrix
estimator (CRVE) can be very unreliable. We study several alternatives.
Conceptually the simplest of these, but also the most computationally
demanding, involves jackknifing at the cluster level. We also propose a
linearized version of the cluster-jackknife variance matrix estimator as well
as linearized versions of the wild cluster bootstrap. The linearizations are
based on empirical scores and are computationally efficient. Our results can
readily be generalized to other binary response models. We also discuss a new
Stata software package called logitjack which implements these procedures.
Simulation results strongly favor the new methods, and two empirical examples
suggest that it can be important to use them in practice.

arXiv link: http://arxiv.org/abs/2406.00650v2

Econometrics arXiv cross-link from q-fin.PM (q-fin.PM), submitted: 2024-06-02

Portfolio Optimization with Robust Covariance and Conditional Value-at-Risk Constraints

Authors: Qiqin Zhou

The measure of portfolio risk is an important input of the Markowitz
framework. In this study, we explored various methods to obtain a robust
covariance estimators that are less susceptible to financial data noise. We
evaluated the performance of large-cap portfolio using various forms of Ledoit
Shrinkage Covariance and Robust Gerber Covariance matrix during the period of
2012 to 2022. Out-of-sample performance indicates that robust covariance
estimators can outperform the market capitalization-weighted benchmark
portfolio, particularly during bull markets. The Gerber covariance with
Mean-Absolute-Deviation (MAD) emerged as the top performer. However, robust
estimators do not manage tail risk well under extreme market conditions, for
example, Covid-19 period. When we aim to control for tail risk, we should add
constraint on Conditional Value-at-Risk (CVaR) to make more conservative
decision on risk exposure. Additionally, we incorporated unsupervised
clustering algorithm K-means to the optimization algorithm (i.e. Nested
Clustering Optimization, NCO). It not only helps mitigate numerical instability
of the optimization algorithm, but also contributes to lower drawdown as well.

arXiv link: http://arxiv.org/abs/2406.00610v1

Econometrics arXiv paper, submitted: 2024-06-01

Financial Deepening and Economic Growth in Select Emerging Markets with Currency Board Systems: Theory and Evidence

Authors: Yujuan Qiu

This paper investigates some indicators of financial development in select
countries with currency board systems and raises some questions about the
connection between financial development and growth in currency board systems.
Most of those cases are long past episodes of what we would now call emerging
markets. However, the paper also looks at Hong Kong, the currency board system
that is one of the world's largest and most advanced financial markets. The
global financial crisis of 2008 09 created doubts about the efficiency of
financial markets in advanced economies, including in Hong Kong, and unsettled
the previous consensus that a large financial sector would be more stable than
a smaller one.

arXiv link: http://arxiv.org/abs/2406.00472v1

Econometrics arXiv paper, submitted: 2024-06-01

Optimizing hydrogen and e-methanol production through Power-to-X integration in biogas plants

Authors: Alberto Alamia, Behzad Partoon, Eoghan Rattigan, Gorm Brunn Andresen

The European Union strategy for net zero emissions relies on developing
hydrogen and electro fuels infrastructure. These fuels will be crucial as
energy carriers and balancing agents for renewable energy variability. Large
scale production requires more renewable capacity, and various Power to X (PtX)
concepts are emerging in renewable rich countries. However, sourcing renewable
carbon to scale carbon based electro fuels is a significant challenge. This
study explores a PtX hub that sources renewable CO2 from biogas plants,
integrating renewable energy, hydrogen production, and methanol synthesis on
site. This concept creates an internal market for energy and materials,
interfacing with the external energy system. The size and operation of the PtX
hub were optimized, considering integration with local energy systems and a
potential hydrogen grid. The levelized costs of hydrogen and methanol were
estimated for a 2030 start, considering new legislation on renewable fuels of
non biological origin (RFNBOs). Our results show the PtX hub can rely mainly on
on site renewable energy, selling excess electricity to the grid. A local
hydrogen grid connection improves operations, and the behind the meter market
lowers energy prices, buffering against market variability. We found methanol
costs could be below 650 euros per ton and hydrogen production costs below 3
euros per kg, with standalone methanol plants costing 23 per cent more. The CO2
recovery to methanol production ratio is crucial, with over 90 per cent
recovery requiring significant investment in CO2 and H2 storage. Overall, our
findings support planning PtX infrastructures integrated with the agricultural
sector as a cost effective way to access renewable carbon.

arXiv link: http://arxiv.org/abs/2406.00442v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2024-06-01

From day-ahead to mid and long-term horizons with econometric electricity price forecasting models

Authors: Paul Ghelasi, Florian Ziel

The recent energy crisis starting in 2021 led to record-high gas, coal,
carbon and power prices, with electricity reaching up to 40 times the
pre-crisis average. This had dramatic consequences for operational and risk
management prompting the need for robust econometric models for mid to
long-term electricity price forecasting. After a comprehensive literature
analysis, we identify key challenges and address them with novel approaches: 1)
Fundamental information is incorporated by constraining coefficients with
bounds derived from fundamental models offering interpretability; 2) Short-term
regressors such as load and renewables can be used in long-term forecasts by
incorporating their seasonal expectations to stabilize the model; 3) Unit root
behavior of power prices, induced by fuel prices, can be managed by estimating
same-day relationships and projecting them forward. We develop interpretable
models for a range of forecasting horizons from one day to one year ahead,
providing guidelines on robust modeling frameworks and key explanatory
variables for each horizon. Our study, focused on Europe's largest energy
market, Germany, analyzes hourly electricity prices using regularized
regression methods and generalized additive models.

arXiv link: http://arxiv.org/abs/2406.00326v2

Econometrics arXiv cross-link from cs.CE (cs.CE), submitted: 2024-05-31

Transforming Japan Real Estate

Authors: Diabul Haque

The Japanese real estate market, valued over 35 trillion USD, offers
significant investment opportunities. Accurate rent and price forecasting could
provide a substantial competitive edge. This paper explores using alternative
data variables to predict real estate performance in 1100 Japanese
municipalities. A comprehensive house price index was created, covering all
municipalities from 2005 to the present, using a dataset of over 5 million
transactions. This core dataset was enriched with economic factors spanning
decades, allowing for price trajectory predictions.
The findings show that alternative data variables can indeed forecast real
estate performance effectively. Investment signals based on these variables
yielded notable returns with low volatility. For example, the net migration
ratio delivered an annualized return of 4.6% with a Sharpe ratio of 1.5.
Taxable income growth and new dwellings ratio also performed well, with
annualized returns of 4.1% (Sharpe ratio of 1.3) and 3.3% (Sharpe ratio of
0.9), respectively. When combined with transformer models to predict
risk-adjusted returns 4 years in advance, the model achieved an R-squared score
of 0.28, explaining nearly 30% of the variation in future municipality prices.
These results highlight the potential of alternative data variables in real
estate investment. They underscore the need for further research to identify
more predictive factors. Nonetheless, the evidence suggests that such data can
provide valuable insights into real estate price drivers, enabling more
informed investment decisions in the Japanese market.

arXiv link: http://arxiv.org/abs/2405.20715v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2024-05-30

Multidimensional spatiotemporal clustering -- An application to environmental sustainability scores in Europe

Authors: Caterina Morelli, Simone Boccaletti, Paolo Maranzano, Philipp Otto

The assessment of corporate sustainability performance is extremely relevant
in facilitating the transition to a green and low-carbon intensity economy.
However, companies located in different areas may be subject to different
sustainability and environmental risks and policies. Henceforth, the main
objective of this paper is to investigate the spatial and temporal pattern of
the sustainability evaluations of European firms. We leverage on a large
dataset containing information about companies' sustainability performances,
measured by MSCI ESG ratings, and geographical coordinates of firms in Western
Europe between 2013 and 2023. By means of a modified version of the Chavent et
al. (2018) hierarchical algorithm, we conduct a spatial clustering analysis,
combining sustainability and spatial information, and a spatiotemporal
clustering analysis, which combines the time dynamics of multiple
sustainability features and spatial dissimilarities, to detect groups of firms
with homogeneous sustainability performance. We are able to build
cross-national and cross-industry clusters with remarkable differences in terms
of sustainability scores. Among other results, in the spatio-temporal analysis,
we observe a high degree of geographical overlap among clusters, indicating
that the temporal dynamics in sustainability assessment are relevant within a
multidimensional approach. Our findings help to capture the diversity of ESG
ratings across Western Europe and may assist practitioners and policymakers in
evaluating companies facing different sustainability-linked risks in different
areas.

arXiv link: http://arxiv.org/abs/2405.20191v1

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2024-05-30

The ARR2 prior: flexible predictive prior definition for Bayesian auto-regressions

Authors: David Kohns, Noa Kallioinen, Yann McLatchie, Aki Vehtari

We present the ARR2 prior, a joint prior over the auto-regressive components
in Bayesian time-series models and their induced $R^2$. Compared to other
priors designed for times-series models, the ARR2 prior allows for flexible and
intuitive shrinkage. We derive the prior for pure auto-regressive models, and
extend it to auto-regressive models with exogenous inputs, and state-space
models. Through both simulations and real-world modelling exercises, we
demonstrate the efficacy of the ARR2 prior in improving sparse and reliable
inference, while showing greater inference quality and predictive performance
than other shrinkage priors. An open-source implementation of the prior is
provided.

arXiv link: http://arxiv.org/abs/2405.19920v3

Econometrics arXiv paper, submitted: 2024-05-30

The Political Resource Curse Redux

Authors: Hanyuan Jiang

In the study of the Political Resource Curse (Brollo et al.,2013), the
authors identified a new channel to investigate whether the windfalls of
resources are unambiguously beneficial to society, both with theory and
empirical evidence. This paper revisits the framework with a new dataset.
Specifically, we implemented a regression discontinuity design and
difference-in-difference specification

arXiv link: http://arxiv.org/abs/2405.19897v1

Econometrics arXiv paper, submitted: 2024-05-30

Modelling and Forecasting Energy Market Volatility Using GARCH and Machine Learning Approach

Authors: Seulki Chung

This paper presents a comparative analysis of univariate and multivariate
GARCH-family models and machine learning algorithms in modeling and forecasting
the volatility of major energy commodities: crude oil, gasoline, heating oil,
and natural gas. It uses a comprehensive dataset incorporating financial,
macroeconomic, and environmental variables to assess predictive performance and
discusses volatility persistence and transmission across these commodities.
Aspects of volatility persistence and transmission, traditionally examined by
GARCH-class models, are jointly explored using the SHAP (Shapley Additive
exPlanations) method. The findings reveal that machine learning models
demonstrate superior out-of-sample forecasting performance compared to
traditional GARCH models. Machine learning models tend to underpredict, while
GARCH models tend to overpredict energy market volatility, suggesting a hybrid
use of both types of models. There is volatility transmission from crude oil to
the gasoline and heating oil markets. The volatility transmission in the
natural gas market is less prevalent.

arXiv link: http://arxiv.org/abs/2405.19849v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2024-05-29

Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data

Authors: Xuxing Chen, Abhishek Roy, Yifan Hu, Krishnakumar Balasubramanian

We develop and analyze algorithms for instrumental variable regression by
viewing the problem as a conditional stochastic optimization problem. In the
context of least-squares instrumental variable regression, our algorithms
neither require matrix inversions nor mini-batches and provides a fully online
approach for performing instrumental variable regression with streaming data.
When the true model is linear, we derive rates of convergence in expectation,
that are of order $O(\log T/T)$ and $O(1/T^{1-\iota})$ for
any $\iota>0$, respectively under the availability of two-sample and one-sample
oracles, respectively, where $T$ is the number of iterations. Importantly,
under the availability of the two-sample oracle, our procedure avoids
explicitly modeling and estimating the relationship between confounder and the
instrumental variables, demonstrating the benefit of the proposed approach over
recent works based on reformulating the problem as minimax optimization
problems. Numerical experiments are provided to corroborate the theoretical
results.

arXiv link: http://arxiv.org/abs/2405.19463v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-05-29

Generalized Neyman Allocation for Locally Minimax Optimal Best-Arm Identification

Authors: Masahiro Kato

This study investigates an asymptotically locally minimax optimal algorithm
for fixed-budget best-arm identification (BAI). We propose the Generalized
Neyman Allocation (GNA) algorithm and demonstrate that its worst-case upper
bound on the probability of misidentifying the best arm aligns with the
worst-case lower bound under the small-gap regime, where the gap between the
expected outcomes of the best and suboptimal arms is small. Our lower and upper
bounds are tight, matching exactly including constant terms within the
small-gap regime. The GNA algorithm generalizes the Neyman allocation for
two-armed bandits (Neyman, 1934; Kaufmann et al., 2016) and refines existing
BAI algorithms, such as those proposed by Glynn & Juneja (2004). By proposing
an asymptotically minimax optimal algorithm, we address the longstanding open
issue in BAI (Kaufmann, 2020) and treatment choice (Kasy & Sautmann, 202) by
restricting a class of distributions to the small-gap regimes.

arXiv link: http://arxiv.org/abs/2405.19317v4

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-05-29

Synthetic Potential Outcomes and Causal Mixture Identifiability

Authors: Bijan Mazaheri, Chandler Squires, Caroline Uhler

Heterogeneous data from multiple populations, sub-groups, or sources is often
represented as a “mixture model” with a single latent class influencing all
of the observed covariates. Heterogeneity can be resolved at multiple levels by
grouping populations according to different notions of similarity. This paper
proposes grouping with respect to the causal response of an intervention or
perturbation on the system. This definition is distinct from previous notions,
such as similar covariate values (e.g. clustering) or similar correlations
between covariates (e.g. Gaussian mixture models). To solve the problem, we
“synthetically sample” from a counterfactual distribution using higher-order
multi-linear moments of the observable data. To understand how these “causal
mixtures” fit in with more classical notions, we develop a hierarchy of
mixture identifiability.

arXiv link: http://arxiv.org/abs/2405.19225v4

Econometrics arXiv updated paper (originally submitted: 2024-05-29)

Transmission Channel Analysis in Dynamic Models

Authors: Enrico Wegner, Lenard Lieb, Stephan Smeekes, Ines Wilms

We propose a framework for analysing transmission channels in a large class
of dynamic models. We formulate our approach both using graph theory and
potential outcomes, which we show to be equivalent. Our method, labelled
Transmission Channel Analysis (TCA), allows for the decomposition of total
effects captured by impulse response functions into the effects flowing through
transmission channels, thereby providing a quantitative assessment of the
strength of various well-defined channels. We establish that this requires no
additional identification assumptions beyond the identification of the
structural shock whose effects the researcher wants to decompose. Additionally,
we prove that impulse response functions are sufficient statistics for the
computation of transmission effects. We demonstrate the empirical relevance of
TCA for policy evaluation by decomposing the effects of policy shocks arising
from a variety of popular macroeconomic models.

arXiv link: http://arxiv.org/abs/2405.18987v3

Econometrics arXiv paper, submitted: 2024-05-28

Difference-in-Discontinuities: Estimation, Inference and Validity Tests

Authors: Pedro Picchetti, Cristine C. X. Pinto, Stephanie T. Shinoki

This paper investigates the econometric theory behind the newly developed
difference-in-discontinuities design (DiDC). Despite its increasing use in
applied research, there are currently limited studies of its properties. The
method combines elements of regression discontinuity (RDD) and
difference-in-differences (DiD) designs, allowing researchers to eliminate the
effects of potential confounders at the discontinuity. We formalize the
difference-in-discontinuity theory by stating the identification assumptions
and proposing a nonparametric estimator, deriving its asymptotic properties and
examining the scenarios in which the DiDC has desirable bias properties when
compared to the standard RDD. We also provide comprehensive tests for one of
the identification assumption of the DiDC. Monte Carlo simulation studies show
that the estimators have good performance in finite samples. Finally, we
revisit Grembi et al. (2016), that studies the effects of relaxing fiscal rules
on public finance outcomes in Italian municipalities. The results show that the
proposed estimator exhibits substantially smaller confidence intervals for the
estimated effects.

arXiv link: http://arxiv.org/abs/2405.18531v1

Econometrics arXiv paper, submitted: 2024-05-28

Semi-nonparametric models of multidimensional matching: an optimal transport approach

Authors: Dongwoo Kim, Young Jun Lee

This paper proposes empirically tractable multidimensional matching models,
focusing on worker-job matching. We generalize the parametric model proposed by
Lindenlaub (2017), which relies on the assumption of joint normality of
observed characteristics of workers and jobs. In our paper, we allow
unrestricted distributions of characteristics and show identification of the
production technology, and equilibrium wage and matching functions using tools
from optimal transport theory. Given identification, we propose efficient,
consistent, asymptotically normal sieve estimators. We revisit Lindenlaub's
empirical application and show that, between 1990 and 2010, the U.S. economy
experienced much larger technological progress favoring cognitive abilities
than the original findings suggest. Furthermore, our flexible model
specifications provide a significantly better fit for patterns in the evolution
of wage inequality.

arXiv link: http://arxiv.org/abs/2405.18089v1

Econometrics arXiv updated paper (originally submitted: 2024-05-28)

Dyadic Regression with Sample Selection

Authors: Kensuke Sakamoto

This paper addresses the sample selection problem in panel dyadic regression
analysis. Dyadic data often include many zeros in the main outcomes due to the
underlying network formation process. This not only contaminates popular
estimators used in practice but also complicates the inference due to the
dyadic dependence structure. We extend Kyriazidou (1997)'s approach to dyadic
data and characterize the asymptotic distribution of our proposed estimator.
The convergence rates are $n$ or $n^{2h_{n}}$, depending on the
degeneracy of the H\'{a}jek projection part of the estimator, where $n$ is the
number of nodes and $h_{n}$ is a bandwidth. We propose a bias-corrected
confidence interval and a variance estimator that adapts to the degeneracy. A
Monte Carlo simulation shows the good finite sample performance of our
estimator and highlights the importance of bias correction in both asymptotic
regimes when the fraction of zeros in outcomes varies. We illustrate our
procedure using data from Moretti and Wilson (2017)'s paper on migration.

arXiv link: http://arxiv.org/abs/2405.17787v3

Econometrics arXiv updated paper (originally submitted: 2024-05-27)

Count Data Models with Heterogeneous Peer Effects under Rational Expectations

Authors: Aristide Houndetoungan

This paper develops a peer effect model for count responses under rational
expectations. The model accounts for heterogeneity in peer effects through
groups based on observed characteristics. Identification is based on the linear
model condition requiring friends' friends who are not direct friends, which I
show extends to a broad class of nonlinear models. Parameters are estimated
using a nested pseudo-likelihood approach. An empirical application on
students' extracurricular participation reveals that females are more
responsive to peers than males. An easy-to-use R package, CDatanet, is
available for implementing the model.

arXiv link: http://arxiv.org/abs/2405.17290v2

Econometrics arXiv updated paper (originally submitted: 2024-05-27)

Estimating treatment-effect heterogeneity across sites, in multi-site randomized experiments with few units per site

Authors: Clément de Chaisemartin, Antoine Deeb

In multi-site randomized trials with many sites and few randomization units
per site, an Empirical-Bayes estimator can be used to estimate the variance of
the treatment effect across sites. When this estimator indicates that treatment
effects do vary, we propose estimators of the coefficients from regressions of
site-level effects on site-level characteristics that are unobserved but can be
unbiasedly estimated, such as sites' average outcome without treatment, or
site-specific treatment effects on mediator variables. In experiments with
imperfect compliance, we show that the sign of the correlation between local
average treatment effects (LATEs) and site-level characteristics is identified,
and we propose a partly testable assumption under which the variance of LATEs
is identified. We use our results to revisit Behaghel et al (2014), who study
the effect of counseling programs on job seekers' job-finding rate, in 200 job
placement agencies in France. We find considerable treatment-effect
heterogeneity, both for intention to treat and LATE effects, and the treatment
effect is negatively correlated with sites' job-finding rate without treatment.

arXiv link: http://arxiv.org/abs/2405.17254v3

Econometrics arXiv updated paper (originally submitted: 2024-05-27)

Mixing it up: Inflation at risk

Authors: Maximilian Schröder

Assessing the contribution of various risk factors to future inflation risks
was crucial for guiding monetary policy during the recent high inflation
period. However, existing methodologies often provide limited insights by
focusing solely on specific percentiles of the forecast distribution. In
contrast, this paper introduces a comprehensive framework that examines how
economic indicators impact the entire forecast distribution of macroeconomic
variables, facilitating the decomposition of the overall risk outlook into its
underlying drivers. Additionally, the framework allows for the construction of
risk measures that align with central bank preferences, serving as valuable
summary statistics. Applied to the recent inflation surge, the framework
reveals that U.S. inflation risk was primarily influenced by the recovery of
the U.S. business cycle and surging commodity prices, partially mitigated by
adjustments in monetary policy and credit spreads.

arXiv link: http://arxiv.org/abs/2405.17237v2

Econometrics arXiv paper, submitted: 2024-05-27

Quantifying the Reliance of Black-Box Decision-Makers on Variables of Interest

Authors: Daniel Vebman

This paper introduces a framework for measuring how much black-box
decision-makers rely on variables of interest. The framework adapts a
permutation-based measure of variable importance from the explainable machine
learning literature. With an emphasis on applicability, I present some of the
framework's theoretical and computational properties, explain how reliance
computations have policy implications, and work through an illustrative
example. In the empirical application to interruptions by Supreme Court
Justices during oral argument, I find that the effect of gender is more muted
compared to the existing literature's estimate; I then use this paper's
framework to compare Justices' reliance on gender and alignment to their
reliance on experience, which are incomparable using regression coefficients.

arXiv link: http://arxiv.org/abs/2405.17225v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2024-05-27

Statistical Mechanism Design: Robust Pricing, Estimation, and Inference

Authors: Duarte Gonçalves, Bruno A. Furtado

This paper tackles challenges in pricing and revenue projections due to
consumer uncertainty. We propose a novel data-based approach for firms facing
unknown consumer type distributions. Unlike existing methods, we assume firms
only observe a finite sample of consumers' types. We introduce
empirically optimal mechanisms, a simple and intuitive class of
sample-based mechanisms with strong finite-sample revenue guarantees.
Furthermore, we leverage our results to develop a toolkit for statistical
inference on profits. Our approach allows to reliably estimate the profits
associated for any particular mechanism, to construct confidence intervals, and
to, more generally, conduct valid hypothesis testing.

arXiv link: http://arxiv.org/abs/2405.17178v1

Econometrics arXiv paper, submitted: 2024-05-27

Cross-border cannibalization: Spillover effects of wind and solar energy on interconnected European electricity markets

Authors: Clemens Stiewe, Alice Lixuan Xu, Anselm Eicke, Lion Hirth

The average revenue, or market value, of wind and solar energy tends to fall
with increasing market shares, as is now evident across European electricity
markets. At the same time, these markets have become more interconnected. In
this paper, we empirically study the multiple cross-border effects on the value
of renewable energy: on one hand, interconnection is a flexibility resource
that allows to export energy when it is locally abundant, benefitting
renewables. On the other hand, wind and solar radiation are correlated across
space, so neighboring supply adds to the local one to depress domestic prices.
We estimate both effects, using spatial panel regression on electricity market
data from 2015 to 2023 from 30 European bidding zones. We find that domestic
wind and solar value is not only depressed by domestic, but also by neighboring
renewables expansion. The better interconnected a market is, the smaller the
effect of domestic but the larger the effect of neighboring renewables. While
wind value is stabilized by interconnection, solar value is not. If wind market
share increases both at home and in neighboring markets by one percentage
point, the value factor of wind energy is reduced by just above 1 percentage
points. For solar, this number is almost 4 percentage points.

arXiv link: http://arxiv.org/abs/2405.17166v1

Econometrics arXiv paper, submitted: 2024-05-26

Estimating Dyadic Treatment Effects with Unknown Confounders

Authors: Tadao Hoshino, Takahide Yanagi

This paper proposes a statistical inference method for assessing treatment
effects with dyadic data. Under the assumption that the treatments follow an
exchangeable distribution, our approach allows for the presence of any
unobserved confounding factors that potentially cause endogeneity of treatment
choice without requiring additional information other than the treatments and
outcomes. Building on the literature of graphon estimation in network data
analysis, we propose a neighborhood kernel smoothing method for estimating
dyadic average treatment effects. We also develop a permutation inference
method for testing the sharp null hypothesis. Under certain regularity
conditions, we derive the rate of convergence of the proposed estimator and
demonstrate the size control property of our test. We apply our method to
international trade data to assess the impact of free trade agreements on
bilateral trade flows.

arXiv link: http://arxiv.org/abs/2405.16547v1

Econometrics arXiv paper, submitted: 2024-05-26

Two-way fixed effects instrumental variable regressions in staggered DID-IV designs

Authors: Sho Miyaji

Many studies run two-way fixed effects instrumental variable (TWFEIV)
regressions, leveraging variation in the timing of policy adoption across units
as an instrument for treatment. This paper studies the properties of the TWFEIV
estimator in staggered instrumented difference-in-differences (DID-IV) designs.
We show that in settings with the staggered adoption of the instrument across
units, the TWFEIV estimator can be decomposed into a weighted average of all
possible two-group/two-period Wald-DID estimators. Under staggered DID-IV
designs, a causal interpretation of the TWFEIV estimand hinges on the stable
effects of the instrument on the treatment and the outcome over time. We
illustrate the use of our decomposition theorem for the TWFEIV estimator
through an empirical application.

arXiv link: http://arxiv.org/abs/2405.16467v1

Econometrics arXiv paper, submitted: 2024-05-24

Dynamic Latent-Factor Model with High-Dimensional Asset Characteristics

Authors: Adam Baybutt

We develop novel estimation procedures with supporting econometric theory for
a dynamic latent-factor model with high-dimensional asset characteristics, that
is, the number of characteristics is on the order of the sample size. Utilizing
the Double Selection Lasso estimator, our procedure employs regularization to
eliminate characteristics with low signal-to-noise ratios yet maintains
asymptotically valid inference for asset pricing tests. The crypto asset class
is well-suited for applying this model given the limited number of tradable
assets and years of data as well as the rich set of available asset
characteristics. The empirical results present out-of-sample pricing abilities
and risk-adjusted returns for our novel estimator as compared to benchmark
methods. We provide an inference procedure for measuring the risk premium of an
observable nontradable factor, and employ this to find that the
inflation-mimicking portfolio in the crypto asset class has positive risk
compensation.

arXiv link: http://arxiv.org/abs/2405.15721v1

Econometrics arXiv paper, submitted: 2024-05-24

Empirical Crypto Asset Pricing

Authors: Adam Baybutt

We motivate the study of the crypto asset class with eleven empirical facts,
and study the drivers of crypto asset returns through the lens of univariate
factors. We argue crypto assets are a new, attractive, and independent asset
class. In a novel and rigorously built panel of crypto assets, we examine
pricing ability of sixty three asset characteristics to find rich signal
content across the characteristics and at several future horizons. Only
univariate financial factors (i.e., functions of previous returns) were
associated with statistically significant long-short strategies, suggestive of
speculatively driven returns as opposed to more fundamental pricing factors.

arXiv link: http://arxiv.org/abs/2405.15716v1

Econometrics arXiv paper, submitted: 2024-05-24

Generating density nowcasts for U.S. GDP growth with deep learning: Bayes by Backprop and Monte Carlo dropout

Authors: Kristóf Németh, Dániel Hadházi

Recent results in the literature indicate that artificial neural networks
(ANNs) can outperform the dynamic factor model (DFM) in terms of the accuracy
of GDP nowcasts. Compared to the DFM, the performance advantage of these highly
flexible, nonlinear estimators is particularly evident in periods of recessions
and structural breaks. From the perspective of policy-makers, however, nowcasts
are the most useful when they are conveyed with uncertainty attached to them.
While the DFM and other classical time series approaches analytically derive
the predictive (conditional) distribution for GDP growth, ANNs can only produce
point nowcasts based on their default training procedure (backpropagation). To
fill this gap, first in the literature, we adapt two different deep learning
algorithms that enable ANNs to generate density nowcasts for U.S. GDP growth:
Bayes by Backprop and Monte Carlo dropout. The accuracy of point nowcasts,
defined as the mean of the empirical predictive distribution, is evaluated
relative to a naive constant growth model for GDP and a benchmark DFM
specification. Using a 1D CNN as the underlying ANN architecture, both
algorithms outperform those benchmarks during the evaluation period (2012:Q1 --
2022:Q4). Furthermore, both algorithms are able to dynamically adjust the
location (mean), scale (variance), and shape (skew) of the empirical predictive
distribution. The results indicate that both Bayes by Backprop and Monte Carlo
dropout can effectively augment the scope and functionality of ANNs, rendering
them a fully compatible and competitive alternative for classical time series
approaches.

arXiv link: http://arxiv.org/abs/2405.15579v1

Econometrics arXiv paper, submitted: 2024-05-23

Modularity, Higher-Order Recombination, and New Venture Success

Authors: Likun Cao, Ziwen Chen, James Evans

Modularity is critical for the emergence and evolution of complex social,
natural, and technological systems robust to exploratory failure. We consider
this in the context of emerging business organizations, which can be understood
as complex systems. We build a theory of organizational emergence as
higher-order, modular recombination wherein successful start-ups assemble novel
combinations of successful modular components, rather than engage in the
lower-order combination of disparate, singular components. Lower-order
combinations are critical for long-term socio-economic transformation, but
manifest diffuse benefits requiring support as public goods. Higher-order
combinations facilitate rapid experimentation and attract private funding. We
evaluate this with U.S. venture-funded start-ups over 45 years using company
descriptions. We build a dynamic semantic space with word embedding models
constructed from evolving business discourse, which allow us to measure the
modularity of and distance between new venture components. Using event history
models, we demonstrate how ventures more likely achieve successful IPOs and
high-priced acquisitions when they combine diverse modules of clustered
components. We demonstrate how higher-order combination enables venture success
by accelerating firm development and diversifying investment, and we reflect on
its implications for social innovation.

arXiv link: http://arxiv.org/abs/2405.15042v1

Econometrics arXiv updated paper (originally submitted: 2024-05-23)

On the Identifying Power of Monotonicity for Average Treatment Effects

Authors: Yuehao Bai, Shunzhuang Huang, Sarah Moon, Azeem M. Shaikh, Edward J. Vytlacil

In the context of a binary outcome, treatment, and instrument, Balke and
Pearl (1993, 1997) establish that the monotonicity condition of Imbens and
Angrist (1994) has no identifying power beyond instrument exogeneity for
average potential outcomes and average treatment effects in the sense that
adding it to instrument exogeneity does not decrease the identified sets for
those parameters whenever those restrictions are consistent with the
distribution of the observable data. This paper shows that this phenomenon
holds in a broader setting with a multi-valued outcome, treatment, and
instrument, under an extension of the monotonicity condition that we refer to
as generalized monotonicity. We further show that this phenomenon holds for any
restriction on treatment response that is stronger than generalized
monotonicity provided that these stronger restrictions do not restrict
potential outcomes. Importantly, many models of potential treatments previously
considered in the literature imply generalized monotonicity, including the
types of monotonicity restrictions considered by Kline and Walters (2016),
Kirkeboen et al. (2016), and Heckman and Pinto (2018), and the restriction that
treatment selection is determined by particular classes of additive random
utility models. We show through a series of examples that restrictions on
potential treatments can provide identifying power beyond instrument exogeneity
for average potential outcomes and average treatment effects when the
restrictions imply that the generalized monotonicity condition is violated. In
this way, our results shed light on the types of restrictions required for help
in identifying average potential outcomes and average treatment effects.

arXiv link: http://arxiv.org/abs/2405.14104v3

Econometrics arXiv paper, submitted: 2024-05-22

Exogenous Consideration and Extended Random Utility

Authors: Roy Allen

In a consideration set model, an individual maximizes utility among the
considered alternatives. I relate a consideration set additive random utility
model to classic discrete choice and the extended additive random utility
model, in which utility can be $-\infty$ for infeasible alternatives. When
observable utility shifters are bounded, all three models are observationally
equivalent. Moreover, they have the same counterfactual bounds and welfare
formulas for changes in utility shifters like price. For attention
interventions, welfare cannot change in the full consideration model but is
completely unbounded in the limited consideration model. The identified set for
consideration set probabilities has a minimal width for any bounded support of
shifters, but with unbounded support it is a point: identification "towards"
infinity does not resemble identification "at" infinity.

arXiv link: http://arxiv.org/abs/2405.13945v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-05-22

Some models are useful, but for how long?: A decision theoretic approach to choosing when to refit large-scale prediction models

Authors: Kentaro Hoffman, Stephen Salerno, Jeff Leek, Tyler McCormick

Large-scale prediction models using tools from artificial intelligence (AI)
or machine learning (ML) are increasingly common across a variety of industries
and scientific domains. Despite their effectiveness, training AI and ML tools
at scale can cost tens or hundreds of thousands of dollars (or more); and even
after a model is trained, substantial resources must be invested to keep models
up-to-date. This paper presents a decision-theoretic framework for deciding
when to refit an AI/ML model when the goal is to perform unbiased statistical
inference using partially AI/ML-generated data. Drawing on portfolio
optimization theory, we treat the decision of {\it recalibrating} a model or
statistical inference versus {\it refitting} the model as a choice between
“investing” in one of two “assets.” One asset, recalibrating the model
based on another model, is quick and relatively inexpensive but bears
uncertainty from sampling and may not be robust to model drift. The other
asset, {\it refitting} the model, is costly but removes the drift concern
(though not statistical uncertainty from sampling). We present a framework for
balancing these two potential investments while preserving statistical
validity. We evaluate the framework using simulation and data on electricity
usage and predicting flu trends.

arXiv link: http://arxiv.org/abs/2405.13926v2

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2024-05-21

Integrating behavioral experimental findings into dynamical models to inform social change interventions

Authors: Radu Tanase, René Algesheimer, Manuel S. Mariani

Addressing global challenges -- from public health to climate change -- often
involves stimulating the large-scale adoption of new products or behaviors.
Research traditions that focus on individual decision making suggest that
achieving this objective requires better identifying the drivers of individual
adoption choices. On the other hand, computational approaches rooted in
complexity science focus on maximizing the propagation of a given product or
behavior throughout social networks of interconnected adopters. The integration
of these two perspectives -- although advocated by several research communities
-- has remained elusive so far. Here we show how achieving this integration
could inform seeding policies to facilitate the large-scale adoption of a given
behavior or product. Drawing on complex contagion and discrete choice theories,
we propose a method to estimate individual-level thresholds to adoption, and
validate its predictive power in two choice experiments. By integrating the
estimated thresholds into computational simulations, we show that
state-of-the-art seeding methods for social influence maximization might be
suboptimal if they neglect individual-level behavioral drivers, which can be
corrected through the proposed experimental method.

arXiv link: http://arxiv.org/abs/2405.13224v1

Econometrics arXiv paper, submitted: 2024-05-21

Conditional Choice Probability Estimation of Dynamic Discrete Choice Models with 2-period Finite Dependence

Authors: Yu Hao, Hiroyuki Kasahara

This paper extends the work of Arcidiacono and Miller (2011, 2019) by
introducing a novel characterization of finite dependence within dynamic
discrete choice models, demonstrating that numerous models display 2-period
finite dependence. We recast finite dependence as a problem of sequentially
searching for weights and introduce a computationally efficient method for
determining these weights by utilizing the Kronecker product structure embedded
in state transitions. With the estimated weights, we develop a computationally
attractive Conditional Choice Probability estimator with 2-period finite
dependence. The computational efficacy of our proposed estimator is
demonstrated through Monte Carlo simulations.

arXiv link: http://arxiv.org/abs/2405.12467v1

Econometrics arXiv paper, submitted: 2024-05-20

Estimating the Impact of Social Distance Policy in Mitigating COVID-19 Spread with Factor-Based Imputation Approach

Authors: Difang Huang, Ying Liang, Boyao Wu, Yanyi Ye

We identify the effectiveness of social distancing policies in reducing the
transmission of the COVID-19 spread. We build a model that measures the
relative frequency and geographic distribution of the virus growth rate and
provides hypothetical infection distribution in the states that enacted the
social distancing policies, where we control time-varying, observed and
unobserved, state-level heterogeneities. Using panel data on infection and
deaths in all US states from February 20 to April 20, 2020, we find that
stay-at-home orders and other types of social distancing policies significantly
reduced the growth rate of infection and deaths. We show that the effects are
time-varying and range from the weakest at the beginning of policy intervention
to the strongest by the end of our sample period. We also found that social
distancing policies were more effective in states with higher income, better
education, more white people, more democratic voters, and higher CNN
viewership.

arXiv link: http://arxiv.org/abs/2405.12180v1

Econometrics arXiv updated paper (originally submitted: 2024-05-20)

Instrumented Difference-in-Differences with Heterogeneous Treatment Effects

Authors: Sho Miyaji

Many studies exploit variation in policy adoption timing across units as an
instrument for treatment. This paper formalizes the underlying identification
strategy as an instrumented difference-in-differences (DID-IV). In this design,
a Wald-DID estimand, which scales the DID estimand of the outcome by the DID
estimand of the treatment, captures the local average treatment effect on the
treated (LATET). We extend the canonical DID-IV design to multiple period
settings with the staggered adoption of the instrument across units. Moreover,
we propose a credible estimation method in this design that is robust to
treatment effect heterogeneity. We illustrate the empirical relevance of our
findings, estimating returns to schooling in the United Kingdom. In this
application, the two-way fixed effects instrumental variable regression, the
conventional approach to implement DID-IV designs, yields a negative estimate.
By contrast, our estimation method indicates a substantial gain from schooling.

arXiv link: http://arxiv.org/abs/2405.12083v5

Econometrics arXiv paper, submitted: 2024-05-20

Comparing predictive ability in presence of instability over a very short time

Authors: Fabrizio Iacone, Luca Rossini, Andrea Viselli

We consider forecast comparison in the presence of instability when this
affects only a short period of time. We demonstrate that global tests do not
perform well in this case, as they were not designed to capture very
short-lived instabilities, and their power vanishes altogether when the
magnitude of the shock is very large. We then discuss and propose approaches
that are more suitable to detect such situations, such as nonparametric methods
(S test or MAX procedure). We illustrate these results in different Monte Carlo
exercises and in evaluating the nowcast of the quarterly US nominal GDP from
the Survey of Professional Forecasters (SPF) against a naive benchmark of no
growth, over the period that includes the GDP instability brought by the
Covid-19 crisis. We recommend that the forecaster should not pool the sample,
but exclude the short periods of high local instability from the evaluation
exercise.

arXiv link: http://arxiv.org/abs/2405.11954v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-05-20

Revisiting Day-ahead Electricity Price: Simple Model Save Millions

Authors: Linian Wang, Jianghong Liu, Huibin Zhang, Leye Wang

Accurate day-ahead electricity price forecasting is essential for residential
welfare, yet current methods often fall short in forecast accuracy. We observe
that commonly used time series models struggle to utilize the prior correlation
between price and demand-supply, which, we found, can contribute a lot to a
reliable electricity price forecaster. Leveraging this prior, we propose a
simple piecewise linear model that significantly enhances forecast accuracy by
directly deriving prices from readily forecastable demand-supply values.
Experiments in the day-ahead electricity markets of Shanxi province and ISO New
England reveal that such forecasts could potentially save residents millions of
dollars a year compared to existing methods. Our findings underscore the value
of suitably integrating time series modeling with economic prior for enhanced
electricity price forecasting accuracy.

arXiv link: http://arxiv.org/abs/2405.14893v2

Econometrics arXiv updated paper (originally submitted: 2024-05-20)

Testing Sign Congruence Between Two Parameters

Authors: Douglas L. Miller, Francesca Molinari, Jörg Stoye

We test the null hypothesis that two parameters $(\mu_1,\mu_2)$ have the same
sign, assuming that (asymptotically) normal estimators
$(\mu_1,\mu_2)$ are available. Examples of this problem include the
analysis of heterogeneous treatment effects, causal interpretation of
reduced-form estimands, meta-studies, and mediation analysis. A number of tests
were recently proposed. We recommend a test that is simple and rejects more
often than many of these recent proposals. Like all other tests in the
literature, it is conservative if the truth is near $(0,0)$ and therefore also
biased. To clarify whether these features are avoidable, we also provide a test
that is unbiased and has exact size control on the boundary of the null
hypothesis, but which has counterintuitive properties and hence we do not
recommend. We use the test to improve p-values in Kowalski (2022) from
information contained in that paper's main text and to establish statistical
significance of some key estimates in Dippel et al. (2021).

arXiv link: http://arxiv.org/abs/2405.11759v4

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2024-05-20

Transfer Learning for Spatial Autoregressive Models with Application to U.S. Presidential Election Prediction

Authors: Hao Zeng, Wei Zhong, Xingbai Xu

It is important to incorporate spatial geographic information into U.S.
presidential election analysis, especially for swing states. The state-level
analysis also faces significant challenges of limited spatial data
availability. To address the challenges of spatial dependence and small sample
sizes in predicting U.S. presidential election results using spatially
dependent data, we propose a novel transfer learning framework within the SAR
model, called as tranSAR. Classical SAR model estimation often loses accuracy
with small target data samples. Our framework enhances estimation and
prediction by leveraging information from similar source data. We introduce a
two-stage algorithm, consisting of a transferring stage and a debiasing stage,
to estimate parameters and establish theoretical convergence rates for the
estimators. Additionally, if the informative source data are unknown, we
propose a transferable source detection algorithm using spatial residual
bootstrap to maintain spatial dependence and derive its detection consistency.
Simulation studies show our algorithm substantially improves the classical
two-stage least squares estimator. We demonstrate our method's effectiveness in
predicting outcomes in U.S. presidential swing states, where it outperforms
traditional methods. In addition, our tranSAR model predicts that the
Democratic party will win the 2024 U.S. presidential election.

arXiv link: http://arxiv.org/abs/2405.15600v2

Econometrics arXiv cross-link from cs.AI (cs.AI), submitted: 2024-05-18

The Logic of Counterfactuals and the Epistemology of Causal Inference

Authors: Hanti Lin

The 2021 Nobel Prize in Economics recognized an epistemology of causal
inference based on the Rubin causal model (Rubin 1974), which merits broader
attention in philosophy. This model, in fact, presupposes a logical principle
of counterfactuals, Conditional Excluded Middle (CEM), the locus of a pivotal
debate between Stalnaker (1968) and Lewis (1973) on the semantics of
counterfactuals. Proponents of CEM should recognize that this connection points
to a new argument for CEM -- a Quine-Putnam indispensability argument grounded
in the Nobel-winning applications of the Rubin model in health and social
sciences. To advance the dialectic, I challenge this argument with an updated
Rubin causal model that retains its successes while dispensing with CEM. This
novel approach combines the strengths of the Rubin causal model and a causal
model familiar in philosophy, the causal Bayes net. The takeaway: deductive
logic and inductive inference, often studied in isolation, are deeply
interconnected.

arXiv link: http://arxiv.org/abs/2405.11284v3

Econometrics arXiv paper, submitted: 2024-05-17

Macroeconomic Factors, Industrial Indexes and Bank Spread in Brazil

Authors: Carlos Alberto Durigan Junior, André Taue Saito, Daniel Reed Bergmann, Nuno Manoel Martins Dias Fouto

The main objective of this paper is to Identify which macroe conomic factors
and industrial indexes influenced the total Brazilian banking spread between
March 2011 and March 2015. This paper considers subclassification of industrial
activities in Brazil. Monthly time series data were used in multivariate linear
regression models using Eviews (7.0). Eighteen variables were considered as
candidates to be determinants. Variables which positively influenced bank
spread are; Default, IPIs (Industrial Production Indexes) for capital goods,
intermediate goods, du rable consumer goods, semi-durable and non-durable
goods, the Selic, GDP, unemployment rate and EMBI +. Variables which influence
negatively are; Consumer and general consumer goods IPIs, IPCA, the balance of
the loan portfolio and the retail sales index. A p-value of 05% was considered.
The main conclusion of this work is that the progress of industry, job creation
and consumption can reduce bank spread. Keywords: Credit. Bank spread.
Macroeconomics. Industrial Production Indexes. Finance.

arXiv link: http://arxiv.org/abs/2405.10655v1

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2024-05-17

Overcoming Medical Overuse with AI Assistance: An Experimental Investigation

Authors: Ziyi Wang, Lijia Wei, Lian Xue

This study evaluates the effectiveness of Artificial Intelligence (AI) in
mitigating medical overtreatment, a significant issue characterized by
unnecessary interventions that inflate healthcare costs and pose risks to
patients. We conducted a lab-in-the-field experiment at a medical school,
utilizing a novel medical prescription task, manipulating monetary incentives
and the availability of AI assistance among medical students using a
three-by-two factorial design. We tested three incentive schemes: Flat
(constant pay regardless of treatment quantity), Progressive (pay increases
with the number of treatments), and Regressive (penalties for overtreatment) to
assess their influence on the adoption and effectiveness of AI assistance. Our
findings demonstrate that AI significantly reduced overtreatment rates by up to
62% in the Regressive incentive conditions where (prospective) physician and
patient interests were most aligned. Diagnostic accuracy improved by 17% to
37%, depending on the incentive scheme. Adoption of AI advice was high, with
approximately half of the participants modifying their decisions based on AI
input across all settings. For policy implications, we quantified the monetary
(57%) and non-monetary (43%) incentives of overtreatment and highlighted AI's
potential to mitigate non-monetary incentives and enhance social welfare. Our
results provide valuable insights for healthcare administrators considering AI
integration into healthcare systems.

arXiv link: http://arxiv.org/abs/2405.10539v1

Econometrics arXiv cross-link from cs.AI (cs.AI), submitted: 2024-05-16

Simulation-Based Benchmarking of Reinforcement Learning Agents for Personalized Retail Promotions

Authors: Yu Xia, Sriram Narayanamoorthy, Zhengyuan Zhou, Joshua Mabry

The development of open benchmarking platforms could greatly accelerate the
adoption of AI agents in retail. This paper presents comprehensive simulations
of customer shopping behaviors for the purpose of benchmarking reinforcement
learning (RL) agents that optimize coupon targeting. The difficulty of this
learning problem is largely driven by the sparsity of customer purchase events.
We trained agents using offline batch data comprising summarized customer
purchase histories to help mitigate this effect. Our experiments revealed that
contextual bandit and deep RL methods that are less prone to over-fitting the
sparse reward distributions significantly outperform static policies. This
study offers a practical framework for simulating AI agents that optimize the
entire retail customer journey. It aims to inspire the further development of
simulation tools for retail AI systems.

arXiv link: http://arxiv.org/abs/2405.10469v1

Econometrics arXiv paper, submitted: 2024-05-16

Optimal Text-Based Time-Series Indices

Authors: David Ardia, Keven Bluteau

We propose an approach to construct text-based time-series indices in an
optimal way--typically, indices that maximize the contemporaneous relation or
the predictive performance with respect to a target variable, such as
inflation. We illustrate our methodology with a corpus of news articles from
the Wall Street Journal by optimizing text-based indices focusing on tracking
the VIX index and inflation expectations. Our results highlight the superior
performance of our approach compared to existing indices.

arXiv link: http://arxiv.org/abs/2405.10449v1

Econometrics arXiv updated paper (originally submitted: 2024-05-16)

Comprehensive Causal Machine Learning

Authors: Michael Lechner, Jana Mareckova

Uncovering causal effects in multiple treatment setting at various levels of
granularity provides substantial value to decision makers. Comprehensive
machine learning approaches to causal effect estimation allow to use a single
causal machine learning approach for estimation and inference of causal mean
effects for all levels of granularity. Focusing on selection-on-observables,
this paper compares three such approaches, the modified causal forest (mcf),
the generalized random forest (grf), and double machine learning (dml). It also
compares the theoretical properties of the approaches and provides proven
theoretical guarantees for the mcf. The findings indicate that dml-based
methods excel for average treatment effects at the population level (ATE) and
group level (GATE) with few groups, when selection into treatment is not too
strong. However, for finer causal heterogeneity, explicitly outcome-centred
forest-based approaches are superior. The mcf has three additional benefits:
(i) It is the most robust estimator in cases when dml-based approaches
underperform because of substantial selection into treatment; (ii) it is the
best estimator for GATEs when the number of groups gets larger; and (iii), it
is the only estimator that is internally consistent, in the sense that
low-dimensional causal ATEs and GATEs are obtained as aggregates of
finer-grained causal parameters.

arXiv link: http://arxiv.org/abs/2405.10198v2

Econometrics arXiv updated paper (originally submitted: 2024-05-15)

Double Robustness of Local Projections and Some Unpleasant VARithmetic

Authors: José Luis Montiel Olea, Mikkel Plagborg-Møller, Eric Qian, Christian K. Wolf

We consider impulse response inference in a locally misspecified vector
autoregression (VAR) model. The conventional local projection (LP) confidence
interval has correct coverage even when the misspecification is so large that
it can be detected with probability approaching 1. This result follows from a
"double robustness" property analogous to that of popular partially linear
regression estimators. In contrast, the conventional VAR confidence interval
with short-to-moderate lag length can severely undercover, even for
misspecification that is small, economically plausible, and difficult to detect
statistically. There is no free lunch: the VAR confidence interval has robust
coverage only if the lag length is so large that the interval is as wide as the
LP interval.

arXiv link: http://arxiv.org/abs/2405.09509v2

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2024-05-15

Identifying Heterogeneous Decision Rules From Choices When Menus Are Unobserved

Authors: Larry G Epstein, Kaushil Patel

Given only aggregate choice data and limited information about how menus are
distributed across the population, we describe what can be inferred robustly
about the distribution of preferences (or more general decision rules). We
strengthen and generalize existing results on such identification and provide
an alternative analytical approach to study the problem. We show further that
our model and results are applicable, after suitable reinterpretation, to other
contexts. One application is to the robust identification of the distribution
of updating rules given only the population distribution of beliefs and limited
information about heterogeneous information sources.

arXiv link: http://arxiv.org/abs/2405.09500v1

Econometrics arXiv paper, submitted: 2024-05-15

Optimizing Sales Forecasts through Automated Integration of Market Indicators

Authors: Lina Döring, Felix Grumbach, Pascal Reusch

Recognizing that traditional forecasting models often rely solely on
historical demand, this work investigates the potential of data-driven
techniques to automatically select and integrate market indicators for
improving customer demand predictions. By adopting an exploratory methodology,
we integrate macroeconomic time series, such as national GDP growth, from the
Eurostat database into Neural Prophet and SARIMAX
forecasting models. Suitable time series are automatically identified through
different state-of-the-art feature selection methods and applied to sales data
from our industrial partner. It could be shown that forecasts can be
significantly enhanced by incorporating external information. Notably, the
potential of feature selection methods stands out, especially due to their
capability for automation without expert knowledge and manual selection effort.
In particular, the Forward Feature Selection technique consistently yielded
superior forecasting accuracy for both SARIMAX and Neural Prophet across
different company sales datasets. In the comparative analysis of the errors of
the selected forecasting models, namely Neural Prophet and SARIMAX, it is
observed that neither model demonstrates a significant superiority over the
other.

arXiv link: http://arxiv.org/abs/2406.07564v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-05-14

Bounds on the Distribution of a Sum of Two Random Variables: Revisiting a problem of Kolmogorov with application to Individual Treatment Effects

Authors: Zhehao Zhang, Thomas S. Richardson

We revisit the following problem, proposed by Kolmogorov: given prescribed
marginal distributions $F$ and $G$ for random variables $X,Y$ respectively,
characterize the set of compatible distribution functions for the sum $Z=X+Y$.
Bounds on the distribution function for $Z$ were first given by Markarov (1982)
and R\"uschendorf (1982) independently. Frank et al. (1987) provided a solution
to the same problem using copula theory. However, though these authors obtain
the same bounds, they make different assertions concerning their sharpness. In
addition, their solutions leave some open problems in the case when the given
marginal distribution functions are discontinuous. These issues have led to
some confusion and erroneous statements in subsequent literature, which we
correct.
Kolmogorov's problem is closely related to inferring possible distributions
for individual treatment effects $Y_1 - Y_0$ given the marginal distributions
of $Y_1$ and $Y_0$; the latter being identified from a randomized experiment.
We use our new insights to sharpen and correct the results due to Fan and Park
(2010) concerning individual treatment effects, and to fill some other logical
gaps.

arXiv link: http://arxiv.org/abs/2405.08806v2

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2024-05-14

Variational Bayes and non-Bayesian Updating

Authors: Tomasz Strzalecki

I show how variational Bayes can be used as a microfoundation for a popular
model of non-Bayesian updating.

arXiv link: http://arxiv.org/abs/2405.08796v2

Econometrics arXiv paper, submitted: 2024-05-14

Latent group structure in linear panel data models with endogenous regressors

Authors: Junho Choi, Ryo Okui

This paper concerns the estimation of linear panel data models with
endogenous regressors and a latent group structure in the coefficients. We
consider instrumental variables estimation of the group-specific coefficient
vector. We show that direct application of the Kmeans algorithm to the
generalized method of moments objective function does not yield unique
estimates. We newly develop and theoretically justify two-stage estimation
methods that apply the Kmeans algorithm to a regression of the dependent
variable on predicted values of the endogenous regressors. The results of Monte
Carlo simulations demonstrate that two-stage estimation with the first stage
modeled using a latent group structure achieves good classification accuracy,
even if the true first-stage regression is fully heterogeneous. We apply our
estimation methods to revisiting the relationship between income and democracy.

arXiv link: http://arxiv.org/abs/2405.08687v1

Econometrics arXiv paper, submitted: 2024-05-14

Predicting NVIDIA's Next-Day Stock Price: A Comparative Analysis of LSTM, MLP, ARIMA, and ARIMA-GARCH Models

Authors: Yiluan Xing, Chao Yan, Cathy Chang Xie

Forecasting stock prices remains a considerable challenge in financial
markets, bearing significant implications for investors, traders, and financial
institutions. Amid the ongoing AI revolution, NVIDIA has emerged as a key
player driving innovation across various sectors. Given its prominence, we
chose NVIDIA as the subject of our study.

arXiv link: http://arxiv.org/abs/2405.08284v1

Econometrics arXiv updated paper (originally submitted: 2024-05-13)

Random Utility Models with Skewed Random Components: the Smallest versus Largest Extreme Value Distribution

Authors: Richard T. Carson, Derrick H. Sun, Yixiao Sun

At the core of most random utility models (RUMs) is an individual agent with
a random utility component following a largest extreme value Type I (LEVI)
distribution. What if, instead, the random component follows its mirror image
-- the smallest extreme value Type I (SEVI) distribution? Differences between
these specifications, closely tied to the random component's skewness, can be
quite profound. For the same preference parameters, the two RUMs, equivalent
with only two choice alternatives, diverge progressively as the number of
alternatives increases, resulting in substantially different estimates and
predictions for key measures, such as elasticities and market shares.
The LEVI model imposes the well-known independence-of-irrelevant-alternatives
property, while SEVI does not. Instead, the SEVI choice probability for a
particular option involves enumerating all subsets that contain this option.
The SEVI model, though more complex to estimate, is shown to have
computationally tractable closed-form choice probabilities. Much of the paper
delves into explicating the properties of the SEVI model and exploring
implications of the random component's skewness.
Conceptually, the difference between the LEVI and SEVI models centers on
whether information, known only to the agent, is more likely to increase or
decrease the systematic utility parameterized using observed attributes. LEVI
does the former; SEVI the latter. An immediate implication is that if choice is
characterized by SEVI random components, then the observed choice is more
likely to correspond to the systematic-utility-maximizing choice than if
characterized by LEVI. Examining standard empirical examples from different
applied areas, we find that the SEVI model outperforms the LEVI model,
suggesting the relevance of its inclusion in applied researchers' toolkits.

arXiv link: http://arxiv.org/abs/2405.08222v2

Econometrics arXiv updated paper (originally submitted: 2024-05-13)

Simultaneous Inference for Local Structural Parameters with Random Forests

Authors: David M. Ritzwoller, Vasilis Syrgkanis

We construct simultaneous confidence intervals for solutions to conditional
moment equations. The intervals are built around a class of nonparametric
regression algorithms based on subsampled kernels. This class encompasses
various forms of subsampled random forest regression, including Generalized
Random Forests (Athey et al., 2019). Although simultaneous validity is often
desirable in practice -- for example, for fine-grained characterization of
treatment effect heterogeneity -- only confidence intervals that confer
pointwise guarantees were previously available. Our work closes this gap. As a
by-product, we obtain several new order-explicit results on the concentration
and normal approximation of high-dimensional U-statistics.

arXiv link: http://arxiv.org/abs/2405.07860v3

Econometrics arXiv updated paper (originally submitted: 2024-05-13)

Robust Estimation and Inference for High-Dimensional Panel Data Models

Authors: Jiti Gao, Fei Liu, Bin Peng, Yayi Yan

This paper provides the relevant literature with a complete toolkit for
conducting robust estimation and inference about the parameters of interest
involved in a high-dimensional panel data framework. Specifically, (1) we allow
for non-Gaussian, serially and cross-sectionally correlated and heteroskedastic
error processes, (2) we develop an estimation method for high-dimensional
long-run covariance matrix using a thresholded estimator, (3) we also allow for
the number of regressors to grow faster than the sample size.
Methodologically and technically, we develop two Nagaev--types of
concentration inequalities: one for a partial sum and the other for a quadratic
form, subject to a set of easily verifiable conditions. Leveraging these two
inequalities, we derive a non-asymptotic bound for the LASSO estimator, achieve
asymptotic normality via the node-wise LASSO regression, and establish a sharp
convergence rate for the thresholded heteroskedasticity and autocorrelation
consistent (HAC) estimator.
We demonstrate the practical relevance of these theoretical results by
investigating a high-dimensional panel data model with interactive effects.
Moreover, we conduct extensive numerical studies using simulated and real data
examples.

arXiv link: http://arxiv.org/abs/2405.07420v3

Econometrics arXiv updated paper (originally submitted: 2024-05-12)

Kernel Three Pass Regression Filter

Authors: Rajveer Jat, Daanish Padha

We forecast a single time series using a high-dimensional set of predictors.
When these predictors share common underlying dynamics, an approximate latent
factor model provides a powerful characterization of their co-movements
Bai(2003). These latent factors succinctly summarize the data and can also be
used for prediction, alleviating the curse of dimensionality in
high-dimensional prediction exercises, see Stock & Watson (2002a). However,
forecasting using these latent factors suffers from two potential drawbacks.
First, not all pervasive factors among the set of predictors may be relevant,
and using all of them can lead to inefficient forecasts. The second shortcoming
is the assumption of linear dependence of predictors on the underlying factors.
The first issue can be addressed by using some form of supervision, which leads
to the omission of irrelevant information. One example is the three-pass
regression filter proposed by Kelly & Pruitt (2015). We extend their framework
to cases where the form of dependence might be nonlinear by developing a new
estimator, which we refer to as the Kernel Three-Pass Regression Filter
(K3PRF). This alleviates the aforementioned second shortcoming. The estimator
is computationally efficient and performs well empirically. The short-term
performance matches or exceeds that of established models, while the long-term
performance shows significant improvement.

arXiv link: http://arxiv.org/abs/2405.07292v3

Econometrics arXiv paper, submitted: 2024-05-12

On the Ollivier-Ricci curvature as fragility indicator of the stock markets

Authors: Joaquín Sánchez García, Sebastian Gherghe

Recently, an indicator for stock market fragility and crash size in terms of
the Ollivier-Ricci curvature has been proposed. We study analytical and
empirical properties of such indicator, test its elasticity with respect to
different parameters and provide heuristics for the parameters involved. We
show when and how the indicator accurately describes a financial crisis. We
also propose an alternate method for calculating the indicator using a specific
sub-graph with special curvature properties.

arXiv link: http://arxiv.org/abs/2405.07134v1

Econometrics arXiv paper, submitted: 2024-05-10

Identifying Peer Effects in Networks with Unobserved Effort and Isolated Students

Authors: Aristide Houndetoungan, Cristelle Kouame, Michael Vlassopoulos

Peer influence on effort devoted to some activity is often studied using
proxy variables when actual effort is unobserved. For instance, in education,
academic effort is often proxied by GPA. We propose an alternative approach
that circumvents this approximation. Our framework distinguishes unobserved
shocks to GPA that do not affect effort from preference shocks that do affect
effort levels. We show that peer effects estimates obtained using our approach
can differ significantly from classical estimates (where effort is
approximated) if the network includes isolated students. Applying our approach
to data on high school students in the United States, we find that peer effect
estimates relying on GPA as a proxy for effort are 40% lower than those
obtained using our approach.

arXiv link: http://arxiv.org/abs/2405.06850v1

Econometrics arXiv updated paper (originally submitted: 2024-05-10)

Generalization Issues in Conjoint Experiment: Attention and Salience

Authors: Jiawei Fu, Xiaojun Li

Can the causal effects estimated in an experiment be generalized to
real-world scenarios? This question lies at the heart of social science
studies. External validity primarily assesses whether experimental effects
persist across different settings, implicitly presuming the consistency of
experimental effects with their real-life counterparts. However, we argue that
this presumed consistency may not always hold, especially in experiments
involving multi-dimensional decision processes, such as conjoint experiments.
We introduce a formal model to elucidate how attention and salience effects
lead to three types of inconsistencies between experimental findings and
real-world phenomena: amplified effect magnitude, effect sign reversal, and
effect importance reversal. We derive testable hypotheses from each theoretical
outcome and test these hypotheses using data from various existing conjoint
experiments and our own experiments. Drawing on our theoretical framework, we
propose several recommendations for experimental design aimed at enhancing the
generalizability of survey experiment findings.

arXiv link: http://arxiv.org/abs/2405.06779v3

Econometrics arXiv paper, submitted: 2024-05-10

A Sharp Test for the Judge Leniency Design

Authors: Mohamed Coulibaly, Yu-Chin Hsu, Ismael Mourifié, Yuanyuan Wan

We propose a new specification test to assess the validity of the judge
leniency design. We characterize a set of sharp testable implications, which
exploit all the relevant information in the observed data distribution to
detect violations of the judge leniency design assumptions. The proposed sharp
test is asymptotically valid and consistent and will not make discordant
recommendations. When the judge's leniency design assumptions are rejected, we
propose a way to salvage the model using partial monotonicity and exclusion
assumptions, under which a variant of the Local Instrumental Variable (LIV)
estimand can recover the Marginal Treatment Effect. Simulation studies show our
test outperforms existing non-sharp tests by significant margins. We apply our
test to assess the validity of the judge leniency design using data from
Stevenson (2018), and it rejects the validity for three crime categories:
robbery, drug selling, and drug possession.

arXiv link: http://arxiv.org/abs/2405.06156v1

Econometrics arXiv paper, submitted: 2024-05-09

Advancing Distribution Decomposition Methods Beyond Common Supports: Applications to Racial Wealth Disparities

Authors: Bernardo Modenesi

I generalize state-of-the-art approaches that decompose differences in the
distribution of a variable of interest between two groups into a portion
explained by covariates and a residual portion. The method that I propose
relaxes the overlapping supports assumption, allowing the groups being compared
to not necessarily share exactly the same covariate support. I illustrate my
method revisiting the black-white wealth gap in the U.S. as a function of labor
income and other variables. Traditionally used decomposition methods would trim
(or assign zero weight to) observations that lie outside the common covariate
support region. On the other hand, by allowing all observations to contribute
to the existing wealth gap, I find that otherwise trimmed observations
contribute from 3% to 19% to the overall wealth gap, at different portions of
the wealth distribution.

arXiv link: http://arxiv.org/abs/2405.05759v1

Econometrics arXiv paper, submitted: 2024-05-09

Sequential Validation of Treatment Heterogeneity

Authors: Stefan Wager

We use the martingale construction of Luedtke and van der Laan (2016) to
develop tests for the presence of treatment heterogeneity. The resulting
sequential validation approach can be instantiated using various validation
metrics, such as BLPs, GATES, QINI curves, etc., and provides an alternative to
cross-validation-like cross-fold application of these metrics.

arXiv link: http://arxiv.org/abs/2405.05534v1

Econometrics arXiv paper, submitted: 2024-05-08

Causal Duration Analysis with Diff-in-Diff

Authors: Ben Deaner, Hyejin Ku

In economic program evaluation, it is common to obtain panel data in which
outcomes are indicators that an individual has reached an absorbing state. For
example, they may indicate whether an individual has exited a period of
unemployment, passed an exam, left a marriage, or had their parole revoked. The
parallel trends assumption that underpins difference-in-differences generally
fails in such settings. We suggest identifying conditions that are analogous to
those of difference-in-differences but apply to hazard rates rather than mean
outcomes. These alternative assumptions motivate estimators that retain the
simplicity and transparency of standard diff-in-diff, and we suggest analogous
specification tests. Our approach can be adapted to general linear restrictions
between the hazard rates of different groups, motivating duration analogues of
the triple differences and synthetic control methods. We apply our procedures
to examine the impact of a policy that increased the generosity of unemployment
benefits, using a cross-cohort comparison.

arXiv link: http://arxiv.org/abs/2405.05220v1

Econometrics arXiv paper, submitted: 2024-05-08

SVARs with breaks: Identification and inference

Authors: Emanuele Bacchiocchi, Toru Kitagawa

In this paper we propose a class of structural vector autoregressions (SVARs)
characterized by structural breaks (SVAR-WB). Together with standard
restrictions on the parameters and on functions of them, we also consider
constraints across the different regimes. Such constraints can be either (a) in
the form of stability restrictions, indicating that not all the parameters or
impulse responses are subject to structural changes, or (b) in terms of
inequalities regarding particular characteristics of the SVAR-WB across the
regimes. We show that all these kinds of restrictions provide benefits in terms
of identification. We derive conditions for point and set identification of the
structural parameters of the SVAR-WB, mixing equality, sign, rank and stability
restrictions, as well as constraints on forecast error variances (FEVs). As
point identification, when achieved, holds locally but not globally, there will
be a set of isolated structural parameters that are observationally equivalent
in the parametric space. In this respect, both common frequentist and Bayesian
approaches produce unreliable inference as the former focuses on just one of
these observationally equivalent points, while for the latter on a
non-vanishing sensitivity to the prior. To overcome these issues, we propose
alternative approaches for estimation and inference that account for all
admissible observationally equivalent structural parameters. Moreover, we
develop a pure Bayesian and a robust Bayesian approach for doing inference in
set-identified SVAR-WBs. Both the theory of identification and inference are
illustrated through a set of examples and an empirical application on the
transmission of US monetary policy over the great inflation and great
moderation regimes.

arXiv link: http://arxiv.org/abs/2405.04973v1

Econometrics arXiv updated paper (originally submitted: 2024-05-08)

Testing the Fairness-Accuracy Improvability of Algorithms

Authors: Eric Auerbach, Annie Liang, Kyohei Okumura, Max Tabord-Meehan

Many organizations use algorithms that have a disparate impact, i.e., the
benefits or harms of the algorithm fall disproportionately on certain social
groups. Addressing an algorithm's disparate impact can be challenging, however,
because it is often unclear whether it is possible to reduce this impact
without sacrificing other objectives of the organization, such as accuracy or
profit. Establishing the improvability of algorithms with respect to multiple
criteria is of both conceptual and practical interest: in many settings,
disparate impact that would otherwise be prohibited under US federal law is
permissible if it is necessary to achieve a legitimate business interest. The
question is how a policy-maker can formally substantiate, or refute, this
"necessity" defense. In this paper, we provide an econometric framework for
testing the hypothesis that it is possible to improve on the fairness of an
algorithm without compromising on other pre-specified objectives. Our proposed
test is simple to implement and can be applied under any exogenous constraint
on the algorithm space. We establish the large-sample validity and consistency
of our test, and microfound the test's robustness to manipulation based on a
game between a policymaker and the analyst. Finally, we apply our approach to
evaluate a healthcare algorithm originally considered by Obermeyer et al.
(2019), and quantify the extent to which the algorithm's disparate impact can
be reduced without compromising the accuracy of its predictions.

arXiv link: http://arxiv.org/abs/2405.04816v4

Econometrics arXiv updated paper (originally submitted: 2024-05-07)

Difference-in-Differences Estimators When No Unit Remains Untreated

Authors: Clément de Chaisemartin, Diego Ciccia, Xavier D'Haultfœuille, Felix Knau

We consider treatment-effect estimation with a two-periods panel, where units
are untreated at period one, and receive strictly positive doses at period two.
First, we consider designs with some quasi-untreated units, with a period-two
dose local to zero. We show that under a parallel-trends assumption, a weighted
average of slopes of units' potential outcomes is identified by a
difference-in-difference estimand using quasi-untreated units as the control
group. We leverage results from the regression-discontinuity-design literature
to propose a nonparametric estimator. Then, we propose estimators for designs
without quasi-untreated units. Finally, we propose a test of the
homogeneous-effect assumption underlying two-way-fixed-effects regressions.

arXiv link: http://arxiv.org/abs/2405.04465v5

Econometrics arXiv paper, submitted: 2024-05-07

Detailed Gender Wage Gap Decompositions: Controlling for Worker Unobserved Heterogeneity Using Network Theory

Authors: Jamie Fogel, Bernardo Modenesi

Recent advances in the literature of decomposition methods in economics have
allowed for the identification and estimation of detailed wage gap
decompositions. In this context, building reliable counterfactuals requires
using tighter controls to ensure that similar workers are correctly identified
by making sure that important unobserved variables such as skills are
controlled for, as well as comparing only workers with similar observable
characteristics. This paper contributes to the wage decomposition literature in
two main ways: (i) developing an economic principled network based approach to
control for unobserved worker skills heterogeneity in the presence of potential
discrimination; and (ii) extending existing generic decomposition tools to
accommodate for potential lack of overlapping supports in covariates between
groups being compared, which is likely to be the norm in more detailed
decompositions. We illustrate the methodology by decomposing the gender wage
gap in Brazil.

arXiv link: http://arxiv.org/abs/2405.04365v1

Econometrics arXiv updated paper (originally submitted: 2024-05-06)

A Primer on the Analysis of Randomized Experiments and a Survey of some Recent Advances

Authors: Yuehao Bai, Azeem M. Shaikh, Max Tabord-Meehan

The past two decades have witnessed a surge of new research in the analysis
of randomized experiments. The emergence of this literature may seem surprising
given the widespread use and long history of experiments as the "gold standard"
in program evaluation, but this body of work has revealed many subtle aspects
of randomized experiments that may have been previously unappreciated. This
article provides an overview of some of these topics, primarily focused on
stratification, regression adjustment, and cluster randomization.

arXiv link: http://arxiv.org/abs/2405.03910v2

Econometrics arXiv paper, submitted: 2024-05-06

A quantile-based nonadditive fixed effects model

Authors: Xin Liu

I propose a quantile-based nonadditive fixed effects panel model to study
heterogeneous causal effects. Similar to standard fixed effects (FE) model, my
model allows arbitrary dependence between regressors and unobserved
heterogeneity, but it generalizes the additive separability of standard FE to
allow the unobserved heterogeneity to enter nonseparably. Similar to structural
quantile models, my model's random coefficient vector depends on an unobserved,
scalar ”rank” variable, in which outcomes (excluding an additive noise term)
are monotonic at a particular value of the regressor vector, which is much
weaker than the conventional monotonicity assumption that must hold at all
possible values. This rank is assumed to be stable over time, which is often
more economically plausible than the panel quantile studies that assume
individual rank is iid over time. It uncovers the heterogeneous causal effects
as functions of the rank variable. I provide identification and estimation
results, establishing uniform consistency and uniform asymptotic normality of
the heterogeneous causal effect function estimator. Simulations show reasonable
finite-sample performance and show my model complements fixed effects quantile
regression. Finally, I illustrate the proposed methods by examining the causal
effect of a country's oil wealth on its military defense spending.

arXiv link: http://arxiv.org/abs/2405.03826v1

Econometrics arXiv paper, submitted: 2024-05-05

Tuning parameter selection in econometrics

Authors: Denis Chetverikov

I review some of the main methods for selecting tuning parameters in
nonparametric and $\ell_1$-penalized estimation. For the nonparametric
estimation, I consider the methods of Mallows, Stein, Lepski, cross-validation,
penalization, and aggregation in the context of series estimation. For the
$\ell_1$-penalized estimation, I consider the methods based on the theory of
self-normalized moderate deviations, bootstrap, Stein's unbiased risk
estimation, and cross-validation in the context of Lasso estimation. I explain
the intuition behind each of the methods and discuss their comparative
advantages. I also give some extensions.

arXiv link: http://arxiv.org/abs/2405.03021v1

Econometrics arXiv paper, submitted: 2024-05-03

A Network Simulation of OTC Markets with Multiple Agents

Authors: James T. Wilkinson, Jacob Kelter, John Chen, Uri Wilensky

We present a novel agent-based approach to simulating an over-the-counter
(OTC) financial market in which trades are intermediated solely by market
makers and agent visibility is constrained to a network topology. Dynamics,
such as changes in price, result from agent-level interactions that
ubiquitously occur via market maker agents acting as liquidity providers. Two
additional agents are considered: trend investors use a deep convolutional
neural network paired with a deep Q-learning framework to inform trading
decisions by analysing price history; and value investors use a static
price-target to determine their trade directions and sizes. We demonstrate that
our novel inclusion of a network topology with market makers facilitates
explorations into various market structures. First, we present the model and an
overview of its mechanics. Second, we validate our findings via comparison to
the real-world: we demonstrate a fat-tailed distribution of price changes,
auto-correlated volatility, a skew negatively correlated to market maker
positioning, predictable price-history patterns and more. Finally, we
demonstrate that our network-based model can lend insights into the effect of
market-structure on price-action. For example, we show that markets with
sparsely connected intermediaries can have a critical point of fragmentation,
beyond which the market forms distinct clusters and arbitrage becomes rapidly
possible between the prices of different market makers. A discussion is
provided on future work that would be beneficial.

arXiv link: http://arxiv.org/abs/2405.02480v1

Econometrics arXiv updated paper (originally submitted: 2024-05-03)

Identifying and exploiting alpha in linear asset pricing models with strong, semi-strong, and latent factors

Authors: M. Hashem Pesaran, Ron P. Smith

The risk premia of traded factors are the sum of factor means and a parameter
vector we denote by {\phi} which is identified from the cross section
regression of alpha of individual securities on the vector of factor loadings.
If phi is non-zero one can construct "phi-portfolios" which exploit the
systematic components of non-zero alpha. We show that for known values of betas
and when phi is non-zero there exist phi-portfolios that dominate mean-variance
portfolios. The paper then proposes a two-step bias corrected estimator of phi
and derives its asymptotic distribution allowing for idiosyncratic pricing
errors, weak missing factors, and weak error cross-sectional dependence. Small
sample results from extensive Monte Carlo experiments show that the proposed
estimator has the correct size with good power properties. The paper also
provides an empirical application to a large number of U.S. securities with
risk factors selected from a large number of potential risk factors according
to their strength and constructs phi-portfolios and compares their Sharpe
ratios to mean variance and S&P 500 portfolio.

arXiv link: http://arxiv.org/abs/2405.02217v4

Econometrics arXiv paper, submitted: 2024-05-03

Testing for an Explosive Bubble using High-Frequency Volatility

Authors: H. Peter Boswijk, Jun Yu, Yang Zu

Based on a continuous-time stochastic volatility model with a linear drift,
we develop a test for explosive behavior in financial asset prices at a low
frequency when prices are sampled at a higher frequency. The test exploits the
volatility information in the high-frequency data. The method consists of
devolatizing log-asset price increments with realized volatility measures and
performing a supremum-type recursive Dickey-Fuller test on the devolatized
sample. The proposed test has a nuisance-parameter-free asymptotic distribution
and is easy to implement. We study the size and power properties of the test in
Monte Carlo simulations. A real-time date-stamping strategy based on the
devolatized sample is proposed for the origination and conclusion dates of the
explosive regime. Conditions under which the real-time date-stamping strategy
is consistent are established. The test and the date-stamping strategy are
applied to study explosive behavior in cryptocurrency and stock markets.

arXiv link: http://arxiv.org/abs/2405.02087v1

Econometrics arXiv paper, submitted: 2024-05-03

Unleashing the Power of AI: Transforming Marketing Decision-Making in Heavy Machinery with Machine Learning, Radar Chart Simulation, and Markov Chain Analysis

Authors: Tian Tian, Jiahao Deng

This pioneering research introduces a novel approach for decision-makers in
the heavy machinery industry, specifically focusing on production management.
The study integrates machine learning techniques like Ridge Regression, Markov
chain analysis, and radar charts to optimize North American Crawler Cranes
market production processes. Ridge Regression enables growth pattern
identification and performance assessment, facilitating comparisons and
addressing industry challenges. Markov chain analysis evaluates risk factors,
aiding in informed decision-making and risk management. Radar charts simulate
benchmark product designs, enabling data-driven decisions for production
optimization. This interdisciplinary approach equips decision-makers with
transformative insights, enhancing competitiveness in the heavy machinery
industry and beyond. By leveraging these techniques, companies can
revolutionize their production management strategies, driving success in
diverse markets.

arXiv link: http://arxiv.org/abs/2405.01913v1

Econometrics arXiv paper, submitted: 2024-05-02

Synthetic Controls with spillover effects: A comparative study

Authors: Andrii Melnychuk

Iterative Synthetic Control Method is introduced in this study, a
modification of the Synthetic Control Method (SCM) designed to improve its
predictive performance by utilizing control units affected by the treatment in
question. This method is then compared to other SCM modifications: SCM without
any modifications, SCM after removing all spillover-affected units, Inclusive
SCM, and the SP SCM model. For the comparison, Monte Carlo simulations are
utilized, generating artificial datasets with known counterfactuals and
comparing the predictive performance of the methods. Generally, the Inclusive
SCM performed best in all settings and is relatively simple to implement. The
Iterative SCM, introduced in this paper, was in close seconds, with a small
difference in performance and a simpler implementation.

arXiv link: http://arxiv.org/abs/2405.01645v1

Econometrics arXiv cross-link from cs.HC (cs.HC), submitted: 2024-05-02

Designing Algorithmic Recommendations to Achieve Human-AI Complementarity

Authors: Bryce McLaughlin, Jann Spiess

Algorithms frequently assist, rather than replace, human decision-makers.
However, the design and analysis of algorithms often focus on predicting
outcomes and do not explicitly model their effect on human decisions. This
discrepancy between the design and role of algorithmic assistants becomes
particularly concerning in light of empirical evidence that suggests that
algorithmic assistants again and again fail to improve human decisions. In this
article, we formalize the design of recommendation algorithms that assist human
decision-makers without making restrictive ex-ante assumptions about how
recommendations affect decisions. We formulate an algorithmic-design problem
that leverages the potential-outcomes framework from causal inference to model
the effect of recommendations on a human decision-maker's binary treatment
choice. Within this model, we introduce a monotonicity assumption that leads to
an intuitive classification of human responses to the algorithm. Under this
assumption, we can express the human's response to algorithmic recommendations
in terms of their compliance with the algorithm and the active decision they
would take if the algorithm sends no recommendation. We showcase the utility of
our framework using an online experiment that simulates a hiring task. We argue
that our approach can make sense of the relative performance of different
recommendation algorithms in the experiment and can help design solutions that
realize human-AI complementarity. Finally, we leverage our approach to derive
minimax optimal recommendation algorithms that can be implemented with machine
learning using limited training data.

arXiv link: http://arxiv.org/abs/2405.01484v2

Econometrics arXiv updated paper (originally submitted: 2024-05-02)

Dynamic Local Average Treatment Effects

Authors: Ravi B. Sojitra, Vasilis Syrgkanis

We consider Dynamic Treatment Regimes (DTRs) with One Sided Noncompliance
that arise in applications such as digital recommendations and adaptive medical
trials. These are settings where decision makers encourage individuals to take
treatments over time, but adapt encouragements based on previous
encouragements, treatments, states, and outcomes. Importantly, individuals may
not comply with encouragements based on unobserved confounders. For settings
with binary treatments and encouragements, we provide nonparametric
identification, estimation, and inference for Dynamic Local Average Treatment
Effects (LATEs), which are expected values of multiple time period treatment
effect contrasts for the respective complier subpopulations. Under One Sided
Noncompliance and sequential extensions of the assumptions in Imbens and
Angrist (1994), we show that one can identify Dynamic LATEs that correspond to
treating at single time steps. In Staggered Adoption settings, we show that the
assumptions are sufficient to identify Dynamic LATEs for treating in multiple
time periods. Moreover, this result extends to any setting where the effect of
a treatment in one period is uncorrelated with the compliance event in a
subsequent period.

arXiv link: http://arxiv.org/abs/2405.01463v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-05-02

Demistifying Inference after Adaptive Experiments

Authors: Aurélien Bibaut, Nathan Kallus

Adaptive experiments such as multi-arm bandits adapt the treatment-allocation
policy and/or the decision to stop the experiment to the data observed so far.
This has the potential to improve outcomes for study participants within the
experiment, to improve the chance of identifying best treatments after the
experiment, and to avoid wasting data. Seen as an experiment (rather than just
a continually optimizing system) it is still desirable to draw statistical
inferences with frequentist guarantees. The concentration inequalities and
union bounds that generally underlie adaptive experimentation algorithms can
yield overly conservative inferences, but at the same time the asymptotic
normality we would usually appeal to in non-adaptive settings can be imperiled
by adaptivity. In this article we aim to explain why, how, and when adaptivity
is in fact an issue for inference and, when it is, understand the various ways
to fix it: reweighting to stabilize variances and recover asymptotic normality,
always-valid inference based on joint normality of an asymptotic limiting
sequence, and characterizing and inverting the non-normal distributions induced
by adaptivity.

arXiv link: http://arxiv.org/abs/2405.01281v1

Econometrics arXiv updated paper (originally submitted: 2024-05-02)

Asymptotic Properties of the Distributional Synthetic Controls

Authors: Lu Zhang, Xiaomeng Zhang, Xinyu Zhang

As an alternative to synthetic control, the distributional Synthetic Control
(DSC) proposed by Gunsilius (2023) provides estimates for quantile treatment
effect and thus enabling researchers to comprehensively understand the impact
of interventions in causal inference. But the asymptotic properties of DSC have
not been built. In this paper, we first establish the DSC estimator's
asymptotic optimality in the essence that the treatment effect estimator given
by DSC achieves the lowest possible squared prediction error among all
potential estimators from averaging quantiles of control units. We then
establish the convergence rate of the DSC weights. A significant aspect of our
research is that we find the DSC synthesis forms an optimal weighted average,
particularly in situations where it is impractical to perfectly fit the treated
unit's quantiles through the weighted average of the control units' quantiles.
Simulation results verify our theoretical insights.

arXiv link: http://arxiv.org/abs/2405.00953v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-05-01

De-Biasing Models of Biased Decisions: A Comparison of Methods Using Mortgage Application Data

Authors: Nicholas Tenev

Prediction models can improve efficiency by automating decisions such as the
approval of loan applications. However, they may inherit bias against protected
groups from the data they are trained on. This paper adds counterfactual
(simulated) ethnic bias to real data on mortgage application decisions, and
shows that this bias is replicated by a machine learning model (XGBoost) even
when ethnicity is not used as a predictive variable. Next, several other
de-biasing methods are compared: averaging over prohibited variables, taking
the most favorable prediction over prohibited variables (a novel method), and
jointly minimizing errors as well as the association between predictions and
prohibited variables. De-biasing can recover some of the original decisions,
but the results are sensitive to whether the bias is effected through a proxy.

arXiv link: http://arxiv.org/abs/2405.00910v1

Econometrics arXiv updated paper (originally submitted: 2024-05-01)

Optimal Bias-Correction and Valid Inference in High-Dimensional Ridge Regression: A Closed-Form Solution

Authors: Zhaoxing Gao, Ruey S. Tsay

Ridge regression is an indispensable tool in big data analysis. Yet its
inherent bias poses a significant and longstanding challenge, compromising both
statistical efficiency and scalability across various applications. To tackle
this critical issue, we introduce an iterative strategy to correct bias
effectively when the dimension $p$ is less than the sample size $n$. For $p>n$,
our method optimally mitigates the bias such that any remaining bias in the
proposed de-biased estimator is unattainable through linear transformations of
the response data. To address the remaining bias when $p>n$, we employ a
Ridge-Screening (RS) method, producing a reduced model suitable for bias
correction. Crucially, under certain conditions, the true model is nested
within our selected one, highlighting RS as a novel variable selection
approach. Through rigorous analysis, we establish the asymptotic properties and
valid inferences of our de-biased ridge estimators for both $p<n$ and $p>n$,
where, both $p$ and $n$ may increase towards infinity, along with the number of
iterations. We further validate these results using simulated and real-world
data examples. Our method offers a transformative solution to the bias
challenge in ridge regression inferences across various disciplines.

arXiv link: http://arxiv.org/abs/2405.00424v2

Econometrics arXiv updated paper (originally submitted: 2024-04-30)

Estimating Heterogeneous Treatment Effects with Item-Level Outcome Data: Insights from Item Response Theory

Authors: Joshua B. Gilbert, Zachary Himmelsbach, James Soland, Mridul Joshi, Benjamin W. Domingue

Analyses of heterogeneous treatment effects (HTE) are common in applied
causal inference research. However, when outcomes are latent variables assessed
via psychometric instruments such as educational tests, standard methods ignore
the potential HTE that may exist among the individual items of the outcome
measure. Failing to account for "item-level" HTE (IL-HTE) can lead to both
underestimated standard errors and identification challenges in the estimation
of treatment-by-covariate interaction effects. We demonstrate how Item Response
Theory (IRT) models that estimate a treatment effect for each assessment item
can both address these challenges and provide new insights into HTE generally.
This study articulates the theoretical rationale for the IL-HTE model and
demonstrates its practical value using 75 datasets from 48 randomized
controlled trials containing 5.8 million item responses in economics,
education, and health research. Our results show that the IL-HTE model reveals
item-level variation masked by single-number scores, provides more meaningful
standard errors in many settings, allows for estimates of the generalizability
of causal effects to untested items, resolves identification problems in the
estimation of interaction effects, and provides estimates of standardized
treatment effect sizes corrected for attenuation due to measurement error.

arXiv link: http://arxiv.org/abs/2405.00161v4

Econometrics arXiv updated paper (originally submitted: 2024-04-30)

Identification by non-Gaussianity in structural threshold and smooth transition vector autoregressive models

Authors: Savi Virolainen

We show that structural smooth transition vector autoregressive models are
statistically identified if the shocks are mutually independent and at most one
of them is Gaussian. This extends a known identification result for linear
structural vector autoregressions to a time-varying impact matrix. We also
propose an estimation method, show how a blended identification strategy can be
adopted to address weak identification, and establish a sufficient condition
for ergodic stationarity. The introduced methods are implemented in the
accompanying R package sstvars. Our empirical application finds that a positive
climate policy uncertainty shock reduces production and raises inflation under
both low and high economic policy uncertainty, but its effects, particularly on
inflation, are stronger during the latter.

arXiv link: http://arxiv.org/abs/2404.19707v5

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2024-04-30

Percentage Coefficient (bp) -- Effect Size Analysis (Theory Paper 1)

Authors: Xinshu Zhao, Dianshi Moses Li, Ze Zack Lai, Piper Liping Liu, Song Harris Ao, Fei You

Percentage coefficient (bp) has emerged in recent publications as an
additional and alternative estimator of effect size for regression analysis.
This paper retraces the theory behind the estimator. It's posited that an
estimator must first serve the fundamental function of enabling researchers and
readers to comprehend an estimand, the target of estimation. It may then serve
the instrumental function of enabling researchers and readers to compare two or
more estimands. Defined as the regression coefficient when dependent variable
(DV) and independent variable (IV) are both on conceptual 0-1 percentage
scales, percentage coefficients (bp) feature 1) clearly comprehendible
interpretation and 2) equitable scales for comparison. The coefficient (bp)
serves the two functions effectively and efficiently. It thus serves needs
unserved by other indicators, such as raw coefficient (bw) and standardized
beta.
Another premise of the functionalist theory is that "effect" is not a
monolithic concept. Rather, it is a collection of concepts, each of which
measures a component of the conglomerate called "effect", thereby serving a
subfunction. Regression coefficient (b), for example, indicates the unit change
in DV associated with a one-unit increase in IV, thereby measuring one aspect
called unit effect, aka efficiency. Percentage coefficient (bp) indicates the
percentage change in DV associated with a whole scale increase in IV. It is not
meant to be an all-encompassing indicator of an all-encompassing concept, but
rather a comprehendible and comparable indicator of efficiency, a key aspect of
effect.

arXiv link: http://arxiv.org/abs/2404.19495v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-04-29

Orthogonal Bootstrap: Efficient Simulation of Input Uncertainty

Authors: Kaizhao Liu, Jose Blanchet, Lexing Ying, Yiping Lu

Bootstrap is a popular methodology for simulating input uncertainty. However,
it can be computationally expensive when the number of samples is large. We
propose a new approach called Orthogonal Bootstrap that reduces the
number of required Monte Carlo replications. We decomposes the target being
simulated into two parts: the non-orthogonal part which has a
closed-form result known as Infinitesimal Jackknife and the orthogonal
part which is easier to be simulated. We theoretically and numerically show
that Orthogonal Bootstrap significantly reduces the computational cost of
Bootstrap while improving empirical accuracy and maintaining the same width of
the constructed interval.

arXiv link: http://arxiv.org/abs/2404.19145v2

Econometrics arXiv paper, submitted: 2024-04-29

A Locally Robust Semiparametric Approach to Examiner IV Designs

Authors: Lonjezo Sithole

I propose a locally robust semiparametric framework for estimating causal
effects using the popular examiner IV design, in the presence of many examiners
and possibly many covariates relative to the sample size. The key ingredient of
this approach is an orthogonal moment function that is robust to biases and
local misspecification from the first step estimation of the examiner IV. I
derive the orthogonal moment function and show that it delivers multiple
robustness where the outcome model or at least one of the first step components
is misspecified but the estimating equation remains valid. The proposed
framework not only allows for estimation of the examiner IV in the presence of
many examiners and many covariates relative to sample size, using a wide range
of nonparametric and machine learning techniques including LASSO, Dantzig,
neural networks and random forests, but also delivers root-n consistent
estimation of the parameter of interest under mild assumptions.

arXiv link: http://arxiv.org/abs/2404.19144v1

Econometrics arXiv paper, submitted: 2024-04-28

Optimal Treatment Allocation under Constraints

Authors: Torben S. D. Johansen

In optimal policy problems where treatment effects vary at the individual
level, optimally allocating treatments to recipients is complex even when
potential outcomes are known. We present an algorithm for multi-arm treatment
allocation problems that is guaranteed to find the optimal allocation in
strongly polynomial time, and which is able to handle arbitrary potential
outcomes as well as constraints on treatment requirement and capacity. Further,
starting from an arbitrary allocation, we show how to optimally re-allocate
treatments in a Pareto-improving manner. To showcase our results, we use data
from Danish nurse home visiting for infants. We estimate nurse specific
treatment effects for children born 1959-1967 in Copenhagen, comparing nurses
against each other. We exploit random assignment of newborn children to nurses
within a district to obtain causal estimates of nurse-specific treatment
effects using causal machine learning. Using these estimates, and treating the
Danish nurse home visiting program as a case of an optimal treatment allocation
problem (where a treatment is a nurse), we document room for significant
productivity improvements by optimally re-allocating nurses to children. Our
estimates suggest that optimal allocation of nurses to children could have
improved average yearly earnings by USD 1,815 and length of education by around
two months.

arXiv link: http://arxiv.org/abs/2404.18268v1

Econometrics arXiv paper, submitted: 2024-04-28

Testing for Asymmetric Information in Insurance with Deep Learning

Authors: Serguei Maliar, Bernard Salanie

The positive correlation test for asymmetric information developed by
Chiappori and Salanie (2000) has been applied in many insurance markets. Most
of the literature focuses on the special case of constant correlation; it also
relies on restrictive parametric specifications for the choice of coverage and
the occurrence of claims. We relax these restrictions by estimating conditional
covariances and correlations using deep learning methods. We test the positive
correlation property by using the intersection test of Chernozhukov, Lee, and
Rosen (2013) and the "sorted groups" test of Chernozhukov, Demirer, Duflo, and
Fernandez-Val (2023). Our results confirm earlier findings that the correlation
between risk and coverage is small. Random forests and gradient boosting trees
produce similar results to neural networks.

arXiv link: http://arxiv.org/abs/2404.18207v1

Econometrics arXiv paper, submitted: 2024-04-27

Sequential monitoring for explosive volatility regimes

Authors: Lajos Horvath, Lorenzo Trapani, Shixuan Wang

In this paper, we develop two families of sequential monitoring procedure to
(timely) detect changes in a GARCH(1,1) model. Whilst our methodologies can be
applied for the general analysis of changepoints in GARCH(1,1) sequences, they
are in particular designed to detect changes from stationarity to explosivity
or vice versa, thus allowing to check for volatility bubbles. Our statistics
can be applied irrespective of whether the historical sample is stationary or
not, and indeed without prior knowledge of the regime of the observations
before and after the break. In particular, we construct our detectors as the
CUSUM process of the quasi-Fisher scores of the log likelihood function. In
order to ensure timely detection, we then construct our boundary function
(exceeding which would indicate a break) by including a weighting sequence
which is designed to shorten the detection delay in the presence of a
changepoint. We consider two types of weights: a lighter set of weights, which
ensures timely detection in the presence of changes occurring early, but not
too early after the end of the historical sample; and a heavier set of weights,
called Renyi weights which is designed to ensure timely detection in the
presence of changepoints occurring very early in the monitoring horizon. In
both cases, we derive the limiting distribution of the detection delays,
indicating the expected delay for each set of weights. Our theoretical results
are validated via a comprehensive set of simulations, and an empirical
application to daily returns of individual stocks.

arXiv link: http://arxiv.org/abs/2404.17885v1

Econometrics arXiv updated paper (originally submitted: 2024-04-26)

A Nonresponse Bias Correction using Nonrandom Followup with an Application to the Gender Entrepreneurship Gap

Authors: Clint Harris, Jon Eckhardt, Brent Goldfarb

We develop a nonresponse correction applicable to any setting in which
multiple attempts to contact subjects affect whether researchers observe
variables without affecting the variables themselves. Our procedure produces
point estimates of population averages using selected samples without requiring
randomized incentives or assuming selection bias cancels out for any
within-respondent comparisons. Applying our correction to a 16% response rate
survey of University of Wisconsin-Madison undergraduates, we estimate a 15
percentage point male-female entrepreneurial intention gap. Our estimates
attribute the 20 percentage point uncorrected within-respondent gap to positive
bias for men and negative bias for women, highlighting the value of
within-group nonresponse corrections.

arXiv link: http://arxiv.org/abs/2404.17693v2

Econometrics arXiv paper, submitted: 2024-04-25

Overidentification in Shift-Share Designs

Authors: Jinyong Hahn, Guido Kuersteiner, Andres Santos, Wavid Willigrod

This paper studies the testability of identifying restrictions commonly
employed to assign a causal interpretation to two stage least squares (TSLS)
estimators based on Bartik instruments. For homogeneous effects models applied
to short panels, our analysis yields testable implications previously noted in
the literature for the two major available identification strategies. We
propose overidentification tests for these restrictions that remain valid in
high dimensional regimes and are robust to heteroskedasticity and clustering.
We further show that homogeneous effect models in short panels, and their
corresponding overidentification tests, are of central importance by
establishing that: (i) In heterogenous effects models, interpreting TSLS as a
positively weighted average of treatment effects can impose implausible
assumptions on the distribution of the data; and (ii) Alternative identifying
strategies relying on long panels can prove uninformative in short panel
applications. We highlight the empirical relevance of our results by examining
the viability of Bartik instruments for identifying the effect of rising
Chinese import competition on US local labor markets.

arXiv link: http://arxiv.org/abs/2404.17049v1

Econometrics arXiv updated paper (originally submitted: 2024-04-25)

A joint test of unconfoundedness and common trends

Authors: Martin Huber, Eva-Maria Oeß

This paper introduces an overidentification test of two alternative
assumptions to identify the average treatment effect on the treated in a
two-period panel data setting: unconfoundedness and common trends. Under the
unconfoundedness assumption, treatment assignment and post-treatment outcomes
are independent, conditional on control variables and pre-treatment outcomes,
which motivates including pre-treatment outcomes in the set of controls.
Conversely, under the common trends assumption, the trend and the treatment
assignment are independent, conditional on control variables. This motivates
employing a Difference-in-Differences (DiD) approach by comparing the
differences between pre- and post-treatment outcomes of the treatment and
control group. Given the non-nested nature of these assumptions and their often
ambiguous plausibility in empirical settings, we propose a joint test using a
doubly robust statistic that can be combined with machine learning to control
for observed confounders in a data-driven manner. We discuss various causal
models that imply the satisfaction of either common trends, unconfoundedness,
or both assumptions jointly, and we investigate the finite sample properties of
our test through a simulation study. Additionally, we apply the proposed method
to five empirical examples using publicly available datasets and find the test
to reject the null hypothesis in two cases.

arXiv link: http://arxiv.org/abs/2404.16961v3

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2024-04-23

Correlations versus noise in the NFT market

Authors: Marcin Wątorek, Paweł Szydło, Jarosław Kwapień, Stanisław Drożdż

The non-fungible token (NFT) market emerges as a recent trading innovation
leveraging blockchain technology, mirroring the dynamics of the cryptocurrency
market. The current study is based on the capitalization changes and
transaction volumes across a large number of token collections on the Ethereum
platform. In order to deepen the understanding of the market dynamics, the
collection-collection dependencies are examined by using the multivariate
formalism of detrended correlation coefficient and correlation matrix. It
appears that correlation strength is lower here than that observed in
previously studied markets. Consequently, the eigenvalue spectra of the
correlation matrix more closely follow the Marchenko-Pastur distribution,
still, some departures indicating the existence of correlations remain. The
comparison of results obtained from the correlation matrix built from the
Pearson coefficients and, independently, from the detrended cross-correlation
coefficients suggests that the global correlations in the NFT market arise from
higher frequency fluctuations. Corresponding minimal spanning trees (MSTs) for
capitalization variability exhibit a scale-free character while, for the number
of transactions, they are somewhat more decentralized.

arXiv link: http://arxiv.org/abs/2404.15495v2

Econometrics arXiv updated paper (originally submitted: 2024-04-22)

Quantifying the Internal Validity of Weighted Estimands

Authors: Alexandre Poirier, Tymon Słoczyński

In this paper we study a class of weighted estimands, which we define as
parameters that can be expressed as weighted averages of the underlying
heterogeneous treatment effects. The popular ordinary least squares (OLS),
two-stage least squares (2SLS), and two-way fixed effects (TWFE) estimands are
all special cases within our framework. Our focus is on answering two questions
concerning weighted estimands. First, under what conditions can they be
interpreted as the average treatment effect for some (possibly latent)
subpopulation? Second, when these conditions are satisfied, what is the upper
bound on the size of that subpopulation, either in absolute terms or relative
to a target population of interest? We argue that this upper bound provides a
valuable diagnostic for empirical research. When a given weighted estimand
corresponds to the average treatment effect for a small subset of the
population of interest, we say its internal validity is low. Our paper develops
practical tools to quantify the internal validity of weighted estimands. We
also apply these tools to revisit a prominent study of the effects of
unilateral divorce laws on female suicide.

arXiv link: http://arxiv.org/abs/2404.14603v4

Econometrics arXiv updated paper (originally submitted: 2024-04-22)

Stochastic Volatility in Mean: Efficient Analysis by a Generalized Mixture Sampler

Authors: Daichi Hiraki, Siddhartha Chib, Yasuhiro Omori

In this paper we consider the simulation-based Bayesian analysis of
stochastic volatility in mean (SVM) models. Extending the highly efficient
Markov chain Monte Carlo mixture sampler for the SV model proposed in Kim et
al. (1998) and Omori et al. (2007), we develop an accurate approximation of the
non-central chi-squared distribution as a mixture of thirty normal
distributions. Under this mixture representation, we sample the parameters and
latent volatilities in one block. We also detail a correction of the small
approximation error by using additional Metropolis-Hastings steps. The proposed
method is extended to the SVM model with leverage. The methodology and models
are applied to excess holding yields and S&P500 returns in empirical studies,
and the SVM models are shown to outperform other volatility models based on
marginal likelihoods.

arXiv link: http://arxiv.org/abs/2404.13986v2

Econometrics arXiv paper, submitted: 2024-04-21

Identification and Estimation of Nonseparable Triangular Equations with Mismeasured Instruments

Authors: Shaomin Wu

In this paper, I study the nonparametric identification and estimation of the
marginal effect of an endogenous variable $X$ on the outcome variable $Y$,
given a potentially mismeasured instrument variable $W^*$, without assuming
linearity or separability of the functions governing the relationship between
observables and unobservables. To address the challenges arising from the
co-existence of measurement error and nonseparability, I first employ the
deconvolution technique from the measurement error literature to identify the
joint distribution of $Y, X, W^*$ using two error-laden measurements of $W^*$.
I then recover the structural derivative of the function of interest and the
"Local Average Response" (LAR) from the joint distribution via the "unobserved
instrument" approach in Matzkin (2016). I also propose nonparametric estimators
for these parameters and derive their uniform rates of convergence. Monte Carlo
exercises show evidence that the estimators I propose have good finite sample
performance.

arXiv link: http://arxiv.org/abs/2404.13735v1

Econometrics arXiv updated paper (originally submitted: 2024-04-20)

How do applied researchers use the Causal Forest? A methodological review of a method

Authors: Patrick Rehill

This methodological review examines the use of the causal forest method by
applied researchers across 133 peer-reviewed papers. It shows that the emerging
best practice relies heavily on the approach and tools created by the original
authors of the causal forest such as their grf package and the approaches given
by them in examples. Generally researchers use the causal forest on a
relatively low-dimensional dataset relying on observed controls or in some
cases experiments to identify effects. There are several common ways to then
communicate results -- by mapping out the univariate distribution of
individual-level treatment effect estimates, displaying variable importance
results for the forest and graphing the distribution of treatment effects
across covariates that are important either for theoretical reasons or because
they have high variable importance. Some deviations from this common practice
are interesting and deserve further development and use. Others are unnecessary
or even harmful. The paper concludes by reflecting on the emerging best
practice for causal forest use and paths for future research.

arXiv link: http://arxiv.org/abs/2404.13356v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2024-04-19

An economically-consistent discrete choice model with flexible utility specification based on artificial neural networks

Authors: Jose Ignacio Hernandez, Niek Mouter, Sander van Cranenburgh

Random utility maximisation (RUM) models are one of the cornerstones of
discrete choice modelling. However, specifying the utility function of RUM
models is not straightforward and has a considerable impact on the resulting
interpretable outcomes and welfare measures. In this paper, we propose a new
discrete choice model based on artificial neural networks (ANNs) named
"Alternative-Specific and Shared weights Neural Network (ASS-NN)", which
provides a further balance between flexible utility approximation from the data
and consistency with two assumptions: RUM theory and fungibility of money
(i.e., "one euro is one euro"). Therefore, the ASS-NN can derive
economically-consistent outcomes, such as marginal utilities or willingness to
pay, without explicitly specifying the utility functional form. Using a Monte
Carlo experiment and empirical data from the Swissmetro dataset, we show that
ASS-NN outperforms (in terms of goodness of fit) conventional multinomial logit
(MNL) models under different utility specifications. Furthermore, we show how
the ASS-NN is used to derive marginal utilities and willingness to pay
measures.

arXiv link: http://arxiv.org/abs/2404.13198v1

Econometrics arXiv updated paper (originally submitted: 2024-04-19)

On the Asymmetric Volatility Connectedness

Authors: Abdulnasser Hatemi-J

Connectedness measures the degree at which a time-series variable spills over
volatility to other variables compared to the rate that it is receiving. The
idea is based on the percentage of variance decomposition from one variable to
the others, which is estimated by making use of a VAR model. Diebold and Yilmaz
(2012, 2014) suggested estimating this simple and useful measure of percentage
risk spillover impact. Their method is symmetric by nature, however. The
current paper offers an alternative asymmetric approach for measuring the
volatility spillover direction, which is based on estimating the asymmetric
variance decompositions introduced by Hatemi-J (2011, 2014). This approach
accounts explicitly for the asymmetric property in the estimations, which
accords better with reality. An application is provided to capture the
potential asymmetric volatility spillover impacts between the three largest
financial markets in the world.

arXiv link: http://arxiv.org/abs/2404.12997v2

Econometrics arXiv updated paper (originally submitted: 2024-04-19)

The modified conditional sum-of-squares estimator for fractionally integrated models

Authors: Mustafa R. Kılınç, Michael Massmann

In this paper, we analyse the influence of estimating a constant term on the
bias of the conditional sum-of-squares (CSS) estimator in a stationary or
non-stationary type-II ARFIMA ($p_1$,$d$,$p_2$) model. We derive expressions
for the estimator's bias and show that the leading term can be easily removed
by a simple modification of the CSS objective function. We call this new
estimator the modified conditional sum-of-squares (MCSS) estimator. We show
theoretically and by means of Monte Carlo simulations that its performance
relative to that of the CSS estimator is markedly improved even for small
sample sizes. Finally, we revisit three classical short datasets that have in
the past been described by ARFIMA($p_1$,$d$,$p_2$) models with constant term,
namely the post-second World War real GNP data, the extended Nelson-Plosser
data, and the Nile data.

arXiv link: http://arxiv.org/abs/2404.12882v2

Econometrics arXiv paper, submitted: 2024-04-19

Two-step Estimation of Network Formation Models with Unobserved Heterogeneities and Strategic Interactions

Authors: Shaomin Wu

In this paper, I characterize the network formation process as a static game
of incomplete information, where the latent payoff of forming a link between
two individuals depends on the structure of the network, as well as private
information on agents' attributes. I allow agents' private unobserved
attributes to be correlated with observed attributes through individual fixed
effects. Using data from a single large network, I propose a two-step estimator
for the model primitives. In the first step, I estimate agents' equilibrium
beliefs of other people's choice probabilities. In the second step, I plug in
the first-step estimator to the conditional choice probability expression and
estimate the model parameters and the unobserved individual fixed effects
together using Joint MLE. Assuming that the observed attributes are discrete, I
showed that the first step estimator is uniformly consistent with rate
$N^{-1/4}$, where $N$ is the total number of linking proposals. I also show
that the second-step estimator converges asymptotically to a normal
distribution at the same rate.

arXiv link: http://arxiv.org/abs/2404.12581v1

Econometrics arXiv updated paper (originally submitted: 2024-04-18)

Axiomatic modeling of fixed proportion technologies

Authors: Xun Zhou, Timo Kuosmanen

Understanding input substitution and output transformation possibilities is
critical for efficient resource allocation and firm strategy. There are
important examples of fixed proportion technologies where certain inputs are
non-substitutable and/or certain outputs are non-transformable. However, there
is widespread confusion about the appropriate modeling of fixed proportion
technologies in data envelopment analysis. We point out and rectify several
misconceptions in the existing literature, and show how fixed proportion
technologies can be correctly incorporated into the axiomatic framework. A
Monte Carlo study is performed to demonstrate the proposed solution.

arXiv link: http://arxiv.org/abs/2404.12462v2

Econometrics arXiv paper, submitted: 2024-04-18

(Empirical) Bayes Approaches to Parallel Trends

Authors: Soonwoo Kwon, Jonathan Roth

We consider Bayes and Empirical Bayes (EB) approaches for dealing with
violations of parallel trends. In the Bayes approach, the researcher specifies
a prior over both the pre-treatment violations of parallel trends
$\delta_{pre}$ and the post-treatment violations $\delta_{post}$. The
researcher then updates their posterior about the post-treatment bias
$\delta_{post}$ given an estimate of the pre-trends $\delta_{pre}$. This allows
them to form posterior means and credible sets for the treatment effect of
interest, $\tau_{post}$. In the EB approach, the prior on the violations of
parallel trends is learned from the pre-treatment observations. We illustrate
these approaches in two empirical applications.

arXiv link: http://arxiv.org/abs/2404.11839v1

Econometrics arXiv updated paper (originally submitted: 2024-04-17)

Regret Analysis in Threshold Policy Design

Authors: Federico Crippa

Threshold policies are decision rules that assign treatments based on whether
an observable characteristic exceeds a certain threshold. They are widespread
across multiple domains, including welfare programs, taxation, and clinical
medicine. This paper examines the problem of designing threshold policies using
experimental data, when the goal is to maximize the population welfare. First,
I characterize the regret - a measure of policy optimality - of the Empirical
Welfare Maximizer (EWM) policy, popular in the literature. Next, I introduce
the Smoothed Welfare Maximizer (SWM) policy, which improves the EWM's regret
convergence rate under an additional smoothness condition. The two policies are
compared by studying how differently their regrets depend on the population
distribution, and investigating their finite sample performances through Monte
Carlo simulations. In many contexts, the SWM policy guarantees larger welfare
than the EWM. An empirical illustration demonstrates how the treatment
recommendations of the two policies may differ in practice.

arXiv link: http://arxiv.org/abs/2404.11767v2

Econometrics arXiv updated paper (originally submitted: 2024-04-17)

Testing Mechanisms

Authors: Soonwoo Kwon, Jonathan Roth

Economists are often interested in the mechanisms by which a treatment
affects an outcome. We develop tests for the "sharp null of full mediation"
that a treatment $D$ affects an outcome $Y$ only through a particular mechanism
(or set of mechanisms) $M$. Our approach exploits connections between mediation
analysis and the econometric literature on testing instrument validity. We also
provide tools for quantifying the magnitude of alternative mechanisms when the
sharp null is rejected: we derive sharp lower bounds on the fraction of
individuals whose outcome is affected by the treatment despite having the same
value of $M$ under both treatments (“always-takers”), as well as sharp bounds
on the average effect of the treatment for such always-takers. An advantage of
our approach relative to existing tools for mediation analysis is that it does
not require stringent assumptions about how $M$ is assigned. We illustrate our
methodology in two empirical applications.

arXiv link: http://arxiv.org/abs/2404.11739v2

Econometrics arXiv paper, submitted: 2024-04-17

Weighted-Average Least Squares for Negative Binomial Regression

Authors: Kevin Huynh

Model averaging methods have become an increasingly popular tool for
improving predictions and dealing with model uncertainty, especially in
Bayesian settings. Recently, frequentist model averaging methods such as
information theoretic and least squares model averaging have emerged. This work
focuses on the issue of covariate uncertainty where managing the computational
resources is key: The model space grows exponentially with the number of
covariates such that averaged models must often be approximated.
Weighted-average least squares (WALS), first introduced for (generalized)
linear models in the econometric literature, combines Bayesian and frequentist
aspects and additionally employs a semiorthogonal transformation of the
regressors to reduce the computational burden. This paper extends WALS for
generalized linear models to the negative binomial (NB) regression model for
overdispersed count data. A simulation experiment and an empirical application
using data on doctor visits were conducted to compare the predictive power of
WALS for NB regression to traditional estimators. The results show that WALS
for NB improves on the maximum likelihood estimator in sparse situations and is
competitive with lasso while being computationally more efficient.

arXiv link: http://arxiv.org/abs/2404.11324v1

Econometrics arXiv updated paper (originally submitted: 2024-04-17)

Bayesian Markov-Switching Vector Autoregressive Process

Authors: Battulga Gankhuu

This study introduces marginal density functions of the general Bayesian
Markov-Switching Vector Autoregressive (MS-VAR) process. In the case of the
Bayesian MS-VAR process, we provide closed-form density functions and
Monte-Carlo simulation algorithms, including the importance sampling method.
The Monte-Carlo simulation method departs from the previous simulation methods
because it removes the duplication in a regime vector.

arXiv link: http://arxiv.org/abs/2404.11235v3

Econometrics arXiv updated paper (originally submitted: 2024-04-17)

Forecasting with panel data: Estimation uncertainty versus parameter heterogeneity

Authors: M. Hashem Pesaran, Andreas Pick, Allan Timmermann

We provide a comprehensive examination of the predictive performance of panel
forecasting methods based on individual, pooling, fixed effects, and empirical
Bayes estimation, and propose optimal weights for forecast combination schemes.
We consider linear panel data models, allowing for weakly exogenous regressors
and correlated heterogeneity. We quantify the gains from exploiting panel data
and demonstrate how forecasting performance depends on the degree of parameter
heterogeneity, whether such heterogeneity is correlated with the regressors,
the goodness of fit of the model, and the dimensions of the data. Monte Carlo
simulations and empirical applications to house prices and CPI inflation show
that empirical Bayes and forecast combination methods perform best overall and
rarely produce the least accurate forecasts for individual series.

arXiv link: http://arxiv.org/abs/2404.11198v2

Econometrics arXiv paper, submitted: 2024-04-17

Estimation for conditional moment models based on martingale difference divergence

Authors: Kunyang Song, Feiyu Jiang, Ke Zhu

We provide a new estimation method for conditional moment models via the
martingale difference divergence (MDD).Our MDD-based estimation method is
formed in the framework of a continuum of unconditional moment restrictions.
Unlike the existing estimation methods in this framework, the MDD-based
estimation method adopts a non-integrable weighting function, which could grab
more information from unconditional moment restrictions than the integrable
weighting function to enhance the estimation efficiency. Due to the nature of
shift-invariance in MDD, our MDD-based estimation method can not identify the
intercept parameters. To overcome this identification issue, we further provide
a two-step estimation procedure for the model with intercept parameters. Under
regularity conditions, we establish the asymptotics of the proposed estimators,
which are not only easy-to-implement with analytic asymptotic variances, but
also applicable to time series data with an unspecified form of conditional
heteroskedasticity. Finally, we illustrate the usefulness of the proposed
estimators by simulations and two real examples.

arXiv link: http://arxiv.org/abs/2404.11092v1

Econometrics arXiv updated paper (originally submitted: 2024-04-17)

Partial Identification of Structural Vector Autoregressions with Non-Centred Stochastic Volatility

Authors: Helmut Lütkepohl, Fei Shang, Luis Uzeda, Tomasz Woźniak

We consider structural vector autoregressions that are identified through
stochastic volatility under Bayesian estimation. Three contributions emerge
from our exercise. First, we show that a non-centred parameterization of
stochastic volatility yields a marginal prior for the conditional variances of
structural shocks that is centred on homoskedasticity, with strong shrinkage
and heavy tails -- unlike the common centred parameterization. This feature
makes it well suited for assessing partial identification of any shock of
interest. Second, Monte Carlo experiments on small and large systems indicate
that the non-centred setup estimates structural parameters more precisely and
normalizes conditional variances efficiently. Third, revisiting prominent
fiscal structural vector autoregressions, we show how the non-centred approach
identifies tax shocks that are consistent with estimates reported in the
literature.

arXiv link: http://arxiv.org/abs/2404.11057v2

Econometrics arXiv updated paper (originally submitted: 2024-04-15)

From Predictive Algorithms to Automatic Generation of Anomalies

Authors: Sendhil Mullainathan, Ashesh Rambachan

How can we extract theoretical insights from machine learning algorithms? We
take a familiar lesson: researchers often turn their intuitions into
theoretical insights by constructing "anomalies" -- specific examples
highlighting hypothesized flaws in a theory, such as the Allais paradox and the
Kahneman-Tversky choice experiments for expected utility. We develop procedures
that replace researchers' intuitions with predictive algorithms: given a
predictive algorithm and a theory, our procedures automatically generate
anomalies for that theory. We illustrate our procedures with a concrete
application: generating anomalies for expected utility theory. Based on a
neural network that accurately predicts lottery choices, our procedures recover
known anomalies for expected utility theory and discover new ones absent from
existing work. In incentivized experiments, subjects violate expected utility
theory on these algorithmically generated anomalies at rates similar to the
Allais paradox and common ratio effect.

arXiv link: http://arxiv.org/abs/2404.10111v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-04-15

Overfitting Reduction in Convex Regression

Authors: Zhiqiang Liao, Sheng Dai, Eunji Lim, Timo Kuosmanen

Convex regression is a method for estimating the convex function from a data
set. This method has played an important role in operations research,
economics, machine learning, and many other areas. However, it has been
empirically observed that convex regression produces inconsistent estimates of
convex functions and extremely large subgradients near the boundary as the
sample size increases. In this paper, we provide theoretical evidence of this
overfitting behavior. To eliminate this behavior, we propose two new estimators
by placing a bound on the subgradients of the convex function. We further show
that our proposed estimators can reduce overfitting by proving that they
converge to the underlying true convex function and that their subgradients
converge to the gradient of the underlying function, both uniformly over the
domain with probability one as the sample size is increasing to infinity. An
application to Finnish electricity distribution firms confirms the superior
performance of the proposed methods in predictive power over the existing
methods.

arXiv link: http://arxiv.org/abs/2404.09528v2

Econometrics arXiv updated paper (originally submitted: 2024-04-15)

The Role of Carbon Pricing in Food Inflation: Evidence from Canadian Provinces

Authors: Jiansong Xu

In the search for political-economic tools for greenhouse gas mitigation,
carbon pricing, which includes carbon tax and cap-and-trade, is implemented by
many governments. However, the inflating food prices in carbon-pricing
countries, such as Canada, have led many to believe such policies harm food
affordability. This study aims to identify changes in food prices induced by
carbon pricing using the case of Canadian provinces. Using the staggered
difference-in-difference (DiD) approach, we find an overall deflationary effect
of carbon pricing on food prices (measured by monthly provincial food CPI). The
average reductions in food CPI compared to before carbon pricing are $2%$ and
$4%$ within and beyond two years of implementation. We further find that the
deflationary effects are partially driven by lower consumption with no
significant change via farm input costs. Evidence in this paper suggests no
inflationary effect of carbon pricing in Canadian provinces, thus giving no
support to the growing voices against carbon pricing policies.

arXiv link: http://arxiv.org/abs/2404.09467v5

Econometrics arXiv updated paper (originally submitted: 2024-04-14)

Julia as a universal platform for statistical software development

Authors: David Roodman

The julia package integrates the Julia programming language into Stata. Users
can transfer data between Stata and Julia, issue Julia commands to analyze and
plot, and pass results back to Stata. Julia's econometric ecosystem is not as
mature as Stata's or R's or Python's. But Julia is an excellent environment for
developing high-performance numerical applications, which can then be called
from many platforms. For example, the boottest program for wild bootstrap-based
inference (Roodman et al. 2019) and fwildclusterboot for R (Fischer and Roodman
2021) can use the same Julia back end. And the program reghdfejl mimics reghdfe
(Correia 2016) in fitting linear models with high-dimensional fixed effects
while calling a Julia package for tenfold acceleration on hard problems.
reghdfejl also supports nonlinear fixed-effect models that cannot otherwise be
fit in Stata--though preliminarily, as the Julia package for that purpose is
immature.

arXiv link: http://arxiv.org/abs/2404.09309v4

Econometrics arXiv paper, submitted: 2024-04-14

Identifying Causal Effects under Kink Setting: Theory and Evidence

Authors: Yi Lu, Jianguo Wang, Huihua Xie

This paper develops a generalized framework for identifying causal impacts in
a reduced-form manner under kinked settings when agents can manipulate their
choices around the threshold. The causal estimation using a bunching framework
was initially developed by Diamond and Persson (2017) under notched settings.
Many empirical applications of bunching designs involve kinked settings. We
propose a model-free causal estimator in kinked settings with sharp bunching
and then extend to the scenarios with diffuse bunching, misreporting,
optimization frictions, and heterogeneity. The estimation method is mostly
non-parametric and accounts for the interior response under kinked settings.
Applying the proposed approach, we estimate how medical subsidies affect
outpatient behaviors in China.

arXiv link: http://arxiv.org/abs/2404.09117v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-04-12

Multiply-Robust Causal Change Attribution

Authors: Victor Quintas-Martinez, Mohammad Taha Bahadori, Eduardo Santiago, Jeff Mu, Dominik Janzing, David Heckerman

Comparing two samples of data, we observe a change in the distribution of an
outcome variable. In the presence of multiple explanatory variables, how much
of the change can be explained by each possible cause? We develop a new
estimation strategy that, given a causal model, combines regression and
re-weighting methods to quantify the contribution of each causal mechanism. Our
proposed methodology is multiply robust, meaning that it still recovers the
target parameter under partial misspecification. We prove that our estimator is
consistent and asymptotically normal. Moreover, it can be incorporated into
existing frameworks for causal attribution, such as Shapley values, which will
inherit the consistency and large-sample distribution properties. Our method
demonstrates excellent performance in Monte Carlo simulations, and we show its
usefulness in an empirical application. Our method is implemented as part of
the Python library DoWhy (arXiv:2011.04216, arXiv:2206.06821).

arXiv link: http://arxiv.org/abs/2404.08839v4

Econometrics arXiv cross-link from cs.CL (cs.CL), submitted: 2024-04-12

Measuring the Quality of Answers in Political Q&As with Large Language Models

Authors: R. Michael Alvarez, Jacob Morrier

This article proposes a new approach for assessing the quality of answers in
political question-and-answer sessions. We measure the quality of an answer
based on how easily and accurately it can be recognized in a random set of
candidate answers given the question's text. This measure reflects the answer's
relevance and depth of engagement with the question. Like semantic search, we
can implement this approach by training a language model on the corpus of
observed questions and answers without additional human-labeled data. We
showcase and validate our methodology within the context of the Question Period
in the Canadian House of Commons. Our analysis reveals that while some answers
have a weak semantic connection to questions, hinting at some evasion or
obfuscation, they are generally at least moderately relevant, far exceeding
what we would expect from random replies. We also find a meaningful correlation
between answer quality and the party affiliation of the members of Parliament
asking the questions.

arXiv link: http://arxiv.org/abs/2404.08816v5

Econometrics arXiv updated paper (originally submitted: 2024-04-12)

Estimation and Inference for Three-Dimensional Panel Data Models

Authors: Guohua Feng, Jiti Gao, Fei Liu, Bin Peng

Hierarchical panel data models have recently garnered significant attention.
This study contributes to the relevant literature by introducing a novel
three-dimensional (3D) hierarchical panel data model, which integrates panel
regression with three sets of latent factor structures: one set of global
factors and two sets of local factors. Instead of aggregating latent factors
from various nodes, as seen in the literature of distributed principal
component analysis (PCA), we propose an estimation approach capable of
recovering the parameters of interest and disentangling latent factors at
different levels and across different dimensions. We establish an asymptotic
theory and provide a bootstrap procedure to obtain inference for the parameters
of interest while accommodating various types of cross-sectional dependence and
time series autocorrelation. Finally, we demonstrate the applicability of our
framework by examining productivity convergence in manufacturing industries
worldwide.

arXiv link: http://arxiv.org/abs/2404.08365v2

Econometrics arXiv cross-link from q-fin.GN (q-fin.GN), submitted: 2024-04-11

One Factor to Bind the Cross-Section of Returns

Authors: Nicola Borri, Denis Chetverikov, Yukun Liu, Aleh Tsyvinski

We propose a new non-linear single-factor asset pricing model
$r_{it}=h(f_{t}\lambda_{i})+\epsilon_{it}$. Despite its parsimony, this model
represents exactly any non-linear model with an arbitrary number of factors and
loadings -- a consequence of the Kolmogorov-Arnold representation theorem. It
features only one pricing component $h(f_{t}\lambda_{I})$, comprising a
nonparametric link function of the time-dependent factor and factor loading
that we jointly estimate with sieve-based estimators. Using 171 assets across
major classes, our model delivers superior cross-sectional performance with a
low-dimensional approximation of the link function. Most known finance and
macro factors become insignificant controlling for our single-factor.

arXiv link: http://arxiv.org/abs/2404.08129v1

Econometrics arXiv updated paper (originally submitted: 2024-04-11)

Uniform Inference in High-Dimensional Threshold Regression Models

Authors: Jiatong Li, Hongqiang Yan

We develop a uniform inference theory for high-dimensional slope parameters
in threshold regression models, allowing for either cross-sectional or time
series data. We first establish oracle inequalities for prediction errors, and
L1 estimation errors for the Lasso estimator of the slope parameters and the
threshold parameter, accommodating heteroskedastic non-subgaussian error terms
and non-subgaussian covariates. Next, we derive the asymptotic distribution of
tests involving an increasing number of slope parameters by debiasing (or
desparsifying) the Lasso estimator in cases with no threshold effect and with a
fixed threshold effect. We show that the asymptotic distributions in both cases
are the same, allowing us to perform uniform inference without specifying
whether the model is a linear or threshold regression. Additionally, we extend
the theory to accommodate time series data under the near-epoch dependence
assumption. Finally, we identify statistically significant factors influencing
cross-country economic growth and quantify the effects of military news shocks
on US government spending and GDP, while also estimating a data-driven
threshold point in both applications.

arXiv link: http://arxiv.org/abs/2404.08105v3

Econometrics arXiv updated paper (originally submitted: 2024-04-11)

Merger Analysis with Unobserved Prices

Authors: Paul S. Koh

Standard empirical tools for merger analysis assume price data, which are
often unavailable. I characterize sufficient conditions for identifying the
unilateral effects of mergers without price data using the first-order approach
and merger simulation. Data on merging firms' revenues, margins, and revenue
diversion ratios are sufficient to identify their gross upward pricing pressure
indices and compensating marginal cost reductions. Standard discrete-continuous
demand assumptions facilitate the identification of revenue diversion ratios as
well as the feasibility of merger simulation in terms of percentage change in
price. I apply the framework to the Albertsons/Safeway (2015) and
Staples/Office Depot (2016) mergers.

arXiv link: http://arxiv.org/abs/2404.07684v6

Econometrics arXiv updated paper (originally submitted: 2024-04-09)

Regression Discontinuity Design with Spillovers

Authors: Eric Auerbach, Yong Cai, Ahnaf Rafi

This paper studies regression discontinuity designs (RDD) when
linear-in-means spillovers occur between units that are close in their running
variable. We show that the RDD estimand depends on the ratio of two terms: (1)
the radius over which spillovers occur and (2) the choice of bandwidth used for
the local linear regression. RDD estimates direct treatment effect when radius
is of larger order than the bandwidth and total treatment effect when radius is
of smaller order than the bandwidth. When the two are of similar order, the RDD
estimand need not have a causal interpretation. To recover direct and spillover
effects in the intermediate regime, we propose to incorporate estimated
spillover terms into local linear regression. Our estimator is consistent and
asymptotically normal and we provide bias-aware confidence intervals for direct
treatment effects and spillovers. In the setting of Gonzalez (2021), we detect
endogenous spillovers in voter fraud during the 2009 Afghan Presidential
election. We also clarify when the donut-hole design addresses spillovers in
RDD.

arXiv link: http://arxiv.org/abs/2404.06471v2

Econometrics arXiv updated paper (originally submitted: 2024-04-08)

Common Trends and Long-Run Identification in Nonlinear Structural VARs

Authors: James A. Duffy, Sophocles Mavroeidis

While it is widely recognised that linear (structural) VARs may fail to
capture important aspects of economic time series, the use of nonlinear SVARs
has to date been almost entirely confined to the modelling of stationary time
series, because of a lack of understanding as to how common stochastic trends
may be accommodated within nonlinear models. This has unfortunately
circumscribed the range of series to which such models can be applied -- and/or
required that these series be first transformed to stationarity, a potential
source of misspecification -- and prevented the use of long-run identifying
restrictions in these models. To address these problems, we develop a flexible
class of additively time-separable nonlinear SVARs, which subsume models with
threshold-type endogenous regime switching, both of the piecewise linear and
smooth transition varieties. We extend the Granger--Johansen representation
theorem to this class of models, obtaining conditions that specialise exactly
to the usual ones when the model is linear. We further show that, as a
corollary, these models are capable of supporting the same kinds of long-run
identifying restrictions as are available in linearly cointegrated SVARs.

arXiv link: http://arxiv.org/abs/2404.05349v2

Econometrics arXiv paper, submitted: 2024-04-08

Maximally Forward-Looking Core Inflation

Authors: Philippe Goulet Coulombe, Karin Klieber, Christophe Barrette, Maximilian Goebel

Timely monetary policy decision-making requires timely core inflation
measures. We create a new core inflation series that is explicitly designed to
succeed at that goal. Precisely, we introduce the Assemblage Regression, a
generalized nonnegative ridge regression problem that optimizes the price
index's subcomponent weights such that the aggregate is maximally predictive of
future headline inflation. Ordering subcomponents according to their rank in
each period switches the algorithm to be learning supervised trimmed inflation
- or, put differently, the maximally forward-looking summary statistic of the
realized price changes distribution. In an extensive out-of-sample forecasting
experiment for the US and the euro area, we find substantial improvements for
signaling medium-term inflation developments in both the pre- and post-Covid
years. Those coming from the supervised trimmed version are particularly
striking, and are attributable to a highly asymmetric trimming which contrasts
with conventional indicators. We also find that this metric was indicating
first upward pressures on inflation as early as mid-2020 and quickly captured
the turning point in 2022. We also consider extensions, like assembling
inflation from geographical regions, trimmed temporal aggregation, and building
core measures specialized for either upside or downside inflation risks.

arXiv link: http://arxiv.org/abs/2404.05209v1

Econometrics arXiv paper, submitted: 2024-04-08

Estimating granular house price distributions in the Australian market using Gaussian mixtures

Authors: Willem P Sijp, Anastasios Panagiotelis

A new methodology is proposed to approximate the time-dependent house price
distribution at a fine regional scale using Gaussian mixtures. The means,
variances and weights of the mixture components are related to time, location
and dwelling type through a non linear function trained by a deep functional
approximator. Price indices are derived as means, medians, quantiles or other
functions of the estimated distributions. Price densities for larger regions,
such as a city, are calculated via a weighted sum of the component density
functions. The method is applied to a data set covering all of Australia at a
fine spatial and temporal resolution. In addition to enabling a detailed
exploration of the data, the proposed index yields lower prediction errors in
the practical task of individual dwelling price projection from previous sales
values within the three major Australian cities. The estimated quantiles are
also found to be well calibrated empirically, capturing the complexity of house
price distributions.

arXiv link: http://arxiv.org/abs/2404.05178v1

Econometrics arXiv paper, submitted: 2024-04-07

Context-dependent Causality (the Non-Nonotonic Case)

Authors: Nir Billfeld, Moshe Kim

We develop a novel identification strategy as well as a new estimator for
context-dependent causal inference in non-parametric triangular models with
non-separable disturbances. Departing from the common practice, our analysis
does not rely on the strict monotonicity assumption. Our key contribution lies
in leveraging on diffusion models to formulate the structural equations as a
system evolving from noise accumulation to account for the influence of the
latent context (confounder) on the outcome. Our identifiability strategy
involves a system of Fredholm integral equations expressing the distributional
relationship between a latent context variable and a vector of observables.
These integral equations involve an unknown kernel and are governed by a set of
structural form functions, inducing a non-monotonic inverse problem. We prove
that if the kernel density can be represented as an infinite mixture of
Gaussians, then there exists a unique solution for the unknown function. This
is a significant result, as it shows that it is possible to solve a
non-monotonic inverse problem even when the kernel is unknown. On the
methodological front we leverage on a novel and enriched Contaminated
Generative Adversarial (Neural) Networks (CONGAN) which we provide as a
solution to the non-monotonic inverse problem.

arXiv link: http://arxiv.org/abs/2404.05021v1

Econometrics arXiv paper, submitted: 2024-04-07

Towards a generalized accessibility measure for transportation equity and efficiency

Authors: Rajat Verma, Mithun Debnath, Shagun Mittal, Satish V. Ukkusuri

Locational measures of accessibility are widely used in urban and
transportation planning to understand the impact of the transportation system
on influencing people's access to places. However, there is a considerable lack
of measurement standards and publicly available data. We propose a generalized
measure of locational accessibility that has a comprehensible form for
transportation planning analysis. This metric combines the cumulative
opportunities approach with gravity-based measures and is capable of catering
to multiple trip purposes, travel modes, cost thresholds, and scales of
analysis. Using data from multiple publicly available datasets, this metric is
computed by trip purpose and travel time threshold for all block groups in the
United States, and the data is made publicly accessible. Further, case studies
of three large metropolitan areas reveal substantial inefficiencies in
transportation infrastructure, with the most inefficiency observed in sprawling
and non-core urban areas, especially for bicycling. Subsequently, it is shown
that targeted investment in facilities can contribute to a more equitable
distribution of accessibility to essential shopping and service facilities. By
assigning greater weights to socioeconomically disadvantaged neighborhoods, the
proposed metric formally incorporates equity considerations into transportation
planning, contributing to a more equitable distribution of accessibility to
essential services and facilities.

arXiv link: http://arxiv.org/abs/2404.04985v1

Econometrics arXiv updated paper (originally submitted: 2024-04-07)

CAVIAR: Categorical-Variable Embeddings for Accurate and Robust Inference

Authors: Anirban Mukherjee, Hannah Hanwen Chang

Social science research often hinges on the relationship between categorical
variables and outcomes. We introduce CAVIAR, a novel method for embedding
categorical variables that assume values in a high-dimensional ambient space
but are sampled from an underlying manifold. Our theoretical and numerical
analyses outline challenges posed by such categorical variables in causal
inference. Specifically, dynamically varying and sparse levels can lead to
violations of the Donsker conditions and a failure of the estimation
functionals to converge to a tight Gaussian process. Traditional approaches,
including the exclusion of rare categorical levels and principled variable
selection models like LASSO, fall short. CAVIAR embeds the data into a
lower-dimensional global coordinate system. The mapping can be derived from
both structured and unstructured data, and ensures stable and robust estimates
through dimensionality reduction. In a dataset of direct-to-consumer apparel
sales, we illustrate how high-dimensional categorical variables, such as zip
codes, can be succinctly represented, facilitating inference and analysis.

arXiv link: http://arxiv.org/abs/2404.04979v2

Econometrics arXiv paper, submitted: 2024-04-07

Neural Network Modeling for Forecasting Tourism Demand in Stopića Cave: A Serbian Cave Tourism Study

Authors: Buda Bajić, Srđan Milićević, Aleksandar Antić, Slobodan Marković, Nemanja Tomić

For modeling the number of visits in Stopi\'{c}a cave (Serbia) we consider
the classical Auto-regressive Integrated Moving Average (ARIMA) model, Machine
Learning (ML) method Support Vector Regression (SVR), and hybrid NeuralPropeth
method which combines classical and ML concepts. The most accurate predictions
were obtained with NeuralPropeth which includes the seasonal component and
growing trend of time-series. In addition, non-linearity is modeled by shallow
Neural Network (NN), and Google Trend is incorporated as an exogenous variable.
Modeling tourist demand represents great importance for management structures
and decision-makers due to its applicability in establishing sustainable
tourism utilization strategies in environmentally vulnerable destinations such
as caves. The data provided insights into the tourist demand in Stopi\'{c}a
cave and preliminary data for addressing the issues of carrying capacity within
the most visited cave in Serbia.

arXiv link: http://arxiv.org/abs/2404.04974v1

Econometrics arXiv updated paper (originally submitted: 2024-04-06)

Stratifying on Treatment Status

Authors: Jinyong Hahn, John Ham, Geert Ridder, Shuyang Sheng

We study the estimation of treatment effects using samples stratified by
treatment status. Standard estimators of the average treatment effect and the
local average treatment effect are inconsistent in this setting. We propose
consistent estimators and characterize their asymptotic distributions.

arXiv link: http://arxiv.org/abs/2404.04700v3

Econometrics arXiv paper, submitted: 2024-04-06

Absolute Technical Efficiency Indices

Authors: Montacer Ben Cheikh Larbi, Sina Belkhiria

Technical efficiency indices (TEIs) can be estimated using the traditional
stochastic frontier analysis approach, which yields relative indices that do
not allow self-interpretations. In this paper, we introduce a single-step
estimation procedure for TEIs that eliminates the need to identify best
practices and avoids imposing restrictive hypotheses on the error term. The
resulting indices are absolute and allow for individual interpretation. In our
model, we estimate a distance function using the inverse coefficient of
resource utilization, rather than treating it as unobservable. We employ a
Tobit model with a translog distance function as our econometric framework.
Applying this model to a sample of 19 airline companies from 2012 to 2021, we
find that: (1) Absolute technical efficiency varies considerably between
companies with medium-haul European airlines being technically the most
efficient, while Asian airlines are the least efficient; (2) Our estimated TEIs
are consistent with the observed data with a decline in efficiency especially
during the Covid-19 crisis and Brexit period; (3) All airlines contained in our
sample would be able to increase their average technical efficiency by 0.209%
if they reduced their average kerosene consumption by 1%; (4) Total factor
productivity (TFP) growth slowed between 2013 and 2019 due to a decrease in
Disembodied Technical Change (DTC) and a small effect from Scale Economies
(SE). Toward the end of our study period, TFP growth seemed increasingly driven
by the SE effect, with a sharp decline in 2020 followed by an equally sharp
recovery in 2021 for most airlines.

arXiv link: http://arxiv.org/abs/2404.04590v1

Econometrics arXiv updated paper (originally submitted: 2024-04-06)

Fast and simple inner-loop algorithms of static / dynamic BLP estimations

Authors: Takeshi Fukasawa

This study investigates computationally efficient inner-loop algorithms for
estimating static/dynamic BLP models. It provides the following ideas for
reducing the number of inner-loop iterations: (1). Add a term relating to the
outside option share in the BLP contraction mapping; (2). Analytically
represent the mean product utilities as a function of value functions and solve
for value functions (for dynamic BLP); (3). Combine an acceleration method of
fixed-point iterations, especially the Anderson acceleration. They are
independent and easy to implement. This study shows the good performance of
these methods using numerical experiments.

arXiv link: http://arxiv.org/abs/2404.04494v5

Econometrics arXiv paper, submitted: 2024-04-04

Forecasting with Neuro-Dynamic Programming

Authors: Pedro Afonso Fernandes

Economic forecasting is concerned with the estimation of some variable like
gross domestic product (GDP) in the next period given a set of variables that
describes the current situation or state of the economy, including industrial
production, retail trade turnover or economic confidence. Neuro-dynamic
programming (NDP) provides tools to deal with forecasting and other sequential
problems with such high-dimensional states spaces. Whereas conventional
forecasting methods penalises the difference (or loss) between predicted and
actual outcomes, NDP favours the difference between temporally successive
predictions, following an interactive and trial-and-error approach. Past data
provides a guidance to train the models, but in a different way from ordinary
least squares (OLS) and other supervised learning methods, signalling the
adjustment costs between sequential states. We found that it is possible to
train a GDP forecasting model with data concerned with other countries that
performs better than models trained with past data from the tested country
(Portugal). In addition, we found that non-linear architectures to approximate
the value function of a sequential problem, namely, neural networks can perform
better than a simple linear architecture, lowering the out-of-sample mean
absolute forecast error (MAE) by 32% from an OLS model.

arXiv link: http://arxiv.org/abs/2404.03737v1

Econometrics arXiv updated paper (originally submitted: 2024-04-04)

An early warning system for emerging markets

Authors: Artem Kraevskiy, Artem Prokhorov, Evgeniy Sokolovskiy

Financial markets of emerging economies are vulnerable to extreme and
cascading information spillovers, surges, sudden stops and reversals. With this
in mind, we develop a new online early warning system (EWS) to detect what is
referred to as `concept drift' in machine learning, as a `regime shift' in
economics and as a `change-point' in statistics. The system explores
nonlinearities in financial information flows and remains robust to heavy tails
and dependence of extremes. The key component is the use of conditional
entropy, which captures shifts in various channels of information transmission,
not only in conditional mean or variance. We design a baseline method, and
adapt it to a modern high-dimensional setting through the use of random forests
and copulas. We show the relevance of each system component to the analysis of
emerging markets. The new approach detects significant shifts where
conventional methods fail. We explore when this happens using simulations and
we provide two illustrations when the methods generate meaningful warnings. The
ability to detect changes early helps improve resilience in emerging markets
against shocks and provides new economic and financial insights into their
operation.

arXiv link: http://arxiv.org/abs/2404.03319v2

Econometrics arXiv paper, submitted: 2024-04-04

Marginal Treatment Effects and Monotonicity

Authors: Henrik Sigstad

How robust are analyses based on marginal treatment effects (MTE) to
violations of Imbens and Angrist (1994) monotonicity? In this note, I present
weaker forms of monotonicity under which popular MTE-based estimands still
identify the parameters of interest.

arXiv link: http://arxiv.org/abs/2404.03235v1

Econometrics arXiv updated paper (originally submitted: 2024-04-03)

Bayesian Bi-level Sparse Group Regressions for Macroeconomic Density Forecasting

Authors: Matteo Mogliani, Anna Simoni

We propose a Machine Learning approach for optimal macroeconomic density
forecasting in a high-dimensional setting where the underlying model exhibits a
known group structure. Our approach is general enough to encompass specific
forecasting models featuring either many covariates, or unknown nonlinearities,
or series sampled at different frequencies. By relying on the novel concept of
bi-level sparsity in time-series econometrics, we construct density forecasts
based on a prior that induces sparsity both at the group level and within
groups. We demonstrate the consistency of both posterior and predictive
distributions. We show that the posterior distribution contracts at the
minimax-optimal rate and, asymptotically, puts mass on a set that includes the
support of the model. Our theory allows for correlation between groups, while
predictors in the same group can be characterized by strong covariation as well
as common characteristics and patterns. Finite sample performance is
illustrated through comprehensive Monte Carlo experiments and a real-data
nowcasting exercise of the US GDP growth rate.

arXiv link: http://arxiv.org/abs/2404.02671v3

Econometrics arXiv paper, submitted: 2024-04-03

Moran's I 2-Stage Lasso: for Models with Spatial Correlation and Endogenous Variables

Authors: Sylvain Barde, Rowan Cherodian, Guy Tchuente

We propose a novel estimation procedure for models with endogenous variables
in the presence of spatial correlation based on Eigenvector Spatial Filtering.
The procedure, called Moran's $I$ 2-Stage Lasso (Mi-2SL), uses a two-stage
Lasso estimator where the Standardised Moran's I is used to set the Lasso
tuning parameter. Unlike existing spatial econometric methods, this has the key
benefit of not requiring the researcher to explicitly model the spatial
correlation process, which is of interest in cases where they are only
interested in removing the resulting bias when estimating the direct effect of
covariates. We show the conditions necessary for consistent and asymptotically
normal parameter estimation assuming the support (relevant) set of eigenvectors
is known. Our Monte Carlo simulation results also show that Mi-2SL performs
well against common alternatives in the presence of spatial correlation. Our
empirical application replicates Cadena and Kovak (2016) instrumental variables
estimates using Mi-2SL and shows that in that case, Mi-2SL can boost the
performance of the first stage.

arXiv link: http://arxiv.org/abs/2404.02584v1

Econometrics arXiv updated paper (originally submitted: 2024-04-03)

Improved Semi-Parametric Bounds for Tail Probability and Expected Loss: Theory and Applications

Authors: Zhaolin Li, Artem Prokhorov

Many management decisions involve accumulated random realizations for which
only the first and second moments of their distribution are available. The
sharp Chebyshev-type bound for the tail probability and Scarf bound for the
expected loss are widely used in this setting. We revisit the tail behavior of
such quantities with a focus on independence. Conventional primal-dual
approaches from optimization are ineffective in this setting. Instead, we use
probabilistic inequalities to derive new bounds and offer new insights. For
non-identical distributions attaining the tail probability bounds, we show that
the extreme values are equidistant regardless of the distributional
differences. For the bound on the expected loss, we show that the impact of
each random variable on the expected sum can be isolated using an extension of
the Korkine identity. We illustrate how these new results open up abundant
practical applications, including improved pricing of product bundles, more
precise option pricing, more efficient insurance design, and better inventory
management. For example, we establish a new solution to the optimal bundling
problem, yielding a 17% uplift in per-bundle profits, and a new solution to the
inventory problem, yielding a 5.6% cost reduction for a model with 20
retailers.

arXiv link: http://arxiv.org/abs/2404.02400v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-04-02

Seemingly unrelated Bayesian additive regression trees for cost-effectiveness analyses in healthcare

Authors: Jonas Esser, Mateus Maia, Andrew C. Parnell, Judith Bosmans, Hanneke van Dongen, Thomas Klausch, Keefe Murphy

In recent years, theoretical results and simulation evidence have shown
Bayesian additive regression trees to be a highly-effective method for
nonparametric regression. Motivated by cost-effectiveness analyses in health
economics, where interest lies in jointly modelling the costs of healthcare
treatments and the associated health-related quality of life experienced by a
patient, we propose a multivariate extension of BART which is applicable in
regression analyses with several dependent outcome variables. Our framework
allows for continuous or binary outcomes and overcomes some key limitations of
existing multivariate BART models by allowing each individual response to be
associated with different ensembles of trees, while still handling dependencies
between the outcomes. In the case of continuous outcomes, our model is
essentially a nonparametric version of seemingly unrelated regression.
Likewise, our proposal for binary outcomes is a nonparametric generalisation of
the multivariate probit model. We give suggestions for easily interpretable
prior distributions, which allow specification of both informative and
uninformative priors. We provide detailed discussions of MCMC sampling methods
to conduct posterior inference. Our methods are implemented in the R package
"subart". We showcase their performance through extensive simulation
experiments and an application to an empirical case study from health
economics. By also accommodating propensity scores in a manner befitting a
causal analysis, we find substantial evidence for a novel trauma care
intervention's cost-effectiveness.

arXiv link: http://arxiv.org/abs/2404.02228v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-04-02

Robustly estimating heterogeneity in factorial data using Rashomon Partitions

Authors: Aparajithan Venkateswaran, Anirudh Sankar, Arun G. Chandrasekhar, Tyler H. McCormick

In both observational data and randomized control trials, researchers select
statistical models to articulate how the outcome of interest varies with
combinations of observable covariates. Choosing a model that is too simple can
obfuscate important heterogeneity in outcomes between covariate groups, while
too much complexity risks identifying spurious patterns. In this paper, we
propose a novel Bayesian framework for model uncertainty called Rashomon
Partition Sets (RPSs). The RPS consists of all models that have posterior
density close to the maximum a posteriori (MAP) model. We construct the RPS by
enumeration, rather than sampling, which ensures that we explore all models
models with high evidence in the data, even if they offer dramatically
different substantive explanations. We use a l0 prior, which allows the allows
us to capture complex heterogeneity without imposing strong assumptions about
the associations between effects, showing this prior is minimax optimal from an
information-theoretic perspective. We characterize the approximation error of
(functions of) parameters computed conditional on being in the RPS relative to
the entire posterior. We propose an algorithm to enumerate the RPS from the
class of models that are interpretable and unique, then provide bounds on the
size of the RPS. We give simulation evidence along with three empirical
examples: price effects on charitable giving, heterogeneity in chromosomal
structure, and the introduction of microfinance.

arXiv link: http://arxiv.org/abs/2404.02141v4

Econometrics arXiv paper, submitted: 2024-04-02

The impact of geopolitical risk on the international agricultural market: Empirical analysis based on the GJR-GARCH-MIDAS model

Authors: Yun-Shi Dai, Peng-Fei Dai, Wei-Xing Zhou

The current international landscape is turbulent and unstable, with frequent
outbreaks of geopolitical conflicts worldwide. Geopolitical risk has emerged as
a significant threat to regional and global peace, stability, and economic
prosperity, causing serious disruptions to the global food system and food
security. Focusing on the international food market, this paper builds
different dimensions of geopolitical risk measures based on the random matrix
theory and constructs single- and two-factor GJR-GARCH-MIDAS models with fixed
time span and rolling window, respectively, to investigate the impact of
geopolitical risk on food market volatility. The findings indicate that
modeling based on rolling window performs better in describing the overall
volatility of the wheat, maize, soybean, and rice markets, and the two-factor
models generally exhibit stronger explanatory power in most cases. In terms of
short-term fluctuations, all four staple food markets demonstrate obvious
volatility clustering and high volatility persistence, without significant
asymmetry. Regarding long-term volatility, the realized volatility of wheat,
maize, and soybean significantly exacerbates their long-run market volatility.
Additionally, geopolitical risks of different dimensions show varying
directions and degrees of effects in explaining the long-term market volatility
of the four staple food commodities. This study contributes to the
understanding of the macro-drivers of food market fluctuations, provides useful
information for investment using agricultural futures, and offers valuable
insights into maintaining the stable operation of food markets and safeguarding
global food security.

arXiv link: http://arxiv.org/abs/2404.01641v1

Econometrics arXiv updated paper (originally submitted: 2024-04-02)

Heterogeneous Treatment Effects and Causal Mechanisms

Authors: Jiawei Fu, Tara Slough

The credibility revolution advances the use of research designs that permit
identification and estimation of causal effects. However, understanding which
mechanisms produce measured causal effects remains a challenge. A dominant
current approach to the quantitative evaluation of mechanisms relies on the
detection of heterogeneous treatment effects with respect to pre-treatment
covariates. This paper develops a framework to understand when the existence of
such heterogeneous treatment effects can support inferences about the
activation of a mechanism. We show first that this design cannot provide
evidence of mechanism activation without an additional, generally implicit,
assumption. Further, even when this assumption is satisfied, if a measured
outcome is produced by a non-linear transformation of a directly-affected
outcome of theoretical interest, heterogeneous treatment effects are not
informative of mechanism activation. We provide novel guidance for
interpretation and research design in light of these findings.

arXiv link: http://arxiv.org/abs/2404.01566v3

Econometrics arXiv paper, submitted: 2024-04-01

Estimating Heterogeneous Effects: Applications to Labor Economics

Authors: Stephane Bonhomme, Angela Denis

A growing number of applications involve settings where, in order to infer
heterogeneous effects, a researcher compares various units. Examples of
research designs include children moving between different neighborhoods,
workers moving between firms, patients migrating from one city to another, and
banks offering loans to different firms. We present a unified framework for
these settings, based on a linear model with normal random coefficients and
normal errors. Using the model, we discuss how to recover the mean and
dispersion of effects, other features of their distribution, and to construct
predictors of the effects. We provide moment conditions on the model's
parameters, and outline various estimation strategies. A main objective of the
paper is to clarify some of the underlying assumptions by highlighting their
economic content, and to discuss and inform some of the key practical choices.

arXiv link: http://arxiv.org/abs/2404.01495v1

Econometrics arXiv paper, submitted: 2024-04-01

Convolution-t Distributions

Authors: Peter Reinhard Hansen, Chen Tong

We introduce a new class of multivariate heavy-tailed distributions that are
convolutions of heterogeneous multivariate t-distributions. Unlike commonly
used heavy-tailed distributions, the multivariate convolution-t distributions
embody cluster structures with flexible nonlinear dependencies and
heterogeneous marginal distributions. Importantly, convolution-t distributions
have simple density functions that facilitate estimation and likelihood-based
inference. The characteristic features of convolution-t distributions are found
to be important in an empirical analysis of realized volatility measures and
help identify their underlying factor structure.

arXiv link: http://arxiv.org/abs/2404.00864v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-03-31

Estimating sample paths of Gauss-Markov processes from noisy data

Authors: Benjamin Davies

I derive the pointwise conditional means and variances of an arbitrary
Gauss-Markov process, given noisy observations of points on a sample path.
These moments depend on the process's mean and covariance functions, and on the
conditional moments of the sampled points. I study the Brownian motion and
bridge as special cases.

arXiv link: http://arxiv.org/abs/2404.00784v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-03-30

Policy Learning for Optimal Dynamic Treatment Regimes with Observational Data

Authors: Shosei Sakaguchi

Public policies and medical interventions often involve dynamic treatment
assignments, in which individuals receive a sequence of interventions over
multiple stages. We study the statistical learning of optimal dynamic treatment
regimes (DTRs) that determine the optimal treatment assignment for each
individual at each stage based on their evolving history. We propose a novel,
doubly robust, classification-based method for learning the optimal DTR from
observational data under the sequential ignorability assumption. The method
proceeds via backward induction: at each stage, it constructs and maximizes an
augmented inverse probability weighting (AIPW) estimator of the policy value
function to learn the optimal stage-specific policy. We show that the resulting
DTR achieves an optimal convergence rate of $n^{-1/2}$ for welfare regret under
mild convergence conditions on estimators of the nuisance components.

arXiv link: http://arxiv.org/abs/2404.00221v7

Econometrics arXiv updated paper (originally submitted: 2024-03-29)

Sequential Synthetic Difference in Differences

Authors: Dmitry Arkhangelsky, Aleksei Samkov

We propose the Sequential Synthetic Difference-in-Differences (Sequential
SDiD) estimator for event studies with staggered treatment adoption,
particularly when the parallel trends assumption fails. The method uses an
iterative imputation procedure on aggregated data, where estimates for
early-adopting cohorts are used to construct counterfactuals for later ones. We
prove the estimator is asymptotically equivalent to an infeasible oracle OLS
estimator within a linear model with interactive fixed effects. This key
theoretical result provides a foundation for standard inference by establishing
asymptotic normality and clarifying the estimator's efficiency. By offering a
robust and transparent method with formal statistical guarantees, Sequential
SDiD is a powerful alternative to conventional difference-in-differences
strategies.

arXiv link: http://arxiv.org/abs/2404.00164v2

Econometrics arXiv paper, submitted: 2024-03-28

Dynamic Analyses of Contagion Risk and Module Evolution on the SSE A-Shares Market Based on Minimum Information Entropy

Authors: Muzi Chen, Yuhang Wang, Boyao Wu, Difang Huang

The interactive effect is significant in the Chinese stock market,
exacerbating the abnormal market volatilities and risk contagion. Based on
daily stock returns in the Shanghai Stock Exchange (SSE) A-shares, this paper
divides the period between 2005 and 2018 into eight bull and bear market stages
to investigate interactive patterns in the Chinese financial market. We employ
the LASSO method to construct the stock network and further use the Map
Equation method to analyze the evolution of modules in the SSE A-shares market.
Empirical results show: (1) The connected effect is more significant in bear
markets than bull markets; (2) A system module can be found in the network
during the first four stages, and the industry aggregation effect leads to
module differentiation in the last four stages; (3) Some stocks have leading
effects on others throughout eight periods, and medium- and small-cap stocks
with poor financial conditions are more likely to become risk sources,
especially in bear markets. Our conclusions are beneficial to improving
investment strategies and making regulatory policies.

arXiv link: http://arxiv.org/abs/2403.19439v1

Econometrics arXiv paper, submitted: 2024-03-28

Dynamic Correlation of Market Connectivity, Risk Spillover and Abnormal Volatility in Stock Price

Authors: Muzi Chen, Nan Li, Lifen Zheng, Difang Huang, Boyao Wu

The connectivity of stock markets reflects the information efficiency of
capital markets and contributes to interior risk contagion and spillover
effects. We compare Shanghai Stock Exchange A-shares (SSE A-shares) during
tranquil periods, with high leverage periods associated with the 2015 subprime
mortgage crisis. We use Pearson correlations of returns, the maximum strongly
connected subgraph, and $3\sigma$ principle to iteratively determine the
threshold value for building a dynamic correlation network of SSE A-shares.
Analyses are carried out based on the networking structure, intra-sector
connectivity, and node status, identifying several contributions. First,
compared with tranquil periods, the SSE A-shares network experiences a more
significant small-world and connective effect during the subprime mortgage
crisis and the high leverage period in 2015. Second, the finance, energy and
utilities sectors have a stronger intra-industry connectivity than other
sectors. Third, HUB nodes drive the growth of the SSE A-shares market during
bull periods, while stocks have a think-tail degree distribution in bear
periods and show distinct characteristics in terms of market value and finance.
Granger linear and non-linear causality networks are also considered for the
comparison purpose. Studies on the evolution of inter-cycle connectivity in the
SSE A-share market may help investors improve portfolios and develop more
robust risk management policies.

arXiv link: http://arxiv.org/abs/2403.19363v1

Econometrics arXiv updated paper (originally submitted: 2024-03-27)

Distributional Treatment Effect with Latent Rank Invariance

Authors: Myungkou Shin

Treatment effect heterogeneity is of a great concern when evaluating policy
impact: "is the treatment Pareto-improving?", "what is the proportion of people
who are better off under the treatment?", etc. However, even in the simple case
of a binary random treatment, existing analysis has been mostly limited to an
average treatment effect or a quantile treatment effect, due to the fundamental
limitation that we cannot simultaneously observe both treated potential outcome
and untreated potential outcome for a given unit. This paper assumes a
conditional independence assumption that the two potential outcomes are
independent of each other given a scalar latent variable. With a specific
example of strictly increasing conditional expectation, I label the latent
variable as 'latent rank' and motivate the identifying assumption as 'latent
rank invariance.' In implementation, I assume a finite support on the latent
variable and propose an estimation strategy based on a nonnegative matrix
factorization. A limiting distribution is derived for the distributional
treatment effect estimator, using Neyman orthogonality.

arXiv link: http://arxiv.org/abs/2403.18503v3

Econometrics arXiv updated paper (originally submitted: 2024-03-27)

Statistical Inference of Optimal Allocations I: Regularities and their Implications

Authors: Kai Feng, Han Hong, Denis Nekipelov

In this paper, we develop a functional differentiability approach for solving
statistical optimal allocation problems. We derive Hadamard differentiability
of the value functions through analyzing the properties of the sorting operator
using tools from geometric measure theory. Building on our Hadamard
differentiability results, we apply the functional delta method to obtain the
asymptotic properties of the value function process for the binary constrained
optimal allocation problem and the plug-in ROC curve estimator. Moreover, the
convexity of the optimal allocation value functions facilitates demonstrating
the degeneracy of first order derivatives with respect to the policy. We then
present a double / debiased estimator for the value functions. Importantly, the
conditions that validate Hadamard differentiability justify the margin
assumption from the statistical classification literature for the fast
convergence rate of plug-in methods.

arXiv link: http://arxiv.org/abs/2403.18248v3

Econometrics arXiv paper, submitted: 2024-03-26

Deconvolution from two order statistics

Authors: JoonHwan Cho, Yao Luo, Ruli Xiao

Economic data are often contaminated by measurement errors and truncated by
ranking. This paper shows that the classical measurement error model with
independent and additive measurement errors is identified nonparametrically
using only two order statistics of repeated measurements. The identification
result confirms a hypothesis by Athey and Haile (2002) for a symmetric
ascending auction model with unobserved heterogeneity. Extensions allow for
heterogeneous measurement errors, broadening the applicability to additional
empirical settings, including asymmetric auctions and wage offer models. We
adapt an existing simulated sieve estimator and illustrate its performance in
finite samples.

arXiv link: http://arxiv.org/abs/2403.17777v1

Econometrics arXiv updated paper (originally submitted: 2024-03-26)

The inclusive Synthetic Control Method

Authors: Roberta Di Stefano, Giovanni Mellace

We introduce the inclusive synthetic control method (iSCM), a modification of
synthetic control methods that includes units in the donor pool potentially
affected, directly or indirectly, by an intervention. This method is ideal for
situations where including treated units in the donor pool is essential or
where donor units may experience spillover effects. The iSCM is straightforward
to implement with most synthetic control estimators. As an empirical
illustration, we re-estimate the causal effect of German reunification on GDP
per capita, accounting for spillover effects from West Germany to Austria.

arXiv link: http://arxiv.org/abs/2403.17624v2

Econometrics arXiv paper, submitted: 2024-03-25

Resistant Inference in Instrumental Variable Models

Authors: Jens Klooster, Mikhail Zhelonkin

The classical tests in the instrumental variable model can behave arbitrarily
if the data is contaminated. For instance, one outlying observation can be
enough to change the outcome of a test. We develop a framework to construct
testing procedures that are robust to weak instruments, outliers and
heavy-tailed errors in the instrumental variable model. The framework is
constructed upon M-estimators. By deriving the influence functions of the
classical weak instrument robust tests, such as the Anderson-Rubin test, K-test
and the conditional likelihood ratio (CLR) test, we prove their unbounded
sensitivity to infinitesimal contamination. Therefore, we construct
contamination resistant/robust alternatives. In particular, we show how to
construct a robust CLR statistic based on Mallows type M-estimators and show
that its asymptotic distribution is the same as that of the (classical) CLR
statistic. The theoretical results are corroborated by a simulation study.
Finally, we revisit three empirical studies affected by outliers and
demonstrate how the new robust tests can be used in practice.

arXiv link: http://arxiv.org/abs/2403.16844v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-03-25

Privacy-Protected Spatial Autoregressive Model

Authors: Danyang Huang, Ziyi Kong, Shuyuan Wu, Hansheng Wang

Spatial autoregressive (SAR) models are important tools for studying network
effects. However, with an increasing emphasis on data privacy, data providers
often implement privacy protection measures that make classical SAR models
inapplicable. In this study, we introduce a privacy-protected SAR model with
noise-added response and covariates to meet privacy-protection requirements.
However, in this scenario, the traditional quasi-maximum likelihood estimator
becomes infeasible because the likelihood function cannot be directly
formulated. To address this issue, we first consider an explicit expression for
the likelihood function with only noise-added responses. Then, we develop
techniques to correct the biases for derivatives introduced by noise.
Correspondingly, a Newton-Raphson-type algorithm is proposed to obtain the
estimator, leading to a corrected likelihood estimator. To further enhance
computational efficiency, we introduce a corrected least squares estimator
based on the idea of bias correction. These two estimation methods ensure both
data security and the attainment of statistically valid estimators. Theoretical
analysis of both estimators is carefully conducted, statistical inference
methods and model extensions are discussed. The finite sample performances of
different methods are demonstrated through extensive simulations and the
analysis of a real dataset.

arXiv link: http://arxiv.org/abs/2403.16773v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-03-25

Quasi-randomization tests for network interference

Authors: Supriya Tiwari, Pallavi Basu

Network interference amounts to the treatment status of one unit affecting
the potential outcome of other units in the population. Testing for spillover
effects in this setting makes the null hypothesis non-sharp. An interesting
approach to tackling the non-sharp nature of the null hypothesis in this setup
is constructing conditional randomization tests such that the null is sharp on
the restricted population. In randomized experiments, conditional randomized
tests hold finite sample validity and are assumption-lean. In this paper, we
incorporate the network amongst the population as a random variable instead of
being fixed. We propose a new approach that builds a conditional
quasi-randomization test. To build the (non-sharp) null distribution of no
spillover effects, we use random graph null models. We show that our method is
exactly valid in finite samples under mild assumptions. Our method displays
enhanced power over state-of-the-art methods, with a substantial improvement in
cluster randomized trials. We illustrate our methodology to test for
interference in a weather insurance adoption experiment run in rural China.

arXiv link: http://arxiv.org/abs/2403.16673v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-03-25

Optimal testing in a class of nonregular models

Authors: Yuya Shimizu, Taisuke Otsu

This paper studies optimal hypothesis testing for nonregular econometric
models with parameter-dependent support. We consider both one-sided and
two-sided hypothesis testing and develop asymptotically uniformly most powerful
tests based on a limit experiment. Our two-sided test becomes asymptotically
uniformly most powerful without imposing further restrictions such as
unbiasedness, and can be inverted to construct a confidence set for the
nonregular parameter. Simulation results illustrate desirable finite sample
properties of the proposed tests.

arXiv link: http://arxiv.org/abs/2403.16413v2

Econometrics arXiv paper, submitted: 2024-03-24

The Informativeness of Combined Experimental and Observational Data under Dynamic Selection

Authors: Yechan Park, Yuya Sasaki

This paper addresses the challenge of estimating the Average Treatment Effect
on the Treated Survivors (ATETS; Vikstrom et al., 2018) in the absence of
long-term experimental data, utilizing available long-term observational data
instead. We establish two theoretical results. First, it is impossible to
obtain informative bounds for the ATETS with no model restriction and no
auxiliary data. Second, to overturn this negative result, we explore as a
promising avenue the recent econometric developments in combining experimental
and observational data (e.g., Athey et al., 2020, 2019); we indeed find that
exploiting short-term experimental data can be informative without imposing
classical model restrictions. Furthermore, building on Chesher and Rosen
(2017), we explore how to systematically derive sharp identification bounds,
exploiting both the novel data-combination principles and classical model
restrictions. Applying the proposed method, we explore what can be learned
about the long-run effects of job training programs on employment without
long-term experimental data.

arXiv link: http://arxiv.org/abs/2403.16177v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2024-03-24

Liquidity Jump, Liquidity Diffusion, and Treatment on Wash Trading of Crypto Assets

Authors: Qi Deng, Zhong-guo Zhou

We propose that the liquidity of an asset includes two components: liquidity
jump and liquidity diffusion. We show that liquidity diffusion has a higher
correlation with crypto wash trading than liquidity jump and demonstrate that
treatment on wash trading significantly reduces the level of liquidity
diffusion, but only marginally reduces that of liquidity jump. We confirm that
the autoregressive models are highly effective in modeling the
liquidity-adjusted return with and without the treatment on wash trading. We
argue that treatment on wash trading is unnecessary in modeling established
crypto assets that trade in unregulated but mainstream exchanges.

arXiv link: http://arxiv.org/abs/2404.07222v3

Econometrics arXiv updated paper (originally submitted: 2024-03-23)

Debiased Machine Learning when Nuisance Parameters Appear in Indicator Functions

Authors: Gyungbae Park

This paper studies debiased machine learning when nuisance parameters appear
in indicator functions. An important example is maximized average welfare gain
under optimal treatment assignment rules. For asymptotically valid inference
for a parameter of interest, the current literature on debiased machine
learning relies on Gateaux differentiability of the functions inside moment
conditions, which does not hold when nuisance parameters appear in indicator
functions. In this paper, we propose smoothing the indicator functions, and
develop an asymptotic distribution theory for this class of models. The
asymptotic behavior of the proposed estimator exhibits a trade-off between bias
and variance due to smoothing. We study how a parameter which controls the
degree of smoothing can be chosen optimally to minimize an upper bound of the
asymptotic mean squared error. A Monte Carlo simulation supports the asymptotic
distribution theory, and an empirical example illustrates the implementation of
the method.

arXiv link: http://arxiv.org/abs/2403.15934v2

Econometrics arXiv updated paper (originally submitted: 2024-03-23)

Difference-in-Differences with Unpoolable Data

Authors: Sunny Karim, Matthew D. Webb, Nichole Austin, Erin Strumpf

Difference-in-differences (DID) is commonly used to estimate treatment
effects but is infeasible in settings where data are unpoolable due to privacy
concerns or legal restrictions on data sharing, particularly across
jurisdictions. In this study, we identify and relax the assumption of data
poolability in DID estimation. We propose an innovative approach to estimate
DID with unpoolable data (UN-DID) which can accommodate covariates, multiple
groups, and staggered adoption. Through analytical proofs and Monte Carlo
simulations, we show that UN-DID and conventional DID estimates of the average
treatment effect and standard errors are equal and unbiased in settings without
covariates. With covariates, both methods produce estimates that are unbiased,
equivalent, and converge to the true value. The estimates differ slightly but
the statistical inference and substantive conclusions remain the same. Two
empirical examples with real-world data further underscore UN-DID's utility.
The UN-DID method allows the estimation of cross-jurisdictional treatment
effects with unpoolable data, enabling better counterfactuals to be used and
new research questions to be answered.

arXiv link: http://arxiv.org/abs/2403.15910v3

Econometrics arXiv paper, submitted: 2024-03-22

Tests for almost stochastic dominance

Authors: Amparo Baíllo, Javier Cárcamo, Carlos Mora-Corral

We introduce a 2-dimensional stochastic dominance (2DSD) index to
characterize both strict and almost stochastic dominance. Based on this index,
we derive an estimator for the minimum violation ratio (MVR), also known as the
critical parameter, of the almost stochastic ordering condition between two
variables. We determine the asymptotic properties of the empirical 2DSD index
and MVR for the most frequently used stochastic orders. We also provide
conditions under which the bootstrap estimators of these quantities are
strongly consistent. As an application, we develop consistent bootstrap testing
procedures for almost stochastic dominance. The performance of the tests is
checked via simulations and the analysis of real data.

arXiv link: http://arxiv.org/abs/2403.15258v1

Econometrics arXiv updated paper (originally submitted: 2024-03-22)

Modelling with Sensitive Variables

Authors: Felix Chan, Laszlo Matyas, Agoston Reguly

The paper deals with models in which the dependent variable, some explanatory
variables, or both represent sensitive data. We introduce a novel
discretization method that preserves data privacy when working with such
variables. A multiple discretization method is proposed that utilizes
information from the different discretization schemes. We show convergence in
distribution for the unobserved variable and derive the asymptotic properties
of the OLS estimator for linear models. Monte Carlo simulation experiments
presented support our theoretical findings. Finally, we contrast our method
with a differential privacy method to estimate the Australian gender wage gap.

arXiv link: http://arxiv.org/abs/2403.15220v3

Econometrics arXiv paper, submitted: 2024-03-22

Fast TTC Computation

Authors: Irene Aldridge

This paper proposes a fast Markov Matrix-based methodology for computing Top
Trading Cycles (TTC) that delivers O(1) computational speed, that is speed
independent of the number of agents and objects in the system. The proposed
methodology is well suited for complex large-dimensional problems like housing
choice. The methodology retains all the properties of TTC, namely,
Pareto-efficiency, individual rationality and strategy-proofness.

arXiv link: http://arxiv.org/abs/2403.15111v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2024-03-21

Estimating Causal Effects with Double Machine Learning -- A Method Evaluation

Authors: Jonathan Fuhr, Philipp Berens, Dominik Papies

The estimation of causal effects with observational data continues to be a
very active research area. In recent years, researchers have developed new
frameworks which use machine learning to relax classical assumptions necessary
for the estimation of causal effects. In this paper, we review one of the most
prominent methods - "double/debiased machine learning" (DML) - and empirically
evaluate it by comparing its performance on simulated data relative to more
traditional statistical methods, before applying it to real-world data. Our
findings indicate that the application of a suitably flexible machine learning
algorithm within DML improves the adjustment for various nonlinear confounding
relationships. This advantage enables a departure from traditional functional
form assumptions typically necessary in causal effect estimation. However, we
demonstrate that the method continues to critically depend on standard
assumptions about causal structure and identification. When estimating the
effects of air pollution on housing prices in our application, we find that DML
estimates are consistently larger than estimates of less flexible methods. From
our overall results, we provide actionable recommendations for specific choices
researchers must make when applying DML in practice.

arXiv link: http://arxiv.org/abs/2403.14385v2

Econometrics arXiv updated paper (originally submitted: 2024-03-21)

A Gaussian smooth transition vector autoregressive model: An application to the macroeconomic effects of severe weather shocks

Authors: Markku Lanne, Savi Virolainen

We introduce a new smooth transition vector autoregressive model with a
Gaussian conditional distribution and transition weights that, for a $p$th
order model, depend on the full distribution of the preceding $p$ observations.
Specifically, the transition weight of each regime increases in its relative
weighted likelihood. This data-driven approach facilitates capturing complex
switching dynamics, enhancing the identification of gradual regime shifts. In
an empirical application to the macroeconomic effects of a severe weather
shock, we find that in monthly U.S. data from 1961:1 to 2022:3, the shock has
stronger impact in the regime prevailing in the early part of the sample and in
certain crisis periods than in the regime dominating the latter part of the
sample. This suggests overall adaptation of the U.S. economy to severe weather
over time.

arXiv link: http://arxiv.org/abs/2403.14216v3

Econometrics arXiv updated paper (originally submitted: 2024-03-20)

Fused LASSO as Non-Crossing Quantile Regression

Authors: Tibor Szendrei, Arnab Bhattacharjee, Mark E. Schaffer

Growth-at-Risk is vital for empirical macroeconomics but is often suspect to
quantile crossing due to data limitations. While existing literature addresses
this through post-processing of the fitted quantiles, these methods do not
correct the estimated coefficients. We advocate for imposing non-crossing
constraints during estimation and demonstrate their equivalence to fused LASSO
with quantile-specific shrinkage parameters. By re-examining Growth-at-Risk
through an interquantile shrinkage lens, we achieve improved left-tail
forecasts and better identification of variables that drive quantile variation.
We show that these improvements have ramifications for policy tools such as
Expected Shortfall and Quantile Local Projections.

arXiv link: http://arxiv.org/abs/2403.14036v3

Econometrics arXiv updated paper (originally submitted: 2024-03-20)

Policy Relevant Treatment Effects with Multidimensional Unobserved Heterogeneity

Authors: Takuya Ura, Lina Zhang

This paper provides a unified framework for bounding policy relevant
treatment effects using instrumental variables. In this framework, the
treatment selection may depend on multidimensional unobserved heterogeneity. We
derive bilinear constraints on the target parameter by extracting information
from identifiable estimands. We apply a convex relaxation method to these
bilinear constraints and provide conservative yet computationally simple
bounds. Our convex-relaxation bounds extend and robustify the bounds by
Mogstad, Santos, and Torgovitsky (2018) which require the threshold-crossing
structure for the treatment: if this condition holds, our bounds are simplified
to theirs for a large class of target parameters; even if it does not, our
bounds include the true parameter value whereas theirs may not and are
sometimes empty. Linear shape restrictions can be easily incorporated to narrow
the proposed bounds. Numerical and simulation results illustrate the
informativeness of our convex-relaxation bounds.

arXiv link: http://arxiv.org/abs/2403.13738v2

Econometrics arXiv paper, submitted: 2024-03-20

Robust Inference in Locally Misspecified Bipartite Networks

Authors: Luis E. Candelaria, Yichong Zhang

This paper introduces a methodology to conduct robust inference in bipartite
networks under local misspecification. We focus on a class of dyadic network
models with misspecified conditional moment restrictions. The framework of
misspecification is local, as the effect of misspecification varies with the
sample size. We utilize this local asymptotic approach to construct a robust
estimator that is minimax optimal for the mean square error within a
neighborhood of misspecification. Additionally, we introduce bias-aware
confidence intervals that account for the effect of the local misspecification.
These confidence intervals have the correct asymptotic coverage for the true
parameter of interest under sparse network asymptotics. Monte Carlo experiments
demonstrate that the robust estimator performs well in finite samples and
sparse networks. As an empirical illustration, we study the formation of a
scientific collaboration network among economists.

arXiv link: http://arxiv.org/abs/2403.13725v1

Econometrics arXiv cross-link from q-fin.MF (q-fin.MF), submitted: 2024-03-20

Multifractal wavelet dynamic mode decomposition modeling for marketing time series

Authors: Mohamed Elshazli A. Zidan, Anouar Ben Mabrouk, Nidhal Ben Abdallah, Tawfeeq M. Alanazi

Marketing is the way we ensure our sales are the best in the market, our
prices the most accessible, and our clients satisfied, thus ensuring our brand
has the widest distribution. This requires sophisticated and advanced
understanding of the whole related network. Indeed, marketing data may exist in
different forms such as qualitative and quantitative data. However, in the
literature, it is easily noted that large bibliographies may be collected about
qualitative studies, while only a few studies adopt a quantitative point of
view. This is a major drawback that results in marketing science still focusing
on design, although the market is strongly dependent on quantities such as
money and time. Indeed, marketing data may form time series such as brand sales
in specified periods, brand-related prices over specified periods, market
shares, etc. The purpose of the present work is to investigate some marketing
models based on time series for various brands. This paper aims to combine the
dynamic mode decomposition and wavelet decomposition to study marketing series
due to both prices, and volume sales in order to explore the effect of the time
scale on the persistence of brand sales in the market and on the forecasting of
such persistence, according to the characteristics of the brand and the related
market competition or competitors. Our study is based on a sample of Saudi
brands during the period 22 November 2017 to 30 December 2021.

arXiv link: http://arxiv.org/abs/2403.13361v1

Econometrics arXiv paper, submitted: 2024-03-19

Composite likelihood estimation of stationary Gaussian processes with a view toward stochastic volatility

Authors: Mikkel Bennedsen, Kim Christensen, Peter Christensen

We develop a framework for composite likelihood inference of parametric
continuous-time stationary Gaussian processes. We derive the asymptotic theory
of the associated maximum composite likelihood estimator. We implement our
approach on a pair of models that has been proposed to describe the random
log-spot variance of financial asset returns. A simulation study shows that it
delivers good performance in these settings and improves upon a
method-of-moments estimation. In an application, we inspect the dynamic of an
intraday measure of spot variance computed with high-frequency data from the
cryptocurrency market. The empirical evidence supports a mechanism, where the
short- and long-term correlation structure of stochastic volatility are
decoupled in order to capture its properties at different time scales.

arXiv link: http://arxiv.org/abs/2403.12653v1

Econometrics arXiv paper, submitted: 2024-03-19

Inflation Target at Risk: A Time-varying Parameter Distributional Regression

Authors: Yunyun Wang, Tatsushi Oka, Dan Zhu

Macro variables frequently display time-varying distributions, driven by the
dynamic and evolving characteristics of economic, social, and environmental
factors that consistently reshape the fundamental patterns and relationships
governing these variables. To better understand the distributional dynamics
beyond the central tendency, this paper introduces a novel semi-parametric
approach for constructing time-varying conditional distributions, relying on
the recent advances in distributional regression. We present an efficient
precision-based Markov Chain Monte Carlo algorithm that simultaneously
estimates all model parameters while explicitly enforcing the monotonicity
condition on the conditional distribution function. Our model is applied to
construct the forecasting distribution of inflation for the U.S., conditional
on a set of macroeconomic and financial indicators. The risks of future
inflation deviating excessively high or low from the desired range are
carefully evaluated. Moreover, we provide a thorough discussion about the
interplay between inflation and unemployment rates during the Global Financial
Crisis, COVID, and the third quarter of 2023.

arXiv link: http://arxiv.org/abs/2403.12456v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-03-18

Robust Estimation and Inference for Categorical Data

Authors: Max Welz

While there is a rich literature on robust methodologies for contamination in
continuously distributed data, contamination in categorical data is largely
overlooked. This is regrettable because many datasets are categorical and
oftentimes suffer from contamination. Examples include inattentive responding
and bot responses in questionnaires or zero-inflated count data. We propose a
novel class of contamination-robust estimators of models for categorical data,
coined $C$-estimators (“$C$” for categorical). We show that the countable and
possibly finite sample space of categorical data results in non-standard
theoretical properties. Notably, in contrast to classic robustness theory,
$C$-estimators can be simultaneously robust and fully efficient at the
postulated model. In addition, a certain particularly robust specification
fails to be asymptotically Gaussian at the postulated model, but is
asymptotically Gaussian in the presence of contamination. We furthermore
propose a diagnostic test to identify categorical outliers and demonstrate the
enhanced robustness of $C$-estimators in a simulation study.

arXiv link: http://arxiv.org/abs/2403.11954v3

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2024-03-17

Identification of Information Structures in Bayesian Games

Authors: Masaki Miyashita

To what extent can an external observer observing an equilibrium action
distribution in an incomplete information game infer the underlying information
structure? We investigate this issue in a general linear-quadratic-Gaussian
framework. A simple class of canonical information structures is offered and
proves rich enough to rationalize any possible equilibrium action distribution
that can arise under an arbitrary information structure. We show that the class
is parsimonious in the sense that the relevant parameters can be uniquely
pinned down by an observed equilibrium outcome, up to some qualifications. Our
result implies, for example, that the accuracy of each agent's signal about the
state is identified, as measured by how much observing the signal reduces the
state variance. Moreover, we show that a canonical information structure
characterizes the lower bound on the amount by which each agent's signal can
reduce the state variance, across all observationally equivalent information
structures. The lower bound is tight, for example, when the actual information
structure is uni-dimensional, or when there are no strategic interactions among
agents, but in general, there is a gap since agents' strategic motives confound
their private information about fundamental and strategic uncertainty.

arXiv link: http://arxiv.org/abs/2403.11333v1

Econometrics arXiv paper, submitted: 2024-03-17

Nonparametric Identification and Estimation with Non-Classical Errors-in-Variables

Authors: Kirill S. Evdokimov, Andrei Zeleneev

This paper considers nonparametric identification and estimation of the
regression function when a covariate is mismeasured. The measurement error need
not be classical. Employing the small measurement error approximation, we
establish nonparametric identification under weak and easy-to-interpret
conditions on the instrumental variable. The paper also provides nonparametric
estimators of the regression function and derives their rates of convergence.

arXiv link: http://arxiv.org/abs/2403.11309v1

Econometrics arXiv updated paper (originally submitted: 2024-03-16)

Comprehensive OOS Evaluation of Predictive Algorithms with Statistical Decision Theory

Authors: Jeff Dominitz, Charles F. Manski

We argue that comprehensive out-of-sample (OOS) evaluation using statistical
decision theory (SDT) should replace the current practice of K-fold and Common
Task Framework validation in machine learning (ML) research on prediction. SDT
provides a formal frequentist framework for performing comprehensive OOS
evaluation across all possible (1) training samples, (2) populations that may
generate training data, and (3) populations of prediction interest. Regarding
feature (3), we emphasize that SDT requires the practitioner to directly
confront the possibility that the future may not look like the past and to
account for a possible need to extrapolate from one population to another when
building a predictive algorithm. For specificity, we consider treatment choice
using conditional predictions with alternative restrictions on the state space
of possible populations that may generate training data. We discuss application
of SDT to the problem of predicting patient illness to inform clinical decision
making. SDT is simple in abstraction, but it is often computationally demanding
to implement. We call on ML researchers, econometricians, and statisticians to
expand the domain within which implementation of SDT is tractable.

arXiv link: http://arxiv.org/abs/2403.11016v3

Econometrics arXiv updated paper (originally submitted: 2024-03-16)

Macroeconomic Spillovers of Weather Shocks across U.S. States

Authors: Emanuele Bacchiocchi, Andrea Bastianin, Graziano Moramarco

We estimate the short-run effects of weather-related disasters on local
economic activity and cross-border spillovers that operate through economic
linkages between U.S. states. To this end, we use emergency declarations
triggered by natural disasters and estimate their effects using a monthly
Global Vector Autoregressive (GVAR) model for U.S. states. Impulse responses
highlight the nationwide effects of weather-related disasters that hit
individual regions. Taking into account economic linkages between states allows
capturing much stronger spillovers than those associated with mere spatial
proximity. The results underscore the importance of geographic heterogeneity
for impact evaluation and the critical role of supply-side propagation
mechanisms.

arXiv link: http://arxiv.org/abs/2403.10907v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-03-15

Limits of Approximating the Median Treatment Effect

Authors: Raghavendra Addanki, Siddharth Bhandari

Average Treatment Effect (ATE) estimation is a well-studied problem in causal
inference. However, it does not necessarily capture the heterogeneity in the
data, and several approaches have been proposed to tackle the issue, including
estimating the Quantile Treatment Effects. In the finite population setting
containing $n$ individuals, with treatment and control values denoted by the
potential outcome vectors $a, b$, much of the prior work
focused on estimating median$(a) -$ median$(b)$, where
median($\mathbf x$) denotes the median value in the sorted ordering of all the
values in vector $\mathbf x$. It is known that estimating the difference of
medians is easier than the desired estimand of median$(a-b)$, called
the Median Treatment Effect (MTE). The fundamental problem of causal inference
-- for every individual $i$, we can only observe one of the potential outcome
values, i.e., either the value $a_i$ or $b_i$, but not both, makes estimating
MTE particularly challenging. In this work, we argue that MTE is not estimable
and detail a novel notion of approximation that relies on the sorted order of
the values in $a-b$. Next, we identify a quantity called variability
that exactly captures the complexity of MTE estimation. By drawing connections
to instance-optimality studied in theoretical computer science, we show that
every algorithm for estimating the MTE obtains an approximation error that is
no better than the error of an algorithm that computes variability. Finally, we
provide a simple linear time algorithm for computing the variability exactly.
Unlike much prior work, a particular highlight of our work is that we make no
assumptions about how the potential outcome vectors are generated or how they
are correlated, except that the potential outcome values are $k$-ary, i.e.,
take one of $k$ discrete values.

arXiv link: http://arxiv.org/abs/2403.10618v1

Econometrics arXiv updated paper (originally submitted: 2024-03-15)

Testing Goodness-of-Fit for Conditional Distributions: A New Perspective based on Principal Component Analysis

Authors: Cui Rui, Li Yuhao

This paper introduces a novel goodness-of-fit test technique for parametric
conditional distributions. The proposed tests are based on a residual marked
empirical process, for which we develop a conditional Principal Component
Analysis. The obtained components provide a basis for various types of new
tests in addition to the omnibus one. Component tests that based on each
component serve as experts in detecting certain directions. Smooth tests that
assemble a few components are also of great use in practice. To further improve
testing performance, we introduce a component selection approach, aiming to
identify the most contributory components. The finite sample performance of the
proposed tests is illustrated through Monte Carlo experiments.

arXiv link: http://arxiv.org/abs/2403.10352v2

Econometrics arXiv cross-link from cs.CL (cs.CL), submitted: 2024-03-15

A Big Data Approach to Understand Sub-national Determinants of FDI in Africa

Authors: A. Fronzetti Colladon, R. Vestrelli, S. Bait, M. M. Schiraldi

Various macroeconomic and institutional factors hinder FDI inflows, including
corruption, trade openness, access to finance, and political instability.
Existing research mostly focuses on country-level data, with limited
exploration of firm-level data, especially in developing countries. Recognizing
this gap, recent calls for research emphasize the need for qualitative data
analysis to delve into FDI determinants, particularly at the regional level.
This paper proposes a novel methodology, based on text mining and social
network analysis, to get information from more than 167,000 online news
articles to quantify regional-level (sub-national) attributes affecting FDI
ownership in African companies. Our analysis extends information on obstacles
to industrial development as mapped by the World Bank Enterprise Surveys.
Findings suggest that regional (sub-national) structural and institutional
characteristics can play an important role in determining foreign ownership.

arXiv link: http://arxiv.org/abs/2403.10239v1

Econometrics arXiv updated paper (originally submitted: 2024-03-13)

Invalid proxies and volatility changes

Authors: Giovanni Angelini, Luca Fanelli, Luca Neri

When in proxy-SVARs the covariance matrix of VAR disturbances is subject to
exogenous, permanent breaks that cause IRFs to change across volatility
regimes, even strong, exogenous external instruments yield inconsistent
estimates of the dynamic causal effects. However, if these volatility shifts
are properly incorporated into the analysis through (testable) "stability
restrictions", we demonstrate that the target IRFs are point-identified and can
be estimated consistently under a necessary and sufficient rank condition. If
the shifts in volatility are sufficiently informative, standard asymptotic
inference remains valid even with (i) local-to-zero covariance between the
proxies and the instrumented structural shocks, and (ii) potential failures of
instrument exogeneity. Intuitively, shifts in volatility act similarly to
strong instruments that are correlated with both the target and non-target
shocks. We illustrate the effectiveness of our approach by revisiting a seminal
fiscal proxy-SVAR for the US economy. We detect a sharp change in the size of
the tax multiplier when the narrative tax instrument is complemented with the
decline in unconditional volatility observed during the transition from the
Great Inflation to the Great Moderation. The narrative tax instrument
contributes to identify the tax shock in both regimes, although our empirical
analysis raises concerns about its "statistical" validity.

arXiv link: http://arxiv.org/abs/2403.08753v3

Econometrics arXiv updated paper (originally submitted: 2024-03-13)

Identifying Treatment and Spillover Effects Using Exposure Contrasts

Authors: Michael P. Leung

To report spillover effects, a common practice is to regress outcomes on
statistics capturing treatment variation among neighboring units. This paper
studies the causal interpretation of nonparametric analogs of these estimands,
which we refer to as exposure contrasts. We demonstrate that their signs can be
inconsistent with those of the unit-level effects of interest even under
unconfounded assignment. We then provide interpretable restrictions under which
exposure contrasts are sign preserving and therefore have causal
interpretations. We discuss the implications of our results for
cluster-randomized trials, network experiments, and observational settings with
peer effects in selection into treatment.

arXiv link: http://arxiv.org/abs/2403.08183v3

Econometrics arXiv updated paper (originally submitted: 2024-03-12)

Imputation of Counterfactual Outcomes when the Errors are Predictable

Authors: Silvia Goncalves, Serena Ng

A crucial input into causal inference is the imputed counterfactual outcome.
Imputation error can arise because of sampling uncertainty from estimating the
prediction model using the untreated observations, or from out-of-sample
information not captured by the model. While the literature has focused on
sampling uncertainty, it vanishes with the sample size. Often overlooked is the
possibility that the out-of-sample error can be informative about the missing
counterfactual outcome if it is mutually or serially correlated. Motivated by
the best linear unbiased predictor (\blup) of goldberger:62 in a time
series setting, we propose an improved predictor of potential outcome when the
errors are correlated. The proposed \pup\; is practical as it is not restricted
to linear models, can be used with consistent estimators already developed, and
improves mean-squared error for a large class of strong mixing error processes.
Ignoring predictability in the errors can distort conditional inference.
However, the precise impact will depend on the choice of estimator as well as
the realized values of the residuals.

arXiv link: http://arxiv.org/abs/2403.08130v2

Econometrics arXiv updated paper (originally submitted: 2024-03-12)

Partial Identification of Individual-Level Parameters Using Aggregate Data in a Nonparametric Model

Authors: Sarah Moon

I develop a methodology to partially identify linear combinations of
conditional mean outcomes when the researcher only has access to aggregate
data. Unlike the existing literature, I only allow for marginal, not joint,
distributions of covariates in my model of aggregate data. Bounds are obtained
by solving an optimization program and can easily accommodate additional
polyhedral shape restrictions. I provide an empirical illustration of the
method to Rhode Island standardized exam data.

arXiv link: http://arxiv.org/abs/2403.07236v7

Econometrics arXiv updated paper (originally submitted: 2024-03-11)

Partially identified heteroskedastic SVARs

Authors: Emanuele Bacchiocchi, Andrea Bastianin, Toru Kitagawa, Elisabetta Mirto

This paper studies the identification of Structural Vector Autoregressions
(SVARs) exploiting a break in the variances of the structural shocks.
Point-identification for this class of models relies on an eigen-decomposition
involving the covariance matrices of reduced-form errors and requires that all
the eigenvalues are distinct. This point-identification, however, fails in the
presence of multiplicity of eigenvalues. This occurs in an empirically relevant
scenario where, for instance, only a subset of structural shocks had the break
in their variances, or where a group of variables shows a variance shift of the
same amount. Together with zero or sign restrictions on the structural
parameters and impulse responses, we derive the identified sets for impulse
responses and show how to compute them. We perform inference on the impulse
response functions, building on the robust Bayesian approach developed for set
identified SVARs. To illustrate our proposal, we present an empirical example
based on the literature on the global crude oil market where the identification
is expected to fail due to multiplicity of eigenvalues.

arXiv link: http://arxiv.org/abs/2403.06879v2

Econometrics arXiv updated paper (originally submitted: 2024-03-11)

Data-Driven Tuning Parameter Selection for High-Dimensional Vector Autoregressions

Authors: Anders Bredahl Kock, Rasmus Søndergaard Pedersen, Jesper Riis-Vestergaard Sørensen

Lasso-type estimators are routinely used to estimate high-dimensional time
series models. The theoretical guarantees established for these estimators
typically require the penalty level to be chosen in a suitable fashion often
depending on unknown population quantities. Furthermore, the resulting
estimates and the number of variables retained in the model depend crucially on
the chosen penalty level. However, there is currently no theoretically founded
guidance for this choice in the context of high-dimensional time series.
Instead, one resorts to selecting the penalty level in an ad hoc manner using,
e.g., information criteria or cross-validation. We resolve this problem by
considering estimation of the perhaps most commonly employed multivariate time
series model, the linear vector autoregressive (VAR) model, and propose
versions of the Lasso, post-Lasso, and square-root Lasso estimators with
penalization chosen in a fully data-driven way. The theoretical guarantees that
we establish for the resulting estimation and prediction errors match those
currently available for methods based on infeasible choices of penalization. We
thus provide a first solution for choosing the penalization in high-dimensional
time series models.

arXiv link: http://arxiv.org/abs/2403.06657v2

Econometrics arXiv paper, submitted: 2024-03-10

Estimating Factor-Based Spot Volatility Matrices with Noisy and Asynchronous High-Frequency Data

Authors: Degui Li, Oliver Linton, Haoxuan Zhang

We propose a new estimator of high-dimensional spot volatility matrices
satisfying a low-rank plus sparse structure from noisy and asynchronous
high-frequency data collected for an ultra-large number of assets. The noise
processes are allowed to be temporally correlated, heteroskedastic,
asymptotically vanishing and dependent on the efficient prices. We define a
kernel-weighted pre-averaging method to jointly tackle the microstructure noise
and asynchronicity issues, and we obtain uniformly consistent estimates for
latent prices. We impose a continuous-time factor model with time-varying
factor loadings on the price processes, and estimate the common factors and
loadings via a local principal component analysis. Assuming a uniform sparsity
condition on the idiosyncratic volatility structure, we combine the POET and
kernel-smoothing techniques to estimate the spot volatility matrices for both
the latent prices and idiosyncratic errors. Under some mild restrictions, the
estimated spot volatility matrices are shown to be uniformly consistent under
various matrix norms. We provide Monte-Carlo simulation and empirical studies
to examine the numerical performance of the developed estimation methodology.

arXiv link: http://arxiv.org/abs/2403.06246v1

Econometrics arXiv updated paper (originally submitted: 2024-03-09)

Locally Regular and Efficient Tests in Non-Regular Semiparametric Models

Authors: Adam Lee

This paper considers hypothesis testing in semiparametric models which may be
non-regular. I show that C($\alpha$) style tests are locally regular under mild
conditions, including in cases where locally regular estimators do not exist,
such as models which are (semiparametrically) weakly identified. I characterise
the appropriate limit experiment in which to study local (asymptotic)
optimality of tests in the non-regular case and generalise classical power
bounds to this case. I give conditions under which these power bounds are
attained by the proposed C($\alpha$) style tests. The application of the theory
to a single index model and an instrumental variables model is worked out in
detail.

arXiv link: http://arxiv.org/abs/2403.05999v2

Econometrics arXiv updated paper (originally submitted: 2024-03-09)

Estimating Causal Effects of Discrete and Continuous Treatments with Binary Instruments

Authors: Victor Chernozhukov, Iván Fernández-Val, Sukjin Han, Kaspar Wüthrich

We propose an instrumental variable framework for identifying and estimating
causal effects of discrete and continuous treatments with binary instruments.
The basis of our approach is a local copula representation of the joint
distribution of the potential outcomes and unobservables determining treatment
assignment. This representation allows us to introduce an identifying
assumption, so-called copula invariance, that restricts the local dependence of
the copula with respect to the treatment propensity. We show that copula
invariance identifies treatment effects for the entire population and other
subpopulations such as the treated. The identification results are constructive
and lead to practical estimation and inference procedures based on distribution
regression. An application to estimating the effect of sleep on well-being
uncovers interesting patterns of heterogeneity.

arXiv link: http://arxiv.org/abs/2403.05850v2

Econometrics arXiv updated paper (originally submitted: 2024-03-09)

Semiparametric Inference for Regression-Discontinuity Designs

Authors: Weiwei Jiang, Rong J. B. Zhu

Treatment effects in regression discontinuity designs (RDDs) are often
estimated using local regression methods. Hahn:01 demonstrated that the
identification of the average treatment effect at the cutoff in RDDs relies on
the unconfoundedness assumption and that, without this assumption, only the
local average treatment effect at the cutoff can be identified. In this paper,
we propose a semiparametric framework tailored for identifying the average
treatment effect in RDDs, eliminating the need for the unconfoundedness
assumption. Our approach globally conceptualizes the identification as a
partially linear modeling problem, with the coefficient of a specified
polynomial function of propensity score in the linear component capturing the
average treatment effect. This identification result underpins our
semiparametric inference for RDDs, employing the $P$-spline method to
approximate the nonparametric function and establishing a procedure for
conducting inference within this framework. Through theoretical analysis, we
demonstrate that our global approach achieves a faster convergence rate
compared to the local method. Monte Carlo simulations further confirm that the
proposed method consistently outperforms alternatives across various scenarios.
Furthermore, applications to real-world datasets illustrate that our global
approach can provide more reliable inference for practical problems.

arXiv link: http://arxiv.org/abs/2403.05803v3

Econometrics arXiv updated paper (originally submitted: 2024-03-08)

Non-robustness of diffusion estimates on networks with measurement error

Authors: Arun G. Chandrasekhar, Paul Goldsmith-Pinkham, Tyler H. McCormick, Samuel Thau, Jerry Wei

Network diffusion models are used to study things like disease transmission,
information spread, and technology adoption. However, small amounts of
mismeasurement are extremely likely in the networks constructed to
operationalize these models. We show that estimates of diffusions are highly
non-robust to this measurement error. First, we show that even when measurement
error is vanishingly small, such that the share of missed links is close to
zero, forecasts about the extent of diffusion will greatly underestimate the
truth. Second, a small mismeasurement in the identity of the initial seed
generates a large shift in the locations of expected diffusion path. We show
that both of these results still hold when the vanishing measurement error is
only local in nature. Such non-robustness in forecasting exists even under
conditions where the basic reproductive number is consistently estimable.
Possible solutions, such as estimating the measurement error or implementing
widespread detection efforts, still face difficulties because the number of
missed links are so small. Finally, we conduct Monte Carlo simulations on
simulated networks, and real networks from three settings: travel data from the
COVID-19 pandemic in the western US, a mobile phone marketing campaign in rural
India, and in an insurance experiment in China.

arXiv link: http://arxiv.org/abs/2403.05704v4

Econometrics arXiv updated paper (originally submitted: 2024-03-07)

Nonparametric Regression under Cluster Sampling

Authors: Yuya Shimizu

This paper develops a general asymptotic theory for nonparametric kernel
regression in the presence of cluster dependence. We examine nonparametric
density estimation, Nadaraya-Watson kernel regression, and local linear
estimation. Our theory accommodates growing and heterogeneous cluster sizes. We
derive asymptotic conditional bias and variance, establish uniform consistency,
and prove asymptotic normality. Our findings reveal that under heterogeneous
cluster sizes, the asymptotic variance includes a new term reflecting
within-cluster dependence, which is overlooked when cluster sizes are presumed
to be bounded. We propose valid approaches for bandwidth selection and
inference, introduce estimators of the asymptotic variance, and demonstrate
their consistency. In simulations, we verify the effectiveness of the
cluster-robust bandwidth selection and show that the derived cluster-robust
confidence interval improves the coverage ratio. We illustrate the application
of these methods using a policy-targeting dataset in development economics.

arXiv link: http://arxiv.org/abs/2403.04766v2

Econometrics arXiv paper, submitted: 2024-03-07

A Logarithmic Mean Divisia Index Decomposition of CO$_2$ Emissions from Energy Use in Romania

Authors: Mariana Carmelia Balanica-Dragomir, Gabriel Murariu, Lucian Puiu Georgescu

Carbon emissions have become a specific alarming indicators and intricate
challenges that lead an extended argue about climate change. The growing trend
in the utilization of fossil fuels for the economic progress and simultaneously
reducing the carbon quantity has turn into a substantial and global challenge.
The aim of this paper is to examine the driving factors of CO$_2$ emissions
from energy sector in Romania during the period 2008-2022 emissions using the
log mean Divisia index (LMDI) method and takes into account five items: CO$_2$
emissions, primary energy resources, energy consumption, gross domestic product
and population, the driving forces of CO$_2$ emissions, based on which it was
calculated the contribution of carbon intensity, energy mixes, generating
efficiency, economy, and population. The results indicate that generating
efficiency effect -90968.57 is the largest inhibiting index while economic
effect is the largest positive index 69084.04 having the role of increasing
CO$_2$ emissions.

arXiv link: http://arxiv.org/abs/2403.04354v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2024-03-07

A dual approach to nonparametric characterization for random utility models

Authors: Nobuo Koida, Koji Shirai

This paper develops a novel characterization for random utility models (RUM),
which turns out to be a dual representation of the characterization by Kitamura
and Stoye (2018, ECMA). For a given family of budgets and its "patch"
representation \'a la Kitamura and Stoye, we construct a matrix $\Xi$ of which
each row vector indicates the structure of possible revealed preference
relations in each subfamily of budgets. Then, it is shown that a stochastic
demand system on the patches of budget lines, say $\pi$, is consistent with a
RUM, if and only if $\Xi\pi \geq 1$, where the RHS is the vector of
$1$'s. In addition to providing a concise quantifier-free characterization,
especially when $\pi$ is inconsistent with RUMs, the vector $\Xi\pi$ also
contains information concerning (1) sub-families of budgets in which cyclical
choices must occur with positive probabilities, and (2) the maximal possible
weights on rational choice patterns in a population. The notion of Chv\'atal
rank of polytopes and the duality theorem in linear programming play key roles
to obtain these results.

arXiv link: http://arxiv.org/abs/2403.04328v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-03-07

Regularized DeepIV with Model Selection

Authors: Zihao Li, Hui Lan, Vasilis Syrgkanis, Mengdi Wang, Masatoshi Uehara

In this paper, we study nonparametric estimation of instrumental variable
(IV) regressions. While recent advancements in machine learning have introduced
flexible methods for IV estimation, they often encounter one or more of the
following limitations: (1) restricting the IV regression to be uniquely
identified; (2) requiring minimax computation oracle, which is highly unstable
in practice; (3) absence of model selection procedure. In this paper, we
present the first method and analysis that can avoid all three limitations,
while still enabling general function approximation. Specifically, we propose a
minimax-oracle-free method called Regularized DeepIV (RDIV) regression that can
converge to the least-norm IV solution. Our method consists of two stages:
first, we learn the conditional distribution of covariates, and by utilizing
the learned distribution, we learn the estimator by minimizing a
Tikhonov-regularized loss function. We further show that our method allows
model selection procedures that can achieve the oracle rates in the
misspecified regime. When extended to an iterative estimator, our method
matches the current state-of-the-art convergence rate. Our method is a Tikhonov
regularized variant of the popular DeepIV method with a non-parametric MLE
first-stage estimator, and our results provide the first rigorous guarantees
for this empirically used method, showcasing the importance of regularization
which was absent from the original work.

arXiv link: http://arxiv.org/abs/2403.04236v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-03-07

Extracting Mechanisms from Heterogeneous Effects: An Identification Strategy for Mediation Analysis

Authors: Jiawei Fu

Understanding causal mechanisms is crucial for explaining and generalizing
empirical phenomena. Causal mediation analysis offers statistical techniques to
quantify the mediation effects. However, current methods often require multiple
ignorability assumptions or sophisticated research designs. In this paper, we
introduce a novel identification strategy that enables the simultaneous
identification and estimation of treatment and mediation effects. By combining
explicit and implicit mediation analysis, this strategy exploits heterogeneous
treatment effects through a new decomposition of total treatment effects. Monte
Carlo simulations demonstrate that the method is more accurate and precise
across various scenarios. To illustrate the efficiency and efficacy of our
method, we apply it to estimate the causal mediation effects in two studies
with distinct data structures, focusing on common pool resource governance and
voting information. Additionally, we have developed statistical software to
facilitate the implementation of our method.

arXiv link: http://arxiv.org/abs/2403.04131v5

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-03-06

Active Adaptive Experimental Design for Treatment Effect Estimation with Covariate Choices

Authors: Masahiro Kato, Akihiro Oga, Wataru Komatsubara, Ryo Inokuchi

This study designs an adaptive experiment for efficiently estimating average
treatment effects (ATEs). In each round of our adaptive experiment, an
experimenter sequentially samples an experimental unit, assigns a treatment,
and observes the corresponding outcome immediately. At the end of the
experiment, the experimenter estimates an ATE using the gathered samples. The
objective is to estimate the ATE with a smaller asymptotic variance. Existing
studies have designed experiments that adaptively optimize the propensity score
(treatment-assignment probability). As a generalization of such an approach, we
propose optimizing the covariate density as well as the propensity score.
First, we derive the efficient covariate density and propensity score that
minimize the semiparametric efficiency bound and find that optimizing both
covariate density and propensity score minimizes the semiparametric efficiency
bound more effectively than optimizing only the propensity score. Next, we
design an adaptive experiment using the efficient covariate density and
propensity score sequentially estimated during the experiment. Lastly, we
propose an ATE estimator whose asymptotic variance aligns with the minimized
semiparametric efficiency bound.

arXiv link: http://arxiv.org/abs/2403.03589v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-03-05

Demystifying and avoiding the OLS "weighting problem": Unmodeled heterogeneity and straightforward solutions

Authors: Tanvi Shinkre, Chad Hazlett

Researchers frequently estimate treatment effects by regressing outcomes (Y)
on treatment (D) and covariates (X). Even without unobserved confounding, the
coefficient on D yields a conditional-variance-weighted average of strata-wise
effects, not the average treatment effect. Scholars have proposed
characterizing the severity of these weights, evaluating resulting biases, or
changing investigators' target estimand to the conditional-variance-weighted
effect. We aim to demystify these weights, clarifying how they arise, what they
represent, and how to avoid them. Specifically, these weights reflect
misspecification bias from unmodeled treatment-effect heterogeneity. Rather
than diagnosing or tolerating them, we recommend avoiding the issue altogether,
by relaxing the standard regression assumption of "single linearity" to one of
"separate linearity" (of each potential outcome in the covariates),
accommodating heterogeneity. Numerous methods--including regression imputation
(g-computation), interacted regression, and mean balancing weights--satisfy
this assumption. In many settings, the efficiency cost to avoiding this
weighting problem altogether will be modest and worthwhile.

arXiv link: http://arxiv.org/abs/2403.03299v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-03-05

Triple/Debiased Lasso for Statistical Inference of Conditional Average Treatment Effects

Authors: Masahiro Kato

This study investigates the estimation and the statistical inference about
Conditional Average Treatment Effects (CATEs), which have garnered attention as
a metric representing individualized causal effects. In our data-generating
process, we assume linear models for the outcomes associated with binary
treatments and define the CATE as a difference between the expected outcomes of
these linear models. This study allows the linear models to be
high-dimensional, and our interest lies in consistent estimation and
statistical inference for the CATE. In high-dimensional linear regression, one
typical approach is to assume sparsity. However, in our study, we do not assume
sparsity directly. Instead, we consider sparsity only in the difference of the
linear models. We first use a doubly robust estimator to approximate this
difference and then regress the difference on covariates with Lasso
regularization. Although this regression estimator is consistent for the CATE,
we further reduce the bias using the techniques in double/debiased machine
learning (DML) and debiased Lasso, leading to $n$-consistency and
confidence intervals. We refer to the debiased estimator as the triple/debiased
Lasso (TDL), applying both DML and debiased Lasso techniques. We confirm the
soundness of our proposed method through simulation studies.

arXiv link: http://arxiv.org/abs/2403.03240v1

Econometrics arXiv updated paper (originally submitted: 2024-03-05)

Matrix-based Prediction Approach for Intraday Instantaneous Volatility Vector

Authors: Sung Hoon Choi, Donggyu Kim

In this paper, we introduce a novel method for predicting intraday
instantaneous volatility based on Ito semimartingale models using
high-frequency financial data. Several studies have highlighted stylized
volatility time series features, such as interday auto-regressive dynamics and
the intraday U-shaped pattern. To accommodate these volatility features, we
propose an interday-by-intraday instantaneous volatility matrix process that
can be decomposed into low-rank conditional expected instantaneous volatility
and noise matrices. To predict the low-rank conditional expected instantaneous
volatility matrix, we propose the Two-sIde Projected-PCA (TIP-PCA) procedure.
We establish asymptotic properties of the proposed estimators and conduct a
simulation study to assess the finite sample performance of the proposed
prediction method. Finally, we apply the TIP-PCA method to an out-of-sample
instantaneous volatility vector prediction study using high-frequency data from
the S&P 500 index and 11 sector index funds.

arXiv link: http://arxiv.org/abs/2403.02591v3

Econometrics arXiv paper, submitted: 2024-03-04

Applied Causal Inference Powered by ML and AI

Authors: Victor Chernozhukov, Christian Hansen, Nathan Kallus, Martin Spindler, Vasilis Syrgkanis

An introduction to the emerging fusion of machine learning and causal
inference. The book presents ideas from classical structural equation models
(SEMs) and their modern AI equivalent, directed acyclical graphs (DAGs) and
structural causal models (SCMs), and covers Double/Debiased Machine Learning
methods to do inference in such models using modern predictive tools.

arXiv link: http://arxiv.org/abs/2403.02467v1

Econometrics arXiv paper, submitted: 2024-03-04

Improved Tests for Mediation

Authors: Grant Hillier, Kees Jan van Garderen, Noud van Giersbergen

Testing for a mediation effect is important in many disciplines, but is made
difficult - even asymptotically - by the influence of nuisance parameters.
Classical tests such as likelihood ratio (LR) and Wald (Sobel) tests have very
poor power properties in parts of the parameter space, and many attempts have
been made to produce improved tests, with limited success. In this paper we
show that augmenting the critical region of the LR test can produce a test with
much improved behavior everywhere. In fact, we first show that there exists a
test of this type that is (asymptotically) exact for certain test levels
$\alpha $, including the common choices $\alpha =.01,.05,.10.$ The critical
region of this exact test has some undesirable properties. We go on to show
that there is a very simple class of augmented LR critical regions which
provides tests that are nearly exact, and avoid the issues inherent in the
exact test. We suggest an optimal and coherent member of this class, provide
the table needed to implement the test and to report p-values if desired.
Simulation confirms validity with non-Gaussian disturbances, under
heteroskedasticity, and in a nonlinear (logit) model. A short application of
the method to an entrepreneurial attitudes study is included for illustration.

arXiv link: http://arxiv.org/abs/2403.02144v1

Econometrics arXiv updated paper (originally submitted: 2024-03-03)

Calibrating doubly-robust estimators with unbalanced treatment assignment

Authors: Daniele Ballinari

Machine learning methods, particularly the double machine learning (DML)
estimator (Chernozhukov et al., 2018), are increasingly popular for the
estimation of the average treatment effect (ATE). However, datasets often
exhibit unbalanced treatment assignments where only a few observations are
treated, leading to unstable propensity score estimations. We propose a simple
extension of the DML estimator which undersamples data for propensity score
modeling and calibrates scores to match the original distribution. The paper
provides theoretical results showing that the estimator retains the DML
estimator's asymptotic properties. A simulation study illustrates the finite
sample performance of the estimator.

arXiv link: http://arxiv.org/abs/2403.01585v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-03-03

Minimax-Regret Sample Selection in Randomized Experiments

Authors: Yuchen Hu, Henry Zhu, Emma Brunskill, Stefan Wager

Randomized controlled trials are often run in settings with many
subpopulations that may have differential benefits from the treatment being
evaluated. We consider the problem of sample selection, i.e., whom to enroll in
a randomized trial, such as to optimize welfare in a heterogeneous population.
We formalize this problem within the minimax-regret framework, and derive
optimal sample-selection schemes under a variety of conditions. Using data from
a COVID-19 vaccine trial, we also highlight how different objectives and
decision rules can lead to meaningfully different guidance regarding optimal
sample allocation.

arXiv link: http://arxiv.org/abs/2403.01386v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2024-03-02

High-Dimensional Tail Index Regression: with An Application to Text Analyses of Viral Posts in Social Media

Authors: Yuya Sasaki, Jing Tao, Yulong Wang

Motivated by the empirical observation of power-law distributions in the
credits (e.g., "likes") of viral social media posts, we introduce a
high-dimensional tail index regression model and propose methods for estimation
and inference of its parameters. First, we present a regularized estimator,
establish its consistency, and derive its convergence rate. Second, we
introduce a debiasing technique for the regularized estimator to facilitate
inference and prove its asymptotic normality. Third, we extend our approach to
handle large-scale online streaming data using stochastic gradient descent.
Simulation studies corroborate our theoretical findings. We apply these methods
to the text analysis of viral posts on X (formerly Twitter) related to LGBTQ+
topics.

arXiv link: http://arxiv.org/abs/2403.01318v2

Econometrics arXiv updated paper (originally submitted: 2024-03-01)

Electric vehicle pricing and battery costs: A misaligned assumption

Authors: Lucas Woodley, Chung Yi See, Vasco Rato Santos, Megan Yeo, Daniel Palmer, Sebastian Nosenzo, Ashley Nunes

Although electric vehicles (EVs) are a climate friendly alternative to
internal combustion engine vehicles (ICEVs), EV adoption is challenged by
higher up-front procurement prices. Existing discourse attributes this price
differential to high battery costs and reasons that lowering these costs will
reduce EV upfront price differentials. However, other factors beyond battery
price may influence prices. Leveraging data for over 400 EV models and trims
sold in the United Sates between 2011-2023, we scrutinize these factors. We
find that contrary to existing discourse, EV MSRP has increased over time
despite declining EV battery costs. We attribute this increase to the growing
accommodation of attributes that strongly influence EV prices but have long
been underappreciated in mainstream discourse. Furthermore, and relevant to
decarbonization efforts, we observe that continued reductions in pack-level
battery costs are unlikely to deliver price parity between EVs and ICEVs. Were
pack level battery costs reduced to zero, EV MSRP would decrease by $4,025,
estimates that are insufficient to offset observed price differences between
EVs and ICEVs. These findings warrant attention as decarbonization efforts
increasingly emphasize EVs as a pathway for complying with domestic and
international climate agreements.

arXiv link: http://arxiv.org/abs/2403.00458v2

Econometrics arXiv updated paper (originally submitted: 2024-03-01)

Inference for Interval-Identified Parameters Selected from an Estimated Set

Authors: Sukjin Han, Adam McCloskey

Interval identification of parameters such as average treatment effects,
average partial effects and welfare is particularly common when using
observational data and experimental data with imperfect compliance due to the
endogeneity of individuals' treatment uptake. In this setting, the researcher
is typically interested in a treatment or policy that is either selected from
the estimated set of best-performers or arises from a data-dependent selection
rule. In this paper, we develop new inference tools for interval-identified
parameters chosen via these forms of selection. We develop three types of
confidence intervals for data-dependent and interval-identified parameters,
discuss how they apply to several examples of interest and prove their uniform
asymptotic validity under weak assumptions.

arXiv link: http://arxiv.org/abs/2403.00422v2

Econometrics arXiv updated paper (originally submitted: 2024-03-01)

Set-Valued Control Functions

Authors: Sukjin Han, Hiroaki Kaido

The control function approach allows the researcher to identify various
causal effects of interest. While powerful, it requires a strong invertibility
assumption in the selection process, which limits its applicability. This paper
expands the scope of the nonparametric control function approach by allowing
the control function to be set-valued and derive sharp bounds on structural
parameters. The proposed generalization accommodates a wide range of selection
processes involving discrete endogenous variables, random coefficients,
treatment selections with interference, and dynamic treatment selections. The
framework also applies to partially observed or identified controls that are
directly motivated from economic models.

arXiv link: http://arxiv.org/abs/2403.00347v3

Econometrics arXiv paper, submitted: 2024-02-29

Testing Information Ordering for Strategic Agents

Authors: Sukjin Han, Hiroaki Kaido, Lorenzo Magnolfi

A key primitive of a strategic environment is the information available to
players. Specifying a priori an information structure is often difficult for
empirical researchers. We develop a test of information ordering that allows
researchers to examine if the true information structure is at least as
informative as a proposed baseline. We construct a computationally tractable
test statistic by utilizing the notion of Bayes Correlated Equilibrium (BCE) to
translate the ordering of information structures into an ordering of functions.
We apply our test to examine whether hubs provide informational advantages to
certain airlines in addition to market power.

arXiv link: http://arxiv.org/abs/2402.19425v1

Econometrics arXiv cross-link from q-fin.TR (q-fin.TR), submitted: 2024-02-29

An Empirical Analysis of Scam Tokens on Ethereum Blockchain

Authors: Vahidin Jeleskovic

This article presents an empirical investigation into the determinants of
total revenue generated by counterfeit tokens on Uniswap. It offers a detailed
overview of the counterfeit token fraud process, along with a systematic
summary of characteristics associated with such fraudulent activities observed
in Uniswap. The study primarily examines the relationship between revenue from
counterfeit token scams and their defining characteristics, and analyzes the
influence of market economic factors such as return on market capitalization
and price return on Ethereum. Key findings include a significant increase in
overall transactions of counterfeit tokens on their first day of fraud, and a
rise in upfront fraud costs leading to corresponding increases in revenue.
Furthermore, a negative correlation is identified between the total revenue of
counterfeit tokens and the volatility of Ethereum market capitalization return,
while price return volatility on Ethereum is found to have a positive impact on
counterfeit token revenue, albeit requiring further investigation for a
comprehensive understanding. Additionally, the number of subscribers for the
real token correlates positively with the realized volume of scam tokens,
indicating that a larger community following the legitimate token may
inadvertently contribute to the visibility and success of counterfeit tokens.
Conversely, the number of Telegram subscribers exhibits a negative impact on
the realized volume of scam tokens, suggesting that a higher level of scrutiny
or awareness within Telegram communities may act as a deterrent to fraudulent
activities. Finally, the timing of when the scam token is introduced on the
Ethereum blockchain may have a negative impact on its success. Notably, the
cumulative amount scammed by only 42 counterfeit tokens amounted to almost
11214 Ether.

arXiv link: http://arxiv.org/abs/2402.19399v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-02-29

Extremal quantiles of intermediate orders under two-way clustering

Authors: Harold D. Chiang, Ryutah Kato, Yuya Sasaki

This paper investigates extremal quantiles under two-way cluster dependence.
We demonstrate that the limiting distribution of the unconditional intermediate
order quantiles in the tails converges to a Gaussian distribution. This is
remarkable as two-way cluster dependence entails potential non-Gaussianity in
general, but extremal quantiles do not suffer from this issue. Building upon
this result, we extend our analysis to extremal quantile regressions of
intermediate order.

arXiv link: http://arxiv.org/abs/2402.19268v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-02-28

Unveiling the Potential of Robustness in Selecting Conditional Average Treatment Effect Estimators

Authors: Yiyan Huang, Cheuk Hang Leung, Siyi Wang, Yijun Li, Qi Wu

The growing demand for personalized decision-making has led to a surge of
interest in estimating the Conditional Average Treatment Effect (CATE). Various
types of CATE estimators have been developed with advancements in machine
learning and causal inference. However, selecting the desirable CATE estimator
through a conventional model validation procedure remains impractical due to
the absence of counterfactual outcomes in observational data. Existing
approaches for CATE estimator selection, such as plug-in and pseudo-outcome
metrics, face two challenges. First, they must determine the metric form and
the underlying machine learning models for fitting nuisance parameters (e.g.,
outcome function, propensity function, and plug-in learner). Second, they lack
a specific focus on selecting a robust CATE estimator. To address these
challenges, this paper introduces a Distributionally Robust Metric (DRM) for
CATE estimator selection. The proposed DRM is nuisance-free, eliminating the
need to fit models for nuisance parameters, and it effectively prioritizes the
selection of a distributionally robust CATE estimator. The experimental results
validate the effectiveness of the DRM method in selecting CATE estimators that
are robust to the distribution shift incurred by covariate shift and hidden
confounders.

arXiv link: http://arxiv.org/abs/2402.18392v2

Econometrics arXiv updated paper (originally submitted: 2024-02-27)

Quasi-Bayesian Estimation and Inference with Control Functions

Authors: Ruixuan Liu, Zhengfei Yu

This paper introduces a quasi-Bayesian method that integrates frequentist
nonparametric estimation with Bayesian inference in a two-stage process.
Applied to an endogenous discrete choice model, the approach first uses kernel
or sieve estimators to estimate the control function nonparametrically,
followed by Bayesian methods to estimate the structural parameters. This
combination leverages the advantages of both frequentist tractability for
nonparametric estimation and Bayesian computational efficiency for complicated
structural models. We analyze the asymptotic properties of the resulting
quasi-posterior distribution, finding that its mean provides a consistent
estimator for the parameters of interest, although its quantiles do not yield
valid confidence intervals. However, bootstrapping the quasi-posterior mean
accounts for the estimation uncertainty from the first stage, thereby producing
asymptotically valid confidence intervals.

arXiv link: http://arxiv.org/abs/2402.17374v2

Econometrics arXiv updated paper (originally submitted: 2024-02-27)

Treatment effects without multicollinearity? Temporal order and the Gram-Schmidt process in causal inference

Authors: Robin M. Cross, Steven T. Buccola

This paper incorporates information about the temporal order of regressors to
estimate orthogonal and economically interpretable regression coefficients. We
establish new finite sample properties for the Gram-Schmidt orthogonalization
process. Coefficients are unbiased and stable with lower standard errors than
those from Ordinary Least Squares. We provide conditions under which
coefficients represent average total treatment effects on the treated and
extend the model to groups of ordered and simultaneous regressors. Finally, we
reanalyze two studies that controlled for temporally ordered and collinear
characteristics, including race, education, and income. The new approach
expands Bohren et al.'s decomposition of systemic discrimination into
channel-specific effects and improves significance levels.

arXiv link: http://arxiv.org/abs/2402.17103v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-02-26

Towards Generalizing Inferences from Trials to Target Populations

Authors: Melody Y Huang, Harsh Parikh

Randomized Controlled Trials (RCTs) are pivotal in generating internally
valid estimates with minimal assumptions, serving as a cornerstone for
researchers dedicated to advancing causal inference methods. However, extending
these findings beyond the experimental cohort to achieve externally valid
estimates is crucial for broader scientific inquiry. This paper delves into the
forefront of addressing these external validity challenges, encapsulating the
essence of a multidisciplinary workshop held at the Institute for Computational
and Experimental Research in Mathematics (ICERM), Brown University, in Fall
2023. The workshop congregated experts from diverse fields including social
science, medicine, public health, statistics, computer science, and education,
to tackle the unique obstacles each discipline faces in extrapolating
experimental findings. Our study presents three key contributions: we integrate
ongoing efforts, highlighting methodological synergies across fields; provide
an exhaustive review of generalizability and transportability based on the
workshop's discourse; and identify persistent hurdles while suggesting avenues
for future research. By doing so, this paper aims to enhance the collective
understanding of the generalizability and transportability of causal effects,
fostering cross-disciplinary collaboration and offering valuable insights for
researchers working on refining and applying causal inference methods.

arXiv link: http://arxiv.org/abs/2402.17042v2

Econometrics arXiv paper, submitted: 2024-02-26

Fast Algorithms for Quantile Regression with Selection

Authors: Santiago Pereda-Fernández

This paper addresses computational challenges in estimating Quantile
Regression with Selection (QRS). The estimation of the parameters that model
self-selection requires the estimation of the entire quantile process several
times. Moreover, closed-form expressions of the asymptotic variance are too
cumbersome, making the bootstrap more convenient to perform inference. Taking
advantage of recent advancements in the estimation of quantile regression,
along with some specific characteristics of the QRS estimation problem, I
propose streamlined algorithms for the QRS estimator. These algorithms
significantly reduce computation time through preprocessing techniques and
quantile grid reduction for the estimation of the copula and slope parameters.
I show the optimization enhancements with some simulations. Lastly, I show how
preprocessing methods can improve the precision of the estimates without
sacrificing computational efficiency. Hence, they constitute a practical
solutions for estimators with non-differentiable and non-convex criterion
functions such as those based on copulas.

arXiv link: http://arxiv.org/abs/2402.16693v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-02-26

Information-Enriched Selection of Stationary and Non-Stationary Autoregressions using the Adaptive Lasso

Authors: Thilo Reinschlüssel, Martin C. Arnold

We propose a novel approach to elicit the weight of a potentially
non-stationary regressor in the consistent and oracle-efficient estimation of
autoregressive models using the adaptive Lasso. The enhanced weight builds on a
statistic that exploits distinct orders in probability of the OLS estimator in
time series regressions when the degree of integration differs. We provide
theoretical results on the benefit of our approach for detecting stationarity
when a tuning criterion selects the $\ell_1$ penalty parameter. Monte Carlo
evidence shows that our proposal is superior to using OLS-based weights, as
suggested by Kock [Econom. Theory, 32, 2016, 243-259]. We apply the modified
estimator to model selection for German inflation rates after the introduction
of the Euro. The results indicate that energy commodity price inflation and
headline inflation are best described by stationary autoregressions.

arXiv link: http://arxiv.org/abs/2402.16580v2

Econometrics arXiv paper, submitted: 2024-02-26

Estimating Stochastic Block Models in the Presence of Covariates

Authors: Yuichi Kitamura, Louise Laage

In the standard stochastic block model for networks, the probability of a
connection between two nodes, often referred to as the edge probability,
depends on the unobserved communities each of these nodes belongs to. We
consider a flexible framework in which each edge probability, together with the
probability of community assignment, are also impacted by observed covariates.
We propose a computationally tractable two-step procedure to estimate the
conditional edge probabilities as well as the community assignment
probabilities. The first step relies on a spectral clustering algorithm applied
to a localized adjacency matrix of the network. In the second step, k-nearest
neighbor regression estimates are computed on the extracted communities. We
study the statistical properties of these estimators by providing
non-asymptotic bounds.

arXiv link: http://arxiv.org/abs/2402.16322v1

Econometrics arXiv updated paper (originally submitted: 2024-02-23)

Inference for Regression with Variables Generated by AI or Machine Learning

Authors: Laura Battaglia, Timothy Christensen, Stephen Hansen, Szymon Sacher

Researchers now routinely use AI or other machine learning methods to
estimate latent variables of economic interest, then plug-in the estimates as
covariates in a regression. We show both theoretically and empirically that
naively treating AI/ML-generated variables as "data" leads to biased estimates
and invalid inference. To restore valid inference, we propose two methods: (1)
an explicit bias correction with bias-corrected confidence intervals, and (2)
joint estimation of the regression parameters and latent variables. We
illustrate these ideas through applications involving label imputation,
dimensionality reduction, and index construction via classification and
aggregation.

arXiv link: http://arxiv.org/abs/2402.15585v5

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-02-22

A Combinatorial Central Limit Theorem for Stratified Randomization

Authors: Purevdorj Tuvaandorj

This paper establishes a combinatorial central limit theorem for stratified
randomization, which holds under a Lindeberg-type condition. The theorem allows
for an arbitrary number or sizes of strata, with the sole requirement being
that each stratum contains at least two units. This flexibility accommodates
both a growing number of large and small strata simultaneously, while imposing
minimal conditions. We then apply this result to derive the asymptotic
distributions of two test statistics proposed for instrumental variables
settings in the presence of potentially many strata of unrestricted sizes.

arXiv link: http://arxiv.org/abs/2402.14764v2

Econometrics arXiv updated paper (originally submitted: 2024-02-22)

Functional Spatial Autoregressive Models

Authors: Tadao Hoshino

This study introduces a novel spatial autoregressive model in which the
dependent variable is a function that may exhibit functional autocorrelation
with the outcome functions of nearby units. This model can be characterized as
a simultaneous integral equation system, which, in general, does not
necessarily have a unique solution. For this issue, we provide a simple
condition on the magnitude of the spatial interaction to ensure the uniqueness
in data realization. For estimation, to account for the endogeneity caused by
the spatial interaction, we propose a regularized two-stage least squares
estimator based on a basis approximation for the functional parameter. The
asymptotic properties of the estimator including the consistency and asymptotic
normality are investigated under certain conditions. Additionally, we propose a
simple Wald-type test for detecting the presence of spatial effects. As an
empirical illustration, we apply the proposed model and method to analyze age
distributions in Japanese cities.

arXiv link: http://arxiv.org/abs/2402.14763v3

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2024-02-22

Interference Produces False-Positive Pricing Experiments

Authors: Lars Roemheld, Justin Rao

It is standard practice in online retail to run pricing experiments by
randomizing at the article-level, i.e. by changing prices of different products
to identify treatment effects. Due to customers' cross-price substitution
behavior, such experiments suffer from interference bias: the observed
difference between treatment groups in the experiment is typically
significantly larger than the global effect that could be expected after a
roll-out decision of the tested pricing policy. We show in simulations that
such bias can be as large as 100%, and report experimental data implying bias
of similar magnitude. Finally, we discuss approaches for de-biased pricing
experiments, suggesting observational methods as a potentially attractive
alternative to clustering.

arXiv link: http://arxiv.org/abs/2402.14538v1

Econometrics arXiv updated paper (originally submitted: 2024-02-22)

Enhancing Rolling Horizon Production Planning Through Stochastic Optimization Evaluated by Means of Simulation

Authors: Manuel Schlenkrich, Wolfgang Seiringer, Klaus Altendorfer, Sophie N. Parragh

Production planning must account for uncertainty in a production system,
arising from fluctuating demand forecasts. Therefore, this article focuses on
the integration of updated customer demand into the rolling horizon planning
cycle. We use scenario-based stochastic programming to solve capacitated lot
sizing problems under stochastic demand in a rolling horizon environment. This
environment is replicated using a discrete event simulation-optimization
framework, where the optimization problem is periodically solved, leveraging
the latest demand information to continually adjust the production plan. We
evaluate the stochastic optimization approach and compare its performance to
solving a deterministic lot sizing model, using expected demand figures as
input, as well as to standard Material Requirements Planning (MRP). In the
simulation study, we analyze three different customer behaviors related to
forecasting, along with four levels of shop load, within a multi-item and
multi-stage production system. We test a range of significant parameter values
for the three planning methods and compute the overall costs to benchmark them.
The results show that the production plans obtained by MRP are outperformed by
deterministic and stochastic optimization. Particularly, when facing tight
resource restrictions and rising uncertainty in customer demand, the use of
stochastic optimization becomes preferable compared to deterministic
optimization.

arXiv link: http://arxiv.org/abs/2402.14506v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2024-02-22

Structure-agnostic Optimality of Doubly Robust Learning for Treatment Effect Estimation

Authors: Jikai Jin, Vasilis Syrgkanis

Average treatment effect estimation is the most central problem in causal
inference with application to numerous disciplines. While many estimation
strategies have been proposed in the literature, the statistical optimality of
these methods has still remained an open area of investigation, especially in
regimes where these methods do not achieve parametric rates. In this paper, we
adopt the recently introduced structure-agnostic framework of statistical lower
bounds, which poses no structural properties on the nuisance functions other
than access to black-box estimators that achieve some statistical estimation
rate. This framework is particularly appealing when one is only willing to
consider estimation strategies that use non-parametric regression and
classification oracles as black-box sub-processes. Within this framework, we
prove the statistical optimality of the celebrated and widely used doubly
robust estimators for both the Average Treatment Effect (ATE) and the Average
Treatment Effect on the Treated (ATT), as well as weighted variants of the
former, which arise in policy evaluation.

arXiv link: http://arxiv.org/abs/2402.14264v4

Econometrics arXiv paper, submitted: 2024-02-22

The impact of Facebook-Cambridge Analytica data scandal on the USA tech stock market: An event study based on clustering method

Authors: Vahidin Jeleskovic, Yinan Wan

This study delves into the intra-industry effects following a firm-specific
scandal, with a particular focus on the Facebook data leakage scandal and its
associated events within the U.S. tech industry and two additional relevant
groups. We employ various metrics including daily spread, volatility,
volume-weighted return, and CAPM-beta for the pre-analysis clustering, and
subsequently utilize CAR (Cumulative Abnormal Return) to evaluate the impact on
firms grouped within these clusters. From a broader industry viewpoint,
significant positive CAARs are observed across U.S. sample firms over the three
days post-scandal announcement, indicating no adverse impact on the tech sector
overall. Conversely, after Facebook's initial quarterly earnings report, it
showed a notable negative effect despite reported positive performance. The
clustering principle should aid in identifying directly related companies and
thus reducing the influence of randomness. This was indeed achieved for the
effect of the key event, namely "The Effect of Congressional Hearing on Certain
Clusters across U.S. Tech Stock Market," which was identified as delayed and
significantly negative. Therefore, we recommend applying the clustering method
when conducting such or similar event studies.

arXiv link: http://arxiv.org/abs/2402.14206v1

Econometrics arXiv cross-link from cs.CL (cs.CL), submitted: 2024-02-21

Breaking the HISCO Barrier: Automatic Occupational Standardization with OccCANINE

Authors: Christian Møller Dahl, Torben Johansen, Christian Vedel

This paper introduces a new tool, OccCANINE, to automatically transform
occupational descriptions into the HISCO classification system. The manual work
involved in processing and classifying occupational descriptions is
error-prone, tedious, and time-consuming. We finetune a preexisting language
model (CANINE) to do this automatically, thereby performing in seconds and
minutes what previously took days and weeks. The model is trained on 14 million
pairs of occupational descriptions and HISCO codes in 13 different languages
contributed by 22 different sources. Our approach is shown to have accuracy,
recall, and precision above 90 percent. Our tool breaks the metaphorical HISCO
barrier and makes this data readily available for analysis of occupational
structures with broad applicability in economics, economic history, and various
related disciplines.

arXiv link: http://arxiv.org/abs/2402.13604v2

Econometrics arXiv updated paper (originally submitted: 2024-02-20)

Vulnerability Webs: Systemic Risk in Software Networks

Authors: Cornelius Fritz, Co-Pierre Georg, Angelo Mele, Michael Schweinberger

Software development relies on code reuse to minimize costs, creating
vulnerability risks through dependencies with substantial economic impact, as
seen in the Crowdstrike and HeartBleed incidents. We analyze 52,897
dependencies across 16,102 Python repositories using a strategic network
formation model incorporating observable and unobservable heterogeneity.
Through variational approximation of conditional distributions, we demonstrate
that dependency creation generates negative externalities. Vulnerability
propagation, modeled as a contagion process, shows that popular protection
heuristics are ineffective. AI-assisted coding, on the other hand, offers an
effective alternative by enabling dependency replacement with in-house code.

arXiv link: http://arxiv.org/abs/2402.13375v3

Econometrics arXiv paper, submitted: 2024-02-20

Bridging Methodologies: Angrist and Imbens' Contributions to Causal Identification

Authors: Lucas Girard, Yannick Guyonvarch

In the 1990s, Joshua Angrist and Guido Imbens studied the causal
interpretation of Instrumental Variable estimates (a widespread methodology in
economics) through the lens of potential outcomes (a classical framework to
formalize causality in statistics). Bridging a gap between those two strands of
literature, they stress the importance of treatment effect heterogeneity and
show that, under defendable assumptions in various applications, this method
recovers an average causal effect for a specific subpopulation of individuals
whose treatment is affected by the instrument. They were awarded the Nobel
Prize primarily for this Local Average Treatment Effect (LATE). The first part
of this article presents that methodological contribution in-depth: the
origination in earlier applied articles, the different identification results
and extensions, and related debates on the relevance of LATEs for public policy
decisions. The second part reviews the main contributions of the authors beyond
the LATE. J. Angrist has pursued the search for informative and varied
empirical research designs in several fields, particularly in education. G.
Imbens has complemented the toolbox for treatment effect estimation in many
ways, notably through propensity score reweighting, matching, and, more
recently, adapting machine learning procedures.

arXiv link: http://arxiv.org/abs/2402.13023v1

Econometrics arXiv updated paper (originally submitted: 2024-02-20)

Extending the Scope of Inference About Predictive Ability to Machine Learning Methods

Authors: Juan Carlos Escanciano, Ricardo Parra

The use of machine learning methods for predictive purposes has increased
dramatically over the past two decades, but uncertainty quantification for
predictive comparisons remains elusive. This paper addresses this gap by
extending the classic inference theory for predictive ability in time series to
modern machine learners, such as the Lasso or Deep Learning. We investigate
under which conditions such extensions are possible. For standard out-of-sample
asymptotic inference to be valid with machine learning, two key properties must
hold: (I) a zero-mean condition for the score of the prediction loss function
and (ii) a "fast rate" of convergence for the machine learner. Absent any of
these conditions, the estimation risk may be unbounded, and inferences invalid
and very sensitive to sample splitting. For accurate inferences, we recommend
an 80%-20% training-test splitting rule. We illustrate the wide applicability
of our results with three applications: high-dimensional time series
regressions with the Lasso, Deep learning for binary outcomes, and a new
out-of-sample test for the Martingale Difference Hypothesis (MDH). The
theoretical results are supported by extensive Monte Carlo simulations and an
empirical application evaluating the MDH of some major exchange rates at daily
and higher frequencies.

arXiv link: http://arxiv.org/abs/2402.12838v3

Econometrics arXiv updated paper (originally submitted: 2024-02-20)

Inference on LATEs with covariates

Authors: Tom Boot, Didier Nibbering

In theory, two-stage least squares (TSLS) identifies a weighted average of
covariate-specific local average treatment effects (LATEs) from a saturated
specification, without making parametric assumptions on how available
covariates enter the model. In practice, TSLS is severely biased as saturation
leads to a large number of control dummies and an equally large number of,
arguably weak, instruments. This paper derives asymptotically valid tests and
confidence intervals for the weighted average of LATEs that is targeted, yet
missed by saturated TSLS. The proposed inference procedure is robust to
unobserved treatment effect heterogeneity, covariates with rich support, and
weak identification. We find LATEs statistically significantly different from
zero in applications in criminology, finance, health, and education.

arXiv link: http://arxiv.org/abs/2402.12607v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-02-19

Non-linear Triple Changes Estimator for Targeted Policies

Authors: Sina Akbari, Negar Kiyavash

The renowned difference-in-differences (DiD) estimator relies on the
assumption of 'parallel trends,' which does not hold in many practical
applications. To address this issue, the econometrics literature has turned to
the triple difference estimator. Both DiD and triple difference are limited to
assessing average effects exclusively. An alternative avenue is offered by the
changes-in-changes (CiC) estimator, which provides an estimate of the entire
counterfactual distribution at the cost of relying on (stronger) distributional
assumptions. In this work, we extend the triple difference estimator to
accommodate the CiC framework, presenting the `triple changes estimator' and
its identification assumptions, thereby expanding the scope of the CiC
paradigm. Subsequently, we empirically evaluate the proposed framework and
apply it to a study examining the impact of Medicaid expansion on children's
preventive care.

arXiv link: http://arxiv.org/abs/2402.12583v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-02-18

Credible causal inference beyond toy models

Authors: Pablo Geraldo Bastías

Causal inference with observational data critically relies on untestable and
extra-statistical assumptions that have (sometimes) testable implications.
Well-known sets of assumptions that are sufficient to justify the causal
interpretation of certain estimators are called identification strategies.
These templates for causal analysis, however, do not perfectly map into
empirical research practice. Researchers are often left in the disjunctive of
either abstracting away from their particular setting to fit in the templates,
risking erroneous inferences, or avoiding situations in which the templates
cannot be applied, missing valuable opportunities for conducting empirical
analysis. In this article, I show how directed acyclic graphs (DAGs) can help
researchers to conduct empirical research and assess the quality of evidence
without excessively relying on research templates. First, I offer a concise
introduction to causal inference frameworks. Then I survey the arguments in the
methodological literature in favor of using research templates, while either
avoiding or limiting the use of causal graphical models. Third, I discuss the
problems with the template model, arguing for a more flexible approach to DAGs
that helps illuminating common problems in empirical settings and improving the
credibility of causal claims. I demonstrate this approach in a series of worked
examples, showing the gap between identification strategies as invoked by
researchers and their actual applications. Finally, I conclude highlighting the
benefits that routinely incorporating causal graphical models in our scientific
discussions would have in terms of transparency, testability, and generativity.

arXiv link: http://arxiv.org/abs/2402.11659v1

Econometrics arXiv updated paper (originally submitted: 2024-02-18)

Doubly Robust Inference in Causal Latent Factor Models

Authors: Alberto Abadie, Anish Agarwal, Raaz Dwivedi, Abhin Shah

This article introduces a new estimator of average treatment effects under
unobserved confounding in modern data-rich environments featuring large numbers
of units and outcomes. The proposed estimator is doubly robust, combining
outcome imputation, inverse probability weighting, and a novel cross-fitting
procedure for matrix completion. We derive finite-sample and asymptotic
guarantees, and show that the error of the new estimator converges to a
mean-zero Gaussian distribution at a parametric rate. Simulation results
demonstrate the relevance of the formal properties of the estimators analyzed
in this article.

arXiv link: http://arxiv.org/abs/2402.11652v3

Econometrics arXiv cross-link from math.PR (math.PR), submitted: 2024-02-17

Maximal Inequalities for Empirical Processes under General Mixing Conditions with an Application to Strong Approximations

Authors: Demian Pouzo

This paper provides a bound for the supremum of sample averages over a class
of functions for a general class of mixing stochastic processes with arbitrary
mixing rates. Regardless of the speed of mixing, the bound is comprised of a
concentration rate and a novel measure of complexity. The speed of mixing,
however, affects the former quantity implying a phase transition. Fast mixing
leads to the standard root-n concentration rate, while slow mixing leads to a
slower concentration rate, its speed depends on the mixing structure. Our
findings are applied to derive strong approximation results for a general class
of mixing processes with arbitrary mixing rates.

arXiv link: http://arxiv.org/abs/2402.11394v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-02-16

Functional Partial Least-Squares: Adaptive Estimation and Inference

Authors: Andrii Babii, Marine Carrasco, Idriss Tsafack

We study the functional linear regression model with a scalar response and a
Hilbert space-valued predictor, a canonical example of an ill-posed inverse
problem. We show that the functional partial least squares (PLS) estimator
attains nearly minimax-optimal convergence rates over a class of ellipsoids and
propose an adaptive early stopping procedure for selecting the number of PLS
components. In addition, we develop new test that can detect local alternatives
converging at the parametric rate which can be inverted to construct confidence
sets. Simulation results demonstrate that the estimator performs favorably
relative to several existing methods and the proposed test exhibits good power
properties. We apply our methodology to evaluate the nonlinear effects of
temperature on corn and soybean yields.

arXiv link: http://arxiv.org/abs/2402.11134v2

Econometrics arXiv updated paper (originally submitted: 2024-02-16)

Manipulation Test for Multidimensional RDD

Authors: Federico Crippa

The causal inference model proposed by Lee (2008) for the regression
discontinuity design (RDD) relies on assumptions that imply the continuity of
the density of the assignment (running) variable. The test for this implication
is commonly referred to as the manipulation test and is regularly reported in
applied research to strengthen the design's validity. The multidimensional RDD
(MRDD) extends the RDD to contexts where treatment assignment depends on
several running variables. This paper introduces a manipulation test for the
MRDD. First, it develops a theoretical model for causal inference with the
MRDD, used to derive a testable implication on the conditional marginal
densities of the running variables. Then, it constructs the test for the
implication based on a quadratic form of a vector of statistics separately
computed for each marginal density. Finally, the proposed test is compared with
alternative procedures commonly employed in applied research.

arXiv link: http://arxiv.org/abs/2402.10836v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-02-16

Optimizing Adaptive Experiments: A Unified Approach to Regret Minimization and Best-Arm Identification

Authors: Chao Qin, Daniel Russo

Practitioners conducting adaptive experiments often encounter two competing
priorities: maximizing total welfare (or `reward') through effective treatment
assignment and swiftly concluding experiments to implement population-wide
treatments. Current literature addresses these priorities separately, with
regret minimization studies focusing on the former and best-arm identification
research on the latter. This paper bridges this divide by proposing a unified
model that simultaneously accounts for within-experiment performance and
post-experiment outcomes. We provide a sharp theory of optimal performance in
large populations that not only unifies canonical results in the literature but
also uncovers novel insights. Our theory reveals that familiar algorithms, such
as the recently proposed top-two Thompson sampling algorithm, can optimize a
broad class of objectives if a single scalar parameter is appropriately
adjusted. In addition, we demonstrate that substantial reductions in experiment
duration can often be achieved with minimal impact on both within-experiment
and post-experiment regret.

arXiv link: http://arxiv.org/abs/2402.10592v2

Econometrics arXiv updated paper (originally submitted: 2024-02-16)

Nowcasting with Mixed Frequency Data Using Gaussian Processes

Authors: Niko Hauzenberger, Massimiliano Marcellino, Michael Pfarrhofer, Anna Stelzer

We develop Bayesian machine learning methods for mixed data sampling (MIDAS)
regressions. This involves handling frequency mismatches and specifying
functional relationships between many predictors and the dependent variable. We
use Gaussian processes (GPs) and compress the input space with structured and
unstructured MIDAS variants. This yields several versions of GP-MIDAS with
distinct properties and implications, which we evaluate in short-horizon now-
and forecasting exercises with both simulated data and data on quarterly US
output growth and inflation in the GDP deflator. It turns out that our proposed
framework leverages macroeconomic Big Data in a computationally efficient way
and offers gains in predictive accuracy compared to other machine learning
approaches along several dimensions.

arXiv link: http://arxiv.org/abs/2402.10574v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-02-15

mshw, a forecasting library to predict short-term electricity demand based on multiple seasonal Holt-Winters

Authors: Oscar Trull, J. Carlos García-Díaz, Angel Peiró-Signes

Transmission system operators have a growing need for more accurate
forecasting of electricity demand. Current electricity systems largely require
demand forecasting so that the electricity market establishes electricity
prices as well as the programming of production units. The companies that are
part of the electrical system use exclusive software to obtain predictions,
based on the use of time series and prediction tools, whether statistical or
artificial intelligence. However, the most common form of prediction is based
on hybrid models that use both technologies. In any case, it is software with a
complicated structure, with a large number of associated variables and that
requires a high computational load to make predictions. The predictions they
can offer are not much better than those that simple models can offer. In this
paper we present a MATLAB toolbox created for the prediction of electrical
demand. The toolbox implements multiple seasonal Holt-Winters exponential
smoothing models and neural network models. The models used include the use of
discrete interval mobile seasonalities (DIMS) to improve forecasting on special
days. Additionally, the results of its application in various electrical
systems in Europe are shown, where the results obtained can be seen. The use of
this library opens a new avenue of research for the use of models with discrete
and complex seasonalities in other fields of application.

arXiv link: http://arxiv.org/abs/2402.10982v1

Econometrics arXiv updated paper (originally submitted: 2024-02-15)

When Can We Use Two-Way Fixed-Effects (TWFE): A Comparison of TWFE and Novel Dynamic Difference-in-Differences Estimators

Authors: Tobias Rüttenauer, Ozan Aksoy

The conventional Two-Way Fixed-Effects (TWFE) estimator has come under
scrutiny lately. Recent literature has revealed potential shortcomings of TWFE
when the treatment effects are heterogeneous. Scholars have developed new
advanced dynamic Difference-in-Differences (DiD) estimators to tackle these
potential shortcomings. However, confusion remains in applied research as to
when the conventional TWFE is biased and what issues the novel estimators can
and cannot address. In this study, we first provide an intuitive explanation of
the problems of TWFE and elucidate the key features of the novel alternative
DiD estimators. We then systematically demonstrate the conditions under which
the conventional TWFE is inconsistent. We employ Monte Carlo simulations to
assess the performance of dynamic DiD estimators under violations of key
assumptions, which likely happens in applied cases. While the new dynamic DiD
estimators offer notable advantages in capturing heterogeneous treatment
effects, we show that the conventional TWFE performs generally well if the
model specifies an event-time function. All estimators are equally sensitive to
violations of the parallel trends assumption, anticipation effects or
violations of time-varying exogeneity. Despite their advantages, the new
dynamic DiD estimators tackle a very specific problem and they do not serve as
a universal remedy for violations of the most critical assumptions. We finally
derive, based on our simulations, recommendations for how and when to use TWFE
and the new DiD estimators in applied research.

arXiv link: http://arxiv.org/abs/2402.09928v3

Econometrics arXiv updated paper (originally submitted: 2024-02-15)

Spatial Data Analysis

Authors: Tobias Rüttenauer

This handbook chapter provides an essential introduction to the field of
spatial econometrics, offering a comprehensive overview of techniques and
methodologies for analysing spatial data in the social sciences. Spatial
econometrics addresses the unique challenges posed by spatially dependent
observations, where spatial relationships among data points can be of
substantive interest or can significantly impact statistical analyses. The
chapter begins by exploring the fundamental concepts of spatial dependence and
spatial autocorrelation, and highlighting their implications for traditional
econometric models. It then introduces a range of spatial econometric models,
particularly spatial lag, spatial error, spatial lag of X, and spatial Durbin
models, illustrating how these models accommodate spatial relationships and
yield accurate and insightful results about the underlying spatial processes.
The chapter provides an intuitive guide on how to interpret those different
models. A practical example on London house prices demonstrates the application
of spatial econometrics, emphasising its relevance in uncovering hidden spatial
patterns, addressing endogeneity, and providing robust estimates in the
presence of spatial dependence.

arXiv link: http://arxiv.org/abs/2402.09895v2

Econometrics arXiv paper, submitted: 2024-02-15

Identification with Posterior-Separable Information Costs

Authors: Martin Bustos

I provide a model of rational inattention with heterogeneity and prove it is
observationally equivalent to a state-dependent stochastic choice model subject
to attention costs. I demonstrate that additive separability of unobservable
heterogeneity, together with an independence assumption, suffice for the
empirical model to admit a representative agent. Using conditional
probabilities, I show how to identify: how covariates affect the desirability
of goods, (a measure of) welfare, factual changes in welfare, and bounds on
counterfactual market shares.

arXiv link: http://arxiv.org/abs/2402.09789v1

Econometrics arXiv updated paper (originally submitted: 2024-02-15)

Quantile Granger Causality in the Presence of Instability

Authors: Alexander Mayer, Dominik Wied, Victor Troster

We propose a new framework for assessing Granger causality in quantiles in
unstable environments, for a fixed quantile or over a continuum of quantile
levels. Our proposed test statistics are consistent against fixed alternatives,
they have nontrivial power against local alternatives, and they are pivotal in
certain important special cases. In addition, we show the validity of a
bootstrap procedure when asymptotic distributions depend on nuisance
parameters. Monte Carlo simulations reveal that the proposed test statistics
have correct empirical size and high power, even in absence of structural
breaks. Moreover, a procedure providing additional insight into the timing of
Granger causal regimes based on our new tests is proposed. Finally, an
empirical application in energy economics highlights the applicability of our
method as the new tests provide stronger evidence of Granger causality.

arXiv link: http://arxiv.org/abs/2402.09744v2

Econometrics arXiv updated paper (originally submitted: 2024-02-14)

Cross-Temporal Forecast Reconciliation at Digital Platforms with Machine Learning

Authors: Jeroen Rombouts, Marie Ternes, Ines Wilms

Platform businesses operate on a digital core and their decision making
requires high-dimensional accurate forecast streams at different levels of
cross-sectional (e.g., geographical regions) and temporal aggregation (e.g.,
minutes to days). It also necessitates coherent forecasts across all levels of
the hierarchy to ensure aligned decision making across different planning units
such as pricing, product, controlling and strategy. Given that platform data
streams feature complex characteristics and interdependencies, we introduce a
non-linear hierarchical forecast reconciliation method that produces
cross-temporal reconciled forecasts in a direct and automated way through the
use of popular machine learning methods. The method is sufficiently fast to
allow forecast-based high-frequency decision making that platforms require. We
empirically test our framework on unique, large-scale streaming datasets from a
leading on-demand delivery platform in Europe and a bicycle sharing system in
New York City.

arXiv link: http://arxiv.org/abs/2402.09033v2

Econometrics arXiv updated paper (originally submitted: 2024-02-14)

Local-Polynomial Estimation for Multivariate Regression Discontinuity Designs

Authors: Masayuki Sawada, Takuya Ishihara, Daisuke Kurisu, Yasumasa Matsuda

We introduce a multivariate local-linear estimator for multivariate
regression discontinuity designs in which treatment is assigned by crossing a
boundary in the space of running variables. The dominant approach uses the
Euclidean distance from a boundary point as the scalar running variable; hence,
multivariate designs are handled as uni-variate designs. However, the bandwidth
selection with the distance running variable is suboptimal and inefficient for
the underlying multivariate problem. We handle multivariate designs as
multivariate. In this study, we develop a novel asymptotic normality for
multivariate local-polynomial estimators. Our estimator is asymptotically valid
and can capture heterogeneous treatment effects over the boundary. We
demonstrate the effectiveness of our estimator through numerical simulations.
Our empirical illustration of a Colombian scholarship study reveals a richer
heterogeneity of the treatment effect that is hidden in the original estimates.

arXiv link: http://arxiv.org/abs/2402.08941v2

Econometrics arXiv updated paper (originally submitted: 2024-02-14)

Inference for an Algorithmic Fairness-Accuracy Frontier

Authors: Yiqi Liu, Francesca Molinari

Algorithms are increasingly used to aid with high-stakes decision making.
Yet, their predictive ability frequently exhibits systematic variation across
population subgroups. To assess the trade-off between fairness and accuracy
using finite data, we propose a debiased machine learning estimator for the
fairness-accuracy frontier introduced by Liang, Lu, Mu, and Okumura (2024). We
derive its asymptotic distribution and propose inference methods to test key
hypotheses in the fairness literature, such as (i) whether excluding group
identity from use in training the algorithm is optimal and (ii) whether there
are less discriminatory alternatives to a given algorithm. In addition, we
construct an estimator for the distance between a given algorithm and the
fairest point on the frontier, and characterize its asymptotic distribution.
Using Monte Carlo simulations, we evaluate the finite-sample performance of our
inference methods. We apply our framework to re-evaluate algorithms used in
hospital care management and show that our approach yields alternative
algorithms that lie on the fairness-accuracy frontier, offering improvements
along both dimensions.

arXiv link: http://arxiv.org/abs/2402.08879v2

Econometrics arXiv updated paper (originally submitted: 2024-02-13)

Heterogeneity, Uncertainty and Learning: Semiparametric Identification and Estimation

Authors: Jackson Bunting, Paul Diegert, Arnaud Maurel

We provide identification results for a broad class of learning models in
which continuous outcomes depend on three types of unobservables: known
heterogeneity, initially unknown heterogeneity that may be revealed over time,
and transitory uncertainty. We consider a common environment where the
researcher only has access to a short panel on choices and realized outcomes.
We establish identification of the outcome equation parameters and the
distribution of the unobservables, under the standard assumption that unknown
heterogeneity and uncertainty are normally distributed. We also show that,
absent known heterogeneity, the model is identified without making any
distributional assumption. We then derive the asymptotic properties of a sieve
MLE estimator for the model parameters, and devise a tractable profile
likelihood-based estimation procedure. Our estimator exhibits good
finite-sample properties. Finally, we illustrate our approach with an
application to ability learning in the context of occupational choice. Our
results point to substantial ability learning based on realized wages.

arXiv link: http://arxiv.org/abs/2402.08575v2

Econometrics arXiv paper, submitted: 2024-02-12

Finding Moving-Band Statistical Arbitrages via Convex-Concave Optimization

Authors: Kasper Johansson, Thomas Schmelzer, Stephen Boyd

We propose a new method for finding statistical arbitrages that can contain
more assets than just the traditional pair. We formulate the problem as seeking
a portfolio with the highest volatility, subject to its price remaining in a
band and a leverage limit. This optimization problem is not convex, but can be
approximately solved using the convex-concave procedure, a specific sequential
convex programming method. We show how the method generalizes to finding
moving-band statistical arbitrages, where the price band midpoint varies over
time.

arXiv link: http://arxiv.org/abs/2402.08108v1

Econometrics arXiv paper, submitted: 2024-02-12

On Bayesian Filtering for Markov Regime Switching Models

Authors: Nigar Hashimzade, Oleg Kirsanov, Tatiana Kirsanova, Junior Maih

This paper presents a framework for empirical analysis of dynamic
macroeconomic models using Bayesian filtering, with a specific focus on the
state-space formulation of Dynamic Stochastic General Equilibrium (DSGE) models
with multiple regimes. We outline the theoretical foundations of model
estimation, provide the details of two families of powerful multiple-regime
filters, IMM and GPB, and construct corresponding multiple-regime smoothers. A
simulation exercise, based on a prototypical New Keynesian DSGE model, is used
to demonstrate the computational robustness of the proposed filters and
smoothers and evaluate their accuracy and speed for a selection of filters from
each family. We show that the canonical IMM filter is faster and is no less,
and often more, accurate than its competitors within IMM and GPB families, the
latter including the commonly used Kim and Nelson (1999) filter. Using it with
the matching smoother improves the precision in recovering unobserved variables
by about 25 percent. Furthermore, applying it to the U.S. 1947-2023
macroeconomic time series, we successfully identify significant past policy
shifts including those related to the post-Covid-19 period. Our results
demonstrate the practical applicability and potential of the proposed routines
in macroeconomic analysis.

arXiv link: http://arxiv.org/abs/2402.08051v1

Econometrics arXiv updated paper (originally submitted: 2024-02-12)

Local Projections Inference with High-Dimensional Covariates without Sparsity

Authors: Jooyoung Cha

This paper presents a comprehensive local projections (LP) framework for
estimating future responses to current shocks, robust to high-dimensional
controls without relying on sparsity assumptions. The approach is applicable to
various settings, including impulse response analysis and
difference-in-differences (DiD) estimation. While methods like LASSO exist,
they often assume most parameters are exactly zero, limiting their
effectiveness in dense data generation processes. I propose a novel technique
incorporating high-dimensional covariates in local projections using the
Orthogonal Greedy Algorithm with a high-dimensional AIC (OGA+HDAIC) model
selection method. This approach offers robustness in both sparse and dense
scenarios, improved interpretability, and more reliable causal inference in
local projections. Simulation studies show superior performance in dense and
persistent scenarios compared to conventional LP and LASSO-based approaches. In
an empirical application to Acemoglu, Naidu, Restrepo, and Robinson (2019), I
demonstrate efficiency gains and robustness to a large set of controls.
Additionally, I examine the effect of subjective beliefs on economic
aggregates, demonstrating robustness to various model specifications. A novel
state-dependent analysis reveals that inflation behaves more in line with
rational expectations in good states, but exhibits more subjective, pessimistic
dynamics in bad states.

arXiv link: http://arxiv.org/abs/2402.07743v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-02-12

A step towards the integration of machine learning and classic model-based survey methods

Authors: Tomasz Żądło, Adam Chwila

The usage of machine learning methods in traditional surveys including
official statistics, is still very limited. Therefore, we propose a predictor
supported by these algorithms, which can be used to predict any population or
subpopulation characteristics. Machine learning methods have already been shown
to be very powerful in identifying and modelling complex and nonlinear
relationships between the variables, which means they have very good properties
in case of strong departures from the classic assumptions. Therefore, we
analyse the performance of our proposal under a different set-up, which, in our
opinion, is of greater importance in real-life surveys. We study only small
departures from the assumed model to show that our proposal is a good
alternative, even in comparison with optimal methods under the model. Moreover,
we propose the method of the ex ante accuracy estimation of machine learning
predictors, giving the possibility of the accuracy comparison with classic
methods. The solution to this problem is indicated in the literature as one of
the key issues in integrating these approaches. The simulation studies are
based on a real, longitudinal dataset, where the prediction of subpopulation
characteristics is considered.

arXiv link: http://arxiv.org/abs/2402.07521v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-02-11

Interference Among First-Price Pacing Equilibria: A Bias and Variance Analysis

Authors: Luofeng Liao, Christian Kroer, Sergei Leonenkov, Okke Schrijvers, Liang Shi, Nicolas Stier-Moses, Congshan Zhang

Online A/B testing is widely used in the internet industry to inform
decisions on new feature roll-outs. For online marketplaces (such as
advertising markets), standard approaches to A/B testing may lead to biased
results when buyers operate under a budget constraint, as budget consumption in
one arm of the experiment impacts performance of the other arm. To counteract
this interference, one can use a budget-split design where the budget
constraint operates on a per-arm basis and each arm receives an equal fraction
of the budget, leading to “budget-controlled A/B testing.” Despite clear
advantages of budget-controlled A/B testing, performance degrades when budget
are split too small, limiting the overall throughput of such systems. In this
paper, we propose a parallel budget-controlled A/B testing design where we use
market segmentation to identify submarkets in the larger market, and we run
parallel experiments on each submarket.
Our contributions are as follows: First, we introduce and demonstrate the
effectiveness of the parallel budget-controlled A/B test design with submarkets
in a large online marketplace environment. Second, we formally define market
interference in first-price auction markets using the first price pacing
equilibrium (FPPE) framework. Third, we propose a debiased surrogate that
eliminates the first-order bias of FPPE, drawing upon the principles of
sensitivity analysis in mathematical programs. Fourth, we derive a plug-in
estimator for the surrogate and establish its asymptotic normality. Fifth, we
provide an estimation procedure for submarket parallel budget-controlled A/B
tests. Finally, we present numerical examples on semi-synthetic data,
confirming that the debiasing technique achieves the desired coverage
properties.

arXiv link: http://arxiv.org/abs/2402.07322v3

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2024-02-11

Research on the multi-stage impact of digital economy on rural revitalization in Hainan Province based on GPM model

Authors: Wenbo Lyu

The rapid development of the digital economy has had a profound impact on the
implementation of the rural revitalization strategy. Based on this, this study
takes Hainan Province as the research object to deeply explore the impact of
digital economic development on rural revitalization. The study collected panel
data from 2003 to 2022 to construct an evaluation index system for the digital
economy and rural revitalization and used panel regression analysis and other
methods to explore the promotion effect of the digital economy on rural
revitalization. Research results show that the digital economy has a
significant positive impact on rural revitalization, and this impact increases
as the level of fiscal expenditure increases. The issuance of digital RMB has
further exerted a regulatory effect and promoted the development of the digital
economy and the process of rural revitalization. At the same time, the
establishment of the Hainan Free Trade Port has also played a positive role in
promoting the development of the digital economy and rural revitalization. In
the prediction of the optimal strategy for rural revitalization based on the
development levels of the primary, secondary, and tertiary industries (Rate1,
Rate2, and Rate3), it was found that rate1 can encourage Hainan Province to
implement digital economic innovation, encourage rate3 to implement promotion
behaviors, and increase rate2 can At the level of sustainable development when
rate3 promotes rate2's digital economic innovation behavior, it can standardize
rate2's production behavior to the greatest extent, accelerate the faster
application of the digital economy to the rural revitalization industry, and
promote the technological advancement of enterprises.

arXiv link: http://arxiv.org/abs/2402.07170v1

Econometrics arXiv paper, submitted: 2024-02-08

High Dimensional Factor Analysis with Weak Factors

Authors: Jungjun Choi, Ming Yuan

This paper studies the principal components (PC) estimator for high
dimensional approximate factor models with weak factors in that the factor
loading ($\Lambda^0$) scales sublinearly in the number $N$ of
cross-section units, i.e., $\Lambda^{0\top} \Lambda^0
/ N^\alpha$ is positive definite in the limit for some $\alpha \in (0,1)$.
While the consistency and asymptotic normality of these estimates are by now
well known when the factors are strong, i.e., $\alpha=1$, the statistical
properties for weak factors remain less explored. Here, we show that the PC
estimator maintains consistency and asymptotical normality for any
$\alpha\in(0,1)$, provided suitable conditions regarding the dependence
structure in the noise are met. This complements earlier result by Onatski
(2012) that the PC estimator is inconsistent when $\alpha=0$, and the more
recent work by Bai and Ng (2023) who established the asymptotic normality of
the PC estimator when $\alpha \in (1/2,1)$. Our proof strategy integrates the
traditional eigendecomposition-based approach for factor models with
leave-one-out analysis similar in spirit to those used in matrix completion and
other settings. This combination allows us to deal with factors weaker than the
former and at the same time relax the incoherence and independence assumptions
often associated with the later.

arXiv link: http://arxiv.org/abs/2402.05789v1

Econometrics arXiv paper, submitted: 2024-02-08

Difference-in-Differences Estimators with Continuous Treatments and no Stayers

Authors: Clément de Chaisemartin, Xavier D'Haultfœuille, Gonzalo Vazquez-Bare

Many treatments or policy interventions are continuous in nature. Examples
include prices, taxes or temperatures. Empirical researchers have usually
relied on two-way fixed effect regressions to estimate treatment effects in
such cases. However, such estimators are not robust to heterogeneous treatment
effects in general; they also rely on the linearity of treatment effects. We
propose estimators for continuous treatments that do not impose those
restrictions, and that can be used when there are no stayers: the treatment of
all units changes from one period to the next. We start by extending the
nonparametric results of de Chaisemartin et al. (2023) to cases without
stayers. We also present a parametric estimator, and use it to revisit
Desch\^enes and Greenstone (2012).

arXiv link: http://arxiv.org/abs/2402.05432v1

Econometrics arXiv paper, submitted: 2024-02-08

Selective linear segmentation for detecting relevant parameter changes

Authors: Arnaud Dufays, Aristide Houndetoungan, Alain Coën

Change-point processes are one flexible approach to model long time series.
We propose a method to uncover which model parameter truly vary when a
change-point is detected. Given a set of breakpoints, we use a penalized
likelihood approach to select the best set of parameters that changes over time
and we prove that the penalty function leads to a consistent selection of the
true model. Estimation is carried out via the deterministic annealing
expectation-maximization algorithm. Our method accounts for model selection
uncertainty and associates a probability to all the possible time-varying
parameter specifications. Monte Carlo simulations highlight that the method
works well for many time series models including heteroskedastic processes. For
a sample of 14 Hedge funds (HF) strategies, using an asset based style pricing
model, we shed light on the promising ability of our method to detect the
time-varying dynamics of risk exposures as well as to forecast HF returns.

arXiv link: http://arxiv.org/abs/2402.05329v1

Econometrics arXiv updated paper (originally submitted: 2024-02-07)

Inference for Two-Stage Extremum Estimators

Authors: Aristide Houndetoungan, Abdoul Haki Maoude

We present a simulation-based inference approach for two-stage estimators,
focusing on extremum estimators in the second stage. We accommodate a broad
range of first-stage estimators, including extremum estimators,
high-dimensional estimators, and other types of estimators such as Bayesian
estimators. The key contribution of our approach lies in its ability to
estimate the asymptotic distribution of two-stage estimators, even when the
distributions of both the first- and second-stage estimators are non-normal and
when the second-stage estimator's bias, scaled by the square root of the sample
size, does not vanish asymptotically. This enables reliable inference in
situations where standard methods fail. Additionally, we propose a debiased
estimator, based on the mean of the estimated distribution function, which
exhibits improved finite sample properties. Unlike resampling methods, our
approach avoids the need for multiple calculations of the two-stage estimator.
We illustrate the effectiveness of our method in an empirical application on
peer effects in adolescent fast-food consumption, where we address the issue of
biased instrumental variable estimates resulting from many weak instruments.

arXiv link: http://arxiv.org/abs/2402.05030v2

Econometrics arXiv updated paper (originally submitted: 2024-02-07)

What drives the European carbon market? Macroeconomic factors and forecasts

Authors: Andrea Bastianin, Elisabetta Mirto, Yan Qin, Luca Rossini

Putting a price on carbon -- with taxes or developing carbon markets -- is a
widely used policy measure to achieve the target of net-zero emissions by 2050.
This paper tackles the issue of producing point, direction-of-change, and
density forecasts for the monthly real price of carbon within the EU Emissions
Trading Scheme (EU ETS). We aim to uncover supply- and demand-side forces that
can contribute to improving the prediction accuracy of models at short- and
medium-term horizons. We show that a simple Bayesian Vector Autoregressive
(BVAR) model, augmented with either one or two factors capturing a set of
predictors affecting the price of carbon, provides substantial accuracy gains
over a wide set of benchmark forecasts, including survey expectations and
forecasts made available by data providers. We extend the study to verified
emissions and demonstrate that, in this case, adding stochastic volatility can
further improve the forecasting performance of a single-factor BVAR model. We
rely on emissions and price forecasts to build market monitoring tools that
track demand and price pressure in the EU ETS market. Our results are relevant
for policymakers and market practitioners interested in monitoring the carbon
market dynamics.

arXiv link: http://arxiv.org/abs/2402.04828v2

Econometrics arXiv paper, submitted: 2024-02-07

Hyperparameter Tuning for Causal Inference with Double Machine Learning: A Simulation Study

Authors: Philipp Bach, Oliver Schacht, Victor Chernozhukov, Sven Klaassen, Martin Spindler

Proper hyperparameter tuning is essential for achieving optimal performance
of modern machine learning (ML) methods in predictive tasks. While there is an
extensive literature on tuning ML learners for prediction, there is only little
guidance available on tuning ML learners for causal machine learning and how to
select among different ML learners. In this paper, we empirically assess the
relationship between the predictive performance of ML methods and the resulting
causal estimation based on the Double Machine Learning (DML) approach by
Chernozhukov et al. (2018). DML relies on estimating so-called nuisance
parameters by treating them as supervised learning problems and using them as
plug-in estimates to solve for the (causal) parameter. We conduct an extensive
simulation study using data from the 2019 Atlantic Causal Inference Conference
Data Challenge. We provide empirical insights on the role of hyperparameter
tuning and other practical decisions for causal estimation with DML. First, we
assess the importance of data splitting schemes for tuning ML learners within
Double Machine Learning. Second, we investigate how the choice of ML methods
and hyperparameters, including recent AutoML frameworks, impacts the estimation
performance for a causal parameter of interest. Third, we assess to what extent
the choice of a particular causal model, as characterized by incorporated
parametric assumptions, can be based on predictive performance metrics.

arXiv link: http://arxiv.org/abs/2402.04674v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-02-06

Fast Online Changepoint Detection

Authors: Fabrizio Ghezzi, Eduardo Rossi, Lorenzo Trapani

We study online changepoint detection in the context of a linear regression
model. We propose a class of heavily weighted statistics based on the CUSUM
process of the regression residuals, which are specifically designed to ensure
timely detection of breaks occurring early on during the monitoring horizon. We
subsequently propose a class of composite statistics, constructed using
different weighing schemes; the decision rule to mark a changepoint is based on
the largest statistic across the various weights, thus effectively working like
a veto-based voting mechanism, which ensures fast detection irrespective of the
location of the changepoint. Our theory is derived under a very general form of
weak dependence, thus being able to apply our tests to virtually all time
series encountered in economics, medicine, and other applied sciences. Monte
Carlo simulations show that our methodologies are able to control the
procedure-wise Type I Error, and have short detection delays in the presence of
breaks.

arXiv link: http://arxiv.org/abs/2402.04433v1

Econometrics arXiv paper, submitted: 2024-02-06

Monthly GDP nowcasting with Machine Learning and Unstructured Data

Authors: Juan Tenorio, Wilder Perez

In the dynamic landscape of continuous change, Machine Learning (ML)
"nowcasting" models offer a distinct advantage for informed decision-making in
both public and private sectors. This study introduces ML-based GDP growth
projection models for monthly rates in Peru, integrating structured
macroeconomic indicators with high-frequency unstructured sentiment variables.
Analyzing data from January 2007 to May 2023, encompassing 91 leading economic
indicators, the study evaluates six ML algorithms to identify optimal
predictors. Findings highlight the superior predictive capability of ML models
using unstructured data, particularly Gradient Boosting Machine, LASSO, and
Elastic Net, exhibiting a 20% to 25% reduction in prediction errors compared to
traditional AR and Dynamic Factor Models (DFM). This enhanced performance is
attributed to better handling of data of ML models in high-uncertainty periods,
such as economic crises.

arXiv link: http://arxiv.org/abs/2402.04165v1

Econometrics arXiv updated paper (originally submitted: 2024-02-04)

Data-driven Policy Learning for Continuous Treatments

Authors: Chunrong Ai, Yue Fang, Haitian Xie

This paper studies policy learning for continuous treatments from
observational data. Continuous treatments present more significant challenges
than discrete ones because population welfare may need nonparametric
estimation, and policy space may be infinite-dimensional and may satisfy shape
restrictions. We propose to approximate the policy space with a sequence of
finite-dimensional spaces and, for any given policy, obtain the empirical
welfare by applying the kernel method. We consider two cases: known and unknown
propensity scores. In the latter case, we allow for machine learning of the
propensity score and modify the empirical welfare to account for the effect of
machine learning. The learned policy maximizes the empirical welfare or the
modified empirical welfare over the approximating space. In both cases, we
modify the penalty algorithm proposed in mbakop2021model to
data-automate the tuning parameters (i.e., bandwidth and dimension of the
approximating space) and establish an oracle inequality for the welfare regret.

arXiv link: http://arxiv.org/abs/2402.02535v2

Econometrics arXiv updated paper (originally submitted: 2024-02-04)

Decomposing Global Bank Network Connectedness: What is Common, Idiosyncratic and When?

Authors: Jonas Krampe, Luca Margaritella

We propose a novel approach to estimate high-dimensional global bank network
connectedness in both the time and frequency domains. By employing a factor
model with sparse VAR idiosyncratic components, we decompose system-wide
connectedness (SWC) into two key drivers: (i) common component shocks and (ii)
idiosyncratic shocks. We also provide bootstrap confidence bands for all SWC
measures. Furthermore, spectral density estimation allows us to disentangle SWC
into short-, medium-, and long-term frequency responses to these shocks. We
apply our methodology to two datasets of daily stock price volatilities for
over 90 global banks, spanning the periods 2003-2013 and 2014-2023. Our
empirical analysis reveals that SWC spikes during global crises, primarily
driven by common component shocks and their short term effects. Conversely, in
normal times, SWC is largely influenced by idiosyncratic shocks and medium-term
dynamics.

arXiv link: http://arxiv.org/abs/2402.02482v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-02-04

Bootstrapping Fisher Market Equilibrium and First-Price Pacing Equilibrium

Authors: Luofeng Liao, Christian Kroer

The linear Fisher market (LFM) is a basic equilibrium model from economics,
which also has applications in fair and efficient resource allocation.
First-price pacing equilibrium (FPPE) is a model capturing budget-management
mechanisms in first-price auctions. In certain practical settings such as
advertising auctions, there is an interest in performing statistical inference
over these models. A popular methodology for general statistical inference is
the bootstrap procedure. Yet, for LFM and FPPE there is no existing theory for
the valid application of bootstrap procedures. In this paper, we introduce and
devise several statistically valid bootstrap inference procedures for LFM and
FPPE. The most challenging part is to bootstrap general FPPE, which reduces to
bootstrapping constrained M-estimators, a largely unexplored problem. We devise
a bootstrap procedure for FPPE under mild degeneracy conditions by using the
powerful tool of epi-convergence theory. Experiments with synthetic and
semi-real data verify our theory.

arXiv link: http://arxiv.org/abs/2402.02303v6

Econometrics arXiv updated paper (originally submitted: 2024-02-03)

One-inflated zero-truncated Poisson and negative binomial regression models

Authors: Ryan T. Godwin

The workhorse model for zero-truncated count data (y = 1, 2, ...) is the
zero-truncated negative binomial (ZTNB) model. We find it should seldom be
used. Instead, we recommend the one-inflated zero-truncated negative binomial
(OIZTNB) model developed here. Zero-truncated count data often contain an
excess of 1s, leading to bias and inconsistency in the ZTNB model. The
importance of the OIZTNB model is apparent given the obvious presence of
one-inflation in four datasets that have traditionally championed the standard
ZTNB. We provide estimation, marginal effects, and a suite of accompanying
tools in the R package oneinfl, available on CRAN.

arXiv link: http://arxiv.org/abs/2402.02272v2

Econometrics arXiv updated paper (originally submitted: 2024-02-03)

The general solution to an autoregressive law of motion

Authors: Brendan K. Beare, Massimo Franchi, Phil Howlett

We provide a complete description of the set of all solutions to an
autoregressive law of motion in a finite-dimensional complex vector space.
Every solution is shown to be the sum of three parts, each corresponding to a
directed flow of time. One part flows forward from the arbitrarily distant
past; one flows backward from the arbitrarily distant future; and one flows
outward from time zero. The three parts are obtained by applying three
complementary spectral projections to the solution, these corresponding to a
separation of the eigenvalues of the autoregressive operator according to
whether they are inside, outside or on the unit circle. We provide a
finite-dimensional parametrization of the set of all solutions.

arXiv link: http://arxiv.org/abs/2402.01966v2

Econometrics arXiv updated paper (originally submitted: 2024-02-02)

Sparse spanning portfolios and under-diversification with second-order stochastic dominance

Authors: Stelios Arvanitis, Olivier Scaillet, Nikolas Topaloglou

We develop and implement methods for determining whether relaxing sparsity
constraints on portfolios improves the investment opportunity set for
risk-averse investors. We formulate a new estimation procedure for sparse
second-order stochastic spanning based on a greedy algorithm and Linear
Programming. We show the optimal recovery of the sparse solution asymptotically
whether spanning holds or not. From large equity datasets, we estimate the
expected utility loss due to possible under-diversification, and find that
there is no benefit from expanding a sparse opportunity set beyond 45 assets.
The optimal sparse portfolio invests in 10 industry sectors and cuts tail risk
when compared to a sparse mean-variance portfolio. On a rolling-window basis,
the number of assets shrinks to 25 assets in crisis periods, while standard
factor models cannot explain the performance of the sparse portfolios.

arXiv link: http://arxiv.org/abs/2402.01951v2

Econometrics arXiv paper, submitted: 2024-02-02

Data-driven model selection within the matrix completion method for causal panel data models

Authors: Sandro Heiniger

Matrix completion estimators are employed in causal panel data models to
regulate the rank of the underlying factor model using nuclear norm
minimization. This convex optimization problem enables concurrent
regularization of a potentially high-dimensional set of covariates to shrink
the model size. For valid finite sample inference, we adopt a permutation-based
approach and prove its validity for any treatment assignment mechanism.
Simulations illustrate the consistency of the proposed estimator in parameter
estimation and variable selection. An application to public health policies in
Germany demonstrates the data-driven model selection feature on empirical data
and finds no effect of travel restrictions on the containment of severe
Covid-19 infections.

arXiv link: http://arxiv.org/abs/2402.01069v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-02-01

DoubleMLDeep: Estimation of Causal Effects with Multimodal Data

Authors: Sven Klaassen, Jan Teichert-Kluge, Philipp Bach, Victor Chernozhukov, Martin Spindler, Suhas Vijaykumar

This paper explores the use of unstructured, multimodal data, namely text and
images, in causal inference and treatment effect estimation. We propose a
neural network architecture that is adapted to the double machine learning
(DML) framework, specifically the partially linear model. An additional
contribution of our paper is a new method to generate a semi-synthetic dataset
which can be used to evaluate the performance of causal effect estimation in
the presence of text and images as confounders. The proposed methods and
architectures are evaluated on the semi-synthetic dataset and compared to
standard approaches, highlighting the potential benefit of using text and
images directly in causal studies. Our findings have implications for
researchers and practitioners in economics, marketing, finance, medicine and
data science in general who are interested in estimating causal quantities
using non-traditional data.

arXiv link: http://arxiv.org/abs/2402.01785v1

Econometrics arXiv paper, submitted: 2024-02-01

The prices of renewable commodities: A robust stationarity analysis

Authors: Manuel Landajo, María José Presno

This paper addresses the problem of testing for persistence in the effects of
the shocks affecting the prices of renewable commodities, which have potential
implications on stabilization policies and economic forecasting, among other
areas. A robust methodology is employed that enables the determination of the
potential presence and number of instant/gradual structural changes in the
series, stationarity testing conditional on the number of changes detected, and
the detection of change points. This procedure is applied to the annual real
prices of eighteen renewable commodities over the period of 1900-2018. Results
indicate that most of the series display non-linear features, including
quadratic patterns and regime transitions that often coincide with well-known
political and economic episodes. The conclusions of stationarity testing
suggest that roughly half of the series are integrated. Stationarity fails to
be rejected for grains, whereas most livestock and textile commodities do
reject stationarity. Evidence is mixed in all soft commodities and tropical
crops, where stationarity can be rejected in approximately half of the cases.
The implication would be that for these commodities, stabilization schemes
would not be recommended.

arXiv link: http://arxiv.org/abs/2402.01005v1

Econometrics arXiv paper, submitted: 2024-02-01

EU-28's progress towards the 2020 renewable energy share. A club convergence analysis

Authors: María José Presno, Manuel Landajo

This paper assesses the convergence of the EU-28 countries towards their
common goal of 20% in the renewable energy share indicator by year 2020. The
potential presence of clubs of convergence towards different steady state
equilibria is also analyzed from both the standpoints of global convergence to
the 20% goal and specific convergence to the various targets assigned to Member
States. Two clubs of convergence are detected in the former case, each
corresponding to different RES targets. A probit model is also fitted with the
aim of better understanding the determinants of club membership, that seemingly
include real GDP per capita, expenditure on environmental protection, energy
dependence, and nuclear capacity, with all of them having statistically
significant effects. Finally, convergence is also analyzed separately for the
transport, heating and cooling, and electricity sectors.

arXiv link: http://arxiv.org/abs/2402.00788v1

Econometrics arXiv updated paper (originally submitted: 2024-02-01)

Arellano-Bond LASSO Estimator for Dynamic Linear Panel Models

Authors: Victor Chernozhukov, Iván Fernández-Val, Chen Huang, Weining Wang

The Arellano-Bond estimator is a fundamental method for dynamic panel data
models, widely used in practice. However, the estimator is severely biased when
the data's time series dimension $T$ is long due to the large degree of
overidentification. We show that weak dependence along the panel's time series
dimension naturally implies approximate sparsity of the most informative moment
conditions, motivating the following approach to remove the bias: First, apply
LASSO to the cross-section data at each time period to construct most
informative (and cross-fitted) instruments, using lagged values of suitable
covariates. This step relies on approximate sparsity to select the most
informative instruments. Second, apply a linear instrumental variable estimator
after first differencing the dynamic structural equation using the constructed
instruments. Under weak time series dependence, we show the new estimator is
consistent and asymptotically normal under much weaker conditions on $T$'s
growth than the Arellano-Bond estimator. Our theory covers models with high
dimensional covariates, including multiple lags of the dependent variable,
common in modern applications. We illustrate our approach by applying it to
weekly county-level panel data from the United States to study opening K-12
schools and other mitigation policies' short and long-term effects on
COVID-19's spread.

arXiv link: http://arxiv.org/abs/2402.00584v4

Econometrics arXiv paper, submitted: 2024-02-01

Stochastic convergence in per capita CO$_2$ emissions. An approach from nonlinear stationarity analysis

Authors: María José Presno, Manuel Landajo, Paula Fernández González

This paper studies stochastic convergence of per capita CO$_2$ emissions in
28 OECD countries for the 1901-2009 period. The analysis is carried out at two
aggregation levels, first for the whole set of countries (joint analysis) and
then separately for developed and developing states (group analysis). A
powerful time series methodology, adapted to a nonlinear framework that allows
for quadratic trends with possibly smooth transitions between regimes, is
applied. This approach provides more robust conclusions in convergence path
analysis, enabling (a) robust detection of the presence, and if so, the number
of changes in the level and/or slope of the trend of the series, (b) inferences
on stationarity of relative per capita CO$_2$ emissions, conditionally on the
presence of breaks and smooth transitions between regimes, and (c) estimation
of change locations in the convergence paths. Finally, as stochastic
convergence is attained when both stationarity around a trend and
$\beta$-convergence hold, the linear approach proposed by Tomljanovich and
Vogelsang (2002) is extended in order to allow for more general quadratic
models. Overall, joint analysis finds some evidence of stochastic convergence
in per capita CO$_2$ emissions. Some dispersion in terms of $\beta$-convergence
is detected by group analysis, particularly among developed countries. This is
in accordance with per capita GDP not being the sole determinant of convergence
in emissions, with factors like search for more efficient technologies, fossil
fuel substitution, innovation, and possibly outsources of industries, also
having a crucial role.

arXiv link: http://arxiv.org/abs/2402.00567v1

Econometrics arXiv paper, submitted: 2024-01-31

Finite- and Large-Sample Inference for Ranks using Multinomial Data with an Application to Ranking Political Parties

Authors: Sergei Bazylik, Magne Mogstad, Joseph Romano, Azeem Shaikh, Daniel Wilhelm

It is common to rank different categories by means of preferences that are
revealed through data on choices. A prominent example is the ranking of
political candidates or parties using the estimated share of support each one
receives in surveys or polls about political attitudes. Since these rankings
are computed using estimates of the share of support rather than the true share
of support, there may be considerable uncertainty concerning the true ranking
of the political candidates or parties. In this paper, we consider the problem
of accounting for such uncertainty by constructing confidence sets for the rank
of each category. We consider both the problem of constructing marginal
confidence sets for the rank of a particular category as well as simultaneous
confidence sets for the ranks of all categories. A distinguishing feature of
our analysis is that we exploit the multinomial structure of the data to
develop confidence sets that are valid in finite samples. We additionally
develop confidence sets using the bootstrap that are valid only approximately
in large samples. We use our methodology to rank political parties in Australia
using data from the 2019 Australian Election Survey. We find that our
finite-sample confidence sets are informative across the entire ranking of
political parties, even in Australian territories with few survey respondents
and/or with parties that are chosen by only a small share of the survey
respondents. In contrast, the bootstrap-based confidence sets may sometimes be
considerably less informative. These findings motivate us to compare these
methods in an empirically-driven simulation study, in which we conclude that
our finite-sample confidence sets often perform better than their large-sample,
bootstrap-based counterparts, especially in settings that resemble our
empirical application.

arXiv link: http://arxiv.org/abs/2402.00192v1

Econometrics arXiv updated paper (originally submitted: 2024-01-31)

The Mixed Aggregate Preference Logit Model: A Machine Learning Approach to Modeling Unobserved Heterogeneity in Discrete Choice Analysis

Authors: Connor R. Forsythe, Cristian Arteaga, John P. Helveston

This paper introduces the Mixed Aggregate Preference Logit (MAPL, pronounced
"maple”) model, a novel class of discrete choice models that leverages machine
learning to model unobserved heterogeneity in discrete choice analysis. The
traditional mixed logit model (also known as "random parameters logit”)
parameterizes preference heterogeneity through assumptions about
feature-specific heterogeneity distributions. These parameters are also
typically assumed to be linearly added in a random utility (or random regret)
model. MAPL models relax these assumptions by instead directly relating model
inputs to parameters of alternative-specific distributions of aggregate
preference heterogeneity, with no feature-level assumptions required. MAPL
models eliminate the need to make any assumption about the functional form of
the latent decision model, freeing modelers from potential misspecification
errors. In a simulation experiment, we demonstrate that a single MAPL model
specification is capable of correctly modeling multiple different
data-generating processes with different forms of utility and heterogeneity
specifications. MAPL models advance machine-learning-based choice models by
accounting for unobserved heterogeneity. Further, MAPL models can be leveraged
by traditional choice modelers as a diagnostic tool for identifying utility and
heterogeneity misspecification.

arXiv link: http://arxiv.org/abs/2402.00184v2

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2024-01-31

The Fourier-Malliavin Volatility (FMVol) MATLAB library

Authors: Simona Sanfelici, Giacomo Toscano

This paper presents the Fourier-Malliavin Volatility (FMVol) estimation
library for MATLAB. This library includes functions that implement Fourier-
Malliavin estimators (see Malliavin and Mancino (2002, 2009)) of the volatility
and co-volatility of continuous stochastic volatility processes and
second-order quantities, like the quarticity (the squared volatility), the
volatility of volatility and the leverage (the covariance between changes in
the process and changes in its volatility). The Fourier-Malliavin method is
fully non-parametric, does not require equally-spaced observations and is
robust to measurement errors, or noise, without any preliminary bias correction
or pre-treatment of the observations. Further, in its multivariate version, it
is intrinsically robust to irregular and asynchronous sampling. Although
originally introduced for a specific application in financial econometrics,
namely the estimation of asset volatilities, the Fourier-Malliavin method is a
general method that can be applied whenever one is interested in reconstructing
the latent volatility and second-order quantities of a continuous stochastic
volatility process from discrete observations.

arXiv link: http://arxiv.org/abs/2402.00172v1

Econometrics arXiv updated paper (originally submitted: 2024-01-31)

Regularizing Fairness in Optimal Policy Learning with Distributional Targets

Authors: Anders Bredahl Kock, David Preinerstorfer

A decision maker typically (i) incorporates training data to learn about the
relative effectiveness of treatments, and (ii) chooses an implementation
mechanism that implies an “optimal” predicted outcome distribution according
to some target functional. Nevertheless, a fairness-aware decision maker may
not be satisfied achieving said optimality at the cost of being “unfair"
against a subgroup of the population, in the sense that the outcome
distribution in that subgroup deviates too strongly from the overall optimal
outcome distribution. We study a framework that allows the decision maker to
regularize such deviations, while allowing for a wide range of target
functionals and fairness measures to be employed. We establish regret and
consistency guarantees for empirical success policies with (possibly)
data-driven preference parameters, and provide numerical results. Furthermore,
we briefly illustrate the methods in two empirical settings.

arXiv link: http://arxiv.org/abs/2401.17909v2

Econometrics arXiv updated paper (originally submitted: 2024-01-31)

Marginal treatment effects in the absence of instrumental variables

Authors: Zhewen Pan, Zhengxin Wang, Junsen Zhang, Yahong Zhou

We propose a method for defining, identifying, and estimating the marginal
treatment effect (MTE) without imposing the instrumental variable (IV)
assumptions of independence, exclusion, and separability (or monotonicity).
Under a new definition of the MTE based on reduced-form treatment error that is
statistically independent of the covariates, we find that the relationship
between the MTE and standard treatment parameters holds in the absence of IVs.
We provide a set of sufficient conditions ensuring the identification of the
defined MTE in an environment of essential heterogeneity. The key conditions
include a linear restriction on potential outcome regression functions, a
nonlinear restriction on the propensity score, and a conditional mean
independence restriction that will lead to additive separability. We prove this
identification using the notion of semiparametric identification based on
functional form. And we provide an empirical application for the Head Start
program to illustrate the usefulness of the proposed method in analyzing
heterogenous causal effects when IVs are elusive.

arXiv link: http://arxiv.org/abs/2401.17595v2

Econometrics arXiv paper, submitted: 2024-01-30

Partial Identification of Binary Choice Models with Misreported Outcomes

Authors: Orville Mondal, Rui Wang

This paper provides partial identification of various binary choice models
with misreported dependent variables. We propose two distinct approaches by
exploiting different instrumental variables respectively. In the first
approach, the instrument is assumed to only affect the true dependent variable
but not misreporting probabilities. The second approach uses an instrument that
influences misreporting probabilities monotonically while having no effect on
the true dependent variable. Moreover, we derive identification results under
additional restrictions on misreporting, including bounded/monotone
misreporting probabilities. We use simulations to demonstrate the robust
performance of our approaches, and apply the method to study educational
attainment.

arXiv link: http://arxiv.org/abs/2401.17137v1

Econometrics arXiv cross-link from cs.GT (cs.GT), submitted: 2024-01-30

Congestion Pricing for Efficiency and Equity: Theory and Applications to the San Francisco Bay Area

Authors: Chinmay Maheshwari, Kshitij Kulkarni, Druv Pai, Jiarui Yang, Manxi Wu, Shankar Sastry

Congestion pricing, while adopted by many cities to alleviate traffic
congestion, raises concerns about widening socioeconomic disparities due to its
disproportionate impact on low-income travelers. We address this concern by
proposing a new class of congestion pricing schemes that not only minimize
total travel time, but also incorporate an equity objective, reducing
disparities in the relative change in travel costs across populations with
different incomes, following the implementation of tolls. Our analysis builds
on a congestion game model with heterogeneous traveler populations. We present
four pricing schemes that account for practical considerations, such as the
ability to charge differentiated tolls to various traveler populations and the
option to toll all or only a subset of edges in the network. We evaluate our
pricing schemes in the calibrated freeway network of the San Francisco Bay
Area. We demonstrate that the proposed congestion pricing schemes improve both
the total travel time and the equity objective compared to the current pricing
scheme.
Our results further show that pricing schemes charging differentiated prices
to traveler populations with varying value-of-time lead to a more equitable
distribution of travel costs compared to those that charge a homogeneous price
to all.

arXiv link: http://arxiv.org/abs/2401.16844v2

Econometrics arXiv paper, submitted: 2024-01-29

Graph Neural Networks: Theory for Estimation with Application on Network Heterogeneity

Authors: Yike Wang, Chris Gu, Taisuke Otsu

This paper presents a novel application of graph neural networks for modeling
and estimating network heterogeneity. Network heterogeneity is characterized by
variations in unit's decisions or outcomes that depend not only on its own
attributes but also on the conditions of its surrounding neighborhood. We
delineate the convergence rate of the graph neural networks estimator, as well
as its applicability in semiparametric causal inference with heterogeneous
treatment effects. The finite-sample performance of our estimator is evaluated
through Monte Carlo simulations. In an empirical setting related to
microfinance program participation, we apply the new estimator to examine the
average treatment effects and outcomes of counterfactual policies, and to
propose an enhanced strategy for selecting the initial recipients of program
information in social networks.

arXiv link: http://arxiv.org/abs/2401.16275v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2024-01-28

Comparing MCMC algorithms in Stochastic Volatility Models using Simulation Based Calibration

Authors: Benjamin Wee

Simulation Based Calibration (SBC) is applied to analyse two commonly used,
competing Markov chain Monte Carlo algorithms for estimating the posterior
distribution of a stochastic volatility model. In particular, the bespoke
'off-set mixture approximation' algorithm proposed by Kim, Shephard, and Chib
(1998) is explored together with a Hamiltonian Monte Carlo algorithm
implemented through Stan. The SBC analysis involves a simulation study to
assess whether each sampling algorithm has the capacity to produce valid
inference for the correctly specified model, while also characterising
statistical efficiency through the effective sample size. Results show that
Stan's No-U-Turn sampler, an implementation of Hamiltonian Monte Carlo,
produces a well-calibrated posterior estimate while the celebrated off-set
mixture approach is less efficient and poorly calibrated, though model
parameterisation also plays a role. Limitations and restrictions of generality
are discussed.

arXiv link: http://arxiv.org/abs/2402.12384v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-01-27

Testing the Exogeneity of Instrumental Variables and Regressors in Linear Regression Models Using Copulas

Authors: Seyed Morteza Emadi

We provide a Copula-based approach to test the exogeneity of instrumental
variables in linear regression models. We show that the exogeneity of
instrumental variables is equivalent to the exogeneity of their standard normal
transformations with the same CDF value. Then, we establish a Wald test for the
exogeneity of the instrumental variables. We demonstrate the performance of our
test using simulation studies. Our simulations show that if the instruments are
actually endogenous, our test rejects the exogeneity hypothesis approximately
93% of the time at the 5% significance level. Conversely, when instruments are
truly exogenous, it dismisses the exogeneity assumption less than 30% of the
time on average for data with 200 observations and less than 2% of the time for
data with 1,000 observations. Our results demonstrate our test's effectiveness,
offering significant value to applied econometricians.

arXiv link: http://arxiv.org/abs/2401.15253v1

Econometrics arXiv paper, submitted: 2024-01-26

csranks: An R Package for Estimation and Inference Involving Ranks

Authors: Denis Chetverikov, Magne Mogstad, Pawel Morgen, Joseph Romano, Azeem Shaikh, Daniel Wilhelm

This article introduces the R package csranks for estimation and inference
involving ranks. First, we review methods for the construction of confidence
sets for ranks, namely marginal and simultaneous confidence sets as well as
confidence sets for the identities of the tau-best. Second, we review methods
for estimation and inference in regressions involving ranks. Third, we describe
the implementation of these methods in csranks and illustrate their usefulness
in two examples: one about the quantification of uncertainty in the PISA
ranking of countries and one about the measurement of intergenerational
mobility using rank-rank regressions.

arXiv link: http://arxiv.org/abs/2401.15205v1

Econometrics arXiv updated paper (originally submitted: 2024-01-26)

High-dimensional forecasting with known knowns and known unknowns

Authors: M. Hashem Pesaran, Ron P. Smith

Forecasts play a central role in decision making under uncertainty. After a
brief review of the general issues, this paper considers ways of using
high-dimensional data in forecasting. We consider selecting variables from a
known active set, known knowns, using Lasso and OCMT, and approximating
unobserved latent factors, known unknowns, by various means. This combines both
sparse and dense approaches. We demonstrate the various issues involved in
variable selection in a high-dimensional setting with an application to
forecasting UK inflation at different horizons over the period 2020q1-2023q1.
This application shows both the power of parsimonious models and the importance
of allowing for global variables.

arXiv link: http://arxiv.org/abs/2401.14582v2

Econometrics arXiv updated paper (originally submitted: 2024-01-25)

Structural Periodic Vector Autoregressions

Authors: Daniel Dzikowski, Carsten Jentsch

While seasonality inherent to raw macroeconomic data is commonly removed by
seasonal adjustment techniques before it is used for structural inference, this
may distort valuable information in the data. As an alternative method to
commonly used structural vector autoregressions (SVARs) for seasonally adjusted
data, we propose to model potential periodicity in seasonally unadjusted (raw)
data directly by structural periodic vector autoregressions (SPVARs). This
approach does not only allow for periodically time-varying intercepts, but also
for periodic autoregressive parameters and innovations variances. As this
larger flexibility leads to an increased number of parameters, we propose
linearly constrained estimation techniques. Moreover, based on SPVARs, we
provide two novel identification schemes and propose a general framework for
impulse response analyses that allows for direct consideration of seasonal
patterns. We provide asymptotic theory for SPVAR estimators and impulse
responses under flexible linear restrictions and introduce a test for
seasonality in impulse responses. For the construction of confidence intervals,
we discuss several residual-based (seasonal) bootstrap methods and prove their
bootstrap consistency under different assumptions. A real data application
shows that useful information about the periodic structure in the data may be
lost when relying on common seasonal adjustment methods.

arXiv link: http://arxiv.org/abs/2401.14545v2

Econometrics arXiv paper, submitted: 2024-01-25

Identification of Nonseparable Models with Endogenous Control Variables

Authors: Kaicheng Chen, Kyoo il Kim

We study identification of the treatment effects in a class of nonseparable
models with the presence of potentially endogenous control variables. We show
that given the treatment variable and the controls are measurably separated,
the usual conditional independence condition or availability of excluded
instrument suffices for identification.

arXiv link: http://arxiv.org/abs/2401.14395v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-01-24

Entrywise Inference for Missing Panel Data: A Simple and Instance-Optimal Approach

Authors: Yuling Yan, Martin J. Wainwright

Longitudinal or panel data can be represented as a matrix with rows indexed
by units and columns indexed by time. We consider inferential questions
associated with the missing data version of panel data induced by staggered
adoption. We propose a computationally efficient procedure for estimation,
involving only simple matrix algebra and singular value decomposition, and
prove non-asymptotic and high-probability bounds on its error in estimating
each missing entry. By controlling proximity to a suitably scaled Gaussian
variable, we develop and analyze a data-driven procedure for constructing
entrywise confidence intervals with pre-specified coverage. Despite its
simplicity, our procedure turns out to be instance-optimal: we prove that the
width of our confidence intervals match a non-asymptotic instance-wise lower
bound derived via a Bayesian Cram\'{e}r-Rao argument. We illustrate the
sharpness of our theoretical characterization on a variety of numerical
examples. Our analysis is based on a general inferential toolbox for SVD-based
algorithm applied to the matrix denoising model, which might be of independent
interest.

arXiv link: http://arxiv.org/abs/2401.13665v2

Econometrics arXiv paper, submitted: 2024-01-24

New accessibility measures based on unconventional big data sources

Authors: G. Arbia, V. Nardelli, N. Salvini, I. Valentini

In health econometric studies we are often interested in quantifying aspects
related to the accessibility to medical infrastructures. The increasing
availability of data automatically collected through unconventional sources
(such as webscraping, crowdsourcing or internet of things) recently opened
previously unconceivable opportunities to researchers interested in measuring
accessibility and to use it as a tool for real-time monitoring, surveillance
and health policies definition. This paper contributes to this strand of
literature proposing new accessibility measures that can be continuously feeded
by automatic data collection. We present new measures of accessibility and we
illustrate their use to study the territorial impact of supply-side shocks of
health facilities. We also illustrate the potential of our proposal with a case
study based on a huge set of data (related to the Emergency Departments in
Milan, Italy) that have been webscraped for the purpose of this paper every 5
minutes since November 2021 to March 2022, amounting to approximately 5 million
observations.

arXiv link: http://arxiv.org/abs/2401.13370v1

Econometrics arXiv updated paper (originally submitted: 2024-01-24)

Realized Stochastic Volatility Model with Skew-t Distributions for Improved Volatility and Quantile Forecasting

Authors: Makoto Takahashi, Yuta Yamauchi, Toshiaki Watanabe, Yasuhiro Omori

Accurate forecasting of volatility and return quantiles is essential for
evaluating financial tail risks such as value-at-risk and expected shortfall.
This study proposes an extension of the traditional stochastic volatility
model, termed the realized stochastic volatility model, that incorporates
realized volatility as an efficient proxy for latent volatility. To better
capture the stylized features of financial return distributions, particularly
skewness and heavy tails, we introduce three variants of skewed
t-distributions, two of which incorporate skew-normal components to flexibly
model asymmetry. The models are estimated using a Bayesian Markov chain Monte
Carlo approach and applied to daily returns and realized volatilities from
major U.S. and Japanese stock indices. Empirical results demonstrate that
incorporating both realized volatility and flexible return distributions
substantially improves the accuracy of volatility and tail risk forecasts.

arXiv link: http://arxiv.org/abs/2401.13179v3

Econometrics arXiv updated paper (originally submitted: 2024-01-23)

Inference under partial identification with minimax test statistics

Authors: Isaac Loh

We provide a means of computing and estimating the asymptotic distributions
of statistics based on an outer minimization of an inner maximization. Such
test statistics, which arise frequently in moment models, are of special
interest in providing hypothesis tests under partial identification. Under
general conditions, we provide an asymptotic characterization of such test
statistics using the minimax theorem, and a means of computing critical values
using the bootstrap. Making some light regularity assumptions, our results
augment several asymptotic approximations that have been provided for partially
identified hypothesis tests, and extend them by mitigating their dependence on
local linear approximations of the parameter space. These asymptotic results
are generally simple to state and straightforward to compute (esp.\
adversarially).

arXiv link: http://arxiv.org/abs/2401.13057v2

Econometrics arXiv paper, submitted: 2024-01-22

Interpreting Event-Studies from Recent Difference-in-Differences Methods

Authors: Jonathan Roth

This note discusses the interpretation of event-study plots produced by
recent difference-in-differences methods. I show that even when specialized to
the case of non-staggered treatment timing, the default plots produced by
software for three of the most popular recent methods (de Chaisemartin and
D'Haultfoeuille, 2020; Callaway and SantAnna, 2021; Borusyak, Jaravel and
Spiess, 2024) do not match those of traditional two-way fixed effects (TWFE)
event-studies: the new methods may show a kink or jump at the time of treatment
even when the TWFE event-study shows a straight line. This difference stems
from the fact that the new methods construct the pre-treatment coefficients
asymmetrically from the post-treatment coefficients. As a result, visual
heuristics for analyzing TWFE event-study plots should not be immediately
applied to those from these methods. I conclude with practical recommendations
for constructing and interpreting event-study plots when using these methods.

arXiv link: http://arxiv.org/abs/2401.12309v1

Econometrics arXiv updated paper (originally submitted: 2024-01-22)

Temporal Aggregation for the Synthetic Control Method

Authors: Liyang Sun, Eli Ben-Michael, Avi Feller

The synthetic control method (SCM) is a popular approach for estimating the
impact of a treatment on a single unit with panel data. Two challenges arise
with higher frequency data (e.g., monthly versus yearly): (1) achieving
excellent pre-treatment fit is typically more challenging; and (2) overfitting
to noise is more likely. Aggregating data over time can mitigate these problems
but can also destroy important signal. In this paper, we bound the bias for SCM
with disaggregated and aggregated outcomes and give conditions under which
aggregating tightens the bounds. We then propose finding weights that balance
both disaggregated and aggregated series.

arXiv link: http://arxiv.org/abs/2401.12084v2

Econometrics arXiv paper, submitted: 2024-01-22

A Bracketing Relationship for Long-Term Policy Evaluation with Combined Experimental and Observational Data

Authors: Yechan Park, Yuya Sasaki

Combining short-term experimental data with observational data enables
credible long-term policy evaluation. The literature offers two key but
non-nested assumptions, namely the latent unconfoundedness (LU; Athey et al.,
2020) and equi-confounding bias (ECB; Ghassami et al., 2022) conditions, to
correct observational selection. Committing to the wrong assumption leads to
biased estimation. To mitigate such risks, we provide a novel bracketing
relationship (cf. Angrist and Pischke, 2009) repurposed for the setting with
data combination: the LU-based estimand and the ECB-based estimand serve as the
lower and upper bounds, respectively, with the true causal effect lying in
between if either assumption holds. For researchers further seeking point
estimates, our Lalonde-style exercise suggests the conservatively more robust
LU-based lower bounds align closely with the hold-out experimental estimates
for educational policy evaluation. We investigate the economic substantives of
these findings through the lens of a nonparametric class of selection
mechanisms and sensitivity analysis. We uncover as key the sub-martingale
property and sufficient-statistics role (Chetty, 2009) of the potential
outcomes of student test scores (Chetty et al., 2011, 2014).

arXiv link: http://arxiv.org/abs/2401.12050v1

Econometrics arXiv updated paper (originally submitted: 2024-01-21)

Local Identification in Instrumental Variable Multivariate Quantile Regression Models

Authors: Haruki Kono

In the instrumental variable quantile regression (IVQR) model of Chernozhukov
and Hansen (2005), a one-dimensional unobserved rank variable monotonically
determines a single potential outcome. Even when multiple outcomes are
simultaneously of interest, it is common to apply the IVQR model to each of
them separately. This practice implicitly assumes that the rank variable of
each regression model affects only the corresponding outcome and does not
affect the other outcomes. In reality, however, it is often the case that all
rank variables together determine the outcomes, which leads to a systematic
correlation between the outcomes. To deal with this, we propose a nonlinear IV
model that allows for multivariate unobserved heterogeneity, each of which is
considered as a rank variable for an observed outcome. We show that the
structural function of our model is locally identified under the assumption
that the IV and the treatment variable are sufficiently positively correlated.

arXiv link: http://arxiv.org/abs/2401.11422v3

Econometrics arXiv updated paper (originally submitted: 2024-01-20)

Estimation with Pairwise Observations

Authors: Felix Chan, Laszlo Matyas

The paper introduces a new estimation method for the standard linear
regression model. The procedure is not driven by the optimisation of any
objective function rather, it is a simple weighted average of slopes from
observation pairs. The paper shows that such estimator is consistent for
carefully selected weights. Other properties, such as asymptotic distributions,
have also been derived to facilitate valid statistical inference. Unlike
traditional methods, such as Least Squares and Maximum Likelihood, among
others, the estimated residual of this estimator is not by construction
orthogonal to the explanatory variables of the model. This property allows a
wide range of practical applications, such as the testing of endogeneity, i.e.,
the correlation between the explanatory variables and the disturbance terms.

arXiv link: http://arxiv.org/abs/2401.11229v2

Econometrics arXiv paper, submitted: 2024-01-19

Information Based Inference in Models with Set-Valued Predictions and Misspecification

Authors: Hiroaki Kaido, Francesca Molinari

This paper proposes an information-based inference method for partially
identified parameters in incomplete models that is valid both when the model is
correctly specified and when it is misspecified. Key features of the method
are: (i) it is based on minimizing a suitably defined Kullback-Leibler
information criterion that accounts for incompleteness of the model and
delivers a non-empty pseudo-true set; (ii) it is computationally tractable;
(iii) its implementation is the same for both correctly and incorrectly
specified models; (iv) it exploits all information provided by variation in
discrete and continuous covariates; (v) it relies on Rao's score statistic,
which is shown to be asymptotically pivotal.

arXiv link: http://arxiv.org/abs/2401.11046v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-01-19

When the Universe is Too Big: Bounding Consideration Probabilities for Plackett-Luce Rankings

Authors: Ben Aoki-Sherwood, Catherine Bregou, David Liben-Nowell, Kiran Tomlinson, Thomas Zeng

The widely used Plackett-Luce ranking model assumes that individuals rank
items by making repeated choices from a universe of items. But in many cases
the universe is too big for people to plausibly consider all options. In the
choice literature, this issue has been addressed by supposing that individuals
first sample a small consideration set and then choose among the considered
items. However, inferring unobserved consideration sets (or item consideration
probabilities) in this "consider then choose" setting poses significant
challenges, because even simple models of consideration with strong
independence assumptions are not identifiable, even if item utilities are
known. We apply the consider-then-choose framework to top-$k$ rankings, where
we assume rankings are constructed according to a Plackett-Luce model after
sampling a consideration set. While item consideration probabilities remain
non-identified in this setting, we prove that we can infer bounds on the
relative values of consideration probabilities. Additionally, given a condition
on the expected consideration set size and known item utilities, we derive
absolute upper and lower bounds on item consideration probabilities. We also
provide algorithms to tighten those bounds on consideration probabilities by
propagating inferred constraints. Thus, we show that we can learn useful
information about consideration probabilities despite not being able to
identify them precisely. We demonstrate our methods on a ranking dataset from a
psychology experiment with two different ranking tasks (one with fixed
consideration sets and one with unknown consideration sets). This combination
of data allows us to estimate utilities and then learn about unknown
consideration probabilities using our bounds.

arXiv link: http://arxiv.org/abs/2401.11016v2

Econometrics arXiv paper, submitted: 2024-01-18

Nowcasting economic activity in European regions using a mixed-frequency dynamic factor model

Authors: Luca Barbaglia, Lorenzo Frattarolo, Niko Hauzenberger, Dominik Hirschbuehl, Florian Huber, Luca Onorante, Michael Pfarrhofer, Luca Tiozzo Pezzoli

Timely information about the state of regional economies can be essential for
planning, implementing and evaluating locally targeted economic policies.
However, European regional accounts for output are published at an annual
frequency and with a two-year delay. To obtain robust and more timely measures
in a computationally efficient manner, we propose a mixed-frequency dynamic
factor model that accounts for national information to produce high-frequency
estimates of the regional gross value added (GVA). We show that our model
produces reliable nowcasts of GVA in 162 regions across 12 European countries.

arXiv link: http://arxiv.org/abs/2401.10054v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2024-01-18

A Quantile Nelson-Siegel model

Authors: Matteo Iacopini, Aubrey Poon, Luca Rossini, Dan Zhu

We propose a novel framework for modeling the yield curve from a quantile
perspective. Building on the dynamic Nelson-Siegel model of Diebold et al.
(2006), we extend its traditional mean-based approach to a quantile regression
setting, enabling the estimation of yield curve factors - level, slope, and
curvature - at specific quantiles of the conditional distribution. A key
advantage of our framework is its ability to characterize the entire
conditional distribution of the yield curve across maturities and over time. In
an empirical analysis of the U.S. term structure of interest rates, our method
demonstrates superior out-of-sample forecasting performance, particularly in
capturing the tails of the yield distribution - an aspect increasingly
emphasized in the recent literature on distributional forecasting. In addition
to its forecasting advantages, our approach reveals rich distributional
features beyond the mean. In particular, we find that the dynamic changes in
these distributional features differ markedly between the Great Recession and
the COVID-19 pandemic period, highlighting a fundamental shift in how interest
rate markets respond to distinct economic shocks.

arXiv link: http://arxiv.org/abs/2401.09874v2

Econometrics arXiv paper, submitted: 2024-01-16

Assessing the impact of forced and voluntary behavioral changes on economic-epidemiological co-dynamics: A comparative case study between Belgium and Sweden during the 2020 COVID-19 pandemic

Authors: Tijs W. Alleman, Jan M. Baetens

During the COVID-19 pandemic, governments faced the challenge of managing
population behavior to prevent their healthcare systems from collapsing. Sweden
adopted a strategy centered on voluntary sanitary recommendations while Belgium
resorted to mandatory measures. Their consequences on pandemic progression and
associated economic impacts remain insufficiently understood. This study
leverages the divergent policies of Belgium and Sweden during the COVID-19
pandemic to relax the unrealistic -- but persistently used -- assumption that
social contacts are not influenced by an epidemic's dynamics. We develop an
epidemiological-economic co-simulation model where pandemic-induced behavioral
changes are a superposition of voluntary actions driven by fear, prosocial
behavior or social pressure, and compulsory compliance with government
directives. Our findings emphasize the importance of early responses, which
reduce the stringency of measures necessary to safeguard healthcare systems and
minimize ensuing economic damage. Voluntary behavioral changes lead to a
pattern of recurring epidemics, which should be regarded as the natural
long-term course of pandemics. Governments should carefully consider prolonging
lockdown longer than necessary because this leads to higher economic damage and
a potentially higher second surge when measures are released. Our model can aid
policymakers in the selection of an appropriate long-term strategy that
minimizes economic damage.

arXiv link: http://arxiv.org/abs/2401.08442v1

Econometrics arXiv updated paper (originally submitted: 2024-01-16)

Causal Machine Learning for Moderation Effects

Authors: Nora Bearth, Michael Lechner

It is valuable for any decision maker to know the impact of decisions
(treatments) on average and for subgroups. The causal machine learning
literature has recently provided tools for estimating group average treatment
effects (GATE) to better describe treatment heterogeneity. This paper addresses
the challenge of interpreting such differences in treatment effects between
groups while accounting for variations in other covariates. We propose a new
parameter, the balanced group average treatment effect (BGATE), which measures
a GATE with a specific distribution of a priori-determined covariates. By
taking the difference between two BGATEs, we can analyze heterogeneity more
meaningfully than by comparing two GATEs, as we can separate the difference due
to the different distributions of other variables and the difference due to the
variable of interest. The main estimation strategy for this parameter is based
on double/debiased machine learning for discrete treatments in an
unconfoundedness setting, and the estimator is shown to be
$N$-consistent and asymptotically normal under standard conditions. We
propose two additional estimation strategies: automatic debiased machine
learning and a specific reweighting procedure. Last, we demonstrate the
usefulness of these parameters in a small-scale simulation study and in an
empirical example.

arXiv link: http://arxiv.org/abs/2401.08290v3

Econometrics arXiv paper, submitted: 2024-01-13

A Note on Uncertainty Quantification for Maximum Likelihood Parameters Estimated with Heuristic Based Optimization Algorithms

Authors: Zachary Porreca

Gradient-based solvers risk convergence to local optima, leading to incorrect
researcher inference. Heuristic-based algorithms are able to “break free" of
these local optima to eventually converge to the true global optimum. However,
given that they do not provide the gradient/Hessian needed to approximate the
covariance matrix and that the significantly longer computational time they
require for convergence likely precludes resampling procedures for inference,
researchers often are unable to quantify uncertainty in the estimates they
derive with these methods. This note presents a simple and relatively fast
two-step procedure to estimate the covariance matrix for parameters estimated
with these algorithms. This procedure relies on automatic differentiation, a
computational means of calculating derivatives that is popular in machine
learning applications. A brief empirical example demonstrates the advantages of
this procedure relative to bootstrapping and shows the similarity in standard
error estimates between this procedure and that which would normally accompany
maximum likelihood estimation with a gradient-based algorithm.

arXiv link: http://arxiv.org/abs/2401.07176v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-01-13

Inference for Synthetic Controls via Refined Placebo Tests

Authors: Lihua Lei, Timothy Sudijono

The synthetic control method is often applied to problems with one treated
unit and a small number of control units. A common inferential task in this
setting is to test null hypotheses regarding the average treatment effect on
the treated. Inference procedures that are justified asymptotically are often
unsatisfactory due to (1) small sample sizes that render large-sample
approximation fragile and (2) simplification of the estimation procedure that
is implemented in practice. An alternative is permutation inference, which is
related to a common diagnostic called the placebo test. It has provable Type-I
error guarantees in finite samples without simplification of the method, when
the treatment is uniformly assigned. Despite this robustness, the placebo test
suffers from low resolution since the null distribution is constructed from
only $N$ reference estimates, where $N$ is the sample size. This creates a
barrier for statistical inference at a common level like $\alpha = 0.05$,
especially when $N$ is small. We propose a novel leave-two-out procedure that
bypasses this issue, while still maintaining the same finite-sample Type-I
error guarantee under uniform assignment for a wide range of $N$. Unlike the
placebo test whose Type-I error always equals the theoretical upper bound, our
procedure often achieves a lower unconditional Type-I error than theory
suggests; this enables useful inference in the challenging regime when $\alpha
< 1/N$. Empirically, our procedure achieves a higher power when the effect size
is reasonably large and a comparable power otherwise. We generalize our
procedure to non-uniform assignments and show how to conduct sensitivity
analysis. From a methodological perspective, our procedure can be viewed as a
new type of randomization inference different from permutation or rank-based
inference, which is particularly effective in small samples.

arXiv link: http://arxiv.org/abs/2401.07152v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2024-01-13

Bubble Modeling and Tagging: A Stochastic Nonlinear Autoregression Approach

Authors: Xuanling Yang, Dong Li, Ting Zhang

Economic and financial time series can feature locally explosive behavior
when a bubble is formed. The economic or financial bubble, especially its
dynamics, is an intriguing topic that has been attracting longstanding
attention. To illustrate the dynamics of the local explosion itself, the paper
presents a novel, simple, yet useful time series model, called the stochastic
nonlinear autoregressive model, which is always strictly stationary and
geometrically ergodic and can create long swings or persistence observed in
many macroeconomic variables. When a nonlinear autoregressive coefficient is
outside of a certain range, the model has periodically explosive behaviors and
can then be used to portray the bubble dynamics. Further, the quasi-maximum
likelihood estimation (QMLE) of our model is considered, and its strong
consistency and asymptotic normality are established under minimal assumptions
on innovation. A new model diagnostic checking statistic is developed for model
fitting adequacy. In addition, two methods for bubble tagging are proposed, one
from the residual perspective and the other from the null-state perspective.
Monte Carlo simulation studies are conducted to assess the performances of the
QMLE and the two bubble tagging methods in finite samples. Finally, the
usefulness of the model is illustrated by an empirical application to the
monthly Hang Seng Index.

arXiv link: http://arxiv.org/abs/2401.07038v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2024-01-12

Deep Learning With DAGs

Authors: Sourabh Balgi, Adel Daoud, Jose M. Peña, Geoffrey T. Wodtke, Jesse Zhou

Social science theories often postulate causal relationships among a set of
variables or events. Although directed acyclic graphs (DAGs) are increasingly
used to represent these theories, their full potential has not yet been
realized in practice. As non-parametric causal models, DAGs require no
assumptions about the functional form of the hypothesized relationships.
Nevertheless, to simplify the task of empirical evaluation, researchers tend to
invoke such assumptions anyway, even though they are typically arbitrary and do
not reflect any theoretical content or prior knowledge. Moreover, functional
form assumptions can engender bias, whenever they fail to accurately capture
the complexity of the causal system under investigation. In this article, we
introduce causal-graphical normalizing flows (cGNFs), a novel approach to
causal inference that leverages deep neural networks to empirically evaluate
theories represented as DAGs. Unlike conventional approaches, cGNFs model the
full joint distribution of the data according to a DAG supplied by the analyst,
without relying on stringent assumptions about functional form. In this way,
the method allows for flexible, semi-parametric estimation of any causal
estimand that can be identified from the DAG, including total effects,
conditional effects, direct and indirect effects, and path-specific effects. We
illustrate the method with a reanalysis of Blau and Duncan's (1967) model of
status attainment and Zhou's (2019) model of conditional versus controlled
mobility. To facilitate adoption, we provide open-source software together with
a series of online tutorials for implementing cGNFs. The article concludes with
a discussion of current limitations and directions for future development.

arXiv link: http://arxiv.org/abs/2401.06864v1

Econometrics arXiv paper, submitted: 2024-01-12

Robust Analysis of Short Panels

Authors: Andrew Chesher, Adam M. Rosen, Yuanqi Zhang

Many structural econometric models include latent variables on whose
probability distributions one may wish to place minimal restrictions. Leading
examples in panel data models are individual-specific variables sometimes
treated as "fixed effects" and, in dynamic models, initial conditions. This
paper presents a generally applicable method for characterizing sharp
identified sets when models place no restrictions on the probability
distribution of certain latent variables and no restrictions on their
covariation with other variables. In our analysis latent variables on which
restrictions are undesirable are removed, leading to econometric analysis
robust to misspecification of restrictions on their distributions which are
commonplace in the applied panel data literature. Endogenous explanatory
variables are easily accommodated. Examples of application to some static and
dynamic binary, ordered and multiple discrete choice and censored panel data
models are presented.

arXiv link: http://arxiv.org/abs/2401.06611v1

Econometrics arXiv updated paper (originally submitted: 2024-01-11)

Exposure effects are not automatically useful for policymaking

Authors: Eric Auerbach, Jonathan Auerbach, Max Tabord-Meehan

We thank Savje (2023) for a thought-provoking article and appreciate the
opportunity to share our perspective as social scientists. In his article,
Savje recommends misspecified exposure effects as a way to avoid strong
assumptions about interference when analyzing the results of an experiment. In
this invited discussion, we highlight a limiation of Savje's recommendation:
exposure effects are not generally useful for evaluating social policies
without the strong assumptions that Savje seeks to avoid.

arXiv link: http://arxiv.org/abs/2401.06264v2

Econometrics arXiv updated paper (originally submitted: 2024-01-11)

Covariance Function Estimation for High-Dimensional Functional Time Series with Dual Factor Structures

Authors: Chenlei Leng, Degui Li, Hanlin Shang, Yingcun Xia

We propose a flexible dual functional factor model for modelling
high-dimensional functional time series. In this model, a high-dimensional
fully functional factor parametrisation is imposed on the observed functional
processes, whereas a low-dimensional version (via series approximation) is
assumed for the latent functional factors. We extend the classic principal
component analysis technique for the estimation of a low-rank structure to the
estimation of a large covariance matrix of random functions that satisfies a
notion of (approximate) functional "low-rank plus sparse" structure; and
generalise the matrix shrinkage method to functional shrinkage in order to
estimate the sparse structure of functional idiosyncratic components. Under
appropriate regularity conditions, we derive the large sample theory of the
developed estimators, including the consistency of the estimated factors and
functional factor loadings and the convergence rates of the estimated matrices
of covariance functions measured by various (functional) matrix norms.
Consistent selection of the number of factors and a data-driven rule to choose
the shrinkage parameter are discussed. Simulation and empirical studies are
provided to demonstrate the finite-sample performance of the developed model
and estimation methodology.

arXiv link: http://arxiv.org/abs/2401.05784v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-01-10

On Efficient Inference of Causal Effects with Multiple Mediators

Authors: Haoyu Wei, Hengrui Cai, Chengchun Shi, Rui Song

This paper provides robust estimators and efficient inference of causal
effects involving multiple interacting mediators. Most existing works either
impose a linear model assumption among the mediators or are restricted to
handle conditionally independent mediators given the exposure. To overcome
these limitations, we define causal and individual mediation effects in a
general setting, and employ a semiparametric framework to develop quadruply
robust estimators for these causal effects. We further establish the asymptotic
normality of the proposed estimators and prove their local semiparametric
efficiencies. The proposed method is empirically validated via simulated and
real datasets concerning psychiatric disorders in trauma survivors.

arXiv link: http://arxiv.org/abs/2401.05517v1

Econometrics arXiv paper, submitted: 2024-01-09

A Deep Learning Representation of Spatial Interaction Model for Resilient Spatial Planning of Community Business Clusters

Authors: Haiyan Hao, Yan Wang

Existing Spatial Interaction Models (SIMs) are limited in capturing the
complex and context-aware interactions between business clusters and trade
areas. To address the limitation, we propose a SIM-GAT model to predict
spatiotemporal visitation flows between community business clusters and their
trade areas. The model innovatively represents the integrated system of
business clusters, trade areas, and transportation infrastructure within an
urban region using a connected graph. Then, a graph-based deep learning model,
i.e., Graph AttenTion network (GAT), is used to capture the complexity and
interdependencies of business clusters. We developed this model with data
collected from the Miami metropolitan area in Florida. We then demonstrated its
effectiveness in capturing varying attractiveness of business clusters to
different residential neighborhoods and across scenarios with an eXplainable AI
approach. We contribute a novel method supplementing conventional SIMs to
predict and analyze the dynamics of inter-connected community business
clusters. The analysis results can inform data-evidenced and place-specific
planning strategies helping community business clusters better accommodate
their customers across scenarios, and hence improve the resilience of community
businesses.

arXiv link: http://arxiv.org/abs/2401.04849v1

Econometrics arXiv paper, submitted: 2024-01-09

IV Estimation of Panel Data Tobit Models with Normal Errors

Authors: Bo E. Honore

Amemiya (1973) proposed a “consistent initial estimator” for the parameters
in a censored regression model with normal errors. This paper demonstrates that
a similar approach can be used to construct moment conditions for
fixed--effects versions of the model considered by Amemiya. This result
suggests estimators for models that have not previously been considered.

arXiv link: http://arxiv.org/abs/2401.04803v1

Econometrics arXiv updated paper (originally submitted: 2024-01-09)

Robust Bayesian Method for Refutable Models

Authors: Moyu Liao

We propose a robust Bayesian method for economic models that can be rejected
by some data distributions. The econometrician starts with a refutable
structural assumption which can be written as the intersection of several
assumptions. To avoid the assumption refutable, the econometrician first takes
a stance on which assumption $j$ will be relaxed and considers a function $m_j$
that measures the deviation from the assumption $j$. She then specifies a set
of prior beliefs $\Pi_s$ whose elements share the same marginal distribution
$\pi_{m_j}$ which measures the likelihood of deviations from assumption $j$.
Compared to the standard Bayesian method that specifies a single prior, the
robust Bayesian method allows the econometrician to take a stance only on the
likeliness of violation of assumption $j$ while leaving other features of the
model unspecified. We show that many frequentist approaches to relax refutable
assumptions are equivalent to particular choices of robust Bayesian prior sets,
and thus we give a Bayesian interpretation to the frequentist methods. We use
the local average treatment effect ($LATE$) in the potential outcome framework
as the leading illustrating example.

arXiv link: http://arxiv.org/abs/2401.04512v3

Econometrics arXiv updated paper (originally submitted: 2024-01-08)

Teacher bias or measurement error?

Authors: Thomas van Huizen, Madelon Jacobs, Matthijs Oosterveen

Subjective teacher evaluations play a key role in shaping students'
educational trajectories. Previous studies have shown that students of low
socioeconomic status (SES) receive worse subjective evaluations than their high
SES peers, even when they score similarly on objective standardized tests. This
is often interpreted as evidence of teacher bias. Measurement error in test
scores challenges this interpretation. We discuss how both classical and
non-classical measurement error in test scores generate a biased coefficient of
the conditional SES gap, and consider three empirical strategies to address
this bias. Using administrative data from the Netherlands, where secondary
school track recommendations are pivotal teacher judgments, we find that
measurement error explains 35 to 43% of the conditional SES gap in track
recommendations.

arXiv link: http://arxiv.org/abs/2401.04200v4

Econometrics arXiv paper, submitted: 2024-01-08

Robust Estimation in Network Vector Autoregression with Nonstationary Regressors

Authors: Christis Katsouris

This article studies identification and estimation for the network vector
autoregressive model with nonstationary regressors. In particular, network
dependence is characterized by a nonstochastic adjacency matrix. The
information set includes a stationary regressand and a node-specific vector of
nonstationary regressors, both observed at the same equally spaced time
frequencies. Our proposed econometric specification correponds to the NVAR
model under time series nonstationarity which relies on the local-to-unity
parametrization for capturing the unknown form of persistence of these
node-specific regressors. Robust econometric estimation is achieved using an
IVX-type estimator and the asymptotic theory analysis for the augmented vector
of regressors is studied based on a double asymptotic regime where both the
network size and the time dimension tend to infinity.

arXiv link: http://arxiv.org/abs/2401.04050v1

Econometrics arXiv updated paper (originally submitted: 2024-01-08)

Identification with possibly invalid IVs

Authors: Christophe Bruneel-Zupanc, Jad Beyhum

This paper proposes a novel identification strategy relying on
quasi-instrumental variables (quasi-IVs). A quasi-IV is a relevant but possibly
invalid IV because it is not exogenous or not excluded. We show that a variety
of models with discrete or continuous endogenous treatment which are usually
identified with an IV - quantile models with rank invariance, additive models
with homogenous treatment effects, and local average treatment effect models -
can be identified under the joint relevance of two complementary quasi-IVs
instead. To achieve identification, we complement one excluded but possibly
endogenous quasi-IV (e.g., "relevant proxies" such as lagged treatment choice)
with one exogenous (conditional on the excluded quasi-IV) but possibly included
quasi-IV (e.g., random assignment or exogenous market shocks). Our approach
also holds if any of the two quasi-IVs turns out to be a valid IV. In practice,
being able to address endogeneity with complementary quasi-IVs instead of IVs
is convenient since there are many applications where quasi-IVs are more
readily available. Difference-in-differences is a notable example: time is an
exogenous quasi-IV while the group assignment acts as a complementary excluded
quasi-IV.

arXiv link: http://arxiv.org/abs/2401.03990v4

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2024-01-08

Adaptive Experimental Design for Policy Learning

Authors: Masahiro Kato, Kyohei Okumura, Takuya Ishihara, Toru Kitagawa

This study investigates the contextual best arm identification (BAI) problem,
aiming to design an adaptive experiment to identify the best treatment arm
conditioned on contextual information (covariates). We consider a
decision-maker who assigns treatment arms to experimental units during an
experiment and recommends the estimated best treatment arm based on the
contexts at the end of the experiment. The decision-maker uses a policy for
recommendations, which is a function that provides the estimated best treatment
arm given the contexts. In our evaluation, we focus on the worst-case expected
regret, a relative measure between the expected outcomes of an optimal policy
and our proposed policy. We derive a lower bound for the expected simple regret
and then propose a strategy called Adaptive Sampling-Policy Learning (PLAS). We
prove that this strategy is minimax rate-optimal in the sense that its leading
factor in the regret upper bound matches the lower bound as the number of
experimental units increases.

arXiv link: http://arxiv.org/abs/2401.03756v4

Econometrics arXiv paper, submitted: 2024-01-06

Counterfactuals in factor models

Authors: Jad Beyhum

We study a new model where the potential outcomes, corresponding to the
values of a (possibly continuous) treatment, are linked through common factors.
The factors can be estimated using a panel of regressors. We propose a
procedure to estimate time-specific and unit-specific average marginal effects
in this context. Our approach can be used either with high-dimensional time
series or with large panels. It allows for treatment effects heterogenous
across time and units and is straightforward to implement since it only relies
on principal components analysis and elementary computations. We derive the
asymptotic distribution of our estimator of the average marginal effect and
highlight its solid finite sample performance through a simulation exercise.
The approach can also be used to estimate average counterfactuals or adapted to
an instrumental variables setting and we discuss these extensions. Finally, we
illustrate our novel methodology through an empirical application on income
inequality.

arXiv link: http://arxiv.org/abs/2401.03293v1

Econometrics arXiv paper, submitted: 2024-01-05

Roughness Signature Functions

Authors: Peter Christensen

Inspired by the activity signature introduced by Todorov and Tauchen (2010),
which was used to measure the activity of a semimartingale, this paper
introduces the roughness signature function. The paper illustrates how it can
be used to determine whether a discretely observed process is generated by a
continuous process that is rougher than a Brownian motion, a pure-jump process,
or a combination of the two. Further, if a continuous rough process is present,
the function gives an estimate of the roughness index. This is done through an
extensive simulation study, where we find that the roughness signature function
works as expected on rough processes. We further derive some asymptotic
properties of this new signature function. The function is applied empirically
to three different volatility measures for the S&P500 index. The three measures
are realized volatility, the VIX, and the option-extracted volatility estimator
of Todorov (2019). The realized volatility and option-extracted volatility show
signs of roughness, with the option-extracted volatility appearing smoother
than the realized volatility, while the VIX appears to be driven by a
continuous martingale with jumps.

arXiv link: http://arxiv.org/abs/2401.02819v1

Econometrics arXiv updated paper (originally submitted: 2024-01-03)

Efficient Computation of Confidence Sets Using Classification on Equidistributed Grids

Authors: Lujie Zhou

Economic models produce moment inequalities, which can be used to form tests
of the true parameters. Confidence sets (CS) of the true parameters are derived
by inverting these tests. However, they often lack analytical expressions,
necessitating a grid search to obtain the CS numerically by retaining the grid
points that pass the test. When the statistic is not asymptotically pivotal,
constructing the critical value for each grid point in the parameter space adds
to the computational burden. In this paper, we convert the computational issue
into a classification problem by using a support vector machine (SVM)
classifier. Its decision function provides a faster and more systematic way of
dividing the parameter space into two regions: inside vs. outside of the
confidence set. We label those points in the CS as 1 and those outside as -1.
Researchers can train the SVM classifier on a grid of manageable size and use
it to determine whether points on denser grids are in the CS or not. We
establish certain conditions for the grid so that there is a tuning that allows
us to asymptotically reproduce the test in the CS. This means that in the
limit, a point is classified as belonging to the confidence set if and only if
it is labeled as 1 by the SVM.

arXiv link: http://arxiv.org/abs/2401.01804v2

Econometrics arXiv updated paper (originally submitted: 2024-01-03)

Model Averaging and Double Machine Learning

Authors: Achim Ahrens, Christian B. Hansen, Mark E. Schaffer, Thomas Wiemann

This paper discusses pairing double/debiased machine learning (DDML) with
stacking, a model averaging method for combining multiple candidate learners,
to estimate structural parameters. In addition to conventional stacking, we
consider two stacking variants available for DDML: short-stacking exploits the
cross-fitting step of DDML to substantially reduce the computational burden and
pooled stacking enforces common stacking weights over cross-fitting folds.
Using calibrated simulation studies and two applications estimating gender gaps
in citations and wages, we show that DDML with stacking is more robust to
partially unknown functional forms than common alternative approaches based on
single pre-selected learners. We provide Stata and R software implementing our
proposals.

arXiv link: http://arxiv.org/abs/2401.01645v2

Econometrics arXiv cross-link from math.OC (math.OC), submitted: 2024-01-03

Classification and Treatment Learning with Constraints via Composite Heaviside Optimization: a Progressive MIP Method

Authors: Yue Fang, Junyi Liu, Jong-Shi Pang

This paper proposes a Heaviside composite optimization approach and presents
a progressive (mixed) integer programming (PIP) method for solving multi-class
classification and multi-action treatment problems with constraints. A
Heaviside composite function is a composite of a Heaviside function (i.e., the
indicator function of either the open $( \, 0,\infty )$ or closed $[ \,
0,\infty \, )$ interval) with a possibly nondifferentiable function.
Modeling-wise, we show how Heaviside composite optimization provides a unified
formulation for learning the optimal multi-class classification and
multi-action treatment rules, subject to rule-dependent constraints stipulating
a variety of domain restrictions. A Heaviside composite function has an
equivalent discrete formulation, and the resulting optimization problem can in
principle be solved by integer programming (IP) methods. Nevertheless, for
constrained learning problems with large data sets, a straightforward
application of off-the-shelf IP solvers is usually ineffective in achieving
global optimality. To alleviate such a computational burden, our major
contribution is the proposal of the PIP method by leveraging the effectiveness
of state-of-the-art IP solvers for problems of modest sizes. We provide the
theoretical advantage of the PIP method with the connection to continuous
optimization and show that the computed solution is locally optimal for a broad
class of Heaviside composite optimization problems. The numerical performance
of the PIP method is demonstrated by extensive computational experimentation.

arXiv link: http://arxiv.org/abs/2401.01565v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2024-01-02

Robust Inference for Multiple Predictive Regressions with an Application on Bond Risk Premia

Authors: Xiaosai Liao, Xinjue Li, Qingliang Fan

We propose a robust hypothesis testing procedure for the predictability of
multiple predictors that could be highly persistent. Our method improves the
popular extended instrumental variable (IVX) testing (Phillips and Lee, 2013;
Kostakis et al., 2015) in that, besides addressing the two bias effects found
in Hosseinkouchack and Demetrescu (2021), we find and deal with the
variance-enlargement effect. We show that two types of higher-order terms
induce these distortion effects in the test statistic, leading to significant
over-rejection for one-sided tests and tests in multiple predictive
regressions. Our improved IVX-based test includes three steps to tackle all the
issues above regarding finite sample bias and variance terms. Thus, the test
statistics perform well in size control, while its power performance is
comparable with the original IVX. Monte Carlo simulations and an empirical
study on the predictability of bond risk premia are provided to demonstrate the
effectiveness of the newly proposed approach.

arXiv link: http://arxiv.org/abs/2401.01064v1

Econometrics arXiv updated paper (originally submitted: 2024-01-01)

Changes-in-Changes for Ordered Choice Models: Too Many "False Zeros"?

Authors: Daniel Gutknecht, Cenchen Liu

In this paper, we develop a Difference-in-Differences model for discrete,
ordered outcomes, building upon elements from a continuous Changes-in-Changes
model. We focus on outcomes derived from self-reported survey data eliciting
socially undesirable, illegal, or stigmatized behaviors like tax evasion or
substance abuse, where too many "false zeros", or more broadly, underreporting
are likely. We start by providing a characterization for parallel trends within
a general threshold-crossing model. We then propose a partial and point
identification framework for different distributional treatment effects when
the outcome is subject to underreporting. Applying our methodology, we
investigate the impact of recreational marijuana legalization for adults in
several U.S. states on the short-term consumption behavior of 8th-grade
high-school students. The results indicate small, but significant increases in
consumption probabilities at each level. These effects are further amplified
upon accounting for misreporting.

arXiv link: http://arxiv.org/abs/2401.00618v3

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2023-12-31

How industrial clusters influence the growth of the regional GDP: A spatial-approach

Authors: Vahidin Jeleskovic, Steffen Loeber

In this paper, we employ spatial econometric methods to analyze panel data
from German NUTS 3 regions. Our goal is to gain a deeper understanding of the
significance and interdependence of industry clusters in shaping the dynamics
of GDP. To achieve a more nuanced spatial differentiation, we introduce
indicator matrices for each industry sector which allows for extending the
spatial Durbin model to a new version of it. This approach is essential due to
both the economic importance of these sectors and the potential issue of
omitted variables. Failing to account for industry sectors can lead to omitted
variable bias and estimation problems. To assess the effects of the major
industry sectors, we incorporate eight distinct branches of industry into our
analysis. According to prevailing economic theory, these clusters should have a
positive impact on the regions they are associated with. Our findings indeed
reveal highly significant impacts, which can be either positive or negative, of
specific sectors on local GDP growth. Spatially, we observe that direct and
indirect effects can exhibit opposite signs, indicative of heightened
competitiveness within and between industry sectors. Therefore, we recommend
that industry sectors should be taken into consideration when conducting
spatial analysis of GDP. Doing so allows for a more comprehensive understanding
of the economic dynamics at play.

arXiv link: http://arxiv.org/abs/2401.10261v1

Econometrics arXiv updated paper (originally submitted: 2023-12-30)

Identification of Nonlinear Dynamic Panels under Partial Stationarity

Authors: Wayne Yuan Gao, Rui Wang

This paper provides a general identification approach for a wide range of
nonlinear panel data models, including binary choice, ordered response, and
other types of limited dependent variable models. Our approach accommodates
dynamic models with any number of lagged dependent variables as well as other
types of endogenous covariates. Our identification strategy relies on a partial
stationarity condition, which allows for not only an unknown distribution of
errors, but also temporal dependencies in errors. We derive partial
identification results under flexible model specifications and establish
sharpness of our identified set in the binary choice setting. We demonstrate
the robust finite-sample performance of our approach using Monte Carlo
simulations, and apply the approach to analyze the empirical application of
income categories using various ordered choice models.

arXiv link: http://arxiv.org/abs/2401.00264v4

Econometrics arXiv updated paper (originally submitted: 2023-12-30)

Forecasting CPI inflation under economic policy and geopolitical uncertainties

Authors: Shovon Sengupta, Tanujit Chakraborty, Sunny Kumar Singh

Forecasting consumer price index (CPI) inflation is of paramount importance
for both academics and policymakers at the central banks. This study introduces
a filtered ensemble wavelet neural network (FEWNet) to forecast CPI inflation,
which is tested on BRIC countries. FEWNet breaks down inflation data into high
and low-frequency components using wavelets and utilizes them along with other
economic factors (economic policy uncertainty and geopolitical risk) to produce
forecasts. All the wavelet-transformed series and filtered exogenous variables
are fed into downstream autoregressive neural networks to make the final
ensemble forecast. Theoretically, we show that FEWNet reduces the empirical
risk compared to fully connected autoregressive neural networks. FEWNet is more
accurate than other forecasting methods and can also estimate the uncertainty
in its predictions due to its capacity to effectively capture non-linearities
and long-range dependencies in the data through its adaptable architecture.
This makes FEWNet a valuable tool for central banks to manage inflation.

arXiv link: http://arxiv.org/abs/2401.00249v2

Econometrics arXiv paper, submitted: 2023-12-29

Robust Inference in Panel Data Models: Some Effects of Heteroskedasticity and Leveraged Data in Small Samples

Authors: Annalivia Polselli

With the violation of the assumption of homoskedasticity, least squares
estimators of the variance become inefficient and statistical inference
conducted with invalid standard errors leads to misleading rejection rates.
Despite a vast cross-sectional literature on the downward bias of robust
standard errors, the problem is not extensively covered in the panel data
framework. We investigate the consequences of the simultaneous presence of
small sample size, heteroskedasticity and data points that exhibit extreme
values in the covariates ('good leverage points') on the statistical inference.
Focusing on one-way linear panel data models, we examine asymptotic and finite
sample properties of a battery of heteroskedasticity-consistent estimators
using Monte Carlo simulations. We also propose a hybrid estimator of the
variance-covariance matrix. Results show that conventional standard errors are
always dominated by more conservative estimators of the variance, especially in
small samples. In addition, all types of HC standard errors have excellent
performances in terms of size and power tests under homoskedasticity.

arXiv link: http://arxiv.org/abs/2312.17676v1

Econometrics arXiv updated paper (originally submitted: 2023-12-29)

Decision Theory for Treatment Choice Problems with Partial Identification

Authors: José Luis Montiel Olea, Chen Qiu, Jörg Stoye

We apply classical statistical decision theory to a large class of treatment
choice problems with partial identification. We show that, in a general class
of problems with Gaussian likelihood, all decision rules are admissible; it is
maximin-welfare optimal to ignore all data; and, for severe enough partial
identification, there are infinitely many minimax-regret optimal decision
rules, all of which sometimes randomize the policy recommendation. We uniquely
characterize the minimax-regret optimal rule that least frequently randomizes,
and show that, in some cases, it can outperform other minimax-regret optimal
rules in terms of what we term profiled regret. We analyze the implications of
our results in the aggregation of experimental estimates for policy adoption,
extrapolation of Local Average Treatment Effects, and policy making in the
presence of omitted variable bias.

arXiv link: http://arxiv.org/abs/2312.17623v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-12-28

Bayesian Analysis of High Dimensional Vector Error Correction Model

Authors: Parley R Yang, Alexander Y Shestopaloff

Vector Error Correction Model (VECM) is a classic method to analyse
cointegration relationships amongst multivariate non-stationary time series. In
this paper, we focus on high dimensional setting and seek for
sample-size-efficient methodology to determine the level of cointegration. Our
investigation centres at a Bayesian approach to analyse the cointegration
matrix, henceforth determining the cointegration rank. We design two algorithms
and implement them on simulated examples, yielding promising results
particularly when dealing with high number of variables and relatively low
number of observations. Furthermore, we extend this methodology to empirically
investigate the constituents of the S&P 500 index, where low-volatility
portfolios can be found during both in-sample training and out-of-sample
testing periods.

arXiv link: http://arxiv.org/abs/2312.17061v2

Econometrics arXiv paper, submitted: 2023-12-28

Development of Choice Model for Brand Evaluation

Authors: Marina Kholod, Nikita Mokrenko

Consumer choice modeling takes center stage as we delve into understanding
how personal preferences of decision makers (customers) for products influence
demand at the level of the individual. The contemporary choice theory is built
upon the characteristics of the decision maker, alternatives available for the
choice of the decision maker, the attributes of the available alternatives and
decision rules that the decision maker uses to make a choice. The choice set in
our research is represented by six major brands (products) of laundry
detergents in the Japanese market. We use the panel data of the purchases of 98
households to which we apply the hierarchical probit model, facilitated by a
Markov Chain Monte Carlo simulation (MCMC) in order to evaluate the brand
values of six brands. The applied model also allows us to evaluate the tangible
and intangible brand values. These evaluated metrics help us to assess the
brands based on their tangible and intangible characteristics. Moreover,
consumer choice modeling also provides a framework for assessing the
environmental performance of laundry detergent brands as the model uses the
information on components (physical attributes) of laundry detergents.

arXiv link: http://arxiv.org/abs/2312.16927v1

Econometrics arXiv paper, submitted: 2023-12-27

Modeling Systemic Risk: A Time-Varying Nonparametric Causal Inference Framework

Authors: Jalal Etesami, Ali Habibnia, Negar Kiyavash

We propose a nonparametric and time-varying directed information graph
(TV-DIG) framework to estimate the evolving causal structure in time series
networks, thereby addressing the limitations of traditional econometric models
in capturing high-dimensional, nonlinear, and time-varying interconnections
among series. This framework employs an information-theoretic measure rooted in
a generalized version of Granger-causality, which is applicable to both linear
and nonlinear dynamics. Our framework offers advancements in measuring systemic
risk and establishes meaningful connections with established econometric
models, including vector autoregression and switching models. We evaluate the
efficacy of our proposed model through simulation experiments and empirical
analysis, reporting promising results in recovering simulated time-varying
networks with nonlinear and multivariate structures. We apply this framework to
identify and monitor the evolution of interconnectedness and systemic risk
among major assets and industrial sectors within the financial network. We
focus on cryptocurrencies' potential systemic risks to financial stability,
including spillover effects on other sectors during crises like the COVID-19
pandemic and the Federal Reserve's 2020 emergency response. Our findings
reveals significant, previously underrecognized pre-2020 influences of
cryptocurrencies on certain financial sectors, highlighting their potential
systemic risks and offering a systematic approach in tracking evolving
cross-sector interactions within financial networks.

arXiv link: http://arxiv.org/abs/2312.16707v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2023-12-27

Best-of-Both-Worlds Linear Contextual Bandits

Authors: Masahiro Kato, Shinji Ito

This study investigates the problem of $K$-armed linear contextual bandits,
an instance of the multi-armed bandit problem, under an adversarial corruption.
At each round, a decision-maker observes an independent and identically
distributed context and then selects an arm based on the context and past
observations. After selecting an arm, the decision-maker incurs a loss
corresponding to the selected arm. The decision-maker aims to minimize the
cumulative loss over the trial. The goal of this study is to develop a strategy
that is effective in both stochastic and adversarial environments, with
theoretical guarantees. We first formulate the problem by introducing a novel
setting of bandits with adversarial corruption, referred to as the contextual
adversarial regime with a self-bounding constraint. We assume linear models for
the relationship between the loss and the context. Then, we propose a strategy
that extends the RealLinExp3 by Neu & Olkhovskaya (2020) and the
Follow-The-Regularized-Leader (FTRL). The regret of our proposed algorithm is
shown to be upper-bounded by $O\left(\min\left\{(\log(T))^3{\Delta_{*}}
+ \frac{C(\log(T))^3{\Delta_{*}}},\ \
T(\log(T))^2\right\}\right)$, where $T \inN$ is the number of
rounds, $\Delta_{*} > 0$ is the constant minimum gap between the best and
suboptimal arms for any context, and $C\in[0, T] $ is an adversarial corruption
parameter. This regret upper bound implies
$O\left((\log(T))^3{\Delta_{*}}\right)$ in a stochastic environment and
by $O\left( T(\log(T))^2\right)$ in an adversarial environment. We refer
to our strategy as the Best-of-Both-Worlds (BoBW) RealFTRL, due to its
theoretical guarantees in both stochastic and adversarial regimes.

arXiv link: http://arxiv.org/abs/2312.16489v1

Econometrics arXiv updated paper (originally submitted: 2023-12-26)

Incentive-Aware Synthetic Control: Accurate Counterfactual Estimation via Incentivized Exploration

Authors: Daniel Ngo, Keegan Harris, Anish Agarwal, Vasilis Syrgkanis, Zhiwei Steven Wu

We consider the setting of synthetic control methods (SCMs), a canonical
approach used to estimate the treatment effect on the treated in a panel data
setting. We shed light on a frequently overlooked but ubiquitous assumption
made in SCMs of "overlap": a treated unit can be written as some combination --
typically, convex or linear combination -- of the units that remain under
control. We show that if units select their own interventions, and there is
sufficiently large heterogeneity between units that prefer different
interventions, overlap will not hold. We address this issue by proposing a
framework which incentivizes units with different preferences to take
interventions they would not normally consider. Specifically, leveraging tools
from information design and online learning, we propose a SCM that incentivizes
exploration in panel data settings by providing incentive-compatible
intervention recommendations to units. We establish this estimator obtains
valid counterfactual estimates without the need for an a priori overlap
assumption. We extend our results to the setting of synthetic interventions,
where the goal is to produce counterfactual outcomes under all interventions,
not just control. Finally, we provide two hypothesis tests for determining
whether unit overlap holds for a given panel dataset.

arXiv link: http://arxiv.org/abs/2312.16307v2

Econometrics arXiv paper, submitted: 2023-12-26

Direct Multi-Step Forecast based Comparison of Nested Models via an Encompassing Test

Authors: Jean-Yves Pitarakis

We introduce a novel approach for comparing out-of-sample multi-step
forecasts obtained from a pair of nested models that is based on the forecast
encompassing principle. Our proposed approach relies on an alternative way of
testing the population moment restriction implied by the forecast encompassing
principle and that links the forecast errors from the two competing models in a
particular way. Its key advantage is that it is able to bypass the variance
degeneracy problem afflicting model based forecast comparisons across nested
models. It results in a test statistic whose limiting distribution is standard
normal and which is particularly simple to construct and can accommodate both
single period and longer-horizon prediction comparisons. Inferences are also
shown to be robust to different predictor types, including stationary,
highly-persistent and purely deterministic processes. Finally, we illustrate
the use of our proposed approach through an empirical application that explores
the role of global inflation in enhancing individual country specific inflation
forecasts.

arXiv link: http://arxiv.org/abs/2312.16099v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2023-12-26

Pricing with Contextual Elasticity and Heteroscedastic Valuation

Authors: Jianyu Xu, Yu-Xiang Wang

We study an online contextual dynamic pricing problem, where customers decide
whether to purchase a product based on its features and price. We introduce a
novel approach to modeling a customer's expected demand by incorporating
feature-based price elasticity, which can be equivalently represented as a
valuation with heteroscedastic noise. To solve the problem, we propose a
computationally efficient algorithm called "Pricing with Perturbation (PwP)",
which enjoys an $O(dT\log T)$ regret while allowing arbitrary
adversarial input context sequences. We also prove a matching lower bound at
$\Omega(dT)$ to show the optimality regarding $d$ and $T$ (up to $\log
T$ factors). Our results shed light on the relationship between contextual
elasticity and heteroscedastic valuation, providing insights for effective and
practical pricing strategies.

arXiv link: http://arxiv.org/abs/2312.15999v1

Econometrics arXiv updated paper (originally submitted: 2023-12-25)

Negative Control Falsification Tests for Instrumental Variable Designs

Authors: Oren Danieli, Daniel Nevo, Itai Walk, Bar Weinstein, Dan Zeltzer

The validity of instrumental variable (IV) designs is typically tested using
two types of falsification tests. We characterize these tests as conditional
independence tests between negative control variables -- proxies for unobserved
variables posing a threat to the identification -- and the IV or the outcome.
We describe the conditions that variables must satisfy in order to serve as
negative controls. We show that these falsification tests examine not only
independence and the exclusion restriction, but also functional form
assumptions. Our analysis reveals that conventional applications of these tests
may flag problems even in valid IV designs. We offer implementation guidance to
address these issues.

arXiv link: http://arxiv.org/abs/2312.15624v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2023-12-25

Zero-Inflated Bandits

Authors: Haoyu Wei, Runzhe Wan, Lei Shi, Rui Song

Many real-world bandit applications are characterized by sparse rewards,
which can significantly hinder learning efficiency. Leveraging problem-specific
structures for careful distribution modeling is recognized as essential for
improving estimation efficiency in statistics. However, this approach remains
under-explored in the context of bandits. To address this gap, we initiate the
study of zero-inflated bandits, where the reward is modeled using a classic
semi-parametric distribution known as the zero-inflated distribution. We
develop algorithms based on the Upper Confidence Bound and Thompson Sampling
frameworks for this specific structure. The superior empirical performance of
these methods is demonstrated through extensive numerical studies.

arXiv link: http://arxiv.org/abs/2312.15595v3

Econometrics arXiv cross-link from cs.AI (cs.AI), submitted: 2023-12-24

The Challenge of Using LLMs to Simulate Human Behavior: A Causal Inference Perspective

Authors: George Gui, Olivier Toubia

Large Language Models (LLMs) have shown impressive potential to simulate
human behavior. We identify a fundamental challenge in using them to simulate
experiments: when LLM-simulated subjects are blind to the experimental design
(as is standard practice with human subjects), variations in treatment
systematically affect unspecified variables that should remain constant,
violating the unconfoundedness assumption. Using demand estimation as a context
and an actual experiment as a benchmark, we show this can lead to implausible
results. While confounding may in principle be addressed by controlling for
covariates, this can compromise ecological validity in the context of LLM
simulations: controlled covariates become artificially salient in the simulated
decision process, which introduces focalism. This trade-off between
unconfoundedness and ecological validity is usually absent in traditional
experimental design and represents a unique challenge in LLM simulations. We
formalize this challenge theoretically, showing it stems from ambiguous
prompting strategies, and hence cannot be fully addressed by improving training
data or by fine-tuning. Alternative approaches that unblind the experimental
design to the LLM show promise. Our findings suggest that effectively
leveraging LLMs for experimental simulations requires fundamentally rethinking
established experimental design practices rather than simply adapting protocols
developed for human subjects.

arXiv link: http://arxiv.org/abs/2312.15524v2

Econometrics arXiv updated paper (originally submitted: 2023-12-24)

Variable Selection in High Dimensional Linear Regressions with Parameter Instability

Authors: Alexander Chudik, M. Hashem Pesaran, Mahrad Sharifvaghefi

This paper considers the problem of variable selection allowing for parameter
instability. It distinguishes between signal and pseudo-signal variables that
are correlated with the target variable, and noise variables that are not, and
investigate the asymptotic properties of the One Covariate at a Time Multiple
Testing (OCMT) method proposed by Chudik et al. (2018) under parameter
insatiability. It is established that OCMT continues to asymptotically select
an approximating model that includes all the signals and none of the noise
variables. Properties of post selection regressions are also investigated, and
in-sample fit of the selected regression is shown to have the oracle property.
The theoretical results support the use of unweighted observations at the
selection stage of OCMT, whilst applying down-weighting of observations only at
the forecasting stage. Monte Carlo and empirical applications show that OCMT
without down-weighting at the selection stage yields smaller mean squared
forecast errors compared to Lasso, Adaptive Lasso, and boosting.

arXiv link: http://arxiv.org/abs/2312.15494v2

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2023-12-23

Stochastic Equilibrium the Lucas Critique and Keynesian Economics

Authors: David Staines

In this paper, a mathematically rigorous solution overturns existing wisdom
regarding New Keynesian Dynamic Stochastic General Equilibrium. I develop a
formal concept of stochastic equilibrium. I prove uniqueness and necessity,
when agents are patient, with general application. Existence depends on
appropriately specified eigenvalue conditions. Otherwise, no solution of any
kind exists. I construct the equilibrium with Calvo pricing. I provide novel
comparative statics with the non-stochastic model of mathematical significance.
I uncover a bifurcation between neighbouring stochastic systems and
approximations taken from the Zero Inflation Non-Stochastic Steady State
(ZINSS). The correct Phillips curve agrees with the zero limit from the trend
inflation framework. It contains a large lagged inflation coefficient and a
small response to expected inflation. Price dispersion can be first or second
order depending how shocks are scaled. The response to the output gap is always
muted and is zero at standard parameters. A neutrality result is presented to
explain why and align Calvo with Taylor pricing. Present and lagged demand
shocks enter the Phillips curve so there is no Divine Coincidence and the
system is identified from structural shocks alone. The lagged inflation slope
is increasing in the inflation response, embodying substantive policy
trade-offs. The Taylor principle is reversed, inactive settings are necessary,
pointing towards inertial policy. The observational equivalence idea of the
Lucas critique is disproven. The bifurcation results from the breakdown of the
constraints implied by lagged nominal rigidity, associated with cross-equation
cancellation possible only at ZINSS. There is a dual relationship between
restrictions on the econometrician and constraints on repricing firms. Thus, if
the model is correct, goodness of fit will jump.

arXiv link: http://arxiv.org/abs/2312.16214v4

Econometrics arXiv cross-link from math.PR (math.PR), submitted: 2023-12-22

Functional CLTs for subordinated Lévy models in physics, finance, and econometrics

Authors: Andreas Søjmark, Fabrice Wunderlich

We present a simple unifying treatment of a broad class of applications from
statistical mechanics, econometrics, mathematical finance, and insurance
mathematics, where (possibly subordinated) L\'evy noise arises as a scaling
limit of some form of continuous-time random walk (CTRW). For each application,
it is natural to rely on weak convergence results for stochastic integrals on
Skorokhod space in Skorokhod's J1 or M1 topologies. As compared to earlier and
entirely separate works, we are able to give a more streamlined account while
also allowing for greater generality and providing important new insights. For
each application, we first elucidate how the fundamental conclusions for J1
convergent CTRWs emerge as special cases of the same general principles, and we
then illustrate how the specific settings give rise to different results for
strictly M1 convergent CTRWs.

arXiv link: http://arxiv.org/abs/2312.15119v2

Econometrics arXiv paper, submitted: 2023-12-21

Exploring Distributions of House Prices and House Price Indices

Authors: Jiong Liu, Hamed Farahani, R. A. Serota

We use house prices (HP) and house price indices (HPI) as a proxy to income
distribution. Specifically, we analyze sale prices in the 1970-2010 window of
over 116,000 single-family homes in Hamilton County, Ohio, including Cincinnati
metro area of about 2.2 million people. We also analyze HPI, published by
Federal Housing Finance Agency (FHFA), for nearly 18,000 US ZIP codes that
cover a period of over 40 years starting in 1980's. If HP can be viewed as a
first derivative of income, HPI can be viewed as its second derivative. We use
generalized beta (GB) family of functions to fit distributions of HP and HPI
since GB naturally arises from the models of economic exchange described by
stochastic differential equations. Our main finding is that HP and multi-year
HPI exhibit a negative Dragon King (nDK) behavior, wherein power-law
distribution tail gives way to an abrupt decay to a finite upper limit value,
which is similar to our recent findings for realized volatility of S&P500
index in the US stock market. This type of tail behavior is best fitted by a
modified GB (mGB) distribution. Tails of single-year HPI appear to show more
consistency with power-law behavior, which is better described by a GB Prime
(GB2) distribution. We supplement full distribution fits by mGB and GB2 with
direct linear fits (LF) of the tails. Our numerical procedure relies on
evaluation of confidence intervals (CI) of the fits, as well as of p-values
that give the likelihood that data come from the fitted distributions.

arXiv link: http://arxiv.org/abs/2312.14325v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2023-12-21

RetailSynth: Synthetic Data Generation for Retail AI Systems Evaluation

Authors: Yu Xia, Ali Arian, Sriram Narayanamoorthy, Joshua Mabry

Significant research effort has been devoted in recent years to developing
personalized pricing, promotions, and product recommendation algorithms that
can leverage rich customer data to learn and earn. Systematic benchmarking and
evaluation of these causal learning systems remains a critical challenge, due
to the lack of suitable datasets and simulation environments. In this work, we
propose a multi-stage model for simulating customer shopping behavior that
captures important sources of heterogeneity, including price sensitivity and
past experiences. We embedded this model into a working simulation environment
-- RetailSynth. RetailSynth was carefully calibrated on publicly available
grocery data to create realistic synthetic shopping transactions. Multiple
pricing policies were implemented within the simulator and analyzed for impact
on revenue, category penetration, and customer retention. Applied researchers
can use RetailSynth to validate causal demand models for multi-category retail
and to incorporate realistic price sensitivity into emerging benchmarking
suites for personalized pricing, promotions, and product recommendations.

arXiv link: http://arxiv.org/abs/2312.14095v1

Econometrics arXiv paper, submitted: 2023-12-21

Binary Endogenous Treatment in Stochastic Frontier Models with an Application to Soil Conservation in El Salvador

Authors: Samuele Centorrino, Maria Pérez-Urdiales, Boris Bravo-Ureta, Alan J. Wall

Improving the productivity of the agricultural sector is part of one of the
Sustainable Development Goals set by the United Nations. To this end, many
international organizations have funded training and technology transfer
programs that aim to promote productivity and income growth, fight poverty and
enhance food security among smallholder farmers in developing countries.
Stochastic production frontier analysis can be a useful tool when evaluating
the effectiveness of these programs. However, accounting for treatment
endogeneity, often intrinsic to these interventions, only recently has received
any attention in the stochastic frontier literature. In this work, we extend
the classical maximum likelihood estimation of stochastic production frontier
models by allowing both the production frontier and inefficiency to depend on a
potentially endogenous binary treatment. We use instrumental variables to
define an assignment mechanism for the treatment, and we explicitly model the
density of the first and second-stage composite error terms. We provide
empirical evidence of the importance of controlling for endogeneity in this
setting using farm-level data from a soil conservation program in El Salvador.

arXiv link: http://arxiv.org/abs/2312.13939v1

Econometrics arXiv cross-link from q-fin.RM (q-fin.RM), submitted: 2023-12-20

Principal Component Copulas for Capital Modelling and Systemic Risk

Authors: K. B. Gubbels, J. Y. Ypma, C. W. Oosterlee

We introduce a class of copulas that we call Principal Component Copulas
(PCCs). This class combines the strong points of copula-based techniques with
principal component analysis (PCA), which results in flexibility when modelling
tail dependence along the most important directions in high-dimensional data.
We obtain theoretical results for PCCs that are important for practical
applications. In particular, we derive tractable expressions for the
high-dimensional copula density, which can be represented in terms of
characteristic functions. We also develop algorithms to perform Maximum
Likelihood and Generalized Method of Moment estimation in high-dimensions and
show very good performance in simulation experiments. Finally, we apply the
copula to the international stock market to study systemic risk. We find that
PCCs lead to excellent performance on measures of systemic risk due to their
ability to distinguish between parallel and orthogonal movements in the global
market, which have a different impact on systemic risk and diversification. As
a result, we consider the PCC promising for capital models, which financial
institutions use to protect themselves against systemic risk.

arXiv link: http://arxiv.org/abs/2312.13195v3

Econometrics arXiv cross-link from cs.CR (cs.CR), submitted: 2023-12-20

Noisy Measurements Are Important, the Design of Census Products Is Much More Important

Authors: John M. Abowd

McCartan et al. (2023) call for "making differential privacy work for census
data users." This commentary explains why the 2020 Census Noisy Measurement
Files (NMFs) are not the best focus for that plea. The August 2021 letter from
62 prominent researchers asking for production of the direct output of the
differential privacy system deployed for the 2020 Census signaled the
engagement of the scholarly community in the design of decennial census data
products. NMFs, the raw statistics produced by the 2020 Census Disclosure
Avoidance System before any post-processing, are one component of that
design-the query strategy output. The more important component is the query
workload output-the statistics released to the public. Optimizing the query
workload-the Redistricting Data (P.L. 94-171) Summary File, specifically-could
allow the privacy-loss budget to be more effectively managed. There could be
fewer noisy measurements, no post-processing bias, and direct estimates of the
uncertainty from disclosure avoidance for each published statistic.

arXiv link: http://arxiv.org/abs/2312.14191v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2023-12-20

Locally Optimal Fixed-Budget Best Arm Identification in Two-Armed Gaussian Bandits with Unknown Variances

Authors: Masahiro Kato

We address the problem of best arm identification (BAI) with a fixed budget
for two-armed Gaussian bandits. In BAI, given multiple arms, we aim to find the
best arm, an arm with the highest expected reward, through an adaptive
experiment. Kaufmann et al. (2016) develops a lower bound for the probability
of misidentifying the best arm. They also propose a strategy, assuming that the
variances of rewards are known, and show that it is asymptotically optimal in
the sense that its probability of misidentification matches the lower bound as
the budget approaches infinity. However, an asymptotically optimal strategy is
unknown when the variances are unknown. For this open issue, we propose a
strategy that estimates variances during an adaptive experiment and draws arms
with a ratio of the estimated standard deviations. We refer to this strategy as
the Neyman Allocation (NA)-Augmented Inverse Probability weighting (AIPW)
strategy. We then demonstrate that this strategy is asymptotically optimal by
showing that its probability of misidentification matches the lower bound when
the budget approaches infinity, and the gap between the expected rewards of two
arms approaches zero (small-gap regime). Our results suggest that under the
worst-case scenario characterized by the small-gap regime, our strategy, which
employs estimated variance, is asymptotically optimal even when the variances
are unknown.

arXiv link: http://arxiv.org/abs/2312.12741v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-12-18

Real-time monitoring with RCA models

Authors: Lajos Horváth, Lorenzo Trapani

We propose a family of weighted statistics based on the CUSUM process of the
WLS residuals for the online detection of changepoints in a Random Coefficient
Autoregressive model, using both the standard CUSUM and the Page-CUSUM process.
We derive the asymptotics under the null of no changepoint for all possible
weighing schemes, including the case of the standardised CUSUM, for which we
derive a Darling-Erdos-type limit theorem; our results guarantee the
procedure-wise size control under both an open-ended and a closed-ended
monitoring. In addition to considering the standard RCA model with no
covariates, we also extend our results to the case of exogenous regressors. Our
results can be applied irrespective of (and with no prior knowledge required as
to) whether the observations are stationary or not, and irrespective of whether
they change into a stationary or nonstationary regime. Hence, our methodology
is particularly suited to detect the onset, or the collapse, of a bubble or an
epidemic. Our simulations show that our procedures, especially when
standardising the CUSUM process, can ensure very good size control and short
detection delays. We complement our theory by studying the online detection of
breaks in epidemiological and housing prices series.

arXiv link: http://arxiv.org/abs/2312.11710v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2023-12-18

A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census

Authors: John M. Abowd, Tamara Adams, Robert Ashmead, David Darais, Sourya Dey, Simson L. Garfinkel, Nathan Goldschlag, Michael B. Hawes, Daniel Kifer, Philip Leclerc, Ethan Lew, Scott Moore, Rolando A. Rodríguez, Ramy N. Tadros, Lars Vilhuber

We show that individual, confidential microdata records from the 2010 U.S.
Census of Population and Housing can be accurately reconstructed from the
published tabular summaries. Ninety-seven million person records (every
resident in 70% of all census blocks) are exactly reconstructed with provable
certainty using only public information. We further show that a hypothetical
attacker using our methods can reidentify with 95% accuracy population unique
individuals who are perfectly reconstructed and not in the modal race and
ethnicity category in their census block (3.4 million persons)--a result that
is only possible because their confidential records were used in the published
tabulations. Finally, we show that the methods used for the 2020 Census, based
on a differential privacy framework, provide better protection against this
type of attack, with better published data accuracy, than feasible
alternatives.

arXiv link: http://arxiv.org/abs/2312.11283v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2023-12-18

Predicting Financial Literacy via Semi-supervised Learning

Authors: David Hason Rudd, Huan Huo, Guandong Xu

Financial literacy (FL) represents a person's ability to turn assets into
income, and understanding digital currencies has been added to the modern
definition. FL can be predicted by exploiting unlabelled recorded data in
financial networks via semi-supervised learning (SSL). Measuring and predicting
FL has not been widely studied, resulting in limited understanding of customer
financial engagement consequences. Previous studies have shown that low FL
increases the risk of social harm. Therefore, it is important to accurately
estimate FL to allocate specific intervention programs to less financially
literate groups. This will not only increase company profitability, but will
also reduce government spending. Some studies considered predicting FL in
classification tasks, whereas others developed FL definitions and impacts. The
current paper investigated mechanisms to learn customer FL level from their
financial data using sampling by synthetic minority over-sampling techniques
for regression with Gaussian noise (SMOGN). We propose the SMOGN-COREG model
for semi-supervised regression, applying SMOGN to deal with unbalanced datasets
and a nonparametric multi-learner co-regression (COREG) algorithm for labeling.
We compared the SMOGN-COREG model with six well-known regressors on five
datasets to evaluate the proposed models effectiveness on unbalanced and
unlabelled financial data. Experimental results confirmed that the proposed
method outperformed the comparator models for unbalanced and unlabelled
financial data. Therefore, SMOGN-COREG is a step towards using unlabelled data
to estimate FL level.

arXiv link: http://arxiv.org/abs/2312.10984v1

Econometrics arXiv paper, submitted: 2023-12-16

Some Finite-Sample Results on the Hausman Test

Authors: Jinyong Hahn, Zhipeng Liao, Nan Liu, Shuyang Sheng

This paper shows that the endogeneity test using the control function
approach in linear instrumental variable models is a variant of the Hausman
test. Moreover, we find that the test statistics used in these tests can be
numerically ordered, indicating their relative power properties in finite
samples.

arXiv link: http://arxiv.org/abs/2312.10558v1

Econometrics arXiv updated paper (originally submitted: 2023-12-16)

The Dynamic Triple Gamma Prior as a Shrinkage Process Prior for Time-Varying Parameter Models

Authors: Peter Knaus, Sylvia Frühwirth-Schnatter

Many existing shrinkage approaches for time-varying parameter (TVP) models
assume constant innovation variances across time points, inducing sparsity by
shrinking these variances toward zero. However, this assumption falls short
when states exhibit large jumps or structural changes, as often seen in
empirical time series analysis. To address this, we propose the dynamic triple
gamma prior -- a stochastic process that induces time-dependent shrinkage by
modeling dependence among innovations while retaining a well-known triple gamma
marginal distribution. This framework encompasses various special and limiting
cases, including the horseshoe shrinkage prior, making it highly flexible. We
derive key properties of the dynamic triple gamma that highlight its dynamic
shrinkage behavior and develop an efficient Markov chain Monte Carlo algorithm
for posterior sampling. The proposed approach is evaluated through sparse
covariance modeling and forecasting of the returns of the EURO STOXX 50 index,
demonstrating favorable forecasting performance.

arXiv link: http://arxiv.org/abs/2312.10487v2

Econometrics arXiv paper, submitted: 2023-12-16

Logit-based alternatives to two-stage least squares

Authors: Denis Chetverikov, Jinyong Hahn, Zhipeng Liao, Shuyang Sheng

We propose logit-based IV and augmented logit-based IV estimators that serve
as alternatives to the traditionally used 2SLS estimator in the model where
both the endogenous treatment variable and the corresponding instrument are
binary. Our novel estimators are as easy to compute as the 2SLS estimator but
have an advantage over the 2SLS estimator in terms of causal interpretability.
In particular, in certain cases where the probability limits of both our
estimators and the 2SLS estimator take the form of weighted-average treatment
effects, our estimators are guaranteed to yield non-negative weights whereas
the 2SLS estimator is not.

arXiv link: http://arxiv.org/abs/2312.10333v1

Econometrics arXiv updated paper (originally submitted: 2023-12-13)

Double Machine Learning for Static Panel Models with Fixed Effects

Authors: Paul S. Clarke, Annalivia Polselli

Recent advances in causal inference have seen the development of methods
which make use of the predictive power of machine learning algorithms. In this
paper, we develop novel double machine learning (DML) procedures for panel data
in which these algorithms are used to approximate high-dimensional and
nonlinear nuisance functions of the covariates. Our new procedures are
extensions of the well-known correlated random effects, within-group and
first-difference estimators from linear to nonlinear panel models,
specifically, Robinson (1988)'s partially linear regression model with fixed
effects and unspecified nonlinear confounding. Our simulation study assesses
the performance of these procedures using different machine learning
algorithms. We use our procedures to re-estimate the impact of minimum wage on
voting behaviour in the UK. From our results, we recommend the use of
first-differencing because it imposes the fewest constraints on the
distribution of the fixed effects, and an ensemble learning strategy to ensure
optimum estimator accuracy.

arXiv link: http://arxiv.org/abs/2312.08174v5

Econometrics arXiv paper, submitted: 2023-12-13

Individual Updating of Subjective Probability of Homicide Victimization: a "Natural Experiment'' on Risk Communication

Authors: José Raimundo Carvalho, Diego de Maria André, Yuri Costa

We investigate the dynamics of the update of subjective homicide
victimization risk after an informational shock by developing two econometric
models able to accommodate both optimal decisions of changing prior
expectations which enable us to rationalize skeptical Bayesian agents with
their disregard to new information. We apply our models to a unique household
data (N = 4,030) that consists of socioeconomic and victimization expectation
variables in Brazil, coupled with an informational “natural experiment”
brought by the sample design methodology, which randomized interviewers to
interviewees. The higher priors about their own subjective homicide
victimization risk are set, the more likely individuals are to change their
initial perceptions. In case of an update, we find that elders and females are
more reluctant to change priors and choose the new response level. In addition,
even though the respondents' level of education is not significant, the
interviewers' level of education has a key role in changing and updating
decisions. The results show that our econometric approach fits reasonable well
the available empirical evidence, stressing the salient role heterogeneity
represented by individual characteristics of interviewees and interviewers have
on belief updating and lack of it, say, skepticism. Furthermore, we can
rationalize skeptics through an informational quality/credibility argument.

arXiv link: http://arxiv.org/abs/2312.08171v1

Econometrics arXiv updated paper (originally submitted: 2023-12-13)

Efficiency of QMLE for dynamic panel data models with interactive effects

Authors: Jushan Bai

This paper studies the problem of efficient estimation of panel data models
in the presence of an increasing number of incidental parameters. We formulate
the dynamic panel as a simultaneous equations system, and derive the efficiency
bound under the normality assumption. We then show that the Gaussian
quasi-maximum likelihood estimator (QMLE) applied to the system achieves the
normality efficiency bound without the normality assumption. Comparison of QMLE
with the fixed effects approach is made.

arXiv link: http://arxiv.org/abs/2312.07881v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2023-12-12

On Rosenbaum's Rank-based Matching Estimator

Authors: Matias D. Cattaneo, Fang Han, Zhexiao Lin

In two influential contributions, Rosenbaum (2005, 2020) advocated for using
the distances between component-wise ranks, instead of the original data
values, to measure covariate similarity when constructing matching estimators
of average treatment effects. While the intuitive benefits of using covariate
ranks for matching estimation are apparent, there is no theoretical
understanding of such procedures in the literature. We fill this gap by
demonstrating that Rosenbaum's rank-based matching estimator, when coupled with
a regression adjustment, enjoys the properties of double robustness and
semiparametric efficiency without the need to enforce restrictive covariate
moment assumptions. Our theoretical findings further emphasize the statistical
virtues of employing ranks for estimation and inference, more broadly aligning
with the insights put forth by Peter Bickel in his 2004 Rietz lecture (Bickel,
2004).

arXiv link: http://arxiv.org/abs/2312.07683v2

Econometrics arXiv updated paper (originally submitted: 2023-12-12)

Estimating Counterfactual Matrix Means with Short Panel Data

Authors: Lihua Lei, Brad Ross

We develop a new, spectral approach for identifying and estimating average
counterfactual outcomes under a low-rank factor model with short panel data and
general outcome missingness patterns. Applications include event studies and
studies of outcomes of "matches" between agents of two types, e.g. workers and
firms, typically conducted under less-flexible Two-Way-Fixed-Effects (TWFE)
models of outcomes. Given an infinite population of units and a finite number
of outcomes, we show our approach identifies all counterfactual outcome means,
including those not estimable by existing methods, if a particular graph
constructed based on overlaps in observed outcomes between subpopulations is
connected. Our analogous, computationally efficient estimation procedure yields
consistent, asymptotically normal estimates of counterfactual outcome means
under fixed-$T$ (number of outcomes), large-$N$ (sample size) asymptotics. In a
semi-synthetic simulation study based on matched employer-employee data, our
estimator has lower bias and only slightly higher variance than a
TWFE-model-based estimator when estimating average log-wages.

arXiv link: http://arxiv.org/abs/2312.07520v2

Econometrics arXiv updated paper (originally submitted: 2023-12-11)

Structural Analysis of Vector Autoregressive Models

Authors: Christis Katsouris

This set of lecture notes discuss key concepts for the Structural Analysis of
Vector Autoregressive models for the teaching of a course on Applied
Macroeconometrics with Advanced Topics.

arXiv link: http://arxiv.org/abs/2312.06402v9

Econometrics arXiv paper, submitted: 2023-12-11

Trends in Temperature Data: Micro-foundations of Their Nature

Authors: Maria Dolores Gadea, Jesus Gonzalo, Andrey Ramos

Determining whether Global Average Temperature (GAT) is an integrated process
of order 1, I(1), or is a stationary process around a trend function is crucial
for detection, attribution, impact and forecasting studies of climate change.
In this paper, we investigate the nature of trends in GAT building on the
analysis of individual temperature grids. Our 'micro-founded' evidence suggests
that GAT is stationary around a non-linear deterministic trend in the form of a
linear function with a one-period structural break. This break can be
attributed to a combination of individual grid breaks and the standard
aggregation method under acceleration in global warming. We illustrate our
findings using simulations.

arXiv link: http://arxiv.org/abs/2312.06379v1

Econometrics arXiv updated paper (originally submitted: 2023-12-10)

Fused Extended Two-Way Fixed Effects for Difference-in-Differences With Staggered Adoptions

Authors: Gregory Faletto

To address the bias of the canonical two-way fixed effects estimator for
difference-in-differences under staggered adoptions, Wooldridge (2021) proposed
the extended two-way fixed effects estimator, which adds many parameters.
However, this reduces efficiency. Restricting some of these parameters to be
equal (for example, subsequent treatment effects within a cohort) helps, but ad
hoc restrictions may reintroduce bias. We propose a machine learning estimator
with a single tuning parameter, fused extended two-way fixed effects (FETWFE),
that enables automatic data-driven selection of these restrictions. We prove
that under an appropriate sparsity assumption FETWFE identifies the correct
restrictions with probability tending to one, which improves efficiency. We
also prove the consistency, oracle property, and asymptotic normality of FETWFE
for several classes of heterogeneous marginal treatment effect estimators under
either conditional or marginal parallel trends, and we prove the same results
for conditional average treatment effects under conditional parallel trends. We
provide an R package implementing fused extended two-way fixed effects, and we
demonstrate FETWFE in simulation studies and an empirical application.

arXiv link: http://arxiv.org/abs/2312.05985v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-12-10

Dynamic Spatiotemporal ARCH Models: Small and Large Sample Results

Authors: Philipp Otto, Osman Doğan, Süleyman Taşpınar

This paper explores the estimation of a dynamic spatiotemporal autoregressive
conditional heteroscedasticity (ARCH) model. The log-volatility term in this
model can depend on (i) the spatial lag of the log-squared outcome variable,
(ii) the time-lag of the log-squared outcome variable, (iii) the spatiotemporal
lag of the log-squared outcome variable, (iv) exogenous variables, and (v) the
unobserved heterogeneity across regions and time, i.e., the regional and time
fixed effects. We examine the small and large sample properties of two
quasi-maximum likelihood estimators and a generalized method of moments
estimator for this model. We first summarize the theoretical properties of
these estimators and then compare their finite sample properties through Monte
Carlo simulations.

arXiv link: http://arxiv.org/abs/2312.05898v1

Econometrics arXiv updated paper (originally submitted: 2023-12-10)

Causal inference and policy evaluation without a control group

Authors: Augusto Cerqua, Marco Letta, Fiammetta Menchetti

Without a control group, the most widespread methodologies for estimating
causal effects cannot be applied. To fill this gap, we propose the Machine
Learning Control Method, a new approach for causal panel analysis that
estimates causal parameters without relying on untreated units. We formalize
identification within the potential outcomes framework and then provide
estimation based on machine learning algorithms. To illustrate the practical
relevance of our method, we present simulation evidence, a replication study,
and an empirical application on the impact of the COVID-19 crisis on
educational inequality. We implement the proposed approach in the companion R
package MachineControl

arXiv link: http://arxiv.org/abs/2312.05858v2

Econometrics arXiv paper, submitted: 2023-12-09

Influence Analysis with Panel Data

Authors: Annalivia Polselli

The presence of units with extreme values in the dependent and/or independent
variables (i.e., vertical outliers, leveraged data) has the potential to
severely bias regression coefficients and/or standard errors. This is common
with short panel data because the researcher cannot advocate asymptotic theory.
Example include cross-country studies, cell-group analyses, and field or
laboratory experimental studies, where the researcher is forced to use few
cross-sectional observations repeated over time due to the structure of the
data or research design. Available diagnostic tools may fail to properly detect
these anomalies, because they are not designed for panel data. In this paper,
we formalise statistical measures for panel data models with fixed effects to
quantify the degree of leverage and outlyingness of units, and the joint and
conditional influences of pairs of units. We first develop a method to visually
detect anomalous units in a panel data set, and identify their type. Second, we
investigate the effect of these units on LS estimates, and on other units'
influence on the estimated parameters. To illustrate and validate the proposed
method, we use a synthetic data set contaminated with different types of
anomalous units. We also provide an empirical example.

arXiv link: http://arxiv.org/abs/2312.05700v1

Econometrics arXiv updated paper (originally submitted: 2023-12-09)

Economic Forecasts Using Many Noises

Authors: Yuan Liao, Xinjie Ma, Andreas Neuhierl, Zhentao Shi

This paper addresses a key question in economic forecasting: does pure noise
truly lack predictive power? Economists typically conduct variable selection to
eliminate noises from predictors. Yet, we prove a compelling result that in
most economic forecasts, the inclusion of noises in predictions yields greater
benefits than its exclusion. Furthermore, if the total number of predictors is
not sufficiently large, intentionally adding more noises yields superior
forecast performance, outperforming benchmark predictors relying on dimension
reduction. The intuition lies in economic predictive signals being densely
distributed among regression coefficients, maintaining modest forecast bias
while diversifying away overall variance, even when a significant proportion of
predictors constitute pure noises. One of our empirical demonstrations shows
that intentionally adding 300 6,000 pure noises to the Welch and Goyal (2008)
dataset achieves a noteworthy 10% out-of-sample R square accuracy in
forecasting the annual U.S. equity premium. The performance surpasses the
majority of sophisticated machine learning models.

arXiv link: http://arxiv.org/abs/2312.05593v2

Econometrics arXiv updated paper (originally submitted: 2023-12-08)

GCov-Based Portmanteau Test

Authors: Joann Jasiak, Aryan Manafi Neyazi

We study nonlinear serial dependence tests for non-Gaussian time series and
residuals of dynamic models based on portmanteau statistics involving nonlinear
autocovariances. A new test with an asymptotic $\chi^2$ distribution is
introduced for testing nonlinear serial dependence (NLSD) in time series. This
test is inspired by the Generalized Covariance (GCov) residual-based
specification test, recently proposed as a diagnostic tool for semi-parametric
dynamic models with i.i.d. non-Gaussian errors. It has a $\chi^2$ distribution
when the model is correctly specified and estimated by the GCov estimator. We
derive new asymptotic results under local alternatives for testing hypotheses
on the parameters of a semi-parametric model. We extend it by introducing a
GCov bootstrap test for residual diagnostics,black which is also
available for models estimated by a different method, such as the maximum
likelihood estimator under a parametric assumption on the error distribution.
black A simulation study shows that the tests perform well in
applications to mixed causal-noncausal autoregressive models. The GCov
specification test is used to assess the fit of a mixed causal-noncausal model
of aluminum prices with locally explosive patterns, i.e. bubbles and spikes
between 2005 and 2024.

arXiv link: http://arxiv.org/abs/2312.05373v2

Econometrics arXiv paper, submitted: 2023-12-08

Occasionally Misspecified

Authors: Jean-Jacques Forneron

When fitting a particular Economic model on a sample of data, the model may
turn out to be heavily misspecified for some observations. This can happen
because of unmodelled idiosyncratic events, such as an abrupt but short-lived
change in policy. These outliers can significantly alter estimates and
inferences. A robust estimation is desirable to limit their influence. For
skewed data, this induces another bias which can also invalidate the estimation
and inferences. This paper proposes a robust GMM estimator with a simple bias
correction that does not degrade robustness significantly. The paper provides
finite-sample robustness bounds, and asymptotic uniform equivalence with an
oracle that discards all outliers. Consistency and asymptotic normality ensue
from that result. An application to the "Price-Puzzle," which finds inflation
increases when monetary policy tightens, illustrates the concerns and the
method. The proposed estimator finds the intuitive result: tighter monetary
policy leads to a decline in inflation.

arXiv link: http://arxiv.org/abs/2312.05342v1

Econometrics arXiv updated paper (originally submitted: 2023-12-07)

Probabilistic Scenario-Based Assessment of National Food Security Risks with Application to Egypt and Ethiopia

Authors: Phoebe Koundouri, Georgios I. Papayiannis, Achilleas Vassilopoulos, Athanasios N. Yannacopoulos

This study presents a novel approach to assessing food security risks at the
national level, employing a probabilistic scenario-based framework that
integrates both Shared Socioeconomic Pathways (SSP) and Representative
Concentration Pathways (RCP). This innovative method allows each scenario,
encompassing socio-economic and climate factors, to be treated as a model
capable of generating diverse trajectories. This approach offers a more dynamic
understanding of food security risks under varying future conditions. The paper
details the methodologies employed, showcasing their applicability through a
focused analysis of food security challenges in Egypt and Ethiopia, and
underscores the importance of considering a spectrum of socio-economic and
climatic factors in national food security assessments.

arXiv link: http://arxiv.org/abs/2312.04428v2

Econometrics arXiv cross-link from q-fin.CP (q-fin.CP), submitted: 2023-12-06

Alternative models for FX, arbitrage opportunities and efficient pricing of double barrier options in Lévy models

Authors: Svetlana Boyarchenko, Sergei Levendorskii

We analyze the qualitative differences between prices of double barrier
no-touch options in the Heston model and pure jump KoBoL model calibrated to
the same set of the empirical data, and discuss the potential for arbitrage
opportunities if the correct model is a pure jump model. We explain and
demonstrate with numerical examples that accurate and fast calculations of
prices of double barrier options in jump models are extremely difficult using
the numerical methods available in the literature. We develop a new efficient
method (GWR-SINH method) based of the Gaver-Wynn-Rho acceleration applied to
the Bromwich integral; the SINH-acceleration and simplified trapezoid rule are
used to evaluate perpetual double barrier options for each value of the
spectral parameter in GWR-algorithm. The program in Matlab running on a Mac
with moderate characteristics achieves the precision of the order of E-5 and
better in several several dozen of milliseconds; the precision E-07 is
achievable in about 0.1 sec. We outline the extension of GWR-SINH method to
regime-switching models and models with stochastic parameters and stochastic
interest rates.

arXiv link: http://arxiv.org/abs/2312.03915v1

Econometrics arXiv paper, submitted: 2023-12-05

A Theory Guide to Using Control Functions to Instrument Hazard Models

Authors: William Liu

I develop the theory around using control functions to instrument hazard
models, allowing the inclusion of endogenous (e.g., mismeasured) regressors.
Simple discrete-data hazard models can be expressed as binary choice panel data
models, and the widespread Prentice and Gloeckler (1978) discrete-data
proportional hazards model can specifically be expressed as a complementary
log-log model with time fixed effects. This allows me to recast it as GMM
estimation and its instrumented version as sequential GMM estimation in a
Z-estimation (non-classical GMM) framework; this framework can then be
leveraged to establish asymptotic properties and sufficient conditions. Whilst
this paper focuses on the Prentice and Gloeckler (1978) model, the methods and
discussion developed here can be applied more generally to other hazard models
and binary choice models. I also introduce my Stata command for estimating a
complementary log-log model instrumented via control functions (available as
ivcloglog on SSC), which allows practitioners to easily instrument the Prentice
and Gloeckler (1978) model.

arXiv link: http://arxiv.org/abs/2312.03165v1

Econometrics arXiv updated paper (originally submitted: 2023-12-04)

Almost Dominance: Inference and Application

Authors: Xiaojun Song, Zhenting Sun

This paper proposes a general framework for inference on three types of
almost dominances: almost Lorenz dominance, almost inverse stochastic
dominance, and almost stochastic dominance. We first generalize almost Lorenz
dominance to almost upward and downward Lorenz dominances. We then provide a
bootstrap inference procedure for the Lorenz dominance coefficients, which
measure the degrees of almost Lorenz dominance. Furthermore, we propose almost
upward and downward inverse stochastic dominances and provide inference on the
inverse stochastic dominance coefficients. We also show that our results can
easily be extended to almost stochastic dominance. Simulation studies
demonstrate the finite sample properties of the proposed estimators and the
bootstrap confidence intervals. This framework can be applied to economic
analysis, particularly in the areas of social welfare, inequality, and decision
making under uncertainty. As an empirical example, we apply the methods to the
inequality growth in the United Kingdom and find evidence for almost upward
inverse stochastic dominance.

arXiv link: http://arxiv.org/abs/2312.02288v2

Econometrics arXiv paper, submitted: 2023-12-04

Bayesian Nonlinear Regression using Sums of Simple Functions

Authors: Florian Huber

This paper proposes a new Bayesian machine learning model that can be applied
to large datasets arising in macroeconomics. Our framework sums over many
simple two-component location mixtures. The transition between components is
determined by a logistic function that depends on a single threshold variable
and two hyperparameters. Each of these individual models only accounts for a
minor portion of the variation in the endogenous variables. But many of them
are capable of capturing arbitrary nonlinear conditional mean relations.
Conjugate priors enable fast and efficient inference. In simulations, we show
that our approach produces accurate point and density forecasts. In a real-data
exercise, we forecast US macroeconomic aggregates and consider the nonlinear
effects of financial shocks in a large-scale nonlinear VAR.

arXiv link: http://arxiv.org/abs/2312.01881v1

Econometrics arXiv updated paper (originally submitted: 2023-12-02)

A Method of Moments Approach to Asymptotically Unbiased Synthetic Controls

Authors: Joseph Fry

A common approach to constructing a Synthetic Control unit is to fit on the
outcome variable and covariates in pre-treatment time periods, but it has been
shown by Ferman and Pinto (2019) that this approach does not provide asymptotic
unbiasedness when the fit is imperfect and the number of controls is fixed.
Many related panel methods have a similar limitation when the number of units
is fixed. I introduce and evaluate a new method in which the Synthetic Control
is constructed using a General Method of Moments approach where units not being
included in the Synthetic Control are used as instruments. I show that a
Synthetic Control Estimator of this form will be asymptotically unbiased as the
number of pre-treatment time periods goes to infinity, even when pre-treatment
fit is imperfect and the number of units is fixed. Furthermore, if both the
number of pre-treatment and post-treatment time periods go to infinity, then
averages of treatment effects can be consistently estimated. I conduct
simulations and an empirical application to compare the performance of this
method with existing approaches in the literature.

arXiv link: http://arxiv.org/abs/2312.01209v2

Econometrics arXiv updated paper (originally submitted: 2023-12-02)

Inference on many jumps in nonparametric panel regression models

Authors: Likai Chen, Georg Keilbar, Liangjun Su, Weining Wang

We investigate the significance of change-points within fully nonparametric
regression contexts, with a particular focus on panel data where data
generation processes vary across units, and error terms may display complex
dependency structures. In our setting the threshold effect depends on one
specific covariate, and we permit the true nonparametric regression to vary
based on additional (latent) variables. We propose two uniform testing
procedures: one to assess the existence of change-points and another to
evaluate the uniformity of such effects across units. Our approach involves
deriving a straightforward analytical expression to approximate the
variance-covariance structure of change-point effects under general dependency
conditions. Notably, when Gaussian approximations are made to these test
statistics, the intricate dependency structures within the data can be safely
disregarded owing to the localized nature of the statistics. This finding bears
significant implications for obtaining critical values. Through extensive
simulations, we demonstrate that our tests exhibit excellent control over size
and reasonable power performance in finite samples, irrespective of strong
cross-sectional and weak serial dependency within the data. Furthermore,
applying our tests to two datasets reveals the existence of significant
nonsmooth effects in both cases.

arXiv link: http://arxiv.org/abs/2312.01162v3

Econometrics arXiv paper, submitted: 2023-12-01

Identification and Inference for Synthetic Controls with Confounding

Authors: Guido W. Imbens, Davide Viviano

This paper studies inference on treatment effects in panel data settings with
unobserved confounding. We model outcome variables through a factor model with
random factors and loadings. Such factors and loadings may act as unobserved
confounders: when the treatment is implemented depends on time-varying factors,
and who receives the treatment depends on unit-level confounders. We study the
identification of treatment effects and illustrate the presence of a trade-off
between time and unit-level confounding. We provide asymptotic results for
inference for several Synthetic Control estimators and show that different
sources of randomness should be considered for inference, depending on the
nature of confounding. We conclude with a comparison of Synthetic Control
estimators with alternatives for factor models.

arXiv link: http://arxiv.org/abs/2312.00955v1

Econometrics arXiv updated paper (originally submitted: 2023-12-01)

Inference on common trends in functional time series

Authors: Morten Ørregaard Nielsen, Won-Ki Seo, Dakyung Seong

We study statistical inference on unit roots and cointegration for time
series in a Hilbert space. We develop statistical inference on the number of
common stochastic trends embedded in the time series, i.e., the dimension of
the nonstationary subspace. We also consider tests of hypotheses on the
nonstationary and stationary subspaces themselves. The Hilbert space can be of
an arbitrarily large dimension, and our methods remain asymptotically valid
even when the time series of interest takes values in a subspace of possibly
unknown dimension. This has wide applicability in practice; for example, to the
case of cointegrated vector time series that are either high-dimensional or of
finite dimension, to high-dimensional factor model that includes a finite
number of nonstationary factors, to cointegrated curve-valued (or
function-valued) time series, and to nonstationary dynamic functional factor
models. We include two empirical illustrations to the term structure of
interest rates and labor market indices, respectively.

arXiv link: http://arxiv.org/abs/2312.00590v4

Econometrics arXiv updated paper (originally submitted: 2023-12-01)

GMM-lev estimation and individual heterogeneity: Monte Carlo evidence and empirical applications

Authors: Maria Elena Bontempi, Jan Ditzen

We introduce a new estimator, CRE-GMM, which exploits the correlated random
effects (CRE) approach within the generalised method of moments (GMM),
specifically applied to level equations, GMM-lev. It has the advantage of
estimating the effect of measurable time-invariant covariates using all
available information. This is not possible with GMM-dif, applied to the
equations of each period transformed into first differences, while GMM-sys uses
little information as it adds the equation in levels for only one period. The
GMM-lev, by implying a two-component error term containing individual
heterogeneity and shock, exposes the explanatory variables to possible double
endogeneity. For example, the estimation of actual persistence could suffer
from bias if instruments were correlated with the unit-specific error
component. The CRE-GMM deals with double endogeneity, captures initial
conditions and enhance inference. Monte Carlo simulations for different panel
types and under different double endogeneity assumptions show the advantage of
our approach. The empirical applications on production and R&D contribute to
clarify the advantages of using CRE-GMM.

arXiv link: http://arxiv.org/abs/2312.00399v2

Econometrics arXiv paper, submitted: 2023-12-01

Stochastic volatility models with skewness selection

Authors: Igor Ferreira Batista Martins, Hedibert Freitas Lopes

This paper expands traditional stochastic volatility models by allowing for
time-varying skewness without imposing it. While dynamic asymmetry may capture
the likely direction of future asset returns, it comes at the risk of leading
to overparameterization. Our proposed approach mitigates this concern by
leveraging sparsity-inducing priors to automatically selects the skewness
parameter as being dynamic, static or zero in a data-driven framework. We
consider two empirical applications. First, in a bond yield application,
dynamic skewness captures interest rate cycles of monetary easing and
tightening being partially explained by central banks' mandates. In an currency
modeling framework, our model indicates no skewness in the carry factor after
accounting for stochastic volatility which supports the idea of carry crashes
being the result of volatility surges instead of dynamic skewness.

arXiv link: http://arxiv.org/abs/2312.00282v1

Econometrics arXiv paper, submitted: 2023-11-30

Bootstrap Inference on Partially Linear Binary Choice Model

Authors: Wenzheng Gao, Zhenting Sun

The partially linear binary choice model can be used for estimating
structural equations where nonlinearity may appear due to diminishing marginal
returns, different life cycle regimes, or hectic physical phenomena. The
inference procedure for this model based on the analytic asymptotic
approximation could be unreliable in finite samples if the sample size is not
sufficiently large. This paper proposes a bootstrap inference approach for the
model. Monte Carlo simulations show that the proposed inference method performs
well in finite samples compared to the procedure based on the asymptotic
approximation.

arXiv link: http://arxiv.org/abs/2311.18759v1

Econometrics arXiv paper, submitted: 2023-11-30

Identification in Endogenous Sequential Treatment Regimes

Authors: Pedro Picchetti

This paper develops a novel nonparametric identification method for treatment
effects in settings where individuals self-select into treatment sequences. I
propose an identification strategy which relies on a dynamic version of
standard Instrumental Variables (IV) assumptions and builds on a dynamic
version of the Marginal Treatment Effects (MTE) as the fundamental building
block for treatment effects. The main contribution of the paper is to relax
assumptions on the support of the observed variables and on unobservable gains
of treatment that are present in the dynamic treatment effects literature.
Monte Carlo simulation studies illustrate the desirable finite-sample
performance of a sieve estimator for MTEs and Average Treatment Effects (ATEs)
on a close-to-application simulation study.

arXiv link: http://arxiv.org/abs/2311.18555v1

Econometrics arXiv paper, submitted: 2023-11-29

Extrapolating Away from the Cutoff in Regression Discontinuity Designs

Authors: Yiwei Sun

Canonical RD designs yield credible local estimates of the treatment effect
at the cutoff under mild continuity assumptions, but they fail to identify
treatment effects away from the cutoff without additional assumptions. The
fundamental challenge of identifying treatment effects away from the cutoff is
that the counterfactual outcome under the alternative treatment status is never
observed. This paper aims to provide a methodological blueprint to identify
treatment effects away from the cutoff in various empirical settings by
offering a non-exhaustive list of assumptions on the counterfactual outcome.
Instead of assuming the exact evolution of the counterfactual outcome, this
paper bounds its variation using the data and sensitivity parameters. The
proposed assumptions are weaker than those introduced previously in the
literature, resulting in partially identified treatment effects that are less
susceptible to assumption violations. This approach accommodates both single
cutoff and multi-cutoff designs. The specific choice of the extrapolation
assumption depends on the institutional background of each empirical
application. Additionally, researchers are recommended to conduct sensitivity
analysis on the chosen parameter and assess resulting shifts in conclusions.
The paper compares the proposed identification results with results using
previous methods via an empirical application and simulated data. It
demonstrates that set identification yields a more credible conclusion about
the sign of the treatment effect.

arXiv link: http://arxiv.org/abs/2311.18136v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-11-29

On the Limits of Regression Adjustment

Authors: Daniel Ting, Kenneth Hung

Regression adjustment, sometimes known as Controlled-experiment Using
Pre-Experiment Data (CUPED), is an important technique in internet
experimentation. It decreases the variance of effect size estimates, often
cutting confidence interval widths in half or more while never making them
worse. It does so by carefully regressing the goal metric against
pre-experiment features to reduce the variance. The tremendous gains of
regression adjustment begs the question: How much better can we do by
engineering better features from pre-experiment data, for example by using
machine learning techniques or synthetic controls? Could we even reduce the
variance in our effect sizes arbitrarily close to zero with the right
predictors? Unfortunately, our answer is negative. A simple form of regression
adjustment, which uses just the pre-experiment values of the goal metric,
captures most of the benefit. Specifically, under a mild assumption that
observations closer in time are easier to predict that ones further away in
time, we upper bound the potential gains of more sophisticated feature
engineering, with respect to the gains of this simple form of regression
adjustment. The maximum reduction in variance is $50%$ in Theorem 1, or
equivalently, the confidence interval width can be reduced by at most an
additional $29%$.

arXiv link: http://arxiv.org/abs/2311.17858v1

Econometrics arXiv updated paper (originally submitted: 2023-11-29)

Identifying Causal Effects of Discrete, Ordered and ContinuousTreatments using Multiple Instrumental Variables

Authors: Nadja van 't Hoff

Inferring causal relationships from observational data is often challenging
due to endogeneity. This paper provides new identification results for causal
effects of discrete, ordered and continuous treatments using multiple binary
instruments. The key contribution is the identification of a new causal
parameter that has a straightforward interpretation with a positive weighting
scheme and is applicable in many settings due to a mild monotonicity
assumption. This paper further leverages recent advances in causal machine
learning for both estimation and the detection of local violations of the
underlying monotonicity assumption. The methodology is applied to estimate the
returns to education and assess the impact of having an additional child on
female labor market outcomes.

arXiv link: http://arxiv.org/abs/2311.17575v3

Econometrics arXiv updated paper (originally submitted: 2023-11-28)

Optimal Categorical Instrumental Variables

Authors: Thomas Wiemann

This paper discusses estimation with a categorical instrumental variable in
settings with potentially few observations per category. The proposed
categorical instrumental variable estimator (CIV) leverages a regularization
assumption that implies existence of a latent categorical variable with fixed
finite support achieving the same first stage fit as the observed instrument.
In asymptotic regimes that allow the number of observations per category to
grow at arbitrary small polynomial rate with the sample size, I show that when
the cardinality of the support of the optimal instrument is known, CIV is
root-n asymptotically normal, achieves the same asymptotic variance as the
oracle IV estimator that presumes knowledge of the optimal instrument, and is
semiparametrically efficient under homoskedasticity. Under-specifying the
number of support points reduces efficiency but maintains asymptotic normality.
In an application that leverages judge fixed effects as instruments, CIV
compares favorably to commonly used jackknife-based instrumental variable
estimators.

arXiv link: http://arxiv.org/abs/2311.17021v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2023-11-28

On the adaptation of causal forests to manifold data

Authors: Yiyi Huo, Yingying Fan, Fang Han

Researchers often hold the belief that random forests are "the cure to the
world's ills" (Bickel, 2010). But how exactly do they achieve this? Focused on
the recently introduced causal forests (Athey and Imbens, 2016; Wager and
Athey, 2018), this manuscript aims to contribute to an ongoing research trend
towards answering this question, proving that causal forests can adapt to the
unknown covariate manifold structure. In particular, our analysis shows that a
causal forest estimator can achieve the optimal rate of convergence for
estimating the conditional average treatment effect, with the covariate
dimension automatically replaced by the manifold dimension. These findings
align with analogous observations in the realm of deep learning and resonate
with the insights presented in Peter Bickel's 2004 Rietz lecture.

arXiv link: http://arxiv.org/abs/2311.16486v2

Econometrics arXiv updated paper (originally submitted: 2023-11-28)

Inference for Low-rank Models without Estimating the Rank

Authors: Jungjun Choi, Hyukjun Kwon, Yuan Liao

This paper studies the inference about linear functionals of high-dimensional
low-rank matrices. While most existing inference methods would require
consistent estimation of the true rank, our procedure is robust to rank
misspecification, making it a promising approach in applications where rank
estimation can be unreliable. We estimate the low-rank spaces using
pre-specified weighting matrices, known as diversified projections. A novel
statistical insight is that, unlike the usual statistical wisdom that
overfitting mainly introduces additional variances, the over-estimated low-rank
space also gives rise to a non-negligible bias due to an implicit ridge-type
regularization. We develop a new inference procedure and show that the central
limit theorem holds as long as the pre-specified rank is no smaller than the
true rank. In one of our applications, we study multiple testing with
incomplete data in the presence of confounding factors and show that our method
remains valid as long as the number of controlled confounding factors is at
least as large as the true number, even when no confounding factors are
present.

arXiv link: http://arxiv.org/abs/2311.16440v2

Econometrics arXiv updated paper (originally submitted: 2023-11-27)

From Reactive to Proactive Volatility Modeling with Hemisphere Neural Networks

Authors: Philippe Goulet Coulombe, Mikael Frenette, Karin Klieber

We reinvigorate maximum likelihood estimation (MLE) for macroeconomic density
forecasting through a novel neural network architecture with dedicated mean and
variance hemispheres. Our architecture features several key ingredients making
MLE work in this context. First, the hemispheres share a common core at the
entrance of the network which accommodates for various forms of time variation
in the error variance. Second, we introduce a volatility emphasis constraint
that breaks mean/variance indeterminacy in this class of overparametrized
nonlinear models. Third, we conduct a blocked out-of-bag reality check to curb
overfitting in both conditional moments. Fourth, the algorithm utilizes
standard deep learning software and thus handles large data sets - both
computationally and statistically. Ergo, our Hemisphere Neural Network (HNN)
provides proactive volatility forecasts based on leading indicators when it
can, and reactive volatility based on the magnitude of previous prediction
errors when it must. We evaluate point and density forecasts with an extensive
out-of-sample experiment and benchmark against a suite of models ranging from
classics to more modern machine learning-based offerings. In all cases, HNN
fares well by consistently providing accurate mean/variance forecasts for all
targets and horizons. Studying the resulting volatility paths reveals its
versatility, while probabilistic forecasting evaluation metrics showcase its
enviable reliability. Finally, we also demonstrate how this machinery can be
merged with other structured deep learning models by revisiting Goulet Coulombe
(2022)'s Neural Phillips Curve.

arXiv link: http://arxiv.org/abs/2311.16333v2

Econometrics arXiv updated paper (originally submitted: 2023-11-27)

Using Multiple Outcomes to Improve the Synthetic Control Method

Authors: Liyang Sun, Eli Ben-Michael, Avi Feller

When there are multiple outcome series of interest, Synthetic Control
analyses typically proceed by estimating separate weights for each outcome. In
this paper, we instead propose estimating a common set of weights across
outcomes, by balancing either a vector of all outcomes or an index or average
of them. Under a low-rank factor model, we show that these approaches lead to
lower bias bounds than separate weights, and that averaging leads to further
gains when the number of outcomes grows. We illustrate this via a re-analysis
of the impact of the Flint water crisis on educational outcomes.

arXiv link: http://arxiv.org/abs/2311.16260v3

Econometrics arXiv paper, submitted: 2023-11-27

Robust Conditional Wald Inference for Over-Identified IV

Authors: David S. Lee, Justin McCrary, Marcelo J. Moreira, Jack Porter, Luther Yap

For the over-identified linear instrumental variables model, researchers
commonly report the 2SLS estimate along with the robust standard error and seek
to conduct inference with these quantities. If errors are homoskedastic, one
can control the degree of inferential distortion using the first-stage F
critical values from Stock and Yogo (2005), or use the robust-to-weak
instruments Conditional Wald critical values of Moreira (2003). If errors are
non-homoskedastic, these methods do not apply. We derive the generalization of
Conditional Wald critical values that is robust to non-homoskedastic errors
(e.g., heteroskedasticity or clustered variance structures), which can also be
applied to nonlinear weakly-identified models (e.g. weakly-identified GMM).

arXiv link: http://arxiv.org/abs/2311.15952v1

Econometrics arXiv paper, submitted: 2023-11-27

Valid Wald Inference with Many Weak Instruments

Authors: Luther Yap

This paper proposes three novel test procedures that yield valid inference in
an environment with many weak instrumental variables (MWIV). It is observed
that the t statistic of the jackknife instrumental variable estimator (JIVE)
has an asymptotic distribution that is identical to the two-stage-least squares
(TSLS) t statistic in the just-identified environment. Consequently, test
procedures that were valid for TSLS t are also valid for the JIVE t. Two such
procedures, i.e., VtF and conditional Wald, are adapted directly. By exploiting
a feature of MWIV environments, a third, more powerful, one-sided VtF-based
test procedure can be obtained.

arXiv link: http://arxiv.org/abs/2311.15932v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-11-27

Policy Learning with Distributional Welfare

Authors: Yifan Cui, Sukjin Han

In this paper, we explore optimal treatment allocation policies that target
distributional welfare. Most literature on treatment choice has considered
utilitarian welfare based on the conditional average treatment effect (ATE).
While average welfare is intuitive, it may yield undesirable allocations
especially when individuals are heterogeneous (e.g., with outliers) - the very
reason individualized treatments were introduced in the first place. This
observation motivates us to propose an optimal policy that allocates the
treatment based on the conditional quantile of individual treatment effects
(QoTE). Depending on the choice of the quantile probability, this criterion can
accommodate a policymaker who is either prudent or negligent. The challenge of
identifying the QoTE lies in its requirement for knowledge of the joint
distribution of the counterfactual outcomes, which is not generally
point-identified. We introduce minimax policies that are robust to this model
uncertainty. A range of identifying assumptions can be used to yield more
informative policies. For both stochastic and deterministic policies, we
establish the asymptotic bound on the regret of implementing the proposed
policies. The framework can be generalized to any setting where welfare is
defined as a functional of the joint distribution of the potential outcomes.

arXiv link: http://arxiv.org/abs/2311.15878v4

Econometrics arXiv paper, submitted: 2023-11-27

On Quantile Treatment Effects, Rank Similarity, and Variation of Instrumental Variables

Authors: Sukjin Han, Haiqing Xu

This paper investigates how certain relationship between observed and
counterfactual distributions serves as an identifying condition for treatment
effects when the treatment is endogenous, and shows that this condition holds
in a range of nonparametric models for treatment effects. To this end, we first
provide a novel characterization of the prevalent assumption restricting
treatment heterogeneity in the literature, namely rank similarity. Our
characterization demonstrates the stringency of this assumption and allows us
to relax it in an economically meaningful way, resulting in our identifying
condition. It also justifies the quest of richer exogenous variations in the
data (e.g., multi-valued or multiple instrumental variables) in exchange for
weaker identifying conditions. The primary goal of this investigation is to
provide empirical researchers with tools that are robust and easy to implement
but still yield tight policy evaluations.

arXiv link: http://arxiv.org/abs/2311.15871v1

Econometrics arXiv paper, submitted: 2023-11-27

(Frisch-Waugh-Lovell)': On the Estimation of Regression Models by Row

Authors: Damian Clarke, Nicolás Paris, Benjamín Villena-Roldán

We demonstrate that regression models can be estimated by working
independently in a row-wise fashion. We document a simple procedure which
allows for a wide class of econometric estimators to be implemented
cumulatively, where, in the limit, estimators can be produced without ever
storing more than a single line of data in a computer's memory. This result is
useful in understanding the mechanics of many common regression models. These
procedures can be used to speed up the computation of estimates computed via
OLS, IV, Ridge regression, LASSO, Elastic Net, and Non-linear models including
probit and logit, with all common modes of inference. This has implications for
estimation and inference with `big data', where memory constraints may imply
that working with all data at once is particularly costly. We additionally show
that even with moderately sized datasets, this method can reduce computation
time compared with traditional estimation routines.

arXiv link: http://arxiv.org/abs/2311.15829v1

Econometrics arXiv updated paper (originally submitted: 2023-11-26)

Causal Models for Longitudinal and Panel Data: A Survey

Authors: Dmitry Arkhangelsky, Guido Imbens

In this survey we discuss the recent causal panel data literature. This
recent literature has focused on credibly estimating causal effects of binary
interventions in settings with longitudinal data, emphasizing practical advice
for empirical researchers. It pays particular attention to heterogeneity in the
causal effects, often in situations where few units are treated and with
particular structures on the assignment pattern. The literature has extended
earlier work on difference-in-differences or two-way-fixed-effect estimators.
It has more generally incorporated factor models or interactive fixed effects.
It has also developed novel methods using synthetic control approaches.

arXiv link: http://arxiv.org/abs/2311.15458v3

Econometrics arXiv updated paper (originally submitted: 2023-11-25)

An Identification and Dimensionality Robust Test for Instrumental Variables Models

Authors: Manu Navjeevan

Using modifications of Lindeberg's interpolation technique, I propose a new
identification-robust test for the structural parameter in a heteroskedastic
instrumental variables model. While my analysis allows the number of
instruments to be much larger than the sample size, it does not require many
instruments, making my test applicable in settings that have not been well
studied. Instead, the proposed test statistic has a limiting chi-squared
distribution so long as an auxiliary parameter can be consistently estimated.
This is possible using machine learning methods even when the number of
instruments is much larger than the sample size. To improve power, a simple
combination with the sup-score statistic of Belloni et al. (2012) is proposed.
I point out that first-stage F-statistics calculated on LASSO selected
variables may be misleading indicators of identification strength and
demonstrate favorable performance of my proposed methods in both empirical data
and simulation study.

arXiv link: http://arxiv.org/abs/2311.14892v2

Econometrics arXiv paper, submitted: 2023-11-24

A Review of Cross-Sectional Matrix Exponential Spatial Models

Authors: Ye Yang, Osman Dogan, Suleyman Taspinar, Fei Jin

The matrix exponential spatial models exhibit similarities to the
conventional spatial autoregressive model in spatial econometrics but offer
analytical, computational, and interpretive advantages. This paper provides a
comprehensive review of the literature on the estimation, inference, and model
selection approaches for the cross-sectional matrix exponential spatial models.
We discuss summary measures for the marginal effects of regressors and detail
the matrix-vector product method for efficient estimation. Our aim is not only
to summarize the main findings from the spatial econometric literature but also
to make them more accessible to applied researchers. Additionally, we
contribute to the literature by introducing some new results. We propose an
M-estimation approach for models with heteroskedastic error terms and
demonstrate that the resulting M-estimator is consistent and has an asymptotic
normal distribution. We also consider some new results for model selection
exercises. In a Monte Carlo study, we examine the finite sample properties of
various estimators from the literature alongside the M-estimator.

arXiv link: http://arxiv.org/abs/2311.14813v1

Econometrics arXiv updated paper (originally submitted: 2023-11-23)

Reproducible Aggregation of Sample-Split Statistics

Authors: David M. Ritzwoller, Joseph P. Romano

Statistical inference is often simplified by sample-splitting. This
simplification comes at the cost of the introduction of randomness not native
to the data. We propose a simple procedure for sequentially aggregating
statistics constructed with multiple splits of the same sample. The user
specifies a bound and a nominal error rate. If the procedure is implemented
twice on the same data, the nominal error rate approximates the chance that the
results differ by more than the bound. We illustrate the application of the
procedure to several widely applied econometric methods.

arXiv link: http://arxiv.org/abs/2311.14204v3

Econometrics arXiv updated paper (originally submitted: 2023-11-23)

Measurement Error and Counterfactuals in Quantitative Trade and Spatial Models

Authors: Bas Sanders

Counterfactuals in quantitative trade and spatial models are functions of the
current state of the world and the model parameters. Common practice treats the
current state of the world as perfectly observed, but there is good reason to
believe that it is measured with error. This paper provides tools for
quantifying uncertainty about counterfactuals when the current state of the
world is measured with error. I recommend an empirical Bayes approach to
uncertainty quantification, and show that it is both practical and
theoretically justified. I apply the proposed method to the settings in Adao,
Costinot, and Donaldson (2017) and Allen and Arkolakis (2022) and find
non-trivial uncertainty about counterfactuals.

arXiv link: http://arxiv.org/abs/2311.14032v4

Econometrics arXiv updated paper (originally submitted: 2023-11-23)

Was Javert right to be suspicious? Marginal Treatment Effects with Duration Outcomes

Authors: Santiago Acerenza, Vitor Possebom, Pedro H. C. Sant'Anna

We identify the distributional and quantile marginal treatment effect
functions when the outcome is right-censored. Our method requires a
conditionally exogenous instrument and random censoring. We propose
asymptotically consistent semi-parametric estimators and valid inferential
procedures for the target functions. To illustrate, we evaluate the effect of
alternative sentences (fines and community service vs. no punishment) on
recidivism in Brazil. Our results highlight substantial treatment effect
heterogeneity: we find that people whom most judges would punish take longer to
recidivate, while people who would be punished only by strict judges recidivate
at an earlier date than if they were not punished.

arXiv link: http://arxiv.org/abs/2311.13969v5

Econometrics arXiv updated paper (originally submitted: 2023-11-22)

Large-Sample Properties of the Synthetic Control Method under Selection on Unobservables

Authors: Dmitry Arkhangelsky, David Hirshberg

We analyze the synthetic control (SC) method in panel data settings with many
units. We assume the treatment assignment is based on unobserved heterogeneity
and pre-treatment information, allowing for both strictly and sequentially
exogenous assignment processes. We show that the critical property that
determines the behavior of the SC method is the ability of input features to
approximate the unobserved heterogeneity. Our results imply that the SC method
delivers asymptotically normal estimators for a large class of linear panel
data models as long as the number of pre-treatment periods is sufficiently
large, making it a natural alternative to the Difference-in-Differences.

arXiv link: http://arxiv.org/abs/2311.13575v2

Econometrics arXiv updated paper (originally submitted: 2023-11-22)

Regressions under Adverse Conditions

Authors: Timo Dimitriadis, Yannick Hoga

We introduce a new regression method that relates the mean of an outcome
variable to covariates, under the "adverse condition" that a distress variable
falls in its tail. This allows to tailor classical mean regressions to adverse
scenarios, which receive increasing interest in economics and finance, among
many others. In the terminology of the systemic risk literature, our method can
be interpreted as a regression for the Marginal Expected Shortfall. We propose
a two-step procedure to estimate the new models, show consistency and
asymptotic normality of the estimator, and propose feasible inference under
weak conditions that allow for cross-sectional and time series applications.
Simulations verify the accuracy of the asymptotic approximations of the
two-step estimator. Two empirical applications show that our regressions under
adverse conditions are a valuable tool in such diverse fields as the study of
the relation between systemic risk and asset price bubbles, and dissecting
macroeconomic growth vulnerabilities into individual components.

arXiv link: http://arxiv.org/abs/2311.13327v3

Econometrics arXiv paper, submitted: 2023-11-21

Predictive Density Combination Using a Tree-Based Synthesis Function

Authors: Tony Chernis, Niko Hauzenberger, Florian Huber, Gary Koop, James Mitchell

Bayesian predictive synthesis (BPS) provides a method for combining multiple
predictive distributions based on agent/expert opinion analysis theory and
encompasses a range of existing density forecast pooling methods. The key
ingredient in BPS is a “synthesis” function. This is typically specified
parametrically as a dynamic linear regression. In this paper, we develop a
nonparametric treatment of the synthesis function using regression trees. We
show the advantages of our tree-based approach in two macroeconomic forecasting
applications. The first uses density forecasts for GDP growth from the euro
area's Survey of Professional Forecasters. The second combines density
forecasts of US inflation produced by many regression models involving
different predictors. Both applications demonstrate the benefits -- in terms of
improved forecast accuracy and interpretability -- of modeling the synthesis
function nonparametrically.

arXiv link: http://arxiv.org/abs/2311.12671v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2023-11-21

Learning Causal Representations from General Environments: Identifiability and Intrinsic Ambiguity

Authors: Jikai Jin, Vasilis Syrgkanis

We study causal representation learning, the task of recovering high-level
latent variables and their causal relationships in the form of a causal graph
from low-level observed data (such as text and images), assuming access to
observations generated from multiple environments. Prior results on the
identifiability of causal representations typically assume access to
single-node interventions which is rather unrealistic in practice, since the
latent variables are unknown in the first place. In this work, we provide the
first identifiability results based on data that stem from general
environments. We show that for linear causal models, while the causal graph can
be fully recovered, the latent variables are only identified up to the
surrounded-node ambiguity (SNA) varici2023score. We provide a
counterpart of our guarantee, showing that SNA is basically unavoidable in our
setting. We also propose an algorithm, LiNGCReL which provably
recovers the ground-truth model up to SNA, and we demonstrate its effectiveness
via numerical experiments. Finally, we consider general non-parametric causal
models and show that the same identification barrier holds when assuming access
to groups of soft single-node interventions.

arXiv link: http://arxiv.org/abs/2311.12267v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-11-20

Adaptive Bayesian Learning with Action and State-Dependent Signal Variance

Authors: Kaiwen Hou

This manuscript presents an advanced framework for Bayesian learning by
incorporating action and state-dependent signal variances into decision-making
models. This framework is pivotal in understanding complex data-feedback loops
and decision-making processes in various economic systems. Through a series of
examples, we demonstrate the versatility of this approach in different
contexts, ranging from simple Bayesian updating in stable environments to
complex models involving social learning and state-dependent uncertainties. The
paper uniquely contributes to the understanding of the nuanced interplay
between data, actions, outcomes, and the inherent uncertainty in economic
models.

arXiv link: http://arxiv.org/abs/2311.12878v2

Econometrics arXiv updated paper (originally submitted: 2023-11-20)

Theory coherent shrinkage of Time-Varying Parameters in VARs

Authors: Andrea Renzetti

This paper introduces a novel theory-coherent shrinkage prior for
Time-Varying Parameter VARs (TVP-VARs). The prior centers the time-varying
parameters on a path implied a priori by an underlying economic theory, chosen
to describe the dynamics of the macroeconomic variables in the system.
Leveraging information from conventional economic theory using this prior
significantly improves inference precision and forecast accuracy compared to
the standard TVP-VAR. In an application, I use this prior to incorporate
information from a New Keynesian model that includes both the Zero Lower Bound
(ZLB) and forward guidance into a medium-scale TVP-VAR model. This approach
leads to more precise estimates of the impulse response functions, revealing a
distinct propagation of risk premium shocks inside and outside the ZLB in US
data.

arXiv link: http://arxiv.org/abs/2311.11858v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-11-20

Modeling economies of scope in joint production: Convex regression of input distance function

Authors: Timo Kuosmanen, Sheng Dai

Modeling of joint production has proved a vexing problem. This paper develops
a radial convex nonparametric least squares (CNLS) approach to estimate the
input distance function with multiple outputs. We document the correct input
distance function transformation and prove that the necessary orthogonality
conditions can be satisfied in radial CNLS. A Monte Carlo study is performed to
compare the finite sample performance of radial CNLS and other deterministic
and stochastic frontier approaches in terms of the input distance function
estimation. We apply our novel approach to the Finnish electricity distribution
network regulation and empirically confirm that the input isoquants become more
curved. In addition, we introduce the weight restriction to radial CNLS to
mitigate the potential overfitting and increase the out-of-sample performance
in energy regulation.

arXiv link: http://arxiv.org/abs/2311.11637v1

Econometrics arXiv cross-link from q-fin.GN (q-fin.GN), submitted: 2023-11-17

High-Throughput Asset Pricing

Authors: Andrew Y. Chen, Chukwuma Dim

We apply empirical Bayes (EB) to mine data on 136,000 long-short strategies
constructed from accounting ratios, past returns, and ticker symbols. This
“high-throughput asset pricing” matches the out-of-sample performance of top
journals while eliminating look-ahead bias. Naively mining for the largest
Sharpe ratios leads to similar performance, consistent with our theoretical
results, though EB uniquely provides unbiased predictions with transparent
intuition. Predictability is concentrated in accounting strategies, small
stocks, and pre-2004 periods, consistent with limited attention theories.
Multiple testing methods popular in finance fail to identify most out-of-sample
performers. High-throughput methods provide a rigorous, unbiased framework for
understanding asset prices.

arXiv link: http://arxiv.org/abs/2311.10685v3

Econometrics arXiv updated paper (originally submitted: 2023-11-16)

Inference in Auctions with Many Bidders Using Transaction Prices

Authors: Federico A. Bugni, Yulong Wang

This paper studies inference in first- and second-price sealed-bid auctions
with many bidders, using an asymptotic framework where the number of bidders
increases while the number of auctions remains fixed. Relevant applications
include online, treasury, spectrum, and art auctions. Our approach enables
asymptotically exact inference on key features such as the winner's expected
utility, the seller's expected revenue, and the tail of the valuation
distribution using only transaction price data. Our simulations demonstrate the
accuracy of the methods in finite samples. We apply our methods to Hong Kong
vehicle license auctions, focusing on high-priced, single-letter plates.

arXiv link: http://arxiv.org/abs/2311.09972v3

Econometrics arXiv paper, submitted: 2023-11-15

Estimating Functionals of the Joint Distribution of Potential Outcomes with Optimal Transport

Authors: Daniel Ober-Reynolds

Many causal parameters depend on a moment of the joint distribution of
potential outcomes. Such parameters are especially relevant in policy
evaluation settings, where noncompliance is common and accommodated through the
model of Imbens & Angrist (1994). This paper shows that the sharp identified
set for these parameters is an interval with endpoints characterized by the
value of optimal transport problems. Sample analogue estimators are proposed
based on the dual problem of optimal transport. These estimators are root-n
consistent and converge in distribution under mild assumptions. Inference
procedures based on the bootstrap are straightforward and computationally
convenient. The ideas and estimators are demonstrated in an application
revisiting the National Supported Work Demonstration job training program. I
find suggestive evidence that workers who would see below average earnings
without treatment tend to see above average benefits from treatment.

arXiv link: http://arxiv.org/abs/2311.09435v1

Econometrics arXiv paper, submitted: 2023-11-15

Incorporating Preferences Into Treatment Assignment Problems

Authors: Daido Kido

This study investigates the problem of individualizing treatment allocations
using stated preferences for treatments. If individuals know in advance how the
assignment will be individualized based on their stated preferences, they may
state false preferences. We derive an individualized treatment rule (ITR) that
maximizes welfare when individuals strategically state their preferences. We
also show that the optimal ITR is strategy-proof, that is, individuals do not
have a strong incentive to lie even if they know the optimal ITR a priori.
Constructing the optimal ITR requires information on the distribution of true
preferences and the average treatment effect conditioned on true preferences.
In practice, the information must be identified and estimated from the data. As
true preferences are hidden information, the identification is not
straightforward. We discuss two experimental designs that allow the
identification: strictly strategy-proof randomized controlled trials and doubly
randomized preference trials. Under the presumption that data comes from one of
these experiments, we develop data-dependent procedures for determining ITR,
that is, statistical treatment rules (STRs). The maximum regret of the proposed
STRs converges to zero at a rate of the square root of the sample size. An
empirical application demonstrates our proposed STRs.

arXiv link: http://arxiv.org/abs/2311.08963v1

Econometrics arXiv paper, submitted: 2023-11-15

Locally Asymptotically Minimax Statistical Treatment Rules Under Partial Identification

Authors: Daido Kido

Policymakers often desire a statistical treatment rule (STR) that determines
a treatment assignment rule deployed in a future population from available
data. With the true knowledge of the data generating process, the average
treatment effect (ATE) is the key quantity characterizing the optimal treatment
rule. Unfortunately, the ATE is often not point identified but partially
identified. Presuming the partial identification of the ATE, this study
conducts a local asymptotic analysis and develops the locally asymptotically
minimax (LAM) STR. The analysis does not assume the full differentiability but
the directional differentiability of the boundary functions of the
identification region of the ATE. Accordingly, the study shows that the LAM STR
differs from the plug-in STR. A simulation study also demonstrates that the LAM
STR outperforms the plug-in STR.

arXiv link: http://arxiv.org/abs/2311.08958v1

Econometrics arXiv updated paper (originally submitted: 2023-11-14)

Estimating Conditional Value-at-Risk with Nonstationary Quantile Predictive Regression Models

Authors: Christis Katsouris

This paper develops an asymptotic distribution theory for an endogenous
instrumentation approach in quantile predictive regressions when both generated
covariates and persistent predictors are used. The generated covariates are
obtained from an auxiliary quantile predictive regression model and the
statistical problem of interest is the robust estimation and inference of the
parameters that correspond to the primary quantile predictive regression in
which this generated covariate is added to the set of nonstationary regressors.
We find that the proposed doubly IVX corrected estimator is robust to the
abstract degree of persistence regardless of the presence of generated
regressor obtained from the first stage procedure. The asymptotic properties of
the two-stage IVX estimator such as mixed Gaussianity are established while the
asymptotic covariance matrix is adjusted to account for the first-step
estimation error.

arXiv link: http://arxiv.org/abs/2311.08218v6

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2023-11-13

Optimal Estimation of Large-Dimensional Nonlinear Factor Models

Authors: Yingjie Feng

This paper studies optimal estimation of large-dimensional nonlinear factor
models. The key challenge is that the observed variables are possibly nonlinear
functions of some latent variables where the functional forms are left
unspecified. A local principal component analysis method is proposed to
estimate the factor structure and recover information on latent variables and
latent functions, which combines $K$-nearest neighbors matching and principal
component analysis. Large-sample properties are established, including a sharp
bound on the matching discrepancy of nearest neighbors, sup-norm error bounds
for estimated local factors and factor loadings, and the uniform convergence
rate of the factor structure estimator. Under mild conditions our estimator of
the latent factor structure can achieve the optimal rate of uniform convergence
for nonparametric regression. The method is illustrated with a Monte Carlo
experiment and an empirical application studying the effect of tax cuts on
economic growth.

arXiv link: http://arxiv.org/abs/2311.07243v1

Econometrics arXiv updated paper (originally submitted: 2023-11-13)

High Dimensional Binary Choice Model with Unknown Heteroskedasticity or Instrumental Variables

Authors: Fu Ouyang, Thomas Tao Yang

This paper proposes a new method for estimating high-dimensional binary
choice models. We consider a semiparametric model that places no distributional
assumptions on the error term, allows for heteroskedastic errors, and permits
endogenous regressors. Our approaches extend the special regressor estimator
originally proposed by Lewbel (2000). This estimator becomes impractical in
high-dimensional settings due to the curse of dimensionality associated with
high-dimensional conditional density estimation. To overcome this challenge, we
introduce an innovative data-driven dimension reduction method for
nonparametric kernel estimators, which constitutes the main contribution of
this work. The method combines distance covariance-based screening with
cross-validation (CV) procedures, making special regressor estimation feasible
in high dimensions. Using this new feasible conditional density estimator, we
address variable and moment (instrumental variable) selection problems for
these models. We apply penalized least squares (LS) and generalized method of
moments (GMM) estimators with an L1 penalty. A comprehensive analysis of the
oracle and asymptotic properties of these estimators is provided. Finally,
through Monte Carlo simulations and an empirical study on the migration
intentions of rural Chinese residents, we demonstrate the effectiveness of our
proposed methods in finite sample settings.

arXiv link: http://arxiv.org/abs/2311.07067v2

Econometrics arXiv updated paper (originally submitted: 2023-11-12)

Design-based Estimation Theory for Complex Experiments

Authors: Haoge Chang

This paper considers the estimation of treatment effects in randomized
experiments with complex experimental designs, including cases with
interference between units. We develop a design-based estimation theory for
arbitrary experimental designs. Our theory facilitates the analysis of many
design-estimator pairs that researchers commonly employ in practice and provide
procedures to consistently estimate asymptotic variance bounds. We propose new
classes of estimators with favorable asymptotic properties from a design-based
point of view. In addition, we propose a scalar measure of experimental
complexity which can be linked to the design-based variance of the estimators.
We demonstrate the performance of our estimators using simulated datasets based
on an actual network experiment studying the effect of social networks on
insurance adoptions.

arXiv link: http://arxiv.org/abs/2311.06891v2

Econometrics arXiv updated paper (originally submitted: 2023-11-12)

Quasi-Bayes in Latent Variable Models

Authors: Sid Kankanala

Latent variable models are widely used to account for unobserved determinants
of economic behavior. This paper introduces a quasi-Bayes approach to
nonparametrically estimate a large class of latent variable models. As an
application, we model U.S. individual log earnings from the Panel Study of
Income Dynamics (PSID) as the sum of latent permanent and transitory
components. Simulations illustrate the favorable performance of quasi-Bayes
estimators relative to common alternatives.

arXiv link: http://arxiv.org/abs/2311.06831v3

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2023-11-10

Cuánto es demasiada inflación? Una clasificación de regímenes inflacionarios

Authors: Manuel de Mier, Fernando Delbianco

The classifications of inflationary regimes proposed in the literature have
mostly been based on arbitrary characterizations, subject to value judgments by
researchers. The objective of this study is to propose a new methodological
approach that reduces subjectivity and improves accuracy in the construction of
such regimes. The method is built upon a combination of clustering techniques
and classification trees, which allows for an historical periodization of
Argentina's inflationary history for the period 1943-2022. Additionally, two
procedures are introduced to smooth out the classification over time: a measure
of temporal contiguity of observations and a rolling method based on the simple
majority rule. The obtained regimes are compared against the existing
literature on the inflation-relative price variability relationship, revealing
a better performance of the proposed regimes.

arXiv link: http://arxiv.org/abs/2401.02428v1

Econometrics arXiv updated paper (originally submitted: 2023-11-10)

Time-Varying Identification of Monetary Policy Shocks

Authors: Annika Camehl, Tomasz Woźniak

We propose a new Bayesian heteroskedastic Markov-switching structural vector
autoregression with data-driven time-varying identification. The model selects
alternative exclusion restrictions over time and, as a condition for the
search, allows to verify identification through heteroskedasticity within each
regime. Based on four alternative monetary policy rules, we show that a monthly
six-variable system supports time variation in US monetary policy shock
identification. In the sample-dominating first regime, systematic monetary
policy follows a Taylor rule extended by the term spread, effectively curbing
inflation. In the second regime, occurring after 2000 and gaining more
persistence after the global financial and COVID crises, it is characterized by
a money-augmented Taylor rule. This regime's unconventional monetary policy
provides economic stimulus, features the liquidity effect, and is complemented
by a pure term spread shock. Absent the specific monetary policy of the second
regime, inflation would be over one percentage point higher on average after
2008.

arXiv link: http://arxiv.org/abs/2311.05883v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-11-10

Business Policy Experiments using Fractional Factorial Designs: Consumer Retention on DoorDash

Authors: Yixin Tang, Yicong Lin, Navdeep S. Sahni

This paper investigates an approach to both speed up business decision-making
and lower the cost of learning through experimentation by factorizing business
policies and employing fractional factorial experimental designs for their
evaluation. We illustrate how this method integrates with advances in the
estimation of heterogeneous treatment effects, elaborating on its advantages
and foundational assumptions. We empirically demonstrate the implementation and
benefits of our approach and assess its validity in evaluating consumer
promotion policies at DoorDash, which is one of the largest delivery platforms
in the US. Our approach discovers a policy with 5% incremental profit at 67%
lower implementation cost.

arXiv link: http://arxiv.org/abs/2311.14698v2

Econometrics arXiv paper, submitted: 2023-11-07

Debiased Fixed Effects Estimation of Binary Logit Models with Three-Dimensional Panel Data

Authors: Amrei Stammann

Naive maximum likelihood estimation of binary logit models with fixed effects
leads to unreliable inference due to the incidental parameter problem. We study
the case of three-dimensional panel data, where the model includes three sets
of additive and overlapping unobserved effects. This encompasses models for
network panel data, where senders and receivers maintain bilateral
relationships over time, and fixed effects account for unobserved heterogeneity
at the sender-time, receiver-time, and sender-receiver levels. In an asymptotic
framework, where all three panel dimensions grow large at constant relative
rates, we characterize the leading bias of the naive estimator. The inference
problem we identify is particularly severe, as it is not possible to balance
the order of the bias and the standard deviation. As a consequence, the naive
estimator has a degenerating asymptotic distribution, which exacerbates the
inference problem relative to other fixed effects estimators studied in the
literature. To resolve the inference problem, we derive explicit expressions to
debias the fixed effects estimator.

arXiv link: http://arxiv.org/abs/2311.04073v1

Econometrics arXiv updated paper (originally submitted: 2023-11-06)

Optimal Estimation Methodologies for Panel Data Regression Models

Authors: Christis Katsouris

This survey study discusses main aspects to optimal estimation methodologies
for panel data regression models. In particular, we present current
methodological developments for modeling stationary panel data as well as
robust methods for estimation and inference in nonstationary panel data
regression models. Some applications from the network econometrics and high
dimensional statistics literature are also discussed within a stationary time
series environment.

arXiv link: http://arxiv.org/abs/2311.03471v3

Econometrics arXiv updated paper (originally submitted: 2023-11-05)

Estimation and Inference for a Class of Generalized Hierarchical Models

Authors: Chaohua Dong, Jiti Gao, Bin Peng, Yayi Yan

In this paper, we consider estimation and inference for the unknown
parameters and function involved in a class of generalized hierarchical models.
Such models are of great interest in the literature of neural networks (such as
Bauer and Kohler, 2019). We propose a rectified linear unit (ReLU) based deep
neural network (DNN) approach, and contribute to the design of DNN by i)
providing more transparency for practical implementation, ii) defining
different types of sparsity, iii) showing the differentiability, iv) pointing
out the set of effective parameters, and v) offering a new variant of rectified
linear activation function (ReLU), etc. Asymptotic properties are established
accordingly, and a feasible procedure for the purpose of inference is also
proposed. We conduct extensive numerical studies to examine the finite-sample
performance of the estimation methods, and we also evaluate the empirical
relevance and applicability of the proposed models and estimation methods to
real data.

arXiv link: http://arxiv.org/abs/2311.02789v5

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-11-04

Individualized Policy Evaluation and Learning under Clustered Network Interference

Authors: Yi Zhang, Kosuke Imai

Although there is now a large literature on policy evaluation and learning,
much of the prior work assumes that the treatment assignment of one unit does
not affect the outcome of another unit. Unfortunately, ignoring interference
can lead to biased policy evaluation and ineffective learned policies. For
example, treating influential individuals who have many friends can generate
positive spillover effects, thereby improving the overall performance of an
individualized treatment rule (ITR). We consider the problem of evaluating and
learning an optimal ITR under clustered network interference (also known as
partial interference), where clusters of units are sampled from a population
and units may influence one another within each cluster. Unlike previous
methods that impose strong restrictions on spillover effects, such as anonymous
interference, the proposed methodology only assumes a semiparametric structural
model, where each unit's outcome is an additive function of individual
treatments within the cluster. Under this model, we propose an estimator that
can be used to evaluate the empirical performance of an ITR. We show that this
estimator is substantially more efficient than the standard inverse probability
weighting estimator, which does not impose any assumption about spillover
effects. We derive the finite-sample regret bound for a learned ITR, showing
that the use of our efficient evaluation estimator leads to the improved
performance of learned policies. We consider both experimental and
observational studies, and for the latter, we develop a doubly robust estimator
that is semiparametrically efficient and yields an optimal regret bound.
Finally, we conduct simulation and empirical studies to illustrate the
advantages of the proposed methodology.

arXiv link: http://arxiv.org/abs/2311.02467v3

Econometrics arXiv updated paper (originally submitted: 2023-11-04)

The Fragility of Sparsity

Authors: Michal Kolesár, Ulrich K. Müller, Sebastian T. Roelsgaard

We show, using three empirical applications, that linear regression estimates
which rely on the assumption of sparsity are fragile in two ways. First, we
document that different choices of the regressor matrix that do not impact
ordinary least squares (OLS) estimates, such as the choice of baseline category
with categorical controls, can move sparsity-based estimates by two standard
errors or more. Second, we develop two tests of the sparsity assumption based
on comparing sparsity-based estimators with OLS. The tests tend to reject the
sparsity assumption in all three applications. Unless the number of regressors
is comparable to or exceeds the sample size, OLS yields more robust inference
at little efficiency cost.

arXiv link: http://arxiv.org/abs/2311.02299v4

Econometrics arXiv paper, submitted: 2023-11-03

Pooled Bewley Estimator of Long Run Relationships in Dynamic Heterogenous Panels

Authors: Alexander Chudik, M. Hashem Pesaran, Ron P. Smith

Using a transformation of the autoregressive distributed lag model due to
Bewley, a novel pooled Bewley (PB) estimator of long-run coefficients for
dynamic panels with heterogeneous short-run dynamics is proposed. The PB
estimator is directly comparable to the widely used Pooled Mean Group (PMG)
estimator, and is shown to be consistent and asymptotically normal. Monte Carlo
simulations show good small sample performance of PB compared to the existing
estimators in the literature, namely PMG, panel dynamic OLS (PDOLS), and panel
fully-modified OLS (FMOLS). Application of two bias-correction methods and a
bootstrapping of critical values to conduct inference robust to cross-sectional
dependence of errors are also considered. The utility of the PB estimator is
illustrated in an empirical application to the aggregate consumption function.

arXiv link: http://arxiv.org/abs/2311.02196v1

Econometrics arXiv updated paper (originally submitted: 2023-11-02)

The learning effects of subsidies to bundled goods: a semiparametric approach

Authors: Luis Alvarez, Ciro Biderman

Can temporary subsidies to bundles induce long-run changes in demand due to
learning about the quality of one of the constituent goods? This paper provides
theoretical support and empirical evidence on this mechanism. Theoretically, we
introduce a model where an agent learns about the quality of an innovation
through repeated consumption. We then assess the predictions of our theory in a
randomised experiment in a ridesharing platform. The experiment subsidised car
trips integrating with a train or metro station, which we interpret as a
bundle. Given the heavy-tailed nature of our data, we propose a semiparametric
specification for treatment effects that enables the construction of more
efficient estimators. We then introduce an efficient estimator for our
specification by relying on L-moments. Our results indicate that a ten-weekday
50% discount on integrated trips leads to a large contemporaneous increase in
the demand for integration, and, consistent with our model, persistent changes
in the mean and dispersion of nonintegrated app rides. These effects last for
over four months. A calibration of our theoretical model suggests that around
40% of the contemporaneous increase in integrated rides may be attributable to
increased incentives to learning. Our results have nontrivial policy
implications for the design of public transit systems.

arXiv link: http://arxiv.org/abs/2311.01217v4

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2023-11-02

Data-driven fixed-point tuning for truncated realized variations

Authors: B. Cooper Boniece, José E. Figueroa-López, Yuchen Han

Many methods for estimating integrated volatility and related functionals of
semimartingales in the presence of jumps require specification of tuning
parameters for their use in practice. In much of the available theory, tuning
parameters are assumed to be deterministic and their values are specified only
up to asymptotic constraints. However, in empirical work and in simulation
studies, they are typically chosen to be random and data-dependent, with
explicit choices often relying entirely on heuristics. In this paper, we
consider novel data-driven tuning procedures for the truncated realized
variations of a semimartingale with jumps based on a type of random fixed-point
iteration. Being effectively automated, our approach alleviates the need for
delicate decision-making regarding tuning parameters in practice and can be
implemented using information regarding sampling frequency alone. We
demonstrate our methods can lead to asymptotically efficient estimation of
integrated volatility and exhibit superior finite-sample performance compared
to popular alternatives in the literature.

arXiv link: http://arxiv.org/abs/2311.00905v3

Econometrics arXiv updated paper (originally submitted: 2023-11-01)

On Gaussian Process Priors in Conditional Moment Restriction Models

Authors: Sid Kankanala

This paper studies quasi Bayesian estimation and uncertainty quantification
for an unknown function that is identified by a nonparametric conditional
moment restriction. We derive contraction rates for a class of Gaussian process
priors. Furthermore, we provide conditions under which a Bernstein von Mises
theorem holds for the quasi-posterior distribution. As a consequence, we show
that optimally weighted quasi-Bayes credible sets have exact asymptotic
frequentist coverage.

arXiv link: http://arxiv.org/abs/2311.00662v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2023-11-01

Personalized Assignment to One of Many Treatment Arms via Regularized and Clustered Joint Assignment Forests

Authors: Rahul Ladhania, Jann Spiess, Lyle Ungar, Wenbo Wu

We consider learning personalized assignments to one of many treatment arms
from a randomized controlled trial. Standard methods that estimate
heterogeneous treatment effects separately for each arm may perform poorly in
this case due to excess variance. We instead propose methods that pool
information across treatment arms: First, we consider a regularized
forest-based assignment algorithm based on greedy recursive partitioning that
shrinks effect estimates across arms. Second, we augment our algorithm by a
clustering scheme that combines treatment arms with consistently similar
outcomes. In a simulation study, we compare the performance of these approaches
to predicting arm-wise outcomes separately, and document gains of directly
optimizing the treatment assignment with regularization and clustering. In a
theoretical model, we illustrate how a high number of treatment arms makes
finding the best arm hard, while we can achieve sizable utility gains from
personalization by regularized optimization.

arXiv link: http://arxiv.org/abs/2311.00577v1

Econometrics arXiv updated paper (originally submitted: 2023-11-01)

Robustify and Tighten the Lee Bounds: A Sample Selection Model under Stochastic Monotonicity and Symmetry Assumptions

Authors: Yuta Okamoto

In the presence of sample selection, Lee's (2009) nonparametric bounds are a
popular tool for estimating a treatment effect. However, the Lee bounds rely on
the monotonicity assumption, whose empirical validity is sometimes unclear.
Furthermore, the bounds are often regarded to be wide and less informative even
under monotonicity. To address these issues, this study introduces a stochastic
version of the monotonicity assumption alongside a nonparametric distributional
shape constraint. The former enhances the robustness of the Lee bounds with
respect to monotonicity, while the latter helps tighten these bounds. The
obtained bounds do not rely on the exclusion restriction and can be root-$n$
consistently estimable, making them practically viable. The potential
usefulness of the proposed methods is illustrated by their application on
experimental data from the after-school instruction programme studied by
Muralidharan, Singh, and Ganimian (2019).

arXiv link: http://arxiv.org/abs/2311.00439v4

Econometrics arXiv updated paper (originally submitted: 2023-10-31)

Semiparametric Discrete Choice Models for Bundles

Authors: Fu Ouyang, Thomas Tao Yang

We propose two approaches to estimate semiparametric discrete choice models
for bundles. Our first approach is a kernel-weighted rank estimator based on a
matching-based identification strategy. We establish its complete asymptotic
properties and prove the validity of the nonparametric bootstrap for inference.
We then introduce a new multi-index least absolute deviations (LAD) estimator
as an alternative, of which the main advantage is its capacity to estimate
preference parameters on both alternative- and agent-specific regressors. Both
methods can account for arbitrary correlation in disturbances across choices,
with the former also allowing for interpersonal heteroskedasticity. We also
demonstrate that the identification strategy underlying these procedures can be
extended naturally to panel data settings, producing an analogous localized
maximum score estimator and a LAD estimator for estimating bundle choice models
with fixed effects. We derive the limiting distribution of the former and
verify the validity of the numerical bootstrap as an inference tool. All our
proposed methods can be applied to general multi-index models. Monte Carlo
experiments show that they perform well in finite samples.

arXiv link: http://arxiv.org/abs/2311.00013v3

Econometrics arXiv paper, submitted: 2023-10-30

Robust Estimation of Realized Correlation: New Insight about Intraday Fluctuations in Market Betas

Authors: Peter Reinhard Hansen, Yiyao Luo

Time-varying volatility is an inherent feature of most economic time-series,
which causes standard correlation estimators to be inconsistent. The quadrant
correlation estimator is consistent but very inefficient. We propose a novel
subsampled quadrant estimator that improves efficiency while preserving
consistency and robustness. This estimator is particularly well-suited for
high-frequency financial data and we apply it to a large panel of US stocks.
Our empirical analysis sheds new light on intra-day fluctuations in market
betas by decomposing them into time-varying correlations and relative
volatility changes. Our results show that intraday variation in betas is
primarily driven by intraday variation in correlations.

arXiv link: http://arxiv.org/abs/2310.19992v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2023-10-30

Worst-Case Optimal Multi-Armed Gaussian Best Arm Identification with a Fixed Budget

Authors: Masahiro Kato

This study investigates the experimental design problem for identifying the
arm with the highest expected outcome, referred to as best arm identification
(BAI). In our experiments, the number of treatment-allocation rounds is fixed.
During each round, a decision-maker allocates an arm and observes a
corresponding outcome, which follows a Gaussian distribution with variances
that can differ among the arms. At the end of the experiment, the
decision-maker recommends one of the arms as an estimate of the best arm. To
design an experiment, we first discuss lower bounds for the probability of
misidentification. Our analysis highlights that the available information on
the outcome distribution, such as means (expected outcomes), variances, and the
choice of the best arm, significantly influences the lower bounds. Because
available information is limited in actual experiments, we develop a lower
bound that is valid under the unknown means and the unknown choice of the best
arm, which are referred to as the worst-case lower bound. We demonstrate that
the worst-case lower bound depends solely on the variances of the outcomes.
Then, under the assumption that the variances are known, we propose the
Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) strategy, an
extension of the Neyman allocation proposed by Neyman (1934). We show that the
GNA-EBA strategy is asymptotically optimal in the sense that its probability of
misidentification aligns with the lower bounds as the sample size increases
infinitely and the differences between the expected outcomes of the best and
other suboptimal arms converge to the same values across arms. We refer to such
strategies as asymptotically worst-case optimal.

arXiv link: http://arxiv.org/abs/2310.19788v3

Econometrics arXiv cross-link from q-fin.CP (q-fin.CP), submitted: 2023-10-30

Characteristics of price related fluctuations in Non-Fungible Token (NFT) market

Authors: Paweł Szydło, Marcin Wątorek, Jarosław Kwapień, Stanisław Drożdż

A non-fungible token (NFT) market is a new trading invention based on the
blockchain technology which parallels the cryptocurrency market. In the present
work we study capitalization, floor price, the number of transactions, the
inter-transaction times, and the transaction volume value of a few selected
popular token collections. The results show that the fluctuations of all these
quantities are characterized by heavy-tailed probability distribution
functions, in most cases well described by the stretched exponentials, with a
trace of power-law scaling at times, long-range memory, and in several cases
even the fractal organization of fluctuations, mostly restricted to the larger
fluctuations, however. We conclude that the NFT market - even though young and
governed by a somewhat different mechanisms of trading - shares several
statistical properties with the regular financial markets. However, some
differences are visible in the specific quantitative indicators.

arXiv link: http://arxiv.org/abs/2310.19747v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2023-10-30

A Bayesian Markov-switching SAR model for time-varying cross-price spillovers

Authors: Christian Glocker, Matteo Iacopini, Tamás Krisztin, Philipp Piribauer

The spatial autoregressive (SAR) model is extended by introducing a Markov
switching dynamics for the weight matrix and spatial autoregressive parameter.
The framework enables the identification of regime-specific connectivity
patterns and strengths and the study of the spatiotemporal propagation of
shocks in a system with a time-varying spatial multiplier matrix. The proposed
model is applied to disaggregated CPI data from 15 EU countries to examine
cross-price dependencies. The analysis identifies distinct connectivity
structures and spatial weights across the states, which capture shifts in
consumer behaviour, with marked cross-country differences in the spillover from
one price category to another.

arXiv link: http://arxiv.org/abs/2310.19557v1

Econometrics arXiv paper, submitted: 2023-10-30

Spectral identification and estimation of mixed causal-noncausal invertible-noninvertible models

Authors: Alain Hecq, Daniel Velasquez-Gaviria

This paper introduces new techniques for estimating, identifying and
simulating mixed causal-noncausal invertible-noninvertible models. We propose a
framework that integrates high-order cumulants, merging both the spectrum and
bispectrum into a single estimation function. The model that most adequately
represents the data under the assumption that the error term is i.i.d. is
selected. Our Monte Carlo study reveals unbiased parameter estimates and a high
frequency with which correct models are identified. We illustrate our strategy
through an empirical analysis of returns from 24 Fama-French emerging market
stock portfolios. The findings suggest that each portfolio displays noncausal
dynamics, producing white noise residuals devoid of conditional heteroscedastic
effects.

arXiv link: http://arxiv.org/abs/2310.19543v1

Econometrics arXiv paper, submitted: 2023-10-29

Popularity, face and voice: Predicting and interpreting livestreamers' retail performance using machine learning techniques

Authors: Xiong Xiong, Fan Yang, Li Su

Livestreaming commerce, a hybrid of e-commerce and self-media, has expanded
the broad spectrum of traditional sales performance determinants. To
investigate the factors that contribute to the success of livestreaming
commerce, we construct a longitudinal firm-level database with 19,175
observations, covering an entire livestreaming subsector. By comparing the
forecasting accuracy of eight machine learning models, we identify a random
forest model that provides the best prediction of gross merchandise volume
(GMV). Furthermore, we utilize explainable artificial intelligence to open the
black-box of machine learning model, discovering four new facts: 1) variables
representing the popularity of livestreaming events are crucial features in
predicting GMV. And voice attributes are more important than appearance; 2)
popularity is a major determinant of sales for female hosts, while vocal
aesthetics is more decisive for their male counterparts; 3) merits and
drawbacks of the voice are not equally valued in the livestreaming market; 4)
based on changes of comments, page views and likes, sales growth can be divided
into three stages. Finally, we innovatively propose a 3D-SHAP diagram that
demonstrates the relationship between predicting feature importance, target
variable, and its predictors. This diagram identifies bottlenecks for both
beginner and top livestreamers, providing insights into ways to optimize their
sales performance.

arXiv link: http://arxiv.org/abs/2310.19200v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-10-28

Cluster-Randomized Trials with Cross-Cluster Interference

Authors: Michael P. Leung

The literature on cluster-randomized trials typically allows for interference
within but not across clusters. This may be implausible when units are
irregularly distributed across space without well-separated communities, as
clusters in such cases may not align with significant geographic, social, or
economic divisions. This paper develops methods for reducing bias due to
cross-cluster interference. We first propose an estimation strategy that
excludes units not surrounded by clusters assigned to the same treatment arm.
We show that this substantially reduces bias relative to conventional
difference-in-means estimators without significant cost to variance. Second, we
formally establish a bias-variance trade-off in the choice of clusters:
constructing fewer, larger clusters reduces bias due to interference but
increases variance. We provide a rule for choosing the number of clusters to
balance the asymptotic orders of the bias and variance of our estimator.
Finally, we consider unsupervised learning for cluster construction and provide
theoretical guarantees for $k$-medoids.

arXiv link: http://arxiv.org/abs/2310.18836v4

Econometrics arXiv updated paper (originally submitted: 2023-10-28)

Covariate Balancing and the Equivalence of Weighting and Doubly Robust Estimators of Average Treatment Effects

Authors: Tymon Słoczyński, S. Derya Uysal, Jeffrey M. Wooldridge

How should researchers adjust for covariates? We show that if the propensity
score is estimated using a specific covariate balancing approach, inverse
probability weighting (IPW), augmented inverse probability weighting (AIPW),
and inverse probability weighted regression adjustment (IPWRA) estimators are
numerically equivalent for the average treatment effect (ATE), and likewise for
the average treatment effect on the treated (ATT). The resulting weights are
inherently normalized, making normalized and unnormalized IPW and AIPW
identical. We discuss implications for instrumental variables and
difference-in-differences estimators and illustrate with two applications how
these numerical equivalences simplify analysis and interpretation.

arXiv link: http://arxiv.org/abs/2310.18563v2

Econometrics arXiv updated paper (originally submitted: 2023-10-27)

Doubly Robust Identification of Causal Effects of a Continuous Treatment using Discrete Instruments

Authors: Yingying Dong, Ying-Ying Lee

Many empirical applications estimate causal effects of a continuous
endogenous variable (treatment) using a binary instrument. Estimation is
typically done through linear 2SLS. This approach requires a mean treatment
change and causal interpretation requires the LATE-type monotonicity in the
first stage. An alternative approach is to explore distributional changes in
the treatment, where the first-stage restriction is treatment rank similarity.
We propose causal estimands that are doubly robust in that they are valid under
either of these two restrictions. We apply the doubly robust estimation to
estimate the impacts of sleep on well-being. Our new estimates corroborate the
usual 2SLS estimates.

arXiv link: http://arxiv.org/abs/2310.18504v3

Econometrics arXiv updated paper (originally submitted: 2023-10-26)

Inside the black box: Neural network-based real-time prediction of US recessions

Authors: Seulki Chung

Long short-term memory (LSTM) and gated recurrent unit (GRU) are used to
model US recessions from 1967 to 2021. Their predictive performances are
compared to those of the traditional linear models. The out-of-sample
performance suggests the application of LSTM and GRU in recession forecasting,
especially for longer-term forecasts. The Shapley additive explanations (SHAP)
method is applied to both groups of models. The SHAP-based different weight
assignments imply the capability of these types of neural networks to capture
the business cycle asymmetries and nonlinearities. The SHAP method delivers key
recession indicators, such as the S&P 500 index for short-term forecasting up
to 3 months and the term spread for longer-term forecasting up to 12 months.
These findings are robust against other interpretation methods, such as the
local interpretable model-agnostic explanations (LIME) and the marginal
effects.

arXiv link: http://arxiv.org/abs/2310.17571v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-10-26

Tackling Interference Induced by Data Training Loops in A/B Tests: A Weighted Training Approach

Authors: Nian Si

In modern recommendation systems, the standard pipeline involves training
machine learning models on historical data to predict user behaviors and
improve recommendations continuously. However, these data training loops can
introduce interference in A/B tests, where data generated by control and
treatment algorithms, potentially with different distributions, are combined.
To address these challenges, we introduce a novel approach called weighted
training. This approach entails training a model to predict the probability of
each data point appearing in either the treatment or control data and
subsequently applying weighted losses during model training. We demonstrate
that this approach achieves the least variance among all estimators that do not
cause shifts in the training distributions. Through simulation studies, we
demonstrate the lower bias and variance of our approach compared to other
methods.

arXiv link: http://arxiv.org/abs/2310.17496v5

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2023-10-26

Bayesian SAR model with stochastic volatility and multiple time-varying weights

Authors: Michele Costola, Matteo Iacopini, Casper Wichers

A novel spatial autoregressive model for panel data is introduced, which
incorporates multilayer networks and accounts for time-varying relationships.
Moreover, the proposed approach allows the structural variance to evolve
smoothly over time and enables the analysis of shock propagation in terms of
time-varying spillover effects. The framework is applied to analyse the
dynamics of international relationships among the G7 economies and their impact
on stock market returns and volatilities. The findings underscore the
substantial impact of cooperative interactions and highlight discernible
disparities in network exposure across G7 nations, along with nuanced patterns
in direct and indirect spillover effects.

arXiv link: http://arxiv.org/abs/2310.17473v1

Econometrics arXiv updated paper (originally submitted: 2023-10-26)

Dynamic Factor Models: a Genealogy

Authors: Matteo Barigozzi, Marc Hallin

Dynamic factor models have been developed out of the need of analyzing and
forecasting time series in increasingly high dimensions. While mathematical
statisticians faced with inference problems in high-dimensional observation
spaces were focusing on the so-called spiked-model-asymptotics, econometricians
adopted an entirely and considerably more effective asymptotic approach, rooted
in the factor models originally considered in psychometrics. The so-called
dynamic factor model methods, in two decades, has grown into a wide and
successful body of techniques that are widely used in central banks, financial
institutions, economic and statistical institutes. The objective of this
chapter is not an extensive survey of the topic but a sketch of its historical
growth, with emphasis on the various assumptions and interpretations, and a
family tree of its main variants.

arXiv link: http://arxiv.org/abs/2310.17278v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2023-10-25

Causal Q-Aggregation for CATE Model Selection

Authors: Hui Lan, Vasilis Syrgkanis

Accurate estimation of conditional average treatment effects (CATE) is at the
core of personalized decision making. While there is a plethora of models for
CATE estimation, model selection is a nontrivial task, due to the fundamental
problem of causal inference. Recent empirical work provides evidence in favor
of proxy loss metrics with double robust properties and in favor of model
ensembling. However, theoretical understanding is lacking. Direct application
of prior theoretical work leads to suboptimal oracle model selection rates due
to the non-convexity of the model selection problem. We provide regret rates
for the major existing CATE ensembling approaches and propose a new CATE model
ensembling approach based on Q-aggregation using the doubly robust loss. Our
main result shows that causal Q-aggregation achieves statistically optimal
oracle model selection regret rates of $\log(M){n}$ (with $M$ models and
$n$ samples), with the addition of higher-order estimation error terms related
to products of errors in the nuisance functions. Crucially, our regret rate
does not require that any of the candidate CATE models be close to the truth.
We validate our new method on many semi-synthetic datasets and also provide
extensions of our work to CATE model selection with instrumental variables and
unobserved confounding.

arXiv link: http://arxiv.org/abs/2310.16945v5

Econometrics arXiv paper, submitted: 2023-10-25

CATE Lasso: Conditional Average Treatment Effect Estimation with High-Dimensional Linear Regression

Authors: Masahiro Kato, Masaaki Imaizumi

In causal inference about two treatments, Conditional Average Treatment
Effects (CATEs) play an important role as a quantity representing an
individualized causal effect, defined as a difference between the expected
outcomes of the two treatments conditioned on covariates. This study assumes
two linear regression models between a potential outcome and covariates of the
two treatments and defines CATEs as a difference between the linear regression
models. Then, we propose a method for consistently estimating CATEs even under
high-dimensional and non-sparse parameters. In our study, we demonstrate that
desirable theoretical properties, such as consistency, remain attainable even
without assuming sparsity explicitly if we assume a weaker assumption called
implicit sparsity originating from the definition of CATEs. In this assumption,
we suppose that parameters of linear models in potential outcomes can be
divided into treatment-specific and common parameters, where the
treatment-specific parameters take difference values between each linear
regression model, while the common parameters remain identical. Thus, in a
difference between two linear regression models, the common parameters
disappear, leaving only differences in the treatment-specific parameters.
Consequently, the non-zero parameters in CATEs correspond to the differences in
the treatment-specific parameters. Leveraging this assumption, we develop a
Lasso regression method specialized for CATE estimation and present that the
estimator is consistent. Finally, we confirm the soundness of the proposed
method by simulation studies.

arXiv link: http://arxiv.org/abs/2310.16819v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-10-25

Double Debiased Covariate Shift Adaptation Robust to Density-Ratio Estimation

Authors: Masahiro Kato, Kota Matsui, Ryo Inokuchi

Consider a scenario where we have access to train data with both covariates
and outcomes while test data only contains covariates. In this scenario, our
primary aim is to predict the missing outcomes of the test data. With this
objective in mind, we train parametric regression models under a covariate
shift, where covariate distributions are different between the train and test
data. For this problem, existing studies have proposed covariate shift
adaptation via importance weighting using the density ratio. This approach
averages the train data losses, each weighted by an estimated ratio of the
covariate densities between the train and test data, to approximate the
test-data risk. Although it allows us to obtain a test-data risk minimizer, its
performance heavily relies on the accuracy of the density ratio estimation.
Moreover, even if the density ratio can be consistently estimated, the
estimation errors of the density ratio also yield bias in the estimators of the
regression model's parameters of interest. To mitigate these challenges, we
introduce a doubly robust estimator for covariate shift adaptation via
importance weighting, which incorporates an additional estimator for the
regression function. Leveraging double machine learning techniques, our
estimator reduces the bias arising from the density ratio estimation errors. We
demonstrate the asymptotic distribution of the regression parameter estimator.
Notably, our estimator remains consistent if either the density ratio estimator
or the regression function is consistent, showcasing its robustness against
potential errors in density ratio estimation. Finally, we confirm the soundness
of our proposed method via simulation studies.

arXiv link: http://arxiv.org/abs/2310.16638v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-10-25

Fair Adaptive Experiments

Authors: Waverly Wei, Xinwei Ma, Jingshen Wang

Randomized experiments have been the gold standard for assessing the
effectiveness of a treatment or policy. The classical complete randomization
approach assigns treatments based on a prespecified probability and may lead to
inefficient use of data. Adaptive experiments improve upon complete
randomization by sequentially learning and updating treatment assignment
probabilities. However, their application can also raise fairness and equity
concerns, as assignment probabilities may vary drastically across groups of
participants. Furthermore, when treatment is expected to be extremely
beneficial to certain groups of participants, it is more appropriate to expose
many of these participants to favorable treatment. In response to these
challenges, we propose a fair adaptive experiment strategy that simultaneously
enhances data use efficiency, achieves an envy-free treatment assignment
guarantee, and improves the overall welfare of participants. An important
feature of our proposed strategy is that we do not impose parametric modeling
assumptions on the outcome variables, making it more versatile and applicable
to a wider array of applications. Through our theoretical investigation, we
characterize the convergence rate of the estimated treatment effects and the
associated standard deviations at the group level and further prove that our
adaptive treatment assignment algorithm, despite not having a closed-form
expression, approaches the optimal allocation rule asymptotically. Our proof
strategy takes into account the fact that the allocation decisions in our
design depend on sequentially accumulated data, which poses a significant
challenge in characterizing the properties and conducting statistical inference
of our method. We further provide simulation evidence to showcase the
performance of our fair adaptive experiment strategy.

arXiv link: http://arxiv.org/abs/2310.16290v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2023-10-25

Improving Robust Decisions with Data

Authors: Xiaoyu Cheng

A decision-maker faces uncertainty governed by a data-generating process
(DGP), which is only known to belong to a set of sequences of independent but
possibly non-identical distributions. A robust decision maximizes the expected
payoff against the worst possible DGP in this set. This paper characterizes
when and how such robust decisions can be improved with data, measured by the
expected payoff under the true DGP, no matter which possible DGP is the truth.
It further develops novel and simple inference methods to achieve it, as common
methods (e.g., maximum likelihood) may fail to deliver such an improvement.

arXiv link: http://arxiv.org/abs/2310.16281v4

Econometrics arXiv paper, submitted: 2023-10-24

Testing for equivalence of pre-trends in Difference-in-Differences estimation

Authors: Holger Dette, Martin Schumann

The plausibility of the “parallel trends assumption” in
Difference-in-Differences estimation is usually assessed by a test of the null
hypothesis that the difference between the average outcomes of both groups is
constant over time before the treatment. However, failure to reject the null
hypothesis does not imply the absence of differences in time trends between
both groups. We provide equivalence tests that allow researchers to find
evidence in favor of the parallel trends assumption and thus increase the
credibility of their treatment effect estimates. While we motivate our tests in
the standard two-way fixed effects model, we discuss simple extensions to
settings in which treatment adoption is staggered over time.

arXiv link: http://arxiv.org/abs/2310.15796v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2023-10-24

The impact of the Russia-Ukraine conflict on the extreme risk spillovers between agricultural futures and spots

Authors: Wei-Xing Zhou, Yun-Shi Dai, Kiet Tuan Duong, Peng-Fei Dai

The ongoing Russia-Ukraine conflict between two major agricultural powers has
posed significant threats and challenges to the global food system and world
food security. Focusing on the impact of the conflict on the global
agricultural market, we propose a new analytical framework for tail dependence,
and combine the Copula-CoVaR method with the ARMA-GARCH-skewed Student-t model
to examine the tail dependence structure and extreme risk spillover between
agricultural futures and spots over the pre- and post-outbreak periods. Our
results indicate that the tail dependence structures in the futures-spot
markets of soybean, maize, wheat, and rice have all reacted to the
Russia-Ukraine conflict. Furthermore, the outbreak of the conflict has
intensified risks of the four agricultural markets in varying degrees, with the
wheat market being affected the most. Additionally, all the agricultural
futures markets exhibit significant downside and upside risk spillovers to
their corresponding spot markets before and after the outbreak of the conflict,
whereas the strengths of these extreme risk spillover effects demonstrate
significant asymmetries at the directional (downside versus upside) and
temporal (pre-outbreak versus post-outbreak) levels.

arXiv link: http://arxiv.org/abs/2310.16850v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2023-10-24

Correlation structure analysis of the global agricultural futures market

Authors: Yun-Shi Dai, Ngoc Quang Anh Huynh, Qing-Huan Zheng, Wei-Xing Zhou

This paper adopts the random matrix theory (RMT) to analyze the correlation
structure of the global agricultural futures market from 2000 to 2020. It is
found that the distribution of correlation coefficients is asymmetric and right
skewed, and many eigenvalues of the correlation matrix deviate from the RMT
prediction. The largest eigenvalue reflects a collective market effect common
to all agricultural futures, the other largest deviating eigenvalues can be
implemented to identify futures groups, and there are modular structures based
on regional properties or agricultural commodities among the significant
participants of their corresponding eigenvectors. Except for the smallest
eigenvalue, other smallest deviating eigenvalues represent the agricultural
futures pairs with highest correlations. This paper can be of reference and
significance for using agricultural futures to manage risk and optimize asset
allocation.

arXiv link: http://arxiv.org/abs/2310.16849v1

Econometrics arXiv updated paper (originally submitted: 2023-10-24)

Inference for Rank-Rank Regressions

Authors: Denis Chetverikov, Daniel Wilhelm

The slope coefficient in a rank-rank regression is a popular measure of
intergenerational mobility. In this article, we first show that commonly used
inference methods for this slope parameter are invalid. Second, when the
underlying distribution is not continuous, the OLS estimator and its asymptotic
distribution may be highly sensitive to how ties in the ranks are handled.
Motivated by these findings we develop a new asymptotic theory for the OLS
estimator in a general class of rank-rank regression specifications without
imposing any assumptions about the continuity of the underlying distribution.
We then extend the asymptotic theory to other regressions involving ranks that
have been used in empirical work. Finally, we apply our new inference methods
to two empirical studies on intergenerational mobility, highlighting the
practical implications of our theoretical findings.

arXiv link: http://arxiv.org/abs/2310.15512v5

Econometrics arXiv updated paper (originally submitted: 2023-10-23)

Causal clustering: design of cluster experiments under network interference

Authors: Davide Viviano, Lihua Lei, Guido Imbens, Brian Karrer, Okke Schrijvers, Liang Shi

This paper studies the design of cluster experiments to estimate the global
treatment effect in the presence of network spillovers. We provide a framework
to choose the clustering that minimizes the worst-case mean-squared error of
the estimated global effect. We show that optimal clustering solves a novel
penalized min-cut optimization problem computed via off-the-shelf semi-definite
programming algorithms. Our analysis also characterizes simple conditions to
choose between any two cluster designs, including choosing between a cluster or
individual-level randomization. We illustrate the method's properties using
unique network data from the universe of Facebook's users and existing data
from a field experiment.

arXiv link: http://arxiv.org/abs/2310.14983v3

Econometrics arXiv paper, submitted: 2023-10-22

BVARs and Stochastic Volatility

Authors: Joshua Chan

Bayesian vector autoregressions (BVARs) are the workhorse in macroeconomic
forecasting. Research in the last decade has established the importance of
allowing time-varying volatility to capture both secular and cyclical
variations in macroeconomic uncertainty. This recognition, together with the
growing availability of large datasets, has propelled a surge in recent
research in building stochastic volatility models suitable for large BVARs.
Some of these new models are also equipped with additional features that are
especially desirable for large systems, such as order invariance -- i.e.,
estimates are not dependent on how the variables are ordered in the BVAR -- and
robustness against COVID-19 outliers. Estimation of these large, flexible
models is made possible by the recently developed equation-by-equation approach
that drastically reduces the computational cost of estimating large systems.
Despite these recent advances, there remains much ongoing work, such as the
development of parsimonious approaches for time-varying coefficients and other
types of nonlinearities in large BVARs.

arXiv link: http://arxiv.org/abs/2310.14438v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2023-10-22

On propensity score matching with a diverging number of matches

Authors: Yihui He, Fang Han

This paper reexamines Abadie and Imbens (2016)'s work on propensity score
matching for average treatment effect estimation. We explore the asymptotic
behavior of these estimators when the number of nearest neighbors, $M$, grows
with the sample size. It is shown, hardly surprising but technically
nontrivial, that the modified estimators can improve upon the original
fixed-$M$ estimators in terms of efficiency. Additionally, we demonstrate the
potential to attain the semiparametric efficiency lower bound when the
propensity score achieves "sufficient" dimension reduction, echoing Hahn
(1998)'s insight about the role of dimension reduction in propensity
score-based causal inference.

arXiv link: http://arxiv.org/abs/2310.14142v2

Econometrics arXiv updated paper (originally submitted: 2023-10-21)

Unobserved Grouped Heteroskedasticity and Fixed Effects

Authors: Jorge A. Rivero

This paper extends the linear grouped fixed effects (GFE) panel model to
allow for heteroskedasticity from a discrete latent group variable. Key
features of GFE are preserved, such as individuals belonging to one of a finite
number of groups and group membership is unrestricted and estimated. Ignoring
group heteroskedasticity may lead to poor classification, which is detrimental
to finite sample bias and standard errors of estimators. I introduce the
"weighted grouped fixed effects" (WGFE) estimator that minimizes a weighted
average of group sum of squared residuals. I establish $NT$-consistency
and normality under a concept of group separation based on second moments. A
test of group homoskedasticity is discussed. A fast computation procedure is
provided. Simulations show that WGFE outperforms alternatives that exclude
second moment information. I demonstrate this approach by considering the link
between income and democracy and the effect of unionization on earnings.

arXiv link: http://arxiv.org/abs/2310.14068v2

Econometrics arXiv updated paper (originally submitted: 2023-10-20)

Bayesian Estimation of Panel Models under Potentially Sparse Heterogeneity

Authors: Hyungsik Roger Moon, Frank Schorfheide, Boyuan Zhang

We incorporate a version of a spike and slab prior, comprising a pointmass at
zero ("spike") and a Normal distribution around zero ("slab") into a dynamic
panel data framework to model coefficient heterogeneity. In addition to
homogeneity and full heterogeneity, our specification can also capture sparse
heterogeneity, that is, there is a core group of units that share common
parameters and a set of deviators with idiosyncratic parameters. We fit a model
with unobserved components to income data from the Panel Study of Income
Dynamics. We find evidence for sparse heterogeneity for balanced panels
composed of individuals with long employment histories.

arXiv link: http://arxiv.org/abs/2310.13785v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2023-10-20

Transparency challenges in policy evaluation with causal machine learning -- improving usability and accountability

Authors: Patrick Rehill, Nicholas Biddle

Causal machine learning tools are beginning to see use in real-world policy
evaluation tasks to flexibly estimate treatment effects. One issue with these
methods is that the machine learning models used are generally black boxes,
i.e., there is no globally interpretable way to understand how a model makes
estimates. This is a clear problem in policy evaluation applications,
particularly in government, because it is difficult to understand whether such
models are functioning in ways that are fair, based on the correct
interpretation of evidence and transparent enough to allow for accountability
if things go wrong. However, there has been little discussion of transparency
problems in the causal machine learning literature and how these might be
overcome. This paper explores why transparency issues are a problem for causal
machine learning in public policy evaluation applications and considers ways
these problems might be addressed through explainable AI tools and by
simplifying models in line with interpretable AI principles. It then applies
these ideas to a case-study using a causal forest model to estimate conditional
average treatment effects for a hypothetical change in the school leaving age
in Australia. It shows that existing tools for understanding black-box
predictive models are poorly suited to causal machine learning and that
simplifying the model to make it interpretable leads to an unacceptable
increase in error (in this application). It concludes that new tools are needed
to properly understand causal machine learning models and the algorithms that
fit them.

arXiv link: http://arxiv.org/abs/2310.13240v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2023-10-19

A remark on moment-dependent phase transitions in high-dimensional Gaussian approximations

Authors: Anders Bredahl Kock, David Preinerstorfer

In this article, we study the critical growth rates of dimension below which
Gaussian critical values can be used for hypothesis testing but beyond which
they cannot. We are particularly interested in how these growth rates depend on
the number of moments that the observations possess.

arXiv link: http://arxiv.org/abs/2310.12863v3

Econometrics arXiv paper, submitted: 2023-10-19

Nonparametric Regression with Dyadic Data

Authors: Brice Romuald Gueyap Kounga

This paper studies the identification and estimation of a nonparametric
nonseparable dyadic model where the structural function and the distribution of
the unobservable random terms are assumed to be unknown. The identification and
the estimation of the distribution of the unobservable random term are also
proposed. I assume that the structural function is continuous and strictly
increasing in the unobservable heterogeneity. I propose suitable normalization
for the identification by allowing the structural function to have some
desirable properties such as homogeneity of degree one in the unobservable
random term and some of its observables. The consistency and the asymptotic
distribution of the estimators are proposed. The finite sample properties of
the proposed estimators in a Monte-Carlo simulation are assessed.

arXiv link: http://arxiv.org/abs/2310.12825v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-10-18

Survey calibration for causal inference: a simple method to balance covariate distributions

Authors: Maciej Beręsewicz

This paper proposes a simple, yet powerful, method for balancing
distributions of covariates for causal inference based on observational
studies. The method makes it possible to balance an arbitrary number of
quantiles (e.g., medians, quartiles, or deciles) together with means if
necessary. The proposed approach is based on the theory of calibration
estimators (Deville and S\"arndal 1992), in particular, calibration estimators
for quantiles, proposed by Harms and Duchesne (2006). The method does not
require numerical integration, kernel density estimation or assumptions about
the distributions. Valid estimates can be obtained by drawing on existing
asymptotic theory. An illustrative example of the proposed approach is
presented for the entropy balancing method and the covariate balancing
propensity score method. Results of a simulation study indicate that the method
efficiently estimates average treatment effects on the treated (ATT), the
average treatment effect (ATE), the quantile treatment effect on the treated
(QTT) and the quantile treatment effect (QTE), especially in the presence of
non-linearity and mis-specification of the models. The proposed approach can be
further generalized to other designs (e.g. multi-category, continuous) or
methods (e.g. synthetic control method). An open source software implementing
proposed methods is available.

arXiv link: http://arxiv.org/abs/2310.11969v2

Econometrics arXiv paper, submitted: 2023-10-18

Machine Learning for Staggered Difference-in-Differences and Dynamic Treatment Effect Heterogeneity

Authors: Julia Hatamyar, Noemi Kreif, Rudi Rocha, Martin Huber

We combine two recently proposed nonparametric difference-in-differences
methods, extending them to enable the examination of treatment effect
heterogeneity in the staggered adoption setting using machine learning. The
proposed method, machine learning difference-in-differences (MLDID), allows for
estimation of time-varying conditional average treatment effects on the
treated, which can be used to conduct detailed inference on drivers of
treatment effect heterogeneity. We perform simulations to evaluate the
performance of MLDID and find that it accurately identifies the true predictors
of treatment effect heterogeneity. We then use MLDID to evaluate the
heterogeneous impacts of Brazil's Family Health Program on infant mortality,
and find those in poverty and urban locations experienced the impact of the
policy more quickly than other subgroups.

arXiv link: http://arxiv.org/abs/2310.11962v1

Econometrics arXiv updated paper (originally submitted: 2023-10-18)

Trimmed Mean Group Estimation of Average Effects in Ultra Short T Panels under Correlated Heterogeneity

Authors: M. Hashem Pesaran, Liying Yang

The commonly used two-way fixed effects estimator is biased under correlated
heterogeneity and can lead to misleading inference. This paper proposes a new
trimmed mean group (TMG) estimator which is consistent at the irregular rate of
n^{1/3} even if the time dimension of the panel is as small as the number of
its regressors. Extensions to panels with time effects are provided, and a
Hausman test of correlated heterogeneity is proposed. Small sample properties
of the TMG estimator (with and without time effects) are investigated by Monte
Carlo experiments and shown to be satisfactory and perform better than other
trimmed estimators proposed in the literature. The proposed test of correlated
heterogeneity is also shown to have the correct size and satisfactory power.
The utility of the TMG approach is illustrated with an empirical application.

arXiv link: http://arxiv.org/abs/2310.11680v2

Econometrics arXiv updated paper (originally submitted: 2023-10-14)

Adaptive maximization of social welfare

Authors: Nicolo Cesa-Bianchi, Roberto Colomboni, Maximilian Kasy

We consider the problem of repeatedly choosing policies to maximize social
welfare. Welfare is a weighted sum of private utility and public revenue.
Earlier outcomes inform later policies. Utility is not observed, but indirectly
inferred. Response functions are learned through experimentation. We derive a
lower bound on regret, and a matching adversarial upper bound for a variant of
the Exp3 algorithm. Cumulative regret grows at a rate of $T^{2/3}$. This
implies that (i) welfare maximization is harder than the multi-armed bandit
problem (with a rate of $T^{1/2}$ for finite policy sets), and (ii) our
algorithm achieves the optimal rate. For the stochastic setting, if social
welfare is concave, we can achieve a rate of $T^{1/2}$ (for continuous policy
sets), using a dyadic search algorithm. We analyze an extension to nonlinear
income taxation, and sketch an extension to commodity taxation. We compare our
setting to monopoly pricing (which is easier), and price setting for bilateral
trade (which is harder).

arXiv link: http://arxiv.org/abs/2310.09597v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-10-14

A Semiparametric Instrumented Difference-in-Differences Approach to Policy Learning

Authors: Pan Zhao, Yifan Cui

Recently, there has been a surge in methodological development for the
difference-in-differences (DiD) approach to evaluate causal effects. Standard
methods in the literature rely on the parallel trends assumption to identify
the average treatment effect on the treated. However, the parallel trends
assumption may be violated in the presence of unmeasured confounding, and the
average treatment effect on the treated may not be useful in learning a
treatment assignment policy for the entire population. In this article, we
propose a general instrumented DiD approach for learning the optimal treatment
policy. Specifically, we establish identification results using a binary
instrumental variable (IV) when the parallel trends assumption fails to hold.
Additionally, we construct a Wald estimator, novel inverse probability
weighting (IPW) estimators, and a class of semiparametric efficient and
multiply robust estimators, with theoretical guarantees on consistency and
asymptotic normality, even when relying on flexible machine learning algorithms
for nuisance parameters estimation. Furthermore, we extend the instrumented DiD
to the panel data setting. We evaluate our methods in extensive simulations and
a real data application.

arXiv link: http://arxiv.org/abs/2310.09545v1

Econometrics arXiv cross-link from cs.CR (cs.CR), submitted: 2023-10-13

An In-Depth Examination of Requirements for Disclosure Risk Assessment

Authors: Ron S. Jarmin, John M. Abowd, Robert Ashmead, Ryan Cumings-Menon, Nathan Goldschlag, Michael B. Hawes, Sallie Ann Keller, Daniel Kifer, Philip Leclerc, Jerome P. Reiter, Rolando A. Rodríguez, Ian Schmutte, Victoria A. Velkoff, Pavel Zhuravlev

The use of formal privacy to protect the confidentiality of responses in the
2020 Decennial Census of Population and Housing has triggered renewed interest
and debate over how to measure the disclosure risks and societal benefits of
the published data products. Following long-established precedent in economics
and statistics, we argue that any proposal for quantifying disclosure risk
should be based on pre-specified, objective criteria. Such criteria should be
used to compare methodologies to identify those with the most desirable
properties. We illustrate this approach, using simple desiderata, to evaluate
the absolute disclosure risk framework, the counterfactual framework underlying
differential privacy, and prior-to-posterior comparisons. We conclude that
satisfying all the desiderata is impossible, but counterfactual comparisons
satisfy the most while absolute disclosure risk satisfies the fewest.
Furthermore, we explain that many of the criticisms levied against differential
privacy would be levied against any technology that is not equivalent to
direct, unrestricted access to confidential data. Thus, more research is
needed, but in the near-term, the counterfactual approach appears best-suited
for privacy-utility analysis.

arXiv link: http://arxiv.org/abs/2310.09398v1

Econometrics arXiv updated paper (originally submitted: 2023-10-13)

Estimating Individual Responses when Tomorrow Matters

Authors: Stephane Bonhomme, Angela Denis

We propose a regression-based approach to estimate how individuals'
expectations influence their responses to a counterfactual change. We provide
conditions under which average partial effects based on regression estimates
recover structural effects. We propose a practical three-step estimation method
that relies on panel data on subjective expectations. We illustrate our
approach in a model of consumption and saving, focusing on the impact of an
income tax that not only changes current income but also affects beliefs about
future income. Applying our approach to Italian survey data, we find that
individuals' beliefs matter for evaluating the impact of tax policies on
consumption decisions.

arXiv link: http://arxiv.org/abs/2310.09105v3

Econometrics arXiv paper, submitted: 2023-10-13

Smoothed instrumental variables quantile regression

Authors: David M. Kaplan

In this article, I introduce the sivqr command, which estimates the
coefficients of the instrumental variables (IV) quantile regression model
introduced by Chernozhukov and Hansen (2005). The sivqr command offers several
advantages over the existing ivqreg and ivqreg2 commands for estimating this IV
quantile regression model, which complements the alternative "triangular model"
behind cqiv and the "local quantile treatment effect" model of ivqte.
Computationally, sivqr implements the smoothed estimator of Kaplan and Sun
(2017), who show that smoothing improves both computation time and statistical
accuracy. Standard errors are computed analytically or by Bayesian bootstrap;
for non-iid sampling, sivqr is compatible with bootstrap. I discuss syntax and
the underlying methodology, and I compare sivqr with other commands in an
example.

arXiv link: http://arxiv.org/abs/2310.09013v1

Econometrics arXiv updated paper (originally submitted: 2023-10-12)

Machine Learning Who to Nudge: Causal vs Predictive Targeting in a Field Experiment on Student Financial Aid Renewal

Authors: Susan Athey, Niall Keleher, Jann Spiess

In many settings, interventions may be more effective for some individuals
than others, so that targeting interventions may be beneficial. We analyze the
value of targeting in the context of a large-scale field experiment with over
53,000 college students, where the goal was to use "nudges" to encourage
students to renew their financial-aid applications before a non-binding
deadline. We begin with baseline approaches to targeting. First, we target
based on a causal forest that estimates heterogeneous treatment effects and
then assigns students to treatment according to those estimated to have the
highest treatment effects. Next, we evaluate two alternative targeting
policies, one targeting students with low predicted probability of renewing
financial aid in the absence of the treatment, the other targeting those with
high probability. The predicted baseline outcome is not the ideal criterion for
targeting, nor is it a priori clear whether to prioritize low, high, or
intermediate predicted probability. Nonetheless, targeting on low baseline
outcomes is common in practice, for example because the relationship between
individual characteristics and treatment effects is often difficult or
impossible to estimate with historical data. We propose hybrid approaches that
incorporate the strengths of both predictive approaches (accurate estimation)
and causal approaches (correct criterion); we show that targeting intermediate
baseline outcomes is most effective in our specific application, while
targeting based on low baseline outcomes is detrimental. In one year of the
experiment, nudging all students improved early filing by an average of 6.4
percentage points over a baseline average of 37% filing, and we estimate that
targeting half of the students using our preferred policy attains around 75% of
this benefit.

arXiv link: http://arxiv.org/abs/2310.08672v2

Econometrics arXiv updated paper (originally submitted: 2023-10-12)

Real-time Prediction of the Great Recession and the Covid-19 Recession

Authors: Seulki Chung

This paper uses standard and penalized logistic regression models to predict
the Great Recession and the Covid-19 recession in the US in real time. It
examines the predictability of various macroeconomic and financial indicators
with respect to the NBER recession indicator. The findings strongly support the
use of penalized logistic regression models in recession forecasting. These
models, particularly the ridge logistic regression model, outperform the
standard logistic regression model in predicting the Great Recession in the US
across different forecast horizons. The study also confirms the traditional
significance of the term spread as an important recession indicator. However,
it acknowledges that the Covid-19 recession remains unpredictable due to the
unprecedented nature of the pandemic. The results are validated by creating a
recession indicator through principal component analysis (PCA) on selected
variables, which strongly correlates with the NBER recession indicator and is
less affected by publication lags.

arXiv link: http://arxiv.org/abs/2310.08536v5

Econometrics arXiv paper, submitted: 2023-10-12

Structural Vector Autoregressions and Higher Moments: Challenges and Solutions in Small Samples

Authors: Sascha A. Keweloh

Generalized method of moments estimators based on higher-order moment
conditions derived from independent shocks can be used to identify and estimate
the simultaneous interaction in structural vector autoregressions. This study
highlights two problems that arise when using these estimators in small
samples. First, imprecise estimates of the asymptotically efficient weighting
matrix and the asymptotic variance lead to volatile estimates and inaccurate
inference. Second, many moment conditions lead to a small sample scaling bias
towards innovations with a variance smaller than the normalizing unit variance
assumption. To address the first problem, I propose utilizing the assumption of
independent structural shocks to estimate the efficient weighting matrix and
the variance of the estimator. For the second issue, I propose incorporating a
continuously updated scaling term into the weighting matrix, eliminating the
scaling bias. To demonstrate the effectiveness of these measures, I conducted a
Monte Carlo simulation which shows a significant improvement in the performance
of the estimator.

arXiv link: http://arxiv.org/abs/2310.08173v1

Econometrics arXiv updated paper (originally submitted: 2023-10-12)

Model-Agnostic Covariate-Assisted Inference on Partially Identified Causal Effects

Authors: Wenlong Ji, Lihua Lei, Asher Spector

Many causal estimands are only partially identifiable since they depend on
the unobservable joint distribution between potential outcomes. Stratification
on pretreatment covariates can yield sharper bounds; however, unless the
covariates are discrete with relatively small support, this approach typically
requires binning covariates or estimating the conditional distributions of the
potential outcomes given the covariates. Binning can result in substantial
efficiency loss and become challenging to implement, even with a moderate
number of covariates. Estimating conditional distributions, on the other hand,
may yield invalid inference if the distributions are inaccurately estimated,
such as when a misspecified model is used or when the covariates are
high-dimensional. In this paper, we propose a unified and model-agnostic
inferential approach for a wide class of partially identified estimands. Our
method, based on duality theory for optimal transport problems, has four key
properties. First, in randomized experiments, our approach can wrap around any
estimates of the conditional distributions and provide uniformly valid
inference, even if the initial estimates are arbitrarily inaccurate. A simple
extension of our method to observational studies is doubly robust in the usual
sense. Second, if nuisance parameters are estimated at semiparametric rates,
our estimator is asymptotically unbiased for the sharp partial identification
bound. Third, we can apply the multiplier bootstrap to select covariates and
models without sacrificing validity, even if the true model is not selected.
Finally, our method is computationally efficient. Overall, in three empirical
applications, our method consistently reduces the width of estimated identified
sets and confidence intervals without making additional structural assumptions.

arXiv link: http://arxiv.org/abs/2310.08115v2

Econometrics arXiv updated paper (originally submitted: 2023-10-12)

Inference for Nonlinear Endogenous Treatment Effects Accounting for High-Dimensional Covariate Complexity

Authors: Qingliang Fan, Zijian Guo, Ziwei Mei, Cun-Hui Zhang

Nonlinearity and endogeneity are prevalent challenges in causal analysis
using observational data. This paper proposes an inference procedure for a
nonlinear and endogenous marginal effect function, defined as the derivative of
the nonparametric treatment function, with a primary focus on an additive model
that includes high-dimensional covariates. Using the control function approach
for identification, we implement a regularized nonparametric estimation to
obtain an initial estimator of the model. Such an initial estimator suffers
from two biases: the bias in estimating the control function and the
regularization bias for the high-dimensional outcome model. Our key innovation
is to devise the double bias correction procedure that corrects these two
biases simultaneously. Building on this debiased estimator, we further provide
a confidence band of the marginal effect function. Simulations and an empirical
study of air pollution and migration demonstrate the validity of our
procedures.

arXiv link: http://arxiv.org/abs/2310.08063v3

Econometrics arXiv paper, submitted: 2023-10-11

Marital Sorting, Household Inequality and Selection

Authors: Iván Fernández-Val, Aico van Vuuren, Francis Vella

Using CPS data for 1976 to 2022 we explore how wage inequality has evolved
for married couples with both spouses working full time full year, and its
impact on household income inequality. We also investigate how marriage sorting
patterns have changed over this period. To determine the factors driving income
inequality we estimate a model explaining the joint distribution of wages which
accounts for the spouses' employment decisions. We find that income inequality
has increased for these households and increased assortative matching of wages
has exacerbated the inequality resulting from individual wage growth. We find
that positive sorting partially reflects the correlation across unobservables
influencing both members' of the marriage wages. We decompose the changes in
sorting patterns over the 47 years comprising our sample into structural,
composition and selection effects and find that the increase in positive
sorting primarily reflects the increased skill premia for both observed and
unobserved characteristics.

arXiv link: http://arxiv.org/abs/2310.07839v1

Econometrics arXiv paper, submitted: 2023-10-11

Integration or fragmentation? A closer look at euro area financial markets

Authors: Martin Feldkircher, Karin Klieber

This paper examines the degree of integration at euro area financial markets.
To that end, we estimate overall and country-specific integration indices based
on a panel vector-autoregression with factor stochastic volatility. Our results
indicate a more heterogeneous bond market compared to the market for lending
rates. At both markets, the global financial crisis and the sovereign debt
crisis led to a severe decline in financial integration, which fully recovered
since then. We furthermore identify countries that deviate from their peers
either by responding differently to crisis events or by taking on different
roles in the spillover network. The latter analysis reveals two set of
countries, namely a main body of countries that receives and transmits
spillovers and a second, smaller group of spillover absorbing economies.
Finally, we demonstrate by estimating an augmented Taylor rule that euro area
short-term interest rates are positively linked to the level of integration on
the bond market.

arXiv link: http://arxiv.org/abs/2310.07790v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2023-10-11

Smoothness-Adaptive Dynamic Pricing with Nonparametric Demand Learning

Authors: Zeqi Ye, Hansheng Jiang

We study the dynamic pricing problem where the demand function is
nonparametric and H\"older smooth, and we focus on adaptivity to the unknown
H\"older smoothness parameter $\beta$ of the demand function. Traditionally the
optimal dynamic pricing algorithm heavily relies on the knowledge of $\beta$ to
achieve a minimax optimal regret of
$O(T^{\beta+1{2\beta+1}})$. However, we highlight the
challenge of adaptivity in this dynamic pricing problem by proving that no
pricing policy can adaptively achieve this minimax optimal regret without
knowledge of $\beta$. Motivated by the impossibility result, we propose a
self-similarity condition to enable adaptivity. Importantly, we show that the
self-similarity condition does not compromise the problem's inherent complexity
since it preserves the regret lower bound
$\Omega(T^{\beta+1{2\beta+1}})$. Furthermore, we develop a
smoothness-adaptive dynamic pricing algorithm and theoretically prove that the
algorithm achieves this minimax optimal regret bound without the prior
knowledge $\beta$.

arXiv link: http://arxiv.org/abs/2310.07558v2

Econometrics arXiv updated paper (originally submitted: 2023-10-11)

Identification and Estimation of a Semiparametric Logit Model using Network Data

Authors: Brice Romuald Gueyap Kounga

This paper studies the identification and estimation of a semiparametric
binary network model in which the unobserved social characteristic is
endogenous, that is, the unobserved individual characteristic influences both
the binary outcome of interest and how links are formed within the network. The
exact functional form of the latent social characteristic is not known. The
proposed estimators are obtained based on matching pairs of agents whose
network formation distributions are the same. The consistency and the
asymptotic distribution of the estimators are proposed. The finite sample
properties of the proposed estimators in a Monte-Carlo simulation are assessed.
We conclude this study with an empirical application.

arXiv link: http://arxiv.org/abs/2310.07151v2

Econometrics arXiv paper, submitted: 2023-10-10

Treatment Choice, Mean Square Regret and Partial Identification

Authors: Toru Kitagawa, Sokbae Lee, Chen Qiu

We consider a decision maker who faces a binary treatment choice when their
welfare is only partially identified from data. We contribute to the literature
by anchoring our finite-sample analysis on mean square regret, a decision
criterion advocated by Kitagawa, Lee, and Qiu (2022). We find that optimal
rules are always fractional, irrespective of the width of the identified set
and precision of its estimate. The optimal treatment fraction is a simple
logistic transformation of the commonly used t-statistic multiplied by a factor
calculated by a simple constrained optimization. This treatment fraction gets
closer to 0.5 as the width of the identified set becomes wider, implying the
decision maker becomes more cautious against the adversarial Nature.

arXiv link: http://arxiv.org/abs/2310.06242v1

Econometrics arXiv paper, submitted: 2023-10-09

Robust Minimum Distance Inference in Structural Models

Authors: Joan Alegre, Juan Carlos Escanciano

This paper proposes minimum distance inference for a structural parameter of
interest, which is robust to the lack of identification of other structural
nuisance parameters. Some choices of the weighting matrix lead to asymptotic
chi-squared distributions with degrees of freedom that can be consistently
estimated from the data, even under partial identification. In any case,
knowledge of the level of under-identification is not required. We study the
power of our robust test. Several examples show the wide applicability of the
procedure and a Monte Carlo investigates its finite sample performance. Our
identification-robust inference method can be applied to make inferences on
both calibrated (fixed) parameters and any other structural parameter of
interest. We illustrate the method's usefulness by applying it to a structural
model on the non-neutrality of monetary policy, as in nakamura2018high,
where we empirically evaluate the validity of the calibrated parameters and we
carry out robust inference on the slope of the Phillips curve and the
information effect.

arXiv link: http://arxiv.org/abs/2310.05761v1

Econometrics arXiv paper, submitted: 2023-10-08

Identification and Estimation in a Class of Potential Outcomes Models

Authors: Manu Navjeevan, Rodrigo Pinto, Andres Santos

This paper develops a class of potential outcomes models characterized by
three main features: (i) Unobserved heterogeneity can be represented by a
vector of potential outcomes and a type describing the manner in which an
instrument determines the choice of treatment; (ii) The availability of an
instrumental variable that is conditionally independent of unobserved
heterogeneity; and (iii) The imposition of convex restrictions on the
distribution of unobserved heterogeneity. The proposed class of models
encompasses multiple classical and novel research designs, yet possesses a
common structure that permits a unifying analysis of identification and
estimation. In particular, we establish that these models share a common
necessary and sufficient condition for identifying certain causal parameters.
Our identification results are constructive in that they yield estimating
moment conditions for the parameters of interest. Focusing on a leading special
case of our framework, we further show how these estimating moment conditions
may be modified to be doubly robust. The corresponding double robust estimators
are shown to be asymptotically normally distributed, bootstrap based inference
is shown to be asymptotically valid, and the semi-parametric efficiency bound
is derived for those parameters that are root-n estimable. We illustrate the
usefulness of our results for developing, identifying, and estimating causal
models through an empirical evaluation of the role of mental health as a
mediating variable in the Moving To Opportunity experiment.

arXiv link: http://arxiv.org/abs/2310.05311v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-10-07

On changepoint detection in functional data using empirical energy distance

Authors: B. Cooper Boniece, Lajos Horváth, Lorenzo Trapani

We propose a novel family of test statistics to detect the presence of
changepoints in a sequence of dependent, possibly multivariate,
functional-valued observations. Our approach allows to test for a very general
class of changepoints, including the "classical" case of changes in the mean,
and even changes in the whole distribution. Our statistics are based on a
generalisation of the empirical energy distance; we propose weighted
functionals of the energy distance process, which are designed in order to
enhance the ability to detect breaks occurring at sample endpoints. The
limiting distribution of the maximally selected version of our statistics
requires only the computation of the eigenvalues of the covariance function,
thus being readily implementable in the most commonly employed packages, e.g.
R. We show that, under the alternative, our statistics are able to detect
changepoints occurring even very close to the beginning/end of the sample. In
the presence of multiple changepoints, we propose a binary segmentation
algorithm to estimate the number of breaks and the locations thereof.
Simulations show that our procedures work very well in finite samples. We
complement our theory with applications to financial and temperature data.

arXiv link: http://arxiv.org/abs/2310.04853v1

Econometrics arXiv updated paper (originally submitted: 2023-10-06)

Challenges in Statistically Rejecting the Perfect Competition Hypothesis Using Imperfect Competition Data

Authors: Yuri Matsumura, Suguru Otani

We theoretically prove why statistically rejecting the null hypothesis of
perfect competition is challenging, known as a common problem in the
literature. We also assess the finite sample performance of the conduct
parameter test in homogeneous goods markets, showing that statistical power
increases with the number of markets, a larger conduct parameter, and a
stronger demand rotation instrument. However, even with a moderate number of
markets and five firms, rejecting the null hypothesis of perfect competition
remains difficult, irrespective of instrument strength or the use of optimal
instruments. Our findings suggest that empirical results failing to reject
perfect competition are due to the limited number of markets rather than
methodological shortcomings.

arXiv link: http://arxiv.org/abs/2310.04576v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-10-05

Cutting Feedback in Misspecified Copula Models

Authors: Michael Stanley Smith, Weichang Yu, David J. Nott, David Frazier

In copula models the marginal distributions and copula function are specified
separately. We treat these as two modules in a modular Bayesian inference
framework, and propose conducting modified Bayesian inference by "cutting
feedback". Cutting feedback limits the influence of potentially misspecified
modules in posterior inference. We consider two types of cuts. The first limits
the influence of a misspecified copula on inference for the marginals, which is
a Bayesian analogue of the popular Inference for Margins (IFM) estimator. The
second limits the influence of misspecified marginals on inference for the
copula parameters by using a pseudo likelihood of the ranks to define the cut
model. We establish that if only one of the modules is misspecified, then the
appropriate cut posterior gives accurate uncertainty quantification
asymptotically for the parameters in the other module. Computation of the cut
posteriors is difficult, and new variational inference methods to do so are
proposed. The efficacy of the new methodology is demonstrated using both
simulated data and a substantive multivariate time series application from
macroeconomic forecasting. In the latter, cutting feedback from misspecified
marginals to a 1096 dimension copula improves posterior inference and
predictive accuracy greatly, compared to conventional Bayesian inference.

arXiv link: http://arxiv.org/abs/2310.03521v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2023-10-05

Variational Inference for GARCH-family Models

Authors: Martin Magris, Alexandros Iosifidis

The Bayesian estimation of GARCH-family models has been typically addressed
through Monte Carlo sampling. Variational Inference is gaining popularity and
attention as a robust approach for Bayesian inference in complex machine
learning models; however, its adoption in econometrics and finance is limited.
This paper discusses the extent to which Variational Inference constitutes a
reliable and feasible alternative to Monte Carlo sampling for Bayesian
inference in GARCH-like models. Through a large-scale experiment involving the
constituents of the S&P 500 index, several Variational Inference optimizers, a
variety of volatility models, and a case study, we show that Variational
Inference is an attractive, remarkably well-calibrated, and competitive method
for Bayesian learning.

arXiv link: http://arxiv.org/abs/2310.03435v1

Econometrics arXiv paper, submitted: 2023-10-04

Moran's I Lasso for models with spatially correlated data

Authors: Sylvain Barde, Rowan Cherodian, Guy Tchuente

This paper proposes a Lasso-based estimator which uses information embedded
in the Moran statistic to develop a selection procedure called Moran's I Lasso
(Mi-Lasso) to solve the Eigenvector Spatial Filtering (ESF) eigenvector
selection problem. ESF uses a subset of eigenvectors from a spatial weights
matrix to efficiently account for any omitted cross-sectional correlation terms
in a classical linear regression framework, thus does not require the
researcher to explicitly specify the spatial part of the underlying structural
model. We derive performance bounds and show the necessary conditions for
consistent eigenvector selection. The key advantages of the proposed estimator
are that it is intuitive, theoretically grounded, and substantially faster than
Lasso based on cross-validation or any proposed forward stepwise procedure. Our
main simulation results show the proposed selection procedure performs well in
finite samples. Compared to existing selection procedures, we find Mi-Lasso has
one of the smallest biases and mean squared errors across a range of sample
sizes and levels of spatial correlation. An application on house prices further
demonstrates Mi-Lasso performs well compared to existing procedures.

arXiv link: http://arxiv.org/abs/2310.02773v1

Econometrics arXiv updated paper (originally submitted: 2023-10-03)

Sharp and Robust Estimation of Partially Identified Discrete Response Models

Authors: Shakeeb Khan, Tatiana Komarova, Denis Nekipelov

Semiparametric discrete choice models are widely used in a variety of
practical applications. While these models are point identified in the presence
of continuous covariates, they can become partially identified when covariates
are discrete. In this paper we find that classical estimators, including the
maximum score estimator, (Manski (1975)), loose their attractive statistical
properties without point identification. First of all, they are not sharp with
the estimator converging to an outer region of the identified set, (Komarova
(2013)), and in many discrete designs it weakly converges to a random set.
Second, they are not robust, with their distribution limit discontinuously
changing with respect to the parameters of the model. We propose a novel class
of estimators based on the concept of a quantile of a random set, which we show
to be both sharp and robust. We demonstrate that our approach extends from
cross-sectional settings to classical static and dynamic discrete panel data
models.

arXiv link: http://arxiv.org/abs/2310.02414v4

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2023-10-03

fmeffects: An R Package for Forward Marginal Effects

Authors: Holger Löwe, Christian A. Scholbeck, Christian Heumann, Bernd Bischl, Giuseppe Casalicchio

Forward marginal effects have recently been introduced as a versatile and
effective model-agnostic interpretation method particularly suited for
non-linear and non-parametric prediction models. They provide comprehensible
model explanations of the form: if we change feature values by a pre-specified
step size, what is the change in the predicted outcome? We present the R
package fmeffects, the first software implementation of the theory surrounding
forward marginal effects. The relevant theoretical background, package
functionality and handling, as well as the software design and options for
future extensions are discussed in this paper.

arXiv link: http://arxiv.org/abs/2310.02008v2

Econometrics arXiv updated paper (originally submitted: 2023-10-03)

Specification testing with grouped fixed effects

Authors: Claudia Pigini, Alessandro Pionati, Francesco Valentini

We propose a Hausman test for the correct specification of unobserved
heterogeneity in both linear and nonlinear fixed-effects panel data models. The
null hypothesis is that heterogeneity is either time-invariant or,
symmetrically, described by homogeneous time effects. We contrast the standard
one-way fixed-effects estimator with the recently developed two-way grouped
fixed-effects estimator, that is consistent in the presence of time-varying
heterogeneity (or heterogeneous time effects) under minimal specification and
distributional assumptions for the unobserved effects. The Hausman test
compares jackknife corrected estimators, removing the leading term of the
incidental parameters and approximation biases, and exploits bootstrap to
obtain the variance of the vector of contrasts. We provide Monte Carlo evidence
on the size and power properties of the test and illustrate its application in
two empirical settings.

arXiv link: http://arxiv.org/abs/2310.01950v3

Econometrics arXiv updated paper (originally submitted: 2023-10-02)

Impact of Economic Uncertainty, Geopolitical Risk, Pandemic, Financial & Macroeconomic Factors on Crude Oil Returns -- An Empirical Investigation

Authors: Sarit Maitra

This study aims to use simultaneous quantile regression (SQR) to examine the
impact of macroeconomic and financial uncertainty including global pandemic,
geopolitical risk on the futures returns of crude oil (ROC). The data for this
study is sourced from the FRED (Federal Reserve Economic Database) economic
dataset; the importance of the factors have been validated by using variation
inflation factor (VIF) and principal component analysis (PCA). To fully
understand the combined effect of these factors on WTI, study includes
interaction terms in the multi-factor model. Empirical results suggest that
changes in ROC can have varying impacts depending on the specific period and
market conditions. The results can be used for informed investment decisions
and to construct portfolios that are well-balanced in terms of risk and return.
Structural breaks, such as changes in global economic conditions or shifts in
demand for crude oil, can cause return on crude oil to be sensitive to changes
in different time periods. The unique aspect ness of this study also lies in
its inclusion of explanatory factors related to the pandemic, geopolitical
risk, and inflation.

arXiv link: http://arxiv.org/abs/2310.01123v2

Econometrics arXiv cross-link from q-fin.MF (q-fin.MF), submitted: 2023-10-02

Multi-period static hedging of European options

Authors: Purba Banerjee, Srikanth Iyer, Shashi Jain

We consider the hedging of European options when the price of the underlying
asset follows a single-factor Markovian framework. By working in such a
setting, Carr and Wu carr2014static derived a spanning relation between
a given option and a continuum of shorter-term options written on the same
asset. In this paper, we have extended their approach to simultaneously include
options over multiple short maturities. We then show a practical implementation
of this with a finite set of shorter-term options to determine the hedging
error using a Gaussian Quadrature method. We perform a wide range of
experiments for both the Black-Scholes and Merton Jump
Diffusion models, illustrating the comparative performance of the two methods.

arXiv link: http://arxiv.org/abs/2310.01104v3

Econometrics arXiv updated paper (originally submitted: 2023-10-01)

Semidiscrete optimal transport with unknown costs

Authors: Yinchu Zhu, Ilya O. Ryzhov

Semidiscrete optimal transport is a challenging generalization of the
classical transportation problem in linear programming. The goal is to design a
joint distribution for two random variables (one continuous, one discrete) with
fixed marginals, in a way that minimizes expected cost. We formulate a novel
variant of this problem in which the cost functions are unknown, but can be
learned through noisy observations; however, only one function can be sampled
at a time. We develop a semi-myopic algorithm that couples online learning with
stochastic approximation, and prove that it achieves optimal convergence rates,
despite the non-smoothness of the stochastic gradient and the lack of strong
concavity in the objective function.

arXiv link: http://arxiv.org/abs/2310.00786v3

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2023-10-01

CausalGPS: An R Package for Causal Inference With Continuous Exposures

Authors: Naeem Khoshnevis, Xiao Wu, Danielle Braun

Quantifying the causal effects of continuous exposures on outcomes of
interest is critical for social, economic, health, and medical research.
However, most existing software packages focus on binary exposures. We develop
the CausalGPS R package that implements a collection of algorithms to provide
algorithmic solutions for causal inference with continuous exposures. CausalGPS
implements a causal inference workflow, with algorithms based on generalized
propensity scores (GPS) as the core, extending propensity scores (the
probability of a unit being exposed given pre-exposure covariates) from binary
to continuous exposures. As the first step, the package implements efficient
and flexible estimations of the GPS, allowing multiple user-specified modeling
options. As the second step, the package provides two ways to adjust for
confounding: weighting and matching, generating weighted and matched data sets,
respectively. Lastly, the package provides built-in functions to fit flexible
parametric, semi-parametric, or non-parametric regression models on the
weighted or matched data to estimate the exposure-response function relating
the outcome with the exposures. The computationally intensive tasks are
implemented in C++, and efficient shared-memory parallelization is achieved by
OpenMP API. This paper outlines the main components of the CausalGPS R package
and demonstrates its application to assess the effect of long-term exposure to
PM2.5 on educational attainment using zip code-level data from the contiguous
United States from 2000-2016.

arXiv link: http://arxiv.org/abs/2310.00561v1

Econometrics arXiv cross-link from math.OC (math.OC), submitted: 2023-09-30

On Sinkhorn's Algorithm and Choice Modeling

Authors: Zhaonan Qu, Alfred Galichon, Wenzhi Gao, Johan Ugander

For a broad class of models widely used in practice for choice and ranking
data based on Luce's choice axiom, including the Bradley--Terry--Luce and
Plackett--Luce models, we show that the associated maximum likelihood
estimation problems are equivalent to a classic matrix balancing problem with
target row and column sums. This perspective opens doors between two seemingly
unrelated research areas, and allows us to unify existing algorithms in the
choice modeling literature as special instances or analogs of Sinkhorn's
celebrated algorithm for matrix balancing. We draw inspirations from these
connections and resolve some open problems on the study of Sinkhorn's
algorithm. We establish the global linear convergence of Sinkhorn's algorithm
for non-negative matrices whenever finite scaling matrices exist, and
characterize its linear convergence rate in terms of the algebraic connectivity
of a weighted bipartite graph. We further derive the sharp asymptotic rate of
linear convergence, which generalizes a classic result of Knight (2008). To our
knowledge, these are the first quantitative linear convergence results for
Sinkhorn's algorithm for general non-negative matrices and positive marginals.
Our results highlight the importance of connectivity and orthogonality
structures in matrix balancing and Sinkhorn's algorithm, which could be of
independent interest. More broadly, the connections we establish in this paper
between matrix balancing and choice modeling could also help motivate further
transmission of ideas and lead to interesting results in both disciplines.

arXiv link: http://arxiv.org/abs/2310.00260v2

Econometrics arXiv cross-link from cs.HC (cs.HC), submitted: 2023-09-30

Identification, Impacts, and Opportunities of Three Common Measurement Considerations when using Digital Trace Data

Authors: Daniel Muise, Nilam Ram, Thomas Robinson, Byron Reeves

Cataloguing specific URLs, posts, and applications with digital traces is the
new best practice for measuring media use and content consumption. Despite the
apparent accuracy that comes with greater granularity, however, digital traces
may introduce additional ambiguity and new errors into the measurement of media
use. In this note, we identify three new measurement challenges when using
Digital Trace Data that were recently uncovered using a new measurement
framework - Screenomics - that records media use at the granularity of
individual screenshots obtained every few seconds as people interact with
mobile devices. We label the considerations as follows: (1) entangling - the
common measurement error introduced by proxying exposure to content by exposure
to format; (2) flattening - aggregating unique segments of media interaction
without incorporating temporal information, most commonly intraindividually and
(3) bundling - summation of the durations of segments of media interaction,
indiscriminate with respect to variations across media segments.

arXiv link: http://arxiv.org/abs/2310.00197v1

Econometrics arXiv paper, submitted: 2023-09-28

Smoothing the Nonsmoothness

Authors: Chaohua Dong, Jiti Gao, Bin Peng, Yundong Tu

To tackle difficulties for theoretical studies in situations involving
nonsmooth functions, we propose a sequence of infinitely differentiable
functions to approximate the nonsmooth function under consideration. A rate of
approximation is established and an illustration of its application is then
provided.

arXiv link: http://arxiv.org/abs/2309.16348v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-09-27

Causal Panel Analysis under Parallel Trends: Lessons from a Large Reanalysis Study

Authors: Albert Chiu, Xingchen Lan, Ziyi Liu, Yiqing Xu

Two-way fixed effects (TWFE) models are widely used in political science to
establish causality, but recent methodological discussions highlight their
limitations under heterogeneous treatment effects (HTE) and violations of the
parallel trends (PT) assumption. This growing literature has introduced
numerous new estimators and procedures, causing confusion among researchers
about the reliability of existing results and best practices. To address these
concerns, we replicated and reanalyzed 49 studies from leading journals using
TWFE models for observational panel data with binary treatments. Using six
HTE-robust estimators, diagnostic tests, and sensitivity analyses, we find: (i)
HTE-robust estimators yield qualitatively similar but highly variable results;
(ii) while a few studies show clear signs of PT violations, many lack evidence
to support this assumption; and (iii) many studies are underpowered when
accounting for HTE and potential PT violations. We emphasize the importance of
strong research designs and rigorous validation of key identifying assumptions.

arXiv link: http://arxiv.org/abs/2309.15983v6

Econometrics arXiv paper, submitted: 2023-09-27

Sluggish news reactions: A combinatorial approach for synchronizing stock jumps

Authors: Nabil Bouamara, Kris Boudt, Sébastien Laurent, Christopher J. Neely

Stock prices often react sluggishly to news, producing gradual jumps and jump
delays. Econometricians typically treat these sluggish reactions as
microstructure effects and settle for a coarse sampling grid to guard against
them. Synchronizing mistimed stock returns on a fine sampling grid allows us to
automatically detect noisy jumps and better approximate the true common jumps
in related stock prices.

arXiv link: http://arxiv.org/abs/2309.15705v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-09-26

Double machine learning and design in batch adaptive experiments

Authors: Harrison H. Li, Art B. Owen

We consider an experiment with at least two stages or batches and $O(N)$
subjects per batch. First, we propose a semiparametric treatment effect
estimator that efficiently pools information across the batches, and show it
asymptotically dominates alternatives that aggregate single batch estimates.
Then, we consider the design problem of learning propensity scores for
assigning treatment in the later batches of the experiment to maximize the
asymptotic precision of this estimator. For two common causal estimands, we
estimate this precision using observations from previous batches, and then
solve a finite-dimensional concave maximization problem to adaptively learn
flexible propensity scores that converge to suitably defined optima in each
batch at rate $O_p(N^{-1/4})$. By extending the framework of double machine
learning, we show this rate suffices for our pooled estimator to attain the
targeted precision after each batch, as long as nuisance function estimates
converge at rate $o_p(N^{-1/4})$. These relatively weak rate requirements
enable the investigator to avoid the common practice of discretizing the
covariate space for design and estimation in batch adaptive experiments while
maintaining the advantages of pooling. Our numerical study shows that such
discretization often leads to substantial asymptotic and finite sample
precision losses outweighing any gains from design.

arXiv link: http://arxiv.org/abs/2309.15297v1

Econometrics arXiv updated paper (originally submitted: 2023-09-26)

Free Discontinuity Regression: With an Application to the Economic Effects of Internet Shutdowns

Authors: Florian Gunsilius, David Van Dijcke

Sharp, multidimensional changepoints-abrupt shifts in a regression surface
whose locations and magnitudes are unknown-arise in settings as varied as
gene-expression profiling, financial covariance breaks, climate-regime
detection, and urban socioeconomic mapping. Despite their prevalence, there are
no current approaches that jointly estimate the location and size of the
discontinuity set in a one-shot approach with statistical guarantees. We
therefore introduce Free Discontinuity Regression (FDR), a fully nonparametric
estimator that simultaneously (i) smooths a regression surface, (ii) segments
it into contiguous regions, and (iii) provably recovers the precise locations
and sizes of its jumps. By extending a convex relaxation of the Mumford-Shah
functional to random spatial sampling and correlated noise, FDR overcomes the
fixed-grid and i.i.d. noise assumptions of classical image-segmentation
approaches, thus enabling its application to real-world data of any dimension.
This yields the first identification and uniform consistency results for
multivariate jump surfaces: under mild SBV regularity, the estimated function,
its discontinuity set, and all jump sizes converge to their true population
counterparts. Hyperparameters are selected automatically from the data using
Stein's Unbiased Risk Estimate, and large-scale simulations up to three
dimensions validate the theoretical results and demonstrate good finite-sample
performance. Applying FDR to an internet shutdown in India reveals a 25-35%
reduction in economic activity around the estimated shutdown boundaries-much
larger than previous estimates. By unifying smoothing, segmentation, and
effect-size recovery in a general statistical setting, FDR turns
free-discontinuity ideas into a practical tool with formal guarantees for
modern multivariate data.

arXiv link: http://arxiv.org/abs/2309.14630v3

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2023-09-26

Assessing Utility of Differential Privacy for RCTs

Authors: Soumya Mukherjee, Aratrika Mustafi, Aleksandra Slavković, Lars Vilhuber

Randomized control trials, RCTs, have become a powerful tool for assessing
the impact of interventions and policies in many contexts. They are considered
the gold-standard for inference in the biomedical fields and in many social
sciences. Researchers have published an increasing number of studies that rely
on RCTs for at least part of the inference, and these studies typically include
the response data collected, de-identified and sometimes protected through
traditional disclosure limitation methods. In this paper, we empirically assess
the impact of strong privacy-preservation methodology (with DP
guarantees), on published analyses from RCTs, leveraging the availability of
replication packages (research compendia) in economics and policy analysis. We
provide simulations studies and demonstrate how we can replicate the analysis
in a published economics article on privacy-protected data under various
parametrizations. We find that relatively straightforward DP-based methods
allow for inference-valid protection of the published data, though
computational issues may limit more complex analyses from using these methods.
The results have applicability to researchers wishing to share RCT data,
especially in the context of low- and middle-income countries, with strong
privacy protection.

arXiv link: http://arxiv.org/abs/2309.14581v1

Econometrics arXiv updated paper (originally submitted: 2023-09-25)

Unified Inference for Dynamic Quantile Predictive Regression

Authors: Christis Katsouris

This paper develops unified asymptotic distribution theory for dynamic
quantile predictive regressions which is useful when examining quantile
predictability in stock returns under possible presence of nonstationarity.

arXiv link: http://arxiv.org/abs/2309.14160v2

Econometrics arXiv updated paper (originally submitted: 2023-09-23)

Nonparametric estimation of conditional densities by generalized random forests

Authors: Federico Zincenko

Considering a continuous random variable Y together with a continuous random
vector X, I propose a nonparametric estimator f^(.|x) for the conditional
density of Y given X=x. This estimator takes the form of an exponential series
whose coefficients Tx = (Tx1,...,TxJ) are the solution of a system of nonlinear
equations that depends on an estimator of the conditional expectation
E[p(Y)|X=x], where p is a J-dimensional vector of basis functions. The
distinguishing feature of the proposed estimator is that E[p(Y)|X=x] is
estimated by generalized random forest (Athey, Tibshirani, and Wager, Annals of
Statistics, 2019), targeting the heterogeneity of Tx across x. I show that
f^(.|x) is uniformly consistent and asymptotically normal, allowing J to grow
to infinity. I also provide a standard error formula to construct
asymptotically valid confidence intervals. Results from Monte Carlo experiments
are provided.

arXiv link: http://arxiv.org/abs/2309.13251v4

Econometrics arXiv updated paper (originally submitted: 2023-09-22)

Nonparametric mixed logit model with market-level parameters estimated from market share data

Authors: Xiyuan Ren, Joseph Y. J. Chow, Prateek Bansal

We propose a nonparametric mixed logit model that is estimated using
market-level choice share data. The model treats each market as an agent and
represents taste heterogeneity through market-specific parameters by solving a
multiagent inverse utility maximization problem, addressing the limitations of
existing market-level choice models with parametric estimation. A simulation
study is conducted to evaluate the performance of our model in terms of
estimation time, estimation accuracy, and out-of-sample predictive accuracy. In
a real data application, we estimate the travel mode choice of 53.55 million
trips made by 19.53 million residents in New York State. These trips are
aggregated based on population segments and census block group-level
origin-destination (OD) pairs, resulting in 120,740 markets. We benchmark our
model against multinomial logit (MNL), nested logit (NL), inverse product
differentiation logit (IPDL), and the BLP models. The results show that the
proposed model improves the out-of-sample accuracy from 65.30% to 81.78%, with
a computation time less than one-tenth of that taken to estimate the BLP model.
The price elasticities and diversion ratios retrieved from our model and
benchmark models exhibit similar substitution patterns. Moreover, the
market-level parameters estimated by our model provide additional insights and
facilitate their seamless integration into supply-side optimization models for
transportation design. By measuring the compensating variation for the driving
mode, we found that a $9 congestion toll would impact roughly 60 % of the total
travelers. As an application of supply-demand integration, we showed that a 50%
discount of transit fare could bring a maximum ridership increase of 9402 trips
per day under a budget of $50,000 per day.

arXiv link: http://arxiv.org/abs/2309.13159v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-09-21

Optimal Conditional Inference in Adaptive Experiments

Authors: Jiafeng Chen, Isaiah Andrews

We study batched bandit experiments and consider the problem of inference
conditional on the realized stopping time, assignment probabilities, and target
parameter, where all of these may be chosen adaptively using information up to
the last batch of the experiment. Absent further restrictions on the
experiment, we show that inference using only the results of the last batch is
optimal. When the adaptive aspects of the experiment are known to be
location-invariant, in the sense that they are unchanged when we shift all
batch-arm means by a constant, we show that there is additional information in
the data, captured by one additional linear function of the batch-arm means. In
the more restrictive case where the stopping time, assignment probabilities,
and target parameter are known to depend on the data only through a collection
of polyhedral events, we derive computationally tractable and optimal
conditional inference procedures.

arXiv link: http://arxiv.org/abs/2309.12162v1

Econometrics arXiv paper, submitted: 2023-09-21

A detection analysis for temporal memory patterns at different time-scales

Authors: Fabio Vanni, David Lambert

This paper introduces a novel methodology that utilizes latency to unveil
time-series dependence patterns. A customized statistical test detects memory
dependence in event sequences by analyzing their inter-event time
distributions. Synthetic experiments based on the renewal-aging property assess
the impact of observer latency on the renewal property. Our test uncovers
memory patterns across diverse time scales, emphasizing the event sequence's
probability structure beyond correlations. The time series analysis produces a
statistical test and graphical plots which helps to detect dependence patterns
among events at different time-scales if any. Furthermore, the test evaluates
the renewal assumption through aging experiments, offering valuable
applications in time-series analysis within economics.

arXiv link: http://arxiv.org/abs/2309.12034v1

Econometrics arXiv cross-link from q-fin.TR (q-fin.TR), submitted: 2023-09-20

Transformers versus LSTMs for electronic trading

Authors: Paul Bilokon, Yitao Qiu

With the rapid development of artificial intelligence, long short term memory
(LSTM), one kind of recurrent neural network (RNN), has been widely applied in
time series prediction.
Like RNN, Transformer is designed to handle the sequential data. As
Transformer achieved great success in Natural Language Processing (NLP),
researchers got interested in Transformer's performance on time series
prediction, and plenty of Transformer-based solutions on long time series
forecasting have come out recently. However, when it comes to financial time
series prediction, LSTM is still a dominant architecture. Therefore, the
question this study wants to answer is: whether the Transformer-based model can
be applied in financial time series prediction and beat LSTM.
To answer this question, various LSTM-based and Transformer-based models are
compared on multiple financial prediction tasks based on high-frequency limit
order book data. A new LSTM-based model called DLSTM is built and new
architecture for the Transformer-based model is designed to adapt for financial
prediction. The experiment result reflects that the Transformer-based model
only has the limited advantage in absolute price sequence prediction. The
LSTM-based models show better and more robust performance on difference
sequence prediction, such as price difference and price movement.

arXiv link: http://arxiv.org/abs/2309.11400v1

Econometrics arXiv updated paper (originally submitted: 2023-09-20)

Identifying Causal Effects in Information Provision Experiments

Authors: Dylan Balla-Elliott

Information treatments often shift beliefs more for people with weaker belief
effects. Since standard TSLS and panel specifications in information provision
experiments have weights proportional to belief updating in the first-stage,
this dependence attenuates existing estimates. This is natural if people whose
decisions depend on their beliefs gather information before the experiment. I
propose a local least squares estimator that identifies unweighted average
effects in several classes of experiments under progressively stronger versions
of Bayesian updating. In five of six recent studies, average effects are larger
than-in several cases more than double-estimates in standard specifications.

arXiv link: http://arxiv.org/abs/2309.11387v4

Econometrics arXiv updated paper (originally submitted: 2023-09-20)

require: Package dependencies for reproducible research

Authors: Sergio Correia, Matthew P. Seay

The ability to conduct reproducible research in Stata is often limited by the
lack of version control for community-contributed packages. This article
introduces the require command, a tool designed to ensure Stata package
dependencies are compatible across users and computer systems. Given a list of
Stata packages, require verifies that each package is installed, checks for a
minimum or exact version or package release date, and optionally installs the
package if prompted by the researcher.

arXiv link: http://arxiv.org/abs/2309.11058v2

Econometrics arXiv updated paper (originally submitted: 2023-09-19)

Correcting Sample Selection Bias in PISA Rankings

Authors: Onil Boussim

This paper addresses the critical issue of sample selection bias in
cross-country comparisons based on international assessments such as the
Programme for International Student Assessment (PISA). Although PISA is widely
used to benchmark educational performance across countries, it samples only
students who remain enrolled in school at age 15. This introduces survival
bias, particularly in countries with high dropout rates, potentially leading to
distorted comparisons. To correct for this bias, I develop a simple adjustment
of the classical Heckman selection model tailored to settings with fully
truncated outcome data. My approach exploits the joint normality of latent
errors and leverages information on the selection rate, allowing identification
of the counterfactual mean outcome for the full population of 15-year-olds.
Applying this method to PISA 2018 data, I show that adjusting for selection
bias results in substantial changes in country rankings based on average
performance. These results highlight the importance of accounting for
non-random sample selection to ensure accurate and policy-relevant
international comparisons of educational outcomes.

arXiv link: http://arxiv.org/abs/2309.10642v6

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-09-19

Regressing on distributions: The nonlinear effect of temperature on regional economic growth

Authors: Malte Jahn

A nonlinear regression framework is proposed for time series and panel data
for the situation where certain explanatory variables are available at a higher
temporal resolution than the dependent variable. The main idea is to use the
moments of the empirical distribution of these variables to construct
regressors with the correct resolution. As the moments are likely to display
nonlinear marginal and interaction effects, an artificial neural network
regression function is proposed. The corresponding model operates within the
traditional stochastic nonlinear least squares framework. In particular, a
numerical Hessian is employed to calculate confidence intervals. The practical
usefulness is demonstrated by analyzing the influence of daily temperatures in
260 European NUTS2 regions on the yearly growth of gross value added in these
regions in the time period 2000 to 2021. In the particular example, the model
allows for an appropriate assessment of regional economic impacts resulting
from (future) changes in the regional temperature distribution (mean AND
variance).

arXiv link: http://arxiv.org/abs/2309.10481v1

Econometrics arXiv updated paper (originally submitted: 2023-09-17)

Bounds on Average Effects in Discrete Choice Panel Data Models

Authors: Cavit Pakel, Martin Weidner

In discrete choice panel data, the estimation of average effects is crucial
for quantifying the effect of covariates, and for policy evaluation and
counterfactual analysis. This task is challenging in short panels with
individual-specific effects due to partial identification and the incidental
parameter problem. In particular, estimation of the sharp identified set is
practically infeasible at realistic sample sizes whenever the number of support
points of the observed covariates is large, such as when the covariates are
continuous. In this paper, we therefore propose estimating outer bounds on the
identified set of average effects. Our bounds are easy to construct, converge
at the parametric rate, and are computationally simple to obtain even in
moderately large samples, independent of whether the covariates are discrete or
continuous. We also provide asymptotically valid confidence intervals on the
identified set.

arXiv link: http://arxiv.org/abs/2309.09299v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-09-16

Optimal Estimation under a Semiparametric Density Ratio Model

Authors: Archer Gong Zhang, Jiahua Chen

In many statistical and econometric applications, we gather individual
samples from various interconnected populations that undeniably exhibit common
latent structures. Utilizing a model that incorporates these latent structures
for such data enhances the efficiency of inferences. Recently, many researchers
have been adopting the semiparametric density ratio model (DRM) to address the
presence of latent structures. The DRM enables estimation of each population
distribution using pooled data, resulting in statistically more efficient
estimations in contrast to nonparametric methods that analyze each sample in
isolation. In this article, we investigate the limit of the efficiency
improvement attainable through the DRM. We focus on situations where one
population's sample size significantly exceeds those of the other populations.
In such scenarios, we demonstrate that the DRM-based inferences for populations
with smaller sample sizes achieve the highest attainable asymptotic efficiency
as if a parametric model is assumed. The estimands we consider include the
model parameters, distribution functions, and quantiles. We use simulation
experiments to support the theoretical findings with a specific focus on
quantile estimation. Additionally, we provide an analysis of real revenue data
from U.S. collegiate sports to illustrate the efficacy of our contribution.

arXiv link: http://arxiv.org/abs/2309.09103v1

Econometrics arXiv updated paper (originally submitted: 2023-09-16)

Least squares estimation in nonstationary nonlinear cohort panels with learning from experience

Authors: Alexander Mayer, Michael Massmann

We discuss techniques of estimation and inference for nonstationary nonlinear
cohort panels with learning from experience, showing, inter alia, the
consistency and asymptotic normality of the nonlinear least squares estimator
used in empirical practice. Potential pitfalls for hypothesis testing are
identified and solutions proposed. Monte Carlo simulations verify the
properties of the estimator and corresponding test statistics in finite
samples, while an application to a panel of survey expectations demonstrates
the usefulness of the theory developed.

arXiv link: http://arxiv.org/abs/2309.08982v4

Econometrics arXiv updated paper (originally submitted: 2023-09-16)

Total-effect Test May Erroneously Reject So-called "Full" or "Complete" Mediation

Authors: Tingxuan Han, Luxi Zhang, Xinshu Zhao, Ke Deng

The procedure for establishing mediation, i.e., determining that an
independent variable X affects a dependent variable Y through some mediator M,
has been under debate. The classic causal steps require that a "total effect"
be significant, now also known as statistically acknowledged. It has been shown
that the total-effect test can erroneously reject competitive mediation and is
superfluous for establishing complementary mediation. Little is known about the
last type, indirect-only mediation, aka "full" or "complete" mediation, in
which the indirect (ab) path passes the statistical partition test while the
direct-and-remainder (d) path fails. This study 1) provides proof that the
total-effect test can erroneously reject indirect-only mediation, including
both sub-types, assuming least square estimation (LSE) F-test or Sobel test; 2)
provides a simulation to duplicate the mathematical proofs and extend the
conclusion to LAD-Z test; 3) provides two real-data examples, one for each
sub-type, to illustrate the mathematical conclusion; 4) in view of the
mathematical findings, proposes to revisit concepts, theories, and techniques
of mediation analysis and other causal dissection analyses, and showcase a more
comprehensive alternative, process-and-product analysis (PAPA).

arXiv link: http://arxiv.org/abs/2309.08910v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-09-15

Adaptive Neyman Allocation

Authors: Jinglong Zhao

In the experimental design literature, Neyman allocation refers to the
practice of allocating units into treated and control groups, potentially in
unequal numbers proportional to their respective standard deviations, with the
objective of minimizing the variance of the treatment effect estimator. This
widely recognized approach increases statistical power in scenarios where the
treated and control groups have different standard deviations, as is often the
case in social experiments, clinical trials, marketing research, and online A/B
testing. However, Neyman allocation cannot be implemented unless the standard
deviations are known in advance. Fortunately, the multi-stage nature of the
aforementioned applications allows the use of earlier stage observations to
estimate the standard deviations, which further guide allocation decisions in
later stages. In this paper, we introduce a competitive analysis framework to
study this multi-stage experimental design problem. We propose a simple
adaptive Neyman allocation algorithm, which almost matches the
information-theoretic limit of conducting experiments. We provide theory for
estimation and inference using data collected from our adaptive Neyman
allocation algorithm. We demonstrate the effectiveness of our adaptive Neyman
allocation algorithm using both online A/B testing data from a social media
site and synthetic data.

arXiv link: http://arxiv.org/abs/2309.08808v4

Econometrics arXiv paper, submitted: 2023-09-15

Ordered Correlation Forest

Authors: Riccardo Di Francesco

Empirical studies in various social sciences often involve categorical
outcomes with inherent ordering, such as self-evaluations of subjective
well-being and self-assessments in health domains. While ordered choice models,
such as the ordered logit and ordered probit, are popular tools for analyzing
these outcomes, they may impose restrictive parametric and distributional
assumptions. This paper introduces a novel estimator, the ordered correlation
forest, that can naturally handle non-linearities in the data and does not
assume a specific error term distribution. The proposed estimator modifies a
standard random forest splitting criterion to build a collection of forests,
each estimating the conditional probability of a single class. Under an
"honesty" condition, predictions are consistent and asymptotically normal. The
weights induced by each forest are used to obtain standard errors for the
predicted probabilities and the covariates' marginal effects. Evidence from
synthetic data shows that the proposed estimator features a superior prediction
performance than alternative forest-based estimators and demonstrates its
ability to construct valid confidence intervals for the covariates' marginal
effects.

arXiv link: http://arxiv.org/abs/2309.08755v1

Econometrics arXiv updated paper (originally submitted: 2023-09-15)

Fixed-b Asymptotics for Panel Models with Two-Way Clustering

Authors: Kaicheng Chen, Timothy J. Vogelsang

This paper studies a cluster robust variance estimator proposed by Chiang,
Hansen and Sasaki (2024) for linear panels. First, we show algebraically that
this variance estimator (CHS estimator, hereafter) is a linear combination of
three common variance estimators: the one-way unit cluster estimator, the "HAC
of averages" estimator, and the "average of HACs" estimator. Based on this
finding, we obtain a fixed-$b$ asymptotic result for the CHS estimator and
corresponding test statistics as the cross-section and time sample sizes
jointly go to infinity. Furthermore, we propose two simple bias-corrected
versions of the variance estimator and derive the fixed-$b$ limits. In a
simulation study, we find that the two bias-corrected variance estimators along
with fixed-$b$ critical values provide improvements in finite sample coverage
probabilities. We illustrate the impact of bias-correction and use of the
fixed-$b$ critical values on inference in an empirical example on the
relationship between industry profitability and market concentration.

arXiv link: http://arxiv.org/abs/2309.08707v4

Econometrics arXiv updated paper (originally submitted: 2023-09-14)

Causal inference in network experiments: regression-based analysis and design-based properties

Authors: Mengsi Gao, Peng Ding

Network experiments are powerful tools for studying spillover effects, which
avoid endogeneity by randomly assigning treatments to units over networks.
However, it is non-trivial to analyze network experiments properly without
imposing strong modeling assumptions. We show that regression-based point
estimators and standard errors can have strong theoretical guarantees if the
regression functions and robust standard errors are carefully specified to
accommodate the interference patterns under network experiments. We first
recall a well-known result that the H\'ajek estimator is numerically identical
to the coefficient from the weighted-least-squares fit based on the inverse
probability of the exposure mapping. Moreover, we demonstrate that the
regression-based approach offers three notable advantages: its ease of
implementation, the ability to derive standard errors through the same
regression fit, and the potential to integrate covariates into the analysis to
improve efficiency. Recognizing that the regression-based network-robust
covariance estimator can be anti-conservative under nonconstant effects, we
propose an adjusted covariance estimator to improve the empirical coverage
rates.

arXiv link: http://arxiv.org/abs/2309.07476v3

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2023-09-13

From Deep Filtering to Deep Econometrics

Authors: Robert Stok, Paul Bilokon

Calculating true volatility is an essential task for option pricing and risk
management. However, it is made difficult by market microstructure noise.
Particle filtering has been proposed to solve this problem as it favorable
statistical properties, but relies on assumptions about underlying market
dynamics. Machine learning methods have also been proposed but lack
interpretability, and often lag in performance. In this paper we implement the
SV-PF-RNN: a hybrid neural network and particle filter architecture. Our
SV-PF-RNN is designed specifically with stochastic volatility estimation in
mind. We then show that it can improve on the performance of a basic particle
filter.

arXiv link: http://arxiv.org/abs/2311.06256v1

Econometrics arXiv updated paper (originally submitted: 2023-09-13)

Stochastic Learning of Semiparametric Monotone Index Models with Large Sample Size

Authors: Qingsong Yao

I study the estimation of semiparametric monotone index models in the
scenario where the number of observation points $n$ is extremely large and
conventional approaches fail to work due to heavy computational burdens.
Motivated by the mini-batch gradient descent algorithm (MBGD) that is widely
used as a stochastic optimization tool in the machine learning field, I
proposes a novel subsample- and iteration-based estimation procedure. In
particular, starting from any initial guess of the true parameter, I
progressively update the parameter using a sequence of subsamples randomly
drawn from the data set whose sample size is much smaller than $n$. The update
is based on the gradient of some well-chosen loss function, where the
nonparametric component is replaced with its Nadaraya-Watson kernel estimator
based on subsamples. My proposed algorithm essentially generalizes MBGD
algorithm to the semiparametric setup. Compared with full-sample-based method,
the new method reduces the computational time by roughly $n$ times if the
subsample size and the kernel function are chosen properly, so can be easily
applied when the sample size $n$ is large. Moreover, I show that if I further
conduct averages across the estimators produced during iterations, the
difference between the average estimator and full-sample-based estimator will
be $1/n$-trivial. Consequently, the average estimator is
$1/n$-consistent and asymptotically normally distributed. In other
words, the new estimator substantially improves the computational speed, while
at the same time maintains the estimation accuracy.

arXiv link: http://arxiv.org/abs/2309.06693v2

Econometrics arXiv updated paper (originally submitted: 2023-09-12)

Sensitivity Analysis for Linear Estimators

Authors: Jacob Dorn, Luther Yap

We propose a novel sensitivity analysis framework for linear estimators with
identification failures that can be viewed as seeing the wrong outcome
distribution. Our approach measures the degree of identification failure
through the change in measure between the observed distribution and a
hypothetical target distribution that would identify the causal parameter of
interest. The framework yields a sensitivity analysis that generalizes existing
bounds for Average Potential Outcome (APO), Regression Discontinuity (RD), and
instrumental variables (IV) exclusion failure designs. Our partial
identification results extend results from the APO context to allow even
unbounded likelihood ratios. Our proposed sensitivity analysis consistently
estimates sharp bounds under plausible conditions and estimates valid bounds
under mild conditions. We find that our method performs well in simulations
even when targeting a discontinuous and nearly infinite bound.

arXiv link: http://arxiv.org/abs/2309.06305v3

Econometrics arXiv updated paper (originally submitted: 2023-09-11)

Forecasted Treatment Effects

Authors: Irene Botosaru, Raffaella Giacomini, Martin Weidner

We consider estimation and inference of the effects of a policy in the
absence of a control group. We obtain unbiased estimators of individual
(heterogeneous) treatment effects and a consistent and asymptotically normal
estimator of the average treatment effect. Our estimator averages over unbiased
forecasts of individual counterfactuals, based on a (short) time series of
pre-treatment data. The paper emphasizes the importance of focusing on forecast
unbiasedness rather than accuracy when the end goal is estimation of average
treatment effects. We show that simple basis function regressions ensure
forecast unbiasedness for a broad class of data-generating processes for the
counterfactuals, even in short panels. In contrast, model-based forecasting
requires stronger assumptions and is prone to misspecification and estimation
bias. We show that our method can replicate the findings of some previous
empirical studies, but without using a control group.

arXiv link: http://arxiv.org/abs/2309.05639v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2023-09-10

Nonlinear Granger Causality using Kernel Ridge Regression

Authors: Wojciech "Victor" Fulmyk

I introduce a novel algorithm and accompanying Python library, named
mlcausality, designed for the identification of nonlinear Granger causal
relationships. This novel algorithm uses a flexible plug-in architecture that
enables researchers to employ any nonlinear regressor as the base prediction
model. Subsequently, I conduct a comprehensive performance analysis of
mlcausality when the prediction regressor is the kernel ridge regressor with
the radial basis function kernel. The results demonstrate that mlcausality
employing kernel ridge regression achieves competitive AUC scores across a
diverse set of simulated data. Furthermore, mlcausality with kernel ridge
regression yields more finely calibrated $p$-values in comparison to rival
algorithms. This enhancement enables mlcausality to attain superior accuracy
scores when using intuitive $p$-value-based thresholding criteria. Finally,
mlcausality with the kernel ridge regression exhibits significantly reduced
computation times compared to existing nonlinear Granger causality algorithms.
In fact, in numerous instances, this innovative approach achieves superior
solutions within computational timeframes that are an order of magnitude
shorter than those required by competing algorithms.

arXiv link: http://arxiv.org/abs/2309.05107v1

Econometrics arXiv updated paper (originally submitted: 2023-09-10)

Testing for Stationary or Persistent Coefficient Randomness in Predictive Regressions

Authors: Mikihito Nishi

This study considers tests for coefficient randomness in predictive
regressions. Our focus is on how tests for coefficient randomness are
influenced by the persistence of random coefficient. We show that when the
random coefficient is stationary, or I(0), Nyblom's (1989) LM test loses its
optimality (in terms of power), which is established against the alternative of
integrated, or I(1), random coefficient. We demonstrate this by constructing a
test that is more powerful than the LM test when the random coefficient is
stationary, although the test is dominated in terms of power by the LM test
when the random coefficient is integrated. The power comparison is made under
the sequence of local alternatives that approaches the null hypothesis at
different rates depending on the persistence of the random coefficient and
which test is considered. We revisit an earlier empirical research and apply
the tests considered in this study to the U.S. stock returns data. The result
mostly reverses the earlier finding.

arXiv link: http://arxiv.org/abs/2309.04926v5

Econometrics arXiv paper, submitted: 2023-09-09

Structural Econometric Estimation of the Basic Reproduction Number for Covid-19 Across U.S. States and Selected Countries

Authors: Ida Johnsson, M. Hashem Pesaran, Cynthia Fan Yang

This paper proposes a structural econometric approach to estimating the basic
reproduction number ($R_{0}$) of Covid-19. This approach identifies
$R_{0}$ in a panel regression model by filtering out the effects of
mitigating factors on disease diffusion and is easy to implement. We apply the
method to data from 48 contiguous U.S. states and a diverse set of countries.
Our results reveal a notable concentration of $R_{0}$ estimates with
an average value of 4.5. Through a counterfactual analysis, we highlight a
significant underestimation of the $R_{0}$ when mitigating factors
are not appropriately accounted for.

arXiv link: http://arxiv.org/abs/2309.08619v1

Econometrics arXiv paper, submitted: 2023-09-09

Non-linear dimension reduction in factor-augmented vector autoregressions

Authors: Karin Klieber

This paper introduces non-linear dimension reduction in factor-augmented
vector autoregressions to analyze the effects of different economic shocks. I
argue that controlling for non-linearities between a large-dimensional dataset
and the latent factors is particularly useful during turbulent times of the
business cycle. In simulations, I show that non-linear dimension reduction
techniques yield good forecasting performance, especially when data is highly
volatile. In an empirical application, I identify a monetary policy as well as
an uncertainty shock excluding and including observations of the COVID-19
pandemic. Those two applications suggest that the non-linear FAVAR approaches
are capable of dealing with the large outliers caused by the COVID-19 pandemic
and yield reliable results in both scenarios.

arXiv link: http://arxiv.org/abs/2309.04821v1

Econometrics arXiv updated paper (originally submitted: 2023-09-09)

Interpreting TSLS Estimators in Information Provision Experiments

Authors: Vod Vilfort, Whitney Zhang

To estimate the causal effects of beliefs on actions, researchers often run
information provision experiments. We consider the causal interpretation of
two-stage least squares (TSLS) estimators in these experiments. We characterize
common TSLS estimators as weighted averages of causal effects, and interpret
these weights under general belief updating conditions that nest parametric
models from the literature. Our framework accommodates TSLS estimators for both
passive and active control designs. Notably, we find that some passive control
estimators allow for negative weights, which compromises their causal
interpretation. We give practical guidance on such issues, and illustrate our
results in two empirical applications.

arXiv link: http://arxiv.org/abs/2309.04793v4

Econometrics arXiv paper, submitted: 2023-09-07

Identifying spatial interdependence in panel data with large N and small T

Authors: Deborah Gefang, Stephen G. Hall, George S. Tavlas

This paper develops a simple two-stage variational Bayesian algorithm to
estimate panel spatial autoregressive models, where N, the number of
cross-sectional units, is much larger than T, the number of time periods
without restricting the spatial effects using a predetermined weighting matrix.
We use Dirichlet-Laplace priors for variable selection and parameter shrinkage.
Without imposing any a priori structures on the spatial linkages between
variables, we let the data speak for themselves. Extensive Monte Carlo studies
show that our method is super-fast and our estimated spatial weights matrices
strongly resemble the true spatial weights matrices. As an illustration, we
investigate the spatial interdependence of European Union regional gross value
added growth rates. In addition to a clear pattern of predominant country
clusters, we have uncovered a number of important between-country spatial
linkages which are yet to be documented in the literature. This new procedure
for estimating spatial effects is of particular relevance for researchers and
policy makers alike.

arXiv link: http://arxiv.org/abs/2309.03740v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2023-09-07

A Causal Perspective on Loan Pricing: Investigating the Impacts of Selection Bias on Identifying Bid-Response Functions

Authors: Christopher Bockel-Rickermann, Sam Verboven, Tim Verdonck, Wouter Verbeke

In lending, where prices are specific to both customers and products, having
a well-functioning personalized pricing policy in place is essential to
effective business making. Typically, such a policy must be derived from
observational data, which introduces several challenges. While the problem of
“endogeneity” is prominently studied in the established pricing literature,
the problem of selection bias (or, more precisely, bid selection bias) is not.
We take a step towards understanding the effects of selection bias by posing
pricing as a problem of causal inference. Specifically, we consider the
reaction of a customer to price a treatment effect. In our experiments, we
simulate varying levels of selection bias on a semi-synthetic dataset on
mortgage loan applications in Belgium. We investigate the potential of
parametric and nonparametric methods for the identification of individual
bid-response functions. Our results illustrate how conventional methods such as
logistic regression and neural networks suffer adversely from selection bias.
In contrast, we implement state-of-the-art methods from causal machine learning
and show their capability to overcome selection bias in pricing data.

arXiv link: http://arxiv.org/abs/2309.03730v1

Econometrics arXiv paper, submitted: 2023-09-05

Instrumental variable estimation of the proportional hazards model by presmoothing

Authors: Lorenzo Tedesco, Jad Beyhum, Ingrid Van Keilegom

We consider instrumental variable estimation of the proportional hazards
model of Cox (1972). The instrument and the endogenous variable are discrete
but there can be (possibly continuous) exogenous covariables. By making a rank
invariance assumption, we can reformulate the proportional hazards model into a
semiparametric version of the instrumental variable quantile regression model
of Chernozhukov and Hansen (2005). A na\"ive estimation approach based on
conditional moment conditions generated by the model would lead to a highly
nonconvex and nonsmooth objective function. To overcome this problem, we
propose a new presmoothing methodology. First, we estimate the model
nonparametrically - and show that this nonparametric estimator has a
closed-form solution in the leading case of interest of randomized experiments
with one-sided noncompliance. Second, we use the nonparametric estimator to
generate “proxy” observations for which exogeneity holds. Third, we apply the
usual partial likelihood estimator to the “proxy” data. While the paper
focuses on the proportional hazards model, our presmoothing approach could be
applied to estimate other semiparametric formulations of the instrumental
variable quantile regression model. Our estimation procedure allows for random
right-censoring. We show asymptotic normality of the resulting estimator. The
approach is illustrated via simulation studies and an empirical application to
the Illinois

arXiv link: http://arxiv.org/abs/2309.02183v1

Econometrics arXiv paper, submitted: 2023-09-05

On the use of U-statistics for linear dyadic interaction models

Authors: G. M. Szini

Even though dyadic regressions are widely used in empirical applications, the
(asymptotic) properties of estimation methods only began to be studied recently
in the literature. This paper aims to provide in a step-by-step manner how
U-statistics tools can be applied to obtain the asymptotic properties of
pairwise differences estimators for a two-way fixed effects model of dyadic
interactions. More specifically, we first propose an estimator for the model
that relies on pairwise differencing such that the fixed effects are
differenced out. As a result, the summands of the influence function will not
be independent anymore, showing dependence on the individual level and
translating to the fact that the usual law of large numbers and central limit
theorems do not straightforwardly apply. To overcome such obstacles, we show
how to generalize tools of U-statistics for single-index variables to the
double-indices context of dyadic datasets. A key result is that there can be
different ways of defining the Hajek projection for a directed dyadic
structure, which will lead to distinct, but equivalent, consistent estimators
for the asymptotic variances. The results presented in this paper are easily
extended to non-linear models.

arXiv link: http://arxiv.org/abs/2309.02089v1

Econometrics arXiv updated paper (originally submitted: 2023-09-05)

Global Neural Networks and The Data Scaling Effect in Financial Time Series Forecasting

Authors: Chen Liu, Minh-Ngoc Tran, Chao Wang, Richard Gerlach, Robert Kohn

Neural networks have revolutionized many empirical fields, yet their
application to financial time series forecasting remains controversial. In this
study, we demonstrate that the conventional practice of estimating models
locally in data-scarce environments may underlie the mixed empirical
performance observed in prior work. By focusing on volatility forecasting, we
employ a dataset comprising over 10,000 global stocks and implement a global
estimation strategy that pools information across cross-sections. Our
econometric analysis reveals that forecasting accuracy improves markedly as the
training dataset becomes larger and more heterogeneous. Notably, even with as
little as 12 months of data, globally trained networks deliver robust
predictions for individual stocks and portfolios that are not even in the
training dataset. Furthermore, our interpretation of the model dynamics shows
that these networks not only capture key stylized facts of volatility but also
exhibit resilience to outliers and rapid adaptation to market regime changes.
These findings underscore the importance of leveraging extensive and diverse
datasets in financial forecasting and advocate for a shift from traditional
local training approaches to integrated global estimation methods.

arXiv link: http://arxiv.org/abs/2309.02072v6

Econometrics arXiv updated paper (originally submitted: 2023-09-05)

The Local Projection Residual Bootstrap for AR(1) Models

Authors: Amilcar Velez

This paper proposes a local projection residual bootstrap method to construct
confidence intervals for impulse response coefficients of AR(1) models. Our
bootstrap method is based on the local projection (LP) approach and involves a
residual bootstrap procedure applied to AR(1) models. We present theoretical
results for our bootstrap method and proposed confidence intervals. First, we
prove the uniform consistency of the LP-residual bootstrap over a large class
of AR(1) models that allow for a unit root, conditional heteroskedasticity of
unknown form, and martingale difference shocks. Then, we prove the asymptotic
validity of our confidence intervals over the same class of AR(1) models.
Finally, we show that the LP-residual bootstrap provides asymptotic refinements
for confidence intervals on a restricted class of AR(1) models relative to
those required for the uniform consistency of our bootstrap.

arXiv link: http://arxiv.org/abs/2309.01889v5

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-09-04

Non-Transitivity of the Win Ratio and the Area Under the Receiver Operating Characteristics Curve (AUC): a case for evaluating the strength of stochastic comparisons

Authors: Olga V. Demler, Ilona A. Demler

The win ratio (WR) is a novel statistic used in randomized controlled trials
that can account for hierarchies within event outcomes. In this paper we report
and study the long-run non-transitive behavior of the win ratio and the closely
related Area Under the Receiver Operating Characteristics Curve (AUC) and argue
that their transitivity cannot be taken for granted. Crucially, traditional
within-group statistics (i.e., comparison of means) are always transitive,
while the WR can detect non-transitivity. Non-transitivity provides valuable
information on the stochastic relationship between two treatment groups, which
should be tested and reported. We specify the necessary conditions for
transitivity, the sufficient conditions for non-transitivity, and demonstrate
non-transitivity in a real-life large randomized controlled trial for the WR of
time-to-death. Our results can be used to rule out or evaluate the possibility
of non-transitivity and show the importance of studying the strength of
stochastic relationships.

arXiv link: http://arxiv.org/abs/2309.01791v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-09-04

Generalized Information Criteria for Structured Sparse Models

Authors: Eduardo F. Mendes, Gabriel J. P. Pinto

Regularized m-estimators are widely used due to their ability of recovering a
low-dimensional model in high-dimensional scenarios. Some recent efforts on
this subject focused on creating a unified framework for establishing oracle
bounds, and deriving conditions for support recovery. Under this same
framework, we propose a new Generalized Information Criteria (GIC) that takes
into consideration the sparsity pattern one wishes to recover. We obtain
non-asymptotic model selection bounds and sufficient conditions for model
selection consistency of the GIC. Furthermore, we show that the GIC can also be
used for selecting the regularization parameter within a regularized
$m$-estimation framework, which allows practical use of the GIC for model
selection in high-dimensional scenarios. We provide examples of group LASSO in
the context of generalized linear regression and low rank matrix regression.

arXiv link: http://arxiv.org/abs/2309.01764v1

Econometrics arXiv paper, submitted: 2023-09-04

Design-Based Multi-Way Clustering

Authors: Luther Yap

This paper extends the design-based framework to settings with multi-way
cluster dependence, and shows how multi-way clustering can be justified when
clustered assignment and clustered sampling occurs on different dimensions, or
when either sampling or assignment is multi-way clustered. Unlike one-way
clustering, the plug-in variance estimator in multi-way clustering is no longer
conservative, so valid inference either requires an assumption on the
correlation of treatment effects or a more conservative variance estimator.
Simulations suggest that the plug-in variance estimator is usually robust, and
the conservative variance estimator is often too conservative.

arXiv link: http://arxiv.org/abs/2309.01658v1

Econometrics arXiv updated paper (originally submitted: 2023-09-04)

The Robust F-Statistic as a Test for Weak Instruments

Authors: Frank Windmeijer

Montiel Olea and Pflueger (2013) proposed the effective F-statistic as a test
for weak instruments in terms of the Nagar bias of the two-stage least squares
(2SLS) estimator relative to a benchmark worst-case bias. We show that their
methodology applies to a class of linear generalized method of moments (GMM)
estimators with an associated class of generalized effective F-statistics. The
standard nonhomoskedasticity robust F-statistic is a member of this class. The
associated GMMf estimator, with the extension f for first-stage, is a novel and
unusual estimator as the weight matrix is based on the first-stage residuals.
As the robust F-statistic can also be used as a test for underidentification,
expressions for the calculation of the weak-instruments critical values in
terms of the Nagar bias of the GMMf estimator relative to the benchmark
simplify and no simulation methods or Patnaik (1949) distributional
approximations are needed. In the grouped-data IV designs of Andrews (2018),
where the robust F-statistic is large but the effective F-statistic is small,
the GMMf estimator is shown to behave much better in terms of bias than the
2SLS estimator, as expected by the weak-instruments test results.

arXiv link: http://arxiv.org/abs/2309.01637v3

Econometrics arXiv paper, submitted: 2023-09-04

Moment-Based Estimation of Diffusion and Adoption Parameters in Networks

Authors: L. S. Sanna Stephan

According to standard econometric theory, Maximum Likelihood estimation (MLE)
is the efficient estimation choice, however, it is not always a feasible one.
In network diffusion models with unobserved signal propagation, MLE requires
integrating out a large number of latent variables, which quickly becomes
computationally infeasible even for moderate network sizes and time horizons.
Limiting the model time horizon on the other hand entails loss of important
information while approximation techniques entail a (small) error that.
Searching for a viable alternative is thus potentially highly beneficial. This
paper proposes two estimators specifically tailored to the network diffusion
model of partially observed adoption and unobserved network diffusion.

arXiv link: http://arxiv.org/abs/2309.01489v1

Econometrics arXiv paper, submitted: 2023-09-04

A Trimming Estimator for the Latent-Diffusion-Observed-Adoption Model

Authors: L. S. Sanna Stephan

Network diffusion models are applicable to many socioeconomic interactions,
yet network interaction is hard to observe or measure. Whenever the diffusion
process is unobserved, the number of possible realizations of the latent matrix
that captures agents' diffusion statuses grows exponentially with the size of
network. Due to interdependencies, the log likelihood function can not be
factorized in individual components. As a consequence, exact estimation of
latent diffusion models with more than one round of interaction is
computationally infeasible. In the present paper, I propose a trimming
estimator that enables me to establish and maximize an approximate log
likelihood function that almost exactly identifies the peak of the true log
likelihood function whenever no more than one third of eligible agents are
subject to trimming.

arXiv link: http://arxiv.org/abs/2309.01471v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2023-09-02

iCOS: Option-Implied COS Method

Authors: Evgenii Vladimirov

This paper proposes the option-implied Fourier-cosine method, iCOS, for
non-parametric estimation of risk-neutral densities, option prices, and option
sensitivities. The iCOS method leverages the Fourier-based COS technique,
proposed by Fang and Oosterlee (2008), by utilizing the option-implied cosine
series coefficients. Notably, this procedure does not rely on any model
assumptions about the underlying asset price dynamics, it is fully
non-parametric, and it does not involve any numerical optimization. These
features make it rather general and computationally appealing. Furthermore, we
derive the asymptotic properties of the proposed non-parametric estimators and
study their finite-sample behavior in Monte Carlo simulations. Our empirical
analysis using S&P 500 index options and Amazon equity options illustrates the
effectiveness of the iCOS method in extracting valuable information from option
prices under different market conditions. Additionally, we apply our
methodology to dissect and quantify observation and discretization errors in
the VIX index.

arXiv link: http://arxiv.org/abs/2309.00943v2

Econometrics arXiv paper, submitted: 2023-09-02

Fairness Implications of Heterogeneous Treatment Effect Estimation with Machine Learning Methods in Policy-making

Authors: Patrick Rehill, Nicholas Biddle

Causal machine learning methods which flexibly generate heterogeneous
treatment effect estimates could be very useful tools for governments trying to
make and implement policy. However, as the critical artificial intelligence
literature has shown, governments must be very careful of unintended
consequences when using machine learning models. One way to try and protect
against unintended bad outcomes is with AI Fairness methods which seek to
create machine learning models where sensitive variables like race or gender do
not influence outcomes. In this paper we argue that standard AI Fairness
approaches developed for predictive machine learning are not suitable for all
causal machine learning applications because causal machine learning generally
(at least so far) uses modelling to inform a human who is the ultimate
decision-maker while AI Fairness approaches assume a model that is making
decisions directly. We define these scenarios as indirect and direct
decision-making respectively and suggest that policy-making is best seen as a
joint decision where the causal machine learning model usually only has
indirect power. We lay out a definition of fairness for this scenario - a model
that provides the information a decision-maker needs to accurately make a value
judgement about just policy outcomes - and argue that the complexity of causal
machine learning models can make this difficult to achieve. The solution here
is not traditional AI Fairness adjustments, but careful modelling and awareness
of some of the decision-making biases that these methods might encourage which
we describe.

arXiv link: http://arxiv.org/abs/2309.00805v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2023-08-31

New general dependence measures: construction, estimation and application to high-frequency stock returns

Authors: Aleksy Leeuwenkamp, Wentao Hu

We propose a set of dependence measures that are non-linear, local, invariant
to a wide range of transformations on the marginals, can show tail and risk
asymmetries, are always well-defined, are easy to estimate and can be used on
any dataset. We propose a nonparametric estimator and prove its consistency and
asymptotic normality. Thereby we significantly improve on existing (extreme)
dependence measures used in asset pricing and statistics. To show practical
utility, we use these measures on high-frequency stock return data around
market distress events such as the 2010 Flash Crash and during the GFC.
Contrary to ubiquitously used correlations we find that our measures clearly
show tail asymmetry, non-linearity, lack of diversification and endogenous
buildup of risks present during these distress events. Additionally, our
measures anticipate large (joint) losses during the Flash Crash while also
anticipating the bounce back and flagging the subsequent market fragility. Our
findings have implications for risk management, portfolio construction and
hedging at any frequency.

arXiv link: http://arxiv.org/abs/2309.00025v1

Econometrics arXiv paper, submitted: 2023-08-29

Target PCA: Transfer Learning Large Dimensional Panel Data

Authors: Junting Duan, Markus Pelger, Ruoxuan Xiong

This paper develops a novel method to estimate a latent factor model for a
large target panel with missing observations by optimally using the information
from auxiliary panel data sets. We refer to our estimator as target-PCA.
Transfer learning from auxiliary panel data allows us to deal with a large
fraction of missing observations and weak signals in the target panel. We show
that our estimator is more efficient and can consistently estimate weak
factors, which are not identifiable with conventional methods. We provide the
asymptotic inferential theory for target-PCA under very general assumptions on
the approximate factor model and missing patterns. In an empirical study of
imputing data in a mixed-frequency macroeconomic panel, we demonstrate that
target-PCA significantly outperforms all benchmark methods.

arXiv link: http://arxiv.org/abs/2308.15627v1

Econometrics arXiv paper, submitted: 2023-08-29

Mixed-Effects Methods for Search and Matching Research

Authors: John M. Abowd, Kevin L. McKinney

We study mixed-effects methods for estimating equations containing person and
firm effects. In economics such models are usually estimated using
fixed-effects methods. Recent enhancements to those fixed-effects methods
include corrections to the bias in estimating the covariance matrix of the
person and firm effects, which we also consider.

arXiv link: http://arxiv.org/abs/2308.15445v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2023-08-29

Combining predictive distributions of electricity prices: Does minimizing the CRPS lead to optimal decisions in day-ahead bidding?

Authors: Weronika Nitka, Rafał Weron

Probabilistic price forecasting has recently gained attention in power
trading because decisions based on such predictions can yield significantly
higher profits than those made with point forecasts alone. At the same time,
methods are being developed to combine predictive distributions, since no model
is perfect and averaging generally improves forecasting performance. In this
article we address the question of whether using CRPS learning, a novel
weighting technique minimizing the continuous ranked probability score (CRPS),
leads to optimal decisions in day-ahead bidding. To this end, we conduct an
empirical study using hourly day-ahead electricity prices from the German EPEX
market. We find that increasing the diversity of an ensemble can have a
positive impact on accuracy. At the same time, the higher computational cost of
using CRPS learning compared to an equal-weighted aggregation of distributions
is not offset by higher profits, despite significantly more accurate
predictions.

arXiv link: http://arxiv.org/abs/2308.15443v1

Econometrics arXiv updated paper (originally submitted: 2023-08-29)

Another Look at the Linear Probability Model and Nonlinear Index Models

Authors: Kaicheng Chen, Robert S. Martin, Jeffrey M. Wooldridge

We reassess the use of linear models to approximate response probabilities of
binary outcomes, focusing on average partial effects (APE). We confirm that
linear projection parameters coincide with APEs in certain scenarios. Through
simulations, we identify other cases where OLS does or does not approximate
APEs and find that having large fraction of fitted values in [0, 1] is neither
necessary nor sufficient. We also show nonlinear least squares estimation of
the ramp model is consistent and asymptotically normal and is equivalent to
using OLS on an iteratively trimmed sample to reduce bias. Our findings offer
practical guidance for empirical research.

arXiv link: http://arxiv.org/abs/2308.15338v3

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2023-08-29

Forecasting with Feedback

Authors: Robert P. Lieli, Augusto Nieto-Barthaburu

Systematically biased forecasts are typically interpreted as evidence of
forecasters' irrationality and/or asymmetric loss. In this paper we propose an
alternative explanation: when forecasts inform economic policy decisions, and
the resulting actions affect the realization of the forecast target itself,
forecasts may be optimally biased even under quadratic loss. The result arises
in environments in which the forecaster is uncertain about the decision maker's
reaction to the forecast, which is presumably the case in most applications. We
illustrate the empirical relevance of our theory by reviewing some stylized
properties of Green Book inflation forecasts and relating them to the
predictions from our model. Our results point out that the presence of policy
feedback poses a challenge to traditional tests of forecast rationality.

arXiv link: http://arxiv.org/abs/2308.15062v3

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2023-08-29

Stochastic Variational Inference for GARCH Models

Authors: Hanwen Xuan, Luca Maestrini, Feng Chen, Clara Grazian

Stochastic variational inference algorithms are derived for fitting various
heteroskedastic time series models. We examine Gaussian, t, and skew-t response
GARCH models and fit these using Gaussian variational approximating densities.
We implement efficient stochastic gradient ascent procedures based on the use
of control variates or the reparameterization trick and demonstrate that the
proposed implementations provide a fast and accurate alternative to Markov
chain Monte Carlo sampling. Additionally, we present sequential updating
versions of our variational algorithms, which are suitable for efficient
portfolio construction and dynamic asset allocation.

arXiv link: http://arxiv.org/abs/2308.14952v1

Econometrics arXiv paper, submitted: 2023-08-28

Donut Regression Discontinuity Designs

Authors: Cladia Noack, Chistoph Rothe

We study the econometric properties of so-called donut regression
discontinuity (RD) designs, a robustness exercise which involves repeating
estimation and inference without the data points in some area around the
treatment threshold. This approach is often motivated by concerns that possible
systematic sorting of units, or similar data issues, in some neighborhood of
the treatment threshold might distort estimation and inference of RD treatment
effects. We show that donut RD estimators can have substantially larger bias
and variance than contentional RD estimators, and that the corresponding
confidence intervals can be substantially longer. We also provide a formal
testing framework for comparing donut and conventional RD estimation results.

arXiv link: http://arxiv.org/abs/2308.14464v1

Econometrics arXiv updated paper (originally submitted: 2023-08-28)

Bandwidth Selection for Treatment Choice with Binary Outcomes

Authors: Takuya Ishihara

This study considers the treatment choice problem when outcome variables are
binary. We focus on statistical treatment rules that plug in fitted values
based on nonparametric kernel regression and show that optimizing two
parameters enables the calculation of the maximum regret. Using this result, we
propose a novel bandwidth selection method based on the minimax regret
criterion. Finally, we perform a numerical analysis to compare the optimal
bandwidth choices for the binary and normally distributed outcomes.

arXiv link: http://arxiv.org/abs/2308.14375v2

Econometrics arXiv paper, submitted: 2023-08-28

Can Machine Learning Catch Economic Recessions Using Economic and Market Sentiments?

Authors: Kian Tehranian

Quantitative models are an important decision-making factor for policy makers
and investors. Predicting an economic recession with high accuracy and
reliability would be very beneficial for the society. This paper assesses
machine learning technics to predict economic recessions in United States using
market sentiment and economic indicators (seventy-five explanatory variables)
from Jan 1986 - June 2022 on a monthly basis frequency. In order to solve the
issue of missing time-series data points, Autoregressive Integrated Moving
Average (ARIMA) method used to backcast explanatory variables. Analysis started
with reduction in high dimensional dataset to only most important characters
using Boruta algorithm, correlation matrix and solving multicollinearity issue.
Afterwards, built various cross-validated models, both probability regression
methods and machine learning technics, to predict recession binary outcome. The
methods considered are Probit, Logit, Elastic Net, Random Forest, Gradient
Boosting, and Neural Network. Lastly, discussed different models performance
based on confusion matrix, accuracy and F1 score with potential reasons for
their weakness and robustness.

arXiv link: http://arxiv.org/abs/2308.16200v1

Econometrics arXiv paper, submitted: 2023-08-27

Identification and Estimation of Demand Models with Endogenous Product Entry and Exit

Authors: Victor Aguirregabiria, Alessandro Iaria, Senay Sokullu

This paper deals with the endogeneity of firms' entry and exit decisions in
demand estimation. Product entry decisions lack a single crossing property in
terms of demand unobservables, which causes the inconsistency of conventional
methods dealing with selection. We present a novel and straightforward two-step
approach to estimate demand while addressing endogenous product entry. In the
first step, our method estimates a finite mixture model of product entry
accommodating latent market types. In the second step, it estimates demand
controlling for the propensity scores of all latent market types. We apply this
approach to data from the airline industry.

arXiv link: http://arxiv.org/abs/2308.14196v1

Econometrics arXiv paper, submitted: 2023-08-27

High Dimensional Time Series Regression Models: Applications to Statistical Learning Methods

Authors: Christis Katsouris

These lecture notes provide an overview of existing methodologies and recent
developments for estimation and inference with high dimensional time series
regression models. First, we present main limit theory results for high
dimensional dependent data which is relevant to covariance matrix structures as
well as to dependent time series sequences. Second, we present main aspects of
the asymptotic theory related to time series regression models with many
covariates. Third, we discuss various applications of statistical learning
methodologies for time series analysis purposes.

arXiv link: http://arxiv.org/abs/2308.16192v1

Econometrics arXiv paper, submitted: 2023-08-26

Break-Point Date Estimation for Nonstationary Autoregressive and Predictive Regression Models

Authors: Christis Katsouris

In this article, we study the statistical and asymptotic properties of
break-point estimators in nonstationary autoregressive and predictive
regression models for testing the presence of a single structural break at an
unknown location in the full sample. Moreover, we investigate aspects such as
how the persistence properties of covariates and the location of the
break-point affects the limiting distribution of the proposed break-point
estimators.

arXiv link: http://arxiv.org/abs/2308.13915v1

Econometrics arXiv paper, submitted: 2023-08-25

Splash! Robustifying Donor Pools for Policy Studies

Authors: Jared Amani Greathouse, Mani Bayani, Jason Coupet

Policy researchers using synthetic control methods typically choose a donor
pool in part by using policy domain expertise so the untreated units are most
like the treated unit in the pre intervention period. This potentially leaves
estimation open to biases, especially when researchers have many potential
donors. We compare how functional principal component analysis synthetic
control, forward-selection, and the original synthetic control method select
donors. To do this, we use Gaussian Process simulations as well as policy case
studies from West German Reunification, a hotel moratorium in Barcelona, and a
sugar-sweetened beverage tax in San Francisco. We then summarize the
implications for policy research and provide avenues for future work.

arXiv link: http://arxiv.org/abs/2308.13688v1

Econometrics arXiv updated paper (originally submitted: 2023-08-25)

GARCHX-NoVaS: A Model-free Approach to Incorporate Exogenous Variables

Authors: Kejin Wu, Sayar Karmakar, Rangan Gupta

In this work, we explore the forecasting ability of a recently proposed
normalizing and variance-stabilizing (NoVaS) transformation with the possible
inclusion of exogenous variables. From an applied point-of-view, extra
knowledge such as fundamentals- and sentiments-based information could be
beneficial to improve the prediction accuracy of market volatility if they are
incorporated into the forecasting process. In the classical approach, these
models including exogenous variables are typically termed GARCHX-type models.
Being a Model-free prediction method, NoVaS has generally shown more accurate,
stable and robust (to misspecifications) performance than that compared to
classical GARCH-type methods. This motivates us to extend this framework to the
GARCHX forecasting as well. We derive the NoVaS transformation needed to
include exogenous covariates and then construct the corresponding prediction
procedure. We show through extensive simulation studies that bolster our claim
that the NoVaS method outperforms traditional ones, especially for long-term
time aggregated predictions. We also provide an interesting data analysis to
exhibit how our method could possibly shed light on the role of geopolitical
risks in forecasting volatility in national stock market indices for three
different countries in Europe.

arXiv link: http://arxiv.org/abs/2308.13346v3

Econometrics arXiv updated paper (originally submitted: 2023-08-25)

SGMM: Stochastic Approximation to Generalized Method of Moments

Authors: Xiaohong Chen, Sokbae Lee, Yuan Liao, Myung Hwan Seo, Youngki Shin, Myunghyun Song

We introduce a new class of algorithms, Stochastic Generalized Method of
Moments (SGMM), for estimation and inference on (overidentified) moment
restriction models. Our SGMM is a novel stochastic approximation alternative to
the popular Hansen (1982) (offline) GMM, and offers fast and scalable
implementation with the ability to handle streaming datasets in real time. We
establish the almost sure convergence, and the (functional) central limit
theorem for the inefficient online 2SLS and the efficient SGMM. Moreover, we
propose online versions of the Durbin-Wu-Hausman and Sargan-Hansen tests that
can be seamlessly integrated within the SGMM framework. Extensive Monte Carlo
simulations show that as the sample size increases, the SGMM matches the
standard (offline) GMM in terms of estimation accuracy and gains over
computational efficiency, indicating its practical value for both large-scale
and online datasets. We demonstrate the efficacy of our approach by a proof of
concept using two well known empirical examples with large sample sizes.

arXiv link: http://arxiv.org/abs/2308.13564v2

Econometrics arXiv paper, submitted: 2023-08-24

Spatial and Spatiotemporal Volatility Models: A Review

Authors: Philipp Otto, Osman Doğan, Süleyman Taşpınar, Wolfgang Schmid, Anil K. Bera

Spatial and spatiotemporal volatility models are a class of models designed
to capture spatial dependence in the volatility of spatial and spatiotemporal
data. Spatial dependence in the volatility may arise due to spatial spillovers
among locations; that is, if two locations are in close proximity, they can
exhibit similar volatilities. In this paper, we aim to provide a comprehensive
review of the recent literature on spatial and spatiotemporal volatility
models. We first briefly review time series volatility models and their
multivariate extensions to motivate their spatial and spatiotemporal
counterparts. We then review various spatial and spatiotemporal volatility
specifications proposed in the literature along with their underlying
motivations and estimation strategies. Through this analysis, we effectively
compare all models and provide practical recommendations for their appropriate
usage. We highlight possible extensions and conclude by outlining directions
for future research.

arXiv link: http://arxiv.org/abs/2308.13061v1

Econometrics arXiv updated paper (originally submitted: 2023-08-24)

Optimal Shrinkage Estimation of Fixed Effects in Linear Panel Data Models

Authors: Soonwoo Kwon

Shrinkage methods are frequently used to improve the precision of least
squares estimators of fixed effects. However, widely used shrinkage estimators
guarantee improved precision only under strong distributional assumptions. I
develop an estimator for the fixed effects that obtains the best possible mean
squared error within a class of shrinkage estimators. This class includes
conventional shrinkage estimators and the optimality does not require
distributional assumptions. The estimator has an intuitive form and is easy to
implement. Moreover, the fixed effects are allowed to vary with time and to be
serially correlated, in which case the shrinkage optimally incorporates the
underlying correlation structure. I also provide a method to forecast fixed
effects one period ahead in this setting.

arXiv link: http://arxiv.org/abs/2308.12485v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-08-23

Scalable Estimation of Multinomial Response Models with Random Consideration Sets

Authors: Siddhartha Chib, Kenichi Shimizu

A common assumption in the fitting of unordered multinomial response models
for $J$ mutually exclusive categories is that the responses arise from the same
set of $J$ categories across subjects. However, when responses measure a choice
made by the subject, it is more appropriate to condition the distribution of
multinomial responses on a subject-specific consideration set, drawn from the
power set of $\{1,2,\ldots,J\}$. This leads to a mixture of multinomial
response models governed by a probability distribution over the $J^{\ast} = 2^J
-1$ consideration sets. We introduce a novel method for estimating such
generalized multinomial response models based on the fundamental result that
any mass distribution over $J^{\ast}$ consideration sets can be represented as
a mixture of products of $J$ component-specific inclusion-exclusion
probabilities. Moreover, under time-invariant consideration sets, the
conditional posterior distribution of consideration sets is sparse. These
features enable a scalable MCMC algorithm for sampling the posterior
distribution of parameters, random effects, and consideration sets. Under
regularity conditions, the posterior distributions of the marginal response
probabilities and the model parameters satisfy consistency. The methodology is
demonstrated in a longitudinal data set on weekly cereal purchases that cover
$J = 101$ brands, a dimension substantially beyond the reach of existing
methods.

arXiv link: http://arxiv.org/abs/2308.12470v5

Econometrics arXiv paper, submitted: 2023-08-22

Forecasting inflation using disaggregates and machine learning

Authors: Gilberto Boaretto, Marcelo C. Medeiros

This paper examines the effectiveness of several forecasting methods for
predicting inflation, focusing on aggregating disaggregated forecasts - also
known in the literature as the bottom-up approach. Taking the Brazilian case as
an application, we consider different disaggregation levels for inflation and
employ a range of traditional time series techniques as well as linear and
nonlinear machine learning (ML) models to deal with a larger number of
predictors. For many forecast horizons, the aggregation of disaggregated
forecasts performs just as well survey-based expectations and models that
generate forecasts using the aggregate directly. Overall, ML methods outperform
traditional time series models in predictive accuracy, with outstanding
performance in forecasting disaggregates. Our results reinforce the benefits of
using models in a data-rich environment for inflation forecasting, including
aggregating disaggregated forecasts from ML techniques, mainly during volatile
periods. Starting from the COVID-19 pandemic, the random forest model based on
both aggregate and disaggregated inflation achieves remarkable predictive
performance at intermediate and longer horizons.

arXiv link: http://arxiv.org/abs/2308.11173v1

Econometrics arXiv paper, submitted: 2023-08-21

Econometrics of Machine Learning Methods in Economic Forecasting

Authors: Andrii Babii, Eric Ghysels, Jonas Striaukas

This paper surveys the recent advances in machine learning method for
economic forecasting. The survey covers the following topics: nowcasting,
textual data, panel and tensor data, high-dimensional Granger causality tests,
time series cross-validation, classification with economic losses.

arXiv link: http://arxiv.org/abs/2308.10993v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-08-21

Simulation Experiments as a Causal Problem

Authors: Tyrel Stokes, Ian Shrier, Russell Steele

Simulation methods are among the most ubiquitous methodological tools in
statistical science. In particular, statisticians often is simulation to
explore properties of statistical functionals in models for which developed
statistical theory is insufficient or to assess finite sample properties of
theoretical results. We show that the design of simulation experiments can be
viewed from the perspective of causal intervention on a data generating
mechanism. We then demonstrate the use of causal tools and frameworks in this
context. Our perspective is agnostic to the particular domain of the simulation
experiment which increases the potential impact of our proposed approach. In
this paper, we consider two illustrative examples. First, we re-examine a
predictive machine learning example from a popular textbook designed to assess
the relationship between mean function complexity and the mean-squared error.
Second, we discuss a traditional causal inference method problem, simulating
the effect of unmeasured confounding on estimation, specifically to illustrate
bias amplification. In both cases, applying causal principles and using
graphical models with parameters and distributions as nodes in the spirit of
influence diagrams can 1) make precise which estimand the simulation targets ,
2) suggest modifications to better attain the simulation goals, and 3) provide
scaffolding to discuss performance criteria for a particular simulation design.

arXiv link: http://arxiv.org/abs/2308.10823v1

Econometrics arXiv updated paper (originally submitted: 2023-08-20)

Genuinely Robust Inference for Clustered Data

Authors: Harold D. Chiang, Yuya Sasaki, Yulong Wang

Conventional cluster-robust inference can be invalid when data contain
clusters of unignorably large size. We formalize this issue by deriving a
necessary and sufficient condition for its validity, and show that this
condition is frequently violated in practice: specifications from 77% of
empirical research articles in American Economic Review and Econometrica during
2020-2021 appear not to meet it. To address this limitation, we propose a
genuinely robust inference procedure based on a new cluster score bootstrap. We
establish its validity and size control across broad classes of data-generating
processes where conventional methods break down. Simulation studies corroborate
our theoretical findings, and empirical applications illustrate that employing
the proposed method can substantially alter conventional statistical
conclusions.

arXiv link: http://arxiv.org/abs/2308.10138v8

Econometrics arXiv updated paper (originally submitted: 2023-08-18)

Weak Identification with Many Instruments

Authors: Anna Mikusheva, Liyang Sun

Linear instrumental variable regressions are widely used to estimate causal
effects. Many instruments arise from the use of “technical” instruments and
more recently from the empirical strategy of “judge design”. This paper
surveys and summarizes ideas from recent literature on estimation and
statistical inferences with many instruments for a single endogenous regressor.
We discuss how to assess the strength of the instruments and how to conduct
weak identification-robust inference under heteroskedasticity. We establish new
results for a jack-knifed version of the Lagrange Multiplier (LM) test
statistic. Furthermore, we extend the weak-identification-robust tests to
settings with both many exogenous regressors and many instruments. We propose a
test that properly partials out many exogenous regressors while preserving the
re-centering property of the jack-knife. The proposed tests have correct size
and good power properties.

arXiv link: http://arxiv.org/abs/2308.09535v2

Econometrics arXiv paper, submitted: 2023-08-17

Closed-form approximations of moments and densities of continuous-time Markov models

Authors: Dennis Kristensen, Young Jun Lee, Antonio Mele

This paper develops power series expansions of a general class of moment
functions, including transition densities and option prices, of continuous-time
Markov processes, including jump--diffusions. The proposed expansions extend
the ones in Kristensen and Mele (2011) to cover general Markov processes. We
demonstrate that the class of expansions nests the transition density and
option price expansions developed in Yang, Chen, and Wan (2019) and Wan and
Yang (2021) as special cases, thereby connecting seemingly different ideas in a
unified framework. We show how the general expansion can be implemented for
fully general jump--diffusion models. We provide a new theory for the validity
of the expansions which shows that series expansions are not guaranteed to
converge as more terms are added in general. Thus, these methods should be used
with caution. At the same time, the numerical studies in this paper demonstrate
good performance of the proposed implementation in practice when a small number
of terms are included.

arXiv link: http://arxiv.org/abs/2308.09009v1

Econometrics arXiv updated paper (originally submitted: 2023-08-17)

Linear Regression with Weak Exogeneity

Authors: Anna Mikusheva, Mikkel Sølvsten

This paper studies linear time series regressions with many regressors. Weak
exogeneity is the most used identifying assumption in time series. Weak
exogeneity requires the structural error to have zero conditional expectation
given the present and past regressor values, allowing errors to correlate with
future regressor realizations. We show that weak exogeneity in time series
regressions with many controls may produce substantial biases and even render
the least squares (OLS) estimator inconsistent. The bias arises in settings
with many regressors because the normalized OLS design matrix remains
asymptotically random and correlates with the regression error when only weak
(but not strict) exogeneity holds. This bias's magnitude increases with the
number of regressors and their average autocorrelation. To address this issue,
we propose an innovative approach to bias correction that yields a new
estimator with improved properties relative to OLS. We establish consistency
and conditional asymptotic Gaussianity of this new estimator and provide a
method for inference.

arXiv link: http://arxiv.org/abs/2308.08958v2

Econometrics arXiv updated paper (originally submitted: 2023-08-16)

Testing Partial Instrument Monotonicity

Authors: Hongyi Jiang, Zhenting Sun

When multi-dimensional instruments are used to identify and estimate causal
effects, the monotonicity condition may not hold due to heterogeneity in the
population. Under a partial monotonicity condition, which only requires the
monotonicity to hold for each instrument separately holding all the other
instruments fixed, the 2SLS estimand can still be a positively weighted average
of LATEs. In this paper, we provide a simple nonparametric test for partial
instrument monotonicity. We demonstrate the good finite sample properties of
the test through Monte Carlo simulations. We then apply the test to monetary
incentives and distance from results centers as instruments for the knowledge
of HIV status.

arXiv link: http://arxiv.org/abs/2308.08390v2

Econometrics arXiv cross-link from cs.CV (cs.CV), submitted: 2023-08-16

Computer vision-enriched discrete choice models, with an application to residential location choice

Authors: Sander van Cranenburgh, Francisco Garrido-Valenzuela

Visual imagery is indispensable to many multi-attribute decision situations.
Examples of such decision situations in travel behaviour research include
residential location choices, vehicle choices, tourist destination choices, and
various safety-related choices. However, current discrete choice models cannot
handle image data and thus cannot incorporate information embedded in images
into their representations of choice behaviour. This gap between discrete
choice models' capabilities and the real-world behaviour it seeks to model
leads to incomplete and, possibly, misleading outcomes. To solve this gap, this
study proposes "Computer Vision-enriched Discrete Choice Models" (CV-DCMs).
CV-DCMs can handle choice tasks involving numeric attributes and images by
integrating computer vision and traditional discrete choice models. Moreover,
because CV-DCMs are grounded in random utility maximisation principles, they
maintain the solid behavioural foundation of traditional discrete choice
models. We demonstrate the proposed CV-DCM by applying it to data obtained
through a novel stated choice experiment involving residential location
choices. In this experiment, respondents faced choice tasks with trade-offs
between commute time, monthly housing cost and street-level conditions,
presented using images. As such, this research contributes to the growing body
of literature in the travel behaviour field that seeks to integrate discrete
choice modelling and machine learning.

arXiv link: http://arxiv.org/abs/2308.08276v1

Econometrics arXiv updated paper (originally submitted: 2023-08-16)

Estimating Effects of Long-Term Treatments

Authors: Shan Huang, Chen Wang, Yuan Yuan, Jinglong Zhao, Brocco, Zhang

Estimating the effects of long-term treatments through A/B testing is
challenging. Treatments, such as updates to product functionalities, user
interface designs, and recommendation algorithms, are intended to persist
within the system for a long duration of time after their initial launches.
However, due to the constraints of conducting long-term experiments,
practitioners often rely on short-term experimental results to make product
launch decisions. It remains open how to accurately estimate the effects of
long-term treatments using short-term experimental data. To address this
question, we introduce a longitudinal surrogate framework that decomposes the
long-term effects into functions based on user attributes, short-term metrics,
and treatment assignments. We outline identification assumptions, estimation
strategies, inferential techniques, and validation methods under this
framework. Empirically, we demonstrate that our approach outperforms existing
solutions by using data from two real-world experiments, each involving more
than a million users on WeChat, one of the world's largest social networking
platforms.

arXiv link: http://arxiv.org/abs/2308.08152v2

Econometrics arXiv updated paper (originally submitted: 2023-08-15)

Emerging Frontiers: Exploring the Impact of Generative AI Platforms on University Quantitative Finance Examinations

Authors: Rama K. Malladi

This study evaluated three Artificial Intelligence (AI) large language model
(LLM) enabled platforms - ChatGPT, BARD, and Bing AI - to answer an
undergraduate finance exam with 20 quantitative questions across various
difficulty levels. ChatGPT scored 30 percent, outperforming Bing AI, which
scored 20 percent, while Bard lagged behind with a score of 15 percent. These
models faced common challenges, such as inaccurate computations and formula
selection. While they are currently insufficient for helping students pass the
finance exam, they serve as valuable tools for dedicated learners. Future
advancements are expected to overcome these limitations, allowing for improved
formula selection and accurate computations and potentially enabling students
to score 90 percent or higher.

arXiv link: http://arxiv.org/abs/2308.07979v2

Econometrics arXiv paper, submitted: 2023-08-15

Optimizing B2B Product Offers with Machine Learning, Mixed Logit, and Nonlinear Programming

Authors: John V. Colias, Stella Park, Elizabeth Horn

In B2B markets, value-based pricing and selling has become an important
alternative to discounting. This study outlines a modeling method that uses
customer data (product offers made to each current or potential customer,
features, discounts, and customer purchase decisions) to estimate a mixed logit
choice model. The model is estimated via hierarchical Bayes and machine
learning, delivering customer-level parameter estimates. Customer-level
estimates are input into a nonlinear programming next-offer maximization
problem to select optimal features and discount level for customer segments,
where segments are based on loyalty and discount elasticity. The mixed logit
model is integrated with economic theory (the random utility model), and it
predicts both customer perceived value for and response to alternative future
sales offers. The methodology can be implemented to support value-based pricing
and selling efforts.
Contributions to the literature include: (a) the use of customer-level
parameter estimates from a mixed logit model, delivered via a hierarchical
Bayes estimation procedure, to support value-based pricing decisions; (b)
validation that mixed logit customer-level modeling can deliver strong
predictive accuracy, not as high as random forest but comparing favorably; and
(c) a nonlinear programming problem that uses customer-level mixed logit
estimates to select optimal features and discounts.

arXiv link: http://arxiv.org/abs/2308.07830v1

Econometrics arXiv paper, submitted: 2023-08-15

Serendipity in Science

Authors: Pyung Nahm, Raviv Murciano-Goroff, Michael Park, Russell J. Funk

Serendipity plays an important role in scientific discovery. Indeed, many of
the most important breakthroughs, ranging from penicillin to the electric
battery, have been made by scientists who were stimulated by a chance exposure
to unsought but useful information. However, not all scientists are equally
likely to benefit from such serendipitous exposure. Although scholars generally
agree that scientists with a prepared mind are most likely to benefit from
serendipitous encounters, there is much less consensus over what precisely
constitutes a prepared mind, with some research suggesting the importance of
openness and others emphasizing the need for deep prior experience in a
particular domain. In this paper, we empirically investigate the role of
serendipity in science by leveraging a policy change that exogenously shifted
the shelving location of journals in university libraries and subsequently
exposed scientists to unsought scientific information. Using large-scale data
on 2.4 million papers published in 9,750 journals by 520,000 scientists at 115
North American research universities, we find that scientists with greater
openness are more likely to benefit from serendipitous encounters. Following
the policy change, these scientists tended to cite less familiar and newer
work, and ultimately published papers that were more innovative. By contrast,
we find little effect on innovativeness for scientists with greater depth of
experience, who, in our sample, tended to cite more familiar and older work
following the policy change.

arXiv link: http://arxiv.org/abs/2308.07519v1

Econometrics arXiv updated paper (originally submitted: 2023-08-12)

Quantile Time Series Regression Models Revisited

Authors: Christis Katsouris

This article discusses recent developments in the literature of quantile time
series models in the cases of stationary and nonstationary underline stochastic
processes.

arXiv link: http://arxiv.org/abs/2308.06617v3

Econometrics arXiv cross-link from cs.HC (cs.HC), submitted: 2023-08-12

Driver Heterogeneity in Willingness to Give Control to Conditional Automation

Authors: Muhammad Sajjad Ansar, Nael Alsaleh, Bilal Farooq

The driver's willingness to give (WTG) control in conditionally automated
driving is assessed in a virtual reality based driving-rig, through their
choice to give away driving control and through the extent to which automated
driving is adopted in a mixed-traffic environment. Within- and across-class
unobserved heterogeneity and locus of control variations are taken into
account. The choice of giving away control is modelled using the mixed logit
(MIXL) and mixed latent class (LCML) model. The significant latent segments of
the locus of control are developed into internalizers and externalizers by the
latent class model (LCM) based on the taste heterogeneity identified from the
MIXL model. Results suggest that drivers choose to "giveAway" control of the
vehicle when greater concentration/attentiveness is required (e.g., in the
nighttime) or when they are interested in performing a non-driving-related task
(NDRT). In addition, it is observed that internalizers demonstrate more
heterogeneity compared to externalizers in terms of WTG.

arXiv link: http://arxiv.org/abs/2308.06426v1

Econometrics arXiv paper, submitted: 2023-08-11

Characterizing Correlation Matrices that Admit a Clustered Factor Representation

Authors: Chen Tong, Peter Reinhard Hansen

The Clustered Factor (CF) model induces a block structure on the correlation
matrix and is commonly used to parameterize correlation matrices. Our results
reveal that the CF model imposes superfluous restrictions on the correlation
matrix. This can be avoided by a different parametrization, involving the
logarithmic transformation of the block correlation matrix.

arXiv link: http://arxiv.org/abs/2308.05895v1

Econometrics arXiv updated paper (originally submitted: 2023-08-10)

Large Skew-t Copula Models and Asymmetric Dependence in Intraday Equity Returns

Authors: Lin Deng, Michael Stanley Smith, Worapree Maneesoonthorn

Skew-t copula models are attractive for the modeling of financial data
because they allow for asymmetric and extreme tail dependence. We show that the
copula implicit in the skew-t distribution of Azzalini and Capitanio (2003)
allows for a higher level of pairwise asymmetric dependence than two popular
alternative skew-t copulas. Estimation of this copula in high dimensions is
challenging, and we propose a fast and accurate Bayesian variational inference
(VI) approach to do so. The method uses a generative representation of the
skew-t distribution to define an augmented posterior that can be approximated
accurately. A stochastic gradient ascent algorithm is used to solve the
variational optimization. The methodology is used to estimate skew-t factor
copula models with up to 15 factors for intraday returns from 2017 to 2021 on
93 U.S. equities. The copula captures substantial heterogeneity in asymmetric
dependence over equity pairs, in addition to the variability in pairwise
correlations. In a moving window study we show that the asymmetric dependencies
also vary over time, and that intraday predictive densities from the skew-t
copula are more accurate than those from benchmark copula models. Portfolio
selection strategies based on the estimated pairwise asymmetric dependencies
improve performance relative to the index.

arXiv link: http://arxiv.org/abs/2308.05564v4

Econometrics arXiv updated paper (originally submitted: 2023-08-10)

Money Growth and Inflation: A Quantile Sensitivity Approach

Authors: Matteo Iacopini, Aubrey Poon, Luca Rossini, Dan Zhu

An innovative method is proposed to construct a quantile dependence system
for inflation and money growth. By considering all quantiles and leveraging a
novel notion of quantile sensitivity, the method allows the assessment of
changes in the entire distribution of a variable of interest in response to a
perturbation in another variable's quantile. The construction of this
relationship is demonstrated through a system of linear quantile regressions.
Then, the proposed framework is exploited to examine the distributional effects
of money growth on the distributions of inflation and its disaggregate measures
in the United States and the Euro area. The empirical analysis uncovers
significant impacts of the upper quantile of the money growth distribution on
the distribution of inflation and its disaggregate measures. Conversely, the
lower and median quantiles of the money growth distribution are found to have a
negligible influence. Finally, this distributional impact exhibits variation
over time in both the United States and the Euro area.

arXiv link: http://arxiv.org/abs/2308.05486v3

Econometrics arXiv paper, submitted: 2023-08-10

Solving the Forecast Combination Puzzle

Authors: David T. Frazier, Ryan Covey, Gael M. Martin, Donald Poskitt

We demonstrate that the forecasting combination puzzle is a consequence of
the methodology commonly used to produce forecast combinations. By the
combination puzzle, we refer to the empirical finding that predictions formed
by combining multiple forecasts in ways that seek to optimize forecast
performance often do not out-perform more naive, e.g. equally-weighted,
approaches. In particular, we demonstrate that, due to the manner in which such
forecasts are typically produced, tests that aim to discriminate between the
predictive accuracy of competing combination strategies can have low power, and
can lack size control, leading to an outcome that favours the naive approach.
We show that this poor performance is due to the behavior of the corresponding
test statistic, which has a non-standard asymptotic distribution under the null
hypothesis of no inferior predictive accuracy, rather than the {standard normal
distribution that is} {typically adopted}. In addition, we demonstrate that the
low power of such predictive accuracy tests in the forecast combination setting
can be completely avoided if more efficient estimation strategies are used in
the production of the combinations, when feasible. We illustrate these findings
both in the context of forecasting a functional of interest and in terms of
predictive densities. A short empirical example {using daily financial returns}
exemplifies how researchers can avoid the puzzle in practical settings.

arXiv link: http://arxiv.org/abs/2308.05263v1

Econometrics arXiv paper, submitted: 2023-08-09

Interpolation of numerical series by the Fermat-Torricelli point construction method on the example of the numerical series of inflation in the Czech Republic in 2011-2021

Authors: Yekimov Sergey

The use of regression analysis for processing experimental data is fraught
with certain difficulties, which, when models are constructed, are associated
with assumptions, and there is a normal law of error distribution and variables
are statistically independent. In practice , these conditions do not always
take place . This may cause the constructed economic and mathematical model to
have no practical value. As an alternative approach to the study of numerical
series, according to the author, smoothing of numerical series using
Fermat-Torricelli points with subsequent interpolation of these points by
series of exponents could be used. The use of exponential series for
interpolating numerical series makes it possible to achieve the accuracy of
model construction no worse than regression analysis . At the same time, the
interpolation by series of exponents does not require the statistical material
that the errors of the numerical series obey the normal distribution law, and
statistical independence of variables is also not required. Interpolation of
numerical series by exponential series represents a "black box" type model,
that is, only input parameters and output parameters matter.

arXiv link: http://arxiv.org/abs/2308.05183v1

Econometrics arXiv paper, submitted: 2023-08-09

Statistical Decision Theory Respecting Stochastic Dominance

Authors: Charles F. Manski, Aleksey Tetenov

The statistical decision theory pioneered by Wald (1950) has used
state-dependent mean loss (risk) to measure the performance of statistical
decision functions across potential samples. We think it evident that
evaluation of performance should respect stochastic dominance, but we do not
see a compelling reason to focus exclusively on mean loss. We think it
instructive to also measure performance by other functionals that respect
stochastic dominance, such as quantiles of the distribution of loss. This paper
develops general principles and illustrative applications for statistical
decision theory respecting stochastic dominance. We modify the Wald definition
of admissibility to an analogous concept of stochastic dominance (SD)
admissibility, which uses stochastic dominance rather than mean sampling
performance to compare alternative decision rules. We study SD admissibility in
two relatively simple classes of decision problems that arise in treatment
choice. We reevaluate the relationship between the MLE, James-Stein, and
James-Stein positive part estimators from the perspective of SD admissibility.
We consider alternative criteria for choice among SD-admissible rules. We
juxtapose traditional criteria based on risk, regret, or Bayes risk with
analogous ones based on quantiles of state-dependent sampling distributions or
the Bayes distribution of loss.

arXiv link: http://arxiv.org/abs/2308.05171v1

Econometrics arXiv paper, submitted: 2023-08-09

A Guide to Impact Evaluation under Sample Selection and Missing Data: Teacher's Aides and Adolescent Mental Health

Authors: Simon Calmar Andersen, Louise Beuchert, Phillip Heiler, Helena Skyt Nielsen

This paper is concerned with identification, estimation, and specification
testing in causal evaluation problems when data is selective and/or missing. We
leverage recent advances in the literature on graphical methods to provide a
unifying framework for guiding empirical practice. The approach integrates and
connects to prominent identification and testing strategies in the literature
on missing data, causal machine learning, panel data analysis, and more. We
demonstrate its utility in the context of identification and specification
testing in sample selection models and field experiments with attrition. We
provide a novel analysis of a large-scale cluster-randomized controlled
teacher's aide trial in Danish schools at grade 6. Even with detailed
administrative data, the handling of missing data crucially affects broader
conclusions about effects on mental health. Results suggest that teaching
assistants provide an effective way of improving internalizing behavior for
large parts of the student population.

arXiv link: http://arxiv.org/abs/2308.04963v1

Econometrics arXiv updated paper (originally submitted: 2023-08-08)

Causal Interpretation of Linear Social Interaction Models with Endogenous Networks

Authors: Tadao Hoshino

This study investigates the causal interpretation of linear social
interaction models in the presence of endogeneity in network formation under a
heterogeneous treatment effects framework. We consider an experimental setting
in which individuals are randomly assigned to treatments while no interventions
are made for the network structure. We show that running a linear regression
ignoring network endogeneity is not problematic for estimating the average
direct treatment effect. However, it leads to sample selection bias and
negative-weights problem for the estimation of the average spillover effect. To
overcome these problems, we propose using potential peer treatment as an
instrumental variable (IV), which is automatically a valid IV for actual
spillover exposure. Using this IV, we examine two IV-based estimands and
demonstrate that they have a local average treatment-effect-type causal
interpretation for the spillover effect.

arXiv link: http://arxiv.org/abs/2308.04276v2

Econometrics arXiv updated paper (originally submitted: 2023-08-08)

Threshold Regression in Heterogeneous Panel Data with Interactive Fixed Effects

Authors: Marco Barassi, Yiannis Karavias, Chongxian Zhu

This paper introduces unit-specific heterogeneity in panel data threshold
regression. We develop a comprehensive asymptotic theory for models with
heterogeneous thresholds, heterogeneous slope coefficients, and interactive
fixed effects. Our estimation methodology employs the Common Correlated Effects
approach, which is able to handle heterogeneous coefficients while maintaining
computational simplicity. We also propose a semi-homogeneous model with
heterogeneous slopes but a common threshold, revealing novel mean group
estimator convergence rates due to the interaction of heterogeneity with the
shrinking threshold assumption. Tests for linearity are provided, and also a
modified information criterion which can choose between the fully heterogeneous
and the semi-homogeneous models. Monte Carlo simulations demonstrate the good
performance of the new methods in small samples. The new theory is applied to
examine the Feldstein-Horioka puzzle and it is found that threshold
nonlinearity with respect to trade openness exists only in a small subset of
countries.

arXiv link: http://arxiv.org/abs/2308.04057v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-08-07

Measuring income inequality via percentile relativities

Authors: Vytaras Brazauskas, Francesca Greselin, Ricardas Zitikis

"The rich are getting richer" implies that the population income
distributions are getting more right skewed and heavily tailed. For such
distributions, the mean is not the best measure of the center, but the
classical indices of income inequality, including the celebrated Gini index,
are all mean-based. In view of this, Professor Gastwirth sounded an alarm back
in 2014 by suggesting to incorporate the median into the definition of the Gini
index, although noted a few shortcomings of his proposed index. In the present
paper we make a further step in the modification of classical indices and, to
acknowledge the possibility of differing viewpoints, arrive at three
median-based indices of inequality. They avoid the shortcomings of the previous
indices and can be used even when populations are ultra heavily tailed, that
is, when their first moments are infinite. The new indices are illustrated both
analytically and numerically using parametric families of income distributions,
and further illustrated using capital incomes coming from 2001 and 2018 surveys
of fifteen European countries. We also discuss the performance of the indices
from the perspective of income transfers.

arXiv link: http://arxiv.org/abs/2308.03708v1

Econometrics arXiv paper, submitted: 2023-08-05

Treatment Effects in Staggered Adoption Designs with Non-Parallel Trends

Authors: Brantly Callaway, Emmanuel Selorm Tsyawo

This paper considers identifying and estimating causal effect parameters in a
staggered treatment adoption setting -- that is, where a researcher has access
to panel data and treatment timing varies across units. We consider the case
where untreated potential outcomes may follow non-parallel trends over time
across groups. This implies that the identifying assumptions of leading
approaches such as difference-in-differences do not hold. We mainly focus on
the case where untreated potential outcomes are generated by an interactive
fixed effects model and show that variation in treatment timing provides
additional moment conditions that can be used to recover a large class of
target causal effect parameters. Our approach exploits the variation in
treatment timing without requiring either (i) a large number of time periods or
(ii) requiring any extra exclusion restrictions. This is in contrast to
essentially all of the literature on interactive fixed effects models which
requires at least one of these extra conditions. Rather, our approach directly
applies in settings where there is variation in treatment timing. Although our
main focus is on a model with interactive fixed effects, our idea of using
variation in treatment timing to recover causal effect parameters is quite
general and could be adapted to other settings with non-parallel trends across
groups such as dynamic panel data models.

arXiv link: http://arxiv.org/abs/2308.02899v1

Econometrics arXiv updated paper (originally submitted: 2023-08-04)

Composite Quantile Factor Model

Authors: Xiao Huang

This paper introduces the method of composite quantile factor model for
factor analysis in high-dimensional panel data. We propose to estimate the
factors and factor loadings across multiple quantiles of the data, allowing the
estimates to better adapt to features of the data at different quantiles while
still modeling the mean of the data. We develop the limiting distribution of
the estimated factors and factor loadings, and an information criterion for
consistent factor number selection is also discussed. Simulations show that the
proposed estimator and the information criterion have good finite sample
properties for several non-normal distributions under consideration. We also
consider an empirical study on the factor analysis for 246 quarterly
macroeconomic variables. A companion R package cqrfactor is developed.

arXiv link: http://arxiv.org/abs/2308.02450v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-08-04

Matrix Completion When Missing Is Not at Random and Its Applications in Causal Panel Data Models

Authors: Jungjun Choi, Ming Yuan

This paper develops an inferential framework for matrix completion when
missing is not at random and without the requirement of strong signals. Our
development is based on the observation that if the number of missing entries
is small enough compared to the panel size, then they can be estimated well
even when missing is not at random. Taking advantage of this fact, we divide
the missing entries into smaller groups and estimate each group via nuclear
norm regularization. In addition, we show that with appropriate debiasing, our
proposed estimate is asymptotically normal even for fairly weak signals. Our
work is motivated by recent research on the Tick Size Pilot Program, an
experiment conducted by the Security and Exchange Commission (SEC) to evaluate
the impact of widening the tick size on the market quality of stocks from 2016
to 2018. While previous studies were based on traditional regression or
difference-in-difference methods by assuming that the treatment effect is
invariant with respect to time and unit, our analyses suggest significant
heterogeneity across units and intriguing dynamics over time during the pilot
program.

arXiv link: http://arxiv.org/abs/2308.02364v1

Econometrics arXiv paper, submitted: 2023-08-03

Amortized neural networks for agent-based model forecasting

Authors: Denis Koshelev, Alexey Ponomarenko, Sergei Seleznev

In this paper, we propose a new procedure for unconditional and conditional
forecasting in agent-based models. The proposed algorithm is based on the
application of amortized neural networks and consists of two steps. The first
step simulates artificial datasets from the model. In the second step, a neural
network is trained to predict the future values of the variables using the
history of observations. The main advantage of the proposed algorithm is its
speed. This is due to the fact that, after the training procedure, it can be
used to yield predictions for almost any data without additional simulations or
the re-estimation of the neural network

arXiv link: http://arxiv.org/abs/2308.05753v1

Econometrics arXiv updated paper (originally submitted: 2023-08-03)

Individual Shrinkage for Random Effects

Authors: Raffaella Giacomini, Sokbae Lee, Silvia Sarpietro

This paper develops a novel approach to random effects estimation and
individual-level forecasting in micropanels, targeting individual accuracy
rather than aggregate performance. The conventional shrinkage methods used in
the literature, such as the James-Stein estimator and Empirical Bayes, target
aggregate performance and can lead to inaccurate decisions at the individual
level. We propose a class of shrinkage estimators with individual weights (IW)
that leverage an individual's own past history, instead of the cross-sectional
dimension. This approach overcomes the "tyranny of the majority" inherent in
existing methods, while relying on weaker assumptions. A key contribution is
addressing the challenge of obtaining feasible weights from short time-series
data and under parameter heterogeneity. We discuss the theoretical optimality
of IW and recommend using feasible weights determined through a Minimax Regret
analysis in practice.

arXiv link: http://arxiv.org/abs/2308.01596v3

Econometrics arXiv updated paper (originally submitted: 2023-08-02)

Limit Theory under Network Dependence and Nonstationarity

Authors: Christis Katsouris

These lecture notes represent supplementary material for a short course on
time series econometrics and network econometrics. We give emphasis on limit
theory for time series regression models as well as the use of the
local-to-unity parametrization when modeling time series nonstationarity.
Moreover, we present various non-asymptotic theory results for moderate
deviation principles when considering the eigenvalues of covariance matrices as
well as asymptotics for unit root moderate deviations in nonstationary
autoregressive processes. Although not all applications from the literature are
covered we also discuss some open problems in the time series and network
econometrics literature.

arXiv link: http://arxiv.org/abs/2308.01418v4

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2023-08-02

Analyzing the Reporting Error of Public Transport Trips in the Danish National Travel Survey Using Smart Card Data

Authors: Georges Sfeir, Filipe Rodrigues, Maya Abou Zeid, Francisco Camara Pereira

Household travel surveys have been used for decades to collect individuals
and households' travel behavior. However, self-reported surveys are subject to
recall bias, as respondents might struggle to recall and report their
activities accurately. This study examines the time reporting error of public
transit users in a nationwide household travel survey by matching, at the
individual level, five consecutive years of data from two sources, namely the
Danish National Travel Survey (TU) and the Danish Smart Card system
(Rejsekort). Survey respondents are matched with travel cards from the
Rejsekort data solely based on the respondents' declared spatiotemporal travel
behavior. Approximately, 70% of the respondents were successfully matched with
Rejsekort travel cards. The findings reveal a median time reporting error of
11.34 minutes, with an Interquartile Range of 28.14 minutes. Furthermore, a
statistical analysis was performed to explore the relationships between the
survey respondents' reporting error and their socio-economic and demographic
characteristics. The results indicate that females and respondents with a fixed
schedule are in general more accurate than males and respondents with a
flexible schedule in reporting their times of travel. Moreover, trips reported
during weekdays or via the internet displayed higher accuracies compared to
trips reported during weekends and holidays or via telephone interviews. This
disaggregated analysis provides valuable insights that could help in improving
the design and analysis of travel surveys, as well accounting for reporting
errors/biases in travel survey-based applications. Furthermore, it offers
valuable insights underlying the psychology of travel recall by survey
respondents.

arXiv link: http://arxiv.org/abs/2308.01198v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-08-02

The Bayesian Context Trees State Space Model for time series modelling and forecasting

Authors: Ioannis Papageorgiou, Ioannis Kontoyiannis

A hierarchical Bayesian framework is introduced for developing tree-based
mixture models for time series, partly motivated by applications in finance and
forecasting. At the top level, meaningful discrete states are identified as
appropriately quantised values of some of the most recent samples. At the
bottom level, a different, arbitrary base model is associated with each state.
This defines a very general framework that can be used in conjunction with any
existing model class to build flexible and interpretable mixture models. We
call this the Bayesian Context Trees State Space Model, or the BCT-X framework.
Appropriate algorithmic tools are described, which allow for effective and
efficient Bayesian inference and learning; these algorithms can be updated
sequentially, facilitating online forecasting. The utility of the general
framework is illustrated in the particular instances when AR or ARCH models are
used as base models. The latter results in a mixture model that offers a
powerful way of modelling the well-known volatility asymmetries in financial
data, revealing a novel, important feature of stock market index data, in the
form of an enhanced leverage effect. In forecasting, the BCT-X methods are
found to outperform several state-of-the-art techniques, both in terms of
accuracy and computational requirements.

arXiv link: http://arxiv.org/abs/2308.00913v3

Econometrics arXiv paper, submitted: 2023-08-01

Testing for Threshold Effects in Presence of Heteroskedasticity and Measurement Error with an application to Italian Strikes

Authors: Francesco Angelini, Massimiliano Castellani, Simone Giannerini, Greta Goracci

Many macroeconomic time series are characterised by nonlinearity both in the
conditional mean and in the conditional variance and, in practice, it is
important to investigate separately these two aspects. Here we address the
issue of testing for threshold nonlinearity in the conditional mean, in the
presence of conditional heteroskedasticity. We propose a supremum Lagrange
Multiplier approach to test a linear ARMA-GARCH model against the alternative
of a TARMA-GARCH model. We derive the asymptotic null distribution of the test
statistic and this requires novel results since the difficulties of working
with nuisance parameters, absent under the null hypothesis, are amplified by
the non-linear moving average, combined with GARCH-type innovations. We show
that tests that do not account for heteroskedasticity fail to achieve the
correct size even for large sample sizes. Moreover, we show that the TARMA
specification naturally accounts for the ubiquitous presence of measurement
error that affects macroeconomic data. We apply the results to analyse the time
series of Italian strikes and we show that the TARMA-GARCH specification is
consistent with the relevant macroeconomic theory while capturing the main
features of the Italian strikes dynamics, such as asymmetric cycles and
regime-switching.

arXiv link: http://arxiv.org/abs/2308.00444v1

Econometrics arXiv updated paper (originally submitted: 2023-07-31)

Randomization Inference of Heterogeneous Treatment Effects under Network Interference

Authors: Julius Owusu

We develop randomization-based tests for heterogeneous treatment effects in
the presence of network interference. Leveraging the exposure mapping
framework, we study a broad class of null hypotheses that represent various
forms of constant treatment effects in networked populations. These null
hypotheses, unlike the classical Fisher sharp null, are not sharp due to
unknown parameters and multiple potential outcomes. Existing conditional
randomization procedures either fail to control size or suffer from low
statistical power in this setting. We propose a testing procedure that
constructs a data-dependent focal assignment set and permits variation in focal
units across focal assignments. These features complicate both estimation and
inference, necessitating new technical developments. We establish the
asymptotic validity of the proposed procedure under general conditions on the
test statistic and characterize the asymptotic size distortion in terms of
observable quantities. The procedure is applied to experimental network data
and evaluated via Monte Carlo simulations.

arXiv link: http://arxiv.org/abs/2308.00202v5

Econometrics arXiv updated paper (originally submitted: 2023-07-31)

What's Logs Got to do With it: On the Perils of log Dependent Variables and Difference-in-Differences

Authors: Brendon McConnell

The log transformation of the dependent variable is not innocuous when using
a difference-in-differences (DD) model. With a dependent variable in logs, the
DD term captures an approximation of the proportional difference in growth
rates across groups. As I show with both simulations and two empirical
examples, if the baseline outcome distributions are sufficiently different
across groups, the DD parameter for a log-specification can be different in
sign to that of a levels-specification. I provide a condition, based on (i) the
aggregate time effect, and (ii) the difference in relative baseline outcome
means, for when the sign-switch will occur.

arXiv link: http://arxiv.org/abs/2308.00167v3

Econometrics arXiv updated paper (originally submitted: 2023-07-31)

A new mapping of technological interdependence

Authors: A. Fronzetti Colladon, B. Guardabascio, F. Venturini

How does technological interdependence affect innovation? We address this
question by examining the influence of neighbors' innovativeness and the
structure of the innovators' network on a sector's capacity to develop new
technologies. We study these two dimensions of technological interdependence by
applying novel methods of text mining and network analysis to the documents of
6.5 million patents granted by the United States Patent and Trademark Office
(USPTO) between 1976 and 2021. We find that, in the long run, the influence of
network linkages is as important as that of neighbor innovativeness. In the
short run, however, positive shocks to neighbor innovativeness yield relatively
rapid effects, while the impact of shocks strengthening network linkages
manifests with delay, even though lasts longer. Our analysis also highlights
that patent text contains a wealth of information often not captured by
traditional innovation metrics, such as patent citations.

arXiv link: http://arxiv.org/abs/2308.00014v3

Econometrics arXiv cross-link from cs.AI (cs.AI), submitted: 2023-07-31

Causal Inference for Banking Finance and Insurance A Survey

Authors: Satyam Kumar, Yelleti Vivek, Vadlamani Ravi, Indranil Bose

Causal Inference plays an significant role in explaining the decisions taken
by statistical models and artificial intelligence models. Of late, this field
started attracting the attention of researchers and practitioners alike. This
paper presents a comprehensive survey of 37 papers published during 1992-2023
and concerning the application of causal inference to banking, finance, and
insurance. The papers are categorized according to the following families of
domains: (i) Banking, (ii) Finance and its subdomains such as corporate
finance, governance finance including financial risk and financial policy,
financial economics, and Behavioral finance, and (iii) Insurance. Further, the
paper covers the primary ingredients of causal inference namely, statistical
methods such as Bayesian Causal Network, Granger Causality and jargon used
thereof such as counterfactuals. The review also recommends some important
directions for future research. In conclusion, we observed that the application
of causal inference in the banking and insurance sectors is still in its
infancy, and thus more research is possible to turn it into a viable method.

arXiv link: http://arxiv.org/abs/2307.16427v1

Econometrics arXiv paper, submitted: 2023-07-31

Inference for Low-rank Completion without Sample Splitting with Application to Treatment Effect Estimation

Authors: Jungjun Choi, Hyukjun Kwon, Yuan Liao

This paper studies the inferential theory for estimating low-rank matrices.
It also provides an inference method for the average treatment effect as an
application. We show that the least square estimation of eigenvectors following
the nuclear norm penalization attains the asymptotic normality. The key
contribution of our method is that it does not require sample splitting. In
addition, this paper allows dependent observation patterns and heterogeneous
observation probabilities. Empirically, we apply the proposed procedure to
estimating the impact of the presidential vote on allocating the U.S. federal
budget to the states.

arXiv link: http://arxiv.org/abs/2307.16370v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-07-30

Towards Practical Robustness Auditing for Linear Regression

Authors: Daniel Freund, Samuel B. Hopkins

We investigate practical algorithms to find or disprove the existence of
small subsets of a dataset which, when removed, reverse the sign of a
coefficient in an ordinary least squares regression involving that dataset. We
empirically study the performance of well-established algorithmic techniques
for this task -- mixed integer quadratically constrained optimization for
general linear regression problems and exact greedy methods for special cases.
We show that these methods largely outperform the state of the art and provide
a useful robustness check for regression problems in a few dimensions. However,
significant computational bottlenecks remain, especially for the important task
of disproving the existence of such small sets of influential samples for
regression problems of dimension $3$ or greater. We make some headway on this
challenge via a spectral algorithm using ideas drawn from recent innovations in
algorithmic robust statistics. We summarize the limitations of known techniques
in several challenge datasets to encourage further algorithmic innovation.

arXiv link: http://arxiv.org/abs/2307.16315v1

Econometrics arXiv paper, submitted: 2023-07-29

Panel Data Models with Time-Varying Latent Group Structures

Authors: Yiren Wang, Peter C B Phillips, Liangjun Su

This paper considers a linear panel model with interactive fixed effects and
unobserved individual and time heterogeneities that are captured by some latent
group structures and an unknown structural break, respectively. To enhance
realism the model may have different numbers of groups and/or different group
memberships before and after the break. With the preliminary
nuclear-norm-regularized estimation followed by row- and column-wise linear
regressions, we estimate the break point based on the idea of binary
segmentation and the latent group structures together with the number of groups
before and after the break by sequential testing K-means algorithm
simultaneously. It is shown that the break point, the number of groups and the
group memberships can each be estimated correctly with probability approaching
one. Asymptotic distributions of the estimators of the slope coefficients are
established. Monte Carlo simulations demonstrate excellent finite sample
performance for the proposed estimation algorithm. An empirical application to
real house price data across 377 Metropolitan Statistical Areas in the US from
1975 to 2014 suggests the presence both of structural breaks and of changes in
group membership.

arXiv link: http://arxiv.org/abs/2307.15863v1

Econometrics arXiv paper, submitted: 2023-07-28

Group-Heterogeneous Changes-in-Changes and Distributional Synthetic Controls

Authors: Songnian Chen, Junlong Feng

We develop new methods for changes-in-changes and distributional synthetic
controls when there exists group level heterogeneity. For changes-in-changes,
we allow individuals to belong to a large number of heterogeneous groups. The
new method extends the changes-in-changes method in Athey and Imbens (2006) by
finding appropriate subgroups within the control groups which share similar
group level unobserved characteristics to the treatment groups. For
distributional synthetic control, we show that the appropriate synthetic
control needs to be constructed using units in potentially different time
periods in which they have comparable group level heterogeneity to the
treatment group, instead of units that are only in the same time period as in
Gunsilius (2023). Implementation and data requirements for these new methods
are briefly discussed.

arXiv link: http://arxiv.org/abs/2307.15313v1

Econometrics arXiv updated paper (originally submitted: 2023-07-27)

On the Efficiency of Finely Stratified Experiments

Authors: Yuehao Bai, Jizhou Liu, Azeem M. Shaikh, Max Tabord-Meehan

This paper studies the use of finely stratified designs for the efficient
estimation of a large class of treatment effect parameters that arise in the
analysis of experiments. By a "finely stratified" design, we mean experiments
in which units are divided into groups of a fixed size and a proportion within
each group is assigned to a binary treatment uniformly at random. The class of
parameters considered are those that can be expressed as the solution to a set
of moment conditions constructed using a known function of the observed data.
They include, among other things, average treatment effects, quantile treatment
effects, and local average treatment effects as well as the counterparts to
these quantities in experiments in which the unit is itself a cluster. In this
setting, we establish three results. First, we show that under a finely
stratified design, the naive method of moments estimator achieves the same
asymptotic variance as what could typically be attained under alternative
treatment assignment mechanisms only through ex post covariate adjustment.
Second, we argue that the naive method of moments estimator under a finely
stratified design is asymptotically efficient by deriving a lower bound on the
asymptotic variance of regular estimators of the parameter of interest in the
form of a convolution theorem. In this sense, finely stratified experiments are
attractive because they lead to efficient estimators of treatment effect
parameters "by design." Finally, we strengthen this conclusion by establishing
conditions under which a "fast-balancing" property of finely stratified designs
is in fact necessary for the naive method of moments estimator to attain the
efficiency bound.

arXiv link: http://arxiv.org/abs/2307.15181v6

Econometrics arXiv paper, submitted: 2023-07-27

Predictability Tests Robust against Parameter Instability

Authors: Christis Katsouris

We consider Wald type statistics designed for joint predictability and
structural break testing based on the instrumentation method of Phillips and
Magdalinos (2009). We show that under the assumption of nonstationary
predictors: (i) the tests based on the OLS estimators converge to a nonstandard
limiting distribution which depends on the nuisance coefficient of persistence;
and (ii) the tests based on the IVX estimators can filter out the persistence
under certain parameter restrictions due to the supremum functional. These
results contribute to the literature of joint predictability and parameter
instability testing by providing analytical tractable asymptotic theory when
taking into account nonstationary regressors. We compare the finite-sample size
and power performance of the Wald tests under both estimators via extensive
Monte Carlo experiments. Critical values are computed using standard bootstrap
inference methodologies. We illustrate the usefulness of the proposed framework
to test for predictability under the presence of parameter instability by
examining the stock market predictability puzzle for the US equity premium.

arXiv link: http://arxiv.org/abs/2307.15151v1

Econometrics arXiv updated paper (originally submitted: 2023-07-27)

One-step smoothing splines instrumental regression

Authors: Jad Beyhum, Elia Lapenta, Pascal Lavergne

We extend nonparametric regression smoothing splines to a context where there
is endogeneity and instrumental variables are available. Unlike popular
existing estimators, the resulting estimator is one-step and relies on a unique
regularization parameter. We derive rates of the convergence for the estimator
and its first derivative, which are uniform in the support of the endogenous
variable. We also address the issue of imposing monotonicity in estimation and
extend the approach to a partly linear model. Simulations confirm the good
performances of our estimator compared to two-step procedures. Our method
yields economically sensible results when used to estimate Engel curves.

arXiv link: http://arxiv.org/abs/2307.14867v4

Econometrics arXiv paper, submitted: 2023-07-26

Weak (Proxy) Factors Robust Hansen-Jagannathan Distance For Linear Asset Pricing Models

Authors: Lingwei Kong

The Hansen-Jagannathan (HJ) distance statistic is one of the most dominant
measures of model misspecification. However, the conventional HJ specification
test procedure has poor finite sample performance, and we show that it can be
size distorted even in large samples when (proxy) factors exhibit small
correlations with asset returns. In other words, applied researchers are likely
to falsely reject a model even when it is correctly specified. We provide two
alternatives for the HJ statistic and two corresponding novel procedures for
model specification tests, which are robust against the presence of weak
(proxy) factors, and we also offer a novel robust risk premia estimator.
Simulation exercises support our theory. Our empirical application documents
the non-reliability of the traditional HJ test since it may produce
counter-intuitive results when comparing nested models by rejecting a
four-factor model but not the reduced three-factor model. At the same time, our
proposed methods are practically more appealing and show support for a
four-factor model for Fama French portfolios.

arXiv link: http://arxiv.org/abs/2307.14499v1

Econometrics arXiv paper, submitted: 2023-07-26

Bootstrapping Nonstationary Autoregressive Processes with Predictive Regression Models

Authors: Christis Katsouris

We establish the asymptotic validity of the bootstrap-based IVX estimator
proposed by Phillips and Magdalinos (2009) for the predictive regression model
parameter based on a local-to-unity specification of the autoregressive
coefficient which covers both nearly nonstationary and nearly stationary
processes. A mixed Gaussian limit distribution is obtained for the
bootstrap-based IVX estimator. The statistical validity of the theoretical
results are illustrated by Monte Carlo experiments for various statistical
inference problems.

arXiv link: http://arxiv.org/abs/2307.14463v1

Econometrics arXiv updated paper (originally submitted: 2023-07-26)

Causal Effects in Matching Mechanisms with Strategically Reported Preferences

Authors: Marinho Bertanha, Margaux Luflade, Ismael Mourifié

A growing number of central authorities use assignment mechanisms to allocate
students to schools in a way that reflects student preferences and school
priorities. However, most real-world mechanisms incentivize students to
strategically misreport their preferences. Misreporting complicates the
identification of causal parameters that depend on true preferences, which are
necessary inputs for a broad class of counterfactual analyses. In this paper,
we provide an identification approach that is robust to strategic misreporting
and derive sharp bounds on causal effects of school assignment on future
outcomes. Our approach applies to any mechanism as long as there exist
placement scores and cutoffs that characterize that mechanism's allocation
rule. We use data from a deferred acceptance mechanism that assigns students to
more than 1,000 university--major combinations in Chile. Matching theory
predicts and empirical evidence suggests that students behave strategically in
Chile because they face constraints on their submission of preferences and have
good a priori information on the schools they will have access to. Our bounds
are informative enough to reveal significant heterogeneity in graduation
success with respect to preferences and school assignment.

arXiv link: http://arxiv.org/abs/2307.14282v3

Econometrics arXiv updated paper (originally submitted: 2023-07-26)

Dynamic Regression Discontinuity: An Event-Study Approach

Authors: Francesco Ruggieri

I propose a novel argument to identify economically interpretable
intertemporal treatment effects in dynamic regression discontinuity designs
(RDDs). Specifically, I develop a dynamic potential outcomes model and
reformulate two assumptions from the difference-in-differences literature, no
anticipation and common trends, to attain point identification of
cutoff-specific impulse responses. The estimand of each target parameter can be
expressed as the sum of two static RDD contrasts, thereby allowing for
nonparametric estimation and inference with standard local polynomial methods.
I also propose a nonparametric approach to aggregate treatment effects across
calendar time and treatment paths, leveraging a limited path independence
restriction to reduce the dimensionality of the parameter space. I apply this
method to estimate the dynamic effects of school district expenditure
authorizations on housing prices in Wisconsin.

arXiv link: http://arxiv.org/abs/2307.14203v5

Econometrics arXiv paper, submitted: 2023-07-26

Using Probabilistic Stated Preference Analyses to Understand Actual Choices

Authors: Romuald Meango

Can stated preferences help in counterfactual analyses of actual choice? This
research proposes a novel approach to researchers who have access to both
stated choices in hypothetical scenarios and actual choices. The key idea is to
use probabilistic stated choices to identify the distribution of individual
unobserved heterogeneity, even in the presence of measurement error. If this
unobserved heterogeneity is the source of endogeneity, the researcher can
correct for its influence in a demand function estimation using actual choices,
and recover causal effects. Estimation is possible with an off-the-shelf Group
Fixed Effects estimator.

arXiv link: http://arxiv.org/abs/2307.13966v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2023-07-25

The Core of Bayesian Persuasion

Authors: Laura Doval, Ran Eilat

An analyst observes the frequency with which an agent takes actions, but not
the frequency with which she takes actions conditional on a payoff relevant
state. In this setting, we ask when the analyst can rationalize the agent's
choices as the outcome of the agent learning something about the state before
taking action. Our characterization marries the obedience approach in
information design (Bergemann and Morris, 2016) and the belief approach in
Bayesian persuasion (Kamenica and Gentzkow, 2011) relying on a theorem by
Strassen (1965) and Hall's marriage theorem. We apply our results to
ring-network games and to identify conditions under which a data set is
consistent with a public information structure in first-order Bayesian
persuasion games.

arXiv link: http://arxiv.org/abs/2307.13849v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-07-25

Source Condition Double Robust Inference on Functionals of Inverse Problems

Authors: Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

We consider estimation of parameters defined as linear functionals of
solutions to linear inverse problems. Any such parameter admits a doubly robust
representation that depends on the solution to a dual linear inverse problem,
where the dual solution can be thought as a generalization of the inverse
propensity function. We provide the first source condition double robust
inference method that ensures asymptotic normality around the parameter of
interest as long as either the primal or the dual inverse problem is
sufficiently well-posed, without knowledge of which inverse problem is the more
well-posed one. Our result is enabled by novel guarantees for iterated Tikhonov
regularized adversarial estimators for linear inverse problems, over general
hypothesis spaces, which are developments of independent interest.

arXiv link: http://arxiv.org/abs/2307.13793v1

Econometrics arXiv updated paper (originally submitted: 2023-07-25)

Characteristics and Predictive Modeling of Short-term Impacts of Hurricanes on the US Employment

Authors: Gan Zhang, Wenjun Zhu

The physical and economic damages of hurricanes can acutely affect employment
and the well-being of employees. However, a comprehensive understanding of
these impacts remains elusive as many studies focused on narrow subsets of
regions or hurricanes. Here we present an open-source dataset that serves
interdisciplinary research on hurricane impacts on US employment. Compared to
past domain-specific efforts, this dataset has greater spatial-temporal
granularity and variable coverage. To demonstrate potential applications of
this dataset, we focus on the short-term employment disruptions related to
hurricanes during 1990-2020. The observed county-level employment changes in
the initial month are small on average, though large employment losses (>30%)
can occur after extreme storms. The overall small changes partly result from
compensation among different employment sectors, which may obscure large,
concentrated employment losses after hurricanes. Additional econometric
analyses concur on the post-storm employment losses in hospitality and leisure
but disagree on employment changes in the other industries. The dataset also
enables data-driven analyses that highlight vulnerabilities such as pronounced
employment losses related to Puerto Rico and rainy hurricanes. Furthermore,
predictive modeling of short-term employment changes shows promising
performance for service-providing industries and high-impact storms. In the
examined cases, the nonlinear Random Forests model greatly outperforms the
multiple linear regression model. The nonlinear model also suggests that more
severe hurricane hazards projected by physical models may cause more extreme
losses in US service-providing employment. Finally, we share our dataset and
analytical code to facilitate the study and modeling of hurricane impacts in a
changing climate.

arXiv link: http://arxiv.org/abs/2307.13686v3

Econometrics arXiv paper, submitted: 2023-07-25

Smoothing of numerical series by the triangle method on the example of hungarian gdp data 1992-2022 based on approximation by series of exponents

Authors: Yekimov Sergey

In practice , quite often there is a need to describe the values set by means
of a table in the form of some functional dependence . The observed values ,
due to certain circumstances , have an error . For approximation, it is
advisable to use a functional dependence that would allow smoothing out the
errors of the observation results. Approximation allows you to determine
intermediate values of functions that are not listed among the data in the
observation table. The use of exponential series for data approximation allows
you to get a result no worse than from approximation by polynomials In the
economic scientific literature, approximation in the form of power functions,
for example, the Cobb-Douglas function, has become widespread. The advantage of
this type of approximation can be called a simple type of approximating
function , and the disadvantage is that in nature not all processes can be
described by power functions with a given accuracy. An example is the GDP
indicator for several decades . For this case , it is difficult to find a power
function approximating a numerical series . But in this case, as shown in this
article, you can use exponential series to approximate the data. In this paper,
the time series of Hungary's GDP in the period from 1992 to 2022 was
approximated by a series of thirty exponents of a complex variable. The use of
data smoothing by the method of triangles allows you to average the data and
increase the accuracy of approximation . This is of practical importance if the
observed random variable contains outliers that need to be smoothed out.

arXiv link: http://arxiv.org/abs/2307.14378v1

Econometrics arXiv updated paper (originally submitted: 2023-07-25)

Large sample properties of GMM estimators under second-order identification

Authors: Hugo Kruiniger

Dovonon and Hall (Journal of Econometrics, 2018) proposed a limiting
distribution theory for GMM estimators for a p - dimensional globally
identified parameter vector {\phi} when local identification conditions fail at
first-order but hold at second-order. They assumed that the first-order
underidentification is due to the expected Jacobian having rank p-1 at the true
value {\phi}_{0}, i.e., having a rank deficiency of one. After reparametrizing
the model such that the last column of the Jacobian vanishes, they showed that
the GMM estimator of the first p-1 parameters converges at rate T^{-1/2} and
the GMM estimator of the remaining parameter, {\phi}_{p}, converges at rate
T^{-1/4}. They also provided a limiting distribution of
T^{1/4}({\phi}_{p}-{\phi}_{0,p}) subject to a (non-transparent) condition which
they claimed to be not restrictive in general. However, as we show in this
paper, their condition is in fact only satisfied when {\phi} is overidentified
and the limiting distribution of T^{1/4}({\phi}_{p}-{\phi}_{0,p}), which is
non-standard, depends on whether {\phi} is exactly identified or
overidentified. In particular, the limiting distributions of the sign of
T^{1/4}({\phi}_{p}-{\phi}_{0,p}) for the cases of exact and overidentification,
respectively, are different and are obtained by using expansions of the GMM
objective function of different orders. Unsurprisingly, we find that the
limiting distribution theories of Dovonon and Hall (2018) for Indirect
Inference (II) estimation under two different scenarios with second-order
identification where the target function is a GMM estimator of the auxiliary
parameter vector, are incomplete for similar reasons. We discuss how our
results for GMM estimation can be used to complete both theories and how they
can be used to obtain the limiting distributions of the II estimators in the
case of exact identification under either scenario.

arXiv link: http://arxiv.org/abs/2307.13475v2

Econometrics arXiv updated paper (originally submitted: 2023-07-25)

Testing for sparse idiosyncratic components in factor-augmented regression models

Authors: Jad Beyhum, Jonas Striaukas

We propose a novel bootstrap test of a dense model, namely factor regression,
against a sparse plus dense alternative augmenting model with sparse
idiosyncratic components. The asymptotic properties of the test are established
under time series dependence and polynomial tails. We outline a data-driven
rule to select the tuning parameter and prove its theoretical validity. In
simulation experiments, our procedure exhibits high power against sparse
alternatives and low power against dense deviations from the null. Moreover, we
apply our test to various datasets in macroeconomics and finance and often
reject the null. This suggests the presence of sparsity -- on top of a dense
model -- in commonly studied economic applications. The R package FAS
implements our approach.

arXiv link: http://arxiv.org/abs/2307.13364v4

Econometrics arXiv updated paper (originally submitted: 2023-07-24)

Inference in Experiments with Matched Pairs and Imperfect Compliance

Authors: Yuehao Bai, Hongchang Guo, Azeem M. Shaikh, Max Tabord-Meehan

This paper studies inference for the local average treatment effect in
randomized controlled trials with imperfect compliance where treatment status
is determined according to "matched pairs." By "matched pairs," we mean that
units are sampled i.i.d. from the population of interest, paired according to
observed, baseline covariates and finally, within each pair, one unit is
selected at random for treatment. Under weak assumptions governing the quality
of the pairings, we first derive the limit distribution of the usual Wald
(i.e., two-stage least squares) estimator of the local average treatment
effect. We show further that conventional heteroskedasticity-robust estimators
of the Wald estimator's limiting variance are generally conservative, in that
their probability limits are (typically strictly) larger than the limiting
variance. We therefore provide an alternative estimator of the limiting
variance that is consistent. Finally, we consider the use of additional
observed, baseline covariates not used in pairing units to increase the
precision with which we can estimate the local average treatment effect. To
this end, we derive the limiting behavior of a two-stage least squares
estimator of the local average treatment effect which includes both the
additional covariates in addition to pair fixed effects, and show that its
limiting variance is always less than or equal to that of the Wald estimator.
To complete our analysis, we provide a consistent estimator of this limiting
variance. A simulation study confirms the practical relevance of our
theoretical results. Finally, we apply our results to revisit a prominent
experiment studying the effect of macroinsurance on microenterprise in Egypt.

arXiv link: http://arxiv.org/abs/2307.13094v2

Econometrics arXiv paper, submitted: 2023-07-24

Identification Robust Inference for the Risk Premium in Term Structure Models

Authors: Frank Kleibergen, Lingwei Kong

We propose identification robust statistics for testing hypotheses on the
risk premia in dynamic affine term structure models. We do so using the moment
equation specification proposed for these models in Adrian et al. (2013). We
extend the subset (factor) Anderson-Rubin test from Guggenberger et al. (2012)
to models with multiple dynamic factors and time-varying risk prices. Unlike
projection-based tests, it provides a computationally tractable manner to
conduct identification robust tests on a larger number of parameters. We
analyze the potential identification issues arising in empirical studies.
Statistical inference based on the three-stage estimator from Adrian et al.
(2013) requires knowledge of the factors' quality and is misleading without
full-rank beta's or with sampling errors of comparable size as the loadings.
Empirical applications show that some factors, though potentially weak, may
drive the time variation of risk prices, and weak identification issues are
more prominent in multi-factor models.

arXiv link: http://arxiv.org/abs/2307.12628v1

Econometrics arXiv paper, submitted: 2023-07-21

Scenario Sampling for Large Supermodular Games

Authors: Bryan S. Graham, Andrin Pelican

This paper introduces a simulation algorithm for evaluating the
log-likelihood function of a large supermodular binary-action game. Covered
examples include (certain types of) peer effect, technology adoption, strategic
network formation, and multi-market entry games. More generally, the algorithm
facilitates simulated maximum likelihood (SML) estimation of games with large
numbers of players, $T$, and/or many binary actions per player, $M$ (e.g.,
games with tens of thousands of strategic actions, $TM=O(10^4)$). In such cases
the likelihood of the observed pure strategy combination is typically (i) very
small and (ii) a $TM$-fold integral who region of integration has a complicated
geometry. Direct numerical integration, as well as accept-reject Monte Carlo
integration, are computationally impractical in such settings. In contrast, we
introduce a novel importance sampling algorithm which allows for accurate
likelihood simulation with modest numbers of simulation draws.

arXiv link: http://arxiv.org/abs/2307.11857v1

Econometrics arXiv paper, submitted: 2023-07-21

Functional Differencing in Networks

Authors: Stéphane Bonhomme, Kevin Dano

Economic interactions often occur in networks where heterogeneous agents
(such as workers or firms) sort and produce. However, most existing estimation
approaches either require the network to be dense, which is at odds with many
empirical networks, or they require restricting the form of heterogeneity and
the network formation process. We show how the functional differencing approach
introduced by Bonhomme (2012) in the context of panel data, can be applied in
network settings to derive moment restrictions on model parameters and average
effects. Those restrictions are valid irrespective of the form of
heterogeneity, and they hold in both dense and sparse networks. We illustrate
the analysis with linear and nonlinear models of matched employer-employee
data, in the spirit of the model introduced by Abowd, Kramarz, and Margolis
(1999).

arXiv link: http://arxiv.org/abs/2307.11484v1

Econometrics arXiv updated paper (originally submitted: 2023-07-20)

Asymptotically Unbiased Synthetic Control Methods by Density Matching

Authors: Masahiro Kato, Akari Ohda

Synthetic Control Methods (SCMs) have become a fundamental tool for
comparative case studies. The core idea behind SCMs is to estimate treatment
effects by predicting counterfactual outcomes for a treated unit using a
weighted combination of observed outcomes from untreated units. The accuracy of
these predictions is crucial for evaluating the treatment effect of a policy
intervention. Subsequent research has therefore focused on estimating SC
weights. In this study, we highlight a key endogeneity issue in existing
SCMs-namely, the correlation between the outcomes of untreated units and the
error term of the synthetic control, which leads to bias in both counterfactual
outcome prediction and treatment effect estimation. To address this issue, we
propose a novel SCM based on density matching, assuming that the outcome
density of the treated unit can be approximated by a weighted mixture of the
joint density of untreated units. Under this assumption, we estimate SC weights
by matching the moments of the treated outcomes with the weighted sum of the
moments of the untreated outcomes. Our method offers three advantages: first,
under the mixture model assumption, our estimator is asymptotically unbiased;
second, this asymptotic unbiasedness reduces the mean squared error in
counterfactual predictions; and third, our method provides full densities of
the treatment effect rather than just expected values, thereby broadening the
applicability of SCMs. Finally, we present experimental results that
demonstrate the effectiveness of our approach.

arXiv link: http://arxiv.org/abs/2307.11127v4

Econometrics arXiv paper, submitted: 2023-07-20

Real-Time Detection of Local No-Arbitrage Violations

Authors: Torben G. Andersen, Viktor Todorov, Bo Zhou

This paper focuses on the task of detecting local episodes involving
violation of the standard It\^o semimartingale assumption for financial asset
prices in real time that might induce arbitrage opportunities. Our proposed
detectors, defined as stopping rules, are applied sequentially to continually
incoming high-frequency data. We show that they are asymptotically
exponentially distributed in the absence of Ito semimartingale violations. On
the other hand, when a violation occurs, we can achieve immediate detection
under infill asymptotics. A Monte Carlo study demonstrates that the asymptotic
results provide a good approximation to the finite-sample behavior of the
sequential detectors. An empirical application to S&P 500 index futures data
corroborates the effectiveness of our detectors in swiftly identifying the
emergence of an extreme return persistence episode in real time.

arXiv link: http://arxiv.org/abs/2307.10872v1

Econometrics arXiv updated paper (originally submitted: 2023-07-20)

PySDTest: a Python/Stata Package for Stochastic Dominance Tests

Authors: Kyungho Lee, Yoon-Jae Whang

We introduce PySDTest, a Python/Stata package for statistical tests of
stochastic dominance. PySDTest implements various testing procedures such as
Barrett and Donald (2003), Linton et al. (2005), Linton et al. (2010), and
Donald and Hsu (2016), along with their extensions. Users can flexibly combine
several resampling methods and test statistics, including the numerical delta
method (D\"umbgen, 1993; Hong and Li, 2018; Fang and Santos, 2019). The package
allows for testing advanced hypotheses on stochastic dominance relations, such
as stochastic maximality among multiple prospects. We first provide an overview
of the concepts of stochastic dominance and testing methods. Then, we offer
practical guidance for using the package and the Stata command pysdtest. We
apply PySDTest to investigate the portfolio choice problem between the daily
returns of Bitcoin and the S&P 500 index as an empirical illustration. Our
findings indicate that the S&P 500 index returns second-order stochastically
dominate the Bitcoin returns.

arXiv link: http://arxiv.org/abs/2307.10694v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-07-19

Latent Gaussian dynamic factor modeling and forecasting for multivariate count time series

Authors: Younghoon Kim, Marie-Christine Düker, Zachary F. Fisher, Vladas Pipiras

This work considers estimation and forecasting in a multivariate, possibly
high-dimensional count time series model constructed from a transformation of a
latent Gaussian dynamic factor series. The estimation of the latent model
parameters is based on second-order properties of the count and underlying
Gaussian time series, yielding estimators of the underlying covariance matrices
for which standard principal component analysis applies. Theoretical
consistency results are established for the proposed estimation, building on
certain concentration results for the models of the type considered. They also
involve the memory of the latent Gaussian process, quantified through a
spectral gap, shown to be suitably bounded as the model dimension increases,
which is of independent interest. In addition, novel cross-validation schemes
are suggested for model selection. The forecasting is carried out through a
particle-based sequential Monte Carlo, leveraging Kalman filtering techniques.
A simulation study and an application are also considered.

arXiv link: http://arxiv.org/abs/2307.10454v3

Econometrics arXiv updated paper (originally submitted: 2023-07-19)

Asymptotic equivalence of Principal Components and Quasi Maximum Likelihood estimators in Large Approximate Factor Models

Authors: Matteo Barigozzi

This paper investigates the properties of Quasi Maximum Likelihood estimation
of an approximate factor model for an $n$-dimensional vector of stationary time
series. We prove that the factor loadings estimated by Quasi Maximum Likelihood
are asymptotically equivalent, as $n\to\infty$, to those estimated via
Principal Components. Both estimators are, in turn, also asymptotically
equivalent, as $n\to\infty$, to the unfeasible Ordinary Least Squares estimator
we would have if the factors were observed. We also show that the usual
sandwich form of the asymptotic covariance matrix of the Quasi Maximum
Likelihood estimator is asymptotically equivalent to the simpler asymptotic
covariance matrix of the unfeasible Ordinary Least Squares. All these results
hold in the general case in which the idiosyncratic components are
cross-sectionally heteroskedastic, as well as serially and cross-sectionally
weakly correlated. The intuition behind these results is that as $n\to\infty$
the factors can be considered as observed, thus showing that factor models
enjoy a blessing of dimensionality.

arXiv link: http://arxiv.org/abs/2307.09864v5

Econometrics arXiv paper, submitted: 2023-07-18

Risk Preference Types, Limited Consideration, and Welfare

Authors: Levon Barseghyan, Francesca Molinari

We provide sufficient conditions for semi-nonparametric point identification
of a mixture model of decision making under risk, when agents make choices in
multiple lines of insurance coverage (contexts) by purchasing a bundle. As a
first departure from the related literature, the model allows for two
preference types. In the first one, agents behave according to standard
expected utility theory with CARA Bernoulli utility function, with an
agent-specific coefficient of absolute risk aversion whose distribution is left
completely unspecified. In the other, agents behave according to the dual
theory of choice under risk(Yaari, 1987) combined with a one-parameter family
distortion function, where the parameter is agent-specific and is drawn from a
distribution that is left completely unspecified. Within each preference type,
the model allows for unobserved heterogeneity in consideration sets, where the
latter form at the bundle level -- a second departure from the related
literature. Our point identification result rests on observing sufficient
variation in covariates across contexts, without requiring any independent
variation across alternatives within a single context. We estimate the model on
data on households' deductible choices in two lines of property insurance, and
use the results to assess the welfare implications of a hypothetical market
intervention where the two lines of insurance are combined into a single one.
We study the role of limited consideration in mediating the welfare effects of
such intervention.

arXiv link: http://arxiv.org/abs/2307.09411v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2023-07-17

Comparative Analysis of Machine Learning, Hybrid, and Deep Learning Forecasting Models Evidence from European Financial Markets and Bitcoins

Authors: Apostolos Ampountolas

This study analyzes the transmission of market uncertainty on key European
financial markets and the cryptocurrency market over an extended period,
encompassing the pre, during, and post-pandemic periods. Daily financial market
indices and price observations are used to assess the forecasting models. We
compare statistical, machine learning, and deep learning forecasting models to
evaluate the financial markets, such as the ARIMA, hybrid ETS-ANN, and kNN
predictive models. The study results indicate that predicting financial market
fluctuations is challenging, and the accuracy levels are generally low in
several instances. ARIMA and hybrid ETS-ANN models perform better over extended
periods compared to the kNN model, with ARIMA being the best-performing model
in 2018-2021 and the hybrid ETS-ANN model being the best-performing model in
most of the other subperiods. Still, the kNN model outperforms the others in
several periods, depending on the observed accuracy measure. Researchers have
advocated using parametric and non-parametric modeling combinations to generate
better results. In this study, the results suggest that the hybrid ETS-ANN
model is the best-performing model despite its moderate level of accuracy.
Thus, the hybrid ETS-ANN model is a promising financial time series forecasting
approach. The findings offer financial analysts an additional source that can
provide valuable insights for investment decisions.

arXiv link: http://arxiv.org/abs/2307.08853v1

Econometrics arXiv paper, submitted: 2023-07-15

Supervised Dynamic PCA: Linear Dynamic Forecasting with Many Predictors

Authors: Zhaoxing Gao, Ruey S. Tsay

This paper proposes a novel dynamic forecasting method using a new supervised
Principal Component Analysis (PCA) when a large number of predictors are
available. The new supervised PCA provides an effective way to bridge the gap
between predictors and the target variable of interest by scaling and combining
the predictors and their lagged values, resulting in an effective dynamic
forecasting. Unlike the traditional diffusion-index approach, which does not
learn the relationships between the predictors and the target variable before
conducting PCA, we first re-scale each predictor according to their
significance in forecasting the targeted variable in a dynamic fashion, and a
PCA is then applied to a re-scaled and additive panel, which establishes a
connection between the predictability of the PCA factors and the target
variable. Furthermore, we also propose to use penalized methods such as the
LASSO approach to select the significant factors that have superior predictive
power over the others. Theoretically, we show that our estimators are
consistent and outperform the traditional methods in prediction under some mild
conditions. We conduct extensive simulations to verify that the proposed method
produces satisfactory forecasting results and outperforms most of the existing
methods using the traditional PCA. A real example of predicting U.S.
macroeconomic variables using a large number of predictors showcases that our
method fares better than most of the existing ones in applications. The
proposed method thus provides a comprehensive and effective approach for
dynamic forecasting in high-dimensional data analysis.

arXiv link: http://arxiv.org/abs/2307.07689v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-07-14

Sparsified Simultaneous Confidence Intervals for High-Dimensional Linear Models

Authors: Xiaorui Zhu, Yichen Qin, Peng Wang

Statistical inference of the high-dimensional regression coefficients is
challenging because the uncertainty introduced by the model selection procedure
is hard to account for. A critical question remains unsettled; that is, is it
possible and how to embed the inference of the model into the simultaneous
inference of the coefficients? To this end, we propose a notion of simultaneous
confidence intervals called the sparsified simultaneous confidence intervals.
Our intervals are sparse in the sense that some of the intervals' upper and
lower bounds are shrunken to zero (i.e., $[0,0]$), indicating the unimportance
of the corresponding covariates. These covariates should be excluded from the
final model. The rest of the intervals, either containing zero (e.g., $[-1,1]$
or $[0,1]$) or not containing zero (e.g., $[2,3]$), indicate the plausible and
significant covariates, respectively. The proposed method can be coupled with
various selection procedures, making it ideal for comparing their uncertainty.
For the proposed method, we establish desirable asymptotic properties, develop
intuitive graphical tools for visualization, and justify its superior
performance through simulation and real data analysis.

arXiv link: http://arxiv.org/abs/2307.07574v2

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2023-07-14

Global path preference and local response: A reward decomposition approach for network path choice analysis in the presence of locally perceived attributes

Authors: Yuki Oyama

This study performs an attribute-level analysis of the global and local path
preferences of network travelers. To this end, a reward decomposition approach
is proposed and integrated into a link-based recursive (Markovian) path choice
model. The approach decomposes the instantaneous reward function associated
with each state-action pair into the global utility, a function of attributes
globally perceived from anywhere in the network, and the local utility, a
function of attributes that are only locally perceived from the current state.
Only the global utility then enters the value function of each state,
representing the future expected utility toward the destination. This
global-local path choice model with decomposed reward functions allows us to
analyze to what extent and which attributes affect the global and local path
choices of agents. Moreover, unlike most adaptive path choice models, the
proposed model can be estimated based on revealed path observations (without
the information of plans) and as efficiently as deterministic recursive path
choice models. The model was applied to the real pedestrian path choice
observations in an urban street network where the green view index was
extracted as a visual street quality from Google Street View images. The result
revealed that pedestrians locally perceive and react to the visual street
quality, rather than they have the pre-trip global perception on it.
Furthermore, the simulation results using the estimated models suggested the
importance of location selection of interventions when policy-related
attributes are only locally perceived by travelers.

arXiv link: http://arxiv.org/abs/2307.08646v1

Econometrics arXiv updated paper (originally submitted: 2023-07-13)

Choice Models and Permutation Invariance: Demand Estimation in Differentiated Products Markets

Authors: Amandeep Singh, Ye Liu, Hema Yoganarasimhan

Choice modeling is at the core of understanding how changes to the
competitive landscape affect consumer choices and reshape market equilibria. In
this paper, we propose a fundamental characterization of choice functions that
encompasses a wide variety of extant choice models. We demonstrate how
non-parametric estimators like neural nets can easily approximate such
functionals and overcome the curse of dimensionality that is inherent in the
non-parametric estimation of choice functions. We demonstrate through extensive
simulations that our proposed functionals can flexibly capture underlying
consumer behavior in a completely data-driven fashion and outperform
traditional parametric models. As demand settings often exhibit endogenous
features, we extend our framework to incorporate estimation under endogenous
features. Further, we also describe a formal inference procedure to construct
valid confidence intervals on objects of interest like price elasticity.
Finally, to assess the practical applicability of our estimator, we utilize a
real-world dataset from S. Berry, Levinsohn, and Pakes (1995). Our empirical
analysis confirms that the estimator generates realistic and comparable own-
and cross-price elasticities that are consistent with the observations reported
in the existing literature.

arXiv link: http://arxiv.org/abs/2307.07090v2

Econometrics arXiv updated paper (originally submitted: 2023-07-13)

The Canonical Decomposition of Factor Models: Weak Factors are Everywhere

Authors: Philipp Gersing, Matteo Barigozzi, Christoph Rust, Manfred Deistler

There are two approaches to time series approximate factor models: the static
factor model, where the factors are loaded contemporaneously by the common
component, and the Generalised Dynamic Factor Model, where the factors are
loaded with lags. In this paper we derive a canonical decomposition which nests
both models by introducing the weak common component which is the difference
between the dynamic- and the static common component. Such component is driven
by potentially infinitely many non-pervasive weak factors which live in the
dynamically common space (not to be confused with rate-weak factors, being
pervasive but associated with a slower rate). Our result shows that the
relation between the two approaches is far more rich and complex than what
usually assumed. We exemplify why the weak common component shall not be
neglected by means of theoretical and empirical examples. Furthermore, we
propose a simple estimation procedure for the canonical decomposition. Our
empirical estimates on US macroeconomic data reveal that the weak common
component can account for a large part of the variation of individual
variables. Furthermore in a pseudo real-time forecasting evaluation for
industrial production and inflation, we show that gains can be obtained from
considering the dynamic approach over the static approach.

arXiv link: http://arxiv.org/abs/2307.10067v3

Econometrics arXiv updated paper (originally submitted: 2023-07-12)

The Yule-Frisch-Waugh-Lovell Theorem for Linear Instrumental Variables Estimation

Authors: Deepankar Basu

In this paper, I discuss three aspects of the Frisch-Waugh-Lovell theorem.
First, I show that the theorem holds for linear instrumental variables
estimation of a multiple regression model that is either exactly or
overidentified. I show that with linear instrumental variables estimation: (a)
coefficients on endogenous variables are identical in full and partial (or
residualized) regressions; (b) residual vectors are identical for full and
partial regressions; and (c) estimated covariance matrices of the coefficient
vectors from full and partial regressions are equal (up to a degree of freedom
correction) if the estimator of the error vector is a function only of the
residual vectors and does not use any information about the covariate matrix
other than its dimensions. While estimation of the full model uses the full set
of instrumental variables, estimation of the partial model uses the
residualized version of the same set of instrumental variables, with
residualization carried out with respect to the set of exogenous variables.
Second, I show that: (a) the theorem applies in large samples to the K-class of
estimators, including the limited information maximum likelihood (LIML)
estimator, and (b) the theorem does not apply in general to linear GMM
estimators, but it does apply to the two step optimal linear GMM estimator.
Third, I trace the historical and analytical development of the theorem and
suggest that it be renamed as the Yule-Frisch-Waugh-Lovell (YFWL) theorem to
recognize the pioneering contribution of the statistician G. Udny Yule in its
development.

arXiv link: http://arxiv.org/abs/2307.12731v2

Econometrics arXiv updated paper (originally submitted: 2023-07-12)

Stationarity with Occasionally Binding Constraints

Authors: James A. Duffy, Sophocles Mavroeidis, Sam Wycherley

This paper studies a class of multivariate threshold autoregressive models,
known as censored and kinked structural vector autoregressions (CKSVAR), which
are notably able to accommodate series that are subject to occasionally binding
constraints. We develop a set of sufficient conditions for the processes
generated by a CKSVAR to be stationary, ergodic, and weakly dependent. Our
conditions relate directly to the stability of the deterministic part of the
model, and are therefore less conservative than those typically available for
general vector threshold autoregressive (VTAR) models. Though our criteria
refer to quantities, such as refinements of the joint spectral radius, that
cannot feasibly be computed exactly, they can be approximated numerically to a
high degree of precision.

arXiv link: http://arxiv.org/abs/2307.06190v2

Econometrics arXiv paper, submitted: 2023-07-12

Identification in Multiple Treatment Models under Discrete Variation

Authors: Vishal Kamat, Samuel Norris, Matthew Pecenco

We develop a method to learn about treatment effects in multiple treatment
models with discrete-valued instruments. We allow selection into treatment to
be governed by a general class of threshold crossing models that permits
multidimensional unobserved heterogeneity. Under a semi-parametric restriction
on the distribution of unobserved heterogeneity, we show how a sequence of
linear programs can be used to compute sharp bounds for a number of treatment
effect parameters when the marginal treatment response functions underlying
them remain nonparametric or are additionally parameterized.

arXiv link: http://arxiv.org/abs/2307.06174v1

Econometrics arXiv paper, submitted: 2023-07-12

Robust Impulse Responses using External Instruments: the Role of Information

Authors: Davide Brignone, Alessandro Franconi, Marco Mazzali

External-instrument identification leads to biased responses when the shock
is not invertible and the measurement error is present. We propose to use this
identification strategy in a structural Dynamic Factor Model, which we call
Proxy DFM. In a simulation analysis, we show that the Proxy DFM always
successfully retrieves the true impulse responses, while the Proxy SVAR
systematically fails to do so when the model is either misspecified, does not
include all relevant information, or the measurement error is present. In an
application to US monetary policy, the Proxy DFM shows that a tightening shock
is unequivocally contractionary, with deteriorations in domestic demand, labor,
credit, housing, exchange, and financial markets. This holds true for all raw
instruments available in the literature. The variance decomposition analysis
highlights the importance of monetary policy shocks in explaining economic
fluctuations, albeit at different horizons.

arXiv link: http://arxiv.org/abs/2307.06145v1

Econometrics arXiv updated paper (originally submitted: 2023-07-11)

What Does it Take to Control Global Temperatures? A toolbox for testing and estimating the impact of economic policies on climate

Authors: Guillaume Chevillon, Takamitsu Kurita

This paper tests the feasibility and estimates the cost of climate control
through economic policies. It provides a toolbox for a statistical historical
assessment of a Stochastic Integrated Model of Climate and the Economy, and its
use in (possibly counterfactual) policy analysis. Recognizing that
stabilization requires supressing a trend, we use an integrated-cointegrated
Vector Autoregressive Model estimated using a newly compiled dataset ranging
between years A.D. 1000-2008, extending previous results on Control Theory in
nonstationary systems. We test statistically whether, and quantify to what
extent, carbon abatement policies can effectively stabilize or reduce global
temperatures. Our formal test of policy feasibility shows that carbon abatement
can have a significant long run impact and policies can render temperatures
stationary around a chosen long run mean. In a counterfactual empirical
illustration of the possibilities of our modeling strategy, we study a
retrospective policy aiming to keep global temperatures close to their 1900
historical level. Achieving this via carbon abatement may cost about 75% of the
observed 2008 level of world GDP, a cost equivalent to reverting to levels of
output historically observed in the mid 1960s. By contrast, investment in
carbon neutral technology could achieve the policy objective and be
self-sustainable as long as it costs less than 50% of 2008 global GDP and 75%
of consumption.

arXiv link: http://arxiv.org/abs/2307.05818v2

Econometrics arXiv updated paper (originally submitted: 2023-07-11)

Synthetic Decomposition for Counterfactual Predictions

Authors: Nathan Canen, Kyungchul Song

Counterfactual predictions are challenging when the policy variable goes
beyond its pre-policy support. However, in many cases, information about the
policy of interest is available from different ("source") regions where a
similar policy has already been implemented. In this paper, we propose a novel
method of using such data from source regions to predict a new policy in a
target region. Instead of relying on extrapolation of a structural relationship
using a parametric specification, we formulate a transferability condition and
construct a synthetic outcome-policy relationship such that it is as close as
possible to meeting the condition. The synthetic relationship weighs both the
similarity in distributions of observables and in structural relationships. We
develop a general procedure to construct asymptotic confidence intervals for
counterfactual predictions and prove its asymptotic validity. We then apply our
proposal to predict average teenage employment in Texas following a
counterfactual increase in the minimum wage.

arXiv link: http://arxiv.org/abs/2307.05122v2

Econometrics arXiv paper, submitted: 2023-07-09

Decentralized Decision-Making in Retail Chains: Evidence from Inventory Management

Authors: Victor Aguirregabiria, Francis Guiton

This paper investigates the impact of decentralizing inventory
decision-making in multi-establishment firms using data from a large retail
chain. Analyzing two years of daily data, we find significant heterogeneity
among the inventory decisions made by 634 store managers. By estimating a
dynamic structural model, we reveal substantial heterogeneity in managers'
perceived costs. Moreover, we observe a correlation between the variance of
these perceptions and managers' education and experience. Counterfactual
experiments show that centralized inventory management reduces costs by
eliminating the impact of managers' skill heterogeneity. However, these
benefits are offset by the negative impact of delayed demand information.

arXiv link: http://arxiv.org/abs/2307.05562v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2023-07-07

Are there Dragon Kings in the Stock Market?

Authors: Jiong Liu, M. Dashti Moghaddam, R. A. Serota

We undertake a systematic study of historic market volatility spanning
roughly five preceding decades. We focus specifically on the time series of
realized volatility (RV) of the S&P500 index and its distribution function. As
expected, the largest values of RV coincide with the largest economic upheavals
of the period: Savings and Loan Crisis, Tech Bubble, Financial Crisis and Covid
Pandemic. We address the question of whether these values belong to one of the
three categories: Black Swans (BS), that is they lie on scale-free, power-law
tails of the distribution; Dragon Kings (DK), defined as statistically
significant upward deviations from BS; or Negative Dragons Kings (nDK), defined
as statistically significant downward deviations from BS. In analyzing the
tails of the distribution with RV > 40, we observe the appearance of
"potential" DK which eventually terminate in an abrupt plunge to nDK. This
phenomenon becomes more pronounced with the increase of the number of days over
which the average RV is calculated -- here from daily, n=1, to "monthly," n=21.
We fit the entire distribution with a modified Generalized Beta (mGB)
distribution function, which terminates at a finite value of the variable but
exhibits a long power-law stretch prior to that, as well as Generalized Beta
Prime (GB2) distribution function, which has a power-law tail. We also fit the
tails directly with a straight line on a log-log scale. In order to ascertain
BS, DK or nDK behavior, all fits include their confidence intervals and
p-values are evaluated for the data points to check if they can come from the
respective distributions.

arXiv link: http://arxiv.org/abs/2307.03693v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-07-07

Generalised Covariances and Correlations

Authors: Tobias Fissler, Marc-Oliver Pohle

The covariance of two random variables measures the average joint deviations
from their respective means. We generalise this well-known measure by replacing
the means with other statistical functionals such as quantiles, expectiles, or
thresholds. Deviations from these functionals are defined via generalised
errors, often induced by identification or moment functions. As a normalised
measure of dependence, a generalised correlation is constructed. Replacing the
common Cauchy-Schwarz normalisation by a novel Fr\'echet-Hoeffding
normalisation, we obtain attainability of the entire interval $[-1, 1]$ for any
given marginals. We uncover favourable properties of these new dependence
measures. The families of quantile and threshold correlations give rise to
function-valued distributional correlations, exhibiting the entire dependence
structure. They lead to tail correlations, which should arguably supersede the
coefficients of tail dependence. Finally, we construct summary covariances
(correlations), which arise as (normalised) weighted averages of distributional
covariances. We retrieve Pearson covariance and Spearman correlation as special
cases. The applicability and usefulness of our new dependence measures is
illustrated on demographic data from the Panel Study of Income Dynamics.

arXiv link: http://arxiv.org/abs/2307.03594v2

Econometrics arXiv paper, submitted: 2023-07-07

Climate Models Underestimate the Sensitivity of Arctic Sea Ice to Carbon Emissions

Authors: Francis X. Diebold, Glenn D. Rudebusch

Arctic sea ice has steadily diminished as atmospheric greenhouse gas
concentrations have increased. Using observed data from 1979 to 2019, we
estimate a close contemporaneous linear relationship between Arctic sea ice
area and cumulative carbon dioxide emissions. For comparison, we provide
analogous regression estimates using simulated data from global climate models
(drawn from the CMIP5 and CMIP6 model comparison exercises). The carbon
sensitivity of Arctic sea ice area is considerably stronger in the observed
data than in the climate models. Thus, for a given future emissions path, an
ice-free Arctic is likely to occur much earlier than the climate models
project. Furthermore, little progress has been made in recent global climate
modeling (from CMIP5 to CMIP6) to more accurately match the observed
carbon-climate response of Arctic sea ice.

arXiv link: http://arxiv.org/abs/2307.03552v1

Econometrics arXiv paper, submitted: 2023-07-05

Panel Data Nowcasting: The Case of Price-Earnings Ratios

Authors: Andrii Babii, Ryan T. Ball, Eric Ghysels, Jonas Striaukas

The paper uses structured machine learning regressions for nowcasting with
panel data consisting of series sampled at different frequencies. Motivated by
the problem of predicting corporate earnings for a large cross-section of firms
with macroeconomic, financial, and news time series sampled at different
frequencies, we focus on the sparse-group LASSO regularization which can take
advantage of the mixed frequency time series panel data structures. Our
empirical results show the superior performance of our machine learning panel
data regression models over analysts' predictions, forecast combinations,
firm-specific time series regression models, and standard machine learning
methods.

arXiv link: http://arxiv.org/abs/2307.02673v1

Econometrics arXiv cross-link from q-fin.TR (q-fin.TR), submitted: 2023-07-05

Online Learning of Order Flow and Market Impact with Bayesian Change-Point Detection Methods

Authors: Ioanna-Yvonni Tsaknaki, Fabrizio Lillo, Piero Mazzarisi

Financial order flow exhibits a remarkable level of persistence, wherein buy
(sell) trades are often followed by subsequent buy (sell) trades over extended
periods. This persistence can be attributed to the division and gradual
execution of large orders. Consequently, distinct order flow regimes might
emerge, which can be identified through suitable time series models applied to
market data. In this paper, we propose the use of Bayesian online change-point
detection (BOCPD) methods to identify regime shifts in real-time and enable
online predictions of order flow and market impact. To enhance the
effectiveness of our approach, we have developed a novel BOCPD method using a
score-driven approach. This method accommodates temporal correlations and
time-varying parameters within each regime. Through empirical application to
NASDAQ data, we have found that: (i) Our newly proposed model demonstrates
superior out-of-sample predictive performance compared to existing models that
assume i.i.d. behavior within each regime; (ii) When examining the residuals,
our model demonstrates good specification in terms of both distributional
assumptions and temporal correlations; (iii) Within a given regime, the price
dynamics exhibit a concave relationship with respect to time and volume,
mirroring the characteristics of actual large orders; (iv) By incorporating
regime information, our model produces more accurate online predictions of
order flow and market impact compared to models that do not consider regimes.

arXiv link: http://arxiv.org/abs/2307.02375v2

Econometrics arXiv updated paper (originally submitted: 2023-07-05)

Claim Reserving via Inverse Probability Weighting: A Micro-Level Chain-Ladder Method

Authors: Sebastian Calcetero-Vanegas, Andrei L. Badescu, X. Sheldon Lin

Claim reserving primarily relies on macro-level models, with the Chain-Ladder
method being the most widely adopted. These methods were heuristically
developed without minimal statistical foundations, relying on oversimplified
data assumptions and neglecting policyholder heterogeneity, often resulting in
conservative reserve predictions. Micro-level reserving, utilizing stochastic
modeling with granular information, can improve predictions but tends to
involve less attractive and complex models for practitioners. This paper aims
to strike a practical balance between aggregate and individual models by
introducing a methodology that enables the Chain-Ladder method to incorporate
individual information. We achieve this by proposing a novel framework,
formulating the claim reserving problem within a population sampling context.
We introduce a reserve estimator in a frequency and severity distribution-free
manner that utilizes inverse probability weights (IPW) driven by individual
information, akin to propensity scores. We demonstrate that the Chain-Ladder
method emerges as a particular case of such an IPW estimator, thereby
inheriting a statistically sound foundation based on population sampling theory
that enables the use of granular information, and other extensions.

arXiv link: http://arxiv.org/abs/2307.10808v3

Econometrics arXiv paper, submitted: 2023-07-04

Asymptotics for the Generalized Autoregressive Conditional Duration Model

Authors: Giuseppe Cavaliere, Thomas Mikosch, Anders Rahbek, Frederik Vilandt

Engle and Russell (1998, Econometrica, 66:1127--1162) apply results from the
GARCH literature to prove consistency and asymptotic normality of the
(exponential) QMLE for the generalized autoregressive conditional duration
(ACD) model, the so-called ACD(1,1), under the assumption of strict
stationarity and ergodicity. The GARCH results, however, do not account for the
fact that the number of durations over a given observation period is random.
Thus, in contrast with Engle and Russell (1998), we show that strict
stationarity and ergodicity alone are not sufficient for consistency and
asymptotic normality, and provide additional sufficient conditions to account
for the random number of durations. In particular, we argue that the durations
need to satisfy the stronger requirement that they have finite mean.

arXiv link: http://arxiv.org/abs/2307.01779v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-07-04

A Double Machine Learning Approach to Combining Experimental and Observational Data

Authors: Harsh Parikh, Marco Morucci, Vittorio Orlandi, Sudeepa Roy, Cynthia Rudin, Alexander Volfovsky

Experimental and observational studies often lack validity due to untestable
assumptions. We propose a double machine learning approach to combine
experimental and observational studies, allowing practitioners to test for
assumption violations and estimate treatment effects consistently. Our
framework proposes a falsification test for external validity and ignorability
under milder assumptions. We provide consistent treatment effect estimators
even when one of the assumptions is violated. However, our no-free-lunch
theorem highlights the necessity of accurately identifying the violated
assumption for consistent treatment effect estimation. Through comparative
analyses, we show our framework's superiority over existing data fusion
methods. The practical utility of our approach is further exemplified by three
real-world case studies, underscoring its potential for widespread application
in empirical research.

arXiv link: http://arxiv.org/abs/2307.01449v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2023-07-03

Adaptive Principal Component Regression with Applications to Panel Data

Authors: Anish Agarwal, Keegan Harris, Justin Whitehouse, Zhiwei Steven Wu

Principal component regression (PCR) is a popular technique for fixed-design
error-in-variables regression, a generalization of the linear regression
setting in which the observed covariates are corrupted with random noise. We
provide the first time-uniform finite sample guarantees for (regularized) PCR
whenever data is collected adaptively. Since the proof techniques for analyzing
PCR in the fixed design setting do not readily extend to the online setting,
our results rely on adapting tools from modern martingale concentration to the
error-in-variables setting. We demonstrate the usefulness of our bounds by
applying them to the domain of panel data, a ubiquitous setting in econometrics
and statistics. As our first application, we provide a framework for experiment
design in panel data settings when interventions are assigned adaptively. Our
framework may be thought of as a generalization of the synthetic control and
synthetic interventions frameworks, where data is collected via an adaptive
intervention assignment policy. Our second application is a procedure for
learning such an intervention assignment policy in a setting where units arrive
sequentially to be treated. In addition to providing theoretical performance
guarantees (as measured by regret), we show that our method empirically
outperforms a baseline which does not leverage error-in-variables regression.

arXiv link: http://arxiv.org/abs/2307.01357v3

Econometrics arXiv paper, submitted: 2023-07-03

Nonparametric Estimation of Large Spot Volatility Matrices for High-Frequency Financial Data

Authors: Ruijun Bu, Degui Li, Oliver Linton, Hanchao Wang

In this paper, we consider estimating spot/instantaneous volatility matrices
of high-frequency data collected for a large number of assets. We first combine
classic nonparametric kernel-based smoothing with a generalised shrinkage
technique in the matrix estimation for noise-free data under a uniform sparsity
assumption, a natural extension of the approximate sparsity commonly used in
the literature. The uniform consistency property is derived for the proposed
spot volatility matrix estimator with convergence rates comparable to the
optimal minimax one. For the high-frequency data contaminated by microstructure
noise, we introduce a localised pre-averaging estimation method that reduces
the effective magnitude of the noise. We then use the estimation tool developed
in the noise-free scenario, and derive the uniform convergence rates for the
developed spot volatility matrix estimator. We further combine the kernel
smoothing with the shrinkage technique to estimate the time-varying volatility
matrix of the high-dimensional noise vector. In addition, we consider large
spot volatility matrix estimation in time-varying factor models with observable
risk factors and derive the uniform convergence property. We provide numerical
studies including simulation and empirical application to examine the
performance of the proposed estimation methods in finite samples.

arXiv link: http://arxiv.org/abs/2307.01348v1

Econometrics arXiv paper, submitted: 2023-07-03

A maximal inequality for local empirical processes under weak dependence

Authors: Luis Alvarez, Cristine Pinto

We introduce a maximal inequality for a local empirical process under
strongly mixing data. Local empirical processes are defined as the (local)
averages $1{nh}\sum_{i=1}^n 1\{x - h \leq X_i \leq
x+h\}f(Z_i)$, where $f$ belongs to a class of functions, $x \in R$ and
$h > 0$ is a bandwidth. Our nonasymptotic bounds control estimation error
uniformly over the function class, evaluation point $x$ and bandwidth $h$. They
are also general enough to accomodate function classes whose complexity
increases with $n$. As an application, we apply our bounds to function classes
that exhibit polynomial decay in their uniform covering numbers. When
specialized to the problem of kernel density estimation, our bounds reveal
that, under weak dependence with exponential decay, these estimators achieve
the same (up to a logarithmic factor) sharp uniform-in-bandwidth rates derived
in the iid setting by Einmahl2005.

arXiv link: http://arxiv.org/abs/2307.01328v1

Econometrics arXiv updated paper (originally submitted: 2023-07-03)

Does regional variation in wage levels identify the effects of a national minimum wage?

Authors: Daniel Haanwinckel

This paper evaluates the validity of estimators that exploit regional wage
differences to study the effects of a national minimum wage. It shows that
variations of the “fraction affected” and “effective minimum wage” designs
are vulnerable to bias from measurement error and functional form
misspecification, even when standard identification assumptions hold, and that
small deviations from these assumptions can substantially amplify the biases.
Using simulation exercises and a case study of Brazil's minimum wage increase,
the paper illustrates the practical relevance of these issues and assesses the
performance of potential solutions and diagnostic tools.

arXiv link: http://arxiv.org/abs/2307.01284v5

Econometrics arXiv paper, submitted: 2023-07-03

Doubly Robust Estimation of Direct and Indirect Quantile Treatment Effects with Machine Learning

Authors: Yu-Chin Hsu, Martin Huber, Yu-Min Yen

We suggest double/debiased machine learning estimators of direct and indirect
quantile treatment effects under a selection-on-observables assumption. This
permits disentangling the causal effect of a binary treatment at a specific
outcome rank into an indirect component that operates through an intermediate
variable called mediator and an (unmediated) direct impact. The proposed method
is based on the efficient score functions of the cumulative distribution
functions of potential outcomes, which are robust to certain misspecifications
of the nuisance parameters, i.e., the outcome, treatment, and mediator models.
We estimate these nuisance parameters by machine learning and use cross-fitting
to reduce overfitting bias in the estimation of direct and indirect quantile
treatment effects. We establish uniform consistency and asymptotic normality of
our effect estimators. We also propose a multiplier bootstrap for statistical
inference and show the validity of the multiplier bootstrap. Finally, we
investigate the finite sample performance of our method in a simulation study
and apply it to empirical data from the National Job Corp Study to assess the
direct and indirect earnings effects of training.

arXiv link: http://arxiv.org/abs/2307.01049v1

Econometrics arXiv updated paper (originally submitted: 2023-07-03)

Expected Shortfall LASSO

Authors: Sander Barendse

We propose an $\ell_1$-penalized estimator for high-dimensional models of
Expected Shortfall (ES). The estimator is obtained as the solution to a
least-squares problem for an auxiliary dependent variable, which is defined as
a transformation of the dependent variable and a pre-estimated tail quantile.
Leveraging a sparsity condition, we derive a nonasymptotic bound on the
prediction and estimator errors of the ES estimator, accounting for the
estimation error in the dependent variable, and provide conditions under which
the estimator is consistent. Our estimator is applicable to heavy-tailed
time-series data and we find that the amount of parameters in the model may
grow with the sample size at a rate that depends on the dependence and
heavy-tailedness in the data. In an empirical application, we consider the
systemic risk measure CoES and consider a set of regressors that consists of
nonlinear transformations of a set of state variables. We find that the
nonlinear model outperforms an unpenalized and untransformed benchmark
considerably.

arXiv link: http://arxiv.org/abs/2307.01033v2

Econometrics arXiv cross-link from math.OC (math.OC), submitted: 2023-07-03

Quantifying Distributional Model Risk in Marginal Problems via Optimal Transport

Authors: Yanqin Fan, Hyeonseok Park, Gaoqian Xu

This paper studies distributional model risk in marginal problems, where each
marginal measure is assumed to lie in a Wasserstein ball centered at a fixed
reference measure with a given radius. Theoretically, we establish several
fundamental results including strong duality, finiteness of the proposed
Wasserstein distributional model risk, and the existence of an optimizer at
each radius. In addition, we show continuity of the Wasserstein distributional
model risk as a function of the radius. Using strong duality, we extend the
well-known Makarov bounds for the distribution function of the sum of two
random variables with given marginals to Wasserstein distributionally robust
Markarov bounds. Practically, we illustrate our results on four distinct
applications when the sample information comes from multiple data sources and
only some marginal reference measures are identified. They are: partial
identification of treatment effects; externally valid treatment choice via
robust welfare functions; Wasserstein distributionally robust estimation under
data combination; and evaluation of the worst aggregate risk measures.

arXiv link: http://arxiv.org/abs/2307.00779v1

Econometrics arXiv paper, submitted: 2023-07-01

The Yule-Frisch-Waugh-Lovell Theorem

Authors: Deepankar Basu

This paper traces the historical and analytical development of what is known
in the econometrics literature as the Frisch-Waugh-Lovell theorem. This theorem
demonstrates that the coefficients on any subset of covariates in a multiple
regression is equal to the coefficients in a regression of the residualized
outcome variable on the residualized subset of covariates, where
residualization uses the complement of the subset of covariates of interest. In
this paper, I suggest that the theorem should be renamed as the
Yule-Frisch-Waugh-Lovell (YFWL) theorem to recognize the pioneering
contribution of the statistician G. Udny Yule in its development. Second, I
highlight recent work by the statistician, P. Ding, which has extended the YFWL
theorem to a comparison of estimated covariance matrices of coefficients from
multiple and partial, i.e. residualized regressions. Third, I show that, in
cases where Ding's results do not apply, one can still resort to a
computational method to conduct statistical inference about coefficients in
multiple regressions using information from partial regressions.

arXiv link: http://arxiv.org/abs/2307.00369v1

Econometrics arXiv cross-link from q-fin.TR (q-fin.TR), submitted: 2023-06-29

Decomposing cryptocurrency high-frequency price dynamics into recurring and noisy components

Authors: Marcin Wątorek, Maria Skupień, Jarosław Kwapień, Stanisław Drożdż

This paper investigates the temporal patterns of activity in the
cryptocurrency market with a focus on Bitcoin, Ethereum, Dogecoin, and WINkLink
from January 2020 to December 2022. Market activity measures - logarithmic
returns, volume, and transaction number, sampled every 10 seconds, were divided
into intraday and intraweek periods and then further decomposed into recurring
and noise components via correlation matrix formalism. The key findings include
the distinctive market behavior from traditional stock markets due to the
nonexistence of trade opening and closing. This was manifest in three
enhanced-activity phases aligning with Asian, European, and U.S. trading
sessions. An intriguing pattern of activity surge in 15-minute intervals,
particularly at full hours, was also noticed, implying the potential role of
algorithmic trading. Most notably, recurring bursts of activity in bitcoin and
ether were identified to coincide with the release times of significant U.S.
macroeconomic reports such as Nonfarm payrolls, Consumer Price Index data, and
Federal Reserve statements. The most correlated daily patterns of activity
occurred in 2022, possibly reflecting the documented correlations with U.S.
stock indices in the same period. Factors that are external to the inner market
dynamics are found to be responsible for the repeatable components of the
market dynamics, while the internal factors appear to be substantially random,
which manifests itself in a good agreement between the empirical eigenvalue
distributions in their bulk and the random matrix theory predictions expressed
by the Marchenko-Pastur distribution. The findings reported support the growing
integration of cryptocurrencies into the global financial markets.

arXiv link: http://arxiv.org/abs/2306.17095v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-06-28

Nonparametric Causal Decomposition of Group Disparities

Authors: Ang Yu, Felix Elwert

We introduce a new nonparametric causal decomposition approach that
identifies the mechanisms by which a treatment variable contributes to a
group-based outcome disparity. Our approach distinguishes three mechanisms:
group differences in 1) treatment prevalence, 2) average treatment effects, and
3) selection into treatment based on individual-level treatment effects. Our
approach reformulates classic Kitagawa-Blinder-Oaxaca decompositions in causal
and nonparametric terms, complements causal mediation analysis by explaining
group disparities instead of group effects, and isolates conceptually distinct
mechanisms conflated in recent random equalization decompositions. In contrast
to all prior approaches, our framework uniquely identifies differential
selection into treatment as a novel disparity-generating mechanism. Our
approach can be used for both the retrospective causal explanation of
disparities and the prospective planning of interventions to change
disparities. We present both an unconditional and a conditional decomposition,
where the latter quantifies the contributions of the treatment within levels of
certain covariates. We develop nonparametric estimators that are
$n$-consistent, asymptotically normal, semiparametrically efficient, and
multiply robust. We apply our approach to analyze the mechanisms by which
college graduation causally contributes to intergenerational income persistence
(the disparity in adult income between the children of high- vs low-income
parents). Empirically, we demonstrate a previously undiscovered role played by
the new selection component in intergenerational income persistence.

arXiv link: http://arxiv.org/abs/2306.16591v4

Econometrics arXiv updated paper (originally submitted: 2023-06-28)

High-Dimensional Canonical Correlation Analysis

Authors: Anna Bykhovskaya, Vadim Gorin

This paper studies high-dimensional canonical correlation analysis (CCA) with
an emphasis on the vectors that define canonical variables. The paper shows
that when two dimensions of data grow to infinity jointly and proportionally,
the classical CCA procedure for estimating those vectors fails to deliver a
consistent estimate. This provides the first result on the impossibility of
identification of canonical variables in the CCA procedure when all dimensions
are large. As a countermeasure, the paper derives the magnitude of the
estimation error, which can be used in practice to assess the precision of CCA
estimates. Applications of the results to cyclical vs. non-cyclical stocks and
to a limestone grassland data set are provided.

arXiv link: http://arxiv.org/abs/2306.16393v3

Econometrics arXiv updated paper (originally submitted: 2023-06-26)

Assessing Heterogeneity of Treatment Effects

Authors: Tetsuya Kaji, Jianfei Cao

Heterogeneous treatment effects are of major interest in economics. For
example, a poverty reduction measure would be best evaluated by its effects on
those who would be poor in the absence of the treatment, or by the share among
the poor who would increase their earnings because of the treatment. While
these quantities are not identified, we derive nonparametrically sharp bounds
using only the marginal distributions of the control and treated outcomes.
Applications to microfinance and welfare reform demonstrate their utility even
when the average treatment effects are not significant and when economic theory
makes opposite predictions between heterogeneous individuals.

arXiv link: http://arxiv.org/abs/2306.15048v4

Econometrics arXiv updated paper (originally submitted: 2023-06-26)

Identifying Socially Disruptive Policies

Authors: Eric Auerbach, Yong Cai

Social disruption occurs when a policy creates or destroys many network
connections between agents. It is a costly side effect of many interventions
and so a growing empirical literature recommends measuring and accounting for
social disruption when evaluating the welfare impact of a policy. However,
there is currently little work characterizing what can actually be learned
about social disruption from data in practice. In this paper, we consider the
problem of identifying social disruption in an experimental setting. We show
that social disruption is not generally point identified, but informative
bounds can be constructed by rearranging the eigenvalues of the marginal
distribution of network connections between pairs of agents identified from the
experiment. We apply our bounds to the setting of Banerjee et al. (2021) and
find large disruptive effects that the authors miss by only considering
regression estimates.

arXiv link: http://arxiv.org/abs/2306.15000v3

Econometrics arXiv updated paper (originally submitted: 2023-06-26)

Marginal Effects for Probit and Tobit with Endogeneity

Authors: Kirill S. Evdokimov, Ilze Kalnina, Andrei Zeleneev

When evaluating partial effects, it is important to distinguish between
structural endogeneity and measurement errors. In contrast to linear models,
these two sources of endogeneity affect partial effects differently in
nonlinear models. We study this issue focusing on the Instrumental Variable
(IV) Probit and Tobit models. We show that even when a valid IV is available,
failing to differentiate between the two types of endogeneity can lead to
either under- or over-estimation of the partial effects. We develop simple
estimators of the bounds on the partial effects and provide easy to implement
confidence intervals that correctly account for both types of endogeneity. We
illustrate the methods in a Monte Carlo simulation and an empirical
application.

arXiv link: http://arxiv.org/abs/2306.14862v4

Econometrics arXiv updated paper (originally submitted: 2023-06-26)

Optimization of the Generalized Covariance Estimator in Noncausal Processes

Authors: Gianluca Cubadda, Francesco Giancaterini, Alain Hecq, Joann Jasiak

This paper investigates the performance of the Generalized Covariance
estimator (GCov) in estimating and identifying mixed causal and noncausal
models. The GCov estimator is a semi-parametric method that minimizes an
objective function without making any assumptions about the error distribution
and is based on nonlinear autocovariances to identify the causal and noncausal
orders. When the number and type of nonlinear autocovariances included in the
objective function of a GCov estimator is insufficient/inadequate, or the error
density is too close to the Gaussian, identification issues can arise. These
issues result in local minima in the objective function, which correspond to
parameter values associated with incorrect causal and noncausal orders. Then,
depending on the starting point and the optimization algorithm employed, the
algorithm can converge to a local minimum. The paper proposes the use of the
Simulated Annealing (SA) optimization algorithm as an alternative to
conventional numerical optimization methods. The results demonstrate that SA
performs well when applied to mixed causal and noncausal models, successfully
eliminating the effects of local minima. The proposed approach is illustrated
by an empirical application involving a bivariate commodity price series.

arXiv link: http://arxiv.org/abs/2306.14653v3

Econometrics arXiv paper, submitted: 2023-06-26

Hybrid unadjusted Langevin methods for high-dimensional latent variable models

Authors: Ruben Loaiza-Maya, Didier Nibbering, Dan Zhu

The exact estimation of latent variable models with big data is known to be
challenging. The latents have to be integrated out numerically, and the
dimension of the latent variables increases with the sample size. This paper
develops a novel approximate Bayesian method based on the Langevin diffusion
process. The method employs the Fisher identity to integrate out the latent
variables, which makes it accurate and computationally feasible when applied to
big data. In contrast to other approximate estimation methods, it does not
require the choice of a parametric distribution for the unknowns, which often
leads to inaccuracies. In an empirical discrete choice example with a million
observations, the proposed method accurately estimates the posterior choice
probabilities using only 2% of the computation time of exact MCMC.

arXiv link: http://arxiv.org/abs/2306.14445v1

Econometrics arXiv updated paper (originally submitted: 2023-06-25)

Simple Estimation of Semiparametric Models with Measurement Errors

Authors: Kirill S. Evdokimov, Andrei Zeleneev

We develop a practical way of addressing the Errors-In-Variables (EIV)
problem in the Generalized Method of Moments (GMM) framework. We focus on the
settings in which the variability of the EIV is a fraction of that of the
mismeasured variables, which is typical for empirical applications. For any
initial set of moment conditions our approach provides a "corrected" set of
moment conditions that are robust to the EIV. We show that the GMM estimator
based on these moments is root-n-consistent, with the standard tests and
confidence intervals providing valid inference. This is true even when the EIV
are so large that naive estimators (that ignore the EIV problem) are heavily
biased with their confidence intervals having 0% coverage. Our approach
involves no nonparametric estimation, which is especially important for
applications with many covariates, and settings with multivariate or
non-classical EIV. In particular, the approach makes it easy to use
instrumental variables to address EIV in nonlinear models.

arXiv link: http://arxiv.org/abs/2306.14311v3

Econometrics arXiv updated paper (originally submitted: 2023-06-24)

Latent Factor Analysis in Short Panels

Authors: Alain-Philippe Fortin, Patrick Gagliardini, Olivier Scaillet

We develop inferential tools for latent factor analysis in short panels. The
pseudo maximum likelihood setting under a large cross-sectional dimension n and
a fixed time series dimension T relies on a diagonal TxT covariance matrix of
the errors without imposing sphericity nor Gaussianity. We outline the
asymptotic distributions of the latent factor and error covariance estimates as
well as of an asymptotically uniformly most powerful invariant (AUMPI) test for
the number of factors based on the likelihood ratio statistic. We derive the
AUMPI characterization from inequalities ensuring the monotone likelihood ratio
property for positive definite quadratic forms in normal variables. An
empirical application to a large panel of monthly U.S. stock returns separates
month after month systematic and idiosyncratic risks in short subperiods of
bear vs. bull market based on the selected number of factors. We observe an
uptrend in the paths of total and idiosyncratic volatilities while the
systematic risk explains a large part of the cross-sectional total variance in
bear markets but is not driven by a single factor. Rank tests show that
observed factors struggle spanning latent factors with a discrepancy between
the dimensions of the two factor spaces decreasing over time.

arXiv link: http://arxiv.org/abs/2306.14004v2

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2023-06-23

Multivariate Simulation-based Forecasting for Intraday Power Markets: Modelling Cross-Product Price Effects

Authors: Simon Hirsch, Florian Ziel

Intraday electricity markets play an increasingly important role in balancing
the intermittent generation of renewable energy resources, which creates a need
for accurate probabilistic price forecasts. However, research to date has
focused on univariate approaches, while in many European intraday electricity
markets all delivery periods are traded in parallel. Thus, the dependency
structure between different traded products and the corresponding cross-product
effects cannot be ignored. We aim to fill this gap in the literature by using
copulas to model the high-dimensional intraday price return vector. We model
the marginal distribution as a zero-inflated Johnson's $S_U$ distribution with
location, scale and shape parameters that depend on market and fundamental
data. The dependence structure is modelled using latent beta regression to
account for the particular market structure of the intraday electricity market,
such as overlapping but independent trading sessions for different delivery
days. We allow the dependence parameter to be time-varying. We validate our
approach in a simulation study for the German intraday electricity market and
find that modelling the dependence structure improves the forecasting
performance. Additionally, we shed light on the impact of the single intraday
coupling (SIDC) on the trading activity and price distribution and interpret
our results in light of the market efficiency hypothesis. The approach is
directly applicable to other European electricity markets.

arXiv link: http://arxiv.org/abs/2306.13419v1

Econometrics arXiv updated paper (originally submitted: 2023-06-23)

Factor-augmented sparse MIDAS regressions with an application to nowcasting

Authors: Jad Beyhum, Jonas Striaukas

This article investigates factor-augmented sparse MIDAS (Mixed Data Sampling)
regressions for high-dimensional time series data, which may be observed at
different frequencies. Our novel approach integrates sparse and dense
dimensionality reduction techniques. We derive the convergence rate of our
estimator under misspecification, $\tau$-mixing dependence, and polynomial
tails. Our method's finite sample performance is assessed via Monte Carlo
simulations. We apply the methodology to nowcasting U.S. GDP growth and
demonstrate that it outperforms both sparse regression and standard
factor-augmented regression during the COVID-19 pandemic. To ensure the
robustness of these results, we also implement factor-augmented sparse logistic
regression, which further confirms the superior accuracy of our nowcast
probabilities during recessions. These findings indicate that recessions are
influenced by both idiosyncratic (sparse) and common (dense) shocks.

arXiv link: http://arxiv.org/abs/2306.13362v3

Econometrics arXiv paper, submitted: 2023-06-22

A Discrimination Report Card

Authors: Patrick Kline, Evan K. Rose, Christopher R. Walters

We develop an Empirical Bayes grading scheme that balances the
informativeness of the assigned grades against the expected frequency of
ranking errors. Applying the method to a massive correspondence experiment, we
grade the racial biases of 97 U.S. employers. A four-grade ranking limits the
chances that a randomly selected pair of firms is mis-ranked to 5% while
explaining nearly half of the variation in firms' racial contact gaps. The
grades are presented alongside measures of uncertainty about each firm's
contact gap in an accessible rubric that is easily adapted to other settings
where ranks and levels are of simultaneous interest.

arXiv link: http://arxiv.org/abs/2306.13005v1

Econometrics arXiv paper, submitted: 2023-06-22

Price elasticity of electricity demand: Using instrumental variable regressions to address endogeneity and autocorrelation of high-frequency time series

Authors: Silvana Tiedemann, Raffaele Sgarlato, Lion Hirth

This paper examines empirical methods for estimating the response of
aggregated electricity demand to high-frequency price signals, the short-term
elasticity of electricity demand. We investigate how the endogeneity of prices
and the autocorrelation of the time series, which are particularly pronounced
at hourly granularity, affect and distort common estimators. After developing a
controlled test environment with synthetic data that replicate key statistical
properties of electricity demand, we show that not only the ordinary least
square (OLS) estimator is inconsistent (due to simultaneity), but so is a
regular instrumental variable (IV) regression (due to autocorrelation). Using
wind as an instrument, as it is commonly done, may result in an estimate of the
demand elasticity that is inflated by an order of magnitude. We visualize the
reason for the Thams bias using causal graphs and show that its magnitude
depends on the autocorrelation of both the instrument, and the dependent
variable. We further incorporate and adapt two extensions of the IV estimation,
conditional IV and nuisance IV, which have recently been proposed by Thams et
al. (2022). We show that these extensions can identify the true short-term
elasticity in a synthetic setting and are thus particularly promising for
future empirical research in this field.

arXiv link: http://arxiv.org/abs/2306.12863v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-06-21

Estimating the Value of Evidence-Based Decision Making

Authors: Alberto Abadie, Anish Agarwal, Guido Imbens, Siwei Jia, James McQueen, Serguei Stepaniants

Business/policy decisions are often based on evidence from randomized
experiments and observational studies. In this article we propose an empirical
framework to estimate the value of evidence-based decision making (EBDM) and
the return on the investment in statistical precision.

arXiv link: http://arxiv.org/abs/2306.13681v2

Econometrics arXiv updated paper (originally submitted: 2023-06-21)

A Nonparametric Test of $m$th-degree Inverse Stochastic Dominance

Authors: Hongyi Jiang, Zhenting Sun, Shiyun Hu

This paper proposes a nonparametric test for $m$th-degree inverse stochastic
dominance which is a powerful tool for ranking distribution functions according
to social welfare. We construct the test based on empirical process theory. The
test is shown to be asymptotically size controlled and consistent. The good
finite sample properties of the test are illustrated via Monte Carlo
simulations. We apply our test to the inequality growth in the United Kingdom
from 1995 to 2010.

arXiv link: http://arxiv.org/abs/2306.12271v3

Econometrics arXiv updated paper (originally submitted: 2023-06-21)

Difference-in-Differences with Interference

Authors: Ruonan Xu

In many scenarios, such as the evaluation of place-based policies, potential
outcomes are not only dependent upon the unit's own treatment but also its
neighbors' treatment. Despite this, "difference-in-differences" (DID) type
estimators typically ignore such interference among neighbors. I show in this
paper that the canonical DID estimators generally fail to identify interesting
causal effects in the presence of neighborhood interference. To incorporate
interference structure into DID estimation, I propose doubly robust estimators
for the direct average treatment effect on the treated as well as the average
spillover effects under a modified parallel trends assumption. I later relax
common restrictions in the literature, such as immediate neighborhood
interference and correctly specified spillover functions. Moreover, robust
inference is discussed based on the asymptotic distribution of the proposed
estimators.

arXiv link: http://arxiv.org/abs/2306.12003v6

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-06-21

Qini Curves for Multi-Armed Treatment Rules

Authors: Erik Sverdrup, Han Wu, Susan Athey, Stefan Wager

Qini curves have emerged as an attractive and popular approach for evaluating
the benefit of data-driven targeting rules for treatment allocation. We propose
a generalization of the Qini curve to multiple costly treatment arms, that
quantifies the value of optimally selecting among both units and treatment arms
at different budget levels. We develop an efficient algorithm for computing
these curves and propose bootstrap-based confidence intervals that are exact in
large samples for any point on the curve. These confidence intervals can be
used to conduct hypothesis tests comparing the value of treatment targeting
using an optimal combination of arms with using just a subset of arms, or with
a non-targeting assignment rule ignoring covariates, at different budget
levels. We demonstrate the statistical performance in a simulation experiment
and an application to treatment targeting for election turnout.

arXiv link: http://arxiv.org/abs/2306.11979v4

Econometrics arXiv updated paper (originally submitted: 2023-06-20)

Statistical Tests for Replacing Human Decision Makers with Algorithms

Authors: Kai Feng, Han Hong, Ke Tang, Jingyuan Wang

This paper proposes a statistical framework of using artificial intelligence
to improve human decision making. The performance of each human decision maker
is benchmarked against that of machine predictions. We replace the diagnoses
made by a subset of the decision makers with the recommendation from the
machine learning algorithm. We apply both a heuristic frequentist approach and
a Bayesian posterior loss function approach to abnormal birth detection using a
nationwide dataset of doctor diagnoses from prepregnancy checkups of
reproductive age couples and pregnancy outcomes. We find that our algorithm on
a test dataset results in a higher overall true positive rate and a lower false
positive rate than the diagnoses made by doctors only.

arXiv link: http://arxiv.org/abs/2306.11689v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-06-18

Assumption-lean falsification tests of rate double-robustness of double-machine-learning estimators

Authors: Lin Liu, Rajarshi Mukherjee, James M. Robins

The class of doubly-robust (DR) functionals studied by Rotnitzky et al.
(2021) is of central importance in economics and biostatistics. It strictly
includes both (i) the class of mean-square continuous functionals that can be
written as an expectation of an affine functional of a conditional expectation
studied by Chernozhukov et al. (2022b) and (ii) the class of functionals
studied by Robins et al. (2008). The present state-of-the-art estimators for DR
functionals $\psi$ are double-machine-learning (DML) estimators (Chernozhukov
et al., 2018). A DML estimator $\psi_{1}$ of $\psi$ depends on
estimates $p (x)$ and $b (x)$ of a pair of nuisance
functions $p(x)$ and $b(x)$, and is said to satisfy "rate double-robustness" if
the Cauchy--Schwarz upper bound of its bias is $o (n^{- 1/2})$. Were it
achievable, our scientific goal would have been to construct valid,
assumption-lean (i.e. no complexity-reducing assumptions on $b$ or $p$) tests
of the validity of a nominal $(1 - \alpha)$ Wald confidence interval (CI)
centered at $\psi_{1}$. But this would require a test of the bias to
be $o (n^{-1/2})$, which can be shown not to exist. We therefore adopt the less
ambitious goal of falsifying, when possible, an analyst's justification for her
claim that the reported $(1 - \alpha)$ Wald CI is valid. In many instances, an
analyst justifies her claim by imposing complexity-reducing assumptions on $b$
and $p$ to ensure "rate double-robustness". Here we exhibit valid,
assumption-lean tests of $H_{0}$: "rate double-robustness holds", with
non-trivial power against certain alternatives. If $H_{0}$ is rejected, we will
have falsified her justification. However, no assumption-lean test of $H_{0}$,
including ours, can be a consistent test. Thus, the failure of our test to
reject is not meaningful evidence in favor of $H_{0}$.

arXiv link: http://arxiv.org/abs/2306.10590v4

Econometrics arXiv paper, submitted: 2023-06-18

Formal Covariate Benchmarking to Bound Omitted Variable Bias

Authors: Deepankar Basu

Covariate benchmarking is an important part of sensitivity analysis about
omitted variable bias and can be used to bound the strength of the unobserved
confounder using information and judgments about observed covariates. It is
common to carry out formal covariate benchmarking after residualizing the
unobserved confounder on the set of observed covariates. In this paper, I
explain the rationale and details of this procedure. I clarify some important
details of the process of formal covariate benchmarking and highlight some of
the difficulties of interpretation that researchers face in reasoning about the
residualized part of unobserved confounders. I explain all the points with
several empirical examples.

arXiv link: http://arxiv.org/abs/2306.10562v1

Econometrics arXiv updated paper (originally submitted: 2023-06-16)

Testing for Peer Effects without Specifying the Network Structure

Authors: Hyunseok Jung, Xiaodong Liu

This paper proposes an Anderson-Rubin (AR) test for the presence of peer
effects in panel data without the need to specify the network structure. The
unrestricted model of our test is a linear panel data model of social
interactions with dyad-specific peer effect coefficients for all potential
peers. The proposed AR test evaluates if these peer effect coefficients are all
zero. As the number of peer effect coefficients increases with the sample size,
so does the number of instrumental variables (IVs) employed to test the
restrictions under the null, rendering Bekker's many-IV environment. By
extending existing many-IV asymptotic results to panel data, we establish the
asymptotic validity of the proposed AR test. Our Monte Carlo simulations show
the robustness and superior performance of the proposed test compared to some
existing tests with misspecified networks. We provide two applications to
demonstrate its empirical relevance.

arXiv link: http://arxiv.org/abs/2306.09806v3

Econometrics arXiv updated paper (originally submitted: 2023-06-15)

Modelling and Forecasting Macroeconomic Risk with Time Varying Skewness Stochastic Volatility Models

Authors: Andrea Renzetti

Monitoring downside risk and upside risk to the key macroeconomic indicators
is critical for effective policymaking aimed at maintaining economic stability.
In this paper I propose a parametric framework for modelling and forecasting
macroeconomic risk based on stochastic volatility models with Skew-Normal and
Skew-t shocks featuring time varying skewness. Exploiting a mixture stochastic
representation of the Skew-Normal and Skew-t random variables, in the paper I
develop efficient posterior simulation samplers for Bayesian estimation of both
univariate and VAR models of this type. In an application, I use the models to
predict downside risk to GDP growth in the US and I show that these models
represent a competitive alternative to semi-parametric approaches such as
quantile regression. Finally, estimating a medium scale VAR on US data I show
that time varying skewness is a relevant feature of macroeconomic and financial
shocks.

arXiv link: http://arxiv.org/abs/2306.09287v2

Econometrics arXiv updated paper (originally submitted: 2023-06-14)

Inference in clustered IV models with many and weak instruments

Authors: Johannes W. Ligtenberg

Data clustering reduces the effective sample size from the number of
observations towards the number of clusters. For instrumental variable models
this reduced effective sample size makes the instruments more likely to be
weak, in the sense that they contain little information about the endogenous
regressor, and many, in the sense that their number is large compared to the
sample size. Consequently, weak and many instrument problems for estimators and
tests in instrumental variable models are also more likely. None of the
previously developed many and weak instrument robust tests, however, can be
applied to clustered data as they all require independent observations.
Therefore, I adapt the many and weak instrument robust jackknife
Anderson--Rubin and jackknife score tests to clustered data by removing
clusters rather than individual observations from the statistics. Simulations
and a revisitation of a study on the effect of queenly reign on war show the
empirical relevance of the new tests.

arXiv link: http://arxiv.org/abs/2306.08559v3

Econometrics arXiv paper, submitted: 2023-06-13

Machine Learning for Zombie Hunting: Predicting Distress from Firms' Accounts and Missing Values

Authors: Falco J. Bargagli-Stoffi, Fabio Incerti, Massimo Riccaboni, Armando Rungi

In this contribution, we propose machine learning techniques to predict
zombie firms. First, we derive the risk of failure by training and testing our
algorithms on disclosed financial information and non-random missing values of
304,906 firms active in Italy from 2008 to 2017. Then, we spot the highest
financial distress conditional on predictions that lies above a threshold for
which a combination of false positive rate (false prediction of firm failure)
and false negative rate (false prediction of active firms) is minimized.
Therefore, we identify zombies as firms that persist in a state of financial
distress, i.e., their forecasts fall into the risk category above the threshold
for at least three consecutive years. For our purpose, we implement a gradient
boosting algorithm (XGBoost) that exploits information about missing values.
The inclusion of missing values in our predictive model is crucial because
patterns of undisclosed accounts are correlated with firm failure. Finally, we
show that our preferred machine learning algorithm outperforms (i) proxy models
such as Z-scores and the Distance-to-Default, (ii) traditional econometric
methods, and (iii) other widely used machine learning techniques. We provide
evidence that zombies are on average less productive and smaller, and that they
tend to increase in times of crisis. Finally, we argue that our application can
help financial institutions and public authorities design evidence-based
policies-e.g., optimal bankruptcy laws and information disclosure policies.

arXiv link: http://arxiv.org/abs/2306.08165v1

Econometrics arXiv updated paper (originally submitted: 2023-06-13)

Kernel Choice Matters for Local Polynomial Density Estimators at Boundaries

Authors: Shunsuke Imai, Yuta Okamoto

This paper examines kernel selection for local polynomial density (LPD)
estimators at boundary points. Contrary to conventional wisdom, we demonstrate
that the choice of kernel has a substantial impact on the efficiency of LPD
estimators. In particular, we provide theoretical results and present
simulation and empirical evidence showing that commonly used kernels, such as
the triangular kernel, suffer from several efficiency issues: They yield a
larger mean squared error than our preferred Laplace kernel. For inference, the
efficiency loss is even more pronounced, with confidence intervals based on
popular kernels being wide, whereas those based on the Laplace kernel are
markedly tighter. Furthermore, the variance of the LPD estimator with such
popular kernels explodes as the sample size decreases, reflecting the fact --
formally proven here -- that its finite-sample variance is infinite. This
small-sample problem, however, can be avoided by employing kernels with
unbounded support. Taken together, both asymptotic and finite-sample analyses
justify the use of the Laplace kernel: Simply changing the kernel function
improves the reliability of LPD estimation and inference, and its effect is
numerically significant.

arXiv link: http://arxiv.org/abs/2306.07619v3

Econometrics arXiv paper, submitted: 2023-06-12

Instrument-based estimation of full treatment effects with movers

Authors: Didier Nibbering, Matthijs Oosterveen

The effect of the full treatment is a primary parameter of interest in policy
evaluation, while often only the effect of a subset of treatment is estimated.
We partially identify the local average treatment effect of receiving full
treatment (LAFTE) using an instrumental variable that may induce individuals
into only a subset of treatment (movers). We show that movers violate the
standard exclusion restriction, necessary conditions on the presence of movers
are testable, and partial identification holds under a double exclusion
restriction. We identify movers in four empirical applications and estimate
informative bounds on the LAFTE in three of them.

arXiv link: http://arxiv.org/abs/2306.07018v1

Econometrics arXiv updated paper (originally submitted: 2023-06-08)

Localized Neural Network Modelling of Time Series: A Case Study on US Monetary Policy

Authors: Jiti Gao, Fei Liu, Bin Peng, Yanrong Yang

In this paper, we investigate a semiparametric regression model under the
context of treatment effects via a localized neural network (LNN) approach. Due
to a vast number of parameters involved, we reduce the number of effective
parameters by (i) exploring the use of identification restrictions; and (ii)
adopting a variable selection method based on the group-LASSO technique.
Subsequently, we derive the corresponding estimation theory and propose a
dependent wild bootstrap procedure to construct valid inferences accounting for
the dependence of data. Finally, we validate our theoretical findings through
extensive numerical studies. In an empirical study, we revisit the impacts of a
tightening monetary policy action on a variety of economic variables, including
short-/long-term interest rate, inflation, unemployment rate, industrial price
and equity return via the newly proposed framework using a monthly dataset of
the US.

arXiv link: http://arxiv.org/abs/2306.05593v2

Econometrics arXiv updated paper (originally submitted: 2023-06-08)

Maximally Machine-Learnable Portfolios

Authors: Philippe Goulet Coulombe, Maximilian Goebel

When it comes to stock returns, any form of predictability can bolster
risk-adjusted profitability. We develop a collaborative machine learning
algorithm that optimizes portfolio weights so that the resulting synthetic
security is maximally predictable. Precisely, we introduce MACE, a multivariate
extension of Alternating Conditional Expectations that achieves the
aforementioned goal by wielding a Random Forest on one side of the equation,
and a constrained Ridge Regression on the other. There are two key improvements
with respect to Lo and MacKinlay's original maximally predictable portfolio
approach. First, it accommodates for any (nonlinear) forecasting algorithm and
predictor set. Second, it handles large portfolios. We conduct exercises at the
daily and monthly frequency and report significant increases in predictability
and profitability using very little conditioning information. Interestingly,
predictability is found in bad as well as good times, and MACE successfully
navigates the debacle of 2022.

arXiv link: http://arxiv.org/abs/2306.05568v2

Econometrics arXiv updated paper (originally submitted: 2023-06-08)

Heterogeneous Autoregressions in Short T Panel Data Models

Authors: M. Hashem Pesaran, Liying Yang

This paper considers a first-order autoregressive panel data model with
individual-specific effects and heterogeneous autoregressive coefficients
defined on the interval (-1,1], thus allowing for some of the individual
processes to have unit roots. It proposes estimators for the moments of the
cross-sectional distribution of the autoregressive (AR) coefficients, assuming
a random coefficient model for the autoregressive coefficients without imposing
any restrictions on the fixed effects. It is shown the standard generalized
method of moments estimators obtained under homogeneous slopes are biased.
Small sample properties of the proposed estimators are investigated by Monte
Carlo experiments and compared with a number of alternatives, both under
homogeneous and heterogeneous slopes. It is found that a simple moment
estimator of the mean of heterogeneous AR coefficients performs very well even
for moderate sample sizes, but to reliably estimate the variance of AR
coefficients much larger samples are required. It is also required that the
true value of this variance is not too close to zero. The utility of the
heterogeneous approach is illustrated in the case of earnings dynamics.

arXiv link: http://arxiv.org/abs/2306.05299v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-06-08

Matrix GARCH Model: Inference and Application

Authors: Cheng Yu, Dong Li, Feiyu Jiang, Ke Zhu

Matrix-variate time series data are largely available in applications.
However, no attempt has been made to study their conditional heteroskedasticity
that is often observed in economic and financial data. To address this gap, we
propose a novel matrix generalized autoregressive conditional
heteroskedasticity (GARCH) model to capture the dynamics of conditional row and
column covariance matrices of matrix time series. The key innovation of the
matrix GARCH model is the use of a univariate GARCH specification for the trace
of conditional row or column covariance matrix, which allows for the
identification of conditional row and column covariance matrices. Moreover, we
introduce a quasi maximum likelihood estimator (QMLE) for model estimation and
develop a portmanteau test for model diagnostic checking. Simulation studies
are conducted to assess the finite-sample performance of the QMLE and
portmanteau test. To handle large dimensional matrix time series, we also
propose a matrix factor GARCH model. Finally, we demonstrate the superiority of
the matrix GARCH and matrix factor GARCH models over existing multivariate
GARCH-type models in volatility forecasting and portfolio allocations using
three applications on credit default swap prices, global stock sector indices,
and future prices.

arXiv link: http://arxiv.org/abs/2306.05169v1

Econometrics arXiv paper, submitted: 2023-06-07

Network-based Representations and Dynamic Discrete Choice Models for Multiple Discrete Choice Analysis

Authors: Hung Tran, Tien Mai

In many choice modeling applications, people demand is frequently
characterized as multiple discrete, which means that people choose multiple
items simultaneously. The analysis and prediction of people behavior in
multiple discrete choice situations pose several challenges. In this paper, to
address this, we propose a random utility maximization (RUM) based model that
considers each subset of choice alternatives as a composite alternative, where
individuals choose a subset according to the RUM framework. While this approach
offers a natural and intuitive modeling approach for multiple-choice analysis,
the large number of subsets of choices in the formulation makes its estimation
and application intractable. To overcome this challenge, we introduce directed
acyclic graph (DAG) based representations of choices where each node of the DAG
is associated with an elemental alternative and additional information such
that the number of selected elemental alternatives. Our innovation is to show
that the multi-choice model is equivalent to a recursive route choice model on
the DAG, leading to the development of new efficient estimation algorithms
based on dynamic programming. In addition, the DAG representations enable us to
bring some advanced route choice models to capture the correlation between
subset choice alternatives. Numerical experiments based on synthetic and real
datasets show many advantages of our modeling approach and the proposed
estimation algorithms.

arXiv link: http://arxiv.org/abs/2306.04606v1

Econometrics arXiv updated paper (originally submitted: 2023-06-07)

Evaluating the Impact of Regulatory Policies on Social Welfare in Difference-in-Difference Settings

Authors: Dalia Ghanem, Désiré Kédagni, Ismael Mourifié

Quantifying the impact of regulatory policies on social welfare generally
requires the identification of counterfactual distributions. Many of these
policies (e.g. minimum wages or minimum working time) generate mass points
and/or discontinuities in the outcome distribution. Existing approaches in the
difference-in-difference literature cannot accommodate these discontinuities
while accounting for selection on unobservables and non-stationary outcome
distributions. We provide a unifying partial identification result that can
account for these features. Our main identifying assumption is the stability of
the dependence (copula) between the distribution of the untreated potential
outcome and group membership (treatment assignment) across time. Exploiting
this copula stability assumption allows us to provide an identification result
that is invariant to monotonic transformations. We provide sharp bounds on the
counterfactual distribution of the treatment group suitable for any outcome,
whether discrete, continuous, or mixed. Our bounds collapse to the
point-identification result in Athey and Imbens (2006) for continuous outcomes
with strictly increasing distribution functions. We illustrate our approach and
the informativeness of our bounds by analyzing the impact of an increase in the
legal minimum wage using data from a recent minimum wage study (Cengiz, Dube,
Lindner, and Zipperer, 2019).

arXiv link: http://arxiv.org/abs/2306.04494v2

Econometrics arXiv updated paper (originally submitted: 2023-06-07)

Semiparametric Efficiency Gains From Parametric Restrictions on Propensity Scores

Authors: Haruki Kono

We explore how much knowing a parametric restriction on propensity scores
improves semiparametric efficiency bounds in the potential outcome framework.
For stratified propensity scores, considered as a parametric model, we derive
explicit formulas for the efficiency gain from knowing how the covariate space
is split. Based on these, we find that the efficiency gain decreases as the
partition of the stratification becomes finer. For general parametric models,
where it is hard to obtain explicit representations of efficiency bounds, we
propose a novel framework that enables us to see whether knowing a parametric
model is valuable in terms of efficiency even when it is high-dimensional. In
addition to the intuitive fact that knowing the parametric model does not help
much if it is sufficiently flexible, we discover that the efficiency gain can
be nearly zero even though the parametric assumption significantly restricts
the space of possible propensity scores.

arXiv link: http://arxiv.org/abs/2306.04177v3

Econometrics arXiv updated paper (originally submitted: 2023-06-07)

Semiparametric Discrete Choice Models for Bundles

Authors: Fu Ouyang, Thomas T. Yang

We propose two approaches to estimate semiparametric discrete choice models
for bundles. Our first approach is a kernel-weighted rank estimator based on a
matching-based identification strategy. We establish its complete asymptotic
properties and prove the validity of the nonparametric bootstrap for inference.
We then introduce a new multi-index least absolute deviations (LAD) estimator
as an alternative, of which the main advantage is its capacity to estimate
preference parameters on both alternative- and agent-specific regressors. Both
methods can account for arbitrary correlation in disturbances across choices,
with the former also allowing for interpersonal heteroskedasticity. We also
demonstrate that the identification strategy underlying these procedures can be
extended naturally to panel data settings, producing an analogous localized
maximum score estimator and a LAD estimator for estimating bundle choice models
with fixed effects. We derive the limiting distribution of the former and
verify the validity of the numerical bootstrap as an inference tool. All our
proposed methods can be applied to general multi-index models. Monte Carlo
experiments show that they perform well in finite samples.

arXiv link: http://arxiv.org/abs/2306.04135v3

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2023-06-06

Marijuana on Main Streets? The Story Continues in Colombia: An Endogenous Three-part Model

Authors: A. Ramirez-Hassan, C. Gomez, S. Velasquez, K. Tangarife

Cannabis is the most common illicit drug, and understanding its demand is
relevant to analyze the potential implications of its legalization. This paper
proposes an endogenous three-part model taking into account incidental
truncation and access restrictions to study demand for marijuana in Colombia,
and analyze the potential effects of its legalization. Our application suggests
that modeling simultaneously access, intensive and extensive margin is
relevant, and that selection into access is important for the intensive margin.
We find that younger men that have consumed alcohol and cigarettes, living in a
neighborhood with drug suppliers, and friends that consume marijuana face
higher probability of having access and using this drug. In addition, we find
that marijuana is an inelastic good (-0.45 elasticity). Our results are robust
to different specifications and definitions. If marijuana were legalized,
younger individuals with a medium or low risk perception about marijuana use
would increase the probability of use in 3.8 percentage points, from 13.6% to
17.4%. Overall, legalization would increase the probability of consumption in
0.7 p.p. (2.3% to 3.0%). Different price settings suggest that annual tax
revenues fluctuate between USD 11.0 million and USD 54.2 million, a potential
benchmark is USD 32 million.

arXiv link: http://arxiv.org/abs/2306.10031v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2023-06-06

Parametrization, Prior Independence, and the Semiparametric Bernstein-von Mises Theorem for the Partially Linear Model

Authors: Christopher D. Walker

I prove a semiparametric Bernstein-von Mises theorem for a partially linear
regression model with independent priors for the low-dimensional parameter of
interest and the infinite-dimensional nuisance parameters. My result avoids a
challenging prior invariance condition that arises from a loss of information
associated with not knowing the nuisance parameter. The key idea is to employ a
feasible reparametrization of the partially linear regression model that
reflects the semiparametric structure of the model. This allows a researcher to
assume independent priors for the model parameters while automatically
accounting for the loss of information associated with not knowing the nuisance
parameters. The theorem is verified for uniform wavelet series priors and
Mat\'{e}rn Gaussian process priors.

arXiv link: http://arxiv.org/abs/2306.03816v5

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2023-06-06

Uniform Inference for Cointegrated Vector Autoregressive Processes

Authors: Christian Holberg, Susanne Ditlevsen

Uniformly valid inference for cointegrated vector autoregressive processes
has so far proven difficult due to certain discontinuities arising in the
asymptotic distribution of the least squares estimator. We extend asymptotic
results from the univariate case to multiple dimensions and show how inference
can be based on these results. Furthermore, we show that lag augmentation and a
recent instrumental variable procedure can also yield uniformly valid tests and
confidence regions. We verify the theoretical findings and investigate finite
sample properties in simulation experiments for two specific examples.

arXiv link: http://arxiv.org/abs/2306.03632v2

Econometrics arXiv paper, submitted: 2023-06-06

Forecasting the Performance of US Stock Market Indices During COVID-19: RF vs LSTM

Authors: Reza Nematirad, Amin Ahmadisharaf, Ali Lashgari

The US stock market experienced instability following the recession
(2007-2009). COVID-19 poses a significant challenge to US stock traders and
investors. Traders and investors should keep up with the stock market. This is
to mitigate risks and improve profits by using forecasting models that account
for the effects of the pandemic. With consideration of the COVID-19 pandemic
after the recession, two machine learning models, including Random Forest and
LSTM are used to forecast two major US stock market indices. Data on historical
prices after the big recession is used for developing machine learning models
and forecasting index returns. To evaluate the model performance during
training, cross-validation is used. Additionally, hyperparameter optimizing,
regularization, such as dropouts and weight decays, and preprocessing improve
the performances of Machine Learning techniques. Using high-accuracy machine
learning techniques, traders and investors can forecast stock market behavior,
stay ahead of their competition, and improve profitability. Keywords: COVID-19,
LSTM, S&P500, Random Forest, Russell 2000, Forecasting, Machine Learning, Time
Series JEL Code: C6, C8, G4.

arXiv link: http://arxiv.org/abs/2306.03620v1

Econometrics arXiv paper, submitted: 2023-06-06

Robust inference for the treatment effect variance in experiments using machine learning

Authors: Alejandro Sanchez-Becerra

Experimenters often collect baseline data to study heterogeneity. I propose
the first valid confidence intervals for the VCATE, the treatment effect
variance explained by observables. Conventional approaches yield incorrect
coverage when the VCATE is zero. As a result, practitioners could be prone to
detect heterogeneity even when none exists. The reason why coverage worsens at
the boundary is that all efficient estimators have a locally-degenerate
influence function and may not be asymptotically normal. I solve the problem
for a broad class of multistep estimators with a predictive first stage. My
confidence intervals account for higher-order terms in the limiting
distribution and are fast to compute. I also find new connections between the
VCATE and the problem of deciding whom to treat. The gains of targeting
treatment are (sharply) bounded by half the square root of the VCATE. Finally,
I document excellent performance in simulation and reanalyze an experiment from
Malawi.

arXiv link: http://arxiv.org/abs/2306.03363v1

Econometrics arXiv updated paper (originally submitted: 2023-06-05)

Inference for Local Projections

Authors: Atsushi Inoue, Òscar Jordà, Guido M. Kuersteiner

Inference for impulse responses estimated with local projections presents
interesting challenges and opportunities. Analysts typically want to assess the
precision of individual estimates, explore the dynamic evolution of the
response over particular regions, and generally determine whether the impulse
generates a response that is any different from the null of no effect. Each of
these goals requires a different approach to inference. In this article, we
provide an overview of results that have appeared in the literature in the past
20 years along with some new procedures that we introduce here.

arXiv link: http://arxiv.org/abs/2306.03073v2

Econometrics arXiv paper, submitted: 2023-06-05

Improving the accuracy of bubble date estimators under time-varying volatility

Authors: Eiji Kurozumi, Anton Skrobotov

In this study, we consider a four-regime bubble model under the assumption of
time-varying volatility and propose the algorithm of estimating the break dates
with volatility correction: First, we estimate the emerging date of the
explosive bubble, its collapsing date, and the recovering date to the normal
market under assumption of homoskedasticity; second, we collect the residuals
and then employ the WLS-based estimation of the bubble dates. We demonstrate by
Monte Carlo simulations that the accuracy of the break dates estimators improve
significantly by this two-step procedure in some cases compared to those based
on the OLS method.

arXiv link: http://arxiv.org/abs/2306.02977v1

Econometrics arXiv updated paper (originally submitted: 2023-06-05)

Synthetic Regressing Control Method

Authors: Rong J. B. Zhu

Estimating weights in the synthetic control method, typically resulting in
sparse weights where only a few control units have non-zero weights, involves
an optimization procedure that simultaneously selects and aligns control units
to closely match the treated unit. However, this simultaneous selection and
alignment of control units may lead to a loss of efficiency. Another concern
arising from the aforementioned procedure is its susceptibility to
under-fitting due to imperfect pre-treatment fit. It is not uncommon for the
linear combination, using nonnegative weights, of pre-treatment period outcomes
for the control units to inadequately approximate the pre-treatment outcomes
for the treated unit. To address both of these issues, this paper proposes a
simple and effective method called Synthetic Regressing Control (SRC). The SRC
method begins by performing the univariate linear regression to appropriately
align the pre-treatment periods of the control units with the treated unit.
Subsequently, a SRC estimator is obtained by synthesizing (taking a weighted
average) the fitted controls. To determine the weights in the synthesis
procedure, we propose an approach that utilizes a criterion of unbiased risk
estimator. Theoretically, we show that the synthesis way is asymptotically
optimal in the sense of achieving the lowest possible squared error. Extensive
numerical experiments highlight the advantages of the SRC method.

arXiv link: http://arxiv.org/abs/2306.02584v2

Econometrics arXiv paper, submitted: 2023-06-03

Individual Causal Inference Using Panel Data With Multiple Outcomes

Authors: Wei Tian

Policy evaluation in empirical microeconomics has been focusing on estimating
the average treatment effect and more recently the heterogeneous treatment
effects, often relying on the unconfoundedness assumption. We propose a method
based on the interactive fixed effects model to estimate treatment effects at
the individual level, which allows both the treatment assignment and the
potential outcomes to be correlated with the unobserved individual
characteristics. This method is suitable for panel datasets where multiple
related outcomes are observed for a large number of individuals over a small
number of time periods. Monte Carlo simulations show that our method
outperforms related methods. To illustrate our method, we provide an example of
estimating the effect of health insurance coverage on individual usage of
hospital emergency departments using the Oregon Health Insurance Experiment
data.

arXiv link: http://arxiv.org/abs/2306.01969v1

Econometrics arXiv paper, submitted: 2023-06-03

The Synthetic Control Method with Nonlinear Outcomes: Estimating the Impact of the 2019 Anti-Extradition Law Amendments Bill Protests on Hong Kong's Economy

Authors: Wei Tian

The synthetic control estimator (Abadie et al., 2010) is asymptotically
unbiased assuming that the outcome is a linear function of the underlying
predictors and that the treated unit can be well approximated by the synthetic
control before the treatment. When the outcome is nonlinear, the bias of the
synthetic control estimator can be severe. In this paper, we provide conditions
for the synthetic control estimator to be asymptotically unbiased when the
outcome is nonlinear, and propose a flexible and data-driven method to choose
the synthetic control weights. Monte Carlo simulations show that compared with
the competing methods, the nonlinear synthetic control method has similar or
better performance when the outcome is linear, and better performance when the
outcome is nonlinear, and that the confidence intervals have good coverage
probabilities across settings. In the empirical application, we illustrate the
method by estimating the impact of the 2019 anti-extradition law amendments
bill protests on Hong Kong's economy, and find that the year-long protests
reduced real GDP per capita in Hong Kong by 11.27% in the first quarter of
2020, which was larger in magnitude than the economic decline during the 1997
Asian financial crisis or the 2008 global financial crisis.

arXiv link: http://arxiv.org/abs/2306.01967v1

Econometrics arXiv cross-link from math.OC (math.OC), submitted: 2023-06-02

Load Asymptotics and Dynamic Speed Optimization for the Greenest Path Problem: A Comprehensive Analysis

Authors: Poulad Moradi, Joachim Arts, Josué C. Velázquez-Martínez

We study the effect of using high-resolution elevation data on the selection
of the most fuel-efficient(greenest) path for different trucks in various urban
environments.We adapt a variant of the Comprehensive Modal Emission Model(CMEM)
to show that the optimal speed and the greenest path are slope dependent
(dynamic).When there are no elevation changes in a road network, the most
fuel-efficient path is the shortest path with a constant (static) optimal speed
throughout.However, if the network is not flat, then the shortest path is not
necessarily the greenest path, and the optimal driving speed is dynamic.We
prove that the greenest path converges to an asymptotic greenest path as the
payload approaches infinity and that this limiting path is attained for a
finite load.In a set of extensive numerical experiments, we benchmark the
CO2emissions reduction of our dynamic speed and the greenest path policies
against policies that ignore elevation data.We use the geospatial data of
25major cities across 6continents.We observe numerically that the greenest path
quickly diverges from the shortest path and attains the asymptotic greenest
path even for moderate payloads.Based on an analysis of variance, the main
determinants of the CO2emissions reduction potential are the variation of the
road gradients along the shortest path as well as the relative elevation of the
source from the target.Using speed data estimates for rush hour in New York
City, we test CO2emissions reduction by comparing the greenest paths with
optimized speeds against the fastest paths with traffic speed.We observe that
selecting the greenest paths instead of the fastest paths can significantly
reduce CO2emissions.Additionally,our results show that while speed optimization
on uphill arcs can significantly help CO2reduction,the potential to leverage
gravity for acceleration on downhill arcs is limited due to traffic congestion.

arXiv link: http://arxiv.org/abs/2306.01687v2

Econometrics arXiv updated paper (originally submitted: 2023-06-02)

Social Interactions in Endogenous Groups

Authors: Shuyang Sheng, Xiaoting Sun

This paper investigates social interactions in endogenous groups. We specify
a two-sided many-to-one matching model, where individuals select groups based
on preferences, while groups admit individuals based on qualifications until
reaching capacities. Endogenous formation of groups leads to selection bias in
peer effect estimation, which is complicated by equilibrium effects and
alternative groups. We propose novel methods to simplify selection bias and
develop a sieve OLS estimator for peer effects that is n-consistent and
asymptotically normal. Using Chilean data, we find that ignoring selection into
high schools leads to overestimated peer influence and distorts the estimation
of school effectiveness.

arXiv link: http://arxiv.org/abs/2306.01544v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2023-06-01

Rank-heterogeneous Preference Models for School Choice

Authors: Amel Awadelkarim, Arjun Seshadri, Itai Ashlagi, Irene Lo, Johan Ugander

School choice mechanism designers use discrete choice models to understand
and predict families' preferences. The most widely-used choice model, the
multinomial logit (MNL), is linear in school and/or household attributes. While
the model is simple and interpretable, it assumes the ranked preference lists
arise from a choice process that is uniform throughout the ranking, from top to
bottom. In this work, we introduce two strategies for rank-heterogeneous choice
modeling tailored for school choice. First, we adapt a context-dependent random
utility model (CDM), considering down-rank choices as occurring in the context
of earlier up-rank choices. Second, we consider stratifying the choice modeling
by rank, regularizing rank-adjacent models towards one another when
appropriate. Using data on household preferences from the San Francisco Unified
School District (SFUSD) across multiple years, we show that the contextual
models considerably improve our out-of-sample evaluation metrics across all
rank positions over the non-contextual models in the literature. Meanwhile,
stratifying the model by rank can yield more accurate first-choice predictions
while down-rank predictions are relatively unimproved. These models provide
performance upgrades that school choice researchers can adopt to improve
predictions and counterfactual analyses.

arXiv link: http://arxiv.org/abs/2306.01801v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-06-01

Causal Estimation of User Learning in Personalized Systems

Authors: Evan Munro, David Jones, Jennifer Brennan, Roland Nelet, Vahab Mirrokni, Jean Pouget-Abadie

In online platforms, the impact of a treatment on an observed outcome may
change over time as 1) users learn about the intervention, and 2) the system
personalization, such as individualized recommendations, change over time. We
introduce a non-parametric causal model of user actions in a personalized
system. We show that the Cookie-Cookie-Day (CCD) experiment, designed for the
measurement of the user learning effect, is biased when there is
personalization. We derive new experimental designs that intervene in the
personalization system to generate the variation necessary to separately
identify the causal effect mediated through user learning and personalization.
Making parametric assumptions allows for the estimation of long-term causal
effects based on medium-term experiments. In simulations, we show that our new
designs successfully recover the dynamic causal effects of interest.

arXiv link: http://arxiv.org/abs/2306.00485v1

Econometrics arXiv updated paper (originally submitted: 2023-06-01)

Inference in Predictive Quantile Regressions

Authors: Alex Maynard, Katsumi Shimotsu, Nina Kuriyama

This paper studies inference in predictive quantile regressions when the
predictive regressor has a near-unit root. We derive asymptotic distributions
for the quantile regression estimator and its heteroskedasticity and
autocorrelation consistent (HAC) t-statistic in terms of functionals of
Ornstein-Uhlenbeck processes. We then propose a switching-fully modified (FM)
predictive test for quantile predictability. The proposed test employs an FM
style correction with a Bonferroni bound for the local-to-unity parameter when
the predictor has a near unit root. It switches to a standard predictive
quantile regression test with a slightly conservative critical value when the
largest root of the predictor lies in the stationary range. Simulations
indicate that the test has a reliable size in small samples and good power. We
employ this new methodology to test the ability of three commonly employed,
highly persistent and endogenous lagged valuation regressors - the dividend
price ratio, earnings price ratio, and book-to-market ratio - to predict the
median, shoulders, and tails of the stock return distribution.

arXiv link: http://arxiv.org/abs/2306.00296v2

Econometrics arXiv paper, submitted: 2023-05-31

Deep Neural Network Estimation in Panel Data Models

Authors: Ilias Chronopoulos, Katerina Chrysikou, George Kapetanios, James Mitchell, Aristeidis Raftapostolos

In this paper we study neural networks and their approximating power in panel
data models. We provide asymptotic guarantees on deep feed-forward neural
network estimation of the conditional mean, building on the work of Farrell et
al. (2021), and explore latent patterns in the cross-section. We use the
proposed estimators to forecast the progression of new COVID-19 cases across
the G7 countries during the pandemic. We find significant forecasting gains
over both linear panel and nonlinear time series models. Containment or
lockdown policies, as instigated at the national-level by governments, are
found to have out-of-sample predictive power for new COVID-19 cases. We
illustrate how the use of partial derivatives can help open the "black-box" of
neural networks and facilitate semi-structural analysis: school and workplace
closures are found to have been effective policies at restricting the
progression of the pandemic across the G7 countries. But our methods illustrate
significant heterogeneity and time-variation in the effectiveness of specific
containment policies.

arXiv link: http://arxiv.org/abs/2305.19921v1

Econometrics arXiv updated paper (originally submitted: 2023-05-31)

Quasi-Score Matching Estimation for Spatial Autoregressive Model with Random Weights Matrix and Regressors

Authors: Xuan Liang, Tao Zou

With the rapid advancements in technology for data collection, the
application of the spatial autoregressive (SAR) model has become increasingly
prevalent in real-world analysis, particularly when dealing with large
datasets. However, the commonly used quasi-maximum likelihood estimation (QMLE)
for the SAR model is not computationally scalable to handle the data with a
large size. In addition, when establishing the asymptotic properties of the
parameter estimators of the SAR model, both weights matrix and regressors are
assumed to be nonstochastic in classical spatial econometrics, which is perhaps
not realistic in real applications. Motivated by the machine learning
literature, this paper proposes quasi-score matching estimation for the SAR
model. This new estimation approach is developed based on the likelihood, but
significantly reduces the computational complexity of the QMLE. The asymptotic
properties of parameter estimators under the random weights matrix and
regressors are established, which provides a new theoretical framework for the
asymptotic inference of the SAR-type models. The usefulness of the quasi-score
matching estimation and its asymptotic inference is illustrated via extensive
simulation studies and a case study of an anti-conflict social network
experiment for middle school students.

arXiv link: http://arxiv.org/abs/2305.19721v2

Econometrics arXiv updated paper (originally submitted: 2023-05-31)

A Simple Method for Predicting Covariance Matrices of Financial Returns

Authors: Kasper Johansson, Mehmet Giray Ogut, Markus Pelger, Thomas Schmelzer, Stephen Boyd

We consider the well-studied problem of predicting the time-varying
covariance matrix of a vector of financial returns. Popular methods range from
simple predictors like rolling window or exponentially weighted moving average
(EWMA) to more sophisticated predictors such as generalized autoregressive
conditional heteroscedastic (GARCH) type methods. Building on a specific
covariance estimator suggested by Engle in 2002, we propose a relatively simple
extension that requires little or no tuning or fitting, is interpretable, and
produces results at least as good as MGARCH, a popular extension of GARCH that
handles multiple assets. To evaluate predictors we introduce a novel approach,
evaluating the regret of the log-likelihood over a time period such as a
quarter. This metric allows us to see not only how well a covariance predictor
does over all, but also how quickly it reacts to changes in market conditions.
Our simple predictor outperforms MGARCH in terms of regret. We also test
covariance predictors on downstream applications such as portfolio optimization
methods that depend on the covariance matrix. For these applications our simple
covariance predictor and MGARCH perform similarly.

arXiv link: http://arxiv.org/abs/2305.19484v2

Econometrics arXiv updated paper (originally submitted: 2023-05-30)

Impulse Response Analysis of Structural Nonlinear Time Series Models

Authors: Giovanni Ballarin

This paper proposes a semiparametric sieve approach to estimate impulse
response functions of nonlinear time series within a general class of
structural autoregressive models. We prove that a two-step procedure can
flexibly accommodate nonlinear specifications while avoiding the need to choose
fixed parametric forms. Sieve impulse responses are proven to be consistent by
deriving uniform estimation guarantees, and an iterative algorithm makes it
straightforward to compute them in practice. With simulations, we show that the
proposed semiparametric approach proves effective against misspecification
while suffering only from minor efficiency losses. In a U.S. monetary policy
application, the pointwise sieve GDP response associated with an interest rate
increase is larger than that of a linear model. Finally, in an analysis of
interest rate uncertainty shocks, sieve responses indicate more substantial
contractionary effects on production and inflation.

arXiv link: http://arxiv.org/abs/2305.19089v6

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2023-05-30

Incorporating Domain Knowledge in Deep Neural Networks for Discrete Choice Models

Authors: Shadi Haj-Yahia, Omar Mansour, Tomer Toledo

Discrete choice models (DCM) are widely employed in travel demand analysis as
a powerful theoretical econometric framework for understanding and predicting
choice behaviors. DCMs are formed as random utility models (RUM), with their
key advantage of interpretability. However, a core requirement for the
estimation of these models is a priori specification of the associated utility
functions, making them sensitive to modelers' subjective beliefs. Recently,
machine learning (ML) approaches have emerged as a promising avenue for
learning unobserved non-linear relationships in DCMs. However, ML models are
considered "black box" and may not correspond with expected relationships. This
paper proposes a framework that expands the potential of data-driven approaches
for DCM by supporting the development of interpretable models that incorporate
domain knowledge and prior beliefs through constraints. The proposed framework
includes pseudo data samples that represent required relationships and a loss
function that measures their fulfillment, along with observed data, for model
training. The developed framework aims to improve model interpretability by
combining ML's specification flexibility with econometrics and interpretable
behavioral analysis. A case study demonstrates the potential of this framework
for discrete choice analysis.

arXiv link: http://arxiv.org/abs/2306.00016v1

Econometrics arXiv paper, submitted: 2023-05-30

Generalized Autoregressive Score Trees and Forests

Authors: Andrew J. Patton, Yasin Simsek

We propose methods to improve the forecasts from generalized autoregressive
score (GAS) models (Creal et. al, 2013; Harvey, 2013) by localizing their
parameters using decision trees and random forests. These methods avoid the
curse of dimensionality faced by kernel-based approaches, and allow one to draw
on information from multiple state variables simultaneously. We apply the new
models to four distinct empirical analyses, and in all applications the
proposed new methods significantly outperform the baseline GAS model. In our
applications to stock return volatility and density prediction, the optimal GAS
tree model reveals a leverage effect and a variance risk premium effect. Our
study of stock-bond dependence finds evidence of a flight-to-quality effect in
the optimal GAS forest forecasts, while our analysis of high-frequency trade
durations uncovers a volume-volatility effect.

arXiv link: http://arxiv.org/abs/2305.18991v1

Econometrics arXiv updated paper (originally submitted: 2023-05-29)

Nonlinear Impulse Response Functions and Local Projections

Authors: Christian Gourieroux, Quinlan Lee

The goal of this paper is to extend the nonparametric estimation of Impulse
Response Functions (IRF) by means of local projections in the nonlinear dynamic
framework. We discuss the existence of a nonlinear autoregressive
representation for Markov processes and explain how their IRFs are directly
linked to the Nonlinear Local Projection (NLP), as in the case for the linear
setting. We present a fully nonparametric LP estimator in the one dimensional
nonlinear framework, compare its asymptotic properties to that of IRFs implied
by the nonlinear autoregressive model and show that the two approaches are
asymptotically equivalent. This extends the well-known result in the linear
autoregressive model by Plagborg-Moller and Wolf (2017). We also consider
extensions to the multivariate framework through the lens of semiparametric
models, and demonstrate that the indirect approach by the NLP is less accurate
than the direct estimation approach of the IRF.

arXiv link: http://arxiv.org/abs/2305.18145v2

Econometrics arXiv updated paper (originally submitted: 2023-05-29)

Dynamic LATEs with a Static Instrument

Authors: Bruno Ferman, Otávio Tecchio

In many situations, researchers are interested in identifying dynamic effects
of an irreversible treatment with a time-invariant binary instrumental variable
(IV). For example, in evaluations of dynamic effects of training programs with
a single lottery determining eligibility. A common approach in these situations
is to report per-period IV estimates. Under a dynamic extension of standard IV
assumptions, we show that such IV estimands identify a weighted sum of
treatment effects for different latent groups and treatment exposures. However,
there is possibility of negative weights. We discuss point and partial
identification of dynamic treatment effects in this setting under different
sets of assumptions.

arXiv link: http://arxiv.org/abs/2305.18114v3

Econometrics arXiv paper, submitted: 2023-05-28

Time-Varying Vector Error-Correction Models: Estimation and Inference

Authors: Jiti Gao, Bin Peng, Yayi Yan

This paper considers a time-varying vector error-correction model that allows
for different time series behaviours (e.g., unit-root and locally stationary
processes) to interact with each other to co-exist. From practical
perspectives, this framework can be used to estimate shifts in the
predictability of non-stationary variables, test whether economic theories hold
periodically, etc. We first develop a time-varying Granger Representation
Theorem, which facilitates the establishment of asymptotic properties for the
model, and then propose estimation and inferential methods and theory for both
short-run and long-run coefficients. We also propose an information criterion
to estimate the lag length, a singular-value ratio test to determine the
cointegration rank, and a hypothesis test to examine the parameter stability.
To validate the theoretical findings, we conduct extensive simulations.
Finally, we demonstrate the empirical relevance by applying the framework to
investigate the rational expectations hypothesis of the U.S. term structure.

arXiv link: http://arxiv.org/abs/2305.17829v1

Econometrics arXiv updated paper (originally submitted: 2023-05-28)

Estimating overidentified linear models with heteroskedasticity and outliers

Authors: Lei Bill Wang

A large degree of overidentification causes severe bias in TSLS. A
conventional heuristic rule used to motivate new estimators in this context is
approximate bias. This paper formalizes the definition of approximate bias and
expands the applicability of approximate bias to various classes of estimators
that bridge OLS, TSLS, and Jackknife IV estimators (JIVEs). By evaluating their
approximate biases, I propose new approximately unbiased estimators, including
UOJIVE1 and UOJIVE2. UOJIVE1 can be interpreted as a generalization of an
existing estimator UIJIVE1. Both UOJIVEs are proven to be consistent and
asymptotically normal under a fixed number of instruments and controls. The
asymptotic proofs for UOJIVE1 in this paper require the absence of high
leverage points, whereas proofs for UOJIVE2 do not. In addition, UOJIVE2 is
consistent under many-instrument asymptotic. The simulation results align with
the theorems in this paper: (i) Both UOJIVEs perform well under many instrument
scenarios with or without heteroskedasticity, (ii) When a high leverage point
coincides with a high variance of the error term, an outlier is generated and
the performance of UOJIVE1 is much poorer than that of UOJIVE2.

arXiv link: http://arxiv.org/abs/2305.17615v5

Econometrics arXiv paper, submitted: 2023-05-26

Using Limited Trial Evidence to Credibly Choose Treatment Dosage when Efficacy and Adverse Effects Weakly Increase with Dose

Authors: Charles F. Manski

In medical treatment and elsewhere, it has become standard to base treatment
intensity (dosage) on evidence in randomized trials. Yet it has been rare to
study how outcomes vary with dosage. In trials to obtain drug approval, the
norm has been to specify some dose of a new drug and compare it with an
established therapy or placebo. Design-based trial analysis views each trial
arm as qualitatively different, but it may be highly credible to assume that
efficacy and adverse effects (AEs) weakly increase with dosage. Optimization of
patient care requires joint attention to both, as well as to treatment cost.
This paper develops methodology to credibly use limited trial evidence to
choose dosage when efficacy and AEs weakly increase with dose. I suppose that
dosage is an integer choice t in (0, 1, . . . , T), T being a specified maximum
dose. I study dosage choice when trial evidence on outcomes is available for
only K dose levels, where K < T + 1. Then the population distribution of dose
response is partially rather than point identified. The identification region
is a convex polygon determined by linear equalities and inequalities. I
characterize clinical and public-health decision making using the
minimax-regret criterion. A simple analytical solution exists when T = 2 and
computation is tractable when T is larger.

arXiv link: http://arxiv.org/abs/2305.17206v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2023-05-26

A Policy Gradient Method for Confounded POMDPs

Authors: Mao Hong, Zhengling Qi, Yanxun Xu

In this paper, we propose a policy gradient method for confounded partially
observable Markov decision processes (POMDPs) with continuous state and
observation spaces in the offline setting. We first establish a novel
identification result to non-parametrically estimate any history-dependent
policy gradient under POMDPs using the offline data. The identification enables
us to solve a sequence of conditional moment restrictions and adopt the min-max
learning procedure with general function approximation for estimating the
policy gradient. We then provide a finite-sample non-asymptotic bound for
estimating the gradient uniformly over a pre-specified policy class in terms of
the sample size, length of horizon, concentratability coefficient and the
measure of ill-posedness in solving the conditional moment restrictions.
Lastly, by deploying the proposed gradient estimation in the gradient ascent
algorithm, we show the global convergence of the proposed algorithm in finding
the history-dependent optimal policy under some technical conditions. To the
best of our knowledge, this is the first work studying the policy gradient
method for POMDPs under the offline setting.

arXiv link: http://arxiv.org/abs/2305.17083v2

Econometrics arXiv cross-link from q-fin.TR (q-fin.TR), submitted: 2023-05-26

When is cross impact relevant?

Authors: Victor Le Coz, Iacopo Mastromatteo, Damien Challet, Michael Benzaquen

Trading pressure from one asset can move the price of another, a phenomenon
referred to as cross impact. Using tick-by-tick data spanning 5 years for 500
assets listed in the United States, we identify the features that make
cross-impact relevant to explain the variance of price returns. We show that
price formation occurs endogenously within highly liquid assets. Then, trades
in these assets influence the prices of less liquid correlated products, with
an impact velocity constrained by their minimum trading frequency. We
investigate the implications of such a multidimensional price formation
mechanism on interest rate markets. We find that the 10-year bond future serves
as the primary liquidity reservoir, influencing the prices of cash bonds and
futures contracts within the interest rate curve. Such behaviour challenges the
validity of the theory in Financial Economics that regards long-term rates as
agents anticipations of future short term rates.

arXiv link: http://arxiv.org/abs/2305.16915v2

Econometrics arXiv paper, submitted: 2023-05-26

Fast and Order-invariant Inference in Bayesian VARs with Non-Parametric Shocks

Authors: Florian Huber, Gary Koop

The shocks which hit macroeconomic models such as Vector Autoregressions
(VARs) have the potential to be non-Gaussian, exhibiting asymmetries and fat
tails. This consideration motivates the VAR developed in this paper which uses
a Dirichlet process mixture (DPM) to model the shocks. However, we do not
follow the obvious strategy of simply modeling the VAR errors with a DPM since
this would lead to computationally infeasible Bayesian inference in larger VARs
and potentially a sensitivity to the way the variables are ordered in the VAR.
Instead we develop a particular additive error structure inspired by Bayesian
nonparametric treatments of random effects in panel data models. We show that
this leads to a model which allows for computationally fast and order-invariant
inference in large VARs with nonparametric shocks. Our empirical results with
nonparametric VARs of various dimensions shows that nonparametric treatment of
the VAR errors is particularly useful in periods such as the financial crisis
and the pandemic.

arXiv link: http://arxiv.org/abs/2305.16827v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2023-05-25

Hierarchical forecasting for aggregated curves with an application to day-ahead electricity price auctions

Authors: Paul Ghelasi, Florian Ziel

Aggregated curves are common structures in economics and finance, and the
most prominent examples are supply and demand curves. In this study, we exploit
the fact that all aggregated curves have an intrinsic hierarchical structure,
and thus hierarchical reconciliation methods can be used to improve the
forecast accuracy. We provide an in-depth theory on how aggregated curves can
be constructed or deconstructed, and conclude that these methods are equivalent
under weak assumptions. We consider multiple reconciliation methods for
aggregated curves, including previously established bottom-up, top-down, and
linear optimal reconciliation approaches. We also present a new benchmark
reconciliation method called 'aggregated-down' with similar complexity to
bottom-up and top-down approaches, but it tends to provide better accuracy in
this setup. We conducted an empirical forecasting study on the German day-ahead
power auction market by predicting the demand and supply curves, where their
equilibrium determines the electricity price for the next day. Our results
demonstrate that hierarchical reconciliation methods can be used to improve the
forecasting accuracy of aggregated curves.

arXiv link: http://arxiv.org/abs/2305.16255v1

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2023-05-25

Validating a dynamic input-output model for the propagation of supply and demand shocks during the COVID-19 pandemic in Belgium

Authors: Tijs W. Alleman, Koen Schoors, Jan M. Baetens

This work validates a dynamic production network model, used to quantify the
impact of economic shocks caused by COVID-19 in the UK, using data for Belgium.
Because the model was published early during the 2020 COVID-19 pandemic, it
relied on several assumptions regarding the magnitude of the observed economic
shocks, for which more accurate data have become available in the meantime. We
refined the propagated shocks to align with observed data collected during the
pandemic and calibrated some less well-informed parameters using 115 economic
time series. The refined model effectively captures the evolution of GDP,
revenue, and employment during the COVID-19 pandemic in Belgium at both
individual economic activity and aggregate levels. However, the reduction in
business-to-business demand is overestimated, revealing structural shortcomings
in accounting for businesses' motivations to sustain trade despite the
pandemic's induced shocks. We confirm that the relaxation of the stringent
Leontief production function by a survey on the criticality of inputs
significantly improved the model's accuracy. However, despite a large dataset,
distinguishing between varying degrees of relaxation proved challenging.
Overall, this work demonstrates the model's validity in assessing the impact of
economic shocks caused by an epidemic in Belgium.

arXiv link: http://arxiv.org/abs/2305.16377v2

Econometrics arXiv updated paper (originally submitted: 2023-05-23)

Adapting to Misspecification

Authors: Timothy B. Armstrong, Patrick Kline, Liyang Sun

Empirical research typically involves a robustness-efficiency tradeoff. A
researcher seeking to estimate a scalar parameter can invoke strong assumptions
to motivate a restricted estimator that is precise but may be heavily biased,
or they can relax some of these assumptions to motivate a more robust, but
variable, unrestricted estimator. When a bound on the bias of the restricted
estimator is available, it is optimal to shrink the unrestricted estimator
towards the restricted estimator. For settings where a bound on the bias of the
restricted estimator is unknown, we propose adaptive estimators that minimize
the percentage increase in worst case risk relative to an oracle that knows the
bound. We show that adaptive estimators solve a weighted convex minimax problem
and provide lookup tables facilitating their rapid computation. Revisiting some
well known empirical studies where questions of model specification arise, we
examine the advantages of adapting to -- rather than testing for --
misspecification.

arXiv link: http://arxiv.org/abs/2305.14265v6

Econometrics arXiv updated paper (originally submitted: 2023-05-23)

Flexible Bayesian Quantile Analysis of Residential Rental Rates

Authors: Ivan Jeliazkov, Shubham Karnawat, Mohammad Arshad Rahman, Angela Vossmeyer

This article develops a random effects quantile regression model for panel
data that allows for increased distributional flexibility, multivariate
heterogeneity, and time-invariant covariates in situations where mean
regression may be unsuitable. Our approach is Bayesian and builds upon the
generalized asymmetric Laplace distribution to decouple the modeling of
skewness from the quantile parameter. We derive an efficient simulation-based
estimation algorithm, demonstrate its properties and performance in targeted
simulation studies, and employ it in the computation of marginal likelihoods to
enable formal Bayesian model comparisons. The methodology is applied in a study
of U.S. residential rental rates following the Global Financial Crisis. Our
empirical results provide interesting insights on the interaction between rents
and economic, demographic and policy variables, weigh in on key modeling
features, and overwhelmingly support the additional flexibility at nearly all
quantiles and across several sub-samples. The practical differences that arise
as a result of allowing for flexible modeling can be nontrivial, especially for
quantiles away from the median.

arXiv link: http://arxiv.org/abs/2305.13687v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2023-05-22

Prediction Risk and Estimation Risk of the Ridgeless Least Squares Estimator under General Assumptions on Regression Errors

Authors: Sungyoon Lee, Sokbae Lee

In recent years, there has been a significant growth in research focusing on
minimum $\ell_2$ norm (ridgeless) interpolation least squares estimators.
However, the majority of these analyses have been limited to an unrealistic
regression error structure, assuming independent and identically distributed
errors with zero mean and common variance. In this paper, we explore prediction
risk as well as estimation risk under more general regression error
assumptions, highlighting the benefits of overparameterization in a more
realistic setting that allows for clustered or serial dependence. Notably, we
establish that the estimation difficulties associated with the variance
components of both risks can be summarized through the trace of the
variance-covariance matrix of the regression errors. Our findings suggest that
the benefits of overparameterization can extend to time series, panel and
grouped data.

arXiv link: http://arxiv.org/abs/2305.12883v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2023-05-21

Federated Offline Policy Learning

Authors: Aldo Gael Carranza, Susan Athey

We consider the problem of learning personalized decision policies from
observational bandit feedback data across multiple heterogeneous data sources.
In our approach, we introduce a novel regret analysis that establishes
finite-sample upper bounds on distinguishing notions of global regret for all
data sources on aggregate and of local regret for any given data source. We
characterize these regret bounds by expressions of source heterogeneity and
distribution shift. Moreover, we examine the practical considerations of this
problem in the federated setting where a central server aims to train a policy
on data distributed across the heterogeneous sources without collecting any of
their raw data. We present a policy learning algorithm amenable to federation
based on the aggregation of local policies trained with doubly robust offline
policy evaluation strategies. Our analysis and supporting experimental results
provide insights into tradeoffs in the participation of heterogeneous data
sources in offline policy learning.

arXiv link: http://arxiv.org/abs/2305.12407v2

Econometrics arXiv paper, submitted: 2023-05-20

Identification and Estimation of Production Function with Unobserved Heterogeneity

Authors: Hiroyuki Kasahara, Paul Schrimpf, Michio Suzuki

This paper examines the nonparametric identifiability of production
functions, considering firm heterogeneity beyond Hicks-neutral technology
terms. We propose a finite mixture model to account for unobserved
heterogeneity in production technology and productivity growth processes. Our
analysis demonstrates that the production function for each latent type can be
nonparametrically identified using four periods of panel data, relying on
assumptions similar to those employed in existing literature on production
function and panel data identification. By analyzing Japanese plant-level panel
data, we uncover significant disparities in estimated input elasticities and
productivity growth processes among latent types within narrowly defined
industries. We further show that neglecting unobserved heterogeneity in input
elasticities may lead to substantial and systematic bias in the estimation of
productivity growth.

arXiv link: http://arxiv.org/abs/2305.12067v1

Econometrics arXiv updated paper (originally submitted: 2023-05-18)

Statistical Estimation for Covariance Structures with Tail Estimates using Nodewise Quantile Predictive Regression Models

Authors: Christis Katsouris

This paper considers the specification of covariance structures with tail
estimates. We focus on two aspects: (i) the estimation of the VaR-CoVaR risk
matrix in the case of larger number of time series observations than assets in
a portfolio using quantile predictive regression models without assuming the
presence of nonstationary regressors and; (ii) the construction of a novel
variable selection algorithm, so-called, Feature Ordering by Centrality
Exclusion (FOCE), which is based on an assumption-lean regression framework,
has no tuning parameters and is proved to be consistent under general sparsity
assumptions. We illustrate the usefulness of our proposed methodology with
numerical studies of real and simulated datasets when modelling systemic risk
in a network.

arXiv link: http://arxiv.org/abs/2305.11282v2

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2023-05-18

Context-Dependent Heterogeneous Preferences: A Comment on Barseghyan and Molinari (2023)

Authors: Matias D. Cattaneo, Xinwei Ma, Yusufcan Masatlioglu

Barseghyan and Molinari (2023) give sufficient conditions for
semi-nonparametric point identification of parameters of interest in a mixture
model of decision-making under risk, allowing for unobserved heterogeneity in
utility functions and limited consideration. A key assumption in the model is
that the heterogeneity of risk preferences is unobservable but
context-independent. In this comment, we build on their insights and present
identification results in a setting where the risk preferences are allowed to
be context-dependent.

arXiv link: http://arxiv.org/abs/2305.10934v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-05-18

Modeling Interference Using Experiment Roll-out

Authors: Ariel Boyarsky, Hongseok Namkoong, Jean Pouget-Abadie

Experiments on online marketplaces and social networks suffer from
interference, where the outcome of a unit is impacted by the treatment status
of other units. We propose a framework for modeling interference using a
ubiquitous deployment mechanism for experiments, staggered roll-out designs,
which slowly increase the fraction of units exposed to the treatment to
mitigate any unanticipated adverse side effects. Our main idea is to leverage
the temporal variations in treatment assignments introduced by roll-outs to
model the interference structure. Since there are often multiple competing
models of interference in practice we first develop a model selection method
that evaluates models based on their ability to explain outcome variation
observed along the roll-out. Through simulations, we show that our heuristic
model selection method, Leave-One-Period-Out, outperforms other baselines.
Next, we present a set of model identification conditions under which the
estimation of common estimands is possible and show how these conditions are
aided by roll-out designs. We conclude with a set of considerations, robustness
checks, and potential limitations for practitioners wishing to use our
framework.

arXiv link: http://arxiv.org/abs/2305.10728v2

Econometrics arXiv paper, submitted: 2023-05-17

Nowcasting with signature methods

Authors: Samuel N. Cohen, Silvia Lui, Will Malpass, Giulia Mantoan, Lars Nesheim, Áureo de Paula, Andrew Reeves, Craig Scott, Emma Small, Lingyi Yang

Key economic variables are often published with a significant delay of over a
month. The nowcasting literature has arisen to provide fast, reliable estimates
of delayed economic indicators and is closely related to filtering methods in
signal processing. The path signature is a mathematical object which captures
geometric properties of sequential data; it naturally handles missing data from
mixed frequency and/or irregular sampling -- issues often encountered when
merging multiple data sources -- by embedding the observed data in continuous
time. Calculating path signatures and using them as features in models has
achieved state-of-the-art results in fields such as finance, medicine, and
cyber security. We look at the nowcasting problem by applying regression on
signatures, a simple linear model on these nonlinear objects that we show
subsumes the popular Kalman filter. We quantify the performance via a
simulation exercise, and through application to nowcasting US GDP growth, where
we see a lower error than a dynamic factor model based on the New York Fed
staff nowcasting model. Finally we demonstrate the flexibility of this method
by applying regression on signatures to nowcast weekly fuel prices using daily
data. Regression on signatures is an easy-to-apply approach that allows great
flexibility for data with complex sampling patterns.

arXiv link: http://arxiv.org/abs/2305.10256v1

Econometrics arXiv paper, submitted: 2023-05-16

Monitoring multicountry macroeconomic risk

Authors: Dimitris Korobilis, Maximilian Schröder

We propose a multicountry quantile factor augmeneted vector autoregression
(QFAVAR) to model heterogeneities both across countries and across
characteristics of the distributions of macroeconomic time series. The presence
of quantile factors allows for summarizing these two heterogeneities in a
parsimonious way. We develop two algorithms for posterior inference that
feature varying level of trade-off between estimation precision and
computational speed. Using monthly data for the euro area, we establish the
good empirical properties of the QFAVAR as a tool for assessing the effects of
global shocks on country-level macroeconomic risks. In particular, QFAVAR
short-run tail forecasts are more accurate compared to a FAVAR with symmetric
Gaussian errors, as well as univariate quantile autoregressions that ignore
comovements among quantiles of macroeconomic variables. We also illustrate how
quantile impulse response functions and quantile connectedness measures,
resulting from the new model, can be used to implement joint risk scenario
analysis.

arXiv link: http://arxiv.org/abs/2305.09563v1

Econometrics arXiv paper, submitted: 2023-05-15

Grenander-type Density Estimation under Myerson Regularity

Authors: Haitian Xie

This study presents a novel approach to the density estimation of private
values from second-price auctions, diverging from the conventional use of
smoothing-based estimators. We introduce a Grenander-type estimator,
constructed based on a shape restriction in the form of a convexity constraint.
This constraint corresponds to the renowned Myerson regularity condition in
auction theory, which is equivalent to the concavity of the revenue function
for selling the auction item. Our estimator is nonparametric and does not
require any tuning parameters. Under mild assumptions, we establish the
cube-root consistency and show that the estimator asymptotically follows the
scaled Chernoff's distribution. Moreover, we demonstrate that the estimator
achieves the minimax optimal convergence rate.

arXiv link: http://arxiv.org/abs/2305.09052v1

Econometrics arXiv cross-link from cs.IT (cs.IT), submitted: 2023-05-15

Designing Discontinuities

Authors: Ibtihal Ferwana, Suyoung Park, Ting-Yi Wu, Lav R. Varshney

Discontinuities can be fairly arbitrary but also cause a significant impact
on outcomes in larger systems. Indeed, their arbitrariness is why they have
been used to infer causal relationships among variables in numerous settings.
Regression discontinuity from econometrics assumes the existence of a
discontinuous variable that splits the population into distinct partitions to
estimate the causal effects of a given phenomenon. Here we consider the design
of partitions for a given discontinuous variable to optimize a certain effect
previously studied using regression discontinuity. To do so, we propose a
quantization-theoretic approach to optimize the effect of interest, first
learning the causal effect size of a given discontinuous variable and then
applying dynamic programming for optimal quantization design of discontinuities
to balance the gain and loss in that effect size. We also develop a
computationally-efficient reinforcement learning algorithm for the dynamic
programming formulation of optimal quantization. We demonstrate our approach by
designing optimal time zone borders for counterfactuals of social capital,
social mobility, and health. This is based on regression discontinuity analyses
we perform on novel data, which may be of independent empirical interest.

arXiv link: http://arxiv.org/abs/2305.08559v3

Econometrics arXiv updated paper (originally submitted: 2023-05-15)

Hierarchical DCC-HEAVY Model for High-Dimensional Covariance Matrices

Authors: Emilija Dzuverovic, Matteo Barigozzi

We introduce a HD DCC-HEAVY class of hierarchical-type factor models for
high-dimensional covariance matrices, employing the realized measures built
from higher-frequency data. The modelling approach features straightforward
estimation and forecasting schemes, independent of the cross-sectional
dimension of the assets under consideration, and accounts for sophisticated
asymmetric dynamics in the covariances. Empirical analyses suggest that the HD
DCC-HEAVY models have a better in-sample fit and deliver statistically and
economically significant out-of-sample gains relative to the existing
hierarchical factor model and standard benchmarks. The results are robust under
different frequencies and market conditions.

arXiv link: http://arxiv.org/abs/2305.08488v2

Econometrics arXiv paper, submitted: 2023-05-15

Efficient Semiparametric Estimation of Average Treatment Effects Under Covariate Adaptive Randomization

Authors: Ahnaf Rafi

Experiments that use covariate adaptive randomization (CAR) are commonplace
in applied economics and other fields. In such experiments, the experimenter
first stratifies the sample according to observed baseline covariates and then
assigns treatment randomly within these strata so as to achieve balance
according to pre-specified stratum-specific target assignment proportions. In
this paper, we compute the semiparametric efficiency bound for estimating the
average treatment effect (ATE) in such experiments with binary treatments
allowing for the class of CAR procedures considered in Bugni, Canay, and Shaikh
(2018, 2019). This is a broad class of procedures and is motivated by those
used in practice. The stratum-specific target proportions play the role of the
propensity score conditional on all baseline covariates (and not just the
strata) in these experiments. Thus, the efficiency bound is a special case of
the bound in Hahn (1998), but conditional on all baseline covariates.
Additionally, this efficiency bound is shown to be achievable under the same
conditions as those used to derive the bound by using a cross-fitted
Nadaraya-Watson kernel estimator to form nonparametric regression adjustments.

arXiv link: http://arxiv.org/abs/2305.08340v1

Econometrics arXiv cross-link from math.OC (math.OC), submitted: 2023-05-13

The Nonstationary Newsvendor with (and without) Predictions

Authors: Lin An, Andrew A. Li, Benjamin Moseley, R. Ravi

The classic newsvendor model yields an optimal decision for a “newsvendor”
selecting a quantity of inventory, under the assumption that the demand is
drawn from a known distribution. Motivated by applications such as cloud
provisioning and staffing, we consider a setting in which newsvendor-type
decisions must be made sequentially, in the face of demand drawn from a
stochastic process that is both unknown and nonstationary. All prior work on
this problem either (a) assumes that the level of nonstationarity is known, or
(b) imposes additional statistical assumptions that enable accurate predictions
of the unknown demand. Our research tackles the Nonstationary Newsvendor
without these assumptions, both with and without predictions.
We first, in the setting without predictions, design a policy which we prove
achieves order-optimal regret -- ours is the first policy to accomplish this
without being given the level of nonstationarity of the underlying demand. We
then, for the first time, introduce a model for generic (i.e. with no
statistical assumptions) predictions with arbitrary accuracy, and propose a
policy that incorporates these predictions without being given their accuracy.
We upper bound the regret of this policy, and show that it matches the best
achievable regret had the accuracy of the predictions been known.
Our findings provide valuable insights on inventory management. Managers can
make more informed and effective decisions in dynamic environments, reducing
costs and enhancing service levels despite uncertain demand patterns. We
empirically validate our new policy with experiments based on three real-world
datasets containing thousands of time-series, showing that it succeeds in
closing approximately 74% of the gap between the best approaches based on
nonstationarity and predictions alone.

arXiv link: http://arxiv.org/abs/2305.07993v4

Econometrics arXiv paper, submitted: 2023-05-13

Semiparametrically Optimal Cointegration Test

Authors: Bo Zhou

This paper aims to address the issue of semiparametric efficiency for
cointegration rank testing in finite-order vector autoregressive models, where
the innovation distribution is considered an infinite-dimensional nuisance
parameter. Our asymptotic analysis relies on Le Cam's theory of limit
experiment, which in this context takes the form of Locally Asymptotically
Brownian Functional (LABF). By leveraging the structural version of LABF, an
Ornstein-Uhlenbeck experiment, we develop the asymptotic power envelopes of
asymptotically invariant tests for both cases with and without a time trend. We
propose feasible tests based on a nonparametrically estimated density and
demonstrate that their power can achieve the semiparametric power envelopes,
making them semiparametrically optimal. We validate the theoretical results
through large-sample simulations and illustrate satisfactory size control and
excellent power performance of our tests under small samples. In both cases
with and without time trend, we show that a remarkable amount of additional
power can be obtained from non-Gaussian distributions.

arXiv link: http://arxiv.org/abs/2305.08880v1

Econometrics arXiv paper, submitted: 2023-05-11

Band-Pass Filtering with High-Dimensional Time Series

Authors: Alessandro Giovannelli, Marco Lippi, Tommaso Proietti

The paper deals with the construction of a synthetic indicator of economic
growth, obtained by projecting a quarterly measure of aggregate economic
activity, namely gross domestic product (GDP), into the space spanned by a
finite number of smooth principal components, representative of the
medium-to-long-run component of economic growth of a high-dimensional time
series, available at the monthly frequency. The smooth principal components
result from applying a cross-sectional filter distilling the low-pass component
of growth in real time. The outcome of the projection is a monthly nowcast of
the medium-to-long-run component of GDP growth. After discussing the
theoretical properties of the indicator, we deal with the assessment of its
reliability and predictive validity with reference to a panel of macroeconomic
U.S. time series.

arXiv link: http://arxiv.org/abs/2305.06618v1

Econometrics arXiv paper, submitted: 2023-05-10

The price elasticity of Gleevec in patients with Chronic Myeloid Leukemia enrolled in Medicare Part D: Evidence from a regression discontinuity design

Authors: Samantha E. Clark, Ruth Etzioni, Jerry Radich, Zachary Marcum, Anirban Basu

Objective To assess the price elasticity of branded imatinib in chronic
myeloid leukemia (CML) patients on Medicare Part D to determine if high
out-of-pocket payments (OOP) are driving the substantial levels of
non-adherence observed in this population.
Data sources and study setting We use data from the TriNetX Diamond Network
(TDN) United States database for the period from first availability in 2011
through the end of patent exclusivity following the introduction of generic
imatinib in early 2016.
Study design We implement a fuzzy regression discontinuity design to
separately estimate the effect of Medicare Part D enrollment at age 65 on
adherence and OOP in newly-diagnosed CML patients initiating branded imatinib.
The corresponding price elasticity of demand (PED) is estimated and results are
assessed across a variety of specifications and robustness checks.
Data collection/extraction methods Data from eligible patients following the
application of inclusion and exclusion criteria were analyzed.
Principal findings Our analysis suggests that there is a significant increase
in initial OOP of $232 (95% Confidence interval (CI): $102 to $362) for
individuals that enrolled in Part D due to expanded eligibility at age 65. The
relatively smaller and non-significant decrease in adherence of only 6
percentage points (95% CI: -0.21 to 0.08) led to a PED of -0.02 (95% CI:
-0.056, 0.015).
Conclusion This study provides evidence regarding the financial impact of
coinsurance-based benefit designs on Medicare-age patients with CML initiating
branded imatinib. Results indicate that factors besides high OOP are driving
the substantial non-adherence observed in this population and add to the
growing literature on PED for specialty drugs.

arXiv link: http://arxiv.org/abs/2305.06076v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2023-05-10

On the Time-Varying Structure of the Arbitrage Pricing Theory using the Japanese Sector Indices

Authors: Koichiro Moriya, Akihiko Noda

This paper is the first study to examine the time instability of the APT in
the Japanese stock market. In particular, we measure how changes in each risk
factor affect the stock risk premiums to investigate the validity of the APT
over time, applying the rolling window method to Fama and MacBeth's (1973)
two-step regression and Kamstra and Shi's (2023) generalized GRS test. We
summarize our empirical results as follows: (1) the changes in monetary policy
by major central banks greatly affect the validity of the APT in Japan, and (2)
the time-varying estimates of the risk premiums for each factor are also
unstable over time, and they are affected by the business cycle and economic
crises. Therefore, we conclude that the validity of the APT as an appropriate
model to explain the Japanese sector index is not stable over time.

arXiv link: http://arxiv.org/abs/2305.05998v4

Econometrics arXiv updated paper (originally submitted: 2023-05-10)

Does Principal Component Analysis Preserve the Sparsity in Sparse Weak Factor Models?

Authors: Jie Wei, Yonghui Zhang

This paper studies the principal component (PC) method-based estimation of
weak factor models with sparse loadings. We uncover an intrinsic near-sparsity
preservation property for the PC estimators of loadings, which comes from the
approximately upper triangular (block) structure of the rotation matrix. It
implies an asymmetric relationship among factors: the rotated loadings for a
stronger factor can be contaminated by those from a weaker one, but the
loadings for a weaker factor is almost free of the impact of those from a
stronger one. More importantly, the finding implies that there is no need to
use complicated penalties to sparsify the loading estimators. Instead, we adopt
a simple screening method to recover the sparsity and construct estimators for
various factor strengths. In addition, for sparse weak factor models, we
provide a singular value thresholding-based approach to determine the number of
factors and establish uniform convergence rates for PC estimators, which
complement Bai and Ng (2023). The accuracy and efficiency of the proposed
estimators are investigated via Monte Carlo simulations. The application to the
FRED-QD dataset reveals the underlying factor strengths and loading sparsity as
well as their dynamic features.

arXiv link: http://arxiv.org/abs/2305.05934v2

Econometrics arXiv updated paper (originally submitted: 2023-05-06)

Volatility of Volatility and Leverage Effect from Options

Authors: Carsten H. Chong, Viktor Todorov

We propose model-free (nonparametric) estimators of the volatility of
volatility and leverage effect using high-frequency observations of short-dated
options. At each point in time, we integrate available options into estimates
of the conditional characteristic function of the price increment until the
options' expiration and we use these estimates to recover spot volatility. Our
volatility of volatility estimator is then formed from the sample variance and
first-order autocovariance of the spot volatility increments, with the latter
correcting for the bias in the former due to option observation errors. The
leverage effect estimator is the sample covariance between price increments and
the estimated volatility increments. The rate of convergence of the estimators
depends on the diffusive innovations in the latent volatility process as well
as on the observation error in the options with strikes in the vicinity of the
current spot price. Feasible inference is developed in a way that does not
require prior knowledge of the source of estimation error that is
asymptotically dominating.

arXiv link: http://arxiv.org/abs/2305.04137v2

Econometrics arXiv cross-link from stat.OT (stat.OT), submitted: 2023-05-04

Risk management in the use of published statistical results for policy decisions

Authors: Duncan Ermini Leaf

Statistical inferential results generally come with a measure of reliability
for decision-making purposes. For a policy implementer, the value of
implementing published policy research depends critically upon this
reliability. For a policy researcher, the value of policy implementation may
depend weakly or not at all upon the policy's outcome. Some researchers might
benefit from overstating the reliability of statistical results. Implementers
may find it difficult or impossible to determine whether researchers are
overstating reliability. This information asymmetry between researchers and
implementers can lead to an adverse selection problem where, at best, the full
benefits of a policy are not realized or, at worst, a policy is deemed too
risky to implement at any scale. Researchers can remedy this by guaranteeing
the policy outcome. Researchers can overcome their own risk aversion and wealth
constraints by exchanging risks with other researchers or offering only partial
insurance. The problem and remedy are illustrated using a confidence interval
for the success probability of a binomial policy outcome.

arXiv link: http://arxiv.org/abs/2305.03205v2

Econometrics arXiv updated paper (originally submitted: 2023-05-04)

Debiased Inference for Dynamic Nonlinear Panels with Multi-dimensional Heterogeneities

Authors: Xuan Leng, Jiaming Mao, Yutao Sun

We introduce a generic class of dynamic nonlinear heterogeneous parameter
models that incorporate individual and time fixed effects in both the intercept
and slope. These models are subject to the incidental parameter problem, in
that the limiting distribution of the point estimator is not centered at zero,
and that test statistics do not follow their standard asymptotic distributions
as in the absence of the fixed effects. To address the problem, we develop an
analytical bias correction procedure to construct a bias-corrected likelihood.
The resulting estimator follows an asymptotic normal distribution with mean
zero. Moreover, likelihood-based tests statistics -- including
likelihood-ratio, Lagrange-multiplier, and Wald tests -- follow the limiting
chi-squared distribution under the null hypothesis. Simulations demonstrate the
effectiveness of the proposed correction method, and an empirical application
on the labor force participation of single mothers underscores its practical
importance.

arXiv link: http://arxiv.org/abs/2305.03134v4

Econometrics arXiv updated paper (originally submitted: 2023-05-03)

Doubly Robust Uniform Confidence Bands for Group-Time Conditional Average Treatment Effects in Difference-in-Differences

Authors: Shunsuke Imai, Lei Qin, Takahide Yanagi

We consider a panel data analysis to examine the heterogeneity in treatment
effects with respect to groups, periods, and a pre-treatment covariate of
interest in the staggered difference-in-differences setting of Callaway and
Sant'Anna (2021). Under standard identification conditions, a doubly robust
estimand conditional on the covariate identifies the group-time conditional
average treatment effect given the covariate. Focusing on the case of a
continuous covariate, we propose a three-step estimation procedure based on
nonparametric local polynomial regressions and parametric estimation methods.
Using uniformly valid distributional approximation results for empirical
processes and weighted/multiplier bootstrapping, we develop doubly robust
inference methods to construct uniform confidence bands for the group-time
conditional average treatment effect function and a variety of useful summary
parameters. The accompanying R package didhetero allows for easy implementation
of our methods.

arXiv link: http://arxiv.org/abs/2305.02185v4

Econometrics arXiv updated paper (originally submitted: 2023-05-02)

Large Global Volatility Matrix Analysis Based on Observation Structural Information

Authors: Sung Hoon Choi, Donggyu Kim

In this paper, we develop a novel large volatility matrix estimation
procedure for analyzing global financial markets. Practitioners often use
lower-frequency data, such as weekly or monthly returns, to address the issue
of different trading hours in the international financial market. However, this
approach can lead to inefficiency due to information loss. To mitigate this
problem, our proposed method, called Structured Principal Orthogonal complEment
Thresholding (Structured-POET), incorporates observation structural information
for both global and national factor models. We establish the asymptotic
properties of the Structured-POET estimator, and also demonstrate the drawbacks
of conventional covariance matrix estimation procedures when using
lower-frequency data. Finally, we apply the Structured-POET estimator to an
out-of-sample portfolio allocation study using international stock market data.

arXiv link: http://arxiv.org/abs/2305.01464v3

Econometrics arXiv updated paper (originally submitted: 2023-05-02)

Transfer Estimates for Causal Effects across Heterogeneous Sites

Authors: Konrad Menzel

We consider the problem of extrapolating treatment effects across
heterogeneous populations (“sites"/“contexts"). We consider an idealized
scenario in which the researcher observes cross-sectional data for a large
number of units across several “experimental" sites in which an intervention
has already been implemented to a new “target" site for which a baseline
survey of unit-specific, pre-treatment outcomes and relevant attributes is
available. Our approach treats the baseline as functional data, and this choice
is motivated by the observation that unobserved site-specific confounders
manifest themselves not only in average levels of outcomes, but also how these
interact with observed unit-specific attributes. We consider the problem of
determining the optimal finite-dimensional feature space in which to solve that
prediction problem. Our approach is design-based in the sense that the
performance of the predictor is evaluated given the specific, finite selection
of experimental and target sites. Our approach is nonparametric, and our formal
results concern the construction of an optimal basis of predictors as well as
convergence rates for the estimated conditional average treatment effect
relative to the constrained-optimal population predictor for the target site.
We quantify the potential gains from adapting experimental estimates to a
target location in an application to conditional cash transfer (CCT) programs
using a combined data set from five multi-site randomized controlled trials.

arXiv link: http://arxiv.org/abs/2305.01435v7

Econometrics arXiv updated paper (originally submitted: 2023-05-02)

Estimating Input Coefficients for Regional Input-Output Tables Using Deep Learning with Mixup

Authors: Shogo Fukui

An input-output table is an important data for analyzing the economic
situation of a region. Generally, the input-output table for each region
(regional input-output table) in Japan is not always publicly available, so it
is necessary to estimate the table. In particular, various methods have been
developed for estimating input coefficients, which are an important part of the
input-output table. Currently, non-survey methods are often used to estimate
input coefficients because they require less data and computation, but these
methods have some problems, such as discarding information and requiring
additional data for estimation.
In this study, the input coefficients are estimated by approximating the
generation process with an artificial neural network (ANN) to mitigate the
problems of the non-survey methods and to estimate the input coefficients with
higher precision. To avoid over-fitting due to the small data used, data
augmentation, called mixup, is introduced to increase the data size by
generating virtual regions through region composition and scaling.
By comparing the estimates of the input coefficients with those of Japan as a
whole, it is shown that the accuracy of the method of this research is higher
and more stable than that of the conventional non-survey methods. In addition,
the estimated input coefficients for the three cities in Japan are generally
close to the published values for each city.

arXiv link: http://arxiv.org/abs/2305.01201v3

Econometrics arXiv updated paper (originally submitted: 2023-05-01)

Estimation and Inference in Threshold Predictive Regression Models with Locally Explosive Regressors

Authors: Christis Katsouris

In this paper, we study the estimation of the threshold predictive regression
model with hybrid stochastic local unit root predictors. We demonstrate the
estimation procedure and derive the asymptotic distribution of the least square
estimator and the IV based estimator proposed by Magdalinos and Phillips
(2009), under the null hypothesis of a diminishing threshold effect. Simulation
experiments focus on the finite sample performance of our proposed estimators
and the corresponding predictability tests as in Gonzalo and Pitarakis (2012),
under the presence of threshold effects with stochastic local unit roots. An
empirical application to stock return equity indices, illustrate the usefulness
of our framework in uncovering regimes of predictability during certain
periods. In particular, we focus on an aspect not previously examined in the
predictability literature, that is, the effect of economic policy uncertainty.

arXiv link: http://arxiv.org/abs/2305.00860v3

Econometrics arXiv updated paper (originally submitted: 2023-05-01)

Double and Single Descent in Causal Inference with an Application to High-Dimensional Synthetic Control

Authors: Jann Spiess, Guido Imbens, Amar Venugopal

Motivated by a recent literature on the double-descent phenomenon in machine
learning, we consider highly over-parameterized models in causal inference,
including synthetic control with many control units. In such models, there may
be so many free parameters that the model fits the training data perfectly. We
first investigate high-dimensional linear regression for imputing wage data and
estimating average treatment effects, where we find that models with many more
covariates than sample size can outperform simple ones. We then document the
performance of high-dimensional synthetic control estimators with many control
units. We find that adding control units can help improve imputation
performance even beyond the point where the pre-treatment fit is perfect. We
provide a unified theoretical perspective on the performance of these
high-dimensional models. Specifically, we show that more complex models can be
interpreted as model-averaging estimators over simpler ones, which we link to
an improvement in average performance. This perspective yields concrete
insights into the use of synthetic control when control units are many relative
to the number of pre-treatment periods.

arXiv link: http://arxiv.org/abs/2305.00700v3

Econometrics arXiv updated paper (originally submitted: 2023-04-30)

Optimal tests following sequential experiments

Authors: Karun Adusumilli

Recent years have seen tremendous advances in the theory and application of
sequential experiments. While these experiments are not always designed with
hypothesis testing in mind, researchers may still be interested in performing
tests after the experiment is completed. The purpose of this paper is to aid in
the development of optimal tests for sequential experiments by analyzing their
asymptotic properties. Our key finding is that the asymptotic power function of
any test can be matched by a test in a limit experiment where a Gaussian
process is observed for each treatment, and inference is made for the drifts of
these processes. This result has important implications, including a powerful
sufficiency result: any candidate test only needs to rely on a fixed set of
statistics, regardless of the type of sequential experiment. These statistics
are the number of times each treatment has been sampled by the end of the
experiment, along with final value of the score (for parametric models) or
efficient influence function (for non-parametric models) process for each
treatment. We then characterize asymptotically optimal tests under various
restrictions such as unbiasedness, \alpha-spending constraints etc. Finally, we
apply our our results to three key classes of sequential experiments: costly
sampling, group sequential trials, and bandit experiments, and show how optimal
inference can be conducted in these scenarios.

arXiv link: http://arxiv.org/abs/2305.00403v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-04-27

Augmented balancing weights as linear regression

Authors: David Bruns-Smith, Oliver Dukes, Avi Feller, Elizabeth L. Ogburn

We provide a novel characterization of augmented balancing weights, also
known as automatic debiased machine learning (AutoDML). These popular doubly
robust or de-biased machine learning estimators combine outcome modeling with
balancing weights - weights that achieve covariate balance directly in lieu of
estimating and inverting the propensity score. When the outcome and weighting
models are both linear in some (possibly infinite) basis, we show that the
augmented estimator is equivalent to a single linear model with coefficients
that combine the coefficients from the original outcome model and coefficients
from an unpenalized ordinary least squares (OLS) fit on the same data. We see
that, under certain choices of regularization parameters, the augmented
estimator often collapses to the OLS estimator alone; this occurs for example
in a re-analysis of the Lalonde 1986 dataset. We then extend these results to
specific choices of outcome and weighting models. We first show that the
augmented estimator that uses (kernel) ridge regression for both outcome and
weighting models is equivalent to a single, undersmoothed (kernel) ridge
regression. This holds numerically in finite samples and lays the groundwork
for a novel analysis of undersmoothing and asymptotic rates of convergence.
When the weighting model is instead lasso-penalized regression, we give
closed-form expressions for special cases and demonstrate a “double
selection” property. Our framework opens the black box on this increasingly
popular class of estimators, bridges the gap between existing results on the
semiparametric efficiency of undersmoothed and doubly robust estimators, and
provides new insights into the performance of augmented balancing weights.

arXiv link: http://arxiv.org/abs/2304.14545v3

Econometrics arXiv paper, submitted: 2023-04-27

Assessing Text Mining and Technical Analyses on Forecasting Financial Time Series

Authors: Ali Lashgari

Forecasting financial time series (FTS) is an essential field in finance and
economics that anticipates market movements in financial markets. This paper
investigates the accuracy of text mining and technical analyses in forecasting
financial time series. It focuses on the S&P500 stock market index during the
pandemic, which tracks the performance of the largest publicly traded companies
in the US. The study compares two methods of forecasting the future price of
the S&P500: text mining, which uses NLP techniques to extract meaningful
insights from financial news, and technical analysis, which uses historical
price and volume data to make predictions. The study examines the advantages
and limitations of both methods and analyze their performance in predicting the
S&P500. The FinBERT model outperforms other models in terms of S&P500 price
prediction, as evidenced by its lower RMSE value, and has the potential to
revolutionize financial analysis and prediction using financial news data.
Keywords: ARIMA, BERT, FinBERT, Forecasting Financial Time Series, GARCH, LSTM,
Technical Analysis, Text Mining JEL classifications: G4, C8

arXiv link: http://arxiv.org/abs/2304.14544v1

Econometrics arXiv updated paper (originally submitted: 2023-04-27)

Convexity Not Required: Estimation of Smooth Moment Condition Models

Authors: Jean-Jacques Forneron, Liang Zhong

Generalized and Simulated Method of Moments are often used to estimate
structural Economic models. Yet, it is commonly reported that optimization is
challenging because the corresponding objective function is non-convex. For
smooth problems, this paper shows that convexity is not required: under
conditions involving the Jacobian of the moments, certain algorithms are
globally convergent. These include a gradient-descent and a Gauss-Newton
algorithm with appropriate choice of tuning parameters. The results are robust
to 1) non-convexity, 2) one-to-one moderately non-linear reparameterizations,
and 3) moderate misspecification. The conditions preclude non-global optima.
Numerical and empirical examples illustrate the condition, non-convexity, and
convergence properties of different optimizers.

arXiv link: http://arxiv.org/abs/2304.14386v2

Econometrics arXiv cross-link from physics.data-an (physics.data-an), submitted: 2023-04-27

A universal model for the Lorenz curve with novel applications for datasets containing zeros and/or exhibiting extreme inequality

Authors: Thitithep Sitthiyot, Kanyarat Holasut

Given that the existing parametric functional forms for the Lorenz curve do
not fit all possible size distributions, a universal parametric functional form
is introduced. By using the empirical data from different scientific
disciplines and also the hypothetical data, this study shows that, the proposed
model fits not only the data whose actual Lorenz plots have a typical convex
segment but also the data whose actual Lorenz plots have both horizontal and
convex segments practically well. It also perfectly fits the data whose
observation is larger in size while the rest of observations are smaller and
equal in size as characterized by 2 positive-slope linear segments. In
addition, the proposed model has a closed-form expression for the Gini index,
making it computationally convenient to calculate. Considering that the Lorenz
curve and the Gini index are widely used in various disciplines of sciences,
the proposed model and the closed-form expression for the Gini index could be
used as alternative tools to analyze size distributions of non-negative
quantities and examine their inequalities or unevennesses.

arXiv link: http://arxiv.org/abs/2304.13934v1

Econometrics arXiv updated paper (originally submitted: 2023-04-27)

Difference-in-Differences with Compositional Changes

Authors: Pedro H. C. Sant'Anna, Qi Xu

This paper studies difference-in-differences (DiD) setups with repeated
cross-sectional data and potential compositional changes across time periods.
We begin our analysis by deriving the efficient influence function and the
semiparametric efficiency bound for the average treatment effect on the treated
(ATT). We introduce nonparametric estimators that attain the semiparametric
efficiency bound under mild rate conditions on the estimators of the nuisance
functions, exhibiting a type of rate doubly robust (DR) property. Additionally,
we document a trade-off related to compositional changes: We derive the
asymptotic bias of DR DiD estimators that erroneously exclude compositional
changes and the efficiency loss when one fails to correctly rule out
compositional changes. We propose a nonparametric Hausman-type test for
compositional changes based on these trade-offs. The finite sample performance
of the proposed DiD tools is evaluated through Monte Carlo experiments and an
empirical application. We consider extensions of our framework that accommodate
double machine learning procedures with cross-fitting, and setups when some
units are observed in both pre- and post-treatment periods. As a by-product of
our analysis, we present a new uniform stochastic expansion of the local
polynomial multinomial logit estimator, which may be of independent interest.

arXiv link: http://arxiv.org/abs/2304.13925v2

Econometrics arXiv paper, submitted: 2023-04-26

Estimation of Characteristics-based Quantile Factor Models

Authors: Liang Chen, Juan Jose Dolado, Jesus Gonzalo, Haozi Pan

This paper studies the estimation of characteristic-based quantile factor
models where the factor loadings are unknown functions of observed individual
characteristics while the idiosyncratic error terms are subject to conditional
quantile restrictions. We propose a three-stage estimation procedure that is
easily implementable in practice and has nice properties. The convergence
rates, the limiting distributions of the estimated factors and loading
functions, and a consistent selection criterion for the number of factors at
each quantile are derived under general conditions. The proposed estimation
methodology is shown to work satisfactorily when: (i) the idiosyncratic errors
have heavy tails, (ii) the time dimension of the panel dataset is not large,
and (iii) the number of factors exceeds the number of characteristics. Finite
sample simulations and an empirical application aimed at estimating the loading
functions of the daily returns of a large panel of S&P500 index securities
help illustrate these properties.

arXiv link: http://arxiv.org/abs/2304.13206v1

Econometrics arXiv paper, submitted: 2023-04-25

Common Correlated Effects Estimation of Nonlinear Panel Data Models

Authors: Liang Chen, Minyuan Zhang

This paper focuses on estimating the coefficients and average partial effects
of observed regressors in nonlinear panel data models with interactive fixed
effects, using the common correlated effects (CCE) framework. The proposed
two-step estimation method involves applying principal component analysis to
estimate latent factors based on cross-sectional averages of the regressors in
the first step, and jointly estimating the coefficients of the regressors and
factor loadings in the second step. The asymptotic distributions of the
proposed estimators are derived under general conditions, assuming that the
number of time-series observations is comparable to the number of
cross-sectional observations. To correct for asymptotic biases of the
estimators, we introduce both analytical and split-panel jackknife methods, and
confirm their good performance in finite samples using Monte Carlo simulations.
An empirical application utilizes the proposed method to study the arbitrage
behaviour of nonfinancial firms across different security markets.

arXiv link: http://arxiv.org/abs/2304.13199v1

Econometrics arXiv updated paper (originally submitted: 2023-04-25)

Enhanced multilayer perceptron with feature selection and grid search for travel mode choice prediction

Authors: Li Tang, Chuanli Tang, Qi Fu

Accurate and reliable prediction of individual travel mode choices is crucial
for developing multi-mode urban transportation systems, conducting
transportation planning and formulating traffic demand management strategies.
Traditional discrete choice models have dominated the modelling methods for
decades yet suffer from strict model assumptions and low prediction accuracy.
In recent years, machine learning (ML) models, such as neural networks and
boosting models, are widely used by researchers for travel mode choice
prediction and have yielded promising results. However, despite the superior
prediction performance, a large body of ML methods, especially the branch of
neural network models, is also limited by overfitting and tedious model
structure determination process. To bridge this gap, this study proposes an
enhanced multilayer perceptron (MLP; a neural network) with two hidden layers
for travel mode choice prediction; this MLP is enhanced by XGBoost (a boosting
method) for feature selection and a grid search method for optimal hidden
neurone determination of each hidden layer. The proposed method was trained and
tested on a real resident travel diary dataset collected in Chengdu, China.

arXiv link: http://arxiv.org/abs/2304.12698v2

Econometrics arXiv paper, submitted: 2023-04-25

The Ordinary Least Eigenvalues Estimator

Authors: Yassine Sbai Sassi

We propose a rate optimal estimator for the linear regression model on
network data with interacted (unobservable) individual effects. The estimator
achieves a faster rate of convergence $N$ compared to the standard estimators'
$N$ rate and is efficient in cases that we discuss. We observe that the
individual effects alter the eigenvalue distribution of the data's matrix
representation in significant and distinctive ways. We subsequently offer a
correction for the ordinary least squares' objective function to
attenuate the statistical noise that arises due to the individual effects, and
in some cases, completely eliminate it. The new estimator is asymptotically
normal and we provide a valid estimator for its asymptotic covariance matrix.
While this paper only considers models accounting for first-order interactions
between individual effects, our estimation procedure is naturally extendable to
higher-order interactions and more general specifications of the error terms.

arXiv link: http://arxiv.org/abs/2304.12554v1

Econometrics arXiv updated paper (originally submitted: 2023-04-24)

Determination of the effective cointegration rank in high-dimensional time-series predictive regressions

Authors: Puyi Fang, Zhaoxing Gao, Ruey S. Tsay

This paper proposes a new approach to identifying the effective cointegration
rank in high-dimensional unit-root (HDUR) time series from a prediction
perspective using reduced-rank regression. For a HDUR process $x_t\in
R^N$ and a stationary series $y_t\in R^p$ of
interest, our goal is to predict future values of $y_t$ using
$x_t$ and lagged values of $y_t$. The proposed framework
consists of a two-step estimation procedure. First, the Principal Component
Analysis is used to identify all cointegrating vectors of $x_t$.
Second, the co-integrated stationary series are used as regressors, together
with some lagged variables of $y_t$, to predict $y_t$. The
estimated reduced rank is then defined as the effective cointegration rank of
$x_t$. Under the scenario that the autoregressive coefficient matrices
are sparse (or of low-rank), we apply the Least Absolute Shrinkage and
Selection Operator (or the reduced-rank techniques) to estimate the
autoregressive coefficients when the dimension involved is high. Theoretical
properties of the estimators are established under the assumptions that the
dimensions $p$ and $N$ and the sample size $T \to \infty$. Both simulated and
real examples are used to illustrate the proposed framework, and the empirical
application suggests that the proposed procedure fares well in predicting stock
returns.

arXiv link: http://arxiv.org/abs/2304.12134v2

Econometrics arXiv paper, submitted: 2023-04-23

Policy Learning under Biased Sample Selection

Authors: Lihua Lei, Roshni Sahoo, Stefan Wager

Practitioners often use data from a randomized controlled trial to learn a
treatment assignment policy that can be deployed on a target population. A
recurring concern in doing so is that, even if the randomized trial was
well-executed (i.e., internal validity holds), the study participants may not
represent a random sample of the target population (i.e., external validity
fails)--and this may lead to policies that perform suboptimally on the target
population. We consider a model where observable attributes can impact sample
selection probabilities arbitrarily but the effect of unobservable attributes
is bounded by a constant, and we aim to learn policies with the best possible
performance guarantees that hold under any sampling bias of this type. In
particular, we derive the partial identification result for the worst-case
welfare in the presence of sampling bias and show that the optimal max-min,
max-min gain, and minimax regret policies depend on both the conditional
average treatment effect (CATE) and the conditional value-at-risk (CVaR) of
potential outcomes given covariates. To avoid finite-sample inefficiencies of
plug-in estimates, we further provide an end-to-end procedure for learning the
optimal max-min and max-min gain policies that does not require the separate
estimation of nuisance parameters.

arXiv link: http://arxiv.org/abs/2304.11735v1

Econometrics arXiv paper, submitted: 2023-04-19

The Impact of Industrial Zone:Evidence from China's National High-tech Zone Policy

Authors: Li Han

Based on the statistical yearbook data and related patent data of 287 cities
in China from 2000 to 2020, this study regards the policy of establishing the
national high-tech zones as a quasi-natural experiment. Using this experiment,
this study firstly estimated the treatment effect of the policy and checked the
robustness of the estimation. Then the study examined the heterogeneity in
different geographic demarcation of China and in different city level of China.
After that, this study explored the possible influence mechanism of the policy.
It shows that the possible mechanism of the policy is financial support,
industrial agglomeration of secondary industry and the spillovers. In the end,
this study examined the spillovers deeply and showed the distribution of
spillover effect.

arXiv link: http://arxiv.org/abs/2304.09775v1

Econometrics arXiv paper, submitted: 2023-04-18

A hybrid model for day-ahead electricity price forecasting: Combining fundamental and stochastic modelling

Authors: Mira Watermeyer, Thomas Möbius, Oliver Grothe, Felix Müsgens

The accurate prediction of short-term electricity prices is vital for
effective trading strategies, power plant scheduling, profit maximisation and
efficient system operation. However, uncertainties in supply and demand make
such predictions challenging. We propose a hybrid model that combines a
techno-economic energy system model with stochastic models to address this
challenge. The techno-economic model in our hybrid approach provides a deep
understanding of the market. It captures the underlying factors and their
impacts on electricity prices, which is impossible with statistical models
alone. The statistical models incorporate non-techno-economic aspects, such as
the expectations and speculative behaviour of market participants, through the
interpretation of prices. The hybrid model generates both conventional point
predictions and probabilistic forecasts, providing a comprehensive
understanding of the market landscape. Probabilistic forecasts are particularly
valuable because they account for market uncertainty, facilitating informed
decision-making and risk management. Our model delivers state-of-the-art
results, helping market participants to make informed decisions and operate
their systems more efficiently.

arXiv link: http://arxiv.org/abs/2304.09336v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2023-04-18

Club coefficients in the UEFA Champions League: Time for shift to an Elo-based formula

Authors: László Csató

One of the most popular club football tournaments, the UEFA Champions League,
will see a fundamental reform from the 2024/25 season: the traditional group
stage will be replaced by one league where each of the 36 teams plays eight
matches. To guarantee that the opponents of the clubs are of the same strength
in the new design, it is crucial to forecast the performance of the teams
before the tournament as well as possible. This paper investigates whether the
currently used rating of the teams, the UEFA club coefficient, can be improved
by taking the games played in the national leagues into account. According to
our logistic regression models, a variant of the Elo method provides a higher
accuracy in terms of explanatory power in the Champions League matches. The
Union of European Football Associations (UEFA) is encouraged to follow the
example of the FIFA World Ranking and reform the calculation of the club
coefficients in order to avoid unbalanced schedules in the novel tournament
format of the Champions League.

arXiv link: http://arxiv.org/abs/2304.09078v6

Econometrics arXiv updated paper (originally submitted: 2023-04-18)

Doubly Robust Estimators with Weak Overlap

Authors: Yukun Ma, Pedro H. C. Sant'Anna, Yuya Sasaki, Takuya Ura

In this paper, we derive a new class of doubly robust estimators for
treatment effect estimands that is also robust against weak covariate overlap.
Our proposed estimator relies on trimming observations with extreme propensity
scores and uses a bias correction device for trimming bias. Our framework
accommodates many research designs, such as unconfoundedness, local treatment
effects, and difference-in-differences. Simulation exercises illustrate that
our proposed tools indeed have attractive finite sample properties, which are
aligned with our theoretical asymptotic results.

arXiv link: http://arxiv.org/abs/2304.08974v2

Econometrics arXiv updated paper (originally submitted: 2023-04-17)

Adjustment with Many Regressors Under Covariate-Adaptive Randomizations

Authors: Liang Jiang, Liyao Li, Ke Miao, Yichong Zhang

Our paper discovers a new trade-off of using regression adjustments (RAs) in
causal inference under covariate-adaptive randomizations (CARs). On one hand,
RAs can improve the efficiency of causal estimators by incorporating
information from covariates that are not used in the randomization. On the
other hand, RAs can degrade estimation efficiency due to their estimation
errors, which are not asymptotically negligible when the number of regressors
is of the same order as the sample size. Ignoring the estimation errors of RAs
may result in serious over-rejection of causal inference under the null
hypothesis. To address the issue, we construct a new ATE estimator by optimally
linearly combining the estimators with and without RAs. We then develop a
unified inference theory for this estimator under CARs. It has two features:
(1) the Wald test based on it achieves the exact asymptotic size under the null
hypothesis, regardless of whether the number of covariates is fixed or diverges
no faster than the sample size; and (2) it guarantees weak efficiency
improvement over estimators both with and without RAs.

arXiv link: http://arxiv.org/abs/2304.08184v5

Econometrics arXiv updated paper (originally submitted: 2023-04-16)

Coarsened Bayesian VARs -- Correcting BVARs for Incorrect Specification

Authors: Florian Huber, Massimiliano Marcellino, Tobias Scheckel

Model misspecification in multivariate econometric models can strongly
influence estimates of quantities of interest such as structural parameters,
forecast distributions or responses to structural shocks, even more so if
higher-order forecasts or responses are considered, due to parameter Model
misspecification in multivariate econometric models can strongly influence
estimates of quantities of interest such as structural parameters, forecast
distributions or responses to structural shocks, even more so if higher-order
forecasts or responses are considered, due to parameter convolution. We propose
a simple method for addressing these specification issues in the context of
Bayesian VARs. Our method, called coarsened Bayesian VARs (cBVARs), replaces
the exact likelihood with a coarsened likelihood that takes into account that
the model might be misspecified along important but unknown dimensions. Since
endogenous variables in a VAR can feature different degrees of
misspecification, our model allows for this and automatically detects the
degree of misspecification. The resulting cBVARs perform well in simulations
for several types of misspecification. Applied to US data, cBVARs improve point
and density forecasts compared to standard BVARs.

arXiv link: http://arxiv.org/abs/2304.07856v3

Econometrics arXiv paper, submitted: 2023-04-16

Penalized Likelihood Inference with Survey Data

Authors: Joann Jasiak, Purevdorj Tuvaandorj

This paper extends three Lasso inferential methods, Debiased Lasso,
$C(\alpha)$ and Selective Inference to a survey environment. We establish the
asymptotic validity of the inference procedures in generalized linear models
with survey weights and/or heteroskedasticity. Moreover, we generalize the
methods to inference on nonlinear parameter functions e.g. the average marginal
effect in survey logit models. We illustrate the effectiveness of the approach
in simulated data and Canadian Internet Use Survey 2020 data.

arXiv link: http://arxiv.org/abs/2304.07855v1

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2023-04-15

Gini-stable Lorenz curves and their relation to the generalised Pareto distribution

Authors: Lucio Bertoli-Barsotti, Marek Gagolewski, Grzegorz Siudem, Barbara Żogała-Siudem

We introduce an iterative discrete information production process where we
can extend ordered normalised vectors by new elements based on a simple affine
transformation, while preserving the predefined level of inequality, G, as
measured by the Gini index.
Then, we derive the family of empirical Lorenz curves of the corresponding
vectors and prove that it is stochastically ordered with respect to both the
sample size and G which plays the role of the uncertainty parameter. We prove
that asymptotically, we obtain all, and only, Lorenz curves generated by a new,
intuitive parametrisation of the finite-mean Pickands' Generalised Pareto
Distribution (GPD) that unifies three other families, namely: the Pareto Type
II, exponential, and scaled beta distributions. The family is not only totally
ordered with respect to the parameter G, but also, thanks to our derivations,
has a nice underlying interpretation. Our result may thus shed a new light on
the genesis of this family of distributions.
Our model fits bibliometric, informetric, socioeconomic, and environmental
data reasonably well. It is quite user-friendly for it only depends on the
sample size and its Gini index.

arXiv link: http://arxiv.org/abs/2304.07480v3

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2023-04-15

Equivalence of inequality indices: Three dimensions of impact revisited

Authors: Lucio Bertoli-Barsotti, Marek Gagolewski, Grzegorz Siudem, Barbara Żogała-Siudem

Inequality is an inherent part of our lives: we see it in the distribution of
incomes, talents, resources, and citations, amongst many others. Its intensity
varies across different environments: from relatively evenly distributed ones,
to where a small group of stakeholders controls the majority of the available
resources. We would like to understand why inequality naturally arises as a
consequence of the natural evolution of any system. Studying simple
mathematical models governed by intuitive assumptions can bring many insights
into this problem. In particular, we recently observed (Siudem et al., PNAS
117:13896-13900, 2020) that impact distribution might be modelled accurately by
a time-dependent agent-based model involving a mixture of the rich-get-richer
and sheer chance components. Here we point out its relationship to an iterative
process that generates rank distributions of any length and a predefined level
of inequality, as measured by the Gini index.
Many indices quantifying the degree of inequality have been proposed. Which
of them is the most informative? We show that, under our model, indices such as
the Bonferroni, De Vergottini, and Hoover ones are equivalent. Given one of
them, we can recreate the value of any other measure using the derived
functional relationships. Also, thanks to the obtained formulae, we can
understand how they depend on the sample size. An empirical analysis of a large
sample of citation records in economics (RePEc) as well as countrywise family
income data, confirms our theoretical observations. Therefore, we can safely
and effectively remain faithful to the simplest measure: the Gini index.

arXiv link: http://arxiv.org/abs/2304.07479v1

Econometrics arXiv paper, submitted: 2023-04-14

Generalized Automatic Least Squares: Efficiency Gains from Misspecified Heteroscedasticity Models

Authors: Bulat Gafarov

It is well known that in the presence of heteroscedasticity ordinary least
squares estimator is not efficient. I propose a generalized automatic least
squares estimator (GALS) that makes partial correction of heteroscedasticity
based on a (potentially) misspecified model without a pretest. Such an
estimator is guaranteed to be at least as efficient as either OLS or WLS but
can provide some asymptotic efficiency gains over OLS if the misspecified model
is approximately correct. If the heteroscedasticity model is correct, the
proposed estimator achieves full asymptotic efficiency. The idea is to frame
moment conditions corresponding to OLS and WLS squares based on miss-specified
heteroscedasticity as a joint generalized method of moments estimation problem.
The resulting optimal GMM estimator is equivalent to a feasible GLS with
estimated weight matrix. I also propose an optimal GMM variance-covariance
estimator for GALS to account for any remaining heteroscedasticity in the
residuals.

arXiv link: http://arxiv.org/abs/2304.07331v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-04-14

Detection and Estimation of Structural Breaks in High-Dimensional Functional Time Series

Authors: Degui Li, Runze Li, Han Lin Shang

In this paper, we consider detecting and estimating breaks in heterogeneous
mean functions of high-dimensional functional time series which are allowed to
be cross-sectionally correlated and temporally dependent. A new test statistic
combining the functional CUSUM statistic and power enhancement component is
proposed with asymptotic null distribution theory comparable to the
conventional CUSUM theory derived for a single functional time series. In
particular, the extra power enhancement component enlarges the region where the
proposed test has power, and results in stable power performance when breaks
are sparse in the alternative hypothesis. Furthermore, we impose a latent group
structure on the subjects with heterogeneous break points and introduce an
easy-to-implement clustering algorithm with an information criterion to
consistently estimate the unknown group number and membership. The estimated
group structure can subsequently improve the convergence property of the
post-clustering break point estimate. Monte-Carlo simulation studies and
empirical applications show that the proposed estimation and testing techniques
have satisfactory performance in finite samples.

arXiv link: http://arxiv.org/abs/2304.07003v1

Econometrics arXiv paper, submitted: 2023-04-13

Predictive Incrementality by Experimentation (PIE) for Ad Measurement

Authors: Brett R. Gordon, Robert Moakler, Florian Zettelmeyer

We present a novel approach to causal measurement for advertising, namely to
use exogenous variation in advertising exposure (RCTs) for a subset of ad
campaigns to build a model that can predict the causal effect of ad campaigns
that were run without RCTs. This approach -- Predictive Incrementality by
Experimentation (PIE) -- frames the task of estimating the causal effect of an
ad campaign as a prediction problem, with the unit of observation being an RCT
itself. In contrast, traditional causal inference approaches with observational
data seek to adjust covariate imbalance at the user level. A key insight is to
use post-campaign features, such as last-click conversion counts, that do not
require an RCT, as features in our predictive model. We find that our PIE model
recovers RCT-derived incremental conversions per dollar (ICPD) much better than
the program evaluation approaches analyzed in Gordon et al. (forthcoming). The
prediction errors from the best PIE model are 48%, 42%, and 62% of the
RCT-based average ICPD for upper-, mid-, and lower-funnel conversion outcomes,
respectively. In contrast, across the same data, the average prediction error
of stratified propensity score matching exceeds 491%, and that of
double/debiased machine learning exceeds 2,904%. Using a decision-making
framework inspired by industry, we show that PIE leads to different decisions
compared to RCTs for only 6% of upper-funnel, 7% of mid-funnel, and 13% of
lower-funnel outcomes. We conclude that PIE could enable advertising platforms
to scale causal ad measurement by extrapolating from a limited number of RCTs
to a large set of non-experimental ad campaigns.

arXiv link: http://arxiv.org/abs/2304.06828v1

Econometrics arXiv updated paper (originally submitted: 2023-04-12)

GDP nowcasting with artificial neural networks: How much does long-term memory matter?

Authors: Kristóf Németh, Dániel Hadházi

We apply artificial neural networks (ANNs) to nowcast quarterly GDP growth
for the U.S. economy. Using the monthly FRED-MD database, we compare the
nowcasting performance of five different ANN architectures: the multilayer
perceptron (MLP), the one-dimensional convolutional neural network (1D CNN),
the Elman recurrent neural network (RNN), the long short-term memory network
(LSTM), and the gated recurrent unit (GRU). The empirical analysis presents
results from two distinctively different evaluation periods. The first (2012:Q1
-- 2019:Q4) is characterized by balanced economic growth, while the second
(2012:Q1 -- 2024:Q2) also includes periods of the COVID-19 recession. During
the first evaluation period, longer input sequences slightly improve nowcasting
performance for some ANNs, but the best accuracy is still achieved with
8-month-long input sequences at the end of the nowcasting window. Results from
the second test period depict the role of long-term memory even more clearly.
The MLP, the 1D CNN, and the Elman RNN work best with 8-month-long input
sequences at each step of the nowcasting window. The relatively weak
performance of the gated RNNs also suggests that architectural features
enabling long-term memory do not result in more accurate nowcasts for GDP
growth. The combined results indicate that the 1D CNN seems to represent a
“sweet spot” between the simple time-agnostic MLP and the more
complex (gated) RNNs. The network generates nearly as accurate nowcasts as the
best competitor for the first test period, while it achieves the overall best
accuracy during the second evaluation period. Consequently, as a first in the
literature, we propose the application of the 1D CNN for economic nowcasting.

arXiv link: http://arxiv.org/abs/2304.05805v4

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2023-04-11

Financial Time Series Forecasting using CNN and Transformer

Authors: Zhen Zeng, Rachneet Kaur, Suchetha Siddagangappa, Saba Rahimi, Tucker Balch, Manuela Veloso

Time series forecasting is important across various domains for
decision-making. In particular, financial time series such as stock prices can
be hard to predict as it is difficult to model short-term and long-term
temporal dependencies between data points. Convolutional Neural Networks (CNN)
are good at capturing local patterns for modeling short-term dependencies.
However, CNNs cannot learn long-term dependencies due to the limited receptive
field. Transformers on the other hand are capable of learning global context
and long-term dependencies. In this paper, we propose to harness the power of
CNNs and Transformers to model both short-term and long-term dependencies
within a time series, and forecast if the price would go up, down or remain the
same (flat) in the future. In our experiments, we demonstrated the success of
the proposed method in comparison to commonly adopted statistical and deep
learning methods on forecasting intraday stock price change of S&P 500
constituents.

arXiv link: http://arxiv.org/abs/2304.04912v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-04-06

Adaptive Student's t-distribution with method of moments moving estimator for nonstationary time series

Authors: Jarek Duda

The real life time series are usually nonstationary, bringing a difficult
question of model adaptation. Classical approaches like ARMA-ARCH assume
arbitrary type of dependence. To avoid their bias, we will focus on recently
proposed agnostic philosophy of moving estimator: in time $t$ finding
parameters optimizing e.g. $F_t=\sum_{\tau<t} (1-\eta)^{t-\tau} \ln(\rho_\theta
(x_\tau))$ moving log-likelihood, evolving in time. It allows for example to
estimate parameters using inexpensive exponential moving averages (EMA), like
absolute central moments $m_p=E[|x-\mu|^p]$ evolving for one or multiple powers
$p\inR^+$ using $m_{p,t+1} = m_{p,t} + \eta (|x_t-\mu_t|^p-m_{p,t})$.
Application of such general adaptive methods of moments will be presented on
Student's t-distribution, popular especially in economical applications, here
applied to log-returns of DJIA companies. While standard ARMA-ARCH approaches
provide evolution of $\mu$ and $\sigma$, here we also get evolution of $\nu$
describing $\rho(x)\sim |x|^{-\nu-1}$ tail shape, probability of extreme events
- which might turn out catastrophic, destabilizing the market.

arXiv link: http://arxiv.org/abs/2304.03069v4

Econometrics arXiv updated paper (originally submitted: 2023-04-05)

Faster estimation of dynamic discrete choice models using index invertibility

Authors: Jackson Bunting, Takuya Ura

Many estimators of dynamic discrete choice models with persistent unobserved
heterogeneity have desirable statistical properties but are computationally
intensive. In this paper we propose a method to quicken estimation for a broad
class of dynamic discrete choice problems by exploiting semiparametric index
restrictions. Specifically, we propose an estimator for models whose reduced
form parameters are invertible functions of one or more linear indices (Ahn,
Ichimura, Powell and Ruud 2018), a property we term index invertibility. We
establish that index invertibility implies a set of equality constraints on the
model parameters. Our proposed estimator uses the equality constraints to
decrease the dimension of the optimization problem, thereby generating
computational gains. Our main result shows that the proposed estimator is
asymptotically equivalent to the unconstrained, computationally heavy
estimator. In addition, we provide a series of results on the number of
independent index restrictions on the model parameters, providing theoretical
guidance on the extent of computational gains. Finally, we demonstrate the
advantages of our approach via Monte Carlo simulations.

arXiv link: http://arxiv.org/abs/2304.02171v4

Econometrics arXiv updated paper (originally submitted: 2023-04-04)

Individual Welfare Analysis: Random Quasilinear Utility, Independence, and Confidence Bounds

Authors: Junlong Feng, Sokbae Lee

We introduce a novel framework for individual-level welfare analysis. It
builds on a parametric model for continuous demand with a quasilinear utility
function, allowing for heterogeneous coefficients and unobserved
individual-good-level preference shocks. We obtain bounds on the
individual-level consumer welfare loss at any confidence level due to a
hypothetical price increase, solving a scalable optimization problem
constrained by a novel confidence set under an independence restriction. This
confidence set is computationally simple and robust to weak instruments,
nonlinearity, and partial identification. The validity of the confidence set is
guaranteed by our new results on the joint limiting distribution of the
independence test by Chatterjee (2021). These results together with the
confidence set may have applications beyond welfare analysis. Monte Carlo
simulations and two empirical applications on gasoline and food demand
demonstrate the effectiveness of our method.

arXiv link: http://arxiv.org/abs/2304.01921v4

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2023-04-04

Torch-Choice: A PyTorch Package for Large-Scale Choice Modeling with Python

Authors: Tianyu Du, Ayush Kanodia, Susan Athey

The $torch-choice$ is an open-source library for flexible, fast
choice modeling with Python and PyTorch. $torch-choice$ provides a
$ChoiceDataset$ data structure to manage databases flexibly and
memory-efficiently. The paper demonstrates constructing a
$ChoiceDataset$ from databases of various formats and functionalities
of $ChoiceDataset$. The package implements two widely used models,
namely the multinomial logit and nested logit models, and supports
regularization during model estimation. The package incorporates the option to
take advantage of GPUs for estimation, allowing it to scale to massive datasets
while being computationally efficient. Models can be initialized using either
R-style formula strings or Python dictionaries. We conclude with a comparison
of the computational efficiencies of $torch-choice$ and
$mlogit$ in R as (1) the number of observations increases, (2) the
number of covariates increases, and (3) the expansion of item sets. Finally, we
demonstrate the scalability of $torch-choice$ on large-scale datasets.

arXiv link: http://arxiv.org/abs/2304.01906v4

Econometrics arXiv updated paper (originally submitted: 2023-04-03)

Heterogeneity-robust granular instruments

Authors: Eric Qian

Granular instrumental variables (GIV) has experienced sharp growth in
empirical macro-finance. The methodology's rise showcases granularity's
potential for identification across many economic environments, like the
estimation of spillovers and demand systems. I propose a new estimator--called
robust granular instrumental variables (RGIV)--that enables studying unit-level
heterogeneity in spillovers. Unlike existing methods that assume heterogeneity
is a function of observables, RGIV leaves heterogeneity unrestricted. In
contrast to the baseline GIV estimator, RGIV allows for unknown shock variances
and equal-sized units. Applied to the Euro area, I find strong evidence of
country-level heterogeneity in sovereign yield spillovers.

arXiv link: http://arxiv.org/abs/2304.01273v3

Econometrics arXiv paper, submitted: 2023-04-03

Testing for idiosyncratic Treatment Effect Heterogeneity

Authors: Jaime Ramirez-Cuellar

This paper provides asymptotically valid tests for the null hypothesis of no
treatment effect heterogeneity. Importantly, I consider the presence of
heterogeneity that is not explained by observed characteristics, or so-called
idiosyncratic heterogeneity. When examining this heterogeneity, common
statistical tests encounter a nuisance parameter problem in the average
treatment effect which renders the asymptotic distribution of the test
statistic dependent on that parameter. I propose an asymptotically valid test
that circumvents the estimation of that parameter using the empirical
characteristic function. A simulation study illustrates not only the test's
validity but its higher power in rejecting a false null as compared to current
tests. Furthermore, I show the method's usefulness through its application to a
microfinance experiment in Bosnia and Herzegovina. In this experiment and for
outcomes related to loan take-up and self-employment, the tests suggest that
treatment effect heterogeneity does not seem to be completely accounted for by
baseline characteristics. For those outcomes, researchers could potentially try
to collect more baseline characteristics to inspect the remaining treatment
effect heterogeneity, and potentially, improve treatment targeting.

arXiv link: http://arxiv.org/abs/2304.01141v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-04-03

Artificial neural networks and time series of counts: A class of nonlinear INGARCH models

Authors: Malte Jahn

Time series of counts are frequently analyzed using generalized
integer-valued autoregressive models with conditional heteroskedasticity
(INGARCH). These models employ response functions to map a vector of past
observations and past conditional expectations to the conditional expectation
of the present observation. In this paper, it is shown how INGARCH models can
be combined with artificial neural network (ANN) response functions to obtain a
class of nonlinear INGARCH models. The ANN framework allows for the
interpretation of many existing INGARCH models as a degenerate version of a
corresponding neural model. Details on maximum likelihood estimation, marginal
effects and confidence intervals are given. The empirical analysis of time
series of bounded and unbounded counts reveals that the neural INGARCH models
are able to outperform reasonable degenerate competitor models in terms of the
information loss.

arXiv link: http://arxiv.org/abs/2304.01025v1

Econometrics arXiv paper, submitted: 2023-04-03

Testing and Identifying Substitution and Complementarity Patterns

Authors: Rui Wang

This paper studies semiparametric identification of substitution and
complementarity patterns between two goods using a panel multinomial choice
model with bundles. The model allows the two goods to be either substitutes or
complements and admits heterogeneous complementarity through observed
characteristics. I first provide testable implications for the complementarity
relationship between goods. I then characterize the sharp identified set for
the model parameters and provide sufficient conditions for point
identification. The identification analysis accommodates endogenous covariates
through flexible dependence structures between observed characteristics and
fixed effects while placing no distributional assumptions on unobserved
preference shocks. My method is shown to perform more robustly than the
parametric method through Monte Carlo simulations. As an extension, I allow for
unobserved heterogeneity in the complementarity, investigate scenarios
involving more than two goods, and study a class of nonseparable utility
functions.

arXiv link: http://arxiv.org/abs/2304.00678v1

Econometrics arXiv updated paper (originally submitted: 2023-04-02)

IV Regressions without Exclusion Restrictions

Authors: Wayne Yuan Gao, Rui Wang

We study identification and estimation of endogenous linear and nonlinear
regression models without excluded instrumental variables, based on the
standard mean independence condition and a nonlinear relevance condition. Based
on the identification results, we propose two semiparametric estimators as well
as a discretization-based estimator that does not require any nonparametric
regressions. We establish their asymptotic normality and demonstrate via
simulations their robust finite-sample performances with respect to exclusion
restrictions violations and endogeneity. Our approach is applied to study the
returns to education, and to test the direct effects of college proximity
indicators as well as family background variables on the outcome.

arXiv link: http://arxiv.org/abs/2304.00626v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2023-03-31

Hypothesis testing on invariant subspaces of non-diagonalizable matrices with applications to network statistics

Authors: Jérôme R. Simons

We generalise the inference procedure for eigenvectors of symmetrizable
matrices of Tyler (1981) to that of invariant and singular subspaces of
non-diagonalizable matrices. Wald tests for invariant vectors and $t$-tests for
their individual coefficients perform well in simulations, despite the matrix
being not symmetric. Using these results, it is now possible to perform
inference on network statistics that depend on eigenvectors of non-symmetric
adjacency matrices as they arise in empirical applications from directed
networks. Further, we find that statisticians only need control over the
first-order Davis-Kahan bound to control convergence rates of invariant
subspace estimators to higher-orders. For general invariant subspaces, the
minimal eigenvalue separation dominates the first-order bound potentially
slowing convergence rates considerably. In an example, we find that accounting
for uncertainty in network estimates changes empirical conclusions about the
ranking of nodes' popularity.

arXiv link: http://arxiv.org/abs/2303.18233v5

Econometrics arXiv paper, submitted: 2023-03-27

Under-Identification of Structural Models Based on Timing and Information Set Assumptions

Authors: Daniel Ackerberg, Garth Frazer, Kyoo il Kim, Yao Luo, Yingjun Su

We revisit identification based on timing and information set assumptions in
structural models, which have been used in the context of production functions,
demand equations, and hedonic pricing models (e.g. Olley and Pakes (1996),
Blundell and Bond (2000)). First, we demonstrate a general under-identification
problem using these assumptions in a simple version of the Blundell-Bond
dynamic panel model. In particular, the basic moment conditions can yield
multiple discrete solutions: one at the persistence parameter in the main
equation and another at the persistence parameter governing the regressor. We
then show that the problem can persist in a broader set of models but
disappears in models under stronger timing assumptions. We then propose
possible solutions in the simple setting by enforcing an assumed sign
restriction and conclude by using lessons from our basic identification
approach to propose more general practical advice for empirical researchers.

arXiv link: http://arxiv.org/abs/2303.15170v1

Econometrics arXiv updated paper (originally submitted: 2023-03-24)

Sensitivity Analysis in Unconditional Quantile Effects

Authors: Julian Martinez-Iriarte

This paper proposes a framework to analyze the effects of counterfactual
policies on the unconditional quantiles of an outcome variable. For a given
counterfactual policy, we obtain identified sets for the effect of both
marginal and global changes in the proportion of treated individuals. To
conduct a sensitivity analysis, we introduce the quantile breakdown frontier, a
curve that (i) indicates whether a sensitivity analysis is possible or not, and
(ii) when a sensitivity analysis is possible, quantifies the amount of
selection bias consistent with a given conclusion of interest across different
quantiles. To illustrate our method, we perform a sensitivity analysis on the
effect of unionizing low income workers on the quantiles of the distribution of
(log) wages.

arXiv link: http://arxiv.org/abs/2303.14298v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-03-24

Synthetic Combinations: A Causal Inference Framework for Combinatorial Interventions

Authors: Abhineet Agarwal, Anish Agarwal, Suhas Vijaykumar

Consider a setting where there are $N$ heterogeneous units and $p$
interventions. Our goal is to learn unit-specific potential outcomes for any
combination of these $p$ interventions, i.e., $N \times 2^p$ causal parameters.
Choosing a combination of interventions is a problem that naturally arises in a
variety of applications such as factorial design experiments, recommendation
engines, combination therapies in medicine, conjoint analysis, etc. Running $N
\times 2^p$ experiments to estimate the various parameters is likely expensive
and/or infeasible as $N$ and $p$ grow. Further, with observational data there
is likely confounding, i.e., whether or not a unit is seen under a combination
is correlated with its potential outcome under that combination. To address
these challenges, we propose a novel latent factor model that imposes structure
across units (i.e., the matrix of potential outcomes is approximately rank
$r$), and combinations of interventions (i.e., the coefficients in the Fourier
expansion of the potential outcomes is approximately $s$ sparse). We establish
identification for all $N \times 2^p$ parameters despite unobserved
confounding. We propose an estimation procedure, Synthetic Combinations, and
establish it is finite-sample consistent and asymptotically normal under
precise conditions on the observation pattern. Our results imply consistent
estimation given $poly(r) \times \left( N + s^2p\right)$ observations,
while previous methods have sample complexity scaling as $\min(N \times s^2p, \
\ poly(r) \times (N + 2^p))$. We use Synthetic Combinations to propose a
data-efficient experimental design. Empirically, Synthetic Combinations
outperforms competing approaches on a real-world dataset on movie
recommendations. Lastly, we extend our analysis to do causal inference where
the intervention is a permutation over $p$ items (e.g., rankings).

arXiv link: http://arxiv.org/abs/2303.14226v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2023-03-24

On the failure of the bootstrap for Chatterjee's rank correlation

Authors: Zhexiao Lin, Fang Han

While researchers commonly use the bootstrap for statistical inference, many
of us have realized that the standard bootstrap, in general, does not work for
Chatterjee's rank correlation. In this paper, we provide proof of this issue
under an additional independence assumption, and complement our theory with
simulation evidence for general settings. Chatterjee's rank correlation thus
falls into a category of statistics that are asymptotically normal but
bootstrap inconsistent. Valid inferential methods in this case are Chatterjee's
original proposal (for testing independence) and Lin and Han (2022)'s analytic
asymptotic variance estimator (for more general purposes).

arXiv link: http://arxiv.org/abs/2303.14088v2

Econometrics arXiv paper, submitted: 2023-03-24

Point Identification of LATE with Two Imperfect Instruments

Authors: Rui Wang

This paper characterizes point identification results of the local average
treatment effect (LATE) using two imperfect instruments. The classical approach
(Imbens and Angrist (1994)) establishes the identification of LATE via an
instrument that satisfies exclusion, monotonicity, and independence. However,
it may be challenging to find a single instrument that satisfies all these
assumptions simultaneously. My paper uses two instruments but imposes weaker
assumptions on both instruments. The first instrument is allowed to violate the
exclusion restriction and the second instrument does not need to satisfy
monotonicity. Therefore, the first instrument can affect the outcome via both
direct effects and a shift in the treatment status. The direct effects can be
identified via exogenous variation in the second instrument and therefore the
local average treatment effect is identified. An estimator is proposed, and
using Monte Carlo simulations, it is shown to perform more robustly than the
instrumental variable estimand.

arXiv link: http://arxiv.org/abs/2303.13795v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2023-03-23

Bootstrap-Assisted Inference for Generalized Grenander-type Estimators

Authors: Matias D. Cattaneo, Michael Jansson, Kenichi Nagasawa

Westling and Carone (2020) proposed a framework for studying the large sample
distributional properties of generalized Grenander-type estimators, a versatile
class of nonparametric estimators of monotone functions. The limiting
distribution of those estimators is representable as the left derivative of the
greatest convex minorant of a Gaussian process whose monomial mean can be of
unknown order (when the degree of flatness of the function of interest is
unknown). The standard nonparametric bootstrap is unable to consistently
approximate the large sample distribution of the generalized Grenander-type
estimators even if the monomial order of the mean is known, making statistical
inference a challenging endeavour in applications. To address this inferential
problem, we present a bootstrap-assisted inference procedure for generalized
Grenander-type estimators. The procedure relies on a carefully crafted, yet
automatic, transformation of the estimator. Moreover, our proposed method can
be made “flatness robust” in the sense that it can be made adaptive to the
(possibly unknown) degree of flatness of the function of interest. The method
requires only the consistent estimation of a single scalar quantity, for which
we propose an automatic procedure based on numerical derivative estimation and
the generalized jackknife. Under random sampling, our inference method can be
implemented using a computationally attractive exchangeable bootstrap
procedure. We illustrate our methods with examples and we also provide a small
simulation study. The development of formal results is made possible by some
technical results that may be of independent interest.

arXiv link: http://arxiv.org/abs/2303.13598v3

Econometrics arXiv updated paper (originally submitted: 2023-03-23)

Sequential Cauchy Combination Test for Multiple Testing Problems with Financial Applications

Authors: Nabil Bouamara, Sébastien Laurent, Shuping Shi

We introduce a simple tool to control for false discoveries and identify
individual signals in scenarios involving many tests, dependent test
statistics, and potentially sparse signals. The tool applies the Cauchy
combination test recursively on a sequence of expanding subsets of $p$-values
and is referred to as the sequential Cauchy combination test. While the
original Cauchy combination test aims to make a global statement about a set of
null hypotheses by summing transformed $p$-values, our sequential version
determines which $p$-values trigger the rejection of the global null. The
sequential test achieves strong familywise error rate control, exhibits less
conservatism compared to existing controlling procedures when dealing with
dependent test statistics, and provides a power boost. As illustrations, we
revisit two well-known large-scale multiple testing problems in finance for
which the test statistics have either serial dependence or cross-sectional
dependence, namely monitoring drift bursts in asset prices and searching for
assets with a nonzero alpha. In both applications, the sequential Cauchy
combination test proves to be a preferable alternative. It overcomes many of
the drawbacks inherent to inequality-based controlling procedures, extreme
value approaches, resampling and screening methods, and it improves the power
in simulations, leading to distinct empirical outcomes.

arXiv link: http://arxiv.org/abs/2303.13406v2

Econometrics arXiv updated paper (originally submitted: 2023-03-23)

Uncertain Short-Run Restrictions and Statistically Identified Structural Vector Autoregressions

Authors: Sascha A. Keweloh

This study proposes a combination of a statistical identification approach
with potentially invalid short-run zero restrictions. The estimator shrinks
towards imposed restrictions and stops shrinkage when the data provide evidence
against a restriction. Simulation results demonstrate how incorporating valid
restrictions through the shrinkage approach enhances the accuracy of the
statistically identified estimator and how the impact of invalid restrictions
decreases with the sample size. The estimator is applied to analyze the
interaction between the stock and oil market. The results indicate that
incorporating stock market data into the analysis is crucial, as it enables the
identification of information shocks, which are shown to be important drivers
of the oil price.

arXiv link: http://arxiv.org/abs/2303.13281v2

Econometrics arXiv paper, submitted: 2023-03-23

Functional-Coefficient Quantile Regression for Panel Data with Latent Group Structure

Authors: Xiaorong Yang, Jia Chen, Degui Li, Runze Li

This paper considers estimating functional-coefficient models in panel
quantile regression with individual effects, allowing the cross-sectional and
temporal dependence for large panel observations. A latent group structure is
imposed on the heterogenous quantile regression models so that the number of
nonparametric functional coefficients to be estimated can be reduced
considerably. With the preliminary local linear quantile estimates of the
subject-specific functional coefficients, a classic agglomerative clustering
algorithm is used to estimate the unknown group structure and an
easy-to-implement ratio criterion is proposed to determine the group number.
The estimated group number and structure are shown to be consistent.
Furthermore, a post-grouping local linear smoothing method is introduced to
estimate the group-specific functional coefficients, and the relevant
asymptotic normal distribution theory is derived with a normalisation rate
comparable to that in the literature. The developed methodologies and theory
are verified through a simulation study and showcased with an application to
house price data from UK local authority districts, which reveals different
homogeneity structures at different quantile levels.

arXiv link: http://arxiv.org/abs/2303.13218v1

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2023-03-23

sparseDFM: An R Package to Estimate Dynamic Factor Models with Sparse Loadings

Authors: Luke Mosley, Tak-Shing Chan, Alex Gibberd

sparseDFM is an R package for the implementation of popular estimation
methods for dynamic factor models (DFMs) including the novel Sparse DFM
approach of Mosley et al. (2023). The Sparse DFM ameliorates interpretability
issues of factor structure in classic DFMs by constraining the loading matrices
to have few non-zero entries (i.e. are sparse). Mosley et al. (2023) construct
an efficient expectation maximisation (EM) algorithm to enable estimation of
model parameters using a regularised quasi-maximum likelihood. We provide
detail on the estimation strategy in this paper and show how we implement this
in a computationally efficient way. We then provide two real-data case studies
to act as tutorials on how one may use the sparseDFM package. The first case
study focuses on summarising the structure of a small subset of quarterly CPI
(consumer price inflation) index data for the UK, while the second applies the
package onto a large-scale set of monthly time series for the purpose of
nowcasting nine of the main trade commodities the UK exports worldwide.

arXiv link: http://arxiv.org/abs/2303.14125v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2023-03-22

Forecasting Large Realized Covariance Matrices: The Benefits of Factor Models and Shrinkage

Authors: Rafael Alves, Diego S. de Brito, Marcelo C. Medeiros, Ruy M. Ribeiro

We propose a model to forecast large realized covariance matrices of returns,
applying it to the constituents of the S&P 500 daily. To address the curse of
dimensionality, we decompose the return covariance matrix using standard
firm-level factors (e.g., size, value, and profitability) and use sectoral
restrictions in the residual covariance matrix. This restricted model is then
estimated using vector heterogeneous autoregressive (VHAR) models with the
least absolute shrinkage and selection operator (LASSO). Our methodology
improves forecasting precision relative to standard benchmarks and leads to
better estimates of minimum variance portfolios.

arXiv link: http://arxiv.org/abs/2303.16151v1

Econometrics arXiv updated paper (originally submitted: 2023-03-22)

Don't (fully) exclude me, it's not necessary! Causal inference with semi-IVs

Authors: Christophe Bruneel-Zupanc

This paper proposes semi-instrumental variables (semi-IVs) as an alternative
to instrumental variables (IVs) to identify the causal effect of a binary (or
discrete) endogenous treatment. A semi-IV is a less restrictive form of
instrument: it affects the selection into treatment but is excluded only from
one, not necessarily both, potential outcomes. Having two continuously
distributed semi-IVs, one excluded from the potential outcome under treatment
and the other from the potential outcome under control, is sufficient to
nonparametrically point identify marginal treatment effect (MTE) and local
average treatment effect (LATE) parameters. In practice, semi-IVs provide a
solution to the challenge of finding valid IVs because they are often easier to
find: many selection-specific shocks, policies, prices, costs, or benefits are
valid semi-IVs. As an application, I estimate the returns to working in the
manufacturing sector on earnings using sector-specific characteristics as
semi-IVs.

arXiv link: http://arxiv.org/abs/2303.12667v5

Econometrics arXiv updated paper (originally submitted: 2023-03-21)

Quasi Maximum Likelihood Estimation of High-Dimensional Factor Models: A Critical Review

Authors: Matteo Barigozzi

We review Quasi Maximum Likelihood estimation of factor models for
high-dimensional panels of time series. We consider two cases: (1) estimation
when no dynamic model for the factors is specified (Bai and Li, 2012, 2016);
(2) estimation based on the Kalman smoother and the Expectation Maximization
algorithm thus allowing to model explicitly the factor dynamics (Doz et al.,
2012, Barigozzi and Luciani, 2019). Our interest is in approximate factor
models, i.e., when we allow for the idiosyncratic components to be mildly
cross-sectionally, as well as serially, correlated. Although such setting
apparently makes estimation harder, we show, in fact, that factor models do not
suffer of the {\it curse of dimensionality} problem, but instead they enjoy a
{\it blessing of dimensionality} property. In particular, given an approximate
factor structure, if the cross-sectional dimension of the data, $N$, grows to
infinity, we show that: (i) identification of the model is still possible, (ii)
the mis-specification error due to the use of an exact factor model
log-likelihood vanishes. Moreover, if we let also the sample size, $T$, grow to
infinity, we can also consistently estimate all parameters of the model and
make inference. The same is true for estimation of the latent factors which can
be carried out by weighted least-squares, linear projection, or Kalman
filtering/smoothing. We also compare the approaches presented with: Principal
Component analysis and the classical, fixed $N$, exact Maximum Likelihood
approach. We conclude with a discussion on efficiency of the considered
estimators.

arXiv link: http://arxiv.org/abs/2303.11777v5

Econometrics arXiv updated paper (originally submitted: 2023-03-21)

Using Forests in Multivariate Regression Discontinuity Designs

Authors: Yiqi Liu, Yuan Qi

We discuss estimation and inference of conditional treatment effects in
regression discontinuity (RD) designs with multiple scores. In addition to
local linear regressions and the minimax-optimal estimator more recently
proposed by Imbens and Wager (2019), we argue that two variants of random
forests, honest regression forests and local linear forests, should be added to
the toolkit of applied researchers working with multivariate RD designs; their
validity follows from results in Wager and Athey (2018) and Friedberg et al.
(2020). We design a systematic Monte Carlo study with data generating processes
built both from functional forms that we specify and from Wasserstein
Generative Adversarial Networks that closely mimic the observed data. We find
no single estimator dominates across all specifications: (i) local linear
regressions perform well in univariate settings, but the common practice of
reducing multivariate scores to a univariate one can incur under-coverage,
possibly due to vanishing density at the transformed cutoff; (ii) good
performance of the minimax-optimal estimator depends on accurate estimation of
a nuisance parameter and its current implementation only accepts up to two
scores; (iii) forest-based estimators are not designed for estimation at
boundary points and are susceptible to finite-sample bias, but their
flexibility in modeling multivariate scores opens the door to a wide range of
empirical applications, as illustrated by an empirical study of COVID-19
hospital funding with three eligibility criteria.

arXiv link: http://arxiv.org/abs/2303.11721v3

Econometrics arXiv updated paper (originally submitted: 2023-03-20)

On the Existence and Information of Orthogonal Moments

Authors: Facundo Argañaraz, Juan Carlos Escanciano

Locally Robust (LR)/Orthogonal/Debiased moments have proven useful with
machine learning first steps, but their existence has not been investigated for
general parameters. In this paper, we provide a necessary and sufficient
condition, referred to as Restricted Local Non-surjectivity (RLN), for the
existence of such orthogonal moments to conduct robust inference on general
parameters of interest in regular semiparametric models. Importantly, RLN does
not require either identification of the parameters of interest or the nuisance
parameters. However, for orthogonal moments to be informative, the efficient
Fisher Information matrix for the parameter must be non-zero (though possibly
singular). Thus, orthogonal moments exist and are informative under more
general conditions than previously recognized. We demonstrate the utility of
our general results by characterizing orthogonal moments in a class of models
with Unobserved Heterogeneity (UH). For this class of models our method
delivers functional differencing as a special case. Orthogonality for general
smooth functionals of the distribution of UH is also characterized. As a second
major application, we investigate the existence of orthogonal moments and their
relevance for models defined by moment restrictions with possibly different
conditioning variables. We find orthogonal moments for the fully saturated two
stage least squares, for heterogeneous parameters in treatment effects, for
sample selection models, and for popular models of demand for differentiated
products. We apply our results to the Oregon Health Experiment to study
heterogeneous treatment effects of Medicaid on different health outcomes.

arXiv link: http://arxiv.org/abs/2303.11418v2

Econometrics arXiv updated paper (originally submitted: 2023-03-20)

How Much Should We Trust Instrumental Variable Estimates in Political Science? Practical Advice Based on Over 60 Replicated Studies

Authors: Apoorva Lal, Mac Lockhart, Yiqing Xu, Ziwen Zu

Instrumental variable (IV) strategies are widely used in political science to
establish causal relationships. However, the identifying assumptions required
by an IV design are demanding, and it remains challenging for researchers to
assess their validity. In this paper, we replicate 67 papers published in three
top journals in political science during 2010-2022 and identify several
troubling patterns. First, researchers often overestimate the strength of their
IVs due to non-i.i.d. errors, such as a clustering structure. Second, the most
commonly used t-test for the two-stage-least-squares (2SLS) estimates often
severely underestimates uncertainty. Using more robust inferential methods, we
find that around 19-30% of the 2SLS estimates in our sample are underpowered.
Third, in the majority of the replicated studies, the 2SLS estimates are much
larger than the ordinary-least-squares estimates, and their ratio is negatively
correlated with the strength of the IVs in studies where the IVs are not
experimentally generated, suggesting potential violations of unconfoundedness
or the exclusion restriction. To help researchers avoid these pitfalls, we
provide a checklist for better practice.

arXiv link: http://arxiv.org/abs/2303.11399v3

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2023-03-20

Network log-ARCH models for forecasting stock market volatility

Authors: Raffaele Mattera, Philipp Otto

This paper presents a novel dynamic network autoregressive conditional
heteroscedasticity (ARCH) model based on spatiotemporal ARCH models to forecast
volatility in the US stock market. To improve the forecasting accuracy, the
model integrates temporally lagged volatility information and information from
adjacent nodes, which may instantaneously spill across the entire network. The
model is also suitable for high-dimensional cases where multivariate ARCH
models are typically no longer applicable. We adopt the theoretical foundations
from spatiotemporal statistics and transfer the dynamic ARCH model for
processes to networks. This new approach is compared with independent
univariate log-ARCH models. We could quantify the improvements due to the
instantaneous network ARCH effects, which are studied for the first time in
this paper. The edges are determined based on various distance and correlation
measures between the time series. The performances of the alternative networks'
definitions are compared in terms of out-of-sample accuracy. Furthermore, we
consider ensemble forecasts based on different network definitions.

arXiv link: http://arxiv.org/abs/2303.11064v1

Econometrics arXiv paper, submitted: 2023-03-18

Standard errors when a regressor is randomly assigned

Authors: Denis Chetverikov, Jinyong Hahn, Zhipeng Liao, Andres Santos

We examine asymptotic properties of the OLS estimator when the values of the
regressor of interest are assigned randomly and independently of other
regressors. We find that the OLS variance formula in this case is often
simplified, sometimes substantially. In particular, when the regressor of
interest is independent not only of other regressors but also of the error
term, the textbook homoskedastic variance formula is valid even if the error
term and auxiliary regressors exhibit a general dependence structure. In the
context of randomized controlled trials, this conclusion holds in completely
randomized experiments with constant treatment effects. When the error term is
heteroscedastic with respect to the regressor of interest, the variance formula
has to be adjusted not only for heteroscedasticity but also for correlation
structure of the error term. However, even in the latter case, some
simplifications are possible as only a part of the correlation structure of the
error term should be taken into account. In the context of randomized control
trials, this implies that the textbook homoscedastic variance formula is
typically not valid if treatment effects are heterogenous but
heteroscedasticity-robust variance formulas are valid if treatment effects are
independent across units, even if the error term exhibits a general dependence
structure. In addition, we extend the results to the case when the regressor of
interest is assigned randomly at a group level, such as in randomized control
trials with treatment assignment determined at a group (e.g., school/village)
level.

arXiv link: http://arxiv.org/abs/2303.10306v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-03-17

Estimation of Grouped Time-Varying Network Vector Autoregression Models

Authors: Degui Li, Bin Peng, Songqiao Tang, Weibiao Wu

This paper introduces a flexible time-varying network vector autoregressive
model framework for large-scale time series. A latent group structure is
imposed on the heterogeneous and node-specific time-varying momentum and
network spillover effects so that the number of unknown time-varying
coefficients to be estimated can be reduced considerably. A classic
agglomerative clustering algorithm with nonparametrically estimated distance
matrix is combined with a ratio criterion to consistently estimate the latent
group number and membership. A post-grouping local linear smoothing method is
proposed to estimate the group-specific time-varying momentum and network
effects, substantially improving the convergence rates of the preliminary
estimates which ignore the latent structure. We further modify the methodology
and theory to allow for structural breaks in either the group membership, group
number or group-specific coefficient functions. Numerical studies including
Monte-Carlo simulation and an empirical application are presented to examine
the finite-sample performance of the developed model and methodology.

arXiv link: http://arxiv.org/abs/2303.10117v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2023-03-17

Multivariate Probabilistic CRPS Learning with an Application to Day-Ahead Electricity Prices

Authors: Jonathan Berrisch, Florian Ziel

This paper presents a new method for combining (or aggregating or ensembling)
multivariate probabilistic forecasts, considering dependencies between
quantiles and marginals through a smoothing procedure that allows for online
learning. We discuss two smoothing methods: dimensionality reduction using
Basis matrices and penalized smoothing. The new online learning algorithm
generalizes the standard CRPS learning framework into multivariate dimensions.
It is based on Bernstein Online Aggregation (BOA) and yields optimal asymptotic
learning properties. The procedure uses horizontal aggregation, i.e.,
aggregation across quantiles. We provide an in-depth discussion on possible
extensions of the algorithm and several nested cases related to the existing
literature on online forecast combination. We apply the proposed methodology to
forecasting day-ahead electricity prices, which are 24-dimensional
distributional forecasts. The proposed method yields significant improvements
over uniform combination in terms of continuous ranked probability score
(CRPS). We discuss the temporal evolution of the weights and hyperparameters
and present the results of reduced versions of the preferred model. A fast C++
implementation of the proposed algorithm is provided in the open-source
R-Package profoc on CRAN.

arXiv link: http://arxiv.org/abs/2303.10019v3

Econometrics arXiv updated paper (originally submitted: 2023-03-16)

Bootstrap based asymptotic refinements for high-dimensional nonlinear models

Authors: Joel L. Horowitz, Ahnaf Rafi

We consider penalized extremum estimation of a high-dimensional, possibly
nonlinear model that is sparse in the sense that most of its parameters are
zero but some are not. We use the SCAD penalty function, which provides model
selection consistent and oracle efficient estimates under suitable conditions.
However, asymptotic approximations based on the oracle model can be inaccurate
with the sample sizes found in many applications. This paper gives conditions
under which the bootstrap, based on estimates obtained through SCAD
penalization with thresholding, provides asymptotic refinements of size \(O
\left( n^{- 2} \right)\) for the error in the rejection (coverage) probability
of a symmetric hypothesis test (confidence interval) and \(O \left( n^{- 1}
\right)\) for the error in the rejection (coverage) probability of a one-sided
or equal tailed test (confidence interval). The results of Monte Carlo
experiments show that the bootstrap can provide large reductions in errors in
rejection and coverage probabilities. The bootstrap is consistent, though it
does not necessarily provide asymptotic refinements, even if some parameters
are close but not equal to zero. Random-coefficients logit and probit models
and nonlinear moment models are examples of models to which the procedure
applies.

arXiv link: http://arxiv.org/abs/2303.09680v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2023-03-15

On the robustness of posterior means

Authors: Jiafeng Chen

Consider a normal location model $X \mid \theta \sim N(\theta, \sigma^2)$
with known $\sigma^2$. Suppose $\theta \sim G_0$, where the prior $G_0$ has
zero mean and variance bounded by $V$. Let $G_1$ be a possibly misspecified
prior with zero mean and variance bounded by $V$. We show that the squared
error Bayes risk of the posterior mean under $G_1$ is bounded, subjected to an
additional tail condition on $G_1$, uniformly over $G_0, G_1, \sigma^2 > 0$.

arXiv link: http://arxiv.org/abs/2303.08653v2

Econometrics arXiv updated paper (originally submitted: 2023-03-15)

Identifying an Earnings Process With Dependent Contemporaneous Income Shocks

Authors: Dan Ben-Moshe

This paper proposes a novel approach for identifying coefficients in an
earnings dynamics model with arbitrarily dependent contemporaneous income
shocks. Traditional methods relying on second moments fail to identify these
coefficients, emphasizing the need for nongaussianity assumptions that capture
information from higher moments. Our results contribute to the literature on
earnings dynamics by allowing models of earnings to have, for example, the
permanent income shock of a job change to be linked to the contemporaneous
transitory income shock of a relocation bonus.

arXiv link: http://arxiv.org/abs/2303.08460v2

Econometrics arXiv updated paper (originally submitted: 2023-03-14)

Identification- and many moment-robust inference via invariant moment conditions

Authors: Tom Boot, Johannes W. Ligtenberg

Identification-robust hypothesis tests are commonly based on the continuous
updating GMM objective function. When the number of moment conditions grows
proportionally with the sample size, the large-dimensional weighting matrix
prohibits the use of conventional asymptotic approximations and the behavior of
these tests remains unknown. We show that the structure of the weighting matrix
opens up an alternative route to asymptotic results when, under the null
hypothesis, the distribution of the moment conditions satisfies a symmetry
condition known as reflection invariance. We provide several examples in which
the invariance follows from standard assumptions. Our results show that
existing tests will be asymptotically conservative, and we propose an
adjustment to attain nominal size in large samples. We illustrate our findings
through simulations for various linear and nonlinear models, and an empirical
application on the effect of the concentration of financial activities in banks
on systemic risk.

arXiv link: http://arxiv.org/abs/2303.07822v5

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2023-03-13

Tight Non-asymptotic Inference via Sub-Gaussian Intrinsic Moment Norm

Authors: Huiming Zhang, Haoyu Wei, Guang Cheng

In non-asymptotic learning, variance-type parameters of sub-Gaussian
distributions are of paramount importance. However, directly estimating these
parameters using the empirical moment generating function (MGF) is infeasible.
To address this, we suggest using the sub-Gaussian intrinsic moment norm
[Buldygin and Kozachenko (2000), Theorem 1.3] achieved by maximizing a sequence
of normalized moments. Significantly, the suggested norm can not only
reconstruct the exponential moment bounds of MGFs but also provide tighter
sub-Gaussian concentration inequalities. In practice, we provide an intuitive
method for assessing whether data with a finite sample size is sub-Gaussian,
utilizing the sub-Gaussian plot. The intrinsic moment norm can be robustly
estimated via a simple plug-in approach. Our theoretical findings are also
applicable to reinforcement learning, including the multi-armed bandit
scenario.

arXiv link: http://arxiv.org/abs/2303.07287v2

Econometrics arXiv updated paper (originally submitted: 2023-03-13)

Inflation forecasting with attention based transformer neural networks

Authors: Maximilian Tschuchnig, Petra Tschuchnig, Cornelia Ferner, Michael Gadermayr

Inflation is a major determinant for allocation decisions and its forecast is
a fundamental aim of governments and central banks. However, forecasting
inflation is not a trivial task, as its prediction relies on low frequency,
highly fluctuating data with unclear explanatory variables. While classical
models show some possibility of predicting inflation, reliably beating the
random walk benchmark remains difficult. Recently, (deep) neural networks have
shown impressive results in a multitude of applications, increasingly setting
the new state-of-the-art. This paper investigates the potential of the
transformer deep neural network architecture to forecast different inflation
rates. The results are compared to a study on classical time series and machine
learning models. We show that our adapted transformer, on average, outperforms
the baseline in 6 out of 16 experiments, showing best scores in two out of four
investigated inflation rates. Our results demonstrate that a transformer based
neural network can outperform classical regression and machine learning models
in certain inflation rates and forecasting horizons.

arXiv link: http://arxiv.org/abs/2303.15364v2

Econometrics arXiv paper, submitted: 2023-03-12

Counterfactual Copula and Its Application to the Effects of College Education on Intergenerational Mobility

Authors: Tsung-Chih Lai, Jiun-Hua Su

This paper proposes a nonparametric estimator of the counterfactual copula of
two outcome variables that would be affected by a policy intervention. The
proposed estimator allows policymakers to conduct ex-ante evaluations by
comparing the estimated counterfactual and actual copulas as well as their
corresponding measures of association. Asymptotic properties of the
counterfactual copula estimator are established under regularity conditions.
These conditions are also used to validate the nonparametric bootstrap for
inference on counterfactual quantities. Simulation results indicate that our
estimation and inference procedures perform well in moderately sized samples.
Applying the proposed method to studying the effects of college education on
intergenerational income mobility under two counterfactual scenarios, we find
that while providing some college education to all children is unlikely to
promote mobility, offering a college degree to children from less educated
families can significantly reduce income persistence across generations.

arXiv link: http://arxiv.org/abs/2303.06658v1

Econometrics arXiv paper, submitted: 2023-03-09

Distributional Vector Autoregression: Eliciting Macro and Financial Dependence

Authors: Yunyun Wang, Tatsushi Oka, Dan Zhu

Vector autoregression is an essential tool in empirical macroeconomics and
finance for understanding the dynamic interdependencies among multivariate time
series. In this study, we expand the scope of vector autoregression by
incorporating a multivariate distributional regression framework and
introducing a distributional impulse response function, providing a
comprehensive view of dynamic heterogeneity. We propose a straightforward yet
flexible estimation method and establish its asymptotic properties under weak
dependence assumptions. Our empirical analysis examines the conditional joint
distribution of GDP growth and financial conditions in the United States, with
a focus on the global financial crisis. Our results show that tight financial
conditions lead to a multimodal conditional joint distribution of GDP growth
and financial conditions, and easing financial conditions significantly impacts
long-term GDP growth, while improving the GDP growth during the global
financial crisis has limited effects on financial conditions.

arXiv link: http://arxiv.org/abs/2303.04994v1

Econometrics arXiv updated paper (originally submitted: 2023-03-08)

Inference on Optimal Dynamic Policies via Softmax Approximation

Authors: Qizhao Chen, Morgane Austern, Vasilis Syrgkanis

Estimating optimal dynamic policies from offline data is a fundamental
problem in dynamic decision making. In the context of causal inference, the
problem is known as estimating the optimal dynamic treatment regime. Even
though there exists a plethora of methods for estimation, constructing
confidence intervals for the value of the optimal regime and structural
parameters associated with it is inherently harder, as it involves non-linear
and non-differentiable functionals of unknown quantities that need to be
estimated. Prior work resorted to sub-sample approaches that can deteriorate
the quality of the estimate. We show that a simple soft-max approximation to
the optimal treatment regime, for an appropriately fast growing temperature
parameter, can achieve valid inference on the truly optimal regime. We
illustrate our result for a two-period optimal dynamic regime, though our
approach should directly extend to the finite horizon case. Our work combines
techniques from semi-parametric inference and $g$-estimation, together with an
appropriate triangular array central limit theorem, as well as a novel analysis
of the asymptotic influence and asymptotic bias of softmax approximations.

arXiv link: http://arxiv.org/abs/2303.04416v3

Econometrics arXiv updated paper (originally submitted: 2023-03-06)

Just Ask Them Twice: Choice Probabilities and Identification of Ex ante returns and Willingness-To-Pay

Authors: Romuald Meango, Esther Mirjam Girsberger

One of the exciting developments in the stated preference literature is the
use of probabilistic stated preference experiments to estimate semi-parametric
population distributions of ex ante returns and willingness-to-pay (WTP) for a
choice attribute. This relies on eliciting several choices per individual, and
estimating separate demand functions, at the cost of possibly long survey
instruments. This paper shows that the distributions of interest can be
recovered from at most two stated choices, without requiring ad-hoc parametric
assumptions. Hence, it allows for significantly shorter survey instruments. The
paper also shows that eliciting probabilistic stated choices allows identifying
much richer objects than we have done so far, and therefore, provides better
tools for ex ante policy evaluation. Finally, it showcases the feasibility and
relevance of the results by studying the preference of high ability students in
Cote d'Ivoire for public sector jobs exploiting a unique survey on this
population. Our analysis supports the claim that public sector jobs might
significantly increase the cost of hiring elite students for the private
sector.

arXiv link: http://arxiv.org/abs/2303.03009v4

Econometrics arXiv updated paper (originally submitted: 2023-03-06)

EnsembleIV: Creating Instrumental Variables from Ensemble Learners for Robust Statistical Inference

Authors: Gordon Burtch, Edward McFowland III, Mochen Yang, Gediminas Adomavicius

Despite increasing popularity in empirical studies, the integration of
machine learning generated variables into regression models for statistical
inference suffers from the measurement error problem, which can bias estimation
and threaten the validity of inferences. In this paper, we develop a novel
approach to alleviate associated estimation biases. Our proposed approach,
EnsembleIV, creates valid and strong instrumental variables from weak learners
in an ensemble model, and uses them to obtain consistent estimates that are
robust against the measurement error problem. Our empirical evaluations, using
both synthetic and real-world datasets, show that EnsembleIV can effectively
reduce estimation biases across several common regression specifications, and
can be combined with modern deep learning techniques when dealing with
unstructured data.

arXiv link: http://arxiv.org/abs/2303.02820v2

Econometrics arXiv paper, submitted: 2023-03-05

Censored Quantile Regression with Many Controls

Authors: Seoyun Hong

This paper develops estimation and inference methods for censored quantile
regression models with high-dimensional controls. The methods are based on the
application of double/debiased machine learning (DML) framework to the censored
quantile regression estimator of Buchinsky and Hahn (1998). I provide valid
inference for low-dimensional parameters of interest in the presence of
high-dimensional nuisance parameters when implementing machine learning
estimators. The proposed estimator is shown to be consistent and asymptotically
normal. The performance of the estimator with high-dimensional controls is
illustrated with numerical simulation and an empirical application that
examines the effect of 401(k) eligibility on savings.

arXiv link: http://arxiv.org/abs/2303.02784v1

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2023-03-05

Deterministic, quenched and annealed parameter estimation for heterogeneous network models

Authors: Marzio Di Vece, Diego Garlaschelli, Tiziano Squartini

At least two, different approaches to define and solve statistical models for
the analysis of economic systems exist: the typical, econometric one,
interpreting the Gravity Model specification as the expected link weight of an
arbitrary probability distribution, and the one rooted into statistical
physics, constructing maximum-entropy distributions constrained to satisfy
certain network properties. In a couple of recent, companion papers they have
been successfully integrated within the framework induced by the constrained
minimisation of the Kullback-Leibler divergence: specifically, two, broad
classes of models have been devised, i.e. the integrated and the conditional
ones, defined by different, probabilistic rules to place links, load them with
weights and turn them into proper, econometric prescriptions. Still, the
recipes adopted by the two approaches to estimate the parameters entering into
the definition of each model differ. In econometrics, a likelihood that
decouples the binary and weighted parts of a model, treating a network as
deterministic, is typically maximised; to restore its random character, two
alternatives exist: either solving the likelihood maximisation on each
configuration of the ensemble and taking the average of the parameters
afterwards or taking the average of the likelihood function and maximising the
latter one. The difference between these approaches lies in the order in which
the operations of averaging and maximisation are taken - a difference that is
reminiscent of the quenched and annealed ways of averaging out the disorder in
spin glasses. The results of the present contribution, devoted to comparing
these recipes in the case of continuous, conditional network models, indicate
that the annealed estimation recipe represents the best alternative to the
deterministic one.

arXiv link: http://arxiv.org/abs/2303.02716v5

Econometrics arXiv updated paper (originally submitted: 2023-03-03)

Fast Forecasting of Unstable Data Streams for On-Demand Service Platforms

Authors: Yu Jeffrey Hu, Jeroen Rombouts, Ines Wilms

On-demand service platforms face a challenging problem of forecasting a large
collection of high-frequency regional demand data streams that exhibit
instabilities. This paper develops a novel forecast framework that is fast and
scalable, and automatically assesses changing environments without human
intervention. We empirically test our framework on a large-scale demand data
set from a leading on-demand delivery platform in Europe, and find strong
performance gains from using our framework against several industry benchmarks,
across all geographical regions, loss functions, and both pre- and post-Covid
periods. We translate forecast gains to economic impacts for this on-demand
service platform by computing financial gains and reductions in computing
costs.

arXiv link: http://arxiv.org/abs/2303.01887v2

Econometrics arXiv updated paper (originally submitted: 2023-03-03)

Constructing High Frequency Economic Indicators by Imputation

Authors: Serena Ng, Susannah Scanlan

Monthly and weekly economic indicators are often taken to be the largest
common factor estimated from high and low frequency data, either separately or
jointly. To incorporate mixed frequency information without directly modeling
them, we target a low frequency diffusion index that is already available, and
treat high frequency values as missing. We impute these values using multiple
factors estimated from the high frequency data. In the empirical examples
considered, static matrix completion that does not account for serial
correlation in the idiosyncratic errors yields imprecise estimates of the
missing values irrespective of how the factors are estimated. Single equation
and systems-based dynamic procedures that account for serial correlation yield
imputed values that are closer to the observed low frequency ones. This is the
case in the counterfactual exercise that imputes the monthly values of consumer
sentiment series before 1978 when the data was released only on a quarterly
basis. This is also the case for a weekly version of the CFNAI index of
economic activity that is imputed using seasonally unadjusted data. The imputed
series reveals episodes of increased variability of weekly economic information
that are masked by the monthly data, notably around the 2014-15 collapse in oil
prices.

arXiv link: http://arxiv.org/abs/2303.01863v3

Econometrics arXiv updated paper (originally submitted: 2023-03-02)

Debiased Machine Learning of Aggregated Intersection Bounds and Other Causal Parameters

Authors: Vira Semenova

This paper proposes a novel framework of aggregated intersection of
regression functions, where the target parameter is obtained by averaging the
minimum (or maximum) of a collection of regression functions over the covariate
space. Such quantities include the lower and upper bounds on distributional
effects (Frechet-Hoeffding, Makarov) and the optimal welfare in the statistical
treatment choice problem. The proposed estimator -- the envelope score
estimator -- is shown to have an oracle property, where the oracle knows the
identity of the minimizer for each covariate value. I apply this result to the
bounds in the Roy model and the Horowitz-Manski-Lee bounds with a discrete
outcome. The proposed approach performs well empirically on the data from the
Oregon Health Insurance Experiment.

arXiv link: http://arxiv.org/abs/2303.00982v3

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2023-03-01

$21^{st}$ Century Statistical Disclosure Limitation: Motivations and Challenges

Authors: John M Abowd, Michael B Hawes

This chapter examines the motivations and imperatives for modernizing how
statistical agencies approach statistical disclosure limitation for official
data product releases. It discusses the implications for agencies' broader data
governance and decision-making, and it identifies challenges that agencies will
likely face along the way. In conclusion, the chapter proposes some principles
and best practices that we believe can help guide agencies in navigating the
transformation of their confidentiality programs.

arXiv link: http://arxiv.org/abs/2303.00845v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-03-01

Generalized Cumulative Shrinkage Process Priors with Applications to Sparse Bayesian Factor Analysis

Authors: Sylvia Frühwirth-Schnatter

The paper discusses shrinkage priors which impose increasing shrinkage in a
sequence of parameters. We review the cumulative shrinkage process (CUSP) prior
of Legramanti et al. (2020), which is a spike-and-slab shrinkage prior where
the spike probability is stochastically increasing and constructed from the
stick-breaking representation of a Dirichlet process prior. As a first
contribution, this CUSP prior is extended by involving arbitrary stick-breaking
representations arising from beta distributions. As a second contribution, we
prove that exchangeable spike-and-slab priors, which are popular and widely
used in sparse Bayesian factor analysis, can be represented as a finite
generalized CUSP prior, which is easily obtained from the decreasing order
statistics of the slab probabilities. Hence, exchangeable spike-and-slab
shrinkage priors imply increasing shrinkage as the column index in the loading
matrix increases, without imposing explicit order constraints on the slab
probabilities. An application to sparse Bayesian factor analysis illustrates
the usefulness of the findings of this paper. A new exchangeable spike-and-slab
shrinkage prior based on the triple gamma prior of Cadonna et al. (2020) is
introduced and shown to be helpful for estimating the unknown number of factors
in a simulation study.

arXiv link: http://arxiv.org/abs/2303.00473v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2023-03-01

Consumer Welfare Under Individual Heterogeneity

Authors: Charles Gauthier, Sebastiaan Maes, Raghav Malhotra

We propose a nonparametric method for estimating the distribution of consumer
welfare from cross-sectional data with no restrictions on individual
preferences. First demonstrating that moments of demand identify the curvature
of the expenditure function, we use these moments to approximate money-metric
welfare measures. Our approach captures both nonhomotheticity and heterogeneity
in preferences in the behavioral responses to price changes. We apply our
method to US household scanner data to evaluate the impacts of the price shock
between December 2020 and 2021 on the cost-of-living index. We document
substantial heterogeneity in welfare losses within and across demographic
groups. For most groups, a naive measure of consumer welfare would
significantly underestimate the welfare loss. By decomposing the behavioral
responses into the components arising from nonhomotheticity and heterogeneity
in preferences, we find that both factors are essential for accurate welfare
measurement, with heterogeneity contributing more substantially.

arXiv link: http://arxiv.org/abs/2303.01231v5

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-03-01

Disentangling Structural Breaks in Factor Models for Macroeconomic Data

Authors: Bonsoo Koo, Benjamin Wong, Ze-Yu Zhong

Through a routine normalization of the factor variance, standard methods for
estimating factor models in macroeconomics do not distinguish between breaks of
the factor variance and factor loadings. We argue that it is important to
distinguish between structural breaks in the factor variance and loadings
within factor models commonly employed in macroeconomics as both can lead to
markedly different interpretations when viewed via the lens of the underlying
dynamic factor model. We then develop a projection-based decomposition that
leads to two standard and easy-to-implement Wald tests to disentangle
structural breaks in the factor variance and factor loadings. Applying our
procedure to U.S. macroeconomic data, we find evidence of both types of breaks
associated with the Great Moderation and the Great Recession. Through our
projection-based decomposition, we estimate that the Great Moderation is
associated with an over 60% reduction in the total factor variance,
highlighting the relevance of disentangling breaks in the factor structure.

arXiv link: http://arxiv.org/abs/2303.00178v2

Econometrics arXiv updated paper (originally submitted: 2023-02-28)

Transition Probabilities and Moment Restrictions in Dynamic Fixed Effects Logit Models

Authors: Kevin Dano

Dynamic logit models are popular tools in economics to measure state
dependence. This paper introduces a new method to derive moment restrictions in
a large class of such models with strictly exogenous regressors and fixed
effects. We exploit the common structure of logit-type transition probabilities
and elementary properties of rational fractions, to formulate a systematic
procedure that scales naturally with model complexity (e.g the lag order or the
number of observed time periods). We detail the construction of moment
restrictions in binary response models of arbitrary lag order as well as
first-order panel vector autoregressions and dynamic multinomial logit models.
Identification of common parameters and average marginal effects is also
discussed for the binary response case. Finally, we illustrate our results by
studying the dynamics of drug consumption amongst young people inspired by Deza
(2015).

arXiv link: http://arxiv.org/abs/2303.00083v2

Econometrics arXiv updated paper (originally submitted: 2023-02-28)

The First-stage F Test with Many Weak Instruments

Authors: Zhenhong Huang, Chen Wang, Jianfeng Yao

A widely adopted approach for detecting weak instruments is to use the
first-stage $F$ statistic. While this method was developed with a fixed number
of instruments, its performance with many instruments remains insufficiently
explored. We show that the first-stage $F$ test exhibits distorted sizes for
detecting many weak instruments, regardless of the choice of pretested
estimators or Wald tests. These distortions occur due to the inadequate
approximation using classical noncentral Chi-squared distributions. As a
byproduct of our main result, we present an alternative approach to pre-test
many weak instruments with the corrected first-stage $F$ statistic. An
empirical illustration with Angrist and Keueger (1991)'s returns to education
data confirms its usefulness.

arXiv link: http://arxiv.org/abs/2302.14423v2

Econometrics arXiv paper, submitted: 2023-02-28

A specification test for the strength of instrumental variables

Authors: Zhenhong Huang, Chen Wang, Jianfeng Yao

This paper develops a new specification test for the instrument weakness when
the number of instruments $K_n$ is large with a magnitude comparable to the
sample size $n$. The test relies on the fact that the difference between the
two-stage least squares (2SLS) estimator and the ordinary least squares (OLS)
estimator asymptotically disappears when there are many weak instruments, but
otherwise converges to a non-zero limit. We establish the limiting distribution
of the difference within the above two specifications, and introduce a
delete-$d$ Jackknife procedure to consistently estimate the asymptotic
variance/covariance of the difference. Monte Carlo experiments demonstrate the
good performance of the test procedure for both cases of single and multiple
endogenous variables. Additionally, we re-examine the analysis of returns to
education data in Angrist and Keueger (1991) using our proposed test. Both the
simulation results and empirical analysis indicate the reliability of the test.

arXiv link: http://arxiv.org/abs/2302.14396v1

Econometrics arXiv paper, submitted: 2023-02-28

Unified and robust Lagrange multiplier type tests for cross-sectional independence in large panel data models

Authors: Zhenhong Huang, Zhaoyuan Li, Jianfeng Yao

This paper revisits the Lagrange multiplier type test for the null hypothesis
of no cross-sectional dependence in large panel data models. We propose a
unified test procedure and its power enhancement version, which show robustness
for a wide class of panel model contexts. Specifically, the two procedures are
applicable to both heterogeneous and fixed effects panel data models with the
presence of weakly exogenous as well as lagged dependent regressors, allowing
for a general form of nonnormal error distribution. With the tools from Random
Matrix Theory, the asymptotic validity of the test procedures is established
under the simultaneous limit scheme where the number of time periods and the
number of cross-sectional units go to infinity proportionally. The derived
theories are accompanied by detailed Monte Carlo experiments, which confirm the
robustness of the two tests and also suggest the validity of the power
enhancement technique.

arXiv link: http://arxiv.org/abs/2302.14387v1

Econometrics arXiv paper, submitted: 2023-02-28

Identification and Estimation of Categorical Random Coefficient Models

Authors: Zhan Gao, M. Hashem Pesaran

This paper proposes a linear categorical random coefficient model, in which
the random coefficients follow parametric categorical distributions. The
distributional parameters are identified based on a linear recurrence structure
of moments of the random coefficients. A Generalized Method of Moments
estimation procedure is proposed also employed by Peter Schmidt and his
coauthors to address heterogeneity in time effects in panel data models. Using
Monte Carlo simulations, we find that moments of the random coefficients can be
estimated reasonably accurately, but large samples are required for estimation
of the parameters of the underlying categorical distribution. The utility of
the proposed estimator is illustrated by estimating the distribution of returns
to education in the U.S. by gender and educational levels. We find that rising
heterogeneity between educational groups is mainly due to the increasing
returns to education for those with postsecondary education, whereas within
group heterogeneity has been rising mostly in the case of individuals with high
school or less education.

arXiv link: http://arxiv.org/abs/2302.14380v1

Econometrics arXiv updated paper (originally submitted: 2023-02-27)

Macroeconomic Forecasting using Dynamic Factor Models: The Case of Morocco

Authors: Daoui Marouane

This article discusses the use of dynamic factor models in macroeconomic
forecasting, with a focus on the Factor-Augmented Error Correction Model
(FECM). The FECM combines the advantages of cointegration and dynamic factor
models, providing a flexible and reliable approach to macroeconomic
forecasting, especially for non-stationary variables. We evaluate the
forecasting performance of the FECM model on a large dataset of 117 Moroccan
economic series with quarterly frequency. Our study shows that FECM outperforms
traditional econometric models in terms of forecasting accuracy and robustness.
The inclusion of long-term information and common factors in FECM enhances its
ability to capture economic dynamics and leads to better forecasting
performance than other competing models. Our results suggest that FECM can be a
valuable tool for macroeconomic forecasting in Morocco and other similar
economies.

arXiv link: http://arxiv.org/abs/2302.14180v3

Econometrics arXiv updated paper (originally submitted: 2023-02-27)

Forecasting Macroeconomic Tail Risk in Real Time: Do Textual Data Add Value?

Authors: Philipp Adämmer, Jan Prüser, Rainer Schüssler

We examine the incremental value of news-based data relative to the FRED-MD
economic indicators for quantile predictions of employment, output, inflation
and consumer sentiment in a high-dimensional setting. Our results suggest that
news data contain valuable information that is not captured by a large set of
economic indicators. We provide empirical evidence that this information can be
exploited to improve tail risk predictions. The added value is largest when
media coverage and sentiment are combined to compute text-based predictors.
Methods that capture quantile-specific non-linearities produce overall superior
forecasts relative to methods that feature linear predictive relationships. The
results are robust along different modeling choices.

arXiv link: http://arxiv.org/abs/2302.13999v2

Econometrics arXiv updated paper (originally submitted: 2023-02-27)

Multicell experiments for marginal treatment effect estimation of digital ads

Authors: Caio Waisman, Brett R. Gordon

Randomized experiments with treatment and control groups are an important
tool to measure the impacts of interventions. However, in experimental settings
with one-sided noncompliance extant empirical approaches may not produce the
estimands a decision maker needs to solve the problem of interest. For example,
these experimental designs are common in digital advertising settings but
typical methods do not yield effects that inform the intensive margin: how many
consumers should be reached or how much should be spent on a campaign. We
propose a solution that combines a novel multicell experimental design with
modern estimation techniques that enables decision makers to solve problems
with an intensive margin. Our design is straightforward to implement and does
not require additional budget. We illustrate our method through simulations
calibrated using an advertising experiment at Facebook, demonstrating its
superior performance in various scenarios and its advantage over direct
optimization approaches.

arXiv link: http://arxiv.org/abs/2302.13857v4

Econometrics arXiv updated paper (originally submitted: 2023-02-27)

Nickell Bias in Panel Local Projection: Financial Crises Are Worse Than You Think

Authors: Ziwei Mei, Liugang Sheng, Zhentao Shi

Panel local projection (LP) with fixed-effects (FE) estimation is widely
adopted for evaluating the economic consequences of financial crises across
countries. This paper highlights a fundamental methodological issue: the
presence of the Nickell bias in the panel FE estimator due to inherent dynamic
structures of panel predictive specifications, even if the regressors have no
lagged dependent variables. The Nickell bias invalidates the standard
inferential procedure based on the $t$-statistic. We propose the split-panel
jackknife (SPJ) estimator as a simple, easy-to-implement, and yet effective
solution to eliminate the bias and restore valid statistical inference. We
revisit four influential empirical studies on the impact of financial crises,
and find that the FE method underestimates the economic losses of financial
crises relative to the SPJ estimates.

arXiv link: http://arxiv.org/abs/2302.13455v4

Econometrics arXiv updated paper (originally submitted: 2023-02-25)

Estimating Fiscal Multipliers by Combining Statistical Identification with Potentially Endogenous Proxies

Authors: Sascha A. Keweloh, Mathias Klein, Jan Prüser

Different proxy variables used in fiscal policy SVARs lead to contradicting
conclusions regarding the size of fiscal multipliers. Our analysis suggests
that the conflicting results may stem from violations of the proxy exogeneity
assumptions. We propose a novel approach to include proxy variables into a
Bayesian non-Gaussian SVAR, tailored to accommodate potentially endogenous
proxies. Using our model, we find that increasing government spending is more
effective in stimulating the economy than reducing taxes.

arXiv link: http://arxiv.org/abs/2302.13066v6

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-02-24

On the Misspecification of Linear Assumptions in Synthetic Control

Authors: Achille Nazaret, Claudia Shi, David M. Blei

The synthetic control (SC) method is a popular approach for estimating
treatment effects from observational panel data. It rests on a crucial
assumption that we can write the treated unit as a linear combination of the
untreated units. This linearity assumption, however, can be unlikely to hold in
practice and, when violated, the resulting SC estimates are incorrect. In this
paper we examine two questions: (1) How large can the misspecification error
be? (2) How can we limit it? First, we provide theoretical bounds to quantify
the misspecification error. The bounds are comforting: small misspecifications
induce small errors. With these bounds in hand, we then develop new SC
estimators that are specially designed to minimize misspecification error. The
estimators are based on additional data about each unit, which is used to
produce the SC weights. (For example, if the units are countries then the
additional data might be demographic information about each.) We study our
estimators on synthetic data; we find they produce more accurate causal
estimates than standard synthetic controls. We then re-analyze the California
tobacco-program data of the original SC paper, now including additional data
from the US census about per-state demographics. Our estimators show that the
observations in the pre-treatment period lie within the bounds of
misspecification error, and that the observations post-treatment lie outside of
those bounds. This is evidence that our SC methods have uncovered a true
effect.

arXiv link: http://arxiv.org/abs/2302.12777v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-02-24

Personalized Pricing with Invalid Instrumental Variables: Identification, Estimation, and Policy Learning

Authors: Rui Miao, Zhengling Qi, Cong Shi, Lin Lin

Pricing based on individual customer characteristics is widely used to
maximize sellers' revenues. This work studies offline personalized pricing
under endogeneity using an instrumental variable approach. Standard
instrumental variable methods in causal inference/econometrics either focus on
a discrete treatment space or require the exclusion restriction of instruments
from having a direct effect on the outcome, which limits their applicability in
personalized pricing. In this paper, we propose a new policy learning method
for Personalized pRicing using Invalid iNsTrumental variables (PRINT) for
continuous treatment that allow direct effects on the outcome. Specifically,
relying on the structural models of revenue and price, we establish the
identifiability condition of an optimal pricing strategy under endogeneity with
the help of invalid instrumental variables. Based on this new identification,
which leads to solving conditional moment restrictions with generalized
residual functions, we construct an adversarial min-max estimator and learn an
optimal pricing strategy. Furthermore, we establish an asymptotic regret bound
to find an optimal pricing strategy. Finally, we demonstrate the effectiveness
of the proposed method via extensive simulation studies as well as a real data
application from an US online auto loan company.

arXiv link: http://arxiv.org/abs/2302.12670v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-02-23

Variable Importance Matching for Causal Inference

Authors: Quinn Lanners, Harsh Parikh, Alexander Volfovsky, Cynthia Rudin, David Page

Our goal is to produce methods for observational causal inference that are
auditable, easy to troubleshoot, accurate for treatment effect estimation, and
scalable to high-dimensional data. We describe a general framework called
Model-to-Match that achieves these goals by (i) learning a distance metric via
outcome modeling, (ii) creating matched groups using the distance metric, and
(iii) using the matched groups to estimate treatment effects. Model-to-Match
uses variable importance measurements to construct a distance metric, making it
a flexible framework that can be adapted to various applications. Concentrating
on the scalability of the problem in the number of potential confounders, we
operationalize the Model-to-Match framework with LASSO. We derive performance
guarantees for settings where LASSO outcome modeling consistently identifies
all confounders (importantly without requiring the linear model to be correctly
specified). We also provide experimental results demonstrating the method's
auditability, accuracy, and scalability as well as extensions to more general
nonparametric outcome modeling.

arXiv link: http://arxiv.org/abs/2302.11715v2

Econometrics arXiv updated paper (originally submitted: 2023-02-22)

Decomposition and Interpretation of Treatment Effects in Settings with Delayed Outcomes

Authors: Federico A. Bugni, Ivan A. Canay, Steve McBride

This paper studies settings where the analyst is interested in identifying
and estimating the average direct causal effect of a binary treatment on
an outcome. We consider a setup in which the outcome realization does not get
immediately realized after the treatment assignment, a feature that is
ubiquitous in empirical settings. The period between the treatment and the
realization of the outcome allows other observed actions to occur and affect
the outcome. In this context, we study several regression-based estimands
routinely used in empirical work to capture the average treatment effect and
shed light on interpreting them in terms of ceteris paribus effects, indirect
causal effects, and selection terms. We obtain three main and related takeaways
under a common set of assumptions. First, the three most popular estimands do
not generally satisfy what we call strong sign preservation, in the
sense that these estimands may be negative even when the treatment positively
affects the outcome conditional on any possible combination of other actions.
Second, the most popular regression that includes the other actions as controls
satisfies strong sign preservation if and only if these actions are
mutually exclusive binary variables. Finally, we show that a linear regression
that fully stratifies the other actions leads to estimands that satisfy strong
sign preservation.

arXiv link: http://arxiv.org/abs/2302.11505v5

Econometrics arXiv paper, submitted: 2023-02-20

Attitudes and Latent Class Choice Models using Machine learning

Authors: Lorena Torres Lahoz, Francisco Camara Pereira, Georges Sfeir, Ioanna Arkoudi, Mayara Moraes Monteiro, Carlos Lima Azevedo

Latent Class Choice Models (LCCM) are extensions of discrete choice models
(DCMs) that capture unobserved heterogeneity in the choice process by
segmenting the population based on the assumption of preference similarities.
We present a method of efficiently incorporating attitudinal indicators in the
specification of LCCM, by introducing Artificial Neural Networks (ANN) to
formulate latent variables constructs. This formulation overcomes structural
equations in its capability of exploring the relationship between the
attitudinal indicators and the decision choice, given the Machine Learning (ML)
flexibility and power in capturing unobserved and complex behavioural features,
such as attitudes and beliefs. All of this while still maintaining the
consistency of the theoretical assumptions presented in the Generalized Random
Utility model and the interpretability of the estimated parameters. We test our
proposed framework for estimating a Car-Sharing (CS) service subscription
choice with stated preference data from Copenhagen, Denmark. The results show
that our proposed approach provides a complete and realistic segmentation,
which helps design better policies.

arXiv link: http://arxiv.org/abs/2302.09871v1

Econometrics arXiv updated paper (originally submitted: 2023-02-20)

Identification-robust inference for the LATE with high-dimensional covariates

Authors: Yukun Ma

This paper presents an inference method for the local average treatment
effect (LATE) in the presence of high-dimensional covariates, irrespective of
the strength of identification. We propose a novel high-dimensional conditional
test statistic with uniformly correct asymptotic size. We provide an
easy-to-implement algorithm to infer the high-dimensional LATE by inverting our
test statistic and employing the double/debiased machine learning method.
Simulations indicate that our test is robust against both weak identification
and high dimensionality concerning size control and power performance,
outperforming other conventional tests. Applying the proposed method to
railroad and population data to study the effect of railroad access on urban
population growth, we observe that our methodology yields confidence intervals
that are 49% to 92% shorter than conventional results, depending on
specifications.

arXiv link: http://arxiv.org/abs/2302.09756v4

Econometrics arXiv updated paper (originally submitted: 2023-02-18)

Clustered Covariate Regression

Authors: Abdul-Nasah Soale, Emmanuel Selorm Tsyawo

High covariate dimensionality is increasingly occurrent in model estimation,
and existing techniques to address this issue typically require sparsity or
discrete heterogeneity of the unobservable parameter vector. However,
neither restriction may be supported by economic theory in some empirical
contexts, leading to severe bias and misleading inference. The clustering-based
grouped parameter estimator (GPE) introduced in this paper drops both
restrictions and maintains the natural one that the parameter support be
bounded. GPE exhibits robust large sample properties under standard conditions
and accommodates both sparse and non-sparse parameters whose support can be
bounded away from zero. Extensive Monte Carlo simulations demonstrate the
excellent performance of GPE in terms of bias reduction and size control
compared to competing estimators. An empirical application of GPE to estimating
price and income elasticities of demand for gasoline highlights its practical
utility.

arXiv link: http://arxiv.org/abs/2302.09255v4

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2023-02-17

Post Reinforcement Learning Inference

Authors: Vasilis Syrgkanis, Ruohan Zhan

We study estimation and inference using data collected by reinforcement
learning (RL) algorithms. These algorithms adaptively experiment by interacting
with individual units over multiple stages, updating their strategies based on
past outcomes. Our goal is to evaluate a counterfactual policy after data
collection and estimate structural parameters, such as dynamic treatment
effects, that support credit assignment and quantify the impact of early
actions on final outcomes. These parameters can often be defined as solutions
to moment equations, motivating moment-based estimation methods developed for
static data. In RL settings, however, data are often collected adaptively under
nonstationary behavior policies. As a result, standard estimators fail to
achieve asymptotic normality due to time-varying variance. We propose a
weighted generalized method of moments (GMM) approach that uses adaptive
weights to stabilize this variance. We characterize weighting schemes that
ensure consistency and asymptotic normality of the weighted GMM estimators,
enabling valid hypothesis testing and uniform confidence region construction.
Key applications include dynamic treatment effect estimation and dynamic
off-policy evaluation.

arXiv link: http://arxiv.org/abs/2302.08854v5

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2023-02-16

New $\sqrt{n}$-consistent, numerically stable higher-order influence function estimators

Authors: Lin Liu, Chang Li

Higher-Order Influence Functions (HOIFs) provide a unified theory for
constructing rate-optimal estimators for a large class of low-dimensional
(smooth) statistical functionals/parameters (and sometimes even
infinite-dimensional functions) that arise in substantive fields including
epidemiology, economics, and the social sciences. Since the introduction of
HOIFs by Robins et al. (2008), they have been viewed mostly as a theoretical
benchmark rather than a useful tool for statistical practice. Works aimed to
flip the script are scant, but a few recent papers Liu et al. (2017, 2021b)
make some partial progress. In this paper, we take a fresh attempt at achieving
this goal by constructing new, numerically stable HOIF estimators (or sHOIF
estimators for short with “s” standing for “stable”) with provable
statistical, numerical, and computational guarantees. This new class of sHOIF
estimators (up to the 2nd order) was foreshadowed in synthetic experiments
conducted by Liu et al. (2020a).

arXiv link: http://arxiv.org/abs/2302.08097v1

Econometrics arXiv updated paper (originally submitted: 2023-02-16)

Deep Learning Enhanced Realized GARCH

Authors: Chen Liu, Chao Wang, Minh-Ngoc Tran, Robert Kohn

We propose a new approach to volatility modeling by combining deep learning
(LSTM) and realized volatility measures. This LSTM-enhanced realized GARCH
framework incorporates and distills modeling advances from financial
econometrics, high frequency trading data and deep learning. Bayesian inference
via the Sequential Monte Carlo method is employed for statistical inference and
forecasting. The new framework can jointly model the returns and realized
volatility measures, has an excellent in-sample fit and superior predictive
performance compared to several benchmark models, while being able to adapt
well to the stylized facts in volatility. The performance of the new framework
is tested using a wide range of metrics, from marginal likelihood, volatility
forecasting, to tail risk forecasting and option pricing. We report on a
comprehensive empirical study using 31 widely traded stock indices over a time
period that includes COVID-19 pandemic.

arXiv link: http://arxiv.org/abs/2302.08002v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-02-15

A Guide to Regression Discontinuity Designs in Medical Applications

Authors: Matias D. Cattaneo, Luke Keele, Rocio Titiunik

We present a practical guide for the analysis of regression discontinuity
(RD) designs in biomedical contexts. We begin by introducing key concepts,
assumptions, and estimands within both the continuity-based framework and the
local randomization framework. We then discuss modern estimation and inference
methods within both frameworks, including approaches for bandwidth or local
neighborhood selection, optimal treatment effect point estimation, and robust
bias-corrected inference methods for uncertainty quantification. We also
overview empirical falsification tests that can be used to support key
assumptions. Our discussion focuses on two particular features that are
relevant in biomedical research: (i) fuzzy RD designs, which often arise when
therapeutic treatments are based on clinical guidelines but patients with
scores near the cutoff are treated contrary to the assignment rule; and (ii) RD
designs with discrete scores, which are ubiquitous in biomedical applications.
We illustrate our discussion with three empirical applications: the effect of
CD4 guidelines for anti-retroviral therapy on retention of HIV patients in
South Africa, the effect of genetic guidelines for chemotherapy on breast
cancer recurrence in the United States, and the effects of age-based patient
cost-sharing on healthcare utilization in Taiwan. We provide replication
materials employing publicly available statistical software in Python, R and
Stata, offering researchers all necessary tools to conduct an RD analysis.

arXiv link: http://arxiv.org/abs/2302.07413v2

Econometrics arXiv paper, submitted: 2023-02-14

Sequential Estimation of Multivariate Factor Stochastic Volatility Models

Authors: Giorgio Calzolari, Roxana Halbleib, Christian Mücher

We provide a simple method to estimate the parameters of multivariate
stochastic volatility models with latent factor structures. These models are
very useful as they alleviate the standard curse of dimensionality, allowing
the number of parameters to increase only linearly with the number of the
return series. Although theoretically very appealing, these models have only
found limited practical application due to huge computational burdens. Our
estimation method is simple in implementation as it consists of two steps:
first, we estimate the loadings and the unconditional variances by maximum
likelihood, and then we use the efficient method of moments to estimate the
parameters of the stochastic volatility structure with GARCH as an auxiliary
model. In a comprehensive Monte Carlo study we show the good performance of our
method to estimate the parameters of interest accurately. The simulation study
and an application to real vectors of daily returns of dimensions up to 148
show the method's computation advantage over the existing estimation
procedures.

arXiv link: http://arxiv.org/abs/2302.07052v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-02-14

Quantiled conditional variance, skewness, and kurtosis by Cornish-Fisher expansion

Authors: Ningning Zhang, Ke Zhu

The conditional variance, skewness, and kurtosis play a central role in time
series analysis. These three conditional moments (CMs) are often studied by
some parametric models but with two big issues: the risk of model
mis-specification and the instability of model estimation. To avoid the above
two issues, this paper proposes a novel method to estimate these three CMs by
the so-called quantiled CMs (QCMs). The QCM method first adopts the idea of
Cornish-Fisher expansion to construct a linear regression model, based on $n$
different estimated conditional quantiles. Next, it computes the QCMs simply
and simultaneously by using the ordinary least squares estimator of this
regression model, without any prior estimation of the conditional mean. Under
certain conditions, the QCMs are shown to be consistent with the convergence
rate $n^{-1/2}$. Simulation studies indicate that the QCMs perform well under
different scenarios of Cornish-Fisher expansion errors and quantile estimation
errors. In the application, the study of QCMs for three exchange rates
demonstrates the effectiveness of financial rescue plans during the COVID-19
pandemic outbreak, and suggests that the existing “news impact curve”
functions for the conditional skewness and kurtosis may not be suitable.

arXiv link: http://arxiv.org/abs/2302.06799v2

Econometrics arXiv updated paper (originally submitted: 2023-02-11)

Individualized Treatment Allocation in Sequential Network Games

Authors: Toru Kitagawa, Guanyi Wang

Designing individualized allocation of treatments so as to maximize the
equilibrium welfare of interacting agents has many policy-relevant
applications. Focusing on sequential decision games of interacting agents, this
paper develops a method to obtain optimal treatment assignment rules that
maximize a social welfare criterion by evaluating stationary distributions of
outcomes. Stationary distributions in sequential decision games are given by
Gibbs distributions, which are difficult to optimize with respect to a
treatment allocation due to analytical and computational complexity. We apply a
variational approximation to the stationary distribution and optimize the
approximated equilibrium welfare with respect to treatment allocation using a
greedy optimization algorithm. We characterize the performance of the
variational approximation, deriving a performance guarantee for the greedy
optimization algorithm via a welfare regret bound. We implement our proposed
method in simulation exercises and an empirical application using the Indian
microfinance data (Banerjee et al., 2013), and show it delivers significant
welfare gains.

arXiv link: http://arxiv.org/abs/2302.05747v5

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2023-02-10

Minimax Instrumental Variable Regression and $L_2$ Convergence Guarantees without Identification or Closedness

Authors: Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

In this paper, we study nonparametric estimation of instrumental variable
(IV) regressions. Recently, many flexible machine learning methods have been
developed for instrumental variable estimation. However, these methods have at
least one of the following limitations: (1) restricting the IV regression to be
uniquely identified; (2) only obtaining estimation error rates in terms of
pseudometrics (e.g., projected norm) rather than valid metrics
(e.g., $L_2$ norm); or (3) imposing the so-called closedness condition
that requires a certain conditional expectation operator to be sufficiently
smooth. In this paper, we present the first method and analysis that can avoid
all three limitations, while still permitting general function approximation.
Specifically, we propose a new penalized minimax estimator that can converge to
a fixed IV solution even when there are multiple solutions, and we derive a
strong $L_2$ error rate for our estimator under lax conditions. Notably, this
guarantee only needs a widely-used source condition and realizability
assumptions, but not the so-called closedness condition. We argue that the
source condition and the closedness condition are inherently conflicting, so
relaxing the latter significantly improves upon the existing literature that
requires both conditions. Our estimator can achieve this improvement because it
builds on a novel formulation of the IV estimation problem as a constrained
optimization problem.

arXiv link: http://arxiv.org/abs/2302.05404v1

Econometrics arXiv updated paper (originally submitted: 2023-02-10)

Policy Learning with Rare Outcomes

Authors: Julia Hatamyar, Noemi Kreif

Machine learning (ML) estimates of conditional average treatment effects
(CATE) can guide policy decisions, either by allowing targeting of individuals
with beneficial CATE estimates, or as inputs to decision trees that optimise
overall outcomes. There is limited information available regarding how well
these algorithms perform in real-world policy evaluation scenarios. Using
synthetic data, we compare the finite sample performance of different policy
learning algorithms, machine learning techniques employed during their learning
phases, and methods for presenting estimated policy values. For each algorithm,
we assess the resulting treatment allocation by measuring deviation from the
ideal ("oracle") policy. Our main finding is that policy trees based on
estimated CATEs outperform trees learned from doubly-robust scores. Across
settings, Causal Forests and the Normalised Double-Robust Learner perform
consistently well, while Bayesian Additive Regression Trees perform poorly.
These methods are then applied to a case study targeting optimal allocation of
subsidised health insurance, with the goal of reducing infant mortality in
Indonesia.

arXiv link: http://arxiv.org/abs/2302.05260v2

Econometrics arXiv paper, submitted: 2023-02-10

Structural Break Detection in Quantile Predictive Regression Models with Persistent Covariates

Authors: Christis Katsouris

We propose an econometric environment for structural break detection in
nonstationary quantile predictive regressions. We establish the limit
distributions for a class of Wald and fluctuation type statistics based on both
the ordinary least squares estimator and the endogenous instrumental regression
estimator proposed by Phillips and Magdalinos (2009a, Econometric Inference in
the Vicinity of Unity. Working paper, Singapore Management University).
Although the asymptotic distribution of these test statistics appears to depend
on the chosen estimator, the IVX based tests are shown to be asymptotically
nuisance parameter-free regardless of the degree of persistence and consistent
under local alternatives. The finite-sample performance of both tests is
evaluated via simulation experiments. An empirical application to house pricing
index returns demonstrates the practicality of the proposed break tests for
regression quantiles of nonstationary time series data.

arXiv link: http://arxiv.org/abs/2302.05193v1

Econometrics arXiv paper, submitted: 2023-02-10

On semiparametric estimation of the intercept of the sample selection model: a kernel approach

Authors: Zhewen Pan

This paper presents a new perspective on the identification at infinity for
the intercept of the sample selection model as identification at the boundary
via a transformation of the selection index. This perspective suggests
generalizations of estimation at infinity to kernel regression estimation at
the boundary and further to local linear estimation at the boundary. The
proposed kernel-type estimators with an estimated transformation are proven to
be nonparametric-rate consistent and asymptotically normal under mild
regularity conditions. A fully data-driven method of selecting the optimal
bandwidths for the estimators is developed. The Monte Carlo simulation shows
the desirable finite sample properties of the proposed estimators and bandwidth
selection procedures.

arXiv link: http://arxiv.org/abs/2302.05089v1

Econometrics arXiv updated paper (originally submitted: 2023-02-09)

Covariate Adjustment in Experiments with Matched Pairs

Authors: Yuehao Bai, Liang Jiang, Joseph P. Romano, Azeem M. Shaikh, Yichong Zhang

This paper studies inference on the average treatment effect in experiments
in which treatment status is determined according to "matched pairs" and it is
additionally desired to adjust for observed, baseline covariates to gain
further precision. By a "matched pairs" design, we mean that units are sampled
i.i.d. from the population of interest, paired according to observed, baseline
covariates and finally, within each pair, one unit is selected at random for
treatment. Importantly, we presume that not all observed, baseline covariates
are used in determining treatment assignment. We study a broad class of
estimators based on a "doubly robust" moment condition that permits us to study
estimators with both finite-dimensional and high-dimensional forms of covariate
adjustment. We find that estimators with finite-dimensional, linear adjustments
need not lead to improvements in precision relative to the unadjusted
difference-in-means estimator. This phenomenon persists even if the adjustments
are interacted with treatment; in fact, doing so leads to no changes in
precision. However, gains in precision can be ensured by including fixed
effects for each of the pairs. Indeed, we show that this adjustment is the
"optimal" finite-dimensional, linear adjustment. We additionally study two
estimators with high-dimensional forms of covariate adjustment based on the
LASSO. For each such estimator, we show that it leads to improvements in
precision relative to the unadjusted difference-in-means estimator and also
provide conditions under which it leads to the "optimal" nonparametric,
covariate adjustment. A simulation study confirms the practical relevance of
our theoretical analysis, and the methods are employed to reanalyze data from
an experiment using a "matched pairs" design to study the effect of
macroinsurance on microenterprise.

arXiv link: http://arxiv.org/abs/2302.04380v3

Econometrics arXiv updated paper (originally submitted: 2023-02-08)

Consider or Choose? The Role and Power of Consideration Sets

Authors: Yi-Chun Akchen, Dmitry Mitrofanov

Consideration sets play a crucial role in discrete choice modeling, where
customers often form consideration sets in the first stage and then use a
second-stage choice mechanism to select the product with the highest utility.
While many recent studies aim to improve choice models by incorporating more
sophisticated second-stage choice mechanisms, this paper takes a step back and
goes into the opposite extreme. We simplify the second-stage choice mechanism
to its most basic form and instead focus on modeling customer choice by
emphasizing the role and power of the first-stage consideration set formation.
To this end, we study a model that is parameterized solely by a distribution
over consideration sets with a bounded rationality interpretation.
Intriguingly, we show that this model is characterized by the axiom of
symmetric demand cannibalization, enabling complete statistical identification.
The latter finding highlights the critical role of consideration sets in the
identifiability of two-stage choice models. We also examine the model's
implications for assortment planning, proving that the optimal assortment is
revenue-ordered within each partition block created by consideration sets.
Despite this compelling structure, we establish that the assortment problem
under this model is NP-hard even to approximate, highlighting how consideration
sets contribute to nontractability, even under the simplest uniform
second-stage choice mechanism. Finally, using real-world data, we show that the
model achieves prediction performance comparable to other advanced choice
models. Given the simplicity of the model's second-stage phase, this result
showcases the enormous power of first-stage consideration set formation in
capturing customers' decision-making processes.

arXiv link: http://arxiv.org/abs/2302.04354v4

Econometrics arXiv updated paper (originally submitted: 2023-02-08)

High-Dimensional Granger Causality for Climatic Attribution

Authors: Marina Friedrich, Luca Margaritella, Stephan Smeekes

In this paper we test for Granger causality in high-dimensional vector
autoregressive models (VARs) to disentangle and interpret the complex causal
chains linking radiative forcings and global temperatures. By allowing for high
dimensionality in the model, we can enrich the information set with relevant
natural and anthropogenic forcing variables to obtain reliable causal
relations. This provides a step forward from existing climatology literature,
which has mostly treated these variables in isolation in small models.
Additionally, our framework allows to disregard the order of integration of the
variables by directly estimating the VAR in levels, thus avoiding accumulating
biases coming from unit-root and cointegration tests. This is of particular
appeal for climate time series which are well known to contain stochastic
trends and long memory. We are thus able to establish causal networks linking
radiative forcings to global temperatures and to connect radiative forcings
among themselves, thereby allowing for tracing the path of dynamic causal
effects through the system.

arXiv link: http://arxiv.org/abs/2302.03996v2

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2023-02-07

Reevaluating the Taylor Rule with Machine Learning

Authors: Alper Deniz Karakas

This paper aims to reevaluate the Taylor Rule, through a linear and a
nonlinear method, such that its estimated federal funds rates match those
actually previously implemented by the Federal Reserve Bank. In the linear
method, this paper uses an OLS regression model to find more accurate
coefficients within the same Taylor Rule equation in which the dependent
variable is the federal funds rate, and the independent variables are the
inflation rate, the inflation gap, and the output gap. The intercept in the OLS
regression model would capture the constant equilibrium target real interest
rate set at 2. The linear OLS method suggests that the Taylor Rule
overestimates the output gap and standalone inflation rate's coefficients for
the Taylor Rule. The coefficients this paper suggests are shown in equation
(2). In the nonlinear method, this paper uses a machine learning system in
which the two inputs are the inflation rate and the output gap and the output
is the federal funds rate. This system utilizes gradient descent error
minimization to create a model that minimizes the error between the estimated
federal funds rate and the actual previously implemented federal funds rate.
Since the machine learning system allows the model to capture the more
realistic nonlinear relationship between the variables, it significantly
increases the estimation accuracy as a result. The actual and estimated federal
funds rates are almost identical besides three recessions caused by bubble
bursts, which the paper addresses in the concluding remarks. Overall, the first
method provides theoretical insight while the second suggests a model with
improved applicability.

arXiv link: http://arxiv.org/abs/2302.08323v1

Econometrics arXiv updated paper (originally submitted: 2023-02-07)

Covariate Adjustment in Stratified Experiments

Authors: Max Cytrynbaum

This paper studies covariate adjusted estimation of the average treatment
effect in stratified experiments. We work in a general framework that includes
matched tuples designs, coarse stratification, and complete randomization as
special cases. Regression adjustment with treatment-covariate interactions is
known to weakly improve efficiency for completely randomized designs. By
contrast, we show that for stratified designs such regression estimators are
generically inefficient, potentially even increasing estimator variance
relative to the unadjusted benchmark. Motivated by this result, we derive the
asymptotically optimal linear covariate adjustment for a given stratification.
We construct several feasible estimators that implement this efficient
adjustment in large samples. In the special case of matched pairs, for example,
the regression including treatment, covariates, and pair fixed effects is
asymptotically optimal. We also provide novel asymptotically exact inference
methods that allow researchers to report smaller confidence intervals, fully
reflecting the efficiency gains from both stratification and adjustment.
Simulations and an empirical application demonstrate the value of our proposed
methods.

arXiv link: http://arxiv.org/abs/2302.03687v4

Econometrics arXiv paper, submitted: 2023-02-07

High-Dimensional Conditionally Gaussian State Space Models with Missing Data

Authors: Joshua C. C. Chan, Aubrey Poon, Dan Zhu

We develop an efficient sampling approach for handling complex missing data
patterns and a large number of missing observations in conditionally Gaussian
state space models. Two important examples are dynamic factor models with
unbalanced datasets and large Bayesian VARs with variables in multiple
frequencies. A key insight underlying the proposed approach is that the joint
distribution of the missing data conditional on the observed data is Gaussian.
Moreover, the inverse covariance or precision matrix of this conditional
distribution is sparse, and this special structure can be exploited to
substantially speed up computations. We illustrate the methodology using two
empirical applications. The first application combines quarterly, monthly and
weekly data using a large Bayesian VAR to produce weekly GDP estimates. In the
second application, we extract latent factors from unbalanced datasets
involving over a hundred monthly variables via a dynamic factor model with
stochastic volatility.

arXiv link: http://arxiv.org/abs/2302.03172v1

Econometrics arXiv paper, submitted: 2023-02-06

Extensions for Inference in Difference-in-Differences with Few Treated Clusters

Authors: Luis Alvarez, Bruno Ferman

In settings with few treated units, Difference-in-Differences (DID)
estimators are not consistent, and are not generally asymptotically normal.
This poses relevant challenges for inference. While there are inference methods
that are valid in these settings, some of these alternatives are not readily
available when there is variation in treatment timing and heterogeneous
treatment effects; or for deriving uniform confidence bands for event-study
plots. We present alternatives in settings with few treated units that are
valid with variation in treatment timing and/or that allow for uniform
confidence bands.

arXiv link: http://arxiv.org/abs/2302.03131v1

Econometrics arXiv updated paper (originally submitted: 2023-02-06)

Asymptotic Representations for Sequential Decisions, Adaptive Experiments, and Batched Bandits

Authors: Keisuke Hirano, Jack R. Porter

We develop asymptotic approximations that can be applied to sequential
estimation and inference problems, adaptive randomized controlled trials, and
related settings. In batched adaptive settings where the decision at one stage
can affect the observation of variables in later stages, our asymptotic
representation characterizes all limit distributions attainable through a joint
choice of an adaptive design rule and statistics applied to the adaptively
generated data. This facilitates local power analysis of tests, comparison of
adaptive treatments rules, and other analyses of batchwise sequential
statistical decision rules.

arXiv link: http://arxiv.org/abs/2302.03117v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2023-02-06

Asymptotically Optimal Fixed-Budget Best Arm Identification with Variance-Dependent Bounds

Authors: Masahiro Kato, Masaaki Imaizumi, Takuya Ishihara, Toru Kitagawa

We investigate the problem of fixed-budget best arm identification (BAI) for
minimizing expected simple regret. In an adaptive experiment, a decision maker
draws one of multiple treatment arms based on past observations and observes
the outcome of the drawn arm. After the experiment, the decision maker
recommends the treatment arm with the highest expected outcome. We evaluate the
decision based on the expected simple regret, which is the difference between
the expected outcomes of the best arm and the recommended arm. Due to inherent
uncertainty, we evaluate the regret using the minimax criterion. First, we
derive asymptotic lower bounds for the worst-case expected simple regret, which
are characterized by the variances of potential outcomes (leading factor).
Based on the lower bounds, we propose the Two-Stage (TS)-Hirano-Imbens-Ridder
(HIR) strategy, which utilizes the HIR estimator (Hirano et al., 2003) in
recommending the best arm. Our theoretical analysis shows that the TS-HIR
strategy is asymptotically minimax optimal, meaning that the leading factor of
its worst-case expected simple regret matches our derived worst-case lower
bound. Additionally, we consider extensions of our method, such as the
asymptotic optimality for the probability of misidentification. Finally, we
validate the proposed method's effectiveness through simulations.

arXiv link: http://arxiv.org/abs/2302.02988v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2023-02-06

In Search of Insights, Not Magic Bullets: Towards Demystification of the Model Selection Dilemma in Heterogeneous Treatment Effect Estimation

Authors: Alicia Curth, Mihaela van der Schaar

Personalized treatment effect estimates are often of interest in high-stakes
applications -- thus, before deploying a model estimating such effects in
practice, one needs to be sure that the best candidate from the ever-growing
machine learning toolbox for this task was chosen. Unfortunately, due to the
absence of counterfactual information in practice, it is usually not possible
to rely on standard validation metrics for doing so, leading to a well-known
model selection dilemma in the treatment effect estimation literature. While
some solutions have recently been investigated, systematic understanding of the
strengths and weaknesses of different model selection criteria is still
lacking. In this paper, instead of attempting to declare a global `winner', we
therefore empirically investigate success- and failure modes of different
selection criteria. We highlight that there is a complex interplay between
selection strategies, candidate estimators and the data used for comparing
them, and provide interesting insights into the relative (dis)advantages of
different criteria alongside desiderata for the design of further illuminating
empirical studies in this context.

arXiv link: http://arxiv.org/abs/2302.02923v2

Econometrics arXiv paper, submitted: 2023-02-06

Penalized Quasi-likelihood Estimation and Model Selection in Time Series Models with Parameters on the Boundary

Authors: Heino Bohn Nielsen, Anders Rahbek

We extend the theory from Fan and Li (2001) on penalized likelihood-based
estimation and model-selection to statistical and econometric models which
allow for non-negativity constraints on some or all of the parameters, as well
as time-series dependence. It differs from classic non-penalized likelihood
estimation, where limiting distributions of likelihood-based estimators and
test-statistics are non-standard, and depend on the unknown number of
parameters on the boundary of the parameter space. Specifically, we establish
that the joint model selection and estimation, results in standard asymptotic
Gaussian distributed estimators. The results are applied to the rich class of
autoregressive conditional heteroskedastic (ARCH) models for the modelling of
time-varying volatility. We find from simulations that the penalized estimation
and model-selection works surprisingly well even for a large number of
parameters. A simple empirical illustration for stock-market returns data
confirms the ability of the penalized estimation to select ARCH models which
fit nicely the autocorrelation function, as well as confirms the stylized fact
of long-memory in financial time series data.

arXiv link: http://arxiv.org/abs/2302.02867v1

Econometrics arXiv updated paper (originally submitted: 2023-02-06)

Out of Sample Predictability in Predictive Regressions with Many Predictor Candidates

Authors: Jesus Gonzalo, Jean-Yves Pitarakis

This paper is concerned with detecting the presence of out of sample
predictability in linear predictive regressions with a potentially large set of
candidate predictors. We propose a procedure based on out of sample MSE
comparisons that is implemented in a pairwise manner using one predictor at a
time and resulting in an aggregate test statistic that is standard normally
distributed under the global null hypothesis of no linear predictability.
Predictors can be highly persistent, purely stationary or a combination of
both. Upon rejection of the null hypothesis we subsequently introduce a
predictor screening procedure designed to identify the most active predictors.
An empirical application to key predictors of US economic activity illustrates
the usefulness of our methods and highlights the important forward looking role
played by the series of manufacturing new orders.

arXiv link: http://arxiv.org/abs/2302.02866v2

Econometrics arXiv updated paper (originally submitted: 2023-02-06)

Testing Quantile Forecast Optimality

Authors: Jack Fosten, Daniel Gutknecht, Marc-Oliver Pohle

Quantile forecasts made across multiple horizons have become an important
output of many financial institutions, central banks and international
organisations. This paper proposes misspecification tests for such quantile
forecasts that assess optimality over a set of multiple forecast horizons
and/or quantiles. The tests build on multiple Mincer-Zarnowitz quantile
regressions cast in a moment equality framework. Our main test is for the null
hypothesis of autocalibration, a concept which assesses optimality with respect
to the information contained in the forecasts themselves. We provide an
extension that allows to test for optimality with respect to larger information
sets and a multivariate extension. Importantly, our tests do not just inform
about general violations of optimality, but may also provide useful insights
into specific forms of sub-optimality. A simulation study investigates the
finite sample performance of our tests, and two empirical applications to
financial returns and U.S. macroeconomic series illustrate that our tests can
yield interesting insights into quantile forecast sub-optimality and its
causes.

arXiv link: http://arxiv.org/abs/2302.02747v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-02-05

Estimating Time-Varying Networks for High-Dimensional Time Series

Authors: Jia Chen, Degui Li, Yuning Li, Oliver Linton

We explore time-varying networks for high-dimensional locally stationary time
series, using the large VAR model framework with both the transition and
(error) precision matrices evolving smoothly over time. Two types of
time-varying graphs are investigated: one containing directed edges of Granger
causality linkages, and the other containing undirected edges of partial
correlation linkages. Under the sparse structural assumption, we propose a
penalised local linear method with time-varying weighted group LASSO to jointly
estimate the transition matrices and identify their significant entries, and a
time-varying CLIME method to estimate the precision matrices. The estimated
transition and precision matrices are then used to determine the time-varying
network structures. Under some mild conditions, we derive the theoretical
properties of the proposed estimates including the consistency and oracle
properties. In addition, we extend the methodology and theory to cover
highly-correlated large-scale time series, for which the sparsity assumption
becomes invalid and we allow for common factors before estimating the
factor-adjusted time-varying networks. We provide extensive simulation studies
and an empirical application to a large U.S. macroeconomic dataset to
illustrate the finite-sample performance of our methods.

arXiv link: http://arxiv.org/abs/2302.02476v1

Econometrics arXiv paper, submitted: 2023-02-05

Testing for Structural Change under Nonstationarity

Authors: Christis Katsouris

This Appendix (dated: July 2021) includes supplementary derivations related
to the main limit results of the econometric framework for structural break
testing in predictive regression models based on the OLS-Wald and IVX-Wald test
statistics, developed by Katsouris C (2021). In particular, we derive the
asymptotic distributions of the test statistics when the predictive regression
model includes either mildly integrated or persistent regressors. Moreover, we
consider the case in which a model intercept is included in the model vis-a-vis
the case that the predictive regression model has no model intercept. In a
subsequent version of this study we reexamine these particular aspects in more
depth with respect to the demeaned versions of the variables of the predictive
regression.

arXiv link: http://arxiv.org/abs/2302.02370v1

Econometrics arXiv paper, submitted: 2023-02-03

Using bayesmixedlogit and bayesmixedlogitwtp in Stata

Authors: Matthew J. Baker

This document presents an overview of the bayesmixedlogit and
bayesmixedlogitwtp Stata packages. It mirrors closely the helpfile obtainable
in Stata (i.e., through help bayesmixedlogit or help bayesmixedlogitwtp).
Further background for the packages can be found in Baker(2014).

arXiv link: http://arxiv.org/abs/2302.01775v1

Econometrics arXiv paper, submitted: 2023-02-03

Agreed and Disagreed Uncertainty

Authors: Luca Gambetti, Dimitris Korobilis, John Tsoukalas, Francesco Zanetti

When agents' information is imperfect and dispersed, existing measures of
macroeconomic uncertainty based on the forecast error variance have two
distinct drivers: the variance of the economic shock and the variance of the
information dispersion. The former driver increases uncertainty and reduces
agents' disagreement (agreed uncertainty). The latter increases both
uncertainty and disagreement (disagreed uncertainty). We use these implications
to identify empirically the effects of agreed and disagreed uncertainty shocks,
based on a novel measure of consumer disagreement derived from survey
expectations. Disagreed uncertainty has no discernible economic effects and is
benign for economic activity, but agreed uncertainty exerts significant
depressing effects on a broad spectrum of macroeconomic indicators.

arXiv link: http://arxiv.org/abs/2302.01621v1

Econometrics arXiv updated paper (originally submitted: 2023-02-02)

Inference in Non-stationary High-Dimensional VARs

Authors: Alain Hecq, Luca Margaritella, Stephan Smeekes

In this paper we construct an inferential procedure for Granger causality in
high-dimensional non-stationary vector autoregressive (VAR) models. Our method
does not require knowledge of the order of integration of the time series under
consideration. We augment the VAR with at least as many lags as the suspected
maximum order of integration, an approach which has been proven to be robust
against the presence of unit roots in low dimensions. We prove that we can
restrict the augmentation to only the variables of interest for the testing,
thereby making the approach suitable for high dimensions. We combine this lag
augmentation with a post-double-selection procedure in which a set of initial
penalized regressions is performed to select the relevant variables for both
the Granger causing and caused variables. We then establish uniform asymptotic
normality of a second-stage regression involving only the selected variables.
Finite sample simulations show good performance, an application to investigate
the (predictive) causes and effects of economic uncertainty illustrates the
need to allow for unknown orders of integration.

arXiv link: http://arxiv.org/abs/2302.01434v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2023-02-02

A Machine Learning Approach to Measuring Climate Adaptation

Authors: Max Vilgalys

I measure adaptation to climate change by comparing elasticities from
short-run and long-run changes in damaging weather. I propose a debiased
machine learning approach to flexibly measure these elasticities in panel
settings. In a simulation exercise, I show that debiased machine learning has
considerable benefits relative to standard machine learning or ordinary least
squares, particularly in high-dimensional settings. I then measure adaptation
to damaging heat exposure in United States corn and soy production. Using rich
sets of temperature and precipitation variation, I find evidence that short-run
impacts from damaging heat are significantly offset in the long run. I show
that this is because the impacts of long-run changes in heat exposure do not
follow the same functional form as short-run shocks to heat exposure.

arXiv link: http://arxiv.org/abs/2302.01236v1

Econometrics arXiv updated paper (originally submitted: 2023-02-02)

Sparse High-Dimensional Vector Autoregressive Bootstrap

Authors: Robert Adamek, Stephan Smeekes, Ines Wilms

We introduce a high-dimensional multiplier bootstrap for time series data
based on capturing dependence through a sparsely estimated vector
autoregressive model. We prove its consistency for inference on
high-dimensional means under two different moment assumptions on the errors,
namely sub-gaussian moments and a finite number of absolute moments. In
establishing these results, we derive a Gaussian approximation for the maximum
mean of a linear process, which may be of independent interest.

arXiv link: http://arxiv.org/abs/2302.01233v2

Econometrics arXiv updated paper (originally submitted: 2023-02-01)

Regression Adjustment, Cross-Fitting, and Randomized Experiments with Many Controls

Authors: Harold D Chiang, Yukitoshi Matsushita, Taisuke Otsu

This paper studies estimation and inference for average treatment effects in
randomized experiments with many covariates, under a design-based framework
with a deterministic number of treated units. We show that a simple yet
powerful cross-fitted regression adjustment achieves bias-correction and leads
to sharper asymptotic properties than existing alternatives. Specifically, we
derive higher-order stochastic expansions, analyze associated inference
procedures, and propose a modified HC3 variance estimator that accounts for up
to second-order. Our analysis reveals that cross-fitting permits substantially
faster growth in the covariate dimension $p$ relative to sample size $n$, with
asymptotic normality holding under favorable designs when $p = o(n^{3/4}/(\log
n)^{1/2})$, improving on standard rates. We also explain and address the poor
size performance of conventional variance estimators. The methodology extends
naturally to stratified experiments with many strata. Simulations confirm that
the cross-fitted estimator, combined with the modified HC3, delivers accurate
estimation and reliable inference across diverse designs.

arXiv link: http://arxiv.org/abs/2302.00469v4

Econometrics arXiv paper, submitted: 2023-02-01

Adaptive hedging horizon and hedging performance estimation

Authors: Wang Haoyu, Junpeng Di, Qing Han

In this study, we constitute an adaptive hedging method based on empirical
mode decomposition (EMD) method to extract the adaptive hedging horizon and
build a time series cross-validation method for robust hedging performance
estimation. Basing on the variance reduction criterion and the value-at-risk
(VaR) criterion, we find that the estimation of in-sample hedging performance
is inconsistent with that of the out-sample hedging performance. The EMD
hedging method family exhibits superior performance on the VaR criterion
compared with the minimum variance hedging method. The matching degree of the
spot and futures contracts at the specific time scale is the key determinant of
the hedging performance in the corresponding hedging horizon.

arXiv link: http://arxiv.org/abs/2302.00251v1

Econometrics arXiv cross-link from cs.CV (cs.CV), submitted: 2023-01-31

Real Estate Property Valuation using Self-Supervised Vision Transformers

Authors: Mahdieh Yazdani, Maziar Raissi

The use of Artificial Intelligence (AI) in the real estate market has been
growing in recent years. In this paper, we propose a new method for property
valuation that utilizes self-supervised vision transformers, a recent
breakthrough in computer vision and deep learning. Our proposed algorithm uses
a combination of machine learning, computer vision and hedonic pricing models
trained on real estate data to estimate the value of a given property. We
collected and pre-processed a data set of real estate properties in the city of
Boulder, Colorado and used it to train, validate and test our algorithm. Our
data set consisted of qualitative images (including house interiors, exteriors,
and street views) as well as quantitative features such as the number of
bedrooms, bathrooms, square footage, lot square footage, property age, crime
rates, and proximity to amenities. We evaluated the performance of our model
using metrics such as Root Mean Squared Error (RMSE). Our findings indicate
that these techniques are able to accurately predict the value of properties,
with a low RMSE. The proposed algorithm outperforms traditional appraisal
methods that do not leverage property images and has the potential to be used
in real-world applications.

arXiv link: http://arxiv.org/abs/2302.00117v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-01-31

Factor Model of Mixtures

Authors: Cheng Peng, Stanislav Uryasev

This paper proposes a new approach to estimating the distribution of a
response variable conditioned on observing some factors. The proposed approach
possesses desirable properties of flexibility, interpretability, tractability
and extendability. The conditional quantile function is modeled by a mixture
(weighted sum) of basis quantile functions, with the weights depending on
factors. The calibration problem is formulated as a convex optimization
problem. It can be viewed as conducting quantile regressions for all confidence
levels simultaneously while avoiding quantile crossing by definition. The
calibration problem is equivalent to minimizing the continuous ranked
probability score (CRPS). Based on the canonical polyadic (CP) decomposition of
tensors, we propose a dimensionality reduction method that reduces the rank of
the parameter tensor and propose an alternating algorithm for estimation.
Additionally, based on Risk Quadrangle framework, we generalize the approach to
conditional distributions defined by Conditional Value-at-Risk (CVaR),
expectile and other functions of uncertainty measures. Although this paper
focuses on using splines as the weight functions, it can be extended to neural
networks. Numerical experiments demonstrate the effectiveness of our approach.

arXiv link: http://arxiv.org/abs/2301.13843v2

Econometrics arXiv paper, submitted: 2023-01-31

On Using The Two-Way Cluster-Robust Standard Errors

Authors: Harold D Chiang, Yuya Sasaki

Thousands of papers have reported two-way cluster-robust (TWCR) standard
errors. However, the recent econometrics literature points out the potential
non-gaussianity of two-way cluster sample means, and thus invalidity of the
inference based on the TWCR standard errors. Fortunately, simulation studies
nonetheless show that the gaussianity is rather common than exceptional. This
paper provides theoretical support for this encouraging observation.
Specifically, we derive a novel central limit theorem for two-way clustered
triangular arrays that justifies the use of the TWCR under very mild and
interpretable conditions. We, therefore, hope that this paper will provide a
theoretical justification for the legitimacy of most, if not all, of the
thousands of those empirical papers that have used the TWCR standard errors. We
provide a guide in practice as to when a researcher can employ the TWCR
standard errors.

arXiv link: http://arxiv.org/abs/2301.13775v1

Econometrics arXiv updated paper (originally submitted: 2023-01-31)

Approximate Functional Differencing

Authors: Geert Dhaene, Martin Weidner

Inference on common parameters in panel data models with individual-specific
fixed effects is a classic example of Neyman and Scott's (1948) incidental
parameter problem (IPP). One solution to this IPP is functional differencing
(Bonhomme 2012), which works when the number of time periods T is fixed (and
may be small), but this solution is not applicable to all panel data models of
interest. Another solution, which applies to a larger class of models, is
"large-T" bias correction (pioneered by Hahn and Kuersteiner 2002 and Hahn and
Newey 2004), but this is only guaranteed to work well when T is sufficiently
large. This paper provides a unified approach that connects those two seemingly
disparate solutions to the IPP. In doing so, we provide an approximate version
of functional differencing, that is, an approximate solution to the IPP that is
applicable to a large class of panel data models even when T is relatively
small.

arXiv link: http://arxiv.org/abs/2301.13736v2

Econometrics arXiv paper, submitted: 2023-01-31

Bridging the Covid-19 Data and the Epidemiological Model using Time-Varying Parameter SIRD Model

Authors: Cem Cakmakli, Yasin Simsek

This paper extends the canonical model of epidemiology, the SIRD model, to
allow for time-varying parameters for real-time measurement and prediction of
the trajectory of the Covid-19 pandemic. Time variation in model parameters is
captured using the generalized autoregressive score modeling structure designed
for the typical daily count data related to the pandemic. The resulting
specification permits a flexible yet parsimonious model with a low
computational cost. The model is extended to allow for unreported cases using a
mixed-frequency setting. Results suggest that these cases' effects on the
parameter estimates might be sizeable. Full sample results show that the
flexible framework accurately captures the successive waves of the pandemic. A
real-time exercise indicates that the proposed structure delivers timely and
precise information on the pandemic's current stance. This superior
performance, in turn, transforms into accurate predictions of the confirmed and
death cases.

arXiv link: http://arxiv.org/abs/2301.13692v1

Econometrics arXiv updated paper (originally submitted: 2023-01-31)

Nonlinearities in Macroeconomic Tail Risk through the Lens of Big Data Quantile Regressions

Authors: Jan Prüser, Florian Huber

Modeling and predicting extreme movements in GDP is notoriously difficult and
the selection of appropriate covariates and/or possible forms of nonlinearities
are key in obtaining precise forecasts. In this paper, our focus is on using
large datasets in quantile regression models to forecast the conditional
distribution of US GDP growth. To capture possible non-linearities, we include
several nonlinear specifications. The resulting models will be huge dimensional
and we thus rely on a set of shrinkage priors. Since Markov Chain Monte Carlo
estimation becomes slow in these dimensions, we rely on fast variational Bayes
approximations to the posterior distribution of the coefficients and the latent
states. We find that our proposed set of models produces precise forecasts.
These gains are especially pronounced in the tails. Using Gaussian processes to
approximate the nonlinear component of the model further improves the good
performance, in particular in the right tail.

arXiv link: http://arxiv.org/abs/2301.13604v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2023-01-30

STEEL: Singularity-aware Reinforcement Learning

Authors: Xiaohong Chen, Zhengling Qi, Runzhe Wan

Batch reinforcement learning (RL) aims at leveraging pre-collected data to
find an optimal policy that maximizes the expected total rewards in a dynamic
environment. The existing methods require absolutely continuous assumption
(e.g., there do not exist non-overlapping regions) on the distribution induced
by target policies with respect to the data distribution over either the state
or action or both. We propose a new batch RL algorithm that allows for
singularity for both state and action spaces (e.g., existence of
non-overlapping regions between offline data distribution and the distribution
induced by the target policies) in the setting of an infinite-horizon Markov
decision process with continuous states and actions. We call our algorithm
STEEL: SingulariTy-awarE rEinforcement Learning. Our algorithm is motivated by
a new error analysis on off-policy evaluation, where we use maximum mean
discrepancy, together with distributionally robust optimization, to
characterize the error of off-policy evaluation caused by the possible
singularity and to enable model extrapolation. By leveraging the idea of
pessimism and under some technical conditions, we derive a first finite-sample
regret guarantee for our proposed algorithm under singularity. Compared with
existing algorithms,by requiring only minimal data-coverage assumption, STEEL
improves the applicability and robustness of batch RL. In addition, a two-step
adaptive STEEL, which is nearly tuning-free, is proposed. Extensive simulation
studies and one (semi)-real experiment on personalized pricing demonstrate the
superior performance of our methods in dealing with possible singularity in
batch RL.

arXiv link: http://arxiv.org/abs/2301.13152v5

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2023-01-30

Prediction of Customer Churn in Banking Industry

Authors: Sina Esmaeilpour Charandabi

With the growing competition in banking industry, banks are required to
follow customer retention strategies while they are trying to increase their
market share by acquiring new customers. This study compares the performance of
six supervised classification techniques to suggest an efficient model to
predict customer churn in banking industry, given 10 demographic and personal
attributes from 10000 customers of European banks. The effect of feature
selection, class imbalance, and outliers will be discussed for ANN and random
forest as the two competing models. As shown, unlike random forest, ANN does
not reveal any serious concern regarding overfitting and is also robust to
noise. Therefore, ANN structure with five nodes in a single hidden layer is
recognized as the best performing classifier.

arXiv link: http://arxiv.org/abs/2301.13099v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2023-01-30

Machine Learning with High-Cardinality Categorical Features in Actuarial Applications

Authors: Benjamin Avanzi, Greg Taylor, Melantha Wang, Bernard Wong

High-cardinality categorical features are pervasive in actuarial data (e.g.
occupation in commercial property insurance). Standard categorical encoding
methods like one-hot encoding are inadequate in these settings.
In this work, we present a novel _Generalised Linear Mixed Model Neural
Network_ ("GLMMNet") approach to the modelling of high-cardinality categorical
features. The GLMMNet integrates a generalised linear mixed model in a deep
learning framework, offering the predictive power of neural networks and the
transparency of random effects estimates, the latter of which cannot be
obtained from the entity embedding models. Further, its flexibility to deal
with any distribution in the exponential dispersion (ED) family makes it widely
applicable to many actuarial contexts and beyond.
We illustrate and compare the GLMMNet against existing approaches in a range
of simulation experiments as well as in a real-life insurance case study.
Notably, we find that the GLMMNet often outperforms or at least performs
comparably with an entity embedded neural network, while providing the
additional benefit of transparency, which is particularly valuable in practical
applications.
Importantly, while our model was motivated by actuarial applications, it can
have wider applicability. The GLMMNet would suit any applications that involve
high-cardinality categorical variables and where the response cannot be
sufficiently modelled by a Gaussian distribution.

arXiv link: http://arxiv.org/abs/2301.12710v1

Econometrics arXiv paper, submitted: 2023-01-29

A Note on the Estimation of Job Amenities and Labor Productivity

Authors: Arnaud Dupuy, Alfred Galichon

This paper introduces a maximum likelihood estimator of the value of job
amenities and labor productivity in a single matching market based on the
observation of equilibrium matches and wages. The estimation procedure
simultaneously fits both the matching patterns and the wage curve. While our
estimator is suited for a wide range of assignment problems, we provide an
application to the estimation of the Value of a Statistical Life using
compensating wage differentials for the risk of fatal injury on the job. Using
US data for 2017, we estimate the Value of Statistical Life at $ 6.3 million
($2017).

arXiv link: http://arxiv.org/abs/2301.12542v1

Econometrics arXiv paper, submitted: 2023-01-29

Multidimensional dynamic factor models

Authors: Matteo Barigozzi, Filippo Pellegrino

This paper generalises dynamic factor models for multidimensional dependent
data. In doing so, it develops an interpretable technique to study complex
information sources ranging from repeated surveys with a varying number of
respondents to panels of satellite images. We specialise our results to model
microeconomic data on US households jointly with macroeconomic aggregates. This
results in a powerful tool able to generate localised predictions,
counterfactuals and impulse response functions for individual households,
accounting for traditional time-series complexities depicted in the state-space
literature. The model is also compatible with the growing focus of policymakers
for real-time economic analysis as it is able to process observations online,
while handling missing values and asynchronous data releases.

arXiv link: http://arxiv.org/abs/2301.12499v1

Econometrics arXiv updated paper (originally submitted: 2023-01-27)

Synthetic Difference In Differences Estimation

Authors: Damian Clarke, Daniel Pailañir, Susan Athey, Guido Imbens

In this paper, we describe a computational implementation of the Synthetic
difference-in-differences (SDID) estimator of Arkhangelsky et al. (2021) for
Stata. Synthetic difference-in-differences can be used in a wide class of
circumstances where treatment effects on some particular policy or event are
desired, and repeated observations on treated and untreated units are available
over time. We lay out the theory underlying SDID, both when there is a single
treatment adoption date and when adoption is staggered over time, and discuss
estimation and inference in each of these cases. We introduce the sdid command
which implements these methods in Stata, and provide a number of examples of
use, discussing estimation, inference, and visualization of results.

arXiv link: http://arxiv.org/abs/2301.11859v3

Econometrics arXiv updated paper (originally submitted: 2023-01-26)

Simple Difference-in-Differences Estimation in Fixed-T Panels

Authors: Nicholas Brown, Kyle Butts, Joakim Westerlund

The present paper proposes a new treatment effects estimator that is valid
when the number of time periods is small, and the parallel trends condition
holds conditional on covariates and unobserved heterogeneity in the form of
interactive fixed effects. The estimator also allow the control variables to be
affected by treatment and it enables estimation of the resulting indirect
effect on the outcome variable. The asymptotic properties of the estimator are
established and their accuracy in small samples is investigated using Monte
Carlo simulations. The empirical usefulness of the estimator is illustrated
using as an example the effect of increased trade competition on firm markups
in China.

arXiv link: http://arxiv.org/abs/2301.11358v2

Econometrics arXiv updated paper (originally submitted: 2023-01-25)

Automatic Debiased Estimation with Machine Learning-Generated Regressors

Authors: Juan Carlos Escanciano, Telmo Pérez-Izquierdo

Many parameters of interest in economics and other social sciences depend on
generated regressors. Examples in economics include structural parameters in
models with endogenous variables estimated by control functions and in models
with sample selection, treatment effect estimation with propensity score
matching, and marginal treatment effects. More recently, Machine Learning (ML)
generated regressors are becoming ubiquitous for these and other applications
such as imputation with missing regressors, dimension reduction, including
autoencoders, learned proxies, confounders and treatments, and for feature
engineering with unstructured data, among others. We provide the first general
method for valid inference with regressors generated from ML. Inference with
generated regressors is complicated by the very complex expression for
influence functions and asymptotic variances. Additionally, ML-generated
regressors may lead to large biases in downstream inferences. To address these
problems, we propose Automatic Locally Robust/debiased GMM estimators in a
general three-step setting with ML-generated regressors. We illustrate our
results with treatment effects and counterfactual parameters in the partially
linear and nonparametric models with ML-generated regressors. We provide
sufficient conditions for the asymptotic normality of our debiased GMM
estimators and investigate their finite-sample performance through Monte Carlo
simulations.

arXiv link: http://arxiv.org/abs/2301.10643v3

Econometrics arXiv updated paper (originally submitted: 2023-01-25)

Hierarchical Regularizers for Reverse Unrestricted Mixed Data Sampling Regressions

Authors: Alain Hecq, Marie Ternes, Ines Wilms

Reverse Unrestricted MIxed DAta Sampling (RU-MIDAS) regressions are used to
model high-frequency responses by means of low-frequency variables. However,
due to the periodic structure of RU-MIDAS regressions, the dimensionality grows
quickly if the frequency mismatch between the high- and low-frequency variables
is large. Additionally the number of high-frequency observations available for
estimation decreases. We propose to counteract this reduction in sample size by
pooling the high-frequency coefficients and further reduce the dimensionality
through a sparsity-inducing convex regularizer that accounts for the temporal
ordering among the different lags. To this end, the regularizer prioritizes the
inclusion of lagged coefficients according to the recency of the information
they contain. We demonstrate the proposed method on two empirical applications,
one on realized volatility forecasting with macroeconomic data and another on
demand forecasting for a bicycle-sharing system with ridership data on other
transportation types.

arXiv link: http://arxiv.org/abs/2301.10592v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2023-01-25

Sequential Bayesian Learning for Hidden Semi-Markov Models

Authors: Patrick Aschermayr, Konstantinos Kalogeropoulos

In this paper, we explore the class of the Hidden Semi-Markov Model (HSMM), a
flexible extension of the popular Hidden Markov Model (HMM) that allows the
underlying stochastic process to be a semi-Markov chain. HSMMs are typically
used less frequently than their basic HMM counterpart due to the increased
computational challenges when evaluating the likelihood function. Moreover,
while both models are sequential in nature, parameter estimation is mainly
conducted via batch estimation methods. Thus, a major motivation of this paper
is to provide methods to estimate HSMMs (1) in a computationally feasible time,
(2) in an exact manner, i.e. only subject to Monte Carlo error, and (3) in a
sequential setting. We provide and verify an efficient computational scheme for
Bayesian parameter estimation on HSMMs. Additionally, we explore the
performance of HSMMs on the VIX time series using Autoregressive (AR) models
with hidden semi-Markov states and demonstrate how this algorithm can be used
for regime switching, model selection and clustering purposes.

arXiv link: http://arxiv.org/abs/2301.10494v1

Econometrics arXiv paper, submitted: 2023-01-23

Processes analogous to ecological interactions and dispersal shape the dynamics of economic activities

Authors: Victor Boussange, Didier Sornette, Heike Lischke, Loïc Pellissier

The processes of ecological interactions, dispersal and mutations shape the
dynamics of biological communities, and analogous eco-evolutionary processes
acting upon economic entities have been proposed to explain economic change.
This hypothesis is compelling because it explains economic change through
endogenous mechanisms, but it has not been quantitatively tested at the global
economy level. Here, we use an inverse modelling technique and 59 years of
economic data covering 77 countries to test whether the collective dynamics of
national economic activities can be characterised by eco-evolutionary
processes. We estimate the statistical support of dynamic community models in
which the dynamics of economic activities are coupled with positive and
negative interactions between the activities, the spatial dispersal of the
activities, and their transformations into other economic activities. We find
strong support for the models capturing positive interactions between economic
activities and spatial dispersal of the activities across countries. These
results suggest that processes akin to those occurring in ecosystems play a
significant role in the dynamics of economic systems. The strength-of-evidence
obtained for each model varies across countries and may be caused by
differences in the distance between countries, specific institutional contexts,
and historical contingencies. Overall, our study provides a new quantitative,
biologically inspired framework to study the forces shaping economic change.

arXiv link: http://arxiv.org/abs/2301.09486v1

Econometrics arXiv updated paper (originally submitted: 2023-01-23)

ddml: Double/debiased machine learning in Stata

Authors: Achim Ahrens, Christian B. Hansen, Mark E. Schaffer, Thomas Wiemann

We introduce the package ddml for Double/Debiased Machine Learning (DDML) in
Stata. Estimators of causal parameters for five different econometric models
are supported, allowing for flexible estimation of causal effects of endogenous
variables in settings with unknown functional forms and/or many exogenous
variables. ddml is compatible with many existing supervised machine learning
programs in Stata. We recommend using DDML in combination with stacking
estimation which combines multiple machine learners into a final predictor. We
provide Monte Carlo evidence to support our recommendation.

arXiv link: http://arxiv.org/abs/2301.09397v3

Econometrics arXiv updated paper (originally submitted: 2023-01-23)

Revisiting Panel Data Discrete Choice Models with Lagged Dependent Variables

Authors: Christopher R. Dobronyi, Fu Ouyang, Thomas Tao Yang

This paper revisits the identification and estimation of a class of
semiparametric (distribution-free) panel data binary choice models with lagged
dependent variables, exogenous covariates, and entity fixed effects. We provide
a novel identification strategy, using an "identification at infinity"
argument. In contrast with the celebrated Honore and Kyriazidou (2000), our
method permits time trends of any form and does not suffer from the "curse of
dimensionality". We propose an easily implementable conditional maximum score
estimator. The asymptotic properties of the proposed estimator are fully
characterized. A small-scale Monte Carlo study demonstrates that our approach
performs satisfactorily in finite samples. We illustrate the usefulness of our
method by presenting an empirical application to enrollment in private hospital
insurance using the Household, Income and Labour Dynamics in Australia (HILDA)
Survey data.

arXiv link: http://arxiv.org/abs/2301.09379v5

Econometrics arXiv cross-link from q-fin.PR (q-fin.PR), submitted: 2023-01-22

Labor Income Risk and the Cross-Section of Expected Returns

Authors: Mykola Pinchuk

This paper explores asset pricing implications of unemployment risk from
sectoral shifts. I proxy for this risk using cross-industry dispersion (CID),
defined as a mean absolute deviation of returns of 49 industry portfolios. CID
peaks during periods of accelerated sectoral reallocation and heightened
uncertainty. I find that expected stock returns are related cross-sectionally
to the sensitivities of returns to innovations in CID. Annualized returns of
the stocks with high sensitivity to CID are 5.9% lower than the returns of the
stocks with low sensitivity. Abnormal returns with respect to the best factor
model are 3.5%, suggesting that common factors can not explain this return
spread. Stocks with high sensitivity to CID are likely to be the stocks, which
benefited from sectoral shifts. CID positively predicts unemployment through
its long-term component, consistent with the hypothesis that CID is a proxy for
unemployment risk from sectoral shifts.

arXiv link: http://arxiv.org/abs/2301.09173v1

Econometrics arXiv updated paper (originally submitted: 2023-01-21)

Inference for Two-stage Experiments under Covariate-Adaptive Randomization

Authors: Jizhou Liu

This paper studies inference in two-stage randomized experiments under
covariate-adaptive randomization. In the initial stage of this experimental
design, clusters (e.g., households, schools, or graph partitions) are
stratified and randomly assigned to control or treatment groups based on
cluster-level covariates. Subsequently, an independent second-stage design is
carried out, wherein units within each treated cluster are further stratified
and randomly assigned to either control or treatment groups, based on
individual-level covariates. Under the homogeneous partial interference
assumption, I establish conditions under which the proposed
difference-in-“average of averages” estimators are consistent and
asymptotically normal for the corresponding average primary and spillover
effects and develop consistent estimators of their asymptotic variances.
Combining these results establishes the asymptotic validity of tests based on
these estimators. My findings suggest that ignoring covariate information in
the design stage can result in efficiency loss, and commonly used inference
methods that ignore or improperly use covariate information can lead to either
conservative or invalid inference. Then, I apply these results to studying
optimal use of covariate information under covariate-adaptive randomization in
large samples, and demonstrate that a specific generalized matched-pair design
achieves minimum asymptotic variance for each proposed estimator. Finally, I
discuss covariate adjustment, which incorporates additional baseline covariates
not used for treatment assignment. The practical relevance of the theoretical
results is illustrated through a simulation study and an empirical application.

arXiv link: http://arxiv.org/abs/2301.09016v6

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-01-21

A Practical Introduction to Regression Discontinuity Designs: Extensions

Authors: Matias D. Cattaneo, Nicolas Idrobo, Rocio Titiunik

This monograph, together with its accompanying first part Cattaneo, Idrobo
and Titiunik (2020), collects and expands the instructional materials we
prepared for more than $50$ short courses and workshops on Regression
Discontinuity (RD) methodology that we taught between 2014 and 2023. In this
second monograph, we discuss several topics in RD methodology that build on and
extend the analysis of RD designs introduced in Cattaneo, Idrobo and Titiunik
(2020). Our first goal is to present an alternative RD conceptual framework
based on local randomization ideas. This methodological approach can be useful
in RD designs with discretely-valued scores, and can also be used more broadly
as a complement to the continuity-based approach in other settings. Then,
employing both continuity-based and local randomization approaches, we extend
the canonical Sharp RD design in multiple directions: fuzzy RD designs, RD
designs with discrete scores, and multi-dimensional RD designs. The goal of our
two-part monograph is purposely practical and hence we focus on the empirical
analysis of RD designs.

arXiv link: http://arxiv.org/abs/2301.08958v2

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2023-01-20

Composite distributions in the social sciences: A comparative empirical study of firms' sales distribution for France, Germany, Italy, Japan, South Korea, and Spain

Authors: Arturo Ramos, Till Massing, Atushi Ishikawa, Shouji Fujimoto, Takayuki Mizuno

We study 17 different statistical distributions for sizes obtained {}from the
classical and recent literature to describe a relevant variable in the social
sciences and Economics, namely the firms' sales distribution in six countries
over an ample period. We find that the best results are obtained with mixtures
of lognormal (LN), loglogistic (LL), and log Student's $t$ (LSt) distributions.
The single lognormal, in turn, is strongly not selected. We then find that the
whole firm size distribution is better described by a mixture, and there exist
subgroups of firms. Depending on the method of measurement, the best fitting
distribution cannot be defined by a single one, but as a mixture of at least
three distributions or even four or five. We assess a full sample analysis, an
in-sample and out-of-sample analysis, and a doubly truncated sample analysis.
We also provide the formulation of the preferred models as solutions of the
Fokker--Planck or forward Kolmogorov equation.

arXiv link: http://arxiv.org/abs/2301.09438v1

Econometrics arXiv cross-link from eess.SY (eess.SY), submitted: 2023-01-19

From prosumer to flexumer: Case study on the value of flexibility in decarbonizing the multi-energy system of a manufacturing company

Authors: Markus Fleschutz, Markus Bohlayer, Marco Braun, Michael D. Murphy

Digitalization and sector coupling enable companies to turn into flexumers.
By using the flexibility of their multi-energy system (MES), they reduce costs
and carbon emissions while stabilizing the electricity system. However, to
identify the necessary investments in energy conversion and storage
technologies to leverage demand response (DR) potentials, companies need to
assess the value of flexibility. Therefore, this study quantifies the
flexibility value of a production company's MES by optimizing the synthesis,
design, and operation of a decarbonizing MES considering self-consumption
optimization, peak shaving, and integrated DR based on hourly prices and carbon
emission factors (CEFs). The detailed case study of a beverage company in
northern Germany considers vehicle-to-X of powered industrial trucks,
power-to-heat on multiple temperatures, wind turbines, photovoltaic systems,
and energy storage systems (thermal energy, electricity, and hydrogen). We
propose and apply novel data-driven metrics to evaluate the intensity of
price-based and CEF-based DR. The results reveal that flexibility usage reduces
decarbonization costs (by 19-80% depending on electricity and carbon removal
prices), total annual costs, operating carbon emissions, energy-weighted
average prices and CEFs, and fossil energy dependency. The results also suggest
that a net-zero operational carbon emission MES requires flexibility, which, in
an economic case, is provided by a combination of different flexible
technologies and storage systems that complement each other. While the value of
flexibility depends on various market and consumer-specific factors such as
electricity or carbon removal prices, this study highlights the importance of
demand flexibility for the decarbonization of MESs.

arXiv link: http://arxiv.org/abs/2301.07997v1

Econometrics arXiv updated paper (originally submitted: 2023-01-19)

Digital Divide: Empirical Study of CIUS 2020

Authors: Joann Jasiak, Peter MacKenzie, Purevdorj Tuvaandorj

As Canada and other major economies consider implementing "digital money" or
Central Bank Digital Currencies, understanding how demographic and geographic
factors influence public engagement with digital technologies becomes
increasingly important. This paper uses data from the 2020 Canadian Internet
Use Survey and employs survey-adapted Lasso inference methods to identify
individual socio-economic and demographic characteristics determining the
digital divide in Canada. We also introduce a score to measure and compare the
digital literacy of various segments of Canadian population. Our findings
reveal that disparities in the use of e.g. online banking, emailing, and
digital payments exist across different demographic and socio-economic groups.
In addition, we document the effects of COVID-19 pandemic on internet use in
Canada and describe changes in the characteristics of Canadian internet users
over the last decade.

arXiv link: http://arxiv.org/abs/2301.07855v3

Econometrics arXiv paper, submitted: 2023-01-18

An MCMC Approach to Classical Estimation

Authors: Victor Chernozhukov, Han Hong

This paper studies computationally and theoretically attractive estimators
called the Laplace type estimators (LTE), which include means and quantiles of
Quasi-posterior distributions defined as transformations of general
(non-likelihood-based) statistical criterion functions, such as those in GMM,
nonlinear IV, empirical likelihood, and minimum distance methods. The approach
generates an alternative to classical extremum estimation and also falls
outside the parametric Bayesian approach. For example, it offers a new
attractive estimation method for such important semi-parametric problems as
censored and instrumental quantile, nonlinear GMM and value-at-risk models. The
LTE's are computed using Markov Chain Monte Carlo methods, which help
circumvent the computational curse of dimensionality. A large sample theory is
obtained for regular cases.

arXiv link: http://arxiv.org/abs/2301.07782v1

Econometrics arXiv paper, submitted: 2023-01-18

Optimal Transport for Counterfactual Estimation: A Method for Causal Inference

Authors: Arthur Charpentier, Emmanuel Flachaire, Ewen Gallic

Many problems ask a question that can be formulated as a causal question:
"what would have happened if...?" For example, "would the person have had
surgery if he or she had been Black?" To address this kind of questions,
calculating an average treatment effect (ATE) is often uninformative, because
one would like to know how much impact a variable (such as skin color) has on a
specific individual, characterized by certain covariates. Trying to calculate a
conditional ATE (CATE) seems more appropriate. In causal inference, the
propensity score approach assumes that the treatment is influenced by x, a
collection of covariates. Here, we will have the dual view: doing an
intervention, or changing the treatment (even just hypothetically, in a thought
experiment, for example by asking what would have happened if a person had been
Black) can have an impact on the values of x. We will see here that optimal
transport allows us to change certain characteristics that are influenced by
the variable we are trying to quantify the effect of. We propose here a mutatis
mutandis version of the CATE, which will be done simply in dimension one by
saying that the CATE must be computed relative to a level of probability,
associated to the proportion of x (a single covariate) in the control
population, and by looking for the equivalent quantile in the test population.
In higher dimension, it will be necessary to go through transport, and an
application will be proposed on the impact of some variables on the probability
of having an unnatural birth (the fact that the mother smokes, or that the
mother is Black).

arXiv link: http://arxiv.org/abs/2301.07755v1

Econometrics arXiv updated paper (originally submitted: 2023-01-18)

Unconditional Quantile Partial Effects via Conditional Quantile Regression

Authors: Javier Alejo, Antonio F. Galvao, Julian Martinez-Iriarte, Gabriel Montes-Rojas

This paper develops a semi-parametric procedure for estimation of
unconditional quantile partial effects using quantile regression coefficients.
The estimator is based on an identification result showing that, for continuous
covariates, unconditional quantile effects are a weighted average of
conditional ones at particular quantile levels that depend on the covariates.
We propose a two-step estimator for the unconditional effects where in the
first step one estimates a structural quantile regression model, and in the
second step a nonparametric regression is applied to the first step
coefficients. We establish the asymptotic properties of the estimator, say
consistency and asymptotic normality. Monte Carlo simulations show numerical
evidence that the estimator has very good finite sample performance and is
robust to the selection of bandwidth and kernel. To illustrate the proposed
method, we study the canonical application of the Engel's curve, i.e. food
expenditures as a share of income.

arXiv link: http://arxiv.org/abs/2301.07241v4

Econometrics arXiv updated paper (originally submitted: 2023-01-17)

Noisy, Non-Smooth, Non-Convex Estimation of Moment Condition Models

Authors: Jean-Jacques Forneron

A practical challenge for structural estimation is the requirement to
accurately minimize a sample objective function which is often non-smooth,
non-convex, or both. This paper proposes a simple algorithm designed to find
accurate solutions without performing an exhaustive search. It augments each
iteration from a new Gauss-Newton algorithm with a grid search step. A finite
sample analysis derives its optimization and statistical properties
simultaneously using only econometric assumptions. After a finite number of
iterations, the algorithm automatically transitions from global to fast local
convergence, producing accurate estimates with high probability. Simulated
examples and an empirical application illustrate the results.

arXiv link: http://arxiv.org/abs/2301.07196v3

Econometrics arXiv updated paper (originally submitted: 2023-01-17)

Testing Firm Conduct

Authors: Marco Duarte, Lorenzo Magnolfi, Mikkel Sølvsten, Christopher Sullivan

Evaluating policy in imperfectly competitive markets requires understanding
firm behavior. While researchers test conduct via model selection and
assessment, we present advantages of Rivers and Vuong (2002) (RV) model
selection under misspecification. However, degeneracy of RV invalidates
inference. With a novel definition of weak instruments for testing, we connect
degeneracy to instrument strength, derive weak instrument properties of RV, and
provide a diagnostic for weak instruments by extending the framework of Stock
and Yogo (2005) to model selection. We test vertical conduct (Villas-Boas,
2007) using common instrument sets. Some are weak, providing no power. Strong
instruments support manufacturers setting retail prices.

arXiv link: http://arxiv.org/abs/2301.06720v2

Econometrics arXiv updated paper (originally submitted: 2023-01-17)

Resolving the Conflict on Conduct Parameter Estimation in Homogeneous Goods Markets between Bresnahan (1982) and Perloff and Shen (2012)

Authors: Yuri Matsumura, Suguru Otani

We revisit conduct parameter estimation in homogeneous goods markets to
resolve the conflict between Bresnahan (1982) and Perloff and Shen (2012)
regarding the identification and the estimation of conduct parameters. We point
out that Perloff and Shen's (2012) proof is incorrect and its simulation
setting is invalid. Our simulation shows that estimation becomes accurate when
demand shifters are properly added in supply estimation and sample sizes are
increased, supporting Bresnahan (1982).

arXiv link: http://arxiv.org/abs/2301.06665v5

Econometrics arXiv paper, submitted: 2023-01-17

Statistical inference for the logarithmic spatial heteroskedasticity model with exogenous variables

Authors: Bing Su, Fukang Zhu, Ke Zhu

The spatial dependence in mean has been well studied by plenty of models in a
large strand of literature, however, the investigation of spatial dependence in
variance is lagging significantly behind. The existing models for the spatial
dependence in variance are scarce, with neither probabilistic structure nor
statistical inference procedure being explored. To circumvent this deficiency,
this paper proposes a new generalized logarithmic spatial heteroscedasticity
model with exogenous variables (denoted by the log-SHE model) to study the
spatial dependence in variance. For the log-SHE model, its spatial near-epoch
dependence (NED) property is investigated, and a systematic statistical
inference procedure is provided, including the maximum likelihood and
generalized method of moments estimators, the Wald, Lagrange multiplier and
likelihood-ratio-type D tests for model parameter constraints, and the
overidentification test for the model diagnostic checking. Using the tool of
spatial NED, the asymptotics of all proposed estimators and tests are
established under regular conditions. The usefulness of the proposed
methodology is illustrated by simulation results and a real data example on the
house selling price.

arXiv link: http://arxiv.org/abs/2301.06658v1

Econometrics arXiv paper, submitted: 2023-01-16

Robust M-Estimation for Additive Single-Index Cointegrating Time Series Models

Authors: Chaohua Dong, Jiti Gao, Yundong Tu, Bin Peng

Robust M-estimation uses loss functions, such as least absolute deviation
(LAD), quantile loss and Huber's loss, to construct its objective function, in
order to for example eschew the impact of outliers, whereas the difficulty in
analysing the resultant estimators rests on the nonsmoothness of these losses.
Generalized functions have advantages over ordinary functions in several
aspects, especially generalized functions possess derivatives of any order.
Generalized functions incorporate local integrable functions, the so-called
regular generalized functions, while the so-called singular generalized
functions (e.g. Dirac delta function) can be obtained as the limits of a
sequence of sufficient smooth functions, so-called regular sequence in
generalized function context. This makes it possible to use these singular
generalized functions through approximation. Nevertheless, a significant
contribution of this paper is to establish the convergence rate of regular
sequence to nonsmooth loss that answers a call from the relevant literature.
For parameter estimation where objective function may be nonsmooth, this paper
first shows as a general paradigm that how generalized function approach can be
used to tackle the nonsmooth loss functions in Section two using a very simple
model. This approach is of general interest and applicability. We further use
the approach in robust M-estimation for additive single-index cointegrating
time series models; the asymptotic theory is established for the proposed
estimators. We evaluate the finite-sample performance of the proposed
estimation method and theory by both simulated data and an empirical analysis
of predictive regression of stock returns.

arXiv link: http://arxiv.org/abs/2301.06631v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-01-16

When it counts -- Econometric identification of the basic factor model based on GLT structures

Authors: Sylvia Frühwirth-Schnatter, Darjus Hosszejni, Hedibert Freitas Lopes

Despite the popularity of factor models with sparse loading matrices, little
attention has been given to formally address identifiability of these models
beyond standard rotation-based identification such as the positive lower
triangular (PLT) constraint. To fill this gap, we review the advantages of
variance identification in sparse factor analysis and introduce the generalized
lower triangular (GLT) structures. We show that the GLT assumption is an
improvement over PLT without compromise: GLT is also unique but, unlike PLT, a
non-restrictive assumption. Furthermore, we provide a simple counting rule for
variance identification under GLT structures, and we demonstrate that within
this model class the unknown number of common factors can be recovered in an
exploratory factor analysis. Our methodology is illustrated for simulated data
in the context of post-processing posterior draws in Bayesian sparse factor
analysis.

arXiv link: http://arxiv.org/abs/2301.06354v1

Econometrics arXiv paper, submitted: 2023-01-16

Doubly-Robust Inference for Conditional Average Treatment Effects with High-Dimensional Controls

Authors: Adam Baybutt, Manu Navjeevan

Plausible identification of conditional average treatment effects (CATEs) may
rely on controlling for a large number of variables to account for confounding
factors. In these high-dimensional settings, estimation of the CATE requires
estimating first-stage models whose consistency relies on correctly specifying
their parametric forms. While doubly-robust estimators of the CATE exist,
inference procedures based on the second stage CATE estimator are not
doubly-robust. Using the popular augmented inverse propensity weighting signal,
we propose an estimator for the CATE whose resulting Wald-type confidence
intervals are doubly-robust. We assume a logistic model for the propensity
score and a linear model for the outcome regression, and estimate the
parameters of these models using an $\ell_1$ (Lasso) penalty to address the
high dimensional covariates. Our proposed estimator remains consistent at the
nonparametric rate and our proposed pointwise and uniform confidence intervals
remain asymptotically valid even if one of the logistic propensity score or
linear outcome regression models are misspecified. These results are obtained
under similar conditions to existing analyses in the high-dimensional and
nonparametric literatures.

arXiv link: http://arxiv.org/abs/2301.06283v1

Econometrics arXiv updated paper (originally submitted: 2023-01-13)

Identification in a Binary Choice Panel Data Model with a Predetermined Covariate

Authors: Stéphane Bonhomme, Kevin Dano, Bryan S. Graham

We study identification in a binary choice panel data model with a single
predetermined binary covariate (i.e., a covariate sequentially exogenous
conditional on lagged outcomes and covariates). The choice model is indexed by
a scalar parameter $\theta$, whereas the distribution of unit-specific
heterogeneity, as well as the feedback process that maps lagged outcomes into
future covariate realizations, are left unrestricted. We provide a simple
condition under which $\theta$ is never point-identified, no matter the number
of time periods available. This condition is satisfied in most models,
including the logit one. We also characterize the identified set of $\theta$
and show how to compute it using linear programming techniques. While $\theta$
is not generally point-identified, its identified set is informative in the
examples we analyze numerically, suggesting that meaningful learning about
$\theta$ may be possible even in short panels with feedback. As a complement,
we report calculations of identified sets for an average partial effect, and
find informative sets in this case as well.

arXiv link: http://arxiv.org/abs/2301.05733v2

Econometrics arXiv updated paper (originally submitted: 2023-01-13)

Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap

Authors: Ganesh Karapakula

In this paper, I try to tame "Basu's elephants" (data with extreme selection
on observables). I propose new practical large-sample and finite-sample methods
for estimating and inferring heterogeneous causal effects (under
unconfoundedness) in the empirically relevant context of limited overlap. I
develop a general principle called "Stable Probability Weighting" (SPW) that
can be used as an alternative to the widely used Inverse Probability Weighting
(IPW) technique, which relies on strong overlap. I show that IPW (or its
augmented version), when valid, is a special case of the more general SPW (or
its doubly robust version), which adjusts for the extremeness of the
conditional probabilities of the treatment states. The SPW principle can be
implemented using several existing large-sample parametric, semiparametric, and
nonparametric procedures for conditional moment models. In addition, I provide
new finite-sample results that apply when unconfoundedness is plausible within
fine strata. Since IPW estimation relies on the problematic reciprocal of the
estimated propensity score, I develop a "Finite-Sample Stable Probability
Weighting" (FPW) set-estimator that is unbiased in a sense. I also propose new
finite-sample inference methods for testing a general class of weak null
hypotheses. The associated computationally convenient methods, which can be
used to construct valid confidence sets and to bound the finite-sample
confidence distribution, are of independent interest. My large-sample and
finite-sample frameworks extend to the setting of multivalued treatments.

arXiv link: http://arxiv.org/abs/2301.05703v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2023-01-13

Non-Stochastic CDF Estimation Using Threshold Queries

Authors: Princewill Okoroafor, Vaishnavi Gupta, Robert Kleinberg, Eleanor Goh

Estimating the empirical distribution of a scalar-valued data set is a basic
and fundamental task. In this paper, we tackle the problem of estimating an
empirical distribution in a setting with two challenging features. First, the
algorithm does not directly observe the data; instead, it only asks a limited
number of threshold queries about each sample. Second, the data are not assumed
to be independent and identically distributed; instead, we allow for an
arbitrary process generating the samples, including an adaptive adversary.
These considerations are relevant, for example, when modeling a seller
experimenting with posted prices to estimate the distribution of consumers'
willingness to pay for a product: offering a price and observing a consumer's
purchase decision is equivalent to asking a single threshold query about their
value, and the distribution of consumers' values may be non-stationary over
time, as early adopters may differ markedly from late adopters.
Our main result quantifies, to within a constant factor, the sample
complexity of estimating the empirical CDF of a sequence of elements of $[n]$,
up to $\varepsilon$ additive error, using one threshold query per sample. The
complexity depends only logarithmically on $n$, and our result can be
interpreted as extending the existing logarithmic-complexity results for noisy
binary search to the more challenging setting where noise is non-stochastic.
Along the way to designing our algorithm, we consider a more general model in
which the algorithm is allowed to make a limited number of simultaneous
threshold queries on each sample. We solve this problem using Blackwell's
Approachability Theorem and the exponential weights method. As a side result of
independent interest, we characterize the minimum number of simultaneous
threshold queries required by deterministic CDF estimation algorithms.

arXiv link: http://arxiv.org/abs/2301.05682v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-01-13

Randomization Test for the Specification of Interference Structure

Authors: Tadao Hoshino, Takahide Yanagi

This study considers testing the specification of spillover effects in causal
inference. We focus on experimental settings in which the treatment assignment
mechanism is known to researchers. We develop a new randomization test
utilizing a hierarchical relationship between different exposures. Compared
with existing approaches, our approach is essentially applicable to any null
exposure specifications and produces powerful test statistics without a priori
knowledge of the true interference structure. As empirical illustrations, we
revisit two existing social network experiments: one on farmers' insurance
adoption and the other on anti-conflict education programs.

arXiv link: http://arxiv.org/abs/2301.05580v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-01-12

Unbiased estimation and asymptotically valid inference in multivariable Mendelian randomization with many weak instrumental variables

Authors: Yihe Yang, Noah Lorincz-Comi, Xiaofeng Zhu

Mendelian randomization (MR) is an instrumental variable (IV) approach to
infer causal relationships between exposures and outcomes with genome-wide
association studies (GWAS) summary data. However, the multivariable
inverse-variance weighting (IVW) approach, which serves as the foundation for
most MR approaches, cannot yield unbiased causal effect estimates in the
presence of many weak IVs. To address this problem, we proposed the MR using
Bias-corrected Estimating Equation (MRBEE) that can infer unbiased causal
relationships with many weak IVs and account for horizontal pleiotropy
simultaneously. While the practical significance of MRBEE was demonstrated in
our parallel work (Lorincz-Comi (2023)), this paper established the statistical
theories of multivariable IVW and MRBEE with many weak IVs. First, we showed
that the bias of the multivariable IVW estimate is caused by the
error-in-variable bias, whose scale and direction are inflated and influenced
by weak instrument bias and sample overlaps of exposures and outcome GWAS
cohorts, respectively. Second, we investigated the asymptotic properties of
multivariable IVW and MRBEE, showing that MRBEE outperforms multivariable IVW
regarding unbiasedness of causal effect estimation and asymptotic validity of
causal inference. Finally, we applied MRBEE to examine myopia and revealed that
education and outdoor activity are causal to myopia whereas indoor activity is
not.

arXiv link: http://arxiv.org/abs/2301.05130v6

Econometrics arXiv updated paper (originally submitted: 2023-01-12)

Interacting Treatments with Endogenous Takeup

Authors: Mate Kormos, Robert P. Lieli, Martin Huber

We study causal inference in randomized experiments (or quasi-experiments)
following a $2\times 2$ factorial design. There are two treatments, denoted $A$
and $B$, and units are randomly assigned to one of four categories: treatment
$A$ alone, treatment $B$ alone, joint treatment, or none. Allowing for
endogenous non-compliance with the two binary instruments representing the
intended assignment, as well as unrestricted interference across the two
treatments, we derive the causal interpretation of various instrumental
variable estimands under more general compliance conditions than in the
literature. In general, if treatment takeup is driven by both instruments for
some units, it becomes difficult to separate treatment interaction from
treatment effect heterogeneity. We provide auxiliary conditions and various
bounding strategies that may help zero in on causally interesting parameters.
As an empirical illustration, we apply our results to a program randomly
offering two different treatments, namely tutoring and financial incentives, to
first year college students, in order to assess the treatments' effects on
academic performance.

arXiv link: http://arxiv.org/abs/2301.04876v2

Econometrics arXiv updated paper (originally submitted: 2023-01-12)

Testing for Coefficient Randomness in Local-to-Unity Autoregressions

Authors: Mikihito Nishi

In this study, we propose a test for the coefficient randomness in
autoregressive models where the autoregressive coefficient is local to unity,
which is empirically relevant given the results of earlier studies. Under this
specification, we theoretically analyze the effect of the correlation between
the random coefficient and disturbance on tests' properties, which remains
largely unexplored in the literature. Our analysis reveals that the correlation
crucially affects the power of tests for coefficient randomness and that tests
proposed by earlier studies can perform poorly when the degree of the
correlation is moderate to large. The test we propose in this paper is designed
to have a power function robust to the correlation. Because the asymptotic null
distribution of our test statistic depends on the correlation $\psi$ between
the disturbance and its square as earlier tests do, we also propose a modified
version of the test statistic such that its asymptotic null distribution is
free from the nuisance parameter $\psi$. The modified test is shown to have
better power properties than existing ones in large and finite samples.

arXiv link: http://arxiv.org/abs/2301.04853v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2023-01-12

A Framework for Generalization and Transportation of Causal Estimates Under Covariate Shift

Authors: Apoorva Lal, Wenjing Zheng, Simon Ejdemyr

Randomized experiments are an excellent tool for estimating internally valid
causal effects with the sample at hand, but their external validity is
frequently debated. While classical results on the estimation of Population
Average Treatment Effects (PATE) implicitly assume random selection into
experiments, this is typically far from true in many medical,
social-scientific, and industry experiments. When the experimental sample is
different from the target sample along observable or unobservable dimensions,
experimental estimates may be of limited use for policy decisions. We begin by
decomposing the extrapolation bias from estimating the Target Average Treatment
Effect (TATE) using the Sample Average Treatment Effect (SATE) into covariate
shift, overlap, and effect modification components, which researchers can
reason about in order to diagnose the severity of extrapolation bias. Next, We
cast covariate shift as a sample selection problem and propose estimators that
re-weight the doubly-robust scores from experimental subjects to estimate
treatment effects in the overall sample (=: generalization) or in an alternate
target sample (=: transportation). We implement these estimators in the
open-source R package causalTransportR and illustrate its performance in a
simulation study and discuss diagnostics to evaluate its performance.

arXiv link: http://arxiv.org/abs/2301.04776v1

Econometrics arXiv updated paper (originally submitted: 2023-01-11)

Inference on quantile processes with a finite number of clusters

Authors: Andreas Hagemann

I introduce a generic method for inference on entire quantile and regression
quantile processes in the presence of a finite number of large and arbitrarily
heterogeneous clusters. The method asymptotically controls size by generating
statistics that exhibit enough distributional symmetry such that randomization
tests can be applied. The randomization test does not require ex-ante matching
of clusters, is free of user-chosen parameters, and performs well at
conventional significance levels with as few as five clusters. The method tests
standard (non-sharp) hypotheses and can even be asymptotically similar in
empirically relevant situations. The main focus of the paper is inference on
quantile treatment effects but the method applies more broadly. Numerical and
empirical examples are provided.

arXiv link: http://arxiv.org/abs/2301.04687v2

Econometrics arXiv updated paper (originally submitted: 2023-01-11)

Fast and Reliable Jackknife and Bootstrap Methods for Cluster-Robust Inference

Authors: James G. MacKinnon, Morten Ørregaard Nielsen, Matthew D. Webb

We provide computationally attractive methods to obtain jackknife-based
cluster-robust variance matrix estimators (CRVEs) for linear regression models
estimated by least squares. We also propose several new variants of the wild
cluster bootstrap, which involve these CRVEs, jackknife-based bootstrap
data-generating processes, or both. Extensive simulation experiments suggest
that the new methods can provide much more reliable inferences than existing
ones in cases where the latter are not trustworthy, such as when the number of
clusters is small and/or cluster sizes vary substantially. Three empirical
examples illustrate the new methods.

arXiv link: http://arxiv.org/abs/2301.04527v2

Econometrics arXiv updated paper (originally submitted: 2023-01-11)

Testing for the appropriate level of clustering in linear regression models

Authors: James G. MacKinnon, Morten Ørregaard Nielsen, Matthew D. Webb

The overwhelming majority of empirical research that uses cluster-robust
inference assumes that the clustering structure is known, even though there are
often several possible ways in which a dataset could be clustered. We propose
two tests for the correct level of clustering in regression models. One test
focuses on inference about a single coefficient, and the other on inference
about two or more coefficients. We provide both asymptotic and wild bootstrap
implementations. The proposed tests work for a null hypothesis of either no
clustering or “fine” clustering against alternatives of “coarser”
clustering. We also propose a sequential testing procedure to determine the
appropriate level of clustering. Simulations suggest that the bootstrap tests
perform very well under the null hypothesis and can have excellent power. An
empirical example suggests that using the tests leads to sensible inferences.

arXiv link: http://arxiv.org/abs/2301.04522v2

Econometrics arXiv paper, submitted: 2023-01-11

Uniform Inference in Linear Error-in-Variables Models: Divide-and-Conquer

Authors: Tom Boot, Artūras Juodis

It is customary to estimate error-in-variables models using higher-order
moments of observables. This moments-based estimator is consistent only when
the coefficient of the latent regressor is assumed to be non-zero. We develop a
new estimator based on the divide-and-conquer principle that is consistent for
any value of the coefficient of the latent regressor. In an application on the
relation between investment, (mismeasured) Tobin's $q$ and cash flow, we find
time periods in which the effect of Tobin's $q$ is not statistically different
from zero. The implausibly large higher-order moment estimates in these periods
disappear when using the proposed estimator.

arXiv link: http://arxiv.org/abs/2301.04439v1

Econometrics arXiv updated paper (originally submitted: 2023-01-10)

Asymptotic Theory for Two-Way Clustering

Authors: Luther Yap

This paper proves a new central limit theorem for a sample that exhibits
two-way dependence and heterogeneity across clusters. Statistical inference for
situations with both two-way dependence and cluster heterogeneity has thus far
been an open issue. The existing theory for two-way clustering inference
requires identical distributions across clusters (implied by the so-called
separate exchangeability assumption). Yet no such homogeneity requirement is
needed in the existing theory for one-way clustering. The new result therefore
theoretically justifies the view that two-way clustering is a more robust
version of one-way clustering, consistent with applied practice. In an
application to linear regression, I show that a standard plug-in variance
estimator is valid for inference.

arXiv link: http://arxiv.org/abs/2301.03805v3

Econometrics arXiv paper, submitted: 2023-01-07

Quantile Autoregression-based Non-causality Testing

Authors: Weifeng Jin

Non-causal processes have been drawing attention recently in Macroeconomics
and Finance for their ability to display nonlinear behaviors such as asymmetric
dynamics, clustering volatility, and local explosiveness. In this paper, we
investigate the statistical properties of empirical conditional quantiles of
non-causal processes. Specifically, we show that the quantile autoregression
(QAR) estimates for non-causal processes do not remain constant across
different quantiles in contrast to their causal counterparts. Furthermore, we
demonstrate that non-causal autoregressive processes admit nonlinear
representations for conditional quantiles given past observations. Exploiting
these properties, we propose three novel testing strategies of non-causality
for non-Gaussian processes within the QAR framework. The tests are constructed
either by verifying the constancy of the slope coefficients or by applying a
misspecification test of the linear QAR model over different quantiles of the
process. Some numerical experiments are included to examine the finite sample
performance of the testing strategies, where we compare different specification
tests for dynamic quantiles with the Kolmogorov-Smirnov constancy test. The new
methodology is applied to some time series from financial markets to
investigate the presence of speculative bubbles. The extension of the approach
based on the specification tests to AR processes driven by innovations with
heteroskedasticity is studied through simulations. The performance of QAR
estimates of non-causal processes at extreme quantiles is also explored.

arXiv link: http://arxiv.org/abs/2301.02937v1

Econometrics arXiv paper, submitted: 2023-01-06

Climate change heterogeneity: A new quantitative approach

Authors: Maria Dolores Gadea, Jesus Gonzalo

Climate change is a non-uniform phenomenon. This paper proposes a new
quantitative methodology to characterize, measure, and test the existence of
climate change heterogeneity. It consists of three steps. First, we introduce a
new testable warming typology based on the evolution of the trend of the whole
temperature distribution and not only on the average. Second, we define the
concepts of warming acceleration and warming amplification in a testable
format. And third, we introduce the new testable concept of warming dominance
to determine whether region A is suffering a worse warming process than region
B. Applying this three-step methodology, we find that Spain and the Globe
experience a clear distributional warming process (beyond the standard average)
but of different types. In both cases, this process is accelerating over time
and asymmetrically amplified. Overall, warming in Spain dominates the Globe in
all the quantiles except the lower tail of the global temperature distribution
that corresponds to the Arctic region. Our climate change heterogeneity results
open the door to the need for a non-uniform causal-effect climate analysis that
goes beyond the standard causality in mean as well as for a more efficient
design of the mitigation-adaptation policies. In particular, the heterogeneity
we find suggests that these policies should contain a common global component
and a clear local-regional element. Future climate agreements should take the
whole temperature distribution into account.

arXiv link: http://arxiv.org/abs/2301.02648v1

Econometrics arXiv updated paper (originally submitted: 2023-01-05)

Relaxing Instrument Exogeneity with Common Confounders

Authors: Christian Tien

Instruments can be used to identify causal effects in the presence of
unobserved confounding, under the famous relevance and exogeneity
(unconfoundedness and exclusion) assumptions. As exogeneity is difficult to
justify and to some degree untestable, it often invites criticism in
applications. Hoping to alleviate this problem, we propose a novel
identification approach, which relaxes traditional IV exogeneity to exogeneity
conditional on some unobserved common confounders. We assume there exist some
relevant proxies for the unobserved common confounders. Unlike typical proxies,
our proxies can have a direct effect on the endogenous regressor and the
outcome. We provide point identification results with a linearly separable
outcome model in the disturbance, and alternatively with strict monotonicity in
the first stage. General doubly robust and Neyman orthogonal moments are
derived consecutively to enable the straightforward root-n estimation of
low-dimensional parameters despite the high-dimensionality of nuisances,
themselves non-uniquely defined by Fredholm integral equations. Using this
novel method with NLS97 data, we separate ability bias from general selection
bias in the economic returns to education problem.

arXiv link: http://arxiv.org/abs/2301.02052v3

Econometrics arXiv paper, submitted: 2023-01-03

Measuring tail risk at high-frequency: An $L_1$-regularized extreme value regression approach with unit-root predictors

Authors: Julien Hambuckers, Li Sun, Luca Trapin

We study tail risk dynamics in high-frequency financial markets and their
connection with trading activity and market uncertainty. We introduce a dynamic
extreme value regression model accommodating both stationary and local
unit-root predictors to appropriately capture the time-varying behaviour of the
distribution of high-frequency extreme losses. To characterize trading activity
and market uncertainty, we consider several volatility and liquidity
predictors, and propose a two-step adaptive $L_1$-regularized maximum
likelihood estimator to select the most appropriate ones. We establish the
oracle property of the proposed estimator for selecting both stationary and
local unit-root predictors, and show its good finite sample properties in an
extensive simulation study. Studying the high-frequency extreme losses of nine
large liquid U.S. stocks using 42 liquidity and volatility predictors, we find
the severity of extreme losses to be well predicted by low levels of price
impact in period of high volatility of liquidity and volatility.

arXiv link: http://arxiv.org/abs/2301.01362v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2023-01-03

On the causality-preservation capabilities of generative modelling

Authors: Yves-Cédric Bauwelinckx, Jan Dhaene, Tim Verdonck, Milan van den Heuvel

Modeling lies at the core of both the financial and the insurance industry
for a wide variety of tasks. The rise and development of machine learning and
deep learning models have created many opportunities to improve our modeling
toolbox. Breakthroughs in these fields often come with the requirement of large
amounts of data. Such large datasets are often not publicly available in
finance and insurance, mainly due to privacy and ethics concerns. This lack of
data is currently one of the main hurdles in developing better models. One
possible option to alleviating this issue is generative modeling. Generative
models are capable of simulating fake but realistic-looking data, also referred
to as synthetic data, that can be shared more freely. Generative Adversarial
Networks (GANs) is such a model that increases our capacity to fit very
high-dimensional distributions of data. While research on GANs is an active
topic in fields like computer vision, they have found limited adoption within
the human sciences, like economics and insurance. Reason for this is that in
these fields, most questions are inherently about identification of causal
effects, while to this day neural networks, which are at the center of the GAN
framework, focus mostly on high-dimensional correlations. In this paper we
study the causal preservation capabilities of GANs and whether the produced
synthetic data can reliably be used to answer causal questions. This is done by
performing causal analyses on the synthetic data, produced by a GAN, with
increasingly more lenient assumptions. We consider the cross-sectional case,
the time series case and the case with a complete structural model. It is shown
that in the simple cross-sectional scenario where correlation equals causation
the GAN preserves causality, but that challenges arise for more advanced
analyses.

arXiv link: http://arxiv.org/abs/2301.01109v1

Econometrics arXiv paper, submitted: 2023-01-03

Fitting mixed logit random regret minimization models using maximum simulated likelihood

Authors: Ziyue Zhu, Álvaro A. Gutiérrez-Vargas, Martina Vandebroek

This article describes the mixrandregret command, which extends the
randregret command introduced in Guti\'errez-Vargas et al. (2021, The Stata
Journal 21: 626-658) incorporating random coefficients for Random Regret
Minimization models. The newly developed command mixrandregret allows the
inclusion of random coefficients in the regret function of the classical RRM
model introduced in Chorus (2010, European Journal of Transport and
Infrastructure Research 10: 181-196). The command allows the user to specify a
combination of fixed and random coefficients. In addition, the user can specify
normal and log-normal distributions for the random coefficients using the
commands' options. The models are fitted using simulated maximum likelihood
using numerical integration to approximate the choice probabilities.

arXiv link: http://arxiv.org/abs/2301.01091v1

Econometrics arXiv updated paper (originally submitted: 2023-01-03)

The Chained Difference-in-Differences

Authors: Christophe Bellégo, David Benatia, Vincent Dortet-Bernardet

This paper studies the identification, estimation, and inference of long-term
(binary) treatment effect parameters when balanced panel data is not available,
or consists of only a subset of the available data. We develop a new estimator:
the chained difference-in-differences, which leverages the overlapping
structure of many unbalanced panel data sets. This approach consists in
aggregating a collection of short-term treatment effects estimated on multiple
incomplete panels. Our estimator accommodates (1) multiple time periods, (2)
variation in treatment timing, (3) treatment effect heterogeneity, (4) general
missing data patterns, and (5) sample selection on observables. We establish
the asymptotic properties of the proposed estimator and discuss identification
and efficiency gains in comparison to existing methods. Finally, we illustrate
its relevance through (i) numerical simulations, and (ii) an application about
the effects of an innovation policy in France.

arXiv link: http://arxiv.org/abs/2301.01085v4

Econometrics arXiv paper, submitted: 2023-01-02

Time-Varying Coefficient DAR Model and Stability Measures for Stablecoin Prices: An Application to Tether

Authors: Antoine Djobenou, Emre Inan, Joann Jasiak

This paper examines the dynamics of Tether, the stablecoin with the largest
market capitalization. We show that the distributional and dynamic properties
of Tether/USD rates have been evolving from 2017 to 2021. We use local analysis
methods to detect and describe the local patterns, such as short-lived trends,
time-varying volatility and persistence. To accommodate these patterns, we
consider a time varying parameter Double Autoregressive tvDAR(1) model under
the assumption of local stationarity of Tether/USD rates. We estimate the tvDAR
model non-parametrically and test hypotheses on the functional parameters. In
the application to Tether, the model provides a good fit and reliable
out-of-sample forecasts at short horizons, while being robust to time-varying
persistence and volatility. In addition, the model yields a simple plug-in
measure of stability for Tether and other stablecoins for assessing and
comparing their stability.

arXiv link: http://arxiv.org/abs/2301.00509v1

Econometrics arXiv updated paper (originally submitted: 2022-12-31)

Inference for Large Panel Data with Many Covariates

Authors: Markus Pelger, Jiacheng Zou

This paper proposes a novel testing procedure for selecting a sparse set of
covariates that explains a large dimensional panel. Our selection method
provides correct false detection control while having higher power than
existing approaches. We develop the inferential theory for large panels with
many covariates by combining post-selection inference with a novel multiple
testing adjustment. Our data-driven hypotheses are conditional on the sparse
covariate selection. We control for family-wise error rates for covariate
discovery for large cross-sections. As an easy-to-use and practically relevant
procedure, we propose Panel-PoSI, which combines the data-driven adjustment for
panel multiple testing with valid post-selection p-values of a generalized
LASSO, that allows us to incorporate priors. In an empirical study, we select a
small number of asset pricing factors that explain a large cross-section of
investment strategies. Our method dominates the benchmarks out-of-sample due to
its better size and power.

arXiv link: http://arxiv.org/abs/2301.00292v6

Econometrics arXiv updated paper (originally submitted: 2022-12-31)

Higher-order Refinements of Small Bandwidth Asymptotics for Density-Weighted Average Derivative Estimators

Authors: Matias D. Cattaneo, Max H. Farrell, Michael Jansson, Ricardo Masini

The density weighted average derivative (DWAD) of a regression function is a
canonical parameter of interest in economics. Classical first-order large
sample distribution theory for kernel-based DWAD estimators relies on tuning
parameter restrictions and model assumptions that imply an asymptotic linear
representation of the point estimator. These conditions can be restrictive, and
the resulting distributional approximation may not be representative of the
actual sampling distribution of the statistic of interest. In particular, the
approximation is not robust to bandwidth choice. Small bandwidth asymptotics
offers an alternative, more general distributional approximation for
kernel-based DWAD estimators that allows for, but does not require, asymptotic
linearity. The resulting inference procedures based on small bandwidth
asymptotics were found to exhibit superior finite sample performance in
simulations, but no formal theory justifying that empirical success is
available in the literature. Employing Edgeworth expansions, this paper shows
that small bandwidth asymptotic approximations lead to inference procedures
with higher-order distributional properties that are demonstrably superior to
those of procedures based on asymptotic linear approximations.

arXiv link: http://arxiv.org/abs/2301.00277v2

Econometrics arXiv updated paper (originally submitted: 2022-12-31)

Feature Selection for Personalized Policy Analysis

Authors: Maria Nareklishvili, Nicholas Polson, Vadim Sokolov

In this paper, we propose Forest-PLS, a feature selection method for
analyzing policy effect heterogeneity in a more flexible and comprehensive
manner than is typically available with conventional methods. In particular,
our method is able to capture policy effect heterogeneity both within and
across subgroups of the population defined by observable characteristics. To
achieve this, we employ partial least squares to identify target components of
the population and causal forests to estimate personalized policy effects
across these components. We show that the method is consistent and leads to
asymptotically normally distributed policy effects. To demonstrate the efficacy
of our approach, we apply it to the data from the Pennsylvania Reemployment
Bonus Experiments, which were conducted in 1988-1989. The analysis reveals that
financial incentives can motivate some young non-white individuals to enter the
labor market. However, these incentives may also provide a temporary financial
cushion for others, dissuading them from actively seeking employment. Our
findings highlight the need for targeted, personalized measures for young
non-white male participants.

arXiv link: http://arxiv.org/abs/2301.00251v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-12-31

Inference on Time Series Nonparametric Conditional Moment Restrictions Using General Sieves

Authors: Xiaohong Chen, Yuan Liao, Weichen Wang

General nonlinear sieve learnings are classes of nonlinear sieves that can
approximate nonlinear functions of high dimensional variables much more
flexibly than various linear sieves (or series). This paper considers general
nonlinear sieve quasi-likelihood ratio (GN-QLR) based inference on expectation
functionals of time series data, where the functionals of interest are based on
some nonparametric function that satisfy conditional moment restrictions and
are learned using multilayer neural networks. While the asymptotic normality of
the estimated functionals depends on some unknown Riesz representer of the
functional space, we show that the optimally weighted GN-QLR statistic is
asymptotically Chi-square distributed, regardless whether the expectation
functional is regular (root-$n$ estimable) or not. This holds when the data are
weakly dependent beta-mixing condition. We apply our method to the off-policy
evaluation in reinforcement learning, by formulating the Bellman equation into
the conditional moment restriction framework, so that we can make inference
about the state-specific value functional using the proposed GN-QLR method with
time series data. In addition, estimating the averaged partial means and
averaged partial derivatives of nonparametric instrumental variables and
quantile IV models are also presented as leading examples. Finally, a Monte
Carlo study shows the finite sample performance of the procedure

arXiv link: http://arxiv.org/abs/2301.00092v2

Econometrics arXiv updated paper (originally submitted: 2022-12-30)

Identifying causal effects with subjective ordinal outcomes

Authors: Leonard Goff

Survey questions often ask respondents to select from ordered scales where
the meanings of the categories are subjective, leaving each individual free to
apply their own definitions in answering. This paper studies the use of these
responses as an outcome variable in causal inference, accounting for variation
in interpretation of the categories across individuals. I find that when a
continuous treatment variable is statistically independent of both i) potential
outcomes; and ii) heterogeneity in reporting styles, a nonparametric regression
of response category number on that treatment variable recovers a quantity
proportional to an average causal effect among individuals who are on the
margin between successive response categories. The magnitude of a given
regression coefficient is not meaningful on its own, but the ratio of local
regression derivatives with respect to two such treatment variables identifies
the relative magnitudes of convex averages of their effects. These results can
be seen as limiting cases of analogous results for binary treatment variables,
though comparisons of magnitude involving discrete treatments are not as
readily interpretable outside of the limit. I obtain a partial identification
result for comparisons involving discrete treatments under further assumptions.
An empirical application illustrates the results by revisiting the effects of
income comparisons on subjective well-being, without assuming cardinality or
interpersonal comparability of responses.

arXiv link: http://arxiv.org/abs/2212.14622v4

Econometrics arXiv updated paper (originally submitted: 2022-12-29)

Empirical Bayes When Estimation Precision Predicts Parameters

Authors: Jiafeng Chen

Gaussian empirical Bayes methods usually maintain a precision independence
assumption: The unknown parameters of interest are independent from the known
standard errors of the estimates. This assumption is often theoretically
questionable and empirically rejected. This paper proposes to model the
conditional distribution of the parameter given the standard errors as a
flexibly parametrized location-scale family of distributions, leading to a
family of methods that we call CLOSE. The CLOSE framework unifies and
generalizes several proposals under precision dependence. We argue that the
most flexible member of the CLOSE family is a minimalist and computationally
efficient default for accounting for precision dependence. We analyze this
method and show that it is competitive in terms of the regret of subsequent
decisions rules. Empirically, using CLOSE leads to sizable gains for selecting
high-mobility Census tracts.

arXiv link: http://arxiv.org/abs/2212.14444v5

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-12-29

Near-Optimal Non-Parametric Sequential Tests and Confidence Sequences with Possibly Dependent Observations

Authors: Aurelien Bibaut, Nathan Kallus, Michael Lindon

Sequential tests and their implied confidence sequences, which are valid at
arbitrary stopping times, promise flexible statistical inference and on-the-fly
decision making. However, strong guarantees are limited to parametric
sequential tests that under-cover in practice or concentration-bound-based
sequences that over-cover and have suboptimal rejection times. In this work, we
consider classic delayed-start normal-mixture sequential probability ratio
tests, and we provide the first asymptotic type-I-error and
expected-rejection-time guarantees under general non-parametric data generating
processes, where the asymptotics are indexed by the test's burn-in time. The
type-I-error results primarily leverage a martingale strong invariance
principle and establish that these tests (and their implied confidence
sequences) have type-I error rates asymptotically equivalent to the desired
(possibly varying) $\alpha$-level. The expected-rejection-time results
primarily leverage an identity inspired by It\^o's lemma and imply that, in
certain asymptotic regimes, the expected rejection time is asymptotically
equivalent to the minimum possible among $\alpha$-level tests. We show how to
apply our results to sequential inference on parameters defined by estimating
equations, such as average treatment effects. Together, our results establish
these (ostensibly parametric) tests as general-purpose, non-parametric, and
near-optimal. We illustrate this via numerical simulations and a real-data
application to A/B testing at Netflix.

arXiv link: http://arxiv.org/abs/2212.14411v5

Econometrics arXiv paper, submitted: 2022-12-29

What Estimators Are Unbiased For Linear Models?

Authors: Lihua Lei, Jeffrey Wooldridge

The recent thought-provoking paper by Hansen [2022, Econometrica] proved that
the Gauss-Markov theorem continues to hold without the requirement that
competing estimators are linear in the vector of outcomes. Despite the elegant
proof, it was shown by the authors and other researchers that the main result
in the earlier version of Hansen's paper does not extend the classic
Gauss-Markov theorem because no nonlinear unbiased estimator exists under his
conditions. To address the issue, Hansen [2022] added statements in the latest
version with new conditions under which nonlinear unbiased estimators exist.
Motivated by the lively discussion, we study a fundamental problem: what
estimators are unbiased for a given class of linear models? We first review a
line of highly relevant work dating back to the 1960s, which, unfortunately,
have not drawn enough attention. Then, we introduce notation that allows us to
restate and unify results from earlier work and Hansen [2022]. The new
framework also allows us to highlight differences among previous conclusions.
Lastly, we establish new representation theorems for unbiased estimators under
different restrictions on the linear model, allowing the coefficients and
covariance matrix to take only a finite number of values, the higher moments of
the estimator and the dependent variable to exist, and the error distribution
to be discrete, absolutely continuous, or dominated by another probability
measure. Our results substantially generalize the claims of parallel
commentaries on Hansen [2022] and a remarkable result by Koopmann [1982].

arXiv link: http://arxiv.org/abs/2212.14185v1

Econometrics arXiv updated paper (originally submitted: 2022-12-28)

Supercompliers

Authors: Matthew L. Comey, Amanda R. Eng, Pauline Leung, Zhuan Pei

In a binary-treatment instrumental variable framework, we define
supercompliers as the subpopulation whose treatment take-up positively responds
to eligibility and whose outcome positively responds to take-up. Supercompliers
are the only subpopulation to benefit from treatment eligibility and, hence,
are important for policy. We provide tools to characterize supercompliers under
a set of jointly testable assumptions. Specifically, we require standard
assumptions from the local average treatment effect literature plus an outcome
monotonicity assumption. Estimation and inference can be conducted with
instrumental variable regression. In two job-training experiments, we
demonstrate our machinery's utility, particularly in incorporating social
welfare weights into marginal-value-of-public-funds analysis.

arXiv link: http://arxiv.org/abs/2212.14105v3

Econometrics arXiv updated paper (originally submitted: 2022-12-28)

Forward Orthogonal Deviations GMM and the Absence of Large Sample Bias

Authors: Robert F. Phillips

It is well known that generalized method of moments (GMM) estimators of
dynamic panel data regressions can have significant bias when the number of
time periods ($T$) is not small compared to the number of cross-sectional units
($n$). The bias is attributed to the use of many instrumental variables. This
paper shows that if the maximum number of instrumental variables used in a
period increases with $T$ at a rate slower than $T^{1/2}$, then GMM estimators
that exploit the forward orthogonal deviations (FOD) transformation do not have
asymptotic bias, regardless of how fast $T$ increases relative to $n$. This
conclusion is specific to using the FOD transformation. A similar conclusion
does not necessarily apply when other transformations are used to remove fixed
effects. Monte Carlo evidence illustrating the analytical results is provided.

arXiv link: http://arxiv.org/abs/2212.14075v2

Econometrics arXiv paper, submitted: 2022-12-28

Robustifying Markowitz

Authors: Wolfgang Karl Härdle, Yegor Klochkov, Alla Petukhina, Nikita Zhivotovskiy

Markowitz mean-variance portfolios with sample mean and covariance as input
parameters feature numerous issues in practice. They perform poorly out of
sample due to estimation error, they experience extreme weights together with
high sensitivity to change in input parameters. The heavy-tail characteristics
of financial time series are in fact the cause for these erratic fluctuations
of weights that consequently create substantial transaction costs. In
robustifying the weights we present a toolbox for stabilizing costs and weights
for global minimum Markowitz portfolios. Utilizing a projected gradient descent
(PGD) technique, we avoid the estimation and inversion of the covariance
operator as a whole and concentrate on robust estimation of the gradient
descent increment. Using modern tools of robust statistics we construct a
computationally efficient estimator with almost Gaussian properties based on
median-of-means uniformly over weights. This robustified Markowitz approach is
confirmed by empirical studies on equity markets. We demonstrate that
robustified portfolios reach the lowest turnover compared to shrinkage-based
and constrained portfolios while preserving or slightly improving out-of-sample
performance.

arXiv link: http://arxiv.org/abs/2212.13996v1

Econometrics arXiv updated paper (originally submitted: 2022-12-26)

Spectral and post-spectral estimators for grouped panel data models

Authors: Denis Chetverikov, Elena Manresa

In this paper, we develop spectral and post-spectral estimators for grouped
panel data models. Both estimators are consistent in the asymptotics where the
number of observations $N$ and the number of time periods $T$ simultaneously
grow large. In addition, the post-spectral estimator is $NT$-consistent
and asymptotically normal with mean zero under the assumption of well-separated
groups even if $T$ is growing much slower than $N$. The post-spectral estimator
has, therefore, theoretical properties that are comparable to those of the
grouped fixed-effect estimator developed by Bonhomme and Manresa (2015). In
contrast to the grouped fixed-effect estimator, however, our post-spectral
estimator is computationally straightforward.

arXiv link: http://arxiv.org/abs/2212.13324v2

Econometrics arXiv updated paper (originally submitted: 2022-12-26)

An Effective Treatment Approach to Difference-in-Differences with General Treatment Patterns

Authors: Takahide Yanagi

We consider a general difference-in-differences model in which the treatment
variable of interest may be non-binary and its value may change in each period.
It is generally difficult to estimate treatment parameters defined with the
potential outcome given the entire path of treatment adoption, because each
treatment path may be experienced by only a small number of observations. We
propose an alternative approach using the concept of effective treatment, which
summarizes the treatment path into an empirically tractable low-dimensional
variable, and develop doubly robust identification, estimation, and inference
methods. We also provide a companion R software package.

arXiv link: http://arxiv.org/abs/2212.13226v3

Econometrics arXiv paper, submitted: 2022-12-26

Orthogonal Series Estimation for the Ratio of Conditional Expectation Functions

Authors: Kazuhiko Shinoda, Takahiro Hoshino

In various fields of data science, researchers are often interested in
estimating the ratio of conditional expectation functions (CEFR). Specifically
in causal inference problems, it is sometimes natural to consider ratio-based
treatment effects, such as odds ratios and hazard ratios, and even
difference-based treatment effects are identified as CEFR in some empirically
relevant settings. This chapter develops the general framework for estimation
and inference on CEFR, which allows the use of flexible machine learning for
infinite-dimensional nuisance parameters. In the first stage of the framework,
the orthogonal signals are constructed using debiased machine learning
techniques to mitigate the negative impacts of the regularization bias in the
nuisance estimates on the target estimates. The signals are then combined with
a novel series estimator tailored for CEFR. We derive the pointwise and uniform
asymptotic results for estimation and inference on CEFR, including the validity
of the Gaussian bootstrap, and provide low-level sufficient conditions to apply
the proposed framework to some specific examples. We demonstrate the
finite-sample performance of the series estimator constructed under the
proposed framework by numerical simulations. Finally, we apply the proposed
method to estimate the causal effect of the 401(k) program on household assets.

arXiv link: http://arxiv.org/abs/2212.13145v1

Econometrics arXiv updated paper (originally submitted: 2022-12-26)

Tensor PCA for Factor Models

Authors: Andrii Babii, Eric Ghysels, Junsu Pan

Modern empirical analysis often relies on high-dimensional panel datasets
with non-negligible cross-sectional and time-series correlations. Factor models
are natural for capturing such dependencies. A tensor factor model describes
the $d$-dimensional panel as a sum of a reduced rank component and an
idiosyncratic noise, generalizing traditional factor models for two-dimensional
panels. We consider a tensor factor model corresponding to the notion of a
reduced multilinear rank of a tensor. We show that for a strong factor model, a
simple tensor principal component analysis algorithm is optimal for estimating
factors and loadings. When the factors are weak, the convergence rate of simple
TPCA can be improved with alternating least-squares iterations. We also provide
inferential results for factors and loadings and propose the first test to
select the number of factors. The new tools are applied to the problem of
imputing missing values in a multidimensional panel of firm characteristics.

arXiv link: http://arxiv.org/abs/2212.12981v3

Econometrics arXiv updated paper (originally submitted: 2022-12-22)

Efficient Sampling for Realized Variance Estimation in Time-Changed Diffusion Models

Authors: Timo Dimitriadis, Roxana Halbleib, Jeannine Polivka, Jasper Rennspies, Sina Streicher, Axel Friedrich Wolter

This paper analyzes the benefits of sampling intraday returns in intrinsic
time for the realized variance (RV) estimator. We theoretically show in finite
samples that depending on the permitted sampling information, the RV estimator
is most efficient under either hitting time sampling that samples whenever the
price changes by a pre-determined threshold, or under the new concept of
realized business time that samples according to a combination of observed
trades and estimated tick variance. The analysis builds on the assumption that
asset prices follow a diffusion that is time-changed with a jump process that
separately models the transaction times. This provides a flexible model that
allows for leverage specifications and Hawkes-type jump processes and
separately captures the empirically varying trading intensity and tick variance
processes, which are particularly relevant for disentangling the driving forces
of the sampling schemes. Extensive simulations confirm our theoretical results
and show that for low levels of noise, hitting time sampling remains superior
while for increasing noise levels, realized business time becomes the
empirically most efficient sampling scheme. An application to stock data
provides empirical evidence for the benefits of using these intrinsic sampling
schemes to construct more efficient RV estimators as well as for an improved
forecast performance.

arXiv link: http://arxiv.org/abs/2212.11833v3

Econometrics arXiv updated paper (originally submitted: 2022-12-21)

A Bootstrap Specification Test for Semiparametric Models with Generated Regressors

Authors: Elia Lapenta

This paper provides a specification test for semiparametric models with
nonparametrically generated regressors. Such variables are not observed by the
researcher but are nonparametrically identified and estimable. Applications of
the test include models with endogenous regressors identified by control
functions, semiparametric sample selection models, or binary games with
incomplete information. The statistic is built from the residuals of the
semiparametric model. A novel wild bootstrap procedure is shown to provide
valid critical values. We consider nonparametric estimators with an automatic
bias correction that makes the test implementable without undersmoothing. In
simulations the test exhibits good small sample performances, and an
application to women's labor force participation decisions shows its
implementation in a real data context.

arXiv link: http://arxiv.org/abs/2212.11112v2

Econometrics arXiv updated paper (originally submitted: 2022-12-21)

Partly Linear Instrumental Variables Regressions without Smoothing on the Instruments

Authors: Jean-Pierre Florens, Elia Lapenta

We consider a semiparametric partly linear model identified by instrumental
variables. We propose an estimation method that does not smooth on the
instruments and we extend the Landweber-Fridman regularization scheme to the
estimation of this semiparametric model. We then show the asymptotic normality
of the parametric estimator and obtain the convergence rate for the
nonparametric estimator. Our estimator that does not smooth on the instruments
coincides with a typical estimator that does smooth on the instruments but
keeps the respective bandwidth fixed as the sample size increases. We propose a
data driven method for the selection of the regularization parameter, and in a
simulation study we show the attractive performance of our estimators.

arXiv link: http://arxiv.org/abs/2212.11012v2

Econometrics arXiv paper, submitted: 2022-12-21

Inference for Model Misspecification in Interest Rate Term Structure using Functional Principal Component Analysis

Authors: Kaiwen Hou

Level, slope, and curvature are three commonly-believed principal components
in interest rate term structure and are thus widely used in modeling. This
paper characterizes the heterogeneity of how misspecified such models are
through time. Presenting the orthonormal basis in the Nelson-Siegel model
interpretable as the three factors, we design two nonparametric tests for
whether the basis is equivalent to the data-driven functional principal
component basis underlying the yield curve dynamics, considering the ordering
of eigenfunctions or not, respectively. Eventually, we discover high dispersion
between the two bases when rare events occur, suggesting occasional
misspecification even if the model is overall expressive.

arXiv link: http://arxiv.org/abs/2212.10790v1

Econometrics arXiv updated paper (originally submitted: 2022-12-20)

Probabilistic Quantile Factor Analysis

Authors: Dimitris Korobilis, Maximilian Schröder

This paper extends quantile factor analysis to a probabilistic variant that
incorporates regularization and computationally efficient variational
approximations. We establish through synthetic and real data experiments that
the proposed estimator can, in many cases, achieve better accuracy than a
recently proposed loss-based estimator. We contribute to the factor analysis
literature by extracting new indexes of low, medium, and
high economic policy uncertainty, as well as loose,
median, and tight financial conditions. We show that the high
uncertainty and tight financial conditions indexes have superior predictive
ability for various measures of economic activity. In a high-dimensional
exercise involving about 1000 daily financial series, we find that quantile
factors also provide superior out-of-sample information compared to mean or
median factors.

arXiv link: http://arxiv.org/abs/2212.10301v3

Econometrics arXiv paper, submitted: 2022-12-19

Quantifying fairness and discrimination in predictive models

Authors: Arthur Charpentier

The analysis of discrimination has long interested economists and lawyers. In
recent years, the literature in computer science and machine learning has
become interested in the subject, offering an interesting re-reading of the
topic. These questions are the consequences of numerous criticisms of
algorithms used to translate texts or to identify people in images. With the
arrival of massive data, and the use of increasingly opaque algorithms, it is
not surprising to have discriminatory algorithms, because it has become easy to
have a proxy of a sensitive variable, by enriching the data indefinitely.
According to Kranzberg (1986), "technology is neither good nor bad, nor is it
neutral", and therefore, "machine learning won't give you anything like gender
neutrality `for free' that you didn't explicitely ask for", as claimed by
Kearns et a. (2019). In this article, we will come back to the general context,
for predictive models in classification. We will present the main concepts of
fairness, called group fairness, based on independence between the sensitive
variable and the prediction, possibly conditioned on this or that information.
We will finish by going further, by presenting the concepts of individual
fairness. Finally, we will see how to correct a potential discrimination, in
order to guarantee that a model is more ethical

arXiv link: http://arxiv.org/abs/2212.09868v1

Econometrics arXiv updated paper (originally submitted: 2022-12-19)

Robust Design and Evaluation of Predictive Algorithms under Unobserved Confounding

Authors: Ashesh Rambachan, Amanda Coston, Edward Kennedy

Predictive algorithms inform consequential decisions in settings where the
outcome is selectively observed given choices made by human decision makers. We
propose a unified framework for the robust design and evaluation of predictive
algorithms in selectively observed data. We impose general assumptions on how
much the outcome may vary on average between unselected and selected units
conditional on observed covariates and identified nuisance parameters,
formalizing popular empirical strategies for imputing missing data such as
proxy outcomes and instrumental variables. We develop debiased machine learning
estimators for the bounds on a large class of predictive performance estimands,
such as the conditional likelihood of the outcome, a predictive algorithm's
mean square error, true/false positive rate, and many others, under these
assumptions. In an administrative dataset from a large Australian financial
institution, we illustrate how varying assumptions on unobserved confounding
leads to meaningful changes in default risk predictions and evaluations of
credit scores across sensitive groups.

arXiv link: http://arxiv.org/abs/2212.09844v5

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-12-19

Simultaneous Inference of a Partially Linear Model in Time Series

Authors: Jiaqi Li, Likai Chen, Kun Ho Kim, Tianwei Zhou

We introduce a new methodology to conduct simultaneous inference of the
nonparametric component in partially linear time series regression models where
the nonparametric part is a multivariate unknown function. In particular, we
construct a simultaneous confidence region (SCR) for the multivariate function
by extending the high-dimensional Gaussian approximation to dependent processes
with continuous index sets. Our results allow for a more general dependence
structure compared to previous works and are widely applicable to a variety of
linear and nonlinear autoregressive processes. We demonstrate the validity of
our proposed methodology by examining the finite-sample performance in the
simulation study. Finally, an application in time series, the forward premium
regression, is presented, where we construct the SCR for the foreign exchange
risk premium from the exchange rate and macroeconomic data.

arXiv link: http://arxiv.org/abs/2212.10359v2

Econometrics arXiv updated paper (originally submitted: 2022-12-18)

Identification of time-varying counterfactual parameters in nonlinear panel models

Authors: Irene Botosaru, Chris Muris

We develop a general framework for the identification of counterfactual
parameters in a class of nonlinear semiparametric panel models with fixed
effects and time effects. Our method applies to models for discrete outcomes
(e.g., two-way fixed effects binary choice) or continuous outcomes (e.g.,
censored regression), with discrete or continuous regressors. Our results do
not require parametric assumptions on the error terms or time-homogeneity on
the outcome equation. Our main results focus on static models, with a set of
results applying to models without any exogeneity conditions. We show that the
survival distribution of counterfactual outcomes is identified (point or
partial) in this class of models. This parameter is a building block for most
partial and marginal effects of interest in applied practice that are based on
the average structural function as defined by Blundell and Powell (2003, 2004).
To the best of our knowledge, ours are the first results on average partial and
marginal effects for binary choice and ordered choice models with two-way fixed
effects and non-logistic errors.

arXiv link: http://arxiv.org/abs/2212.09193v2

Econometrics arXiv updated paper (originally submitted: 2022-12-18)

PAC-Bayesian Treatment Allocation Under Budget Constraints

Authors: Daniel F. Pellatt

This paper considers the estimation of treatment assignment rules when the
policy maker faces a general budget or resource constraint. Utilizing the
PAC-Bayesian framework, we propose new treatment assignment rules that allow
for flexible notions of treatment outcome, treatment cost, and a budget
constraint. For example, the constraint setting allows for cost-savings, when
the costs of non-treatment exceed those of treatment for a subpopulation, to be
factored into the budget. It also accommodates simpler settings, such as
quantity constraints, and doesn't require outcome responses and costs to have
the same unit of measurement. Importantly, the approach accounts for settings
where budget or resource limitations may preclude treating all that can
benefit, where costs may vary with individual characteristics, and where there
may be uncertainty regarding the cost of treatment rules of interest. Despite
the nomenclature, our theoretical analysis examines frequentist properties of
the proposed rules. For stochastic rules that typically approach
budget-penalized empirical welfare maximizing policies in larger samples, we
derive non-asymptotic generalization bounds for the target population costs and
sharp oracle-type inequalities that compare the rules' welfare regret to that
of optimal policies in relevant budget categories. A closely related,
non-stochastic, model aggregation treatment assignment rule is shown to inherit
desirable attributes.

arXiv link: http://arxiv.org/abs/2212.09007v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-12-16

A smooth transition autoregressive model for matrix-variate time series

Authors: Andrea Bucci

In many applications, data are observed as matrices with temporal dependence.
Matrix-variate time series modeling is a new branch of econometrics. Although
stylized facts in several fields, the existing models do not account for regime
switches in the dynamics of matrices that are not abrupt. In this paper, we
extend linear matrix-variate autoregressive models by introducing a
regime-switching model capable of accounting for smooth changes, the matrix
smooth transition autoregressive model. We present the estimation processes
with the asymptotic properties demonstrated with simulated and real data.

arXiv link: http://arxiv.org/abs/2212.08615v1

Econometrics arXiv cross-link from q-fin.CP (q-fin.CP), submitted: 2022-12-16

Moate Simulation of Stochastic Processes

Authors: Michael E. Mura

A novel approach called Moate Simulation is presented to provide an accurate
numerical evolution of probability distribution functions represented on grids
arising from stochastic differential processes where initial conditions are
specified. Where the variables of stochastic differential equations may be
transformed via It\^o-Doeblin calculus into stochastic differentials with a
constant diffusion term, the probability distribution function for these
variables can be simulated in discrete time steps. The drift is applied
directly to a volume element of the distribution while the stochastic diffusion
term is applied through the use of convolution techniques such as Fast or
Discrete Fourier Transforms. This allows for highly accurate distributions to
be efficiently simulated to a given time horizon and may be employed in one,
two or higher dimensional expectation integrals, e.g. for pricing of financial
derivatives. The Moate Simulation approach forms a more accurate and
considerably faster alternative to Monte Carlo Simulation for many applications
while retaining the opportunity to alter the distribution in mid-simulation.

arXiv link: http://arxiv.org/abs/2212.08509v1

Econometrics arXiv paper, submitted: 2022-12-14

The finite sample performance of instrumental variable-based estimators of the Local Average Treatment Effect when controlling for covariates

Authors: Hugo Bodory, Martin Huber, Michael Lechner

This paper investigates the finite sample performance of a range of
parametric, semi-parametric, and non-parametric instrumental variable
estimators when controlling for a fixed set of covariates to evaluate the local
average treatment effect. Our simulation designs are based on empirical labor
market data from the US and vary in several dimensions, including effect
heterogeneity, instrument selectivity, instrument strength, outcome
distribution, and sample size. Among the estimators and simulations considered,
non-parametric estimation based on the random forest (a machine learner
controlling for covariates in a data-driven way) performs competitive in terms
of the average coverage rates of the (bootstrap-based) 95% confidence
intervals, while also being relatively precise. Non-parametric kernel
regression as well as certain versions of semi-parametric radius matching on
the propensity score, pair matching on the covariates, and inverse probability
weighting also have a decent coverage, but are less precise than the random
forest-based method. In terms of the average root mean squared error of LATE
estimation, kernel regression performs best, closely followed by the random
forest method, which has the lowest average absolute bias.

arXiv link: http://arxiv.org/abs/2212.07379v1

Econometrics arXiv paper, submitted: 2022-12-14

Smoothing volatility targeting

Authors: Mauro Bernardi, Daniele Bianchi, Nicolas Bianco

We propose an alternative approach towards cost mitigation in
volatility-managed portfolios based on smoothing the predictive density of an
otherwise standard stochastic volatility model. Specifically, we develop a
novel variational Bayes estimation method that flexibly encompasses different
smoothness assumptions irrespective of the persistence of the underlying latent
state. Using a large set of equity trading strategies, we show that smoothing
volatility targeting helps to regularise the extreme leverage/turnover that
results from commonly used realised variance estimates. This has important
implications for both the risk-adjusted returns and the mean-variance
efficiency of volatility-managed portfolios, once transaction costs are
factored in. An extensive simulation study shows that our variational inference
scheme compares favourably against existing state-of-the-art Bayesian
estimation methods for stochastic volatility models.

arXiv link: http://arxiv.org/abs/2212.07288v1

Econometrics arXiv updated paper (originally submitted: 2022-12-14)

Robust Estimation of the non-Gaussian Dimension in Structural Linear Models

Authors: Miguel Cabello

Statistical identification of possibly non-fundamental SVARMA models requires
structural errors: (i) to be an i.i.d process, (ii) to be mutually independent
across components, and (iii) each of them must be non-Gaussian distributed.
Hence, provided the first two requisites, it is crucial to evaluate the
non-Gaussian identification condition. We address this problem by relating the
non-Gaussian dimension of structural errors vector to the rank of a matrix
built from the higher-order spectrum of reduced-form errors. This makes our
proposal robust to the roots location of the lag polynomials, and generalizes
the current procedures designed for the restricted case of a causal structural
VAR model. Simulation exercises show that our procedure satisfactorily
estimates the number of non-Gaussian components.

arXiv link: http://arxiv.org/abs/2212.07263v2

Econometrics arXiv updated paper (originally submitted: 2022-12-14)

On LASSO for High Dimensional Predictive Regression

Authors: Ziwei Mei, Zhentao Shi

This paper examines LASSO, a widely-used $L_{1}$-penalized regression method,
in high dimensional linear predictive regressions, particularly when the number
of potential predictors exceeds the sample size and numerous unit root
regressors are present. The consistency of LASSO is contingent upon two key
components: the deviation bound of the cross product of the regressors and the
error term, and the restricted eigenvalue of the Gram matrix. We present new
probabilistic bounds for these components, suggesting that LASSO's rates of
convergence are different from those typically observed in cross-sectional
cases. When applied to a mixture of stationary, nonstationary, and cointegrated
predictors, LASSO maintains its asymptotic guarantee if predictors are
scale-standardized. Leveraging machine learning and macroeconomic domain
expertise, LASSO demonstrates strong performance in forecasting the
unemployment rate, as evidenced by its application to the FRED-MD database.

arXiv link: http://arxiv.org/abs/2212.07052v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-12-13

Policy learning for many outcomes of interest: Combining optimal policy trees with multi-objective Bayesian optimisation

Authors: Patrick Rehill, Nicholas Biddle

Methods for learning optimal policies use causal machine learning models to
create human-interpretable rules for making choices around the allocation of
different policy interventions. However, in realistic policy-making contexts,
decision-makers often care about trade-offs between outcomes, not just
single-mindedly maximising utility for one outcome. This paper proposes an
approach termed Multi-Objective Policy Learning (MOPoL) which combines optimal
decision trees for policy learning with a multi-objective Bayesian optimisation
approach to explore the trade-off between multiple outcomes. It does this by
building a Pareto frontier of non-dominated models for different hyperparameter
settings which govern outcome weighting. The key here is that a low-cost greedy
tree can be an accurate proxy for the very computationally costly optimal tree
for the purposes of making decisions which means models can be repeatedly fit
to learn a Pareto frontier. The method is applied to a real-world case-study of
non-price rationing of anti-malarial medication in Kenya.

arXiv link: http://arxiv.org/abs/2212.06312v2

Econometrics arXiv updated paper (originally submitted: 2022-12-12)

Logs with zeros? Some problems and solutions

Authors: Jiafeng Chen, Jonathan Roth

When studying an outcome $Y$ that is weakly-positive but can equal zero (e.g.
earnings), researchers frequently estimate an average treatment effect (ATE)
for a "log-like" transformation that behaves like $\log(Y)$ for large $Y$ but
is defined at zero (e.g. $\log(1+Y)$, $arcsinh(Y)$). We argue that
ATEs for log-like transformations should not be interpreted as approximating
percentage effects, since unlike a percentage, they depend on the units of the
outcome. In fact, we show that if the treatment affects the extensive margin,
one can obtain a treatment effect of any magnitude simply by re-scaling the
units of $Y$ before taking the log-like transformation. This arbitrary
unit-dependence arises because an individual-level percentage effect is not
well-defined for individuals whose outcome changes from zero to non-zero when
receiving treatment, and the units of the outcome implicitly determine how much
weight the ATE for a log-like transformation places on the extensive margin. We
further establish a trilemma: when the outcome can equal zero, there is no
treatment effect parameter that is an average of individual-level treatment
effects, unit-invariant, and point-identified. We discuss several alternative
approaches that may be sensible in settings with an intensive and extensive
margin, including (i) expressing the ATE in levels as a percentage (e.g. using
Poisson regression), (ii) explicitly calibrating the value placed on the
intensive and extensive margins, and (iii) estimating separate effects for the
two margins (e.g. using Lee bounds). We illustrate these approaches in three
empirical applications.

arXiv link: http://arxiv.org/abs/2212.06080v7

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-12-12

Measuring the Driving Forces of Predictive Performance: Application to Credit Scoring

Authors: Hué Sullivan, Hurlin Christophe, Pérignon Christophe, Saurin Sébastien

As they play an increasingly important role in determining access to credit,
credit scoring models are under growing scrutiny from banking supervisors and
internal model validators. These authorities need to monitor the model
performance and identify its key drivers. To facilitate this, we introduce the
XPER methodology to decompose a performance metric (e.g., AUC, $R^2$) into
specific contributions associated with the various features of a forecasting
model. XPER is theoretically grounded on Shapley values and is both
model-agnostic and performance metric-agnostic. Furthermore, it can be
implemented either at the model level or at the individual level. Using a novel
dataset of car loans, we decompose the AUC of a machine-learning model trained
to forecast the default probability of loan applicants. We show that a small
number of features can explain a surprisingly large part of the model
performance. Notably, the features that contribute the most to the predictive
performance of the model may not be the ones that contribute the most to
individual forecasts (SHAP). Finally, we show how XPER can be used to deal with
heterogeneity issues and improve performance.

arXiv link: http://arxiv.org/abs/2212.05866v4

Econometrics arXiv paper, submitted: 2022-12-12

Dominant Drivers of National Inflation

Authors: Jan Ditzen, Francesco Ravazzolo

For western economies a long-forgotten phenomenon is on the horizon: rising
inflation rates. We propose a novel approach christened D2ML to identify
drivers of national inflation. D2ML combines machine learning for model
selection with time dependent data and graphical models to estimate the inverse
of the covariance matrix, which is then used to identify dominant drivers.
Using a dataset of 33 countries, we find that the US inflation rate and oil
prices are dominant drivers of national inflation rates. For a more general
framework, we carry out Monte Carlo simulations to show that our estimator
correctly identifies dominant drivers.

arXiv link: http://arxiv.org/abs/2212.05841v1

Econometrics arXiv paper, submitted: 2022-12-11

Robust Inference in High Dimensional Linear Model with Cluster Dependence

Authors: Ng Cheuk Fai

Cluster standard error (Liang and Zeger, 1986) is widely used by empirical
researchers to account for cluster dependence in linear model. It is well known
that this standard error is biased. We show that the bias does not vanish under
high dimensional asymptotics by revisiting Chesher and Jewitt (1987)'s
approach. An alternative leave-cluster-out crossfit (LCOC) estimator that is
unbiased, consistent and robust to cluster dependence is provided under high
dimensional setting introduced by Cattaneo, Jansson and Newey (2018). Since
LCOC estimator nests the leave-one-out crossfit estimator of Kline, Saggio and
Solvsten (2019), the two papers are unified. Monte Carlo comparisons are
provided to give insights on its finite sample properties. The LCOC estimator
is then applied to Angrist and Lavy's (2009) study of the effects of high
school achievement award and Donohue III and Levitt's (2001) study of the
impact of abortion on crime.

arXiv link: http://arxiv.org/abs/2212.05554v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2022-12-11

On regression-adjusted imputation estimators of the average treatment effect

Authors: Zhexiao Lin, Fang Han

Imputing missing potential outcomes using an estimated regression function is
a natural idea for estimating causal effects. In the literature, estimators
that combine imputation and regression adjustments are believed to be
comparable to augmented inverse probability weighting. Accordingly, people for
a long time conjectured that such estimators, while avoiding directly
constructing the weights, are also doubly robust (Imbens, 2004; Stuart, 2010).
Generalizing an earlier result of the authors (Lin et al., 2021), this paper
formalizes this conjecture, showing that a large class of regression-adjusted
imputation methods are indeed doubly robust for estimating the average
treatment effect. In addition, they are provably semiparametrically efficient
as long as both the density and regression models are correctly specified.
Notable examples of imputation methods covered by our theory include kernel
matching, (weighted) nearest neighbor matching, local linear matching, and
(honest) random forests.

arXiv link: http://arxiv.org/abs/2212.05424v2

Econometrics arXiv updated paper (originally submitted: 2022-12-09)

The Falsification Adaptive Set in Linear Models with Instrumental Variables that Violate the Exclusion or Conditional Exogeneity Restriction

Authors: Nicolas Apfel, Frank Windmeijer

Masten and Poirier (2021) introduced the falsification adaptive set (FAS) in
linear models with a single endogenous variable estimated with multiple
correlated instrumental variables (IVs). The FAS reflects the model uncertainty
that arises from falsification of the baseline model. We show that it applies
to cases where a conditional exogeneity assumption holds and invalid
instruments violate the exclusion assumption only. We propose a generalized FAS
that reflects the model uncertainty when some instruments violate the exclusion
assumption and/or some instruments violate the conditional exogeneity
assumption. Under the assumption that invalid instruments are not themselves
endogenous explanatory variables, if there is at least one relevant instrument
that satisfies both the exclusion and conditional exogeneity assumptions then
this generalized FAS is guaranteed to contain the parameter of interest.

arXiv link: http://arxiv.org/abs/2212.04814v2

Econometrics arXiv updated paper (originally submitted: 2022-12-09)

On the Non-Identification of Revenue Production Functions

Authors: David Van Dijcke

Production functions are potentially misspecified when revenue is used as a
proxy for output. I formalize and strengthen this common knowledge by showing
that neither the production function nor Hicks-neutral productivity can be
identified with such a revenue proxy. This result obtains when relaxing the
standard assumptions used in the literature to allow for imperfect competition.
It holds for a large class of production functions, including all commonly used
parametric forms. Among the prevalent approaches to address this issue, only
those that impose assumptions on the underlying demand system can possibly
identify the production function.

arXiv link: http://arxiv.org/abs/2212.04620v3

Econometrics arXiv paper, submitted: 2022-12-08

Optimal Model Selection in RDD and Related Settings Using Placebo Zones

Authors: Nathan Kettlewell, Peter Siminski

We propose a new model-selection algorithm for Regression Discontinuity
Design, Regression Kink Design, and related IV estimators. Candidate models are
assessed within a 'placebo zone' of the running variable, where the true
effects are known to be zero. The approach yields an optimal combination of
bandwidth, polynomial, and any other choice parameters. It can also inform
choices between classes of models (e.g. RDD versus cohort-IV) and any other
choices, such as covariates, kernel, or other weights. We outline sufficient
conditions under which the approach is asymptotically optimal. The approach
also performs favorably under more general conditions in a series of Monte
Carlo simulations. We demonstrate the approach in an evaluation of changes to
Minimum Supervised Driving Hours in the Australian state of New South Wales. We
also re-evaluate evidence on the effects of Head Start and Minimum Legal
Drinking Age. Our Stata commands implement the procedure and compare its
performance to other approaches.

arXiv link: http://arxiv.org/abs/2212.04043v1

Econometrics arXiv paper, submitted: 2022-12-07

Semiparametric Distribution Regression with Instruments and Monotonicity

Authors: Dominik Wied

This paper proposes IV-based estimators for the semiparametric distribution
regression model in the presence of an endogenous regressor, which are based on
an extension of IV probit estimators. We discuss the causal interpretation of
the estimators and two methods (monotone rearrangement and isotonic regression)
to ensure a monotonically increasing distribution function. Asymptotic
properties and simulation evidence are provided. An application to wage
equations reveals statistically significant and heterogeneous differences to
the inconsistent OLS-based estimator.

arXiv link: http://arxiv.org/abs/2212.03704v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-12-07

Neighborhood Adaptive Estimators for Causal Inference under Network Interference

Authors: Alexandre Belloni, Fei Fang, Alexander Volfovsky

Estimating causal effects has become an integral part of most applied fields.
In this work we consider the violation of the classical no-interference
assumption with units connected by a network. For tractability, we consider a
known network that describes how interference may spread. Unlike previous work
the radius (and intensity) of the interference experienced by a unit is unknown
and can depend on different (local) sub-networks and the assigned treatments.
We study estimators for the average direct treatment effect on the treated in
such a setting under additive treatment effects. We establish rates of
convergence and distributional results. The proposed estimators considers all
possible radii for each (local) treatment assignment pattern. In contrast to
previous work, we approximate the relevant network interference patterns that
lead to good estimates of the interference. To handle feature engineering, a
key innovation is to propose the use of synthetic treatments to decouple the
dependence. We provide simulations, an empirical illustration and insights for
the general study of interference.

arXiv link: http://arxiv.org/abs/2212.03683v2

Econometrics arXiv updated paper (originally submitted: 2022-12-07)

Bayesian Forecasting in Economics and Finance: A Modern Review

Authors: Gael M. Martin, David T. Frazier, Worapree Maneesoonthorn, Ruben Loaiza-Maya, Florian Huber, Gary Koop, John Maheu, Didier Nibbering, Anastasios Panagiotelis

The Bayesian statistical paradigm provides a principled and coherent approach
to probabilistic forecasting. Uncertainty about all unknowns that characterize
any forecasting problem -- model, parameters, latent states -- is able to be
quantified explicitly, and factored into the forecast distribution via the
process of integration or averaging. Allied with the elegance of the method,
Bayesian forecasting is now underpinned by the burgeoning field of Bayesian
computation, which enables Bayesian forecasts to be produced for virtually any
problem, no matter how large, or complex. The current state of play in Bayesian
forecasting in economics and finance is the subject of this review. The aim is
to provide the reader with an overview of modern approaches to the field, set
in some historical context; and with sufficient computational detail given to
assist the reader with implementation.

arXiv link: http://arxiv.org/abs/2212.03471v2

Econometrics arXiv updated paper (originally submitted: 2022-12-06)

The long-term effect of childhood exposure to technology using surrogates

Authors: Sylvia Klosin, Nicolaj Søndergaard Mühlbach

We study how childhood exposure to technology at ages 5-15 via the occupation
of the parents affects the ability to climb the social ladder in terms of
income at ages 45-49 using the Danish micro data from years 1961-2019. Our
measure of technology exposure covers the degree to which using computers
(hardware and software) is required to perform an occupation, and it is created
by merging occupational codes with detailed data from O*NET. The challenge in
estimating this effect is that long-term outcome is observed over a different
time horizon than our treatment of interest. We therefore adapt the surrogate
index methodology, linking the effect of our childhood treatment on
intermediate surrogates, such as income and education at ages 25-29, to the
effect on adulthood income. We estimate that a one standard error increase in
exposure to technology increases the income rank by 2%-points, which is
economically and statistically significant and robust to cluster-correlation
within families. The derived policy recommendation is to update the educational
curriculum to expose children to computers to a higher degree, which may then
act as a social leveler.

arXiv link: http://arxiv.org/abs/2212.03351v2

Econometrics arXiv paper, submitted: 2022-12-05

Identification of Unobservables in Observations

Authors: Yingyao Hu

In empirical studies, the data usually don't include all the variables of
interest in an economic model. This paper shows the identification of
unobserved variables in observations at the population level. When the
observables are distinct in each observation, there exists a function mapping
from the observables to the unobservables. Such a function guarantees the
uniqueness of the latent value in each observation. The key lies in the
identification of the joint distribution of observables and unobservables from
the distribution of observables. The joint distribution of observables and
unobservables then reveal the latent value in each observation. Three examples
of this result are discussed.

arXiv link: http://arxiv.org/abs/2212.02585v1

Econometrics arXiv updated paper (originally submitted: 2022-12-05)

Educational Inequality of Opportunity and Mobility in Europe

Authors: Joël Terschuur

Educational attainment generates labor market returns, societal gains and has
intrinsic value for individuals. We study Inequality of Opportunity (IOp) and
intergenerational mobility in the distribution of educational attainment. We
propose to use debiased IOp estimators based on the Gini coefficient and the
Mean Logarithmic Deviation (MLD) which are robust to machine learning biases.
We also measure the effect of each circumstance on IOp, we provide tests to
compare IOp in two populations and to test joint significance of a group of
circumstances. We find that circumstances explain between 38% and 74% of
total educational inequality in European countries. Mother's education is the
most important circumstance in most countries. There is high intergenerational
persistence and there is evidence of an educational Great Gatsby curve. We also
construct IOp aware educational Great Gatsby curves and find that high income
IOp countries are also high educational IOp and less mobile countries.

arXiv link: http://arxiv.org/abs/2212.02407v3

Econometrics arXiv paper, submitted: 2022-12-05

A Data Fusion Approach for Ride-sourcing Demand Estimation: A Discrete Choice Model with Sampling and Endogeneity Corrections

Authors: Rico Krueger, Michel Bierlaire, Prateek Bansal

Ride-sourcing services offered by companies like Uber and Didi have grown
rapidly in the last decade. Understanding the demand for these services is
essential for planning and managing modern transportation systems. Existing
studies develop statistical models for ride-sourcing demand estimation at an
aggregate level due to limited data availability. These models lack foundations
in microeconomic theory, ignore competition of ride-sourcing with other travel
modes, and cannot be seamlessly integrated into existing individual-level
(disaggregate) activity-based models to evaluate system-level impacts of
ride-sourcing services. In this paper, we present and apply an approach for
estimating ride-sourcing demand at a disaggregate level using discrete choice
models and multiple data sources. We first construct a sample of trip-based
mode choices in Chicago, USA by enriching household travel survey with publicly
available ride-sourcing and taxi trip records. We then formulate a multivariate
extreme value-based discrete choice with sampling and endogeneity corrections
to account for the construction of the estimation sample from multiple data
sources and endogeneity biases arising from supply-side constraints and surge
pricing mechanisms in ride-sourcing systems. Our analysis of the constructed
dataset reveals insights into the influence of various socio-economic, land use
and built environment features on ride-sourcing demand. We also derive
elasticities of ride-sourcing demand relative to travel cost and time. Finally,
we illustrate how the developed model can be employed to quantify the welfare
implications of ride-sourcing policies and regulations such as terminating
certain types of services and introducing ride-sourcing taxes.

arXiv link: http://arxiv.org/abs/2212.02178v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-12-04

Counterfactual Learning with General Data-generating Policies

Authors: Yusuke Narita, Kyohei Okumura, Akihiro Shimizu, Kohei Yata

Off-policy evaluation (OPE) attempts to predict the performance of
counterfactual policies using log data from a different policy. We extend its
applicability by developing an OPE method for a class of both full support and
deficient support logging policies in contextual-bandit settings. This class
includes deterministic bandit (such as Upper Confidence Bound) as well as
deterministic decision-making based on supervised and unsupervised learning. We
prove that our method's prediction converges in probability to the true
performance of a counterfactual policy as the sample size increases. We
validate our method with experiments on partly and entirely deterministic
logging policies. Finally, we apply it to evaluate coupon targeting policies by
a major online platform and show how to improve the existing policy.

arXiv link: http://arxiv.org/abs/2212.01925v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2022-11-30

mCube: Multinomial Micro-level reserving Model

Authors: Emmanuel Jordy Menvouta, Jolien Ponnet, Robin Van Oirbeek, Tim Verdonck

This paper presents a multinomial multi-state micro-level reserving model,
denoted mCube. We propose a unified framework for modelling the time and the
payment process for IBNR and RBNS claims and for modeling IBNR claim counts. We
use multinomial distributions for the time process and spliced mixture models
for the payment process. We illustrate the excellent performance of the
proposed model on a real data set of a major insurance company consisting of
bodily injury claims. It is shown that the proposed model produces a best
estimate distribution that is centered around the true reserve.

arXiv link: http://arxiv.org/abs/2212.00101v1

Econometrics arXiv updated paper (originally submitted: 2022-11-30)

Incorporating Prior Knowledge of Latent Group Structure in Panel Data Models

Authors: Boyuan Zhang

The assumption of group heterogeneity has become popular in panel data
models. We develop a constrained Bayesian grouped estimator that exploits
researchers' prior beliefs on groups in a form of pairwise constraints,
indicating whether a pair of units is likely to belong to a same group or
different groups. We propose a prior to incorporate the pairwise constraints
with varying degrees of confidence. The whole framework is built on the
nonparametric Bayesian method, which implicitly specifies a distribution over
the group partitions, and so the posterior analysis takes the uncertainty of
the latent group structure into account. Monte Carlo experiments reveal that
adding prior knowledge yields more accurate estimates of coefficient and scores
predictive gains over alternative estimators. We apply our method to two
empirical applications. In a first application to forecasting U.S. CPI
inflation, we illustrate that prior knowledge of groups improves density
forecasts when the data is not entirely informative. A second application
revisits the relationship between a country's income and its democratic
transition; we identify heterogeneous income effects on democracy with five
distinct groups over ninety countries.

arXiv link: http://arxiv.org/abs/2211.16714v3

Econometrics arXiv updated paper (originally submitted: 2022-11-29)

Score-based calibration testing for multivariate forecast distributions

Authors: Malte Knüppel, Fabian Krüger, Marc-Oliver Pohle

Calibration tests based on the probability integral transform (PIT) are
routinely used to assess the quality of univariate distributional forecasts.
However, PIT-based calibration tests for multivariate distributional forecasts
face various challenges. We propose two new types of tests based on proper
scoring rules, which overcome these challenges. They arise from a general
framework for calibration testing in the multivariate case, introduced in this
work. The new tests have good size and power properties in simulations and
solve various problems of existing tests. We apply the tests to forecast
distributions for macroeconomic and financial time series data.

arXiv link: http://arxiv.org/abs/2211.16362v3

Econometrics arXiv updated paper (originally submitted: 2022-11-29)

Double Robust Bayesian Inference on Average Treatment Effects

Authors: Christoph Breunig, Ruixuan Liu, Zhengfei Yu

We propose a double robust Bayesian inference procedure on the average
treatment effect (ATE) under unconfoundedness. For our new Bayesian approach,
we first adjust the prior distributions of the conditional mean functions, and
then correct the posterior distribution of the resulting ATE. Both adjustments
make use of pilot estimators motivated by the semiparametric influence function
for ATE estimation. We prove asymptotic equivalence of our Bayesian procedure
and efficient frequentist ATE estimators by establishing a new semiparametric
Bernstein-von Mises theorem under double robustness; i.e., the lack of
smoothness of conditional mean functions can be compensated by high regularity
of the propensity score and vice versa. Consequently, the resulting Bayesian
credible sets form confidence intervals with asymptotically exact coverage
probability. In simulations, our method provides precise point estimates of the
ATE through the posterior mean and credible intervals that closely align with
the nominal coverage probability. Furthermore, our approach achieves a shorter
interval length in comparison to existing methods. We illustrate our method in
an application to the National Supported Work Demonstration following LaLonde
[1986] and Dehejia and Wahba [1999].

arXiv link: http://arxiv.org/abs/2211.16298v6

Econometrics arXiv updated paper (originally submitted: 2022-11-29)

Bayesian Multivariate Quantile Regression with alternative Time-varying Volatility Specifications

Authors: Matteo Iacopini, Francesco Ravazzolo, Luca Rossini

This article proposes a novel Bayesian multivariate quantile regression to
forecast the tail behavior of energy commodities, where the homoskedasticity
assumption is relaxed to allow for time-varying volatility. In particular, we
exploit the mixture representation of the multivariate asymmetric Laplace
likelihood and the Cholesky-type decomposition of the scale matrix to introduce
stochastic volatility and GARCH processes and then provide an efficient MCMC to
estimate them. The proposed models outperform the homoskedastic benchmark
mainly when predicting the distribution's tails. We provide a model combination
using a quantile score-based weighting scheme, which leads to improved
performances, notably when no single model uniformly outperforms the other
across quantiles, time, or variables.

arXiv link: http://arxiv.org/abs/2211.16121v2

Econometrics arXiv paper, submitted: 2022-11-28

Synthetic Principal Component Design: Fast Covariate Balancing with Synthetic Controls

Authors: Yiping Lu, Jiajin Li, Lexing Ying, Jose Blanchet

The optimal design of experiments typically involves solving an NP-hard
combinatorial optimization problem. In this paper, we aim to develop a globally
convergent and practically efficient optimization algorithm. Specifically, we
consider a setting where the pre-treatment outcome data is available and the
synthetic control estimator is invoked. The average treatment effect is
estimated via the difference between the weighted average outcomes of the
treated and control units, where the weights are learned from the observed
data. {Under this setting, we surprisingly observed that the optimal
experimental design problem could be reduced to a so-called phase
synchronization problem.} We solve this problem via a normalized variant of
the generalized power method with spectral initialization. On the theoretical
side, we establish the first global optimality guarantee for experiment design
when pre-treatment data is sampled from certain data-generating processes.
Empirically, we conduct extensive experiments to demonstrate the effectiveness
of our method on both the US Bureau of Labor Statistics and the
Abadie-Diemond-Hainmueller California Smoking Data. In terms of the root mean
square error, our algorithm surpasses the random design by a large margin.

arXiv link: http://arxiv.org/abs/2211.15241v1

Econometrics arXiv updated paper (originally submitted: 2022-11-27)

Inference in Cluster Randomized Trials with Matched Pairs

Authors: Yuehao Bai, Jizhou Liu, Azeem M. Shaikh, Max Tabord-Meehan

This paper studies inference in cluster randomized trials where treatment
status is determined according to a "matched pairs" design. Here, by a cluster
randomized experiment, we mean one in which treatment is assigned at the level
of the cluster; by a "matched pairs" design, we mean that a sample of clusters
is paired according to baseline, cluster-level covariates and, within each
pair, one cluster is selected at random for treatment. We study the
large-sample behavior of a weighted difference-in-means estimator and derive
two distinct sets of results depending on if the matching procedure does or
does not match on cluster size. We then propose a single variance estimator
which is consistent in either regime. Combining these results establishes the
asymptotic exactness of tests based on these estimators. Next, we consider the
properties of two common testing procedures based on t-tests constructed from
linear regressions, and argue that both are generally conservative in our
framework. We additionally study the behavior of a randomization test which
permutes the treatment status for clusters within pairs, and establish its
finite-sample and asymptotic validity for testing specific null hypotheses.
Finally, we propose a covariate-adjusted estimator which adjusts for additional
baseline covariates not used for treatment assignment, and establish conditions
under which such an estimator leads to strict improvements in precision. A
simulation study confirms the practical relevance of our theoretical results.

arXiv link: http://arxiv.org/abs/2211.14903v5

Econometrics arXiv updated paper (originally submitted: 2022-11-27)

Extreme Changes in Changes

Authors: Yuya Sasaki, Yulong Wang

Policy analysts are often interested in treating the units with extreme
outcomes, such as infants with extremely low birth weights. Existing
changes-in-changes (CIC) estimators are tailored to middle quantiles and do not
work well for such subpopulations. This paper proposes a new CIC estimator to
accurately estimate treatment effects at extreme quantiles. With its asymptotic
normality, we also propose a method of statistical inference, which is simple
to implement. Based on simulation studies, we propose to use our extreme CIC
estimator for extreme, such as below 5% and above 95%, quantiles, while the
conventional CIC estimator should be used for intermediate quantiles. Applying
the proposed method, we study the effects of income gains from the 1993 EITC
reform on infant birth weights for those in the most critical conditions. This
paper is accompanied by a Stata command.

arXiv link: http://arxiv.org/abs/2211.14870v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-11-25

Machine Learning Algorithms for Time Series Analysis and Forecasting

Authors: Rameshwar Garg, Shriya Barpanda, Girish Rao Salanke N S, Ramya S

Time series data is being used everywhere, from sales records to patients'
health evolution metrics. The ability to deal with this data has become a
necessity, and time series analysis and forecasting are used for the same.
Every Machine Learning enthusiast would consider these as very important tools,
as they deepen the understanding of the characteristics of data. Forecasting is
used to predict the value of a variable in the future, based on its past
occurrences. A detailed survey of the various methods that are used for
forecasting has been presented in this paper. The complete process of
forecasting, from preprocessing to validation has also been explained
thoroughly. Various statistical and deep learning models have been considered,
notably, ARIMA, Prophet and LSTMs. Hybrid versions of Machine Learning models
have also been explored and elucidated. Our work can be used by anyone to
develop a good understanding of the forecasting process, and to identify
various state of the art models which are being used today.

arXiv link: http://arxiv.org/abs/2211.14387v1

Econometrics arXiv paper, submitted: 2022-11-25

A Design-Based Approach to Spatial Correlation

Authors: Ruonan Xu, Jeffrey M. Wooldridge

When observing spatial data, what standard errors should we report? With the
finite population framework, we identify three channels of spatial correlation:
sampling scheme, assignment design, and model specification. The
Eicker-Huber-White standard error, the cluster-robust standard error, and the
spatial heteroskedasticity and autocorrelation consistent standard error are
compared under different combinations of the three channels. Then, we provide
guidelines for whether standard errors should be adjusted for spatial
correlation for both linear and nonlinear estimators. As it turns out, the
answer to this question also depends on the magnitude of the sampling
probability.

arXiv link: http://arxiv.org/abs/2211.14354v1

Econometrics arXiv updated paper (originally submitted: 2022-11-25)

Strategyproof Decision-Making in Panel Data Settings and Beyond

Authors: Keegan Harris, Anish Agarwal, Chara Podimata, Zhiwei Steven Wu

We consider the problem of decision-making using panel data, in which a
decision-maker gets noisy, repeated measurements of multiple units (or agents).
We consider a setup where there is a pre-intervention period, when the
principal observes the outcomes of each unit, after which the principal uses
these observations to assign a treatment to each unit. Unlike this classical
setting, we permit the units generating the panel data to be strategic, i.e.
units may modify their pre-intervention outcomes in order to receive a more
desirable intervention. The principal's goal is to design a strategyproof
intervention policy, i.e. a policy that assigns units to their
utility-maximizing interventions despite their potential strategizing. We first
identify a necessary and sufficient condition under which a strategyproof
intervention policy exists, and provide a strategyproof mechanism with a simple
closed form when one does exist. Along the way, we prove impossibility results
for strategic multiclass classification, which may be of independent interest.
When there are two interventions, we establish that there always exists a
strategyproof mechanism, and provide an algorithm for learning such a
mechanism. For three or more interventions, we provide an algorithm for
learning a strategyproof mechanism if there exists a sufficiently large gap in
the principal's rewards between different interventions. Finally, we
empirically evaluate our model using real-world panel data collected from
product sales over 18 months. We find that our methods compare favorably to
baselines which do not take strategic interactions into consideration, even in
the presence of model misspecification.

arXiv link: http://arxiv.org/abs/2211.14236v4

Econometrics arXiv paper, submitted: 2022-11-24

Spectral estimation for mixed causal-noncausal autoregressive models

Authors: Alain Hecq, Daniel Velasquez-Gaviria

This paper investigates new ways of estimating and identifying causal,
noncausal, and mixed causal-noncausal autoregressive models driven by a
non-Gaussian error sequence. We do not assume any parametric distribution
function for the innovations. Instead, we use the information of higher-order
cumulants, combining the spectrum and the bispectrum in a minimum distance
estimation. We show how to circumvent the nonlinearity of the parameters and
the multimodality in the noncausal and mixed models by selecting the
appropriate initial values in the estimation. In addition, we propose a method
of identification using a simple comparison criterion based on the global
minimum of the estimation function. By means of a Monte Carlo study, we find
unbiased estimated parameters and a correct identification as the data depart
from normality. We propose an empirical application on eight monthly commodity
prices, finding noncausal and mixed causal-noncausal dynamics.

arXiv link: http://arxiv.org/abs/2211.13830v1

Econometrics arXiv updated paper (originally submitted: 2022-11-24)

Cross-Sectional Dynamics Under Network Structure: Theory and Macroeconomic Applications

Authors: Marko Mlikota

Many environments in economics involve units linked by bilateral ties. I
develop an econometric framework that rationalizes the dynamics of
cross-sectional variables as the innovation transmission along fixed bilateral
links and that can accommodate rich patterns of how network effects of higher
order accumulate over time. The proposed Network-VAR (NVAR) can be used to
estimate dynamic network effects, with the network given or inferred from
dynamic cross-correlations in the data. In the latter case, it also offers a
dimensionality-reduction technique for modeling high-dimensional
(cross-sectional) processes, owing to networks' ability to summarize complex
relations among variables (units) by relatively few bilateral links. In a first
application, I show that sectoral output growth in an RBC economy with lagged
input-output conversion follows an NVAR. I characterize impulse-responses to
TFP shocks in this environment, and I estimate that the lagged transmission of
productivity shocks along supply chains can account for a third of the
persistence in aggregate output growth. The remainder is due to persistence in
the aggregate TFP process, leaving a negligible role for persistence in
sectoral TFP. In a second application, I forecast macroeconomic aggregates
across OECD countries by assuming and estimating a network that underlies the
dynamics. In line with an equivalence result I provide, this reduces
out-of-sample mean squared errors relative to a dynamic factor model. The
reductions range from -12% for quarterly real GDP growth to -68% for monthly
CPI inflation.

arXiv link: http://arxiv.org/abs/2211.13610v6

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2022-11-23

Simulation-based Forecasting for Intraday Power Markets: Modelling Fundamental Drivers for Location, Shape and Scale of the Price Distribution

Authors: Simon Hirsch, Florian Ziel

During the last years, European intraday power markets have gained importance
for balancing forecast errors due to the rising volumes of intermittent
renewable generation. However, compared to day-ahead markets, the drivers for
the intraday price process are still sparsely researched. In this paper, we
propose a modelling strategy for the location, shape and scale parameters of
the return distribution in intraday markets, based on fundamental variables. We
consider wind and solar forecasts and their intraday updates, outages, price
information and a novel measure for the shape of the merit-order, derived from
spot auction curves as explanatory variables. We validate our modelling by
simulating price paths and compare the probabilistic forecasting performance of
our model to benchmark models in a forecasting study for the German market. The
approach yields significant improvements in the forecasting performance,
especially in the tails of the distribution. At the same time, we are able to
derive the contribution of the driving variables. We find that, apart from the
first lag of the price changes, none of our fundamental variables have
explanatory power for the expected value of the intraday returns. This implies
weak-form market efficiency as renewable forecast changes and outage
information seems to be priced in by the market. We find that the volatility is
driven by the merit-order regime, the time to delivery and the closure of
cross-border order books. The tail of the distribution is mainly influenced by
past price differences and trading activity. Our approach is directly
transferable to other continuous intraday markets in Europe.

arXiv link: http://arxiv.org/abs/2211.13002v1

Econometrics arXiv paper, submitted: 2022-11-22

Macroeconomic Effects of Active Labour Market Policies: A Novel Instrumental Variables Approach

Authors: Ulrike Unterhofer, Conny Wunsch

This study evaluates the macroeconomic effects of active labour market
policies (ALMP) in Germany over the period 2005 to 2018. We propose a novel
identification strategy to overcome the simultaneity of ALMP and labour market
outcomes at the regional level. It exploits the imperfect overlap of local
labour markets and local employment agencies that decide on the local
implementation of policies. Specifically, we instrument for the use of ALMP in
a local labour market with the mix of ALMP implemented outside this market but
in local employment agencies that partially overlap with this market. We find
no effects of short-term activation measures and further vocational training on
aggregate labour market outcomes. In contrast, wage subsidies substantially
increase the share of workers in unsubsidised employment while lowering
long-term unemployment and welfare dependency. Our results suggest that
negative externalities of ALMP partially offset the effects for program
participants and that some segments of the labour market benefit more than
others.

arXiv link: http://arxiv.org/abs/2211.12437v1

Econometrics arXiv updated paper (originally submitted: 2022-11-22)

Peer Effects in Labor Market Training

Authors: Ulrike Unterhofer

This paper shows that group composition shapes the effectiveness of labor
market training programs for jobseekers. Using rich administrative data from
Germany and a novel measure of employability, I find that participants benefit
from greater average exposure to highly employable peers through increased
long-term employment and earnings. The effects vary significantly by own
employability: jobseekers with a low employability experience larger long-term
gains, whereas highly employable individuals benefit primarily in the short
term through higher entry wages. An analysis of mechanisms suggests that
within-group competition in job search attenuates part of the positive effects
that operate through knowledge spillovers.

arXiv link: http://arxiv.org/abs/2211.12366v3

Econometrics arXiv paper, submitted: 2022-11-22

Asymptotic Properties of the Synthetic Control Method

Authors: Xiaomeng Zhang, Wendun Wang, Xinyu Zhang

This paper provides new insights into the asymptotic properties of the
synthetic control method (SCM). We show that the synthetic control (SC) weight
converges to a limiting weight that minimizes the mean squared prediction risk
of the treatment-effect estimator when the number of pretreatment periods goes
to infinity, and we also quantify the rate of convergence. Observing the link
between the SCM and model averaging, we further establish the asymptotic
optimality of the SC estimator under imperfect pretreatment fit, in the sense
that it achieves the lowest possible squared prediction error among all
possible treatment effect estimators that are based on an average of control
units, such as matching, inverse probability weighting and
difference-in-differences. The asymptotic optimality holds regardless of
whether the number of control units is fixed or divergent. Thus, our results
provide justifications for the SCM in a wide range of applications. The
theoretical results are verified via simulations.

arXiv link: http://arxiv.org/abs/2211.12095v1

Econometrics arXiv paper, submitted: 2022-11-22

Contextual Bandits in a Survey Experiment on Charitable Giving: Within-Experiment Outcomes versus Policy Learning

Authors: Susan Athey, Undral Byambadalai, Vitor Hadad, Sanath Kumar Krishnamurthy, Weiwen Leung, Joseph Jay Williams

We design and implement an adaptive experiment (a “contextual bandit”) to
learn a targeted treatment assignment policy, where the goal is to use a
participant's survey responses to determine which charity to expose them to in
a donation solicitation. The design balances two competing objectives:
optimizing the outcomes for the subjects in the experiment (“cumulative regret
minimization”) and gathering data that will be most useful for policy
learning, that is, for learning an assignment rule that will maximize welfare
if used after the experiment (“simple regret minimization”). We evaluate
alternative experimental designs by collecting pilot data and then conducting a
simulation study. Next, we implement our selected algorithm. Finally, we
perform a second simulation study anchored to the collected data that evaluates
the benefits of the algorithm we chose. Our first result is that the value of a
learned policy in this setting is higher when data is collected via a uniform
randomization rather than collected adaptively using standard cumulative regret
minimization or policy learning algorithms. We propose a simple heuristic for
adaptive experimentation that improves upon uniform randomization from the
perspective of policy learning at the expense of increasing cumulative regret
relative to alternative bandit algorithms. The heuristic modifies an existing
contextual bandit algorithm by (i) imposing a lower bound on assignment
probabilities that decay slowly so that no arm is discarded too quickly, and
(ii) after adaptively collecting data, restricting policy learning to select
from arms where sufficient data has been gathered.

arXiv link: http://arxiv.org/abs/2211.12004v1

Econometrics arXiv updated paper (originally submitted: 2022-11-21)

A Misuse of Specification Tests

Authors: Naoya Sueishi

Empirical researchers often perform model specification tests, such as
Hausman tests and overidentifying restrictions tests, to assess the validity of
estimators rather than that of models. This paper examines the effectiveness of
such specification pretests in detecting invalid estimators. We analyze the
local asymptotic properties of test statistics and estimators and show that
locally unbiased specification tests cannot determine whether asymptotically
efficient estimators are asymptotically biased. In particular, an estimator may
remain valid even when the null hypothesis of correct model specification is
false, and it may be invalid even when the null hypothesis is true. The main
message of the paper is that correct model specification and valid estimation
are distinct issues: correct specification is neither necessary nor sufficient
for asymptotically unbiased estimation.

arXiv link: http://arxiv.org/abs/2211.11915v2

Econometrics arXiv paper, submitted: 2022-11-21

Structural Modelling of Dynamic Networks and Identifying Maximum Likelihood

Authors: Christian Gourieroux, Joann Jasiak

This paper considers nonlinear dynamic models where the main parameter of
interest is a nonnegative matrix characterizing the network (contagion)
effects. This network matrix is usually constrained either by assuming a
limited number of nonzero elements (sparsity), or by considering a reduced rank
approach for nonnegative matrix factorization (NMF). We follow the latter
approach and develop a new probabilistic NMF method. We introduce a new
Identifying Maximum Likelihood (IML) method for consistent estimation of the
identified set of admissible NMF's and derive its asymptotic distribution.
Moreover, we propose a maximum likelihood estimator of the parameter matrix for
a given non-negative rank, derive its asymptotic distribution and the
associated efficiency bound.

arXiv link: http://arxiv.org/abs/2211.11876v1

Econometrics arXiv paper, submitted: 2022-11-18

Fractional integration and cointegration

Authors: Javier Hualde, Morten Ørregaard Nielsen

In this chapter we present an overview of the main ideas and methods in the
fractional integration and cointegration literature. We do not attempt to give
a complete survey of this enormous literature, but rather a more introductory
treatment suitable for a researcher or graduate student wishing to learn about
this exciting field of research. With this aim, we have surely overlooked many
relevant references for which we apologize in advance. Knowledge of standard
time series methods, and in particular methods related to nonstationary time
series, at the level of a standard graduate course or advanced undergraduate
course is assumed.

arXiv link: http://arxiv.org/abs/2211.10235v1

Econometrics arXiv updated paper (originally submitted: 2022-11-17)

Cointegration with Occasionally Binding Constraints

Authors: James A. Duffy, Sophocles Mavroeidis, Sam Wycherley

In the literature on nonlinear cointegration, a long-standing open problem
relates to how a (nonlinear) vector autoregression, which provides a unified
description of the short- and long-run dynamics of a vector of time series, can
generate 'nonlinear cointegration' in the profound sense of those series
sharing common nonlinear stochastic trends. We consider this problem in the
setting of the censored and kinked structural VAR (CKSVAR), which provides a
flexible yet tractable framework within which to model time series that are
subject to threshold-type nonlinearities, such as those arising due to
occasionally binding constraints, of which the zero lower bound (ZLB) on
short-term nominal interest rates provides a leading example. We provide a
complete characterisation of how common linear and nonlinear stochastic trends
may be generated in this model, via unit roots and appropriate generalisations
of the usual rank conditions, providing the first extension to date of the
Granger-Johansen representation theorem to a nonlinearly cointegrated setting,
and thereby giving the first successful treatment of the open problem. The
limiting common trend processes include regulated, censored and kinked Brownian
motions, none of which have previously appeared in the literature on
cointegrated VARs. Our results and running examples illustrate that the CKSVAR
is capable of supporting a far richer variety of long-run behaviour than is a
linear VAR, in ways that may be particularly useful for the identification of
structural parameters.

arXiv link: http://arxiv.org/abs/2211.09604v4

Econometrics arXiv paper, submitted: 2022-11-17

On the Role of the Zero Conditional Mean Assumption for Causal Inference in Linear Models

Authors: Federico Crudu, Michael C. Knaus, Giovanni Mellace, Joeri Smits

Many econometrics textbooks imply that under mean independence of the
regressors and the error term, the OLS parameters have a causal interpretation.
We show that even when this assumption is satisfied, OLS might identify a
pseudo-parameter that does not have a causal interpretation. Even assuming that
the linear model is "structural" creates some ambiguity in what the regression
error represents and whether the OLS estimand is causal. This issue applies
equally to linear IV and panel data models. To give these estimands a causal
interpretation, one needs to impose assumptions on a "causal" model, e.g.,
using the potential outcome framework. This highlights that causal inference
requires causal, and not just stochastic, assumptions.

arXiv link: http://arxiv.org/abs/2211.09502v1

Econometrics arXiv paper, submitted: 2022-11-16

Estimating Dynamic Spillover Effects along Multiple Networks in a Linear Panel Model

Authors: Clemens Possnig, Andreea Rotărescu, Kyungchul Song

Spillover of economic outcomes often arises over multiple networks, and
distinguishing their separate roles is important in empirical research. For
example, the direction of spillover between two groups (such as banks and
industrial sectors linked in a bipartite graph) has important economic
implications, and a researcher may want to learn which direction is supported
in the data. For this, we need to have an empirical methodology that allows for
both directions of spillover simultaneously. In this paper, we develop a
dynamic linear panel model and asymptotic inference with large $n$ and small
$T$, where both directions of spillover are accommodated through multiple
networks. Using the methodology developed here, we perform an empirical study
of spillovers between bank weakness and zombie-firm congestion in industrial
sectors, using firm-bank matched data from Spain between 2005 and 2012.
Overall, we find that there is positive spillover in both directions between
banks and sectors.

arXiv link: http://arxiv.org/abs/2211.08995v1

Econometrics arXiv updated paper (originally submitted: 2022-11-16)

Causal Bandits: Online Decision-Making in Endogenous Settings

Authors: Jingwen Zhang, Yifang Chen, Amandeep Singh

The deployment of Multi-Armed Bandits (MAB) has become commonplace in many
economic applications. However, regret guarantees for even state-of-the-art
linear bandit algorithms (such as Optimism in the Face of Uncertainty Linear
bandit (OFUL)) make strong exogeneity assumptions w.r.t. arm covariates. This
assumption is very often violated in many economic contexts and using such
algorithms can lead to sub-optimal decisions. Further, in social science
analysis, it is also important to understand the asymptotic distribution of
estimated parameters. To this end, in this paper, we consider the problem of
online learning in linear stochastic contextual bandit problems with endogenous
covariates. We propose an algorithm we term $\epsilon$-BanditIV, that uses
instrumental variables to correct for this bias, and prove an
$\mathcal{O}(kT)$ upper bound for the expected regret of the
algorithm. Further, we demonstrate the asymptotic consistency and normality of
the $\epsilon$-BanditIV estimator. We carry out extensive Monte Carlo
simulations to demonstrate the performance of our algorithms compared to other
methods. We show that $\epsilon$-BanditIV significantly outperforms other
existing methods in endogenous settings. Finally, we use data from real-time
bidding (RTB) system to demonstrate how $\epsilon$-BanditIV can be used to
estimate the causal impact of advertising in such settings and compare its
performance with other existing methods.

arXiv link: http://arxiv.org/abs/2211.08649v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-11-15

Robust estimation for Threshold Autoregressive Moving-Average models

Authors: Greta Goracci, Davide Ferrari, Simone Giannerini, Francesco ravazzolo

Threshold autoregressive moving-average (TARMA) models are popular in time
series analysis due to their ability to parsimoniously describe several complex
dynamical features. However, neither theory nor estimation methods are
currently available when the data present heavy tails or anomalous
observations, which is often the case in applications. In this paper, we
provide the first theoretical framework for robust M-estimation for TARMA
models and also study its practical relevance. Under mild conditions, we show
that the robust estimator for the threshold parameter is super-consistent,
while the estimators for autoregressive and moving-average parameters are
strongly consistent and asymptotically normal. The Monte Carlo study shows that
the M-estimator is superior, in terms of both bias and variance, to the least
squares estimator, which can be heavily affected by outliers. The findings
suggest that robust M-estimation should be generally preferred to the least
squares method. Finally, we apply our methodology to a set of commodity price
time series; the robust TARMA fit presents smaller standard errors and leads to
superior forecasting accuracy compared to the least squares fit. The results
support the hypothesis of a two-regime, asymmetric nonlinearity around zero,
characterised by slow expansions and fast contractions.

arXiv link: http://arxiv.org/abs/2211.08205v1

Econometrics arXiv paper, submitted: 2022-11-15

Identification and Auto-debiased Machine Learning for Outcome Conditioned Average Structural Derivatives

Authors: Zequn Jin, Lihua Lin, Zhengyu Zhang

This paper proposes a new class of heterogeneous causal quantities, named
outcome conditioned average structural derivatives (OASD) in a general
nonseparable model. OASD is the average partial effect of a marginal change in
a continuous treatment on the individuals located at different parts of the
outcome distribution, irrespective of individuals' characteristics. OASD
combines both features of ATE and QTE: it is interpreted as straightforwardly
as ATE while at the same time more granular than ATE by breaking the entire
population up according to the rank of the outcome distribution.
One contribution of this paper is that we establish some close relationships
between the outcome conditioned average partial effects and a class of
parameters measuring the effect of counterfactually changing the distribution
of a single covariate on the unconditional outcome quantiles. By exploiting
such relationship, we can obtain root-$n$ consistent estimator and calculate
the semi-parametric efficiency bound for these counterfactual effect
parameters. We illustrate this point by two examples: equivalence between OASD
and the unconditional partial quantile effect (Firpo et al. (2009)), and
equivalence between the marginal partial distribution policy effect (Rothe
(2012)) and a corresponding outcome conditioned parameter.
Because identification of OASD is attained under a conditional exogeneity
assumption, by controlling for a rich information about covariates, a
researcher may ideally use high-dimensional controls in data. We propose for
OASD a novel automatic debiased machine learning estimator, and present
asymptotic statistical guarantees for it. We prove our estimator is root-$n$
consistent, asymptotically normal, and semiparametrically efficient. We also
prove the validity of the bootstrap procedure for uniform inference on the OASD
process.

arXiv link: http://arxiv.org/abs/2211.07903v1

Econometrics arXiv updated paper (originally submitted: 2022-11-15)

Graph Neural Networks for Causal Inference Under Network Confounding

Authors: Michael P. Leung, Pantelis Loupos

This paper studies causal inference with observational data from a single
large network. We consider a nonparametric model with interference in potential
outcomes and selection into treatment. Both stages may be the outcomes of
simultaneous equation models, which allow for endogenous peer effects. This
results in high-dimensional network confounding where the network and
covariates of all units constitute sources of selection bias. In contrast, the
existing literature assumes that confounding can be summarized by a known,
low-dimensional function of these objects. We propose to use graph neural
networks (GNNs) to adjust for network confounding. When interference decays
with network distance, we argue that the model has low-dimensional structure
that makes estimation feasible and justifies the use of shallow GNN
architectures.

arXiv link: http://arxiv.org/abs/2211.07823v4

Econometrics arXiv updated paper (originally submitted: 2022-11-14)

Type I Tobit Bayesian Additive Regression Trees for Censored Outcome Regression

Authors: Eoghan O'Neill

Censoring occurs when an outcome is unobserved beyond some threshold value.
Methods that do not account for censoring produce biased predictions of the
unobserved outcome. This paper introduces Type I Tobit Bayesian Additive
Regression Tree (TOBART-1) models for censored outcomes. Simulation results and
real data applications demonstrate that TOBART-1 produces accurate predictions
of censored outcomes. TOBART-1 provides posterior intervals for the conditional
expectation and other quantities of interest. The error term distribution can
have a large impact on the expectation of the censored outcome. Therefore the
error is flexibly modeled as a Dirichlet process mixture of normal
distributions.

arXiv link: http://arxiv.org/abs/2211.07506v4

Econometrics arXiv updated paper (originally submitted: 2022-11-12)

Robust Difference-in-differences Models

Authors: Kyunghoon Ban, Désiré Kédagni

The difference-in-differences (DID) method identifies the average treatment
effects on the treated (ATT) under mainly the so-called parallel trends (PT)
assumption. The most common and widely used approach to justify the PT
assumption is the pre-treatment period examination. If a null hypothesis of the
same trend in the outcome means for both treatment and control groups in the
pre-treatment periods is rejected, researchers believe less in PT and the DID
results. This paper develops a robust generalized DID method that utilizes all
the information available not only from the pre-treatment periods but also from
multiple data sources. Our approach interprets PT in a different way using a
notion of selection bias, which enables us to generalize the standard DID
estimand by defining an information set that may contain multiple pre-treatment
periods or other baseline covariates. Our main assumption states that the
selection bias in the post-treatment period lies within the convex hull of all
selection biases in the pre-treatment periods. We provide a sufficient
condition for this assumption to hold. Based on the baseline information set we
construct, we provide an identified set for the ATT that always contains the
true ATT under our identifying assumption, and also the standard DID estimand.
We extend our proposed approach to multiple treatment periods DID settings. We
propose a flexible and easy way to implement the method. Finally, we illustrate
our methodology through some numerical and empirical examples.

arXiv link: http://arxiv.org/abs/2211.06710v5

Econometrics arXiv updated paper (originally submitted: 2022-11-12)

Multiple Structural Breaks in Interactive Effects Panel Data and the Impact of Quantitative Easing on Bank Lending

Authors: Jan Ditzen, Yiannis Karavias, Joakim Westerlund

This paper develops a new toolbox for multiple structural break detection in
panel data models with interactive effects. The toolbox includes tests for the
presence of structural breaks, a break date estimator, and a break date
confidence interval. The new toolbox is applied to a large panel of US banks
for a period characterized by massive quantitative easing programs aimed at
lessening the impact of the global financial crisis and the COVID--19 pandemic.
The question we ask is: Have these programs been successful in spurring bank
lending in the US economy? The short answer turns out to be: “No”.

arXiv link: http://arxiv.org/abs/2211.06707v2

Econometrics arXiv updated paper (originally submitted: 2022-11-11)

A Residuals-Based Nonparametric Variance Ratio Test for Cointegration

Authors: Karsten Reichold

This paper derives asymptotic theory for Breitung's (2002, Journal of
Econometrics 108, 343-363) nonparameteric variance ratio unit root test when
applied to regression residuals. The test requires neither the specification of
the correlation structure in the data nor the choice of tuning parameters.
Compared with popular residuals-based no-cointegration tests, the variance
ratio test is less prone to size distortions but has smaller local asymptotic
power. However, this paper shows that local asymptotic power properties do not
serve as a useful indicator for the power of residuals-based no-cointegration
tests in finite samples. In terms of size-corrected power, the variance ratio
test performs relatively well and, in particular, does not suffer from power
reversal problems detected for, e.g., the frequently used augmented
Dickey-Fuller type no-cointegration test. An application to daily prices of
cryptocurrencies illustrates the usefulness of the variance ratio test in
practice.

arXiv link: http://arxiv.org/abs/2211.06288v3

Econometrics arXiv updated paper (originally submitted: 2022-11-09)

Bayesian Neural Networks for Macroeconomic Analysis

Authors: Niko Hauzenberger, Florian Huber, Karin Klieber, Massimiliano Marcellino

Macroeconomic data is characterized by a limited number of observations
(small T), many time series (big K) but also by featuring temporal dependence.
Neural networks, by contrast, are designed for datasets with millions of
observations and covariates. In this paper, we develop Bayesian neural networks
(BNNs) that are well-suited for handling datasets commonly used for
macroeconomic analysis in policy institutions. Our approach avoids extensive
specification searches through a novel mixture specification for the activation
function that appropriately selects the form of nonlinearities. Shrinkage
priors are used to prune the network and force irrelevant neurons to zero. To
cope with heteroskedasticity, the BNN is augmented with a stochastic volatility
model for the error term. We illustrate how the model can be used in a policy
institution by first showing that our different BNNs produce precise density
forecasts, typically better than those from other machine learning methods.
Finally, we showcase how our model can be used to recover nonlinearities in the
reaction of macroeconomic aggregates to financial shocks.

arXiv link: http://arxiv.org/abs/2211.04752v4

Econometrics arXiv updated paper (originally submitted: 2022-11-08)

Crises Do Not Cause Lower Short-Term Growth

Authors: Kaiwen Hou, David Hou, Yang Ouyang, Lulu Zhang, Aster Liu

It is commonly believed that financial crises "lead to" lower growth of a
country during the two-year recession period, which can be reflected by their
post-crisis GDP growth. However, by contrasting a causal model with a standard
prediction model, this paper argues that such a belief is non-causal. To make
causal inferences, we design a two-stage staggered difference-in-differences
model to estimate the average treatment effects. Interpreting the residuals as
the contribution of each crisis to the treatment effects, we astonishingly
conclude that cross-sectional crises are often limited to providing relevant
causal information to policymakers.

arXiv link: http://arxiv.org/abs/2211.04558v3

Econometrics arXiv updated paper (originally submitted: 2022-11-08)

On the Past, Present, and Future of the Diebold-Yilmaz Approach to Dynamic Network Connectedness

Authors: Francis X. Diebold, Kamil Yilmaz

We offer retrospective and prospective assessments of the Diebold-Yilmaz
connectedness research program, combined with personal recollections of its
development. Its centerpiece in many respects is Diebold and Yilmaz (2014),
around which our discussion is organized.

arXiv link: http://arxiv.org/abs/2211.04184v2

Econometrics arXiv updated paper (originally submitted: 2022-11-08)

Bootstraps for Dynamic Panel Threshold Models

Authors: Woosik Gong, Myung Hwan Seo

This paper develops valid bootstrap inference methods for the dynamic short
panel threshold regression. We demonstrate that the standard nonparametric
bootstrap is inconsistent for the first-differenced generalized method of
moments (GMM) estimator. The inconsistency arises from an $n^{1/4}$-consistent
non-normal asymptotic distribution of the threshold estimator when the true
parameter lies in the continuity region of the parameter space, which stems
from the rank deficiency of the approximate Jacobian of the sample moment
conditions on the continuity region. To address this, we propose a grid
bootstrap to construct confidence intervals for the threshold and a residual
bootstrap to construct confidence intervals for the coefficients. They are
shown to be valid regardless of the model's continuity. Moreover, we establish
a uniform validity for the grid bootstrap. A set of Monte Carlo experiments
demonstrates that the proposed bootstraps improve upon the standard
nonparametric bootstrap. An empirical application to a firm investment model
illustrates our methods.

arXiv link: http://arxiv.org/abs/2211.04027v4

Econometrics arXiv updated paper (originally submitted: 2022-11-04)

Fast, Robust Inference for Linear Instrumental Variables Models using Self-Normalized Moments

Authors: Eric Gautier, Christiern Rose

We propose and implement an approach to inference in linear instrumental
variables models which is simultaneously robust and computationally tractable.
Inference is based on self-normalization of sample moment conditions, and
allows for (but does not require) many (relative to the sample size), weak,
potentially invalid or potentially endogenous instruments, as well as for many
regressors and conditional heteroskedasticity. Our coverage results are uniform
and can deliver a small sample guarantee. We develop a new computational
approach based on semidefinite programming, which we show can equally be
applied to rapidly invert existing tests (e.g,. AR, LM, CLR, etc.).

arXiv link: http://arxiv.org/abs/2211.02249v3

Econometrics arXiv updated paper (originally submitted: 2022-11-04)

Boosted p-Values for High-Dimensional Vector Autoregression

Authors: Xiao Huang

Assessing the statistical significance of parameter estimates is an important
step in high-dimensional vector autoregression modeling. Using the
least-squares boosting method, we compute the p-value for each selected
parameter at every boosting step in a linear model. The p-values are
asymptotically valid and also adapt to the iterative nature of the boosting
procedure. Our simulation experiment shows that the p-values can keep false
positive rate under control in high-dimensional vector autoregressions. In an
application with more than 100 macroeconomic time series, we further show that
the p-values can not only select a sparser model with good prediction
performance but also help control model stability. A companion R package
boostvar is developed.

arXiv link: http://arxiv.org/abs/2211.02215v2

Econometrics arXiv updated paper (originally submitted: 2022-11-03)

Asymptotic Theory of Principal Component Analysis for High-Dimensional Time Series Data under a Factor Structure

Authors: Matteo Barigozzi

We review Principal Components (PC) estimation of a large approximate factor
model for a panel of $n$ stationary time series and we provide new derivations
of the asymptotic properties of the estimators, which are derived under a
minimal set of assumptions requiring only the existence of 4th order moments.
To this end, we also review various alternative sets of primitive sufficient
conditions for mean-squared consistency of the sample covariance matrix.
Finally, we discuss in detail the issue of identification of the loadings and
factors as well as its implications for inference.

arXiv link: http://arxiv.org/abs/2211.01921v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-11-03

Are Synthetic Control Weights Balancing Score?

Authors: Harsh Parikh

In this short note, I outline conditions under which conditioning on
Synthetic Control (SC) weights emulates a randomized control trial where the
treatment status is independent of potential outcomes. Specifically, I
demonstrate that if there exist SC weights such that (i) the treatment effects
are exactly identified and (ii) these weights are uniformly and cumulatively
bounded, then SC weights are balancing scores.

arXiv link: http://arxiv.org/abs/2211.01575v1

Econometrics arXiv updated paper (originally submitted: 2022-11-03)

Estimating interaction effects with panel data

Authors: Chris Muris, Konstantin Wacker

This paper analyzes how interaction effects can be consistently estimated
under economically plausible assumptions in linear panel models with a fixed
$T$-dimension. We advocate for a correlated interaction term estimator
(CITE) and show that it is consistent under conditions that are not sufficient
for consistency of the interaction term estimator that is most common in
applied econometric work. Our paper discusses the empirical content of these
conditions, shows that standard inference procedures can be applied to CITE,
and analyzes consistency, relative efficiency, inference, and their finite
sample properties in a simulation study. In an empirical application, we test
whether labor displacement effects of robots are stronger in countries at
higher income levels. The results are in line with our theoretical and
simulation results and indicate that standard interaction term estimation
underestimates the importance of a country's income level in the relationship
between robots and employment and may prematurely reject a null hypothesis
about interaction effects in the presence of misspecification.

arXiv link: http://arxiv.org/abs/2211.01557v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-11-03

A Systematic Paradigm for Detecting, Surfacing, and Characterizing Heterogeneous Treatment Effects (HTE)

Authors: John Cai, Weinan Wang

To effectively optimize and personalize treatments, it is necessary to
investigate the heterogeneity of treatment effects. With the wide range of
users being treated over many online controlled experiments, the typical
approach of manually investigating each dimension of heterogeneity becomes
overly cumbersome and prone to subjective human biases. We need an efficient
way to search through thousands of experiments with hundreds of target
covariates and hundreds of breakdown dimensions. In this paper, we propose a
systematic paradigm for detecting, surfacing and characterizing heterogeneous
treatment effects. First, we detect if treatment effect variation is present in
an experiment, prior to specifying any breakdowns. Second, we surface the most
relevant dimensions for heterogeneity. Finally, we characterize the
heterogeneity beyond just the conditional average treatment effects (CATE) by
studying the conditional distributions of the estimated individual treatment
effects. We show the effectiveness of our methods using simulated data and
empirical studies.

arXiv link: http://arxiv.org/abs/2211.01547v1

Econometrics arXiv updated paper (originally submitted: 2022-11-03)

Stochastic Treatment Choice with Empirical Welfare Updating

Authors: Toru Kitagawa, Hugo Lopez, Jeff Rowley

This paper proposes a novel method to estimate individualised treatment
assignment rules. The method is designed to find rules that are stochastic,
reflecting uncertainty in estimation of an assignment rule and about its
welfare performance. Our approach is to form a prior distribution over
assignment rules, not over data generating processes, and to update this prior
based upon an empirical welfare criterion, not likelihood. The social planner
then assigns treatment by drawing a policy from the resulting posterior. We
show analytically a welfare-optimal way of updating the prior using empirical
welfare; this posterior is not feasible to compute, so we propose a variational
Bayes approximation for the optimal posterior. We characterise the welfare
regret convergence of the assignment rule based upon this variational Bayes
approximation, showing that it converges to zero at a rate of ln(n)/sqrt(n). We
apply our methods to experimental data from the Job Training Partnership Act
Study to illustrate the implementation of our methods.

arXiv link: http://arxiv.org/abs/2211.01537v3

Econometrics arXiv paper, submitted: 2022-11-02

A New Test for Market Efficiency and Uncovered Interest Parity

Authors: Richard T. Baillie, Francis X. Diebold, George Kapetanios, Kun Ho Kim

We suggest a new single-equation test for Uncovered Interest Parity (UIP)
based on a dynamic regression approach. The method provides consistent and
asymptotically efficient parameter estimates, and is not dependent on
assumptions of strict exogeneity. This new approach is asymptotically more
efficient than the common approach of using OLS with HAC robust standard errors
in the static forward premium regression. The coefficient estimates when spot
return changes are regressed on the forward premium are all positive and
remarkably stable across currencies. These estimates are considerably larger
than those of previous studies, which frequently find negative coefficients.
The method also has the advantage of showing dynamic effects of risk premia, or
other events that may lead to rejection of UIP or the efficient markets
hypothesis.

arXiv link: http://arxiv.org/abs/2211.01344v1

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2022-11-02

Effects of syndication network on specialisation and performance of venture capital firms

Authors: Qing Yao, Shaodong Ma, Jing Liang, Kim Christensen, Wanru Jing, Ruiqi Li

The Chinese venture capital (VC) market is a young and rapidly expanding
financial subsector. Gaining a deeper understanding of the investment
behaviours of VC firms is crucial for the development of a more sustainable and
healthier market and economy. Contrasting evidence supports that either
specialisation or diversification helps to achieve a better investment
performance. However, the impact of the syndication network is overlooked.
Syndication network has a great influence on the propagation of information and
trust. By exploiting an authoritative VC dataset of thirty-five-year investment
information in China, we construct a joint-investment network of VC firms and
analyse the effects of syndication and diversification on specialisation and
investment performance. There is a clear correlation between the syndication
network degree and specialisation level of VC firms, which implies that the
well-connected VC firms are diversified. More connections generally bring about
more information or other resources, and VC firms are more likely to enter a
new stage or industry with some new co-investing VC firms when compared to a
randomised null model. Moreover, autocorrelation analysis of both
specialisation and success rate on the syndication network indicates that
clustering of similar VC firms is roughly limited to the secondary
neighbourhood. When analysing local clustering patterns, we discover that,
contrary to popular beliefs, there is no apparent successful club of investors.
In contrast, investors with low success rates are more likely to cluster. Our
discoveries enrich the understanding of VC investment behaviours and can assist
policymakers in designing better strategies to promote the development of the
VC industry.

arXiv link: http://arxiv.org/abs/2211.00873v1

Econometrics arXiv updated paper (originally submitted: 2022-11-01)

Cover It Up! Bipartite Graphs Uncover Identifiability in Sparse Factor Analysis

Authors: Darjus Hosszejni, Sylvia Frühwirth-Schnatter

Despite the popularity of factor models with sparse loading matrices, little
attention has been given to formally address identifiability of these models
beyond standard rotation-based identification such as the positive lower
triangular constraint. To fill this gap, we present a counting rule on the
number of nonzero factor loadings that is sufficient for achieving generic
uniqueness of the variance decomposition in the factor representation. This is
formalized in the framework of sparse matrix spaces and some classical elements
from graph and network theory. Furthermore, we provide a computationally
efficient tool for verifying the counting rule. Our methodology is illustrated
for real data in the context of post-processing posterior draws in Bayesian
sparse factor analysis.

arXiv link: http://arxiv.org/abs/2211.00671v4

Econometrics arXiv paper, submitted: 2022-11-01

Population and Technological Growth: Evidence from Roe v. Wade

Authors: John T. H. Wong, Matthias Hei Man, Alex Li Cheuk Hung

We exploit the heterogeneous impact of the Roe v. Wade ruling by the US
Supreme Court, which ruled most abortion restrictions unconstitutional. Our
identifying assumption is that states which had not liberalized their abortion
laws prior to Roe would experience a negative birth shock of greater proportion
than states which had undergone pre-Roe reforms. We estimate the
difference-in-difference in births and use estimated births as an exogenous
treatment variable to predict patents per capita. Our results show that one
standard deviation increase in cohort starting population increases per capita
patents by 0.24 standard deviation. These results suggest that at the margins,
increasing fertility can increase patent production. Insofar as patent
production is a sufficient proxy for technological growth, increasing births
has a positive impact on technological growth. This paper and its results do
not pertain to the issue of abortion itself.

arXiv link: http://arxiv.org/abs/2211.00410v1

Econometrics arXiv updated paper (originally submitted: 2022-11-01)

Reservoir Computing for Macroeconomic Forecasting with Mixed Frequency Data

Authors: Giovanni Ballarin, Petros Dellaportas, Lyudmila Grigoryeva, Marcel Hirt, Sophie van Huellen, Juan-Pablo Ortega

Macroeconomic forecasting has recently started embracing techniques that can
deal with large-scale datasets and series with unequal release periods.
MIxed-DAta Sampling (MIDAS) and Dynamic Factor Models (DFM) are the two main
state-of-the-art approaches that allow modeling series with non-homogeneous
frequencies. We introduce a new framework called the Multi-Frequency Echo State
Network (MFESN) based on a relatively novel machine learning paradigm called
reservoir computing. Echo State Networks (ESN) are recurrent neural networks
formulated as nonlinear state-space systems with random state coefficients
where only the observation map is subject to estimation. MFESNs are
considerably more efficient than DFMs and allow for incorporating many series,
as opposed to MIDAS models, which are prone to the curse of dimensionality. All
methods are compared in extensive multistep forecasting exercises targeting US
GDP growth. We find that our MFESN models achieve superior or comparable
performance over MIDAS and DFMs at a much lower computational cost.

arXiv link: http://arxiv.org/abs/2211.00363v3

Econometrics arXiv updated paper (originally submitted: 2022-11-01)

Weak Identification in Low-Dimensional Factor Models with One or Two Factors

Authors: Gregory Cox

This paper describes how to reparameterize low-dimensional factor models with
one or two factors to fit weak identification theory developed for generalized
method of moments models. Some identification-robust tests, here called
"plug-in" tests, require a reparameterization to distinguish weakly identified
parameters from strongly identified parameters. The reparameterizations in this
paper make plug-in tests available for subvector hypotheses in low-dimensional
factor models with one or two factors. Simulations show that the plug-in tests
are less conservative than identification-robust tests that use the original
parameterization. An empirical application to a factor model of parental
investments in children is included.

arXiv link: http://arxiv.org/abs/2211.00329v2

Econometrics arXiv updated paper (originally submitted: 2022-10-31)

Shrinkage Methods for Treatment Choice

Authors: Takuya Ishihara, Daisuke Kurisu

This study examines the problem of determining whether to treat individuals
based on observed covariates. The most common decision rule is the conditional
empirical success (CES) rule proposed by Manski (2004), which assigns
individuals to treatments that yield the best experimental outcomes conditional
on the observed covariates. Conversely, using shrinkage estimators, which
shrink unbiased but noisy preliminary estimates toward the average of these
estimates, is a common approach in statistical estimation problems because it
is well-known that shrinkage estimators may have smaller mean squared errors
than unshrunk estimators. Inspired by this idea, we propose a computationally
tractable shrinkage rule that selects the shrinkage factor by minimizing an
upper bound of the maximum regret. Then, we compare the maximum regret of the
proposed shrinkage rule with those of the CES and pooling rules when the space
of conditional average treatment effects (CATEs) is correctly specified or
misspecified. Our theoretical results demonstrate that the shrinkage rule
performs well in many cases and these findings are further supported by
numerical experiments. Specifically, we show that the maximum regret of the
shrinkage rule can be strictly smaller than those of the CES and pooling rules
in certain cases when the space of CATEs is correctly specified. In addition,
we find that the shrinkage rule is robust against misspecification of the space
of CATEs. Finally, we apply our method to experimental data from the National
Job Training Partnership Act Study.

arXiv link: http://arxiv.org/abs/2210.17063v4

Econometrics arXiv updated paper (originally submitted: 2022-10-31)

Non-Robustness of the Cluster-Robust Inference: with a Proposal of a New Robust Method

Authors: Yuya Sasaki, Yulong Wang

The conventional cluster-robust (CR) standard errors may not be robust. They
are vulnerable to data that contain a small number of large clusters. When a
researcher uses the 51 states in the U.S. as clusters, the largest cluster
(California) consists of about 10% of the total sample. Such a case in fact
violates the assumptions under which the widely used CR methods are guaranteed
to work. We formally show that the conventional CR methods fail if the
distribution of cluster sizes follows a power law with exponent less than two.
Besides the example of 51 state clusters, some examples are drawn from a list
of recent original research articles published in a top journal. In light of
these negative results about the existing CR methods, we propose a weighted CR
(WCR) method as a simple fix. Simulation studies support our arguments that the
WCR method is robust while the conventional CR methods are not.

arXiv link: http://arxiv.org/abs/2210.16991v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-10-29

Flexible machine learning estimation of conditional average treatment effects: a blessing and a curse

Authors: Richard Post, Isabel van den Heuvel, Marko Petkovic, Edwin van den Heuvel

Causal inference from observational data requires untestable identification
assumptions. If these assumptions apply, machine learning (ML) methods can be
used to study complex forms of causal effect heterogeneity. Recently, several
ML methods were developed to estimate the conditional average treatment effect
(CATE). If the features at hand cannot explain all heterogeneity, the
individual treatment effects (ITEs) can seriously deviate from the CATE. In
this work, we demonstrate how the distributions of the ITE and the CATE can
differ when a causal random forest (CRF) is applied. We extend the CRF to
estimate the difference in conditional variance between treated and controls.
If the ITE distribution equals the CATE distribution, this estimated difference
in variance should be small. If they differ, an additional causal assumption is
necessary to quantify the heterogeneity not captured by the CATE distribution.
The conditional variance of the ITE can be identified when the individual
effect is independent of the outcome under no treatment given the measured
features. Then, in the cases where the ITE and CATE distributions differ, the
extended CRF can appropriately estimate the variance of the ITE distribution
while the CRF fails to do so.

arXiv link: http://arxiv.org/abs/2210.16547v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-10-29

Spectral Representation Learning for Conditional Moment Models

Authors: Ziyu Wang, Yucen Luo, Yueru Li, Jun Zhu, Bernhard Schölkopf

Many problems in causal inference and economics can be formulated in the
framework of conditional moment models, which characterize the target function
through a collection of conditional moment restrictions. For nonparametric
conditional moment models, efficient estimation often relies on preimposed
conditions on various measures of ill-posedness of the hypothesis space, which
are hard to validate when flexible models are used. In this work, we address
this issue by proposing a procedure that automatically learns representations
with controlled measures of ill-posedness. Our method approximates a linear
representation defined by the spectral decomposition of a conditional
expectation operator, which can be used for kernelized estimators and is known
to facilitate minimax optimal estimation in certain settings. We show this
representation can be efficiently estimated from data, and establish L2
consistency for the resulting estimator. We evaluate the proposed method on
proximal causal inference tasks, exhibiting promising performance on
high-dimensional, semi-synthetic data.

arXiv link: http://arxiv.org/abs/2210.16525v2

Econometrics arXiv paper, submitted: 2022-10-28

Eigenvalue tests for the number of latent factors in short panels

Authors: Alain-Philippe Fortin, Patrick Gagliardini, Olivier Scaillet

This paper studies new tests for the number of latent factors in a large
cross-sectional factor model with small time dimension. These tests are based
on the eigenvalues of variance-covariance matrices of (possibly weighted) asset
returns, and rely on either the assumption of spherical errors, or instrumental
variables for factor betas. We establish the asymptotic distributional results
using expansion theorems based on perturbation theory for symmetric matrices.
Our framework accommodates semi-strong factors in the systematic components. We
propose a novel statistical test for weak factors against strong or semi-strong
factors. We provide an empirical application to US equity data. Evidence for a
different number of latent factors according to market downturns and market
upturns, is statistically ambiguous in the considered subperiods. In
particular, our results contradicts the common wisdom of a single factor model
in bear markets.

arXiv link: http://arxiv.org/abs/2210.16042v1

Econometrics arXiv updated paper (originally submitted: 2022-10-28)

How to sample and when to stop sampling: The generalized Wald problem and minimax policies

Authors: Karun Adusumilli

We study sequential experiments where sampling is costly and a decision-maker
aims to determine the best treatment for full scale implementation by (1)
adaptively allocating units between two possible treatments, and (2) stopping
the experiment when the expected welfare (inclusive of sampling costs) from
implementing the chosen treatment is maximized. Working under a continuous time
limit, we characterize the optimal policies under the minimax regret criterion.
We show that the same policies also remain optimal under both parametric and
non-parametric outcome distributions in an asymptotic regime where sampling
costs approach zero. The minimax optimal sampling rule is just the Neyman
allocation: it is independent of sampling costs and does not adapt to observed
outcomes. The decision-maker halts sampling when the product of the average
treatment difference and the number of observations surpasses a specific
threshold. The results derived also apply to the so-called best-arm
identification problem, where the number of observations is exogenously
specified.

arXiv link: http://arxiv.org/abs/2210.15841v7

Econometrics arXiv updated paper (originally submitted: 2022-10-28)

Estimation of Heterogeneous Treatment Effects Using a Conditional Moment Based Approach

Authors: Xiaolin Sun

We propose a new estimator for heterogeneous treatment effects in a partially
linear model (PLM) with multiple exogenous covariates and a potentially
endogenous treatment variable. Our approach integrates a Robinson
transformation to handle the nonparametric component, the Smooth Minimum
Distance (SMD) method to leverage conditional mean independence restrictions,
and a Neyman-Orthogonalized first-order condition (FOC). By employing
regularized model selection techniques like the Lasso method, our estimator
accommodates numerous covariates while exhibiting reduced bias, consistency,
and asymptotic normality. Simulations demonstrate its robust performance with
diverse instrument sets compared to traditional GMM-type estimators. Applying
this method to estimate Medicaid's heterogeneous treatment effects from the
Oregon Health Insurance Experiment reveals more robust and reliable results
than conventional GMM approaches.

arXiv link: http://arxiv.org/abs/2210.15829v4

Econometrics arXiv updated paper (originally submitted: 2022-10-25)

Unit Averaging for Heterogeneous Panels

Authors: Christian Brownlees, Vladislav Morozov

In this work we introduce a unit averaging procedure to efficiently recover
unit-specific parameters in a heterogeneous panel model. The procedure consists
in estimating the parameter of a given unit using a weighted average of all the
unit-specific parameter estimators in the panel. The weights of the average are
determined by minimizing an MSE criterion we derive. We analyze the properties
of the resulting minimum MSE unit averaging estimator in a local heterogeneity
framework inspired by the literature on frequentist model averaging, and we
derive the local asymptotic distribution of the estimator and the corresponding
weights. The benefits of the procedure are showcased with an application to
forecasting unemployment rates for a panel of German regions.

arXiv link: http://arxiv.org/abs/2210.14205v3

Econometrics arXiv updated paper (originally submitted: 2022-10-25)

GLS under Monotone Heteroskedasticity

Authors: Yoichi Arai, Taisuke Otsu, Mengshan Xu

The generalized least square (GLS) is one of the most basic tools in
regression analyses. A major issue in implementing the GLS is estimation of the
conditional variance function of the error term, which typically requires a
restrictive functional form assumption for parametric estimation or smoothing
parameters for nonparametric estimation. In this paper, we propose an
alternative approach to estimate the conditional variance function under
nonparametric monotonicity constraints by utilizing the isotonic regression
method. Our GLS estimator is shown to be asymptotically equivalent to the
infeasible GLS estimator with knowledge of the conditional error variance, and
involves only some tuning to trim boundary observations, not only for point
estimation but also for interval estimation or hypothesis testing. Our analysis
extends the scope of the isotonic regression method by showing that the
isotonic estimates, possibly with generated variables, can be employed as first
stage estimates to be plugged in for semiparametric objects. Simulation studies
illustrate excellent finite sample performances of the proposed method. As an
empirical example, we revisit Acemoglu and Restrepo's (2017) study on the
relationship between an aging population and economic growth to illustrate how
our GLS estimator effectively reduces estimation errors.

arXiv link: http://arxiv.org/abs/2210.13843v2

Econometrics arXiv updated paper (originally submitted: 2022-10-24)

Prediction intervals for economic fixed-event forecasts

Authors: Fabian Krüger, Hendrik Plett

The fixed-event forecasting setup is common in economic policy. It involves a
sequence of forecasts of the same (`fixed') predictand, so that the difficulty
of the forecasting problem decreases over time. Fixed-event point forecasts are
typically published without a quantitative measure of uncertainty. To construct
such a measure, we consider forecast postprocessing techniques tailored to the
fixed-event case. We develop regression methods that impose constraints
motivated by the problem at hand, and use these methods to construct prediction
intervals for gross domestic product (GDP) growth in Germany and the US.

arXiv link: http://arxiv.org/abs/2210.13562v3

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2022-10-24

Spatio-temporal Event Studies for Air Quality Assessment under Cross-sectional Dependence

Authors: Paolo Maranzano, Matteo Maria Pelagatti

Event Studies (ES) are statistical tools that assess whether a particular
event of interest has caused changes in the level of one or more relevant time
series. We are interested in ES applied to multivariate time series
characterized by high spatial (cross-sectional) and temporal dependence. We
pursue two goals. First, we propose to extend the existing taxonomy on ES,
mainly deriving from the financial field, by generalizing the underlying
statistical concepts and then adapting them to the time series analysis of
airborne pollutant concentrations. Second, we address the spatial
cross-sectional dependence by adopting a twofold adjustment. Initially, we use
a linear mixed spatio-temporal regression model (HDGM) to estimate the
relationship between the response variable and a set of exogenous factors,
while accounting for the spatio-temporal dynamics of the observations. Later,
we apply a set of sixteen ES test statistics, both parametric and
nonparametric, some of which directly adjusted for cross-sectional dependence.
We apply ES to evaluate the impact on NO2 concentrations generated by the
lockdown restrictions adopted in the Lombardy region (Italy) during the
COVID-19 pandemic in 2020. The HDGM model distinctly reveals the level shift
caused by the event of interest, while reducing the volatility and isolating
the spatial dependence of the data. Moreover, all the test statistics
unanimously suggest that the lockdown restrictions generated significant
reductions in the average NO2 concentrations.

arXiv link: http://arxiv.org/abs/2210.17529v1

Econometrics arXiv paper, submitted: 2022-10-22

Choosing The Best Incentives for Belief Elicitation with an Application to Political Protests

Authors: Nathan Canen, Anujit Chakraborty

Many experiments elicit subjects' prior and posterior beliefs about a random
variable to assess how information affects one's own actions. However, beliefs
are multi-dimensional objects, and experimenters often only elicit a single
response from subjects. In this paper, we discuss how the incentives offered by
experimenters map subjects' true belief distributions to what profit-maximizing
subjects respond in the elicitation task. In particular, we show how slightly
different incentives may induce subjects to report the mean, mode, or median of
their belief distribution. If beliefs are not symmetric and unimodal, then
using an elicitation scheme that is mismatched with the research question may
affect both the magnitude and the sign of identified effects, or may even make
identification impossible. As an example, we revisit Cantoni et al.'s (2019)
study of whether political protests are strategic complements or substitutes.
We show that they elicit modal beliefs, while modal and mean beliefs may be
updated in opposite directions following their experiment. Hence, the sign of
their effects may change, allowing an alternative interpretation of their
results.

arXiv link: http://arxiv.org/abs/2210.12549v1

Econometrics arXiv paper, submitted: 2022-10-20

Allowing for weak identification when testing GARCH-X type models

Authors: Philipp Ketz

In this paper, we use the results in Andrews and Cheng (2012), extended to
allow for parameters to be near or at the boundary of the parameter space, to
derive the asymptotic distributions of the two test statistics that are used in
the two-step (testing) procedure proposed by Pedersen and Rahbek (2019). The
latter aims at testing the null hypothesis that a GARCH-X type model, with
exogenous covariates (X), reduces to a standard GARCH type model, while
allowing the "GARCH parameter" to be unidentified. We then provide a
characterization result for the asymptotic size of any test for testing this
null hypothesis before numerically establishing a lower bound on the asymptotic
size of the two-step procedure at the 5% nominal level. This lower bound
exceeds the nominal level, revealing that the two-step procedure does not
control asymptotic size. In a simulation study, we show that this finding is
relevant for finite samples, in that the two-step procedure can suffer from
overrejection in finite samples. We also propose a new test that, by
construction, controls asymptotic size and is found to be more powerful than
the two-step procedure when the "ARCH parameter" is "very small" (in which case
the two-step procedure underrejects).

arXiv link: http://arxiv.org/abs/2210.11398v1

Econometrics arXiv updated paper (originally submitted: 2022-10-20)

Network Synthetic Interventions: A Causal Framework for Panel Data Under Network Interference

Authors: Anish Agarwal, Sarah H. Cen, Devavrat Shah, Christina Lee Yu

We propose a generalization of the synthetic controls and synthetic
interventions methodology to incorporate network interference. We consider the
estimation of unit-specific potential outcomes from panel data in the presence
of spillover across units and unobserved confounding. Key to our approach is a
novel latent factor model that takes into account network interference and
generalizes the factor models typically used in panel data settings. We propose
an estimator, Network Synthetic Interventions (NSI), and show that it
consistently estimates the mean outcomes for a unit under an arbitrary set of
counterfactual treatments for the network. We further establish that the
estimator is asymptotically normal. We furnish two validity tests for whether
the NSI estimator reliably generalizes to produce accurate counterfactual
estimates. We provide a novel graph-based experiment design that guarantees the
NSI estimator produces accurate counterfactual estimates, and also analyze the
sample complexity of the proposed design. We conclude with simulations that
corroborate our theoretical findings.

arXiv link: http://arxiv.org/abs/2210.11355v2

Econometrics arXiv paper, submitted: 2022-10-20

Low-rank Panel Quantile Regression: Estimation and Inference

Authors: Yiren Wang, Liangjun Su, Yichong Zhang

In this paper, we propose a class of low-rank panel quantile regression
models which allow for unobserved slope heterogeneity over both individuals and
time. We estimate the heterogeneous intercept and slope matrices via nuclear
norm regularization followed by sample splitting, row- and column-wise quantile
regressions and debiasing. We show that the estimators of the factors and
factor loadings associated with the intercept and slope matrices are
asymptotically normally distributed. In addition, we develop two specification
tests: one for the null hypothesis that the slope coefficient is a constant
over time and/or individuals under the case that true rank of slope matrix
equals one, and the other for the null hypothesis that the slope coefficient
exhibits an additive structure under the case that the true rank of slope
matrix equals two. We illustrate the finite sample performance of estimation
and inference via Monte Carlo simulations and real datasets.

arXiv link: http://arxiv.org/abs/2210.11062v1

Econometrics arXiv updated paper (originally submitted: 2022-10-20)

Efficient variational approximations for state space models

Authors: Rubén Loaiza-Maya, Didier Nibbering

Variational Bayes methods are a potential scalable estimation approach for
state space models. However, existing methods are inaccurate or computationally
infeasible for many state space models. This paper proposes a variational
approximation that is accurate and fast for any model with a closed-form
measurement density function and a state transition distribution within the
exponential family of distributions. We show that our method can accurately and
quickly estimate a multivariate Skellam stochastic volatility model with
high-frequency tick-by-tick discrete price changes of four stocks, and a
time-varying parameter vector autoregression with a stochastic volatility model
using eight macroeconomic variables.

arXiv link: http://arxiv.org/abs/2210.11010v3

Econometrics arXiv updated paper (originally submitted: 2022-10-20)

Synthetic Blips: Generalizing Synthetic Controls for Dynamic Treatment Effects

Authors: Anish Agarwal, Sukjin Han, Dwaipayan Saha, Vasilis Syrgkanis, Haeyeon Yoon

We propose a generalization of the synthetic control and interventions
methods to the setting with dynamic treatment effects. We consider the
estimation of unit-specific treatment effects from panel data collected under a
general treatment sequence. Here, each unit receives multiple treatments
sequentially, according to an adaptive policy that depends on a latent,
endogenously time-varying confounding state. Under a low-rank latent factor
model assumption, we develop an identification strategy for any unit-specific
mean outcome under any sequence of interventions. The latent factor model we
propose admits linear time-varying and time-invariant dynamical systems as
special cases. Our approach can be viewed as an identification strategy for
structural nested mean models -- a widely used framework for dynamic treatment
effects -- under a low-rank latent factor assumption on the blip effects.
Unlike these models, however, it is more permissive in observational settings,
thereby broadening its applicability. Our method, which we term synthetic blip
effects, is a backwards induction process in which the blip effect of a
treatment at each period and for a target unit is recursively expressed as a
linear combination of the blip effects of a group of other units that received
the designated treatment. This strategy avoids the combinatorial explosion in
the number of units that would otherwise be required by a naive application of
prior synthetic control and intervention methods in dynamic treatment settings.
We provide estimation algorithms that are easy to implement in practice and
yield estimators with desirable properties. Using unique Korean firm-level
panel data, we demonstrate how the proposed framework can be used to estimate
individualized dynamic treatment effects and to derive optimal treatment
allocation rules in the context of financial support for exporting firms.

arXiv link: http://arxiv.org/abs/2210.11003v2

Econometrics arXiv paper, submitted: 2022-10-18

Linear Regression with Centrality Measures

Authors: Yong Cai

This paper studies the properties of linear regression on centrality measures
when network data is sparse -- that is, when there are many more agents than
links per agent -- and when they are measured with error. We make three
contributions in this setting: (1) We show that OLS estimators can become
inconsistent under sparsity and characterize the threshold at which this
occurs, with and without measurement error. This threshold depends on the
centrality measure used. Specifically, regression on eigenvector is less robust
to sparsity than on degree and diffusion. (2) We develop distributional theory
for OLS estimators under measurement error and sparsity, finding that OLS
estimators are subject to asymptotic bias even when they are consistent.
Moreover, bias can be large relative to their variances, so that bias
correction is necessary for inference. (3) We propose novel bias correction and
inference methods for OLS with sparse noisy networks. Simulation evidence
suggests that our theory and methods perform well, particularly in settings
where the usual OLS estimators and heteroskedasticity-consistent/robust t-tests
are deficient. Finally, we demonstrate the utility of our results in an
application inspired by De Weerdt and Deacon (2006), in which we consider
consumption smoothing and social insurance in Nyakatoke, Tanzania.

arXiv link: http://arxiv.org/abs/2210.10024v1

Econometrics arXiv updated paper (originally submitted: 2022-10-18)

Modelling Large Dimensional Datasets with Markov Switching Factor Models

Authors: Matteo Barigozzi, Daniele Massacci

We study a novel large dimensional approximate factor model with regime
changes in the loadings driven by a latent first order Markov process. By
exploiting the equivalent linear representation of the model, we first recover
the latent factors by means of Principal Component Analysis. We then cast the
model in state-space form, and we estimate loadings and transition
probabilities through an EM algorithm based on a modified version of the
Baum-Lindgren-Hamilton-Kim filter and smoother that makes use of the factors
previously estimated. Our approach is appealing as it provides closed form
expressions for all estimators. More importantly, it does not require knowledge
of the true number of factors. We derive the theoretical properties of the
proposed estimation procedure, and we show their good finite sample performance
through a comprehensive set of Monte Carlo experiments. The empirical
usefulness of our approach is illustrated through three applications to large
U.S. datasets of stock returns, macroeconomic variables, and inflation indexes.

arXiv link: http://arxiv.org/abs/2210.09828v5

Econometrics arXiv updated paper (originally submitted: 2022-10-17)

Party On: The Labor Market Returns to Social Networks in Adolescence

Authors: Adriana Lleras-Muney, Matthew Miller, Shuyang Sheng, Veronica Sovero

We investigate the returns to adolescent friendships on earnings in adulthood
using data from the National Longitudinal Study of Adolescent to Adult Health.
Because both education and friendships are jointly determined in adolescence,
OLS estimates of their returns are likely biased. We implement a novel
procedure to obtain bounds on the causal returns to friendships: we assume that
the returns to schooling range from 5 to 15% (based on prior literature), and
instrument for friendships using similarity in age among peers. Having one more
friend in adolescence increases earnings between 7 and 14%, substantially more
than OLS estimates would suggest.

arXiv link: http://arxiv.org/abs/2210.09426v5

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2022-10-17

Concentration inequalities of MLE and robust MLE

Authors: Xiaowei Yang, Xinqiao Liu, Haoyu Wei

The Maximum Likelihood Estimator (MLE) serves an important role in statistics
and machine learning. In this article, for i.i.d. variables, we obtain
constant-specified and sharp concentration inequalities and oracle inequalities
for the MLE only under exponential moment conditions. Furthermore, in a robust
setting, the sub-Gaussian type oracle inequalities of the log-truncated maximum
likelihood estimator are derived under the second-moment condition.

arXiv link: http://arxiv.org/abs/2210.09398v2

Econometrics arXiv paper, submitted: 2022-10-17

Modified Wilcoxon-Mann-Whitney tests of stochastic dominance

Authors: Brendan K. Beare, Jackson D. Clarke

Given independent samples from two univariate distributions, one-sided
Wilcoxon-Mann-Whitney statistics may be used to conduct rank-based tests of
stochastic dominance. We broaden the scope of applicability of such tests by
showing that the bootstrap may be used to conduct valid inference in a matched
pairs sampling framework permitting dependence between the two samples.
Further, we show that a modified bootstrap incorporating an implicit estimate
of a contact set may be used to improve power. Numerical simulations indicate
that our test using the modified bootstrap effectively controls the null
rejection rates and can deliver more or less power than that of the Donald-Hsu
test. In the course of establishing our results we obtain a weak approximation
to the empirical ordinance dominance curve permitting its population density to
diverge to infinity at zero or one at arbitrary rates.

arXiv link: http://arxiv.org/abs/2210.08892v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-10-17

A General Design-Based Framework and Estimator for Randomized Experiments

Authors: Christopher Harshaw, Fredrik Sävje, Yitan Wang

We describe a design-based framework for drawing causal inference in general
randomized experiments. Causal effects are defined as linear functionals
evaluated at unit-level potential outcome functions. Assumptions about the
potential outcome functions are encoded as function spaces. This makes the
framework expressive, allowing experimenters to formulate and investigate a
wide range of causal questions, including about interference, that previously
could not be investigated with design-based methods. We describe a class of
estimators for estimands defined using the framework and investigate their
properties. We provide necessary and sufficient conditions for unbiasedness and
consistency. We also describe a class of conservative variance estimators,
which facilitate the construction of confidence intervals. Finally, we provide
several examples of empirical settings that previously could not be examined
with design-based methods to illustrate the use of our approach in practice.

arXiv link: http://arxiv.org/abs/2210.08698v3

Econometrics arXiv updated paper (originally submitted: 2022-10-16)

Inference on Extreme Quantiles of Unobserved Individual Heterogeneity

Authors: Vladislav Morozov

We develop a methodology for conducting inference on extreme quantiles of
unobserved individual heterogeneity (e.g., heterogeneous coefficients,
treatment effects) in panel data and meta-analysis settings. Inference is
challenging in such settings: only noisy estimates of heterogeneity are
available, and central limit approximations perform poorly in the tails. We
derive a necessary and sufficient condition under which noisy estimates are
informative about extreme quantiles, along with sufficient rate and moment
conditions. Under these conditions, we establish an extreme value theorem and
an intermediate order theorem for noisy estimates. These results yield simple
optimization-free confidence intervals for extreme quantiles. Simulations show
that our confidence intervals have favorable coverage and that the rate
conditions matter for the validity of inference. We illustrate the method with
an application to firm productivity differences between denser and less dense
areas.

arXiv link: http://arxiv.org/abs/2210.08524v4

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-10-15

Fair Effect Attribution in Parallel Online Experiments

Authors: Alexander Buchholz, Vito Bellini, Giuseppe Di Benedetto, Yannik Stein, Matteo Ruffini, Fabian Moerchen

A/B tests serve the purpose of reliably identifying the effect of changes
introduced in online services. It is common for online platforms to run a large
number of simultaneous experiments by splitting incoming user traffic randomly
in treatment and control groups. Despite a perfect randomization between
different groups, simultaneous experiments can interact with each other and
create a negative impact on average population outcomes such as engagement
metrics. These are measured globally and monitored to protect overall user
experience. Therefore, it is crucial to measure these interaction effects and
attribute their overall impact in a fair way to the respective experimenters.
We suggest an approach to measure and disentangle the effect of simultaneous
experiments by providing a cost sharing approach based on Shapley values. We
also provide a counterfactual perspective, that predicts shared impact based on
conditional average treatment effects making use of causal inference
techniques. We illustrate our approach in real world and synthetic data
experiments.

arXiv link: http://arxiv.org/abs/2210.08338v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-10-15

Distance and Kernel-Based Measures for Global and Local Two-Sample Conditional Distribution Testing

Authors: Jian Yan, Zhuoxi Li, Xianyang Zhang

Testing the equality of two conditional distributions is crucial in various
modern applications, including transfer learning and causal inference. Despite
its importance, this fundamental problem has received surprisingly little
attention in the literature, with existing works focusing exclusively on global
two-sample conditional distribution testing. Based on distance and kernel
methods, this paper presents the first unified framework for both global and
local two-sample conditional distribution testing. To this end, we introduce
distance and kernel-based measures that characterize the homogeneity of two
conditional distributions. Drawing from the concept of conditional
U-statistics, we propose consistent estimators for these measures.
Theoretically, we derive the convergence rates and the asymptotic distributions
of the estimators under both the null and alternative hypotheses. Utilizing
these measures, along with a local bootstrap approach, we develop global and
local tests that can detect discrepancies between two conditional distributions
at global and local levels, respectively. Our tests demonstrate reliable
performance through simulations and real data analysis.

arXiv link: http://arxiv.org/abs/2210.08149v3

Econometrics arXiv paper, submitted: 2022-10-15

A New Method for Generating Random Correlation Matrices

Authors: Ilya Archakov, Peter Reinhard Hansen, Yiyao Luo

We propose a new method for generating random correlation matrices that makes
it simple to control both location and dispersion. The method is based on a
vector parameterization, gamma = g(C), which maps any distribution on R^d, d =
n(n-1)/2 to a distribution on the space of non-singular nxn correlation
matrices. Correlation matrices with certain properties, such as being
well-conditioned, having block structures, and having strictly positive
elements, are simple to generate. We compare the new method with existing
methods.

arXiv link: http://arxiv.org/abs/2210.08147v1

Econometrics arXiv paper, submitted: 2022-10-14

Conditional Likelihood Ratio Test with Many Weak Instruments

Authors: Sreevidya Ayyar, Yukitoshi Matsushita, Taisuke Otsu

This paper extends validity of the conditional likelihood ratio (CLR) test
developed by Moreira (2003) to instrumental variable regression models with
unknown error variance and many weak instruments. In this setting, we argue
that the conventional CLR test with estimated error variance loses exact
similarity and is asymptotically invalid. We propose a modified critical value
function for the likelihood ratio (LR) statistic with estimated error variance,
and prove that this modified test achieves asymptotic validity under many weak
instrument asymptotics. Our critical value function is constructed by
representing the LR using four statistics, instead of two as in Moreira (2003).
A simulation study illustrates the desirable properties of our test.

arXiv link: http://arxiv.org/abs/2210.07680v1

Econometrics arXiv paper, submitted: 2022-10-13

Fast Estimation of Bayesian State Space Models Using Amortized Simulation-Based Inference

Authors: Ramis Khabibullin, Sergei Seleznev

This paper presents a fast algorithm for estimating hidden states of Bayesian
state space models. The algorithm is a variation of amortized simulation-based
inference algorithms, where a large number of artificial datasets are generated
at the first stage, and then a flexible model is trained to predict the
variables of interest. In contrast to those proposed earlier, the procedure
described in this paper makes it possible to train estimators for hidden states
by concentrating only on certain characteristics of the marginal posterior
distributions and introducing inductive bias. Illustrations using the examples
of the stochastic volatility model, nonlinear dynamic stochastic general
equilibrium model, and seasonal adjustment procedure with breaks in seasonality
show that the algorithm has sufficient accuracy for practical use. Moreover,
after pretraining, which takes several hours, finding the posterior
distribution for any dataset takes from hundredths to tenths of a second.

arXiv link: http://arxiv.org/abs/2210.07154v1

Econometrics arXiv updated paper (originally submitted: 2022-10-13)

Robust Estimation and Inference in Panels with Interactive Fixed Effects

Authors: Timothy B. Armstrong, Martin Weidner, Andrei Zeleneev

We consider estimation and inference for a regression coefficient in panels
with interactive fixed effects (i.e., with a factor structure). We demonstrate
that existing estimators and confidence intervals (CIs) can be heavily biased
and size-distorted when some of the factors are weak. We propose estimators
with improved rates of convergence and bias-aware CIs that remain valid
uniformly, regardless of factor strength. Our approach applies the theory of
minimax linear estimation to form a debiased estimate, using a nuclear norm
bound on the error of an initial estimate of the interactive fixed effects. Our
resulting bias-aware CIs take into account the remaining bias caused by weak
factors. Monte Carlo experiments show substantial improvements over
conventional methods when factors are weak, with minimal costs to estimation
accuracy when factors are strong.

arXiv link: http://arxiv.org/abs/2210.06639v4

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-10-12

Sample Constrained Treatment Effect Estimation

Authors: Raghavendra Addanki, David Arbour, Tung Mai, Cameron Musco, Anup Rao

Treatment effect estimation is a fundamental problem in causal inference. We
focus on designing efficient randomized controlled trials, to accurately
estimate the effect of some treatment on a population of $n$ individuals. In
particular, we study sample-constrained treatment effect estimation, where we
must select a subset of $s \ll n$ individuals from the population to experiment
on. This subset must be further partitioned into treatment and control groups.
Algorithms for partitioning the entire population into treatment and control
groups, or for choosing a single representative subset, have been well-studied.
The key challenge in our setting is jointly choosing a representative subset
and a partition for that set.
We focus on both individual and average treatment effect estimation, under a
linear effects model. We give provably efficient experimental designs and
corresponding estimators, by identifying connections to discrepancy
minimization and leverage-score-based sampling used in randomized numerical
linear algebra. Our theoretical results obtain a smooth transition to known
guarantees when $s$ equals the population size. We also empirically demonstrate
the performance of our algorithms.

arXiv link: http://arxiv.org/abs/2210.06594v1

Econometrics arXiv paper, submitted: 2022-10-12

Estimating Option Pricing Models Using a Characteristic Function-Based Linear State Space Representation

Authors: H. Peter Boswijk, Roger J. A. Laeven, Evgenii Vladimirov

We develop a novel filtering and estimation procedure for parametric option
pricing models driven by general affine jump-diffusions. Our procedure is based
on the comparison between an option-implied, model-free representation of the
conditional log-characteristic function and the model-implied conditional
log-characteristic function, which is functionally affine in the model's state
vector. We formally derive an associated linear state space representation and
establish the asymptotic properties of the corresponding measurement errors.
The state space representation allows us to use a suitably modified Kalman
filtering technique to learn about the latent state vector and a quasi-maximum
likelihood estimator of the model parameters, which brings important
computational advantages. We analyze the finite-sample behavior of our
procedure in Monte Carlo simulations. The applicability of our procedure is
illustrated in two case studies that analyze S&P 500 option prices and the
impact of exogenous state variables capturing Covid-19 reproduction and
economic policy uncertainty.

arXiv link: http://arxiv.org/abs/2210.06217v1

Econometrics arXiv updated paper (originally submitted: 2022-10-11)

Bayesian analysis of mixtures of lognormal distribution with an unknown number of components from grouped data

Authors: Kazuhiko Kakamu

This study proposes a reversible jump Markov chain Monte Carlo method for
estimating parameters of lognormal distribution mixtures for income. Using
simulated data examples, we examined the proposed algorithm's performance and
the accuracy of posterior distributions of the Gini coefficients. Results
suggest that the parameters were estimated accurately. Therefore, the posterior
distributions are close to the true distributions even when the different data
generating process is accounted for. Moreover, promising results for Gini
coefficients encouraged us to apply our method to real data from Japan. The
empirical examples indicate two subgroups in Japan (2020) and the Gini
coefficients' integrity.

arXiv link: http://arxiv.org/abs/2210.05115v3

Econometrics arXiv updated paper (originally submitted: 2022-10-10)

Uncertainty Quantification in Synthetic Controls with Staggered Treatment Adoption

Authors: Matias D. Cattaneo, Yingjie Feng, Filippo Palomba, Rocio Titiunik

We propose principled prediction intervals to quantify the uncertainty of a
large class of synthetic control predictions (or estimators) in settings with
staggered treatment adoption, offering precise non-asymptotic coverage
probability guarantees. From a methodological perspective, we provide a
detailed discussion of different causal quantities to be predicted, which we
call causal predictands, allowing for multiple treated units with treatment
adoption at possibly different points in time. From a theoretical perspective,
our uncertainty quantification methods improve on prior literature by (i)
covering a large class of causal predictands in staggered adoption settings,
(ii) allowing for synthetic control methods with possibly nonlinear
constraints, (iii) proposing scalable robust conic optimization methods and
principled data-driven tuning parameter selection, and (iv) offering valid
uniform inference across post-treatment periods. We illustrate our methodology
with an empirical application studying the effects of economic liberalization
on real GDP per capita for Sub-Saharan African countries. Companion software
packages are provided in Python, R, and Stata.

arXiv link: http://arxiv.org/abs/2210.05026v5

Econometrics arXiv updated paper (originally submitted: 2022-10-10)

Policy Learning with New Treatments

Authors: Samuel Higbee

I study the problem of a decision maker choosing a policy which allocates
treatment to a heterogeneous population on the basis of experimental data that
includes only a subset of possible treatment values. The effects of new
treatments are partially identified by shape restrictions on treatment
response. Policies are compared according to the minimax regret criterion, and
I show that the empirical analog of the population decision problem has a
tractable linear- and integer-programming formulation. I prove the maximum
regret of the estimated policy converges to the lowest possible maximum regret
at a rate which is the maximum of N^-1/2 and the rate at which conditional
average treatment effects are estimated in the experimental data. In an
application to designing targeted subsidies for electrical grid connections in
rural Kenya, I find that nearly the entire population should be given a
treatment not implemented in the experiment, reducing maximum regret by over
60% compared to the policy that restricts to the treatments implemented in the
experiment.

arXiv link: http://arxiv.org/abs/2210.04703v4

Econometrics arXiv updated paper (originally submitted: 2022-10-10)

An identification and testing strategy for proxy-SVARs with weak proxies

Authors: Giovanni Angelini, Giuseppe Cavaliere, Luca Fanelli

When proxies (external instruments) used to identify target structural shocks
are weak, inference in proxy-SVARs (SVAR-IVs) is nonstandard and the
construction of asymptotically valid confidence sets for the impulse responses
of interest requires weak-instrument robust methods. In the presence of
multiple target shocks, test inversion techniques require extra restrictions on
the proxy-SVAR parameters other those implied by the proxies that may be
difficult to interpret and test. We show that frequentist asymptotic inference
in these situations can be conducted through Minimum Distance estimation and
standard asymptotic methods if the proxy-SVAR can be identified by using
`strong' instruments for the non-target shocks; i.e. the shocks which are not
of primary interest in the analysis. The suggested identification strategy
hinges on a novel pre-test for the null of instrument relevance based on
bootstrap resampling which is not subject to pre-testing issues, in the sense
that the validity of post-test asymptotic inferences is not affected by the
outcomes of the test. The test is robust to conditionally heteroskedasticity
and/or zero-censored proxies, is computationally straightforward and applicable
regardless of the number of shocks being instrumented. Some illustrative
examples show the empirical usefulness of the suggested identification and
testing strategy.

arXiv link: http://arxiv.org/abs/2210.04523v4

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2022-10-08

A Structural Equation Modeling Approach to Understand User's Perceptions of Acceptance of Ride-Sharing Services in Dhaka City

Authors: Md. Mohaimenul Islam Sourav, Mohammed Russedul Islam, H M Imran Kays, Md. Hadiuzzaman

This research aims at building a multivariate statistical model for assessing
users' perceptions of acceptance of ride-sharing services in Dhaka City. A
structured questionnaire is developed based on the users' reported attitudes
and perceived risks. A total of 350 normally distributed responses are
collected from ride-sharing service users and stakeholders of Dhaka City.
Respondents are interviewed to express their experience and opinions on
ride-sharing services through the stated preference questionnaire. Structural
Equation Modeling (SEM) is used to validate the research hypotheses.
Statistical parameters and several trials are used to choose the best SEM. The
responses are also analyzed using the Relative Importance Index (RII) method,
validating the chosen SEM. Inside SEM, the quality of ride-sharing services is
measured by two latent and eighteen observed variables. The latent variable
'safety & security' is more influential than 'service performance' on the
overall quality of service index. Under 'safety & security' the other two
variables, i.e., 'account information' and 'personal information' are found to
be the most significant that impact the decision to share rides with others. In
addition, 'risk of conflict' and 'possibility of accident' are identified using
the perception model as the lowest contributing variables. Factor analysis
reveals the suitability and reliability of the proposed SEM. Identifying the
influential parameters in this will help the service providers understand and
improve the quality of ride-sharing service for users.

arXiv link: http://arxiv.org/abs/2210.04086v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-10-08

Empirical Bayes Selection for Value Maximization

Authors: Dominic Coey, Kenneth Hung

We study the problem of selecting the best $m$ units from a set of $n$ as $m
/ n \to \alpha \in (0, 1)$, where noisy, heteroskedastic measurements of the
units' true values are available and the decision-maker wishes to maximize the
aggregate true value of the units selected. Given a parametric prior
distribution, the empirical Bayes decision rule incurs $O_p(n^{-1})$ regret
relative to the Bayesian oracle that knows the true prior. More generally, if
the error in the estimated prior is of order $O_p(r_n)$, regret is
$O_p(r_n^2)$. In this sense selection of the best units is fundamentally
easier than estimation of their values. We show this regret bound is
sharp in the parametric case, by giving an example in which it is attained.
Using priors calibrated from a dataset of over four thousand internet
experiments, we confirm that empirical Bayes methods perform well in detecting
the best treatments with only a modest number of experiments.

arXiv link: http://arxiv.org/abs/2210.03905v3

Econometrics arXiv paper, submitted: 2022-10-07

Order Statistics Approaches to Unobserved Heterogeneity in Auctions

Authors: Yao Luo, Peijun Sang, Ruli Xiao

We establish nonparametric identification of auction models with continuous
and nonseparable unobserved heterogeneity using three consecutive order
statistics of bids. We then propose sieve maximum likelihood estimators for the
joint distribution of unobserved heterogeneity and the private value, as well
as their conditional and marginal distributions. Lastly, we apply our
methodology to a novel dataset from judicial auctions in China. Our estimates
suggest substantial gains from accounting for unobserved heterogeneity when
setting reserve prices. We propose a simple scheme that achieves nearly optimal
revenue by using the appraisal value as the reserve price.

arXiv link: http://arxiv.org/abs/2210.03547v1

Econometrics arXiv updated paper (originally submitted: 2022-10-06)

On estimating Armington elasticities for Japan's meat imports

Authors: Satoshi Nakano, Kazuhiko Nishimura

By fully accounting for the distinct tariff regimes levied on imported meat,
we estimate substitution elasticities of Japan's two-stage import aggregation
functions for beef, chicken and pork. While the regression analysis crucially
depends on the price that consumers face, the post-tariff price of imported
meat depends not only on ad valorem duties but also on tariff rate quotas and
gate price system regimes. The effective tariff rate is consequently evaluated
by utilizing monthly transaction data. To address potential endogeneity
problems, we apply exchange rates that we believe to be independent of the
demand shocks for imported meat. The panel nature of the data allows us to
retrieve the first-stage aggregates via time dummy variables, free of demand
shocks, to be used as part of the explanatory variable and as an instrument in
the second-stage regression.

arXiv link: http://arxiv.org/abs/2210.05358v2

Econometrics arXiv updated paper (originally submitted: 2022-10-06)

Testing the Number of Components in Finite Mixture Normal Regression Model with Panel Data

Authors: Yu Hao, Hiroyuki Kasahara

This paper develops the likelihood ratio-based test of the null hypothesis of
a M0-component model against an alternative of (M0 + 1)-component model in the
normal mixture panel regression by extending the Expectation-Maximization (EM)
test of Chen and Li (2009a) and Kasahara and Shimotsu (2015) to the case of
panel data. We show that, unlike the cross-sectional normal mixture, the
first-order derivative of the density function for the variance parameter in
the panel normal mixture is linearly independent of its second-order
derivatives for the mean parameter. On the other hand, like the cross-sectional
normal mixture, the likelihood ratio test statistic of the panel normal mixture
is unbounded. We consider the Penalized Maximum Likelihood Estimator to deal
with the unboundedness, where we obtain the data-driven penalty function via
computational experiments. We derive the asymptotic distribution of the
Penalized Likelihood Ratio Test (PLRT) and EM test statistics by expanding the
log-likelihood function up to five times for the reparameterized parameters.
The simulation experiment indicates good finite sample performance of the
proposed EM test. We apply our EM test to estimate the number of production
technology types for the finite mixture Cobb-Douglas production function model
studied by Kasahara et al. (2022) used the panel data of the Japanese and
Chilean manufacturing firms. We find the evidence of heterogeneity in
elasticities of output for intermediate goods, suggesting that production
function is heterogeneous across firms beyond their Hicks-neutral productivity
terms.

arXiv link: http://arxiv.org/abs/2210.02824v2

Econometrics arXiv updated paper (originally submitted: 2022-10-05)

The Local to Unity Dynamic Tobit Model

Authors: Anna Bykhovskaya, James A. Duffy

This paper considers highly persistent time series that are subject to
nonlinearities in the form of censoring or an occasionally binding constraint,
such as are regularly encountered in macroeconomics. A tractable candidate
model for such series is the dynamic Tobit with a root local to unity. We show
that this model generates a process that converges weakly to a non-standard
limiting process, that is constrained (regulated) to be positive. Surprisingly,
despite the presence of censoring, the OLS estimators of the model parameters
are consistent. We show that this allows OLS-based inferences to be drawn on
the overall persistence of the process (as measured by the sum of the
autoregressive coefficients), and for the null of a unit root to be tested in
the presence of censoring. Our simulations illustrate that the conventional ADF
test substantially over-rejects when the data is generated by a dynamic Tobit
with a unit root, whereas our proposed test is correctly sized. We provide an
application of our methods to testing for a unit root in the Swiss franc / euro
exchange rate, during a period when this was subject to an occasionally binding
lower bound.

arXiv link: http://arxiv.org/abs/2210.02599v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-10-05

Regression discontinuity design with right-censored survival data

Authors: Emil Aas Stoltenberg

In this paper the regression discontinuity design is adapted to the survival
analysis setting with right-censored data, studied in an intensity based
counting process framework. In particular, a local polynomial regression
version of the Aalen additive hazards estimator is introduced as an estimator
of the difference between two covariate dependent cumulative hazard rate
functions. Large-sample theory for this estimator is developed, including
confidence intervals that take into account the uncertainty associated with
bias correction. As is standard in the causality literature, the models and the
theory are embedded in the potential outcomes framework. Two general results
concerning potential outcomes and the multiplicative hazards model for survival
data are presented.

arXiv link: http://arxiv.org/abs/2210.02548v1

Econometrics arXiv updated paper (originally submitted: 2022-10-05)

Bikeability and the induced demand for cycling

Authors: Mogens Fosgerau, Miroslawa Lukawska, Mads Paulsen, Thomas Kjær Rasmussen

To what extent is the volume of urban bicycle traffic affected by the
provision of bicycle infrastructure? In this study, we exploit a large dataset
of observed bicycle trajectories in combination with a fine-grained
representation of the Copenhagen bicycle-relevant network. We apply a novel
model for bicyclists' choice of route from origin to destination that takes the
complete network into account. This enables us to determine bicyclists'
preferences for a range of infrastructure and land-use types. We use the
estimated preferences to compute a subjective cost of bicycle travel, which we
correlate with the number of bicycle trips across a large number of
origin-destination pairs. Simulations suggest that the extensive Copenhagen
bicycle lane network has caused the number of bicycle trips and the bicycle
kilometers traveled to increase by 60% and 90%, respectively, compared with a
counterfactual without the bicycle lane network. This translates into an annual
benefit of EUR 0.4M per km of bicycle lane owing to changes in subjective
travel cost, health, and accidents. Our results thus strongly support the
provision of bicycle infrastructure.

arXiv link: http://arxiv.org/abs/2210.02504v2

Econometrics arXiv updated paper (originally submitted: 2022-10-04)

Probability of Causation with Sample Selection: A Reanalysis of the Impacts of Jóvenes en Acción on Formality

Authors: Vitor Possebom, Flavio Riva

This paper identifies the probability of causation when there is sample
selection. We show that the probability of causation is partially identified
for individuals who are always observed regardless of treatment status and
derive sharp bounds under three increasingly restrictive sets of assumptions.
The first set imposes an exogenous treatment and a monotone sample selection
mechanism. To tighten these bounds, the second set also imposes the monotone
treatment response assumption, while the third set additionally imposes a
stochastic dominance assumption. Finally, we use experimental data from the
Colombian job training program J\'ovenes en Acci\'on to empirically illustrate
our approach's usefulness. We find that, among always-employed women, at least
10.2% and at most 13.4% transitioned to the formal labor market because of the
program. However, our 90%-confidence region does not reject the null hypothesis
that the lower bound is equal to zero.

arXiv link: http://arxiv.org/abs/2210.01938v6

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-10-04

Revealing Unobservables by Deep Learning: Generative Element Extraction Networks (GEEN)

Authors: Yingyao Hu, Yang Liu, Jiaxiong Yao

Latent variable models are crucial in scientific research, where a key
variable, such as effort, ability, and belief, is unobserved in the sample but
needs to be identified. This paper proposes a novel method for estimating
realizations of a latent variable $X^*$ in a random sample that contains its
multiple measurements. With the key assumption that the measurements are
independent conditional on $X^*$, we provide sufficient conditions under which
realizations of $X^*$ in the sample are locally unique in a class of
deviations, which allows us to identify realizations of $X^*$. To the best of
our knowledge, this paper is the first to provide such identification in
observation. We then use the Kullback-Leibler distance between the two
probability densities with and without the conditional independence as the loss
function to train a Generative Element Extraction Networks (GEEN) that maps
from the observed measurements to realizations of $X^*$ in the sample. The
simulation results imply that this proposed estimator works quite well and the
estimated values are highly correlated with realizations of $X^*$. Our
estimator can be applied to a large class of latent variable models and we
expect it will change how people deal with latent variables.

arXiv link: http://arxiv.org/abs/2210.01300v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-10-04

Structural Estimation of Markov Decision Processes in High-Dimensional State Space with Finite-Time Guarantees

Authors: Siliang Zeng, Mingyi Hong, Alfredo Garcia

We consider the task of estimating a structural model of dynamic decisions by
a human agent based upon the observable history of implemented actions and
visited states. This problem has an inherent nested structure: in the inner
problem, an optimal policy for a given reward function is identified while in
the outer problem, a measure of fit is maximized. Several approaches have been
proposed to alleviate the computational burden of this nested-loop structure,
but these methods still suffer from high complexity when the state space is
either discrete with large cardinality or continuous in high dimensions. Other
approaches in the inverse reinforcement learning (IRL) literature emphasize
policy estimation at the expense of reduced reward estimation accuracy. In this
paper we propose a single-loop estimation algorithm with finite time guarantees
that is equipped to deal with high-dimensional state spaces without
compromising reward estimation accuracy. In the proposed algorithm, each policy
improvement step is followed by a stochastic gradient step for likelihood
maximization. We show that the proposed algorithm converges to a stationary
solution with a finite-time guarantee. Further, if the reward is parameterized
linearly, we show that the algorithm approximates the maximum likelihood
estimator sublinearly. Finally, by using robotics control problems in MuJoCo
and their transfer settings, we show that the proposed algorithm achieves
superior performance compared with other IRL and imitation learning benchmarks.

arXiv link: http://arxiv.org/abs/2210.01282v3

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2022-10-03

Reconciling econometrics with continuous maximum-entropy network models

Authors: Marzio Di Vece, Diego Garlaschelli, Tiziano Squartini

In the study of economic networks, econometric approaches interpret the
traditional Gravity Model specification as the expected link weight coming from
a probability distribution whose functional form can be chosen arbitrarily,
while statistical-physics approaches construct maximum-entropy distributions of
weighted graphs, constrained to satisfy a given set of measurable network
properties. In a recent, companion paper, we integrated the two approaches and
applied them to the World Trade Web, i.e. the network of international trade
among world countries. While the companion paper dealt only with
discrete-valued link weights, the present paper extends the theoretical
framework to continuous-valued link weights. In particular, we construct two
broad classes of maximum-entropy models, namely the integrated and the
conditional ones, defined by different criteria to derive and combine the
probabilistic rules for placing links and loading them with weights. In the
integrated models, both rules follow from a single, constrained optimization of
the continuous Kullback-Leibler divergence; in the conditional models, the two
rules are disentangled and the functional form of the weight distribution
follows from a conditional, optimization procedure. After deriving the general
functional form of the two classes, we turn each of them into a proper family
of econometric models via a suitable identification of the econometric function
relating the corresponding, expected link weights to macroeconomic factors.
After testing the two classes of models on World Trade Web data, we discuss
their strengths and weaknesses.

arXiv link: http://arxiv.org/abs/2210.01179v3

Econometrics arXiv updated paper (originally submitted: 2022-10-02)

Conditional Distribution Model Specification Testing Using Chi-Square Goodness-of-Fit Tests

Authors: Miguel A. Delgado, Julius Vainora

This paper introduces chi-square goodness-of-fit tests to check for
conditional distribution model specification. The data is cross-classified
according to the Rosenblatt transform of the dependent variable and the
explanatory variables, resulting in a contingency table with expected joint
frequencies equal to the product of the row and column marginals, which are
independent of the model parameters. The test statistics assess whether the
difference between observed and expected frequencies is due to chance. We
propose three types of test statistics: the classical trinity of tests based on
the likelihood of grouped data, and two statistics based on the efficient raw
data estimator -- namely, a Chernoff-Lehmann and a generalized Wald statistic.
The asymptotic distribution of these statistics is invariant to
sample-dependent partitions. Monte Carlo experiments demonstrate the good
performance of the proposed tests.

arXiv link: http://arxiv.org/abs/2210.00624v4

Econometrics arXiv cross-link from cs.SC (cs.SC), submitted: 2022-10-02

AI-Assisted Discovery of Quantitative and Formal Models in Social Science

Authors: Julia Balla, Sihao Huang, Owen Dugan, Rumen Dangovski, Marin Soljacic

In social science, formal and quantitative models, such as ones describing
economic growth and collective action, are used to formulate mechanistic
explanations, provide predictions, and uncover questions about observed
phenomena. Here, we demonstrate the use of a machine learning system to aid the
discovery of symbolic models that capture nonlinear and dynamical relationships
in social science datasets. By extending neuro-symbolic methods to find compact
functions and differential equations in noisy and longitudinal data, we show
that our system can be used to discover interpretable models from real-world
data in economics and sociology. Augmenting existing workflows with symbolic
regression can help uncover novel relationships and explore counterfactual
models during the scientific process. We propose that this AI-assisted
framework can bridge parametric and non-parametric models commonly employed in
social science research by systematically exploring the space of nonlinear
models and enabling fine-grained control over expressivity and
interpretability.

arXiv link: http://arxiv.org/abs/2210.00563v3

Econometrics arXiv paper, submitted: 2022-10-02

Large-Scale Allocation of Personalized Incentives

Authors: Lucas Javaudin, Andrea Araldo, André de Palma

We consider a regulator willing to drive individual choices towards
increasing social welfare by providing incentives to a large population of
individuals.
For that purpose, we formalize and solve the problem of finding an optimal
personalized-incentive policy: optimal in the sense that it maximizes social
welfare under an incentive budget constraint, personalized in the sense that
the incentives proposed depend on the alternatives available to each
individual, as well as her preferences.
We propose a polynomial time approximation algorithm that computes a policy
within few seconds and we analytically prove that it is boundedly close to the
optimum.
We then extend the problem to efficiently calculate the Maximum Social
Welfare Curve, which gives the maximum social welfare achievable for a range of
incentive budgets (not just one value).
This curve is a valuable practical tool for the regulator to determine the
right incentive budget to invest.
Finally, we simulate a large-scale application to mode choice in a French
department (about 200 thousands individuals) and illustrate the effectiveness
of the proposed personalized-incentive policy in reducing CO2 emissions.

arXiv link: http://arxiv.org/abs/2210.00463v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2022-10-01

Yurinskii's Coupling for Martingales

Authors: Matias D. Cattaneo, Ricardo P. Masini, William G. Underwood

Yurinskii's coupling is a popular theoretical tool for non-asymptotic
distributional analysis in mathematical statistics and applied probability,
offering a Gaussian strong approximation with an explicit error bound under
easily verifiable conditions. Originally stated in $\ell_2$-norm for sums of
independent random vectors, it has recently been extended both to the
$\ell_p$-norm, for $1 \leq p \leq \infty$, and to vector-valued martingales in
$\ell_2$-norm, under some strong conditions. We present as our main result a
Yurinskii coupling for approximate martingales in $\ell_p$-norm, under
substantially weaker conditions than those previously imposed. Our formulation
further allows for the coupling variable to follow a more general Gaussian
mixture distribution, and we provide a novel third-order coupling method which
gives tighter approximations in certain settings. We specialize our main result
to mixingales, martingales, and independent data, and derive uniform Gaussian
mixture strong approximations for martingale empirical processes. Applications
to nonparametric partitioning-based and local polynomial regression procedures
are provided, alongside central limit theorems for high-dimensional martingale
vectors.

arXiv link: http://arxiv.org/abs/2210.00362v4

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2022-09-30

A Posteriori Risk Classification and Ratemaking with Random Effects in the Mixture-of-Experts Model

Authors: Spark C. Tseung, Ian Weng Chan, Tsz Chai Fung, Andrei L. Badescu, X. Sheldon Lin

A well-designed framework for risk classification and ratemaking in
automobile insurance is key to insurers' profitability and risk management,
while also ensuring that policyholders are charged a fair premium according to
their risk profile. In this paper, we propose to adapt a flexible regression
model, called the Mixed LRMoE, to the problem of a posteriori risk
classification and ratemaking, where policyholder-level random effects are
incorporated to better infer their risk profile reflected by the claim history.
We also develop a stochastic variational Expectation-Conditional-Maximization
algorithm for estimating model parameters and inferring the posterior
distribution of random effects, which is numerically efficient and scalable to
large insurance portfolios. We then apply the Mixed LRMoE model to a real,
multiyear automobile insurance dataset, where the proposed framework is shown
to offer better fit to data and produce posterior premium which accurately
reflects policyholders' claim history.

arXiv link: http://arxiv.org/abs/2209.15212v1

Econometrics arXiv updated paper (originally submitted: 2022-09-29)

Statistical Inference for Fisher Market Equilibrium

Authors: Luofeng Liao, Yuan Gao, Christian Kroer

Statistical inference under market equilibrium effects has attracted
increasing attention recently. In this paper we focus on the specific case of
linear Fisher markets. They have been widely use in fair resource allocation of
food/blood donations and budget management in large-scale Internet ad auctions.
In resource allocation, it is crucial to quantify the variability of the
resource received by the agents (such as blood banks and food banks) in
addition to fairness and efficiency properties of the systems. For ad auction
markets, it is important to establish statistical properties of the platform's
revenues in addition to their expected values. To this end, we propose a
statistical framework based on the concept of infinite-dimensional Fisher
markets. In our framework, we observe a market formed by a finite number of
items sampled from an underlying distribution (the "observed market") and aim
to infer several important equilibrium quantities of the underlying long-run
market. These equilibrium quantities include individual utilities, social
welfare, and pacing multipliers. Through the lens of sample average
approximation (SSA), we derive a collection of statistical results and show
that the observed market provides useful statistical information of the
long-run market. In other words, the equilibrium quantities of the observed
market converge to the true ones of the long-run market with strong statistical
guarantees. These include consistency, finite sample bounds, asymptotics, and
confidence. As an extension, we discuss revenue inference in quasilinear Fisher
markets.

arXiv link: http://arxiv.org/abs/2209.15422v3

Econometrics arXiv paper, submitted: 2022-09-29

With big data come big problems: pitfalls in measuring basis risk for crop index insurance

Authors: Matthieu Stigler, Apratim Dey, Andrew Hobbs, David Lobell

New satellite sensors will soon make it possible to estimate field-level crop
yields, showing a great potential for agricultural index insurance. This paper
identifies an important threat to better insurance from these new technologies:
data with many fields and few years can yield downward biased estimates of
basis risk, a fundamental metric in index insurance. To demonstrate this bias,
we use state-of-the-art satellite-based data on agricultural yields in the US
and in Kenya to estimate and simulate basis risk. We find a substantive
downward bias leading to a systematic overestimation of insurance quality.
In this paper, we argue that big data in crop insurance can lead to a new
situation where the number of variables $N$ largely exceeds the number of
observations $T$. In such a situation where $T\ll N$, conventional asymptotics
break, as evidenced by the large bias we find in simulations. We show how the
high-dimension, low-sample-size (HDLSS) asymptotics, together with the spiked
covariance model, provide a more relevant framework for the $T\ll N$ case
encountered in index insurance. More precisely, we derive the asymptotic
distribution of the relative share of the first eigenvalue of the covariance
matrix, a measure of systematic risk in index insurance. Our formula accurately
approximates the empirical bias simulated from the satellite data, and provides
a useful tool for practitioners to quantify bias in insurance quality.

arXiv link: http://arxiv.org/abs/2209.14611v1

Econometrics arXiv updated paper (originally submitted: 2022-09-29)

Fast Inference for Quantile Regression with Tens of Millions of Observations

Authors: Sokbae Lee, Yuan Liao, Myung Hwan Seo, Youngki Shin

Big data analytics has opened new avenues in economic research, but the
challenge of analyzing datasets with tens of millions of observations is
substantial. Conventional econometric methods based on extreme estimators
require large amounts of computing resources and memory, which are often not
readily available. In this paper, we focus on linear quantile regression
applied to "ultra-large" datasets, such as U.S. decennial censuses. A fast
inference framework is presented, utilizing stochastic subgradient descent
(S-subGD) updates. The inference procedure handles cross-sectional data
sequentially: (i) updating the parameter estimate with each incoming "new
observation", (ii) aggregating it as a $Polyak-Ruppert$ average, and
(iii) computing a pivotal statistic for inference using only a solution path.
The methodology draws from time-series regression to create an asymptotically
pivotal statistic through random scaling. Our proposed test statistic is
calculated in a fully online fashion and critical values are calculated without
resampling. We conduct extensive numerical studies to showcase the
computational merits of our proposed inference. For inference problems as large
as $(n, d) \sim (10^7, 10^3)$, where $n$ is the sample size and $d$ is the
number of regressors, our method generates new insights, surpassing current
inference methods in computation. Our method specifically reveals trends in the
gender gap in the U.S. college wage premium using millions of observations,
while controlling over $10^3$ covariates to mitigate confounding effects.

arXiv link: http://arxiv.org/abs/2209.14502v5

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-09-28

Minimax Optimal Kernel Operator Learning via Multilevel Training

Authors: Jikai Jin, Yiping Lu, Jose Blanchet, Lexing Ying

Learning mappings between infinite-dimensional function spaces has achieved
empirical success in many disciplines of machine learning, including generative
modeling, functional data analysis, causal inference, and multi-agent
reinforcement learning. In this paper, we study the statistical limit of
learning a Hilbert-Schmidt operator between two infinite-dimensional Sobolev
reproducing kernel Hilbert spaces. We establish the information-theoretic lower
bound in terms of the Sobolev Hilbert-Schmidt norm and show that a
regularization that learns the spectral components below the bias contour and
ignores the ones that are above the variance contour can achieve the optimal
learning rate. At the same time, the spectral components between the bias and
variance contours give us flexibility in designing computationally feasible
machine learning algorithms. Based on this observation, we develop a multilevel
kernel operator learning algorithm that is optimal when learning linear
operators between infinite-dimensional function spaces.

arXiv link: http://arxiv.org/abs/2209.14430v3

Econometrics arXiv paper, submitted: 2022-09-28

The Network Propensity Score: Spillovers, Homophily, and Selection into Treatment

Authors: Alejandro Sanchez-Becerra

I establish primitive conditions for unconfoundedness in a coherent model
that features heterogeneous treatment effects, spillovers,
selection-on-observables, and network formation. I identify average partial
effects under minimal exchangeability conditions. If social interactions are
also anonymous, I derive a three-dimensional network propensity score,
characterize its support conditions, relate it to recent work on network
pseudo-metrics, and study extensions. I propose a two-step semiparametric
estimator for a random coefficients model which is consistent and
asymptotically normal as the number and size of the networks grows. I apply my
estimator to a political participation intervention Uganda and a microfinance
application in India.

arXiv link: http://arxiv.org/abs/2209.14391v1

Econometrics arXiv paper, submitted: 2022-09-28

Economic effects of Chile FTAs and an eventual CTPP accession

Authors: Vargas Sepulveda, Mauricio "Pacha"

In this article, we show the benefits derived from the Chile-USA (in-force
Jan, 2004) and Chile-China (in-force Oct, 2006) FTA on GDP consumer and
producers to conclude that Chile improved its welfare improved after its
subscription. From that point, we extrapolate to show the direct and indirect
benefits of CTPP accession.

arXiv link: http://arxiv.org/abs/2209.14748v1

Econometrics arXiv updated paper (originally submitted: 2022-09-28)

Linear estimation of global average treatment effects

Authors: Stefan Faridani, Paul Niehaus

We study the problem of estimating the average causal effect of treating
every member of a population, as opposed to none, using an experiment that
treats only some. We consider settings where spillovers have global support and
decay slowly with (a generalized notion of) distance. We derive the minimax
rate over both estimators and designs, and show that it increases with the
spatial rate of spillover decay. Estimators based on OLS regressions like those
used to analyze recent large-scale experiments are consistent (though only
after de-weighting), achieve the minimax rate when the DGP is linear, and
converge faster than IPW-based alternatives when treatment clusters are small,
providing one justification for OLS's ubiquity. When the DGP is nonlinear they
remain consistent but converge slowly. We further address inference and
bandwidth selection. Applied to the cash transfer experiment studied by Egger
et al. (2022) these methods yield a 20% larger estimated effect on consumption.

arXiv link: http://arxiv.org/abs/2209.14181v6

Econometrics arXiv updated paper (originally submitted: 2022-09-25)

Sentiment Analysis on Inflation after Covid-19

Authors: Xinyu Li, Zihan Tang

We implement traditional machine learning and deep learning methods for
global tweets from 2017-2022 to build a high-frequency measure of the public's
sentiment index on inflation and analyze its correlation with other online data
sources such as google trend and market-oriented inflation index. We use
manually labeled trigrams to test the prediction performance of several machine
learning models(logistic regression,random forest etc.) and choose Bert model
for final demonstration. Later, we sum daily tweets' sentiment scores gained
from Bert model to obtain the predicted inflation sentiment index, and we
further analyze the regional and pre/post covid patterns of these inflation
indexes. Lastly, we take other empirical inflation-related data as references
and prove that twitter-based inflation sentiment analysis method has an
outstanding capability to predict inflation. The results suggest that Twitter
combined with deep learning methods can be a novel and timely method to utilize
existing abundant data sources on inflation expectations and provide daily
indicators of consumers' perception on inflation.

arXiv link: http://arxiv.org/abs/2209.14737v2

Econometrics arXiv updated paper (originally submitted: 2022-09-24)

Bayesian Modeling of TVP-VARs Using Regression Trees

Authors: Niko Hauzenberger, Florian Huber, Gary Koop, James Mitchell

In light of widespread evidence of parameter instability in macroeconomic
models, many time-varying parameter (TVP) models have been proposed. This paper
proposes a nonparametric TVP-VAR model using Bayesian additive regression trees
(BART) that models the TVPs as an unknown function of effect modifiers. The
novelty of this model arises from the fact that the law of motion driving the
parameters is treated nonparametrically. This leads to great flexibility in the
nature and extent of parameter change, both in the conditional mean and in the
conditional variance. Parsimony is achieved through adopting nonparametric
factor structures and use of shrinkage priors. In an application to US
macroeconomic data, we illustrate the use of our model in tracking both the
evolving nature of the Phillips curve and how the effects of business cycle
shocks on inflation measures vary nonlinearly with changes in the effect
modifiers.

arXiv link: http://arxiv.org/abs/2209.11970v3

Econometrics arXiv updated paper (originally submitted: 2022-09-23)

Revisiting the Analysis of Matched-Pair and Stratified Experiments in the Presence of Attrition

Authors: Yuehao Bai, Meng Hsuan Hsieh, Jizhou Liu, Max Tabord-Meehan

In this paper we revisit some common recommendations regarding the analysis
of matched-pair and stratified experimental designs in the presence of
attrition. Our main objective is to clarify a number of well-known claims about
the practice of dropping pairs with an attrited unit when analyzing
matched-pair designs. Contradictory advice appears in the literature about
whether or not dropping pairs is beneficial or harmful, and stratifying into
larger groups has been recommended as a resolution to the issue. To address
these claims, we derive the estimands obtained from the difference-in-means
estimator in a matched-pair design both when the observations from pairs with
an attrited unit are retained and when they are dropped. We find limited
evidence to support the claims that dropping pairs helps recover the average
treatment effect, but we find that it may potentially help in recovering a
convex weighted average of conditional average treatment effects. We report
similar findings for stratified designs when studying the estimands obtained
from a regression of outcomes on treatment with and without strata fixed
effects.

arXiv link: http://arxiv.org/abs/2209.11840v6

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-09-23

Doubly Fair Dynamic Pricing

Authors: Jianyu Xu, Dan Qiao, Yu-Xiang Wang

We study the problem of online dynamic pricing with two types of fairness
constraints: a "procedural fairness" which requires the proposed prices to be
equal in expectation among different groups, and a "substantive fairness" which
requires the accepted prices to be equal in expectation among different groups.
A policy that is simultaneously procedural and substantive fair is referred to
as "doubly fair". We show that a doubly fair policy must be random to have
higher revenue than the best trivial policy that assigns the same price to
different groups. In a two-group setting, we propose an online learning
algorithm for the 2-group pricing problems that achieves $O(T)$
regret, zero procedural unfairness and $O(T)$ substantive
unfairness over $T$ rounds of learning. We also prove two lower bounds showing
that these results on regret and unfairness are both information-theoretically
optimal up to iterated logarithmic factors. To the best of our knowledge, this
is the first dynamic pricing algorithm that learns to price while satisfying
two fairness constraints at the same time.

arXiv link: http://arxiv.org/abs/2209.11837v1

Econometrics arXiv updated paper (originally submitted: 2022-09-23)

Linear Multidimensional Regression with Interactive Fixed-Effects

Authors: Hugo Freeman

This paper studies a linear model for multidimensional panel data of three or
more dimensions with unobserved interactive fixed-effects. The main estimator
uses double debias methods, and requires two preliminary steps. First, the
model is embedded within a two-dimensional panel framework where factor model
methods in Bai (2009) lead to consistent, but slowly converging, estimates. The
second step develops a weighted-within transformation that is robust to
multidimensional interactive fixed-effects and achieves the parametric rate of
consistency. This is combined with a double debias procedure for asymptotically
normal estimates. The methods are implemented to estimate the demand elasticity
for beer.

arXiv link: http://arxiv.org/abs/2209.11691v6

Econometrics arXiv updated paper (originally submitted: 2022-09-23)

Treatment Effects with Multidimensional Unobserved Heterogeneity: Identification of the Marginal Treatment Effect

Authors: Toshiki Tsuda

This paper establishes sufficient conditions for the identification of the
marginal treatment effects with multivalued treatments. Our model is based on a
multinomial choice model with utility maximization. Our MTE generalizes the MTE
defined in Heckman and Vytlacil (2005) in binary treatment models. As in the
binary case, we can interpret the MTE as the treatment effect for persons who
are indifferent between two treatments at a particular level. Our MTE enables
one to obtain the treatment effects of those with specific preference orders
over the choice set. Further, our results can identify other parameters such as
the marginal distribution of potential outcomes.

arXiv link: http://arxiv.org/abs/2209.11444v5

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2022-09-22

Forecasting Cryptocurrencies Log-Returns: a LASSO-VAR and Sentiment Approach

Authors: Federico D'Amario, Milos Ciganovic

Cryptocurrencies have become a trendy topic recently, primarily due to their
disruptive potential and reports of unprecedented returns. In addition,
academics increasingly acknowledge the predictive power of Social Media in many
fields and, more specifically, for financial markets and economics. In this
paper, we leverage the predictive power of Twitter and Reddit sentiment
together with Google Trends indexes and volume to forecast the log returns of
ten cryptocurrencies. Specifically, we consider $Bitcoin$, $Ethereum$,
$Tether$, $Binance Coin$, $Litecoin$, $Enjin Coin$, $Horizen$, $Namecoin$,
$Peercoin$, and $Feathercoin$. We evaluate the performance of LASSO-VAR using
daily data from January 2018 to January 2022. In a 30 days recursive forecast,
we can retrieve the correct direction of the actual series more than 50% of the
time. We compare this result with the main benchmarks, and we see a 10%
improvement in Mean Directional Accuracy (MDA). The use of sentiment and
attention variables as predictors increase significantly the forecast accuracy
in terms of MDA but not in terms of Root Mean Squared Errors. We perform a
Granger causality test using a post-double LASSO selection for high-dimensional
VARs. Results show no "causality" from Social Media sentiment to
cryptocurrencies returns

arXiv link: http://arxiv.org/abs/2210.00883v1

Econometrics arXiv paper, submitted: 2022-09-22

Multiscale Comparison of Nonparametric Trend Curves

Authors: Marina Khismatullina, Michael Vogt

We develop new econometric methods for the comparison of nonparametric time
trends. In many applications, practitioners are interested in whether the
observed time series all have the same time trend. Moreover, they would often
like to know which trends are different and in which time intervals they
differ. We design a multiscale test to formally approach these questions.
Specifically, we develop a test which allows to make rigorous confidence
statements about which time trends are different and where (that is, in which
time intervals) they differ. Based on our multiscale test, we further develop a
clustering algorithm which allows to cluster the observed time series into
groups with the same trend. We derive asymptotic theory for our test and
clustering methods. The theory is complemented by a simulation study and two
applications to GDP growth data and house pricing data.

arXiv link: http://arxiv.org/abs/2209.10841v1

Econometrics arXiv paper, submitted: 2022-09-21

Modelling the Frequency of Home Deliveries: An Induced Travel Demand Contribution of Aggrandized E-shopping in Toronto during COVID-19 Pandemics

Authors: Yicong Liu, Kaili Wang, Patrick Loa, Khandker Nurul Habib

The COVID-19 pandemic dramatically catalyzed the proliferation of e-shopping.
The dramatic growth of e-shopping will undoubtedly cause significant impacts on
travel demand. As a result, transportation modeller's ability to model
e-shopping demand is becoming increasingly important. This study developed
models to predict household' weekly home delivery frequencies. We used both
classical econometric and machine learning techniques to obtain the best model.
It is found that socioeconomic factors such as having an online grocery
membership, household members' average age, the percentage of male household
members, the number of workers in the household and various land use factors
influence home delivery demand. This study also compared the interpretations
and performances of the machine learning models and the classical econometric
model. Agreement is found in the variable's effects identified through the
machine learning and econometric models. However, with similar recall accuracy,
the ordered probit model, a classical econometric model, can accurately predict
the aggregate distribution of household delivery demand. In contrast, both
machine learning models failed to match the observed distribution.

arXiv link: http://arxiv.org/abs/2209.10664v1

Econometrics arXiv updated paper (originally submitted: 2022-09-21)

Efficient Integrated Volatility Estimation in the Presence of Infinite Variation Jumps via Debiased Truncated Realized Variations

Authors: B. Cooper Boniece, José E. Figueroa-López, Yuchen Han

Statistical inference for stochastic processes based on high-frequency
observations has been an active research area for more than two decades. One of
the most well-known and widely studied problems has been the estimation of the
quadratic variation of the continuous component of an It\^o semimartingale with
jumps. Several rate- and variance-efficient estimators have been proposed in
the literature when the jump component is of bounded variation. However, to
date, very few methods can deal with jumps of unbounded variation. By
developing new high-order expansions of the truncated moments of a locally
stable L\'evy process, we propose a new rate- and variance-efficient volatility
estimator for a class of It\^o semimartingales whose jumps behave locally like
those of a stable L\'evy process with Blumenthal-Getoor index $Y\in (1,8/5)$
(hence, of unbounded variation). The proposed method is based on a two-step
debiasing procedure for the truncated realized quadratic variation of the
process and can also cover the case $Y<1$. Our Monte Carlo experiments indicate
that the method outperforms other efficient alternatives in the literature in
the setting covered by our theoretical framework.

arXiv link: http://arxiv.org/abs/2209.10128v3

Econometrics arXiv updated paper (originally submitted: 2022-09-20)

The boosted HP filter is more general than you might think

Authors: Ziwei Mei, Peter C. B. Phillips, Zhentao Shi

The global financial crisis and Covid recession have renewed discussion
concerning trend-cycle discovery in macroeconomic data, and boosting has
recently upgraded the popular HP filter to a modern machine learning device
suited to data-rich and rapid computational environments. This paper extends
boosting's trend determination capability to higher order integrated processes
and time series with roots that are local to unity. The theory is established
by understanding the asymptotic effect of boosting on a simple exponential
function. Given a universe of time series in FRED databases that exhibit
various dynamic patterns, boosting timely captures downturns at crises and
recoveries that follow.

arXiv link: http://arxiv.org/abs/2209.09810v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-09-19

A Dynamic Stochastic Block Model for Multidimensional Networks

Authors: Ovielt Baltodano López, Roberto Casarin

The availability of relational data can offer new insights into the
functioning of the economy. Nevertheless, modeling the dynamics in network data
with multiple types of relationships is still a challenging issue. Stochastic
block models provide a parsimonious and flexible approach to network analysis.
We propose a new stochastic block model for multidimensional networks, where
layer-specific hidden Markov-chain processes drive the changes in community
formation. The changes in the block membership of a node in a given layer may
be influenced by its own past membership in other layers. This allows for
clustering overlap, clustering decoupling, or more complex relationships
between layers, including settings of unidirectional, or bidirectional,
non-linear Granger block causality. We address the overparameterization issue
of a saturated specification by assuming a Multi-Laplacian prior distribution
within a Bayesian framework. Data augmentation and Gibbs sampling are used to
make the inference problem more tractable. Through simulations, we show that
standard linear models and the pairwise approach are unable to detect block
causality in most scenarios. In contrast, our model can recover the true
Granger causality structure. As an application to international trade, we show
that our model offers a unified framework, encompassing community detection and
Gravity equation modeling. We found new evidence of block Granger causality of
trade agreements and flows and core-periphery structure in both layers on a
large sample of countries.

arXiv link: http://arxiv.org/abs/2209.09354v2

Econometrics arXiv updated paper (originally submitted: 2022-09-19)

Statistical Treatment Rules under Social Interaction

Authors: Seungjin Han, Julius Owusu, Youngki Shin

In this paper we study treatment assignment rules in the presence of social
interaction. We construct an analytical framework under the anonymous
interaction assumption, where the decision problem becomes choosing a treatment
fraction. We propose a multinomial empirical success (MES) rule that includes
the empirical success rule of Manski (2004) as a special case. We investigate
the non-asymptotic bounds of the expected utility based on the MES rule.
Finally, we prove that the MES rule achieves the asymptotic optimality with the
minimax regret criterion.

arXiv link: http://arxiv.org/abs/2209.09077v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-09-19

Causal Effect Estimation with Global Probabilistic Forecasting: A Case Study of the Impact of Covid-19 Lockdowns on Energy Demand

Authors: Ankitha Nandipura Prasanna, Priscila Grecov, Angela Dieyu Weng, Christoph Bergmeir

The electricity industry is heavily implementing smart grid technologies to
improve reliability, availability, security, and efficiency. This
implementation needs technological advancements, the development of standards
and regulations, as well as testing and planning. Smart grid load forecasting
and management are critical for reducing demand volatility and improving the
market mechanism that connects generators, distributors, and retailers. During
policy implementations or external interventions, it is necessary to analyse
the uncertainty of their impact on the electricity demand to enable a more
accurate response of the system to fluctuating demand. This paper analyses the
uncertainties of external intervention impacts on electricity demand. It
implements a framework that combines probabilistic and global forecasting
models using a deep learning approach to estimate the causal impact
distribution of an intervention. The causal effect is assessed by predicting
the counterfactual distribution outcome for the affected instances and then
contrasting it to the real outcomes. We consider the impact of Covid-19
lockdowns on energy usage as a case study to evaluate the non-uniform effect of
this intervention on the electricity demand distribution. We could show that
during the initial lockdowns in Australia and some European countries, there
was often a more significant decrease in the troughs than in the peaks, while
the mean remained almost unaffected.

arXiv link: http://arxiv.org/abs/2209.08885v2

Econometrics arXiv paper, submitted: 2022-09-19

A Generalized Argmax Theorem with Applications

Authors: Gregory Cox

The argmax theorem is a useful result for deriving the limiting distribution
of estimators in many applications. The conclusion of the argmax theorem states
that the argmax of a sequence of stochastic processes converges in distribution
to the argmax of a limiting stochastic process. This paper generalizes the
argmax theorem to allow the maximization to take place over a sequence of
subsets of the domain. If the sequence of subsets converges to a limiting
subset, then the conclusion of the argmax theorem continues to hold. We
demonstrate the usefulness of this generalization in three applications:
estimating a structural break, estimating a parameter on the boundary of the
parameter space, and estimating a weakly identified parameter. The generalized
argmax theorem simplifies the proofs for existing results and can be used to
prove new results in these literatures.

arXiv link: http://arxiv.org/abs/2209.08793v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2022-09-17

A Structural Model for Detecting Communities in Networks

Authors: Alex Centeno

The objective of this paper is to identify and analyze the response actions
of a set of players embedded in sub-networks in the context of interaction and
learning. We characterize strategic network formation as a static game of
interactions where players maximize their utility depending on the connections
they establish and multiple interdependent actions that permit group-specific
parameters of players. It is challenging to apply this type of model to
real-life scenarios for two reasons: The computation of the Bayesian Nash
Equilibrium is highly demanding and the identification of social influence
requires the use of excluded variables that are oftentimes unavailable. Based
on the theoretical proposal, we propose a set of simulant equations and discuss
the identification of the social interaction effect employing multi-modal
network autoregressive.

arXiv link: http://arxiv.org/abs/2209.08380v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-09-15

Best Arm Identification with Contextual Information under a Small Gap

Authors: Masahiro Kato, Masaaki Imaizumi, Takuya Ishihara, Toru Kitagawa

We study the best-arm identification (BAI) problem with a fixed budget and
contextual (covariate) information. In each round of an adaptive experiment,
after observing contextual information, we choose a treatment arm using past
observations and current context. Our goal is to identify the best treatment
arm, which is a treatment arm with the maximal expected reward marginalized
over the contextual distribution, with a minimal probability of
misidentification. In this study, we consider a class of nonparametric bandit
models that converge to location-shift models when the gaps go to zero. First,
we derive lower bounds of the misidentification probability for a certain class
of strategies and bandit models (probabilistic models of potential outcomes)
under a small-gap regime. A small-gap regime is a situation where gaps of the
expected rewards between the best and suboptimal treatment arms go to zero,
which corresponds to one of the worst cases in identifying the best treatment
arm. We then develop the “Random Sampling (RS)-Augmented Inverse Probability
weighting (AIPW) strategy,” which is asymptotically optimal in the sense that
the probability of misidentification under the strategy matches the lower bound
when the budget goes to infinity in the small-gap regime. The RS-AIPW strategy
consists of the RS rule tracking a target sample allocation ratio and the
recommendation rule using the AIPW estimator.

arXiv link: http://arxiv.org/abs/2209.07330v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-09-15

$ρ$-GNF: A Copula-based Sensitivity Analysis to Unobserved Confounding Using Normalizing Flows

Authors: Sourabh Balgi, Jose M. Peña, Adel Daoud

We propose a novel sensitivity analysis to unobserved confounding in
observational studies using copulas and normalizing flows. Using the idea of
interventional equivalence of structural causal models, we develop $\rho$-GNF
($\rho$-graphical normalizing flow), where $\in[-1,+1]$ is a bounded
sensitivity parameter. This parameter represents the back-door non-causal
association due to unobserved confounding, and which is encoded with a Gaussian
copula. In other words, the $\rho$-GNF enables scholars to estimate the average
causal effect (ACE) as a function of $\rho$, while accounting for various
assumed strengths of the unobserved confounding. The output of the $\rho$-GNF
is what we denote as the $\rho_{curve}$ that provides the bounds for the ACE
given an interval of assumed $\rho$ values. In particular, the $\rho_{curve}$
enables scholars to identify the confounding strength required to nullify the
ACE, similar to other sensitivity analysis methods (e.g., the E-value).
Leveraging on experiments from simulated and real-world data, we show the
benefits of $\rho$-GNF. One benefit is that the $\rho$-GNF uses a Gaussian
copula to encode the distribution of the unobserved causes, which is commonly
used in many applied settings. This distributional assumption produces narrower
ACE bounds compared to other popular sensitivity analysis methods.

arXiv link: http://arxiv.org/abs/2209.07111v2

Econometrics arXiv updated paper (originally submitted: 2022-09-14)

Do shared e-scooter services cause traffic accidents? Evidence from six European countries

Authors: Cannon Cloud, Simon Heß, Johannes Kasinger

We estimate the causal effect of shared e-scooter services on traffic
accidents by exploiting variation in availability of e-scooter services,
induced by the staggered rollout across 93 cities in six countries.
Police-reported accidents in the average month increased by around 8.2% after
shared e-scooters were introduced. For cities with limited cycling
infrastructure and where mobility relies heavily on cars, estimated effects are
largest. In contrast, no effects are detectable in cities with high bike-lane
density. This heterogeneity suggests that public policy can play a crucial role
in mitigating accidents related to e-scooters and, more generally, to changes
in urban mobility.

arXiv link: http://arxiv.org/abs/2209.06870v2

Econometrics arXiv paper, submitted: 2022-09-14

Sample Fit Reliability

Authors: Gabriel Okasa, Kenneth A. Younge

Researchers frequently test and improve model fit by holding a sample
constant and varying the model. We propose methods to test and improve sample
fit by holding a model constant and varying the sample. Much as the bootstrap
is a well-known method to re-sample data and estimate the uncertainty of the
fit of parameters in a model, we develop Sample Fit Reliability (SFR) as a set
of computational methods to re-sample data and estimate the reliability of the
fit of observations in a sample. SFR uses Scoring to assess the reliability of
each observation in a sample, Annealing to check the sensitivity of results to
removing unreliable data, and Fitting to re-weight observations for more robust
analysis. We provide simulation evidence to demonstrate the advantages of using
SFR, and we replicate three empirical studies with treatment effects to
illustrate how SFR reveals new insights about each study.

arXiv link: http://arxiv.org/abs/2209.06631v1

Econometrics arXiv cross-link from physics.data-an (physics.data-an), submitted: 2022-09-13

Carbon Monitor-Power: near-real-time monitoring of global power generation on hourly to daily scales

Authors: Biqing Zhu, Xuanren Song, Zhu Deng, Wenli Zhao, Da Huo, Taochun Sun, Piyu Ke, Duo Cui, Chenxi Lu, Haiwang Zhong, Chaopeng Hong, Jian Qiu, Steven J. Davis, Pierre Gentine, Philippe Ciais, Zhu Liu

We constructed a frequently updated, near-real-time global power generation
dataset: Carbon Monitor-Power since January, 2016 at national levels with
near-global coverage and hourly-to-daily time resolution. The data presented
here are collected from 37 countries across all continents for eight source
groups, including three types of fossil sources (coal, gas, and oil), nuclear
energy and four groups of renewable energy sources (solar energy, wind energy,
hydro energy and other renewables including biomass, geothermal, etc.). The
global near-real-time power dataset shows the dynamics of the global power
system, including its hourly, daily, weekly and seasonal patterns as influenced
by daily periodical activities, weekends, seasonal cycles, regular and
irregular events (i.e., holidays) and extreme events (i.e., the COVID-19
pandemic). The Carbon Monitor-Power dataset reveals that the COVID-19 pandemic
caused strong disruptions in some countries (i.e., China and India), leading to
a temporary or long-lasting shift to low carbon intensity, while it had only
little impact in some other countries (i.e., Australia). This dataset offers a
large range of opportunities for power-related scientific research and
policy-making.

arXiv link: http://arxiv.org/abs/2209.06086v1

Econometrics arXiv paper, submitted: 2022-09-13

Estimation of Average Derivatives of Latent Regressors: With an Application to Inference on Buffer-Stock Saving

Authors: Hao Dong, Yuya Sasaki

This paper proposes a density-weighted average derivative estimator based on
two noisy measures of a latent regressor. Both measures have classical errors
with possibly asymmetric distributions. We show that the proposed estimator
achieves the root-n rate of convergence, and derive its asymptotic normal
distribution for statistical inference. Simulation studies demonstrate
excellent small-sample performance supporting the root-n asymptotic normality.
Based on the proposed estimator, we construct a formal test on the sub-unity of
the marginal propensity to consume out of permanent income (MPCP) under a
nonparametric consumption model and a permanent-transitory model of income
dynamics with nonparametric distribution. Applying the test to four recent
waves of U.S. Panel Study of Income Dynamics (PSID), we reject the null
hypothesis of the unit MPCP in favor of a sub-unit MPCP, supporting the
buffer-stock model of saving.

arXiv link: http://arxiv.org/abs/2209.05914v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-09-13

Bayesian Functional Emulation of CO2 Emissions on Future Climate Change Scenarios

Authors: Luca Aiello, Matteo Fontana, Alessandra Guglielmi

We propose a statistical emulator for a climate-economy deterministic
integrated assessment model ensemble, based on a functional regression
framework. Inference on the unknown parameters is carried out through a mixed
effects hierarchical model using a fully Bayesian framework with a prior
distribution on the vector of all parameters. We also suggest an autoregressive
parameterization of the covariance matrix of the error, with matching marginal
prior. In this way, we allow for a functional framework for the discretized
output of the simulators that allows their time continuous evaluation.

arXiv link: http://arxiv.org/abs/2209.05767v1

Econometrics arXiv paper, submitted: 2022-09-12

Testing Endogeneity of Spatial Weights Matrices in Spatial Dynamic Panel Data Models

Authors: Jieun Lee

I propose Robust Rao's Score (RS) test statistic to determine endogeneity of
spatial weights matrices in a spatial dynamic panel data (SDPD) model (Qu, Lee,
and Yu, 2017). I firstly introduce the bias-corrected score function since the
score function is not centered around zero due to the two-way fixed effects. I
further adjust score functions to rectify the over-rejection of the null
hypothesis under a presence of local misspecification in contemporaneous
dependence over space, dependence over time, or spatial time dependence. I then
derive the explicit forms of our test statistic. A Monte Carlo simulation
supports the analytics and shows nice finite sample properties. Finally, an
empirical illustration is provided using data from Penn World Table version
6.1.

arXiv link: http://arxiv.org/abs/2209.05563v1

Econometrics arXiv paper, submitted: 2022-09-12

Evidence and Strategy on Economic Distance in Spatially Augmented Solow-Swan Growth Model

Authors: Jieun Lee

Economists' interests in growth theory have a very long history (Harrod,
1939; Domar, 1946; Solow, 1956; Swan 1956; Mankiw, Romer, and Weil, 1992).
Recently, starting from the neoclassical growth model, Ertur and Koch (2007)
developed the spatially augmented Solow-Swan growth model with the exogenous
spatial weights matrices ($W$). While the exogenous $W$ assumption could be
true only with the geographical/physical distance, it may not be true when
economic/social distances play a role. Using Penn World Table version 7.1,
which covers year 1960-2010, I conducted the robust Rao's score test (Bera,
Dogan, and Taspinar, 2018) to determine if $W$ is endogeonus and used the
maximum likelihood estimation (Qu and Lee, 2015). The key finding is that the
significance and positive effects of physical capital externalities and spatial
externalities (technological interdependence) in Ertur and Koch (2007) were no
longer found with the exogenous $W$, but still they were with the endogenous
$W$ models. I also found an empirical strategy on which economic distance to
use when the data recently has been under heavy shocks of the worldwide
financial crises during year 1996-2010.

arXiv link: http://arxiv.org/abs/2209.05562v1

Econometrics arXiv updated paper (originally submitted: 2022-09-11)

Testing the martingale difference hypothesis in high dimension

Authors: Jinyuan Chang, Qing Jiang, Xiaofeng Shao

In this paper, we consider testing the martingale difference hypothesis for
high-dimensional time series. Our test is built on the sum of squares of the
element-wise max-norm of the proposed matrix-valued nonlinear dependence
measure at different lags. To conduct the inference, we approximate the null
distribution of our test statistic by Gaussian approximation and provide a
simulation-based approach to generate critical values. The asymptotic behavior
of the test statistic under the alternative is also studied. Our approach is
nonparametric as the null hypothesis only assumes the time series concerned is
martingale difference without specifying any parametric forms of its
conditional moments. As an advantage of Gaussian approximation, our test is
robust to the cross-series dependence of unknown magnitude. To the best of our
knowledge, this is the first valid test for the martingale difference
hypothesis that not only allows for large dimension but also captures nonlinear
serial dependence. The practical usefulness of our test is illustrated via
simulation and a real data analysis. The test is implemented in a user-friendly
R-function.

arXiv link: http://arxiv.org/abs/2209.04770v2

Econometrics arXiv updated paper (originally submitted: 2022-09-09)

Heterogeneous Treatment Effect Bounds under Sample Selection with an Application to the Effects of Social Media on Political Polarization

Authors: Phillip Heiler

We propose a method for estimation and inference for bounds for heterogeneous
causal effect parameters in general sample selection models where the treatment
can affect whether an outcome is observed and no exclusion restrictions are
available. The method provides conditional effect bounds as functions of policy
relevant pre-treatment variables. It allows for conducting valid statistical
inference on the unidentified conditional effects. We use a flexible
debiased/double machine learning approach that can accommodate non-linear
functional forms and high-dimensional confounders. Easily verifiable high-level
conditions for estimation, misspecification robust confidence intervals, and
uniform confidence bands are provided as well. We re-analyze data from a large
scale field experiment on Facebook on counter-attitudinal news subscription
with attrition. Our method yields substantially tighter effect bounds compared
to conventional methods and suggests depolarization effects for younger users.

arXiv link: http://arxiv.org/abs/2209.04329v5

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-09-08

W-Transformers : A Wavelet-based Transformer Framework for Univariate Time Series Forecasting

Authors: Lena Sasal, Tanujit Chakraborty, Abdenour Hadid

Deep learning utilizing transformers has recently achieved a lot of success
in many vital areas such as natural language processing, computer vision,
anomaly detection, and recommendation systems, among many others. Among several
merits of transformers, the ability to capture long-range temporal dependencies
and interactions is desirable for time series forecasting, leading to its
progress in various time series applications. In this paper, we build a
transformer model for non-stationary time series. The problem is challenging
yet crucially important. We present a novel framework for univariate time
series representation learning based on the wavelet-based transformer encoder
architecture and call it W-Transformer. The proposed W-Transformers utilize a
maximal overlap discrete wavelet transformation (MODWT) to the time series data
and build local transformers on the decomposed datasets to vividly capture the
nonstationarity and long-range nonlinear dependencies in the time series.
Evaluating our framework on several publicly available benchmark time series
datasets from various domains and with diverse characteristics, we demonstrate
that it performs, on average, significantly better than the baseline
forecasters for short-term and long-term forecasting, even for datasets that
consist of only a few hundred training samples.

arXiv link: http://arxiv.org/abs/2209.03945v1

Econometrics arXiv paper, submitted: 2022-09-08

Modified Causal Forest

Authors: Michael Lechner, Jana Mareckova

Uncovering the heterogeneity of causal effects of policies and business
decisions at various levels of granularity provides substantial value to
decision makers. This paper develops estimation and inference procedures for
multiple treatment models in a selection-on-observed-variables framework by
modifying the Causal Forest approach (Wager and Athey, 2018) in several
dimensions. The new estimators have desirable theoretical, computational, and
practical properties for various aggregation levels of the causal effects.
While an Empirical Monte Carlo study suggests that they outperform previously
suggested estimators, an application to the evaluation of an active labour
market pro-gramme shows their value for applied research.

arXiv link: http://arxiv.org/abs/2209.03744v1

Econometrics arXiv updated paper (originally submitted: 2022-09-07)

A Ridge-Regularised Jackknifed Anderson-Rubin Test

Authors: Max-Sebastian Dovì, Anders Bredahl Kock, Sophocles Mavroeidis

We consider hypothesis testing in instrumental variable regression models
with few included exogenous covariates but many instruments -- possibly more
than the number of observations. We show that a ridge-regularised version of
the jackknifed Anderson Rubin (1949, henceforth AR) test controls asymptotic
size in the presence of heteroskedasticity, and when the instruments may be
arbitrarily weak. Asymptotic size control is established under weaker
assumptions than those imposed for recently proposed jackknifed AR tests in the
literature. Furthermore, ridge-regularisation extends the scope of jackknifed
AR tests to situations in which there are more instruments than observations.
Monte-Carlo simulations indicate that our method has favourable finite-sample
size and power properties compared to recently proposed alternative approaches
in the literature. An empirical application on the elasticity of substitution
between immigrants and natives in the US illustrates the usefulness of the
proposed method for practitioners.

arXiv link: http://arxiv.org/abs/2209.03259v2

Econometrics arXiv updated paper (originally submitted: 2022-09-07)

Local Projection Inference in High Dimensions

Authors: Robert Adamek, Stephan Smeekes, Ines Wilms

In this paper, we estimate impulse responses by local projections in
high-dimensional settings. We use the desparsified (de-biased) lasso to
estimate the high-dimensional local projections, while leaving the impulse
response parameter of interest unpenalized. We establish the uniform asymptotic
normality of the proposed estimator under general conditions. Finally, we
demonstrate small sample performance through a simulation study and consider
two canonical applications in macroeconomic research on monetary policy and
government spending.

arXiv link: http://arxiv.org/abs/2209.03218v3

Econometrics arXiv paper, submitted: 2022-09-07

An Assessment Tool for Academic Research Managers in the Third World

Authors: Fernando Delbianco, Andres Fioriti, Fernando Tohmé

The academic evaluation of the publication record of researchers is relevant
for identifying talented candidates for promotion and funding. A key tool for
this is the use of the indexes provided by Web of Science and SCOPUS, costly
databases that sometimes exceed the possibilities of academic institutions in
many parts of the world. We show here how the data in one of the bases can be
used to infer the main index of the other one. Methods of data analysis used in
Machine Learning allow us to select just a few of the hundreds of variables in
a database, which later are used in a panel regression, yielding a good
approximation to the main index in the other database. Since the information of
SCOPUS can be freely scraped from the Web, this approach allows to infer for
free the Impact Factor of publications, the main index used in research
assessments around the globe.

arXiv link: http://arxiv.org/abs/2209.03199v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2022-09-05

Rethinking Generalized Beta Family of Distributions

Authors: Jiong Liu, R. A. Serota

We approach the Generalized Beta (GB) family of distributions using a
mean-reverting stochastic differential equation (SDE) for a power of the
variable, whose steady-state (stationary) probability density function (PDF) is
a modified GB (mGB) distribution. The SDE approach allows for a lucid
explanation of Generalized Beta Prime (GB2) and Generalized Beta (GB1) limits
of GB distribution and, further down, of Generalized Inverse Gamma (GIGa) and
Generalized Gamma (GGa) limits, as well as describe the transition between the
latter two. We provide an alternative form to the "traditional" GB PDF to
underscore that a great deal of usefulness of GB distribution lies in its
allowing a long-range power-law behavior to be ultimately terminated at a
finite value. We derive the cumulative distribution function (CDF) of the
"traditional" GB, which belongs to the family generated by the regularized beta
function and is crucial for analysis of the tails of the distribution. We
analyze fifty years of historical data on realized market volatility,
specifically for S&P500, as a case study of the use of GB/mGB distributions
and show that its behavior is consistent with that of negative Dragon Kings.

arXiv link: http://arxiv.org/abs/2209.05225v1

Econometrics arXiv paper, submitted: 2022-09-05

Bayesian Mixed-Frequency Quantile Vector Autoregression: Eliciting tail risks of Monthly US GDP

Authors: Matteo Iacopini, Aubrey Poon, Luca Rossini, Dan Zhu

Timely characterizations of risks in economic and financial systems play an
essential role in both economic policy and private sector decisions. However,
the informational content of low-frequency variables and the results from
conditional mean models provide only limited evidence to investigate this
problem. We propose a novel mixed-frequency quantile vector autoregression
(MF-QVAR) model to address this issue. Inspired by the univariate Bayesian
quantile regression literature, the multivariate asymmetric Laplace
distribution is exploited under the Bayesian framework to form the likelihood.
A data augmentation approach coupled with a precision sampler efficiently
estimates the missing low-frequency variables at higher frequencies under the
state-space representation. The proposed methods allow us to nowcast
conditional quantiles for multiple variables of interest and to derive
quantile-related risk measures at high frequency, thus enabling timely policy
interventions. The main application of the model is to nowcast conditional
quantiles of the US GDP, which is strictly related to the quantification of
Value-at-Risk and the Expected Shortfall.

arXiv link: http://arxiv.org/abs/2209.01910v1

Econometrics arXiv paper, submitted: 2022-09-05

Robust Causal Learning for the Estimation of Average Treatment Effects

Authors: Yiyan Huang, Cheuk Hang Leung, Xing Yan, Qi Wu, Shumin Ma, Zhiri Yuan, Dongdong Wang, Zhixiang Huang

Many practical decision-making problems in economics and healthcare seek to
estimate the average treatment effect (ATE) from observational data. The
Double/Debiased Machine Learning (DML) is one of the prevalent methods to
estimate ATE in the observational study. However, the DML estimators can suffer
an error-compounding issue and even give an extreme estimate when the
propensity scores are misspecified or very close to 0 or 1. Previous studies
have overcome this issue through some empirical tricks such as propensity score
trimming, yet none of the existing literature solves this problem from a
theoretical standpoint. In this paper, we propose a Robust Causal Learning
(RCL) method to offset the deficiencies of the DML estimators. Theoretically,
the RCL estimators i) are as consistent and doubly robust as the DML
estimators, and ii) can get rid of the error-compounding issue. Empirically,
the comprehensive experiments show that i) the RCL estimators give more stable
estimations of the causal parameters than the DML estimators, and ii) the RCL
estimators outperform the traditional estimators and their variants when
applying different machine learning models on both simulation and benchmark
datasets.

arXiv link: http://arxiv.org/abs/2209.01805v1

Econometrics arXiv updated paper (originally submitted: 2022-09-04)

Combining Forecasts under Structural Breaks Using Graphical LASSO

Authors: Tae-Hwy Lee, Ekaterina Seregina

In this paper we develop a novel method of combining many forecasts based on
a machine learning algorithm called Graphical LASSO (GL). We visualize forecast
errors from different forecasters as a network of interacting entities and
generalize network inference in the presence of common factor structure and
structural breaks. First, we note that forecasters often use common information
and hence make common mistakes, which makes the forecast errors exhibit common
factor structures. We use the Factor Graphical LASSO (FGL, Lee and Seregina
(2023)) to separate common forecast errors from the idiosyncratic errors and
exploit sparsity of the precision matrix of the latter. Second, since the
network of experts changes over time as a response to unstable environments
such as recessions, it is unreasonable to assume constant forecast combination
weights. Hence, we propose Regime-Dependent Factor Graphical LASSO (RD-FGL)
that allows factor loadings and idiosyncratic precision matrix to be
regime-dependent. We develop its scalable implementation using the Alternating
Direction Method of Multipliers (ADMM) to estimate regime-dependent forecast
combination weights. The empirical application to forecasting macroeconomic
series using the data of the European Central Bank's Survey of Professional
Forecasters (ECB SPF) demonstrates superior performance of a combined forecast
using FGL and RD-FGL.

arXiv link: http://arxiv.org/abs/2209.01697v2

Econometrics arXiv updated paper (originally submitted: 2022-09-03)

Instrumental variable quantile regression under random right censoring

Authors: Jad Beyhum, Lorenzo Tedesco, Ingrid Van Keilegom

This paper studies a semiparametric quantile regression model with endogenous
variables and random right censoring. The endogeneity issue is solved using
instrumental variables. It is assumed that the structural quantile of the
logarithm of the outcome variable is linear in the covariates and censoring is
independent. The regressors and instruments can be either continuous or
discrete. The specification generates a continuum of equations of which the
quantile regression coefficients are a solution. Identification is obtained
when this system of equations has a unique solution. Our estimation procedure
solves an empirical analogue of the system of equations. We derive conditions
under which the estimator is asymptotically normal and prove the validity of a
bootstrap procedure for inference. The finite sample performance of the
approach is evaluated through numerical simulations. An application to the
national Job Training Partnership Act study illustrates the method.

arXiv link: http://arxiv.org/abs/2209.01429v2

Econometrics arXiv updated paper (originally submitted: 2022-09-01)

Instrumental variables with unordered treatments: Theory and evidence from returns to fields of study

Authors: Eskil Heinesen, Christian Hvid, Lars Kirkebøen, Edwin Leuven, Magne Mogstad

We revisit the identification argument of Kirkeboen et al. (2016) who showed
how one may combine instruments for multiple unordered treatments with
information about individuals' ranking of these treatments to achieve
identification while allowing for both observed and unobserved heterogeneity in
treatment effects. We show that the key assumptions underlying their
identification argument have testable implications. We also provide a new
characterization of the bias that may arise if these assumptions are violated.
Taken together, these results allow researchers not only to test the underlying
assumptions, but also to argue whether the bias from violation of these
assumptions are likely to be economically meaningful. Guided and motivated by
these results, we estimate and compare the earnings payoffs to post-secondary
fields of study in Norway and Denmark. In each country, we apply the
identification argument of Kirkeboen et al. (2016) to data on individuals'
ranking of fields of study and field-specific instruments from discontinuities
in the admission systems. We empirically examine whether and why the payoffs to
fields of study differ across the two countries. We find strong cross-country
correlation in the payoffs to fields of study, especially after removing fields
with violations of the assumptions underlying the identification argument.

arXiv link: http://arxiv.org/abs/2209.00417v3

Econometrics arXiv paper, submitted: 2022-09-01

A Unified Framework for Estimation of High-dimensional Conditional Factor Models

Authors: Qihui Chen

This paper develops a general framework for estimation of high-dimensional
conditional factor models via nuclear norm regularization. We establish large
sample properties of the estimators, and provide an efficient computing
algorithm for finding the estimators as well as a cross validation procedure
for choosing the regularization parameter. The general framework allows us to
estimate a variety of conditional factor models in a unified way and quickly
deliver new asymptotic results. We apply the method to analyze the cross
section of individual US stock returns, and find that imposing homogeneity may
improve the model's out-of-sample predictability.

arXiv link: http://arxiv.org/abs/2209.00391v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-09-01

Switchback Experiments under Geometric Mixing

Authors: Yuchen Hu, Stefan Wager

The switchback is an experimental design that measures treatment effects by
repeatedly turning an intervention on and off for a whole system. Switchback
experiments are a robust way to overcome cross-unit spillover effects; however,
they are vulnerable to bias from temporal carryovers. In this paper, we
consider properties of switchback experiments in Markovian systems that mix at
a geometric rate. We find that, in this setting, standard switchback designs
suffer considerably from carryover bias: Their estimation error decays as
$T^{-1/3}$ in terms of the experiment horizon $T$, whereas in the absence of
carryovers a faster rate of $T^{-1/2}$ would have been possible. We also show,
however, that judicious use of burn-in periods can considerably improve the
situation, and enables errors that decay as $\log(T)^{1/2}T^{-1/2}$. Our formal
results are mirrored in an empirical evaluation.

arXiv link: http://arxiv.org/abs/2209.00197v3

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2022-08-30

Modeling Volatility and Dependence of European Carbon and Energy Prices

Authors: Jonathan Berrisch, Sven Pappert, Florian Ziel, Antonia Arsova

We study the prices of European Emission Allowances (EUA), whereby we analyze
their uncertainty and dependencies on related energy prices (natural gas, coal,
and oil). We propose a probabilistic multivariate conditional time series model
with a VECM-Copula-GARCH structure which exploits key characteristics of the
data. Data are normalized with respect to inflation and carbon emissions to
allow for proper cross-series evaluation. The forecasting performance is
evaluated in an extensive rolling-window forecasting study, covering eight
years out-of-sample. We discuss our findings for both levels- and
log-transformed data, focusing on time-varying correlations, and in view of the
Russian invasion of Ukraine.

arXiv link: http://arxiv.org/abs/2208.14311v4

Econometrics arXiv updated paper (originally submitted: 2022-08-29)

A Consistent ICM-based $χ^2$ Specification Test

Authors: Feiyu Jiang, Emmanuel Selorm Tsyawo

In spite of the omnibus property of Integrated Conditional Moment (ICM)
specification tests, they are not commonly used in empirical practice owing to,
e.g., the non-pivotality of the test and the high computational cost of
available bootstrap schemes especially in large samples. This paper proposes
specification and mean independence tests based on a class of ICM metrics
termed the generalized martingale difference divergence (GMDD). The proposed
tests exhibit consistency, asymptotic $\chi^2$-distribution under the null
hypothesis, and computational efficiency. Moreover, they demonstrate robustness
to heteroskedasticity of unknown form and can be adapted to enhance power
towards specific alternatives. A power comparison with classical
bootstrap-based ICM tests using Bahadur slopes is also provided. Monte Carlo
simulations are conducted to showcase the proposed tests' excellent size
control and competitive power.

arXiv link: http://arxiv.org/abs/2208.13370v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-08-29

Safe Policy Learning under Regression Discontinuity Designs with Multiple Cutoffs

Authors: Yi Zhang, Eli Ben-Michael, Kosuke Imai

The regression discontinuity (RD) design is widely used for program
evaluation with observational data. The primary focus of the existing
literature has been the estimation of the local average treatment effect at the
existing treatment cutoff. In contrast, we consider policy learning under the
RD design. Because the treatment assignment mechanism is deterministic,
learning better treatment cutoffs requires extrapolation. We develop a robust
optimization approach to finding optimal treatment cutoffs that improve upon
the existing ones. We first decompose the expected utility into
point-identifiable and unidentifiable components. We then propose an efficient
doubly-robust estimator for the identifiable parts. To account for the
unidentifiable components, we leverage the existence of multiple cutoffs that
are common under the RD design. Specifically, we assume that the heterogeneity
in the conditional expectations of potential outcomes across different groups
vary smoothly along the running variable. Under this assumption, we minimize
the worst case utility loss relative to the status quo policy. The resulting
new treatment cutoffs have a safety guarantee that they will not yield a worse
overall outcome than the existing cutoffs. Finally, we establish the asymptotic
regret bounds for the learned policy using semi-parametric efficiency theory.
We apply the proposed methodology to empirical and simulated data sets.

arXiv link: http://arxiv.org/abs/2208.13323v4

Econometrics arXiv paper, submitted: 2022-08-28

Comparing Stochastic Volatility Specifications for Large Bayesian VARs

Authors: Joshua C. C. Chan

Large Bayesian vector autoregressions with various forms of stochastic
volatility have become increasingly popular in empirical macroeconomics. One
main difficulty for practitioners is to choose the most suitable stochastic
volatility specification for their particular application. We develop Bayesian
model comparison methods -- based on marginal likelihood estimators that
combine conditional Monte Carlo and adaptive importance sampling -- to choose
among a variety of stochastic volatility specifications. The proposed methods
can also be used to select an appropriate shrinkage prior on the VAR
coefficients, which is a critical component for avoiding over-fitting in
high-dimensional settings. Using US quarterly data of different dimensions, we
find that both the Cholesky stochastic volatility and factor stochastic
volatility outperform the common stochastic volatility specification. Their
superior performance, however, can mostly be attributed to the more flexible
priors that accommodate cross-variable shrinkage.

arXiv link: http://arxiv.org/abs/2208.13255v1

Econometrics arXiv cross-link from q-fin.GN (q-fin.GN), submitted: 2022-08-28

An agent-based modeling approach for real-world economic systems: Example and calibration with a Social Accounting Matrix of Spain

Authors: Martin Jaraiz

The global economy is one of today's major challenges, with increasing
relevance in recent decades. A frequent observation by policy makers is the
lack of tools that help at least to understand, if not predict, economic
crises. Currently, macroeconomic modeling is dominated by Dynamic Stochastic
General Equilibrium (DSGE) models. The limitations of DSGE in coping with the
complexity of today's global economy are often recognized and are the subject
of intense research to find possible solutions. As an alternative or complement
to DSGE, the last two decades have seen the rise of agent-based models (ABM).
An attractive feature of ABM is that it can model very complex systems because
it is a bottom-up approach that can describe the specific behavior of
heterogeneous agents. The main obstacle, however, is the large number of
parameters that need to be known or calibrated. To enable the use of ABM with
data from the real-world economy, this paper describes an agent-based
macroeconomic modeling approach that can read a Social Accounting Matrix (SAM)
and deploy from scratch an economic system (labor, activity sectors operating
as firms, a central bank, the government, external sectors...) whose structure
and activity produce a SAM with values very close to those of the actual SAM
snapshot. This approach paves the way for unleashing the expected high
performance of ABM models to deal with the complexities of current global
macroeconomics, including other layers of interest like ecology, epidemiology,
or social networks among others.

arXiv link: http://arxiv.org/abs/2208.13254v3

Econometrics arXiv paper, submitted: 2022-08-27

A Descriptive Method of Firm Size Transition Dynamics Using Markov Chain

Authors: Boyang You, Kerry Papps

Social employment, which is mostly carried by firms of different types,
determines the prosperity and stability of a country. As time passing, the
fluctuations of firm employment can reflect the process of creating or
destroying jobs. Therefore, it is instructive to investigate the firm
employment (size) dynamics. Drawing on the firm-level panel data extracted from
the Chinese Industrial Enterprises Database 1998-2013, this paper proposes a
Markov-chain-based descriptive approach to clearly demonstrate the firm size
transfer dynamics between different size categories. With this method, any firm
size transition path in a short time period can be intuitively demonstrated.
Furthermore, by utilizing the properties of Markov transfer matrices, the
definition of transition trend and the transition entropy are introduced and
estimated. As a result, the tendency of firm size transfer between small,
medium and large can be exactly revealed, and the uncertainty of size change
can be quantified. Generally from the evidence of this paper, it can be
inferred that small and medium manufacturing firms in China have greater job
creation potentials compared to large firms over this time period.

arXiv link: http://arxiv.org/abs/2208.13012v1

Econometrics arXiv paper, submitted: 2022-08-27

A restricted eigenvalue condition for unit-root non-stationary data

Authors: Etienne Wijler

In this paper, we develop a restricted eigenvalue condition for unit-root
non-stationary data and derive its validity under the assumption of independent
Gaussian innovations that may be contemporaneously correlated. The method of
proof relies on matrix concentration inequalities and offers sufficient
flexibility to enable extensions of our results to alternative time series
settings. As an application of this result, we show the consistency of the
lasso estimator on ultra high-dimensional cointegrated data in which the number
of integrated regressors may grow exponentially in relation to the sample size.

arXiv link: http://arxiv.org/abs/2208.12990v1

Econometrics arXiv updated paper (originally submitted: 2022-08-25)

Large Volatility Matrix Analysis Using Global and National Factor Models

Authors: Sung Hoon Choi, Donggyu Kim

Several large volatility matrix inference procedures have been developed,
based on the latent factor model. They often assumed that there are a few of
common factors, which can account for volatility dynamics. However, several
studies have demonstrated the presence of local factors. In particular, when
analyzing the global stock market, we often observe that nation-specific
factors explain their own country's volatility dynamics. To account for this,
we propose the Double Principal Orthogonal complEment Thresholding
(Double-POET) method, based on multi-level factor models, and also establish
its asymptotic properties. Furthermore, we demonstrate the drawback of using
the regular principal orthogonal component thresholding (POET) when the local
factor structure exists. We also describe the blessing of dimensionality using
Double-POET for local covariance matrix estimation. Finally, we investigate the
performance of the Double-POET estimator in an out-of-sample portfolio
allocation study using international stocks from 20 financial markets.

arXiv link: http://arxiv.org/abs/2208.12323v2

Econometrics arXiv updated paper (originally submitted: 2022-08-25)

What Impulse Response Do Instrumental Variables Identify?

Authors: Bonsoo Koo, Seojeong Lee, Myung Hwan Seo, Masaya Takano

Macroeconomic shocks are often composites of multiple components. We show
that the local projection-IV (LP-IV) estimand aggregates component-wise impulse
responses with potentially negative weights, challenging its causal
interpretation. To address this, we propose identification strategies using
multiple sign-restricted IVs or disaggregated data, which recover structurally
meaningful responses even when individual LP-IV estimands are non-causal. We
also show that, under weak stationarity, the identified sets are sharp and
cannot be further narrowed in some key cases. Applications to fiscal and
monetary policy demonstrate the practical value of our approach.

arXiv link: http://arxiv.org/abs/2208.11828v3

Econometrics arXiv updated paper (originally submitted: 2022-08-24)

Robust Tests of Model Incompleteness in the Presence of Nuisance Parameters

Authors: Shuowen Chen, Hiroaki Kaido

Economic models may exhibit incompleteness depending on whether or not they
admit certain policy-relevant features such as strategic interaction,
self-selection, or state dependence. We develop a novel test of model
incompleteness and analyze its asymptotic properties. A key observation is that
one can identify the least-favorable parametric model that represents the most
challenging scenario for detecting local alternatives without knowledge of the
selection mechanism. We build a robust test of incompleteness on a score
function constructed from such a model. The proposed procedure remains
computationally tractable even with nuisance parameters because it suffices to
estimate them only under the null hypothesis of model completeness. We
illustrate the test by applying it to a market entry model and a triangular
model with a set-valued control function.

arXiv link: http://arxiv.org/abs/2208.11281v2

Econometrics arXiv updated paper (originally submitted: 2022-08-23)

Beta-Sorted Portfolios

Authors: Matias D. Cattaneo, Richard K. Crump, Weining Wang

Beta-sorted portfolios -- portfolios comprised of assets with similar
covariation to selected risk factors -- are a popular tool in empirical finance
to analyze models of (conditional) expected returns. Despite their widespread
use, little is known of their statistical properties in contrast to comparable
procedures such as two-pass regressions. We formally investigate the properties
of beta-sorted portfolio returns by casting the procedure as a two-step
nonparametric estimator with a nonparametric first step and a beta-adaptive
portfolios construction. Our framework rationalize the well-known estimation
algorithm with precise economic and statistical assumptions on the general data
generating process and characterize its key features. We study beta-sorted
portfolios for both a single cross-section as well as for aggregation over time
(e.g., the grand mean), offering conditions that ensure consistency and
asymptotic normality along with new uniform inference procedures allowing for
uncertainty quantification and testing of various relevant hypotheses in
financial applications. We also highlight some limitations of current empirical
practices and discuss what inferences can and cannot be drawn from returns to
beta-sorted portfolios for either a single cross-section or across the whole
sample. Finally, we illustrate the functionality of our new procedures in an
empirical application.

arXiv link: http://arxiv.org/abs/2208.10974v3

Econometrics arXiv updated paper (originally submitted: 2022-08-23)

pystacked: Stacking generalization and machine learning in Stata

Authors: Achim Ahrens, Christian B. Hansen, Mark E. Schaffer

pystacked implements stacked generalization (Wolpert, 1992) for regression
and binary classification via Python's scikit-learn. Stacking combines multiple
supervised machine learners -- the "base" or "level-0" learners -- into a
single learner. The currently supported base learners include regularized
regression, random forest, gradient boosted trees, support vector machines, and
feed-forward neural nets (multi-layer perceptron). pystacked can also be used
with as a `regular' machine learning program to fit a single base learner and,
thus, provides an easy-to-use API for scikit-learn's machine learning
algorithms.

arXiv link: http://arxiv.org/abs/2208.10896v2

Econometrics arXiv updated paper (originally submitted: 2022-08-20)

Optimal Pre-Analysis Plans: Statistical Decisions Subject to Implementability

Authors: Maximilian Kasy, Jann Spiess

What is the purpose of pre-analysis plans, and how should they be designed?
We model the interaction between an agent who analyzes data and a principal who
makes a decision based on agent reports. The agent could be the manufacturer of
a new drug, and the principal a regulator deciding whether the drug is
approved. Or the agent could be a researcher submitting a research paper, and
the principal an editor deciding whether it is published. The agent decides
which statistics to report to the principal. The principal cannot verify
whether the analyst reported selectively. Absent a pre-analysis message, if
there are conflicts of interest, then many desirable decision rules cannot be
implemented. Allowing the agent to send a message before seeing the data
increases the set of decision rules that can be implemented, and allows the
principal to leverage agent expertise. The optimal mechanisms that we
characterize require pre-analysis plans. Applying these results to hypothesis
testing, we show that optimal rejection rules pre-register a valid test, and
make worst-case assumptions about unreported statistics. Optimal tests can be
found as a solution to a linear-programming problem.

arXiv link: http://arxiv.org/abs/2208.09638v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-08-19

Deep Learning for Choice Modeling

Authors: Zhongze Cai, Hanzhao Wang, Kalyan Talluri, Xiaocheng Li

Choice modeling has been a central topic in the study of individual
preference or utility across many fields including economics, marketing,
operations research, and psychology. While the vast majority of the literature
on choice models has been devoted to the analytical properties that lead to
managerial and policy-making insights, the existing methods to learn a choice
model from empirical data are often either computationally intractable or
sample inefficient. In this paper, we develop deep learning-based choice models
under two settings of choice modeling: (i) feature-free and (ii) feature-based.
Our model captures both the intrinsic utility for each candidate choice and the
effect that the assortment has on the choice probability. Synthetic and real
data experiments demonstrate the performances of proposed models in terms of
the recovery of the existing choice models, sample complexity, assortment
effect, architecture design, and model interpretation.

arXiv link: http://arxiv.org/abs/2208.09325v1

Econometrics arXiv paper, submitted: 2022-08-19

Understanding Volatility Spillover Relationship Among G7 Nations And India During Covid-19

Authors: Avik Das, Devanjali Nandi Das

Purpose: In the context of a COVID pandemic in 2020-21, this paper attempts
to capture the interconnectedness and volatility transmission dynamics. The
nature of change in volatility spillover effects and time-varying conditional
correlation among the G7 countries and India is investigated. Methodology: To
assess the volatility spillover effects, the bivariate BEKK and t- DCC (1,1)
GARCH (1,1) models have been used. Our research shows how the dynamics of
volatility spillover between India and the G7 countries shift before and during
COVID-19. Findings: The findings reveal that the extent of volatility spillover
has altered during COVID compared to the pre-COVID environment. During this
pandemic, a sharp increase in conditional correlation indicates an increase in
systematic risk between countries. Originality: The study contributes to a
better understanding of the dynamics of volatility spillover between G7
countries and India. Asset managers and foreign corporations can use the
changing spillover dynamics to improve investment decisions and implement
effective hedging measures to protect their interests. Furthermore, this
research will assist financial regulators in assessing market risk in the
future owing to crises like as COVID-19.

arXiv link: http://arxiv.org/abs/2208.09148v1

Econometrics arXiv paper, submitted: 2022-08-19

On the Estimation of Peer Effects for Sampled Networks

Authors: Mamadou Yauck

This paper deals with the estimation of exogeneous peer effects for partially
observed networks under the new inferential paradigm of design identification,
which characterizes the missing data challenge arising with sampled networks
with the central idea that two full data versions which are topologically
compatible with the observed data may give rise to two different probability
distributions. We show that peer effects cannot be identified by design when
network links between sampled and unsampled units are not observed. Under
realistic modeling conditions, and under the assumption that sampled units
report on the size of their network of contacts, the asymptotic bias arising
from estimating peer effects with incomplete network data is characterized, and
a bias-corrected estimator is proposed. The finite sample performance of our
methodology is investigated via simulations.

arXiv link: http://arxiv.org/abs/2208.09102v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-08-18

Matrix Quantile Factor Model

Authors: Xin-Bing Kong, Yong-Xin Liu, Long Yu, Peng Zhao

This paper introduces a matrix quantile factor model for matrix-valued data
with low-rank structure. We estimate the row and column factor spaces via
minimizing the empirical check loss function with orthogonal rotation
constraints. We show that the estimates converge at rate
$(\min\{p_1p_2,p_2T,p_1T\})^{-1/2}$ in the average Frobenius norm, where $p_1$,
$p_2$ and $T$ are the row dimensionality, column dimensionality and length of
the matrix sequence, respectively. This rate is faster than that of the
quantile estimates via “flattening" the matrix model into a large vector
model. To derive the central limit theorem, we introduce a novel augmented
Lagrangian function, which is equivalent to the original constrained empirical
check loss minimization problem. Via the equivalence, we prove that the Hessian
matrix of the augmented Lagrangian function is locally positive definite,
resulting in a locally convex penalized loss function around the true factors
and their loadings. This easily leads to a feasible second-order expansion of
the score function and readily established central limit theorems of the
smoothed estimates of the loadings. We provide three consistent criteria to
determine the pair of row and column factor numbers. Extensive simulation
studies and an empirical study justify our theory.

arXiv link: http://arxiv.org/abs/2208.08693v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-08-17

Inference on Strongly Identified Functionals of Weakly Identified Functions

Authors: Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara

In a variety of applications, including nonparametric instrumental variable
(NPIV) analysis, proximal causal inference under unmeasured confounding, and
missing-not-at-random data with shadow variables, we are interested in
inference on a continuous linear functional (e.g., average causal effects) of
nuisance function (e.g., NPIV regression) defined by conditional moment
restrictions. These nuisance functions are generally weakly identified, in that
the conditional moment restrictions can be severely ill-posed as well as admit
multiple solutions. This is sometimes resolved by imposing strong conditions
that imply the function can be estimated at rates that make inference on the
functional possible. In this paper, we study a novel condition for the
functional to be strongly identified even when the nuisance function is not;
that is, the functional is amenable to asymptotically-normal estimation at
$n$-rates. The condition implies the existence of debiasing nuisance
functions, and we propose penalized minimax estimators for both the primary and
debiasing nuisance functions. The proposed nuisance estimators can accommodate
flexible function classes, and importantly they can converge to fixed limits
determined by the penalization regardless of the identifiability of the
nuisances. We use the penalized nuisance estimators to form a debiased
estimator for the functional of interest and prove its asymptotic normality
under generic high-level conditions, which provide for asymptotically valid
confidence intervals. We also illustrate our method in a novel partially linear
proximal causal inference problem and a partially linear instrumental variable
regression problem.

arXiv link: http://arxiv.org/abs/2208.08291v3

Econometrics arXiv paper, submitted: 2022-08-17

Time is limited on the road to asymptopia

Authors: Ivonne Schwartz, Mark Kirstein

One challenge in the estimation of financial market agent-based models
(FABMs) is to infer reliable insights using numerical simulations validated by
only a single observed time series. Ergodicity (besides stationarity) is a
strong precondition for any estimation, however it has not been systematically
explored and is often simply presumed. For finite-sample lengths and limited
computational resources empirical estimation always takes place in
pre-asymptopia. Thus broken ergodicity must be considered the rule, but it
remains largely unclear how to deal with the remaining uncertainty in
non-ergodic observables. Here we show how an understanding of the ergodic
properties of moment functions can help to improve the estimation of (F)ABMs.
We run Monte Carlo experiments and study the convergence behaviour of moment
functions of two prototype models. We find infeasibly-long convergence times
for most. Choosing an efficient mix of ensemble size and simulated time length
guided our estimation and might help in general.

arXiv link: http://arxiv.org/abs/2208.08169v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2022-08-17

Characterizing M-estimators

Authors: Timo Dimitriadis, Tobias Fissler, Johanna Ziegel

We characterize the full classes of M-estimators for semiparametric models of
general functionals by formally connecting the theory of consistent loss
functions from forecast evaluation with the theory of M-estimation. This novel
characterization result opens up the possibility for theoretical research on
efficient and equivariant M-estimation and, more generally, it allows to
leverage existing results on loss functions known from the literature of
forecast evaluation in estimation theory.

arXiv link: http://arxiv.org/abs/2208.08108v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-08-13

Optimal Recovery for Causal Inference

Authors: Ibtihal Ferwana, Lav R. Varshney

Problems in causal inference can be fruitfully addressed using signal
processing techniques. As an example, it is crucial to successfully quantify
the causal effects of an intervention to determine whether the intervention
achieved desired outcomes. We present a new geometric signal processing
approach to classical synthetic control called ellipsoidal optimal recovery
(EOpR), for estimating the unobservable outcome of a treatment unit. EOpR
provides policy evaluators with both worst-case and typical outcomes to help in
decision making. It is an approximation-theoretic technique that relates to the
theory of principal components, which recovers unknown observations given a
learned signal class and a set of known observations. We show EOpR can improve
pre-treatment fit and mitigate bias of the post-treatment estimate relative to
other methods in causal inference. Beyond recovery of the unit of interest, an
advantage of EOpR is that it produces worst-case limits over the estimates
produced. We assess our approach on artificially-generated data, on datasets
commonly used in the econometrics literature, and in the context of the
COVID-19 pandemic, showing better performance than baseline techniques

arXiv link: http://arxiv.org/abs/2208.06729v3

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2022-08-13

From the historical Roman road network to modern infrastructure in Italy

Authors: Luca De Benedictis, Vania Licio, Anna Pinna

An integrated and widespread road system, like the one built during the Roman
Empire in Italy, plays an important role today in facilitating the construction
of new infrastructure. This paper investigates the historical path of Roman
roads as main determinant of both motorways and railways in the country. The
empirical analysis shows how the modern Italian transport infrastructure
followed the path traced in ancient times by the Romans in constructing their
roads. Being paved and connecting Italy from North to South, consular
trajectories lasted in time, representing the starting physical capital for
developing the new transport networks.

arXiv link: http://arxiv.org/abs/2208.06675v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-08-12

A Nonparametric Approach with Marginals for Modeling Consumer Choice

Authors: Yanqiu Ruan, Xiaobo Li, Karthyek Murthy, Karthik Natarajan

Given data on the choices made by consumers for different offer sets, a key
challenge is to develop parsimonious models that describe and predict consumer
choice behavior while being amenable to prescriptive tasks such as pricing and
assortment optimization. The marginal distribution model (MDM) is one such
model, which requires only the specification of marginal distributions of the
random utilities. This paper aims to establish necessary and sufficient
conditions for given choice data to be consistent with the MDM hypothesis,
inspired by the usefulness of similar characterizations for the random utility
model (RUM). This endeavor leads to an exact characterization of the set of
choice probabilities that the MDM can represent. Verifying the consistency of
choice data with this characterization is equivalent to solving a
polynomial-sized linear program. Since the analogous verification task for RUM
is computationally intractable and neither of these models subsumes the other,
MDM is helpful in striking a balance between tractability and representational
power. The characterization is then used with robust optimization for making
data-driven sales and revenue predictions for new unseen assortments. When the
choice data lacks consistency with the MDM hypothesis, finding the best-fitting
MDM choice probabilities reduces to solving a mixed integer convex program.
Numerical results using real world data and synthetic data demonstrate that MDM
exhibits competitive representational power and prediction performance compared
to RUM and parametric models while being significantly faster in computation
than RUM.

arXiv link: http://arxiv.org/abs/2208.06115v6

Econometrics arXiv updated paper (originally submitted: 2022-08-10)

Testing for homogeneous treatment effects in linear and nonparametric instrumental variable models

Authors: Jad Beyhum, Jean-Pierre Florens, Elia Lapenta, Ingrid Van Keilegom

The hypothesis of homogeneous treatment effects is central to the
instrumental variables literature. This assumption signifies that treatment
effects are constant across all subjects. It allows to interpret instrumental
variable estimates as average treatment effects over the whole population of
the study. When this assumption does not hold, the bias of instrumental
variable estimators can be larger than that of naive estimators ignoring
endogeneity. This paper develops two tests for the assumption of homogeneous
treatment effects when the treatment is endogenous and an instrumental variable
is available. The tests leverage a covariable that is (jointly with the error
terms) independent of a coordinate of the instrument. This covariate does not
need to be exogenous. The first test assumes that the potential outcomes are
linear in the regressors and is computationally simple. The second test is
nonparametric and relies on Tikhonov regularization. The treatment can be
either discrete or continuous. We show that the tests have asymptotically
correct level and asymptotic power equal to one against a range of
alternatives. Simulations demonstrate that the proposed tests attain excellent
finite sample performances. The methodology is also applied to the evaluation
of returns to schooling and the effect of price on demand in a fish market.

arXiv link: http://arxiv.org/abs/2208.05344v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-08-10

Selecting Valid Instrumental Variables in Linear Models with Multiple Exposure Variables: Adaptive Lasso and the Median-of-Medians Estimator

Authors: Xiaoran Liang, Eleanor Sanderson, Frank Windmeijer

In a linear instrumental variables (IV) setting for estimating the causal
effects of multiple confounded exposure/treatment variables on an outcome, we
investigate the adaptive Lasso method for selecting valid instrumental
variables from a set of available instruments that may contain invalid ones. An
instrument is invalid if it fails the exclusion conditions and enters the model
as an explanatory variable. We extend the results as developed in Windmeijer et
al. (2019) for the single exposure model to the multiple exposures case. In
particular we propose a median-of-medians estimator and show that the
conditions on the minimum number of valid instruments under which this
estimator is consistent for the causal effects are only moderately stronger
than the simple majority rule that applies to the median estimator for the
single exposure case. The adaptive Lasso method which uses the initial
median-of-medians estimator for the penalty weights achieves consistent
selection with oracle properties of the resulting IV estimator. This is
confirmed by some Monte Carlo simulation results. We apply the method to
estimate the causal effects of educational attainment and cognitive ability on
body mass index (BMI) in a Mendelian Randomization setting.

arXiv link: http://arxiv.org/abs/2208.05278v1

Econometrics arXiv paper, submitted: 2022-08-09

Endogeneity in Weakly Separable Models without Monotonicity

Authors: Songnian Chen, Shakeeb Khan, Xun Tang

We identify and estimate treatment effects when potential outcomes are weakly
separable with a binary endogenous treatment. Vytlacil and Yildiz (2007)
proposed an identification strategy that exploits the mean of observed
outcomes, but their approach requires a monotonicity condition. In comparison,
we exploit full information in the entire outcome distribution, instead of just
its mean. As a result, our method does not require monotonicity and is also
applicable to general settings with multiple indices. We provide examples where
our approach can identify treatment effect parameters of interest whereas
existing methods would fail. These include models where potential outcomes
depend on multiple unobserved disturbance terms, such as a Roy model, a
multinomial choice model, as well as a model with endogenous random
coefficients. We establish consistency and asymptotic normality of our
estimators.

arXiv link: http://arxiv.org/abs/2208.05047v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2022-08-07

Finite Tests from Functional Characterizations

Authors: Charles Gauthier, Raghav Malhotra, Agustin Troccoli Moretti

Classically, testing whether decision makers belong to specific preference
classes involves two main approaches. The first, known as the functional
approach, assumes access to an entire demand function. The second, the revealed
preference approach, constructs inequalities to test finite demand data. This
paper bridges these methods by using the functional approach to test finite
data through preference learnability results. We develop a computationally
efficient algorithm that generates tests for choice data based on functional
characterizations of preference families. We provide these restrictions for
various applications, including homothetic and weakly separable preferences,
where the latter's revealed preference characterization is provably NP-Hard. We
also address choice under uncertainty, offering tests for betweenness
preferences. Lastly, we perform a simulation exercise demonstrating that our
tests are effective in finite samples and accurately reject demands not
belonging to a specified class.

arXiv link: http://arxiv.org/abs/2208.03737v5

Econometrics arXiv cross-link from cs.DL (cs.DL), submitted: 2022-08-07

Strategic differences between regional investments into graphene technology and how corporations and universities manage patent portfolios

Authors: Ai Linh Nguyen, Wenyuan Liu, Khiam Aik Khor, Andrea Nanetti, Siew Ann Cheong

Nowadays, patenting activities are essential in converting applied science to
technology in the prevailing innovation model. To gain strategic advantages in
the technological competitions between regions, nations need to leverage the
investments of public and private funds to diversify over all technologies or
specialize in a small number of technologies. In this paper, we investigated
who the leaders are at the regional and assignee levels, how they attained
their leadership positions, and whether they adopted diversification or
specialization strategies, using a dataset of 176,193 patent records on
graphene between 1986 and 2017 downloaded from Derwent Innovation. By applying
a co-clustering method to the IPC subclasses in the patents and using a z-score
method to extract keywords from their titles and abstracts, we identified seven
graphene technology areas emerging in the sequence synthesis - composites -
sensors - devices - catalyst - batteries - water treatment. We then examined
the top regions in their investment preferences and their changes in rankings
over time and found that they invested in all seven technology areas. In
contrast, at the assignee level, some were diversified while others were
specialized. We found that large entities diversified their portfolios across
multiple technology areas, while small entities specialized around their core
competencies. In addition, we found that universities had higher entropy values
than corporations on average, leading us to the hypothesis that corporations
file, buy, or sell patents to enable product development. In contrast,
universities focus only on licensing their patents. We validated this
hypothesis through an aggregate analysis of reassignment and licensing and a
more detailed analysis of three case studies - SAMSUNG, RICE UNIVERSITY, and
DYSON.

arXiv link: http://arxiv.org/abs/2208.03719v1

Econometrics arXiv updated paper (originally submitted: 2022-08-07)

Quantile Random-Coefficient Regression with Interactive Fixed Effects: Heterogeneous Group-Level Policy Evaluation

Authors: Ruofan Xu, Jiti Gao, Tatsushi Oka, Yoon-Jae Whang

We propose a quantile random-coefficient regression with interactive fixed
effects to study the effects of group-level policies that are heterogeneous
across individuals. Our approach is the first to use a latent factor structure
to handle the unobservable heterogeneities in the random coefficient. The
asymptotic properties and an inferential method for the policy estimators are
established. The model is applied to evaluate the effect of the minimum wage
policy on earnings between 1967 and 1980 in the United States. Our results
suggest that the minimum wage policy has significant and persistent positive
effects on black workers and female workers up to the median. Our results also
indicate that the policy helps reduce income disparity up to the median between
two groups: black, female workers versus white, male workers. However, the
policy is shown to have little effect on narrowing the income gap between low-
and high-income workers within the subpopulations.

arXiv link: http://arxiv.org/abs/2208.03632v3

Econometrics arXiv updated paper (originally submitted: 2022-08-06)

Forecasting Algorithms for Causal Inference with Panel Data

Authors: Jacob Goldin, Julian Nyarko, Justin Young

Conducting causal inference with panel data is a core challenge in social
science research. We adapt a deep neural architecture for time series
forecasting (the N-BEATS algorithm) to more accurately impute the
counterfactual evolution of a treated unit had treatment not occurred. Across a
range of settings, the resulting estimator (“SyNBEATS”) significantly
outperforms commonly employed methods (synthetic controls, two-way fixed
effects), and attains comparable or more accurate performance compared to
recently proposed methods (synthetic difference-in-differences, matrix
completion). An implementation of this estimator is available for public use.
Our results highlight how advances in the forecasting literature can be
harnessed to improve causal inference in panel data settings.

arXiv link: http://arxiv.org/abs/2208.03489v3

Econometrics arXiv updated paper (originally submitted: 2022-08-05)

Partial Identification of Personalized Treatment Response with Trial-reported Analyses of Binary Subgroups

Authors: Sheyu Li, Valentyn Litvin, Charles F. Manski

Medical journals have adhered to a reporting practice that seriously limits
the usefulness of published trial findings. Medical decision makers commonly
observe many patient covariates and seek to use this information to personalize
treatment choices. Yet standard summaries of trial findings only partition
subjects into broad subgroups, typically into binary categories. Given this
reporting practice, we study the problem of inference on long mean treatment
outcomes E[y(t)|x], where t is a treatment, y(t) is a treatment outcome, and
the covariate vector x has length K, each component being a binary variable.
The available data are estimates of {E[y(t)|xk = 0], E[y(t)|xk = 1], P(xk)}, k
= 1, . . . , K reported in journal articles. We show that reported trial
findings partially identify {E[y(t)|x], P(x)}. Illustrative computations
demonstrate that the summaries of trial findings in journal articles may imply
only wide bounds on long mean outcomes. One can realistically tighten
inferences if one can combine reported trial findings with credible assumptions
having identifying power, such as bounded-variation assumptions.

arXiv link: http://arxiv.org/abs/2208.03381v2

Econometrics arXiv updated paper (originally submitted: 2022-08-04)

Factor Network Autoregressions

Authors: Matteo Barigozzi, Giuseppe Cavaliere, Graziano Moramarco

We propose a factor network autoregressive (FNAR) model for time series with
complex network structures. The coefficients of the model reflect many
different types of connections between economic agents ("multilayer network"),
which are summarized into a smaller number of network matrices ("network
factors") through a novel tensor-based principal component approach. We provide
consistency and asymptotic normality results for the estimation of the factors,
their loadings, and the coefficients of the FNAR, as the number of layers,
nodes and time points diverges to infinity. Our approach combines two different
dimension-reduction techniques and can be applied to high-dimensional datasets.
Simulation results show the goodness of our estimators in finite samples. In an
empirical application, we use the FNAR to investigate the cross-country
interdependence of GDP growth rates based on a variety of international trade
and financial linkages. The model provides a rich characterization of
macroeconomic network effects as well as good forecasts of GDP growth rates.

arXiv link: http://arxiv.org/abs/2208.02925v7

Econometrics arXiv cross-link from math.PR (math.PR), submitted: 2022-08-04

Weak convergence to derivatives of fractional Brownian motion

Authors: Søren Johansen, Morten Ørregaard Nielsen

It is well known that, under suitable regularity conditions, the normalized
fractional process with fractional parameter $d$ converges weakly to fractional
Brownian motion for $d>1/2$. We show that, for any non-negative integer $M$,
derivatives of order $m=0,1,\dots,M$ of the normalized fractional process with
respect to the fractional parameter $d$, jointly converge weakly to the
corresponding derivatives of fractional Brownian motion. As an illustration we
apply the results to the asymptotic distribution of the score vectors in the
multifractional vector autoregressive model.

arXiv link: http://arxiv.org/abs/2208.02516v2

Econometrics arXiv paper, submitted: 2022-08-04

Difference-in-Differences with a Misclassified Treatment

Authors: Akanksha Negi, Digvijay Singh Negi

This paper studies identification and estimation of the average treatment
effect on the treated (ATT) in difference-in-difference (DID) designs when the
variable that classifies individuals into treatment and control groups
(treatment status, D) is endogenously misclassified. We show that
misclassification in D hampers consistent estimation of ATT because 1) it
restricts us from identifying the truly treated from those misclassified as
being treated and 2) differential misclassification in counterfactual trends
may result in parallel trends being violated with D even when they hold with
the true but unobserved D*. We propose a solution to correct for endogenous
one-sided misclassification in the context of a parametric DID regression which
allows for considerable heterogeneity in treatment effects and establish its
asymptotic properties in panel and repeated cross section settings.
Furthermore, we illustrate the method by using it to estimate the insurance
impact of a large-scale in-kind food transfer program in India which is known
to suffer from large targeting errors.

arXiv link: http://arxiv.org/abs/2208.02412v1

Econometrics arXiv updated paper (originally submitted: 2022-08-03)

The Econometrics of Financial Duration Modeling

Authors: Giuseppe Cavaliere, Thomas Mikosch, Anders Rahbek, Frederik Vilandt

We establish new results for estimation and inference in financial durations
models, where events are observed over a given time span, such as a trading
day, or a week. For the classical autoregressive conditional duration (ACD)
models by Engle and Russell (1998, Econometrica 66, 1127-1162), we show that
the large sample behavior of likelihood estimators is highly sensitive to the
tail behavior of the financial durations. In particular, even under
stationarity, asymptotic normality breaks down for tail indices smaller than
one or, equivalently, when the clustering behaviour of the observed events is
such that the unconditional distribution of the durations has no finite mean.
Instead, we find that estimators are mixed Gaussian and have non-standard rates
of convergence. The results are based on exploiting the crucial fact that for
duration data the number of observations within any given time span is random.
Our results apply to general econometric models where the number of observed
events is random.

arXiv link: http://arxiv.org/abs/2208.02098v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-08-03

Bayesian ranking and selection with applications to field studies, economic mobility, and forecasting

Authors: Dillon Bowen

Decision-making often involves ranking and selection. For example, to
assemble a team of political forecasters, we might begin by narrowing our
choice set to the candidates we are confident rank among the top 10% in
forecasting ability. Unfortunately, we do not know each candidate's true
ability but observe a noisy estimate of it. This paper develops new Bayesian
algorithms to rank and select candidates based on noisy estimates. Using
simulations based on empirical data, we show that our algorithms often
outperform frequentist ranking and selection algorithms. Our Bayesian ranking
algorithms yield shorter rank confidence intervals while maintaining
approximately correct coverage. Our Bayesian selection algorithms select more
candidates while maintaining correct error rates. We apply our ranking and
selection procedures to field experiments, economic mobility, forecasting, and
similar problems. Finally, we implement our ranking and selection techniques in
a user-friendly Python package documented here:
https://dsbowen-conditional-inference.readthedocs.io/en/latest/.

arXiv link: http://arxiv.org/abs/2208.02038v1

Econometrics arXiv updated paper (originally submitted: 2022-08-03)

Bootstrap inference in the presence of bias

Authors: Giuseppe Cavaliere, Sílvia Gonçalves, Morten Ørregaard Nielsen, Edoardo Zanelli

We consider bootstrap inference for estimators which are (asymptotically)
biased. We show that, even when the bias term cannot be consistently estimated,
valid inference can be obtained by proper implementations of the bootstrap.
Specifically, we show that the prepivoting approach of Beran (1987, 1988),
originally proposed to deliver higher-order refinements, restores bootstrap
validity by transforming the original bootstrap p-value into an asymptotically
uniform random variable. We propose two different implementations of
prepivoting (plug-in and double bootstrap), and provide general high-level
conditions that imply validity of bootstrap inference. To illustrate the
practical relevance and implementation of our results, we discuss five
examples: (i) inference on a target parameter based on model averaging; (ii)
ridge-type regularized estimators; (iii) nonparametric regression; (iv) a
location model for infinite variance data; and (v) dynamic panel data models.

arXiv link: http://arxiv.org/abs/2208.02028v3

Econometrics arXiv paper, submitted: 2022-08-03

Weak Instruments, First-Stage Heteroskedasticity, the Robust F-Test and a GMM Estimator with the Weight Matrix Based on First-Stage Residuals

Authors: Frank Windmeijer

This paper is concerned with the findings related to the robust first-stage
F-statistic in the Monte Carlo analysis of Andrews (2018), who found in a
heteroskedastic grouped-data design that even for very large values of the
robust F-statistic, the standard 2SLS confidence intervals had large coverage
distortions. This finding appears to discredit the robust F-statistic as a test
for underidentification. However, it is shown here that large values of the
robust F-statistic do imply that there is first-stage information, but this may
not be utilized well by the 2SLS estimator, or the standard GMM estimator. An
estimator that corrects for this is a robust GMM estimator, denoted GMMf, with
the robust weight matrix not based on the structural residuals, but on the
first-stage residuals. For the grouped-data setting of Andrews (2018), this
GMMf estimator gives the weights to the group specific estimators according to
the group specific concentration parameters in the same way as 2SLS does under
homoskedasticity, which is formally shown using weak instrument asymptotics.
The GMMf estimator is much better behaved than the 2SLS estimator in the
Andrews (2018) design, behaving well in terms of relative bias and Wald-test
size distortion at more standard values of the robust F-statistic. We show that
the same patterns can occur in a dynamic panel data model when the error
variance is heteroskedastic over time. We further derive the conditions under
which the Stock and Yogo (2005) weak instruments critical values apply to the
robust F-statistic in relation to the behaviour of the GMMf estimator.

arXiv link: http://arxiv.org/abs/2208.01967v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2022-08-02

Multifractal cross-correlations of bitcoin and ether trading characteristics in the post-COVID-19 time

Authors: Marcin Wątorek, Jarosław Kwapień, Stanisław Drożdż

Unlike price fluctuations, the temporal structure of cryptocurrency trading
has seldom been a subject of systematic study. In order to fill this gap, we
analyse detrended correlations of the price returns, the average number of
trades in time unit, and the traded volume based on high-frequency data
representing two major cryptocurrencies: bitcoin and ether. We apply the
multifractal detrended cross-correlation analysis, which is considered the most
reliable method for identifying nonlinear correlations in time series. We find
that all the quantities considered in our study show an unambiguous
multifractal structure from both the univariate (auto-correlation) and
bivariate (cross-correlation) perspectives. We looked at the bitcoin--ether
cross-correlations in simultaneously recorded signals, as well as in
time-lagged signals, in which a time series for one of the cryptocurrencies is
shifted with respect to the other. Such a shift suppresses the
cross-correlations partially for short time scales, but does not remove them
completely. We did not observe any qualitative asymmetry in the results for the
two choices of a leading asset. The cross-correlations for the simultaneous and
lagged time series became the same in magnitude for the sufficiently long
scales.

arXiv link: http://arxiv.org/abs/2208.01445v1

Econometrics arXiv updated paper (originally submitted: 2022-08-02)

Doubly Robust Estimation of Local Average Treatment Effects Using Inverse Probability Weighted Regression Adjustment

Authors: Tymon Słoczyński, S. Derya Uysal, Jeffrey M. Wooldridge

We revisit the problem of estimating the local average treatment effect
(LATE) and the local average treatment effect on the treated (LATT) when
control variables are available, either to render the instrumental variable
(IV) suitably exogenous or to improve precision. Unlike previous approaches,
our doubly robust (DR) estimation procedures use quasi-likelihood methods
weighted by the inverse of the IV propensity score - so-called inverse
probability weighted regression adjustment (IPWRA) estimators. By properly
choosing models for the propensity score and outcome models, fitted values are
ensured to be in the logical range determined by the response variable,
producing DR estimators of LATE and LATT with appealing small sample
properties. Inference is relatively straightforward both analytically and using
the nonparametric bootstrap. Our DR LATE and DR LATT estimators work well in
simulations. We also propose a DR version of the Hausman test that can be used
to assess the unconfoundedness assumption through a comparison of different
estimates of the average treatment effect on the treated (ATT) under one-sided
noncompliance. Unlike the usual test that compares OLS and IV estimates, this
procedure is robust to treatment effect heterogeneity.

arXiv link: http://arxiv.org/abs/2208.01300v2

Econometrics arXiv paper, submitted: 2022-08-01

A penalized two-pass regression to predict stock returns with time-varying risk premia

Authors: Gaetan Bakalli, Stéphane Guerrier, Olivier Scaillet

We develop a penalized two-pass regression with time-varying factor loadings.
The penalization in the first pass enforces sparsity for the time-variation
drivers while also maintaining compatibility with the no-arbitrage restrictions
by regularizing appropriate groups of coefficients. The second pass delivers
risk premia estimates to predict equity excess returns. Our Monte Carlo results
and our empirical results on a large cross-sectional data set of US individual
stocks show that penalization without grouping can yield to nearly all
estimated time-varying models violating the no-arbitrage restrictions.
Moreover, our results demonstrate that the proposed method reduces the
prediction errors compared to a penalized approach without appropriate grouping
or a time-invariant factor model.

arXiv link: http://arxiv.org/abs/2208.00972v1

Econometrics arXiv updated paper (originally submitted: 2022-08-01)

The Effect of Omitted Variables on the Sign of Regression Coefficients

Authors: Matthew A. Masten, Alexandre Poirier

We show that, depending on how the impact of omitted variables is measured,
it can be substantially easier for omitted variables to flip coefficient signs
than to drive them to zero. This behavior occurs with "Oster's delta" (Oster
2019), a widely reported robustness measure. Consequently, any time this
measure is large -- suggesting that omitted variables may be unimportant -- a
much smaller value reverses the sign of the parameter of interest. We propose a
modified measure of robustness to address this concern. We illustrate our
results in four empirical applications and two meta-analyses. We implement our
methods in the companion Stata module regsensitivity.

arXiv link: http://arxiv.org/abs/2208.00552v4

Econometrics arXiv paper, submitted: 2022-07-31

Interpreting and predicting the economy flows: A time-varying parameter global vector autoregressive integrated the machine learning model

Authors: Yukang Jiang, Xueqin Wang, Zhixi Xiong, Haisheng Yang, Ting Tian

The paper proposes a time-varying parameter global vector autoregressive
(TVP-GVAR) framework for predicting and analysing developed region economic
variables. We want to provide an easily accessible approach for the economy
application settings, where a variety of machine learning models can be
incorporated for out-of-sample prediction. The LASSO-type technique for
numerically efficient model selection of mean squared errors (MSEs) is
selected. We show the convincing in-sample performance of our proposed model in
all economic variables and relatively high precision out-of-sample predictions
with different-frequency economic inputs. Furthermore, the time-varying
orthogonal impulse responses provide novel insights into the connectedness of
economic variables at critical time points across developed regions. We also
derive the corresponding asymptotic bands (the confidence intervals) for
orthogonal impulse responses function under standard assumptions.

arXiv link: http://arxiv.org/abs/2209.05998v1

Econometrics arXiv cross-link from math.OC (math.OC), submitted: 2022-07-29

Compact representations of structured BFGS matrices

Authors: Johannes J. Brust, Zichao, Di, Sven Leyffer, Cosmin G. Petra

For general large-scale optimization problems compact representations exist
in which recursive quasi-Newton update formulas are represented as compact
matrix factorizations. For problems in which the objective function contains
additional structure, so-called structured quasi-Newton methods exploit
available second-derivative information and approximate unavailable second
derivatives. This article develops the compact representations of two
structured Broyden-Fletcher-Goldfarb-Shanno update formulas. The compact
representations enable efficient limited memory and initialization strategies.
Two limited memory line search algorithms are described and tested on a
collection of problems, including a real world large scale imaging application.

arXiv link: http://arxiv.org/abs/2208.00057v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-07-29

Tangential Wasserstein Projections

Authors: Florian Gunsilius, Meng Hsuan Hsieh, Myung Jin Lee

We develop a notion of projections between sets of probability measures using
the geometric properties of the 2-Wasserstein space. It is designed for general
multivariate probability measures, is computationally efficient to implement,
and provides a unique solution in regular settings. The idea is to work on
regular tangent cones of the Wasserstein space using generalized geodesics. Its
structure and computational properties make the method applicable in a variety
of settings, from causal inference to the analysis of object data. An
application to estimating causal effects yields a generalization of the notion
of synthetic controls to multivariate data with individual-level heterogeneity,
as well as a way to estimate optimal weights jointly over all time periods.

arXiv link: http://arxiv.org/abs/2207.14727v2

Econometrics arXiv updated paper (originally submitted: 2022-07-29)

Same Root Different Leaves: Time Series and Cross-Sectional Methods in Panel Data

Authors: Dennis Shen, Peng Ding, Jasjeet Sekhon, Bin Yu

A central goal in social science is to evaluate the causal effect of a
policy. One dominant approach is through panel data analysis in which the
behaviors of multiple units are observed over time. The information across time
and space motivates two general approaches: (i) horizontal regression (i.e.,
unconfoundedness), which exploits time series patterns, and (ii) vertical
regression (e.g., synthetic controls), which exploits cross-sectional patterns.
Conventional wisdom states that the two approaches are fundamentally different.
We establish this position to be partly false for estimation but generally true
for inference. In particular, we prove that both approaches yield identical
point estimates under several standard settings. For the same point estimate,
however, each approach quantifies uncertainty with respect to a distinct
estimand. In turn, the confidence interval developed for one estimand may have
incorrect coverage for another. This emphasizes that the source of randomness
that researchers assume has direct implications for the accuracy of inference.

arXiv link: http://arxiv.org/abs/2207.14481v2

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2022-07-28

Stable Matching with Mistaken Agents

Authors: Georgy Artemov, Yeon-Koo Che, YingHua He

Motivated by growing evidence of agents' mistakes in strategically simple
environments, we propose a solution concept -- robust equilibrium -- that
requires only an asymptotically optimal behavior. We use it to study large
random matching markets operated by the applicant-proposing Deferred Acceptance
(DA). Although truth-telling is a dominant strategy, almost all applicants may
be non-truthful in robust equilibrium; however, the outcome must be arbitrarily
close to the stable matching. Our results imply that one can assume truthful
agents to study DA outcomes, theoretically or counterfactually. However, to
estimate the preferences of mistaken agents, one should assume stable matching
but not truth-telling.

arXiv link: http://arxiv.org/abs/2207.13939v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-07-27

Identification and Inference with Min-over-max Estimators for the Measurement of Labor Market Fairness

Authors: Karthik Rajkumar

These notes shows how to do inference on the Demographic Parity (DP) metric.
Although the metric is a complex statistic involving min and max computations,
we propose a smooth approximation of those functions and derive its asymptotic
distribution. The limit of these approximations and their gradients converge to
those of the true max and min functions, wherever they exist. More importantly,
when the true max and min functions are not differentiable, the approximations
still are, and they provide valid asymptotic inference everywhere in the
domain. We conclude with some directions on how to compute confidence intervals
for DP, how to test if it is under 0.8 (the U.S. Equal Employment Opportunity
Commission fairness threshold), and how to do inference in an A/B test.

arXiv link: http://arxiv.org/abs/2207.13797v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-07-27

Conformal Prediction Bands for Two-Dimensional Functional Time Series

Authors: Niccolò Ajroldi, Jacopo Diquigiovanni, Matteo Fontana, Simone Vantini

Time evolving surfaces can be modeled as two-dimensional Functional time
series, exploiting the tools of Functional data analysis. Leveraging this
approach, a forecasting framework for such complex data is developed. The main
focus revolves around Conformal Prediction, a versatile nonparametric paradigm
used to quantify uncertainty in prediction problems. Building upon recent
variations of Conformal Prediction for Functional time series, a probabilistic
forecasting scheme for two-dimensional functional time series is presented,
while providing an extension of Functional Autoregressive Processes of order
one to this setting. Estimation techniques for the latter process are
introduced and their performance are compared in terms of the resulting
prediction regions. Finally, the proposed forecasting procedure and the
uncertainty quantification technique are applied to a real dataset, collecting
daily observations of Sea Level Anomalies of the Black Sea

arXiv link: http://arxiv.org/abs/2207.13656v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-07-26

Differentially Private Estimation via Statistical Depth

Authors: Ryan Cumings-Menon

Constructing a differentially private (DP) estimator requires deriving the
maximum influence of an observation, which can be difficult in the absence of
exogenous bounds on the input data or the estimator, especially in high
dimensional settings. This paper shows that standard notions of statistical
depth, i.e., halfspace depth and regression depth, are particularly
advantageous in this regard, both in the sense that the maximum influence of a
single observation is easy to analyze and that this value is typically low.
This is used to motivate new approximate DP location and regression estimators
using the maximizers of these two notions of statistical depth. A more
computationally efficient variant of the approximate DP regression estimator is
also provided. Also, to avoid requiring that users specify a priori bounds on
the estimates and/or the observations, variants of these DP mechanisms are
described that satisfy random differential privacy (RDP), which is a relaxation
of differential privacy provided by Hall, Wasserman, and Rinaldo (2013). We
also provide simulations of the two DP regression methods proposed here. The
proposed estimators appear to perform favorably relative to the existing DP
regression methods we consider in these simulations when either the sample size
is at least 100-200 or the privacy-loss budget is sufficiently high.

arXiv link: http://arxiv.org/abs/2207.12602v1

Econometrics arXiv paper, submitted: 2022-07-25

Forecasting euro area inflation using a huge panel of survey expectations

Authors: Florian Huber, Luca Onorante, Michael Pfarrhofer

In this paper, we forecast euro area inflation and its main components using
an econometric model which exploits a massive number of time series on survey
expectations for the European Commission's Business and Consumer Survey. To
make estimation of such a huge model tractable, we use recent advances in
computational statistics to carry out posterior simulation and inference. Our
findings suggest that the inclusion of a wide range of firms and consumers'
opinions about future economic developments offers useful information to
forecast prices and assess tail risks to inflation. These predictive
improvements do not only arise from surveys related to expected inflation but
also from other questions related to the general economic environment. Finally,
we find that firms' expectations about the future seem to have more predictive
content than consumer expectations.

arXiv link: http://arxiv.org/abs/2207.12225v1

Econometrics arXiv paper, submitted: 2022-07-25

Sparse Bayesian State-Space and Time-Varying Parameter Models

Authors: Sylvia Frühwirth-Schnatter, Peter Knaus

In this chapter, we review variance selection for time-varying parameter
(TVP) models for univariate and multivariate time series within a Bayesian
framework. We show how both continuous as well as discrete spike-and-slab
shrinkage priors can be transferred from variable selection for regression
models to variance selection for TVP models by using a non-centered
parametrization. We discuss efficient MCMC estimation and provide an
application to US inflation modeling.

arXiv link: http://arxiv.org/abs/2207.12147v1

Econometrics arXiv updated paper (originally submitted: 2022-07-25)

Misclassification in Difference-in-differences Models

Authors: Augustine Denteh, Désiré Kédagni

The difference-in-differences (DID) design is one of the most popular methods
used in empirical economics research. However, there is almost no work
examining what the DID method identifies in the presence of a misclassified
treatment variable. This paper studies the identification of treatment effects
in DID designs when the treatment is misclassified. Misclassification arises in
various ways, including when the timing of a policy intervention is ambiguous
or when researchers need to infer treatment from auxiliary data. We show that
the DID estimand is biased and recovers a weighted average of the average
treatment effects on the treated (ATT) in two subpopulations -- the correctly
classified and misclassified groups. In some cases, the DID estimand may yield
the wrong sign and is otherwise attenuated. We provide bounds on the ATT when
the researcher has access to information on the extent of misclassification in
the data. We demonstrate our theoretical results using simulations and provide
two empirical applications to guide researchers in performing sensitivity
analysis using our proposed methods.

arXiv link: http://arxiv.org/abs/2207.11890v2

Econometrics arXiv paper, submitted: 2022-07-23

Detecting common bubbles in multivariate mixed causal-noncausal models

Authors: Gianluca Cubadda, Alain Hecq, Elisa Voisin

This paper proposes methods to investigate whether the bubble patterns
observed in individual series are common to various series. We detect the
non-linear dynamics using the recent mixed causal and noncausal models. Both a
likelihood ratio test and information criteria are investigated, the former
having better performances in our Monte Carlo simulations. Implementing our
approach on three commodity prices we do not find evidence of commonalities
although some series look very similar.

arXiv link: http://arxiv.org/abs/2207.11557v1

Econometrics arXiv updated paper (originally submitted: 2022-07-22)

A Conditional Linear Combination Test with Many Weak Instruments

Authors: Dennis Lim, Wenjie Wang, Yichong Zhang

We consider a linear combination of jackknife Anderson-Rubin (AR), jackknife
Lagrangian multiplier (LM), and orthogonalized jackknife LM tests for inference
in IV regressions with many weak instruments and heteroskedasticity. Following
I.Andrews (2016), we choose the weights in the linear combination based on a
decision-theoretic rule that is adaptive to the identification strength. Under
both weak and strong identifications, the proposed test controls asymptotic
size and is admissible among certain class of tests. Under strong
identification, our linear combination test has optimal power against local
alternatives among the class of invariant or unbiased tests which are
constructed based on jackknife AR and LM tests. Simulations and an empirical
application to Angrist and Krueger's (1991) dataset confirm the good power
properties of our test.

arXiv link: http://arxiv.org/abs/2207.11137v3

Econometrics arXiv paper, submitted: 2022-07-22

Time-Varying Poisson Autoregression

Authors: Giovanni Angelini, Giuseppe Cavaliere, Enzo D'Innocenzo, Luca De Angelis

In this paper we propose a new time-varying econometric model, called
Time-Varying Poisson AutoRegressive with eXogenous covariates (TV-PARX), suited
to model and forecast time series of counts. {We show that the score-driven
framework is particularly suitable to recover the evolution of time-varying
parameters and provides the required flexibility to model and forecast time
series of counts characterized by convoluted nonlinear dynamics and structural
breaks.} We study the asymptotic properties of the TV-PARX model and prove
that, under mild conditions, maximum likelihood estimation (MLE) yields
strongly consistent and asymptotically normal parameter estimates.
Finite-sample performance and forecasting accuracy are evaluated through Monte
Carlo simulations. The empirical usefulness of the time-varying specification
of the proposed TV-PARX model is shown by analyzing the number of new daily
COVID-19 infections in Italy and the number of corporate defaults in the US.

arXiv link: http://arxiv.org/abs/2207.11003v1

Econometrics arXiv paper, submitted: 2022-07-20

Testing for a Threshold in Models with Endogenous Regressors

Authors: Mario P. Rothfelder, Otilia Boldea

We show by simulation that the test for an unknown threshold in models with
endogenous regressors - proposed in Caner and Hansen (2004) - can exhibit
severe size distortions both in small and in moderately large samples,
pertinent to empirical applications. We propose three new tests that rectify
these size distortions. The first test is based on GMM estimators. The other
two are based on unconventional 2SLS estimators, that use additional
information about the linearity (or lack of linearity) of the first stage. Just
like the test in Caner and Hansen (2004), our tests are non-pivotal, and we
prove their bootstrap validity. The empirical application revisits the question
in Ramey and Zubairy (2018) whether government spending multipliers are larger
in recessions, but using tests for an unknown threshold. Consistent with Ramey
and Zubairy (2018), we do not find strong evidence that these multipliers are
larger in recessions.

arXiv link: http://arxiv.org/abs/2207.10076v1

Econometrics arXiv updated paper (originally submitted: 2022-07-20)

Efficient Bias Correction for Cross-section and Panel Data

Authors: Jinyong Hahn, David W. Hughes, Guido Kuersteiner, Whitney K. Newey

Bias correction can often improve the finite sample performance of
estimators. We show that the choice of bias correction method has no effect on
the higher-order variance of semiparametrically efficient parametric
estimators, so long as the estimate of the bias is asymptotically linear. It is
also shown that bootstrap, jackknife, and analytical bias estimates are
asymptotically linear for estimators with higher-order expansions of a standard
form. In particular, we find that for a variety of estimators the
straightforward bootstrap bias correction gives the same higher-order variance
as more complicated analytical or jackknife bias corrections. In contrast, bias
corrections that do not estimate the bias at the parametric rate, such as the
split-sample jackknife, result in larger higher-order variances in the i.i.d.
setting we focus on. For both a cross-sectional MLE and a panel model with
individual fixed effects, we show that the split-sample jackknife has a
higher-order variance term that is twice as large as that of the
`leave-one-out' jackknife.

arXiv link: http://arxiv.org/abs/2207.09943v4

Econometrics arXiv updated paper (originally submitted: 2022-07-19)

Asymptotic Properties of Endogeneity Corrections Using Nonlinear Transformations

Authors: Jörg Breitung, Alexander Mayer, Dominik Wied

This paper considers a linear regression model with an endogenous regressor
which arises from a nonlinear transformation of a latent variable. It is shown
that the corresponding coefficient can be consistently estimated without
external instruments by adding a rank-based transformation of the regressor to
the model and performing standard OLS estimation. In contrast to other
approaches, our nonparametric control function approach does not rely on a
conformably specified copula. Furthermore, the approach allows for the presence
of additional exogenous regressors which may be (linearly) correlated with the
endogenous regressor(s). Consistency and asymptotic normality of the estimator
are proved and the estimator is compared with copula based approaches by means
of Monte Carlo simulations. An empirical application on wage data of the US
current population survey demonstrates the usefulness of our method.

arXiv link: http://arxiv.org/abs/2207.09246v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-07-19

The role of the geometric mean in case-control studies

Authors: Amanda Coston, Edward H. Kennedy

Historically used in settings where the outcome is rare or data collection is
expensive, outcome-dependent sampling is relevant to many modern settings where
data is readily available for a biased sample of the target population, such as
public administrative data. Under outcome-dependent sampling, common effect
measures such as the average risk difference and the average risk ratio are not
identified, but the conditional odds ratio is. Aggregation of the conditional
odds ratio is challenging since summary measures are generally not identified.
Furthermore, the marginal odds ratio can be larger (or smaller) than all
conditional odds ratios. This so-called non-collapsibility of the odds ratio is
avoidable if we use an alternative aggregation to the standard arithmetic mean.
We provide a new definition of collapsibility that makes this choice of
aggregation method explicit, and we demonstrate that the odds ratio is
collapsible under geometric aggregation. We describe how to partially identify,
estimate, and do inference on the geometric odds ratio under outcome-dependent
sampling. Our proposed estimator is based on the efficient influence function
and therefore has doubly robust-style properties.

arXiv link: http://arxiv.org/abs/2207.09016v1

Econometrics arXiv paper, submitted: 2022-07-19

Bias correction and uniform inference for the quantile density function

Authors: Grigory Franguridi

For the kernel estimator of the quantile density function (the derivative of
the quantile function), I show how to perform the boundary bias correction,
establish the rate of strong uniform consistency of the bias-corrected
estimator, and construct the confidence bands that are asymptotically exact
uniformly over the entire domain $[0,1]$. The proposed procedures rely on the
pivotality of the studentized bias-corrected estimator and known
anti-concentration properties of the Gaussian approximation for its supremum.

arXiv link: http://arxiv.org/abs/2207.09004v1

Econometrics arXiv updated paper (originally submitted: 2022-07-18)

Isotonic propensity score matching

Authors: Mengshan Xu, Taisuke Otsu

We propose a one-to-many matching estimator of the average treatment effect
based on propensity scores estimated by isotonic regression. This approach is
predicated on the assumption of monotonicity in the propensity score function,
a condition that can be justified in many economic applications. We show that
the nature of the isotonic estimator can help us to fix many problems of
existing matching methods, including efficiency, choice of the number of
matches, choice of tuning parameters, robustness to propensity score
misspecification, and bootstrap validity. As a by-product, a uniformly
consistent isotonic estimator is developed for our proposed matching method.

arXiv link: http://arxiv.org/abs/2207.08868v3

Econometrics arXiv updated paper (originally submitted: 2022-07-18)

Estimating Continuous Treatment Effects in Panel Data using Machine Learning with a Climate Application

Authors: Sylvia Klosin, Max Vilgalys

Economists often estimate continuous treatment effects in panel data using
linear two-way fixed effects models (TWFE). When the treatment-outcome
relationship is nonlinear, TWFE is misspecifed and potentially biased for the
average partial derivative (APD). We develop an automatic double/de-biased
machine learning (ADML) estimator that is consistent for the population APD
while allowing additive unit fixed effects, nonlinearities, and high
dimensional heterogeneity. We prove asymptotic normality and add two
refinements - optimization based de-biasing and analytic derivatives - that
reduce bias and remove numerical approximation error. Simulations show that the
proposed method outperforms high order polynomial OLS and standard ML
estimators. Our estimator leads to significantly larger (by 50%), but equally
precise, estimates of the effect of extreme heat on corn yield compared to
standard linear models.

arXiv link: http://arxiv.org/abs/2207.08789v3

Econometrics arXiv paper, submitted: 2022-07-17

Testing for explosive bubbles: a review

Authors: Anton Skrobotov

This review discusses methods of testing for explosive bubbles in time
series. A large number of recently developed testing methods under various
assumptions about innovation of errors are covered. The review also considers
the methods for dating explosive (bubble) regimes. Special attention is devoted
to time-varying volatility in the errors. Moreover, the modelling of possible
relationships between time series with explosive regimes is discussed.

arXiv link: http://arxiv.org/abs/2207.08249v1

Econometrics arXiv updated paper (originally submitted: 2022-07-15)

Simultaneity in Binary Outcome Models with an Application to Employment for Couples

Authors: Bo E. Honoré, Luojia Hu, Ekaterini Kyriazidou, Martin Weidner

Two of Peter Schmidt's many contributions to econometrics have been to
introduce a simultaneous logit model for bivariate binary outcomes and to study
estimation of dynamic linear fixed effects panel data models using short
panels. In this paper, we study a dynamic panel data version of the bivariate
model introduced in Schmidt and Strauss (1975) that allows for lagged dependent
variables and fixed effects as in Ahn and Schmidt (1995). We combine a
conditional likelihood approach with a method of moments approach to obtain an
estimation strategy for the resulting model. We apply this estimation strategy
to a simple model for the intra-household relationship in employment. Our main
conclusion is that the within-household dependence in employment differs
significantly by the ethnicity composition of the couple even after one allows
for unobserved household specific heterogeneity.

arXiv link: http://arxiv.org/abs/2207.07343v2

Econometrics arXiv updated paper (originally submitted: 2022-07-15)

Flexible global forecast combinations

Authors: Ryan Thompson, Yilin Qian, Andrey L. Vasnev

Forecast combination -- the aggregation of individual forecasts from multiple
experts or models -- is a proven approach to economic forecasting. To date,
research on economic forecasting has concentrated on local combination methods,
which handle separate but related forecasting tasks in isolation. Yet, it has
been known for over two decades in the machine learning community that global
methods, which exploit task-relatedness, can improve on local methods that
ignore it. Motivated by the possibility for improvement, this paper introduces
a framework for globally combining forecasts while being flexible to the level
of task-relatedness. Through our framework, we develop global versions of
several existing forecast combinations. To evaluate the efficacy of these new
global forecast combinations, we conduct extensive comparisons using synthetic
and real data. Our real data comparisons, which involve forecasts of core
economic indicators in the Eurozone, provide empirical evidence that the
accuracy of global combinations of economic forecasts can surpass local
combinations.

arXiv link: http://arxiv.org/abs/2207.07318v3

Econometrics arXiv updated paper (originally submitted: 2022-07-14)

High Dimensional Generalised Penalised Least Squares

Authors: Ilias Chronopoulos, Katerina Chrysikou, George Kapetanios

In this paper we develop inference for high dimensional linear models, with
serially correlated errors. We examine Lasso under the assumption of strong
mixing in the covariates and error process, allowing for fatter tails in their
distribution. While the Lasso estimator performs poorly under such
circumstances, we estimate via GLS Lasso the parameters of interest and extend
the asymptotic properties of the Lasso under more general conditions. Our
theoretical results indicate that the non-asymptotic bounds for stationary
dependent processes are sharper, while the rate of Lasso under general
conditions appears slower as $T,p\to \infty$. Further we employ the debiased
Lasso to perform inference uniformly on the parameters of interest. Monte Carlo
results support the proposed estimator, as it has significant efficiency gains
over traditional methods.

arXiv link: http://arxiv.org/abs/2207.07055v4

Econometrics arXiv updated paper (originally submitted: 2022-07-14)

Parallel Trends and Dynamic Choices

Authors: Philip Marx, Elie Tamer, Xun Tang

Difference-in-differences is a common method for estimating treatment
effects, and the parallel trends condition is its main identifying assumption:
the trend in mean untreated outcomes is independent of the observed treatment
status. In observational settings, treatment is often a dynamic choice made or
influenced by rational actors, such as policy-makers, firms, or individual
agents. This paper relates parallel trends to economic models of dynamic
choice. We clarify the implications of parallel trends on agent behavior and
study when dynamic selection motives lead to violations of parallel trends.
Finally, we consider identification under alternative assumptions that
accommodate features of dynamic choice.

arXiv link: http://arxiv.org/abs/2207.06564v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-07-13

Parametric quantile regression for income data

Authors: Helton Saulo, Roberto Vila, Giovanna V. Borges, Marcelo Bourguignon

Univariate normal regression models are statistical tools widely applied in
many areas of economics. Nevertheless, income data have asymmetric behavior and
are best modeled by non-normal distributions. The modeling of income plays an
important role in determining workers' earnings, as well as being an important
research topic in labor economics. Thus, the objective of this work is to
propose parametric quantile regression models based on two important asymmetric
income distributions, namely, Dagum and Singh-Maddala distributions. The
proposed quantile models are based on reparameterizations of the original
distributions by inserting a quantile parameter. We present the
reparameterizations, some properties of the distributions, and the quantile
regression models with their inferential aspects. We proceed with Monte Carlo
simulation studies, considering the maximum likelihood estimation performance
evaluation and an analysis of the empirical distribution of two residuals. The
Monte Carlo results show that both models meet the expected outcomes. We apply
the proposed quantile regression models to a household income data set provided
by the National Institute of Statistics of Chile. We showed that both proposed
models had a good performance both in terms of model fitting. Thus, we conclude
that results were favorable to the use of Singh-Maddala and Dagum quantile
regression models for positive asymmetric data, such as income data.

arXiv link: http://arxiv.org/abs/2207.06558v1

Econometrics arXiv paper, submitted: 2022-07-13

Two-stage differences in differences

Authors: John Gardner

A recent literature has shown that when adoption of a treatment is staggered
and average treatment effects vary across groups and over time,
difference-in-differences regression does not identify an easily interpretable
measure of the typical effect of the treatment. In this paper, I extend this
literature in two ways. First, I provide some simple underlying intuition for
why difference-in-differences regression does not identify a
group$\times$period average treatment effect. Second, I propose an alternative
two-stage estimation framework, motivated by this intuition. In this framework,
group and period effects are identified in a first stage from the sample of
untreated observations, and average treatment effects are identified in a
second stage by comparing treated and untreated outcomes, after removing these
group and period effects. The two-stage approach is robust to treatment-effect
heterogeneity under staggered adoption, and can be used to identify a host of
different average treatment effect measures. It is also simple, intuitive, and
easy to implement. I establish the theoretical properties of the two-stage
approach and demonstrate its effectiveness and applicability using Monte-Carlo
evidence and an example from the literature.

arXiv link: http://arxiv.org/abs/2207.05943v1

Econometrics arXiv updated paper (originally submitted: 2022-07-10)

Detecting Grouped Local Average Treatment Effects and Selecting True Instruments

Authors: Nicolas Apfel, Helmut Farbmacher, Rebecca Groh, Martin Huber, Henrika Langen

Under an endogenous binary treatment with heterogeneous effects and multiple
instruments, we propose a two-step procedure for identifying complier groups
with identical local average treatment effects (LATE) despite relying on
distinct instruments, even if several instruments violate the identifying
assumptions. We use the fact that the LATE is homogeneous for instruments which
(i) satisfy the LATE assumptions (instrument validity and treatment
monotonicity in the instrument) and (ii) generate identical complier groups in
terms of treatment propensities given the respective instruments. We propose a
two-step procedure, where we first cluster the propensity scores in the first
step and find groups of IVs with the same reduced form parameters in the second
step. Under the plurality assumption that within each set of instruments with
identical treatment propensities, instruments truly satisfying the LATE
assumptions are the largest group, our procedure permits identifying these true
instruments in a data driven way. We show that our procedure is consistent and
provides consistent and asymptotically normal estimators of underlying LATEs.
We also provide a simulation study investigating the finite sample properties
of our approach and an empirical application investigating the effect of
incarceration on recidivism in the US with judge assignments serving as
instruments.

arXiv link: http://arxiv.org/abs/2207.04481v2

Econometrics arXiv paper, submitted: 2022-07-09

Identification and Inference for Welfare Gains without Unconfoundedness

Authors: Undral Byambadalai

This paper studies identification and inference of the welfare gain that
results from switching from one policy (such as the status quo policy) to
another policy. The welfare gain is not point identified in general when data
are obtained from an observational study or a randomized experiment with
imperfect compliance. I characterize the sharp identified region of the welfare
gain and obtain bounds under various assumptions on the unobservables with and
without instrumental variables. Estimation and inference of the lower and upper
bounds are conducted using orthogonalized moment conditions to deal with the
presence of infinite-dimensional nuisance parameters. I illustrate the analysis
by considering hypothetical policies of assigning individuals to job training
programs using experimental data from the National Job Training Partnership Act
Study. Monte Carlo simulations are conducted to assess the finite sample
performance of the estimators.

arXiv link: http://arxiv.org/abs/2207.04314v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-07-09

Model diagnostics of discrete data regression: a unifying framework using functional residuals

Authors: Zewei Lin, Dungang Liu

Model diagnostics is an indispensable component of regression analysis, yet
it is not well addressed in standard textbooks on generalized linear models.
The lack of exposition is attributed to the fact that when outcome data are
discrete, classical methods (e.g., Pearson/deviance residual analysis and
goodness-of-fit tests) have limited utility in model diagnostics and treatment.
This paper establishes a novel framework for model diagnostics of discrete data
regression. Unlike the literature defining a single-valued quantity as the
residual, we propose to use a function as a vehicle to retain the residual
information. In the presence of discreteness, we show that such a functional
residual is appropriate for summarizing the residual randomness that cannot be
captured by the structural part of the model. We establish its theoretical
properties, which leads to the innovation of new diagnostic tools including the
functional-residual-vs covariate plot and Function-to-Function (Fn-Fn) plot.
Our numerical studies demonstrate that the use of these tools can reveal a
variety of model misspecifications, such as not properly including a
higher-order term, an explanatory variable, an interaction effect, a dispersion
parameter, or a zero-inflation component. The functional residual yields, as a
byproduct, Liu-Zhang's surrogate residual mainly developed for cumulative link
models for ordinal data (Liu and Zhang, 2018, JASA). As a general notion, it
considerably broadens the diagnostic scope as it applies to virtually all
parametric models for binary, ordinal and count data, all in a unified
diagnostic scheme.

arXiv link: http://arxiv.org/abs/2207.04299v1

Econometrics arXiv paper, submitted: 2022-07-08

Spatial Econometrics for Misaligned Data

Authors: Guillaume Allaire Pouliot

We produce methodology for regression analysis when the geographic locations
of the independent and dependent variables do not coincide, in which case we
speak of misaligned data. We develop and investigate two complementary methods
for regression analysis with misaligned data that circumvent the need to
estimate or specify the covariance of the regression errors. We carry out a
detailed reanalysis of Maccini and Yang (2009) and find economically
significant quantitative differences but sustain most qualitative conclusions.

arXiv link: http://arxiv.org/abs/2207.04082v1

Econometrics arXiv paper, submitted: 2022-07-08

Large Bayesian VARs with Factor Stochastic Volatility: Identification, Order Invariance and Structural Analysis

Authors: Joshua Chan, Eric Eisenstat, Xuewen Yu

Vector autoregressions (VARs) with multivariate stochastic volatility are
widely used for structural analysis. Often the structural model identified
through economically meaningful restrictions--e.g., sign restrictions--is
supposed to be independent of how the dependent variables are ordered. But
since the reduced-form model is not order invariant, results from the
structural analysis depend on the order of the variables. We consider a VAR
based on the factor stochastic volatility that is constructed to be order
invariant. We show that the presence of multivariate stochastic volatility
allows for statistical identification of the model. We further prove that, with
a suitable set of sign restrictions, the corresponding structural model is
point-identified. An additional appeal of the proposed approach is that it can
easily handle a large number of dependent variables as well as sign
restrictions. We demonstrate the methodology through a structural analysis in
which we use a 20-variable VAR with sign restrictions to identify 5 structural
shocks.

arXiv link: http://arxiv.org/abs/2207.03988v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-07-07

On the instrumental variable estimation with many weak and invalid instruments

Authors: Yiqi Lin, Frank Windmeijer, Xinyuan Song, Qingliang Fan

We discuss the fundamental issue of identification in linear instrumental
variable (IV) models with unknown IV validity. With the assumption of the
"sparsest rule", which is equivalent to the plurality rule but becomes
operational in computation algorithms, we investigate and prove the advantages
of non-convex penalized approaches over other IV estimators based on two-step
selections, in terms of selection consistency and accommodation for
individually weak IVs. Furthermore, we propose a surrogate sparsest penalty
that aligns with the identification condition and provides oracle sparse
structure simultaneously. Desirable theoretical properties are derived for the
proposed estimator with weaker IV strength conditions compared to the previous
literature. Finite sample properties are demonstrated using simulations and the
selection and estimation method is applied to an empirical study concerning the
effect of BMI on diastolic blood pressure.

arXiv link: http://arxiv.org/abs/2207.03035v2

Econometrics arXiv paper, submitted: 2022-07-06

Degrees of Freedom and Information Criteria for the Synthetic Control Method

Authors: Guillaume Allaire Pouliot, Zhen Xie

We provide an analytical characterization of the model flexibility of the
synthetic control method (SCM) in the familiar form of degrees of freedom. We
obtain estimable information criteria. These may be used to circumvent
cross-validation when selecting either the weighting matrix in the SCM with
covariates, or the tuning parameter in model averaging or penalized variants of
SCM. We assess the impact of car license rationing in Tianjin and make a novel
use of SCM; while a natural match is available, it and other donors are noisy,
inviting the use of SCM to average over approximately matching donors. The very
large number of candidate donors calls for model averaging or penalized
variants of SCM and, with short pre-treatment series, model selection per
information criteria outperforms that per cross-validation.

arXiv link: http://arxiv.org/abs/2207.02943v1

Econometrics arXiv updated paper (originally submitted: 2022-07-04)

csa2sls: A complete subset approach for many instruments using Stata

Authors: Seojeong Lee, Siha Lee, Julius Owusu, Youngki Shin

We develop a Stata command $csa2sls$ that implements the complete
subset averaging two-stage least squares (CSA2SLS) estimator in Lee and Shin
(2021). The CSA2SLS estimator is an alternative to the two-stage least squares
estimator that remedies the bias issue caused by many correlated instruments.
We conduct Monte Carlo simulations and confirm that the CSA2SLS estimator
reduces both the mean squared error and the estimation bias substantially when
instruments are correlated. We illustrate the usage of $csa2sls$ in
Stata by an empirical application.

arXiv link: http://arxiv.org/abs/2207.01533v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-07-01

A Comparison of Methods for Adaptive Experimentation

Authors: Samantha Horn, Sabina J. Sloman

We use a simulation study to compare three methods for adaptive
experimentation: Thompson sampling, Tempered Thompson sampling, and Exploration
sampling. We gauge the performance of each in terms of social welfare and
estimation accuracy, and as a function of the number of experimental waves. We
further construct a set of novel "hybrid" loss measures to identify which
methods are optimal for researchers pursuing a combination of experimental
aims. Our main results are: 1) the relative performance of Thompson sampling
depends on the number of experimental waves, 2) Tempered Thompson sampling
uniquely distributes losses across multiple experimental aims, and 3) in most
cases, Exploration sampling performs similarly to random assignment.

arXiv link: http://arxiv.org/abs/2207.00683v1

Econometrics arXiv paper, submitted: 2022-07-01

Valid and Unobtrusive Measurement of Returns to Advertising through Asymmetric Budget Split

Authors: Johannes Hermle, Giorgio Martini

Ad platforms require reliable measurement of advertising returns: what
increase in performance (such as clicks or conversions) can an advertiser
expect in return for additional budget on the platform? Even from the
perspective of the platform, accurately measuring advertising returns is hard.
Selection and omitted variable biases make estimates from observational methods
unreliable, and straightforward experimentation is often costly or infeasible.
We introduce Asymmetric Budget Split, a novel methodology for valid measurement
of ad returns from the perspective of the platform. Asymmetric budget split
creates small asymmetries in ad budget allocation across comparable partitions
of the platform's userbase. By observing performance of the same ad at
different budget levels while holding all other factors constant, the platform
can obtain a valid measure of ad returns. The methodology is unobtrusive and
cost-effective in that it does not require holdout groups or sacrifices in ad
or marketplace performance. We discuss a successful deployment of asymmetric
budget split to LinkedIn's Jobs Marketplace, an ad marketplace where it is used
to measure returns from promotion budgets in terms of incremental job
applicants. We outline operational considerations for practitioners and discuss
further use cases such as budget-aware performance forecasting.

arXiv link: http://arxiv.org/abs/2207.00206v1

Econometrics arXiv paper, submitted: 2022-06-30

Unique futures in China: studys on volatility spillover effects of ferrous metal futures

Authors: Tingting Cao, Weiqing Sun, Cuiping Sun, Lin Hao

Ferrous metal futures have become unique commodity futures with Chinese
characteristics. Due to the late listing time, it has received less attention
from scholars. Our research focuses on the volatility spillover effects,
defined as the intensity of price volatility in financial instruments. We use
DCC-GARCH, BEKK-GARCH, and DY(2012) index methods to conduct empirical tests on
the volatility spillover effects of the Chinese ferrous metal futures market
and other parts of the Chinese commodity futures market, as well as industries
related to the steel industry chain in stock markets. It can be seen that there
is a close volatility spillover relationship between ferrous metal futures and
nonferrous metal futures. Energy futures and chemical futures have a
significant transmission effect on the fluctuations of ferrous metals. In
addition, ferrous metal futures have a significant spillover effect on the
stock index of the steel industry, real estate industry, building materials
industry, machinery equipment industry, and household appliance industry.
Studying the volatility spillover effect of the ferrous metal futures market
can reveal the operating laws of this field and provide ideas and theoretical
references for investors to hedge their risks. It shows that the ferrous metal
futures market has an essential role as a "barometer" for the Chinese commodity
futures market and the stock market.

arXiv link: http://arxiv.org/abs/2206.15039v1

Econometrics arXiv updated paper (originally submitted: 2022-06-28)

Dynamic CoVaR Modeling and Estimation

Authors: Timo Dimitriadis, Yannick Hoga

The popular systemic risk measure CoVaR (conditional Value-at-Risk) and its
variants are widely used in economics and finance. In this article, we propose
joint dynamic forecasting models for the Value-at-Risk (VaR) and CoVaR. The
CoVaR version we consider is defined as a large quantile of one variable (e.g.,
losses in the financial system) conditional on some other variable (e.g.,
losses in a bank's shares) being in distress. We introduce a two-step
M-estimator for the model parameters drawing on recently proposed bivariate
scoring functions for the pair (VaR, CoVaR). We prove consistency and
asymptotic normality of our parameter estimator and analyze its finite-sample
properties in simulations. Finally, we apply a specific subclass of our dynamic
forecasting models, which we call CoCAViaR models, to log-returns of large US
banks. A formal forecast comparison shows that our CoCAViaR models generate
CoVaR predictions which are superior to forecasts issued from current benchmark
models.

arXiv link: http://arxiv.org/abs/2206.14275v4

Econometrics arXiv paper, submitted: 2022-06-28

Business Cycle Synchronization in the EU: A Regional-Sectoral Look through Soft-Clustering and Wavelet Decomposition

Authors: Saulius Jokubaitis, Dmitrij Celov

This paper elaborates on the sectoral-regional view of the business cycle
synchronization in the EU -- a necessary condition for the optimal currency
area. We argue that complete and tidy clustering of the data improves the
decision maker's understanding of the business cycle and, by extension, the
quality of economic decisions. We define the business cycles by applying a
wavelet approach to drift-adjusted gross value added data spanning over 2000Q1
to 2021Q2. For the application of the synchronization analysis, we propose the
novel soft-clustering approach, which adjusts hierarchical clustering in
several aspects. First, the method relies on synchronicity dissimilarity
measures, noting that, for time series data, the feature space is the set of
all points in time. Then, the “soft” part of the approach strengthens the
synchronization signal by using silhouette measures. Finally, we add a
probabilistic sparsity algorithm to drop out the most asynchronous “noisy”
data improving the silhouette scores of the most and less synchronous groups.
The method, hence, splits the sectoral-regional data into three groups: the
synchronous group that shapes the EU business cycle; the less synchronous group
that may hint at cycle forecasting relevant information; the asynchronous group
that may help investors to diversify through-the-cycle risks of the investment
portfolios. The results support the core-periphery hypothesis.

arXiv link: http://arxiv.org/abs/2206.14128v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2022-06-28

Estimating the Currency Composition of Foreign Exchange Reserves

Authors: Matthew Ferranti

Central banks manage about $12 trillion in foreign exchange reserves,
influencing global exchange rates and asset prices. However, some of the
largest holders of reserves report minimal information about their currency
composition, hindering empirical analysis. I describe a Hidden Markov Model to
estimate the composition of a central bank's reserves by relating the
fluctuation in the portfolio's valuation to the exchange rates of major reserve
currencies. I apply the model to China and Singapore, two countries that
collectively hold about $3.4 trillion in reserves and conceal their
composition. I find that both China's reserve composition likely resembles the
global average, while Singapore probably holds fewer US dollars.

arXiv link: http://arxiv.org/abs/2206.13751v4

Econometrics arXiv paper, submitted: 2022-06-27

Misspecification and Weak Identification in Asset Pricing

Authors: Frank Kleibergen, Zhaoguo Zhan

The widespread co-existence of misspecification and weak identification in
asset pricing has led to an overstated performance of risk factors. Because the
conventional Fama and MacBeth (1973) methodology is jeopardized by
misspecification and weak identification, we infer risk premia by using a
double robust Lagrange multiplier test that remains reliable in the presence of
these two empirically relevant issues. Moreover, we show how the
identification, and the resulting appropriate interpretation, of the risk
premia is governed by the relative magnitudes of the misspecification
J-statistic and the identification IS-statistic. We revisit several prominent
empirical applications and all specifications with one to six factors from the
factor zoo of Feng, Giglio, and Xiu (2020) to emphasize the widespread
occurrence of misspecification and weak identification.

arXiv link: http://arxiv.org/abs/2206.13600v1

Econometrics arXiv updated paper (originally submitted: 2022-06-26)

Instrumented Common Confounding

Authors: Christian Tien

Causal inference is difficult in the presence of unobserved confounders. We
introduce the instrumented common confounding (ICC) approach to
(nonparametrically) identify causal effects with instruments, which are
exogenous only conditional on some unobserved common confounders. The ICC
approach is most useful in rich observational data with multiple sources of
unobserved confounding, where instruments are at most exogenous conditional on
some unobserved common confounders. Suitable examples of this setting are
various identification problems in the social sciences, nonlinear dynamic
panels, and problems with multiple endogenous confounders. The ICC identifying
assumptions are closely related to those in mixture models, negative control
and IV. Compared to mixture models [Bonhomme et al., 2016], we require less
conditionally independent variables and do not need to model the unobserved
confounder. Compared to negative control [Cui et al., 2020], we allow for
non-common confounders, with respect to which the instruments are exogenous.
Compared to IV [Newey and Powell, 2003], we allow instruments to be exogenous
conditional on some unobserved common confounders, for which a set of relevant
observed variables exists. We prove point identification with outcome model and
alternatively first stage restrictions. We provide a practical step-by-step
guide to the ICC model assumptions and present the causal effect of education
on income as a motivating example.

arXiv link: http://arxiv.org/abs/2206.12919v2

Econometrics arXiv updated paper (originally submitted: 2022-06-24)

Estimation and Inference in High-Dimensional Panel Data Models with Interactive Fixed Effects

Authors: Maximilian Ruecker, Michael Vogt, Oliver Linton, Christopher Walsh

We develop new econometric methods for estimation and inference in
high-dimensional panel data models with interactive fixed effects. Our approach
can be regarded as a non-trivial extension of the very popular common
correlated effects (CCE) approach. Roughly speaking, we proceed as follows: We
first construct a projection device to eliminate the unobserved factors from
the model by applying a dimensionality reduction transform to the matrix of
cross-sectionally averaged covariates. The unknown parameters are then
estimated by applying lasso techniques to the projected model. For inference
purposes, we derive a desparsified version of our lasso-type estimator. While
the original CCE approach is restricted to the low-dimensional case where the
number of regressors is small and fixed, our methods can deal with both low-
and high-dimensional situations where the number of regressors is large and may
even exceed the overall sample size. We derive theory for our estimation and
inference methods both in the large-T-case, where the time series length T
tends to infinity, and in the small-T-case, where T is a fixed natural number.
Specifically, we derive the convergence rate of our estimator and show that its
desparsified version is asymptotically normal under suitable regularity
conditions. The theoretical analysis of the paper is complemented by a
simulation study and an empirical application to characteristic based asset
pricing.

arXiv link: http://arxiv.org/abs/2206.12152v3

Econometrics arXiv updated paper (originally submitted: 2022-06-21)

Assessing and Comparing Fixed-Target Forecasts of Arctic Sea Ice: Glide Charts for Feature-Engineered Linear Regression and Machine Learning Models

Authors: Francis X. Diebold, Maximilian Goebel, Philippe Goulet Coulombe

We use "glide charts" (plots of sequences of root mean squared forecast
errors as the target date is approached) to evaluate and compare fixed-target
forecasts of Arctic sea ice. We first use them to evaluate the simple
feature-engineered linear regression (FELR) forecasts of Diebold and Goebel
(2021), and to compare FELR forecasts to naive pure-trend benchmark forecasts.
Then we introduce a much more sophisticated feature-engineered machine learning
(FEML) model, and we use glide charts to evaluate FEML forecasts and compare
them to a FELR benchmark. Our substantive results include the frequent
appearance of predictability thresholds, which differ across months, meaning
that accuracy initially fails to improve as the target date is approached but
then increases progressively once a threshold lead time is crossed. Also, we
find that FEML can improve appreciably over FELR when forecasting "turning
point" months in the annual cycle at horizons of one to three months ahead.

arXiv link: http://arxiv.org/abs/2206.10721v2

Econometrics arXiv updated paper (originally submitted: 2022-06-21)

New possibilities in identification of binary choice models with fixed effects

Authors: Yinchu Zhu

We study the identification of binary choice models with fixed effects. We
propose a condition called sign saturation and show that this condition is
sufficient for identifying the model. In particular, this condition can
guarantee identification even when all the regressors are bounded, including
multiple discrete regressors. We also establish that without this condition,
the model is not identified unless the error distribution belongs to a special
class. Moreover, we show that sign saturation is also essential for identifying
the sign of treatment effects. Finally, we introduce a measure for sign
saturation and develop tools for its estimation and inference.

arXiv link: http://arxiv.org/abs/2206.10475v7

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-06-21

Symmetric generalized Heckman models

Authors: Helton Saulo, Roberto Vila, Shayane S. Cordeiro

The sample selection bias problem arises when a variable of interest is
correlated with a latent variable, and involves situations in which the
response variable had part of its observations censored. Heckman (1976)
proposed a sample selection model based on the bivariate normal distribution
that fits both the variable of interest and the latent variable. Recently, this
assumption of normality has been relaxed by more flexible models such as the
Student-t distribution (Marchenko and Genton, 2012; Lachos et al., 2021). The
aim of this work is to propose generalized Heckman sample selection models
based on symmetric distributions (Fang et al., 1990). This is a new class of
sample selection models, in which variables are added to the dispersion and
correlation parameters. A Monte Carlo simulation study is performed to assess
the behavior of the parameter estimation method. Two real data sets are
analyzed to illustrate the proposed approach.

arXiv link: http://arxiv.org/abs/2206.10054v1

Econometrics arXiv updated paper (originally submitted: 2022-06-20)

Policy Learning under Endogeneity Using Instrumental Variables

Authors: Yan Liu

This paper studies the identification and estimation of individualized
intervention policies in observational data settings characterized by
endogenous treatment selection and the availability of instrumental variables.
We introduce encouragement rules that manipulate an instrument. Incorporating
the marginal treatment effects (MTE) as policy invariant structural parameters,
we establish the identification of the social welfare criterion for the optimal
encouragement rule. Focusing on binary encouragement rules, we propose to
estimate the optimal policy via the Empirical Welfare Maximization (EWM) method
and derive convergence rates of the regret (welfare loss). We consider
extensions to accommodate multiple instruments and budget constraints. Using
data from the Indonesian Family Life Survey, we apply the EWM encouragement
rule to advise on the optimal tuition subsidy assignment. Our framework offers
interpretability regarding why a certain subpopulation is targeted.

arXiv link: http://arxiv.org/abs/2206.09883v3

Econometrics arXiv paper, submitted: 2022-06-20

Unbiased estimation of the OLS covariance matrix when the errors are clustered

Authors: Tom Boot, Gianmaria Niccodemi, Tom Wansbeek

When data are clustered, common practice has become to do OLS and use an
estimator of the covariance matrix of the OLS estimator that comes close to
unbiasedness. In this paper we derive an estimator that is unbiased when the
random-effects model holds. We do the same for two more general structures. We
study the usefulness of these estimators against others by simulation, the size
of the $t$-test being the criterion. Our findings suggest that the choice of
estimator hardly matters when the regressor has the same distribution over the
clusters. But when the regressor is a cluster-specific treatment variable, the
choice does matter and the unbiased estimator we propose for the random-effects
model shows excellent performance, even when the clusters are highly
unbalanced.

arXiv link: http://arxiv.org/abs/2206.09644v1

Econometrics arXiv paper, submitted: 2022-06-19

Optimal data-driven hiring with equity for underrepresented groups

Authors: Yinchu Zhu, Ilya O. Ryzhov

We present a data-driven prescriptive framework for fair decisions, motivated
by hiring. An employer evaluates a set of applicants based on their observable
attributes. The goal is to hire the best candidates while avoiding bias with
regard to a certain protected attribute. Simply ignoring the protected
attribute will not eliminate bias due to correlations in the data. We present a
hiring policy that depends on the protected attribute functionally, but not
statistically, and we prove that, among all possible fair policies, ours is
optimal with respect to the firm's objective. We test our approach on both
synthetic and real data, and find that it shows great practical potential to
improve equity for underrepresented and historically marginalized groups.

arXiv link: http://arxiv.org/abs/2206.09300v1

Econometrics arXiv paper, submitted: 2022-06-18

Interpretable and Actionable Vehicular Greenhouse Gas Emission Prediction at Road link-level

Authors: S. Roderick Zhang, Bilal Farooq

To help systematically lower anthropogenic Greenhouse gas (GHG) emissions,
accurate and precise GHG emission prediction models have become a key focus of
the climate research. The appeal is that the predictive models will inform
policymakers, and hopefully, in turn, they will bring about systematic changes.
Since the transportation sector is constantly among the top GHG emission
contributors, especially in populated urban areas, substantial effort has been
going into building more accurate and informative GHG prediction models to help
create more sustainable urban environments. In this work, we seek to establish
a predictive framework of GHG emissions at the urban road segment or link level
of transportation networks. The key theme of the framework centers around model
interpretability and actionability for high-level decision-makers using
econometric Discrete Choice Modelling (DCM). We illustrate that DCM is capable
of predicting link-level GHG emission levels on urban road networks in a
parsimonious and effective manner. Our results show up to 85.4% prediction
accuracy in the DCM models' performances. We also argue that since the goal of
most GHG emission prediction models focuses on involving high-level
decision-makers to make changes and curb emissions, the DCM-based GHG emission
prediction framework is the most suitable framework.

arXiv link: http://arxiv.org/abs/2206.09073v1

Econometrics arXiv updated paper (originally submitted: 2022-06-17)

Semiparametric Single-Index Estimation for Average Treatment Effects

Authors: Difang Huang, Jiti Gao, Tatsushi Oka

We propose a semiparametric method to estimate the average treatment effect
under the assumption of unconfoundedness given observational data. Our
estimation method alleviates misspecification issues of the propensity score
function by estimating the single-index link function involved through Hermite
polynomials. Our approach is computationally tractable and allows for
moderately large dimension covariates. We provide the large sample properties
of the estimator and show its validity. Also, the average treatment effect
estimator achieves the parametric rate and asymptotic normality. Our extensive
Monte Carlo study shows that the proposed estimator is valid in finite samples.
Applying our method to maternal smoking and infant health, we find that
conventional estimates of smoking's impact on birth weight may be biased due to
propensity score misspecification, and our analysis of job training programs
reveals earnings effects that are more precisely estimated than in prior work.
These applications demonstrate how addressing model misspecification can
substantively affect our understanding of key policy-relevant treatment
effects.

arXiv link: http://arxiv.org/abs/2206.08503v4

Econometrics arXiv paper, submitted: 2022-06-16

Fast and Accurate Variational Inference for Large Bayesian VARs with Stochastic Volatility

Authors: Joshua C. C. Chan, Xuewen Yu

We propose a new variational approximation of the joint posterior
distribution of the log-volatility in the context of large Bayesian VARs. In
contrast to existing approaches that are based on local approximations, the new
proposal provides a global approximation that takes into account the entire
support of the joint distribution. In a Monte Carlo study we show that the new
global approximation is over an order of magnitude more accurate than existing
alternatives. We illustrate the proposed methodology with an application of a
96-variable VAR with stochastic volatility to measure global bank network
connectedness.

arXiv link: http://arxiv.org/abs/2206.08438v1

Econometrics arXiv updated paper (originally submitted: 2022-06-16)

Likelihood ratio test for structural changes in factor models

Authors: Jushan Bai, Jiangtao Duan, Xu Han

A factor model with a break in its factor loadings is observationally
equivalent to a model without changes in the loadings but a change in the
variance of its factors. This effectively transforms a structural change
problem of high dimension into a problem of low dimension. This paper considers
the likelihood ratio (LR) test for a variance change in the estimated factors.
The LR test implicitly explores a special feature of the estimated factors: the
pre-break and post-break variances can be a singular matrix under the
alternative hypothesis, making the LR test diverging faster and thus more
powerful than Wald-type tests. The better power property of the LR test is also
confirmed by simulations. We also consider mean changes and multiple breaks. We
apply the procedure to the factor modelling and structural change of the US
employment using monthly industry-level-data.

arXiv link: http://arxiv.org/abs/2206.08052v2

Econometrics arXiv paper, submitted: 2022-06-15

Optimality of Matched-Pair Designs in Randomized Controlled Trials

Authors: Yuehao Bai

In randomized controlled trials (RCTs), treatment is often assigned by
stratified randomization. I show that among all stratified randomization
schemes which treat all units with probability one half, a certain matched-pair
design achieves the maximum statistical precision for estimating the average
treatment effect (ATE). In an important special case, the optimal design pairs
units according to the baseline outcome. In a simulation study based on
datasets from 10 RCTs, this design lowers the standard error for the estimator
of the ATE by 10% on average, and by up to 34%, relative to the original
designs.

arXiv link: http://arxiv.org/abs/2206.07845v1

Econometrics arXiv paper, submitted: 2022-06-15

Finite-Sample Guarantees for High-Dimensional DML

Authors: Victor Quintas-Martinez

Debiased machine learning (DML) offers an attractive way to estimate
treatment effects in observational settings, where identification of causal
parameters requires a conditional independence or unconfoundedness assumption,
since it allows to control flexibly for a potentially very large number of
covariates. This paper gives novel finite-sample guarantees for joint inference
on high-dimensional DML, bounding how far the finite-sample distribution of the
estimator is from its asymptotic Gaussian approximation. These guarantees are
useful to applied researchers, as they are informative about how far off the
coverage of joint confidence bands can be from the nominal level. There are
many settings where high-dimensional causal parameters may be of interest, such
as the ATE of many treatment profiles, or the ATE of a treatment on many
outcomes. We also cover infinite-dimensional parameters, such as impacts on the
entire marginal distribution of potential outcomes. The finite-sample
guarantees in this paper complement the existing results on consistency and
asymptotic normality of DML estimators, which are either asymptotic or treat
only the one-dimensional case.

arXiv link: http://arxiv.org/abs/2206.07386v1

Econometrics arXiv paper, submitted: 2022-06-14

A new algorithm for structural restrictions in Bayesian vector autoregressions

Authors: Dimitris Korobilis

A comprehensive methodology for inference in vector autoregressions (VARs)
using sign and other structural restrictions is developed. The reduced-form VAR
disturbances are driven by a few common factors and structural identification
restrictions can be incorporated in their loadings in the form of parametric
restrictions. A Gibbs sampler is derived that allows for reduced-form
parameters and structural restrictions to be sampled efficiently in one step. A
key benefit of the proposed approach is that it allows for treating parameter
estimation and structural inference as a joint problem. An additional benefit
is that the methodology can scale to large VARs with multiple shocks, and it
can be extended to accommodate non-linearities, asymmetries, and numerous other
interesting empirical features. The excellent properties of the new algorithm
for inference are explored using synthetic data experiments, and by revisiting
the role of financial factors in economic fluctuations using identification
based on sign restrictions.

arXiv link: http://arxiv.org/abs/2206.06892v1

Econometrics arXiv paper, submitted: 2022-06-14

Nowcasting the Portuguese GDP with Monthly Data

Authors: João B. Assunção, Pedro Afonso Fernandes

In this article, we present a method to forecast the Portuguese gross
domestic product (GDP) in each current quarter (nowcasting). It combines bridge
equations of the real GDP on readily available monthly data like the Economic
Sentiment Indicator (ESI), industrial production index, cement sales or exports
and imports, with forecasts for the jagged missing values computed with the
well-known Hodrick and Prescott (HP) filter. As shown, this simple multivariate
approach can perform as well as a Targeted Diffusion Index (TDI) model and
slightly better than the univariate Theta method in terms of out-of-sample mean
errors.

arXiv link: http://arxiv.org/abs/2206.06823v1

Econometrics arXiv cross-link from cs.CR (cs.CR), submitted: 2022-06-13

A novel reconstruction attack on foreign-trade official statistics, with a Brazilian case study

Authors: Danilo Fabrino Favato, Gabriel Coutinho, Mário S. Alvim, Natasha Fernandes

In this paper we describe, formalize, implement, and experimentally evaluate
a novel transaction re-identification attack against official foreign-trade
statistics releases in Brazil. The attack's goal is to re-identify the
importers of foreign-trade transactions (by revealing the identity of the
company performing that transaction), which consequently violates those
importers' fiscal secrecy (by revealing sensitive information: the value and
volume of traded goods). We provide a mathematical formalization of this fiscal
secrecy problem using principles from the framework of quantitative information
flow (QIF), then carefully identify the main sources of imprecision in the
official data releases used as auxiliary information in the attack, and model
transaction re-construction as a linear optimization problem solvable through
integer linear programming (ILP). We show that this problem is NP-complete, and
provide a methodology to identify tractable instances. We exemplify the
feasibility of our attack by performing 2,003 transaction re-identifications
that in total amount to more than $137M, and affect 348 Brazilian companies.
Further, since similar statistics are produced by other statistical agencies,
our attack is of broader concern.

arXiv link: http://arxiv.org/abs/2206.06493v1

Econometrics arXiv updated paper (originally submitted: 2022-06-13)

Clustering coefficients as measures of the complex interactions in a directed weighted multilayer network

Authors: Paolo Bartesaghi, Gian Paolo Clemente, Rosanna Grassi

In this paper, we provide novel definitions of clustering coefficient for
weighted and directed multilayer networks. We extend in the multilayer
theoretical context the clustering coefficients proposed in the literature for
weighted directed monoplex networks. We quantify how deeply a node is involved
in a choesive structure focusing on a single node, on a single layer or on the
entire system. The coefficients convey several characteristics inherent to the
complex topology of the multilayer network. We test their effectiveness
applying them to a particularly complex structure such as the international
trade network. The trade data integrate different aspects and they can be
described by a directed and weighted multilayer network, where each layer
represents import and export relationships between countries for a given
sector. The proposed coefficients find successful application in describing the
interrelations of the trade network, allowing to disentangle the effects of
countries and sectors and jointly consider the interactions between them.

arXiv link: http://arxiv.org/abs/2206.06309v2

Econometrics arXiv paper, submitted: 2022-06-13

A Constructive GAN-based Approach to Exact Estimate Treatment Effect without Matching

Authors: Boyang You, Kerry Papps

Matching has become the mainstream in counterfactual inference, with which
selection bias between sample groups can be significantly eliminated. However
in practice, when estimating average treatment effect on the treated (ATT) via
matching, no matter which method, the trade-off between estimation accuracy and
information loss constantly exist. Attempting to completely replace the
matching process, this paper proposes the GAN-ATT estimator that integrates
generative adversarial network (GAN) into counterfactual inference framework.
Through GAN machine learning, the probability density functions (PDFs) of
samples in both treatment group and control group can be approximated. By
differentiating conditional PDFs of the two groups with identical input
condition, the conditional average treatment effect (CATE) can be estimated,
and the ensemble average of corresponding CATEs over all treatment group
samples is the estimate of ATT. Utilizing GAN-based infinite sample
augmentations, problems in the case of insufficient samples or lack of common
support domains can be easily solved. Theoretically, when GAN could perfectly
learn the PDFs, our estimators can provide exact estimate of ATT.
To check the performance of the GAN-ATT estimator, three sets of data are
used for ATT estimations: Two toy data sets with 1/2 dimensional covariate
inputs and constant/covariate-dependent treatment effect are tested. The
estimates of GAN-ATT are proved close to the ground truth and are better than
traditional matching approaches; A real firm-level data set with
high-dimensional input is tested and the applicability towards real data sets
is evaluated by comparing matching approaches. Through the evidences obtained
from the three tests, we believe that the GAN-ATT estimator has significant
advantages over traditional matching methods in estimating ATT.

arXiv link: http://arxiv.org/abs/2206.06116v1

Econometrics arXiv paper, submitted: 2022-06-13

Robust Knockoffs for Controlling False Discoveries With an Application to Bond Recovery Rates

Authors: Konstantin Görgen, Abdolreza Nazemi, Melanie Schienle

We address challenges in variable selection with highly correlated data that
are frequently present in finance, economics, but also in complex natural
systems as e.g. weather. We develop a robustified version of the knockoff
framework, which addresses challenges with high dependence among possibly many
influencing factors and strong time correlation. In particular, the repeated
subsampling strategy tackles the variability of the knockoffs and the
dependency of factors. Simultaneously, we also control the proportion of false
discoveries over a grid of all possible values, which mitigates variability of
selected factors from ad-hoc choices of a specific false discovery level. In
the application for corporate bond recovery rates, we identify new important
groups of relevant factors on top of the known standard drivers. But we also
show that out-of-sample, the resulting sparse model has similar predictive
power to state-of-the-art machine learning models that use the entire set of
predictors.

arXiv link: http://arxiv.org/abs/2206.06026v1

Econometrics arXiv updated paper (originally submitted: 2022-06-10)

Debiased Machine Learning U-statistics

Authors: Juan Carlos Escanciano, Joël Robert Terschuur

We propose a method to debias estimators based on U-statistics with Machine
Learning (ML) first-steps. Standard plug-in estimators often suffer from
regularization and model-selection biases, producing invalid inferences. We
show that Debiased Machine Learning (DML) estimators can be constructed within
a U-statistics framework to correct these biases while preserving desirable
statistical properties. The approach delivers simple, robust estimators with
provable asymptotic normality and good finite-sample performance. We apply our
method to three problems: inference on Inequality of Opportunity (IOp) using
the Gini coefficient of ML-predicted incomes given circumstances, inference on
predictive accuracy via the Area Under the Curve (AUC), and inference on linear
models with ML-based sample-selection corrections. Using European survey data,
we present the first debiased estimates of income IOp. In our empirical
application, commonly employed ML-based plug-in estimators systematically
underestimate IOp, while our debiased estimators are robust across ML methods.

arXiv link: http://arxiv.org/abs/2206.05235v4

Econometrics arXiv updated paper (originally submitted: 2022-06-10)

Forecasting macroeconomic data with Bayesian VARs: Sparse or dense? It depends!

Authors: Luis Gruber, Gregor Kastner

Vector autogressions (VARs) are widely applied when it comes to modeling and
forecasting macroeconomic variables. In high dimensions, however, they are
prone to overfitting. Bayesian methods, more concretely shrinkage priors, have
shown to be successful in improving prediction performance. In the present
paper, we introduce the semi-global framework, in which we replace the
traditional global shrinkage parameter with group-specific shrinkage
parameters. We show how this framework can be applied to various shrinkage
priors, such as global-local priors and stochastic search variable selection
priors. We demonstrate the virtues of the proposed framework in an extensive
simulation study and in an empirical application forecasting data of the US
economy. Further, we shed more light on the ongoing “Illusion of Sparsity”
debate, finding that forecasting performances under sparse/dense priors vary
across evaluated economic variables and across time frames. Dynamic model
averaging, however, can combine the merits of both worlds.

arXiv link: http://arxiv.org/abs/2206.04902v5

Econometrics arXiv updated paper (originally submitted: 2022-06-09)

On the Performance of the Neyman Allocation with Small Pilots

Authors: Yong Cai, Ahnaf Rafi

The Neyman Allocation is used in many papers on experimental design, which
typically assume that researchers have access to large pilot studies. This may
be unrealistic. To understand the properties of the Neyman Allocation with
small pilots, we study its behavior in an asymptotic framework that takes pilot
size to be fixed even as the size of the main wave tends to infinity. Our
analysis shows that the Neyman Allocation can lead to estimates of the ATE with
higher asymptotic variance than with (non-adaptive) balanced randomization. In
particular, this happens when the outcome variable is relatively homoskedastic
with respect to treatment status or when it exhibits high kurtosis. We provide
a series of empirical examples showing that such situations can arise in
practice. Our results suggest that researchers with small pilots should not use
the Neyman Allocation if they believe that outcomes are homoskedastic or
heavy-tailed. Finally, we examine some potential methods for improving the
finite sample performance of the FNA via simulations.

arXiv link: http://arxiv.org/abs/2206.04643v4

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2022-06-09

A Two-Ball Ellsberg Paradox: An Experiment

Authors: Brian Jabarian, Simon Lazarus

We conduct an incentivized experiment on a nationally representative US
sample \\ (N=708) to test whether people prefer to avoid ambiguity even when it
means choosing dominated options. In contrast to the literature, we find that
55% of subjects prefer a risky act to an ambiguous act that always provides a
larger probability of winning. Our experimental design shows that such a
preference is not mainly due to a lack of understanding. We conclude that
subjects avoid ambiguity per se rather than avoiding ambiguity because
it may yield a worse outcome. Such behavior cannot be reconciled with existing
models of ambiguity aversion in a straightforward manner.

arXiv link: http://arxiv.org/abs/2206.04605v6

Econometrics arXiv updated paper (originally submitted: 2022-06-08)

Inference for Matched Tuples and Fully Blocked Factorial Designs

Authors: Yuehao Bai, Jizhou Liu, Max Tabord-Meehan

This paper studies inference in randomized controlled trials with multiple
treatments, where treatment status is determined according to a "matched
tuples" design. Here, by a matched tuples design, we mean an experimental
design where units are sampled i.i.d. from the population of interest, grouped
into "homogeneous" blocks with cardinality equal to the number of treatments,
and finally, within each block, each treatment is assigned exactly once
uniformly at random. We first study estimation and inference for matched tuples
designs in the general setting where the parameter of interest is a vector of
linear contrasts over the collection of average potential outcomes for each
treatment. Parameters of this form include standard average treatment effects
used to compare one treatment relative to another, but also include parameters
which may be of interest in the analysis of factorial designs. We first
establish conditions under which a sample analogue estimator is asymptotically
normal and construct a consistent estimator of its corresponding asymptotic
variance. Combining these results establishes the asymptotic exactness of tests
based on these estimators. In contrast, we show that, for two common testing
procedures based on t-tests constructed from linear regressions, one test is
generally conservative while the other generally invalid. We go on to apply our
results to study the asymptotic properties of what we call "fully-blocked" 2^K
factorial designs, which are simply matched tuples designs applied to a full
factorial experiment. Leveraging our previous results, we establish that our
estimator achieves a lower asymptotic variance under the fully-blocked design
than that under any stratified factorial design which stratifies the
experimental sample into a finite number of "large" strata. A simulation study
and empirical application illustrate the practical relevance of our results.

arXiv link: http://arxiv.org/abs/2206.04157v5

Econometrics arXiv updated paper (originally submitted: 2022-06-07)

Economic activity and climate change

Authors: Aránzazu de Juan, Pilar Poncela, Vladimir Rodríguez-Caballero, Esther Ruiz

In this paper, we survey recent econometric contributions to measure the
relationship between economic activity and climate change. Due to the critical
relevance of these effects for the well-being of future generations, there is
an explosion of publications devoted to measuring this relationship and its
main channels. The relation between economic activity and climate change is
complex with the possibility of causality running in both directions. Starting
from economic activity, the channels that relate economic activity and climate
change are energy consumption and the consequent pollution. Hence, we first
describe the main econometric contributions about the interactions between
economic activity and energy consumption, moving then to describing the
contributions on the interactions between economic activity and pollution.
Finally, we look at the main results on the relationship between climate change
and economic activity. An important consequence of climate change is the
increasing occurrence of extreme weather phenomena. Therefore, we also survey
contributions on the economic effects of catastrophic climate phenomena.

arXiv link: http://arxiv.org/abs/2206.03187v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-06-06

The Impact of Sampling Variability on Estimated Combinations of Distributional Forecasts

Authors: Ryan Zischke, Gael M. Martin, David T. Frazier, D. S. Poskitt

We investigate the performance and sampling variability of estimated forecast
combinations, with particular attention given to the combination of forecast
distributions. Unknown parameters in the forecast combination are optimized
according to criterion functions based on proper scoring rules, which are
chosen to reward the form of forecast accuracy that matters for the problem at
hand, and forecast performance is measured using the out-of-sample expectation
of said scoring rule. Our results provide novel insights into the behavior of
estimated forecast combinations. Firstly, we show that, asymptotically, the
sampling variability in the performance of standard forecast combinations is
determined solely by estimation of the constituent models, with estimation of
the combination weights contributing no sampling variability whatsoever, at
first order. Secondly, we show that, if computationally feasible, forecast
combinations produced in a single step -- in which the constituent model and
combination function parameters are estimated jointly -- have superior
predictive accuracy and lower sampling variability than standard forecast
combinations -- where constituent model and combination function parameters are
estimated in two steps. These theoretical insights are demonstrated
numerically, both in simulation settings and in an extensive empirical
illustration using a time series of S&P500 returns.

arXiv link: http://arxiv.org/abs/2206.02376v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-06-06

Markovian Interference in Experiments

Authors: Vivek F. Farias, Andrew A. Li, Tianyi Peng, Andrew Zheng

We consider experiments in dynamical systems where interventions on some
experimental units impact other units through a limiting constraint (such as a
limited inventory). Despite outsize practical importance, the best estimators
for this `Markovian' interference problem are largely heuristic in nature, and
their bias is not well understood. We formalize the problem of inference in
such experiments as one of policy evaluation. Off-policy estimators, while
unbiased, apparently incur a large penalty in variance relative to
state-of-the-art heuristics. We introduce an on-policy estimator: the
Differences-In-Q's (DQ) estimator. We show that the DQ estimator can in general
have exponentially smaller variance than off-policy evaluation. At the same
time, its bias is second order in the impact of the intervention. This yields a
striking bias-variance tradeoff so that the DQ estimator effectively dominates
state-of-the-art alternatives. From a theoretical perspective, we introduce
three separate novel techniques that are of independent interest in the theory
of Reinforcement Learning (RL). Our empirical evaluation includes a set of
experiments on a city-scale ride-hailing simulator.

arXiv link: http://arxiv.org/abs/2206.02371v2

Econometrics arXiv updated paper (originally submitted: 2022-06-06)

Assessing Omitted Variable Bias when the Controls are Endogenous

Authors: Paul Diegert, Matthew A. Masten, Alexandre Poirier

Omitted variables are one of the most important threats to the identification
of causal effects. Several widely used methods assess the impact of omitted
variables on empirical conclusions by comparing measures of selection on
observables with measures of selection on unobservables. The recent literature
has discussed various limitations of these existing methods, however. This
includes a companion paper of ours which explains issues that arise when the
omitted variables are endogenous, meaning that they are correlated with the
included controls. In the present paper, we develop a new approach to
sensitivity analysis that avoids those limitations, while still allowing
researchers to calibrate sensitivity parameters by comparing the magnitude of
selection on observables with the magnitude of selection on unobservables as in
previous methods. We illustrate our results in an empirical study of the effect
of historical American frontier life on modern cultural beliefs. Finally, we
implement these methods in the companion Stata module regsensitivity for easy
use in practice.

arXiv link: http://arxiv.org/abs/2206.02303v5

Econometrics arXiv paper, submitted: 2022-06-05

Causal impact of severe events on electricity demand: The case of COVID-19 in Japan

Authors: Yasunobu Wakashiro

As of May 2022, the coronavirus disease 2019 (COVID-19) still has a severe
global impact on people's lives. Previous studies have reported that COVID-19
decreased the electricity demand in early 2020. However, our study found that
the electricity demand increased in summer and winter even when the infection
was widespread. The fact that the event has continued over two years suggests
that it is essential to introduce the method which can estimate the impact of
the event for long period considering seasonal fluctuations. We employed the
Bayesian structural time-series model to estimate the causal impact of COVID-19
on electricity demand in Japan. The results indicate that behavioral
restrictions due to COVID-19 decreased the daily electricity demand (-5.1% in
weekdays, -6.1% in holidays) in April and May 2020 as indicated by previous
studies. However, even in 2020, the results show that the demand increases in
the hot summer and cold winter (the increasing rate is +14% in the period from
1st August to 15th September 2020, and +7.6% from 16th December 2020 to 15th
January 2021). This study shows that the significant decrease in electricity
demand for the business sector exceeded the increase in demand for the
household sector in April and May 2020; however, the increase in demand for the
households exceeded the decrease in demand for the business in hot summer and
cold winter periods. Our result also implies that it is possible to run out of
electricity when people's behavior changes even if they are less active.

arXiv link: http://arxiv.org/abs/2206.02122v1

Econometrics arXiv updated paper (originally submitted: 2022-06-03)

Debiased Machine Learning without Sample-Splitting for Stable Estimators

Authors: Qizhao Chen, Vasilis Syrgkanis, Morgane Austern

Estimation and inference on causal parameters is typically reduced to a
generalized method of moments problem, which involves auxiliary functions that
correspond to solutions to a regression or classification problem. Recent line
of work on debiased machine learning shows how one can use generic machine
learning estimators for these auxiliary problems, while maintaining asymptotic
normality and root-$n$ consistency of the target parameter of interest, while
only requiring mean-squared-error guarantees from the auxiliary estimation
algorithms. The literature typically requires that these auxiliary problems are
fitted on a separate sample or in a cross-fitting manner. We show that when
these auxiliary estimation algorithms satisfy natural leave-one-out stability
properties, then sample splitting is not required. This allows for sample
re-use, which can be beneficial in moderately sized sample regimes. For
instance, we show that the stability properties that we propose are satisfied
for ensemble bagged estimators, built via sub-sampling without replacement, a
popular technique in machine learning practice.

arXiv link: http://arxiv.org/abs/2206.01825v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-06-03

Bayesian and Frequentist Inference for Synthetic Controls

Authors: Ignacio Martinez, Jaume Vives-i-Bastida

The synthetic control method has become a widely popular tool to estimate
causal effects with observational data. Despite this, inference for synthetic
control methods remains challenging. Often, inferential results rely on linear
factor model data generating processes. In this paper, we characterize the
conditions on the factor model primitives (the factor loadings) for which the
statistical risk minimizers are synthetic controls (in the simplex). Then, we
propose a Bayesian alternative to the synthetic control method that preserves
the main features of the standard method and provides a new way of doing valid
inference. We explore a Bernstein-von Mises style result to link our Bayesian
inference to the frequentist inference. For linear factor model frameworks we
show that a maximum likelihood estimator (MLE) of the synthetic control weights
can consistently estimate the predictive function of the potential outcomes for
the treated unit and that our Bayes estimator is asymptotically close to the
MLE in the total variation sense. Through simulations, we show that there is
convergence between the Bayes and frequentist approach even in sparse settings.
Finally, we apply the method to re-visit the study of the economic costs of the
German re-unification and the Catalan secession movement. The Bayesian
synthetic control method is available in the bsynth R-package.

arXiv link: http://arxiv.org/abs/2206.01779v3

Econometrics arXiv paper, submitted: 2022-06-03

Cointegration and ARDL specification between the Dubai crude oil and the US natural gas market

Authors: Stavros Stavroyiannis

This paper examines the relationship between the price of the Dubai crude oil
and the price of the US natural gas using an updated monthly dataset from 1992
to 2018, incorporating the latter events in the energy markets. After employing
a variety of unit root and cointegration tests, the long-run relationship is
examined via the autoregressive distributed lag (ARDL) cointegration technique,
along with the Toda-Yamamoto (1995) causality test. Our results indicate that
there is a long-run relationship with a unidirectional causality running from
the Dubai crude oil market to the US natural gas market. A variety of post
specification tests indicate that the selected ARDL model is well-specified,
and the results of the Toda-Yamamoto approach via impulse response functions,
forecast error variance decompositions, and historical decompositions with
generalized weights, show that the Dubai crude oil price retains a positive
relationship and affects the US natural gas price.

arXiv link: http://arxiv.org/abs/2206.03278v1

Econometrics arXiv paper, submitted: 2022-06-02

Randomization Inference Tests for Shift-Share Designs

Authors: Luis Alvarez, Bruno Ferman, Raoni Oliveira

We consider the problem of inference in shift-share research designs. The
choice between existing approaches that allow for unrestricted spatial
correlation involves tradeoffs, varying in terms of their validity when there
are relatively few or concentrated shocks, and in terms of the assumptions on
the shock assignment process and treatment effects heterogeneity. We propose
alternative randomization inference methods that combine the advantages of
different approaches. These methods are valid in finite samples under
relatively stronger assumptions, while asymptotically valid under weaker
assumptions.

arXiv link: http://arxiv.org/abs/2206.00999v1

Econometrics arXiv paper, submitted: 2022-06-01

Human Wellbeing and Machine Learning

Authors: Ekaterina Oparina, Caspar Kaiser, Niccolò Gentile, Alexandre Tkatchenko, Andrew E. Clark, Jan-Emmanuel De Neve, Conchita D'Ambrosio

There is a vast literature on the determinants of subjective wellbeing.
International organisations and statistical offices are now collecting such
survey data at scale. However, standard regression models explain surprisingly
little of the variation in wellbeing, limiting our ability to predict it. In
response, we here assess the potential of Machine Learning (ML) to help us
better understand wellbeing. We analyse wellbeing data on over a million
respondents from Germany, the UK, and the United States. In terms of predictive
power, our ML approaches do perform better than traditional models. Although
the size of the improvement is small in absolute terms, it turns out to be
substantial when compared to that of key variables like health. We moreover
find that drastically expanding the set of explanatory variables doubles the
predictive power of both OLS and the ML approaches on unseen data. The
variables identified as important by our ML algorithms - $i.e.$ material
conditions, health, and meaningful social relations - are similar to those that
have already been identified in the literature. In that sense, our data-driven
ML results validate the findings from conventional approaches.

arXiv link: http://arxiv.org/abs/2206.00574v1

Econometrics arXiv paper, submitted: 2022-06-01

Time-Varying Multivariate Causal Processes

Authors: Jiti Gao, Bin Peng, Wei Biao Wu, Yayi Yan

In this paper, we consider a wide class of time-varying multivariate causal
processes which nests many classic and new examples as special cases. We first
prove the existence of a weakly dependent stationary approximation for our
model which is the foundation to initiate the theoretical development.
Afterwards, we consider the QMLE estimation approach, and provide both
point-wise and simultaneous inferences on the coefficient functions. In
addition, we demonstrate the theoretical findings through both simulated and
real data examples. In particular, we show the empirical relevance of our study
using an application to evaluate the conditional correlations between the stock
markets of China and U.S. We find that the interdependence between the two
stock markets is increasing over time.

arXiv link: http://arxiv.org/abs/2206.00409v1

Econometrics arXiv updated paper (originally submitted: 2022-05-31)

Predicting Day-Ahead Stock Returns using Search Engine Query Volumes: An Application of Gradient Boosted Decision Trees to the S&P 100

Authors: Christopher Bockel-Rickermann

The internet has changed the way we live, work and take decisions. As it is
the major modern resource for research, detailed data on internet usage
exhibits vast amounts of behavioral information. This paper aims to answer the
question whether this information can be facilitated to predict future returns
of stocks on financial capital markets. In an empirical analysis it implements
gradient boosted decision trees to learn relationships between abnormal returns
of stocks within the S&P 100 index and lagged predictors derived from
historical financial data, as well as search term query volumes on the internet
search engine Google. Models predict the occurrence of day-ahead stock returns
in excess of the index median. On a time frame from 2005 to 2017, all disparate
datasets exhibit valuable information. Evaluated models have average areas
under the receiver operating characteristic between 54.2% and 56.7%, clearly
indicating a classification better than random guessing. Implementing a simple
statistical arbitrage strategy, models are used to create daily trading
portfolios of ten stocks and result in annual performances of more than 57%
before transaction costs. With ensembles of different data sets topping up the
performance ranking, the results further question the weak form and semi-strong
form efficiency of modern financial capital markets. Even though transaction
costs are not included, the approach adds to the existing literature. It gives
guidance on how to use and transform data on internet usage behavior for
financial and economic modeling and forecasting.

arXiv link: http://arxiv.org/abs/2205.15853v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-05-31

Variable importance without impossible data

Authors: Masayoshi Mase, Art B. Owen, Benjamin B. Seiler

The most popular methods for measuring importance of the variables in a black
box prediction algorithm make use of synthetic inputs that combine predictor
variables from multiple subjects. These inputs can be unlikely, physically
impossible, or even logically impossible. As a result, the predictions for such
cases can be based on data very unlike any the black box was trained on. We
think that users cannot trust an explanation of the decision of a prediction
algorithm when the explanation uses such values. Instead we advocate a method
called Cohort Shapley that is grounded in economic game theory and unlike most
other game theoretic methods, it uses only actually observed data to quantify
variable importance. Cohort Shapley works by narrowing the cohort of subjects
judged to be similar to a target subject on one or more features. We illustrate
it on an algorithmic fairness problem where it is essential to attribute
importance to protected variables that the model was not trained on.

arXiv link: http://arxiv.org/abs/2205.15750v3

Econometrics arXiv updated paper (originally submitted: 2022-05-31)

Estimating spot volatility under infinite variation jumps with dependent market microstructure noise

Authors: Qiang Liu, Zhi Liu

Jumps and market microstructure noise are stylized features of high-frequency
financial data. It is well known that they introduce bias in the estimation of
volatility (including integrated and spot volatilities) of assets, and many
methods have been proposed to deal with this problem. When the jumps are
intensive with infinite variation, the efficient estimation of spot volatility
under serially dependent noise is not available and is thus in need. For this
purpose, we propose a novel estimator of spot volatility with a hybrid use of
the pre-averaging technique and the empirical characteristic function. Under
mild assumptions, the results of consistency and asymptotic normality of our
estimator are established. Furthermore, we show that our estimator achieves an
almost efficient convergence rate with optimal variance when the jumps are
either less active or active with symmetric structure. Simulation studies
verify our theoretical conclusions. We apply our proposed estimator to
empirical analyses, such as estimating the weekly volatility curve using
second-by-second transaction price data.

arXiv link: http://arxiv.org/abs/2205.15738v2

Econometrics arXiv updated paper (originally submitted: 2022-05-30)

Fast Two-Stage Variational Bayesian Approach to Estimating Panel Spatial Autoregressive Models with Unrestricted Spatial Weights Matrices

Authors: Deborah Gefang, Stephen G. Hall, George S. Tavlas

This paper proposes a fast two-stage variational Bayesian (VB) algorithm to
estimate unrestricted panel spatial autoregressive models. Using
Dirichlet-Laplace priors, we are able to uncover the spatial relationships
between cross-sectional units without imposing any a priori restrictions. Monte
Carlo experiments show that our approach works well for both long and short
panels. We are also the first in the literature to develop VB methods to
estimate large covariance matrices with unrestricted sparsity patterns, which
are useful for popular large data models such as Bayesian vector
autoregressions. In empirical applications, we examine the spatial
interdependence between euro area sovereign bond ratings and spreads. We find
marked differences between the spillover behaviours of the northern euro area
countries and those of the south.

arXiv link: http://arxiv.org/abs/2205.15420v3

Econometrics arXiv cross-link from cs.GT (cs.GT), submitted: 2022-05-29

Credible, Strategyproof, Optimal, and Bounded Expected-Round Single-Item Auctions for all Distributions

Authors: Meryem Essaidi, Matheus V. X. Ferreira, S. Matthew Weinberg

We consider a revenue-maximizing seller with a single item for sale to
multiple buyers with i.i.d. valuations. Akbarpour and Li (2020) show that the
only optimal, credible, strategyproof auction is the ascending price auction
with reserves which has unbounded communication complexity. Recent work of
Ferreira and Weinberg (2020) circumvents their impossibility result assuming
the existence of cryptographically secure commitment schemes, and designs a
two-round credible, strategyproof, optimal auction. However, their auction is
only credible when buyers' valuations are MHR or $\alpha$-strongly regular:
they show their auction might not be credible even when there is a single buyer
drawn from a non-MHR distribution. In this work, under the same cryptographic
assumptions, we identify a new single-item auction that is credible,
strategyproof, revenue optimal, and terminates in constant rounds in
expectation for all distributions with finite monopoly price.

arXiv link: http://arxiv.org/abs/2205.14758v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-05-28

Provably Auditing Ordinary Least Squares in Low Dimensions

Authors: Ankur Moitra, Dhruv Rohatgi

Measuring the stability of conclusions derived from Ordinary Least Squares
linear regression is critically important, but most metrics either only measure
local stability (i.e. against infinitesimal changes in the data), or are only
interpretable under statistical assumptions. Recent work proposes a simple,
global, finite-sample stability metric: the minimum number of samples that need
to be removed so that rerunning the analysis overturns the conclusion,
specifically meaning that the sign of a particular coefficient of the estimated
regressor changes. However, besides the trivial exponential-time algorithm, the
only approach for computing this metric is a greedy heuristic that lacks
provable guarantees under reasonable, verifiable assumptions; the heuristic
provides a loose upper bound on the stability and also cannot certify lower
bounds on it.
We show that in the low-dimensional regime where the number of covariates is
a constant but the number of samples is large, there are efficient algorithms
for provably estimating (a fractional version of) this metric. Applying our
algorithms to the Boston Housing dataset, we exhibit regression analyses where
we can estimate the stability up to a factor of $3$ better than the greedy
heuristic, and analyses where we can certify stability to dropping even a
majority of the samples.

arXiv link: http://arxiv.org/abs/2205.14284v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-05-27

Average Adjusted Association: Efficient Estimation with High Dimensional Confounders

Authors: Sung Jae Jun, Sokbae Lee

The log odds ratio is a well-established metric for evaluating the
association between binary outcome and exposure variables. Despite its
widespread use, there has been limited discussion on how to summarize the log
odds ratio as a function of confounders through averaging. To address this
issue, we propose the Average Adjusted Association (AAA), which is a summary
measure of association in a heterogeneous population, adjusted for observed
confounders. To facilitate the use of it, we also develop efficient
double/debiased machine learning (DML) estimators of the AAA. Our DML
estimators use two equivalent forms of the efficient influence function, and
are applicable in various sampling scenarios, including random sampling,
outcome-based sampling, and exposure-based sampling. Through real data and
simulations, we demonstrate the practicality and effectiveness of our proposed
estimators in measuring the AAA.

arXiv link: http://arxiv.org/abs/2205.14048v2

Econometrics arXiv updated paper (originally submitted: 2022-05-25)

Identification of Auction Models Using Order Statistics

Authors: Yao Luo, Ruli Xiao

Auction data often contain information on only the most competitive bids as
opposed to all bids. The usual measurement error approaches to unobserved
heterogeneity are inapplicable due to dependence among order statistics. We
bridge this gap by providing a set of positive identification results. First,
we show that symmetric auctions with discrete unobserved heterogeneity are
identifiable using two consecutive order statistics and an instrument. Second,
we extend the results to ascending auctions with unknown competition and
unobserved heterogeneity.

arXiv link: http://arxiv.org/abs/2205.12917v2

Econometrics arXiv cross-link from q-fin.CP (q-fin.CP), submitted: 2022-05-25

Machine learning method for return direction forecasting of Exchange Traded Funds using classification and regression models

Authors: Raphael P. B. Piovezan, Pedro Paulo de Andrade Junior

This article aims to propose and apply a machine learning method to analyze
the direction of returns from Exchange Traded Funds (ETFs) using the historical
return data of its components, helping to make investment strategy decisions
through a trading algorithm. In methodological terms, regression and
classification models were applied, using standard datasets from Brazilian and
American markets, in addition to algorithmic error metrics. In terms of
research results, they were analyzed and compared to those of the Na\"ive
forecast and the returns obtained by the buy & hold technique in the same
period of time. In terms of risk and return, the models mostly performed better
than the control metrics, with emphasis on the linear regression model and the
classification models by logistic regression, support vector machine (using the
LinearSVC model), Gaussian Naive Bayes and K-Nearest Neighbors, where in
certain datasets the returns exceeded by two times and the Sharpe ratio by up
to four times those of the buy & hold control model.

arXiv link: http://arxiv.org/abs/2205.12746v2

Econometrics arXiv updated paper (originally submitted: 2022-05-24)

Estimation and Inference for High Dimensional Factor Model with Regime Switching

Authors: Giovanni Urga, Fa Wang

This paper proposes maximum (quasi)likelihood estimation for high dimensional
factor models with regime switching in the loadings. The model parameters are
estimated jointly by the EM (expectation maximization) algorithm, which in the
current context only requires iteratively calculating regime probabilities and
principal components of the weighted sample covariance matrix. When regime
dynamics are taken into account, smoothed regime probabilities are calculated
using a recursive algorithm. Consistency, convergence rates and limit
distributions of the estimated loadings and the estimated factors are
established under weak cross-sectional and temporal dependence as well as
heteroscedasticity. It is worth noting that due to high dimension, regime
switching can be identified consistently after the switching point with only
one observation. Simulation results show good performance of the proposed
method. An application to the FRED-MD dataset illustrates the potential of the
proposed method for detection of business cycle turning points.

arXiv link: http://arxiv.org/abs/2205.12126v2

Econometrics arXiv updated paper (originally submitted: 2022-05-24)

Subgeometrically ergodic autoregressions with autoregressive conditional heteroskedasticity

Authors: Mika Meitz, Pentti Saikkonen

In this paper, we consider subgeometric (specifically, polynomial) ergodicity
of univariate nonlinear autoregressions with autoregressive conditional
heteroskedasticity (ARCH). The notion of subgeometric ergodicity was introduced
in the Markov chain literature in 1980s and it means that the transition
probability measures converge to the stationary measure at a rate slower than
geometric; this rate is also closely related to the convergence rate of
$\beta$-mixing coefficients. While the existing literature on subgeometrically
ergodic autoregressions assumes a homoskedastic error term, this paper provides
an extension to the case of conditionally heteroskedastic ARCH-type errors,
considerably widening the scope of potential applications. Specifically, we
consider suitably defined higher-order nonlinear autoregressions with possibly
nonlinear ARCH errors and show that they are, under appropriate conditions,
subgeometrically ergodic at a polynomial rate. An empirical example using
energy sector volatility index data illustrates the use of subgeometrically
ergodic AR-ARCH models.

arXiv link: http://arxiv.org/abs/2205.11953v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-05-23

Quasi Black-Box Variational Inference with Natural Gradients for Bayesian Learning

Authors: Martin Magris, Mostafa Shabani, Alexandros Iosifidis

We develop an optimization algorithm suitable for Bayesian learning in
complex models. Our approach relies on natural gradient updates within a
general black-box framework for efficient training with limited model-specific
derivations. It applies within the class of exponential-family variational
posterior distributions, for which we extensively discuss the Gaussian case for
which the updates have a rather simple form. Our Quasi Black-box Variational
Inference (QBVI) framework is readily applicable to a wide class of Bayesian
inference problems and is of simple implementation as the updates of the
variational posterior do not involve gradients with respect to the model
parameters, nor the prescription of the Fisher information matrix. We develop
QBVI under different hypotheses for the posterior covariance matrix, discuss
details about its robust and feasible implementation, and provide a number of
real-world applications to demonstrate its effectiveness.

arXiv link: http://arxiv.org/abs/2205.11568v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-05-23

Robust and Agnostic Learning of Conditional Distributional Treatment Effects

Authors: Nathan Kallus, Miruna Oprescu

The conditional average treatment effect (CATE) is the best measure of
individual causal effects given baseline covariates. However, the CATE only
captures the (conditional) average, and can overlook risks and tail events,
which are important to treatment choice. In aggregate analyses, this is usually
addressed by measuring the distributional treatment effect (DTE), such as
differences in quantiles or tail expectations between treatment groups.
Hypothetically, one can similarly fit conditional quantile regressions in each
treatment group and take their difference, but this would not be robust to
misspecification or provide agnostic best-in-class predictions. We provide a
new robust and model-agnostic methodology for learning the conditional DTE
(CDTE) for a class of problems that includes conditional quantile treatment
effects, conditional super-quantile treatment effects, and conditional
treatment effects on coherent risk measures given by $f$-divergences. Our
method is based on constructing a special pseudo-outcome and regressing it on
covariates using any regression learner. Our method is model-agnostic in that
it can provide the best projection of CDTE onto the regression model class. Our
method is robust in that even if we learn these nuisances nonparametrically at
very slow rates, we can still learn CDTEs at rates that depend on the class
complexity and even conduct inferences on linear projections of CDTEs. We
investigate the behavior of our proposal in simulations, as well as in a case
study of 401(k) eligibility effects on wealth.

arXiv link: http://arxiv.org/abs/2205.11486v3

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2022-05-23

Probabilistic forecasting of German electricity imbalance prices

Authors: Michał Narajewski

The exponential growth of renewable energy capacity has brought much
uncertainty to electricity prices and to electricity generation. To address
this challenge, the energy exchanges have been developing further trading
possibilities, especially the intraday and balancing markets. For an energy
trader participating in both markets, the forecasting of imbalance prices is of
particular interest. Therefore, in this manuscript we conduct a very short-term
probabilistic forecasting of imbalance prices, contributing to the scarce
literature in this novel subject. The forecasting is performed 30 minutes
before the delivery, so that the trader might still choose the trading place.
The distribution of the imbalance prices is modelled and forecasted using
methods well-known in the electricity price forecasting literature: lasso with
bootstrap, gamlss, and probabilistic neural networks. The methods are compared
with a naive benchmark in a meaningful rolling window study. The results
provide evidence of the efficiency between the intraday and balancing markets
as the sophisticated methods do not substantially overperform the intraday
continuous price index. On the other hand, they significantly improve the
empirical coverage. The analysis was conducted on the German market, however it
could be easily applied to any other market of similar structure.

arXiv link: http://arxiv.org/abs/2205.11439v1

Econometrics arXiv cross-link from eess.SY (eess.SY), submitted: 2022-05-23

A Novel Control-Oriented Cell Transmission Model Including Service Stations on Highways

Authors: Carlo Cenedese, Michele Cucuzzella, Antonella Ferrara, John Lygeros

In this paper, we propose a novel model that describes how the traffic
evolution on a highway stretch is affected by the presence of a service
station. The presented model enhances the classical CTM dynamics by adding the
dynamics associated with the service stations, where the vehicles may stop
before merging back into the mainstream. We name it CTMs. We discuss its
flexibility in describing different complex scenarios where multiple stations
are characterized by different drivers' average stopping times corresponding to
different services. The model has been developed to help design control
strategies aimed at decreasing traffic congestion. Thus, we discuss how
classical control schemes can interact with the proposed CTMs. Finally,
we validate the proposed model through numerical simulations and assess the
effects of service stations on traffic evolution, which appear to be
beneficial, especially for relatively short congested periods.

arXiv link: http://arxiv.org/abs/2205.15115v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-05-23

Graph-Based Methods for Discrete Choice

Authors: Kiran Tomlinson, Austin R. Benson

Choices made by individuals have widespread impacts--for instance, people
choose between political candidates to vote for, between social media posts to
share, and between brands to purchase--moreover, data on these choices are
increasingly abundant. Discrete choice models are a key tool for learning
individual preferences from such data. Additionally, social factors like
conformity and contagion influence individual choice. Traditional methods for
incorporating these factors into choice models do not account for the entire
social network and require hand-crafted features. To overcome these
limitations, we use graph learning to study choice in networked contexts. We
identify three ways in which graph learning techniques can be used for discrete
choice: learning chooser representations, regularizing choice model parameters,
and directly constructing predictions from a network. We design methods in each
category and test them on real-world choice datasets, including county-level
2016 US election results and Android app installation and usage data. We show
that incorporating social network structure can improve the predictions of the
standard econometric choice model, the multinomial logit. We provide evidence
that app installations are influenced by social context, but we find no such
effect on app usage among the same participants, which instead is habit-driven.
In the election data, we highlight the additional insights a discrete choice
framework provides over classification or regression, the typical approaches.
On synthetic data, we demonstrate the sample complexity benefit of using social
information in choice models.

arXiv link: http://arxiv.org/abs/2205.11365v2

Econometrics arXiv paper, submitted: 2022-05-23

Regime and Treatment Effects in Duration Models: Decomposing Expectation and Transplant Effects on the Kidney Waitlist

Authors: Stephen Kastoryano

This paper proposes a causal decomposition framework for settings in which an
initial regime randomization influences the timing of a treatment duration. The
initial randomization and treatment affect in turn a duration outcome of
interest. Our empirical application considers the survival of individuals on
the kidney transplant waitlist. Upon entering the waitlist, individuals with an
AB blood type, who are universal recipients, are effectively randomized to a
regime with a higher propensity to rapidly receive a kidney transplant. Our
dynamic potential outcomes framework allows us to identify the pre-transplant
effect of the blood type, and the transplant effects depending on blood type.
We further develop dynamic assumptions which build on the LATE framework and
allow researchers to separate effects for different population substrata. Our
main empirical result is that AB blood type candidates display a higher
pre-transplant mortality. We provide evidence that this effect is due to
behavioural changes rather than biological differences.

arXiv link: http://arxiv.org/abs/2205.11189v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-05-22

Fast Instrument Learning with Faster Rates

Authors: Ziyu Wang, Yuhao Zhou, Jun Zhu

We investigate nonlinear instrumental variable (IV) regression given
high-dimensional instruments. We propose a simple algorithm which combines
kernelized IV methods and an arbitrary, adaptive regression algorithm, accessed
as a black box. Our algorithm enjoys faster-rate convergence and adapts to the
dimensionality of informative latent features, while avoiding an expensive
minimax optimization procedure, which has been necessary to establish similar
guarantees. It further brings the benefit of flexible machine learning models
to quasi-Bayesian uncertainty quantification, likelihood-based model selection,
and model averaging. Simulation studies demonstrate the competitive performance
of our method.

arXiv link: http://arxiv.org/abs/2205.10772v2

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2022-05-21

The Effect of Increased Access to IVF on Women's Careers

Authors: Lingxi Chen

Motherhood is the main contributor to gender gaps in the labor market. IVF is
a method of assisted reproduction that can delay fertility, which results in
decreased motherhood income penalty. In this research, I estimate the effects
of expanded access to in vitro fertilization (IVF) arising from state insurance
mandates. I use a difference-in-differences model to estimate the effect of
increased IVF accessibility for delaying childbirth and decreasing the
motherhood income penalty. Using the fertility supplement dataset from the
Current Population Survey (CPS), I estimate how outcomes change in states when
they implement their mandates compared to how outcomes change in states that
are not changing their policies. The results indicate that IVF mandates
increase the probability of motherhood by 38 by 3.1 percentage points (p<0.01).
However, the results provide no evidence that IVF insurance mandates impact
women's earnings.

arXiv link: http://arxiv.org/abs/2205.14186v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-05-21

The Power of Prognosis: Improving Covariate Balance Tests with Outcome Information

Authors: Clara Bicalho, Adam Bouyamourn, Thad Dunning

Scholars frequently use covariate balance tests to test the validity of
natural experiments and related designs. Unfortunately, when measured
covariates are unrelated to potential outcomes, balance is uninformative about
key identification conditions. We show that balance tests can then lead to
erroneous conclusions. To build stronger tests, researchers should identify
covariates that are jointly predictive of potential outcomes; formally measure
and report covariate prognosis; and prioritize the most individually
informative variables in tests. Building on prior research on “prognostic
scores," we develop bootstrap balance tests that upweight covariates associated
with the outcome. We adapt this approach for regression-discontinuity designs
and use simulations to compare weighting methods based on linear regression and
more flexible methods, including machine learning. The results show how
prognosis weighting can avoid both false negatives and false positives. To
illustrate key points, we study empirical examples from a sample of published
studies, including an important debate over close elections.

arXiv link: http://arxiv.org/abs/2205.10478v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-05-20

What's the Harm? Sharp Bounds on the Fraction Negatively Affected by Treatment

Authors: Nathan Kallus

The fundamental problem of causal inference -- that we never observe
counterfactuals -- prevents us from identifying how many might be negatively
affected by a proposed intervention. If, in an A/B test, half of users click
(or buy, or watch, or renew, etc.), whether exposed to the standard experience
A or a new one B, hypothetically it could be because the change affects no one,
because the change positively affects half the user population to go from
no-click to click while negatively affecting the other half, or something in
between. While unknowable, this impact is clearly of material importance to the
decision to implement a change or not, whether due to fairness, long-term,
systemic, or operational considerations. We therefore derive the
tightest-possible (i.e., sharp) bounds on the fraction negatively affected (and
other related estimands) given data with only factual observations, whether
experimental or observational. Naturally, the more we can stratify individuals
by observable covariates, the tighter the sharp bounds. Since these bounds
involve unknown functions that must be learned from data, we develop a robust
inference algorithm that is efficient almost regardless of how and how fast
these functions are learned, remains consistent when some are mislearned, and
still gives valid conservative bounds when most are mislearned. Our methodology
altogether therefore strongly supports credible conclusions: it avoids
spuriously point-identifying this unknowable impact, focusing on the best
bounds instead, and it permits exceedingly robust inference on these. We
demonstrate our method in simulation studies and in a case study of career
counseling for the unemployed.

arXiv link: http://arxiv.org/abs/2205.10327v2

Econometrics arXiv updated paper (originally submitted: 2022-05-20)

Treatment Effects in Bunching Designs: The Impact of Mandatory Overtime Pay on Hours

Authors: Leonard Goff

This paper studies the identifying power of bunching at kinks when the
researcher does not assume a parametric choice model. I find that in a general
choice model, identifying the average causal response to the policy switch at a
kink amounts to confronting two extrapolation problems, each about the
distribution of a counterfactual choice that is observed only in a censored
manner. I apply this insight to partially identify the effect of overtime pay
regulation on the hours of U.S. workers using administrative payroll data,
assuming that each distribution satisfies a weak non-parametric shape
constraint in the region where it is not observed. The resulting bounds are
informative and indicate a relatively small elasticity of demand for weekly
hours, addressing a long-standing question about the causal effects of the
overtime mandate.

arXiv link: http://arxiv.org/abs/2205.10310v4

Econometrics arXiv updated paper (originally submitted: 2022-05-20)

The Forecasting performance of the Factor model with Martingale Difference errors

Authors: Luca Mattia Rolla, Alessandro Giovannelli

This paper analyses the forecasting performance of a new class of factor
models with martingale difference errors (FMMDE) recently introduced by Lee and
Shao (2018). The FMMDE makes it possible to retrieve a transformation of the
original series so that the resulting variables can be partitioned according to
whether they are conditionally mean-independent with respect to past
information. We contribute to the literature in two respects. First, we propose
a novel methodology for selecting the number of factors in FMMDE. Through
simulation experiments, we show the good performance of our approach for finite
samples for various panel data specifications. Second, we compare the
forecasting performance of FMMDE with alternative factor model specifications
by conducting an extensive forecasting exercise using FRED-MD, a comprehensive
monthly macroeconomic database for the US economy. Our empirical findings
indicate that FMMDE provides an advantage in predicting the evolution of the
real sector of the economy when the novel methodology for factor selection is
adopted. These results are confirmed for key aggregates such as Production and
Income, the Labor Market, and Consumption.

arXiv link: http://arxiv.org/abs/2205.10256v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2022-05-20

A New Central Limit Theorem for the Augmented IPW Estimator: Variance Inflation, Cross-Fit Covariance and Beyond

Authors: Kuanhao Jiang, Rajarshi Mukherjee, Subhabrata Sen, Pragya Sur

Estimation of the average treatment effect (ATE) is a central problem in
causal inference. In recent times, inference for the ATE in the presence of
high-dimensional covariates has been extensively studied. Among the diverse
approaches that have been proposed, augmented inverse probability weighting
(AIPW) with cross-fitting has emerged a popular choice in practice. In this
work, we study this cross-fit AIPW estimator under well-specified outcome
regression and propensity score models in a high-dimensional regime where the
number of features and samples are both large and comparable. Under assumptions
on the covariate distribution, we establish a new central limit theorem for the
suitably scaled cross-fit AIPW that applies without any sparsity assumptions on
the underlying high-dimensional parameters. Our CLT uncovers two crucial
phenomena among others: (i) the AIPW exhibits a substantial variance inflation
that can be precisely quantified in terms of the signal-to-noise ratio and
other problem parameters, (ii) the asymptotic covariance between the
pre-cross-fit estimators is non-negligible even on the root-n scale. These
findings are strikingly different from their classical counterparts. On the
technical front, our work utilizes a novel interplay between three distinct
tools--approximate message passing theory, the theory of deterministic
equivalents, and the leave-one-out approach. We believe our proof techniques
should be useful for analyzing other two-stage estimators in this
high-dimensional regime. Finally, we complement our theoretical results with
simulations that demonstrate both the finite sample efficacy of our CLT and its
robustness to our assumptions.

arXiv link: http://arxiv.org/abs/2205.10198v3

Econometrics arXiv updated paper (originally submitted: 2022-05-20)

Nonlinear Fore(Back)casting and Innovation Filtering for Causal-Noncausal VAR Models

Authors: Christian Gourieroux, Joann Jasiak

We show that the mixed causal-noncausal Vector Autoregressive (VAR) processes
satisfy the Markov property in both calendar and reverse time. Based on that
property, we introduce closed-form formulas of forward and backward predictive
densities for point and interval forecasting and backcasting out-of-sample. The
backcasting formula is used for adjusting the forecast interval to obtain a
desired coverage level when the tail quantiles are difficult to estimate. A
confidence set for the prediction interval is introduced for assessing the
uncertainty due to estimation. We also define new nonlinear past-dependent
innovations of mixed causal-noncausal VAR models for impulse response function
analysis. Our approach is illustrated by simulations and an application to oil
prices and real GDP growth rates.

arXiv link: http://arxiv.org/abs/2205.09922v4

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2022-05-19

High-dimensional Data Bootstrap

Authors: Victor Chernozhukov, Denis Chetverikov, Kengo Kato, Yuta Koike

This article reviews recent progress in high-dimensional bootstrap. We first
review high-dimensional central limit theorems for distributions of sample mean
vectors over the rectangles, bootstrap consistency results in high dimensions,
and key techniques used to establish those results. We then review selected
applications of high-dimensional bootstrap: construction of simultaneous
confidence sets for high-dimensional vector parameters, multiple hypothesis
testing via stepdown, post-selection inference, intersection bounds for
partially identified parameters, and inference on best policies in policy
evaluation. Finally, we also comment on a couple of future research directions.

arXiv link: http://arxiv.org/abs/2205.09691v1

Econometrics arXiv cross-link from q-fin.TR (q-fin.TR), submitted: 2022-05-18

Dynamics of a Binary Option Market with Exogenous Information and Price Sensitivity

Authors: Hannah Gampe, Christopher Griffin

In this paper, we derive and analyze a continuous of a binary option market
with exogenous information. The resulting non-linear system has a discontinuous
right hand side, which can be analyzed using zero-dimensional Filippov
surfaces. Under general assumptions on purchasing rules, we show that when
exogenous information is constant in the binary asset market, the price always
converges. We then investigate market prices in the case of changing
information, showing empirically that price sensitivity has a strong effect on
price lag vs. information. We conclude with open questions on general $n$-ary
option markets. As a by-product of the analysis, we show that these markets are
equivalent to a simple recurrent neural network, helping to explain some of the
predictive power associated with prediction markets, which are usually designed
as $n$-ary option markets.

arXiv link: http://arxiv.org/abs/2206.07132v1

Econometrics arXiv updated paper (originally submitted: 2022-05-17)

Treatment Choice with Nonlinear Regret

Authors: Toru Kitagawa, Sokbae Lee, Chen Qiu

The literature focuses on the mean of welfare regret, which can lead to
undesirable treatment choice due to sensitivity to sampling uncertainty. We
propose to minimize the mean of a nonlinear transformation of regret and show
that singleton rules are not essentially complete for nonlinear regret.
Focusing on mean square regret, we derive closed-form fractions for
finite-sample Bayes and minimax optimal rules. Our approach is grounded in
decision theory and extends to limit experiments. The treatment fractions can
be viewed as the strength of evidence favoring treatment. We apply our
framework to a normal regression model and sample size calculation.

arXiv link: http://arxiv.org/abs/2205.08586v6

Econometrics arXiv updated paper (originally submitted: 2022-05-16)

The Power of Tests for Detecting $p$-Hacking

Authors: Graham Elliott, Nikolay Kudrin, Kaspar Wüthrich

A flourishing empirical literature investigates the prevalence of $p$-hacking
based on the distribution of $p$-values across studies. Interpreting results in
this literature requires a careful understanding of the power of methods for
detecting $p$-hacking. We theoretically study the implications of likely forms
of $p$-hacking on the distribution of $p$-values to understand the power of
tests for detecting it. Power can be low and depends crucially on the
$p$-hacking strategy and the distribution of true effects. Combined tests for
upper bounds and monotonicity and tests for continuity of the $p$-curve tend to
have the highest power for detecting $p$-hacking.

arXiv link: http://arxiv.org/abs/2205.07950v4

Econometrics arXiv updated paper (originally submitted: 2022-05-16)

2SLS with Multiple Treatments

Authors: Manudeep Bhuller, Henrik Sigstad

We study what two-stage least squares (2SLS) identifies in models with
multiple treatments under treatment effect heterogeneity. Two conditions are
shown to be necessary and sufficient for the 2SLS to identify positively
weighted sums of agent-specific effects of each treatment: average conditional
monotonicity and no cross effects. Our identification analysis allows for any
number of treatments, any number of continuous or discrete instruments, and the
inclusion of covariates. We provide testable implications and present
characterizations of choice behavior implied by our identification conditions.

arXiv link: http://arxiv.org/abs/2205.07836v11

Econometrics arXiv paper, submitted: 2022-05-16

HARNet: A Convolutional Neural Network for Realized Volatility Forecasting

Authors: Rafael Reisenhofer, Xandro Bayer, Nikolaus Hautsch

Despite the impressive success of deep neural networks in many application
areas, neural network models have so far not been widely adopted in the context
of volatility forecasting. In this work, we aim to bridge the conceptual gap
between established time series approaches, such as the Heterogeneous
Autoregressive (HAR) model, and state-of-the-art deep neural network models.
The newly introduced HARNet is based on a hierarchy of dilated convolutional
layers, which facilitates an exponential growth of the receptive field of the
model in the number of model parameters. HARNets allow for an explicit
initialization scheme such that before optimization, a HARNet yields identical
predictions as the respective baseline HAR model. Particularly when considering
the QLIKE error as a loss function, we find that this approach significantly
stabilizes the optimization of HARNets. We evaluate the performance of HARNets
with respect to three different stock market indexes. Based on this evaluation,
we formulate clear guidelines for the optimization of HARNets and show that
HARNets can substantially improve upon the forecasting accuracy of their
respective HAR baseline models. In a qualitative analysis of the filter weights
learnt by a HARNet, we report clear patterns regarding the predictive power of
past information. Among information from the previous week, yesterday and the
day before, yesterday's volatility makes by far the most contribution to
today's realized volatility forecast. Moroever, within the previous month, the
importance of single weeks diminishes almost linearly when moving further into
the past.

arXiv link: http://arxiv.org/abs/2205.07719v1

Econometrics arXiv updated paper (originally submitted: 2022-05-16)

Is climate change time reversible?

Authors: Francesco Giancaterini, Alain Hecq, Claudio Morana

This paper proposes strategies to detect time reversibility in stationary
stochastic processes by using the properties of mixed causal and noncausal
models. It shows that they can also be used for non-stationary processes when
the trend component is computed with the Hodrick-Prescott filter rendering a
time-reversible closed-form solution. This paper also links the concept of an
environmental tipping point to the statistical property of time irreversibility
and assesses fourteen climate indicators. We find evidence of time
irreversibility in $GHG$ emissions, global temperature, global sea levels, sea
ice area, and some natural oscillation indices. While not conclusive, our
findings urge the implementation of correction policies to avoid the worst
consequences of climate change and not miss the opportunity window, which might
still be available, despite closing quickly.

arXiv link: http://arxiv.org/abs/2205.07579v3

Econometrics arXiv paper, submitted: 2022-05-15

Inference with Imputed Data: The Allure of Making Stuff Up

Authors: Charles F. Manski

Incomplete observability of data generates an identification problem. There
is no panacea for missing data. What one can learn about a population parameter
depends on the assumptions one finds credible to maintain. The credibility of
assumptions varies with the empirical setting. No specific assumptions can
provide a realistic general solution to the problem of inference with missing
data. Yet Rubin has promoted random multiple imputation (RMI) as a general way
to deal with missing values in public-use data. This recommendation has been
influential to empirical researchers who seek a simple fix to the nuisance of
missing data. This paper adds to my earlier critiques of imputation. It
provides a transparent assessment of the mix of Bayesian and frequentist
thinking used by Rubin to argue for RMI. It evaluates random imputation to
replace missing outcome or covariate data when the objective is to learn a
conditional expectation. It considers steps that might help combat the allure
of making stuff up.

arXiv link: http://arxiv.org/abs/2205.07388v1

Econometrics arXiv cross-link from math.OC (math.OC), submitted: 2022-05-15

Joint Location and Cost Planning in Maximum Capture Facility Location under Multiplicative Random Utility Maximization

Authors: Ngan Ha Duong, Tien Thanh Dam, Thuy Anh Ta, Tien Mai

We study a joint facility location and cost planning problem in a competitive
market under random utility maximization (RUM) models. The objective is to
locate new facilities and make decisions on the costs (or budgets) to spend on
the new facilities, aiming to maximize an expected captured customer demand,
assuming that customers choose a facility among all available facilities
according to a RUM model. We examine two RUM frameworks in the discrete choice
literature, namely, the additive and multiplicative RUM. While the former has
been widely used in facility location problems, we are the first to explore the
latter in the context. We numerically show that the two RUM frameworks can well
approximate each other in the context of the cost optimization problem. In
addition, we show that, under the additive RUM framework, the resultant cost
optimization problem becomes highly non-convex and may have several local
optima. In contrast, the use of the multiplicative RUM brings several
advantages to the competitive facility location problem. For instance, the cost
optimization problem under the multiplicative RUM can be solved efficiently by
a general convex optimization solver or can be reformulated as a conic
quadratic program and handled by a conic solver available in some off-the-shelf
solvers such as CPLEX or GUROBI. Furthermore, we consider a joint location and
cost optimization problem under the multiplicative RUM and propose three
approaches to solve the problem, namely, an equivalent conic reformulation, a
multi-cut outer-approximation algorithm, and a local search heuristic. We
provide numerical experiments based on synthetic instances of various sizes to
evaluate the performances of the proposed algorithms in solving the cost
optimization, and the joint location and cost optimization problems.

arXiv link: http://arxiv.org/abs/2205.07345v2

Econometrics arXiv paper, submitted: 2022-05-13

How do Bounce Rates vary according to product sold?

Authors: Himanshu Sharma

Bounce Rate of different E-commerce websites depends on the different factors
based upon the different devices through which traffic share is observed. This
research paper focuses on how the type of products sold by different E-commerce
websites affects the bounce rate obtained through Mobile/Desktop. It tries to
explain the observations which counter the general trend of positive relation
between Mobile traffic share and bounce rate and how this is different for the
Desktop. To estimate the differences created by the types of products sold by
E-commerce websites on the bounce rate according to the data observed for
different time, fixed effect model (within group method) is used to determine
the difference created by the factors. Along with the effect of the type of
products sold by the E-commerce website on bounce rate, the effect of
individual website is also compared to verify the results obtained for type of
products.

arXiv link: http://arxiv.org/abs/2205.06866v1

Econometrics arXiv updated paper (originally submitted: 2022-05-13)

A Robust Permutation Test for Subvector Inference in Linear Regressions

Authors: Xavier D'Haultfœuille, Purevdorj Tuvaandorj

We develop a new permutation test for inference on a subvector of
coefficients in linear models. The test is exact when the regressors and the
error terms are independent. Then, we show that the test is asymptotically of
correct level, consistent and has power against local alternatives when the
independence condition is relaxed, under two main conditions. The first is a
slight reinforcement of the usual absence of correlation between the regressors
and the error term. The second is that the number of strata, defined by values
of the regressors not involved in the subvector test, is small compared to the
sample size. The latter implies that the vector of nuisance regressors is
discrete. Simulations and empirical illustrations suggest that the test has
good power in practice if, indeed, the number of strata is small compared to
the sample size.

arXiv link: http://arxiv.org/abs/2205.06713v4

Econometrics arXiv paper, submitted: 2022-05-12

Causal Estimation of Position Bias in Recommender Systems Using Marketplace Instruments

Authors: Rina Friedberg, Karthik Rajkumar, Jialiang Mao, Qian Yao, YinYin Yu, Min Liu

Information retrieval systems, such as online marketplaces, news feeds, and
search engines, are ubiquitous in today's digital society. They facilitate
information discovery by ranking retrieved items on predicted relevance, i.e.
likelihood of interaction (click, share) between users and items. Typically
modeled using past interactions, such rankings have a major drawback:
interaction depends on the attention items receive. A highly-relevant item
placed outside a user's attention could receive little interaction. This
discrepancy between observed interaction and true relevance is termed the
position bias. Position bias degrades relevance estimation and when it
compounds over time, it can silo users into false relevant items, causing
marketplace inefficiencies. Position bias may be identified with randomized
experiments, but such an approach can be prohibitive in cost and feasibility.
Past research has also suggested propensity score methods, which do not
adequately address unobserved confounding; and regression discontinuity
designs, which have poor external validity. In this work, we address these
concerns by leveraging the abundance of A/B tests in ranking evaluations as
instrumental variables. Historical A/B tests allow us to access exogenous
variation in rankings without manually introducing them, harming user
experience and platform revenue. We demonstrate our methodology in two distinct
applications at LinkedIn - feed ads and the People-You-May-Know (PYMK)
recommender. The marketplaces comprise users and campaigns on the ads side, and
invite senders and recipients on PYMK. By leveraging prior experimentation, we
obtain quasi-experimental variation in item rankings that is orthogonal to user
relevance. Our method provides robust position effect estimates that handle
unobserved confounding well, greater generalizability, and easily extends to
other information retrieval systems.

arXiv link: http://arxiv.org/abs/2205.06363v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-05-12

A single risk approach to the semiparametric copula competing risks model

Authors: Simon M. S. Lo, Ralf A. Wilke

A typical situation in competing risks analysis is that the researcher is
only interested in a subset of risks. This paper considers a depending
competing risks model with the distribution of one risk being a parametric or
semi-parametric model, while the model for the other risks being unknown.
Identifiability is shown for popular classes of parametric models and the
semiparametric proportional hazards model. The identifiability of the
parametric models does not require a covariate, while the semiparametric model
requires at least one. Estimation approaches are suggested which are shown to
be $n$-consistent. Applicability and attractive finite sample
performance are demonstrated with the help of simulations and data examples.

arXiv link: http://arxiv.org/abs/2205.06087v1

Econometrics arXiv updated paper (originally submitted: 2022-05-11)

Multivariate ordered discrete response models

Authors: Tatiana Komarova, William Matcham

We introduce multivariate ordered discrete response models with general
rectangular structures. From the perspective of behavioral economics, these
non-lattice models correspond to broad bracketing in decision making, whereas
lattice models, which researchers typically estimate in practice, correspond to
narrow bracketing. In these models, we specify latent processes as a sum of an
index of covariates and an unobserved error, with unobservables for different
latent processes potentially correlated. We provide conditions that are
sufficient for identification under the independence of errors and covariates
and outline an estimation approach. We present simulations and empirical
examples, with a particular focus on probit specifications.

arXiv link: http://arxiv.org/abs/2205.05779v2

Econometrics arXiv updated paper (originally submitted: 2022-05-11)

Externally Valid Policy Choice

Authors: Christopher Adjaho, Timothy Christensen

We consider the problem of learning personalized treatment policies that are
externally valid or generalizable: they perform well in other target
populations besides the experimental (or training) population from which data
are sampled. We first show that welfare-maximizing policies for the
experimental population are robust to shifts in the distribution of outcomes
(but not characteristics) between the experimental and target populations. We
then develop new methods for learning policies that are robust to shifts in
outcomes and characteristics. In doing so, we highlight how treatment effect
heterogeneity within the experimental population affects the generalizability
of policies. Our methods may be used with experimental or observational data
(where treatment is endogenous). Many of our methods can be implemented with
linear programming.

arXiv link: http://arxiv.org/abs/2205.05561v3

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2022-05-10

On learning agent-based models from data

Authors: Corrado Monti, Marco Pangallo, Gianmarco De Francisci Morales, Francesco Bonchi

Agent-Based Models (ABMs) are used in several fields to study the evolution
of complex systems from micro-level assumptions. However, ABMs typically can
not estimate agent-specific (or "micro") variables: this is a major limitation
which prevents ABMs from harnessing micro-level data availability and which
greatly limits their predictive power. In this paper, we propose a protocol to
learn the latent micro-variables of an ABM from data. The first step of our
protocol is to reduce an ABM to a probabilistic model, characterized by a
computationally tractable likelihood. This reduction follows two general design
principles: balance of stochasticity and data availability, and replacement of
unobservable discrete choices with differentiable approximations. Then, our
protocol proceeds by maximizing the likelihood of the latent variables via a
gradient-based expectation maximization algorithm. We demonstrate our protocol
by applying it to an ABM of the housing market, in which agents with different
incomes bid higher prices to live in high-income neighborhoods. We demonstrate
that the obtained model allows accurate estimates of the latent variables,
while preserving the general behavior of the ABM. We also show that our
estimates can be used for out-of-sample forecasting. Our protocol can be seen
as an alternative to black-box data assimilation methods, that forces the
modeler to lay bare the assumptions of the model, to think about the
inferential process, and to spot potential identification problems.

arXiv link: http://arxiv.org/abs/2205.05052v2

Econometrics arXiv updated paper (originally submitted: 2022-05-10)

Estimating Discrete Games of Complete Information: Bringing Logit Back in the Game

Authors: Paul S. Koh

Estimating discrete games of complete information is often computationally
difficult due to partial identification and the absence of closed-form moment
characterizations. This paper proposes computationally tractable approaches to
estimation and inference that remove the computational burden associated with
equilibria enumeration, numerical simulation, and grid search. Separately for
unordered and ordered-actions games, I construct an identified set
characterized by a finite set of generalized likelihood-based conditional
moment inequalities that are convex in (a subvector of) structural model
parameters under the standard logit assumption on unobservables. I use
simulation and empirical examples to show that the proposed approaches generate
informative identified sets and can be several orders of magnitude faster than
existing estimation methods.

arXiv link: http://arxiv.org/abs/2205.05002v5

Econometrics arXiv updated paper (originally submitted: 2022-05-10)

Stable Outcomes and Information in Games: An Empirical Framework

Authors: Paul S. Koh

Empirically, many strategic settings are characterized by stable outcomes in
which players' decisions are publicly observed, yet no player takes the
opportunity to deviate. To analyze such situations in the presence of
incomplete information, we build an empirical framework by introducing a novel
solution concept that we call Bayes stable equilibrium. Our framework allows
the researcher to be agnostic about players' information and the equilibrium
selection rule. The Bayes stable equilibrium identified set collapses to the
complete information pure strategy Nash equilibrium identified set under strong
assumptions on players' information. Furthermore, all else equal, it is weakly
tighter than the Bayes correlated equilibrium identified set. We also propose
computationally tractable approaches for estimation and inference. In an
application, we study the strategic entry decisions of McDonald's and Burger
King in the US. Our results highlight the identifying power of informational
assumptions and show that the Bayes stable equilibrium identified set can be
substantially tighter than the Bayes correlated equilibrium identified set. In
a counterfactual experiment, we examine the impact of increasing access to
healthy food on the market structures in Mississippi food deserts.

arXiv link: http://arxiv.org/abs/2205.04990v2

Econometrics arXiv updated paper (originally submitted: 2022-05-10)

Distributionally Robust Policy Learning with Wasserstein Distance

Authors: Daido Kido

The effects of treatments are often heterogeneous, depending on the
observable characteristics, and it is necessary to exploit such heterogeneity
to devise individualized treatment rules (ITRs). Existing estimation methods of
such ITRs assume that the available experimental or observational data are
derived from the target population in which the estimated policy is
implemented. However, this assumption often fails in practice because of
limited useful data. In this case, policymakers must rely on the data generated
in the source population, which differs from the target population.
Unfortunately, existing estimation methods do not necessarily work as expected
in the new setting, and strategies that can achieve a reasonable goal in such a
situation are required. This study examines the application of distributionally
robust optimization (DRO), which formalizes an ambiguity about the target
population and adapts to the worst-case scenario in the set. It is shown that
DRO with Wasserstein distance-based characterization of ambiguity provides
simple intuitions and a simple estimation method. I then develop an estimator
for the distributionally robust ITR and evaluate its theoretical performance.
An empirical application shows that the proposed approach outperforms the naive
approach in the target population.

arXiv link: http://arxiv.org/abs/2205.04637v2

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2022-05-09

Robust Data-Driven Decisions Under Model Uncertainty

Authors: Xiaoyu Cheng

When sample data are governed by an unknown sequence of independent but
possibly non-identical distributions, the data-generating process (DGP) in
general cannot be perfectly identified from the data. For making decisions
facing such uncertainty, this paper presents a novel approach by studying how
the data can best be used to robustly improve decisions. That is, no matter
which DGP governs the uncertainty, one can make a better decision than without
using the data. I show that common inference methods, e.g., maximum likelihood
and Bayesian updating cannot achieve this goal. To address, I develop new
updating rules that lead to robustly better decisions either asymptotically
almost surely or in finite sample with a pre-specified probability. Especially,
they are easy to implement as are given by simple extensions of the standard
statistical procedures in the case where the possible DGPs are all independent
and identically distributed. Finally, I show that the new updating rules also
lead to more intuitive conclusions in existing economic models such as asset
pricing under ambiguity.

arXiv link: http://arxiv.org/abs/2205.04573v1

Econometrics arXiv updated paper (originally submitted: 2022-05-09)

A unified test for regression discontinuity designs

Authors: Koki Fusejima, Takuya Ishihara, Masayuki Sawada

Diagnostic tests for regression discontinuity design face a size-control
problem. We document a massive over-rejection of the diagnostic restriction
among empirical studies in the top five economics journals. At least one
diagnostic test was rejected for 19 out of 59 studies, whereas less than 5% of
the collected 787 tests rejected the null hypotheses. In other words, one-third
of the studies rejected at least one of their diagnostic tests, whereas their
underlying identifying restrictions appear plausible. Multiple testing causes
this problem because the median number of tests per study was as high as 12.
Therefore, we offer unified tests to overcome the size-control problem. Our
procedure is based on the new joint asymptotic normality of local polynomial
mean and density estimates. In simulation studies, our unified tests
outperformed the Bonferroni correction. We implement the procedure as an R
package rdtest with two empirical examples in its vignettes.

arXiv link: http://arxiv.org/abs/2205.04345v5

Econometrics arXiv updated paper (originally submitted: 2022-05-08)

Policy Choice in Time Series by Empirical Welfare Maximization

Authors: Toru Kitagawa, Weining Wang, Mengshan Xu

This paper develops a novel method for policy choice in a dynamic setting
where the available data is a multi-variate time series. Building on the
statistical treatment choice framework, we propose Time-series Empirical
Welfare Maximization (T-EWM) methods to estimate an optimal policy rule by
maximizing an empirical welfare criterion constructed using nonparametric
potential outcome time series. We characterize conditions under which T-EWM
consistently learns a policy choice that is optimal in terms of conditional
welfare given the time-series history. We derive a nonasymptotic upper bound
for conditional welfare regret. To illustrate the implementation and uses of
T-EWM, we perform simulation studies and apply the method to estimate optimal
restriction rules against Covid-19.

arXiv link: http://arxiv.org/abs/2205.03970v4

Econometrics arXiv updated paper (originally submitted: 2022-05-08)

Dynamic demand for differentiated products with fixed-effects unobserved heterogeneity

Authors: Victor Aguirregabiria

This paper studies identification and estimation of a dynamic discrete choice
model of demand for differentiated product using consumer-level panel data with
few purchase events per consumer (i.e., short panel). Consumers are
forward-looking and their preferences incorporate two sources of dynamics: last
choice dependence due to habits and switching costs, and duration dependence
due to inventory, depreciation, or learning. A key distinguishing feature of
the model is that consumer unobserved heterogeneity has a Fixed Effects (FE)
structure -- that is, its probability distribution conditional on the initial
values of endogenous state variables is unrestricted. I apply and extend recent
results to establish the identification of all the structural parameters as
long as the dataset includes four or more purchase events per household. The
parameters can be estimated using a sufficient statistic - conditional maximum
likelihood (CML) method. An attractive feature of CML in this model is that the
sufficient statistic controls for the forward-looking value of the consumer's
decision problem such that the method does not require solving dynamic
programming problems or calculating expected present values.

arXiv link: http://arxiv.org/abs/2205.03948v2

Econometrics arXiv updated paper (originally submitted: 2022-05-07)

Identification and Estimation of Dynamic Games with Unknown Information Structure

Authors: Konan Hara, Yuki Ito, Paul Koh

We develop an empirical framework for analyzing dynamic games when the
underlying information structure is unknown to the analyst. We introduce
Markov correlated equilibrium, a dynamic analog of Bayes correlated
equilibrium, and show that its predictions coincide with the Markov perfect
equilibrium predictions attainable when players observe richer signals than the
analyst assumes. We provide tractable methods for informationally robust
estimation, inference, and counterfactual analysis. We illustrate the framework
with a dynamic entry game between Starbucks and Dunkin' in the US and study the
role of informational assumptions.

arXiv link: http://arxiv.org/abs/2205.03706v5

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-05-06

Benchmarking Econometric and Machine Learning Methodologies in Nowcasting

Authors: Daniel Hopp

Nowcasting can play a key role in giving policymakers timelier insight to
data published with a significant time lag, such as final GDP figures.
Currently, there are a plethora of methodologies and approaches for
practitioners to choose from. However, there lacks a comprehensive comparison
of these disparate approaches in terms of predictive performance and
characteristics. This paper addresses that deficiency by examining the
performance of 12 different methodologies in nowcasting US quarterly GDP
growth, including all the methods most commonly employed in nowcasting, as well
as some of the most popular traditional machine learning approaches.
Performance was assessed on three different tumultuous periods in US economic
history: the early 1980s recession, the 2008 financial crisis, and the COVID
crisis. The two best performing methodologies in the analysis were long
short-term memory artificial neural networks (LSTM) and Bayesian vector
autoregression (BVAR). To facilitate further application and testing of each of
the examined methodologies, an open-source repository containing boilerplate
code that can be applied to different datasets is published alongside the
paper, available at: github.com/dhopp1/nowcasting_benchmark.

arXiv link: http://arxiv.org/abs/2205.03318v1

Econometrics arXiv updated paper (originally submitted: 2022-05-06)

Leverage, Influence, and the Jackknife in Clustered Regression Models: Reliable Inference Using summclust

Authors: James G. MacKinnon, Morten Ørregaard Nielsen, Matthew D. Webb

We introduce a new Stata package called summclust that summarizes the cluster
structure of the dataset for linear regression models with clustered
disturbances. The key unit of observation for such a model is the cluster. We
therefore propose cluster-level measures of leverage, partial leverage, and
influence and show how to compute them quickly in most cases. The measures of
leverage and partial leverage can be used as diagnostic tools to identify
datasets and regression designs in which cluster-robust inference is likely to
be challenging. The measures of influence can provide valuable information
about how the results depend on the data in the various clusters. We also show
how to calculate two jackknife variance matrix estimators efficiently as a
byproduct of our other computations. These estimators, which are already
available in Stata, are generally more conservative than conventional variance
matrix estimators. The summclust package computes all the quantities that we
discuss.

arXiv link: http://arxiv.org/abs/2205.03288v3

Econometrics arXiv paper, submitted: 2022-05-06

Cluster-Robust Inference: A Guide to Empirical Practice

Authors: James G. MacKinnon, Morten Ørregaard Nielsen, Matthew D. Webb

Methods for cluster-robust inference are routinely used in economics and many
other disciplines. However, it is only recently that theoretical foundations
for the use of these methods in many empirically relevant situations have been
developed. In this paper, we use these theoretical results to provide a guide
to empirical practice. We do not attempt to present a comprehensive survey of
the (very large) literature. Instead, we bridge theory and practice by
providing a thorough guide on what to do and why, based on recently available
econometric theory and simulation evidence. To practice what we preach, we
include an empirical analysis of the effects of the minimum wage on labor
supply of teenagers using individual data.

arXiv link: http://arxiv.org/abs/2205.03285v1

Econometrics arXiv paper, submitted: 2022-05-06

Estimation and Inference by Stochastic Optimization

Authors: Jean-Jacques Forneron

In non-linear estimations, it is common to assess sampling uncertainty by
bootstrap inference. For complex models, this can be computationally intensive.
This paper combines optimization with resampling: turning stochastic
optimization into a fast resampling device. Two methods are introduced: a
resampled Newton-Raphson (rNR) and a resampled quasi-Newton (rqN) algorithm.
Both produce draws that can be used to compute consistent estimates, confidence
intervals, and standard errors in a single run. The draws are generated by a
gradient and Hessian (or an approximation) computed from batches of data that
are resampled at each iteration. The proposed methods transition quickly from
optimization to resampling when the objective is smooth and strictly convex.
Simulated and empirical applications illustrate the properties of the methods
on large scale and computationally intensive problems. Comparisons with
frequentist and Bayesian methods highlight the features of the algorithms.

arXiv link: http://arxiv.org/abs/2205.03254v1

Econometrics arXiv paper, submitted: 2022-05-04

Choosing Exogeneity Assumptions in Potential Outcome Models

Authors: Matthew A. Masten, Alexandre Poirier

There are many kinds of exogeneity assumptions. How should researchers choose
among them? When exogeneity is imposed on an unobservable like a potential
outcome, we argue that the form of exogeneity should be chosen based on the
kind of selection on unobservables it allows. Consequently, researchers can
assess the plausibility of any exogeneity assumption by studying the
distributions of treatment given the unobservables that are consistent with
that assumption. We use this approach to study two common exogeneity
assumptions: quantile and mean independence. We show that both assumptions
require a kind of non-monotonic relationship between treatment and the
potential outcomes. We discuss how to assess the plausibility of this kind of
treatment selection. We also show how to define a new and weaker version of
quantile independence that allows for monotonic treatment selection. We then
show the implications of the choice of exogeneity assumption for
identification. We apply these results in an empirical illustration of the
effect of child soldiering on wages.

arXiv link: http://arxiv.org/abs/2205.02288v1

Econometrics arXiv cross-link from math.OC (math.OC), submitted: 2022-05-04

Reducing Marketplace Interference Bias Via Shadow Prices

Authors: Ido Bright, Arthur Delarue, Ilan Lobel

Marketplace companies rely heavily on experimentation when making changes to
the design or operation of their platforms. The workhorse of experimentation is
the randomized controlled trial (RCT), or A/B test, in which users are randomly
assigned to treatment or control groups. However, marketplace interference
causes the Stable Unit Treatment Value Assumption (SUTVA) to be violated,
leading to bias in the standard RCT metric. In this work, we propose techniques
for platforms to run standard RCTs and still obtain meaningful estimates
despite the presence of marketplace interference. We specifically consider a
generalized matching setting, in which the platform explicitly matches supply
with demand via a linear programming algorithm. Our first proposal is for the
platform to estimate the value of global treatment and global control via
optimization. We prove that this approach is unbiased in the fluid limit. Our
second proposal is to compare the average shadow price of the treatment and
control groups rather than the total value accrued by each group. We prove that
this technique corresponds to the correct first-order approximation (in a
Taylor series sense) of the value function of interest even in a finite-size
system. We then use this result to prove that, under reasonable assumptions,
our estimator is less biased than the RCT estimator. At the heart of our result
is the idea that it is relatively easy to model interference in matching-driven
marketplaces since, in such markets, the platform mediates the spillover.

arXiv link: http://arxiv.org/abs/2205.02274v4

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2022-05-04

Approximating Choice Data by Discrete Choice Models

Authors: Haoge Chang, Yusuke Narita, Kota Saito

We obtain a necessary and sufficient condition under which random-coefficient
discrete choice models, such as mixed-logit models, are rich enough to
approximate any nonparametric random utility models arbitrarily well across
choice sets. The condition turns out to be the affine-independence of the set
of characteristic vectors. When the condition fails, resulting in some random
utility models that cannot be closely approximated, we identify preferences and
substitution patterns that are challenging to approximate accurately. We also
propose algorithms to quantify the magnitude of approximation errors.

arXiv link: http://arxiv.org/abs/2205.01882v4

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-05-04

Machine Learning based Framework for Robust Price-Sensitivity Estimation with Application to Airline Pricing

Authors: Ravi Kumar, Shahin Boluki, Karl Isler, Jonas Rauch, Darius Walczak

We consider the problem of dynamic pricing of a product in the presence of
feature-dependent price sensitivity. Developing practical algorithms that can
estimate price elasticities robustly, especially when information about no
purchases (losses) is not available, to drive such automated pricing systems is
a challenge faced by many industries. Based on the Poisson semi-parametric
approach, we construct a flexible yet interpretable demand model where the
price related part is parametric while the remaining (nuisance) part of the
model is non-parametric and can be modeled via sophisticated machine learning
(ML) techniques. The estimation of price-sensitivity parameters of this model
via direct one-stage regression techniques may lead to biased estimates due to
regularization. To address this concern, we propose a two-stage estimation
methodology which makes the estimation of the price-sensitivity parameters
robust to biases in the estimators of the nuisance parameters of the model. In
the first-stage we construct estimators of observed purchases and prices given
the feature vector using sophisticated ML estimators such as deep neural
networks. Utilizing the estimators from the first-stage, in the second-stage we
leverage a Bayesian dynamic generalized linear model to estimate the
price-sensitivity parameters. We test the performance of the proposed
estimation schemes on simulated and real sales transaction data from the
Airline industry. Our numerical studies demonstrate that our proposed two-stage
approach reduces the estimation error in price-sensitivity parameters from 25%
to 4% in realistic simulation settings. The two-stage estimation techniques
proposed in this work allows practitioners to leverage modern ML techniques to
robustly estimate price-sensitivities while still maintaining interpretability
and allowing ease of validation of its various constituent parts.

arXiv link: http://arxiv.org/abs/2205.01875v2

Econometrics arXiv paper, submitted: 2022-05-03

Efficient Score Computation and Expectation-Maximization Algorithm in Regime-Switching Models

Authors: Chaojun Li, Shi Qiu

This study proposes an efficient algorithm for score computation for
regime-switching models, and derived from which, an efficient
expectation-maximization (EM) algorithm. Different from existing algorithms,
this algorithm does not rely on the forward-backward filtering for smoothed
regime probabilities, and only involves forward computation. Moreover, the
algorithm to compute score is readily extended to compute the Hessian matrix.

arXiv link: http://arxiv.org/abs/2205.01565v1

Econometrics arXiv updated paper (originally submitted: 2022-05-02)

Heterogeneous Treatment Effects for Networks, Panels, and other Outcome Matrices

Authors: Eric Auerbach, Yong Cai

We are interested in the distribution of treatment effects for an experiment
where units are randomized to a treatment but outcomes are measured for pairs
of units. For example, we might measure risk sharing links between households
enrolled in a microfinance program, employment relationships between workers
and firms exposed to a trade shock, or bids from bidders to items assigned to
an auction format. Such a double randomized experimental design may be
appropriate when there are social interactions, market externalities, or other
spillovers across units assigned to the same treatment. Or it may describe a
natural or quasi experiment given to the researcher. In this paper, we propose
a new empirical strategy that compares the eigenvalues of the outcome matrices
associated with each treatment. Our proposal is based on a new matrix analog of
the Fr\'echet-Hoeffding bounds that play a key role in the standard theory. We
first use this result to bound the distribution of treatment effects. We then
propose a new matrix analog of quantile treatment effects that is given by a
difference in the eigenvalues. We call this analog spectral treatment effects.

arXiv link: http://arxiv.org/abs/2205.01246v2

Econometrics arXiv updated paper (originally submitted: 2022-05-02)

A short term credibility index for central banks under inflation targeting: an application to Brazil

Authors: Alain Hecq, Joao Issler, Elisa Voisin

This paper uses predictive densities obtained via mixed causal-noncausal
autoregressive models to evaluate the statistical sustainability of Brazilian
inflation targeting system with the tolerance bounds. The probabilities give an
indication of the short-term credibility of the targeting system without
requiring modelling people's beliefs. We employ receiver operating
characteristic curves to determine the optimal probability threshold from which
the bank is predicted to be credible. We also investigate the added value of
including experts predictions of key macroeconomic variables.

arXiv link: http://arxiv.org/abs/2205.00924v2

Econometrics arXiv paper, submitted: 2022-05-02

A Note on "A survey of preference estimation with unobserved choice set heterogeneity" by Gregory S. Crawford, Rachel Griffith, and Alessandro Iaria

Authors: C. Angelo Guevara

Crawford's et al. (2021) article on estimation of discrete choice models with
unobserved or latent consideration sets, presents a unified framework to
address the problem in practice by using "sufficient sets", defined as a
combination of past observed choices. The proposed approach is sustained in a
re-interpretation of a consistency result by McFadden (1978) for the problem of
sampling of alternatives, but the usage of that result in Crawford et al.
(2021) is imprecise in an important matter. It is stated that consistency would
be attained if any subset of the true consideration set is used for estimation,
but McFadden (1978) shows that, in general, one needs to do a sampling
correction that depends on the protocol used to draw the choice set. This note
derives the sampling correction that is required when the choice set for
estimation is built from past choices. Then, it formalizes the conditions under
which such correction would fulfill the uniform condition property and can
therefore be ignored when building practical estimators, such as the ones
analyzed by Crawford et al. (2021).

arXiv link: http://arxiv.org/abs/2205.00852v1

Econometrics arXiv updated paper (originally submitted: 2022-05-01)

Higher-order Expansions and Inference for Panel Data Models

Authors: Jiti Gao, Bin Peng, Yayi Yan

In this paper, we propose a simple inferential method for a wide class of
panel data models with a focus on such cases that have both serial correlation
and cross-sectional dependence. In order to establish an asymptotic theory to
support the inferential method, we develop some new and useful higher-order
expansions, such as Berry-Esseen bound and Edgeworth Expansion, under a set of
simple and general conditions. We further demonstrate the usefulness of these
theoretical results by explicitly investigating a panel data model with
interactive effects which nests many traditional panel data models as special
cases. Finally, we show the superiority of our approach over several natural
competitors using extensive numerical studies.

arXiv link: http://arxiv.org/abs/2205.00577v2

Econometrics arXiv paper, submitted: 2022-04-30

Greenhouse Gas Emissions and its Main Drivers: a Panel Assessment for EU-27 Member States

Authors: I. Jianu, S. M. Jeloaica, M. D. Tudorache

This paper assesses the effects of greenhouse gas emissions drivers in EU-27
over the period 2010-2019, using a Panel EGLS model with period fixed effects.
In particular, we focused our research on studying the effects of GDP,
renewable energy, households energy consumption and waste on the greenhouse gas
emissions. In this regard, we found a positive relationship between three
independent variables (real GDP per capita, households final consumption per
capita and waste generation per capita) and greenhouse gas emissions per
capita, while the effect of the share of renewable energy in gross final energy
consumption on the dependent variable proved to be negative, but quite low. In
addition, we demonstrate that the main challenge that affects greenhouse gas
emissions is related to the structure of households energy consumption, which
is generally composed by environmentally harmful fuels. This suggests the need
to make greater efforts to support the shift to a green economy based on a
higher energy efficiency.

arXiv link: http://arxiv.org/abs/2205.00295v1

Econometrics arXiv updated paper (originally submitted: 2022-04-30)

A Heteroskedasticity-Robust Overidentifying Restriction Test with High-Dimensional Covariates

Authors: Qingliang Fan, Zijian Guo, Ziwei Mei

This paper proposes an overidentifying restriction test for high-dimensional
linear instrumental variable models. The novelty of the proposed test is that
it allows the number of covariates and instruments to be larger than the sample
size. The test is scale-invariant and is robust to heteroskedastic errors. To
construct the final test statistic, we first introduce a test based on the
maximum norm of multiple parameters that could be high-dimensional. The
theoretical power based on the maximum norm is higher than that in the modified
Cragg-Donald test (Koles\'{a}r, 2018), the only existing test allowing for
large-dimensional covariates. Second, following the principle of power
enhancement (Fan et al., 2015), we introduce the power-enhanced test, with an
asymptotically zero component used to enhance the power to detect some extreme
alternatives with many locally invalid instruments. Finally, an empirical
example of the trade and economic growth nexus demonstrates the usefulness of
the proposed test.

arXiv link: http://arxiv.org/abs/2205.00171v3

Econometrics arXiv updated paper (originally submitted: 2022-04-28)

Controlling for Latent Confounding with Triple Proxies

Authors: Ben Deaner

We present new results for nonparametric identification of causal effects
using noisy proxies for unobserved confounders. Our approach builds on the
results of Hu2008 who tackle the problem of general measurement error.
We call this the `triple proxy' approach because it requires three proxies that
are jointly independent conditional on unobservables. We consider three
different choices for the third proxy: it may be an outcome, a vector of
treatments, or a collection of auxiliary variables. We compare to an
alternative identification strategy introduced by Miao2018a in which
causal effects are identified using two conditionally independent proxies. We
refer to this as the `double proxy' approach. The triple proxy approach
identifies objects that are not identified by the double proxy approach,
including some that capture the variation in average treatment effects between
strata of the unobservables. Moreover, the conditional independence assumptions
in the double and triple proxy approaches are non-nested.

arXiv link: http://arxiv.org/abs/2204.13815v2

Econometrics arXiv updated paper (originally submitted: 2022-04-28)

Efficient Estimation of Structural Models via Sieves

Authors: Yao Luo, Peijun Sang

We propose a class of sieve-based efficient estimators for structural models
(SEES), which approximate the solution using a linear combination of basis
functions and impose equilibrium conditions as a penalty to determine the
best-fitting coefficients. Our estimators avoid the need to repeatedly solve
the model, apply to a broad class of models, and are consistent, asymptotically
normal, and asymptotically efficient. Moreover, they solve unconstrained
optimization problems with fewer unknowns and offer convenient standard error
calculations. As an illustration, we apply our method to an entry game between
Walmart and Kmart.

arXiv link: http://arxiv.org/abs/2204.13488v2

Econometrics arXiv cross-link from cs.GT (cs.GT), submitted: 2022-04-28

From prediction markets to interpretable collective intelligence

Authors: Alexey V. Osipov, Nikolay N. Osipov

We outline how to create a mechanism that provides an optimal way to elicit,
from an arbitrary group of experts, the probability of the truth of an
arbitrary logical proposition together with collective information that has an
explicit form and interprets this probability. Namely, we provide strong
arguments for the possibility of the development of a self-resolving prediction
market with play money that incentivizes direct information exchange between
experts. Such a system could, in particular, motivate simultaneously many
experts to collectively solve scientific or medical problems in a very
efficient manner. We also note that in our considerations, experts are not
assumed to be Bayesian.

arXiv link: http://arxiv.org/abs/2204.13424v3

Econometrics arXiv paper, submitted: 2022-04-27

Impulse response estimation via flexible local projections

Authors: Haroon Mumtaz, Michele Piffer

This paper introduces a flexible local projection that generalizes the model
by Jord\'a (2005) to a non-parametric setting using Bayesian Additive
Regression Trees. Monte Carlo experiments show that our BART-LP model is able
to capture non-linearities in the impulse responses. Our first application
shows that the fiscal multiplier is stronger in recession than in expansion
only in response to contractionary fiscal shocks, but not in response to
expansionary fiscal shocks. We then show that financial shocks generate effects
on the economy that increase more than proportionately in the size of the shock
when the shock is negative, but not when the shock is positive.

arXiv link: http://arxiv.org/abs/2204.13150v1

Econometrics arXiv paper, submitted: 2022-04-27

Estimation of Recursive Route Choice Models with Incomplete Trip Observations

Authors: Tien Mai, The Viet Bui, Quoc Phong Nguyen, Tho V. Le

This work concerns the estimation of recursive route choice models in the
situation that the trip observations are incomplete, i.e., there are
unconnected links (or nodes) in the observations. A direct approach to handle
this issue would be intractable because enumerating all paths between
unconnected links (or nodes) in a real network is typically not possible. We
exploit an expectation-maximization (EM) method that allows to deal with the
missing-data issue by alternatively performing two steps of sampling the
missing segments in the observations and solving maximum likelihood estimation
problems. Moreover, observing that the EM method would be expensive, we propose
a new estimation method based on the idea that the choice probabilities of
unconnected link observations can be exactly computed by solving systems of
linear equations. We further design a new algorithm, called as
decomposition-composition (DC), that helps reduce the number of systems of
linear equations to be solved and speed up the estimation. We compare our
proposed algorithms with some standard baselines using a dataset from a real
network and show that the DC algorithm outperforms the other approaches in
recovering missing information in the observations. Our methods work with most
of the recursive route choice models proposed in the literature, including the
recursive logit, nested recursive logit, or discounted recursive models.

arXiv link: http://arxiv.org/abs/2204.12992v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-04-26

A Multivariate Spatial and Spatiotemporal ARCH Model

Authors: Philipp Otto

This paper introduces a multivariate spatiotemporal autoregressive
conditional heteroscedasticity (ARCH) model based on a vec-representation. The
model includes instantaneous spatial autoregressive spill-over effects in the
conditional variance, as they are usually present in spatial econometric
applications. Furthermore, spatial and temporal cross-variable effects are
explicitly modelled. We transform the model to a multivariate spatiotemporal
autoregressive model using a log-squared transformation and derive a consistent
quasi-maximum-likelihood estimator (QMLE). For finite samples and different
error distributions, the performance of the QMLE is analysed in a series of
Monte-Carlo simulations. In addition, we illustrate the practical usage of the
new model with a real-world example. We analyse the monthly real-estate price
returns for three different property types in Berlin from 2002 to 2014. We find
weak (instantaneous) spatial interactions, while the temporal autoregressive
structure in the market risks is of higher importance. Interactions between the
different property types only occur in the temporally lagged variables. Thus,
we see mainly temporal volatility clusters and weak spatial volatility
spill-overs.

arXiv link: http://arxiv.org/abs/2204.12472v1

Econometrics arXiv updated paper (originally submitted: 2022-04-26)

GMM is Inadmissible Under Weak Identification

Authors: Isaiah Andrews, Anna Mikusheva

We consider estimation in moment condition models and show that under any
bound on identification strength, asymptotically admissible (i.e. undominated)
estimators in a wide class of estimation problems must be uniformly continuous
in the sample moment function. GMM estimators are in general discontinuous in
the sample moments, and are thus inadmissible. We show, by contrast, that
bagged, or bootstrap aggregated, GMM estimators as well as quasi-Bayes
posterior means have superior continuity properties, while results in the
literature imply that they are equivalent to GMM when identification is strong.
In simulations calibrated to published instrumental variables specifications,
we find that these alternatives often outperform GMM.

arXiv link: http://arxiv.org/abs/2204.12462v3

Econometrics arXiv updated paper (originally submitted: 2022-04-26)

A One-Covariate-at-a-Time Method for Nonparametric Additive Models

Authors: Liangjun Su, Thomas Tao Yang, Yonghui Zhang, Qiankun Zhou

This paper proposes a one-covariate-at-a-time multiple testing (OCMT)
approach to choose significant variables in high-dimensional nonparametric
additive regression models. Similarly to Chudik, Kapetanios and Pesaran (2018),
we consider the statistical significance of individual nonparametric additive
components one at a time and take into account the multiple testing nature of
the problem. One-stage and multiple-stage procedures are both considered. The
former works well in terms of the true positive rate only if the marginal
effects of all signals are strong enough; the latter helps to pick up hidden
signals that have weak marginal effects. Simulations demonstrate the good
finite sample performance of the proposed procedures. As an empirical
application, we use the OCMT procedure on a dataset we extracted from the
Longitudinal Survey on Rural Urban Migration in China. We find that our
procedure works well in terms of the out-of-sample forecast root mean square
errors, compared with competing methods.

arXiv link: http://arxiv.org/abs/2204.12023v3

Econometrics arXiv updated paper (originally submitted: 2022-04-25)

Optimal Decision Rules when Payoffs are Partially Identified

Authors: Timothy Christensen, Hyungsik Roger Moon, Frank Schorfheide

We derive asymptotically optimal statistical decision rules for discrete
choice problems when payoffs depend on a partially-identified parameter
$\theta$ and the decision maker can use a point-identified parameter $\mu$ to
deduce restrictions on $\theta$. Examples include treatment choice under
partial identification and pricing with rich unobserved heterogeneity. Our
notion of optimality combines a minimax approach to handle the ambiguity from
partial identification of $\theta$ given $\mu$ with an average risk
minimization approach for $\mu$. We show how to implement optimal decision
rules using the bootstrap and (quasi-)Bayesian methods in both parametric and
semiparametric settings. We provide detailed applications to treatment choice
and optimal pricing. Our asymptotic approach is well suited for realistic
empirical settings in which the derivation of finite-sample optimal rules is
intractable.

arXiv link: http://arxiv.org/abs/2204.11748v3

Econometrics arXiv updated paper (originally submitted: 2022-04-24)

Identification and Statistical Decision Theory

Authors: Charles F. Manski

Econometricians have usefully separated study of estimation into
identification and statistical components. Identification analysis, which
assumes knowledge of the probability distribution generating observable data,
places an upper bound on what may be learned about population parameters of
interest with finite sample data. Yet Wald's statistical decision theory
studies decision making with sample data without reference to identification,
indeed without reference to estimation. This paper asks if identification
analysis is useful to statistical decision theory. The answer is positive, as
it can yield an informative and tractable upper bound on the achievable finite
sample performance of decision criteria. The reasoning is simple when the
decision relevant parameter is point identified. It is more delicate when the
true state is partially identified and a decision must be made under ambiguity.
Then the performance of some criteria, such as minimax regret, is enhanced by
randomizing choice of an action. This may be accomplished by making choice a
function of sample data. I find it useful to recast choice of a statistical
decision function as selection of choice probabilities for the elements of the
choice set. Using sample data to randomize choice conceptually differs from and
is complementary to its traditional use to estimate population parameters.

arXiv link: http://arxiv.org/abs/2204.11318v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-04-23

Local Gaussian process extrapolation for BART models with applications to causal inference

Authors: Meijiang Wang, Jingyu He, P. Richard Hahn

Bayesian additive regression trees (BART) is a semi-parametric regression
model offering state-of-the-art performance on out-of-sample prediction.
Despite this success, standard implementations of BART typically provide
inaccurate prediction and overly narrow prediction intervals at points outside
the range of the training data. This paper proposes a novel extrapolation
strategy that grafts Gaussian processes to the leaf nodes in BART for
predicting points outside the range of the observed data. The new method is
compared to standard BART implementations and recent frequentist
resampling-based methods for predictive inference. We apply the new approach to
a challenging problem from causal inference, wherein for some regions of
predictor space, only treated or untreated units are observed (but not both).
In simulation studies, the new approach boasts superior performance compared to
popular alternatives, such as Jackknife+.

arXiv link: http://arxiv.org/abs/2204.10963v2

Econometrics arXiv updated paper (originally submitted: 2022-04-22)

Adversarial Estimators

Authors: Jonas Metzger

We develop an asymptotic theory of adversarial estimators ('A-estimators').
They generalize maximum-likelihood-type estimators ('M-estimators') as their
average objective is maximized by some parameters and minimized by others. This
class subsumes the continuous-updating Generalized Method of Moments,
Generative Adversarial Networks and more recent proposals in machine learning
and econometrics. In these examples, researchers state which aspects of the
problem may in principle be used for estimation, and an adversary learns how to
emphasize them optimally. We derive the convergence rates of A-estimators under
pointwise and partial identification, and the normality of functionals of their
parameters. Unknown functions may be approximated via sieves such as deep
neural networks, for which we provide simplified low-level conditions. As a
corollary, we obtain the normality of neural-net M-estimators, overcoming
technical issues previously identified by the literature. Our theory yields
novel results about a variety of A-estimators, providing intuition and formal
justification for their success in recent applications.

arXiv link: http://arxiv.org/abs/2204.10495v3

Econometrics arXiv paper, submitted: 2022-04-22

MTE with Misspecification

Authors: Julián Martínez-Iriarte, Pietro Emilio Spini

This paper studies the implication of a fraction of the population not
responding to the instrument when selecting into treatment. We show that, in
general, the presence of non-responders biases the Marginal Treatment Effect
(MTE) curve and many of its functionals. Yet, we show that, when the propensity
score is fully supported on the unit interval, it is still possible to restore
identification of the MTE curve and its functionals with an appropriate
re-weighting.

arXiv link: http://arxiv.org/abs/2204.10445v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2022-04-21

Boundary Adaptive Local Polynomial Conditional Density Estimators

Authors: Matias D. Cattaneo, Rajita Chandak, Michael Jansson, Xinwei Ma

We begin by introducing a class of conditional density estimators based on
local polynomial techniques. The estimators are boundary adaptive and easy to
implement. We then study the (pointwise and) uniform statistical properties of
the estimators, offering characterizations of both probability concentration
and distributional approximation. In particular, we establish uniform
convergence rates in probability and valid Gaussian distributional
approximations for the Studentized t-statistic process. We also discuss
implementation issues such as consistent estimation of the covariance function
for the Gaussian approximation, optimal integrated mean squared error bandwidth
selection, and valid robust bias-corrected inference. We illustrate the
applicability of our results by constructing valid confidence bands and
hypothesis tests for both parametric specification and shape constraints,
explicitly characterizing their approximation errors. A companion R software
package implementing our main results is provided.

arXiv link: http://arxiv.org/abs/2204.10359v4

Econometrics arXiv cross-link from q-fin.GN (q-fin.GN), submitted: 2022-04-21

Do t-Statistic Hurdles Need to be Raised?

Authors: Andrew Y. Chen

Many scholars have called for raising statistical hurdles to guard against
false discoveries in academic publications. I show these calls may be difficult
to justify empirically. Published data exhibit bias: results that fail to meet
existing hurdles are often unobserved. These unobserved results must be
extrapolated, which can lead to weak identification of revised hurdles. In
contrast, statistics that can target only published findings (e.g. empirical
Bayes shrinkage and the FDR) can be strongly identified, as data on published
findings is plentiful. I demonstrate these results theoretically and in an
empirical analysis of the cross-sectional return predictability literature.

arXiv link: http://arxiv.org/abs/2204.10275v4

Econometrics arXiv paper, submitted: 2022-04-21

From point forecasts to multivariate probabilistic forecasts: The Schaake shuffle for day-ahead electricity price forecasting

Authors: Oliver Grothe, Fabian Kächele, Fabian Krüger

Modeling price risks is crucial for economic decision making in energy
markets. Besides the risk of a single price, the dependence structure of
multiple prices is often relevant. We therefore propose a generic and
easy-to-implement method for creating multivariate probabilistic forecasts
based on univariate point forecasts of day-ahead electricity prices. While each
univariate point forecast refers to one of the day's 24 hours, the multivariate
forecast distribution models dependencies across hours. The proposed method is
based on simple copula techniques and an optional time series component. We
illustrate the method for five benchmark data sets recently provided by Lago et
al. (2020). Furthermore, we demonstrate an example for constructing realistic
prediction intervals for the weighted sum of consecutive electricity prices,
as, e.g., needed for pricing individual load profiles.

arXiv link: http://arxiv.org/abs/2204.10154v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-04-20

Optimal reconciliation with immutable forecasts

Authors: Bohan Zhang, Yanfei Kang, Anastasios Panagiotelis, Feng Li

The practical importance of coherent forecasts in hierarchical forecasting
has inspired many studies on forecast reconciliation. Under this approach,
so-called base forecasts are produced for every series in the hierarchy and are
subsequently adjusted to be coherent in a second reconciliation step.
Reconciliation methods have been shown to improve forecast accuracy, but will,
in general, adjust the base forecast of every series. However, in an
operational context, it is sometimes necessary or beneficial to keep forecasts
of some variables unchanged after forecast reconciliation. In this paper, we
formulate reconciliation methodology that keeps forecasts of a pre-specified
subset of variables unchanged or "immutable". In contrast to existing
approaches, these immutable forecasts need not all come from the same level of
a hierarchy, and our method can also be applied to grouped hierarchies. We
prove that our approach preserves unbiasedness in base forecasts. Our method
can also account for correlations between base forecasting errors and ensure
non-negativity of forecasts. We also perform empirical experiments, including
an application to sales of a large scale online retailer, to assess the impacts
of our proposed methodology.

arXiv link: http://arxiv.org/abs/2204.09231v1

Econometrics arXiv cross-link from cs.CR (cs.CR), submitted: 2022-04-19

The 2020 Census Disclosure Avoidance System TopDown Algorithm

Authors: John M. Abowd, Robert Ashmead, Ryan Cumings-Menon, Simson Garfinkel, Micah Heineck, Christine Heiss, Robert Johns, Daniel Kifer, Philip Leclerc, Ashwin Machanavajjhala, Brett Moran, William Sexton, Matthew Spence, Pavel Zhuravlev

The Census TopDown Algorithm (TDA) is a disclosure avoidance system using
differential privacy for privacy-loss accounting. The algorithm ingests the
final, edited version of the 2020 Census data and the final tabulation
geographic definitions. The algorithm then creates noisy versions of key
queries on the data, referred to as measurements, using zero-Concentrated
Differential Privacy. Another key aspect of the TDA are invariants, statistics
that the Census Bureau has determined, as matter of policy, to exclude from the
privacy-loss accounting. The TDA post-processes the measurements together with
the invariants to produce a Microdata Detail File (MDF) that contains one
record for each person and one record for each housing unit enumerated in the
2020 Census. The MDF is passed to the 2020 Census tabulation system to produce
the 2020 Census Redistricting Data (P.L. 94-171) Summary File. This paper
describes the mathematics and testing of the TDA for this purpose.

arXiv link: http://arxiv.org/abs/2204.08986v1

Econometrics arXiv updated paper (originally submitted: 2022-04-18)

Inference for Cluster Randomized Experiments with Non-ignorable Cluster Sizes

Authors: Federico Bugni, Ivan Canay, Azeem Shaikh, Max Tabord-Meehan

This paper considers the problem of inference in cluster randomized
experiments when cluster sizes are non-ignorable. Here, by a cluster randomized
experiment, we mean one in which treatment is assigned at the cluster level. By
non-ignorable cluster sizes, we refer to the possibility that the treatment
effects may depend non-trivially on the cluster sizes. We frame our analysis in
a super-population framework in which cluster sizes are random. In this way,
our analysis departs from earlier analyses of cluster randomized experiments in
which cluster sizes are treated as non-random. We distinguish between two
different parameters of interest: the equally-weighted cluster-level average
treatment effect, and the size-weighted cluster-level average treatment effect.
For each parameter, we provide methods for inference in an asymptotic framework
where the number of clusters tends to infinity and treatment is assigned using
a covariate-adaptive stratified randomization procedure. We additionally permit
the experimenter to sample only a subset of the units within each cluster
rather than the entire cluster and demonstrate the implications of such
sampling for some commonly used estimators. A small simulation study and
empirical demonstration show the practical relevance of our theoretical
results.

arXiv link: http://arxiv.org/abs/2204.08356v7

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2022-04-18

Feature-based intermittent demand forecast combinations: bias, accuracy and inventory implications

Authors: Li Li, Yanfei Kang, Fotios Petropoulos, Feng Li

Intermittent demand forecasting is a ubiquitous and challenging problem in
production systems and supply chain management. In recent years, there has been
a growing focus on developing forecasting approaches for intermittent demand
from academic and practical perspectives. However, limited attention has been
given to forecast combination methods, which have achieved competitive
performance in forecasting fast-moving time series. The current study aims to
examine the empirical outcomes of some existing forecast combination methods
and propose a generalized feature-based framework for intermittent demand
forecasting. The proposed framework has been shown to improve the accuracy of
point and quantile forecasts based on two real data sets. Further, some
analysis of features, forecasting pools and computational efficiency is also
provided. The findings indicate the intelligibility and flexibility of the
proposed approach in intermittent demand forecasting and offer insights
regarding inventory decisions.

arXiv link: http://arxiv.org/abs/2204.08283v2

Econometrics arXiv updated paper (originally submitted: 2022-04-18)

Nonlinear and Nonseparable Structural Functions in Fuzzy Regression Discontinuity Designs

Authors: Haitian Xie

Many empirical examples of regression discontinuity (RD) designs concern a
continuous treatment variable, but the theoretical aspects of such models are
less studied. This study examines the identification and estimation of the
structural function in fuzzy RD designs with a continuous treatment variable.
The structural function fully describes the causal impact of the treatment on
the outcome. We show that the nonlinear and nonseparable structural function
can be nonparametrically identified at the RD cutoff under shape restrictions,
including monotonicity and smoothness conditions. Based on the nonparametric
identification equation, we propose a three-step semiparametric estimation
procedure and establish the asymptotic normality of the estimator. The
semiparametric estimator achieves the same convergence rate as in the case of a
binary treatment variable. As an application of the method, we estimate the
causal effect of sleep time on health status by using the discontinuity in
natural light timing at time zone boundaries.

arXiv link: http://arxiv.org/abs/2204.08168v2

Econometrics arXiv updated paper (originally submitted: 2022-04-15)

Abadie's Kappa and Weighting Estimators of the Local Average Treatment Effect

Authors: Tymon Słoczyński, S. Derya Uysal, Jeffrey M. Wooldridge

Recent research has demonstrated the importance of flexibly controlling for
covariates in instrumental variables estimation. In this paper we study the
finite sample and asymptotic properties of various weighting estimators of the
local average treatment effect (LATE), motivated by Abadie's (2003) kappa
theorem and offering the requisite flexibility relative to standard practice.
We argue that two of the estimators under consideration, which are weight
normalized, are generally preferable. Several other estimators, which are
unnormalized, do not satisfy the properties of scale invariance with respect to
the natural logarithm and translation invariance, thereby exhibiting
sensitivity to the units of measurement when estimating the LATE in logs and
the centering of the outcome variable more generally. We also demonstrate that,
when noncompliance is one sided, certain weighting estimators have the
advantage of being based on a denominator that is strictly greater than zero by
construction. This is the case for only one of the two normalized estimators,
and we recommend this estimator for wider use. We illustrate our findings with
a simulation study and three empirical applications, which clearly document the
sensitivity of unnormalized estimators to how the outcome variable is coded. We
implement the proposed estimators in the Stata package kappalate.

arXiv link: http://arxiv.org/abs/2204.07672v4

Econometrics arXiv cross-link from q-fin.PR (q-fin.PR), submitted: 2022-04-14

Option Pricing with Time-Varying Volatility Risk Aversion

Authors: Peter Reinhard Hansen, Chen Tong

We introduce a pricing kernel with time-varying volatility risk aversion to
explain observed time variations in the shape of the pricing kernel. When
combined with the Heston-Nandi GARCH model, this framework yields a tractable
option pricing model in which the variance risk ratio (VRR) emerges as a key
variable. We show that the VRR is closely linked to economic fundamentals, as
well as sentiment and uncertainty measures. A novel approximation method
provides analytical option pricing formulas, and we demonstrate substantial
reductions in pricing errors through an empirical application to the S&P 500
index, the CBOE VIX, and option prices.

arXiv link: http://arxiv.org/abs/2204.06943v4

Econometrics arXiv updated paper (originally submitted: 2022-04-13)

Nonparametric Identification of Differentiated Products Demand Using Micro Data

Authors: Steven T. Berry, Philip A. Haile

We examine identification of differentiated products demand when one has
"micro data" linking individual consumers' characteristics and choices. Our
model nests standard specifications featuring rich observed and unobserved
consumer heterogeneity as well as product/market-level unobservables that
introduce the problem of econometric endogeneity. Previous work establishes
identification of such models using market-level data and instruments for all
prices and quantities. Micro data provides a panel structure that facilitates
richer demand specifications and reduces requirements on both the number and
types of instrumental variables. We address identification of demand in the
standard case in which non-price product characteristics are assumed exogenous,
but also cover identification of demand elasticities and other key features
when product characteristics are endogenous. We discuss implications of these
results for applied work.

arXiv link: http://arxiv.org/abs/2204.06637v2

Econometrics arXiv cross-link from eess.SY (eess.SY), submitted: 2022-04-12

Integrating Distributed Energy Resources: Optimal Prosumer Decisions and Impacts of Net Metering Tariffs

Authors: Ahmed S. Alahmed, Lang Tong

The rapid growth of the behind-the-meter (BTM) distributed generation has led
to initiatives to reform the net energy metering (NEM) policies to address
pressing concerns of rising electricity bills, fairness of cost allocation, and
the long-term growth of distributed energy resources. This article presents an
analytical framework for the optimal prosumer consumption decision using an
inclusive NEM X tariff model that covers existing and proposed NEM tariff
designs. The structure of the optimal consumption policy lends itself to near
closed-form optimal solutions suitable for practical energy management systems
that are responsive to stochastic BTM generation and dynamic pricing. The short
and long-run performance of NEM and feed-in tariffs (FiT) are considered under
a sequential rate-setting decision process. Also presented are numerical
results that characterize social welfare distributions, cross-subsidies, and
long-run solar adoption performance for selected NEM and FiT policy designs.

arXiv link: http://arxiv.org/abs/2204.06115v3

Econometrics arXiv updated paper (originally submitted: 2022-04-12)

Retrieval from Mixed Sampling Frequency: Generic Identifiability in the Unit Root VAR

Authors: Philipp Gersing, Leopold Soegner, Manfred Deistler

The "REtrieval from MIxed Sampling" (REMIS) approach based on blocking
developed in Anderson et al. (2016a) is concerned with retrieving an underlying
high frequency model from mixed frequency observations. In this paper we
investigate parameter-identifiability in the Johansen (1995) vector error
correction model for mixed frequency data. We prove that from the second
moments of the blocked process after taking differences at lag N (N is the slow
sampling rate), the parameters of the high frequency system are generically
identified. We treat the stock and the flow case as well as deterministic
terms.

arXiv link: http://arxiv.org/abs/2204.05952v2

Econometrics arXiv updated paper (originally submitted: 2022-04-12)

Coarse Personalization

Authors: Walter W. Zhang, Sanjog Misra

With advances in estimating heterogeneous treatment effects, firms can
personalize and target individuals at a granular level. However, feasibility
constraints limit full personalization. In practice, firms choose segments of
individuals and assign a treatment to each segment to maximize profits: We call
this the coarse personalization problem. We propose a two-step solution that
simultaneously makes segmentation and targeting decisions. First, the firm
personalizes by estimating conditional average treatment effects. Second, the
firm discretizes using treatment effects to choose which treatments to offer
and their segments. We show that a combination of available machine learning
tools for estimating heterogeneous treatment effects and a novel application of
optimal transport methods provides a viable and efficient solution. With data
from a large-scale field experiment in promotions management, we find our
methodology outperforms extant approaches that segment on consumer
characteristics, consumer preferences, or those that only search over a
prespecified grid. Using our procedure, the firm recoups over $99.5%$ of its
expected incremental profits under full personalization while offering only
five segments. We conclude by discussing how coarse personalization arises in
other domains.

arXiv link: http://arxiv.org/abs/2204.05793v4

Econometrics arXiv cross-link from eess.SP (eess.SP), submitted: 2022-04-12

Portfolio Optimization Using a Consistent Vector-Based MSE Estimation Approach

Authors: Maaz Mahadi, Tarig Ballal, Muhammad Moinuddin, Tareq Y. Al-Naffouri, Ubaid Al-Saggaf

This paper is concerned with optimizing the global minimum-variance
portfolio's (GMVP) weights in high-dimensional settings where both observation
and population dimensions grow at a bounded ratio. Optimizing the GMVP weights
is highly influenced by the data covariance matrix estimation. In a
high-dimensional setting, it is well known that the sample covariance matrix is
not a proper estimator of the true covariance matrix since it is not invertible
when we have fewer observations than the data dimension. Even with more
observations, the sample covariance matrix may not be well-conditioned. This
paper determines the GMVP weights based on a regularized covariance matrix
estimator to overcome the aforementioned difficulties. Unlike other methods,
the proper selection of the regularization parameter is achieved by minimizing
the mean-squared error of an estimate of the noise vector that accounts for the
uncertainty in the data mean estimation. Using random-matrix-theory tools, we
derive a consistent estimator of the achievable mean-squared error that allows
us to find the optimal regularization parameter using a simple line search.
Simulation results demonstrate the effectiveness of the proposed method when
the data dimension is larger than the number of data samples or of the same
order.

arXiv link: http://arxiv.org/abs/2204.05611v1

Econometrics arXiv updated paper (originally submitted: 2022-04-12)

Neyman allocation is minimax optimal for best arm identification with two arms

Authors: Karun Adusumilli

This note describes the optimal policy rule, according to the local
asymptotic minimax regret criterion, for best arm identification when there are
only two treatments. It is shown that the optimal sampling rule is the Neyman
allocation, which allocates a constant fraction of units to each treatment in a
manner that is proportional to the standard deviation of the treatment
outcomes. When the variances are equal, the optimal ratio is one-half. This
policy is independent of the data, so there is no adaptation to previous
outcomes. At the end of the experiment, the policy maker adopts the treatment
with higher average outcomes.

arXiv link: http://arxiv.org/abs/2204.05527v7

Econometrics arXiv updated paper (originally submitted: 2022-04-12)

Tuning Parameter-Free Nonparametric Density Estimation from Tabulated Summary Data

Authors: Ji Hyung Lee, Yuya Sasaki, Alexis Akira Toda, Yulong Wang

Administrative data are often easier to access as tabulated summaries than in
the original format due to confidentiality concerns. Motivated by this
practical feature, we propose a novel nonparametric density estimation method
from tabulated summary data based on maximum entropy and prove its strong
uniform consistency. Unlike existing kernel-based estimators, our estimator is
free from tuning parameters and admits a closed-form density that is convenient
for post-estimation analysis. We apply the proposed method to the tabulated
summary data of the U.S. tax returns to estimate the income distribution.

arXiv link: http://arxiv.org/abs/2204.05480v3

Econometrics arXiv updated paper (originally submitted: 2022-04-11)

Two-step estimation in linear regressions with adaptive learning

Authors: Alexander Mayer

Weak consistency and asymptotic normality of the ordinary least-squares
estimator in a linear regression with adaptive learning is derived when the
crucial, so-called, `gain' parameter is estimated in a first step by nonlinear
least squares from an auxiliary model. The singular limiting distribution of
the two-step estimator is normal and in general affected by the sampling
uncertainty from the first step. However, this `generated-regressor' issue
disappears for certain parameter combinations.

arXiv link: http://arxiv.org/abs/2204.05298v3

Econometrics arXiv updated paper (originally submitted: 2022-04-11)

Partially Linear Models under Data Combination

Authors: Xavier D'Haultfœuille, Christophe Gaillac, Arnaud Maurel

We study partially linear models when the outcome of interest and some of the
covariates are observed in two different datasets that cannot be linked. This
type of data combination problem arises very frequently in empirical
microeconomics. Using recent tools from optimal transport theory, we derive a
constructive characterization of the sharp identified set. We then build on
this result and develop a novel inference method that exploits the specific
geometric properties of the identified set. Our method exhibits good
performances in finite samples, while remaining very tractable. We apply our
approach to study intergenerational income mobility over the period 1850-1930
in the United States. Our method allows us to relax the exclusion restrictions
used in earlier work, while delivering confidence regions that are informative.

arXiv link: http://arxiv.org/abs/2204.05175v3

Econometrics arXiv paper, submitted: 2022-04-11

Bootstrap Cointegration Tests in ARDL Models

Authors: Stefano Bertelli, Gianmarco Vacca, Maria Grazia Zoia

The paper proposes a new bootstrap approach to the Pesaran, Shin and Smith's
bound tests in a conditional equilibrium correction model with the aim to
overcome some typical drawbacks of the latter, such as inconclusive inference
and distortion in size. The bootstrap tests are worked out under several data
generating processes, including degenerate cases. Monte Carlo simulations
confirm the better performance of the bootstrap tests with respect to bound
ones and to the asymptotic F test on the independent variables of the ARDL
model. It is also proved that any inference carried out in misspecified models,
such as unconditional ARDLs, may be misleading. Empirical applications
highlight the importance of employing the appropriate specification and provide
definitive answers to the inconclusive inference of the bound tests when
exploring the long-term equilibrium relationship between economic variables.

arXiv link: http://arxiv.org/abs/2204.04939v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2022-04-11

State capital involvement, managerial sentiment and firm innovation performance Evidence from China

Authors: Xiangtai Zuo

In recent years, more and more state-owned enterprises (SOEs) have been
embedded in the restructuring and governance of private enterprises through
equity participation, providing a more advantageous environment for private
enterprises in financing and innovation. However, there is a lack of knowledge
about the underlying mechanisms of SOE intervention on corporate innovation
performance. Hence, in this study, we investigated the association of state
capital intervention with innovation performance, meanwhile further
investigated the potential mediating and moderating role of managerial
sentiment and financing constraints, respectively, using all listed non-ST
firms from 2010 to 2020 as the sample. The results revealed two main findings:
1) state capital intervention would increase innovation performance through
managerial sentiment; 2) financing constraints would moderate the effect of
state capital intervention on firms' innovation performance.

arXiv link: http://arxiv.org/abs/2204.04860v1

Econometrics arXiv cross-link from math.OC (math.OC), submitted: 2022-04-10

Revenue Management Under the Markov Chain Choice Model with Joint Price and Assortment Decisions

Authors: Anton J. Kleywegt, Hongzhang Shao

Finding the optimal product prices and product assortment are two fundamental
problems in revenue management. Usually, a seller needs to jointly determine
the prices and assortment while managing a network of resources with limited
capacity. However, there is not yet a tractable method to efficiently solve
such a problem. Existing papers studying static joint optimization of price and
assortment cannot incorporate resource constraints. Then we study the revenue
management problem with resource constraints and price bounds, where the prices
and the product assortments need to be jointly determined over time. We showed
that under the Markov chain (MC) choice model (which subsumes the multinomial
logit (MNL) model), we could reformulate the choice-based joint optimization
problem as a tractable convex conic optimization problem. We also proved that
an optimal solution with a constant price vector exists even with constraints
on resources. In addition, a solution with both constant assortment and price
vector can be optimal when there is no resource constraint.

arXiv link: http://arxiv.org/abs/2204.04774v1

Econometrics arXiv paper, submitted: 2022-04-06

Super-linear Scaling Behavior for Electric Vehicle Chargers and Road Map to Addressing the Infrastructure Gap

Authors: Alexius Wadell, Matthew Guttenberg, Christopher P. Kempes, Venkatasubramanian Viswanathan

Enabling widespread electric vehicle (EV) adoption requires substantial
build-out of charging infrastructure in the coming decade. We formulate the
charging infrastructure needs as a scaling analysis problem and use it to
estimate the EV infrastructure needs of the US at a county-level resolution.
Surprisingly, we find that the current EV infrastructure deployment scales
super-linearly with population, deviating from the sub-linear scaling of
gasoline stations and other infrastructure. We discuss how this demonstrates
the infancy of EV station abundance compared to other mature transportation
infrastructures. By considering the power delivery of existing gasoline
stations, and appropriate EV efficiencies, we estimate the EV infrastructure
gap at the county level, providing a road map for future EV infrastructure
expansion. Our reliance on scaling analysis allows us to make a unique forecast
in this domain.

arXiv link: http://arxiv.org/abs/2204.03094v1

Econometrics arXiv cross-link from q-fin.PM (q-fin.PM), submitted: 2022-04-06

Risk budget portfolios with convex Non-negative Matrix Factorization

Authors: Bruno Spilak, Wolfgang Karl Härdle

We propose a portfolio allocation method based on risk factor budgeting using
convex Nonnegative Matrix Factorization (NMF). Unlike classical factor
analysis, PCA, or ICA, NMF ensures positive factor loadings to obtain
interpretable long-only portfolios. As the NMF factors represent separate
sources of risk, they have a quasi-diagonal correlation matrix, promoting
diversified portfolio allocations. We evaluate our method in the context of
volatility targeting on two long-only global portfolios of cryptocurrencies and
traditional assets. Our method outperforms classical portfolio allocations
regarding diversification and presents a better risk profile than hierarchical
risk parity (HRP). We assess the robustness of our findings using Monte Carlo
simulation.

arXiv link: http://arxiv.org/abs/2204.02757v2

Econometrics arXiv updated paper (originally submitted: 2022-04-05)

Finitely Heterogeneous Treatment Effect in Event-study

Authors: Myungkou Shin

A key assumption of the differences-in-differences designs is that the
average evolution of untreated potential outcomes is the same across different
treatment cohorts: a parallel trends assumption. In this paper, we relax the
parallel trend assumption by assuming a latent type variable and developing a
type-specific parallel trend assumption. With a finite support assumption on
the latent type variable and long pretreatment time periods, we show that an
extremum classifier consistently estimates the type assignment. Based on the
classification result, we propose a type-specific diff-in-diff estimator for
type-specific ATT. By estimating the type-specific ATT, we study heterogeneity
in treatment effect, in addition to heterogeneity in baseline outcomes.

arXiv link: http://arxiv.org/abs/2204.02346v5

Econometrics arXiv updated paper (originally submitted: 2022-04-05)

Asymptotic Theory for Unit Root Moderate Deviations in Quantile Autoregressions and Predictive Regressions

Authors: Christis Katsouris

We establish the asymptotic theory in quantile autoregression when the model
parameter is specified with respect to moderate deviations from the unit
boundary of the form (1 + c / k) with a convergence sequence that diverges at a
rate slower than the sample size n. Then, extending the framework proposed by
Phillips and Magdalinos (2007), we consider the limit theory for the
near-stationary and the near-explosive cases when the model is estimated with a
conditional quantile specification function and model parameters are
quantile-dependent. Additionally, a Bahadur-type representation and limiting
distributions based on the M-estimators of the model parameters are derived.
Specifically, we show that the serial correlation coefficient converges in
distribution to a ratio of two independent random variables. Monte Carlo
simulations illustrate the finite-sample performance of the estimation
procedure under investigation.

arXiv link: http://arxiv.org/abs/2204.02073v2

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2022-04-05

Microtransit adoption in the wake of the COVID-19 pandemic: evidence from a choice experiment with transit and car commuters

Authors: Jason Soria, Shelly Etzioni, Yoram Shiftan, Amanda Stathopoulos, Eran Ben-Elia

On-demand mobility platforms play an increasingly important role in urban
mobility systems. Impacts are still debated, as these platforms supply
personalized and optimized services, while also contributing to existing
sustainability challenges. Recently, microtransit services have emerged,
promising to combine advantages of pooled on-demand rides with more sustainable
fixed-route public transit services. Understanding traveler behavior becomes a
primary focus to analyze adoption likelihood and perceptions of different
microtransit attributes. The COVID-19 pandemic context adds an additional layer
of complexity to analyzing mobility innovation acceptance. This study
investigates the potential demand for microtransit options against the
background of the pandemic. We use a stated choice experiment to study the
decision-making of Israeli public transit and car commuters when offered to use
novel microtransit options (sedan vs. passenger van). We investigate the
tradeoffs related to traditional fare and travel time attributes, along with
microtransit features; namely walking time to pickup location, vehicle sharing,
waiting time, minimum advanced reservation time, and shelter at designated
boarding locations. Additionally, we analyze two latent constructs: attitudes
towards sharing, as well as experiences and risk-perceptions related to the
COVID-19 pandemic. We develop Integrated Choice and Latent Variable models to
compare the two commuter groups in terms of the likelihood to switch to
microtransit, attribute trade-offs, sharing preferences and pandemic impacts.
The results reveal high elasticities of several time and COVID effects for car
commuters compared to relative insensitivity of transit commuters to the risk
of COVID contraction. Moreover, for car commuters, those with strong sharing
identities were more likely to be comfortable in COVID risk situations, and to
accept microtransit.

arXiv link: http://arxiv.org/abs/2204.01974v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-04-04

Policy Learning with Competing Agents

Authors: Roshni Sahoo, Stefan Wager

Decision makers often aim to learn a treatment assignment policy under a
capacity constraint on the number of agents that they can treat. When agents
can respond strategically to such policies, competition arises, complicating
estimation of the optimal policy. In this paper, we study capacity-constrained
treatment assignment in the presence of such interference. We consider a
dynamic model where the decision maker allocates treatments at each time step
and heterogeneous agents myopically best respond to the previous treatment
assignment policy. When the number of agents is large but finite, we show that
the threshold for receiving treatment under a given policy converges to the
policy's mean-field equilibrium threshold. Based on this result, we develop a
consistent estimator for the policy gradient. In a semi-synthetic experiment
with data from the National Education Longitudinal Study of 1988, we
demonstrate that this estimator can be used for learning capacity-constrained
policies in the presence of strategic behavior.

arXiv link: http://arxiv.org/abs/2204.01884v5

Econometrics arXiv updated paper (originally submitted: 2022-04-04)

Kernel-weighted specification testing under general distributions

Authors: Sid Kankanala, Victoria Zinde-Walsh

Kernel-weighted test statistics have been widely used in a variety of
settings including non-stationary regression, inference on propensity score and
panel data models. We develop the limit theory for a kernel-based specification
test of a parametric conditional mean when the law of the regressors may not be
absolutely continuous to the Lebesgue measure and is contaminated with singular
components. This result is of independent interest and may be useful in other
applications that utilize kernel smoothed U-statistics. Simulations illustrate
the non-trivial impact of the distribution of the conditioning variables on the
power properties of the test statistic.

arXiv link: http://arxiv.org/abs/2204.01683v3

Econometrics arXiv paper, submitted: 2022-04-04

A Bootstrap-Assisted Self-Normalization Approach to Inference in Cointegrating Regressions

Authors: Karsten Reichold, Carsten Jentsch

Traditional inference in cointegrating regressions requires tuning parameter
choices to estimate a long-run variance parameter. Even in case these choices
are "optimal", the tests are severely size distorted. We propose a novel
self-normalization approach, which leads to a nuisance parameter free limiting
distribution without estimating the long-run variance parameter directly. This
makes our self-normalized test tuning parameter free and considerably less
prone to size distortions at the cost of only small power losses. In
combination with an asymptotically justified vector autoregressive sieve
bootstrap to construct critical values, the self-normalization approach shows
further improvement in small to medium samples when the level of error serial
correlation or regressor endogeneity is large. We illustrate the usefulness of
the bootstrap-assisted self-normalized test in empirical applications by
analyzing the validity of the Fisher effect in Germany and the United States.

arXiv link: http://arxiv.org/abs/2204.01373v1

Econometrics arXiv updated paper (originally submitted: 2022-04-04)

Capturing positive network attributes during the estimation of recursive logit models: A prism-based approach

Authors: Yuki Oyama

Although the recursive logit (RL) model has been recently popular and has led
to many applications and extensions, an important numerical issue with respect
to the computation of value functions remains unsolved. This issue is
particularly significant for model estimation, during which the parameters are
updated every iteration and may violate the feasibility condition of the value
function. To solve this numerical issue of the value function in the model
estimation, this study performs an extensive analysis of a prism-constrained RL
(Prism-RL) model proposed by Oyama and Hato (2019), which has a path set
constrained by the prism defined based upon a state-extended network
representation. The numerical experiments have shown two important properties
of the Prism-RL model for parameter estimation. First, the prism-based approach
enables estimation regardless of the initial and true parameter values, even in
cases where the original RL model cannot be estimated due to the numerical
problem. We also successfully captured a positive effect of the presence of
street green on pedestrian route choice in a real application. Second, the
Prism-RL model achieved better fit and prediction performance than the RL
model, by implicitly restricting paths with large detour or many loops.
Defining the prism-based path set in a data-oriented manner, we demonstrated
the possibility of the Prism-RL model describing more realistic route choice
behavior. The capture of positive network attributes while retaining the
diversity of path alternatives is important in many applications such as
pedestrian route choice and sequential destination choice behavior, and thus
the prism-based approach significantly extends the practical applicability of
the RL model.

arXiv link: http://arxiv.org/abs/2204.01215v3

Econometrics arXiv updated paper (originally submitted: 2022-04-02)

Robust Estimation of Conditional Factor Models

Authors: Qihui Chen

This paper develops estimation and inference methods for conditional quantile
factor models. We first introduce a simple sieve estimation, and establish
asymptotic properties of the estimators under large $N$. We then provide a
bootstrap procedure for estimating the distributions of the estimators. We also
provide two consistent estimators for the number of factors. The methods allow
us not only to estimate conditional factor structures of distributions of asset
returns utilizing characteristics, but also to conduct robust inference in
conditional factor models, which enables us to analyze the cross section of
asset returns with heavy tails. We apply the methods to analyze the cross
section of individual US stock returns.

arXiv link: http://arxiv.org/abs/2204.00801v2

Econometrics arXiv updated paper (originally submitted: 2022-04-01)

Decomposition of Differences in Distribution under Sample Selection and the Gender Wage Gap

Authors: Santiago Pereda-Fernández

I address the decomposition of the differences between the distribution of
outcomes of two groups when individuals self-select themselves into
participation. I differentiate between the decomposition for participants and
the entire population, highlighting how the primitive components of the model
affect each of the distributions of outcomes. Additionally, I introduce two
ancillary decompositions that help uncover the sources of differences in the
distribution of unobservables and participation between the two groups. The
estimation is done using existing quantile regression methods, for which I show
how to perform uniformly valid inference. I illustrate these methods by
revisiting the gender wage gap, finding that changes in female participation
and self-selection have been the main drivers for reducing the gap.

arXiv link: http://arxiv.org/abs/2204.00551v2

Econometrics arXiv updated paper (originally submitted: 2022-04-01)

Finite Sample Inference in Incomplete Models

Authors: Lixiong Li, Marc Henry

We propose confidence regions for the parameters of incomplete models with
exact coverage of the true parameter in finite samples. Our confidence region
inverts a test, which generalizes Monte Carlo tests to incomplete models. The
test statistic is a discrete analogue of a new optimal transport
characterization of the sharp identified region. Both test statistic and
critical values rely on simulation drawn from the distribution of latent
variables and are computed using solutions to discrete optimal transport, hence
linear programming problems. We also propose a fast preliminary search in the
parameter space with an alternative, more conservative yet consistent test,
based on a parameter free critical value.

arXiv link: http://arxiv.org/abs/2204.00473v3

Econometrics arXiv paper, submitted: 2022-04-01

Estimating Separable Matching Models

Authors: Alfred Galichon, Bernard Salanié

In this paper we propose two simple methods to estimate models of matching
with transferable and separable utility introduced in Galichon and Salani\'e
(2022). The first method is a minimum distance estimator that relies on the
generalized entropy of matching. The second relies on a reformulation of the
more special but popular Choo and Siow (2006) model; it uses generalized linear
models (GLMs) with two-way fixed effects.

arXiv link: http://arxiv.org/abs/2204.00362v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2022-04-01

Measuring Diagnostic Test Performance Using Imperfect Reference Tests: A Partial Identification Approach

Authors: Filip Obradović

Diagnostic tests are almost never perfect. Studies quantifying their
performance use knowledge of the true health status, measured with a reference
diagnostic test. Researchers commonly assume that the reference test is
perfect, which is often not the case in practice. When the assumption fails,
conventional studies identify "apparent" performance or performance with
respect to the reference, but not true performance. This paper provides the
smallest possible bounds on the measures of true performance - sensitivity
(true positive rate) and specificity (true negative rate), or equivalently
false positive and negative rates, in standard settings. Implied bounds on
policy-relevant parameters are derived: 1) Prevalence in screened populations;
2) Predictive values. Methods for inference based on moment inequalities are
used to construct uniformly consistent confidence sets in level over a relevant
family of data distributions. Emergency Use Authorization (EUA) and independent
study data for the BinaxNOW COVID-19 antigen test demonstrate that the bounds
can be very informative. Analysis reveals that the estimated false negative
rates for symptomatic and asymptomatic patients are up to 3.17 and 4.59 times
higher than the frequently cited "apparent" false negative rate. Further
applicability of the results in the context of imperfect proxies such as survey
responses and imputed protected classes is indicated.

arXiv link: http://arxiv.org/abs/2204.00180v4

Econometrics arXiv updated paper (originally submitted: 2022-03-29)

Testing the identification of causal effects in observational data

Authors: Martin Huber, Jannis Kueck

This study demonstrates the existence of a testable condition for the
identification of the causal effect of a treatment on an outcome in
observational data, which relies on two sets of variables: observed covariates
to be controlled for and a suspected instrument. Under a causal structure
commonly found in empirical applications, the testable conditional independence
of the suspected instrument and the outcome given the treatment and the
covariates has two implications. First, the instrument is valid, i.e. it does
not directly affect the outcome (other than through the treatment) and is
unconfounded conditional on the covariates. Second, the treatment is
unconfounded conditional on the covariates such that the treatment effect is
identified. We suggest tests of this conditional independence based on machine
learning methods that account for covariates in a data-driven way and
investigate their asymptotic behavior and finite sample performance in a
simulation study. We also apply our testing approach to evaluating the impact
of fertility on female labor supply when using the sibling sex ratio of the
first two children as supposed instrument, which by and large points to a
violation of our testable implication for the moderate set of socio-economic
covariates considered.

arXiv link: http://arxiv.org/abs/2203.15890v4

Econometrics arXiv paper, submitted: 2022-03-29

Difference-in-Differences for Policy Evaluation

Authors: Brantly Callaway

Difference-in-differences is one of the most used identification strategies
in empirical work in economics. This chapter reviews a number of important,
recent developments related to difference-in-differences. First, this chapter
reviews recent work pointing out limitations of two way fixed effects
regressions (these are panel data regressions that have been the dominant
approach to implementing difference-in-differences identification strategies)
that arise in empirically relevant settings where there are more than two time
periods, variation in treatment timing across units, and treatment effect
heterogeneity. Second, this chapter reviews recently proposed alternative
approaches that are able to circumvent these issues without being substantially
more complicated to implement. Third, this chapter covers a number of
extensions to these results, paying particular attention to (i) parallel trends
assumptions that hold only after conditioning on observed covariates and (ii)
strategies to partially identify causal effect parameters in
difference-in-differences applications in cases where the parallel trends
assumption may be violated.

arXiv link: http://arxiv.org/abs/2203.15646v1

Econometrics arXiv updated paper (originally submitted: 2022-03-29)

Estimating Nonlinear Network Data Models with Fixed Effects

Authors: David W. Hughes

I introduce a new method for bias correction of dyadic models with
agent-specific fixed effects, including the dyadic link formation model with
homophily and degree heterogeneity. The proposed approach uses a jackknife
procedure to deal with the incidental parameters problem. The method can be
applied to both directed and undirected networks, allows for non-binary outcome
variables, and can be used to bias correct estimates of average effects and
counterfactual outcomes. I also show how the jackknife can be used to bias
correct fixed-effect averages over functions that depend on multiple nodes,
e.g. triads or tetrads in the network. As an example, I implement specification
tests for dependence across dyads, such as reciprocity or transitivity.
Finally, I demonstrate the usefulness of the estimator in an application to a
gravity model for import/export relationships across countries.

arXiv link: http://arxiv.org/abs/2203.15603v3

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2022-03-28

Network structure and fragmentation of the Argentinean interbank markets

Authors: Federico Forte, Pedro Elosegui, Gabriel Montes-Rojas

This paper studies the network structure and fragmentation of the Argentinean
interbank market. Both the unsecured (CALL) and the secured (REPO) markets are
examined, applying complex network analysis. Results indicate that, although
the secured market has less participants, its nodes are more densely connected
than in the unsecured market. The interrelationships in the unsecured market
are less stable, making its structure more volatile and vulnerable to negative
shocks. The analysis identifies two 'hidden' underlying sub-networks within the
REPO market: one based on the transactions collateralized by Treasury bonds
(REPO-T) and other based on the operations collateralized by Central Bank (CB)
securities (REPO-CB). The changes in monetary policy stance and monetary
conditions seem to have a substantially smaller impact in the former than in
the latter 'sub-market'. The connectivity levels within the REPO-T market and
its structure remain relatively unaffected by the (in some period pronounced)
swings in the other segment of the market. Hence, the REPO market shows signs
of fragmentation in its inner structure, according to the type of collateral
asset involved in the transactions, so the average REPO interest rate reflects
the interplay between these two partially fragmented sub-markets. This mixed
structure of the REPO market entails one of the main sources of differentiation
with respect to the CALL market.

arXiv link: http://arxiv.org/abs/2203.14488v1

Econometrics arXiv updated paper (originally submitted: 2022-03-25)

Automatic Debiased Machine Learning for Dynamic Treatment Effects and General Nested Functionals

Authors: Victor Chernozhukov, Whitney Newey, Rahul Singh, Vasilis Syrgkanis

We extend the idea of automated debiased machine learning to the dynamic
treatment regime and more generally to nested functionals. We show that the
multiply robust formula for the dynamic treatment regime with discrete
treatments can be re-stated in terms of a recursive Riesz representer
characterization of nested mean regressions. We then apply a recursive Riesz
representer estimation learning algorithm that estimates de-biasing corrections
without the need to characterize how the correction terms look like, such as
for instance, products of inverse probability weighting terms, as is done in
prior work on doubly robust estimation in the dynamic regime. Our approach
defines a sequence of loss minimization problems, whose minimizers are the
mulitpliers of the de-biasing correction, hence circumventing the need for
solving auxiliary propensity models and directly optimizing for the mean
squared error of the target de-biasing correction. We provide further
applications of our approach to estimation of dynamic discrete choice models
and estimation of long-term effects with surrogates.

arXiv link: http://arxiv.org/abs/2203.13887v5

Econometrics arXiv cross-link from q-fin.RM (q-fin.RM), submitted: 2022-03-24

The application of techniques derived from artificial intelligence to the prediction of the solvency of bank customers: case of the application of the cart type decision tree (dt)

Authors: Karim Amzile, Rajaa Amzile

In this study we applied the CART-type Decision Tree (DT-CART) method derived
from artificial intelligence technique to the prediction of the solvency of
bank customers, for this we used historical data of bank customers. However we
have adopted the process of Data Mining techniques, for this purpose we started
with a data preprocessing in which we clean the data and we deleted all rows
with outliers or missing values as well as rows with empty columns, then we
fixed the variable to be explained (dependent or Target) and we also thought to
eliminate all explanatory (independent) variables that are not significant
using univariate analysis as well as the correlation matrix, then we applied
our CART decision tree method using the SPSS tool. After completing our process
of building our model (AD-CART), we started the process of evaluating and
testing the performance of our model, by which we found that the accuracy and
precision of our model is 71%, so we calculated the error ratios, and we found
that the error rate equal to 29%, this allowed us to conclude that our model at
a fairly good level in terms of precision, predictability and very precisely in
predicting the solvency of our banking customers.

arXiv link: http://arxiv.org/abs/2203.13001v1

Econometrics arXiv updated paper (originally submitted: 2022-03-23)

Correcting Attrition Bias using Changes-in-Changes

Authors: Dalia Ghanem, Sarojini Hirshleifer, Désiré Kédagni, Karen Ortiz-Becerra

Attrition is a common and potentially important threat to internal validity
in treatment effect studies. We extend the changes-in-changes approach to
identify the average treatment effect for respondents and the entire study
population in the presence of attrition. Our method, which exploits baseline
outcome data, can be applied to randomized experiments as well as
quasi-experimental difference-in-difference designs. A formal comparison
highlights that while widely used corrections typically impose restrictions on
whether or how response depends on treatment, our proposed attrition correction
exploits restrictions on the outcome model. We further show that the conditions
required for our correction can accommodate a broad class of response models
that depend on treatment in an arbitrary way. We illustrate the implementation
of the proposed corrections in an application to a large-scale randomized
experiment.

arXiv link: http://arxiv.org/abs/2203.12740v5

Econometrics arXiv paper, submitted: 2022-03-23

Bounds for Bias-Adjusted Treatment Effect in Linear Econometric Models

Authors: Deepankar Basu

In linear econometric models with proportional selection on unobservables,
omitted variable bias in estimated treatment effects are real roots of a cubic
equation involving estimated parameters from a short and intermediate
regression. The roots of the cubic are functions of $\delta$, the degree of
selection on unobservables, and $R_{max}$, the R-squared in a hypothetical long
regression that includes the unobservable confounder and all observable
controls. In this paper I propose and implement a novel algorithm to compute
roots of the cubic equation over relevant regions of the $\delta$-$R_{max}$
plane and use the roots to construct bounding sets for the true treatment
effect. The algorithm is based on two well-known mathematical results: (a) the
discriminant of the cubic equation can be used to demarcate regions of unique
real roots from regions of three real roots, and (b) a small change in the
coefficients of a polynomial equation will lead to small change in its roots
because the latter are continuous functions of the former. I illustrate my
method by applying it to the analysis of maternal behavior on child outcomes.

arXiv link: http://arxiv.org/abs/2203.12431v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2022-03-23

Exabel's Factor Model

Authors: Øyvind Grotmol, Michael Scheuerer, Kjersti Aas, Martin Jullum

Factor models have become a common and valued tool for understanding the
risks associated with an investing strategy. In this report we describe
Exabel's factor model, we quantify the fraction of the variability of the
returns explained by the different factors, and we show some examples of annual
returns of portfolios with different factor exposure.

arXiv link: http://arxiv.org/abs/2203.12408v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2022-03-23

Performance evaluation of volatility estimation methods for Exabel

Authors: Øyvind Grotmol, Martin Jullum, Kjersti Aas, Michael Scheuerer

Quantifying both historic and future volatility is key in portfolio risk
management. This note presents and compares estimation strategies for
volatility estimation in an estimation universe consisting on 28 629 unique
companies from February 2010 to April 2021, with 858 different portfolios. The
estimation methods are compared in terms of how they rank the volatility of the
different subsets of portfolios. The overall best performing approach estimates
volatility from direct entity returns using a GARCH model for variance
estimation.

arXiv link: http://arxiv.org/abs/2203.12402v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-03-23

Bivariate Distribution Regression with Application to Insurance Data

Authors: Yunyun Wang, Tatsushi Oka, Dan Zhu

Understanding variable dependence, particularly eliciting their statistical
properties given a set of covariates, provides the mathematical foundation in
practical operations management such as risk analysis and decision-making given
observed circumstances. This article presents an estimation method for modeling
the conditional joint distribution of bivariate outcomes based on the
distribution regression and factorization methods. This method is considered
semiparametric in that it allows for flexible modeling of both the marginal and
joint distributions conditional on covariates without imposing global
parametric assumptions across the entire distribution. In contrast to existing
parametric approaches, our method can accommodate discrete, continuous, or
mixed variables, and provides a simple yet effective way to capture
distributional dependence structures between bivariate outcomes and covariates.
Various simulation results confirm that our method can perform similarly or
better in finite samples compared to the alternative methods. In an application
to the study of a motor third-party liability insurance portfolio, the proposed
method effectively estimates risk measures such as the conditional
Value-at-Risk and Expected Shortfall. This result suggests that this
semiparametric approach can serve as an alternative in insurance risk
management.

arXiv link: http://arxiv.org/abs/2203.12228v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-03-22

Performance of long short-term memory artificial neural networks in nowcasting during the COVID-19 crisis

Authors: Daniel Hopp

The COVID-19 pandemic has demonstrated the increasing need of policymakers
for timely estimates of macroeconomic variables. A prior UNCTAD research paper
examined the suitability of long short-term memory artificial neural networks
(LSTM) for performing economic nowcasting of this nature. Here, the LSTM's
performance during the COVID-19 pandemic is compared and contrasted with that
of the dynamic factor model (DFM), a commonly used methodology in the field.
Three separate variables, global merchandise export values and volumes and
global services exports, were nowcast with actual data vintages and performance
evaluated for the second, third, and fourth quarters of 2020 and the first and
second quarters of 2021. In terms of both mean absolute error and root mean
square error, the LSTM obtained better performance in two-thirds of
variable/quarter combinations, as well as displayed more gradual forecast
evolutions with more consistent narratives and smaller revisions. Additionally,
a methodology to introduce interpretability to LSTMs is introduced and made
available in the accompanying nowcast_lstm Python library, which is now also
available in R, MATLAB, and Julia.

arXiv link: http://arxiv.org/abs/2203.11872v1

Econometrics arXiv updated paper (originally submitted: 2022-03-22)

Dealing with Logs and Zeros in Regression Models

Authors: David Benatia, Christophe Bellégo, Louis Pape

The log transformation is widely used in linear regression, mainly because
coefficients are interpretable as proportional effects. Yet this practice has
fundamental limitations, most notably that the log is undefined at zero,
creating an identification problem. We propose a new estimator, iterated OLS
(iOLS), which targets the normalized average treatment effect, preserving the
percentage-change interpretation while addressing these limitations. Our
procedure is the theoretically justified analogue of the ad-hoc log(1+Y)
transformation and delivers a consistent and asymptotically normal estimator of
the parameters of the exponential conditional mean model. iOLS is
computationally efficient, globally convergent, and free of the
incidental-parameter bias, while extending naturally to endogenous regressors
through iterated 2SLS. We illustrate the methods with simulations and revisit
three influential publications.

arXiv link: http://arxiv.org/abs/2203.11820v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-03-22

Predictor Selection for Synthetic Controls

Authors: Jaume Vives-i-Bastida

Synthetic control methods often rely on matching pre-treatment
characteristics (called predictors) of the treated unit. The choice of
predictors and how they are weighted plays a key role in the performance and
interpretability of synthetic control estimators. This paper proposes the use
of a sparse synthetic control procedure that penalizes the number of predictors
used in generating the counterfactual to select the most important predictors.
We derive, in a linear factor model framework, a new model selection
consistency result and show that the penalized procedure has a faster mean
squared error convergence rate. Through a simulation study, we then show that
the sparse synthetic control achieves lower bias and has better post-treatment
performance than the un-penalized synthetic control. Finally, we apply the
method to revisit the study of the passage of Proposition 99 in California in
an augmented setting with a large number of predictors available.

arXiv link: http://arxiv.org/abs/2203.11576v2

Econometrics arXiv updated paper (originally submitted: 2022-03-21)

Indirect Inference for Nonlinear Panel Models with Fixed Effects

Authors: Shuowen Chen

Fixed effect estimators of nonlinear panel data models suffer from the
incidental parameter problem. This leads to two undesirable consequences in
applied research: (1) point estimates are subject to large biases, and (2)
confidence intervals have incorrect coverages. This paper proposes a
simulation-based method for bias reduction. The method simulates data using the
model with estimated individual effects, and finds values of parameters by
equating fixed effect estimates obtained from observed and simulated data. The
asymptotic framework provides consistency, bias correction, and asymptotic
normality results. An application and simulations to female labor force
participation illustrates the finite-sample performance of the method.

arXiv link: http://arxiv.org/abs/2203.10683v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-03-17

GAM(L)A: An econometric model for interpretable Machine Learning

Authors: Emmanuel Flachaire, Gilles Hacheme, Sullivan Hué, Sébastien Laurent

Despite their high predictive performance, random forest and gradient
boosting are often considered as black boxes or uninterpretable models which
has raised concerns from practitioners and regulators. As an alternative, we
propose in this paper to use partial linear models that are inherently
interpretable. Specifically, this article introduces GAM-lasso (GAMLA) and
GAM-autometrics (GAMA), denoted as GAM(L)A in short. GAM(L)A combines
parametric and non-parametric functions to accurately capture linearities and
non-linearities prevailing between dependent and explanatory variables, and a
variable selection procedure to control for overfitting issues. Estimation
relies on a two-step procedure building upon the double residual method. We
illustrate the predictive performance and interpretability of GAM(L)A on a
regression and a classification problem. The results show that GAM(L)A
outperforms parametric models augmented by quadratic, cubic and interaction
effects. Moreover, the results also suggest that the performance of GAM(L)A is
not significantly different from that of random forest and gradient boosting.

arXiv link: http://arxiv.org/abs/2203.11691v1

Econometrics arXiv updated paper (originally submitted: 2022-03-17)

Selection and parallel trends

Authors: Dalia Ghanem, Pedro H. C. Sant'Anna, Kaspar Wüthrich

We study the role of selection into treatment in difference-in-differences
(DiD) designs. We derive necessary and sufficient conditions for parallel
trends assumptions under general classes of selection mechanisms. These
conditions characterize the empirical content of parallel trends. We use the
necessary conditions to provide a selection-based decomposition of the bias of
DiD and provide easy-to-implement strategies for benchmarking its components.
We also provide templates for justifying DiD in applications with and without
covariates. A reanalysis of the causal effect of NSW training programs
demonstrates the usefulness of our selection-based approach to benchmarking the
bias of DiD.

arXiv link: http://arxiv.org/abs/2203.09001v12

Econometrics arXiv updated paper (originally submitted: 2022-03-17)

Lorenz map, inequality ordering and curves based on multidimensional rearrangements

Authors: Yanqin Fan, Marc Henry, Brendan Pass, Jorge A. Rivero

We propose a multivariate extension of the Lorenz curve based on multivariate
rearrangements of optimal transport theory. We define a vector Lorenz map as
the integral of the vector quantile map associated with a multivariate resource
allocation. Each component of the Lorenz map is the cumulative share of each
resource, as in the traditional univariate case. The pointwise ordering of such
Lorenz maps defines a new multivariate majorization order, which is equivalent
to preference by any social planner with inequality averse multivariate rank
dependent social evaluation functional. We define a family of multi-attribute
Gini index and complete ordering based on the Lorenz map. We propose the level
sets of an Inverse Lorenz Function as a practical tool to visualize and compare
inequality in two dimensions, and apply it to income-wealth inequality in the
United States between 1989 and 2022.

arXiv link: http://arxiv.org/abs/2203.09000v4

Econometrics arXiv updated paper (originally submitted: 2022-03-16)

A Simple and Computationally Trivial Estimator for Grouped Fixed Effects Models

Authors: Martin Mugnier

This paper introduces a new fixed effects estimator for linear panel data
models with clustered time patterns of unobserved heterogeneity. The method
avoids non-convex and combinatorial optimization by combining a preliminary
consistent estimator of the slope coefficient, an agglomerative
pairwise-differencing clustering of cross-sectional units, and a pooled
ordinary least squares regression. Asymptotic guarantees are established in a
framework where $T$ can grow at any power of $N$, as both $N$ and $T$ approach
infinity. Unlike most existing approaches, the proposed estimator is
computationally straightforward and does not require a known upper bound on the
number of groups. As existing approaches, this method leads to a consistent
estimation of well-separated groups and an estimator of common parameters
asymptotically equivalent to the infeasible regression controlling for the true
groups. An application revisits the statistical association between income and
democracy.

arXiv link: http://arxiv.org/abs/2203.08879v5

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2022-03-16

Measurability of functionals and of ideal point forecasts

Authors: Tobias Fissler, Hajo Holzmann

The ideal probabilistic forecast for a random variable $Y$ based on an
information set $F$ is the conditional distribution of $Y$ given
$F$. In the context of point forecasts aiming to specify a functional
$T$ such as the mean, a quantile or a risk measure, the ideal point forecast is
the respective functional applied to the conditional distribution. This paper
provides a theoretical justification why this ideal forecast is actually a
forecast, that is, an $F$-measurable random variable. To that end,
the appropriate notion of measurability of $T$ is clarified and this
measurability is established for a large class of practically relevant
functionals, including elicitable ones. More generally, the measurability of
$T$ implies the measurability of any point forecast which arises by applying
$T$ to a probabilistic forecast. Similar measurability results are established
for proper scoring rules, the main tool to evaluate the predictive accuracy of
probabilistic forecasts.

arXiv link: http://arxiv.org/abs/2203.08635v1

Econometrics arXiv updated paper (originally submitted: 2022-03-15)

Pairwise Valid Instruments

Authors: Zhenting Sun, Kaspar Wüthrich

Finding valid instruments is difficult. We propose Validity Set Instrumental
Variable (VSIV) estimation, a method for estimating local average treatment
effects (LATEs) in heterogeneous causal effect models when the instruments are
partially invalid. We consider settings with pairwise valid instruments, that
is, instruments that are valid for a subset of instrument value pairs. VSIV
estimation exploits testable implications of instrument validity to remove
invalid pairs and provides estimates of the LATEs for all remaining pairs,
which can be aggregated into a single parameter of interest using
researcher-specified weights. We show that the proposed VSIV estimators are
asymptotically normal under weak conditions and remove or reduce the asymptotic
bias relative to standard LATE estimators (that is, LATE estimators that do not
use testable implications to remove invalid variation). We evaluate the finite
sample properties of VSIV estimation in application-based simulations and apply
our method to estimate the returns to college education using parental
education as an instrument.

arXiv link: http://arxiv.org/abs/2203.08050v5

Econometrics arXiv updated paper (originally submitted: 2022-03-15)

Non-Existent Moments of Earnings Growth

Authors: Silvia Sarpietro, Yuya Sasaki, Yulong Wang

The literature often employs moment-based earnings risk measures like
variance, skewness, and kurtosis. However, under heavy-tailed distributions,
these moments may not exist in the population. Our empirical analysis reveals
that population kurtosis, skewness, and variance often do not exist for the
conditional distribution of earnings growth. This challenges moment-based
analyses. We propose robust conditional Pareto exponents as novel earnings risk
measures, developing estimation and inference methods. Using the UK New
Earnings Survey Panel Dataset (NESPD) and US Panel Study of Income Dynamics
(PSID), we find: 1) Moments often fail to exist; 2) Earnings risk increases
over the life cycle; 3) Job stayers face higher earnings risk; 4) These
patterns persist during the 2007--2008 recession and the 2015--2016 positive
growth period.

arXiv link: http://arxiv.org/abs/2203.08014v3

Econometrics arXiv updated paper (originally submitted: 2022-03-13)

Encompassing Tests for Nonparametric Regressions

Authors: Elia Lapenta, Pascal Lavergne

We set up a formal framework to characterize encompassing of nonparametric
models through the L2 distance. We contrast it to previous literature on the
comparison of nonparametric regression models. We then develop testing
procedures for the encompassing hypothesis that are fully nonparametric. Our
test statistics depend on kernel regression, raising the issue of bandwidth's
choice. We investigate two alternative approaches to obtain a "small bias
property" for our test statistics. We show the validity of a wild bootstrap
method. We empirically study the use of a data-driven bandwidth and illustrate
the attractive features of our tests for small and moderate samples.

arXiv link: http://arxiv.org/abs/2203.06685v3

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2022-03-13

Measuring anomalies in cigarette sales by using official data from Spanish provinces: Are there only the anomalies detected by the Empty Pack Surveys (EPS) used by Transnational Tobacco Companies (TTCs)?

Authors: Pedro Cadahia, Antonio A. Golpe, Juan M. Martín Álvarez, E. Asensio

There is literature that questions the veracity of the studies commissioned
by the transnational tobacco companies (TTC) to measure the illicit tobacco
trade. Furthermore, there are studies that indicate that the Empty Pack Surveys
(EPS) ordered by the TTCs are oversized. The novelty of this study is that, in
addition to detecting the anomalies analyzed in the EPSs, there are provinces
in which cigarette sales are higher than reasonable values, something that the
TTCs ignore. This study analyzed simultaneously, firstly, if the EPSs
established in each of the 47 Spanish provinces were fulfilled. Second,
anomalies observed in provinces where sales exceed expected values are
measured. To achieve the objective of the paper, provincial data on cigarette
sales, price and GDP per capita are used. These data are modeled with machine
learning techniques widely used to detect anomalies in other areas. The results
reveal that the provinces in which sales below reasonable values are observed
(as detected by the EPSs) present a clear geographical pattern. Furthermore,
the values provided by the EPSs in Spain, as indicated in the previous
literature, are slightly oversized. Finally, there are regions bordering other
countries or with a high tourist influence in which the observed sales are
higher than the expected values.

arXiv link: http://arxiv.org/abs/2203.06640v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-03-11

Synthetic Controls in Action

Authors: Alberto Abadie, Jaume Vives-i-Bastida

In this article we propose a set of simple principles to guide empirical
practice in synthetic control studies. The proposed principles follow from
formal properties of synthetic control estimators, and pertain to the nature,
implications, and prevention of over-fitting biases within a synthetic control
framework, to the interpretability of the results, and to the availability of
validation exercises. We discuss and visually demonstrate the relevance of the
proposed principles under a variety of data configurations.

arXiv link: http://arxiv.org/abs/2203.06279v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-03-09

Explainable Machine Learning for Predicting Homicide Clearance in the United States

Authors: Gian Maria Campedelli

Purpose: To explore the potential of Explainable Machine Learning in the
prediction and detection of drivers of cleared homicides at the national- and
state-levels in the United States.
Methods: First, nine algorithmic approaches are compared to assess the best
performance in predicting cleared homicides country-wise, using data from the
Murder Accountability Project. The most accurate algorithm among all (XGBoost)
is then used for predicting clearance outcomes state-wise. Second, SHAP, a
framework for Explainable Artificial Intelligence, is employed to capture the
most important features in explaining clearance patterns both at the national
and state levels.
Results: At the national level, XGBoost demonstrates to achieve the best
performance overall. Substantial predictive variability is detected state-wise.
In terms of explainability, SHAP highlights the relevance of several features
in consistently predicting investigation outcomes. These include homicide
circumstances, weapons, victims' sex and race, as well as number of involved
offenders and victims.
Conclusions: Explainable Machine Learning demonstrates to be a helpful
framework for predicting homicide clearance. SHAP outcomes suggest a more
organic integration of the two theoretical perspectives emerged in the
literature. Furthermore, jurisdictional heterogeneity highlights the importance
of developing ad hoc state-level strategies to improve police performance in
clearing homicides.

arXiv link: http://arxiv.org/abs/2203.04768v1

Econometrics arXiv updated paper (originally submitted: 2022-03-08)

On Robust Inference in Time Series Regression

Authors: Richard T. Baillie, Francis X. Diebold, George Kapetanios, Kun Ho Kim, Aaron Mora

Least squares regression with heteroskedasticity consistent standard errors
("OLS-HC regression") has proved very useful in cross section environments.
However, several major difficulties, which are generally overlooked, must be
confronted when transferring the HC technology to time series environments via
heteroskedasticity and autocorrelation consistent standard errors ("OLS-HAC
regression"). First, in plausible time-series environments, OLS parameter
estimates can be inconsistent, so that OLS-HAC inference fails even
asymptotically. Second, most economic time series have autocorrelation, which
renders OLS parameter estimates inefficient. Third, autocorrelation similarly
renders conditional predictions based on OLS parameter estimates inefficient.
Finally, the structure of popular HAC covariance matrix estimators is
ill-suited for capturing the autoregressive autocorrelation typically present
in economic time series, which produces large size distortions and reduced
power in HAC-based hypothesis testing, in all but the largest samples. We show
that all four problems are largely avoided by the use of a simple and
easily-implemented dynamic regression procedure, which we call DURBIN. We
demonstrate the advantages of DURBIN with detailed simulations covering a range
of practical issues.

arXiv link: http://arxiv.org/abs/2203.04080v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2022-03-08

Honest calibration assessment for binary outcome predictions

Authors: Timo Dimitriadis, Lutz Duembgen, Alexander Henzi, Marius Puke, Johanna Ziegel

Probability predictions from binary regressions or machine learning methods
ought to be calibrated: If an event is predicted to occur with probability $x$,
it should materialize with approximately that frequency, which means that the
so-called calibration curve $p(\cdot)$ should equal the identity, $p(x) = x$
for all $x$ in the unit interval. We propose honest calibration assessment
based on novel confidence bands for the calibration curve, which are valid only
subject to the natural assumption of isotonicity. Besides testing the classical
goodness-of-fit null hypothesis of perfect calibration, our bands facilitate
inverted goodness-of-fit tests whose rejection allows for the sought-after
conclusion of a sufficiently well specified model. We show that our bands have
a finite sample coverage guarantee, are narrower than existing approaches, and
adapt to the local smoothness of the calibration curve $p$ and the local
variance of the binary observations. In an application to model predictions of
an infant having a low birth weight, the bounds give informative insights on
model calibration.

arXiv link: http://arxiv.org/abs/2203.04065v2

Econometrics arXiv updated paper (originally submitted: 2022-03-08)

When Will Arctic Sea Ice Disappear? Projections of Area, Extent, Thickness, and Volume

Authors: Francis X. Diebold, Glenn D. Rudebusch, Maximilian Goebel, Philippe Goulet Coulombe, Boyuan Zhang

Rapidly diminishing Arctic summer sea ice is a strong signal of the pace of
global climate change. We provide point, interval, and density forecasts for
four measures of Arctic sea ice: area, extent, thickness, and volume.
Importantly, we enforce the joint constraint that these measures must
simultaneously arrive at an ice-free Arctic. We apply this constrained joint
forecast procedure to models relating sea ice to atmospheric carbon dioxide
concentration and models relating sea ice directly to time. The resulting
"carbon-trend" and "time-trend" projections are mutually consistent and predict
a nearly ice-free summer Arctic Ocean by the mid-2030s with an 80% probability.
Moreover, the carbon-trend projections show that global adoption of a lower
carbon path would likely delay the arrival of a seasonally ice-free Arctic by
only a few years.

arXiv link: http://arxiv.org/abs/2203.04040v3

Econometrics arXiv updated paper (originally submitted: 2022-03-07)

Bayesian Bilinear Neural Network for Predicting the Mid-price Dynamics in Limit-Order Book Markets

Authors: Martin Magris, Mostafa Shabani, Alexandros Iosifidis

The prediction of financial markets is a challenging yet important task. In
modern electronically-driven markets, traditional time-series econometric
methods often appear incapable of capturing the true complexity of the
multi-level interactions driving the price dynamics. While recent research has
established the effectiveness of traditional machine learning (ML) models in
financial applications, their intrinsic inability to deal with uncertainties,
which is a great concern in econometrics research and real business
applications, constitutes a major drawback. Bayesian methods naturally appear
as a suitable remedy conveying the predictive ability of ML methods with the
probabilistically-oriented practice of econometric research. By adopting a
state-of-the-art second-order optimization algorithm, we train a Bayesian
bilinear neural network with temporal attention, suitable for the challenging
time-series task of predicting mid-price movements in ultra-high-frequency
limit-order book markets. We thoroughly compare our Bayesian model with
traditional ML alternatives by addressing the use of predictive distributions
to analyze errors and uncertainties associated with the estimated parameters
and model forecasts. Our results underline the feasibility of the Bayesian
deep-learning approach and its predictive and decisional advantages in complex
econometric tasks, prompting future research in this direction.

arXiv link: http://arxiv.org/abs/2203.03613v2

Econometrics arXiv updated paper (originally submitted: 2022-03-07)

Inference in Linear Dyadic Data Models with Network Spillovers

Authors: Nathan Canen, Ko Sugiura

When using dyadic data (i.e., data indexed by pairs of units), researchers
typically assume a linear model, estimate it using Ordinary Least Squares and
conduct inference using “dyadic-robust" variance estimators. The latter
assumes that dyads are uncorrelated if they do not share a common unit (e.g.,
if the same individual is not present in both pairs of data). We show that this
assumption does not hold in many empirical applications because indirect links
may exist due to network connections, generating correlated outcomes. Hence,
“dyadic-robust” estimators can be biased in such situations. We develop a
consistent variance estimator for such contexts by leveraging results in
network statistics. Our estimator has good finite sample properties in
simulations, while allowing for decay in spillover effects. We illustrate our
message with an application to politicians' voting behavior when they are
seating neighbors in the European Parliament.

arXiv link: http://arxiv.org/abs/2203.03497v5

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-03-07

High-Resolution Peak Demand Estimation Using Generalized Additive Models and Deep Neural Networks

Authors: Jonathan Berrisch, Michał Narajewski, Florian Ziel

This paper covers predicting high-resolution electricity peak demand features
given lower-resolution data. This is a relevant setup as it answers whether
limited higher-resolution monitoring helps to estimate future high-resolution
peak loads when the high-resolution data is no longer available. That question
is particularly interesting for network operators considering replacing
high-resolution monitoring predictive models due to economic considerations. We
propose models to predict half-hourly minima and maxima of high-resolution
(every minute) electricity load data while model inputs are of a lower
resolution (30 minutes). We combine predictions of generalized additive models
(GAM) and deep artificial neural networks (DNN), which are popular in load
forecasting. We extensively analyze the prediction models, including the input
parameters' importance, focusing on load, weather, and seasonal effects. The
proposed method won a data competition organized by Western Power Distribution,
a British distribution network operator. In addition, we provide a rigorous
evaluation study that goes beyond the competition frame to analyze the models'
robustness. The results show that the proposed methods are superior to the
competition benchmark concerning the out-of-sample root mean squared error
(RMSE). This holds regarding the competition month and the supplementary
evaluation study, which covers an additional eleven months. Overall, our
proposed model combination reduces the out-of-sample RMSE by 57.4% compared to
the benchmark.

arXiv link: http://arxiv.org/abs/2203.03342v2

Econometrics arXiv paper, submitted: 2022-03-06

Estimation of a Factor-Augmented Linear Model with Applications Using Student Achievement Data

Authors: Matthew Harding, Carlos Lamarche, Chris Muris

In many longitudinal settings, economic theory does not guide practitioners
on the type of restrictions that must be imposed to solve the rotational
indeterminacy of factor-augmented linear models. We study this problem and
offer several novel results on identification using internally generated
instruments. We propose a new class of estimators and establish large sample
results using recent developments on clustered samples and high-dimensional
models. We carry out simulation studies which show that the proposed approaches
improve the performance of existing methods on the estimation of unknown
factors. Lastly, we consider three empirical applications using administrative
data of students clustered in different subjects in elementary school, high
school and college.

arXiv link: http://arxiv.org/abs/2203.03051v1

Econometrics arXiv updated paper (originally submitted: 2022-03-06)

Modelplasticity and Abductive Decision Making

Authors: Subhadeep, Mukhopadhyay

`All models are wrong but some are useful' (George Box 1979). But, how to
find those useful ones starting from an imperfect model? How to make informed
data-driven decisions equipped with an imperfect model? These fundamental
questions appear to be pervasive in virtually all empirical fields -- including
economics, finance, marketing, healthcare, climate change, defense planning,
and operations research. This article presents a modern approach (builds on two
core ideas: abductive thinking and density-sharpening principle) and practical
guidelines to tackle these issues in a systematic manner.

arXiv link: http://arxiv.org/abs/2203.03040v3

Econometrics arXiv paper, submitted: 2022-03-06

Weighted-average quantile regression

Authors: Denis Chetverikov, Yukun Liu, Aleh Tsyvinski

In this paper, we introduce the weighted-average quantile regression
framework, $\int_0^1 q_{Y|X}(u)\psi(u)du = X'\beta$, where $Y$ is a dependent
variable, $X$ is a vector of covariates, $q_{Y|X}$ is the quantile function of
the conditional distribution of $Y$ given $X$, $\psi$ is a weighting function,
and $\beta$ is a vector of parameters. We argue that this framework is of
interest in many applied settings and develop an estimator of the vector of
parameters $\beta$. We show that our estimator is $\sqrt T$-consistent and
asymptotically normal with mean zero and easily estimable covariance matrix,
where $T$ is the size of available sample. We demonstrate the usefulness of our
estimator by applying it in two empirical settings. In the first setting, we
focus on financial data and study the factor structures of the expected
shortfalls of the industry portfolios. In the second setting, we focus on wage
data and study inequality and social welfare dependence on commonly used
individual characteristics.

arXiv link: http://arxiv.org/abs/2203.03032v1

Econometrics arXiv paper, submitted: 2022-03-04

Latent Unbalancedness in Three-Way Gravity Models

Authors: Daniel Czarnowske, Amrei Stammann

Many panel data sets used for pseudo-poisson estimation of three-way gravity
models are implicitly unbalanced because uninformative observations are
redundant for the estimation. We show with real data as well as simulations
that this phenomenon, which we call latent unbalancedness, amplifies the
inference problem recently studied by Weidner and Zylkin (2021).

arXiv link: http://arxiv.org/abs/2203.02235v1

Econometrics arXiv paper, submitted: 2022-03-04

A Classifier-Lasso Approach for Estimating Production Functions with Latent Group Structures

Authors: Daniel Czarnowske

I present a new estimation procedure for production functions with latent
group structures. I consider production functions that are heterogeneous across
groups but time-homogeneous within groups, and where the group membership of
the firms is unknown. My estimation procedure is fully data-driven and embeds
recent identification strategies from the production function literature into
the classifier-Lasso. Simulation experiments demonstrate that firms are
assigned to their correct latent group with probability close to one. I apply
my estimation procedure to a panel of Chilean firms and find sizable
differences in the estimates compared to the standard approach of
classification by industry.

arXiv link: http://arxiv.org/abs/2203.02220v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2022-03-02

A Modern Gauss-Markov Theorem? Really?

Authors: Benedikt M. Pötscher, David Preinerstorfer

We show that the theorems in Hansen (2021a) (the version accepted by
Econometrica), except for one, are not new as they coincide with classical
theorems like the good old Gauss-Markov or Aitken Theorem, respectively; the
exceptional theorem is incorrect. Hansen (2021b) corrects this theorem. As a
result, all theorems in the latter version coincide with the above mentioned
classical theorems. Furthermore, we also show that the theorems in Hansen
(2022) (the version published in Econometrica) either coincide with the
classical theorems just mentioned, or contain extra assumptions that are alien
to the Gauss-Markov or Aitken Theorem.

arXiv link: http://arxiv.org/abs/2203.01425v5

Econometrics arXiv paper, submitted: 2022-03-01

Minimax Risk in Estimating Kink Threshold and Testing Continuity

Authors: Javier Hidalgo, Heejun Lee, Jungyoon Lee, Myung Hwan Seo

We derive a risk lower bound in estimating the threshold parameter without
knowing whether the threshold regression model is continuous or not. The bound
goes to zero as the sample size $ n $ grows only at the cube root rate.
Motivated by this finding, we develop a continuity test for the threshold
regression model and a bootstrap to compute its p-values. The validity
of the bootstrap is established, and its finite sample property is explored
through Monte Carlo simulations.

arXiv link: http://arxiv.org/abs/2203.00349v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-02-28

Estimating causal effects with optimization-based methods: A review and empirical comparison

Authors: Martin Cousineau, Vedat Verter, Susan A. Murphy, Joelle Pineau

In the absence of randomized controlled and natural experiments, it is
necessary to balance the distributions of (observable) covariates of the
treated and control groups in order to obtain an unbiased estimate of a causal
effect of interest; otherwise, a different effect size may be estimated, and
incorrect recommendations may be given. To achieve this balance, there exist a
wide variety of methods. In particular, several methods based on optimization
models have been recently proposed in the causal inference literature. While
these optimization-based methods empirically showed an improvement over a
limited number of other causal inference methods in their relative ability to
balance the distributions of covariates and to estimate causal effects, they
have not been thoroughly compared to each other and to other noteworthy causal
inference methods. In addition, we believe that there exist several unaddressed
opportunities that operational researchers could contribute with their advanced
knowledge of optimization, for the benefits of the applied researchers that use
causal inference tools. In this review paper, we present an overview of the
causal inference literature and describe in more detail the optimization-based
causal inference methods, provide a comparative analysis of the prevailing
optimization-based methods, and discuss opportunities for new methods.

arXiv link: http://arxiv.org/abs/2203.00097v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-02-28

Dynamic Spatiotemporal ARCH Models

Authors: Philipp Otto, Osman Doğan, Süleyman Taşpınar

Geo-referenced data are characterized by an inherent spatial dependence due
to the geographical proximity. In this paper, we introduce a dynamic
spatiotemporal autoregressive conditional heteroscedasticity (ARCH) process to
describe the effects of (i) the log-squared time-lagged outcome variable, i.e.,
the temporal effect, (ii) the spatial lag of the log-squared outcome variable,
i.e., the spatial effect, and (iii) the spatial lag of the log-squared
time-lagged outcome variable, i.e., the spatiotemporal effect, on the
volatility of an outcome variable. Furthermore, our suggested process allows
for the fixed effects over time and space to account for the unobserved
heterogeneity. For this dynamic spatiotemporal ARCH model, we derive a
generalized method of moments (GMM) estimator based on the linear and quadratic
moment conditions of a specific transformation. We show the consistency and
asymptotic normality of the GMM estimator, and determine the best set of moment
functions. We investigate the finite-sample properties of the proposed GMM
estimator in a series of Monte-Carlo simulations with different model
specifications and error distributions. Our simulation results show that our
suggested GMM estimator has good finite sample properties. In an empirical
application, we use monthly log-returns of the average condominium prices of
each postcode of Berlin from 1995 to 2015 (190 spatial units, 240 time points)
to demonstrate the use of our suggested model. Our estimation results show that
the temporal, spatial and spatiotemporal lags of the log-squared returns have
statistically significant effects on the volatility of the log-returns.

arXiv link: http://arxiv.org/abs/2202.13856v1

Econometrics arXiv paper, submitted: 2022-02-28

Forecasting US Inflation Using Bayesian Nonparametric Models

Authors: Todd E. Clark, Florian Huber, Gary Koop, Massimiliano Marcellino

The relationship between inflation and predictors such as unemployment is
potentially nonlinear with a strength that varies over time, and prediction
errors error may be subject to large, asymmetric shocks. Inspired by these
concerns, we develop a model for inflation forecasting that is nonparametric
both in the conditional mean and in the error using Gaussian and Dirichlet
processes, respectively. We discuss how both these features may be important in
producing accurate forecasts of inflation. In a forecasting exercise involving
CPI inflation, we find that our approach has substantial benefits, both overall
and in the left tail, with nonparametric modeling of the conditional mean being
of particular importance.

arXiv link: http://arxiv.org/abs/2202.13793v1

Econometrics arXiv updated paper (originally submitted: 2022-02-28)

Personalized Subsidy Rules

Authors: Yu-Chang Chen, Haitian Xie

Subsidies are commonly used to encourage behaviors that can lead to short- or
long-term benefits. Typical examples include subsidized job training programs
and provisions of preventive health products, in which both behavioral
responses and associated gains can exhibit heterogeneity. This study uses the
marginal treatment effect (MTE) framework to study personalized assignments of
subsidies based on individual characteristics. First, we derive the optimality
condition for a welfare-maximizing subsidy rule by showing that the welfare can
be represented as a function of the MTE. Next, we show that subsidies generally
result in better welfare than directly mandating the encouraged behavior
because subsidy rules implicitly target individuals through unobserved
heterogeneity in the behavioral response. When there is positive selection,
that is, when individuals with higher returns are more likely to select the
encouraged behavior, the optimal subsidy rule achieves the first-best welfare,
which is the optimal welfare if a policy-maker can observe individuals' private
information. We then provide methods to (partially) identify the optimal
subsidy rule when the MTE is identified and unidentified. Particularly,
positive selection allows for the point identification of the optimal subsidy
rule even when the MTE curve is not. As an empirical application, we study the
optimal wage subsidy using the experimental data from the Jordan New
Opportunities for Women pilot study.

arXiv link: http://arxiv.org/abs/2202.13545v2

Econometrics arXiv updated paper (originally submitted: 2022-02-25)

Variational inference for large Bayesian vector autoregressions

Authors: Mauro Bernardi, Daniele Bianchi, Nicolas Bianco

We propose a novel variational Bayes approach to estimate high-dimensional
vector autoregression (VAR) models with hierarchical shrinkage priors. Our
approach does not rely on a conventional structural VAR representation of the
parameter space for posterior inference. Instead, we elicit hierarchical
shrinkage priors directly on the matrix of regression coefficients so that (1)
the prior structure directly maps into posterior inference on the reduced-form
transition matrix, and (2) posterior estimates are more robust to variables
permutation. An extensive simulation study provides evidence that our approach
compares favourably against existing linear and non-linear Markov Chain Monte
Carlo and variational Bayes methods. We investigate both the statistical and
economic value of the forecasts from our variational inference approach within
the context of a mean-variance investor allocating her wealth in a large set of
different industry portfolios. The results show that more accurate estimates
translate into substantial statistical and economic out-of-sample gains. The
results hold across different hierarchical shrinkage priors and model
dimensions.

arXiv link: http://arxiv.org/abs/2202.12644v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-02-25

A general characterization of optimal tie-breaker designs

Authors: Harrison H. Li, Art B. Owen

Tie-breaker designs trade off a statistical design objective with short-term
gain from preferentially assigning a binary treatment to those with high values
of a running variable $x$. The design objective is any continuous function of
the expected information matrix in a two-line regression model, and short-term
gain is expressed as the covariance between the running variable and the
treatment indicator. We investigate how to specify design functions indicating
treatment probabilities as a function of $x$ to optimize these competing
objectives, under external constraints on the number of subjects receiving
treatment. Our results include sharp existence and uniqueness guarantees, while
accommodating the ethically appealing requirement that treatment probabilities
are non-decreasing in $x$. Under such a constraint, there always exists an
optimal design function that is constant below and above a single
discontinuity. When the running variable distribution is not symmetric or the
fraction of subjects receiving the treatment is not $1/2$, our optimal designs
improve upon a $D$-optimality objective without sacrificing short-term gain,
compared to the three level tie-breaker designs of Owen and Varian (2020) that
fix treatment probabilities at $0$, $1/2$, and $1$. We illustrate our optimal
designs with data from Head Start, an early childhood government intervention
program.

arXiv link: http://arxiv.org/abs/2202.12511v2

Econometrics arXiv updated paper (originally submitted: 2022-02-25)

Fast variational Bayes methods for multinomial probit models

Authors: Rubén Loaiza-Maya, Didier Nibbering

The multinomial probit model is often used to analyze choice behaviour.
However, estimation with existing Markov chain Monte Carlo (MCMC) methods is
computationally costly, which limits its applicability to large choice data
sets. This paper proposes a variational Bayes method that is accurate and fast,
even when a large number of choice alternatives and observations are
considered. Variational methods usually require an analytical expression for
the unnormalized posterior density and an adequate choice of variational
family. Both are challenging to specify in a multinomial probit, which has a
posterior that requires identifying restrictions and is augmented with a large
set of latent utilities. We employ a spherical transformation on the covariance
matrix of the latent utilities to construct an unnormalized augmented posterior
that identifies the parameters, and use the conditional posterior of the latent
utilities as part of the variational family. The proposed method is faster than
MCMC, and can be made scalable to both a large number of choice alternatives
and a large number of observations. The accuracy and scalability of our method
is illustrated in numerical experiments and real purchase data with one million
observations.

arXiv link: http://arxiv.org/abs/2202.12495v2

Econometrics arXiv paper, submitted: 2022-02-24

Confidence Intervals of Treatment Effects in Panel Data Models with Interactive Fixed Effects

Authors: Xingyu Li, Yan Shen, Qiankun Zhou

We consider the construction of confidence intervals for treatment effects
estimated using panel models with interactive fixed effects. We first use the
factor-based matrix completion technique proposed by Bai and Ng (2021) to
estimate the treatment effects, and then use bootstrap method to construct
confidence intervals of the treatment effects for treated units at each
post-treatment period. Our construction of confidence intervals requires
neither specific distributional assumptions on the error terms nor large number
of post-treatment periods. We also establish the validity of the proposed
bootstrap procedure that these confidence intervals have asymptotically correct
coverage probabilities. Simulation studies show that these confidence intervals
have satisfactory finite sample performances, and empirical applications using
classical datasets yield treatment effect estimates of similar magnitudes and
reliable confidence intervals.

arXiv link: http://arxiv.org/abs/2202.12078v1

Econometrics arXiv updated paper (originally submitted: 2022-02-24)

Semiparametric Estimation of Dynamic Binary Choice Panel Data Models

Authors: Fu Ouyang, Thomas Tao Yang

We propose a new approach to the semiparametric analysis of panel data binary
choice models with fixed effects and dynamics (lagged dependent variables). The
model we consider has the same random utility framework as in Honore and
Kyriazidou (2000). We demonstrate that, with additional serial dependence
conditions on the process of deterministic utility and tail restrictions on the
error distribution, the (point) identification of the model can proceed in two
steps, and only requires matching the value of an index function of explanatory
variables over time, as opposed to that of each explanatory variable. Our
identification approach motivates an easily implementable, two-step maximum
score (2SMS) procedure -- producing estimators whose rates of convergence, in
contrast to Honore and Kyriazidou's (2000) methods, are independent of the
model dimension. We then derive the asymptotic properties of the 2SMS procedure
and propose bootstrap-based distributional approximations for inference. Monte
Carlo evidence indicates that our procedure performs adequately in finite
samples.

arXiv link: http://arxiv.org/abs/2202.12062v4

Econometrics arXiv updated paper (originally submitted: 2022-02-23)

Distributional Counterfactual Analysis in High-Dimensional Setup

Authors: Ricardo Masini

In the context of treatment effect estimation, this paper proposes a new
methodology to recover the counterfactual distribution when there is a single
(or a few) treated unit and possibly a high-dimensional number of potential
controls observed in a panel structure. The methodology accommodates, albeit
does not require, the number of units to be larger than the number of time
periods (high-dimensional setup). As opposed to modeling only the conditional
mean, we propose to model the entire conditional quantile function (CQF)
without intervention and estimate it using the pre-intervention period by a
l1-penalized regression. We derive non-asymptotic bounds for the estimated CQF
valid uniformly over the quantiles. The bounds are explicit in terms of the
number of time periods, the number of control units, the weak dependence
coefficient (beta-mixing), and the tail decay of the random variables. The
results allow practitioners to re-construct the entire counterfactual
distribution. Moreover, we bound the probability coverage of this estimated
CQF, which can be used to construct valid confidence intervals for the
(possibly random) treatment effect for every post-intervention period. We also
propose a new hypothesis test for the sharp null of no-effect based on the Lp
norm of deviation of the estimated CQF to the population one. Interestingly,
the null distribution is quasi-pivotal in the sense that it only depends on the
estimated CQF, Lp norm, and the number of post-intervention periods, but not on
the size of the post-intervention period. For that reason, critical values can
then be easily simulated. We illustrate the methodology by revisiting the
empirical study in Acemoglu, Johnson, Kermani, Kwak and Mitton (2016).

arXiv link: http://arxiv.org/abs/2202.11671v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-02-22

Differentially Private Estimation of Heterogeneous Causal Effects

Authors: Fengshi Niu, Harsha Nori, Brian Quistorff, Rich Caruana, Donald Ngwe, Aadharsh Kannan

Estimating heterogeneous treatment effects in domains such as healthcare or
social science often involves sensitive data where protecting privacy is
important. We introduce a general meta-algorithm for estimating conditional
average treatment effects (CATE) with differential privacy (DP) guarantees. Our
meta-algorithm can work with simple, single-stage CATE estimators such as
S-learner and more complex multi-stage estimators such as DR and R-learner. We
perform a tight privacy analysis by taking advantage of sample splitting in our
meta-algorithm and the parallel composition property of differential privacy.
In this paper, we implement our approach using DP-EBMs as the base learner.
DP-EBMs are interpretable, high-accuracy models with privacy guarantees, which
allow us to directly observe the impact of DP noise on the learned causal
model. Our experiments show that multi-stage CATE estimators incur larger
accuracy loss than single-stage CATE or ATE estimators and that most of the
accuracy loss from differential privacy is due to an increase in variance, not
biased estimates of treatment effects.

arXiv link: http://arxiv.org/abs/2202.11043v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-02-21

Multivariate Tie-breaker Designs

Authors: Tim P. Morrison, Art B. Owen

In a tie-breaker design (TBD), subjects with high values of a running
variable are given some (usually desirable) treatment, subjects with low values
are not, and subjects in the middle are randomized. TBDs are intermediate
between regression discontinuity designs (RDDs) and randomized controlled
trials (RCTs). TBDs allow a tradeoff between the resource allocation efficiency
of an RDD and the statistical efficiency of an RCT. We study a model where the
expected response is one multivariate regression for treated subjects and
another for control subjects. We propose a prospective D-optimality, analogous
to Bayesian optimal design, to understand design tradeoffs without reference to
a specific data set. For given covariates, we show how to use convex
optimization to choose treatment probabilities that optimize this criterion. We
can incorporate a variety of constraints motivated by economic and ethical
considerations. In our model, D-optimality for the treatment effect coincides
with D-optimality for the whole regression, and, without constraints, an RCT is
globally optimal. We show that a monotonicity constraint favoring more
deserving subjects induces sparsity in the number of distinct treatment
probabilities. We apply the convex optimization solution to a semi-synthetic
example involving triage data from the MIMIC-IV-ED database.

arXiv link: http://arxiv.org/abs/2202.10030v5

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2022-02-20

Score Driven Generalized Fitness Model for Sparse and Weighted Temporal Networks

Authors: Domenico Di Gangi, Giacomo Bormetti, Fabrizio Lillo

While the vast majority of the literature on models for temporal networks
focuses on binary graphs, often one can associate a weight to each link. In
such cases the data are better described by a weighted, or valued, network. An
important well known fact is that real world weighted networks are typically
sparse. We propose a novel time varying parameter model for sparse and weighted
temporal networks as a combination of the fitness model, appropriately
extended, and the score driven framework. We consider a zero augmented
generalized linear model to handle the weights and an observation driven
approach to describe time varying parameters. The result is a flexible approach
where the probability of a link to exist is independent from its expected
weight. This represents a crucial difference with alternative specifications
proposed in the recent literature, with relevant implications for the
flexibility of the model.
Our approach also accommodates for the dependence of the network dynamics on
external variables. We present a link forecasting analysis to data describing
the overnight exposures in the Euro interbank market and investigate whether
the influence of EONIA rates on the interbank network dynamics has changed over
time.

arXiv link: http://arxiv.org/abs/2202.09854v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-02-20

A Unified Nonparametric Test of Transformations on Distribution Functions with Nuisance Parameters

Authors: Xingyu Li, Xiaojun Song, Zhenting Sun

This paper proposes a simple unified approach to testing transformations on
cumulative distribution functions (CDFs) in the presence of nuisance
parameters. The proposed test is constructed based on a new characterization
that avoids the estimation of nuisance parameters. The critical values are
obtained through a numerical bootstrap method which can easily be implemented
in practice. Under suitable conditions, the proposed test is shown to be
asymptotically size controlled and consistent. The local power property of the
test is established. Finally, Monte Carlo simulations and an empirical study
show that the test performs well on finite samples.

arXiv link: http://arxiv.org/abs/2202.11031v2

Econometrics arXiv paper, submitted: 2022-02-18

Long Run Risk in Stationary Structural Vector Autoregressive Models

Authors: Christian Gourieroux, Joann Jasiak

This paper introduces a local-to-unity/small sigma process for a stationary
time series with strong persistence and non-negligible long run risk. This
process represents the stationary long run component in an unobserved short-
and long-run components model involving different time scales. More
specifically, the short run component evolves in the calendar time and the long
run component evolves in an ultra long time scale. We develop the methods of
estimation and long run prediction for the univariate and multivariate
Structural VAR (SVAR) models with unobserved components and reveal the
impossibility to consistently estimate some of the long run parameters. The
approach is illustrated by a Monte-Carlo study and an application to
macroeconomic data.

arXiv link: http://arxiv.org/abs/2202.09473v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2022-02-18

A multivariate extension of the Misspecification-Resistant Information Criterion

Authors: Gery Andrés Díaz Rubio, Simone Giannerini, Greta Goracci

The Misspecification-Resistant Information Criterion (MRIC) proposed in
[H.-L. Hsu, C.-K. Ing, H. Tong: On model selection from a finite family of
possibly misspecified time series models. The Annals of Statistics. 47 (2),
1061--1087 (2019)] is a model selection criterion for univariate parametric
time series that enjoys both the property of consistency and asymptotic
efficiency. In this article we extend the MRIC to the case where the response
is a multivariate time series and the predictor is univariate. The extension
requires novel derivations based upon random matrix theory. We obtain an
asymptotic expression for the mean squared prediction error matrix, the
vectorial MRIC and prove the consistency of its method-of-moments estimator.
Moreover, we prove its asymptotic efficiency. Finally, we show with an example
that, in presence of misspecification, the vectorial MRIC identifies the best
predictive model whereas traditional information criteria like AIC or BIC fail
to achieve the task.

arXiv link: http://arxiv.org/abs/2202.09225v1

Econometrics arXiv cross-link from cs.AI (cs.AI), submitted: 2022-02-17

Counterfactual Analysis of the Impact of the IMF Program on Child Poverty in the Global-South Region using Causal-Graphical Normalizing Flows

Authors: Sourabh Balgi, Jose M. Peña, Adel Daoud

This work demonstrates the application of a particular branch of causal
inference and deep learning models: causal-Graphical Normalizing Flows
(c-GNFs). In a recent contribution, scholars showed that normalizing flows
carry certain properties, making them particularly suitable for causal and
counterfactual analysis. However, c-GNFs have only been tested in a simulated
data setting and no contribution to date have evaluated the application of
c-GNFs on large-scale real-world data. Focusing on the AI for social
good, our study provides a counterfactual analysis of the impact of the
International Monetary Fund (IMF) program on child poverty using c-GNFs. The
analysis relies on a large-scale real-world observational data: 1,941,734
children under the age of 18, cared for by 567,344 families residing in the 67
countries from the Global-South. While the primary objective of the IMF is to
support governments in achieving economic stability, our results find that an
IMF program reduces child poverty as a positive side-effect by about
1.2$\pm$0.24 degree (`0' equals no poverty and `7' is maximum poverty). Thus,
our article shows how c-GNFs further the use of deep learning and causal
inference in AI for social good. It shows how learning algorithms can be used
for addressing the untapped potential for a significant social impact through
counterfactual inference at population level (ACE), sub-population level
(CACE), and individual level (ICE). In contrast to most works that model ACE or
CACE but not ICE, c-GNFs enable personalization using `The First Law of
Causal Inference'.

arXiv link: http://arxiv.org/abs/2202.09391v1

Econometrics arXiv updated paper (originally submitted: 2022-02-17)

Synthetic Control As Online Linear Regression

Authors: Jiafeng Chen

This paper notes a simple connection between synthetic control and online
learning. Specifically, we recognize synthetic control as an instance of
Follow-The-Leader (FTL). Standard results in online convex optimization then
imply that, even when outcomes are chosen by an adversary, synthetic control
predictions of counterfactual outcomes for the treated unit perform almost as
well as an oracle weighted average of control units' outcomes. Synthetic
control on differenced data performs almost as well as oracle weighted
difference-in-differences, potentially making it an attractive choice in
practice. We argue that this observation further supports the use of synthetic
control estimators in comparative case studies.

arXiv link: http://arxiv.org/abs/2202.08426v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-02-16

CAREER: A Foundation Model for Labor Sequence Data

Authors: Keyon Vafa, Emil Palikot, Tianyu Du, Ayush Kanodia, Susan Athey, David M. Blei

Labor economists regularly analyze employment data by fitting predictive
models to small, carefully constructed longitudinal survey datasets. Although
machine learning methods offer promise for such problems, these survey datasets
are too small to take advantage of them. In recent years large datasets of
online resumes have also become available, providing data about the career
trajectories of millions of individuals. However, standard econometric models
cannot take advantage of their scale or incorporate them into the analysis of
survey data. To this end we develop CAREER, a foundation model for job
sequences. CAREER is first fit to large, passively-collected resume data and
then fine-tuned to smaller, better-curated datasets for economic inferences. We
fit CAREER to a dataset of 24 million job sequences from resumes, and adjust it
on small longitudinal survey datasets. We find that CAREER forms accurate
predictions of job sequences, outperforming econometric baselines on three
widely-used economics datasets. We further find that CAREER can be used to form
good predictions of other downstream variables. For example, incorporating
CAREER into a wage model provides better predictions than the econometric
models currently in use.

arXiv link: http://arxiv.org/abs/2202.08370v4

Econometrics arXiv paper, submitted: 2022-02-16

Fairness constraint in Structural Econometrics and Application to fair estimation using Instrumental Variables

Authors: Samuele Centorrino, Jean-Pierre Florens, Jean-Michel Loubes

A supervised machine learning algorithm determines a model from a learning
sample that will be used to predict new observations. To this end, it
aggregates individual characteristics of the observations of the learning
sample. But this information aggregation does not consider any potential
selection on unobservables and any status-quo biases which may be contained in
the training sample. The latter bias has raised concerns around the so-called
fairness of machine learning algorithms, especially towards
disadvantaged groups. In this chapter, we review the issue of fairness in
machine learning through the lenses of structural econometrics models in which
the unknown index is the solution of a functional equation and issues of
endogeneity are explicitly accounted for. We model fairness as a linear
operator whose null space contains the set of strictly {\it fair} indexes. A
{\it fair} solution is obtained by projecting the unconstrained index into the
null space of this operator or by directly finding the closest solution of the
functional equation into this null space. We also acknowledge that policymakers
may incur a cost when moving away from the status quo. Achieving
approximate fairness is obtained by introducing a fairness penalty in
the learning procedure and balancing more or less heavily the influence between
the status quo and a full fair solution.

arXiv link: http://arxiv.org/abs/2202.08977v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2022-02-15

An Equilibrium Model of the First-Price Auction with Strategic Uncertainty: Theory and Empirics

Authors: Bernhard Kasberger

In many first-price auctions, bidders face considerable strategic
uncertainty: They cannot perfectly anticipate the other bidders' bidding
behavior. We propose a model in which bidders do not know the entire
distribution of opponent bids but only the expected (winning) bid and lower and
upper bounds on the opponent bids. We characterize the optimal bidding
strategies and prove the existence of equilibrium beliefs. Finally, we apply
the model to estimate the cost distribution in highway procurement auctions and
find good performance out-of-sample.

arXiv link: http://arxiv.org/abs/2202.07517v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-02-15

Long-term Causal Inference Under Persistent Confounding via Data Combination

Authors: Guido Imbens, Nathan Kallus, Xiaojie Mao, Yuhao Wang

We study the identification and estimation of long-term treatment effects
when both experimental and observational data are available. Since the
long-term outcome is observed only after a long delay, it is not measured in
the experimental data, but only recorded in the observational data. However,
both types of data include observations of some short-term outcomes. In this
paper, we uniquely tackle the challenge of persistent unmeasured confounders,
i.e., some unmeasured confounders that can simultaneously affect the treatment,
short-term outcomes and the long-term outcome, noting that they invalidate
identification strategies in previous literature. To address this challenge, we
exploit the sequential structure of multiple short-term outcomes, and develop
three novel identification strategies for the average long-term treatment
effect. We further propose three corresponding estimators and prove their
asymptotic consistency and asymptotic normality. We finally apply our methods
to estimate the effect of a job training program on long-term employment using
semi-synthetic data. We numerically show that our proposals outperform existing
methods that fail to handle persistent confounders.

arXiv link: http://arxiv.org/abs/2202.07234v5

Econometrics arXiv updated paper (originally submitted: 2022-02-15)

Asymptotics of Cointegration Tests for High-Dimensional VAR($k$)

Authors: Anna Bykhovskaya, Vadim Gorin

The paper studies nonstationary high-dimensional vector autoregressions of
order $k$, VAR($k$). Additional deterministic terms such as trend or
seasonality are allowed. The number of time periods, $T$, and the number of
coordinates, $N$, are assumed to be large and of the same order. Under this
regime the first-order asymptotics of the Johansen likelihood ratio (LR),
Pillai-Bartlett, and Hotelling-Lawley tests for cointegration are derived: the
test statistics converge to nonrandom integrals. For more refined analysis, the
paper proposes and analyzes a modification of the Johansen test. The new test
for the absence of cointegration converges to the partial sum of the Airy$_1$
point process. Supporting Monte Carlo simulations indicate that the same
behavior persists universally in many situations beyond those considered in our
theorems.
The paper presents empirical implementations of the approach for the analysis
of S$&$P$100$ stocks and of cryptocurrencies. The latter example has a strong
presence of multiple cointegrating relationships, while the results for the
former are consistent with the null of no cointegration.

arXiv link: http://arxiv.org/abs/2202.07150v4

Econometrics arXiv paper, submitted: 2022-02-14

Sequential Monte Carlo With Model Tempering

Authors: Marko Mlikota, Frank Schorfheide

Modern macroeconometrics often relies on time series models for which it is
time-consuming to evaluate the likelihood function. We demonstrate how Bayesian
computations for such models can be drastically accelerated by reweighting and
mutating posterior draws from an approximating model that allows for fast
likelihood evaluations, into posterior draws from the model of interest, using
a sequential Monte Carlo (SMC) algorithm. We apply the technique to the
estimation of a vector autoregression with stochastic volatility and a
nonlinear dynamic stochastic general equilibrium model. The runtime reductions
we obtain range from 27% to 88%.

arXiv link: http://arxiv.org/abs/2202.07070v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-02-12

scpi: Uncertainty Quantification for Synthetic Control Methods

Authors: Matias D. Cattaneo, Yingjie Feng, Filippo Palomba, Rocio Titiunik

The synthetic control method offers a way to quantify the effect of an
intervention using weighted averages of untreated units to approximate the
counterfactual outcome that the treated unit(s) would have experienced in the
absence of the intervention. This method is useful for program evaluation and
causal inference in observational studies. We introduce the software package
scpi for prediction and inference using synthetic controls, implemented in
Python, R, and Stata. For point estimation or prediction of treatment effects,
the package offers an array of (possibly penalized) approaches leveraging the
latest optimization methods. For uncertainty quantification, the package offers
the prediction interval methods introduced by Cattaneo, Feng and Titiunik
(2021) and Cattaneo, Feng, Palomba and Titiunik (2022). The paper includes
numerical illustrations and a comparison with other synthetic control software.

arXiv link: http://arxiv.org/abs/2202.05984v3

Econometrics arXiv updated paper (originally submitted: 2022-02-10)

Benign-Overfitting in Conditional Average Treatment Effect Prediction with Linear Regression

Authors: Masahiro Kato, Masaaki Imaizumi

We study the benign overfitting theory in the prediction of the conditional
average treatment effect (CATE), with linear regression models. As the
development of machine learning for causal inference, a wide range of
large-scale models for causality are gaining attention. One problem is that
suspicions have been raised that the large-scale models are prone to
overfitting to observations with sample selection, hence the large models may
not be suitable for causal prediction. In this study, to resolve the
suspicious, we investigate on the validity of causal inference methods for
overparameterized models, by applying the recent theory of benign overfitting
(Bartlett et al., 2020). Specifically, we consider samples whose distribution
switches depending on an assignment rule, and study the prediction of CATE with
linear models whose dimension diverges to infinity. We focus on two methods:
the T-learner, which based on a difference between separately constructed
estimators with each treatment group, and the inverse probability weight
(IPW)-learner, which solves another regression problem approximated by a
propensity score. In both methods, the estimator consists of interpolators that
fit the samples perfectly. As a result, we show that the T-learner fails to
achieve the consistency except the random assignment, while the IPW-learner
converges the risk to zero if the propensity score is known. This difference
stems from that the T-learner is unable to preserve eigenspaces of the
covariances, which is necessary for benign overfitting in the overparameterized
setting. Our result provides new insights into the usage of causal inference
methods in the overparameterizated setting, in particular, doubly robust
estimators.

arXiv link: http://arxiv.org/abs/2202.05245v2

Econometrics arXiv updated paper (originally submitted: 2022-02-10)

von Mises-Fisher distributions and their statistical divergence

Authors: Toru Kitagawa, Jeff Rowley

The von Mises-Fisher family is a parametric family of distributions on the
surface of the unit ball, summarised by a concentration parameter and a mean
direction. As a quasi-Bayesian prior, the von Mises-Fisher distribution is a
convenient and parsimonious choice when parameter spaces are isomorphic to the
hypersphere (e.g., maximum score estimation in semi-parametric discrete choice,
estimation of single-index treatment assignment rules via empirical welfare
maximisation, under-identifying linear simultaneous equation models). Despite a
long history of application, measures of statistical divergence have not been
analytically characterised for von Mises-Fisher distributions. This paper
provides analytical expressions for the $f$-divergence of a von Mises-Fisher
distribution from another, distinct, von Mises-Fisher distribution in
$R^p$ and the uniform distribution over the hypersphere. This paper
also collect several other results pertaining to the von Mises-Fisher family of
distributions, and characterises the limiting behaviour of the measures of
divergence that we consider.

arXiv link: http://arxiv.org/abs/2202.05192v2

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2022-02-10

The Transfer Performance of Economic Models

Authors: Isaiah Andrews, Drew Fudenberg, Lihua Lei, Annie Liang, Chaofeng Wu

Economists often estimate models using data from a particular domain, e.g.
estimating risk preferences in a particular subject pool or for a specific
class of lotteries. Whether a model's predictions extrapolate well across
domains depends on whether the estimated model has captured generalizable
structure. We provide a tractable formulation for this "out-of-domain"
prediction problem and define the transfer error of a model based on how well
it performs on data from a new domain. We derive finite-sample forecast
intervals that are guaranteed to cover realized transfer errors with a
user-selected probability when domains are iid, and use these intervals to
compare the transferability of economic models and black box algorithms for
predicting certainty equivalents. We find that in this application, the black
box algorithms we consider outperform standard economic models when estimated
and tested on data from the same domain, but the economic models generalize
across domains better than the black-box algorithms do.

arXiv link: http://arxiv.org/abs/2202.04796v5

Econometrics arXiv updated paper (originally submitted: 2022-02-09)

Semiparametric Bayesian Estimation of Dynamic Discrete Choice Models

Authors: Andriy Norets, Kenichi Shimizu

We propose a tractable semiparametric estimation method for structural
dynamic discrete choice models. The distribution of additive utility shocks in
the proposed framework is modeled by location-scale mixtures of extreme value
distributions with varying numbers of mixture components. Our approach exploits
the analytical tractability of extreme value distributions in the multinomial
choice settings and the flexibility of the location-scale mixtures. We
implement the Bayesian approach to inference using Hamiltonian Monte Carlo and
an approximately optimal reversible jump algorithm. In our simulation
experiments, we show that the standard dynamic logit model can deliver
misleading results, especially about counterfactuals, when the shocks are not
extreme value distributed. Our semiparametric approach delivers reliable
inference in these settings. We develop theoretical results on approximations
by location-scale mixtures in an appropriate distance and posterior
concentration of the set identified utility parameters and the distribution of
shocks in the model.

arXiv link: http://arxiv.org/abs/2202.04339v3

Econometrics arXiv cross-link from cs.CY (cs.CY), submitted: 2022-02-09

Regulatory Instruments for Fair Personalized Pricing

Authors: Renzhe Xu, Xingxuan Zhang, Peng Cui, Bo Li, Zheyan Shen, Jiazheng Xu

Personalized pricing is a business strategy to charge different prices to
individual consumers based on their characteristics and behaviors. It has
become common practice in many industries nowadays due to the availability of a
growing amount of high granular consumer data. The discriminatory nature of
personalized pricing has triggered heated debates among policymakers and
academics on how to design regulation policies to balance market efficiency and
equity. In this paper, we propose two sound policy instruments, i.e., capping
the range of the personalized prices or their ratios. We investigate the
optimal pricing strategy of a profit-maximizing monopoly under both regulatory
constraints and the impact of imposing them on consumer surplus, producer
surplus, and social welfare. We theoretically prove that both proposed
constraints can help balance consumer surplus and producer surplus at the
expense of total surplus for common demand distributions, such as uniform,
logistic, and exponential distributions. Experiments on both simulation and
real-world datasets demonstrate the correctness of these theoretical results.
Our findings and insights shed light on regulatory policy design for the
increasingly monopolized business in the digital era.

arXiv link: http://arxiv.org/abs/2202.04245v2

Econometrics arXiv paper, submitted: 2022-02-09

Managers versus Machines: Do Algorithms Replicate Human Intuition in Credit Ratings?

Authors: Matthew Harding, Gabriel F. R. Vasconcelos

We use machine learning techniques to investigate whether it is possible to
replicate the behavior of bank managers who assess the risk of commercial loans
made by a large commercial US bank. Even though a typical bank already relies
on an algorithmic scorecard process to evaluate risk, bank managers are given
significant latitude in adjusting the risk score in order to account for other
holistic factors based on their intuition and experience. We show that it is
possible to find machine learning algorithms that can replicate the behavior of
the bank managers. The input to the algorithms consists of a combination of
standard financials and soft information available to bank managers as part of
the typical loan review process. We also document the presence of significant
heterogeneity in the adjustment process that can be traced to differences
across managers and industries. Our results highlight the effectiveness of
machine learning based analytic approaches to banking and the potential
challenges to high-skill jobs in the financial sector.

arXiv link: http://arxiv.org/abs/2202.04218v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-02-09

Validating Causal Inference Methods

Authors: Harsh Parikh, Carlos Varjao, Louise Xu, Eric Tchetgen Tchetgen

The fundamental challenge of drawing causal inference is that counterfactual
outcomes are not fully observed for any unit. Furthermore, in observational
studies, treatment assignment is likely to be confounded. Many statistical
methods have emerged for causal inference under unconfoundedness conditions
given pre-treatment covariates, including propensity score-based methods,
prognostic score-based methods, and doubly robust methods. Unfortunately for
applied researchers, there is no `one-size-fits-all' causal method that can
perform optimally universally. In practice, causal methods are primarily
evaluated quantitatively on handcrafted simulated data. Such data-generative
procedures can be of limited value because they are typically stylized models
of reality. They are simplified for tractability and lack the complexities of
real-world data. For applied researchers, it is critical to understand how well
a method performs for the data at hand. Our work introduces a deep generative
model-based framework, Credence, to validate causal inference methods. The
framework's novelty stems from its ability to generate synthetic data anchored
at the empirical distribution for the observed sample, and therefore virtually
indistinguishable from the latter. The approach allows the user to specify
ground truth for the form and magnitude of causal effects and confounding bias
as functions of covariates. Thus simulated data sets are used to evaluate the
potential performance of various causal estimation methods when applied to data
similar to the observed sample. We demonstrate Credence's ability to accurately
assess the relative performance of causal estimation techniques in an extensive
simulation study and two real-world data applications from Lalonde and Project
STAR studies.

arXiv link: http://arxiv.org/abs/2202.04208v5

Econometrics arXiv updated paper (originally submitted: 2022-02-08)

Dynamic Heterogeneous Distribution Regression Panel Models, with an Application to Labor Income Processes

Authors: Ivan Fernandez-Val, Wayne Yuan Gao, Yuan Liao, Francis Vella

We introduce a dynamic distribution regression panel data model with
heterogeneous coefficients across units. The objects of primary interest are
functionals of these coefficients, including predicted one-step-ahead and
stationary cross-sectional distributions of the outcome variable. Coefficients
and their functionals are estimated via fixed effect methods. We investigate
how these functionals vary in response to counterfactual changes in initial
conditions or covariate values. We also identify a uniformity problem related
to the robustness of inference to the unknown degree of coefficient
heterogeneity, and propose a cross-sectional bootstrap method for uniformly
valid inference on function-valued objects. We showcase the utility of our
approach through an empirical application to individual income dynamics.
Employing the annual Panel Study of Income Dynamics data, we establish the
presence of substantial coefficient heterogeneity. We then highlight some
important empirical questions that our methodology can address. First, we
quantify the impact of a negative labor income shock on the distribution of
future labor income.

arXiv link: http://arxiv.org/abs/2202.04154v4

Econometrics arXiv updated paper (originally submitted: 2022-02-08)

A Neural Phillips Curve and a Deep Output Gap

Authors: Philippe Goulet Coulombe

Many problems plague empirical Phillips curves (PCs). Among them is the
hurdle that the two key components, inflation expectations and the output gap,
are both unobserved. Traditional remedies include proxying for the absentees or
extracting them via assumptions-heavy filtering procedures. I propose an
alternative route: a Hemisphere Neural Network (HNN) whose architecture yields
a final layer where components can be interpreted as latent states within a
Neural PC. There are benefits. First, HNN conducts the supervised estimation of
nonlinearities that arise when translating a high-dimensional set of observed
regressors into latent states. Second, forecasts are economically
interpretable. Among other findings, the contribution of real activity to
inflation appears understated in traditional PCs. In contrast, HNN captures the
2021 upswing in inflation and attributes it to a large positive output gap
starting from late 2020. The unique path of HNN's gap comes from dispensing
with unemployment and GDP in favor of an amalgam of nonlinearly processed
alternative tightness indicators.

arXiv link: http://arxiv.org/abs/2202.04146v2

Econometrics arXiv updated paper (originally submitted: 2022-02-08)

Continuous permanent unobserved heterogeneity in dynamic discrete choice models

Authors: Jackson Bunting

In dynamic discrete choice (DDC) analysis, it is common to use mixture models
to control for unobserved heterogeneity. However, consistent estimation
typically requires both restrictions on the support of unobserved heterogeneity
and a high-level injectivity condition that is difficult to verify. This paper
provides primitive conditions for point identification of a broad class of DDC
models with multivariate continuous permanent unobserved heterogeneity. The
results apply to both finite- and infinite-horizon DDC models, do not require a
full support assumption, nor a long panel, and place no parametric restriction
on the distribution of unobserved heterogeneity. In addition, I propose a
seminonparametric estimator that is computationally attractive and can be
implemented using familiar parametric methods.

arXiv link: http://arxiv.org/abs/2202.03960v4

Econometrics arXiv updated paper (originally submitted: 2022-02-07)

Threshold Asymmetric Conditional Autoregressive Range (TACARR) Model

Authors: Isuru Ratnayake, V. A. Samaranayake

This paper introduces a Threshold Asymmetric Conditional Autoregressive Range
(TACARR) formulation for modeling the daily price ranges of financial assets.
It is assumed that the process generating the conditional expected ranges at
each time point switches between two regimes, labeled as upward market and
downward market states. The disturbance term of the error process is also
allowed to switch between two distributions depending on the regime. It is
assumed that a self-adjusting threshold component that is driven by the past
values of the time series determines the current market regime. The proposed
model is able to capture aspects such as asymmetric and heteroscedastic
behavior of volatility in financial markets. The proposed model is an attempt
at addressing several potential deficits found in existing price range models
such as the Conditional Autoregressive Range (CARR), Asymmetric CARR (ACARR),
Feedback ACARR (FACARR) and Threshold Autoregressive Range (TARR) models.
Parameters of the model are estimated using the Maximum Likelihood (ML) method.
A simulation study shows that the ML method performs well in estimating the
TACARR model parameters. The empirical performance of the TACARR model was
investigated using IBM index data and results show that the proposed model is a
good alternative for in-sample prediction and out-of-sample forecasting of
volatility.
Key Words: Volatility Modeling, Asymmetric Volatility, CARR Models, Regime
Switching.

arXiv link: http://arxiv.org/abs/2202.03351v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-02-07

Forecasting Environmental Data: An example to ground-level ozone concentration surfaces

Authors: Alexander Gleim, Nazarii Salish

Environmental problems are receiving increasing attention in socio-economic
and health studies. This in turn fosters advances in recording and data
collection of many related real-life processes. Available tools for data
processing are often found too restrictive as they do not account for the rich
nature of such data sets. In this paper, we propose a new statistical
perspective on forecasting spatial environmental data collected sequentially
over time. We treat this data set as a surface (functional) time series with a
possibly complicated geographical domain. By employing novel techniques from
functional data analysis we develop a new forecasting methodology. Our approach
consists of two steps. In the first step, time series of surfaces are
reconstructed from measurements sampled over some spatial domain using a finite
element spline smoother. In the second step, we adapt the dynamic functional
factor model to forecast a surface time series. The advantage of this approach
is that we can account for and explore simultaneously spatial as well as
temporal dependencies in the data. A forecasting study of ground-level ozone
concentration over the geographical domain of Germany demonstrates the
practical value of this new perspective, where we compare our approach with
standard functional benchmark models.

arXiv link: http://arxiv.org/abs/2202.03332v1

Econometrics arXiv paper, submitted: 2022-02-07

Predicting Default Probabilities for Stress Tests: A Comparison of Models

Authors: Martin Guth

Since the Great Financial Crisis (GFC), the use of stress tests as a tool for
assessing the resilience of financial institutions to adverse financial and
economic developments has increased significantly. One key part in such
exercises is the translation of macroeconomic variables into default
probabilities for credit risk by using macrofinancial linkage models. A key
requirement for such models is that they should be able to properly detect
signals from a wide array of macroeconomic variables in combination with a
mostly short data sample. The aim of this paper is to compare a great number of
different regression models to find the best performing credit risk model. We
set up an estimation framework that allows us to systematically estimate and
evaluate a large set of models within the same environment. Our results
indicate that there are indeed better performing models than the current
state-of-the-art model. Moreover, our comparison sheds light on other potential
credit risk models, specifically highlighting the advantages of machine
learning models and forecast combinations.

arXiv link: http://arxiv.org/abs/2202.03110v1

Econometrics arXiv paper, submitted: 2022-02-07

Detecting Structural Breaks in Foreign Exchange Markets by using the group LASSO technique

Authors: Mikio Ito

This article proposes an estimation method to detect breakpoints for linear
time series models with their parameters that jump scarcely. Its basic idea
owes the group LASSO (group least absolute shrinkage and selection operator).
The method practically provides estimates of such time-varying parameters of
the models. An example shows that our method can detect each structural
breakpoint's date and magnitude.

arXiv link: http://arxiv.org/abs/2202.02988v1

Econometrics arXiv updated paper (originally submitted: 2022-02-07)

Difference in Differences with Time-Varying Covariates

Authors: Carolina Caetano, Brantly Callaway, Stroud Payne, Hugo Sant'Anna Rodrigues

This paper considers identification and estimation of causal effect
parameters from participating in a binary treatment in a difference in
differences (DID) setup when the parallel trends assumption holds after
conditioning on observed covariates. Relative to existing work in the
econometrics literature, we consider the case where the value of covariates can
change over time and, potentially, where participating in the treatment can
affect the covariates themselves. We propose new empirical strategies in both
cases. We also consider two-way fixed effects (TWFE) regressions that include
time-varying regressors, which is the most common way that DID identification
strategies are implemented under conditional parallel trends. We show that,
even in the case with only two time periods, these TWFE regressions are not
generally robust to (i) time-varying covariates being affected by the
treatment, (ii) treatment effects and/or paths of untreated potential outcomes
depending on the level of time-varying covariates in addition to only the
change in the covariates over time, (iii) treatment effects and/or paths of
untreated potential outcomes depending on time-invariant covariates, (iv)
treatment effect heterogeneity with respect to observed covariates, and (v)
violations of strong functional form assumptions, both for outcomes over time
and the propensity score, that are unlikely to be plausible in most DID
applications. Thus, TWFE regressions can deliver misleading estimates of causal
effect parameters in a number of empirically relevant cases. We propose both
doubly robust estimands and regression adjustment/imputation strategies that
are robust to these issues while not being substantially more challenging to
implement.

arXiv link: http://arxiv.org/abs/2202.02903v3

Econometrics arXiv paper, submitted: 2022-02-05

Adaptive information-based methods for determining the co-integration rank in heteroskedastic VAR models

Authors: H. Peter Boswijk, Giuseppe Cavaliere, Luca De Angelis, A. M. Robert Taylor

Standard methods, such as sequential procedures based on Johansen's
(pseudo-)likelihood ratio (PLR) test, for determining the co-integration rank
of a vector autoregressive (VAR) system of variables integrated of order one
can be significantly affected, even asymptotically, by unconditional
heteroskedasticity (non-stationary volatility) in the data. Known solutions to
this problem include wild bootstrap implementations of the PLR test or the use
of an information criterion, such as the BIC, to select the co-integration
rank. Although asymptotically valid in the presence of heteroskedasticity,
these methods can display very low finite sample power under some patterns of
non-stationary volatility. In particular, they do not exploit potential
efficiency gains that could be realised in the presence of non-stationary
volatility by using adaptive inference methods. Under the assumption of a known
autoregressive lag length, Boswijk and Zu (2022) develop adaptive PLR test
based methods using a non-parameteric estimate of the covariance matrix
process. It is well-known, however, that selecting an incorrect lag length can
significantly impact on the efficacy of both information criteria and bootstrap
PLR tests to determine co-integration rank in finite samples. We show that
adaptive information criteria-based approaches can be used to estimate the
autoregressive lag order to use in connection with bootstrap adaptive PLR
tests, or to jointly determine the co-integration rank and the VAR lag length
and that in both cases they are weakly consistent for these parameters in the
presence of non-stationary volatility provided standard conditions hold on the
penalty term. Monte Carlo simulations are used to demonstrate the potential
gains from using adaptive methods and an empirical application to the U.S. term
structure is provided.

arXiv link: http://arxiv.org/abs/2202.02532v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-02-04

First-order integer-valued autoregressive processes with Generalized Katz innovations

Authors: Ovielt Baltodano Lopez, Federico Bassetti, Giulia Carallo, Roberto Casarin

A new integer--valued autoregressive process (INAR) with Generalised
Lagrangian Katz (GLK) innovations is defined. This process family provides a
flexible modelling framework for count data, allowing for under and
over--dispersion, asymmetry, and excess of kurtosis and includes standard INAR
models such as Generalized Poisson and Negative Binomial as special cases. We
show that the GLK--INAR process is discrete semi--self--decomposable, infinite
divisible, stable by aggregation and provides stationarity conditions. Some
extensions are discussed, such as the Markov--Switching and the zero--inflated
GLK--INARs. A Bayesian inference framework and an efficient posterior
approximation procedure are introduced. The proposed models are applied to 130
time series from Google Trend, which proxy the worldwide public concern about
climate change. New evidence is found of heterogeneity across time, countries
and keywords in the persistence, uncertainty, and long--run public awareness
level.

arXiv link: http://arxiv.org/abs/2202.02029v2

Econometrics arXiv paper, submitted: 2022-02-02

Efficient Volatility Estimation for Lévy Processes with Jumps of Unbounded Variation

Authors: B. Cooper Boniece, José E. Figueroa-López, Yuchen Han

Statistical inference for stochastic processes based on high-frequency
observations has been an active research area for more than a decade. One of
the most well-known and widely studied problems is that of estimation of the
quadratic variation of the continuous component of an It\^o semimartingale with
jumps. Several rate- and variance-efficient estimators have been proposed in
the literature when the jump component is of bounded variation. However, to
date, very few methods can deal with jumps of unbounded variation. By
developing new high-order expansions of the truncated moments of a L\'evy
process, we construct a new rate- and variance-efficient estimator for a class
of L\'evy processes of unbounded variation, whose small jumps behave like those
of a stable L\'evy process with Blumenthal-Getoor index less than $8/5$. The
proposed method is based on a two-step debiasing procedure for the truncated
realized quadratic variation of the process. Our Monte Carlo experiments
indicate that the method outperforms other efficient alternatives in the
literature in the setting covered by our theoretical framework.

arXiv link: http://arxiv.org/abs/2202.00877v1

Econometrics arXiv paper, submitted: 2022-02-01

Long-Horizon Return Predictability from Realized Volatility in Pure-Jump Point Processes

Authors: Meng-Chen Hsieh, Clifford Hurvich, Philippe Soulier

We develop and justify methodology to consistently test for long-horizon
return predictability based on realized variance. To accomplish this, we
propose a parametric transaction-level model for the continuous-time log price
process based on a pure jump point process. The model determines the returns
and realized variance at any level of aggregation with properties shown to be
consistent with the stylized facts in the empirical finance literature. Under
our model, the long-memory parameter propagates unchanged from the
transaction-level drift to the calendar-time returns and the realized variance,
leading endogenously to a balanced predictive regression equation. We propose
an asymptotic framework using power-law aggregation in the predictive
regression. Within this framework, we propose a hypothesis test for long
horizon return predictability which is asymptotically correctly sized and
consistent.

arXiv link: http://arxiv.org/abs/2202.00793v1

Econometrics arXiv paper, submitted: 2022-02-01

Black-box Bayesian inference for economic agent-based models

Authors: Joel Dyer, Patrick Cannon, J. Doyne Farmer, Sebastian Schmon

Simulation models, in particular agent-based models, are gaining popularity
in economics. The considerable flexibility they offer, as well as their
capacity to reproduce a variety of empirically observed behaviours of complex
systems, give them broad appeal, and the increasing availability of cheap
computing power has made their use feasible. Yet a widespread adoption in
real-world modelling and decision-making scenarios has been hindered by the
difficulty of performing parameter estimation for such models. In general,
simulation models lack a tractable likelihood function, which precludes a
straightforward application of standard statistical inference techniques.
Several recent works have sought to address this problem through the
application of likelihood-free inference techniques, in which parameter
estimates are determined by performing some form of comparison between the
observed data and simulation output. However, these approaches are (a) founded
on restrictive assumptions, and/or (b) typically require many hundreds of
thousands of simulations. These qualities make them unsuitable for large-scale
simulations in economics and can cast doubt on the validity of these inference
methods in such scenarios. In this paper, we investigate the efficacy of two
classes of black-box approximate Bayesian inference methods that have recently
drawn significant attention within the probabilistic machine learning
community: neural posterior estimation and neural density ratio estimation. We
present benchmarking experiments in which we demonstrate that neural network
based black-box methods provide state of the art parameter inference for
economic simulation models, and crucially are compatible with generic
multivariate time-series data. In addition, we suggest appropriate assessment
criteria for future benchmarking of approximate Bayesian inference procedures
for economic simulation models.

arXiv link: http://arxiv.org/abs/2202.00625v1

Econometrics arXiv updated paper (originally submitted: 2022-02-01)

Estimation of Impulse-Response Functions with Dynamic Factor Models: A New Parametrization

Authors: Juho Koistinen, Bernd Funovits

We propose a new parametrization for the estimation and identification of the
impulse-response functions (IRFs) of dynamic factor models (DFMs). The
theoretical contribution of this paper concerns the problem of observational
equivalence between different IRFs, which implies non-identification of the IRF
parameters without further restrictions. We show how the previously proposed
minimal identification conditions are nested in the new framework and can be
further augmented with overidentifying restrictions leading to efficiency
gains. The current standard practice for the IRF estimation of DFMs is based on
principal components, compared to which the new parametrization is less
restrictive and allows for modelling richer dynamics. As the empirical
contribution of the paper, we develop an estimation method based on the EM
algorithm, which incorporates the proposed identification restrictions. In the
empirical application, we use a standard high-dimensional macroeconomic dataset
to estimate the effects of a monetary policy shock. We estimate a strong
reaction of the macroeconomic variables, while the benchmark models appear to
give qualitatively counterintuitive results. The estimation methods are
implemented in the accompanying R package.

arXiv link: http://arxiv.org/abs/2202.00310v2

Econometrics arXiv paper, submitted: 2022-02-01

Protection or Peril of Following the Crowd in a Pandemic-Concurrent Flood Evacuation

Authors: Elisa Borowski, Amanda Stathopoulos

The decisions of whether and how to evacuate during a climate disaster are
influenced by a wide range of factors, including sociodemographics, emergency
messaging, and social influence. Further complexity is introduced when multiple
hazards occur simultaneously, such as a flood evacuation taking place amid a
viral pandemic that requires physical distancing. Such multi-hazard events can
necessitate a nuanced navigation of competing decision-making strategies
wherein a desire to follow peers is weighed against contagion risks. To better
understand these nuances, we distributed an online survey during a pandemic
surge in July 2020 to 600 individuals in three midwestern and three southern
states in the United States with high risk of flooding. In this paper, we
estimate a random parameter logit model in both preference space and
willingness-to-pay space. Our results show that the directionality and
magnitude of the influence of peers' choices of whether and how to evacuate
vary widely across respondents. Overall, the decision of whether to evacuate is
positively impacted by peer behavior, while the decision of how to evacuate is
negatively impacted by peers. Furthermore, an increase in flood threat level
lessens the magnitude of these impacts. These findings have important
implications for the design of tailored emergency messaging strategies.
Specifically, emphasizing or deemphasizing the severity of each threat in a
multi-hazard scenario may assist in: (1) encouraging a reprioritization of
competing risk perceptions and (2) magnifying or neutralizing the impacts of
social influence, thereby (3) nudging evacuation decision-making toward a
desired outcome.

arXiv link: http://arxiv.org/abs/2202.00229v1

Econometrics arXiv updated paper (originally submitted: 2022-01-31)

Partial Sum Processes of Residual-Based and Wald-type Break-Point Statistics in Time Series Regression Models

Authors: Christis Katsouris

We revisit classical asymptotics when testing for a structural break in
linear regression models by obtaining the limit theory of residual-based and
Wald-type processes. First, we establish the Brownian bridge limiting
distribution of these test statistics. Second, we study the asymptotic
behaviour of the partial-sum processes in nonstationary (linear) time series
regression models. Although, the particular comparisons of these two different
modelling environments is done from the perspective of the partial-sum
processes, it emphasizes that the presence of nuisance parameters can change
the asymptotic behaviour of the functionals under consideration. Simulation
experiments verify size distortions when testing for a break in nonstationary
time series regressions which indicates that the Brownian bridge limit cannot
provide a suitable asymptotic approximation in this case. Further research is
required to establish the cause of size distortions under the null hypothesis
of parameter stability.

arXiv link: http://arxiv.org/abs/2202.00141v2

Econometrics arXiv paper, submitted: 2022-01-31

Deep Learning Macroeconomics

Authors: Rafael R. S. Guimaraes

Limited datasets and complex nonlinear relationships are among the challenges
that may emerge when applying econometrics to macroeconomic problems. This
research proposes deep learning as an approach to transfer learning in the
former case and to map relationships between variables in the latter case.
Although macroeconomists already apply transfer learning when assuming a given
a priori distribution in a Bayesian context, estimating a structural VAR with
signal restriction and calibrating parameters based on results observed in
other models, to name a few examples, advance in a more systematic transfer
learning strategy in applied macroeconomics is the innovation we are
introducing. We explore the proposed strategy empirically, showing that data
from different but related domains, a type of transfer learning, helps identify
the business cycle phases when there is no business cycle dating committee and
to quick estimate a economic-based output gap. Next, since deep learning
methods are a way of learning representations, those that are formed by the
composition of multiple non-linear transformations, to yield more abstract
representations, we apply deep learning for mapping low-frequency from
high-frequency variables. The results obtained show the suitability of deep
learning models applied to macroeconomic problems. First, models learned to
classify United States business cycles correctly. Then, applying transfer
learning, they were able to identify the business cycles of out-of-sample
Brazilian and European data. Along the same lines, the models learned to
estimate the output gap based on the U.S. data and obtained good performance
when faced with Brazilian data. Additionally, deep learning proved adequate for
mapping low-frequency variables from high-frequency data to interpolate,
distribute, and extrapolate time series by related series.

arXiv link: http://arxiv.org/abs/2201.13380v1

Econometrics arXiv updated paper (originally submitted: 2022-01-31)

Improving Estimation Efficiency via Regression-Adjustment in Covariate-Adaptive Randomizations with Imperfect Compliance

Authors: Liang Jiang, Oliver B. Linton, Haihan Tang, Yichong Zhang

We investigate how to improve efficiency using regression adjustments with
covariates in covariate-adaptive randomizations (CARs) with imperfect subject
compliance. Our regression-adjusted estimators, which are based on the doubly
robust moment for local average treatment effects, are consistent and
asymptotically normal even with heterogeneous probability of assignment and
misspecified regression adjustments. We propose an optimal but potentially
misspecified linear adjustment and its further improvement via a nonlinear
adjustment, both of which lead to more efficient estimators than the one
without adjustments. We also provide conditions for nonparametric and
regularized adjustments to achieve the semiparametric efficiency bound under
CARs.

arXiv link: http://arxiv.org/abs/2201.13004v5

Econometrics arXiv paper, submitted: 2022-01-31

A General Description of Growth Trends

Authors: Moshe Elitzur

Time series that display periodicity can be described with a Fourier
expansion. In a similar vein, a recently developed formalism enables
description of growth patterns with the optimal number of parameters (Elitzur
et al, 2020). The method has been applied to the growth of national GDP,
population and the COVID-19 pandemic; in all cases the deviations of long-term
growth patterns from pure exponential required no more than two additional
parameters, mostly only one. Here I utilize the new framework to develop a
unified formulation for all functions that describe growth deceleration,
wherein the growth rate decreases with time. The result offers the prospects
for a new general tool for trend removal in time-series analysis.

arXiv link: http://arxiv.org/abs/2201.13000v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-01-30

Pigeonhole Design: Balancing Sequential Experiments from an Online Matching Perspective

Authors: Jinglong Zhao, Zijie Zhou

Practitioners and academics have long appreciated the benefits of covariate
balancing when they conduct randomized experiments. For web-facing firms
running online A/B tests, however, it still remains challenging in balancing
covariate information when experimental subjects arrive sequentially. In this
paper, we study an online experimental design problem, which we refer to as the
"Online Blocking Problem." In this problem, experimental subjects with
heterogeneous covariate information arrive sequentially and must be immediately
assigned into either the control or the treated group. The objective is to
minimize the total discrepancy, which is defined as the minimum weight perfect
matching between the two groups. To solve this problem, we propose a randomized
design of experiment, which we refer to as the "Pigeonhole Design." The
pigeonhole design first partitions the covariate space into smaller spaces,
which we refer to as pigeonholes, and then, when the experimental subjects
arrive at each pigeonhole, balances the number of control and treated subjects
for each pigeonhole. We analyze the theoretical performance of the pigeonhole
design and show its effectiveness by comparing against two well-known benchmark
designs: the match-pair design and the completely randomized design. We
identify scenarios when the pigeonhole design demonstrates more benefits over
the benchmark design. To conclude, we conduct extensive simulations using
Yahoo! data to show a 10.2% reduction in variance if we use the pigeonhole
design to estimate the average treatment effect.

arXiv link: http://arxiv.org/abs/2201.12936v6

Econometrics arXiv paper, submitted: 2022-01-30

On the Use of Instrumental Variables in Mediation Analysis

Authors: Bora Kim

Empirical researchers are often interested in not only whether a treatment
affects an outcome of interest, but also how the treatment effect arises.
Causal mediation analysis provides a formal framework to identify causal
mechanisms through which a treatment affects an outcome. The most popular
identification strategy relies on so-called sequential ignorability (SI)
assumption which requires that there is no unobserved confounder that lies in
the causal paths between the treatment and the outcome. Despite its popularity,
such assumption is deemed to be too strong in many settings as it excludes the
existence of unobserved confounders. This limitation has inspired recent
literature to consider an alternative identification strategy based on an
instrumental variable (IV). This paper discusses the identification of causal
mediation effects in a setting with a binary treatment and a binary
instrumental variable that is both assumed to be random. We show that while IV
methods allow for the possible existence of unobserved confounders, additional
monotonicity assumptions are required unless the strong constant effect is
assumed. Furthermore, even when such monotonicity assumptions are satisfied, IV
estimands are not necessarily equivalent to target parameters.

arXiv link: http://arxiv.org/abs/2201.12752v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-01-30

Sharing Behavior in Ride-hailing Trips: A Machine Learning Inference Approach

Authors: Morteza Taiebat, Elham Amini, Ming Xu

Ride-hailing is rapidly changing urban and personal transportation. Ride
sharing or pooling is important to mitigate negative externalities of
ride-hailing such as increased congestion and environmental impacts. However,
there lacks empirical evidence on what affect trip-level sharing behavior in
ride-hailing. Using a novel dataset from all ride-hailing trips in Chicago in
2019, we show that the willingness of riders to request a shared ride has
monotonically decreased from 27.0% to 12.8% throughout the year, while the trip
volume and mileage have remained statistically unchanged. We find that the
decline in sharing preference is due to an increased per-mile costs of shared
trips and shifting shorter trips to solo. Using ensemble machine learning
models, we find that the travel impedance variables (trip cost, distance, and
duration) collectively contribute to 95% and 91% of the predictive power in
determining whether a trip is requested to share and whether it is successfully
shared, respectively. Spatial and temporal attributes, sociodemographic, built
environment, and transit supply variables do not entail predictive power at the
trip level in presence of these travel impedance variables. This implies that
pricing signals are most effective to encourage riders to share their rides.
Our findings shed light on sharing behavior in ride-hailing trips and can help
devise strategies that increase shared ride-hailing, especially as the demand
recovers from pandemic.

arXiv link: http://arxiv.org/abs/2201.12696v1

Econometrics arXiv paper, submitted: 2022-01-30

Meta-Learners for Estimation of Causal Effects: Finite Sample Cross-Fit Performance

Authors: Gabriel Okasa

Estimation of causal effects using machine learning methods has become an
active research field in econometrics. In this paper, we study the finite
sample performance of meta-learners for estimation of heterogeneous treatment
effects under the usage of sample-splitting and cross-fitting to reduce the
overfitting bias. In both synthetic and semi-synthetic simulations we find that
the performance of the meta-learners in finite samples greatly depends on the
estimation procedure. The results imply that sample-splitting and cross-fitting
are beneficial in large samples for bias reduction and efficiency of the
meta-learners, respectively, whereas full-sample estimation is preferable in
small samples. Furthermore, we derive practical recommendations for application
of specific meta-learners in empirical studies depending on particular data
characteristics such as treatment shares and sample size.

arXiv link: http://arxiv.org/abs/2201.12692v1

Econometrics arXiv updated paper (originally submitted: 2022-01-27)

A projection based approach for interactive fixed effects panel data models

Authors: Georg Keilbar, Juan M. Rodriguez-Poo, Alexandra Soberon, Weining Wang

This paper introduces a straightforward sieve-based approach for estimating
and conducting inference on regression parameters in panel data models with
interactive fixed effects. The method's key assumption is that factor loadings
can be decomposed into an unknown smooth function of individual characteristics
plus an idiosyncratic error term. Our estimator offers advantages over existing
approaches by taking a simple partial least squares form, eliminating the need
for iterative procedures or preliminary factor estimation. In deriving the
asymptotic properties, we discover that the limiting distribution exhibits a
discontinuity that depends on how well our basis functions explain the factor
loadings, as measured by the variance of the error factor loadings. This
finding reveals that conventional “plug-in” methods using the estimated
asymptotic covariance can produce excessively conservative coverage
probabilities. We demonstrate that uniformly valid non-conservative inference
can be achieved through the cross-sectional bootstrap method. Monte Carlo
simulations confirm the estimator's strong performance in terms of mean squared
error and good coverage results for the bootstrap procedure. We demonstrate the
practical relevance of our methodology by analyzing growth rate determinants
across OECD countries.

arXiv link: http://arxiv.org/abs/2201.11482v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-01-27

Towards Agnostic Feature-based Dynamic Pricing: Linear Policies vs Linear Valuation with Unknown Noise

Authors: Jianyu Xu, Yu-Xiang Wang

In feature-based dynamic pricing, a seller sets appropriate prices for a
sequence of products (described by feature vectors) on the fly by learning from
the binary outcomes of previous sales sessions ("Sold" if valuation $\geq$
price, and "Not Sold" otherwise). Existing works either assume noiseless linear
valuation or precisely-known noise distribution, which limits the applicability
of those algorithms in practice when these assumptions are hard to verify. In
this work, we study two more agnostic models: (a) a "linear policy" problem
where we aim at competing with the best linear pricing policy while making no
assumptions on the data, and (b) a "linear noisy valuation" problem where the
random valuation is linear plus an unknown and assumption-free noise. For the
former model, we show a $\Theta(d^{\frac13}T^{\frac23})$ minimax regret
up to logarithmic factors. For the latter model, we present an algorithm that
achieves an $O(T^{\frac34})$ regret, and improve the best-known lower
bound from $\Omega(T^{\frac35})$ to $\Omega(T^{\frac23})$. These
results demonstrate that no-regret learning is possible for feature-based
dynamic pricing under weak assumptions, but also reveal a disappointing fact
that the seemingly richer pricing feedback is not significantly more useful
than the bandit-feedback in regret reduction.

arXiv link: http://arxiv.org/abs/2201.11341v2

Econometrics arXiv updated paper (originally submitted: 2022-01-27)

Standard errors for two-way clustering with serially correlated time effects

Authors: Harold D Chiang, Bruce E Hansen, Yuya Sasaki

We propose improved standard errors and an asymptotic distribution theory for
two-way clustered panels. Our proposed estimator and theory allow for arbitrary
serial dependence in the common time effects, which is excluded by existing
two-way methods, including the popular two-way cluster standard errors of
Cameron, Gelbach, and Miller (2011) and the cluster bootstrap of Menzel (2021).
Our asymptotic distribution theory is the first which allows for this level of
inter-dependence among the observations. Under weak regularity conditions, we
demonstrate that the least squares estimator is asymptotically normal, our
proposed variance estimator is consistent, and t-ratios are asymptotically
standard normal, permitting conventional inference. We present simulation
evidence that confidence intervals constructed with our proposed standard
errors obtain superior coverage performance relative to existing methods. We
illustrate the relevance of the proposed method in an empirical application to
a standard Fama-French three-factor regression.

arXiv link: http://arxiv.org/abs/2201.11304v4

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-01-27

Micro-level Reserving for General Insurance Claims using a Long Short-Term Memory Network

Authors: Ihsan Chaoubi, Camille Besse, Hélène Cossette, Marie-Pier Côté

Detailed information about individual claims are completely ignored when
insurance claims data are aggregated and structured in development triangles
for loss reserving. In the hope of extracting predictive power from the
individual claims characteristics, researchers have recently proposed to move
away from these macro-level methods in favor of micro-level loss reserving
approaches. We introduce a discrete-time individual reserving framework
incorporating granular information in a deep learning approach named Long
Short-Term Memory (LSTM) neural network. At each time period, the network has
two tasks: first, classifying whether there is a payment or a recovery, and
second, predicting the corresponding non-zero amount, if any. We illustrate the
estimation procedure on a simulated and a real general insurance dataset. We
compare our approach with the chain-ladder aggregate method using the
predictive outstanding loss estimates and their actual values. Based on a
generalized Pareto model for excess payments over a threshold, we adjust the
LSTM reserve prediction to account for extreme payments.

arXiv link: http://arxiv.org/abs/2201.13267v1

Econometrics arXiv paper, submitted: 2022-01-26

Bootstrap inference for fixed-effect models

Authors: Ayden Higgins, Koen Jochmans

The maximum-likelihood estimator of nonlinear panel data models with fixed
effects is consistent but asymptotically-biased under rectangular-array
asymptotics. The literature has thus far concentrated its effort on devising
methods to correct the maximum-likelihood estimator for its bias as a means to
salvage standard inferential procedures. Instead, we show that the parametric
bootstrap replicates the distribution of the (uncorrected) maximum-likelihood
estimator in large samples. This justifies the use of confidence sets
constructed via standard bootstrap percentile methods. No adjustment for the
presence of bias needs to be made.

arXiv link: http://arxiv.org/abs/2201.11156v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2022-01-26

Instrumental variable estimation of dynamic treatment effects on a duration outcome

Authors: Jad Beyhum, Samuele Centorrino, Jean-Pierre Florens, Ingrid Van Keilegom

This paper considers identification and estimation of the causal effect of
the time Z until a subject is treated on a survival outcome T. The treatment is
not randomly assigned, T is randomly right censored by a random variable C and
the time to treatment Z is right censored by min(T,C). The endogeneity issue is
treated using an instrumental variable explaining Z and independent of the
error term of the model. We study identification in a fully nonparametric
framework. We show that our specification generates an integral equation, of
which the regression function of interest is a solution. We provide
identification conditions that rely on this identification equation. For
estimation purposes, we assume that the regression function follows a
parametric model. We propose an estimation procedure and give conditions under
which the estimator is asymptotically normal. The estimators exhibit good
finite sample properties in simulations. Our methodology is applied to find
evidence supporting the efficacy of a therapy for burn-out.

arXiv link: http://arxiv.org/abs/2201.10826v6

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-01-26

Combining Experimental and Observational Data for Identification and Estimation of Long-Term Causal Effects

Authors: AmirEmad Ghassami, Chang Liu, Alan Yang, David Richardson, Ilya Shpitser, Eric Tchetgen Tchetgen

We study identifying and estimating the causal effect of a treatment variable
on a long-term outcome using data from an observational and an experimental
domain. The observational data are subject to unobserved confounding.
Furthermore, subjects in the experiment are only followed for a short period;
thus, long-term effects are unobserved, though short-term effects are
available. Consequently, neither data source alone suffices for causal
inference on the long-term outcome, necessitating a principled fusion of the
two. We propose three approaches for data fusion for the purpose of identifying
and estimating the causal effect. The first assumes equal confounding bias for
short-term and long-term outcomes. The second weakens this assumption by
leveraging an observed confounder for which the short-term and long-term
potential outcomes share the same partial additive association with this
confounder. The third approach employs proxy variables of the latent confounder
of the treatment-outcome relationship, extending the proximal causal inference
framework to the data fusion setting. For each approach, we develop influence
function-based estimators and analyze their robustness properties. We
illustrate our methods by estimating the effect of class size on 8th-grade SAT
scores using data from the Project STAR experiment combined with observational
data from the Early Childhood Longitudinal Study.

arXiv link: http://arxiv.org/abs/2201.10743v4

Econometrics arXiv cross-link from q-fin.TR (q-fin.TR), submitted: 2022-01-25

Modeling bid and ask price dynamics with an extended Hawkes process and its empirical applications for high-frequency stock market data

Authors: Kyungsub Lee, Byoung Ki Seo

This study proposes a versatile model for the dynamics of the best bid and
ask prices using an extended Hawkes process. The model incorporates the zero
intensities of the spread-narrowing processes at the minimum bid-ask spread,
spread-dependent intensities, possible negative excitement, and nonnegative
intensities. We apply the model to high-frequency best bid and ask price data
from US stock markets. The empirical findings demonstrate a spread-narrowing
tendency, excitations of the intensities caused by previous events, the impact
of flash crashes, characteristic trends in fast trading over time, and the
different features of market participants in the various exchanges.

arXiv link: http://arxiv.org/abs/2201.10173v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-01-21

Marginal Effects for Non-Linear Prediction Functions

Authors: Christian A. Scholbeck, Giuseppe Casalicchio, Christoph Molnar, Bernd Bischl, Christian Heumann

Beta coefficients for linear regression models represent the ideal form of an
interpretable feature effect. However, for non-linear models and especially
generalized linear models, the estimated coefficients cannot be interpreted as
a direct feature effect on the predicted outcome. Hence, marginal effects are
typically used as approximations for feature effects, either in the shape of
derivatives of the prediction function or forward differences in prediction due
to a change in a feature value. While marginal effects are commonly used in
many scientific fields, they have not yet been adopted as a model-agnostic
interpretation method for machine learning models. This may stem from their
inflexibility as a univariate feature effect and their inability to deal with
the non-linearities found in black box models. We introduce a new class of
marginal effects termed forward marginal effects. We argue to abandon
derivatives in favor of better-interpretable forward differences. Furthermore,
we generalize marginal effects based on forward differences to multivariate
changes in feature values. To account for the non-linearity of prediction
functions, we introduce a non-linearity measure for marginal effects. We argue
against summarizing feature effects of a non-linear prediction function in a
single metric such as the average marginal effect. Instead, we propose to
partition the feature space to compute conditional average marginal effects on
feature subspaces, which serve as conditional feature effect estimates.

arXiv link: http://arxiv.org/abs/2201.08837v1

Econometrics arXiv paper, submitted: 2022-01-21

Minimax-Regret Climate Policy with Deep Uncertainty in Climate Modeling and Intergenerational Discounting

Authors: Stephen J. DeCanio, Charles F. Manski, Alan H. Sanstad

Integrated assessment models have become the primary tools for comparing
climate policies that seek to reduce greenhouse gas emissions. Policy
comparisons have often been performed by considering a planner who seeks to
make optimal trade-offs between the costs of carbon abatement and the economic
damages from climate change. The planning problem has been formalized as one of
optimal control, the objective being to minimize the total costs of abatement
and damages over a time horizon. Studying climate policy as a control problem
presumes that a planner knows enough to make optimization feasible, but
physical and economic uncertainties abound. Earlier, Manski, Sanstad, and
DeCanio proposed and studied use of the minimax-regret (MMR) decision criterion
to account for deep uncertainty in climate modeling. Here we study choice of
climate policy that minimizes maximum regret with deep uncertainty regarding
both the correct climate model and the appropriate time discount rate to use in
intergenerational assessment of policy consequences. The analysis specifies a
range of discount rates to express both empirical and normative uncertainty
about the appropriate rate. The findings regarding climate policy are novel and
informative. The MMR analysis points to use of a relatively low discount rate
of 0.02 for climate policy. The MMR decision rule keeps the maximum future
temperature increase below 2C above the 1900-10 level for most of the parameter
values used to weight costs and damages.

arXiv link: http://arxiv.org/abs/2201.08826v1

Econometrics arXiv updated paper (originally submitted: 2022-01-21)

High-Dimensional Sparse Multivariate Stochastic Volatility Models

Authors: Benjamin Poignard, Manabu Asai

Although multivariate stochastic volatility models usually produce more
accurate forecasts compared to the MGARCH models, their estimation techniques
such as Bayesian MCMC typically suffer from the curse of dimensionality. We
propose a fast and efficient estimation approach for MSV based on a penalized
OLS framework. Specifying the MSV model as a multivariate state space model, we
carry out a two-step penalized procedure. We provide the asymptotic properties
of the two-step estimator and the oracle property of the first-step estimator
when the number of parameters diverges. The performances of our method are
illustrated through simulations and financial data.

arXiv link: http://arxiv.org/abs/2201.08584v2

Econometrics arXiv paper, submitted: 2022-01-20

Estimation of Conditional Random Coefficient Models using Machine Learning Techniques

Authors: Stephan Martin

Nonparametric random coefficient (RC)-density estimation has mostly been
considered in the marginal density case under strict independence of RCs and
covariates. This paper deals with the estimation of RC-densities conditional on
a (large-dimensional) set of control variables using machine learning
techniques. The conditional RC-density allows to disentangle observable from
unobservable heterogeneity in partial effects of continuous treatments adding
to a growing literature on heterogeneous effect estimation using machine
learning. %It is also informative of the conditional potential outcome
distribution. This paper proposes a two-stage sieve estimation procedure. First
a closed-form sieve approximation of the conditional RC density is derived
where each sieve coefficient can be expressed as conditional expectation
function varying with controls. Second, sieve coefficients are estimated with
generic machine learning procedures and under appropriate sample splitting
rules. The $L_2$-convergence rate of the conditional RC-density estimator is
derived. The rate is slower by a factor then typical rates of mean regression
machine learning estimators which is due to the ill-posedness of the RC density
estimation problem. The performance and applicability of the estimator is
illustrated using random forest algorithms over a range of Monte Carlo
simulations and with real data from the SOEP-IS. Here behavioral heterogeneity
in an economic experiment on portfolio choice is studied. The method reveals
two types of behavior in the population, one type complying with economic
theory and one not. The assignment to types appears largely based on
unobservables not available in the data.

arXiv link: http://arxiv.org/abs/2201.08366v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-01-20

Learning with latent group sparsity via heat flow dynamics on networks

Authors: Subhroshekhar Ghosh, Soumendu Sundar Mukherjee

Group or cluster structure on explanatory variables in machine learning
problems is a very general phenomenon, which has attracted broad interest from
practitioners and theoreticians alike. In this work we contribute an approach
to learning under such group structure, that does not require prior information
on the group identities. Our paradigm is motivated by the Laplacian geometry of
an underlying network with a related community structure, and proceeds by
directly incorporating this into a penalty that is effectively computed via a
heat flow-based local network dynamics. In fact, we demonstrate a procedure to
construct such a network based on the available data. Notably, we dispense with
computationally intensive pre-processing involving clustering of variables,
spectral or otherwise. Our technique is underpinned by rigorous theorems that
guarantee its effective performance and provide bounds on its sample
complexity. In particular, in a wide range of settings, it provably suffices to
run the heat flow dynamics for time that is only logarithmic in the problem
dimensions. We explore in detail the interfaces of our approach with key
statistical physics models in network science, such as the Gaussian Free Field
and the Stochastic Block Model. We validate our approach by successful
applications to real-world data from a wide array of application domains,
including computer science, genetics, climatology and economics. Our work
raises the possibility of applying similar diffusion-based techniques to
classical learning tasks, exploiting the interplay between geometric, dynamical
and stochastic structures underlying the data.

arXiv link: http://arxiv.org/abs/2201.08326v1

Econometrics arXiv paper, submitted: 2022-01-19

Identification of Direct Socio-Geographical Price Discrimination: An Empirical Study on iPhones

Authors: Davidson Cheng

Price discrimination is a practice where firms utilize varying sensitivities
to prices among consumers to increase profits. The welfare effects of price
discrimination are not agreed on among economists, but identification of such
actions may contribute to our standing of firms' pricing behaviors. In this
letter, I use econometric tools to analyze whether Apple Inc, one of the
largest companies in the globe, is practicing price discrimination on the basis
of socio-economical and geographical factors. My results indicate that iPhones
are significantly (p $<$ 0.01) more expensive in markets where competitions are
weak or where Apple has a strong market presence. Furthermore, iPhone prices
are likely to increase (p $<$ 0.01) in developing countries/regions or markets
with high income inequality.

arXiv link: http://arxiv.org/abs/2201.07903v1

Econometrics arXiv paper, submitted: 2022-01-18

Asymptotic properties of Bayesian inference in linear regression with a structural break

Authors: Kenichi Shimizu

This paper studies large sample properties of a Bayesian approach to
inference about slope parameters $\gamma$ in linear regression models with a
structural break. In contrast to the conventional approach to inference about
$\gamma$ that does not take into account the uncertainty of the unknown break
location $\tau$, the Bayesian approach that we consider incorporates such
uncertainty. Our main theoretical contribution is a Bernstein-von Mises type
theorem (Bayesian asymptotic normality) for $\gamma$ under a wide class of
priors, which essentially indicates an asymptotic equivalence between the
conventional frequentist and Bayesian inference. Consequently, a frequentist
researcher could look at credible intervals of $\gamma$ to check robustness
with respect to the uncertainty of $\tau$. Simulation studies show that the
conventional confidence intervals of $\gamma$ tend to undercover in finite
samples whereas the credible intervals offer more reasonable coverages in
general. As the sample size increases, the two methods coincide, as predicted
from our theoretical conclusion. Using data from Paye and Timmermann (2006) on
stock return prediction, we illustrate that the traditional confidence
intervals on $\gamma$ might underrepresent the true sampling uncertainty.

arXiv link: http://arxiv.org/abs/2201.07319v1

Econometrics arXiv updated paper (originally submitted: 2022-01-18)

Large Hybrid Time-Varying Parameter VARs

Authors: Joshua C. C. Chan

Time-varying parameter VARs with stochastic volatility are routinely used for
structural analysis and forecasting in settings involving a few endogenous
variables. Applying these models to high-dimensional datasets has proved to be
challenging due to intensive computations and over-parameterization concerns.
We develop an efficient Bayesian sparsification method for a class of models we
call hybrid TVP-VARs--VARs with time-varying parameters in some equations but
constant coefficients in others. Specifically, for each equation, the new
method automatically decides whether the VAR coefficients and contemporaneous
relations among variables are constant or time-varying. Using US datasets of
various dimensions, we find evidence that the parameters in some, but not all,
equations are time varying. The large hybrid TVP-VAR also forecasts better than
many standard benchmarks.

arXiv link: http://arxiv.org/abs/2201.07303v2

Econometrics arXiv paper, submitted: 2022-01-18

Bayesian inference of spatial and temporal relations in AI patents for EU countries

Authors: Krzysztof Rusek, Agnieszka Kleszcz, Albert Cabellos-Aparicio

In the paper, we propose two models of Artificial Intelligence (AI) patents
in European Union (EU) countries addressing spatial and temporal behaviour. In
particular, the models can quantitatively describe the interaction between
countries or explain the rapidly growing trends in AI patents. For spatial
analysis Poisson regression is used to explain collaboration between a pair of
countries measured by the number of common patents. Through Bayesian inference,
we estimated the strengths of interactions between countries in the EU and the
rest of the world. In particular, a significant lack of cooperation has been
identified for some pairs of countries.
Alternatively, an inhomogeneous Poisson process combined with the logistic
curve growth accurately models the temporal behaviour by an accurate trend
line. Bayesian analysis in the time domain revealed an upcoming slowdown in
patenting intensity.

arXiv link: http://arxiv.org/abs/2201.07168v1

Econometrics arXiv updated paper (originally submitted: 2022-01-18)

Who Increases Emergency Department Use? New Insights from the Oregon Health Insurance Experiment

Authors: Augustine Denteh, Helge Liebert

We provide new insights regarding the headline result that Medicaid increased
emergency department (ED) use from the Oregon experiment. We find meaningful
heterogeneous impacts of Medicaid on ED use using causal machine learning
methods. The individualized treatment effect distribution includes a wide range
of negative and positive values, suggesting the average effect masks
substantial heterogeneity. A small group-about 14% of participants-in the right
tail of the distribution drives the overall effect. We identify priority groups
with economically significant increases in ED usage based on demographics and
previous utilization. Intensive margin effects are an important driver of
increases in ED utilization.

arXiv link: http://arxiv.org/abs/2201.07072v4

Econometrics arXiv paper, submitted: 2022-01-18

The Time-Varying Multivariate Autoregressive Index Model

Authors: G. Cubadda, S. Grassi, B. Guardabascio

Many economic variables feature changes in their conditional mean and
volatility, and Time Varying Vector Autoregressive Models are often used to
handle such complexity in the data. Unfortunately, when the number of series
grows, they present increasing estimation and interpretation problems. This
paper tries to address this issue proposing a new Multivariate Autoregressive
Index model that features time varying means and volatility. Technically, we
develop a new estimation methodology that mix switching algorithms with the
forgetting factors strategy of Koop and Korobilis (2012). This substantially
reduces the computational burden and allows to select or weight, in real time,
the number of common components and other features of the data using Dynamic
Model Selection or Dynamic Model Averaging without further computational cost.
Using USA macroeconomic data, we provide a structural analysis and a
forecasting exercise that demonstrates the feasibility and usefulness of this
new model.
Keywords: Large datasets, Multivariate Autoregressive Index models,
Stochastic volatility, Bayesian VARs.

arXiv link: http://arxiv.org/abs/2201.07069v1

Econometrics arXiv updated paper (originally submitted: 2022-01-18)

Close Enough? A Large-Scale Exploration of Non-Experimental Approaches to Advertising Measurement

Authors: Brett R. Gordon, Robert Moakler, Florian Zettelmeyer

Despite their popularity, randomized controlled trials (RCTs) are not always
available for the purposes of advertising measurement. Non-experimental data is
thus required. However, Facebook and other ad platforms use complex and
evolving processes to select ads for users. Therefore, successful
non-experimental approaches need to "undo" this selection. We analyze 663
large-scale experiments at Facebook to investigate whether this is possible
with the data typically logged at large ad platforms. With access to over 5,000
user-level features, these data are richer than what most advertisers or their
measurement partners can access. We investigate how accurately two
non-experimental methods -- double/debiased machine learning (DML) and
stratified propensity score matching (SPSM) -- can recover the experimental
effects. Although DML performs better than SPSM, neither method performs well,
even using flexible deep learning models to implement the propensity and
outcome models. The median RCT lifts are 29%, 18%, and 5% for the upper,
middle, and lower funnel outcomes, respectively. Using DML (SPSM), the median
lift by funnel is 83% (173%), 58% (176%), and 24% (64%), respectively,
indicating significant relative measurement errors. We further characterize the
circumstances under which each method performs comparatively better. Overall,
despite having access to large-scale experiments and rich user-level data, we
are unable to reliably estimate an ad campaign's causal effect.

arXiv link: http://arxiv.org/abs/2201.07055v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-01-18

Socioeconomic disparities and COVID-19: the causal connections

Authors: Tannista Banerjee, Ayan Paul, Vishak Srikanth, Inga Strümke

The analysis of causation is a challenging task that can be approached in
various ways. With the increasing use of machine learning based models in
computational socioeconomics, explaining these models while taking causal
connections into account is a necessity. In this work, we advocate the use of
an explanatory framework from cooperative game theory augmented with $do$
calculus, namely causal Shapley values. Using causal Shapley values, we analyze
socioeconomic disparities that have a causal link to the spread of COVID-19 in
the USA. We study several phases of the disease spread to show how the causal
connections change over time. We perform a causal analysis using random effects
models and discuss the correspondence between the two methods to verify our
results. We show the distinct advantages a non-linear machine learning models
have over linear models when performing a multivariate analysis, especially
since the machine learning models can map out non-linear correlations in the
data. In addition, the causal Shapley values allow for including the causal
structure in the variable importance computed for the machine learning model.

arXiv link: http://arxiv.org/abs/2201.07026v1

Econometrics arXiv updated paper (originally submitted: 2022-01-18)

Difference-in-Differences Estimators for Treatments Continuously Distributed at Every Period

Authors: Clément de Chaisemartin, Xavier D'Haultfœuille, Félix Pasquier, Doulo Sow, Gonzalo Vazquez-Bare

When one studies the effects of taxes, tariffs, or prices using panel data,
the treatment is often continuously distributed in every period. We propose
difference-in-differences (DID) estimators for such cases. We assume that
between consecutive periods, the treatment of some units, the switchers,
changes, while the treatment of other units, the stayers, remains constant. We
show that under a parallel-trends assumption, the slopes of switchers'
potential outcomes are nonparametrically identified by
difference-in-differences estimands comparing the outcome evolutions of
switchers and stayers with the same baseline treatment. Controlling for the
baseline treatment ensures that our estimands remain valid if the treatment's
effect changes over time. We consider two weighted averages of switchers'
slopes, and discuss their respective advantages. For each weighted average, we
propose a doubly-robust, nonparametric, and $n$-consistent estimator. We
generalize our results to the instrumental-variable case. We apply our method
to estimate the price-elasticity of gasoline consumption.

arXiv link: http://arxiv.org/abs/2201.06898v6

Econometrics arXiv updated paper (originally submitted: 2022-01-18)

Homophily in preferences or meetings? Identifying and estimating an iterative network formation model

Authors: Luis Alvarez, Cristine Pinto, Vladimir Ponczek

Is homophily in social and economic networks driven by a taste for
homogeneity (preferences) or by a higher probability of meeting individuals
with similar attributes (opportunity)? This paper studies identification and
estimation of an iterative network game that distinguishes between these two
mechanisms. Our approach enables us to assess the counterfactual effects of
changing the meeting protocol between agents. As an application, we study the
role of preferences and meetings in shaping classroom friendship networks in
Brazil. In a network structure in which homophily due to preferences is
stronger than homophily due to meeting opportunities, tracking students may
improve welfare. Still, the relative benefit of this policy diminishes over the
school year.

arXiv link: http://arxiv.org/abs/2201.06694v5

Econometrics arXiv paper, submitted: 2022-01-17

An Entropy-Based Approach for Nonparametrically Testing Simple Probability Distribution Hypotheses

Authors: Ron Mittelhammer, George Judge, Miguel Henry

In this paper, we introduce a flexible and widely applicable nonparametric
entropy-based testing procedure that can be used to assess the validity of
simple hypotheses about a specific parametric population distribution. The
testing methodology relies on the characteristic function of the population
probability distribution being tested and is attractive in that, regardless of
the null hypothesis being tested, it provides a unified framework for
conducting such tests. The testing procedure is also computationally tractable
and relatively straightforward to implement. In contrast to some alternative
test statistics, the proposed entropy test is free from user-specified kernel
and bandwidth choices, idiosyncratic and complex regularity conditions, and/or
choices of evaluation grids. Several simulation exercises were performed to
document the empirical performance of our proposed test, including a regression
example that is illustrative of how, in some contexts, the approach can be
applied to composite hypothesis-testing situations via data transformations.
Overall, the testing procedure exhibits notable promise, exhibiting appreciable
increasing power as sample size increases for a number of alternative
distributions when contrasted with hypothesized null distributions. Possible
general extensions of the approach to composite hypothesis-testing contexts,
and directions for future work are also discussed.

arXiv link: http://arxiv.org/abs/2201.06647v1

Econometrics arXiv updated paper (originally submitted: 2022-01-17)

Inferential Theory for Granular Instrumental Variables in High Dimensions

Authors: Saman Banafti, Tae-Hwy Lee

The Granular Instrumental Variables (GIV) methodology exploits panels with
factor error structures to construct instruments to estimate structural time
series models with endogeneity even after controlling for latent factors. We
extend the GIV methodology in several dimensions. First, we extend the
identification procedure to a large $N$ and large $T$ framework, which depends
on the asymptotic Herfindahl index of the size distribution of $N$
cross-sectional units. Second, we treat both the factors and loadings as
unknown and show that the sampling error in the estimated instrument and
factors is negligible when considering the limiting distribution of the
structural parameters. Third, we show that the sampling error in the
high-dimensional precision matrix is negligible in our estimation algorithm.
Fourth, we overidentify the structural parameters with additional constructed
instruments, which leads to efficiency gains. Monte Carlo evidence is presented
to support our asymptotic theory and application to the global crude oil market
leads to new results.

arXiv link: http://arxiv.org/abs/2201.06605v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2022-01-17

On Well-posedness and Minimax Optimal Rates of Nonparametric Q-function Estimation in Off-policy Evaluation

Authors: Xiaohong Chen, Zhengling Qi

We study the off-policy evaluation (OPE) problem in an infinite-horizon
Markov decision process with continuous states and actions. We recast the
$Q$-function estimation into a special form of the nonparametric instrumental
variables (NPIV) estimation problem. We first show that under one mild
condition the NPIV formulation of $Q$-function estimation is well-posed in the
sense of $L^2$-measure of ill-posedness with respect to the data generating
distribution, bypassing a strong assumption on the discount factor $\gamma$
imposed in the recent literature for obtaining the $L^2$ convergence rates of
various $Q$-function estimators. Thanks to this new well-posed property, we
derive the first minimax lower bounds for the convergence rates of
nonparametric estimation of $Q$-function and its derivatives in both sup-norm
and $L^2$-norm, which are shown to be the same as those for the classical
nonparametric regression (Stone, 1982). We then propose a sieve two-stage least
squares estimator and establish its rate-optimality in both norms under some
mild conditions. Our general results on the well-posedness and the minimax
lower bounds are of independent interest to study not only other nonparametric
estimators for $Q$-function but also efficient estimation on the value of any
target policy in off-policy settings.

arXiv link: http://arxiv.org/abs/2201.06169v3

Econometrics arXiv paper, submitted: 2022-01-16

Nonparametric Identification of Random Coefficients in Endogenous and Heterogeneous Aggregate Demand Models

Authors: Fabian Dunker, Stefan Hoderlein, Hiroaki Kaido

This paper studies nonparametric identification in market level demand models
for differentiated products with heterogeneous consumers. We consider a general
class of models that allows for the individual specific coefficients to vary
continuously across the population and give conditions under which the density
of these coefficients, and hence also functionals such as welfare measures, is
identified. A key finding is that two leading models, the BLP-model (Berry,
Levinsohn, and Pakes, 1995) and the pure characteristics model (Berry and
Pakes, 2007), require considerably different conditions on the support of the
product characteristics.

arXiv link: http://arxiv.org/abs/2201.06140v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2022-01-15

Treatment Effect Risk: Bounds and Inference

Authors: Nathan Kallus

Since the average treatment effect (ATE) measures the change in social
welfare, even if positive, there is a risk of negative effect on, say, some 10%
of the population. Assessing such risk is difficult, however, because any one
individual treatment effect (ITE) is never observed, so the 10% worst-affected
cannot be identified, while distributional treatment effects only compare the
first deciles within each treatment group, which does not correspond to any
10%-subpopulation. In this paper we consider how to nonetheless assess this
important risk measure, formalized as the conditional value at risk (CVaR) of
the ITE-distribution. We leverage the availability of pre-treatment covariates
and characterize the tightest-possible upper and lower bounds on ITE-CVaR given
by the covariate-conditional average treatment effect (CATE) function. We then
proceed to study how to estimate these bounds efficiently from data and
construct confidence intervals. This is challenging even in randomized
experiments as it requires understanding the distribution of the unknown CATE
function, which can be very complex if we use rich covariates so as to best
control for heterogeneity. We develop a debiasing method that overcomes this
and prove it enjoys favorable statistical properties even when CATE and other
nuisances are estimated by black-box machine learning or even inconsistently.
Studying a hypothetical change to French job-search counseling services, our
bounds and inference demonstrate a small social benefit entails a negative
impact on a substantial subpopulation.

arXiv link: http://arxiv.org/abs/2201.05893v2

Econometrics arXiv paper, submitted: 2022-01-14

Measuring Changes in Disparity Gaps: An Application to Health Insurance

Authors: Paul Goldsmith-Pinkham, Karen Jiang, Zirui Song, Jacob Wallace

We propose a method for reporting how program evaluations reduce gaps between
groups, such as the gender or Black-white gap. We first show that the reduction
in disparities between groups can be written as the difference in conditional
average treatment effects (CATE) for each group. Then, using a
Kitagawa-Oaxaca-Blinder-style decomposition, we highlight how these CATE can be
decomposed into unexplained differences in CATE in other observables versus
differences in composition across other observables (e.g. the "endowment").
Finally, we apply this approach to study the impact of Medicare on American's
access to health insurance.

arXiv link: http://arxiv.org/abs/2201.05672v1

Econometrics arXiv updated paper (originally submitted: 2022-01-14)

Monitoring the Economy in Real Time: Trends and Gaps in Real Activity and Prices

Authors: Thomas Hasenzagl, Filippo Pellegrino, Lucrezia Reichlin, Giovanni Ricco

We propose two specifications of a real-time mixed-frequency semi-structural
time series model for evaluating the output potential, output gap, Phillips
curve, and Okun's law for the US. The baseline model uses minimal theory-based
multivariate identification restrictions to inform trend-cycle decomposition,
while the alternative model adds the CBO's output gap measure as an observed
variable. The latter model results in a smoother output potential and lower
cyclical correlation between inflation and real variables but performs worse in
forecasting beyond the short term. This methodology allows for the assessment
and real-time monitoring of official trend and gap estimates.

arXiv link: http://arxiv.org/abs/2201.05556v2

Econometrics arXiv updated paper (originally submitted: 2022-01-14)

Detecting Multiple Structural Breaks in Systems of Linear Regression Equations with Integrated and Stationary Regressors

Authors: Karsten Schweikert

In this paper, we propose a two-step procedure based on the group LASSO
estimator in combination with a backward elimination algorithm to detect
multiple structural breaks in linear regressions with multivariate responses.
Applying the two-step estimator, we jointly detect the number and location of
structural breaks, and provide consistent estimates of the coefficients. Our
framework is flexible enough to allow for a mix of integrated and stationary
regressors, as well as deterministic terms. Using simulation experiments, we
show that the proposed two-step estimator performs competitively against the
likelihood-based approach (Qu and Perron, 2007; Li and Perron, 2017; Oka and
Perron, 2018) in finite samples. However, the two-step estimator is
computationally much more efficient. An economic application to the
identification of structural breaks in the term structure of interest rates
illustrates this methodology.

arXiv link: http://arxiv.org/abs/2201.05430v4

Econometrics arXiv updated paper (originally submitted: 2022-01-13)

Kernel methods for long term dose response curves

Authors: Rahul Singh, Hannah Zhou

A core challenge in causal inference is how to extrapolate long term effects,
of possibly continuous actions, from short term experimental data. It arises in
artificial intelligence: the long term consequences of continuous actions may
be of interest, yet only short term rewards may be collected in exploration.
For this estimand, called the long term dose response curve, we propose a
simple nonparametric estimator based on kernel ridge regression. By embedding
the distribution of the short term experimental data with kernels, we derive
interpretable weights for extrapolating long term effects. Our method allows
actions, short term rewards, and long term rewards to be continuous in general
spaces. It also allows for nonlinearity and heterogeneity in the link between
short term effects and long term effects. We prove uniform consistency, with
nonasymptotic error bounds reflecting the effective dimension of the data. As
an application, we estimate the long term dose response curve of Project STAR,
a social program which randomly assigned students to various class sizes. We
extend our results to long term counterfactual distributions, proving weak
convergence.

arXiv link: http://arxiv.org/abs/2201.05139v2

Econometrics arXiv updated paper (originally submitted: 2022-01-13)

Binary response model with many weak instruments

Authors: Dakyung Seong

This paper considers an endogenous binary response model with many weak
instruments. We employ a control function approach and a regularization scheme
to obtain better estimation results for the endogenous binary response model in
the presence of many weak instruments. Two consistent and asymptotically
normally distributed estimators are provided, each of which is called a
regularized conditional maximum likelihood estimator (RCMLE) and a regularized
nonlinear least squares estimator (RNLSE). Monte Carlo simulations show that
the proposed estimators outperform the existing ones when there are many weak
instruments. We use the proposed estimation method to examine the effect of
family income on college completion.

arXiv link: http://arxiv.org/abs/2201.04811v4

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2022-01-12

Optimal Best Arm Identification in Two-Armed Bandits with a Fixed Budget under a Small Gap

Authors: Masahiro Kato, Kaito Ariu, Masaaki Imaizumi, Masahiro Nomura, Chao Qin

We consider fixed-budget best-arm identification in two-armed Gaussian bandit
problems. One of the longstanding open questions is the existence of an optimal
strategy under which the probability of misidentification matches a lower
bound. We show that a strategy following the Neyman allocation rule (Neyman,
1934) is asymptotically optimal when the gap between the expected rewards is
small. First, we review a lower bound derived by Kaufmann et al. (2016). Then,
we propose the "Neyman Allocation (NA)-Augmented Inverse Probability weighting
(AIPW)" strategy, which consists of the sampling rule using the Neyman
allocation with an estimated standard deviation and the recommendation rule
using an AIPW estimator. Our proposed strategy is optimal because the upper
bound matches the lower bound when the budget goes to infinity and the gap goes
to zero.

arXiv link: http://arxiv.org/abs/2201.04469v8

Econometrics arXiv paper, submitted: 2022-01-10

A machine learning search for optimal GARCH parameters

Authors: Luke De Clerk, Sergey Savl'ev

Here, we use Machine Learning (ML) algorithms to update and improve the
efficiencies of fitting GARCH model parameters to empirical data. We employ an
Artificial Neural Network (ANN) to predict the parameters of these models. We
present a fitting algorithm for GARCH-normal(1,1) models to predict one of the
model's parameters, $\alpha_1$ and then use the analytical expressions for the
fourth order standardised moment, $\Gamma_4$ and the unconditional second order
moment, $\sigma^2$ to fit the other two parameters; $\beta_1$ and $\alpha_0$,
respectively. The speed of fitting of the parameters and quick implementation
of this approach allows for real time tracking of GARCH parameters. We further
show that different inputs to the ANN namely, higher order standardised moments
and the autocovariance of time series can be used for fitting model parameters
using the ANN, but not always with the same level of accuracy.

arXiv link: http://arxiv.org/abs/2201.03286v1

Econometrics arXiv updated paper (originally submitted: 2022-01-07)

Approximate Factor Models for Functional Time Series

Authors: Sven Otto, Nazarii Salish

We propose a novel approximate factor model tailored for analyzing
time-dependent curve data. Our model decomposes such data into two distinct
components: a low-dimensional predictable factor component and an unpredictable
error term. These components are identified through the autocovariance
structure of the underlying functional time series. The model parameters are
consistently estimated using the eigencomponents of a cumulative autocovariance
operator and an information criterion is proposed to determine the appropriate
number of factors. Applications to mortality and yield curve modeling
illustrate key advantages of our approach over the widely used functional
principal component analysis, as it offers parsimonious structural
representations of the underlying dynamics along with gains in out-of-sample
forecast performance.

arXiv link: http://arxiv.org/abs/2201.02532v4

Econometrics arXiv cross-link from cs.CE (cs.CE), submitted: 2022-01-07

Microeconomic Foundations of Decentralised Organisations

Authors: Mauricio Jacobo Romero, André Freitas

In this article, we analyse how decentralised digital infrastructures can
provide a fundamental change in the structure and dynamics of organisations.
The works of R.H.Coase and M. Olson, on the nature of the firm and the logic of
collective action, respectively, are revisited under the light of these
emerging new digital foundations. We also analyse how these technologies can
affect the fundamental assumptions on the role of organisations (either private
or public) as mechanisms for the coordination of labour. We propose that these
technologies can fundamentally affect: (i) the distribution of rewards within
an organisation and (ii) the structure of its transaction costs. These changes
bring the potential for addressing some of the trade-offs between the private
and public sectors.

arXiv link: http://arxiv.org/abs/2201.07666v2

Econometrics arXiv updated paper (originally submitted: 2022-01-07)

Unconditional Effects of General Policy Interventions

Authors: Julian Martinez-Iriarte, Gabriel Montes-Rojas, Yixiao Sun

This paper studies the unconditional effects of a general policy
intervention, which includes location-scale shifts and simultaneous shifts as
special cases. The location-scale shift is intended to study a counterfactual
policy aimed at changing not only the mean or location of a covariate but also
its dispersion or scale. The simultaneous shift refers to the situation where
shifts in two or more covariates take place simultaneously. For example, a
shift in one covariate is compensated at a certain rate by a shift in another
covariate. Not accounting for these possible scale or simultaneous shifts will
result in an incorrect assessment of the potential policy effects on an outcome
variable of interest. The unconditional policy parameters are estimated with
simple semiparametric estimators, for which asymptotic properties are studied.
Monte Carlo simulations are implemented to study their finite sample
performances. The proposed approach is applied to a Mincer equation to study
the effects of changing years of education on wages and to study the effect of
smoking during pregnancy on birth weight.

arXiv link: http://arxiv.org/abs/2201.02292v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-01-05

NumHTML: Numeric-Oriented Hierarchical Transformer Model for Multi-task Financial Forecasting

Authors: Linyi Yang, Jiazheng Li, Ruihai Dong, Yue Zhang, Barry Smyth

Financial forecasting has been an important and active area of machine
learning research because of the challenges it presents and the potential
rewards that even minor improvements in prediction accuracy or forecasting may
entail. Traditionally, financial forecasting has heavily relied on quantitative
indicators and metrics derived from structured financial statements. Earnings
conference call data, including text and audio, is an important source of
unstructured data that has been used for various prediction tasks using deep
earning and related approaches. However, current deep learning-based methods
are limited in the way that they deal with numeric data; numbers are typically
treated as plain-text tokens without taking advantage of their underlying
numeric structure. This paper describes a numeric-oriented hierarchical
transformer model to predict stock returns, and financial risk using
multi-modal aligned earnings calls data by taking advantage of the different
categories of numbers (monetary, temporal, percentages etc.) and their
magnitude. We present the results of a comprehensive evaluation of NumHTML
against several state-of-the-art baselines using a real-world publicly
available dataset. The results indicate that NumHTML significantly outperforms
the current state-of-the-art across a variety of evaluation metrics and that it
has the potential to offer significant financial gains in a practical trading
context.

arXiv link: http://arxiv.org/abs/2201.01770v1

Econometrics arXiv updated paper (originally submitted: 2022-01-04)

What's Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature

Authors: Jonathan Roth, Pedro H. C. Sant'Anna, Alyssa Bilinski, John Poe

This paper synthesizes recent advances in the econometrics of
difference-in-differences (DiD) and provides concrete recommendations for
practitioners. We begin by articulating a simple set of “canonical”
assumptions under which the econometrics of DiD are well-understood. We then
argue that recent advances in DiD methods can be broadly classified as relaxing
some components of the canonical DiD setup, with a focus on $(i)$ multiple
periods and variation in treatment timing, $(ii)$ potential violations of
parallel trends, or $(iii)$ alternative frameworks for inference. Our
discussion highlights the different ways that the DiD literature has advanced
beyond the canonical model, and helps to clarify when each of the papers will
be relevant for empirical work. We conclude by discussing some promising areas
for future research.

arXiv link: http://arxiv.org/abs/2201.01194v3

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2022-01-04

A Multivariate Dependence Analysis for Electricity Prices, Demand and Renewable Energy Sources

Authors: Fabrizio Durante, Angelica Gianfreda, Francesco Ravazzolo, Luca Rossini

This paper examines the dependence between electricity prices, demand, and
renewable energy sources by means of a multivariate copula model {while
studying Germany, the widest studied market in Europe}. The inter-dependencies
are investigated in-depth and monitored over time, with particular emphasis on
the tail behavior. To this end, suitable tail dependence measures are
introduced to take into account a multivariate extreme scenario appropriately
identified {through the} Kendall's distribution function. The empirical
evidence demonstrates a strong association between electricity prices,
renewable energy sources, and demand within a day and over the studied years.
Hence, this analysis provides guidance for further and different incentives for
promoting green energy generation while considering the time-varying
dependencies of the involved variables

arXiv link: http://arxiv.org/abs/2201.01132v1

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2022-01-04

Efficient Likelihood-based Estimation via Annealing for Dynamic Structural Macrofinance Models

Authors: Andras Fulop, Jeremy Heng, Junye Li

Most solved dynamic structural macrofinance models are non-linear and/or
non-Gaussian state-space models with high-dimensional and complex structures.
We propose an annealed controlled sequential Monte Carlo method that delivers
numerically stable and low variance estimators of the likelihood function. The
method relies on an annealing procedure to gradually introduce information from
observations and constructs globally optimal proposal distributions by solving
associated optimal control problems that yield zero variance likelihood
estimators. To perform parameter inference, we develop a new adaptive SMC$^2$
algorithm that employs likelihood estimators from annealed controlled
sequential Monte Carlo. We provide a theoretical stability analysis that
elucidates the advantages of our methodology and asymptotic results concerning
the consistency and convergence rates of our SMC$^2$ estimators. We illustrate
the strengths of our proposed methodology by estimating two popular
macrofinance models: a non-linear new Keynesian dynamic stochastic general
equilibrium model and a non-linear non-Gaussian consumption-based long-run risk
model.

arXiv link: http://arxiv.org/abs/2201.01094v1

Econometrics arXiv updated paper (originally submitted: 2022-01-04)

A Double Robust Approach for Non-Monotone Missingness in Multi-Stage Data

Authors: Shenshen Yang

Multivariate missingness with a non-monotone missing pattern is complicated
to deal with in empirical studies. The traditional Missing at Random (MAR)
assumption is difficult to justify in such cases. Previous studies have
strengthened the MAR assumption, suggesting that the missing mechanism of any
variable is random when conditioned on a uniform set of fully observed
variables. However, empirical evidence indicates that this assumption may be
violated for variables collected at different stages. This paper proposes a new
MAR-type assumption that fits non-monotone missing scenarios involving
multi-stage variables. Based on this assumption, we construct an Augmented
Inverse Probability Weighted GMM (AIPW-GMM) estimator. This estimator features
an asymmetric format for the augmentation term, guarantees double robustness,
and achieves the closed-form semiparametric efficiency bound. We apply this
method to cases of missingness in both endogenous regressor and outcome, using
the Oregon Health Insurance Experiment as an example. We check the correlation
between missing probabilities and partially observed variables to justify the
assumption. Moreover, we find that excluding incomplete data results in a loss
of efficiency and insignificant estimators. The proposed estimator reduces the
standard error by more than 50% for the estimated effects of the Oregon Health
Plan on the elderly.

arXiv link: http://arxiv.org/abs/2201.01010v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2022-01-02

Deep Learning and Linear Programming for Automated Ensemble Forecasting and Interpretation

Authors: Lars Lien Ankile, Kjartan Krange

This paper presents an ensemble forecasting method that shows strong results
on the M4 Competition dataset by decreasing feature and model selection
assumptions, termed DONUT (DO Not UTilize human beliefs). Our assumption
reductions, primarily consisting of auto-generated features and a more diverse
model pool for the ensemble, significantly outperform the statistical,
feature-based ensemble method FFORMA by Montero-Manso et al. (2020). We also
investigate feature extraction with a Long Short-term Memory Network (LSTM)
Autoencoder and find that such features contain crucial information not
captured by standard statistical feature approaches. The ensemble weighting
model uses LSTM and statistical features to combine the models accurately. The
analysis of feature importance and interaction shows a slight superiority for
LSTM features over the statistical ones alone. Clustering analysis shows that
essential LSTM features differ from most statistical features and each other.
We also find that increasing the solution space of the weighting model by
augmenting the ensemble with new models is something the weighting model learns
to use, thus explaining part of the accuracy gains. Moreover, we present a
formal ex-post-facto analysis of an optimal combination and selection for
ensembles, quantifying differences through linear optimization on the M4
dataset. Our findings indicate that classical statistical time series features,
such as trend and seasonality, alone do not capture all relevant information
for forecasting a time series. On the contrary, our novel LSTM features contain
significantly more predictive power than the statistical ones alone, but
combining the two feature sets proved the best in practice.

arXiv link: http://arxiv.org/abs/2201.00426v2

Econometrics arXiv cross-link from cs.GT (cs.GT), submitted: 2022-01-01

Modelling Cournot Games as Multi-agent Multi-armed Bandits

Authors: Kshitija Taywade, Brent Harrison, Adib Bagh

We investigate the use of a multi-agent multi-armed bandit (MA-MAB) setting
for modeling repeated Cournot oligopoly games, where the firms acting as agents
choose from the set of arms representing production quantity (a discrete
value). Agents interact with separate and independent bandit problems. In this
formulation, each agent makes sequential choices among arms to maximize its own
reward. Agents do not have any information about the environment; they can only
see their own rewards after taking an action. However, the market demand is a
stationary function of total industry output, and random entry or exit from the
market is not allowed. Given these assumptions, we found that an
$\epsilon$-greedy approach offers a more viable learning mechanism than other
traditional MAB approaches, as it does not require any additional knowledge of
the system to operate. We also propose two novel approaches that take advantage
of the ordered action space: $\epsilon$-greedy+HL and $\epsilon$-greedy+EL.
These new approaches help firms to focus on more profitable actions by
eliminating less profitable choices and hence are designed to optimize the
exploration. We use computer simulations to study the emergence of various
equilibria in the outcomes and do the empirical analysis of joint cumulative
regrets.

arXiv link: http://arxiv.org/abs/2201.01182v1

Econometrics arXiv updated paper (originally submitted: 2021-12-30)

Auction Throttling and Causal Inference of Online Advertising Effects

Authors: George Gui, Harikesh Nair, Fengshi Niu

Causally identifying the effect of digital advertising is challenging,
because experimentation is expensive, and observational data lacks random
variation. This paper identifies a pervasive source of naturally occurring,
quasi-experimental variation in user-level ad-exposure in digital advertising
campaigns. It shows how this variation can be utilized by ad-publishers to
identify the causal effect of advertising campaigns. The variation pertains to
auction throttling, a probabilistic method of budget pacing that is widely used
to spread an ad-campaign`s budget over its deployed duration, so that the
campaign`s budget is not exceeded or overly concentrated in any one period. The
throttling mechanism is implemented by computing a participation probability
based on the campaign`s budget spending rate and then including the campaign in
a random subset of available ad-auctions each period according to this
probability. We show that access to logged-participation probabilities enables
identifying the local average treatment effect (LATE) in the ad-campaign. We
present a new estimator that leverages this identification strategy and outline
a bootstrap procedure for quantifying its variability. We apply our method to
real-world ad-campaign data from an e-commerce advertising platform, which uses
such throttling for budget pacing. We show our estimate is statistically
different from estimates derived using other standard observational methods
such as OLS and two-stage least squares estimators. Our estimated conversion
lift is 110%, a more plausible number than 600%, the conversion lifts estimated
using naive observational methods.

arXiv link: http://arxiv.org/abs/2112.15155v2

Econometrics arXiv updated paper (originally submitted: 2021-12-30)

Estimating a Continuous Treatment Model with Spillovers: A Control Function Approach

Authors: Tadao Hoshino

We study a continuous treatment effect model in the presence of treatment
spillovers through social networks. We assume that one's outcome is affected
not only by his/her own treatment but also by a (weighted) average of his/her
neighbors' treatments, both of which are treated as endogenous variables. Using
a control function approach with appropriate instrumental variables, we show
that the conditional mean potential outcome can be nonparametrically
identified. We also consider a more empirically tractable semiparametric model
and develop a three-step estimation procedure for this model. As an empirical
illustration, we investigate the causal effect of the regional unemployment
rate on the crime rate.

arXiv link: http://arxiv.org/abs/2112.15114v3

Econometrics arXiv paper, submitted: 2021-12-30

Modeling and Forecasting Intraday Market Returns: a Machine Learning Approach

Authors: Iuri H. Ferreira, Marcelo C. Medeiros

In this paper we examine the relation between market returns and volatility
measures through machine learning methods in a high-frequency environment. We
implement a minute-by-minute rolling window intraday estimation method using
two nonlinear models: Long-Short-Term Memory (LSTM) neural networks and Random
Forests (RF). Our estimations show that the CBOE Volatility Index (VIX) is the
strongest candidate predictor for intraday market returns in our analysis,
specially when implemented through the LSTM model. This model also improves
significantly the performance of the lagged market return as predictive
variable. Finally, intraday RF estimation outputs indicate that there is no
performance improvement with this method, and it may even worsen the results in
some cases.

arXiv link: http://arxiv.org/abs/2112.15108v1

Econometrics arXiv paper, submitted: 2021-12-29

An Analysis of an Alternative Pythagorean Expected Win Percentage Model: Applications Using Major League Baseball Team Quality Simulations

Authors: Justin Ehrlich, Christopher Boudreaux, James Boudreau, Shane Sanders

We ask if there are alternative contest models that minimize error or
information loss from misspecification and outperform the Pythagorean model.
This article aims to use simulated data to select the optimal expected win
percentage model among the choice of relevant alternatives. The choices include
the traditional Pythagorean model and the difference-form contest success
function (CSF). Method. We simulate 1,000 iterations of the 2014 MLB season for
the purpose of estimating and analyzing alternative models of expected win
percentage (team quality). We use the open-source, Strategic Baseball Simulator
and develop an AutoHotKey script that programmatically executes the SBS
application, chooses the correct settings for the 2014 season, enters a unique
ID for the simulation data file, and iterates these steps 1,000 times. We
estimate expected win percentage using the traditional Pythagorean model, as
well as the difference-form CSF model that is used in game theory and public
choice economics. Each model is estimated while accounting for fixed (team)
effects. We find that the difference-form CSF model outperforms the traditional
Pythagorean model in terms of explanatory power and in terms of
misspecification-based information loss as estimated by the Akaike Information
Criterion. Through parametric estimation, we further confirm that the simulator
yields realistic statistical outcomes. The simulation methodology offers the
advantage of greatly improved sample size. As the season is held constant, our
simulation-based statistical inference also allows for estimation and model
comparison without the (time series) issue of non-stationarity. The results
suggest that improved win (productivity) estimation can be achieved through
alternative CSF specifications.

arXiv link: http://arxiv.org/abs/2112.14846v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-12-29

Volatility of volatility estimation: central limit theorems for the Fourier transform estimator and empirical study of the daily time series stylized facts

Authors: Giacomo Toscano, Giulia Livieri, Maria Elvira Mancino, Stefano Marmi

We study the asymptotic normality of two feasible estimators of the
integrated volatility of volatility based on the Fourier methodology, which
does not require the pre-estimation of the spot volatility. We show that the
bias-corrected estimator reaches the optimal rate $n^{1/4}$, while the
estimator without bias-correction has a slower convergence rate and a smaller
asymptotic variance. Additionally, we provide simulation results that support
the theoretical asymptotic distribution of the rate-efficient estimator and
show the accuracy of the latter in comparison with a rate-optimal estimator
based on the pre-estimation of the spot volatility. Finally, using the
rate-optimal Fourier estimator, we reconstruct the time series of the daily
volatility of volatility of the S&P500 and EUROSTOXX50 indices over long
samples and provide novel insight into the existence of stylized facts about
the volatility of volatility dynamics.

arXiv link: http://arxiv.org/abs/2112.14529v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-12-28

Nested Nonparametric Instrumental Variable Regression

Authors: Isaac Meza, Rahul Singh

Several causal parameters in short panel data models are functionals of a
nested nonparametric instrumental variable regression (nested NPIV). Recent
examples include mediated, time varying, and long term treatment effects
identified using proxy variables. In econometrics, examples arise in triangular
simultaneous equations and hedonic price systems. However, it appears that
explicit mean square convergence rates for nested NPIV are unknown, preventing
inference on some of these parameters with generic machine learning. A major
challenge is compounding ill posedness due to the nested inverse problems. To
limit how ill posedness compounds, we introduce two techniques: relative well
posedness, and multiple robustness to ill posedness. With these techniques, we
provide explicit mean square rates for nested NPIV and efficient inference for
recently identified causal parameters. Our nonasymptotic analysis accommodates
neural networks, random forests, and reproducing kernel Hilbert spaces. It
extends to causal functions, e.g. heterogeneous long term treatment effects.

arXiv link: http://arxiv.org/abs/2112.14249v4

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2021-12-27

Random Rank-Dependent Expected Utility

Authors: Nail Kashaev, Victor Aguiar

We present a novel characterization of random rank-dependent expected utility
for finite datasets and finite prizes. The test lends itself to statistical
testing using the tools in Kitamura and Stoye (2018).

arXiv link: http://arxiv.org/abs/2112.13649v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-12-27

Estimation based on nearest neighbor matching: from density ratio to average treatment effect

Authors: Zhexiao Lin, Peng Ding, Fang Han

Nearest neighbor (NN) matching as a tool to align data sampled from different
groups is both conceptually natural and practically well-used. In a landmark
paper, Abadie and Imbens (2006) provided the first large-sample analysis of NN
matching under, however, a crucial assumption that the number of NNs, $M$, is
fixed. This manuscript reveals something new out of their study and shows that,
once allowing $M$ to diverge with the sample size, an intrinsic statistic in
their analysis actually constitutes a consistent estimator of the density
ratio. Furthermore, through selecting a suitable $M$, this statistic can attain
the minimax lower bound of estimation over a Lipschitz density function class.
Consequently, with a diverging $M$, the NN matching provably yields a doubly
robust estimator of the average treatment effect and is semiparametrically
efficient if the density functions are sufficiently smooth and the outcome
model is appropriately specified. It can thus be viewed as a precursor of
double machine learning estimators.

arXiv link: http://arxiv.org/abs/2112.13506v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-12-27

Multiple Randomization Designs

Authors: Patrick Bajari, Brian Burdick, Guido W. Imbens, Lorenzo Masoero, James McQueen, Thomas Richardson, Ido M. Rosen

In this study we introduce a new class of experimental designs. In a
classical randomized controlled trial (RCT), or A/B test, a randomly selected
subset of a population of units (e.g., individuals, plots of land, or
experiences) is assigned to a treatment (treatment A), and the remainder of the
population is assigned to the control treatment (treatment B). The difference
in average outcome by treatment group is an estimate of the average effect of
the treatment. However, motivating our study, the setting for modern
experiments is often different, with the outcomes and treatment assignments
indexed by multiple populations. For example, outcomes may be indexed by buyers
and sellers, by content creators and subscribers, by drivers and riders, or by
travelers and airlines and travel agents, with treatments potentially varying
across these indices. Spillovers or interference can arise from interactions
between units across populations. For example, sellers' behavior may depend on
buyers' treatment assignment, or vice versa. This can invalidate the simple
comparison of means as an estimator for the average effect of the treatment in
classical RCTs. We propose new experiment designs for settings in which
multiple populations interact. We show how these designs allow us to study
questions about interference that cannot be answered by classical randomized
experiments. Finally, we develop new statistical methods for analyzing these
Multiple Randomization Designs.

arXiv link: http://arxiv.org/abs/2112.13495v1

Econometrics arXiv updated paper (originally submitted: 2021-12-26)

Long Story Short: Omitted Variable Bias in Causal Machine Learning

Authors: Victor Chernozhukov, Carlos Cinelli, Whitney Newey, Amit Sharma, Vasilis Syrgkanis

We develop a general theory of omitted variable bias for a wide range of
common causal parameters, including (but not limited to) averages of potential
outcomes, average treatment effects, average causal derivatives, and policy
effects from covariate shifts. Our theory applies to nonparametric models,
while naturally allowing for (semi-)parametric restrictions (such as partial
linearity) when such assumptions are made. We show how simple plausibility
judgments on the maximum explanatory power of omitted variables are sufficient
to bound the magnitude of the bias, thus facilitating sensitivity analysis in
otherwise complex, nonlinear models. Finally, we provide flexible and efficient
statistical inference methods for the bounds, which can leverage modern machine
learning algorithms for estimation. These results allow empirical researchers
to perform sensitivity analyses in a flexible class of machine-learned causal
models using very simple, and interpretable, tools. We demonstrate the utility
of our approach with two empirical examples.

arXiv link: http://arxiv.org/abs/2112.13398v5

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-12-25

Robust Estimation of Average Treatment Effects from Panel Data

Authors: Sayoni Roychowdhury, Indrila Ganguly, Abhik Ghosh

In order to evaluate the impact of a policy intervention on a group of units
over time, it is important to correctly estimate the average treatment effect
(ATE) measure. Due to lack of robustness of the existing procedures of
estimating ATE from panel data, in this paper, we introduce a robust estimator
of the ATE and the subsequent inference procedures using the popular approach
of minimum density power divergence inference. Asymptotic properties of the
proposed ATE estimator are derived and used to construct robust test statistics
for testing parametric hypotheses related to the ATE. Besides asymptotic
analyses of efficiency and powers, extensive simulation studies are conducted
to study the finite-sample performances of our proposed estimation and testing
procedures under both pure and contaminated data. The robustness of the ATE
estimator is further investigated theoretically through the influence functions
analyses. Finally our proposal is applied to study the long-term economic
effects of the 2004 Indian Ocean earthquake and tsunami on the (per-capita)
gross domestic products (GDP) of five mostly affected countries, namely
Indonesia, Sri Lanka, Thailand, India and Maldives.

arXiv link: http://arxiv.org/abs/2112.13228v2

Econometrics arXiv paper, submitted: 2021-12-22

Bayesian Approaches to Shrinkage and Sparse Estimation

Authors: Dimitris Korobilis, Kenichi Shimizu

In all areas of human knowledge, datasets are increasing in both size and
complexity, creating the need for richer statistical models. This trend is also
true for economic data, where high-dimensional and nonlinear/nonparametric
inference is the norm in several fields of applied econometric work. The
purpose of this paper is to introduce the reader to the world of Bayesian model
determination, by surveying modern shrinkage and variable selection algorithms
and methodologies. Bayesian inference is a natural probabilistic framework for
quantifying uncertainty and learning about model parameters, and this feature
is particularly important for inference in modern models of high dimensions and
increased complexity.
We begin with a linear regression setting in order to introduce various
classes of priors that lead to shrinkage/sparse estimators of comparable value
to popular penalized likelihood estimators (e.g.\ ridge, lasso). We explore
various methods of exact and approximate inference, and discuss their pros and
cons. Finally, we explore how priors developed for the simple regression
setting can be extended in a straightforward way to various classes of
interesting econometric models. In particular, the following case-studies are
considered, that demonstrate application of Bayesian shrinkage and variable
selection strategies to popular econometric contexts: i) vector autoregressive
models; ii) factor models; iii) time-varying parameter regressions; iv)
confounder selection in treatment effects models; and v) quantile regression
models. A MATLAB package and an accompanying technical manual allow the reader
to replicate many of the algorithms described in this review.

arXiv link: http://arxiv.org/abs/2112.11751v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-12-21

Doubly-Valid/Doubly-Sharp Sensitivity Analysis for Causal Inference with Unmeasured Confounding

Authors: Jacob Dorn, Kevin Guo, Nathan Kallus

We consider the problem of constructing bounds on the average treatment
effect (ATE) when unmeasured confounders exist but have bounded influence.
Specifically, we assume that omitted confounders could not change the odds of
treatment for any unit by more than a fixed factor. We derive the sharp partial
identification bounds implied by this assumption by leveraging distributionally
robust optimization, and we propose estimators of these bounds with several
novel robustness properties. The first is double sharpness: our estimators
consistently estimate the sharp ATE bounds when one of two nuisance parameters
is misspecified and achieve semiparametric efficiency when all nuisance
parameters are suitably consistent. The second is double validity: even when
most nuisance parameters are misspecified, our estimators still provide valid
but possibly conservative bounds for the ATE and our Wald confidence intervals
remain valid even when our estimators are not asymptotically normal. As a
result, our estimators provide a highly credible method for sensitivity
analysis of causal inferences.

arXiv link: http://arxiv.org/abs/2112.11449v2

Econometrics arXiv paper, submitted: 2021-12-21

Efficient Estimation of State-Space Mixed-Frequency VARs: A Precision-Based Approach

Authors: Joshua C. C. Chan, Aubrey Poon, Dan Zhu

State-space mixed-frequency vector autoregressions are now widely used for
nowcasting. Despite their popularity, estimating such models can be
computationally intensive, especially for large systems with stochastic
volatility. To tackle the computational challenges, we propose two novel
precision-based samplers to draw the missing observations of the low-frequency
variables in these models, building on recent advances in the band and sparse
matrix algorithms for state-space models. We show via a simulation study that
the proposed methods are more numerically accurate and computationally
efficient compared to standard Kalman-filter based methods. We demonstrate how
the proposed method can be applied in two empirical macroeconomic applications:
estimating the monthly output gap and studying the response of GDP to a
monetary policy shock at the monthly frequency. Results from these two
empirical applications highlight the importance of incorporating high-frequency
indicators in macroeconomic models.

arXiv link: http://arxiv.org/abs/2112.11315v1

Econometrics arXiv paper, submitted: 2021-12-21

Ranking and Selection from Pairwise Comparisons: Empirical Bayes Methods for Citation Analysis

Authors: Jiaying Gu, Roger Koenker

We study the Stigler model of citation flows among journals adapting the
pairwise comparison model of Bradley and Terry to do ranking and selection of
journal influence based on nonparametric empirical Bayes procedures.
Comparisons with several other rankings are made.

arXiv link: http://arxiv.org/abs/2112.11064v1

Econometrics arXiv updated paper (originally submitted: 2021-12-20)

Heckman-Selection or Two-Part models for alcohol studies? Depends

Authors: Reka Sundaram-Stukel

Aims: To re-introduce the Heckman model as a valid empirical technique in
alcohol studies. Design: To estimate the determinants of problem drinking using
a Heckman and a two-part estimation model. Psychological and neuro-scientific
studies justify my underlying estimation assumptions and covariate exclusion
restrictions. Higher order tests checking for multicollinearity validate the
use of Heckman over the use of two-part estimation models. I discuss the
generalizability of the two models in applied research. Settings and
Participants: Two pooled national population surveys from 2016 and 2017 were
used: the Behavioral Risk Factor Surveillance Survey (BRFS), and the National
Survey of Drug Use and Health (NSDUH). Measurements: Participation in problem
drinking and meeting the criteria for problem drinking. Findings: Both U.S.
national surveys perform well with the Heckman model and pass all higher order
tests. The Heckman model corrects for selection bias and reveals the direction
of bias, where the two-part model does not. For example, the coefficients on
age are upward biased and unemployment is downward biased in the two-part where
the Heckman model does not have a selection bias. Covariate exclusion
restrictions are sensitive to survey conditions and are contextually
generalizable. Conclusions: The Heckman model can be used for alcohol (smoking
studies as well) if the underlying estimation specification passes higher order
tests for multicollinearity and the exclusion restrictions are justified with
integrity for the data used. Its use is merit-worthy because it corrects for
and reveals the direction and the magnitude of selection bias where the
two-part does not.

arXiv link: http://arxiv.org/abs/2112.10542v2

Econometrics arXiv updated paper (originally submitted: 2021-12-16)

Robustness, Heterogeneous Treatment Effects and Covariate Shifts

Authors: Pietro Emilio Spini

This paper studies the robustness of estimated policy effects to changes in
the distribution of covariates. Robustness to covariate shifts is important,
for example, when evaluating the external validity of quasi-experimental
results, which are often used as a benchmark for evidence-based policy-making.
I propose a novel scalar robustness metric. This metric measures the magnitude
of the smallest covariate shift needed to invalidate a claim on the policy
effect (for example, $ATE \geq 0$) supported by the quasi-experimental
evidence. My metric links the heterogeneity of policy effects and robustness in
a flexible, nonparametric way and does not require functional form assumptions.
I cast the estimation of the robustness metric as a de-biased GMM problem. This
approach guarantees a parametric convergence rate for the robustness metric
while allowing for machine learning-based estimators of policy effect
heterogeneity (for example, lasso, random forest, boosting, neural nets). I
apply my procedure to the Oregon Health Insurance experiment. I study the
robustness of policy effects estimates of health-care utilization and financial
strain outcomes, relative to a shift in the distribution of context-specific
covariates. Such covariates are likely to differ across US states, making
quantification of robustness an important exercise for adoption of the
insurance policy in states other than Oregon. I find that the effect on
outpatient visits is the most robust among the metrics of health-care
utilization considered.

arXiv link: http://arxiv.org/abs/2112.09259v2

Econometrics arXiv updated paper (originally submitted: 2021-12-16)

Reinforcing RCTs with Multiple Priors while Learning about External Validity

Authors: Frederico Finan, Demian Pouzo

This paper introduces a framework for incorporating prior information into
the design of sequential experiments. These sources may include past
experiments, expert opinions, or the experimenter's intuition. We model the
problem using a multi-prior Bayesian approach, mapping each source to a
Bayesian model and aggregating them based on posterior probabilities. Policies
are evaluated on three criteria: learning the parameters of payoff
distributions, the probability of choosing the wrong treatment, and average
rewards. Our framework demonstrates several desirable properties, including
robustness to sources lacking external validity, while maintaining strong
finite sample performance.

arXiv link: http://arxiv.org/abs/2112.09170v5

Econometrics arXiv updated paper (originally submitted: 2021-12-16)

Lassoed Boosting and Linear Prediction in the Equities Market

Authors: Xiao Huang

We consider a two-stage estimation method for linear regression. First, it
uses the lasso in Tibshirani (1996) to screen variables and, second,
re-estimates the coefficients using the least-squares boosting method in
Friedman (2001) on every set of selected variables. Based on the large-scale
simulation experiment in Hastie et al. (2020), lassoed boosting performs as
well as the relaxed lasso in Meinshausen (2007) and, under certain scenarios,
can yield a sparser model. Applied to predicting equity returns, lassoed
boosting gives the smallest mean-squared prediction error compared to several
other methods.

arXiv link: http://arxiv.org/abs/2112.08934v4

Econometrics arXiv updated paper (originally submitted: 2021-12-16)

Uniform Convergence Results for the Local Linear Regression Estimation of the Conditional Distribution

Authors: Haitian Xie

This paper examines the local linear regression (LLR) estimate of the
conditional distribution function $F(y|x)$. We derive three uniform convergence
results: the uniform bias expansion, the uniform convergence rate, and the
uniform asymptotic linear representation. The uniformity in the above results
is with respect to both $x$ and $y$ and therefore has not previously been
addressed in the literature on local polynomial regression. Such uniform
convergence results are especially useful when the conditional distribution
estimator is the first stage of a semiparametric estimator. We demonstrate the
usefulness of these uniform results with two examples: the stochastic
equicontinuity condition in $y$, and the estimation of the integrated
conditional distribution function.

arXiv link: http://arxiv.org/abs/2112.08546v2

Econometrics arXiv updated paper (originally submitted: 2021-12-15)

Testing Instrument Validity with Covariates

Authors: Thomas Carr, Toru Kitagawa

We develop a novel test of the instrumental variable identifying assumptions
for heterogeneous treatment effect models with conditioning covariates. We
assume semiparametric dependence between potential outcomes and conditioning
covariates. This allows us to obtain testable equality and inequality
restrictions among the subdensities of estimable partial residuals. We propose
jointly testing these restrictions. To improve power, we introduce
distillation, where a trimmed sample is used to test the inequality
restrictions. In Monte Carlo exercises we find gains in finite sample power
from testing restrictions jointly and distillation. We apply our test procedure
to three instruments and reject the null for one.

arXiv link: http://arxiv.org/abs/2112.08092v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-12-15

Solving the Data Sparsity Problem in Predicting the Success of the Startups with Machine Learning Methods

Authors: Dafei Yin, Jing Li, Gaosheng Wu

Predicting the success of startup companies is of great importance for both
startup companies and investors. It is difficult due to the lack of available
data and appropriate general methods. With data platforms like Crunchbase
aggregating the information of startup companies, it is possible to predict
with machine learning algorithms. Existing research suffers from the data
sparsity problem as most early-stage startup companies do not have much data
available to the public. We try to leverage the recent algorithms to solve this
problem. We investigate several machine learning algorithms with a large
dataset from Crunchbase. The results suggest that LightGBM and XGBoost perform
best and achieve 53.03% and 52.96% F1 scores. We interpret the predictions from
the perspective of feature contribution. We construct portfolios based on the
models and achieve high success rates. These findings have substantial
implications on how machine learning methods can help startup companies and
investors.

arXiv link: http://arxiv.org/abs/2112.07985v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2021-12-14

Behavioral Foundations of Nested Stochastic Choice and Nested Logit

Authors: Matthew Kovach, Gerelt Tserenjigmid

We provide the first behavioral characterization of nested logit, a
foundational and widely applied discrete choice model, through the introduction
of a non-parametric version of nested logit that we call Nested Stochastic
Choice (NSC). NSC is characterized by a single axiom that weakens Independence
of Irrelevant Alternatives based on revealed similarity to allow for the
similarity effect. Nested logit is characterized by an additional
menu-independence axiom. Our axiomatic characterization leads to a practical,
data-driven algorithm that identifies the true nest structure from choice data.
We also discuss limitations of generalizing nested logit by studying the
testable implications of cross-nested logit.

arXiv link: http://arxiv.org/abs/2112.07155v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-12-14

Factor Models with Sparse VAR Idiosyncratic Components

Authors: Jonas Krampe, Luca Margaritella

We reconcile the two worlds of dense and sparse modeling by exploiting the
positive aspects of both. We employ a factor model and assume {the dynamic of
the factors is non-pervasive while} the idiosyncratic term follows a sparse
vector autoregressive model (VAR) {which allows} for cross-sectional and time
dependence. The estimation is articulated in two steps: first, the factors and
their loadings are estimated via principal component analysis and second, the
sparse VAR is estimated by regularized regression on the estimated
idiosyncratic components. We prove the consistency of the proposed estimation
approach as the time and cross-sectional dimension diverge. In the second step,
the estimation error of the first step needs to be accounted for. Here, we do
not follow the naive approach of simply plugging in the standard rates derived
for the factor estimation. Instead, we derive a more refined expression of the
error. This enables us to derive tighter rates. We discuss the implications of
our model for forecasting, factor augmented regression, bootstrap of factor
models, and time series dependence networks via semi-parametric estimation of
the inverse of the spectral density matrix.

arXiv link: http://arxiv.org/abs/2112.07149v2

Econometrics arXiv updated paper (originally submitted: 2021-12-14)

Semiparametric Conditional Factor Models in Asset Pricing

Authors: Qihui Chen, Nikolai Roussanov, Xiaoliang Wang

We introduce a simple and tractable methodology for estimating semiparametric
conditional latent factor models. Our approach disentangles the roles of
characteristics in capturing factor betas of asset returns from “alpha.” We
construct factors by extracting principal components from Fama-MacBeth managed
portfolios. Applying this methodology to the cross-section of U.S. individual
stock returns, we find compelling evidence of substantial nonzero pricing
errors, even though our factors demonstrate superior performance in standard
asset pricing tests. Unexplained “arbitrage” portfolios earn high Sharpe
ratios, which decline over time. Combining factors with these orthogonal
portfolios produces out-of-sample Sharpe ratios exceeding 4.

arXiv link: http://arxiv.org/abs/2112.07121v5

Econometrics arXiv paper, submitted: 2021-12-13

Identifying Marginal Treatment Effects in the Presence of Sample Selection

Authors: Otávio Bartalotti, Désiré Kédagni, Vitor Possebom

This article presents identification results for the marginal treatment
effect (MTE) when there is sample selection. We show that the MTE is partially
identified for individuals who are always observed regardless of treatment, and
derive uniformly sharp bounds on this parameter under three increasingly
restrictive sets of assumptions. The first result imposes standard MTE
assumptions with an unrestricted sample selection mechanism. The second set of
conditions imposes monotonicity of the sample selection variable with respect
to treatment, considerably shrinking the identified set. Finally, we
incorporate a stochastic dominance assumption which tightens the lower bound
for the MTE. Our analysis extends to discrete instruments. The results rely on
a mixture reformulation of the problem where the mixture weights are
identified, extending Lee's (2009) trimming procedure to the MTE context. We
propose estimators for the bounds derived and use data made available by Deb,
Munking and Trivedi (2006) to empirically illustrate the usefulness of our
approach.

arXiv link: http://arxiv.org/abs/2112.07014v1

Econometrics arXiv paper, submitted: 2021-12-13

Quantile Regression under Limited Dependent Variable

Authors: Javier Alejo, Gabriel Montes-Rojas

A new Stata command, ldvqreg, is developed to estimate quantile regression
models for the cases of censored (with lower and/or upper censoring) and binary
dependent variables. The estimators are implemented using a smoothed version of
the quantile regression objective function. Simulation exercises show that it
correctly estimates the parameters and it should be implemented instead of the
available quantile regression methods when censoring is present. An empirical
application to women's labor supply in Uruguay is considered.

arXiv link: http://arxiv.org/abs/2112.06822v1

Econometrics arXiv updated paper (originally submitted: 2021-12-13)

Risk and optimal policies in bandit experiments

Authors: Karun Adusumilli

We provide a decision theoretic analysis of bandit experiments under local
asymptotics. Working within the framework of diffusion processes, we define
suitable notions of asymptotic Bayes and minimax risk for these experiments.
For normally distributed rewards, the minimal Bayes risk can be characterized
as the solution to a second-order partial differential equation (PDE). Using a
limit of experiments approach, we show that this PDE characterization also
holds asymptotically under both parametric and non-parametric distributions of
the rewards. The approach further describes the state variables it is
asymptotically sufficient to restrict attention to, and thereby suggests a
practical strategy for dimension reduction. The PDEs characterizing minimal
Bayes risk can be solved efficiently using sparse matrix routines or
Monte-Carlo methods. We derive the optimal Bayes and minimax policies from
their numerical solutions. These optimal policies substantially dominate
existing methods such as Thompson sampling; the risk of the latter is often
twice as high.

arXiv link: http://arxiv.org/abs/2112.06363v16

Econometrics arXiv paper, submitted: 2021-12-12

Housing Price Prediction Model Selection Based on Lorenz and Concentration Curves: Empirical Evidence from Tehran Housing Market

Authors: Mohammad Mirbagherijam

This study contributes a house price prediction model selection in Tehran
City based on the area between Lorenz curve (LC) and concentration curve (CC)
of the predicted price by using 206,556 observed transaction data over the
period from March 21, 2018, to February 19, 2021. Several different methods
such as generalized linear models (GLM) and recursive partitioning and
regression trees (RPART), random forests (RF) regression models, and neural
network (NN) models were examined house price prediction. We used 90% of all
data samples which were chosen randomly to estimate the parameters of pricing
models and 10% of remaining datasets to test the accuracy of prediction.
Results showed that the area between the LC and CC curves (which are known as
ABC criterion) of real and predicted prices in the test data sample of the
random forest regression model was less than by other models under study. The
comparison of the calculated ABC criteria leads us to conclude that the
nonlinear regression models such as RF regression models give an accurate
prediction of house prices in Tehran City.

arXiv link: http://arxiv.org/abs/2112.06192v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2021-12-11

The Past as a Stochastic Process

Authors: David H. Wolpert, Michael H. Price, Stefani A. Crabtree, Timothy A. Kohler, Jurgen Jost, James Evans, Peter F. Stadler, Hajime Shimao, Manfred D. Laubichler

Historical processes manifest remarkable diversity. Nevertheless, scholars
have long attempted to identify patterns and categorize historical actors and
influences with some success. A stochastic process framework provides a
structured approach for the analysis of large historical datasets that allows
for detection of sometimes surprising patterns, identification of relevant
causal actors both endogenous and exogenous to the process, and comparison
between different historical cases. The combination of data, analytical tools
and the organizing theoretical framework of stochastic processes complements
traditional narrative approaches in history and archaeology.

arXiv link: http://arxiv.org/abs/2112.05876v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-12-10

On the Assumptions of Synthetic Control Methods

Authors: Claudia Shi, Dhanya Sridhar, Vishal Misra, David M. Blei

Synthetic control (SC) methods have been widely applied to estimate the
causal effect of large-scale interventions, e.g., the state-wide effect of a
change in policy. The idea of synthetic controls is to approximate one unit's
counterfactual outcomes using a weighted combination of some other units'
observed outcomes. The motivating question of this paper is: how does the SC
strategy lead to valid causal inferences? We address this question by
re-formulating the causal inference problem targeted by SC with a more
fine-grained model, where we change the unit of the analysis from "large units"
(e.g., states) to "small units" (e.g., individuals in states). Under this
re-formulation, we derive sufficient conditions for the non-parametric causal
identification of the causal effect. We highlight two implications of the
reformulation: (1) it clarifies where "linearity" comes from, and how it falls
naturally out of the more fine-grained and flexible model, and (2) it suggests
new ways of using available data with SC methods for valid causal inference, in
particular, new ways of selecting observations from which to estimate the
counterfactual.

arXiv link: http://arxiv.org/abs/2112.05671v2

Econometrics arXiv cross-link from q-fin.PR (q-fin.PR), submitted: 2021-12-10

Option Pricing with State-dependent Pricing Kernel

Authors: Chen Tong, Peter Reinhard Hansen, Zhuo Huang

We introduce a new volatility model for option pricing that combines Markov
switching with the Realized GARCH framework. This leads to a novel pricing
kernel with a state-dependent variance risk premium and a pricing formula for
European options, which is derived with an analytical approximation method. We
apply the Markov switching Realized GARCH model to S&P 500 index options from
1990 to 2019 and find that investors' aversion to volatility-specific risk is
time-varying. The proposed framework outperforms competing models and reduces
(in-sample and out-of-sample) option pricing errors by 15% or more.

arXiv link: http://arxiv.org/abs/2112.05308v2

Econometrics arXiv paper, submitted: 2021-12-10

Realized GARCH, CBOE VIX, and the Volatility Risk Premium

Authors: Peter Reinhard Hansen, Zhuo Huang, Chen Tong, Tianyi Wang

We show that the Realized GARCH model yields close-form expression for both
the Volatility Index (VIX) and the volatility risk premium (VRP). The Realized
GARCH model is driven by two shocks, a return shock and a volatility shock, and
these are natural state variables in the stochastic discount factor (SDF). The
volatility shock endows the exponentially affine SDF with a compensation for
volatility risk. This leads to dissimilar dynamic properties under the physical
and risk-neutral measures that can explain time-variation in the VRP. In an
empirical application with the S&P 500 returns, the VIX, and the VRP, we find
that the Realized GARCH model significantly outperforms conventional GARCH
models.

arXiv link: http://arxiv.org/abs/2112.05302v1

Econometrics arXiv paper, submitted: 2021-12-09

Covariate Balancing Sensitivity Analysis for Extrapolating Randomized Trials across Locations

Authors: Xinkun Nie, Guido Imbens, Stefan Wager

The ability to generalize experimental results from randomized control trials
(RCTs) across locations is crucial for informing policy decisions in targeted
regions. Such generalization is often hindered by the lack of identifiability
due to unmeasured effect modifiers that compromise direct transport of
treatment effect estimates from one location to another. We build upon
sensitivity analysis in observational studies and propose an optimization
procedure that allows us to get bounds on the treatment effects in targeted
regions. Furthermore, we construct more informative bounds by balancing on the
moments of covariates. In simulation experiments, we show that the covariate
balancing approach is promising in getting sharper identification intervals.

arXiv link: http://arxiv.org/abs/2112.04723v1

Econometrics arXiv cross-link from q-fin.CP (q-fin.CP), submitted: 2021-12-09

Deep self-consistent learning of local volatility

Authors: Zhe Wang, Ameir Shaa, Nicolas Privault, Claude Guet

We present an algorithm for the calibration of local volatility from market
option prices through deep self-consistent learning, by approximating both
market option prices and local volatility using deep neural networks. Our
method uses the initial-boundary value problem of the underlying Dupire's
partial differential equation solved by the parameterized option prices to
bring corrections to the parameterization in a self-consistent way. By
exploiting the differentiability of neural networks, we can evaluate Dupire's
equation locally at each strike-maturity pair; while by exploiting their
continuity, we sample strike-maturity pairs uniformly from a given domain,
going beyond the discrete points where the options are quoted. Moreover, the
absence of arbitrage opportunities are imposed by penalizing an associated loss
function as a soft constraint. For comparison with existing approaches, the
proposed method is tested on both synthetic and market option prices, which
shows an improved performance in terms of reduced interpolation and reprice
errors, as well as the smoothness of the calibrated local volatility. An
ablation study has been performed, asserting the robustness and significance of
the proposed method.

arXiv link: http://arxiv.org/abs/2201.07880v3

Econometrics arXiv paper, submitted: 2021-12-09

Efficient counterfactual estimation in semiparametric discrete choice models: a note on Chiong, Hsieh, and Shum (2017)

Authors: Grigory Franguridi

I suggest an enhancement of the procedure of Chiong, Hsieh, and Shum (2017)
for calculating bounds on counterfactual demand in semiparametric discrete
choice models. Their algorithm relies on a system of inequalities indexed by
cycles of a large number $M$ of observed markets and hence seems to require
computationally infeasible enumeration of all such cycles. I show that such
enumeration is unnecessary because solving the "fully efficient" inequality
system exploiting cycles of all possible lengths $K=1,\dots,M$ can be reduced
to finding the length of the shortest path between every pair of vertices in a
complete bidirected weighted graph on $M$ vertices. The latter problem can be
solved using the Floyd--Warshall algorithm with computational complexity
$O\left(M^3\right)$, which takes only seconds to run even for thousands of
markets. Monte Carlo simulations illustrate the efficiency gain from using
cycles of all lengths, which turns out to be positive, but small.

arXiv link: http://arxiv.org/abs/2112.04637v1

Econometrics arXiv updated paper (originally submitted: 2021-12-08)

Two-Way Fixed Effects and Differences-in-Differences with Heterogeneous Treatment Effects: A Survey

Authors: Clément de Chaisemartin, Xavier D'Haultfœuille

Linear regressions with period and group fixed effects are widely used to
estimate policies' effects: 26 of the 100 most cited papers published by the
American Economic Review from 2015 to 2019 estimate such regressions. It has
recently been shown that those regressions may produce misleading estimates, if
the policy's effect is heterogeneous between groups or over time, as is often
the case. This survey reviews a fast-growing literature that documents this
issue, and that proposes alternative estimators robust to heterogeneous
effects. We use those alternative estimators to revisit Wolfers (2006).

arXiv link: http://arxiv.org/abs/2112.04565v6

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-12-08

Matching for causal effects via multimarginal unbalanced optimal transport

Authors: Florian Gunsilius, Yuliang Xu

Matching on covariates is a well-established framework for estimating causal
effects in observational studies. The principal challenge stems from the often
high-dimensional structure of the problem. Many methods have been introduced to
address this, with different advantages and drawbacks in computational and
statistical performance as well as interpretability. This article introduces a
natural optimal matching method based on multimarginal unbalanced optimal
transport that possesses many useful properties in this regard. It provides
interpretable weights based on the distance of matched individuals, can be
efficiently implemented via the iterative proportional fitting procedure, and
can match several treatment arms simultaneously. Importantly, the proposed
method only selects good matches from either group, hence is competitive with
the classical k-nearest neighbors approach in terms of bias and variance in
finite samples. Moreover, we prove a central limit theorem for the empirical
process of the potential functions of the optimal coupling in the unbalanced
optimal transport problem with a fixed penalty term. This implies a parametric
rate of convergence of the empirically obtained weights to the optimal weights
in the population for a fixed penalty term.

arXiv link: http://arxiv.org/abs/2112.04398v2

Econometrics arXiv updated paper (originally submitted: 2021-12-07)

Nonparametric Treatment Effect Identification in School Choice

Authors: Jiafeng Chen

This paper studies nonparametric identification and estimation of causal
effects in centralized school assignment. In many centralized assignment
algorithms, students are subjected to both lottery-driven variation and
regression discontinuity (RD) driven variation. We characterize the full set of
identified atomic treatment effects (aTEs), defined as the conditional average
treatment effect between a pair of schools, given student characteristics.
Atomic treatment effects are the building blocks of more aggregated notions of
treatment contrasts, and common approaches to estimating aggregations of aTEs
can mask important heterogeneity. In particular, many aggregations of aTEs put
zero weight on aTEs driven by RD variation, and estimators of such aggregations
put asymptotically vanishing weight on the RD-driven aTEs. We provide a
diagnostic and recommend new aggregation schemes. Lastly, we provide estimators
and accompanying asymptotic results for inference for those aggregations.

arXiv link: http://arxiv.org/abs/2112.03872v4

Econometrics arXiv paper, submitted: 2021-12-07

A decomposition method to evaluate the `paradox of progress' with evidence for Argentina

Authors: Javier Alejo, Leonardo Gasparini, Gabriel Montes-Rojas, Walter Sosa-Escudero

The `paradox of progress' is an empirical regularity that associates more
education with larger income inequality. Two driving and competing factors
behind this phenomenon are the convexity of the `Mincer equation' (that links
wages and education) and the heterogeneity in its returns, as captured by
quantile regressions. We propose a joint least-squares and quantile regression
statistical framework to derive a decomposition in order to evaluate the
relative contribution of each explanation. The estimators are based on the
`functional derivative' approach. We apply the proposed decomposition strategy
to the case of Argentina 1992 to 2015.

arXiv link: http://arxiv.org/abs/2112.03836v1

Econometrics arXiv cross-link from q-fin.MF (q-fin.MF), submitted: 2021-12-07

A Bayesian take on option pricing with Gaussian processes

Authors: Martin Tegner, Stephen Roberts

Local volatility is a versatile option pricing model due to its state
dependent diffusion coefficient. Calibration is, however, non-trivial as it
involves both proposing a hypothesis model of the latent function and a method
for fitting it to data. In this paper we present novel Bayesian inference with
Gaussian process priors. We obtain a rich representation of the local
volatility function with a probabilistic notion of uncertainty attached to the
calibrate. We propose an inference algorithm and apply our approach to S&P 500
market data.

arXiv link: http://arxiv.org/abs/2112.03718v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-12-07

Phase transitions in nonparametric regressions

Authors: Ying Zhu

When the unknown regression function of a single variable is known to have
derivatives up to the $(\gamma+1)$th order bounded in absolute values by a
common constant everywhere or a.e. (i.e., $(\gamma+1)$th degree of smoothness),
the minimax optimal rate of the mean integrated squared error (MISE) is stated
as $\left(1{n}\right)^{2\gamma+2{2\gamma+3}}$ in the literature.
This paper shows that: (i) if $n\leq\left(\gamma+1\right)^{2\gamma+3}$, the
minimax optimal MISE rate is $\log n{n\log(\log n)}$ and the optimal
degree of smoothness to exploit is roughly $\max\left\{ \left\lfloor \log
n{2\log\left(\log n\right)}\right\rfloor ,\,1\right\} $; (ii) if
$n>\left(\gamma+1\right)^{2\gamma+3}$, the minimax optimal MISE rate is
$\left(1{n}\right)^{2\gamma+2{2\gamma+3}}$ and the optimal degree
of smoothness to exploit is $\gamma+1$. The fundamental contribution of this
paper is a set of metric entropy bounds we develop for smooth function classes.
Some of our bounds are original, and some of them improve and/or generalize the
ones in the literature (e.g., Kolmogorov and Tikhomirov, 1959). Our metric
entropy bounds allow us to show phase transitions in the minimax optimal MISE
rates associated with some commonly seen smoothness classes as well as
non-standard smoothness classes, and can also be of independent interest
outside the nonparametric regression problems.

arXiv link: http://arxiv.org/abs/2112.03626v7

Econometrics arXiv updated paper (originally submitted: 2021-12-06)

Visual Inference and Graphical Representation in Regression Discontinuity Designs

Authors: Christina Korting, Carl Lieberman, Jordan Matsudaira, Zhuan Pei, Yi Shen

Despite the widespread use of graphs in empirical research, little is known
about readers' ability to process the statistical information they are meant to
convey ("visual inference"). We study visual inference within the context of
regression discontinuity (RD) designs by measuring how accurately readers
identify discontinuities in graphs produced from data generating processes
calibrated on 11 published papers from leading economics journals. First, we
assess the effects of different graphical representation methods on visual
inference using randomized experiments. We find that bin widths and fit lines
have the largest impacts on whether participants correctly perceive the
presence or absence of a discontinuity. Our experimental results allow us to
make evidence-based recommendations to practitioners, and we suggest using
small bins with no fit lines as a starting point to construct RD graphs.
Second, we compare visual inference on graphs constructed using our preferred
method with widely used econometric inference procedures. We find that visual
inference achieves similar or lower type I error (false positive) rates and
complements econometric inference.

arXiv link: http://arxiv.org/abs/2112.03096v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-12-06

Deep Quantile and Deep Composite Model Regression

Authors: Tobias Fissler, Michael Merz, Mario V. Wüthrich

A main difficulty in actuarial claim size modeling is that there is no simple
off-the-shelf distribution that simultaneously provides a good distributional
model for the main body and the tail of the data. In particular, covariates may
have different effects for small and for large claim sizes. To cope with this
problem, we introduce a deep composite regression model whose splicing point is
given in terms of a quantile of the conditional claim size distribution rather
than a constant. To facilitate M-estimation for such models, we introduce and
characterize the class of strictly consistent scoring functions for the triplet
consisting a quantile, as well as the lower and upper expected shortfall beyond
that quantile. In a second step, this elicitability result is applied to fit
deep neural network regression models. We demonstrate the applicability of our
approach and its superiority over classical approaches on a real accident
insurance data set.

arXiv link: http://arxiv.org/abs/2112.03075v1

Econometrics arXiv updated paper (originally submitted: 2021-12-03)

Gaussian Process Vector Autoregressions and Macroeconomic Uncertainty

Authors: Niko Hauzenberger, Florian Huber, Massimiliano Marcellino, Nico Petz

We develop a non-parametric multivariate time series model that remains
agnostic on the precise relationship between a (possibly) large set of
macroeconomic time series and their lagged values. The main building block of
our model is a Gaussian process prior on the functional relationship that
determines the conditional mean of the model, hence the name of Gaussian
process vector autoregression (GP-VAR). A flexible stochastic volatility
specification is used to provide additional flexibility and control for
heteroskedasticity. Markov chain Monte Carlo (MCMC) estimation is carried out
through an efficient and scalable algorithm which can handle large models. The
GP-VAR is illustrated by means of simulated data and in a forecasting exercise
with US data. Moreover, we use the GP-VAR to analyze the effects of
macroeconomic uncertainty, with a particular emphasis on time variation and
asymmetries in the transmission mechanisms.

arXiv link: http://arxiv.org/abs/2112.01995v3

Econometrics arXiv paper, submitted: 2021-12-03

Inference for ROC Curves Based on Estimated Predictive Indices

Authors: Yu-Chin Hsu, Robert P. Lieli

We provide a comprehensive theory of conducting in-sample statistical
inference about receiver operating characteristic (ROC) curves that are based
on predicted values from a first stage model with estimated parameters (such as
a logit regression). The term "in-sample" refers to the practice of using the
same data for model estimation (training) and subsequent evaluation, i.e., the
construction of the ROC curve. We show that in this case the first stage
estimation error has a generally non-negligible impact on the asymptotic
distribution of the ROC curve and develop the appropriate pointwise and
functional limit theory. We propose methods for simulating the distribution of
the limit process and show how to use the results in practice in comparing ROC
curves.

arXiv link: http://arxiv.org/abs/2112.01772v1

Econometrics arXiv updated paper (originally submitted: 2021-12-02)

Patient-Centered Appraisal of Race-Free Clinical Risk Assessment

Authors: Charles F. Manski

Until recently, there has been a consensus that clinicians should condition
patient risk assessments on all observed patient covariates with predictive
power. The broad idea is that knowing more about patients enables more accurate
predictions of their health risks and, hence, better clinical decisions. This
consensus has recently unraveled with respect to a specific covariate, namely
race. There have been increasing calls for race-free risk assessment, arguing
that using race to predict patient outcomes contributes to racial disparities
and inequities in health care. Writers calling for race-free risk assessment
have not studied how it would affect the quality of clinical decisions.
Considering the matter from the patient-centered perspective of medical
economics yields a disturbing conclusion: Race-free risk assessment would harm
patients of all races.

arXiv link: http://arxiv.org/abs/2112.01639v2

Econometrics arXiv paper, submitted: 2021-12-02

Simple Alternatives to the Common Correlated Effects Model

Authors: Nicholas L. Brown, Peter Schmidt, Jeffrey M. Wooldridge

We study estimation of factor models in a fixed-T panel data setting and
significantly relax the common correlated effects (CCE) assumptions pioneered
by Pesaran (2006) and used in dozens of papers since. In the simplest case, we
model the unobserved factors as functions of the cross-sectional averages of
the explanatory variables and show that this is implied by Pesaran's
assumptions when the number of factors does not exceed the number of
explanatory variables. Our approach allows discrete explanatory variables and
flexible functional forms in the covariates. Plus, it extends to a framework
that easily incorporates general functions of cross-sectional moments, in
addition to heterogeneous intercepts and time trends. Our proposed estimators
include Pesaran's pooled correlated common effects (CCEP) estimator as a
special case. We also show that in the presence of heterogeneous slopes our
estimator is consistent under assumptions much weaker than those previously
used. We derive the fixed-T asymptotic normality of a general estimator and
show how to adjust for estimation of the population moments in the factor
loading equation.

arXiv link: http://arxiv.org/abs/2112.01486v1

Econometrics arXiv paper, submitted: 2021-12-02

RIF Regression via Sensitivity Curves

Authors: Javier Alejo, Gabriel Montes-Rojas, Walter Sosa-Escudero

This paper proposes an empirical method to implement the recentered influence
function (RIF) regression of Firpo, Fortin and Lemieux (2009), a relevant
method to study the effect of covariates on many statistics beyond the mean. In
empirically relevant situations where the influence function is not available
or difficult to compute, we suggest to use the sensitivity curve (Tukey,
1977) as a feasible alternative. This may be computationally cumbersome when
the sample size is large. The relevance of the proposed strategy derives from
the fact that, under general conditions, the sensitivity curve converges in
probability to the influence function. In order to save computational time we
propose to use a cubic splines non-parametric method for a random subsample and
then to interpolate to the rest of the cases where it was not computed. Monte
Carlo simulations show good finite sample properties. We illustrate the
proposed estimator with an application to the polarization index of Duclos,
Esteban and Ray (2004).

arXiv link: http://arxiv.org/abs/2112.01435v1

Econometrics arXiv updated paper (originally submitted: 2021-12-01)

Structural Sieves

Authors: Konrad Menzel

This paper explores the use of deep neural networks for semiparametric
estimation of economic models of maximizing behavior in production or discrete
choice. We argue that certain deep networks are particularly well suited as a
nonparametric sieve to approximate regression functions that result from
nonlinear latent variable models of continuous or discrete optimization.
Multi-stage models of this type will typically generate rich interaction
effects between regressors ("inputs") in the regression function so that there
may be no plausible separability restrictions on the "reduced-form" mapping
form inputs to outputs to alleviate the curse of dimensionality. Rather,
economic shape, sparsity, or separability restrictions either at a global level
or intermediate stages are usually stated in terms of the latent variable
model. We show that restrictions of this kind are imposed in a more
straightforward manner if a sufficiently flexible version of the latent
variable model is in fact used to approximate the unknown regression function.

arXiv link: http://arxiv.org/abs/2112.01377v2

Econometrics arXiv updated paper (originally submitted: 2021-11-30)

Modelling hetegeneous treatment effects by quantitle local polynomial decision tree and forest

Authors: Lai Xinglin

To further develop the statistical inference problem for heterogeneous
treatment effects, this paper builds on Breiman's (2001) random forest tree
(RFT)and Wager et al.'s (2018) causal tree to parameterize the nonparametric
problem using the excellent statistical properties of classical OLS and the
division of local linear intervals based on covariate quantile points, while
preserving the random forest trees with the advantages of constructible
confidence intervals and asymptotic normality properties [Athey and Imbens
(2016),Efron (2014),Wager et al.(2014)wager2014asymptotic], we propose
a decision tree using quantile classification according to fixed rules combined
with polynomial estimation of local samples, which we call the quantile local
linear causal tree (QLPRT) and forest (QLPRF).

arXiv link: http://arxiv.org/abs/2111.15320v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-11-29

Distribution Shift in Airline Customer Behavior during COVID-19

Authors: Abhinav Garg, Naman Shukla, Lavanya Marla, Sriram Somanchi

Traditional AI approaches in customized (personalized) contextual pricing
applications assume that the data distribution at the time of online pricing is
similar to that observed during training. However, this assumption may be
violated in practice because of the dynamic nature of customer buying patterns,
particularly due to unanticipated system shocks such as COVID-19. We study the
changes in customer behavior for a major airline during the COVID-19 pandemic
by framing it as a covariate shift and concept drift detection problem. We
identify which customers changed their travel and purchase behavior and the
attributes affecting that change using (i) Fast Generalized Subset Scanning and
(ii) Causal Forests. In our experiments with simulated and real-world data, we
present how these two techniques can be used through qualitative analysis.

arXiv link: http://arxiv.org/abs/2111.14938v2

Econometrics arXiv updated paper (originally submitted: 2021-11-29)

The Fixed-b Limiting Distribution and the ERP of HAR Tests Under Nonstationarity

Authors: Alessandro Casini

We show that the nonstandard limiting distribution of HAR test statistics
under fixed-b asymptotics is not pivotal (even after studentization) when the
data are nonstationarity. It takes the form of a complicated function of
Gaussian processes and depends on the integrated local long-run variance and on
on the second moments of the relevant series (e.g., of the regressors and
errors for the case of the linear regression model). Hence, existing fixed-b
inference methods based on stationarity are not theoretically valid in general.
The nuisance parameters entering the fixed-b limiting distribution can be
consistently estimated under small-b asymptotics but only with nonparametric
rate of convergence. Hence, We show that the error in rejection probability
(ERP) is an order of magnitude larger than that under stationarity and is also
larger than that of HAR tests based on HAC estimators under conventional
asymptotics. These theoretical results reconcile with recent finite-sample
evidence in Casini (2021) and Casini, Deng and Perron (2021) who showing that
fixed-b HAR tests can perform poorly when the data are nonstationary. They can
be conservative under the null hypothesis and have non-monotonic power under
the alternative hypothesis irrespective of how large the sample size is.

arXiv link: http://arxiv.org/abs/2111.14590v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-11-27

Factor-augmented tree ensembles

Authors: Filippo Pellegrino

This manuscript proposes to extend the information set of time-series
regression trees with latent stationary factors extracted via state-space
methods. In doing so, this approach generalises time-series regression trees on
two dimensions. First, it allows to handle predictors that exhibit measurement
error, non-stationary trends, seasonality and/or irregularities such as missing
observations. Second, it gives a transparent way for using domain-specific
theory to inform time-series regression trees. Empirically, ensembles of these
factor-augmented trees provide a reliable approach for macro-finance problems.
This article highlights it focussing on the lead-lag effect between equity
volatility and the business cycle in the United States.

arXiv link: http://arxiv.org/abs/2111.14000v6

Econometrics arXiv updated paper (originally submitted: 2021-11-26)

Robust Permutation Tests in Linear Instrumental Variables Regression

Authors: Purevdorj Tuvaandorj

This paper develops permutation versions of identification-robust tests in
linear instrumental variables (IV) regression. Unlike the existing
randomization and rank-based tests in which independence between the
instruments and the error terms is assumed, the permutation Anderson- Rubin
(AR), Lagrange Multiplier (LM) and Conditional Likelihood Ratio (CLR) tests are
asymptotically similar and robust to conditional heteroskedasticity under
standard exclusion restriction i.e. the orthogonality between the instruments
and the error terms. Moreover, when the instruments are independent of the
structural error term, the permutation AR tests are exact, hence robust to
heavy tails. As such, these tests share the strengths of the rank-based tests
and the wild bootstrap AR tests. Numerical illustrations corroborate the
theoretical results.

arXiv link: http://arxiv.org/abs/2111.13774v4

Econometrics arXiv paper, submitted: 2021-11-26

Yogurts Choose Consumers? Estimation of Random-Utility Models via Two-Sided Matching

Authors: Odran Bonnet, Alfred Galichon, Yu-Wei Hsieh, Keith O'Hara, Matt Shum

The problem of demand inversion - a crucial step in the estimation of random
utility discrete-choice models - is equivalent to the determination of stable
outcomes in two-sided matching models. This equivalence applies to random
utility models that are not necessarily additive, smooth, nor even invertible.
Based on this equivalence, algorithms for the determination of stable matchings
provide effective computational methods for estimating these models. For
non-invertible models, the identified set of utility vectors is a lattice, and
the matching algorithms recover sharp upper and lower bounds on the utilities.
Our matching approach facilitates estimation of models that were previously
difficult to estimate, such as the pure characteristics model. An empirical
application to voting data from the 1999 European Parliament elections
illustrates the good performance of our matching-based demand inversion
algorithms in practice.

arXiv link: http://arxiv.org/abs/2111.13744v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2021-11-25

Expert Aggregation for Financial Forecasting

Authors: Carl Remlinger, Brière Marie, Alasseur Clémence, Joseph Mikael

Machine learning algorithms dedicated to financial time series forecasting
have gained a lot of interest. But choosing between several algorithms can be
challenging, as their estimation accuracy may be unstable over time. Online
aggregation of experts combine the forecasts of a finite set of models in a
single approach without making any assumption about the models. In this paper,
a Bernstein Online Aggregation (BOA) procedure is applied to the construction
of long-short strategies built from individual stock return forecasts coming
from different machine learning models. The online mixture of experts leads to
attractive portfolio performances even in environments characterised by
non-stationarity. The aggregation outperforms individual algorithms, offering a
higher portfolio Sharpe Ratio, lower shortfall, with a similar turnover.
Extensions to expert and aggregation specialisations are also proposed to
improve the overall mixture on a family of portfolio evaluation metrics.

arXiv link: http://arxiv.org/abs/2111.15365v4

Econometrics arXiv updated paper (originally submitted: 2021-11-25)

Difference in Differences and Ratio in Ratios for Limited Dependent Variables

Authors: Myoung-jae Lee, Sanghyeok Lee

Difference in differences (DD) is widely used to find policy/treatment
effects with observational data, but applying DD to limited dependent variables
(LDV's) Y has been problematic. This paper addresses how to apply DD and
related approaches (such as "ratio in ratios" or "ratio in odds ratios") to
binary, count, fractional, multinomial or zero-censored Y under the unifying
framework of `generalized linear models with link functions'. We evaluate DD
and the related approaches with simulation and empirical studies, and recommend
'Poisson Quasi-MLE' for non-negative (such as count or zero-censored) Y and
(multinomial) logit MLE for binary, fractional or multinomial Y.

arXiv link: http://arxiv.org/abs/2111.12948v2

Econometrics arXiv updated paper (originally submitted: 2021-11-25)

Network regression and supervised centrality estimation

Authors: Junhui Cai, Dan Yang, Ran Chen, Wu Zhu, Haipeng Shen, Linda Zhao

The centrality in a network is often used to measure nodes' importance and
model network effects on a certain outcome. Empirical studies widely adopt a
two-stage procedure, which first estimates the centrality from the observed
noisy network and then infers the network effect from the estimated centrality,
even though it lacks theoretical understanding. We propose a unified modeling
framework to study the properties of centrality estimation and inference and
the subsequent network regression analysis with noisy network observations.
Furthermore, we propose a supervised centrality estimation methodology, which
aims to simultaneously estimate both centrality and network effect. We showcase
the advantages of our method compared with the two-stage method both
theoretically and numerically via extensive simulations and a case study in
predicting currency risk premiums from the global trade network.

arXiv link: http://arxiv.org/abs/2111.12921v3

Econometrics arXiv paper, submitted: 2021-11-24

Maximum Likelihood Estimation of Differentiated Products Demand Systems

Authors: Greg Lewis, Bora Ozaltun, Georgios Zervas

We discuss estimation of the differentiated products demand system of Berry
et al (1995) (BLP) by maximum likelihood estimation (MLE). We derive the
maximum likelihood estimator in the case where prices are endogenously
generated by firms that set prices in Bertrand-Nash equilibrium. In Monte Carlo
simulations the MLE estimator outperforms the best-practice GMM estimator on
both bias and mean squared error when the model is correctly specified. This
remains true under some forms of misspecification. In our simulations, the
coverage of the ML estimator is close to its nominal level, whereas the GMM
estimator tends to under-cover. We conclude the paper by estimating BLP on the
car data used in the original Berry et al (1995) paper, obtaining similar
estimates with considerably tighter standard errors.

arXiv link: http://arxiv.org/abs/2111.12397v1

Econometrics arXiv updated paper (originally submitted: 2021-11-24)

On Recoding Ordered Treatments as Binary Indicators

Authors: Evan K. Rose, Yotam Shem-Tov

Researchers using instrumental variables to investigate ordered treatments
often recode treatment into an indicator for any exposure. We investigate this
estimand under the assumption that the instruments shift compliers from no
treatment to some but not from some treatment to more. We show that when there
are extensive margin compliers only (EMCO) this estimand captures a weighted
average of treatment effects that can be partially unbundled into each complier
group's potential outcome means. We also establish an equivalence between EMCO
and a two-factor selection model and apply our results to study treatment
heterogeneity in the Oregon Health Insurance Experiment.

arXiv link: http://arxiv.org/abs/2111.12258v4

Econometrics arXiv paper, submitted: 2021-11-22

Interactive Effects Panel Data Models with General Factors and Regressors

Authors: Bin Peng, Liangjun Su, Joakim Westerlund, Yanrong Yang

This paper considers a model with general regressors and unobservable
factors. An estimator based on iterated principal components is proposed, which
is shown to be not only asymptotically normal and oracle efficient, but under
certain conditions also free of the otherwise so common asymptotic incidental
parameters bias. Interestingly, the conditions required to achieve unbiasedness
become weaker the stronger the trends in the factors, and if the trending is
strong enough unbiasedness comes at no cost at all. In particular, the approach
does not require any knowledge of how many factors there are, or whether they
are deterministic or stochastic. The order of integration of the factors is
also treated as unknown, as is the order of integration of the regressors,
which means that there is no need to pre-test for unit roots, or to decide on
which deterministic terms to include in the model.

arXiv link: http://arxiv.org/abs/2111.11506v1

Econometrics arXiv updated paper (originally submitted: 2021-11-21)

Orthogonal Policy Learning Under Ambiguity

Authors: Riccardo D'Adamo

This paper studies the problem of estimating individualized treatment rules
when treatment effects are partially identified, as it is often the case with
observational data. By drawing connections between the treatment assignment
problem and classical decision theory, we characterize several notions of
optimal treatment policies in the presence of partial identification. Our
unified framework allows to incorporate user-defined constraints on the set of
allowable policies, such as restrictions for transparency or interpretability,
while also ensuring computational feasibility. We show how partial
identification leads to a new policy learning problem where the objective
function is directionally -- but not fully -- differentiable with respect to
the nuisance first-stage. We then propose an estimation procedure that ensures
Neyman-orthogonality with respect to the nuisance components and we provide
statistical guarantees that depend on the amount of concentration around the
points of non-differentiability in the data-generating-process. The proposed
methods are illustrated using data from the Job Partnership Training Act study.

arXiv link: http://arxiv.org/abs/2111.10904v3

Econometrics arXiv paper, submitted: 2021-11-21

Why Synthetic Control estimators are biased and what to do about it: Introducing Relaxed and Penalized Synthetic Controls

Authors: Oscar Engelbrektson

This paper extends the literature on the theoretical properties of synthetic
controls to the case of non-linear generative models, showing that the
synthetic control estimator is generally biased in such settings. I derive a
lower bound for the bias, showing that the only component of it that is
affected by the choice of synthetic control is the weighted sum of pairwise
differences between the treated unit and the untreated units in the synthetic
control. To address this bias, I propose a novel synthetic control estimator
that allows for a constant difference of the synthetic control to the treated
unit in the pre-treatment period, and that penalizes the pairwise
discrepancies. Allowing for a constant offset makes the model more flexible,
thus creating a larger set of potential synthetic controls, and the
penalization term allows for the selection of the potential solution that will
minimize bias. I study the properties of this estimator and propose a
data-driven process for parameterizing the penalization term.

arXiv link: http://arxiv.org/abs/2111.10784v1

Econometrics arXiv updated paper (originally submitted: 2021-11-21)

Identifying Dynamic Discrete Choice Models with Hyperbolic Discounting

Authors: Taiga Tsubota

We study identification of dynamic discrete choice models with hyperbolic
discounting. We show that the standard discount factor, present bias factor,
and instantaneous utility functions for the sophisticated agent are
point-identified from observed conditional choice probabilities and transition
probabilities in a finite horizon model. The main idea to achieve
identification is to exploit variation in the observed conditional choice
probabilities over time. We present the estimation method and demonstrate a
good performance of the estimator by simulation.

arXiv link: http://arxiv.org/abs/2111.10721v4

Econometrics arXiv paper, submitted: 2021-11-21

Optimized Inference in Regression Kink Designs

Authors: Majed Dodin

We propose a method to remedy finite sample coverage problems and improve
upon the efficiency of commonly employed procedures for the construction of
nonparametric confidence intervals in regression kink designs. The proposed
interval is centered at the half-length optimal, numerically obtained linear
minimax estimator over distributions with Lipschitz constrained conditional
mean function. Its construction ensures excellent finite sample coverage and
length properties which are demonstrated in a simulation study and an empirical
illustration. Given the Lipschitz constant that governs how much curvature one
plausibly allows for, the procedure is fully data driven, computationally
inexpensive, incorporates shape constraints and is valid irrespective of the
distribution of the assignment variable.

arXiv link: http://arxiv.org/abs/2111.10713v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2021-11-16

An Empirical Evaluation of the Impact of New York's Bail Reform on Crime Using Synthetic Controls

Authors: Angela Zhou, Andrew Koo, Nathan Kallus, Rene Ropac, Richard Peterson, Stephen Koppel, Tiffany Bergin

We conduct an empirical evaluation of the impact of New York's bail reform on
crime. New York State's Bail Elimination Act went into effect on January 1,
2020, eliminating money bail and pretrial detention for nearly all misdemeanor
and nonviolent felony defendants. Our analysis of effects on aggregate crime
rates after the reform informs the understanding of bail reform and general
deterrence. We conduct a synthetic control analysis for a comparative case
study of impact of bail reform. We focus on synthetic control analysis of
post-intervention changes in crime for assault, theft, burglary, robbery, and
drug crimes, constructing a dataset from publicly reported crime data of 27
large municipalities. Our findings, including placebo checks and other
robustness checks, show that for assault, theft, and drug crimes, there is no
significant impact of bail reform on crime; for burglary and robbery, we
similarly have null findings but the synthetic control is also more variable so
these are deemed less conclusive.

arXiv link: http://arxiv.org/abs/2111.08664v2

Econometrics arXiv updated paper (originally submitted: 2021-11-16)

Optimal Stratification of Survey Experiments

Authors: Max Cytrynbaum

This paper studies a two-stage model of experimentation, where the researcher
first samples representative units from an eligible pool, then assigns each
sampled unit to treatment or control. To implement balanced sampling and
assignment, we introduce a new family of finely stratified designs that
generalize matched pairs randomization to propensities p(x) not equal to 1/2.
We show that two-stage stratification nonparametrically dampens the variance of
treatment effect estimation. We formulate and solve the optimal stratification
problem with heterogeneous costs and fixed budget, providing simple heuristics
for the optimal design. In settings with pilot data, we show that implementing
a consistent estimate of this design is also efficient, minimizing asymptotic
variance subject to the budget constraint. We also provide new asymptotically
exact inference methods, allowing experimenters to fully exploit the efficiency
gains from both stratified sampling and assignment. An application to nine
papers recently published in top economics journals demonstrates the value of
our methods.

arXiv link: http://arxiv.org/abs/2111.08157v2

Econometrics arXiv updated paper (originally submitted: 2021-11-15)

Abductive Inference and C. S. Peirce: 150 Years Later

Authors: Deep Mukhopadhyay

This paper is about two things: (i) Charles Sanders Peirce (1837-1914) -- an
iconoclastic philosopher and polymath who is among the greatest of American
minds. (ii) Abductive inference -- a term coined by C. S. Peirce, which he
defined as "the process of forming explanatory hypotheses. It is the only
logical operation which introduces any new idea."
Abductive inference and quantitative economics: Abductive inference plays a
fundamental role in empirical scientific research as a tool for discovery and
data analysis. Heckman and Singer (2017) strongly advocated "Economists should
abduct." Arnold Zellner (2007) stressed that "much greater emphasis on
reductive [abductive] inference in teaching econometrics, statistics, and
economics would be desirable." But currently, there are no established theory
or practical tools that can allow an empirical analyst to abduct. This paper
attempts to fill this gap by introducing new principles and concrete procedures
to the Economics and Statistics community. I termed the proposed approach as
Abductive Inference Machine (AIM).
The historical Peirce's experiment: In 1872, Peirce conducted a series of
experiments to determine the distribution of response times to an auditory
stimulus, which is widely regarded as one of the most significant statistical
investigations in the history of nineteenth-century American mathematical
research (Stigler, 1978). On the 150th anniversary of this historical
experiment, we look back at the Peircean-style abductive inference through a
modern statistical lens. Using Peirce's data, it is shown how empirical
analysts can abduct in a systematic and automated manner using AIM.

arXiv link: http://arxiv.org/abs/2111.08054v3

Econometrics arXiv paper, submitted: 2021-11-15

An Outcome Test of Discrimination for Ranked Lists

Authors: Jonathan Roth, Guillaume Saint-Jacques, YinYin Yu

This paper extends Becker (1957)'s outcome test of discrimination to settings
where a (human or algorithmic) decision-maker produces a ranked list of
candidates. Ranked lists are particularly relevant in the context of online
platforms that produce search results or feeds, and also arise when human
decisionmakers express ordinal preferences over a list of candidates. We show
that non-discrimination implies a system of moment inequalities, which
intuitively impose that one cannot permute the position of a lower-ranked
candidate from one group with a higher-ranked candidate from a second group and
systematically improve the objective. Moreover, we show that that these moment
inequalities are the only testable implications of non-discrimination when the
auditor observes only outcomes and group membership by rank. We show how to
statistically test the implied inequalities, and validate our approach in an
application using data from LinkedIn.

arXiv link: http://arxiv.org/abs/2111.07889v1

Econometrics arXiv paper, submitted: 2021-11-15

Dynamic Network Quantile Regression Model

Authors: Xiu Xu, Weining Wang, Yongcheol Shin, Chaowen Zheng

We propose a dynamic network quantile regression model to investigate the
quantile connectedness using a predetermined network information. We extend the
existing network quantile autoregression model of Zhu et al. (2019b) by
explicitly allowing the contemporaneous network effects and controlling for the
common factors across quantiles. To cope with the endogeneity issue due to
simultaneous network spillovers, we adopt the instrumental variable quantile
regression (IVQR) estimation and derive the consistency and asymptotic
normality of the IVQR estimator using the near epoch dependence property of the
network process. Via Monte Carlo simulations, we confirm the satisfactory
performance of the IVQR estimator across different quantiles under the
different network structures. Finally, we demonstrate the usefulness of our
proposed approach with an application to the dataset on the stocks traded in
NYSE and NASDAQ in 2016.

arXiv link: http://arxiv.org/abs/2111.07633v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-11-14

Decoding Causality by Fictitious VAR Modeling

Authors: Xingwei Hu

In modeling multivariate time series for either forecast or policy analysis,
it would be beneficial to have figured out the cause-effect relations within
the data. Regression analysis, however, is generally for correlation relation,
and very few researches have focused on variance analysis for causality
discovery. We first set up an equilibrium for the cause-effect relations using
a fictitious vector autoregressive model. In the equilibrium, long-run
relations are identified from noise, and spurious ones are negligibly close to
zero. The solution, called causality distribution, measures the relative
strength causing the movement of all series or specific affected ones. If a
group of exogenous data affects the others but not vice versa, then, in theory,
the causality distribution for other variables is necessarily zero. The
hypothesis test of zero causality is the rule to decide a variable is
endogenous or not. Our new approach has high accuracy in identifying the true
cause-effect relations among the data in the simulation studies. We also apply
the approach to estimating the causal factors' contribution to climate change.

arXiv link: http://arxiv.org/abs/2111.07465v2

Econometrics arXiv updated paper (originally submitted: 2021-11-14)

When Can We Ignore Measurement Error in the Running Variable?

Authors: Yingying Dong, Michal Kolesár

In many applications of regression discontinuity designs, the running
variable used by the administrator to assign treatment is only observed with
error. We show that, provided the observed running variable (i) correctly
classifies the treatment assignment, and (ii) affects the conditional means of
the potential outcomes smoothly, ignoring the measurement error nonetheless
yields an estimate with a causal interpretation: the average treatment effect
for units whose observed running variable equals to the cutoff. We show that,
possibly after doughnut trimming, these assumptions accommodate a variety of
settings where support of the measurement error is not too wide. We propose to
conduct inference using bias-aware methods, which remain valid even when
discreteness or irregular support in the observed running variable may lead to
partial identification. We illustrate the results for both sharp and fuzzy
designs in an empirical application.

arXiv link: http://arxiv.org/abs/2111.07388v4

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2021-11-14

Rational AI: A comparison of human and AI responses to triggers of economic irrationality in poker

Authors: C. Grace Haaf, Devansh Singh, Cinny Lin, Scofield Zou

Humans exhibit irrational decision-making patterns in response to
environmental triggers, such as experiencing an economic loss or gain. In this
paper we investigate whether algorithms exhibit the same behavior by examining
the observed decisions and latent risk and rationality parameters estimated by
a random utility model with constant relative risk-aversion utility function.
We use a dataset consisting of 10,000 hands of poker played by Pluribus, the
first algorithm in the world to beat professional human players and find (1)
Pluribus does shift its playing style in response to economic losses and gains,
ceteris paribus; (2) Pluribus becomes more risk-averse and rational following a
trigger but the humans become more risk-seeking and irrational; (3) the
difference in playing styles between Pluribus and the humans on the dimensions
of risk-aversion and rationality are particularly differentiable when both have
experienced a trigger. This provides support that decision-making patterns
could be used as "behavioral signatures" to identify human versus algorithmic
decision-makers in unlabeled contexts.

arXiv link: http://arxiv.org/abs/2111.07295v1

Econometrics arXiv paper, submitted: 2021-11-14

Large Order-Invariant Bayesian VARs with Stochastic Volatility

Authors: Joshua C. C. Chan, Gary Koop, Xuewen Yu

Many popular specifications for Vector Autoregressions (VARs) with
multivariate stochastic volatility are not invariant to the way the variables
are ordered due to the use of a Cholesky decomposition for the error covariance
matrix. We show that the order invariance problem in existing approaches is
likely to become more serious in large VARs. We propose the use of a
specification which avoids the use of this Cholesky decomposition. We show that
the presence of multivariate stochastic volatility allows for identification of
the proposed model and prove that it is invariant to ordering. We develop a
Markov Chain Monte Carlo algorithm which allows for Bayesian estimation and
prediction. In exercises involving artificial and real macroeconomic data, we
demonstrate that the choice of variable ordering can have non-negligible
effects on empirical results. In a macroeconomic forecasting exercise involving
VARs with 20 variables we find that our order-invariant approach leads to the
best forecasts and that some choices of variable ordering can lead to poor
forecasts using a conventional, non-order invariant, approach.

arXiv link: http://arxiv.org/abs/2111.07225v1

Econometrics arXiv paper, submitted: 2021-11-13

Asymmetric Conjugate Priors for Large Bayesian VARs

Authors: Joshua C. C. Chan

Large Bayesian VARs are now widely used in empirical macroeconomics. One
popular shrinkage prior in this setting is the natural conjugate prior as it
facilitates posterior simulation and leads to a range of useful analytical
results. This is, however, at the expense of modeling flexibility, as it rules
out cross-variable shrinkage -- i.e., shrinking coefficients on lags of other
variables more aggressively than those on own lags. We develop a prior that has
the best of both worlds: it can accommodate cross-variable shrinkage, while
maintaining many useful analytical results, such as a closed-form expression of
the marginal likelihood. This new prior also leads to fast posterior simulation
-- for a BVAR with 100 variables and 4 lags, obtaining 10,000 posterior draws
takes less than half a minute on a standard desktop. We demonstrate the
usefulness of the new prior via a structural analysis using a 15-variable VAR
with sign restrictions to identify 5 structural shocks.

arXiv link: http://arxiv.org/abs/2111.07170v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-11-12

Absolute and Relative Bias in Eight Common Observational Study Designs: Evidence from a Meta-analysis

Authors: Jelena Zurovac, Thomas D. Cook, John Deke, Mariel M. Finucane, Duncan Chaplin, Jared S. Coopersmith, Michael Barna, Lauren Vollmer Forrow

Observational studies are needed when experiments are not possible. Within
study comparisons (WSC) compare observational and experimental estimates that
test the same hypothesis using the same treatment group, outcome, and estimand.
Meta-analyzing 39 of them, we compare mean bias and its variance for the eight
observational designs that result from combining whether there is a pretest
measure of the outcome or not, whether the comparison group is local to the
treatment group or not, and whether there is a relatively rich set of other
covariates or not. Of these eight designs, one combines all three design
elements, another has none, and the remainder include any one or two. We found
that both the mean and variance of bias decline as design elements are added,
with the lowest mean and smallest variance in a design with all three elements.
The probability of bias falling within 0.10 standard deviations of the
experimental estimate varied from 59 to 83 percent in Bayesian analyses and
from 86 to 100 percent in non-Bayesian ones -- the ranges depending on the
level of data aggregation. But confounding remains possible due to each of the
eight observational study design cells including a different set of WSC
studies.

arXiv link: http://arxiv.org/abs/2111.06941v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-11-12

Dynamic treatment effects: high-dimensional inference under model misspecification

Authors: Yuqian Zhang, Weijie Ji, Jelena Bradic

Estimating dynamic treatment effects is crucial across various disciplines,
providing insights into the time-dependent causal impact of interventions.
However, this estimation poses challenges due to time-varying confounding,
leading to potentially biased estimates. Furthermore, accurately specifying the
growing number of treatment assignments and outcome models with multiple
exposures appears increasingly challenging to accomplish. Double robustness,
which permits model misspecification, holds great value in addressing these
challenges. This paper introduces a novel "sequential model doubly robust"
estimator. We develop novel moment-targeting estimates to account for
confounding effects and establish that root-$N$ inference can be achieved as
long as at least one nuisance model is correctly specified at each exposure
time, despite the presence of high-dimensional covariates. Although the
nuisance estimates themselves do not achieve root-$N$ rates, the carefully
designed loss functions in our framework ensure final root-$N$ inference for
the causal parameter of interest. Unlike off-the-shelf high-dimensional
methods, which fail to deliver robust inference under model misspecification
even within the doubly robust framework, our newly developed loss functions
address this limitation effectively.

arXiv link: http://arxiv.org/abs/2111.06818v3

Econometrics arXiv updated paper (originally submitted: 2021-11-12)

Bounds for Treatment Effects in the Presence of Anticipatory Behavior

Authors: Aibo Gong

In program evaluations, units can often anticipate the implementation of a
new policy before it occurs. Such anticipatory behavior can lead to units'
outcomes becoming dependent on their future treatment assignments. In this
paper, I employ a potential-outcomes framework to analyze the treatment effect
with anticipation. I start with a classical difference-in-differences model
with two time periods and provide identified sets with easy-to-implement
estimation and inference strategies for causal parameters. Empirical
applications and generalizations are provided. I illustrate my results by
analyzing the effect of an early retirement incentive program for teachers,
which the target units were likely to anticipate, on student achievement. The
empirical results show the result can be overestimated by up to 30% in the
worst case and demonstrate the potential pitfalls of failing to consider
anticipation in policy evaluation.

arXiv link: http://arxiv.org/abs/2111.06573v2

Econometrics arXiv paper, submitted: 2021-11-09

Generalized Kernel Ridge Regression for Causal Inference with Missing-at-Random Sample Selection

Authors: Rahul Singh

I propose kernel ridge regression estimators for nonparametric dose response
curves and semiparametric treatment effects in the setting where an analyst has
access to a selected sample rather than a random sample; only for select
observations, the outcome is observed. I assume selection is as good as random
conditional on treatment and a sufficiently rich set of observed covariates,
where the covariates are allowed to cause treatment or be caused by treatment
-- an extension of missingness-at-random (MAR). I propose estimators of means,
increments, and distributions of counterfactual outcomes with closed form
solutions in terms of kernel matrix operations, allowing treatment and
covariates to be discrete or continuous, and low, high, or infinite
dimensional. For the continuous treatment case, I prove uniform consistency
with finite sample rates. For the discrete treatment case, I prove root-n
consistency, Gaussian approximation, and semiparametric efficiency.

arXiv link: http://arxiv.org/abs/2111.05277v1

Econometrics arXiv updated paper (originally submitted: 2021-11-09)

Bounding Treatment Effects by Pooling Limited Information across Observations

Authors: Sokbae Lee, Martin Weidner

We provide novel bounds on average treatment effects (on the treated) that
are valid under an unconfoundedness assumption. Our bounds are designed to be
robust in challenging situations, for example, when the conditioning variables
take on a large number of different values in the observed sample, or when the
overlap condition is violated. This robustness is achieved by only using
limited "pooling" of information across observations. Namely, the bounds are
constructed as sample averages over functions of the observed outcomes such
that the contribution of each outcome only depends on the treatment status of a
limited number of observations. No information pooling across observations
leads to so-called "Manski bounds", while unlimited information pooling leads
to standard inverse propensity score weighting. We explore the intermediate
range between these two extremes and provide corresponding inference methods.
We show in Monte Carlo experiments and through two empirical application that
our bounds are indeed robust and informative in practice.

arXiv link: http://arxiv.org/abs/2111.05243v5

Econometrics arXiv updated paper (originally submitted: 2021-11-09)

Optimal Decision Rules Under Partial Identification

Authors: Kohei Yata

I consider a class of statistical decision problems in which the policymaker
must decide between two policies to maximize social welfare (e.g., the
population mean of an outcome) based on a finite sample. The framework
introduced in this paper allows for various types of restrictions on the
structural parameter (e.g., the smoothness of a conditional mean potential
outcome function) and accommodates settings with partial identification of
social welfare. As the main theoretical result, I derive a finite-sample
optimal decision rule under the minimax regret criterion. This rule has a
simple form, yet achieves optimality among all decision rules; no ad hoc
restrictions are imposed on the class of decision rules. I apply my results to
the problem of whether to change an eligibility cutoff in a regression
discontinuity setup, and illustrate them in an empirical application to a
school construction program in Burkina Faso.

arXiv link: http://arxiv.org/abs/2111.04926v4

Econometrics arXiv paper, submitted: 2021-11-09

Pair copula constructions of point-optimal sign-based tests for predictive linear and nonlinear regressions

Authors: Kaveh Salehzadeh Nobari

We propose pair copula constructed point-optimal sign tests in the context of
linear and nonlinear predictive regressions with endogenous, persistent
regressors, and disturbances exhibiting serial (nonlinear) dependence. The
proposed approach entails considering the entire dependence structure of the
signs to capture the serial dependence, and building feasible test statistics
based on pair copula constructions of the sign process. The tests are exact and
valid in the presence of heavy tailed and nonstandard errors, as well as
heterogeneous and persistent volatility. Furthermore, they may be inverted to
build confidence regions for the parameters of the regression function.
Finally, we adopt an adaptive approach based on the split-sample technique to
maximize the power of the test by finding an appropriate alternative
hypothesis. In a Monte Carlo study, we compare the performance of the proposed
"quasi"-point-optimal sign tests based on pair copula constructions by
comparing its size and power to those of certain existing tests that are
intended to be robust against heteroskedasticity. The simulation results
maintain the superiority of our procedures to existing popular tests.

arXiv link: http://arxiv.org/abs/2111.04919v1

Econometrics arXiv paper, submitted: 2021-11-08

Exponential GARCH-Ito Volatility Models

Authors: Donggyu Kim

This paper introduces a novel Ito diffusion process to model high-frequency
financial data, which can accommodate low-frequency volatility dynamics by
embedding the discrete-time non-linear exponential GARCH structure with
log-integrated volatility in a continuous instantaneous volatility process. The
key feature of the proposed model is that, unlike existing GARCH-Ito models,
the instantaneous volatility process has a non-linear structure, which ensures
that the log-integrated volatilities have the realized GARCH structure. We call
this the exponential realized GARCH-Ito (ERGI) model. Given the auto-regressive
structure of the log-integrated volatility, we propose a quasi-likelihood
estimation procedure for parameter estimation and establish its asymptotic
properties. We conduct a simulation study to check the finite sample
performance of the proposed model and an empirical study with 50 assets among
the S&P 500 compositions. The numerical studies show the advantages of the new
proposed model.

arXiv link: http://arxiv.org/abs/2111.04267v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-11-08

Rate-Optimal Cluster-Randomized Designs for Spatial Interference

Authors: Michael P. Leung

We consider a potential outcomes model in which interference may be present
between any two units but the extent of interference diminishes with spatial
distance. The causal estimand is the global average treatment effect, which
compares outcomes under the counterfactuals that all or no units are treated.
We study a class of designs in which space is partitioned into clusters that
are randomized into treatment and control. For each design, we estimate the
treatment effect using a Horvitz-Thompson estimator that compares the average
outcomes of units with all or no neighbors treated, where the neighborhood
radius is of the same order as the cluster size dictated by the design. We
derive the estimator's rate of convergence as a function of the design and
degree of interference and use this to obtain estimator-design pairs that
achieve near-optimal rates of convergence under relatively minimal assumptions
on interference. We prove that the estimators are asymptotically normal and
provide a variance estimator. For practical implementation of the designs, we
suggest partitioning space using clustering algorithms.

arXiv link: http://arxiv.org/abs/2111.04219v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-11-06

Sequential Kernel Embedding for Mediated and Time-Varying Dose Response Curves

Authors: Rahul Singh, Liyuan Xu, Arthur Gretton

We propose simple nonparametric estimators for mediated and time-varying dose
response curves based on kernel ridge regression. By embedding Pearl's
mediation formula and Robins' g-formula with kernels, we allow treatments,
mediators, and covariates to be continuous in general spaces, and also allow
for nonlinear treatment-confounder feedback. Our key innovation is a
reproducing kernel Hilbert space technique called sequential kernel embedding,
which we use to construct simple estimators that account for complex feedback.
Our estimators preserve the generality of classic identification while also
achieving nonasymptotic uniform rates. In nonlinear simulations with many
covariates, we demonstrate strong performance. We estimate mediated and
time-varying dose response curves of the US Job Corps, and clean data that may
serve as a benchmark in future work. We extend our results to mediated and
time-varying treatment effects and counterfactual distributions, verifying
semiparametric efficiency and weak convergence.

arXiv link: http://arxiv.org/abs/2111.03950v5

Econometrics arXiv paper, submitted: 2021-11-05

Bootstrap inference for panel data quantile regression

Authors: Antonio F. Galvao, Thomas Parker, Zhijie Xiao

This paper develops bootstrap methods for practical statistical inference in
panel data quantile regression models with fixed effects. We consider
random-weighted bootstrap resampling and formally establish its validity for
asymptotic inference. The bootstrap algorithm is simple to implement in
practice by using a weighted quantile regression estimation for fixed effects
panel data. We provide results under conditions that allow for temporal
dependence of observations within individuals, thus encompassing a large class
of possible empirical applications. Monte Carlo simulations provide numerical
evidence the proposed bootstrap methods have correct finite sample properties.
Finally, we provide an empirical illustration using the environmental Kuznets
curve.

arXiv link: http://arxiv.org/abs/2111.03626v1

Econometrics arXiv paper, submitted: 2021-11-04

Structural Breaks in Interactive Effects Panels and the Stock Market Reaction to COVID-19

Authors: Yiannis Karavias, Paresh Narayan, Joakim Westerlund

Dealing with structural breaks is an important step in most, if not all,
empirical economic research. This is particularly true in panel data comprised
of many cross-sectional units, such as individuals, firms or countries, which
are all affected by major events. The COVID-19 pandemic has affected most
sectors of the global economy, and there is by now plenty of evidence to
support this. The impact on stock markets is, however, still unclear. The fact
that most markets seem to have partly recovered while the pandemic is still
ongoing suggests that the relationship between stock returns and COVID-19 has
been subject to structural change. It is therefore important to know if a
structural break has occurred and, if it has, to infer the date of the break.
In the present paper we take this last observation as a source of motivation to
develop a new break detection toolbox that is applicable to different sized
panels, easy to implement and robust to general forms of unobserved
heterogeneity. The toolbox, which is the first of its kind, includes a test for
structural change, a break date estimator, and a break date confidence
interval. Application to a panel covering 61 countries from January 3 to
September 25, 2020, leads to the detection of a structural break that is dated
to the first week of April. The effect of COVID-19 is negative before the break
and zero thereafter, implying that while markets did react, the reaction was
short-lived. A possible explanation for this is the quantitative easing
programs announced by central banks all over the world in the second half of
March.

arXiv link: http://arxiv.org/abs/2111.03035v1

Econometrics arXiv cross-link from cs.SI (cs.SI), submitted: 2021-11-04

Monitoring COVID-19-induced gender differences in teleworking rates using Mobile Network Data

Authors: Sara Grubanov-Boskovic, Spyridon Spyratos, Stefano Maria Iacus, Umberto Minora, Francesco Sermi

The COVID-19 pandemic has created a sudden need for a wider uptake of
home-based telework as means of sustaining the production. Generally,
teleworking arrangements impacts directly worker's efficiency and motivation.
The direction of this impact, however, depends on the balance between positive
effects of teleworking (e.g. increased flexibility and autonomy) and its
downsides (e.g. blurring boundaries between private and work life). Moreover,
these effects of teleworking can be amplified in case of vulnerable groups of
workers, such as women. The first step in understanding the implications of
teleworking on women is to have timely information on the extent of teleworking
by age and gender. In the absence of timely official statistics, in this paper
we propose a method for nowcasting the teleworking trends by age and gender for
20 Italian regions using mobile network operators (MNO) data. The method is
developed and validated using MNO data together with the Italian quarterly
Labour Force Survey. Our results confirm that the MNO data have the potential
to be used as a tool for monitoring gender and age differences in teleworking
patterns. This tool becomes even more important today as it could support the
adequate gender mainstreaming in the “Next Generation EU” recovery plan and
help to manage related social impacts of COVID-19 through policymaking.

arXiv link: http://arxiv.org/abs/2111.09442v2

Econometrics arXiv updated paper (originally submitted: 2021-11-03)

occ2vec: A principal approach to representing occupations using natural language processing

Authors: Nicolaj Søndergaard Mühlbach

We propose occ2vec, a principal approach to representing
occupations, which can be used in matching, predictive and causal modeling, and
other economic areas. In particular, we use it to score occupations on any
definable characteristic of interest, say the degree of greenness.
Using more than 17,000 occupation-specific text descriptors, we transform each
occupation into a high-dimensional vector using natural language processing.
Similar, we assign a vector to the target characteristic and estimate the
occupational degree of this characteristic as the cosine similarity between the
vectors. The main advantages of this approach are its universal applicability
and verifiability contrary to existing ad-hoc approaches. We extensively
validate our approach on several exercises and then use it to estimate the
occupational degree of charisma and emotional intelligence (EQ). We find that
occupations that score high on these tend to have higher educational
requirements. Turning to wages, highly charismatic occupations are either found
in the lower or upper tail in the wage distribution. This is not found for EQ,
where higher levels of EQ are generally correlated with higher wages.

arXiv link: http://arxiv.org/abs/2111.02528v2

Econometrics arXiv paper, submitted: 2021-11-03

Multiplicative Component GARCH Model of Intraday Volatility

Authors: Xiufeng Yan

This paper proposes a multiplicative component intraday volatility model. The
intraday conditional volatility is expressed as the product of intraday
periodic component, intraday stochastic volatility component and daily
conditional volatility component. I extend the multiplicative component
intraday volatility model of Engle (2012) and Andersen and Bollerslev (1998) by
incorporating the durations between consecutive transactions. The model can be
applied to both regularly and irregularly spaced returns. I also provide a
nonparametric estimation technique of the intraday volatility periodicity. The
empirical results suggest the model can successfully capture the
interdependency of intraday returns.

arXiv link: http://arxiv.org/abs/2111.02376v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-11-03

Leveraging Causal Graphs for Blocking in Randomized Experiments

Authors: Abhishek Kumar Umrawal

Randomized experiments are often performed to study the causal effects of
interest. Blocking is a technique to precisely estimate the causal effects when
the experimental material is not homogeneous. It involves stratifying the
available experimental material based on the covariates causing non-homogeneity
and then randomizing the treatment within those strata (known as blocks). This
eliminates the unwanted effect of the covariates on the causal effects of
interest. We investigate the problem of finding a stable set of covariates to
be used to form blocks, that minimizes the variance of the causal effect
estimates. Using the underlying causal graph, we provide an efficient algorithm
to obtain such a set for a general semi-Markovian causal model.

arXiv link: http://arxiv.org/abs/2111.02306v2

Econometrics arXiv paper, submitted: 2021-11-03

Autoregressive conditional duration modelling of high frequency data

Authors: Xiufeng Yan

This paper explores the duration dynamics modelling under the Autoregressive
Conditional Durations (ACD) framework (Engle and Russell 1998). I test
different distributions assumptions for the durations. The empirical results
suggest unconditional durations approach the Gamma distributions. Moreover,
compared with exponential distributions and Weibull distributions, the ACD
model with Gamma distributed innovations provide the best fit of SPY durations.

arXiv link: http://arxiv.org/abs/2111.02300v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2021-11-03

What drives the accuracy of PV output forecasts?

Authors: Thi Ngoc Nguyen, Felix Müsgens

Due to the stochastic nature of photovoltaic (PV) power generation, there is
high demand for forecasting PV output to better integrate PV generation into
power grids. Systematic knowledge regarding the factors influencing forecast
accuracy is crucially important, but still mostly unknown. In this paper, we
review 180 papers on PV forecasts and extract a database of forecast errors for
statistical analysis. We show that among the forecast models, hybrid models
consistently outperform the others and will most likely be the future of PV
output forecasting. The use of data processing techniques is positively
correlated with the forecast quality, while the lengths of the forecast horizon
and out-of-sample test set have negative effects on the forecast accuracy. We
also found that the inclusion of numerical weather prediction variables, data
normalization, and data resampling are the most effective data processing
techniques. Furthermore, we found some evidence for cherry picking in reporting
errors and recommend that the test sets be at least one year to better assess
model performance. The paper also takes the first step towards establishing a
benchmark for assessing PV output forecasts.

arXiv link: http://arxiv.org/abs/2111.02092v1

Econometrics arXiv paper, submitted: 2021-11-03

Multiple-index Nonstationary Time Series Models: Robust Estimation Theory and Practice

Authors: Chaohua Dong, Jiti Gao, Bin Peng, Yundong Tu

This paper proposes a class of parametric multiple-index time series models
that involve linear combinations of time trends, stationary variables and unit
root processes as regressors. The inclusion of the three different types of
time series, along with the use of a multiple-index structure for these
variables to circumvent the curse of dimensionality, is due to both theoretical
and practical considerations. The M-type estimators (including OLS, LAD,
Huber's estimator, quantile and expectile estimators, etc.) for the index
vectors are proposed, and their asymptotic properties are established, with the
aid of the generalized function approach to accommodate a wide class of loss
functions that may not be necessarily differentiable at every point. The
proposed multiple-index model is then applied to study the stock return
predictability, which reveals strong nonlinear predictability under various
loss measures. Monte Carlo simulations are also included to evaluate the
finite-sample performance of the proposed estimators.

arXiv link: http://arxiv.org/abs/2111.02023v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-11-02

Asymptotic in a class of network models with an increasing sub-Gamma degree sequence

Authors: Jing Luo, Haoyu Wei, Xiaoyu Lei, Jiaxin Guo

For the differential privacy under the sub-Gamma noise, we derive the
asymptotic properties of a class of network models with binary values with a
general link function. In this paper, we release the degree sequences of the
binary networks under a general noisy mechanism with the discrete Laplace
mechanism as a special case. We establish the asymptotic result including both
consistency and asymptotically normality of the parameter estimator when the
number of parameters goes to infinity in a class of network models. Simulations
and a real data example are provided to illustrate asymptotic results.

arXiv link: http://arxiv.org/abs/2111.01301v4

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2021-11-01

Stock Price Prediction Using Time Series, Econometric, Machine Learning, and Deep Learning Models

Authors: Ananda Chatterjee, Hrisav Bhowmick, Jaydip Sen

For a long-time, researchers have been developing a reliable and accurate
predictive model for stock price prediction. According to the literature, if
predictive models are correctly designed and refined, they can painstakingly
and faithfully estimate future stock values. This paper demonstrates a set of
time series, econometric, and various learning-based models for stock price
prediction. The data of Infosys, ICICI, and SUN PHARMA from the period of
January 2004 to December 2019 was used here for training and testing the models
to know which model performs best in which sector. One time series model
(Holt-Winters Exponential Smoothing), one econometric model (ARIMA), two
machine Learning models (Random Forest and MARS), and two deep learning-based
models (simple RNN and LSTM) have been included in this paper. MARS has been
proved to be the best performing machine learning model, while LSTM has proved
to be the best performing deep learning model. But overall, for all three
sectors - IT (on Infosys data), Banking (on ICICI data), and Health (on SUN
PHARMA data), MARS has proved to be the best performing model in sales
forecasting.

arXiv link: http://arxiv.org/abs/2111.01137v1

Econometrics arXiv updated paper (originally submitted: 2021-11-01)

Funding liquidity, credit risk and unconventional monetary policy in the Euro area: A GVAR approach

Authors: Graziano Moramarco

This paper investigates the transmission of funding liquidity shocks, credit
risk shocks and unconventional monetary policy within the Euro area. To this
aim, we estimate a financial GVAR model for Germany, France, Italy and Spain on
monthly data over the period 2006-2017. The interactions between repo markets,
sovereign bonds and banks' CDS spreads are analyzed, explicitly accounting for
the country-specific effects of the ECB's asset purchase programmes. Impulse
response analysis signals marginally significant core-periphery heterogeneity,
flight-to-quality effects and spillovers between liquidity conditions and
credit risk. Simulated reductions in ECB programmes tend to result in higher
government bond yields and bank CDS spreads, especially for Italy and Spain, as
well as in falling repo trade volumes and rising repo rates across the Euro
area. However, only a few responses to shocks achieve statistical significance.

arXiv link: http://arxiv.org/abs/2111.01078v2

Econometrics arXiv updated paper (originally submitted: 2021-11-01)

Nonparametric Cointegrating Regression Functions with Endogeneity and Semi-Long Memory

Authors: Sepideh Mosaferi, Mark S. Kaiser

This article develops nonparametric cointegrating regression models with
endogeneity and semi-long memory. We assume that semi-long memory is produced
in the regressor process by tempering of random shock coefficients. The
fundamental properties of long memory processes are thus retained in the
regressor process. Nonparametric nonlinear cointegrating regressions with
serially dependent errors and endogenous regressors driven by long memory
innovations have been considered in Wang and Phillips (2016). That work also
implemented a statistical specification test for testing whether the regression
function follows a parametric form. The limit theory of test statistic involves
the local time of fractional Brownian motion. The present paper modifies the
test statistic to be suitable for the semi-long memory case. With this
modification, the limit theory for the test involves the local time of the
standard Brownian motion and is free of the unknown parameter d. Through
simulation studies, we investigate the properties of nonparametric regression
function estimation as well as test statistic. We also demonstrate the use of
test statistic through actual data sets.

arXiv link: http://arxiv.org/abs/2111.00972v3

Econometrics arXiv updated paper (originally submitted: 2021-11-01)

Financial-cycle ratios and medium-term predictions of GDP: Evidence from the United States

Authors: Graziano Moramarco

Using a large quarterly macroeconomic dataset for the period 1960-2017, we
document the ability of specific financial ratios from the housing market and
firms' aggregate balance sheets to predict GDP over medium-term horizons in the
United States. A cyclically adjusted house price-to-rent ratio and the
liabilities-to-income ratio of the non-financial non-corporate business sector
provide the best in-sample and out-of-sample predictions of GDP growth over
horizons of one to five years, based on a wide variety of rankings. Small
forecasting models that include these indicators outperform popular
high-dimensional models and forecast combinations. The predictive power of the
two ratios appears strong during both recessions and expansions, stable over
time, and consistent with well-established macro-finance theory.

arXiv link: http://arxiv.org/abs/2111.00822v3

Econometrics arXiv paper, submitted: 2021-10-31

On Time-Varying VAR Models: Estimation, Testing and Impulse Response Analysis

Authors: Yayi Yan, Jiti Gao, Bin Peng

Vector autoregressive (VAR) models are widely used in practical studies,
e.g., forecasting, modelling policy transmission mechanism, and measuring
connection of economic agents. To better capture the dynamics, this paper
introduces a new class of time-varying VAR models in which the coefficients and
covariance matrix of the error innovations are allowed to change smoothly over
time. Accordingly, we establish a set of theories, including the impulse
responses analyses subject to both of the short-run timing and the long-run
restrictions, an information criterion to select the optimal lag, and a
Wald-type test to determine the constant coefficients. Simulation studies are
conducted to evaluate the theoretical findings. Finally, we demonstrate the
empirical relevance and usefulness of the proposed methods through an
application to the transmission mechanism of U.S. monetary policy.

arXiv link: http://arxiv.org/abs/2111.00450v1

Econometrics arXiv paper, submitted: 2021-10-31

Productivity Convergence in Manufacturing: A Hierarchical Panel Data Approach

Authors: Guohua Feng, Jiti Gao, Bin Peng

Despite its paramount importance in the empirical growth literature,
productivity convergence analysis has three problems that have yet to be
resolved: (1) little attempt has been made to explore the hierarchical
structure of industry-level datasets; (2) industry-level technology
heterogeneity has largely been ignored; and (3) cross-sectional dependence has
rarely been allowed for. This paper aims to address these three problems within
a hierarchical panel data framework. We propose an estimation procedure and
then derive the corresponding asymptotic theory. Finally, we apply the
framework to a dataset of 23 manufacturing industries from a wide range of
countries over the period 1963-2018. Our results show that both the
manufacturing industry as a whole and individual manufacturing industries at
the ISIC two-digit level exhibit strong conditional convergence in labour
productivity, but not unconditional convergence. In addition, our results show
that both global and industry-specific shocks are important in explaining the
convergence behaviours of the manufacturing industries.

arXiv link: http://arxiv.org/abs/2111.00449v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-10-29

CP Factor Model for Dynamic Tensors

Authors: Yuefeng Han, Dan Yang, Cun-Hui Zhang, Rong Chen

Observations in various applications are frequently represented as a time
series of multidimensional arrays, called tensor time series, preserving the
inherent multidimensional structure. In this paper, we present a factor model
approach, in a form similar to tensor CP decomposition, to the analysis of
high-dimensional dynamic tensor time series. As the loading vectors are
uniquely defined but not necessarily orthogonal, it is significantly different
from the existing tensor factor models based on Tucker-type tensor
decomposition. The model structure allows for a set of uncorrelated
one-dimensional latent dynamic factor processes, making it much more convenient
to study the underlying dynamics of the time series. A new high order
projection estimator is proposed for such a factor model, utilizing the special
structure and the idea of the higher order orthogonal iteration procedures
commonly used in Tucker-type tensor factor model and general tensor CP
decomposition procedures. Theoretical investigation provides statistical error
bounds for the proposed methods, which shows the significant advantage of
utilizing the special model structure. Simulation study is conducted to further
demonstrate the finite sample properties of the estimators. Real data
application is used to illustrate the model and its interpretations.

arXiv link: http://arxiv.org/abs/2110.15517v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-10-28

Coresets for Time Series Clustering

Authors: Lingxiao Huang, K. Sudhir, Nisheeth K. Vishnoi

We study the problem of constructing coresets for clustering problems with
time series data. This problem has gained importance across many fields
including biology, medicine, and economics due to the proliferation of sensors
facilitating real-time measurement and rapid drop in storage costs. In
particular, we consider the setting where the time series data on $N$ entities
is generated from a Gaussian mixture model with autocorrelations over $k$
clusters in $R^d$. Our main contribution is an algorithm to construct
coresets for the maximum likelihood objective for this mixture model. Our
algorithm is efficient, and under a mild boundedness assumption on the
covariance matrices of the underlying Gaussians, the size of the coreset is
independent of the number of entities $N$ and the number of observations for
each entity, and depends only polynomially on $k$, $d$ and $1/\varepsilon$,
where $\varepsilon$ is the error parameter. We empirically assess the
performance of our coreset with synthetic data.

arXiv link: http://arxiv.org/abs/2110.15263v1

Econometrics arXiv updated paper (originally submitted: 2021-10-27)

Testing and Estimating Structural Breaks in Time Series and Panel Data in Stata

Authors: Jan Ditzen, Yiannis Karavias, Joakim Westerlund

Identifying structural change is a crucial step in analysis of time series
and panel data. The longer the time span, the higher the likelihood that the
model parameters have changed as a result of major disruptive events, such as
the 2007--2008 financial crisis and the 2020 COVID--19 outbreak. Detecting the
existence of breaks, and dating them is therefore necessary, not only for
estimation purposes but also for understanding drivers of change and their
effect on relationships. This article introduces a new community contributed
command called xtbreak, which provides researchers with a complete toolbox for
analysing multiple structural breaks in time series and panel data. xtbreak can
detect the existence of breaks, determine their number and location, and
provide break date confidence intervals. The new command is used to explore
changes in the relationship between COVID--19 cases and deaths in the US, using
both aggregate and state level data, and in the relationship between approval
ratings and consumer confidence, using a panel of eight countries.

arXiv link: http://arxiv.org/abs/2110.14550v3

Econometrics arXiv paper, submitted: 2021-10-27

A Scalable Inference Method For Large Dynamic Economic Systems

Authors: Pratha Khandelwal, Philip Nadler, Rossella Arcucci, William Knottenbelt, Yi-Ke Guo

The nature of available economic data has changed fundamentally in the last
decade due to the economy's digitisation. With the prevalence of often black
box data-driven machine learning methods, there is a necessity to develop
interpretable machine learning methods that can conduct econometric inference,
helping policymakers leverage the new nature of economic data. We therefore
present a novel Variational Bayesian Inference approach to incorporate a
time-varying parameter auto-regressive model which is scalable for big data.
Our model is applied to a large blockchain dataset containing prices,
transactions of individual actors, analyzing transactional flows and price
movements on a very granular level. The model is extendable to any dataset
which can be modelled as a dynamical system. We further improve the simple
state-space modelling by introducing non-linearities in the forward model with
the help of machine learning architectures.

arXiv link: http://arxiv.org/abs/2110.14346v1

Econometrics arXiv updated paper (originally submitted: 2021-10-27)

Forecasting with a Panel Tobit Model

Authors: Laura Liu, Hyungsik Roger Moon, Frank Schorfheide

We use a dynamic panel Tobit model with heteroskedasticity to generate
forecasts for a large cross-section of short time series of censored
observations. Our fully Bayesian approach allows us to flexibly estimate the
cross-sectional distribution of heterogeneous coefficients and then implicitly
use this distribution as prior to construct Bayes forecasts for the individual
time series. In addition to density forecasts, we construct set forecasts that
explicitly target the average coverage probability for the cross-section. We
present a novel application in which we forecast bank-level loan charge-off
rates for small banks.

arXiv link: http://arxiv.org/abs/2110.14117v2

Econometrics arXiv updated paper (originally submitted: 2021-10-26)

Coupling the Gini and Angles to Evaluate Economic Dispersion

Authors: Mario Schlemmer

Classical measures of inequality use the mean as the benchmark of economic
dispersion. They are not sensitive to inequality at the left tail of the
distribution, where it would matter most. This paper presents a new inequality
measurement tool that gives more weight to inequality at the lower end of the
distribution, it is based on the comparison of all value pairs and synthesizes
the dispersion of the whole distribution. The differences that sum to the Gini
coefficient are scaled by angular differences between observations. The
resulting index possesses a set of desirable properties, including
normalization, scale invariance, population invariance, transfer sensitivity,
and weak decomposability.

arXiv link: http://arxiv.org/abs/2110.13847v2

Econometrics arXiv updated paper (originally submitted: 2021-10-26)

Regime-Switching Density Forecasts Using Economists' Scenarios

Authors: Graziano Moramarco

We propose an approach for generating macroeconomic density forecasts that
incorporate information on multiple scenarios defined by experts. We adopt a
regime-switching framework in which sets of scenarios ("views") are used as
Bayesian priors on economic regimes. Predictive densities coming from different
views are then combined by optimizing objective functions of density
forecasting. We illustrate the approach with an empirical application to
quarterly real-time forecasts of U.S. GDP growth, in which we exploit the Fed's
macroeconomic scenarios used for bank stress tests. We show that the approach
achieves good accuracy in terms of average predictive scores and good
calibration of forecast distributions. Moreover, it can be used to evaluate the
contribution of economists' scenarios to density forecast performance.

arXiv link: http://arxiv.org/abs/2110.13761v2

Econometrics arXiv updated paper (originally submitted: 2021-10-26)

Inference in Regression Discontinuity Designs with High-Dimensional Covariates

Authors: Alexander Kreiß, Christoph Rothe

We study regression discontinuity designs in which many predetermined
covariates, possibly much more than the number of observations, can be used to
increase the precision of treatment effect estimates. We consider a two-step
estimator which first selects a small number of "important" covariates through
a localized Lasso-type procedure, and then, in a second step, estimates the
treatment effect by including the selected covariates linearly into the usual
local linear estimator. We provide an in-depth analysis of the algorithm's
theoretical properties, showing that, under an approximate sparsity condition,
the resulting estimator is asymptotically normal, with asymptotic bias and
variance that are conceptually similar to those obtained in low-dimensional
settings. Bandwidth selection and inference can be carried out using standard
methods. We also provide simulations and an empirical application.

arXiv link: http://arxiv.org/abs/2110.13725v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-10-26

Bayesian Estimation and Comparison of Conditional Moment Models

Authors: Siddhartha Chib, Minchul Shin, Anna Simoni

We consider the Bayesian analysis of models in which the unknown distribution
of the outcomes is specified up to a set of conditional moment restrictions.
The nonparametric exponentially tilted empirical likelihood function is
constructed to satisfy a sequence of unconditional moments based on an
increasing (in sample size) vector of approximating functions (such as tensor
splines based on the splines of each conditioning variable). For any given
sample size, results are robust to the number of expanded moments. We derive
Bernstein-von Mises theorems for the behavior of the posterior distribution
under both correct and incorrect specification of the conditional moments,
subject to growth rate conditions (slower under misspecification) on the number
of approximating functions. A large-sample theory for comparing different
conditional moment models is also developed. The central result is that the
marginal likelihood criterion selects the model that is less misspecified. We
also introduce sparsity-based model search for high-dimensional conditioning
variables, and provide efficient MCMC computations for high-dimensional
parameters. Along with clarifying examples, the framework is illustrated with
real-data applications to risk-factor determination in finance, and causal
inference under conditional ignorability.

arXiv link: http://arxiv.org/abs/2110.13531v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-10-25

Negotiating Networks in Oligopoly Markets for Price-Sensitive Products

Authors: Naman Shukla, Kartik Yellepeddi

We present a novel framework to learn functions that estimate decisions of
sellers and buyers simultaneously in an oligopoly market for a price-sensitive
product. In this setting, the aim of the seller network is to come up with a
price for a given context such that the expected revenue is maximized by
considering the buyer's satisfaction as well. On the other hand, the aim of the
buyer network is to assign probability of purchase to the offered price to
mimic the real world buyers' responses while also showing price sensitivity
through its action. In other words, rejecting the unnecessarily high priced
products. Similar to generative adversarial networks, this framework
corresponds to a minimax two-player game. In our experiments with simulated and
real-world transaction data, we compared our framework with the baseline model
and demonstrated its potential through proposed evaluation metrics.

arXiv link: http://arxiv.org/abs/2110.13303v1

Econometrics arXiv updated paper (originally submitted: 2021-10-25)

Covariate Balancing Methods for Randomized Controlled Trials Are Not Adversarially Robust

Authors: Hossein Babaei, Sina Alemohammad, Richard Baraniuk

The first step towards investigating the effectiveness of a treatment via a
randomized trial is to split the population into control and treatment groups
then compare the average response of the treatment group receiving the
treatment to the control group receiving the placebo.
In order to ensure that the difference between the two groups is caused only
by the treatment, it is crucial that the control and the treatment groups have
similar statistics. Indeed, the validity and reliability of a trial are
determined by the similarity of two groups' statistics. Covariate balancing
methods increase the similarity between the distributions of the two groups'
covariates. However, often in practice, there are not enough samples to
accurately estimate the groups' covariate distributions. In this paper, we
empirically show that covariate balancing with the Standardized Means
Difference (SMD) covariate balancing measure, as well as Pocock's sequential
treatment assignment method, are susceptible to worst-case treatment
assignments. Worst-case treatment assignments are those admitted by the
covariate balance measure, but result in highest possible ATE estimation
errors. We developed an adversarial attack to find adversarial treatment
assignment for any given trial. Then, we provide an index to measure how close
the given trial is to the worst-case. To this end, we provide an
optimization-based algorithm, namely Adversarial Treatment ASsignment in
TREatment Effect Trials (ATASTREET), to find the adversarial treatment
assignments.

arXiv link: http://arxiv.org/abs/2110.13262v3

Econometrics arXiv updated paper (originally submitted: 2021-10-25)

Functional instrumental variable regression with an application to estimating the impact of immigration on native wages

Authors: Dakyung Seong, Won-Ki Seo

Functional linear regression gets its popularity as a statistical tool to
study the relationship between function-valued response and exogenous
explanatory variables. However, in practice, it is hard to expect that the
explanatory variables of interest are perfectly exogenous, due to, for example,
the presence of omitted variables and measurement error. Despite its empirical
relevance, it was not until recently that this issue of endogeneity was studied
in the literature on functional regression, and the development in this
direction does not seem to sufficiently meet practitioners' needs; for example,
this issue has been discussed with paying particular attention on consistent
estimation and thus distributional properties of the proposed estimators still
remain to be further explored. To fill this gap, this paper proposes new
consistent FPCA-based instrumental variable estimators and develops their
asymptotic properties in detail. Simulation experiments under a wide range of
settings show that the proposed estimators perform considerably well. We apply
our methodology to estimate the impact of immigration on native wages.

arXiv link: http://arxiv.org/abs/2110.12722v3

Econometrics arXiv updated paper (originally submitted: 2021-10-23)

On Parameter Estimation in Unobserved Components Models subject to Linear Inequality Constraints

Authors: Abhishek K. Umrawal, Joshua C. C. Chan

We propose a new quadratic programming-based method of approximating
a nonstandard density using a multivariate Gaussian density. Such nonstandard
densities usually arise while developing posterior samplers for unobserved
components models involving inequality constraints on the parameters. For
instance, Chan et al. (2016) provided a new model of trend inflation with
linear inequality constraints on the stochastic trend. We implemented the
proposed quadratic programming-based method for this model and compared it to
the existing approximation. We observed that the proposed method works as well
as the existing approximation in terms of the final trend estimates while
achieving gains in terms of sample efficiency.

arXiv link: http://arxiv.org/abs/2110.12149v2

Econometrics arXiv paper, submitted: 2021-10-22

Slow Movers in Panel Data

Authors: Yuya Sasaki, Takuya Ura

Panel data often contain stayers (units with no within-variations) and slow
movers (units with little within-variations). In the presence of many slow
movers, conventional econometric methods can fail to work. We propose a novel
method of robust inference for the average partial effects in correlated random
coefficient models robustly across various distributions of within-variations,
including the cases with many stayers and/or many slow movers in a unified
manner. In addition to this robustness property, our proposed method entails
smaller biases and hence improves accuracy in inference compared to existing
alternatives. Simulation studies demonstrate our theoretical claims about these
properties: the conventional 95% confidence interval covers the true parameter
value with 37-93% frequencies, whereas our proposed one achieves 93-96%
coverage frequencies.

arXiv link: http://arxiv.org/abs/2110.12041v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2021-10-21

DMS, AE, DAA: methods and applications of adaptive time series model selection, ensemble, and financial evaluation

Authors: Parley Ruogu Yang, Ryan Lucas

We introduce three adaptive time series learning methods, called Dynamic
Model Selection (DMS), Adaptive Ensemble (AE), and Dynamic Asset Allocation
(DAA). The methods respectively handle model selection, ensembling, and
contextual evaluation in financial time series. Empirically, we use the methods
to forecast the returns of four key indices in the US market, incorporating
information from the VIX and Yield curves. We present financial applications of
the learning results, including fully-automated portfolios and dynamic hedging
strategies. The strategies strongly outperform long-only benchmarks over our
testing period, spanning from Q4 2015 to the end of 2021. The key outputs of
the learning methods are interpreted during the 2020 market crash.

arXiv link: http://arxiv.org/abs/2110.11156v3

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2021-10-20

Attention Overload

Authors: Matias D. Cattaneo, Paul Cheung, Xinwei Ma, Yusufcan Masatlioglu

We introduce an Attention Overload Model that captures the idea that
alternatives compete for the decision maker's attention, and hence the
attention that each alternative receives decreases as the choice problem
becomes larger. Using this nonparametric restriction on the random attention
formation, we show that a fruitful revealed preference theory can be developed
and provide testable implications on the observed choice behavior that can be
used to (point or partially) identify the decision maker's preference and
attention frequency. We then enhance our attention overload model to
accommodate heterogeneous preferences. Due to the nonparametric nature of our
identifying assumption, we must discipline the amount of heterogeneity in the
choice model: we propose the idea of List-based Attention Overload, where
alternatives are presented to the decision makers as a list that correlates
with both heterogeneous preferences and random attention. We show that
preference and attention frequencies are (point or partially) identifiable
under nonparametric assumptions on the list and attention formation mechanisms,
even when the true underlying list is unknown to the researcher. Building on
our identification results, for both preference and attention frequencies, we
develop econometric methods for estimation and inference that are valid in
settings with a large number of alternatives and choice problems, a distinctive
feature of the economic environment we consider. We provide a software package
in R implementing our empirical methods, and illustrate them in a simulation
study.

arXiv link: http://arxiv.org/abs/2110.10650v4

Econometrics arXiv updated paper (originally submitted: 2021-10-20)

One Instrument to Rule Them All: The Bias and Coverage of Just-ID IV

Authors: Joshua Angrist, Michal Kolesár

We revisit the finite-sample behavior of single-variable just-identified
instrumental variables (just-ID IV) estimators, arguing that in most
microeconometric applications, the usual inference strategies are likely
reliable. Three widely-cited applications are used to explain why this is so.
We then consider pretesting strategies of the form $t_{1}>c$, where $t_{1}$ is
the first-stage $t$-statistic, and the first-stage sign is given. Although
pervasive in empirical practice, pretesting on the first-stage $F$-statistic
exacerbates bias and distorts inference. We show, however, that median bias is
both minimized and roughly halved by setting $c=0$, that is by screening on the
sign of the estimated first stage. This bias reduction is a free
lunch: conventional confidence interval coverage is unchanged by screening on
the estimated first-stage sign. To the extent that IV analysts sign-screen
already, these results strengthen the case for a sanguine view of the
finite-sample behavior of just-ID IV.

arXiv link: http://arxiv.org/abs/2110.10556v7

Econometrics arXiv paper, submitted: 2021-10-20

Bi-integrative analysis of two-dimensional heterogeneous panel data model

Authors: Wei Wang, Xiaodong Yan, Yanyan Ren, Zhijie Xiao

Heterogeneous panel data models that allow the coefficients to vary across
individuals and/or change over time have received increasingly more attention
in statistics and econometrics. This paper proposes a two-dimensional
heterogeneous panel regression model that incorporate a group structure of
individual heterogeneous effects with cohort formation for their
time-variations, which allows common coefficients between nonadjacent time
points. A bi-integrative procedure that detects the information regarding group
and cohort patterns simultaneously via a doubly penalized least square with
concave fused penalties is introduced. We use an alternating direction method
of multipliers (ADMM) algorithm that automatically bi-integrates the
two-dimensional heterogeneous panel data model pertaining to a common one.
Consistency and asymptotic normality for the proposed estimators are developed.
We show that the resulting estimators exhibit oracle properties, i.e., the
proposed estimator is asymptotically equivalent to the oracle estimator
obtained using the known group and cohort structures. Furthermore, the
simulation studies provide supportive evidence that the proposed method has
good finite sample performance. A real data empirical application has been
provided to highlight the proposed method.

arXiv link: http://arxiv.org/abs/2110.10480v1

Econometrics arXiv paper, submitted: 2021-10-19

Difference-in-Differences with Geocoded Microdata

Authors: Kyle Butts

This paper formalizes a common approach for estimating effects of treatment
at a specific location using geocoded microdata. This estimator compares units
immediately next to treatment (an inner-ring) to units just slightly further
away (an outer-ring). I introduce intuitive assumptions needed to identify the
average treatment effect among the affected units and illustrates pitfalls that
occur when these assumptions fail. Since one of these assumptions requires
knowledge of exactly how far treatment effects are experienced, I propose a new
method that relaxes this assumption and allows for nonparametric estimation
using partitioning-based least squares developed in Cattaneo et. al. (2019).
Since treatment effects typically decay/change over distance, this estimator
improves analysis by estimating a treatment effect curve as a function of
distance from treatment. This is contrast to the traditional method which, at
best, identifies the average effect of treatment. To illustrate the advantages
of this method, I show that Linden and Rockoff (2008) under estimate the
effects of increased crime risk on home values closest to the treatment and
overestimate how far the effects extend by selecting a treatment ring that is
too wide.

arXiv link: http://arxiv.org/abs/2110.10192v1

Econometrics arXiv paper, submitted: 2021-10-19

Revisiting identification concepts in Bayesian analysis

Authors: Jean-Pierre Florens, Anna Simoni

This paper studies the role played by identification in the Bayesian analysis
of statistical and econometric models. First, for unidentified models we
demonstrate that there are situations where the introduction of a
non-degenerate prior distribution can make a parameter that is nonidentified in
frequentist theory identified in Bayesian theory. In other situations, it is
preferable to work with the unidentified model and construct a Markov Chain
Monte Carlo (MCMC) algorithms for it instead of introducing identifying
assumptions. Second, for partially identified models we demonstrate how to
construct the prior and posterior distributions for the identified set
parameter and how to conduct Bayesian analysis. Finally, for models that
contain some parameters that are identified and others that are not we show
that marginalizing out the identified parameter from the likelihood with
respect to its conditional prior, given the nonidentified parameter, allows the
data to be informative about the nonidentified and partially identified
parameter. The paper provides examples and simulations that illustrate how to
implement our techniques.

arXiv link: http://arxiv.org/abs/2110.09954v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-10-16

Exact Bias Correction for Linear Adjustment of Randomized Controlled Trials

Authors: Haoge Chang, Joel Middleton, P. M. Aronow

In an influential critique of empirical practice, Freedman (2008) showed that
the linear regression estimator was biased for the analysis of randomized
controlled trials under the randomization model. Under Freedman's assumptions,
we derive exact closed-form bias corrections for the linear regression
estimator with and without treatment-by-covariate interactions. We show that
the limiting distribution of the bias corrected estimator is identical to the
uncorrected estimator, implying that the asymptotic gains from adjustment can
be attained without introducing any risk of bias. Taken together with results
from Lin (2013), our results show that Freedman's theoretical arguments against
the use of regression adjustment can be completely resolved with minor
modifications to practice.

arXiv link: http://arxiv.org/abs/2110.08425v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-10-15

Covariate Adjustment in Regression Discontinuity Designs

Authors: Matias D. Cattaneo, Luke Keele, Rocio Titiunik

The Regression Discontinuity (RD) design is a widely used non-experimental
method for causal inference and program evaluation. While its canonical
formulation only requires a score and an outcome variable, it is common in
empirical work to encounter RD analyses where additional variables are used for
adjustment. This practice has led to misconceptions about the role of covariate
adjustment in RD analysis, from both methodological and empirical perspectives.
In this chapter, we review the different roles of covariate adjustment in RD
designs, and offer methodological guidance for its correct use.

arXiv link: http://arxiv.org/abs/2110.08410v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-10-15

Detecting long-range dependence for time-varying linear models

Authors: Lujia Bai, Weichi Wu

We consider the problem of testing for long-range dependence in time-varying
coefficient regression models, where the covariates and errors are locally
stationary, allowing complex temporal dynamics and heteroscedasticity. We
develop KPSS, R/S, V/S, and K/S-type statistics based on the nonparametric
residuals. Under the null hypothesis, the local alternatives as well as the
fixed alternatives, we derive the limiting distributions of the test
statistics. As the four types of test statistics could degenerate when the
time-varying mean, variance, long-run variance of errors, covariates, and the
intercept lie in certain hyperplanes, we show the bootstrap-assisted tests are
consistent under both degenerate and non-degenerate scenarios. In particular,
in the presence of covariates the exact local asymptotic power of the
bootstrap-assisted tests can enjoy the same order as that of the classical KPSS
test of long memory for strictly stationary series. The asymptotic theory is
built on a new Gaussian approximation technique for locally stationary
long-memory processes with short-memory covariates, which is of independent
interest. The effectiveness of our tests is demonstrated by extensive
simulation studies and real data analysis.

arXiv link: http://arxiv.org/abs/2110.08089v5

Econometrics arXiv paper, submitted: 2021-10-14

Choice probabilities and correlations in closed-form route choice models: specifications and drawbacks

Authors: Fiore Tinessa, Vittorio Marzano, Andrea Papola

This paper investigates the performance, in terms of choice probabilities and
correlations, of existing and new specifications of closed-form route choice
models with flexible correlation patterns, namely the Link Nested Logit (LNL),
the Paired Combinatorial Logit (PCL) and the more recent Combination of Nested
Logit (CoNL) models. Following a consolidated track in the literature, choice
probabilities and correlations of the Multinomial Probit (MNP) model by
(Daganzo and Sheffi, 1977) are taken as target. Laboratory experiments on
small/medium-size networks are illustrated, also leveraging a procedure for
practical calculation of correlations of any GEV models, proposed by (Marzano
2014). Results show that models with inherent limitations in the coverage of
the domain of feasible correlations yield unsatisfactory performance, whilst
the specifications of the CoNL proposed in the paper appear the best in fitting
both MNP correlations and probabilities. Performance of the models are
appreciably ameliorated by introducing lower bounds to the nesting parameters.
Overall, the paper provides guidance for the practical application of tested
models.

arXiv link: http://arxiv.org/abs/2110.07224v1

Econometrics arXiv paper, submitted: 2021-10-14

Machine Learning, Deep Learning, and Hedonic Methods for Real Estate Price Prediction

Authors: Mahdieh Yazdani

In recent years several complaints about racial discrimination in appraising
home values have been accumulating. For several decades, to estimate the sale
price of the residential properties, appraisers have been walking through the
properties, observing the property, collecting data, and making use of the
hedonic pricing models. However, this method bears some costs and by nature is
subjective and biased. To minimize human involvement and the biases in the real
estate appraisals and boost the accuracy of the real estate market price
prediction models, in this research we design data-efficient learning machines
capable of learning and extracting the relation or patterns between the inputs
(features for the house) and output (value of the houses). We compare the
performance of some machine learning and deep learning algorithms, specifically
artificial neural networks, random forest, and k nearest neighbor approaches to
that of hedonic method on house price prediction in the city of Boulder,
Colorado. Even though this study has been done over the houses in the city of
Boulder it can be generalized to the housing market in any cities. The results
indicate non-linear association between the dwelling features and dwelling
prices. In light of these findings, this study demonstrates that random forest
and artificial neural networks algorithms can be better alternatives over the
hedonic regression analysis for prediction of the house prices in the city of
Boulder, Colorado.

arXiv link: http://arxiv.org/abs/2110.07151v1

Econometrics arXiv updated paper (originally submitted: 2021-10-13)

Efficient Estimation in NPIV Models: A Comparison of Various Neural Networks-Based Estimators

Authors: Jiafeng Chen, Xiaohong Chen, Elie Tamer

Artificial Neural Networks (ANNs) can be viewed as nonlinear sieves that can
approximate complex functions of high dimensional variables more effectively
than linear sieves. We investigate the performance of various ANNs in
nonparametric instrumental variables (NPIV) models of moderately high
dimensional covariates that are relevant to empirical economics. We present two
efficient procedures for estimation and inference on a weighted average
derivative (WAD): an orthogonalized plug-in with optimally-weighted sieve
minimum distance (OP-OSMD) procedure and a sieve efficient score (ES)
procedure. Both estimators for WAD use ANN sieves to approximate the unknown
NPIV function and are root-n asymptotically normal and first-order equivalent.
We provide a detailed practitioner's recipe for implementing both efficient
procedures. We compare their finite-sample performances in various simulation
designs that involve smooth NPIV function of up to 13 continuous covariates,
different nonlinearities and covariate correlations. Some Monte Carlo findings
include: 1) tuning and optimization are more delicate in ANN estimation; 2)
given proper tuning, both ANN estimators with various architectures can perform
well; 3) easier to tune ANN OP-OSMD estimators than ANN ES estimators; 4)
stable inferences are more difficult to achieve with ANN (than spline)
estimators; 5) there are gaps between current implementations and approximation
theories. Finally, we apply ANN NPIV to estimate average partial derivatives in
two empirical demand examples with multivariate covariates.

arXiv link: http://arxiv.org/abs/2110.06763v4

Econometrics arXiv updated paper (originally submitted: 2021-10-12)

Partial Identification of Marginal Treatment Effects with discrete instruments and misreported treatment

Authors: Santiago Acerenza

This paper provides partial identification results for the marginal treatment
effect ($MTE$) when the binary treatment variable is potentially misreported
and the instrumental variable is discrete. Identification results are derived
under different sets of nonparametric assumptions. The identification results
are illustrated in identifying the marginal treatment effects of food stamps on
health.

arXiv link: http://arxiv.org/abs/2110.06285v3

Econometrics arXiv paper, submitted: 2021-10-11

Fixed $T$ Estimation of Linear Panel Data Models with Interactive Fixed Effects

Authors: Ayden Higgins

This paper studies the estimation of linear panel data models with
interactive fixed effects, where one dimension of the panel, typically time,
may be fixed. To this end, a novel transformation is introduced that reduces
the model to a lower dimension, and, in doing so, relieves the model of
incidental parameters in the cross-section. The central result of this paper
demonstrates that transforming the model and then applying the principal
component (PC) estimator of bai_panel_2009 delivers $n$
consistent estimates of regression slope coefficients with $T$ fixed. Moreover,
these estimates are shown to be asymptotically unbiased in the presence of
cross-sectional dependence, serial dependence, and with the inclusion of
dynamic regressors, in stark contrast to the usual case. The large $n$, large
$T$ properties of this approach are also studied, where many of these results
carry over to the case in which $n$ is growing sufficiently fast relative to
$T$. Transforming the model also proves to be useful beyond estimation, a point
illustrated by showing that with $T$ fixed, the eigenvalue ratio test of
horenstein provides a consistent test for the number of factors when
applied to the transformed model.

arXiv link: http://arxiv.org/abs/2110.05579v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-10-11

$β$-Intact-VAE: Identifying and Estimating Causal Effects under Limited Overlap

Authors: Pengzhou Wu, Kenji Fukumizu

As an important problem in causal inference, we discuss the identification
and estimation of treatment effects (TEs) under limited overlap; that is, when
subjects with certain features belong to a single treatment group. We use a
latent variable to model a prognostic score which is widely used in
biostatistics and sufficient for TEs; i.e., we build a generative prognostic
model. We prove that the latent variable recovers a prognostic score, and the
model identifies individualized treatment effects. The model is then learned as
\beta-Intact-VAE--a new type of variational autoencoder (VAE). We derive the TE
error bounds that enable representations balanced for treatment groups
conditioned on individualized features. The proposed method is compared with
recent methods using (semi-)synthetic datasets.

arXiv link: http://arxiv.org/abs/2110.05225v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-10-11

Two-stage least squares with a randomly right censored outcome

Authors: Jad Beyhum

This note develops a simple two-stage least squares (2SLS) procedure to
estimate the causal effect of some endogenous regressors on a randomly right
censored outcome in the linear model. The proposal replaces the usual ordinary
least squares regressions of the standard 2SLS by weighted least squares
regressions. The weights correspond to the inverse probability of censoring. We
show consistency and asymptotic normality of the estimator. The estimator
exhibits good finite sample performances in simulations.

arXiv link: http://arxiv.org/abs/2110.05107v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-10-10

High-dimensional Inference for Dynamic Treatment Effects

Authors: Jelena Bradic, Weijie Ji, Yuqian Zhang

Estimating dynamic treatment effects is a crucial endeavor in causal
inference, particularly when confronted with high-dimensional confounders.
Doubly robust (DR) approaches have emerged as promising tools for estimating
treatment effects due to their flexibility. However, we showcase that the
traditional DR approaches that only focus on the DR representation of the
expected outcomes may fall short of delivering optimal results. In this paper,
we propose a novel DR representation for intermediate conditional outcome
models that leads to superior robustness guarantees. The proposed method
achieves consistency even with high-dimensional confounders, as long as at
least one nuisance function is appropriately parametrized for each exposure
time and treatment path. Our results represent a significant step forward as
they provide new robustness guarantees. The key to achieving these results is
our new DR representation, which offers superior inferential performance while
requiring weaker assumptions. Lastly, we confirm our findings in practice
through simulations and a real data application.

arXiv link: http://arxiv.org/abs/2110.04924v4

Econometrics arXiv updated paper (originally submitted: 2021-10-10)

Smooth Tests for Normality in ANOVA

Authors: Peiwen Jia, Xiaojun Song, Haoyu Wei

The normality assumption for random errors is fundamental in the analysis of
variance (ANOVA) models, yet it is seldom subjected to formal testing in
practice. In this paper, we develop Neyman's smooth tests for assessing
normality in a broad class of ANOVA models. The proposed test statistics are
constructed via the Gaussian probability integral transformation of ANOVA
residuals and are shown to follow an asymptotic Chi-square distribution under
the null hypothesis, with degrees of freedom determined by the dimension of the
smooth model. We further propose a data-driven selection of the model dimension
based on a modified Schwarz's criterion. Monte Carlo simulations demonstrate
that the tests maintain the nominal size and achieve high power against a wide
range of alternatives. Our framework thus provides a systematic and effective
tool for formally validating the normality assumption in ANOVA models.

arXiv link: http://arxiv.org/abs/2110.04849v3

Econometrics arXiv paper, submitted: 2021-10-10

Nonparametric Tests of Conditional Independence for Time Series

Authors: Xiaojun Song, Haoyu Wei

We propose consistent nonparametric tests of conditional independence for
time series data. Our methods are motivated from the difference between joint
conditional cumulative distribution function (CDF) and the product of
conditional CDFs. The difference is transformed into a proper conditional
moment restriction (CMR), which forms the basis for our testing procedure. Our
test statistics are then constructed using the integrated moment restrictions
that are equivalent to the CMR. We establish the asymptotic behavior of the
test statistics under the null, the alternative, and the sequence of local
alternatives converging to conditional independence at the parametric rate. Our
tests are implemented with the assistance of a multiplier bootstrap. Monte
Carlo simulations are conducted to evaluate the finite sample performance of
the proposed tests. We apply our tests to examine the predictability of equity
risk premium using variance risk premium for different horizons and find that
there exist various degrees of nonlinear predictability at mid-run and long-run
horizons.

arXiv link: http://arxiv.org/abs/2110.04847v1

Econometrics arXiv updated paper (originally submitted: 2021-10-10)

Various issues around the L1-norm distance

Authors: Jean-Daniel Rolle

Beyond the new results mentioned hereafter, this article aims at
familiarizing researchers working in applied fields -- such as physics or
economics -- with notions or formulas that they use daily without always
identifying all their theoretical features or potentialities. Various
situations where the L1-norm distance E|X-Y| between real-valued random
variables intervene are closely examined. The axiomatic surrounding this
distance is also explored. We constantly try to build bridges between the
concrete uses of E|X-Y| and the underlying probabilistic model. An alternative
interpretation of this distance is also examined, as well as its relation to
the Gini index (economics) and the Lukaszyk-Karmovsky distance (physics). The
main contributions are the following: (a) We show that under independence,
triangle inequality holds for the normalized form E|X-Y|/(E|X| + E|Y|). (b) In
order to present a concrete advance, we determine the analytic form of E|X-Y|
and of its normalized expression when X and Y are independent with Gaussian or
uniform distribution. The resulting formulas generalize relevant tools already
in use in areas such as physics and economics. (c) We propose with all the
required rigor a brief one-dimensional introduction to the optimal transport
problem, essentially for a L1 cost function. The chosen illustrations and
examples should be of great help for newcomers to the field. New proofs and new
results are proposed.

arXiv link: http://arxiv.org/abs/2110.04787v3

Econometrics arXiv updated paper (originally submitted: 2021-10-09)

On the asymptotic behavior of bubble date estimators

Authors: Eiji Kurozumi, Anton Skrobotov

In this study, we extend the three-regime bubble model of Pang et al. (2021)
to allow the forth regime followed by the unit root process after recovery. We
provide the asymptotic and finite sample justification of the consistency of
the collapse date estimator in the two-regime AR(1) model. The consistency
allows us to split the sample before and after the date of collapse and to
consider the estimation of the date of exuberation and date of recovery
separately. We have also found that the limiting behavior of the recovery date
varies depending on the extent of explosiveness and recovering.

arXiv link: http://arxiv.org/abs/2110.04500v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-10-09

A Primer on Deep Learning for Causal Inference

Authors: Bernard Koch, Tim Sainburg, Pablo Geraldo, Song Jiang, Yizhou Sun, Jacob Gates Foster

This review systematizes the emerging literature for causal inference using
deep neural networks under the potential outcomes framework. It provides an
intuitive introduction on how deep learning can be used to estimate/predict
heterogeneous treatment effects and extend causal inference to settings where
confounding is non-linear, time varying, or encoded in text, networks, and
images. To maximize accessibility, we also introduce prerequisite concepts from
causal inference and deep learning. The survey differs from other treatments of
deep learning and causal inference in its sharp focus on observational causal
estimation, its extended exposition of key algorithms, and its detailed
tutorials for implementing, training, and selecting among deep estimators in
Tensorflow 2 available at github.com/kochbj/Deep-Learning-for-Causal-Inference.

arXiv link: http://arxiv.org/abs/2110.04442v2

Econometrics arXiv updated paper (originally submitted: 2021-10-08)

Estimating High Dimensional Monotone Index Models by Iterative Convex Optimization1

Authors: Shakeeb Khan, Xiaoying Lan, Elie Tamer, Qingsong Yao

In this paper we propose new approaches to estimating large dimensional
monotone index models. This class of models has been popular in the applied and
theoretical econometrics literatures as it includes discrete choice,
nonparametric transformation, and duration models. A main advantage of our
approach is computational. For instance, rank estimation procedures such as
those proposed in Han (1987) and Cavanagh and Sherman (1998) that optimize a
nonsmooth, non convex objective function are difficult to use with more than a
few regressors and so limits their use in with economic data sets. For such
monotone index models with increasing dimension, we propose to use a new class
of estimators based on batched gradient descent (BGD) involving nonparametric
methods such as kernel estimation or sieve estimation, and study their
asymptotic properties. The BGD algorithm uses an iterative procedure where the
key step exploits a strictly convex objective function, resulting in
computational advantages. A contribution of our approach is that our model is
large dimensional and semiparametric and so does not require the use of
parametric distributional assumptions.

arXiv link: http://arxiv.org/abs/2110.04388v2

Econometrics arXiv updated paper (originally submitted: 2021-10-08)

Dyadic double/debiased machine learning for analyzing determinants of free trade agreements

Authors: Harold D Chiang, Yukun Ma, Joel Rodrigue, Yuya Sasaki

This paper presents novel methods and theories for estimation and inference
about parameters in econometric models using machine learning for nuisance
parameters estimation when data are dyadic. We propose a dyadic cross fitting
method to remove over-fitting biases under arbitrary dyadic dependence.
Together with the use of Neyman orthogonal scores, this novel cross fitting
method enables root-$n$ consistent estimation and inference robustly against
dyadic dependence. We illustrate an application of our general framework to
high-dimensional network link formation models. With this method applied to
empirical data of international economic networks, we reexamine determinants of
free trade agreements (FTA) viewed as links formed in the dyad composed of
world economies. We document that standard methods may lead to misleading
conclusions for numerous classic determinants of FTA formation due to biased
point estimates or standard errors which are too small.

arXiv link: http://arxiv.org/abs/2110.04365v3

Econometrics arXiv paper, submitted: 2021-10-08

Many Proxy Controls

Authors: Ben Deaner

A recent literature considers causal inference using noisy proxies for
unobserved confounding factors. The proxies are divided into two sets that are
independent conditional on the confounders. One set of proxies are `negative
control treatments' and the other are `negative control outcomes'. Existing
work applies to low-dimensional settings with a fixed number of proxies and
confounders. In this work we consider linear models with many proxy controls
and possibly many confounders. A key insight is that if each group of proxies
is strictly larger than the number of confounding factors, then a matrix of
nuisance parameters has a low-rank structure and a vector of nuisance
parameters has a sparse structure. We can exploit the rank-restriction and
sparsity to reduce the number of free parameters to be estimated. The number of
unobserved confounders is not known a priori but we show that it is identified,
and we apply penalization methods to adapt to this quantity. We provide an
estimator with a closed-form as well as a doubly-robust estimator that must be
evaluated using numerical methods. We provide conditions under which our
doubly-robust estimator is uniformly root-$n$ consistent, asymptotically
centered normal, and our suggested confidence intervals have asymptotically
correct coverage. We provide simulation evidence that our methods achieve
better performance than existing approaches in high dimensions, particularly
when the number of proxies is substantially larger than the number of
confounders.

arXiv link: http://arxiv.org/abs/2110.03973v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-10-07

Heterogeneous Overdispersed Count Data Regressions via Double Penalized Estimations

Authors: Shaomin Li, Haoyu Wei, Xiaoyu Lei

This paper studies the non-asymptotic merits of the double
$\ell_1$-regularized for heterogeneous overdispersed count data via negative
binomial regressions. Under the restricted eigenvalue conditions, we prove the
oracle inequalities for Lasso estimators of two partial regression coefficients
for the first time, using concentration inequalities of empirical processes.
Furthermore, derived from the oracle inequalities, the consistency and
convergence rate for the estimators are the theoretical guarantees for further
statistical inference. Finally, both simulations and a real data analysis
demonstrate that the new methods are effective.

arXiv link: http://arxiv.org/abs/2110.03552v2

Econometrics arXiv paper, submitted: 2021-10-07

Investigating Growth at Risk Using a Multi-country Non-parametric Quantile Factor Model

Authors: Todd E. Clark, Florian Huber, Gary Koop, Massimiliano Marcellino, Michael Pfarrhofer

We develop a Bayesian non-parametric quantile panel regression model. Within
each quantile, the response function is a convex combination of a linear model
and a non-linear function, which we approximate using Bayesian Additive
Regression Trees (BART). Cross-sectional information at the pth quantile is
captured through a conditionally heteroscedastic latent factor. The
non-parametric feature of our model enhances flexibility, while the panel
feature, by exploiting cross-country information, increases the number of
observations in the tails. We develop Bayesian Markov chain Monte Carlo (MCMC)
methods for estimation and forecasting with our quantile factor BART model
(QF-BART), and apply them to study growth at risk dynamics in a panel of 11
advanced economies.

arXiv link: http://arxiv.org/abs/2110.03411v1

Econometrics arXiv cross-link from math.OC (math.OC), submitted: 2021-10-07

Solving Multistage Stochastic Linear Programming via Regularized Linear Decision Rules: An Application to Hydrothermal Dispatch Planning

Authors: Felipe Nazare, Alexandre Street

The solution of multistage stochastic linear problems (MSLP) represents a
challenge for many application areas. Long-term hydrothermal dispatch planning
(LHDP) materializes this challenge in a real-world problem that affects
electricity markets, economies, and natural resources worldwide. No closed-form
solutions are available for MSLP and the definition of non-anticipative
policies with high-quality out-of-sample performance is crucial. Linear
decision rules (LDR) provide an interesting simulation-based framework for
finding high-quality policies for MSLP through two-stage stochastic models. In
practical applications, however, the number of parameters to be estimated when
using an LDR may be close to or higher than the number of scenarios of the
sample average approximation problem, thereby generating an in-sample overfit
and poor performances in out-of-sample simulations. In this paper, we propose a
novel regularized LDR to solve MSLP based on the AdaLASSO (adaptive least
absolute shrinkage and selection operator). The goal is to use the parsimony
principle, as largely studied in high-dimensional linear regression models, to
obtain better out-of-sample performance for LDR applied to MSLP. Computational
experiments show that the overfit threat is non-negligible when using classical
non-regularized LDR to solve the LHDP, one of the most studied MSLP with
relevant applications. Our analysis highlights the following benefits of the
proposed framework in comparison to the non-regularized benchmark: 1)
significant reductions in the number of non-zero coefficients (model
parsimony), 2) substantial cost reductions in out-of-sample evaluations, and 3)
improved spot-price profiles.

arXiv link: http://arxiv.org/abs/2110.03146v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-10-06

Robust Generalized Method of Moments: A Finite Sample Viewpoint

Authors: Dhruv Rohatgi, Vasilis Syrgkanis

For many inference problems in statistics and econometrics, the unknown
parameter is identified by a set of moment conditions. A generic method of
solving moment conditions is the Generalized Method of Moments (GMM). However,
classical GMM estimation is potentially very sensitive to outliers. Robustified
GMM estimators have been developed in the past, but suffer from several
drawbacks: computational intractability, poor dimension-dependence, and no
quantitative recovery guarantees in the presence of a constant fraction of
outliers. In this work, we develop the first computationally efficient GMM
estimator (under intuitive assumptions) that can tolerate a constant $\epsilon$
fraction of adversarially corrupted samples, and that has an $\ell_2$ recovery
guarantee of $O(\epsilon)$. To achieve this, we draw upon and extend a
recent line of work on algorithmic robust statistics for related but simpler
problems such as mean estimation, linear regression and stochastic
optimization. As two examples of the generality of our algorithm, we show how
our estimation algorithm and assumptions apply to instrumental variables linear
and logistic regression. Moreover, we experimentally validate that our
estimator outperforms classical IV regression and two-stage Huber regression on
synthetic and semi-synthetic datasets with corruption.

arXiv link: http://arxiv.org/abs/2110.03070v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-10-06

RieszNet and ForestRiesz: Automatic Debiased Machine Learning with Neural Nets and Random Forests

Authors: Victor Chernozhukov, Whitney K. Newey, Victor Quintas-Martinez, Vasilis Syrgkanis

Many causal and policy effects of interest are defined by linear functionals
of high-dimensional or non-parametric regression functions.
$n$-consistent and asymptotically normal estimation of the object of
interest requires debiasing to reduce the effects of regularization and/or
model selection on the object of interest. Debiasing is typically achieved by
adding a correction term to the plug-in estimator of the functional, which
leads to properties such as semi-parametric efficiency, double robustness, and
Neyman orthogonality. We implement an automatic debiasing procedure based on
automatically learning the Riesz representation of the linear functional using
Neural Nets and Random Forests. Our method only relies on black-box evaluation
oracle access to the linear functional and does not require knowledge of its
analytic form. We propose a multitasking Neural Net debiasing method with
stochastic gradient descent minimization of a combined Riesz representer and
regression loss, while sharing representation layers for the two functions. We
also propose a Random Forest method which learns a locally linear
representation of the Riesz function. Even though our method applies to
arbitrary functionals, we experimentally find that it performs well compared to
the state of art neural net based algorithm of Shi et al. (2019) for the case
of the average treatment effect functional. We also evaluate our method on the
problem of estimating average marginal effects with continuous treatments,
using semi-synthetic data of gasoline price changes on gasoline demand.

arXiv link: http://arxiv.org/abs/2110.03031v3

Econometrics arXiv paper, submitted: 2021-10-06

New insights into price drivers of crude oil futures markets: Evidence from quantile ARDL approach

Authors: Hao-Lin Shao, Ying-Hui Shao, Yan-Hong Yang

This paper investigates the cointegration between possible determinants of
crude oil futures prices during the COVID-19 pandemic period. We perform
comparative analysis of WTI and newly-launched Shanghai crude oil futures (SC)
via the Autoregressive Distributed Lag (ARDL) model and Quantile Autoregressive
Distributed Lag (QARDL) model. The empirical results confirm that economic
policy uncertainty, stock markets, interest rates and coronavirus panic are
important drivers of WTI futures prices. Our findings also suggest that the US
and China's stock markets play vital roles in movements of SC futures prices.
Meanwhile, CSI300 stock index has a significant positive short-run impact on SC
futures prices while S&P500 prices possess a positive nexus with SC futures
prices both in long-run and short-run. Overall, these empirical evidences
provide practical implications for investors and policymakers.

arXiv link: http://arxiv.org/abs/2110.02693v1

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2021-10-05

Distcomp: Comparing distributions

Authors: David M. Kaplan

The distcomp command is introduced and illustrated. The command assesses
whether or not two distributions differ at each possible value while
controlling the probability of any false positive, even in finite samples.
Syntax and the underlying methodology (from Goldman and Kaplan, 2018) are
discussed. Multiple examples illustrate the distcomp command, including
revisiting the experimental data of Gneezy and List (2006) and the regression
discontinuity design of Cattaneo, Frandsen, and Titiunik (2015).

arXiv link: http://arxiv.org/abs/2110.02327v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2021-10-05

Gambits: Theory and Evidence

Authors: Shiva Maharaj, Nicholas Polson, Christian Turk

Gambits are central to human decision-making. Our goal is to provide a theory
of Gambits. A Gambit is a combination of psychological and technical factors
designed to disrupt predictable play. Chess provides an environment to study
gambits and behavioral game theory. Our theory is based on the Bellman
optimality path for sequential decision-making. This allows us to calculate the
$Q$-values of a Gambit where material (usually a pawn) is sacrificed for
dynamic play. On the empirical side, we study the effectiveness of a number of
popular chess Gambits. This is a natural setting as chess Gambits require a
sequential assessment of a set of moves (a.k.a. policy) after the Gambit has
been accepted. Our analysis uses Stockfish 14.1 to calculate the optimal
Bellman $Q$ values, which fundamentally measures if a position is winning or
losing. To test whether Bellman's equation holds in play, we estimate the
transition probabilities to the next board state via a database of expert human
play. This then allows us to test whether the Gambiteer is following the
optimal path in his decision-making. Our methodology is applied to the popular
Stafford and reverse Stafford (a.k.a. Boden-Kieretsky-Morphy) Gambit and other
common ones including the Smith-Morra, Goring, Danish and Halloween Gambits. We
build on research in human decision-making by proving an irrational skewness
preference within agents in chess. We conclude with directions for future
research.

arXiv link: http://arxiv.org/abs/2110.02755v5

Econometrics arXiv paper, submitted: 2021-10-05

A New Multivariate Predictive Model for Stock Returns

Authors: Jianying Xie

One of the most important studies in finance is to find out whether stock
returns could be predicted. This research aims to create a new multivariate
model, which includes dividend yield, earnings-to-price ratio, book-to-market
ratio as well as consumption-wealth ratio as explanatory variables, for future
stock returns predictions. The new multivariate model will be assessed for its
forecasting performance using empirical analysis. The empirical analysis is
performed on S&P500 quarterly data from Quarter 1, 1952 to Quarter 4, 2019 as
well as S&P500 monthly data from Month 12, 1920 to Month 12, 2019. Results have
shown this new multivariate model has predictability for future stock returns.
When compared to other benchmark models, the new multivariate model performs
the best in terms of the Root Mean Squared Error (RMSE) most of the time.

arXiv link: http://arxiv.org/abs/2110.01873v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-10-04

Beware the Gini Index! A New Inequality Measure

Authors: Sabiou Inoua

The Gini index underestimates inequality for heavy-tailed distributions: for
example, a Pareto distribution with exponent 1.5 (which has infinite variance)
has the same Gini index as any exponential distribution (a mere 0.5). This is
because the Gini index is relatively robust to extreme observations: while a
statistic's robustness to extremes is desirable for data potentially distorted
by outliers, it is misleading for heavy-tailed distributions, which inherently
exhibit extremes. We propose an alternative inequality index: the variance
normalized by the second moment. This ratio is more stable (hence more
reliable) for large samples from an infinite-variance distribution than the
Gini index paradoxically. Moreover, the new index satisfies the normative
axioms of inequality measurement; notably, it is decomposable into inequality
within and between subgroups, unlike the Gini index.

arXiv link: http://arxiv.org/abs/2110.01741v1

Econometrics arXiv updated paper (originally submitted: 2021-10-04)

Effect or Treatment Heterogeneity? Policy Evaluation with Aggregated and Disaggregated Treatments

Authors: Phillip Heiler, Michael C. Knaus

Binary treatments are often ex-post aggregates of multiple treatments or can
be disaggregated into multiple treatment versions. Thus, effects can be
heterogeneous due to either effect or treatment heterogeneity. We propose a
decomposition method that uncovers masked heterogeneity, avoids spurious
discoveries, and evaluates treatment assignment quality. The estimation and
inference procedure based on double/debiased machine learning allows for
high-dimensional confounding, many treatments and extreme propensity scores.
Our applications suggest that heterogeneous effects of smoking on birthweight
are partially due to different smoking intensities and that gender gaps in Job
Corps effectiveness are largely explained by differential selection into
vocational training.

arXiv link: http://arxiv.org/abs/2110.01427v4

Econometrics arXiv updated paper (originally submitted: 2021-10-03)

Identification and Estimation in a Time-Varying Endogenous Random Coefficient Panel Data Model

Authors: Ming Li

This paper proposes a correlated random coefficient linear panel data model,
where regressors can be correlated with time-varying and individual-specific
random coefficients through both a fixed effect and a time-varying random
shock. I develop a new panel data-based identification method to identify the
average partial effect and the local average response function. The
identification strategy employs a sufficient statistic to control for the fixed
effect and a conditional control variable for the random shock. Conditional on
these two controls, the residual variation in the regressors is driven solely
by the exogenous instrumental variables, and thus can be exploited to identify
the parameters of interest. The constructive identification analysis leads to
three-step series estimators, for which I establish rates of convergence and
asymptotic normality. To illustrate the method, I estimate a heterogeneous
Cobb-Douglas production function for manufacturing firms in China, finding
substantial variations in output elasticities across firms.

arXiv link: http://arxiv.org/abs/2110.00982v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-10-03

Hierarchical Gaussian Process Models for Regression Discontinuity/Kink under Sharp and Fuzzy Designs

Authors: Ximing Wu

We propose nonparametric Bayesian estimators for causal inference exploiting
Regression Discontinuity/Kink (RD/RK) under sharp and fuzzy designs. Our
estimators are based on Gaussian Process (GP) regression and classification.
The GP methods are powerful probabilistic machine learning approaches that are
advantageous in terms of derivative estimation and uncertainty quantification,
facilitating RK estimation and inference of RD/RK models. These estimators are
extended to hierarchical GP models with an intermediate Bayesian neural network
layer and can be characterized as hybrid deep learning models. Monte Carlo
simulations show that our estimators perform comparably to and sometimes better
than competing estimators in terms of precision, coverage and interval length.
The hierarchical GP models considerably improve upon one-layer GP models. We
apply the proposed methods to estimate the incumbency advantage of US house
elections. Our estimations suggest a significant incumbency advantage in terms
of both vote share and probability of winning in the next elections. Lastly we
present an extension to accommodate covariate adjustment.

arXiv link: http://arxiv.org/abs/2110.00921v2

Econometrics arXiv paper, submitted: 2021-10-02

Probabilistic Prediction for Binary Treatment Choice: with focus on personalized medicine

Authors: Charles F. Manski

This paper extends my research applying statistical decision theory to
treatment choice with sample data, using maximum regret to evaluate the
performance of treatment rules. The specific new contribution is to study as-if
optimization using estimates of illness probabilities in clinical choice
between surveillance and aggressive treatment. Beyond its specifics, the paper
sends a broad message. Statisticians and computer scientists have addressed
conditional prediction for decision making in indirect ways, the former
applying classical statistical theory and the latter measuring prediction
accuracy in test samples. Neither approach is satisfactory. Statistical
decision theory provides a coherent, generally applicable methodology.

arXiv link: http://arxiv.org/abs/2110.00864v1

Econometrics arXiv updated paper (originally submitted: 2021-10-01)

Relative Contagiousness of Emerging Virus Variants: An Analysis of the Alpha, Delta, and Omicron SARS-CoV-2 Variants

Authors: Peter Reinhard Hansen

We propose a simple dynamic model for estimating the relative contagiousness
of two virus variants. Maximum likelihood estimation and inference is
conveniently invariant to variation in the total number of cases over the
sample period and can be expressed as a logistic regression. We apply the model
to Danish SARS-CoV-2 variant data. We estimate the reproduction numbers of
Alpha and Delta to be larger than that of the ancestral variant by a factor of
1.51 [CI 95%: 1.50, 1.53] and 3.28 [CI 95%: 3.01, 3.58], respectively. In a
predominately vaccinated population, we estimate Omicron to be 3.15 [CI 95%:
2.83, 3.50] times more infectious than Delta. Forecasting the proportion of an
emerging virus variant is straight forward and we proceed to show how the
effective reproduction number for a new variant can be estimated without
contemporary sequencing results. This is useful for assessing the state of the
pandemic in real time as we illustrate empirically with the inferred effective
reproduction number for the Alpha variant.

arXiv link: http://arxiv.org/abs/2110.00533v3

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2021-09-30

Stochastic volatility model with range-based correction and leverage

Authors: Yuta Kurose

This study presents contemporaneous modeling of asset return and price range
within the framework of stochastic volatility with leverage. A new
representation of the probability density function for the price range is
provided, and its accurate sampling algorithm is developed. A Bayesian
estimation using Markov chain Monte Carlo (MCMC) method is provided for the
model parameters and unobserved variables. MCMC samples can be generated
rigorously, despite the estimation procedure requiring sampling from a density
function with the sum of an infinite series. The empirical results obtained
using data from the U.S. market indices are consistent with the stylized facts
in the financial market, such as the existence of the leverage effect. In
addition, to explore the model's predictive ability, a model comparison based
on the volatility forecast performance is conducted.

arXiv link: http://arxiv.org/abs/2110.00039v2

Econometrics arXiv paper, submitted: 2021-09-30

Causal Matrix Completion

Authors: Anish Agarwal, Munther Dahleh, Devavrat Shah, Dennis Shen

Matrix completion is the study of recovering an underlying matrix from a
sparse subset of noisy observations. Traditionally, it is assumed that the
entries of the matrix are "missing completely at random" (MCAR), i.e., each
entry is revealed at random, independent of everything else, with uniform
probability. This is likely unrealistic due to the presence of "latent
confounders", i.e., unobserved factors that determine both the entries of the
underlying matrix and the missingness pattern in the observed matrix. For
example, in the context of movie recommender systems -- a canonical application
for matrix completion -- a user who vehemently dislikes horror films is
unlikely to ever watch horror films. In general, these confounders yield
"missing not at random" (MNAR) data, which can severely impact any inference
procedure that does not correct for this bias. We develop a formal causal model
for matrix completion through the language of potential outcomes, and provide
novel identification arguments for a variety of causal estimands of interest.
We design a procedure, which we call "synthetic nearest neighbors" (SNN), to
estimate these causal estimands. We prove finite-sample consistency and
asymptotic normality of our estimator. Our analysis also leads to new
theoretical results for the matrix completion literature. In particular, we
establish entry-wise, i.e., max-norm, finite-sample consistency and asymptotic
normality results for matrix completion with MNAR data. As a special case, this
also provides entry-wise bounds for matrix completion with MCAR data. Across
simulated and real data, we demonstrate the efficacy of our proposed estimator.

arXiv link: http://arxiv.org/abs/2109.15154v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-09-30

Towards Principled Causal Effect Estimation by Deep Identifiable Models

Authors: Pengzhou Wu, Kenji Fukumizu

As an important problem in causal inference, we discuss the estimation of
treatment effects (TEs). Representing the confounder as a latent variable, we
propose Intact-VAE, a new variant of variational autoencoder (VAE), motivated
by the prognostic score that is sufficient for identifying TEs. Our VAE also
naturally gives representations balanced for treatment groups, using its prior.
Experiments on (semi-)synthetic datasets show state-of-the-art performance
under diverse settings, including unobserved confounding. Based on the
identifiability of our model, we prove identification of TEs under
unconfoundedness, and also discuss (possible) extensions to harder settings.

arXiv link: http://arxiv.org/abs/2109.15062v2

Econometrics arXiv paper, submitted: 2021-09-30

Nonparametric Bounds on Treatment Effects with Imperfect Instruments

Authors: Kyunghoon Ban, Désiré Kédagni

This paper extends the identification results in Nevo and Rosen (2012) to
nonparametric models. We derive nonparametric bounds on the average treatment
effect when an imperfect instrument is available. As in Nevo and Rosen (2012),
we assume that the correlation between the imperfect instrument and the
unobserved latent variables has the same sign as the correlation between the
endogenous variable and the latent variables. We show that the monotone
treatment selection and monotone instrumental variable restrictions, introduced
by Manski and Pepper (2000, 2009), jointly imply this assumption. Moreover, we
show how the monotone treatment response assumption can help tighten the
bounds. The identified set can be written in the form of intersection bounds,
which is more conducive to inference. We illustrate our methodology using the
National Longitudinal Survey of Young Men data to estimate returns to
schooling.

arXiv link: http://arxiv.org/abs/2109.14785v1

Econometrics arXiv updated paper (originally submitted: 2021-09-29)

Testing the Presence of Implicit Hiring Quotas with Application to German Universities

Authors: Lena Janys

It is widely accepted that women are underrepresented in academia in general
and economics in particular. This paper introduces a test to detect an
under-researched form of hiring bias: implicit quotas. I derive a test under
the Null of random hiring that requires no information about individual hires
under some assumptions. I derive the asymptotic distribution of this test
statistic and, as an alternative, propose a parametric bootstrap procedure that
samples from the exact distribution. This test can be used to analyze a variety
of other hiring settings. I analyze the distribution of female professors at
German universities across 50 different disciplines. I show that the
distribution of women, given the average number of women in the respective
field, is highly unlikely to result from a random allocation of women across
departments and more likely to stem from an implicit quota of one or two women
on the department level. I also show that a large part of the variation in the
share of women across STEM and non-STEM disciplines could be explained by a
two-women quota on the department level. These findings have important
implications for the potential effectiveness of policies aimed at reducing
underrepresentation and providing evidence of how stakeholders perceive and
evaluate diversity.

arXiv link: http://arxiv.org/abs/2109.14343v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2021-09-28

Forecasting the COVID-19 vaccine uptake rate: An infodemiological study in the US

Authors: Xingzuo Zhou, Yiang Li

A year following the initial COVID-19 outbreak in China, many countries have
approved emergency vaccines. Public-health practitioners and policymakers must
understand the predicted populational willingness for vaccines and implement
relevant stimulation measures. This study developed a framework for predicting
vaccination uptake rate based on traditional clinical data-involving an
autoregressive model with autoregressive integrated moving average (ARIMA)- and
innovative web search queries-involving a linear regression with ordinary least
squares/least absolute shrinkage and selection operator, and machine-learning
with boost and random forest. For accuracy, we implemented a stacking
regression for the clinical data and web search queries. The stacked regression
of ARIMA (1,0,8) for clinical data and boost with support vector machine for
web data formed the best model for forecasting vaccination speed in the US. The
stacked regression provided a more accurate forecast. These results can help
governments and policymakers predict vaccine demand and finance relevant
programs.

arXiv link: http://arxiv.org/abs/2109.13971v2

Econometrics arXiv paper, submitted: 2021-09-28

No-Regret Forecasting with Egalitarian Committees

Authors: Jiun-Hua Su

The forecast combination puzzle is often found in literature: The
equal-weight scheme tends to outperform sophisticated methods of combining
individual forecasts. Exploiting this finding, we propose a hedge egalitarian
committees algorithm (HECA), which can be implemented via mixed integer
quadratic programming. Specifically, egalitarian committees are formed by the
ridge regression with shrinkage toward equal weights; subsequently, the
forecasts provided by these committees are averaged by the hedge algorithm. We
establish the no-regret property of HECA. Using data collected from the ECB
Survey of Professional Forecasters, we find the superiority of HECA relative to
the equal-weight scheme during the COVID-19 recession.

arXiv link: http://arxiv.org/abs/2109.13801v1

Econometrics arXiv paper, submitted: 2021-09-28

Macroeconomic forecasting with LSTM and mixed frequency time series data

Authors: Sarun Kamolthip

This paper demonstrates the potentials of the long short-term memory (LSTM)
when applyingwith macroeconomic time series data sampled at different
frequencies. We first present how theconventional LSTM model can be adapted to
the time series observed at mixed frequencies when thesame mismatch ratio is
applied for all pairs of low-frequency output and higher-frequency variable.
Togeneralize the LSTM to the case of multiple mismatch ratios, we adopt the
unrestricted Mixed DAtaSampling (U-MIDAS) scheme (Foroni et al., 2015) into the
LSTM architecture. We assess via bothMonte Carlo simulations and empirical
application the out-of-sample predictive performance. Ourproposed models
outperform the restricted MIDAS model even in a set up favorable to the
MIDASestimator. For real world application, we study forecasting a quarterly
growth rate of Thai realGDP using a vast array of macroeconomic indicators both
quarterly and monthly. Our LSTM withU-MIDAS scheme easily beats the simple
benchmark AR(1) model at all horizons, but outperformsthe strong benchmark
univariate LSTM only at one and six months ahead. Nonetheless, we find thatour
proposed model could be very helpful in the period of large economic downturns
for short-termforecast. Simulation and empirical results seem to support the
use of our proposed LSTM withU-MIDAS scheme to nowcasting application.

arXiv link: http://arxiv.org/abs/2109.13777v1

Econometrics arXiv updated paper (originally submitted: 2021-09-28)

Gaussian and Student's $t$ mixture vector autoregressive model with application to the effects of the Euro area monetary policy shock

Authors: Savi Virolainen

A new mixture vector autoregressive model based on Gaussian and Student's $t$
distributions is introduced. As its mixture components, our model incorporates
conditionally homoskedastic linear Gaussian vector autoregressions and
conditionally heteroskedastic linear Student's $t$ vector autoregressions. For
a $p$th order model, the mixing weights depend on the full distribution of the
preceding $p$ observations, which leads to attractive practical and theoretical
properties such as ergodicity and full knowledge of the stationary distribution
of $p+1$ consecutive observations. A structural version of the model with
statistically identified shocks is also proposed. The empirical application
studies the effects of the Euro area monetary policy shock. We fit a two-regime
model to the data and find the effects, particularly on inflation, stronger in
the regime that mainly prevails before the Financial crisis than in the regime
that mainly dominates after it. The introduced methods are implemented in the
accompanying R package gmvarkit.

arXiv link: http://arxiv.org/abs/2109.13648v4

Econometrics arXiv updated paper (originally submitted: 2021-09-28)

bqror: An R package for Bayesian Quantile Regression in Ordinal Models

Authors: Prajual Maheshwari, Mohammad Arshad Rahman

This article describes an R package bqror that estimates Bayesian quantile
regression for ordinal models introduced in Rahman (2016). The paper classifies
ordinal models into two types and offers computationally efficient, yet simple,
Markov chain Monte Carlo (MCMC) algorithms for estimating ordinal quantile
regression. The generic ordinal model with 3 or more outcomes (labeled ORI
model) is estimated by a combination of Gibbs sampling and Metropolis-Hastings
algorithm. Whereas an ordinal model with exactly 3 outcomes (labeled ORII
model) is estimated using Gibbs sampling only. In line with the Bayesian
literature, we suggest using marginal likelihood for comparing alternative
quantile regression models and explain how to compute the same. The models and
their estimation procedures are illustrated via multiple simulation studies and
implemented in two applications. The article also describes several other
functions contained within the bqror package, which are necessary for
estimation, inference, and assessing model fit.

arXiv link: http://arxiv.org/abs/2109.13606v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-09-27

Assessing Outcome-to-Outcome Interference in Sibling Fixed Effects Models

Authors: David C. Mallinson

Sibling fixed effects (FE) models are useful for estimating causal treatment
effects while offsetting unobserved sibling-invariant confounding. However,
treatment estimates are biased if an individual's outcome affects their
sibling's outcome. We propose a robustness test for assessing the presence of
outcome-to-outcome interference in linear two-sibling FE models. We regress a
gain-score--the difference between siblings' continuous outcomes--on both
siblings' treatments and on a pre-treatment observed FE. Under certain
restrictions, the observed FE's partial regression coefficient signals the
presence of outcome-to-outcome interference. Monte Carlo simulations
demonstrated the robustness test under several models. We found that an
observed FE signaled outcome-to-outcome spillover if it was directly associated
with an sibling-invariant confounder of treatments and outcomes, directly
associated with a sibling's treatment, or directly and equally associated with
both siblings' outcomes. However, the robustness test collapsed if the observed
FE was directly but differentially associated with siblings' outcomes or if
outcomes affected siblings' treatments.

arXiv link: http://arxiv.org/abs/2109.13399v2

Econometrics arXiv paper, submitted: 2021-09-26

Design and validation of an index to measure development in rural areas through stakeholder participation

Authors: Abreu I., Mesias F. J., Ramajo, J

This paper proposes the development of an index to assess rural development
based on a set of 25 demographic, economic, environmental, and social welfare
indicators previously selected through a Delphi approach. Three widely accepted
aggregation methods were then tested: a mixed arithmetic/geometric mean without
weightings for each indicator; a weighted arithmetic mean using the weights
previously generated by the Delphi panel and an aggregation through Principal
Component Analysis. These three methodologies were later applied to 9
Portuguese NUTS III regions, and the results were presented to a group of
experts in rural development who indicated which of the three forms of
aggregation best measured the levels of rural development of the different
territories. Finally, it was concluded that the unweighted arithmetic/geometric
mean was the most accurate methodology for aggregating indicators to create a
Rural Development Index.

arXiv link: http://arxiv.org/abs/2109.12568v1

Econometrics arXiv cross-link from q-fin.TR (q-fin.TR), submitted: 2021-09-24

Periodicity in Cryptocurrency Volatility and Liquidity

Authors: Peter Reinhard Hansen, Chan Kim, Wade Kimbrough

We study recurrent patterns in volatility and volume for major
cryptocurrencies, Bitcoin and Ether, using data from two centralized exchanges
(Coinbase Pro and Binance) and a decentralized exchange (Uniswap V2). We find
systematic patterns in both volatility and volume across day-of-the-week,
hour-of-the-day, and within the hour. These patterns have grown stronger over
the years and can be related to algorithmic trading and funding times in
futures markets. We also document that price formation mainly takes place on
the centralized exchanges while price adjustments on the decentralized
exchanges can be sluggish.

arXiv link: http://arxiv.org/abs/2109.12142v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-09-24

Combining Discrete Choice Models and Neural Networks through Embeddings: Formulation, Interpretability and Performance

Authors: Ioanna Arkoudi, Carlos Lima Azevedo, Francisco C. Pereira

This study proposes a novel approach that combines theory and data-driven
choice models using Artificial Neural Networks (ANNs). In particular, we use
continuous vector representations, called embeddings, for encoding categorical
or discrete explanatory variables with a special focus on interpretability and
model transparency. Although embedding representations within the logit
framework have been conceptualized by Pereira (2019), their dimensions do not
have an absolute definitive meaning, hence offering limited behavioral insights
in this earlier work. The novelty of our work lies in enforcing
interpretability to the embedding vectors by formally associating each of their
dimensions to a choice alternative. Thus, our approach brings benefits much
beyond a simple parsimonious representation improvement over dummy encoding, as
it provides behaviorally meaningful outputs that can be used in travel demand
analysis and policy decisions. Additionally, in contrast to previously
suggested ANN-based Discrete Choice Models (DCMs) that either sacrifice
interpretability for performance or are only partially interpretable, our
models preserve interpretability of the utility coefficients for all the input
variables despite being based on ANN principles. The proposed models were
tested on two real world datasets and evaluated against benchmark and baseline
models that use dummy-encoding. The results of the experiments indicate that
our models deliver state-of-the-art predictive performance, outperforming
existing ANN-based models while drastically reducing the number of required
network parameters.

arXiv link: http://arxiv.org/abs/2109.12042v2

Econometrics arXiv updated paper (originally submitted: 2021-09-24)

Linear Panel Regressions with Two-Way Unobserved Heterogeneity

Authors: Hugo Freeman, Martin Weidner

We study linear panel regression models in which the unobserved error term is
an unknown smooth function of two-way unobserved fixed effects. In standard
additive or interactive fixed effect models the individual specific and time
specific effects are assumed to enter with a known functional form (additive or
multiplicative). In this paper, we allow for this functional form to be more
general and unknown. We discuss two different estimation approaches that allow
consistent estimation of the regression parameters in this setting as the
number of individuals and the number of time periods grow to infinity. The
first approach uses the interactive fixed effect estimator in Bai (2009), which
is still applicable here, as long as the number of factors in the estimation
grows asymptotically. The second approach first discretizes the two-way
unobserved heterogeneity (similar to what Bonhomme, Lamadon and Manresa 2021
are doing for one-way heterogeneity) and then estimates a simple linear fixed
effect model with additive two-way grouped fixed effects. For both estimation
methods we obtain asymptotic convergence results, perform Monte Carlo
simulations, and employ the estimators in an empirical application to UK house
price data.

arXiv link: http://arxiv.org/abs/2109.11911v3

Econometrics arXiv updated paper (originally submitted: 2021-09-23)

Treatment Effects in Market Equilibrium

Authors: Evan Munro, Xu Kuang, Stefan Wager

Policy-relevant treatment effect estimation in a marketplace setting requires
taking into account both the direct benefit of the treatment and any spillovers
induced by changes to the market equilibrium. The standard way to address these
challenges is to evaluate interventions via cluster-randomized experiments,
where each cluster corresponds to an isolated market. This approach, however,
cannot be used when we only have access to a single market (or a small number
of markets). Here, we show how to identify and estimate policy-relevant
treatment effects using a unit-level randomized trial run within a single large
market. A standard Bernoulli-randomized trial allows consistent estimation of
direct effects, and of treatment heterogeneity measures that can be used for
welfare-improving targeting. Estimating spillovers - as well as providing
confidence intervals for the direct effect - requires estimates of price
elasticities, which we provide using an augmented experimental design. Our
results rely on all spillovers being mediated via the (observed) prices of a
finite number of traded goods, and the market power of any single unit decaying
as the market gets large. We illustrate our results using a simulation
calibrated to a conditional cash transfer experiment in the Philippines.

arXiv link: http://arxiv.org/abs/2109.11647v4

Econometrics arXiv paper, submitted: 2021-09-22

A Wavelet Method for Panel Models with Jump Discontinuities in the Parameters

Authors: Oualid Bada, Alois Kneip, Dominik Liebl, Tim Mensinger, James Gualtieri, Robin C. Sickles

While a substantial literature on structural break change point analysis
exists for univariate time series, research on large panel data models has not
been as extensive. In this paper, a novel method for estimating panel models
with multiple structural changes is proposed. The breaks are allowed to occur
at unknown points in time and may affect the multivariate slope parameters
individually. Our method adapts Haar wavelets to the structure of the observed
variables in order to detect the change points of the parameters consistently.
We also develop methods to address endogenous regressors within our modeling
framework. The asymptotic property of our estimator is established. In our
application, we examine the impact of algorithmic trading on standard measures
of market quality such as liquidity and volatility over a time period that
covers the financial meltdown that began in 2007. We are able to detect jumps
in regression slope parameters automatically without using ad-hoc subsample
selection criteria.

arXiv link: http://arxiv.org/abs/2109.10950v1

Econometrics arXiv updated paper (originally submitted: 2021-09-22)

Algorithms for Inference in SVARs Identified with Sign and Zero Restrictions

Authors: Matthew Read

I develop algorithms to facilitate Bayesian inference in structural vector
autoregressions that are set-identified with sign and zero restrictions by
showing that the system of restrictions is equivalent to a system of sign
restrictions in a lower-dimensional space. Consequently, algorithms applicable
under sign restrictions can be extended to allow for zero restrictions.
Specifically, I extend algorithms proposed in Amir-Ahmadi and Drautzburg (2021)
to check whether the identified set is nonempty and to sample from the
identified set without rejection sampling. I compare the new algorithms to
alternatives by applying them to variations of the model considered by Arias et
al. (2019), who estimate the effects of US monetary policy using sign and zero
restrictions on the monetary policy reaction function. The new algorithms are
particularly useful when a rich set of sign restrictions substantially
truncates the identified set given the zero restrictions.

arXiv link: http://arxiv.org/abs/2109.10676v2

Econometrics arXiv updated paper (originally submitted: 2021-09-21)

A new look at the anthropogenic global warming consensus: an econometric forecast based on the ARIMA model of paleoclimate series

Authors: Gilmar V. F. Santos, Lucas G. Cordeiro, Claudio A. Rojo, Edison L. Leismann

This paper aims to project a climate change scenario using a stochastic
paleotemperature time series model and compare it with the prevailing
consensus. The ARIMA - Autoregressive Integrated Moving Average Process model
was used for this purpose. The results show that the parameter estimates of the
model were below what is established by the anthropogenic current and
governmental organs, such as the IPCC (UN), considering a 100-year scenario,
which suggests a period of temperature reduction and a probable cooling. Thus,
we hope with this study to contribute to the discussion by adding a statistical
element of paleoclimate in counterpoint to the current scientific consensus and
place the debate in a long-term historical dimension, in line with other
existing research on the topic.

arXiv link: http://arxiv.org/abs/2109.10419v2

Econometrics arXiv updated paper (originally submitted: 2021-09-21)

Modeling and Analysis of Discrete Response Data: Applications to Public Opinion on Marijuana Legalization in the United States

Authors: Mohit Batham, Soudeh Mirghasemi, Mohammad Arshad Rahman, Manini Ojha

This chapter presents an overview of a specific form of limited dependent
variable models, namely discrete choice models, where the dependent (response
or outcome) variable takes values which are discrete, inherently ordered, and
characterized by an underlying continuous latent variable. Within this setting,
the dependent variable may take only two discrete values (such as 0 and 1)
giving rise to binary models (e.g., probit and logit models) or more than two
values (say $j=1,2, \ldots, J$, where $J$ is some integer, typically small)
giving rise to ordinal models (e.g., ordinal probit and ordinal logit models).
In these models, the primary goal is to model the probability of
responses/outcomes conditional on the covariates. We connect the outcomes of a
discrete choice model to the random utility framework in economics, discuss
estimation techniques, present the calculation of covariate effects and
measures to assess model fitting. Some recent advances in discrete data
modeling are also discussed. Following the theoretical review, we utilize the
binary and ordinal models to analyze public opinion on marijuana legalization
and the extent of legalization -- a socially relevant but controversial topic
in the United States. We obtain several interesting results including that past
use of marijuana, belief about legalization and political partisanship are
important factors that shape the public opinion.

arXiv link: http://arxiv.org/abs/2109.10122v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-09-19

Unifying Design-based Inference: On Bounding and Estimating the Variance of any Linear Estimator in any Experimental Design

Authors: Joel A. Middleton

This paper provides a design-based framework for variance (bound) estimation
in experimental analysis. Results are applicable to virtually any combination
of experimental design, linear estimator (e.g., difference-in-means, OLS, WLS)
and variance bound, allowing for unified treatment and a basis for systematic
study and comparison of designs using matrix spectral analysis. A proposed
variance estimator reproduces Eicker-Huber-White (aka. "robust",
"heteroskedastic consistent", "sandwich", "White", "Huber-White", "HC", etc.)
standard errors and "cluster-robust" standard errors as special cases. While
past work has shown algebraic equivalences between design-based and the
so-called "robust" standard errors under some designs, this paper motivates
them for a wide array of design-estimator-bound triplets. In so doing, it
provides a clearer and more general motivation for variance estimators.

arXiv link: http://arxiv.org/abs/2109.09220v1

Econometrics arXiv updated paper (originally submitted: 2021-09-19)

Tests for Group-Specific Heterogeneity in High-Dimensional Factor Models

Authors: Antoine Djogbenou, Razvan Sufana

Standard high-dimensional factor models assume that the comovements in a
large set of variables could be modeled using a small number of latent factors
that affect all variables. In many relevant applications in economics and
finance, heterogenous comovements specific to some known groups of variables
naturally arise, and reflect distinct cyclical movements within those groups.
This paper develops two new statistical tests that can be used to investigate
whether there is evidence supporting group-specific heterogeneity in the data.
The first test statistic is designed for the alternative hypothesis of
group-specific heterogeneity appearing in at least one pair of groups; the
second is for the alternative of group-specific heterogeneity appearing in all
pairs of groups. We show that the second moment of factor loadings changes
across groups when heterogeneity is present, and use this feature to establish
the theoretical validity of the tests. We also propose and prove the validity
of a permutation approach for approximating the asymptotic distributions of the
two test statistics. The simulations and the empirical financial application
indicate that the proposed tests are useful for detecting group-specific
heterogeneity.

arXiv link: http://arxiv.org/abs/2109.09049v2

Econometrics arXiv updated paper (originally submitted: 2021-09-19)

Composite Likelihood for Stochastic Migration Model with Unobserved Factor

Authors: Antoine Djogbenou, Christian Gouriéroux, Joann Jasiak, Maygol Bandehali

We introduce the conditional Maximum Composite Likelihood (MCL) estimation
method for the stochastic factor ordered Probit model of credit rating
transitions of firms. This model is recommended for internal credit risk
assessment procedures in banks and financial institutions under the Basel III
regulations. Its exact likelihood function involves a high-dimensional
integral, which can be approximated numerically before maximization. However,
the estimated migration risk and required capital tend to be sensitive to the
quality of this approximation, potentially leading to statistical regulatory
arbitrage. The proposed conditional MCL estimator circumvents this problem and
maximizes the composite log-likelihood of the factor ordered Probit model. We
present three conditional MCL estimators of different complexity and examine
their consistency and asymptotic normality when n and T tend to infinity. The
performance of these estimators at finite T is examined and compared with a
granularity-based approach in a simulation study. The use of the MCL estimator
is also illustrated in an empirical application.

arXiv link: http://arxiv.org/abs/2109.09043v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2021-09-18

Estimations of the Local Conditional Tail Average Treatment Effect

Authors: Le-Yu Chen, Yu-Min Yen

The conditional tail average treatment effect (CTATE) is defined as a
difference between the conditional tail expectations of potential outcomes,
which can capture heterogeneity and deliver aggregated local information on
treatment effects over different quantile levels and is closely related to the
notion of second-order stochastic dominance and the Lorenz curve. These
properties render it a valuable tool for policy evaluation. In this paper, we
study estimation of the CTATE locally for a group of compliers (local CTATE or
LCTATE) under the two-sided noncompliance framework. We consider a
semiparametric treatment effect framework under endogeneity for the LCTATE
estimation using a newly introduced class of consistent loss functions jointly
for the conditional tail expectation and quantile. We establish the asymptotic
theory of our proposed LCTATE estimator and provide an efficient algorithm for
its implementation. We then apply the method to evaluate the effects of
participating in programs under the Job Training Partnership Act in the US.

arXiv link: http://arxiv.org/abs/2109.08793v3

Econometrics arXiv updated paper (originally submitted: 2021-09-17)

Regression Discontinuity Design with Potentially Many Covariates

Authors: Yoichi Arai, Taisuke Otsu, Myung Hwan Seo

This paper studies the case of possibly high-dimensional covariates in the
regression discontinuity design (RDD) analysis. In particular, we propose
estimation and inference methods for the RDD models with covariate selection
which perform stably regardless of the number of covariates. The proposed
methods combine the local approach using kernel weights with
$\ell_{1}$-penalization to handle high-dimensional covariates. We provide
theoretical and numerical results which illustrate the usefulness of the
proposed methods. Theoretically, we present risk and coverage properties for
our point estimation and inference methods, respectively. Under certain special
case, the proposed estimator becomes more efficient than the conventional
covariate adjusted estimator at the cost of an additional sparsity condition.
Numerically, our simulation experiments and empirical example show the robust
behaviors of the proposed methods to the number of covariates in terms of bias
and variance for point estimation and coverage probability and interval length
for inference.

arXiv link: http://arxiv.org/abs/2109.08351v4

Econometrics arXiv updated paper (originally submitted: 2021-09-16)

Policy Choice and Best Arm Identification: Asymptotic Analysis of Exploration Sampling

Authors: Kaito Ariu, Masahiro Kato, Junpei Komiyama, Kenichiro McAlinn, Chao Qin

We consider the "policy choice" problem -- otherwise known as best arm
identification in the bandit literature -- proposed by Kasy and Sautmann (2021)
for adaptive experimental design. Theorem 1 of Kasy and Sautmann (2021)
provides three asymptotic results that give theoretical guarantees for
exploration sampling developed for this setting. We first show that the proof
of Theorem 1 (1) has technical issues, and the proof and statement of Theorem 1
(2) are incorrect. We then show, through a counterexample, that Theorem 1 (3)
is false. For the former two, we correct the statements and provide rigorous
proofs. For Theorem 1 (3), we propose an alternative objective function, which
we call posterior weighted policy regret, and derive the asymptotic optimality
of exploration sampling.

arXiv link: http://arxiv.org/abs/2109.08229v5

Econometrics arXiv paper, submitted: 2021-09-16

Short and Simple Confidence Intervals when the Directions of Some Effects are Known

Authors: Philipp Ketz, Adam McCloskey

We provide adaptive confidence intervals on a parameter of interest in the
presence of nuisance parameters when some of the nuisance parameters have known
signs. The confidence intervals are adaptive in the sense that they tend to be
short at and near the points where the nuisance parameters are equal to zero.
We focus our results primarily on the practical problem of inference on a
coefficient of interest in the linear regression model when it is unclear
whether or not it is necessary to include a subset of control variables whose
partial effects on the dependent variable have known directions (signs). Our
confidence intervals are trivial to compute and can provide significant length
reductions relative to standard confidence intervals in cases for which the
control variables do not have large effects. At the same time, they entail
minimal length increases at any parameter values. We prove that our confidence
intervals are asymptotically valid uniformly over the parameter space and
illustrate their length properties in an empirical application to a factorial
design field experiment and a Monte Carlo study calibrated to the empirical
application.

arXiv link: http://arxiv.org/abs/2109.08222v1

Econometrics arXiv updated paper (originally submitted: 2021-09-16)

Standard Errors for Calibrated Parameters

Authors: Matthew D. Cocci, Mikkel Plagborg-Møller

Calibration, the practice of choosing the parameters of a structural model to
match certain empirical moments, can be viewed as minimum distance estimation.
Existing standard error formulas for such estimators require a consistent
estimate of the correlation structure of the empirical moments, which is often
unavailable in practice. Instead, the variances of the individual empirical
moments are usually readily estimable. Using only these variances, we derive
conservative standard errors and confidence intervals for the structural
parameters that are valid even under the worst-case correlation structure. In
the over-identified case, we show that the moment weighting scheme that
minimizes the worst-case estimator variance amounts to a moment selection
problem with a simple solution. Finally, we develop tests of over-identifying
or parameter restrictions. We apply our methods empirically to a model of menu
cost pricing for multi-product firms and to a heterogeneous agent New Keynesian
model.

arXiv link: http://arxiv.org/abs/2109.08109v3

Econometrics arXiv paper, submitted: 2021-09-16

Structural Estimation of Matching Markets with Transferable Utility

Authors: Alfred Galichon, Bernard Salanié

This paper provides an introduction to structural estimation methods for
matching markets with transferable utility.

arXiv link: http://arxiv.org/abs/2109.07932v1

Econometrics arXiv paper, submitted: 2021-09-16

Semi-parametric estimation of the EASI model: Welfare implications of taxes identifying clusters due to unobserved preference heterogeneity

Authors: Andrés Ramírez-Hassan, Alejandro López-Vera

We provide a novel inferential framework to estimate the exact affine Stone
index (EASI) model, and analyze welfare implications due to price changes
caused by taxes. Our inferential framework is based on a non-parametric
specification of the stochastic errors in the EASI incomplete demand system
using Dirichlet processes. Our proposal enables to identify consumer clusters
due to unobserved preference heterogeneity taking into account, censoring,
simultaneous endogeneity and non-linearities. We perform an application based
on a tax on electricity consumption in the Colombian economy. Our results
suggest that there are four clusters due to unobserved preference
heterogeneity; although 95% of our sample belongs to one cluster. This suggests
that observable variables describe preferences in a good way under the EASI
model in our application. We find that utilities seem to be inelastic normal
goods with non-linear Engel curves. Joint predictive distributions indicate
that electricity tax generates substitution effects between electricity and
other non-utility goods. These distributions as well as Slutsky matrices
suggest good model assessment. We find that there is a 95% probability that the
equivalent variation as percentage of income of the representative household is
between 0.60% to 1.49% given an approximately 1% electricity tariff increase.
However, there are heterogeneous effects with higher socioeconomic strata
facing more welfare losses on average. This highlights the potential remarkable
welfare implications due taxation on inelastic services.

arXiv link: http://arxiv.org/abs/2109.07646v1

Econometrics arXiv updated paper (originally submitted: 2021-09-15)

Geographic Difference-in-Discontinuities

Authors: Kyle Butts

A recent econometric literature has critiqued the use of regression
discontinuities where administrative borders serves as the 'cutoff'.
Identification in this context is difficult since multiple treatments can
change at the cutoff and individuals can easily sort on either side of the
border. This note extends the difference-in-discontinuities framework discussed
in Grembi et. al. (2016) to a geographic setting. The paper formalizes the
identifying assumptions in this context which will allow for the removal of
time-invariant sorting and compound-treatments similar to the
difference-in-differences methodology.

arXiv link: http://arxiv.org/abs/2109.07406v2

Econometrics arXiv paper, submitted: 2021-09-14

Bayesian hierarchical analysis of a multifaceted program against extreme poverty

Authors: Louis Charlot

The evaluation of a multifaceted program against extreme poverty in different
developing countries gave encouraging results, but with important heterogeneity
between countries. This master thesis proposes to study this heterogeneity with
a Bayesian hierarchical analysis. The analysis we carry out with two different
hierarchical models leads to a very low amount of pooling of information
between countries, indicating that this observed heterogeneity should be
interpreted mostly as true heterogeneity, and not as sampling error. We analyze
the first order behavior of our hierarchical models, in order to understand
what leads to this very low amount of pooling. We try to give to this work a
didactic approach, with an introduction of Bayesian analysis and an explanation
of the different modeling and computational choices of our analysis.

arXiv link: http://arxiv.org/abs/2109.06759v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-09-13

Policy Optimization Using Semi-parametric Models for Dynamic Pricing

Authors: Jianqing Fan, Yongyi Guo, Mengxin Yu

In this paper, we study the contextual dynamic pricing problem where the
market value of a product is linear in its observed features plus some market
noise. Products are sold one at a time, and only a binary response indicating
success or failure of a sale is observed. Our model setting is similar to
Javanmard and Nazerzadeh [2019] except that we expand the demand curve to a
semiparametric model and need to learn dynamically both parametric and
nonparametric components. We propose a dynamic statistical learning and
decision-making policy that combines semiparametric estimation from a
generalized linear model with an unknown link and online decision-making to
minimize regret (maximize revenue). Under mild conditions, we show that for a
market noise c.d.f. $F(\cdot)$ with $m$-th order derivative ($m\geq 2$), our
policy achieves a regret upper bound of $O_{d}(T^{2m+1{4m-1}})$,
where $T$ is time horizon and $O_{d}$ is the order that hides
logarithmic terms and the dimensionality of feature $d$. The upper bound is
further reduced to $O_{d}(T)$ if $F$ is super smooth whose
Fourier transform decays exponentially. In terms of dependence on the horizon
$T$, these upper bounds are close to $\Omega(T)$, the lower bound where
$F$ belongs to a parametric class. We further generalize these results to the
case with dynamically dependent product features under the strong mixing
condition.

arXiv link: http://arxiv.org/abs/2109.06368v2

Econometrics arXiv paper, submitted: 2021-09-13

Nonparametric Estimation of Truncated Conditional Expectation Functions

Authors: Tomasz Olma

Truncated conditional expectation functions are objects of interest in a wide
range of economic applications, including income inequality measurement,
financial risk management, and impact evaluation. They typically involve
truncating the outcome variable above or below certain quantiles of its
conditional distribution. In this paper, based on local linear methods, a
novel, two-stage, nonparametric estimator of such functions is proposed. In
this estimation problem, the conditional quantile function is a nuisance
parameter that has to be estimated in the first stage. The proposed estimator
is insensitive to the first-stage estimation error owing to the use of a
Neyman-orthogonal moment in the second stage. This construction ensures that
inference methods developed for the standard nonparametric regression can be
readily adapted to conduct inference on truncated conditional expectations. As
an extension, estimation with an estimated truncation quantile level is
considered. The proposed estimator is applied in two empirical settings: sharp
regression discontinuity designs with a manipulated running variable and
randomized experiments with sample selection.

arXiv link: http://arxiv.org/abs/2109.06150v1

Econometrics arXiv paper, submitted: 2021-09-12

Estimating a new panel MSK dataset for comparative analyses of national absorptive capacity systems, economic growth, and development in low and middle income economies

Authors: Muhammad Salar Khan

Within the national innovation system literature, empirical analyses are
severely lacking for developing economies. Particularly, the low- and
middle-income countries (LMICs) eligible for the World Bank's International
Development Association (IDA) support, are rarely part of any empirical
discourse on growth, development, and innovation. One major issue hindering
panel analyses in LMICs, and thus them being subject to any empirical
discussion, is the lack of complete data availability. This work offers a new
complete panel dataset with no missing values for LMICs eligible for IDA's
support. I use a standard, widely respected multiple imputation technique
(specifically, Predictive Mean Matching) developed by Rubin (1987). This
technique respects the structure of multivariate continuous panel data at the
country level. I employ this technique to create a large dataset consisting of
many variables drawn from publicly available established sources. These
variables, in turn, capture six crucial country-level capacities: technological
capacity, financial capacity, human capital capacity, infrastructural capacity,
public policy capacity, and social capacity. Such capacities are part and
parcel of the National Absorptive Capacity Systems (NACS). The dataset (MSK
dataset) thus produced contains data on 47 variables for 82 LMICs between 2005
and 2019. The dataset has passed a quality and reliability check and can thus
be used for comparative analyses of national absorptive capacities and
development, transition, and convergence analyses among LMICs.

arXiv link: http://arxiv.org/abs/2109.05529v1

Econometrics arXiv updated paper (originally submitted: 2021-09-10)

{did2s}: Two-Stage Difference-in-Differences

Authors: Kyle Butts, John Gardner

Recent work has highlighted the difficulties of estimating
difference-in-differences models when treatment timing occurs at different
times for different units. This article introduces the R package did2s which
implements the estimator introduced in Gardner (2021). The article provides an
approachable review of the underlying econometric theory and introduces the
syntax for the function did2s. Further, the package introduces a function,
event_study, that provides a common syntax for all the modern event-study
estimators and plot_event_study to plot the results of each estimator.

arXiv link: http://arxiv.org/abs/2109.05913v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-09-10

Implicit Copulas: An Overview

Authors: Michael Stanley Smith

Implicit copulas are the most common copula choice for modeling dependence in
high dimensions. This broad class of copulas is introduced and surveyed,
including elliptical copulas, skew $t$ copulas, factor copulas, time series
copulas and regression copulas. The common auxiliary representation of implicit
copulas is outlined, and how this makes them both scalable and tractable for
statistical modeling. Issues such as parameter identification, extended
likelihoods for discrete or mixed data, parsimony in high dimensions, and
simulation from the copula model are considered. Bayesian approaches to
estimate the copula parameters, and predict from an implicit copula model, are
outlined. Particular attention is given to implicit copula processes
constructed from time series and regression models, which is at the forefront
of current research. Two econometric applications -- one from macroeconomic
time series and the other from financial asset pricing -- illustrate the
advantages of implicit copula models.

arXiv link: http://arxiv.org/abs/2109.04718v1

Econometrics arXiv paper, submitted: 2021-09-09

Variable Selection for Causal Inference via Outcome-Adaptive Random Forest

Authors: Daniel Jacob

Estimating a causal effect from observational data can be biased if we do not
control for self-selection. This selection is based on confounding variables
that affect the treatment assignment and the outcome. Propensity score methods
aim to correct for confounding. However, not all covariates are confounders. We
propose the outcome-adaptive random forest (OARF) that only includes desirable
variables for estimating the propensity score to decrease bias and variance.
Our approach works in high-dimensional datasets and if the outcome and
propensity score model are non-linear and potentially complicated. The OARF
excludes covariates that are not associated with the outcome, even in the
presence of a large number of spurious variables. Simulation results suggest
that the OARF produces unbiased estimates, has a smaller variance and is
superior in variable selection compared to other approaches. The results from
two empirical examples, the effect of right heart catheterization on mortality
and the effect of maternal smoking during pregnancy on birth weight, show
comparable treatment effects to previous findings but tighter confidence
intervals and more plausible selected variables.

arXiv link: http://arxiv.org/abs/2109.04154v1

Econometrics arXiv updated paper (originally submitted: 2021-09-08)

Some Impossibility Results for Inference With Cluster Dependence with Large Clusters

Authors: Denis Kojevnikov, Kyungchul Song

This paper focuses on a setting with observations having a cluster dependence
structure and presents two main impossibility results. First, we show that when
there is only one large cluster, i.e., the researcher does not have any
knowledge on the dependence structure of the observations, it is not possible
to consistently discriminate the mean. When within-cluster observations satisfy
the uniform central limit theorem, we also show that a sufficient condition for
consistent $n$-discrimination of the mean is that we have at least two
large clusters. This result shows some limitations for inference when we lack
information on the dependence structure of observations. Our second result
provides a necessary and sufficient condition for the cluster structure that
the long run variance is consistently estimable. Our result implies that when
there is at least one large cluster, the long run variance is not consistently
estimable.

arXiv link: http://arxiv.org/abs/2109.03971v4

Econometrics arXiv paper, submitted: 2021-09-08

On the estimation of discrete choice models to capture irrational customer behaviors

Authors: Sanjay Dominik Jena, Andrea Lodi, Claudio Sole

The Random Utility Maximization model is by far the most adopted framework to
estimate consumer choice behavior. However, behavioral economics has provided
strong empirical evidence of irrational choice behavior, such as halo effects,
that are incompatible with this framework. Models belonging to the Random
Utility Maximization family may therefore not accurately capture such
irrational behavior. Hence, more general choice models, overcoming such
limitations, have been proposed. However, the flexibility of such models comes
at the price of increased risk of overfitting. As such, estimating such models
remains a challenge. In this work, we propose an estimation method for the
recently proposed Generalized Stochastic Preference choice model, which
subsumes the family of Random Utility Maximization models and is capable of
capturing halo effects. Specifically, we show how to use partially-ranked
preferences to efficiently model rational and irrational customer types from
transaction data. Our estimation procedure is based on column generation, where
relevant customer types are efficiently extracted by expanding a tree-like data
structure containing the customer behaviors. Further, we propose a new
dominance rule among customer types whose effect is to prioritize low orders of
interactions among products. An extensive set of experiments assesses the
predictive accuracy of the proposed approach. Our results show that accounting
for irrational preferences can boost predictive accuracy by 12.5% on average,
when tested on a real-world dataset from a large chain of grocery and drug
stores.

arXiv link: http://arxiv.org/abs/2109.03882v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-09-08

On a quantile autoregressive conditional duration model applied to high-frequency financial data

Authors: Helton Saulo, Narayanaswamy Balakrishnan, Roberto Vila

Autoregressive conditional duration (ACD) models are primarily used to deal
with data arising from times between two successive events. These models are
usually specified in terms of a time-varying conditional mean or median
duration. In this paper, we relax this assumption and consider a conditional
quantile approach to facilitate the modeling of different percentiles. The
proposed ACD quantile model is based on a skewed version of Birnbaum-Saunders
distribution, which provides better fitting of the tails than the traditional
Birnbaum-Saunders distribution, in addition to advancing the implementation of
an expectation conditional maximization (ECM) algorithm. A Monte Carlo
simulation study is performed to assess the behavior of the model as well as
the parameter estimation method and to evaluate a form of residual. A real
financial transaction data set is finally analyzed to illustrate the proposed
approach.

arXiv link: http://arxiv.org/abs/2109.03844v1

Econometrics arXiv updated paper (originally submitted: 2021-09-08)

Approximate Factor Models with Weaker Loadings

Authors: Jushan Bai, Serena Ng

Pervasive cross-section dependence is increasingly recognized as a
characteristic of economic data and the approximate factor model provides a
useful framework for analysis. Assuming a strong factor structure where
$\Lop\Lo/N^\alpha$ is positive definite in the limit when $\alpha=1$, early
work established convergence of the principal component estimates of the
factors and loadings up to a rotation matrix. This paper shows that the
estimates are still consistent and asymptotically normal when $\alpha\in(0,1]$
albeit at slower rates and under additional assumptions on the sample size. The
results hold whether $\alpha$ is constant or varies across factor loadings. The
framework developed for heterogeneous loadings and the simplified proofs that
can be also used in strong factor analysis are of independent interest.

arXiv link: http://arxiv.org/abs/2109.03773v4

Econometrics arXiv updated paper (originally submitted: 2021-09-07)

The multilayer architecture of the global input-output network and its properties

Authors: Rosanna Grassi, Paolo Bartesaghi, Gian Paolo Clemente, Duc Thi Luu

We analyze the multilayer architecture of the global input-output network
using sectoral trade data (WIOD, 2016 release). With a focus on the mesoscale
structure and related properties, our multilayer analysis takes into
consideration the splitting into industry-based layers in order to catch more
peculiar relationships between countries that cannot be detected from the
analysis of the single-layer aggregated network. We can identify several large
international communities in which some countries trade more intensively in
some specific layers. However, interestingly, our results show that these
clusters can restructure and evolve over time. In general, not only their
internal composition changes, but the centrality rankings of the members inside
are also reordered, industries from some countries diminishing their role and
others from other countries growing importance. These changes in the large
international clusters may reflect the outcomes and the dynamics of
cooperation, partner selection and competition among industries and among
countries in the global input-output network.

arXiv link: http://arxiv.org/abs/2109.02946v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-09-06

Semiparametric Estimation of Treatment Effects in Randomized Experiments

Authors: Susan Athey, Peter J. Bickel, Aiyou Chen, Guido W. Imbens, Michael Pollmann

We develop new semiparametric methods for estimating treatment effects. We
focus on settings where the outcome distributions may be thick tailed, where
treatment effects may be small, where sample sizes are large and where
assignment is completely random. This setting is of particular interest in
recent online experimentation. We propose using parametric models for the
treatment effects, leading to semiparametric models for the outcome
distributions. We derive the semiparametric efficiency bound for the treatment
effects for this setting, and propose efficient estimators. In the leading case
with constant quantile treatment effects one of the proposed efficient
estimators has an interesting interpretation as a weighted average of quantile
treatment effects, with the weights proportional to minus the second derivative
of the log of the density of the potential outcomes. Our analysis also suggests
an extension of Huber's model and trimmed mean to include asymmetry.

arXiv link: http://arxiv.org/abs/2109.02603v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-09-05

Optimal transport weights for causal inference

Authors: Eric Dunipace

Imbalance in covariate distributions leads to biased estimates of causal
effects. Weighting methods attempt to correct this imbalance but rely on
specifying models for the treatment assignment mechanism, which is unknown in
observational studies. This leaves researchers to choose the proper weighting
method and the appropriate covariate functions for these models without knowing
the correct combination to achieve distributional balance. In response to these
difficulties, we propose a nonparametric generalization of several other
weighting schemes found in the literature: Causal Optimal Transport. This new
method directly targets distributional balance by minimizing optimal transport
distances between treatment and control groups or, more generally, between any
source and target population. Our approach is semiparametrically efficient and
model-free but can also incorporate moments or any other important functions of
covariates that a researcher desires to balance. Moreover, our method can
provide nonparametric estimate the conditional mean outcome function and we
give rates for the convergence of this estimator. Moreover, we show how this
method can provide nonparametric imputations of the missing potential outcomes
and give rates of convergence for this estimator. We find that Causal Optimal
Transport outperforms competitor methods when both the propensity score and
outcome models are misspecified, indicating it is a robust alternative to
common weighting methods. Finally, we demonstrate the utility of our method in
an external control trial examining the effect of misoprostol versus oxytocin
for the treatment of post-partum hemorrhage.

arXiv link: http://arxiv.org/abs/2109.01991v4

Econometrics arXiv updated paper (originally submitted: 2021-09-03)

A Framework for Using Value-Added in Regressions

Authors: Antoine Deeb

As increasingly popular metrics of worker and institutional quality,
estimated value-added (VA) measures are now widely used as dependent or
explanatory variables in regressions. For example, VA is used as an explanatory
variable when examining the relationship between teacher VA and students'
long-run outcomes. Due to the multi-step nature of VA estimation, the standard
errors (SEs) researchers routinely use when including VA measures in OLS
regressions are incorrect. In this paper, I show that the assumptions
underpinning VA models naturally lead to a generalized method of moments (GMM)
framework. Using this insight, I construct correct SEs' for regressions that
use VA as an explanatory variable and for regressions where VA is the outcome.
In addition, I identify the causes of incorrect SEs when using OLS, discuss the
need to adjust SEs under different sets of assumptions, and propose a more
efficient estimator for using VA as an explanatory variable. Finally, I
illustrate my results using data from North Carolina, and show that correcting
SEs results in an increase that is larger than the impact of clustering SEs.

arXiv link: http://arxiv.org/abs/2109.01741v3

Econometrics arXiv updated paper (originally submitted: 2021-09-03)

Dynamic Games in Empirical Industrial Organization

Authors: Victor Aguirregabiria, Allan Collard-Wexler, Stephen P. Ryan

This survey is organized around three main topics: models, econometrics, and
empirical applications. Section 2 presents the theoretical framework,
introduces the concept of Markov Perfect Nash Equilibrium, discusses existence
and multiplicity, and describes the representation of this equilibrium in terms
of conditional choice probabilities. We also discuss extensions of the basic
framework, including models in continuous time, the concepts of oblivious
equilibrium and experience-based equilibrium, and dynamic games where firms
have non-equilibrium beliefs. In section 3, we first provide an overview of the
types of data used in this literature, before turning to a discussion of
identification issues and results, and estimation methods. We review different
methods to deal with multiple equilibria and large state spaces. We also
describe recent developments for estimating games in continuous time and
incorporating serially correlated unobservables, and discuss the use of machine
learning methods to solving and estimating dynamic games. Section 4 discusses
empirical applications of dynamic games in IO. We start describing the first
empirical applications in this literature during the early 2000s. Then, we
review recent applications dealing with innovation, antitrust and mergers,
dynamic pricing, regulation, product repositioning, advertising, uncertainty
and investment, airline network competition, dynamic matching, and natural
resources. We conclude with our view of the progress made in this literature
and the remaining challenges.

arXiv link: http://arxiv.org/abs/2109.01725v2

Econometrics arXiv updated paper (originally submitted: 2021-09-01)

How to Detect Network Dependence in Latent Factor Models? A Bias-Corrected CD Test

Authors: M. Hashem Pesaran, Yimeng Xie

In a recent paper Juodis and Reese (2022) (JR) show that the application of
the CD test proposed by Pesaran (2004) to residuals from panels with latent
factors results in over-rejection. They propose a randomized test statistic to
correct for over-rejection, and add a screening component to achieve power.
This paper considers the same problem but from a different perspective, and
shows that the standard CD test remains valid if the latent factors are weak in
the sense the strength is less than half. In the case where latent factors are
strong, we propose a bias-corrected version, CD*, which is shown to be
asymptotically standard normal under the null of error cross-sectional
independence and have power against network type alternatives. This result is
shown to hold for pure latent factor models as well as for panel regression
models with latent factors. The case where the errors are serially correlated
is also considered. Small sample properties of the CD* test are investigated by
Monte Carlo experiments and are shown to have the correct size for strong and
weak factors as well as for Gaussian and non-Gaussian errors. In contrast, it
is found that JR's test tends to over-reject in the case of panels with
non-Gaussian errors, and has low power against spatial network alternatives. In
an empirical application, using the CD* test, it is shown that there remains
spatial error dependence in a panel data model for real house price changes
across 377 Metropolitan Statistical Areas in the U.S., even after the effects
of latent factors are filtered out.

arXiv link: http://arxiv.org/abs/2109.00408v7

Econometrics arXiv updated paper (originally submitted: 2021-09-01)

Matching Theory and Evidence on Covid-19 using a Stochastic Network SIR Model

Authors: M. Hashem Pesaran, Cynthia Fan Yang

This paper develops an individual-based stochastic network SIR model for the
empirical analysis of the Covid-19 pandemic. It derives moment conditions for
the number of infected and active cases for single as well as multigroup
epidemic models. These moment conditions are used to investigate the
identification and estimation of the transmission rates. The paper then
proposes a method that jointly estimates the transmission rate and the
magnitude of under-reporting of infected cases. Empirical evidence on six
European countries matches the simulated outcomes once the under-reporting of
infected cases is addressed. It is estimated that the number of actual cases
could be between 4 to 10 times higher than the reported numbers in October 2020
and declined to 2 to 3 times in April 2021. The calibrated models are used in
the counterfactual analyses of the impact of social distancing and vaccination
on the epidemic evolution, and the timing of early interventions in the UK and
Germany.

arXiv link: http://arxiv.org/abs/2109.00321v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-09-01

A generalized bootstrap procedure of the standard error and confidence interval estimation for inverse probability of treatment weighting

Authors: Tenglong Li, Jordan Lawson

The inverse probability of treatment weighting (IPTW) approach is commonly
used in propensity score analysis to infer causal effects in regression models.
Due to oversized IPTW weights and errors associated with propensity score
estimation, the IPTW approach can underestimate the standard error of causal
effect. To remediate this, bootstrap standard errors have been recommended to
replace the IPTW standard error, but the ordinary bootstrap (OB) procedure
might still result in underestimation of the standard error because of its
inefficient sampling algorithm and un-stabilized weights. In this paper, we
develop a generalized bootstrap (GB) procedure for estimating the standard
error of the IPTW approach. Compared with the OB procedure, the GB procedure
has much lower risk of underestimating the standard error and is more efficient
for both point and standard error estimates. The GB procedure also has smaller
risk of standard error underestimation than the ordinary bootstrap procedure
with trimmed weights, with comparable efficiencies. We demonstrate the
effectiveness of the GB procedure via a simulation study and a dataset from the
National Educational Longitudinal Study-1988 (NELS-88).

arXiv link: http://arxiv.org/abs/2109.00171v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-08-31

Look Who's Talking: Interpretable Machine Learning for Assessing Italian SMEs Credit Default

Authors: Lisa Crosato, Caterina Liberati, Marco Repetto

Academic research and the financial industry have recently paid great
attention to Machine Learning algorithms due to their power to solve complex
learning tasks. In the field of firms' default prediction, however, the lack of
interpretability has prevented the extensive adoption of the black-box type of
models. To overcome this drawback and maintain the high performances of
black-boxes, this paper relies on a model-agnostic approach. Accumulated Local
Effects and Shapley values are used to shape the predictors' impact on the
likelihood of default and rank them according to their contribution to the
model outcome. Prediction is achieved by two Machine Learning algorithms
(eXtreme Gradient Boosting and FeedForward Neural Network) compared with three
standard discriminant models. Results show that our analysis of the Italian
Small and Medium Enterprises manufacturing industry benefits from the overall
highest classification power by the eXtreme Gradient Boosting algorithm without
giving up a rich interpretation framework.

arXiv link: http://arxiv.org/abs/2108.13914v2

Econometrics arXiv updated paper (originally submitted: 2021-08-31)

Wild Bootstrap for Instrumental Variables Regressions with Weak and Few Clusters

Authors: Wenjie Wang, Yichong Zhang

We study the wild bootstrap inference for instrumental variable regressions
in the framework of a small number of large clusters in which the number of
clusters is viewed as fixed and the number of observations for each cluster
diverges to infinity. We first show that the wild bootstrap Wald test, with or
without using the cluster-robust covariance estimator, controls size
asymptotically up to a small error as long as the parameters of endogenous
variables are strongly identified in at least one of the clusters. Then, we
establish the required number of strong clusters for the test to have power
against local alternatives. We further develop a wild bootstrap Anderson-Rubin
test for the full-vector inference and show that it controls size
asymptotically up to a small error even under weak or partial identification in
all clusters. We illustrate the good finite sample performance of the new
inference methods using simulations and provide an empirical application to a
well-known dataset about US local labor markets.

arXiv link: http://arxiv.org/abs/2108.13707v5

Econometrics arXiv updated paper (originally submitted: 2021-08-28)

Dynamic Selection in Algorithmic Decision-making

Authors: Jin Li, Ye Luo, Xiaowei Zhang

This paper identifies and addresses dynamic selection problems in online
learning algorithms with endogenous data. In a contextual multi-armed bandit
model, a novel bias (self-fulfilling bias) arises because the endogeneity of
the data influences the choices of decisions, affecting the distribution of
future data to be collected and analyzed. We propose an
instrumental-variable-based algorithm to correct for the bias. It obtains true
parameter values and attains low (logarithmic-like) regret levels. We also
prove a central limit theorem for statistical inference. To establish the
theoretical properties, we develop a general technique that untangles the
interdependence between data and actions.

arXiv link: http://arxiv.org/abs/2108.12547v3

Econometrics arXiv updated paper (originally submitted: 2021-08-27)

Revisiting Event Study Designs: Robust and Efficient Estimation

Authors: Kirill Borusyak, Xavier Jaravel, Jann Spiess

We develop a framework for difference-in-differences designs with staggered
treatment adoption and heterogeneous causal effects. We show that conventional
regression-based estimators fail to provide unbiased estimates of relevant
estimands absent strong restrictions on treatment-effect homogeneity. We then
derive the efficient estimator addressing this challenge, which takes an
intuitive "imputation" form when treatment-effect heterogeneity is
unrestricted. We characterize the asymptotic behavior of the estimator, propose
tools for inference, and develop tests for identifying assumptions. Our method
applies with time-varying controls, in triple-difference designs, and with
certain non-binary treatments. We show the practical relevance of our results
in a simulation study and an application. Studying the consumption response to
tax rebates in the United States, we find that the notional marginal propensity
to consume is between 8 and 11 percent in the first quarter - about half as
large as benchmark estimates used to calibrate macroeconomic models - and
predominantly occurs in the first month after the rebate.

arXiv link: http://arxiv.org/abs/2108.12419v5

Econometrics arXiv updated paper (originally submitted: 2021-08-26)

Identification of Peer Effects using Panel Data

Authors: Marisa Miraldo, Carol Propper, Christiern Rose

We provide new identification results for panel data models with peer effects
operating through unobserved individual heterogeneity. The results apply for
general network structures governing peer interactions and allow for correlated
effects. Identification hinges on a conditional mean restriction requiring
exogenous mobility of individuals between groups over time. We apply our method
to surgeon-hospital-year data to study take-up of keyhole surgery for cancer,
finding a positive effect of the average individual heterogeneity of other
surgeons practicing in the same hospital

arXiv link: http://arxiv.org/abs/2108.11545v4

Econometrics arXiv updated paper (originally submitted: 2021-08-25)

Double Machine Learning and Automated Confounder Selection -- A Cautionary Tale

Authors: Paul Hünermund, Beyers Louw, Itamar Caspi

Double machine learning (DML) has become an increasingly popular tool for
automated variable selection in high-dimensional settings. Even though the
ability to deal with a large number of potential covariates can render
selection-on-observables assumptions more plausible, there is at the same time
a growing risk that endogenous variables are included, which would lead to the
violation of conditional independence. This paper demonstrates that DML is very
sensitive to the inclusion of only a few "bad controls" in the covariate space.
The resulting bias varies with the nature of the theoretical causal model,
which raises concerns about the feasibility of selecting control variables in a
data-driven way.

arXiv link: http://arxiv.org/abs/2108.11294v4

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-08-24

Continuous Treatment Recommendation with Deep Survival Dose Response Function

Authors: Jie Zhu, Blanca Gallego

We propose a general formulation for continuous treatment recommendation
problems in settings with clinical survival data, which we call the Deep
Survival Dose Response Function (DeepSDRF). That is, we consider the problem of
learning the conditional average dose response (CADR) function solely from
historical data in which observed factors (confounders) affect both observed
treatment and time-to-event outcomes. The estimated treatment effect from
DeepSDRF enables us to develop recommender algorithms with the correction for
selection bias. We compared two recommender approaches based on random search
and reinforcement learning and found similar performance in terms of patient
outcome. We tested the DeepSDRF and the corresponding recommender on extensive
simulation studies and the eICU Research Institute (eRI) database. To the best
of our knowledge, this is the first time that causal models are used to address
the continuous treatment effect with observational data in a medical context.

arXiv link: http://arxiv.org/abs/2108.10453v5

Econometrics arXiv updated paper (originally submitted: 2021-08-23)

Feasible Weighted Projected Principal Component Analysis for Factor Models with an Application to Bond Risk Premia

Authors: Sung Hoon Choi

I develop a feasible weighted projected principal component (FPPC) analysis
for factor models in which observable characteristics partially explain the
latent factors. This novel method provides more efficient and accurate
estimators than existing methods. To increase estimation efficiency, I take
into account both cross-sectional dependence and heteroskedasticity by using a
consistent estimator of the inverse error covariance matrix as the weight
matrix. To improve accuracy, I employ a projection approach using
characteristics because it removes noise components in high-dimensional factor
analysis. By using the FPPC method, estimators of the factors and loadings have
faster rates of convergence than those of the conventional factor analysis.
Moreover, I propose an FPPC-based diffusion index forecasting model. The
limiting distribution of the parameter estimates and the rate of convergence
for forecast errors are obtained. Using U.S. bond market and macroeconomic
data, I demonstrate that the proposed model outperforms models based on
conventional principal component estimators. I also show that the proposed
model performs well among a large group of machine learning techniques in
forecasting excess bond returns.

arXiv link: http://arxiv.org/abs/2108.10250v3

Econometrics arXiv updated paper (originally submitted: 2021-08-21)

Inference in high-dimensional regression models without the exact or $L^p$ sparsity

Authors: Jooyoung Cha, Harold D. Chiang, Yuya Sasaki

This paper proposes a new method of inference in high-dimensional regression
models and high-dimensional IV regression models. Estimation is based on a
combined use of the orthogonal greedy algorithm, high-dimensional Akaike
information criterion, and double/debiased machine learning. The method of
inference for any low-dimensional subvector of high-dimensional parameters is
based on a root-$N$ asymptotic normality, which is shown to hold without
requiring the exact sparsity condition or the $L^p$ sparsity condition.
Simulation studies demonstrate superior finite-sample performance of this
proposed method over those based on the LASSO or the random forest, especially
under less sparse models. We illustrate an application to production analysis
with a panel of Chilean firms.

arXiv link: http://arxiv.org/abs/2108.09520v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-08-21

A Maximum Entropy Copula Model for Mixed Data: Representation, Estimation, and Applications

Authors: Subhadeep, Mukhopadhyay

A new nonparametric model of maximum-entropy (MaxEnt) copula density function
is proposed, which offers the following advantages: (i) it is valid for mixed
random vector. By `mixed' we mean the method works for any combination of
discrete or continuous variables in a fully automated manner; (ii) it yields a
bonafide density estimate with intepretable parameters. By `bonafide' we mean
the estimate guarantees to be a non-negative function, integrates to 1; and
(iii) it plays a unifying role in our understanding of a large class of
statistical methods. Our approach utilizes modern machinery of nonparametric
statistics to represent and approximate log-copula density function via
LP-Fourier transform. Several real-data examples are also provided to explore
the key theoretical and practical implications of the theory.

arXiv link: http://arxiv.org/abs/2108.09438v2

Econometrics arXiv updated paper (originally submitted: 2021-08-20)

Regression Discontinuity Designs

Authors: Matias D. Cattaneo, Rocio Titiunik

The Regression Discontinuity (RD) design is one of the most widely used
non-experimental methods for causal inference and program evaluation. Over the
last two decades, statistical and econometric methods for RD analysis have
expanded and matured, and there is now a large number of methodological results
for RD identification, estimation, inference, and validation. We offer a
curated review of this methodological literature organized around the two most
popular frameworks for the analysis and interpretation of RD designs: the
continuity framework and the local randomization framework. For each framework,
we discuss three main topics: (i) designs and parameters, which focuses on
different types of RD settings and treatment effects of interest; (ii)
estimation and inference, which presents the most popular methods based on
local polynomial regression and analysis of experiments, as well as
refinements, extensions, and alternatives; and (iii) validation and
falsification, which summarizes an array of mostly empirical approaches to
support the validity of RD designs in practice.

arXiv link: http://arxiv.org/abs/2108.09400v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-08-20

Efficient Online Estimation of Causal Effects by Deciding What to Observe

Authors: Shantanu Gupta, Zachary C. Lipton, David Childers

Researchers often face data fusion problems, where multiple data sources are
available, each capturing a distinct subset of variables. While problem
formulations typically take the data as given, in practice, data acquisition
can be an ongoing process. In this paper, we aim to estimate any functional of
a probabilistic model (e.g., a causal effect) as efficiently as possible, by
deciding, at each time, which data source to query. We propose online moment
selection (OMS), a framework in which structural assumptions are encoded as
moment conditions. The optimal action at each step depends, in part, on the
very moments that identify the functional of interest. Our algorithms balance
exploration with choosing the best action as suggested by current estimates of
the moments. We propose two selection strategies: (1) explore-then-commit
(OMS-ETC) and (2) explore-then-greedy (OMS-ETG), proving that both achieve zero
asymptotic regret as assessed by MSE. We instantiate our setup for average
treatment effect estimation, where structural assumptions are given by a causal
graph and data sources may include subsets of mediators, confounders, and
instrumental variables.

arXiv link: http://arxiv.org/abs/2108.09265v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-08-20

A Theoretical Analysis of the Stationarity of an Unrestricted Autoregression Process

Authors: Varsha S. Kulkarni

The higher dimensional autoregressive models would describe some of the
econometric processes relatively generically if they incorporate the
heterogeneity in dependence on times. This paper analyzes the stationarity of
an autoregressive process of dimension $k>1$ having a sequence of coefficients
$\beta$ multiplied by successively increasing powers of $0<\delta<1$. The
theorem gives the conditions of stationarity in simple relations between the
coefficients and $k$ in terms of $\delta$. Computationally, the evidence of
stationarity depends on the parameters. The choice of $\delta$ sets the bounds
on $\beta$ and the number of time lags for prediction of the model.

arXiv link: http://arxiv.org/abs/2108.09083v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-08-17

Causal Inference with Noncompliance and Unknown Interference

Authors: Tadao Hoshino, Takahide Yanagi

We consider a causal inference model in which individuals interact in a
social network and they may not comply with the assigned treatments. In
particular, we suppose that the form of network interference is unknown to
researchers. To estimate meaningful causal parameters in this situation, we
introduce a new concept of exposure mapping, which summarizes potentially
complicated spillover effects into a fixed dimensional statistic of
instrumental variables. We investigate identification conditions for the
intention-to-treat effects and the average treatment effects for compliers,
while explicitly considering the possibility of misspecification of exposure
mapping. Based on our identification results, we develop nonparametric
estimation procedures via inverse probability weighting. Their asymptotic
properties, including consistency and asymptotic normality, are investigated
using an approximate neighborhood interference framework. For an empirical
illustration, we apply our method to experimental data on the anti-conflict
intervention school program. The proposed methods are readily available with
the companion R package latenetwork.

arXiv link: http://arxiv.org/abs/2108.07455v5

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-08-17

InfoGram and Admissible Machine Learning

Authors: Subhadeep Mukhopadhyay

We have entered a new era of machine learning (ML), where the most accurate
algorithm with superior predictive power may not even be deployable, unless it
is admissible under the regulatory constraints. This has led to great interest
in developing fair, transparent and trustworthy ML methods. The purpose of this
article is to introduce a new information-theoretic learning framework
(admissible machine learning) and algorithmic risk-management tools (InfoGram,
L-features, ALFA-testing) that can guide an analyst to redesign off-the-shelf
ML methods to be regulatory compliant, while maintaining good prediction
accuracy. We have illustrated our approach using several real-data examples
from financial sectors, biomedical research, marketing campaigns, and the
criminal justice system.

arXiv link: http://arxiv.org/abs/2108.07380v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-08-16

Density Sharpening: Principles and Applications to Discrete Data Analysis

Authors: Subhadeep Mukhopadhyay

This article introduces a general statistical modeling principle called
"Density Sharpening" and applies it to the analysis of discrete count data. The
underlying foundation is based on a new theory of nonparametric approximation
and smoothing methods for discrete distributions which play a useful role in
explaining and uniting a large class of applied statistical methods. The
proposed modeling framework is illustrated using several real applications,
from seismology to healthcare to physics.

arXiv link: http://arxiv.org/abs/2108.07372v3

Econometrics arXiv paper, submitted: 2021-08-14

Dimensionality Reduction and State Space Systems: Forecasting the US Treasury Yields Using Frequentist and Bayesian VARs

Authors: Sudiksha Joshi

Using a state-space system, I forecasted the US Treasury yields by employing
frequentist and Bayesian methods after first decomposing the yields of varying
maturities into its unobserved term structure factors. Then, I exploited the
structure of the state-space model to forecast the Treasury yields and compared
the forecast performance of each model using mean squared forecast error. Among
the frequentist methods, I applied the two-step Diebold-Li, two-step principal
components, and one-step Kalman filter approaches. Likewise, I imposed the five
different priors in Bayesian VARs: Diffuse, Minnesota, natural conjugate, the
independent normal inverse: Wishart, and the stochastic search variable
selection priors. After forecasting the Treasury yields for 9 different
forecast horizons, I found that the BVAR with Minnesota prior generally
minimizes the loss function. I augmented the above BVARs by including
macroeconomic variables and constructed impulse response functions with a
recursive ordering identification scheme. Finally, I fitted a sign-restricted
BVAR with dummy observations.

arXiv link: http://arxiv.org/abs/2108.06553v1

Econometrics arXiv updated paper (originally submitted: 2021-08-14)

Evidence Aggregation for Treatment Choice

Authors: Takuya Ishihara, Toru Kitagawa

Consider a planner who has limited knowledge of the policy's causal impact on
a certain local population of interest due to a lack of data, but does have
access to the publicized intervention studies performed for similar policies on
different populations. How should the planner make use of and aggregate this
existing evidence to make her policy decision? Following Manski (2020; Towards
Credible Patient-Centered Meta-Analysis, Epidemiology), we formulate
the planner's problem as a statistical decision problem with a social welfare
objective, and solve for an optimal aggregation rule under the minimax-regret
criterion. We investigate the analytical properties, computational feasibility,
and welfare regret performance of this rule. We apply the minimax regret
decision rule to two settings: whether to enact an active labor market policy
based on 14 randomized control trial studies; and whether to approve a drug
(Remdesivir) for COVID-19 treatment using a meta-database of clinical trials.

arXiv link: http://arxiv.org/abs/2108.06473v2

Econometrics arXiv updated paper (originally submitted: 2021-08-13)

Identification of Incomplete Preferences

Authors: Luca Rigotti, Arie Beresteanu

We provide a sharp identification region for discrete choice models where
consumers' preferences are not necessarily complete even if only aggregate
choice data is available. Behavior is modeled using an upper and a lower
utility for each alternative so that non-comparability can arise. The
identification region places intuitive bounds on the probability distribution
of upper and lower utilities. We show that the existence of an instrumental
variable can be used to reject the hypothesis that the preferences of all
consumers are complete. We apply our methods to data from the 2018 mid-term
elections in Ohio.

arXiv link: http://arxiv.org/abs/2108.06282v4

Econometrics arXiv updated paper (originally submitted: 2021-08-13)

A Unified Frequency Domain Cross-Validatory Approach to HAC Standard Error Estimation

Authors: Zhihao Xu, Clifford M. Hurvich

A unified frequency domain cross-validation (FDCV) method is proposed to
obtain a heteroskedasticity and autocorrelation consistent (HAC) standard
error. This method enables model/tuning parameter selection across both
parametric and nonparametric spectral estimators simultaneously. The candidate
class for this approach consists of restricted maximum likelihood-based (REML)
autoregressive spectral estimators and lag-weights estimators with the Parzen
kernel. Additionally, an efficient technique for computing the REML estimators
of autoregressive models is provided. Through simulations, the reliability of
the FDCV method is demonstrated, comparing favorably with popular HAC
estimators such as Andrews-Monahan and Newey-West.

arXiv link: http://arxiv.org/abs/2108.06093v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-08-12

An Optimal Transport Approach to Estimating Causal Effects via Nonlinear Difference-in-Differences

Authors: William Torous, Florian Gunsilius, Philippe Rigollet

We propose a nonlinear difference-in-differences method to estimate
multivariate counterfactual distributions in classical treatment and control
study designs with observational data. Our approach sheds a new light on
existing approaches like the changes-in-changes and the classical
semiparametric difference-in-differences estimator and generalizes them to
settings with multivariate heterogeneity in the outcomes. The main benefit of
this extension is that it allows for arbitrary dependence and heterogeneity in
the joint outcomes. We demonstrate its utility both on synthetic and real data.
In particular, we revisit the classical Card & Krueger dataset, examining the
effect of a minimum wage increase on employment in fast food restaurants; a
reanalysis with our method reveals that restaurants tend to substitute
full-time with part-time labor after a minimum wage increase at a faster pace.
A previous version of this work was entitled "An optimal transport approach to
causal inference.

arXiv link: http://arxiv.org/abs/2108.05858v2

Econometrics arXiv updated paper (originally submitted: 2021-08-12)

Sparse Temporal Disaggregation

Authors: Luke Mosley, Idris Eckley, Alex Gibberd

Temporal disaggregation is a method commonly used in official statistics to
enable high-frequency estimates of key economic indicators, such as GDP.
Traditionally, such methods have relied on only a couple of high-frequency
indicator series to produce estimates. However, the prevalence of large, and
increasing, volumes of administrative and alternative data-sources motivates
the need for such methods to be adapted for high-dimensional settings. In this
article, we propose a novel sparse temporal-disaggregation procedure and
contrast this with the classical Chow-Lin method. We demonstrate the
performance of our proposed method through simulation study, highlighting
various advantages realised. We also explore its application to disaggregation
of UK gross domestic product data, demonstrating the method's ability to
operate when the number of potential indicators is greater than the number of
low-frequency observations.

arXiv link: http://arxiv.org/abs/2108.05783v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-08-10

Multiway empirical likelihood

Authors: Harold D Chiang, Yukitoshi Matsushita, Taisuke Otsu

This paper develops a general methodology to conduct statistical inference
for observations indexed by multiple sets of entities. We propose a novel
multiway empirical likelihood statistic that converges to a chi-square
distribution under the non-degenerate case, where corresponding Hoeffding type
decomposition is dominated by linear terms. Our methodology is related to the
notion of jackknife empirical likelihood but the leave-out pseudo values are
constructed by leaving columns or rows. We further develop a modified version
of our multiway empirical likelihood statistic, which converges to a chi-square
distribution regardless of the degeneracy, and discover its desirable
higher-order property compared to the t-ratio by the conventional Eicker-White
type variance estimator. The proposed methodology is illustrated by several
important statistical problems, such as bipartite network, generalized
estimating equations, and three-way observations.

arXiv link: http://arxiv.org/abs/2108.04852v6

Econometrics arXiv paper, submitted: 2021-08-10

Weighted asymmetric least squares regression with fixed-effects

Authors: Amadou Barry, Karim Oualkacha, Arthur Charpentier

The fixed-effects model estimates the regressor effects on the mean of the
response, which is inadequate to summarize the variable relationships in the
presence of heteroscedasticity. In this paper, we adapt the asymmetric least
squares (expectile) regression to the fixed-effects model and propose a new
model: expectile regression with fixed-effects $(\ERFE).$ The $\ERFE$ model
applies the within transformation strategy to concentrate out the incidental
parameter and estimates the regressor effects on the expectiles of the response
distribution. The $\ERFE$ model captures the data heteroscedasticity and
eliminates any bias resulting from the correlation between the regressors and
the omitted factors. We derive the asymptotic properties of the $\ERFE$
estimators and suggest robust estimators of its covariance matrix. Our
simulations show that the $\ERFE$ estimator is unbiased and outperforms its
competitors. Our real data analysis shows its ability to capture data
heteroscedasticity (see our R package, github.com/AmBarry/erfe).

arXiv link: http://arxiv.org/abs/2108.04737v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-08-09

Controlling for Unmeasured Confounding in Panel Data Using Minimal Bridge Functions: From Two-Way Fixed Effects to Factor Models

Authors: Guido Imbens, Nathan Kallus, Xiaojie Mao

We develop a new approach for identifying and estimating average causal
effects in panel data under a linear factor model with unmeasured confounders.
Compared to other methods tackling factor models such as synthetic controls and
matrix completion, our method does not require the number of time periods to
grow infinitely. Instead, we draw inspiration from the two-way fixed effect
model as a special case of the linear factor model, where a simple
difference-in-differences transformation identifies the effect. We show that
analogous, albeit more complex, transformations exist in the more general
linear factor model, providing a new means to identify the effect in that
model. In fact many such transformations exist, called bridge functions, all
identifying the same causal effect estimand. This poses a unique challenge for
estimation and inference, which we solve by targeting the minimal bridge
function using a regularized estimation approach. We prove that our resulting
average causal effect estimator is root-N consistent and asymptotically normal,
and we provide asymptotically valid confidence intervals. Finally, we provide
extensions for the case of a linear factor model with time-varying unmeasured
confounders.

arXiv link: http://arxiv.org/abs/2108.03849v1

Econometrics arXiv paper, submitted: 2021-08-08

Improving Inference from Simple Instruments through Compliance Estimation

Authors: Stephen Coussens, Jann Spiess

Instrumental variables (IV) regression is widely used to estimate causal
treatment effects in settings where receipt of treatment is not fully random,
but there exists an instrument that generates exogenous variation in treatment
exposure. While IV can recover consistent treatment effect estimates, they are
often noisy. Building upon earlier work in biostatistics (Joffe and Brensinger,
2003) and relating to an evolving literature in econometrics (including Abadie
et al., 2019; Huntington-Klein, 2020; Borusyak and Hull, 2020), we study how to
improve the efficiency of IV estimates by exploiting the predictable variation
in the strength of the instrument. In the case where both the treatment and
instrument are binary and the instrument is independent of baseline covariates,
we study weighting each observation according to its estimated compliance (that
is, its conditional probability of being affected by the instrument), which we
motivate from a (constrained) solution of the first-stage prediction problem
implicit to IV. The resulting estimator can leverage machine learning to
estimate compliance as a function of baseline covariates. We derive the
large-sample properties of a specific implementation of a weighted IV estimator
in the potential outcomes and local average treatment effect (LATE) frameworks,
and provide tools for inference that remain valid even when the weights are
estimated nonparametrically. With both theoretical results and a simulation
study, we demonstrate that compliance weighting meaningfully reduces the
variance of IV estimates when first-stage heterogeneity is present, and that
this improvement often outweighs any difference between the compliance-weighted
and unweighted IV estimands. These results suggest that in a variety of applied
settings, the precision of IV estimates can be substantially improved by
incorporating compliance estimation.

arXiv link: http://arxiv.org/abs/2108.03726v1

Econometrics arXiv paper, submitted: 2021-08-08

A Theoretical Analysis of Logistic Regression and Bayesian Classifiers

Authors: Roman V. Kirin

This study aims to show the fundamental difference between logistic
regression and Bayesian classifiers in the case of exponential and
unexponential families of distributions, yielding the following findings.
First, the logistic regression is a less general representation of a Bayesian
classifier. Second, one should suppose distributions of classes for the correct
specification of logistic regression equations. Third, in specific cases, there
is no difference between predicted probabilities from correctly specified
generative Bayesian classifier and discriminative logistic regression.

arXiv link: http://arxiv.org/abs/2108.03715v1

Econometrics arXiv updated paper (originally submitted: 2021-08-08)

Including the asymmetry of the Lorenz curve into measures of economic inequality

Authors: Mario Schlemmer

The Gini index signals only the dispersion of the distribution and is not
very sensitive to income differences at the tails of the distribution. The
widely used index of inequality can be adjusted to also measure distributional
asymmetry by attaching weights to the distances between the Lorenz curve and
the 45-degree line. The measure is equivalent to the Gini if the distribution
is symmetric. The alternative measure of inequality inherits good properties
from the Gini but is more sensitive to changes in the extremes of the income
distribution.

arXiv link: http://arxiv.org/abs/2108.03623v2

Econometrics arXiv paper, submitted: 2021-08-07

Fully Modified Least Squares Cointegrating Parameter Estimation in Multicointegrated Systems

Authors: Igor L. Kheifets, Peter C. B. Phillips

Multicointegration is traditionally defined as a particular long run
relationship among variables in a parametric vector autoregressive model that
introduces additional cointegrating links between these variables and partial
sums of the equilibrium errors. This paper departs from the parametric model,
using a semiparametric formulation that reveals the explicit role that
singularity of the long run conditional covariance matrix plays in determining
multicointegration. The semiparametric framework has the advantage that short
run dynamics do not need to be modeled and estimation by standard techniques
such as fully modified least squares (FM-OLS) on the original I(1) system is
straightforward. The paper derives FM-OLS limit theory in the multicointegrated
setting, showing how faster rates of convergence are achieved in the direction
of singularity and that the limit distribution depends on the distribution of
the conditional one-sided long run covariance estimator used in FM-OLS
estimation. Wald tests of restrictions on the regression coefficients have
nonstandard limit theory which depends on nuisance parameters in general. The
usual tests are shown to be conservative when the restrictions are isolated to
the directions of singularity and, under certain conditions, are invariant to
singularity otherwise. Simulations show that approximations derived in the
paper work well in finite samples. The findings are illustrated empirically in
an analysis of fiscal sustainability of the US government over the post-war
period.

arXiv link: http://arxiv.org/abs/2108.03486v1

Econometrics arXiv updated paper (originally submitted: 2021-08-07)

Culling the herd of moments with penalized empirical likelihood

Authors: Jinyuan Chang, Zhentao Shi, Jia Zhang

Models defined by moment conditions are at the center of structural
econometric estimation, but economic theory is mostly agnostic about moment
selection. While a large pool of valid moments can potentially improve
estimation efficiency, in the meantime a few invalid ones may undermine
consistency. This paper investigates the empirical likelihood estimation of
these moment-defined models in high-dimensional settings. We propose a
penalized empirical likelihood (PEL) estimation and establish its oracle
property with consistent detection of invalid moments. The PEL estimator is
asymptotically normally distributed, and a projected PEL procedure further
eliminates its asymptotic bias and provides more accurate normal approximation
to the finite sample behavior. Simulation exercises demonstrate excellent
numerical performance of these methods in estimation and inference.

arXiv link: http://arxiv.org/abs/2108.03382v4

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-08-06

Building a Foundation for Data-Driven, Interpretable, and Robust Policy Design using the AI Economist

Authors: Alexander Trott, Sunil Srinivasa, Douwe van der Wal, Sebastien Haneuse, Stephan Zheng

Optimizing economic and public policy is critical to address socioeconomic
issues and trade-offs, e.g., improving equality, productivity, or wellness, and
poses a complex mechanism design problem. A policy designer needs to consider
multiple objectives, policy levers, and behavioral responses from strategic
actors who optimize for their individual objectives. Moreover, real-world
policies should be explainable and robust to simulation-to-reality gaps, e.g.,
due to calibration issues. Existing approaches are often limited to a narrow
set of policy levers or objectives that are hard to measure, do not yield
explicit optimal policies, or do not consider strategic behavior, for example.
Hence, it remains challenging to optimize policy in real-world scenarios. Here
we show that the AI Economist framework enables effective, flexible, and
interpretable policy design using two-level reinforcement learning (RL) and
data-driven simulations. We validate our framework on optimizing the stringency
of US state policies and Federal subsidies during a pandemic, e.g., COVID-19,
using a simulation fitted to real data. We find that log-linear policies
trained using RL significantly improve social welfare, based on both public
health and economic outcomes, compared to past outcomes. Their behavior can be
explained, e.g., well-performing policies respond strongly to changes in
recovery and vaccination rates. They are also robust to calibration errors,
e.g., infection rates that are over or underestimated. As of yet, real-world
policymaking has not seen adoption of machine learning methods at large,
including RL and AI-driven simulations. Our results show the potential of AI to
guide policy design and improve social welfare amidst the complexity of the
real world.

arXiv link: http://arxiv.org/abs/2108.02904v1

Econometrics arXiv updated paper (originally submitted: 2021-08-05)

Sparse Generalized Yule-Walker Estimation for Large Spatio-temporal Autoregressions with an Application to NO2 Satellite Data

Authors: Hanno Reuvers, Etienne Wijler

We consider a high-dimensional model in which variables are observed over
time and space. The model consists of a spatio-temporal regression containing a
time lag and a spatial lag of the dependent variable. Unlike classical spatial
autoregressive models, we do not rely on a predetermined spatial interaction
matrix, but infer all spatial interactions from the data. Assuming sparsity, we
estimate the spatial and temporal dependence fully data-driven by penalizing a
set of Yule-Walker equations. This regularization can be left unstructured, but
we also propose customized shrinkage procedures when observations originate
from spatial grids (e.g. satellite images). Finite sample error bounds are
derived and estimation consistency is established in an asymptotic framework
wherein the sample size and the number of spatial units diverge jointly.
Exogenous variables can be included as well. A simulation exercise shows strong
finite sample performance compared to competing procedures. As an empirical
application, we model satellite measured NO2 concentrations in London. Our
approach delivers forecast improvements over a competitive benchmark and we
discover evidence for strong spatial interactions.

arXiv link: http://arxiv.org/abs/2108.02864v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-08-04

Synthetic Controls for Experimental Design

Authors: Alberto Abadie, Jinglong Zhao

This article studies experimental design in settings where the experimental
units are large aggregate entities (e.g., markets), and only one or a small
number of units can be exposed to the treatment. In such settings,
randomization of the treatment may result in treated and control groups with
very different characteristics at baseline, inducing biases. We propose a
variety of experimental non-randomized synthetic control designs (Abadie,
Diamond and Hainmueller, 2010, Abadie and Gardeazabal, 2003) that select the
units to be treated, as well as the untreated units to be used as a control
group. Average potential outcomes are estimated as weighted averages of the
outcomes of treated units for potential outcomes with treatment, and weighted
averages the outcomes of control units for potential outcomes without
treatment. We analyze the properties of estimators based on synthetic control
designs and propose new inferential techniques. We show that in experimental
settings with aggregate units, synthetic control designs can substantially
reduce estimation biases in comparison to randomization of the treatment.

arXiv link: http://arxiv.org/abs/2108.02196v5

Econometrics arXiv updated paper (originally submitted: 2021-08-04)

Nested Pseudo Likelihood Estimation of Continuous-Time Dynamic Discrete Games

Authors: Jason R. Blevins, Minhae Kim

We introduce a sequential estimator for continuous time dynamic discrete
choice models (single-agent models and games) by adapting the nested pseudo
likelihood (NPL) estimator of Aguirregabiria and Mira (2002, 2007), developed
for discrete time models with discrete time data, to the continuous time case
with data sampled either discretely (i.e., uniformly-spaced snapshot data) or
continuously. We establish conditions for consistency and asymptotic normality
of the estimator, a local convergence condition, and, for single agent models,
a zero Jacobian property assuring local convergence. We carry out a series of
Monte Carlo experiments using an entry-exit game with five heterogeneous firms
to confirm the large-sample properties and demonstrate finite-sample bias
reduction via iteration. In our simulations we show that the convergence issues
documented for the NPL estimator in discrete time models are less likely to
affect comparable continuous-time models. We also show that there can be large
bias in economically-relevant parameters, such as the competitive effect and
entry cost, from estimating a misspecified discrete time model when in fact the
data generating process is a continuous time model.

arXiv link: http://arxiv.org/abs/2108.02182v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-08-04

Semiparametric Functional Factor Models with Bayesian Rank Selection

Authors: Daniel R. Kowal, Antonio Canale

Functional data are frequently accompanied by a parametric template that
describes the typical shapes of the functions. However, these parametric
templates can incur significant bias, which undermines both utility and
interpretability. To correct for model misspecification, we augment the
parametric template with an infinite-dimensional nonparametric functional
basis. The nonparametric basis functions are learned from the data and
constrained to be orthogonal to the parametric template, which preserves
distinctness between the parametric and nonparametric terms. This distinctness
is essential to prevent functional confounding, which otherwise induces severe
bias for the parametric terms. The nonparametric factors are regularized with
an ordered spike-and-slab prior that provides consistent rank selection and
satisfies several appealing theoretical properties. The versatility of the
proposed approach is illustrated through applications to synthetic data, human
motor control data, and dynamic yield curve data. Relative to parametric and
semiparametric alternatives, the proposed semiparametric functional factor
model eliminates bias, reduces excessive posterior and predictive uncertainty,
and provides reliable inference on the effective number of nonparametric
terms--all with minimal additional computational costs.

arXiv link: http://arxiv.org/abs/2108.02151v3

Econometrics arXiv updated paper (originally submitted: 2021-08-04)

Bayesian forecast combination using time-varying features

Authors: Li Li, Yanfei Kang, Feng Li

In this work, we propose a novel framework for density forecast combination
by constructing time-varying weights based on time series features, which is
called Feature-based Bayesian Forecasting Model Averaging (FEBAMA). Our
framework estimates weights in the forecast combination via Bayesian log
predictive scores, in which the optimal forecasting combination is determined
by time series features from historical information. In particular, we use an
automatic Bayesian variable selection method to add weight to the importance of
different features. To this end, our approach has better interpretability
compared to other black-box forecasting combination schemes. We apply our
framework to stock market data and M3 competition data. Based on our structure,
a simple maximum-a-posteriori scheme outperforms benchmark methods, and
Bayesian variable selection can further enhance the accuracy for both point and
density forecasts.

arXiv link: http://arxiv.org/abs/2108.02082v3

Econometrics arXiv cross-link from cs.CY (cs.CY), submitted: 2021-08-03

Automated Identification of Climate Risk Disclosures in Annual Corporate Reports

Authors: David Friederich, Lynn H. Kaack, Alexandra Luccioni, Bjarne Steffen

It is important for policymakers to understand which financial policies are
effective in increasing climate risk disclosure in corporate reporting. We use
machine learning to automatically identify disclosures of five different types
of climate-related risks. For this purpose, we have created a dataset of over
120 manually-annotated annual reports by European firms. Applying our approach
to reporting of 337 firms over the last 20 years, we find that risk disclosure
is increasing. Disclosure of transition risks grows more dynamically than
physical risks, and there are marked differences across industries.
Country-specific dynamics indicate that regulatory environments potentially
have an important role to play for increasing disclosure.

arXiv link: http://arxiv.org/abs/2108.01415v1

Econometrics arXiv updated paper (originally submitted: 2021-08-03)

Learning Causal Models from Conditional Moment Restrictions by Importance Weighting

Authors: Masahiro Kato, Masaaki Imaizumi, Kenichiro McAlinn, Haruo Kakehi, Shota Yasui

We consider learning causal relationships under conditional moment
restrictions. Unlike causal inference under unconditional moment restrictions,
conditional moment restrictions pose serious challenges for causal inference,
especially in high-dimensional settings. To address this issue, we propose a
method that transforms conditional moment restrictions to unconditional moment
restrictions through importance weighting, using a conditional density ratio
estimator. Using this transformation, we successfully estimate nonparametric
functions defined under conditional moment restrictions. Our proposed framework
is general and can be applied to a wide range of methods, including neural
networks. We analyze the estimation error, providing theoretical support for
our proposed method. In experiments, we confirm the soundness of our proposed
method.

arXiv link: http://arxiv.org/abs/2108.01312v2

Econometrics arXiv updated paper (originally submitted: 2021-08-02)

Partial Identification and Inference for Conditional Distributions of Treatment Effects

Authors: Sungwon Lee

This paper considers identification and inference for the distribution of
treatment effects conditional on observable covariates. Since the conditional
distribution of treatment effects is not point identified without strong
assumptions, we obtain bounds on the conditional distribution of treatment
effects by using the Makarov bounds. We also consider the case where the
treatment is endogenous and propose two stochastic dominance assumptions to
tighten the bounds. We develop a nonparametric framework to estimate the bounds
and establish the asymptotic theory that is uniformly valid over the support of
treatment effects. An empirical example illustrates the usefulness of the
methods.

arXiv link: http://arxiv.org/abs/2108.00723v6

Econometrics arXiv paper, submitted: 2021-08-01

Implementing an Improved Test of Matrix Rank in Stata

Authors: Qihui Chen, Zheng Fang, Xun Huang

We develop a Stata command, bootranktest, for implementing the matrix rank
test of Chen and Fang (2019) in linear instrumental variable regression models.
Existing rank tests employ critical values that may be too small, and hence may
not even be first order valid in the sense that they may fail to control the
Type I error. By appealing to the bootstrap, they devise a test that overcomes
the deficiency of existing tests. The command bootranktest implements the
two-step version of their test, and also the analytic version if chosen. The
command also accommodates data with temporal and cluster dependence.

arXiv link: http://arxiv.org/abs/2108.00511v1

Econometrics arXiv updated paper (originally submitted: 2021-07-30)

Semiparametric Estimation of Long-Term Treatment Effects

Authors: Jiafeng Chen, David M. Ritzwoller

Long-term outcomes of experimental evaluations are necessarily observed after
long delays. We develop semiparametric methods for combining the short-term
outcomes of experiments with observational measurements of short-term and
long-term outcomes, in order to estimate long-term treatment effects. We
characterize semiparametric efficiency bounds for various instances of this
problem. These calculations facilitate the construction of several estimators.
We analyze the finite-sample performance of these estimators with a simulation
calibrated to data from an evaluation of the long-term effects of a poverty
alleviation program.

arXiv link: http://arxiv.org/abs/2107.14405v5

Econometrics arXiv paper, submitted: 2021-07-29

Inference in heavy-tailed non-stationary multivariate time series

Authors: Matteo Barigozzi, Giuseppe Cavaliere, Lorenzo Trapani

We study inference on the common stochastic trends in a non-stationary,
$N$-variate time series $y_{t}$, in the possible presence of heavy tails. We
propose a novel methodology which does not require any knowledge or estimation
of the tail index, or even knowledge as to whether certain moments (such as the
variance) exist or not, and develop an estimator of the number of stochastic
trends $m$ based on the eigenvalues of the sample second moment matrix of
$y_{t}$. We study the rates of such eigenvalues, showing that the first $m$
ones diverge, as the sample size $T$ passes to infinity, at a rate faster by
$O\left(T \right)$ than the remaining $N-m$ ones, irrespective of the tail
index. We thus exploit this eigen-gap by constructing, for each eigenvalue, a
test statistic which diverges to positive infinity or drifts to zero according
to whether the relevant eigenvalue belongs to the set of the first $m$
eigenvalues or not. We then construct a randomised statistic based on this,
using it as part of a sequential testing procedure, ensuring consistency of the
resulting estimator of $m$. We also discuss an estimator of the common trends
based on principal components and show that, up to a an invertible linear
transformation, such estimator is consistent in the sense that the estimation
error is of smaller order than the trend itself. Finally, we also consider the
case in which we relax the standard assumption of i.i.d. innovations,
by allowing for heterogeneity of a very general form in the scale of the
innovations. A Monte Carlo study shows that the proposed estimator for $m$
performs particularly well, even in samples of small size. We complete the
paper by presenting four illustrative applications covering commodity prices,
interest rates data, long run PPP and cryptocurrency markets.

arXiv link: http://arxiv.org/abs/2107.13894v1

Econometrics arXiv cross-link from q-fin.PM (q-fin.PM), submitted: 2021-07-29

Machine Learning and Factor-Based Portfolio Optimization

Authors: Thomas Conlon, John Cotter, Iason Kynigakis

We examine machine learning and factor-based portfolio optimization. We find
that factors based on autoencoder neural networks exhibit a weaker relationship
with commonly used characteristic-sorted portfolios than popular dimensionality
reduction techniques. Machine learning methods also lead to covariance and
portfolio weight structures that diverge from simpler estimators.
Minimum-variance portfolios using latent factors derived from autoencoders and
sparse methods outperform simpler benchmarks in terms of risk minimization.
These effects are amplified for investors with an increased sensitivity to
risk-adjusted returns, during high volatility periods or when accounting for
tail risk.

arXiv link: http://arxiv.org/abs/2107.13866v1

Econometrics arXiv updated paper (originally submitted: 2021-07-29)

Design-Robust Two-Way-Fixed-Effects Regression For Panel Data

Authors: Dmitry Arkhangelsky, Guido W. Imbens, Lihua Lei, Xiaoman Luo

We propose a new estimator for average causal effects of a binary treatment
with panel data in settings with general treatment patterns. Our approach
augments the popular two-way-fixed-effects specification with unit-specific
weights that arise from a model for the assignment mechanism. We show how to
construct these weights in various settings, including the staggered adoption
setting, where units opt into the treatment sequentially but permanently. The
resulting estimator converges to an average (over units and time) treatment
effect under the correct specification of the assignment model, even if the
fixed effect model is misspecified. We show that our estimator is more robust
than the conventional two-way estimator: it remains consistent if either the
assignment mechanism or the two-way regression model is correctly specified. In
addition, the proposed estimator performs better than the two-way-fixed-effect
estimator if the outcome model and assignment mechanism are locally
misspecified. This strong double robustness property underlines and quantifies
the benefits of modeling the assignment process and motivates using our
estimator in practice. We also discuss an extension of our estimator to handle
dynamic treatment effects.

arXiv link: http://arxiv.org/abs/2107.13737v3

Econometrics arXiv paper, submitted: 2021-07-27

Estimating high-dimensional Markov-switching VARs

Authors: Kenwin Maung

Maximum likelihood estimation of large Markov-switching vector
autoregressions (MS-VARs) can be challenging or infeasible due to parameter
proliferation. To accommodate situations where dimensionality may be of
comparable order to or exceeds the sample size, we adopt a sparse framework and
propose two penalized maximum likelihood estimators with either the Lasso or
the smoothly clipped absolute deviation (SCAD) penalty. We show that both
estimators are estimation consistent, while the SCAD estimator also selects
relevant parameters with probability approaching one. A modified EM-algorithm
is developed for the case of Gaussian errors and simulations show that the
algorithm exhibits desirable finite sample performance. In an application to
short-horizon return predictability in the US, we estimate a 15 variable
2-state MS-VAR(1) and obtain the often reported counter-cyclicality in
predictability. The variable selection property of our estimators helps to
identify predictors that contribute strongly to predictability during economic
contractions but are otherwise irrelevant in expansions. Furthermore,
out-of-sample analyses indicate that large MS-VARs can significantly outperform
"hard-to-beat" predictors like the historical average.

arXiv link: http://arxiv.org/abs/2107.12552v1

Econometrics arXiv updated paper (originally submitted: 2021-07-26)

A Unifying Framework for Testing Shape Restrictions

Authors: Zheng Fang

This paper makes the following original contributions. First, we develop a
unifying framework for testing shape restrictions based on the Wald principle.
The test has asymptotic uniform size control and is uniformly consistent.
Second, we examine the applicability and usefulness of some prominent shape
enforcing operators in implementing our framework. In particular, in stark
contrast to its use in point and interval estimation, the rearrangement
operator is inapplicable due to a lack of convexity. The greatest convex
minorization and the least concave majorization are shown to enjoy the analytic
properties required to employ our framework. Third, we show that, despite that
the projection operator may not be well-defined/behaved in general parameter
spaces such as those defined by uniform norms, one may nonetheless employ a
powerful distance-based test by applying our framework. Monte Carlo simulations
confirm that our test works well. We further showcase the empirical relevance
by investigating the relationship between weekly working hours and the annual
wage growth in the high-end labor market.

arXiv link: http://arxiv.org/abs/2107.12494v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-07-26

Semiparametric Estimation of Treatment Effects in Observational Studies with Heterogeneous Partial Interference

Authors: Zhaonan Qu, Ruoxuan Xiong, Jizhou Liu, Guido Imbens

In many observational studies in social science and medicine, subjects or
units are connected, and one unit's treatment and attributes may affect
another's treatment and outcome, violating the stable unit treatment value
assumption (SUTVA) and resulting in interference. To enable feasible estimation
and inference, many previous works assume exchangeability of interfering units
(neighbors). However, in many applications with distinctive units, interference
is heterogeneous and needs to be modeled explicitly. In this paper, we focus on
the partial interference setting, and only restrict units to be exchangeable
conditional on observable characteristics. Under this framework, we propose
generalized augmented inverse propensity weighted (AIPW) estimators for general
causal estimands that include heterogeneous direct and spillover effects. We
show that they are semiparametric efficient and robust to heterogeneous
interference as well as model misspecifications. We apply our methods to the
Add Health dataset to study the direct effects of alcohol consumption on
academic performance and the spillover effects of parental incarceration on
adolescent well-being.

arXiv link: http://arxiv.org/abs/2107.12420v3

Econometrics arXiv updated paper (originally submitted: 2021-07-25)

Adaptive Estimation and Uniform Confidence Bands for Nonparametric Structural Functions and Elasticities

Authors: Xiaohong Chen, Timothy Christensen, Sid Kankanala

We introduce two data-driven procedures for optimal estimation and inference
in nonparametric models using instrumental variables. The first is a
data-driven choice of sieve dimension for a popular class of sieve two-stage
least squares estimators. When implemented with this choice, estimators of both
the structural function $h_0$ and its derivatives (such as elasticities)
converge at the fastest possible (i.e., minimax) rates in sup-norm. The second
is for constructing uniform confidence bands (UCBs) for $h_0$ and its
derivatives. Our UCBs guarantee coverage over a generic class of
data-generating processes and contract at the minimax rate, possibly up to a
logarithmic factor. As such, our UCBs are asymptotically more efficient than
UCBs based on the usual approach of undersmoothing. As an application, we
estimate the elasticity of the intensive margin of firm exports in a
monopolistic competition model of international trade. Simulations illustrate
the good performance of our procedures in empirically calibrated designs. Our
results provide evidence against common parameterizations of the distribution
of unobserved firm heterogeneity.

arXiv link: http://arxiv.org/abs/2107.11869v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-07-25

Federated Causal Inference in Heterogeneous Observational Data

Authors: Ruoxuan Xiong, Allison Koenecke, Michael Powell, Zhu Shen, Joshua T. Vogelstein, Susan Athey

We are interested in estimating the effect of a treatment applied to
individuals at multiple sites, where data is stored locally for each site. Due
to privacy constraints, individual-level data cannot be shared across sites;
the sites may also have heterogeneous populations and treatment assignment
mechanisms. Motivated by these considerations, we develop federated methods to
draw inference on the average treatment effects of combined data across sites.
Our methods first compute summary statistics locally using propensity scores
and then aggregate these statistics across sites to obtain point and variance
estimators of average treatment effects. We show that these estimators are
consistent and asymptotically normal. To achieve these asymptotic properties,
we find that the aggregation schemes need to account for the heterogeneity in
treatment assignments and in outcomes across sites. We demonstrate the validity
of our federated methods through a comparative study of two large medical
claims databases.

arXiv link: http://arxiv.org/abs/2107.11732v5

Econometrics arXiv cross-link from q-fin.GN (q-fin.GN), submitted: 2021-07-21

The macroeconomic cost of climate volatility

Authors: Piergiorgio Alessandri, Haroon Mumtaz

We study the impact of climate volatility on economic growth exploiting data
on 133 countries between 1960 and 2019. We show that the conditional (ex ante)
volatility of annual temperatures increased steadily over time, rendering
climate conditions less predictable across countries, with important
implications for growth. Controlling for concomitant changes in temperatures, a
+1 degree C increase in temperature volatility causes on average a 0.3 percent
decline in GDP growth and a 0.7 percent increase in the volatility of GDP.
Unlike changes in average temperatures, changes in temperature volatility
affect both rich and poor countries.

arXiv link: http://arxiv.org/abs/2108.01617v2

Econometrics arXiv paper, submitted: 2021-07-20

Recent Developments in Inference: Practicalities for Applied Economics

Authors: Jeffrey D. Michler, Anna Josephson

We provide a review of recent developments in the calculation of standard
errors and test statistics for statistical inference. While much of the focus
of the last two decades in economics has been on generating unbiased
coefficients, recent years has seen a variety of advancements in correcting for
non-standard standard errors. We synthesize these recent advances in addressing
challenges to conventional inference, like heteroskedasticity, clustering,
serial correlation, and testing multiple hypotheses. We also discuss recent
advancements in numerical methods, such as the bootstrap, wild bootstrap, and
randomization inference. We make three specific recommendations. First, applied
economists need to clearly articulate the challenges to statistical inference
that are present in data as well as the source of those challenges. Second,
modern computing power and statistical software means that applied economists
have no excuse for not correctly calculating their standard errors and test
statistics. Third, because complicated sampling strategies and research designs
make it difficult to work out the correct formula for standard errors and test
statistics, we believe that in the applied economics profession it should
become standard practice to rely on asymptotic refinements to the distribution
of an estimator or test statistic via bootstrapping. Throughout, we reference
built-in and user-written Stata commands that allow one to quickly calculate
accurate standard errors and relevant test statistics.

arXiv link: http://arxiv.org/abs/2107.09736v1

Econometrics arXiv updated paper (originally submitted: 2021-07-20)

Distributional Effects with Two-Sided Measurement Error: An Application to Intergenerational Income Mobility

Authors: Brantly Callaway, Tong Li, Irina Murtazashvili, Emmanuel Tsyawo

This paper considers identification and estimation of distributional effect
parameters that depend on the joint distribution of an outcome and another
variable of interest ("treatment") in a setting with "two-sided" measurement
error -- that is, where both variables are possibly measured with error.
Examples of these parameters in the context of intergenerational income
mobility include transition matrices, rank-rank correlations, and the poverty
rate of children as a function of their parents' income, among others. Building
on recent work on quantile regression (QR) with measurement error in the
outcome (particularly, Hausman, Liu, Luo, and Palmer (2021)), we show that,
given (i) two linear QR models separately for the outcome and treatment
conditional on other observed covariates and (ii) assumptions about the
measurement error for each variable, one can recover the joint distribution of
the outcome and the treatment. Besides these conditions, our approach does not
require an instrument, repeated measurements, or distributional assumptions
about the measurement error. Using recent data from the 1997 National
Longitudinal Study of Youth, we find that accounting for measurement error
notably reduces several estimates of intergenerational mobility parameters.

arXiv link: http://arxiv.org/abs/2107.09235v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-07-19

Mind the Income Gap: Bias Correction of Inequality Estimators in Small-Sized Samples

Authors: Silvia De Nicolò, Maria Rosaria Ferrante, Silvia Pacei

Income inequality estimators are biased in small samples, leading generally
to an underestimation. This aspect deserves particular attention when
estimating inequality in small domains and performing small area estimation at
the area level. We propose a bias correction framework for a large class of
inequality measures comprising the Gini Index, the Generalized Entropy and the
Atkinson index families by accounting for complex survey designs. The proposed
methodology does not require any parametric assumption on income distribution,
being very flexible. Design-based performance evaluation of our proposal has
been carried out using EU-SILC data, their results show a noticeable bias
reduction for all the measures. Lastly, an illustrative example of application
in small area estimation confirms that ignoring ex-ante bias correction
determines model misspecification.

arXiv link: http://arxiv.org/abs/2107.08950v3

Econometrics arXiv paper, submitted: 2021-07-18

Decoupling Shrinkage and Selection for the Bayesian Quantile Regression

Authors: David Kohns, Tibor Szendrei

This paper extends the idea of decoupling shrinkage and sparsity for
continuous priors to Bayesian Quantile Regression (BQR). The procedure follows
two steps: In the first step, we shrink the quantile regression posterior
through state of the art continuous priors and in the second step, we sparsify
the posterior through an efficient variant of the adaptive lasso, the signal
adaptive variable selection (SAVS) algorithm. We propose a new variant of the
SAVS which automates the choice of penalisation through quantile specific
loss-functions that are valid in high dimensions. We show in large scale
simulations that our selection procedure decreases bias irrespective of the
true underlying degree of sparsity in the data, compared to the un-sparsified
regression posterior. We apply our two-step approach to a high dimensional
growth-at-risk (GaR) exercise. The prediction accuracy of the un-sparsified
posterior is retained while yielding interpretable quantile specific variable
selection results. Our procedure can be used to communicate to policymakers
which variables drive downside risk to the macro economy.

arXiv link: http://arxiv.org/abs/2107.08498v1

Econometrics arXiv updated paper (originally submitted: 2021-07-16)

Hamiltonian Monte Carlo for Regression with High-Dimensional Categorical Data

Authors: Szymon Sacher, Laura Battaglia, Stephen Hansen

Latent variable models are increasingly used in economics for
high-dimensional categorical data like text and surveys. We demonstrate the
effectiveness of Hamiltonian Monte Carlo (HMC) with parallelized automatic
differentiation for analyzing such data in a computationally efficient and
methodologically sound manner. Our new model, Supervised Topic Model with
Covariates, shows that carefully modeling this type of data can have
significant implications on conclusions compared to a simpler, frequently used,
yet methodologically problematic, two-step approach. A simulation study and
revisiting Bandiera et al. (2020)'s study of executive time use demonstrate
these results. The approach accommodates thousands of parameters and doesn't
require custom algorithms specific to each model, making it accessible for
applied researchers

arXiv link: http://arxiv.org/abs/2107.08112v2

Econometrics arXiv updated paper (originally submitted: 2021-07-16)

Flexible Covariate Adjustments in Regression Discontinuity Designs

Authors: Claudia Noack, Tomasz Olma, Christoph Rothe

Empirical regression discontinuity (RD) studies often include covariates in
their specifications to increase the precision of their estimates. In this
paper, we propose a novel class of estimators that use such covariate
information more efficiently than existing methods and can accommodate many
covariates. Our estimators are simple to implement and involve running a
standard RD analysis after subtracting a function of the covariates from the
original outcome variable. We characterize the function of the covariates that
minimizes the asymptotic variance of these estimators. We also show that the
conventional RD framework gives rise to a special robustness property which
implies that the optimal adjustment function can be estimated flexibly via
modern machine learning techniques without affecting the first-order properties
of the final RD estimator. We demonstrate our methods' scope for efficiency
improvements by reanalyzing data from a large number of recently published
empirical studies.

arXiv link: http://arxiv.org/abs/2107.07942v5

Econometrics arXiv paper, submitted: 2021-07-16

Subspace Shrinkage in Conjugate Bayesian Vector Autoregressions

Authors: Florian Huber, Gary Koop

Macroeconomists using large datasets often face the choice of working with
either a large Vector Autoregression (VAR) or a factor model. In this paper, we
develop methods for combining the two using a subspace shrinkage prior.
Subspace priors shrink towards a class of functions rather than directly
forcing the parameters of a model towards some pre-specified location. We
develop a conjugate VAR prior which shrinks towards the subspace which is
defined by a factor model. Our approach allows for estimating the strength of
the shrinkage as well as the number of factors. After establishing the
theoretical properties of our proposed prior, we carry out simulations and
apply it to US macroeconomic data. Using simulations we show that our framework
successfully detects the number of factors. In a forecasting exercise involving
a large macroeconomic data set we find that combining VARs with factor models
using our prior can lead to forecast improvements.

arXiv link: http://arxiv.org/abs/2107.07804v1

Econometrics arXiv paper, submitted: 2021-07-14

Generalized Covariance Estimator

Authors: Christian Gourieroux, Joann Jasiak

We consider a class of semi-parametric dynamic models with strong white noise
errors. This class of processes includes the standard Vector Autoregressive
(VAR) model, the nonfundamental structural VAR, the mixed causal-noncausal
models, as well as nonlinear dynamic models such as the (multivariate) ARCH-M
model. For estimation of processes in this class, we propose the Generalized
Covariance (GCov) estimator, which is obtained by minimizing a residual-based
multivariate portmanteau statistic as an alternative to the Generalized Method
of Moments. We derive the asymptotic properties of the GCov estimator and of
the associated residual-based portmanteau statistic. Moreover, we show that the
GCov estimators are semi-parametrically efficient and the residual-based
portmanteau statistics are asymptotically chi-square distributed. The finite
sample performance of the GCov estimator is illustrated in a simulation study.
The estimator is also applied to a dynamic model of cryptocurrency prices.

arXiv link: http://arxiv.org/abs/2107.06979v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-07-14

Time Series Estimation of the Dynamic Effects of Disaster-Type Shock

Authors: Richard Davis, Serena Ng

This paper provides three results for SVARs under the assumption that the
primitive shocks are mutually independent. First, a framework is proposed to
accommodate a disaster-type variable with infinite variance into a SVAR. We
show that the least squares estimates of the SVAR are consistent but have
non-standard asymptotics. Second, the disaster shock is identified as the
component with the largest kurtosis and whose impact effect is negative. An
estimator that is robust to infinite variance is used to recover the mutually
independent components. Third, an independence test on the residuals
pre-whitened by the Choleski decomposition is proposed to test the restrictions
imposed on a SVAR. The test can be applied whether the data have fat or thin
tails, and to over as well as exactly identified models. Three applications are
considered. In the first, the independence test is used to shed light on the
conflicting evidence regarding the role of uncertainty in economic
fluctuations. In the second, disaster shocks are shown to have short term
economic impact arising mostly from feedback dynamics. The third uses the
framework to study the dynamic effects of economic shocks post-covid.

arXiv link: http://arxiv.org/abs/2107.06663v3

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2021-07-14

Financial Return Distributions: Past, Present, and COVID-19

Authors: Marcin Wątorek, Jarosław Kwapień, Stanisław Drożdż

We analyze the price return distributions of currency exchange rates,
cryptocurrencies, and contracts for differences (CFDs) representing stock
indices, stock shares, and commodities. Based on recent data from the years
2017--2020, we model tails of the return distributions at different time scales
by using power-law, stretched exponential, and $q$-Gaussian functions. We focus
on the fitted function parameters and how they change over the years by
comparing our results with those from earlier studies and find that, on the
time horizons of up to a few minutes, the so-called "inverse-cubic power-law"
still constitutes an appropriate global reference. However, we no longer
observe the hypothesized universal constant acceleration of the market time
flow that was manifested before in an ever faster convergence of empirical
return distributions towards the normal distribution. Our results do not
exclude such a scenario but, rather, suggest that some other short-term
processes related to a current market situation alter market dynamics and may
mask this scenario. Real market dynamics is associated with a continuous
alternation of different regimes with different statistical properties. An
example is the COVID-19 pandemic outburst, which had an enormous yet short-time
impact on financial markets. We also point out that two factors -- speed of the
market time flow and the asset cross-correlation magnitude -- while related
(the larger the speed, the larger the cross-correlations on a given time
scale), act in opposite directions with regard to the return distribution
tails, which can affect the expected distribution convergence to the normal
distribution.

arXiv link: http://arxiv.org/abs/2107.06659v1

Econometrics arXiv paper, submitted: 2021-07-13

MinP Score Tests with an Inequality Constrained Parameter Space

Authors: Giuseppe Cavaliere, Zeng-Hua Lu, Anders Rahbek, Yuhong Yang

Score tests have the advantage of requiring estimation alone of the model
restricted by the null hypothesis, which often is much simpler than models
defined under the alternative hypothesis. This is typically so when the
alternative hypothesis involves inequality constraints. However, existing score
tests address only jointly testing all parameters of interest; a leading
example is testing all ARCH parameters or variances of random coefficients
being zero or not. In such testing problems rejection of the null hypothesis
does not provide evidence on rejection of specific elements of parameter of
interest. This paper proposes a class of one-sided score tests for testing a
model parameter that is subject to inequality constraints. Proposed tests are
constructed based on the minimum of a set of $p$-values. The minimand includes
the $p$-values for testing individual elements of parameter of interest using
individual scores. It may be extended to include a $p$-value of existing score
tests. We show that our tests perform better than/or perform as good as
existing score tests in terms of joint testing, and has furthermore the added
benefit of allowing for simultaneously testing individual elements of parameter
of interest. The added benefit is appealing in the sense that it can identify a
model without estimating it. We illustrate our tests in linear regression
models, ARCH and random coefficient models. A detailed simulation study is
provided to examine the finite sample performance of the proposed tests and we
find that our tests perform well as expected.

arXiv link: http://arxiv.org/abs/2107.06089v1

Econometrics arXiv updated paper (originally submitted: 2021-07-13)

Testability of Reverse Causality Without Exogenous Variation

Authors: Christoph Breunig, Patrick Burauel

This paper shows that testability of reverse causality is possible even in
the absence of exogenous variation, such as in the form of instrumental
variables. Instead of relying on exogenous variation, we achieve testability by
imposing relatively weak model restrictions and exploiting that a dependence of
residual and purported cause is informative about the causal direction. Our
main assumption is that the true functional relationship is nonlinear and that
error terms are additively separable. We extend previous results by
incorporating control variables and allowing heteroskedastic errors. We build
on reproducing kernel Hilbert space (RKHS) embeddings of probability
distributions to test conditional independence and demonstrate the efficacy in
detecting the causal direction in both Monte Carlo simulations and an
application to German survey data.

arXiv link: http://arxiv.org/abs/2107.05936v2

Econometrics arXiv updated paper (originally submitted: 2021-07-12)

Identification of Average Marginal Effects in Fixed Effects Dynamic Discrete Choice Models

Authors: Victor Aguirregabiria, Jesus M. Carro

In nonlinear panel data models, fixed effects methods are often criticized
because they cannot identify average marginal effects (AMEs) in short panels.
The common argument is that identifying AMEs requires knowledge of the
distribution of unobserved heterogeneity, but this distribution is not
identified in a fixed effects model with a short panel. In this paper, we
derive identification results that contradict this argument. In a panel data
dynamic logit model, and for $T$ as small as three, we prove the point
identification of different AMEs, including causal effects of changes in the
lagged dependent variable or the last choice's duration. Our proofs are
constructive and provide simple closed-form expressions for the AMEs in terms
of probabilities of choice histories. We illustrate our results using Monte
Carlo experiments and with an empirical application of a dynamic structural
model of consumer brand choice with state dependence.

arXiv link: http://arxiv.org/abs/2107.06141v2

Econometrics arXiv updated paper (originally submitted: 2021-07-12)

Inference on Individual Treatment Effects in Nonseparable Triangular Models

Authors: Jun Ma, Vadim Marmer, Zhengfei Yu

In nonseparable triangular models with a binary endogenous treatment and a
binary instrumental variable, Vuong and Xu (2017) established identification
results for individual treatment effects (ITEs) under the rank invariance
assumption. Using their approach, Feng, Vuong, and Xu (2019) proposed a
uniformly consistent kernel estimator for the density of the ITE that utilizes
estimated ITEs. In this paper, we establish the asymptotic normality of the
density estimator of Feng, Vuong, and Xu (2019) and show that the ITE
estimation errors have a non-negligible effect on the asymptotic distribution
of the estimator. We propose asymptotically valid standard errors that account
for ITEs estimation, as well as a bias correction. Furthermore, we develop
uniform confidence bands for the density of the ITE using the jackknife
multiplier or nonparametric bootstrap critical values.

arXiv link: http://arxiv.org/abs/2107.05559v4

Econometrics arXiv updated paper (originally submitted: 2021-07-12)

A Lucas Critique Compliant SVAR model with Observation-driven Time-varying Parameters

Authors: Giacomo Bormetti, Fulvio Corsi

We propose an observation-driven time-varying SVAR model where, in agreement
with the Lucas Critique, structural shocks drive both the evolution of the
macro variables and the dynamics of the VAR parameters. Contrary to existing
approaches where parameters follow a stochastic process with random and
exogenous shocks, our observation-driven specification allows the evolution of
the parameters to be driven by realized past structural shocks, thus opening
the possibility to gauge the impact of observed shocks and hypothetical policy
interventions on the future evolution of the economic system.

arXiv link: http://arxiv.org/abs/2107.05263v2

Econometrics arXiv updated paper (originally submitted: 2021-07-11)

Inference for the proportional odds cumulative logit model with monotonicity constraints for ordinal predictors and ordinal response

Authors: Javier Espinosa-Brito, Christian Hennig

The proportional odds cumulative logit model (POCLM) is a standard regression
model for an ordinal response. Ordinality of predictors can be incorporated by
monotonicity constraints for the corresponding parameters. It is shown that
estimators defined by optimization, such as maximum likelihood estimators, for
an unconstrained model and for parameters in the interior set of the parameter
space of a constrained model are asymptotically equivalent. This is used in
order to derive asymptotic confidence regions and tests for the constrained
model, involving simple modifications for finite samples. The finite sample
coverage probability of the confidence regions is investigated by simulation.
Tests concern the effect of individual variables, monotonicity, and a specified
monotonicity direction. The methodology is applied on real data related to the
assessment of school performance.

arXiv link: http://arxiv.org/abs/2107.04946v4

Econometrics arXiv paper, submitted: 2021-07-10

Machine Learning for Financial Forecasting, Planning and Analysis: Recent Developments and Pitfalls

Authors: Helmut Wasserbacher, Martin Spindler

This article is an introduction to machine learning for financial
forecasting, planning and analysis (FP&A). Machine learning appears well
suited to support FP&A with the highly automated extraction of information
from large amounts of data. However, because most traditional machine learning
techniques focus on forecasting (prediction), we discuss the particular care
that must be taken to avoid the pitfalls of using them for planning and
resource allocation (causal inference). While the naive application of machine
learning usually fails in this context, the recently developed double machine
learning framework can address causal questions of interest. We review the
current literature on machine learning in FP&A and illustrate in a simulation
study how machine learning can be used for both forecasting and planning. We
also investigate how forecasting and planning improve as the number of data
points increases.

arXiv link: http://arxiv.org/abs/2107.04851v1

Econometrics arXiv updated paper (originally submitted: 2021-07-07)

Estimation and Inference in Factor Copula Models with Exogenous Covariates

Authors: Alexander Mayer, Dominik Wied

A factor copula model is proposed in which factors are either simulable or
estimable from exogenous information. Point estimation and inference are based
on a simulated methods of moments (SMM) approach with non-overlapping
simulation draws. Consistency and limiting normality of the estimator is
established and the validity of bootstrap standard errors is shown. Doing so,
previous results from the literature are verified under low-level conditions
imposed on the individual components of the factor structure. Monte Carlo
evidence confirms the accuracy of the asymptotic theory in finite samples and
an empirical application illustrates the usefulness of the model to explain the
cross-sectional dependence between stock returns.

arXiv link: http://arxiv.org/abs/2107.03366v4

Econometrics arXiv updated paper (originally submitted: 2021-07-07)

Dynamic Ordered Panel Logit Models

Authors: Bo E. Honoré, Chris Muris, Martin Weidner

This paper studies a dynamic ordered logit model for panel data with fixed
effects. The main contribution of the paper is to construct a set of valid
moment conditions that are free of the fixed effects. The moment functions can
be computed using four or more periods of data, and the paper presents
sufficient conditions for the moment conditions to identify the common
parameters of the model, namely the regression coefficients, the autoregressive
parameters, and the threshold parameters. The availability of moment conditions
suggests that these common parameters can be estimated using the generalized
method of moments, and the paper documents the performance of this estimator
using Monte Carlo simulations and an empirical illustration to self-reported
health status using the British Household Panel Survey.

arXiv link: http://arxiv.org/abs/2107.03253v4

Econometrics arXiv updated paper (originally submitted: 2021-07-06)

Causal Inference with Corrupted Data: Measurement Error, Missing Values, Discretization, and Differential Privacy

Authors: Anish Agarwal, Rahul Singh

The US Census Bureau will deliberately corrupt data sets derived from the
2020 US Census, enhancing the privacy of respondents while potentially reducing
the precision of economic analysis. To investigate whether this trade-off is
inevitable, we formulate a semiparametric model of causal inference with high
dimensional corrupted data. We propose a procedure for data cleaning,
estimation, and inference with data cleaning-adjusted confidence intervals. We
prove consistency and Gaussian approximation by finite sample arguments, with a
rate of $n^{ 1/2}$ for semiparametric estimands that degrades gracefully for
nonparametric estimands. Our key assumption is that the true covariates are
approximately low rank, which we interpret as approximate repeated measurements
and empirically validate. Our analysis provides nonasymptotic theoretical
contributions to matrix completion, statistical learning, and semiparametric
statistics. Calibrated simulations verify the coverage of our data cleaning
adjusted confidence intervals and demonstrate the relevance of our results for
Census-derived data.

arXiv link: http://arxiv.org/abs/2107.02780v6

Econometrics arXiv updated paper (originally submitted: 2021-07-06)

Shapes as Product Differentiation: Neural Network Embedding in the Analysis of Markets for Fonts

Authors: Sukjin Han, Eric H. Schulman, Kristen Grauman, Santhosh Ramakrishnan

Many differentiated products have key attributes that are unstructured and
thus high-dimensional (e.g., design, text). Instead of treating unstructured
attributes as unobservables in economic models, quantifying them can be
important to answer interesting economic questions. To propose an analytical
framework for these types of products, this paper considers one of the simplest
design products-fonts-and investigates merger and product differentiation using
an original dataset from the world's largest online marketplace for fonts. We
quantify font shapes by constructing embeddings from a deep convolutional
neural network. Each embedding maps a font's shape onto a low-dimensional
vector. In the resulting product space, designers are assumed to engage in
Hotelling-type spatial competition. From the image embeddings, we construct two
alternative measures that capture the degree of design differentiation. We then
study the causal effects of a merger on the merging firm's creative decisions
using the constructed measures in a synthetic control method. We find that the
merger causes the merging firm to increase the visual variety of font design.
Notably, such effects are not captured when using traditional measures for
product offerings (e.g., specifications and the number of products) constructed
from structured data.

arXiv link: http://arxiv.org/abs/2107.02739v2

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2021-07-06

Gravity models of networks: integrating maximum-entropy and econometric approaches

Authors: Marzio Di Vece, Diego Garlaschelli, Tiziano Squartini

The World Trade Web (WTW) is the network of international trade relationships
among world countries. Characterizing both the local link weights (observed
trade volumes) and the global network structure (large-scale topology) of the
WTW via a single model is still an open issue. While the traditional Gravity
Model (GM) successfully replicates the observed trade volumes by employing
macroeconomic properties such as GDP and geographic distance, it,
unfortunately, predicts a fully connected network, thus returning a completely
unrealistic topology of the WTW. To overcome this problem, two different
classes of models have been introduced in econometrics and statistical physics.
Econometric approaches interpret the traditional GM as the expected value of a
probability distribution that can be chosen arbitrarily and tested against
alternative distributions. Statistical physics approaches construct
maximum-entropy probability distributions of (weighted) graphs from a chosen
set of measurable structural constraints and test distributions resulting from
different constraints. Here we compare and integrate the two approaches by
considering a class of maximum-entropy models that can incorporate
macroeconomic properties used in standard econometric models. We find that the
integrated approach achieves a better performance than the purely econometric
one. These results suggest that the maximum-entropy construction can serve as a
viable econometric framework wherein extensive and intensive margins can be
separately controlled for, by combining topological constraints and dyadic
macroeconomic variables.

arXiv link: http://arxiv.org/abs/2107.02650v3

Econometrics arXiv updated paper (originally submitted: 2021-07-06)

Difference-in-Differences with a Continuous Treatment

Authors: Brantly Callaway, Andrew Goodman-Bacon, Pedro H. C. Sant'Anna

This paper analyzes difference-in-differences designs with a continuous
treatment. We show that treatment effect on the treated-type parameters can be
identified under a generalized parallel trends assumption that is similar to
the binary treatment setup. However, interpreting differences in these
parameters across different values of the treatment can be particularly
challenging due to selection bias that is not ruled out by the parallel trends
assumption. We discuss alternative, typically stronger, assumptions that
alleviate these challenges. We also provide a variety of treatment effect
decomposition results, highlighting that parameters associated with popular
linear two-way fixed-effects specifications can be hard to interpret,
even when there are only two time periods. We introduce alternative
estimation procedures that do not suffer from these drawbacks and show in an
application that they can lead to different conclusions.

arXiv link: http://arxiv.org/abs/2107.02637v7

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-07-06

Inference for Low-Rank Models

Authors: Victor Chernozhukov, Christian Hansen, Yuan Liao, Yinchu Zhu

This paper studies inference in linear models with a high-dimensional
parameter matrix that can be well-approximated by a “spiked low-rank matrix.”
A spiked low-rank matrix has rank that grows slowly compared to its dimensions
and nonzero singular values that diverge to infinity. We show that this
framework covers a broad class of models of latent-variables which can
accommodate matrix completion problems, factor models, varying coefficient
models, and heterogeneous treatment effects. For inference, we apply a
procedure that relies on an initial nuclear-norm penalized estimation step
followed by two ordinary least squares regressions. We consider the framework
of estimating incoherent eigenvectors and use a rotation argument to argue that
the eigenspace estimation is asymptotically unbiased. Using this framework we
show that our procedure provides asymptotically normal inference and achieves
the semiparametric efficiency bound. We illustrate our framework by providing
low-level conditions for its application in a treatment effects context where
treatment assignment might be strongly dependent.

arXiv link: http://arxiv.org/abs/2107.02602v2

Econometrics arXiv paper, submitted: 2021-07-05

Big Data Information and Nowcasting: Consumption and Investment from Bank Transactions in Turkey

Authors: Ali B. Barlas, Seda Guler Mert, Berk Orkun Isa, Alvaro Ortiz, Tomasa Rodrigo, Baris Soybilgen, Ege Yazgan

We use the aggregate information from individual-to-firm and firm-to-firm in
Garanti BBVA Bank transactions to mimic domestic private demand. Particularly,
we replicate the quarterly national accounts aggregate consumption and
investment (gross fixed capital formation) and its bigger components (Machinery
and Equipment and Construction) in real time for the case of Turkey. In order
to validate the usefulness of the information derived from these indicators we
test the nowcasting ability of both indicators to nowcast the Turkish GDP using
different nowcasting models. The results are successful and confirm the
usefulness of Consumption and Investment Banking transactions for nowcasting
purposes. The value of the Big data information is more relevant at the
beginning of the nowcasting process, when the traditional hard data information
is scarce. This makes this information specially relevant for those countries
where statistical release lags are longer like the Emerging Markets.

arXiv link: http://arxiv.org/abs/2107.03299v1

Econometrics arXiv paper, submitted: 2021-07-02

Partial Identification and Inference in Duration Models with Endogenous Censoring

Authors: Shosei Sakaguchi

This paper studies identification and inference in transformation models with
endogenous censoring. Many kinds of duration models, such as the accelerated
failure time model, proportional hazard model, and mixed proportional hazard
model, can be viewed as transformation models. We allow the censoring of a
duration outcome to be arbitrarily correlated with observed covariates and
unobserved heterogeneity. We impose no parametric restrictions on either the
transformation function or the distribution function of the unobserved
heterogeneity. In this setting, we develop bounds on the regression parameters
and the transformation function, which are characterized by conditional moment
inequalities involving U-statistics. We provide inference methods for them by
constructing an inference approach for conditional moment inequality models in
which the sample analogs of moments are U-statistics. We apply the proposed
inference methods to evaluate the effect of heart transplants on patients'
survival time using data from the Stanford Heart Transplant Study.

arXiv link: http://arxiv.org/abs/2107.00928v1

Econometrics arXiv cross-link from q-fin.MF (q-fin.MF), submitted: 2021-07-01

Feasible Implied Correlation Matrices from Factor Structures

Authors: Wolfgang Schadner

Forward-looking correlations are of interest in different financial
applications, including factor-based asset pricing, forecasting stock-price
movements or pricing index options. With a focus on non-FX markets, this paper
defines necessary conditions for option implied correlation matrices to be
mathematically and economically feasible and argues, that existing models are
typically not capable of guaranteeing so. To overcome this difficulty, the
problem is addressed from the underlying factor structure and introduces two
approaches to solve it. Under the quantitative approach, the puzzle is
reformulated into a nearest correlation matrix problem which can be used either
as a stand-alone estimate or to re-establish positive-semi-definiteness of any
other model's estimate. From an economic approach, it is discussed how expected
correlations between stocks and risk factors (like CAPM, Fama-French) can be
translated into a feasible implied correlation matrix. Empirical experiments
are carried out on monthly option data of the S&P 100 and S&P 500 index
(1996-2020).

arXiv link: http://arxiv.org/abs/2107.00427v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-07-01

A conditional independence test for causality in econometrics

Authors: Jaime Sevilla, Alexandra Mayn

The Y-test is a useful tool for detecting missing confounders in the context
of a multivariate regression.However, it is rarely used in practice since it
requires identifying multiple conditionally independent instruments, which is
often impossible. We propose a heuristic test which relaxes the independence
requirement. We then show how to apply this heuristic test on a price-demand
and a firm loan-productivity problem. We conclude that the test is informative
when the variables are linearly related with Gaussian additive noise, but it
can be misleading in other contexts. Still, we believe that the test can be a
useful concept for falsifying a proposed control set.

arXiv link: http://arxiv.org/abs/2107.09765v1

Econometrics arXiv cross-link from eess.SP (eess.SP), submitted: 2021-06-30

National-scale electricity peak load forecasting: Traditional, machine learning, or hybrid model?

Authors: Juyong Lee, Youngsang Cho

As the volatility of electricity demand increases owing to climate change and
electrification, the importance of accurate peak load forecasting is
increasing. Traditional peak load forecasting has been conducted through time
series-based models; however, recently, new models based on machine or deep
learning are being introduced. This study performs a comparative analysis to
determine the most accurate peak load-forecasting model for Korea, by comparing
the performance of time series, machine learning, and hybrid models. Seasonal
autoregressive integrated moving average with exogenous variables (SARIMAX) is
used for the time series model. Artificial neural network (ANN), support vector
regression (SVR), and long short-term memory (LSTM) are used for the machine
learning models. SARIMAX-ANN, SARIMAX-SVR, and SARIMAX-LSTM are used for the
hybrid models. The results indicate that the hybrid models exhibit significant
improvement over the SARIMAX model. The LSTM-based models outperformed the
others; the single and hybrid LSTM models did not exhibit a significant
performance difference. In the case of Korea's highest peak load in 2019, the
predictive power of the LSTM model proved to be greater than that of the
SARIMAX-LSTM model. The LSTM, SARIMAX-SVR, and SARIMAX-LSTM models outperformed
the current time series-based forecasting model used in Korea. Thus, Korea's
peak load-forecasting performance can be improved by including machine learning
or hybrid models.

arXiv link: http://arxiv.org/abs/2107.06174v1

Econometrics arXiv paper, submitted: 2021-06-28

A Note on the Topology of the First Stage of 2SLS with Many Instruments

Authors: Guy Tchuente

The finite sample properties of estimators are usually understood or
approximated using asymptotic theories. Two main asymptotic constructions have
been used to characterize the presence of many instruments. The first assumes
that the number of instruments increases with the sample size. I demonstrate
that in this case, one of the key assumptions used in the asymptotic
construction may imply that the number of “effective" instruments should be
finite, resulting in an internal contradiction. The second asymptotic
representation considers that the number of instrumental variables (IVs) may be
finite, infinite, or even a continuum. The number does not change with the
sample size. In this scenario, the regularized estimator obtained depends on
the topology imposed on the set of instruments as well as on a regularization
parameter. These restrictions may induce a bias or restrict the set of
admissible instruments. However, the assumptions are internally coherent. The
limitations of many IVs asymptotic assumptions provide support for finite
sample distributional studies to better understand the behavior of many IV
estimators.

arXiv link: http://arxiv.org/abs/2106.15003v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-06-26

The Role of Contextual Information in Best Arm Identification

Authors: Masahiro Kato, Kaito Ariu

We study the best-arm identification problem with fixed confidence when
contextual (covariate) information is available in stochastic bandits. Although
we can use contextual information in each round, we are interested in the
marginalized mean reward over the contextual distribution. Our goal is to
identify the best arm with a minimal number of samplings under a given value of
the error rate. We show the instance-specific sample complexity lower bounds
for the problem. Then, we propose a context-aware version of the
"Track-and-Stop" strategy, wherein the proportion of the arm draws tracks the
set of optimal allocations and prove that the expected number of arm draws
matches the lower bound asymptotically. We demonstrate that contextual
information can be used to improve the efficiency of the identification of the
best marginalized mean reward compared with the results of Garivier & Kaufmann
(2016). We experimentally confirm that context information contributes to
faster best-arm identification.

arXiv link: http://arxiv.org/abs/2106.14077v3

Econometrics arXiv updated paper (originally submitted: 2021-06-25)

Nonparametric inference on counterfactuals in first-price auctions

Authors: Pasha Andreyanov, Grigory Franguridi

In a classical model of the first-price sealed-bid auction with independent
private values, we develop nonparametric estimators for several policy-relevant
targets, such as the bidder's surplus and auctioneer's revenue under
counterfactual reserve prices. Motivated by the linearity of these targets in
the quantile function of bidders' values, we propose an estimator of the latter
and derive its Bahadur-Kiefer expansion. This makes it possible to construct
uniform confidence bands and test complex hypotheses about the auction design.
Using the data on U.S. Forest Service timber auctions, we test whether setting
zero reserve prices in these auctions was revenue maximizing.

arXiv link: http://arxiv.org/abs/2106.13856v3

Econometrics arXiv updated paper (originally submitted: 2021-06-24)

Constrained Classification and Policy Learning

Authors: Toru Kitagawa, Shosei Sakaguchi, Aleksey Tetenov

Modern machine learning approaches to classification, including AdaBoost,
support vector machines, and deep neural networks, utilize surrogate loss
techniques to circumvent the computational complexity of minimizing empirical
classification risk. These techniques are also useful for causal policy
learning problems, since estimation of individualized treatment rules can be
cast as a weighted (cost-sensitive) classification problem. Consistency of the
surrogate loss approaches studied in Zhang (2004) and Bartlett et al. (2006)
crucially relies on the assumption of correct specification, meaning that the
specified set of classifiers is rich enough to contain a first-best classifier.
This assumption is, however, less credible when the set of classifiers is
constrained by interpretability or fairness, leaving the applicability of
surrogate loss based algorithms unknown in such second-best scenarios. This
paper studies consistency of surrogate loss procedures under a constrained set
of classifiers without assuming correct specification. We show that in the
setting where the constraint restricts the classifier's prediction set only,
hinge losses (i.e., $\ell_1$-support vector machines) are the only surrogate
losses that preserve consistency in second-best scenarios. If the constraint
additionally restricts the functional form of the classifier, consistency of a
surrogate loss approach is not guaranteed even with hinge loss. We therefore
characterize conditions for the constrained set of classifiers that can
guarantee consistency of hinge risk minimizing classifiers. Exploiting our
theoretical results, we develop robust and computationally attractive hinge
loss based procedures for a monotone classification problem.

arXiv link: http://arxiv.org/abs/2106.12886v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-06-23

Variational Bayes in State Space Models: Inferential and Predictive Accuracy

Authors: David T. Frazier, Ruben Loaiza-Maya, Gael M. Martin

Using theoretical and numerical results, we document the accuracy of commonly
applied variational Bayes methods across a range of state space models. The
results demonstrate that, in terms of accuracy on fixed parameters, there is a
clear hierarchy in terms of the methods, with approaches that do not
approximate the states yielding superior accuracy over methods that do. We also
document numerically that the inferential discrepancies between the various
methods often yield only small discrepancies in predictive accuracy over small
out-of-sample evaluation periods. Nevertheless, in certain settings, these
predictive discrepancies can become meaningful over a longer out-of-sample
period. This finding indicates that the invariance of predictive results to
inferential inaccuracy, which has been an oft-touted point made by
practitioners seeking to justify the use of variational inference, is not
ubiquitous and must be assessed on a case-by-case basis.

arXiv link: http://arxiv.org/abs/2106.12262v3

Econometrics arXiv updated paper (originally submitted: 2021-06-22)

Discovering Heterogeneous Treatment Effects in Regression Discontinuity Designs

Authors: Ágoston Reguly

The paper proposes a causal supervised machine learning algorithm to uncover
treatment effect heterogeneity in sharp and fuzzy regression discontinuity (RD)
designs. We develop a criterion for building an honest “regression
discontinuity tree”, where each leaf contains the RD estimate of a treatment
conditional on the values of some pre-treatment covariates. It is a priori
unknown which covariates are relevant for capturing treatment effect
heterogeneity, and it is the task of the algorithm to discover them, without
invalidating inference, while employing a nonparametric estimator with expected
MSE optimal bandwidth. We study the performance of the method through Monte
Carlo simulations and apply it to uncover various sources of heterogeneity in
the impact of attending a better secondary school in Romania.

arXiv link: http://arxiv.org/abs/2106.11640v4

Econometrics arXiv paper, submitted: 2021-06-21

On Testing Equal Conditional Predictive Ability Under Measurement Error

Authors: Yannick Hoga, Timo Dimitriadis

Loss functions are widely used to compare several competing forecasts.
However, forecast comparisons are often based on mismeasured proxy variables
for the true target. We introduce the concept of exact robustness to
measurement error for loss functions and fully characterize this class of loss
functions as the Bregman class. For such exactly robust loss functions,
forecast loss differences are on average unaffected by the use of proxy
variables and, thus, inference on conditional predictive ability can be carried
out as usual. Moreover, we show that more precise proxies give predictive
ability tests higher power in discriminating between competing forecasts.
Simulations illustrate the different behavior of exactly robust and non-robust
loss functions. An empirical application to US GDP growth rates demonstrates
that it is easier to discriminate between forecasts issued at different
horizons if a better proxy for GDP growth is used.

arXiv link: http://arxiv.org/abs/2106.11104v1

Econometrics arXiv paper, submitted: 2021-06-21

On the Use of Two-Way Fixed Effects Models for Policy Evaluation During Pandemics

Authors: Germain Gauthier

In the context of the Covid-19 pandemic, multiple studies rely on two-way
fixed effects (FE) models to assess the impact of mitigation policies on health
outcomes. Building on the SIRD model of disease transmission, I show that FE
models tend to be misspecified for three reasons. First, despite misleading
common trends in the pre-treatment period, the parallel trends assumption
generally does not hold. Second, heterogeneity in infection rates and infected
populations across regions cannot be accounted for by region-specific fixed
effects, nor by conditioning on observable time-varying confounders. Third,
epidemiological theory predicts heterogeneous treatment effects across regions
and over time. Via simulations, I find that the bias resulting from model
misspecification can be substantial, in magnitude and sometimes in sign.
Overall, my results caution against the use of FE models for mitigation policy
evaluation.

arXiv link: http://arxiv.org/abs/2106.10949v1

Econometrics arXiv updated paper (originally submitted: 2021-06-20)

A Neural Frequency-Severity Model and Its Application to Insurance Claims

Authors: Dong-Young Lim

This paper proposes a flexible and analytically tractable class of frequency
and severity models for predicting insurance claims. The proposed model is able
to capture nonlinear relationships in explanatory variables by characterizing
the logarithmic mean functions of frequency and severity distributions as
neural networks. Moreover, a potential dependence between the claim frequency
and severity can be incorporated. In particular, the paper provides analytic
formulas for mean and variance of the total claim cost, making our model ideal
for many applications such as pricing insurance contracts and the pure premium.
A simulation study demonstrates that our method successfully recovers nonlinear
features of explanatory variables as well as the dependency between frequency
and severity. Then, this paper uses a French auto insurance claim dataset to
illustrate that the proposed model is superior to the existing methods in
fitting and predicting the claim frequency, severity, and the total claim loss.
Numerical results indicate that the proposed model helps in maintaining the
competitiveness of an insurer by accurately predicting insurance claims and
avoiding adverse selection.

arXiv link: http://arxiv.org/abs/2106.10770v3

Econometrics arXiv paper, submitted: 2021-06-20

Semiparametric inference for partially linear regressions with Box-Cox transformation

Authors: Daniel Becker, Alois Kneip, Valentin Patilea

In this paper, a semiparametric partially linear model in the spirit of
Robinson (1988) with Box- Cox transformed dependent variable is studied.
Transformation regression models are widely used in applied econometrics to
avoid misspecification. In addition, a partially linear semiparametric model is
an intermediate strategy that tries to balance advantages and disadvantages of
a fully parametric model and nonparametric models. A combination of
transformation and partially linear semiparametric model is, thus, a natural
strategy. The model parameters are estimated by a semiparametric extension of
the so called smooth minimum distance (SmoothMD) approach proposed by Lavergne
and Patilea (2013). SmoothMD is suitable for models defined by conditional
moment conditions and allows the variance of the error terms to depend on the
covariates. In addition, here we allow for infinite-dimension nuisance
parameters. The asymptotic behavior of the new SmoothMD estimator is studied
under general conditions and new inference methods are proposed. A simulation
experiment illustrates the performance of the methods for finite samples.

arXiv link: http://arxiv.org/abs/2106.10723v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-06-19

Generalized Spatial and Spatiotemporal ARCH Models

Authors: Philipp Otto, Wolfgang Schmid

In time-series analyses, particularly for finance, generalized autoregressive
conditional heteroscedasticity (GARCH) models are widely applied statistical
tools for modelling volatility clusters (i.e., periods of increased or
decreased risk). In contrast, it has not been considered to be of critical
importance until now to model spatial dependence in the conditional second
moments. Only a few models have been proposed for modelling local clusters of
increased risks. In this paper, we introduce a novel spatial GARCH process in a
unified spatial and spatiotemporal GARCH framework, which also covers all
previously proposed spatial ARCH models, exponential spatial GARCH, and
time-series GARCH models. In contrast to previous spatiotemporal and time
series models, this spatial GARCH allows for instantaneous spill-overs across
all spatial units. For this common modelling framework, estimators are derived
based on a non-linear least-squares approach. Eventually, the use of the model
is demonstrated by a Monte Carlo simulation study and by an empirical example
that focuses on real estate prices from 1995 to 2014 across the ZIP-Code areas
of Berlin. A spatial autoregressive model is applied to the data to illustrate
how locally varying model uncertainties (e.g., due to latent regressors) can be
captured by the spatial GARCH-type models.

arXiv link: http://arxiv.org/abs/2106.10477v1

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2021-06-18

Scalable Econometrics on Big Data -- The Logistic Regression on Spark

Authors: Aurélien Ouattara, Matthieu Bulté, Wan-Ju Lin, Philipp Scholl, Benedikt Veit, Christos Ziakas, Florian Felice, Julien Virlogeux, George Dikos

Extra-large datasets are becoming increasingly accessible, and computing
tools designed to handle huge amount of data efficiently are democratizing
rapidly. However, conventional statistical and econometric tools are still
lacking fluency when dealing with such large datasets. This paper dives into
econometrics on big datasets, specifically focusing on the logistic regression
on Spark. We review the robustness of the functions available in Spark to fit
logistic regression and introduce a package that we developed in PySpark which
returns the statistical summary of the logistic regression, necessary for
statistical inference.

arXiv link: http://arxiv.org/abs/2106.10341v1

Econometrics arXiv paper, submitted: 2021-06-17

Set coverage and robust policy

Authors: Marc Henry, Alexei Onatski

When conducting inference on partially identified parameters, confidence
regions may cover the whole identified set with a prescribed probability, to
which we will refer as set coverage, or they may cover each of its point with a
prescribed probability, to which we will refer as point coverage. Since set
coverage implies point coverage, confidence regions satisfying point coverage
are generally preferred on the grounds that they may be more informative. The
object of this note is to describe a decision problem in which, contrary to
received wisdom, point coverage is clearly undesirable.

arXiv link: http://arxiv.org/abs/2106.09784v1

Econometrics arXiv paper, submitted: 2021-06-15

Economic Nowcasting with Long Short-Term Memory Artificial Neural Networks (LSTM)

Authors: Daniel Hopp

Artificial neural networks (ANNs) have been the catalyst to numerous advances
in a variety of fields and disciplines in recent years. Their impact on
economics, however, has been comparatively muted. One type of ANN, the long
short-term memory network (LSTM), is particularly wellsuited to deal with
economic time-series. Here, the architecture's performance and characteristics
are evaluated in comparison with the dynamic factor model (DFM), currently a
popular choice in the field of economic nowcasting. LSTMs are found to produce
superior results to DFMs in the nowcasting of three separate variables; global
merchandise export values and volumes, and global services exports. Further
advantages include their ability to handle large numbers of input features in a
variety of time frequencies. A disadvantage is the inability to ascribe
contributions of input features to model outputs, common to all ANNs. In order
to facilitate continued applied research of the methodology by avoiding the
need for any knowledge of deep-learning libraries, an accompanying Python
library was developed using PyTorch, https://pypi.org/project/nowcast-lstm/.

arXiv link: http://arxiv.org/abs/2106.08901v1

Econometrics arXiv paper, submitted: 2021-06-15

Comparisons of Australian Mental Health Distributions

Authors: David Gunawan, William Griffiths, Duangkamon Chotikapanich

Bayesian nonparametric estimates of Australian mental health distributions
are obtained to assess how the mental health status of the population has
changed over time and to compare the mental health status of female/male and
indigenous/non-indigenous population subgroups. First- and second-order
stochastic dominance are used to compare distributions, with results presented
in terms of the posterior probability of dominance and the posterior
probability of no dominance. Our results suggest mental health has deteriorated
in recent years, that males mental health status is better than that of
females, and non-indigenous health status is better than that of the indigenous
population.

arXiv link: http://arxiv.org/abs/2106.08047v1

Econometrics arXiv updated paper (originally submitted: 2021-06-14)

Dynamic Asymmetric Causality Tests with an Application

Authors: Abdulnasser Hatemi-J

Testing for causation, defined as the preceding impact of the past values of
one variable on the current value of another one when all other pertinent
information is accounted for, is increasingly utilized in empirical research of
the time-series data in different scientific disciplines. A relatively recent
extension of this approach has been allowing for potential asymmetric impacts
since it is harmonious with the way reality operates in many cases according to
Hatemi-J (2012). The current paper maintains that it is also important to
account for the potential change in the parameters when asymmetric causation
tests are conducted, as there exists a number of reasons for changing the
potential causal connection between variables across time. The current paper
extends therefore the static asymmetric causality tests by making them dynamic
via the usage of subsamples. An application is also provided consistent with
measurable definitions of economic or financial bad as well as good news and
their potential interaction across time.

arXiv link: http://arxiv.org/abs/2106.07612v2

Econometrics arXiv paper, submitted: 2021-06-11

Sensitivity of LATE Estimates to Violations of the Monotonicity Assumption

Authors: Claudia Noack

In this paper, we develop a method to assess the sensitivity of local average
treatment effect estimates to potential violations of the monotonicity
assumption of Imbens and Angrist (1994). We parameterize the degree to which
monotonicity is violated using two sensitivity parameters: the first one
determines the share of defiers in the population, and the second one measures
differences in the distributions of outcomes between compliers and defiers. For
each pair of values of these sensitivity parameters, we derive sharp bounds on
the outcome distributions of compliers in the first-order stochastic dominance
sense. We identify the robust region that is the set of all values of
sensitivity parameters for which a given empirical conclusion, e.g. that the
local average treatment effect is positive, is valid. Researchers can assess
the credibility of their conclusion by evaluating whether all the plausible
sensitivity parameters lie in the robust region. We obtain confidence sets for
the robust region through a bootstrap procedure and illustrate the sensitivity
analysis in an empirical application. We also extend this framework to analyze
treatment effects of the entire population.

arXiv link: http://arxiv.org/abs/2106.06421v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-06-10

An Interpretable Neural Network for Parameter Inference

Authors: Johann Pfitzinger

Adoption of deep neural networks in fields such as economics or finance has
been constrained by the lack of interpretability of model outcomes. This paper
proposes a generative neural network architecture - the parameter encoder
neural network (PENN) - capable of estimating local posterior distributions for
the parameters of a regression model. The parameters fully explain predictions
in terms of the inputs and permit visualization, interpretation and inference
in the presence of complex heterogeneous effects and feature dependencies. The
use of Bayesian inference techniques offers an intuitive mechanism to
regularize local parameter estimates towards a stable solution, and to reduce
noise-fitting in settings of limited data availability. The proposed neural
network is particularly well-suited to applications in economics and finance,
where parameter inference plays an important role. An application to an asset
pricing problem demonstrates how the PENN can be used to explore nonlinear risk
dynamics in financial markets, and to compare empirical nonlinear effects to
behavior posited by financial theory.

arXiv link: http://arxiv.org/abs/2106.05536v1

Econometrics arXiv updated paper (originally submitted: 2021-06-10)

Panel Data with Unknown Clusters

Authors: Yong Cai

Clustered standard errors and approximate randomization tests are popular
inference methods that allow for dependence within observations. However, they
require researchers to know the cluster structure ex ante. We propose a
procedure to help researchers discover clusters in panel data. Our method is
based on thresholding an estimated long-run variance-covariance matrix and
requires the panel to be large in the time dimension, but imposes no lower
bound on the number of units. We show that our procedure recovers the true
clusters with high probability with no assumptions on the cluster structure.
The estimated clusters are independently of interest, but they can also be used
in the approximate randomization tests or with conventional cluster-robust
covariance estimators. The resulting procedures control size and have good
power.

arXiv link: http://arxiv.org/abs/2106.05503v4

Econometrics arXiv updated paper (originally submitted: 2021-06-09)

Estimation of Optimal Dynamic Treatment Assignment Rules under Policy Constraints

Authors: Shosei Sakaguchi

Many policies involve dynamics in their treatment assignments, where
individuals receive sequential interventions over multiple stages. We study
estimation of an optimal dynamic treatment regime that guides the optimal
treatment assignment for each individual at each stage based on their history.
We propose an empirical welfare maximization approach in this dynamic
framework, which estimates the optimal dynamic treatment regime using data from
an experimental or quasi-experimental study while satisfying exogenous
constraints on policies. The paper proposes two estimation methods: one solves
the treatment assignment problem sequentially through backward induction, and
the other solves the entire problem simultaneously across all stages. We
establish finite-sample upper bounds on worst-case average welfare regrets for
these methods and show their optimal $n^{-1/2}$ convergence rates. We also
modify the simultaneous estimation method to accommodate intertemporal
budget/capacity constraints.

arXiv link: http://arxiv.org/abs/2106.05031v5

Econometrics arXiv updated paper (originally submitted: 2021-06-09)

Contamination Bias in Linear Regressions

Authors: Paul Goldsmith-Pinkham, Peter Hull, Michal Kolesár

We study regressions with multiple treatments and a set of controls that is
flexible enough to purge omitted variable bias. We show that these regressions
generally fail to estimate convex averages of heterogeneous treatment effects
-- instead, estimates of each treatment's effect are contaminated by non-convex
averages of the effects of other treatments. We discuss three estimation
approaches that avoid such contamination bias, including the targeting of
easiest-to-estimate weighted average effects. A re-analysis of nine empirical
applications finds economically and statistically meaningful contamination bias
in observational studies; contamination bias in experimental studies is more
limited due to smaller variability in propensity scores.

arXiv link: http://arxiv.org/abs/2106.05024v5

Econometrics arXiv paper, submitted: 2021-06-08

Automatically Differentiable Random Coefficient Logistic Demand Estimation

Authors: Andrew Chia

We show how the random coefficient logistic demand (BLP) model can be phrased
as an automatically differentiable moment function, including the incorporation
of numerical safeguards proposed in the literature. This allows gradient-based
frequentist and quasi-Bayesian estimation using the Continuously Updating
Estimator (CUE). Drawing from the machine learning literature, we outline
hitherto under-utilized best practices in both frequentist and Bayesian
estimation techniques. Our Monte Carlo experiments compare the performance of
CUE, 2S-GMM, and LTE estimation. Preliminary findings indicate that the CUE
estimated using LTE and frequentist optimization has a lower bias but higher
MAE compared to the traditional 2-Stage GMM (2S-GMM) approach. We also find
that using credible intervals from MCMC sampling for the non-linear parameters
together with frequentist analytical standard errors for the concentrated out
linear parameters provides empirical coverage closest to the nominal level. The
accompanying admest Python package provides a platform for replication and
extensibility.

arXiv link: http://arxiv.org/abs/2106.04636v1

Econometrics arXiv updated paper (originally submitted: 2021-06-08)

Testing Monotonicity of Mean Potential Outcomes in a Continuous Treatment with High-Dimensional Data

Authors: Yu-Chin Hsu, Martin Huber, Ying-Ying Lee, Chu-An Liu

While most treatment evaluations focus on binary interventions, a growing
literature also considers continuously distributed treatments. We propose a
Cram\'{e}r-von Mises-type test for testing whether the mean potential outcome
given a specific treatment has a weakly monotonic relationship with the
treatment dose under a weak unconfoundedness assumption. In a nonseparable
structural model, applying our method amounts to testing monotonicity of the
average structural function in the continuous treatment of interest. To
flexibly control for a possibly high-dimensional set of covariates in our
testing approach, we propose a double debiased machine learning estimator that
accounts for covariates in a data-driven way. We show that the proposed test
controls asymptotic size and is consistent against any fixed alternative. These
theoretical findings are supported by the Monte-Carlo simulations. As an
empirical illustration, we apply our test to the Job Corps study and reject a
weakly negative relationship between the treatment (hours in academic and
vocational training) and labor market performance among relatively low
treatment values.

arXiv link: http://arxiv.org/abs/2106.04237v3

Econometrics arXiv paper, submitted: 2021-06-08

Modeling Portfolios with Leptokurtic and Dependent Risk Factors

Authors: Piero Quatto, Gianmarco Vacca, Maria Grazia Zoia

Recently, an approach to modeling portfolio distribution with risk factors
distributed as Gram-Charlier (GC) expansions of the Gaussian law, has been
conceived. GC expansions prove effective when dealing with moderately
leptokurtic data. In order to cover the case of possibly severe leptokurtosis,
the so-called GC-like expansions have been devised by reshaping parent
leptokurtic distributions by means of orthogonal polynomials specific to them.
In this paper, we focus on the hyperbolic-secant (HS) law as parent
distribution whose GC-like expansions fit with kurtosis levels up to 19.4. A
portfolio distribution has been obtained with risk factors modeled as GClike
expansions of the HS law which duly account for excess kurtosis. Empirical
evidence of the workings of the approach dealt with in the paper is included.

arXiv link: http://arxiv.org/abs/2106.04218v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-06-07

Superconsistency of Tests in High Dimensions

Authors: Anders Bredahl Kock, David Preinerstorfer

To assess whether there is some signal in a big database, aggregate tests for
the global null hypothesis of no effect are routinely applied in practice
before more specialized analysis is carried out. Although a plethora of
aggregate tests is available, each test has its strengths but also its blind
spots. In a Gaussian sequence model, we study whether it is possible to obtain
a test with substantially better consistency properties than the likelihood
ratio (i.e., Euclidean norm based) test. We establish an impossibility result,
showing that in the high-dimensional framework we consider, the set of
alternatives for which a test may improve upon the likelihood ratio test --
that is, its superconsistency points -- is always asymptotically negligible in
a relative volume sense.

arXiv link: http://arxiv.org/abs/2106.03700v3

Econometrics arXiv paper, submitted: 2021-06-07

On the "mementum" of Meme Stocks

Authors: Michele Costola, Matteo Iacopini, Carlo R. M. A. Santagiustina

The meme stock phenomenon is yet to be explored. In this note, we provide
evidence that these stocks display common stylized facts on the dynamics of
price, trading volume, and social media activity. Using a regime-switching
cointegration model, we identify the meme stock "mementum" which exhibits a
different characterization with respect to other stocks with high volumes of
activity (persistent and not) on social media. Understanding these properties
helps the investors and market authorities in their decision.

arXiv link: http://arxiv.org/abs/2106.03691v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-06-06

Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling

Authors: Sokbae Lee, Yuan Liao, Myung Hwan Seo, Youngki Shin

We develop a new method of online inference for a vector of parameters
estimated by the Polyak-Ruppert averaging procedure of stochastic gradient
descent (SGD) algorithms. We leverage insights from time series regression in
econometrics and construct asymptotically pivotal statistics via random
scaling. Our approach is fully operational with online data and is rigorously
underpinned by a functional central limit theorem. Our proposed inference
method has a couple of key advantages over the existing methods. First, the
test statistic is computed in an online fashion with only SGD iterates and the
critical values can be obtained without any resampling methods, thereby
allowing for efficient implementation suitable for massive online data. Second,
there is no need to estimate the asymptotic variance and our inference method
is shown to be robust to changes in the tuning parameters for SGD algorithms in
simulation experiments with synthetic data.

arXiv link: http://arxiv.org/abs/2106.03156v3

Econometrics arXiv updated paper (originally submitted: 2021-06-06)

Linear Rescaling to Accurately Interpret Logarithms

Authors: Nick Huntington-Klein

The standard approximation of a natural logarithm in statistical analysis
interprets a linear change of \(p\) in \(\ln(X)\) as a \((1+p)\) proportional
change in \(X\), which is only accurate for small values of \(p\). I suggest
base-\((1+p)\) logarithms, where \(p\) is chosen ahead of time. A one-unit
change in \(\log_{1+p}(X)\) is exactly equivalent to a \((1+p)\) proportional
change in \(X\). This avoids an approximation applied too broadly, makes exact
interpretation easier and less error-prone, improves approximation quality when
approximations are used, makes the change of interest a one-log-unit change
like other regression variables, and reduces error from the use of
\(\log(1+X)\).

arXiv link: http://arxiv.org/abs/2106.03070v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-06-06

Truthful Self-Play

Authors: Shohei Ohsawa

We present a general framework for evolutionary learning to emergent unbiased
state representation without any supervision. Evolutionary frameworks such as
self-play converge to bad local optima in case of multi-agent reinforcement
learning in non-cooperative partially observable environments with
communication due to information asymmetry. Our proposed framework is a simple
modification of self-play inspired by mechanism design, also known as {\em
reverse game theory}, to elicit truthful signals and make the agents
cooperative. The key idea is to add imaginary rewards using the peer prediction
method, i.e., a mechanism for evaluating the validity of information exchanged
between agents in a decentralized environment. Numerical experiments with
predator prey, traffic junction and StarCraft tasks demonstrate that the
state-of-the-art performance of our framework.

arXiv link: http://arxiv.org/abs/2106.03007v6

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-06-05

Learning Treatment Effects in Panels with General Intervention Patterns

Authors: Vivek F. Farias, Andrew A. Li, Tianyi Peng

The problem of causal inference with panel data is a central econometric
question. The following is a fundamental version of this problem: Let $M^*$ be
a low rank matrix and $E$ be a zero-mean noise matrix. For a `treatment' matrix
$Z$ with entries in $\{0,1\}$ we observe the matrix $O$ with entries $O_{ij} :=
M^*_{ij} + E_{ij} + T_{ij} Z_{ij}$ where $T_{ij} $ are
unknown, heterogenous treatment effects. The problem requires we estimate the
average treatment effect $\tau^* := \sum_{ij} T_{ij} Z_{ij} /
\sum_{ij} Z_{ij}$. The synthetic control paradigm provides an approach to
estimating $\tau^*$ when $Z$ places support on a single row. This paper extends
that framework to allow rate-optimal recovery of $\tau^*$ for general $Z$, thus
broadly expanding its applicability. Our guarantees are the first of their type
in this general setting. Computational experiments on synthetic and real-world
data show a substantial advantage over competing estimators.

arXiv link: http://arxiv.org/abs/2106.02780v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-06-03

Change-Point Analysis of Time Series with Evolutionary Spectra

Authors: Alessandro Casini, Pierre Perron

This paper develops change-point methods for the spectrum of a locally
stationary time series. We focus on series with a bounded spectral density that
change smoothly under the null hypothesis but exhibits change-points or becomes
less smooth under the alternative. We address two local problems. The first is
the detection of discontinuities (or breaks) in the spectrum at unknown dates
and frequencies. The second involves abrupt yet continuous changes in the
spectrum over a short time period at an unknown frequency without signifying a
break. Both problems can be cast into changes in the degree of smoothness of
the spectral density over time. We consider estimation and minimax-optimal
testing. We determine the optimal rate for the minimax distinguishable
boundary, i.e., the minimum break magnitude such that we are able to uniformly
control type I and type II errors. We propose a novel procedure for the
estimation of the change-points based on a wild sequential top-down algorithm
and show its consistency under shrinking shifts and possibly growing number of
change-points. Our method can be used across many fields and a companion
program is made available in popular software packages.

arXiv link: http://arxiv.org/abs/2106.02031v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-06-03

Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits

Authors: Ruohan Zhan, Vitor Hadad, David A. Hirshberg, Susan Athey

It has become increasingly common for data to be collected adaptively, for
example using contextual bandits. Historical data of this type can be used to
evaluate other treatment assignment policies to guide future innovation or
experiments. However, policy evaluation is challenging if the target policy
differs from the one used to collect data, and popular estimators, including
doubly robust (DR) estimators, can be plagued by bias, excessive variance, or
both. In particular, when the pattern of treatment assignment in the collected
data looks little like the pattern generated by the policy to be evaluated, the
importance weights used in DR estimators explode, leading to excessive
variance.
In this paper, we improve the DR estimator by adaptively weighting
observations to control its variance. We show that a t-statistic based on our
improved estimator is asymptotically normal under certain conditions, allowing
us to form confidence intervals and test hypotheses. Using synthetic data and
public benchmarks, we provide empirical evidence for our estimator's improved
accuracy and inferential properties relative to existing alternatives.

arXiv link: http://arxiv.org/abs/2106.02029v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-06-01

Retrospective causal inference via matrix completion, with an evaluation of the effect of European integration on cross-border employment

Authors: Jason Poulos, Andrea Albanese, Andrea Mercatanti, Fan Li

We propose a method of retrospective counterfactual imputation in panel data
settings with later-treated and always-treated units, but no never-treated
units. We use the observed outcomes to impute the counterfactual outcomes of
the later-treated using a matrix completion estimator. We propose a novel
propensity-score and elapsed-time weighting of the estimator's objective
function to correct for differences in the observed covariate and unobserved
fixed effects distributions, and elapsed time since treatment between groups.
Our methodology is motivated by studying the effect of two milestones of
European integration -- the Free Movement of persons and the Schengen Agreement
-- on the share of cross-border workers in sending border regions. We apply the
proposed method to the European Labour Force Survey (ELFS) data and provide
evidence that opening the border almost doubled the probability of working
beyond the border in Eastern European regions.

arXiv link: http://arxiv.org/abs/2106.00788v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-05-31

A Simple and General Debiased Machine Learning Theorem with Finite Sample Guarantees

Authors: Victor Chernozhukov, Whitney K. Newey, Rahul Singh

Debiased machine learning is a meta algorithm based on bias correction and
sample splitting to calculate confidence intervals for functionals, i.e. scalar
summaries, of machine learning algorithms. For example, an analyst may desire
the confidence interval for a treatment effect estimated with a neural network.
We provide a nonasymptotic debiased machine learning theorem that encompasses
any global or local functional of any machine learning algorithm that satisfies
a few simple, interpretable conditions. Formally, we prove consistency,
Gaussian approximation, and semiparametric efficiency by finite sample
arguments. The rate of convergence is $n^{-1/2}$ for global functionals, and it
degrades gracefully for local functionals. Our results culminate in a simple
set of conditions that an analyst can use to translate modern learning theory
rates into traditional statistical inference. The conditions reveal a general
double robustness property for ill posed inverse problems.

arXiv link: http://arxiv.org/abs/2105.15197v3

Econometrics arXiv updated paper (originally submitted: 2021-05-31)

Regression-Adjusted Estimation of Quantile Treatment Effects under Covariate-Adaptive Randomizations

Authors: Liang Jiang, Peter C. B. Phillips, Yubo Tao, Yichong Zhang

Datasets from field experiments with covariate-adaptive randomizations (CARs)
usually contain extra covariates in addition to the strata indicators. We
propose to incorporate these additional covariates via auxiliary regressions in
the estimation and inference of unconditional quantile treatment effects (QTEs)
under CARs. We establish the consistency and limit distribution of the
regression-adjusted QTE estimator and prove that the use of multiplier
bootstrap inference is non-conservative under CARs. The auxiliary regression
may be estimated parametrically, nonparametrically, or via regularization when
the data are high-dimensional. Even when the auxiliary regression is
misspecified, the proposed bootstrap inferential procedure still achieves the
nominal rejection probability in the limit under the null. When the auxiliary
regression is correctly specified, the regression-adjusted estimator achieves
the minimum asymptotic variance. We also discuss forms of adjustments that can
improve the efficiency of the QTE estimators. The finite sample performance of
the new estimation and inferential methods is studied in simulations and an
empirical application to a well-known dataset concerned with expanding access
to basic bank accounts on savings is reported.

arXiv link: http://arxiv.org/abs/2105.14752v4

Econometrics arXiv paper, submitted: 2021-05-29

Asset volatility forecasting:The optimal decay parameter in the EWMA model

Authors: Axel A. Araneda

The exponentially weighted moving average (EMWA) could be labeled as a
competitive volatility estimator, where its main strength relies on computation
simplicity, especially in a multi-asset scenario, due to dependency only on the
decay parameter, $\lambda$. But, what is the best election for $\lambda$ in the
EMWA volatility model? Through a large time-series data set of historical
returns of the top US large-cap companies; we test empirically the forecasting
performance of the EWMA approach, under different time horizons and varying the
decay parameter. Using a rolling window scheme, the out-of-sample performance
of the variance-covariance matrix is computed following two approaches. First,
if we look for a fixed decay parameter for the full sample, the results are in
agreement with the RiskMetrics suggestion for 1-month forecasting. In addition,
we provide the full-sample optimal decay parameter for the weekly and bi-weekly
forecasting horizon cases, confirming two facts: i) the optimal value is as a
function of the forecasting horizon, and ii) for lower forecasting horizons the
short-term memory gains importance. In a second way, we also evaluate the
forecasting performance of EWMA, but this time using the optimal time-varying
decay parameter which minimizes the in-sample variance-covariance estimator,
arriving at better accuracy than the use of a fixed-full-sample optimal
parameter.

arXiv link: http://arxiv.org/abs/2105.14382v1

Econometrics arXiv updated paper (originally submitted: 2021-05-29)

Crime and Mismeasured Punishment: Marginal Treatment Effect with Misclassification

Authors: Vitor Possebom

I partially identify the marginal treatment effect (MTE) when the treatment
is misclassified. I explore two restrictions, allowing for dependence between
the instrument and the misclassification decision. If the signs of the
derivatives of the propensity scores are equal, I identify the MTE sign. If
those derivatives are similar, I bound the MTE. To illustrate, I analyze the
impact of alternative sentences (fines and community service v. no punishment)
on recidivism in Brazil, where Appeals processes generate misclassification.
The estimated misclassification bias may be as large as 10% of the largest
possible MTE, and the bounds contain the correctly estimated MTE.

arXiv link: http://arxiv.org/abs/2106.00536v7

Econometrics arXiv paper, submitted: 2021-05-28

Specification tests for GARCH processes

Authors: Giuseppe Cavaliere, Indeewara Perera, Anders Rahbek

This paper develops tests for the correct specification of the conditional
variance function in GARCH models when the true parameter may lie on the
boundary of the parameter space. The test statistics considered are of
Kolmogorov-Smirnov and Cram\'{e}r-von Mises type, and are based on a certain
empirical process marked by centered squared residuals. The limiting
distributions of the test statistics are not free from (unknown) nuisance
parameters, and hence critical values cannot be tabulated. A novel bootstrap
procedure is proposed to implement the tests; it is shown to be asymptotically
valid under general conditions, irrespective of the presence of nuisance
parameters on the boundary. The proposed bootstrap approach is based on
shrinking of the parameter estimates used to generate the bootstrap sample
toward the boundary of the parameter space at a proper rate. It is simple to
implement and fast in applications, as the associated test statistics have
simple closed form expressions. A simulation study demonstrates that the new
tests: (i) have excellent finite sample behavior in terms of empirical
rejection probabilities under the null as well as under the alternative; (ii)
provide a useful complement to existing procedures based on Ljung-Box type
approaches. Two data examples are considered to illustrate the tests.

arXiv link: http://arxiv.org/abs/2105.14081v1

Econometrics arXiv updated paper (originally submitted: 2021-05-27)

Identification and Estimation of Partial Effects in Nonlinear Semiparametric Panel Models

Authors: Laura Liu, Alexandre Poirier, Ji-Liang Shiu

Average partial effects (APEs) are often not point identified in panel models
with unrestricted unobserved individual heterogeneity, such as a binary
response panel model with fixed effects and logistic errors as a special case.
This lack of point identification occurs despite the identification of these
models' common coefficients. We provide a unified framework to establish the
point identification of various partial effects in a wide class of nonlinear
semiparametric models under an index sufficiency assumption on the unobserved
heterogeneity, even when the error distribution is unspecified and
non-stationary. This assumption does not impose parametric restrictions on the
unobserved heterogeneity and idiosyncratic errors. We also present partial
identification results when the support condition fails. We then propose
three-step semiparametric estimators for APEs, average structural functions,
and average marginal effects, and show their consistency and asymptotic
normality. Finally, we illustrate our approach in a study of determinants of
married women's labor supply.

arXiv link: http://arxiv.org/abs/2105.12891v6

Econometrics arXiv updated paper (originally submitted: 2021-05-26)

A Structural Model of Business Card Exchange Networks

Authors: Juan Nelson Martínez Dahbura, Shota Komatsu, Takanori Nishida, Angelo Mele

Social and professional networks affect labor market dynamics, knowledge
diffusion and new business creation. To understand the determinants of how
these networks are formed in the first place, we analyze a unique dataset of
business cards exchanges among a sample of over 240,000 users of the
multi-platform contact management and professional social networking tool for
individuals Eight. We develop a structural model of network formation with
strategic interactions, and we estimate users' payoffs that depend on the
composition of business relationships, as well as indirect business
interactions. We allow heterogeneity of users in both observable and
unobservable characteristics to affect how relationships form and are
maintained. The model's stationary equilibrium delivers a likelihood that is a
mixture of exponential random graph models that we can characterize in
closed-form. We overcome several econometric and computational challenges in
estimation, by exploiting a two-step estimation procedure, variational
approximations and minorization-maximization methods. Our algorithm is
scalable, highly parallelizable and makes efficient use of computer memory to
allow estimation in massive networks. We show that users payoffs display
homophily in several dimensions, e.g. location; furthermore, users unobservable
characteristics also display homophily.

arXiv link: http://arxiv.org/abs/2105.12704v3

Econometrics arXiv cross-link from math.OC (math.OC), submitted: 2021-05-26

A data-driven approach to beating SAA out-of-sample

Authors: Jun-ya Gotoh, Michael Jong Kim, Andrew E. B. Lim

While solutions of Distributionally Robust Optimization (DRO) problems can
sometimes have a higher out-of-sample expected reward than the Sample Average
Approximation (SAA), there is no guarantee. In this paper, we introduce a class
of Distributionally Optimistic Optimization (DOO) models, and show that it is
always possible to “beat" SAA out-of-sample if we consider not just worst-case
(DRO) models but also best-case (DOO) ones. We also show, however, that this
comes at a cost: Optimistic solutions are more sensitive to model error than
either worst-case or SAA optimizers, and hence are less robust and calibrating
the worst- or best-case model to outperform SAA may be difficult when data is
limited.

arXiv link: http://arxiv.org/abs/2105.12342v3

Econometrics arXiv paper, submitted: 2021-05-25

Measuring Financial Advice: aligning client elicited and revealed risk

Authors: John R. J. Thompson, Longlong Feng, R. Mark Reesor, Chuck Grace, Adam Metzler

Financial advisors use questionnaires and discussions with clients to
determine a suitable portfolio of assets that will allow clients to reach their
investment objectives. Financial institutions assign risk ratings to each
security they offer, and those ratings are used to guide clients and advisors
to choose an investment portfolio risk that suits their stated risk tolerance.
This paper compares client Know Your Client (KYC) profile risk allocations to
their investment portfolio risk selections using a value-at-risk discrepancy
methodology. Value-at-risk is used to measure elicited and revealed risk to
show whether clients are over-risked or under-risked, changes in KYC risk lead
to changes in portfolio configuration, and cash flow affects a client's
portfolio risk. We demonstrate the effectiveness of value-at-risk at measuring
clients' elicited and revealed risk on a dataset provided by a private Canadian
financial dealership of over $50,000$ accounts for over $27,000$ clients and
$300$ advisors. By measuring both elicited and revealed risk using the same
measure, we can determine how well a client's portfolio aligns with their
stated goals. We believe that using value-at-risk to measure client risk
provides valuable insight to advisors to ensure that their practice is KYC
compliant, to better tailor their client portfolios to stated goals,
communicate advice to clients to either align their portfolios to stated goals
or refresh their goals, and to monitor changes to the clients' risk positions
across their practice.

arXiv link: http://arxiv.org/abs/2105.11892v1

Econometrics arXiv paper, submitted: 2021-05-24

Vector autoregression models with skewness and heavy tails

Authors: Sune Karlsson, Stepan Mazur, Hoang Nguyen

With uncertain changes of the economic environment, macroeconomic downturns
during recessions and crises can hardly be explained by a Gaussian structural
shock. There is evidence that the distribution of macroeconomic variables is
skewed and heavy tailed. In this paper, we contribute to the literature by
extending a vector autoregression (VAR) model to account for a more realistic
assumption of the multivariate distribution of the macroeconomic variables. We
propose a general class of generalized hyperbolic skew Student's t distribution
with stochastic volatility for the error term in the VAR model that allows us
to take into account skewness and heavy tails. Tools for Bayesian inference and
model selection using a Gibbs sampler are provided. In an empirical study, we
present evidence of skewness and heavy tails for monthly macroeconomic
variables. The analysis also gives a clear message that skewness should be
taken into account for better predictions during recessions and crises.

arXiv link: http://arxiv.org/abs/2105.11182v1

Econometrics arXiv paper, submitted: 2021-05-23

Inference for multi-valued heterogeneous treatment effects when the number of treated units is small

Authors: Marina Dias, Demian Pouzo

We propose a method for conducting asymptotically valid inference for
treatment effects in a multi-valued treatment framework where the number of
units in the treatment arms can be small and do not grow with the sample size.
We accomplish this by casting the model as a semi-/non-parametric conditional
quantile model and using known finite sample results about the law of the
indicator function that defines the conditional quantile. Our framework allows
for structural functions that are non-additively separable, with flexible
functional forms and heteroskedasticy in the residuals, and it also encompasses
commonly used designs like difference in difference. We study the finite sample
behavior of our test in a Monte Carlo study and we also apply our results to
assessing the effect of weather events on GDP growth.

arXiv link: http://arxiv.org/abs/2105.10965v1

Econometrics arXiv paper, submitted: 2021-05-20

Identification and Estimation of a Partially Linear Regression Model using Network Data: Inference and an Application to Network Peer Effects

Authors: Eric Auerbach

This paper provides additional results relevant to the setting, model, and
estimators of Auerbach (2019a). Section 1 contains results about the large
sample properties of the estimators from Section 2 of Auerbach (2019a). Section
2 considers some extensions to the model. Section 3 provides an application to
estimating network peer effects. Section 4 shows the results from some
simulations.

arXiv link: http://arxiv.org/abs/2105.10002v1

Econometrics arXiv paper, submitted: 2021-05-20

Two Sample Unconditional Quantile Effect

Authors: Atsushi Inoue, Tong Li, Qi Xu

This paper proposes a new framework to evaluate unconditional quantile
effects (UQE) in a data combination model. The UQE measures the effect of a
marginal counterfactual change in the unconditional distribution of a covariate
on quantiles of the unconditional distribution of a target outcome. Under rank
similarity and conditional independence assumptions, we provide a set of
identification results for UQEs when the target covariate is continuously
distributed and when it is discrete, respectively. Based on these
identification results, we propose semiparametric estimators and establish
their large sample properties under primitive conditions. Applying our method
to a variant of Mincer's earnings function, we study the counterfactual
quantile effect of actual work experience on income.

arXiv link: http://arxiv.org/abs/2105.09445v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-05-19

Multiply Robust Causal Mediation Analysis with Continuous Treatments

Authors: Yizhen Xu, Numair Sani, AmirEmad Ghassami, Ilya Shpitser

In many applications, researchers are interested in the direct and indirect
causal effects of a treatment or exposure on an outcome of interest. Mediation
analysis offers a rigorous framework for identifying and estimating these
causal effects. For binary treatments, efficient estimators for the direct and
indirect effects are presented by Tchetgen Tchetgen and Shpitser (2012) based
on the influence function of the parameter of interest. These estimators
possess desirable properties such as multiple-robustness and asymptotic
normality while allowing for slower than root-n rates of convergence for the
nuisance parameters. However, in settings involving continuous treatments,
these influence function-based estimators are not readily applicable without
making strong parametric assumptions. In this work, utilizing a
kernel-smoothing approach, we propose an estimator suitable for settings with
continuous treatments inspired by the influence function-based estimator of
Tchetgen Tchetgen and Shpitser (2012). Our proposed approach employs
cross-fitting, relaxing the smoothness requirements on the nuisance functions
and allowing them to be estimated at slower rates than the target parameter.
Additionally, similar to influence function-based estimators, our proposed
estimator is multiply robust and asymptotically normal, allowing for inference
in settings where parametric assumptions may not be justified.

arXiv link: http://arxiv.org/abs/2105.09254v3

Econometrics arXiv updated paper (originally submitted: 2021-05-18)

Trading-off Bias and Variance in Stratified Experiments and in Matching Studies, Under a Boundedness Condition on the Magnitude of the Treatment Effect

Authors: Clément de Chaisemartin

I consider estimation of the average treatment effect (ATE), in a population
composed of $S$ groups or units, when one has unbiased estimators of each
group's conditional average treatment effect (CATE). These conditions are met
in stratified experiments and in matching studies. I assume that each CATE is
bounded in absolute value by $B$ standard deviations of the outcome, for some
known $B$. This restriction may be appealing: outcomes are often standardized
in applied work, so researchers can use available literature to determine a
plausible value for $B$. I derive, across all linear combinations of the CATEs'
estimators, the minimax estimator of the ATE. In two stratified experiments, my
estimator has twice lower worst-case mean-squared-error than the commonly-used
strata-fixed effects estimator. In a matching study with limited overlap, my
estimator achieves 56% of the precision gains of a commonly-used trimming
estimator, and has an 11 times smaller worst-case mean-squared-error.

arXiv link: http://arxiv.org/abs/2105.08766v6

Econometrics arXiv updated paper (originally submitted: 2021-05-18)

Incorporating Social Welfare in Program-Evaluation and Treatment Choice

Authors: Debopam Bhattacharya, Tatiana Komarova

The econometric literature on treatment-effects typically takes functionals
of outcome-distributions as `social welfare' and ignores program-impacts on
unobserved utilities. We show how to incorporate aggregate utility within
econometric program-evaluation and optimal treatment-targeting for a
heterogenous population. In the practically important setting of
discrete-choice, under unrestricted preference-heterogeneity and
income-effects, the indirect-utility distribution becomes a closed-form
functional of average demand. This enables nonparametric cost-benefit analysis
of policy-interventions and their optimal targeting based on planners'
redistributional preferences. For ordered/continuous choice,
utility-distributions can be bounded. Our methods are illustrated with Indian
survey-data on private-tuition, where income-paths of usage-maximizing
subsidies differ significantly from welfare-maximizing ones.

arXiv link: http://arxiv.org/abs/2105.08689v2

Econometrics arXiv paper, submitted: 2021-05-18

Identification robust inference for moments based analysis of linear dynamic panel data models

Authors: Maurice J. G. Bun, Frank Kleibergen

We use identification robust tests to show that difference, level and
non-linear moment conditions, as proposed by Arellano and Bond (1991), Arellano
and Bover (1995), Blundell and Bond (1998) and Ahn and Schmidt (1995) for the
linear dynamic panel data model, do not separately identify the autoregressive
parameter when its true value is close to one and the variance of the initial
observations is large. We prove that combinations of these moment conditions,
however, do so when there are more than three time series observations. This
identification then solely results from a set of, so-called, robust moment
conditions. These robust moments are spanned by the combined difference, level
and non-linear moment conditions and only depend on differenced data. We show
that, when only the robust moments contain identifying information on the
autoregressive parameter, the discriminatory power of the Kleibergen (2005) LM
test using the combined moments is identical to the largest rejection
frequencies that can be obtained from solely using the robust moments. This
shows that the KLM test implicitly uses the robust moments when only they
contain information on the autoregressive parameter.

arXiv link: http://arxiv.org/abs/2105.08346v1

Econometrics arXiv paper, submitted: 2021-05-18

Double robust inference for continuous updating GMM

Authors: Frank Kleibergen, Zhaoguo Zhan

We propose the double robust Lagrange multiplier (DRLM) statistic for testing
hypotheses specified on the pseudo-true value of the structural parameters in
the generalized method of moments. The pseudo-true value is defined as the
minimizer of the population continuous updating objective function and equals
the true value of the structural parameter in the absence of
misspecification.hhy96 The (bounding) chi-squared limiting
distribution of the DRLM statistic is robust to both misspecification and weak
identification of the structural parameters, hence its name. To emphasize its
importance for applied work, we use the DRLM test to analyze the return on
education, which is often perceived to be weakly identified, using data from
Card (1995) where misspecification occurs in case of treatment heterogeneity;
and to analyze the risk premia associated with risk factors proposed in Adrian
et al. (2014) and He et al. (2017), where both misspecification and weak
identification need to be addressed.

arXiv link: http://arxiv.org/abs/2105.08345v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-05-17

Choice Set Confounding in Discrete Choice

Authors: Kiran Tomlinson, Johan Ugander, Austin R. Benson

Standard methods in preference learning involve estimating the parameters of
discrete choice models from data of selections (choices) made by individuals
from a discrete set of alternatives (the choice set). While there are many
models for individual preferences, existing learning methods overlook how
choice set assignment affects the data. Often, the choice set itself is
influenced by an individual's preferences; for instance, a consumer choosing a
product from an online retailer is often presented with options from a
recommender system that depend on information about the consumer's preferences.
Ignoring these assignment mechanisms can mislead choice models into making
biased estimates of preferences, a phenomenon that we call choice set
confounding; we demonstrate the presence of such confounding in widely-used
choice datasets.
To address this issue, we adapt methods from causal inference to the discrete
choice setting. We use covariates of the chooser for inverse probability
weighting and/or regression controls, accurately recovering individual
preferences in the presence of choice set confounding under certain
assumptions. When such covariates are unavailable or inadequate, we develop
methods that take advantage of structured choice set assignment to improve
prediction. We demonstrate the effectiveness of our methods on real-world
choice data, showing, for example, that accounting for choice set confounding
makes choices observed in hotel booking and commute transportation more
consistent with rational utility-maximization.

arXiv link: http://arxiv.org/abs/2105.07959v2

Econometrics arXiv cross-link from cs.GT (cs.GT), submitted: 2021-05-17

Putting a Compass on the Map of Elections

Authors: Niclas Boehmer, Robert Bredereck, Piotr Faliszewski, Rolf Niedermeier, Stanisław Szufa

Recently, Szufa et al. [AAMAS 2020] presented a "map of elections" that
visualizes a set of 800 elections generated from various statistical cultures.
While similar elections are grouped together on this map, there is no obvious
interpretation of the elections' positions. We provide such an interpretation
by introducing four canonical "extreme" elections, acting as a compass on the
map. We use them to analyze both a dataset provided by Szufa et al. and a
number of real-life elections. In effect, we find a new variant of the Mallows
model and show that it captures real-life scenarios particularly well.

arXiv link: http://arxiv.org/abs/2105.07815v1

Econometrics arXiv paper, submitted: 2021-05-17

Using social network and semantic analysis to analyze online travel forums and forecast tourism demand

Authors: A Fronzetti Colladon, B Guardabascio, R Innarella

Forecasting tourism demand has important implications for both policy makers
and companies operating in the tourism industry. In this research, we applied
methods and tools of social network and semantic analysis to study
user-generated content retrieved from online communities which interacted on
the TripAdvisor travel forum. We analyzed the forums of 7 major European
capital cities, over a period of 10 years, collecting more than 2,660,000
posts, written by about 147,000 users. We present a new methodology of analysis
of tourism-related big data and a set of variables which could be integrated
into traditional forecasting models. We implemented Factor Augmented
Autoregressive and Bridge models with social network and semantic variables
which often led to a better forecasting performance than univariate models and
models based on Google Trend data. Forum language complexity and the
centralization of the communication network, i.e. the presence of eminent
contributors, were the variables that contributed more to the forecasting of
international airport arrivals.

arXiv link: http://arxiv.org/abs/2105.07727v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-05-17

Classifying variety of customer's online engagement for churn prediction with mixed-penalty logistic regression

Authors: Petra Posedel Šimović, Davor Horvatic, Edward W. Sun

Using big data to analyze consumer behavior can provide effective
decision-making tools for preventing customer attrition (churn) in customer
relationship management (CRM). Focusing on a CRM dataset with several different
categories of factors that impact customer heterogeneity (i.e., usage of
self-care service channels, duration of service, and responsiveness to
marketing actions), we provide new predictive analytics of customer churn rate
based on a machine learning method that enhances the classification of logistic
regression by adding a mixed penalty term. The proposed penalized logistic
regression can prevent overfitting when dealing with big data and minimize the
loss function when balancing the cost from the median (absolute value) and mean
(squared value) regularization. We show the analytical properties of the
proposed method and its computational advantage in this research. In addition,
we investigate the performance of the proposed method with a CRM data set (that
has a large number of features) under different settings by efficiently
eliminating the disturbance of (1) least important features and (2) sensitivity
from the minority (churn) class. Our empirical results confirm the expected
performance of the proposed method in full compliance with the common
classification criteria (i.e., accuracy, precision, and recall) for evaluating
machine learning methods.

arXiv link: http://arxiv.org/abs/2105.07671v2

Econometrics arXiv updated paper (originally submitted: 2021-05-16)

Uniform Inference on High-dimensional Spatial Panel Networks

Authors: Victor Chernozhukov, Chen Huang, Weining Wang

We propose employing a high-dimensional generalized method of moments (GMM)
estimator, regularized for dimension reduction and subsequently debiased to
correct for shrinkage bias (referred to as a debiased-regularized estimator),
for inference on large-scale spatial panel networks. In particular, the network
structure, which incorporates a flexible sparse deviation that can be regarded
either as a latent component or as a misspecification of a predetermined
adjacency matrix, is estimated using a debiased machine learning approach. The
theoretical analysis establishes the consistency and asymptotic normality of
our proposed estimator, taking into account general temporal and spatial
dependencies inherent in the data-generating processes. A primary contribution
of our study is the development of a uniform inference theory, which enables
hypothesis testing on the parameters of interest, including zero or non-zero
elements in the network structure. Additionally, the asymptotic properties of
the estimator are derived for both linear and nonlinear moments. Simulations
demonstrate the superior performance of our proposed approach. Finally, we
apply our methodology to investigate the spatial network effects of stock
returns.

arXiv link: http://arxiv.org/abs/2105.07424v5

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-05-15

Cohort Shapley value for algorithmic fairness

Authors: Masayoshi Mase, Art B. Owen, Benjamin B. Seiler

Cohort Shapley value is a model-free method of variable importance grounded
in game theory that does not use any unobserved and potentially impossible
feature combinations. We use it to evaluate algorithmic fairness, using the
well known COMPAS recidivism data as our example. This approach allows one to
identify for each individual in a data set the extent to which they were
adversely or beneficially affected by their value of a protected attribute such
as their race. The method can do this even if race was not one of the original
predictors and even if it does not have access to a proprietary algorithm that
has made the predictions. The grounding in game theory lets us define aggregate
variable importance for a data set consistently with its per subject
definitions. We can investigate variable importance for multiple quantities of
interest in the fairness literature including false positive predictions.

arXiv link: http://arxiv.org/abs/2105.07168v1

Econometrics arXiv updated paper (originally submitted: 2021-05-14)

Policy Evaluation during a Pandemic

Authors: Brantly Callaway, Tong Li

National and local governments have implemented a large number of policies in
response to the Covid-19 pandemic. Evaluating the effects of these policies,
both on the number of Covid-19 cases as well as on other economic outcomes is a
key ingredient for policymakers to be able to determine which policies are most
effective as well as the relative costs and benefits of particular policies. In
this paper, we consider the relative merits of common identification strategies
that exploit variation in the timing of policies across different locations by
checking whether the identification strategies are compatible with leading
epidemic models in the epidemiology literature. We argue that unconfoundedness
type approaches, that condition on the pre-treatment "state" of the pandemic,
are likely to be more useful for evaluating policies than
difference-in-differences type approaches due to the highly nonlinear spread of
cases during a pandemic. For difference-in-differences, we further show that a
version of this problem continues to exist even when one is interested in
understanding the effect of a policy on other economic outcomes when those
outcomes also depend on the number of Covid-19 cases. We propose alternative
approaches that are able to circumvent these issues. We apply our proposed
approach to study the effect of state level shelter-in-place orders early in
the pandemic.

arXiv link: http://arxiv.org/abs/2105.06927v2

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2021-05-14

Characterization of the probability and information entropy of a process with an exponentially increasing sample space and its application to the Broad Money Supply

Authors: Laurence F Lacey

There is a random variable (X) with a determined outcome (i.e., X = x0),
p(x0) = 1. Consider x0 to have a discrete uniform distribution over the integer
interval [1, s], where the size of the sample space (s) = 1, in the initial
state, such that p(x0) = 1. What is the probability of x0 and the associated
information entropy (H), as s increases exponentially? If the sample space
expansion occurs at an exponential rate (rate constant = lambda) with time (t)
and applying time scaling, such that T = lambda x t, gives: p(x0|T)=exp(-T) and
H(T)=T. The characterization has also been extended to include exponential
expansion by means of simultaneous, independent processes, as well as the more
general multi-exponential case. The methodology was applied to the expansion of
the broad money supply of US$ over the period 2001-2019, as a real-world
example. At any given time, the information entropy is related to the rate at
which the sample space is expanding. In the context of the expansion of the
broad money supply, the information entropy could be considered to be related
to the "velocity" of the expansion of the money supply.

arXiv link: http://arxiv.org/abs/2105.14193v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2021-05-13

Dynamic Portfolio Allocation in High Dimensions using Sparse Risk Factors

Authors: Bruno P. C. Levy, Hedibert F. Lopes

We propose a fast and flexible method to scale multivariate return volatility
predictions up to high-dimensions using a dynamic risk factor model. Our
approach increases parsimony via time-varying sparsity on factor loadings and
is able to sequentially learn the use of constant or time-varying parameters
and volatilities. We show in a dynamic portfolio allocation problem with 452
stocks from the S&P 500 index that our dynamic risk factor model is able to
produce more stable and sparse predictions, achieving not just considerable
portfolio performance improvements but also higher utility gains for the
mean-variance investor compared to the traditional Wishart benchmark and the
passive investment on the market index.

arXiv link: http://arxiv.org/abs/2105.06584v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-05-12

Generalized Autoregressive Moving Average Models with GARCH Errors

Authors: Tingguo Zheng, Han Xiao, Rong Chen

One of the important and widely used classes of models for non-Gaussian time
series is the generalized autoregressive model average models (GARMA), which
specifies an ARMA structure for the conditional mean process of the underlying
time series. However, in many applications one often encounters conditional
heteroskedasticity. In this paper we propose a new class of models, referred to
as GARMA-GARCH models, that jointly specify both the conditional mean and
conditional variance processes of a general non-Gaussian time series. Under the
general modeling framework, we propose three specific models, as examples, for
proportional time series, nonnegative time series, and skewed and heavy-tailed
financial time series. Maximum likelihood estimator (MLE) and quasi Gaussian
MLE (GMLE) are used to estimate the parameters. Simulation studies and three
applications are used to demonstrate the properties of the models and the
estimation procedures.

arXiv link: http://arxiv.org/abs/2105.05532v1

Econometrics arXiv updated paper (originally submitted: 2021-05-11)

Robust Inference on Income Inequality: $t$-Statistic Based Approaches

Authors: Rustam Ibragimov, Paul Kattuman, Anton Skrobotov

Empirical analyses on income and wealth inequality and those in other fields
in economics and finance often face the difficulty that the data is
heterogeneous, heavy-tailed or correlated in some unknown fashion. The paper
focuses on applications of the recently developed t-statistic based
robust inference approaches in the analysis of inequality measures and their
comparisons under the above problems. Following the approaches, in particular,
a robust large sample test on equality of two parameters of interest (e.g., a
test of equality of inequality measures in two regions or countries considered)
is conducted as follows: The data in the two samples dealt with is partitioned
into fixed numbers $q_1, q_2\ge 2$ (e.g., $q_1=q_2=2, 4, 8$) of groups, the
parameters (inequality measures dealt with) are estimated for each group, and
inference is based on a standard two-sample $t-$test with the resulting $q_1,
q_2$ group estimators. Robust $t-$statistic approaches result in valid
inference under general conditions that group estimators of parameters (e.g.,
inequality measures) considered are asymptotically independent, unbiased and
Gaussian of possibly different variances, or weakly converge, at an arbitrary
rate, to independent scale mixtures of normal random variables. These
conditions are typically satisfied in empirical applications even under
pronounced heavy-tailedness and heterogeneity and possible dependence in
observations. The methods dealt with in the paper complement and compare
favorably with other inference approaches available in the literature. The use
of robust inference approaches is illustrated by an empirical analysis of
income inequality measures and their comparisons across different regions in
Russia.

arXiv link: http://arxiv.org/abs/2105.05335v2

Econometrics arXiv updated paper (originally submitted: 2021-05-10)

Efficient Peer Effects Estimators with Group Effects

Authors: Guido M. Kuersteiner, Ingmar R. Prucha, Ying Zeng

We study linear peer effects models where peers interact in groups,
individual's outcomes are linear in the group mean outcome and characteristics,
and group effects are random. Our specification is motivated by the moment
conditions imposed in Graham 2008. We show that these moment conditions can be
cast in terms of a linear random group effects model and lead to a class of GMM
estimators that are generally identified as long as there is sufficient
variation in group size. We also show that our class of GMM estimators contains
a Quasi Maximum Likelihood estimator (QMLE) for the random group effects model,
as well as the Wald estimator of Graham 2008 and the within estimator of Lee
2007 as special cases. Our identification results extend insights in Graham
2008 that show how assumptions about random group effects as well as variation
in group size can be used to overcome the reflection problem in identifying
peer effects. Our QMLE and GMM estimators accommodate additional covariates and
are valid in situations with a large but finite number of different group sizes
or types. Because our estimators are general moment based procedures, using
instruments other than binary group indicators in estimation is straight
forward. Our QMLE estimator accommodates group level covariates in the spirit
of Mundlak and Chamberlain and offers an alternative to fixed effects
specifications. Monte-Carlo simulations show that the bias of the QMLE
estimator decreases with the number of groups and the variation in group size,
and increases with group size. We also prove the consistency and asymptotic
normality of the estimator under reasonable assumptions.

arXiv link: http://arxiv.org/abs/2105.04330v2

Econometrics arXiv updated paper (originally submitted: 2021-05-09)

The Local Approach to Causal Inference under Network Interference

Authors: Eric Auerbach, Hongchang Guo, Max Tabord-Meehan

We propose a new nonparametric modeling framework for causal inference when
outcomes depend on how agents are linked in a social or economic network. Such
network interference describes a large literature on treatment spillovers,
social interactions, social learning, information diffusion, disease and
financial contagion, social capital formation, and more. Our approach works by
first characterizing how an agent is linked in the network using the
configuration of other agents and connections nearby as measured by path
distance. The impact of a policy or treatment assignment is then learned by
pooling outcome data across similarly configured agents. We demonstrate the
approach by deriving finite-sample bounds on the mean-squared error of a
k-nearest-neighbor estimator for the average treatment response as well as
proposing an asymptotically valid test for the hypothesis of policy
irrelevance.

arXiv link: http://arxiv.org/abs/2105.03810v5

Econometrics arXiv updated paper (originally submitted: 2021-05-08)

Difference-in-Differences Estimation with Spatial Spillovers

Authors: Kyle Butts

Empirical work often uses treatment assigned following geographic boundaries.
When the effects of treatment cross over borders, classical
difference-in-differences estimation produces biased estimates for the average
treatment effect. In this paper, I introduce a potential outcomes framework to
model spillover effects and decompose the estimate's bias in two parts: (1) the
control group no longer identifies the counterfactual trend because their
outcomes are affected by treatment and (2) changes in treated units' outcomes
reflect the effect of their own treatment status and the effect from the
treatment status of 'close' units. I propose conditions for non-parametric
identification that can remove both sources of bias and semi-parametrically
estimate the spillover effects themselves including in settings with staggered
treatment timing. To highlight the importance of spillover effects, I revisit
analyses of three place-based interventions.

arXiv link: http://arxiv.org/abs/2105.03737v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-05-06

Machine Collaboration

Authors: Qingfeng Liu, Yang Feng

We propose a new ensemble framework for supervised learning, called machine
collaboration (MaC), using a collection of base machines for prediction tasks.
Unlike bagging/stacking (a parallel & independent framework) and boosting (a
sequential & top-down framework), MaC is a type of circular & interactive
learning framework. The circular & interactive feature helps the base machines
to transfer information circularly and update their structures and parameters
accordingly. The theoretical result on the risk bound of the estimator from MaC
reveals that the circular & interactive feature can help MaC reduce risk via a
parsimonious ensemble. We conduct extensive experiments on MaC using both
simulated data and 119 benchmark real datasets. The results demonstrate that in
most cases, MaC performs significantly better than several other
state-of-the-art methods, including classification and regression trees, neural
networks, stacking, and boosting.

arXiv link: http://arxiv.org/abs/2105.02569v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-05-05

Policy Learning with Adaptively Collected Data

Authors: Ruohan Zhan, Zhimei Ren, Susan Athey, Zhengyuan Zhou

Learning optimal policies from historical data enables personalization in a
wide variety of applications including healthcare, digital recommendations, and
online education. The growing policy learning literature focuses on settings
where the data collection rule stays fixed throughout the experiment. However,
adaptive data collection is becoming more common in practice, from two primary
sources: 1) data collected from adaptive experiments that are designed to
improve inferential efficiency; 2) data collected from production systems that
progressively evolve an operational policy to improve performance over time
(e.g. contextual bandits). Yet adaptivity complicates the optimal policy
identification ex post, since samples are dependent, and each treatment may not
receive enough observations for each type of individual. In this paper, we make
initial research inquiries into addressing the challenges of learning the
optimal policy with adaptively collected data. We propose an algorithm based on
generalized augmented inverse propensity weighted (AIPW) estimators, which
non-uniformly reweight the elements of a standard AIPW estimator to control
worst-case estimation variance. We establish a finite-sample regret upper bound
for our algorithm and complement it with a regret lower bound that quantifies
the fundamental difficulty of policy learning with adaptive data. When equipped
with the best weighting scheme, our algorithm achieves minimax rate optimal
regret guarantees even with diminishing exploration. Finally, we demonstrate
our algorithm's effectiveness using both synthetic data and public benchmark
datasets.

arXiv link: http://arxiv.org/abs/2105.02344v2

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2021-05-04

Stock Price Forecasting in Presence of Covid-19 Pandemic and Evaluating Performances of Machine Learning Models for Time-Series Forecasting

Authors: Navid Mottaghi, Sara Farhangdoost

With the heightened volatility in stock prices during the Covid-19 pandemic,
the need for price forecasting has become more critical. We investigated the
forecast performance of four models including Long-Short Term Memory, XGBoost,
Autoregression, and Last Value on stock prices of Facebook, Amazon, Tesla,
Google, and Apple in COVID-19 pandemic time to understand the accuracy and
predictability of the models in this highly volatile time region. To train the
models, the data of all stocks are split into train and test datasets. The test
dataset starts from January 2020 to April 2021 which covers the COVID-19
pandemic period. The results show that the Autoregression and Last value models
have higher accuracy in predicting the stock prices because of the strong
correlation between the previous day and the next day's price value.
Additionally, the results suggest that the machine learning models (Long-Short
Term Memory and XGBoost) are not performing as well as Autoregression models
when the market experiences high volatility.

arXiv link: http://arxiv.org/abs/2105.02785v1

Econometrics arXiv updated paper (originally submitted: 2021-05-03)

A Modified Randomization Test for the Level of Clustering

Authors: Yong Cai

Suppose a researcher observes individuals within a county within a state.
Given concerns about correlation across individuals, it is common to group
observations into clusters and conduct inference treating observations across
clusters as roughly independent. However, a researcher that has chosen to
cluster at the county level may be unsure of their decision, given knowledge
that observations are independent across states. This paper proposes a modified
randomization test as a robustness check for the chosen level of clustering in
a linear regression setting. Existing tests require either the number of states
or number of counties to be large. Our method is designed for settings with few
states and few counties. While the method is conservative, it has competitive
power in settings that may be relevant to empirical work.

arXiv link: http://arxiv.org/abs/2105.01008v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-05-03

A nonparametric instrumental approach to endogeneity in competing risks models

Authors: Jad Beyhum, Jean-Pierre Florens, Ingrid Van Keilegom

This paper discusses endogenous treatment models with duration outcomes,
competing risks and random right censoring. The endogeneity issue is solved
using a discrete instrumental variable. We show that the competing risks model
generates a non-parametric quantile instrumental regression problem. The
cause-specific cumulative incidence, the cause-specific hazard and the
subdistribution hazard can be recovered from the regression function. A
distinguishing feature of the model is that censoring and competing risks
prevent identification at some quantiles. We characterize the set of quantiles
for which exact identification is possible and give partial identification
results for other quantiles. We outline an estimation procedure and discuss its
properties. The finite sample performance of the estimator is evaluated through
simulations. We apply the proposed method to the Health Insurance Plan of
Greater New York experiment.

arXiv link: http://arxiv.org/abs/2105.00946v1

Econometrics arXiv updated paper (originally submitted: 2021-05-03)

Identification and Estimation of Average Causal Effects in Fixed Effects Logit Models

Authors: Laurent Davezies, Xavier D'Haultfœuille, Louise Laage

This paper studies identification and estimation of average causal effects,
such as average marginal or treatment effects, in fixed effects logit models
with short panels. Relating the identified set of these effects to an extremal
moment problem, we first show how to obtain sharp bounds on such effects
simply, without any optimization. We also consider even simpler outer bounds,
which, contrary to the sharp bounds, do not require any first-step
nonparametric estimators. We build confidence intervals based on these two
approaches and show their asymptotic validity. Monte Carlo simulations suggest
that both approaches work well in practice, the second being typically
competitive in terms of interval length. Finally, we show that our method is
also useful to measure treatment effect heterogeneity.

arXiv link: http://arxiv.org/abs/2105.00879v5

Econometrics arXiv paper, submitted: 2021-05-02

A model of inter-organizational network formation

Authors: Shweta Gaonkar, Angelo Mele

How do inter-organizational networks emerge? Accounting for interdependence
among ties while studying tie formation is one of the key challenges in this
area of research. We address this challenge using an equilibrium framework
where firms' decisions to form links with other firms are modeled as a
strategic game. In this game, firms weigh the costs and benefits of
establishing a relationship with other firms and form ties if their net payoffs
are positive. We characterize the equilibrium networks as exponential random
graphs (ERGM), and we estimate the firms' payoffs using a Bayesian approach. To
demonstrate the usefulness of our approach, we apply the framework to a
co-investment network of venture capital firms in the medical device industry.
The equilibrium framework allows researchers to draw economic interpretation
from parameter estimates of the ERGM Model. We learn that firms rely on their
joint partners (transitivity) and prefer to form ties with firms similar to
themselves (homophily). These results hold after controlling for the
interdependence among ties. Another, critical advantage of a structural
approach is that it allows us to simulate the effects of economic shocks or
policy counterfactuals. We test two such policy shocks, namely, firm entry and
regulatory change. We show how new firms' entry or a regulatory shock of
minimum capital requirements increase the co-investment network's density and
clustering.

arXiv link: http://arxiv.org/abs/2105.00458v1

Econometrics arXiv updated paper (originally submitted: 2021-05-01)

Local Average and Marginal Treatment Effects with a Misclassified Treatment

Authors: Santiago Acerenza, Kyunghoon Ban, Désiré Kédagni

This paper studies identification of the local average and marginal treatment
effects (LATE and MTE) with a misclassified binary treatment variable. We
derive bounds on the (generalized) LATE and exploit its relationship with the
MTE to further bound the MTE. Indeed, under some standard assumptions, the MTE
is a limit of the ratio of the variation in the conditional expectation of the
observed outcome given the instrument to the variation in the true propensity
score, which is partially identified. We characterize the identified set for
the propensity score, and then for the MTE. We show that our LATE bounds are
tighter than the existing bounds and that the sign of the MTE is locally
identified under some mild regularity conditions. We use our MTE bounds to
derive bounds on other commonly used parameters in the literature and
illustrate the practical relevance of our derived bounds through numerical and
empirical results.

arXiv link: http://arxiv.org/abs/2105.00358v8

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-04-30

Automatic Debiased Machine Learning via Riesz Regression

Authors: Victor Chernozhukov, Whitney K. Newey, Victor Quintas-Martinez, Vasilis Syrgkanis

A variety of interesting parameters may depend on high dimensional
regressions. Machine learning can be used to estimate such parameters. However
estimators based on machine learners can be severely biased by regularization
and/or model selection. Debiased machine learning uses Neyman orthogonal
estimating equations to reduce such biases. Debiased machine learning generally
requires estimation of unknown Riesz representers. A primary innovation of this
paper is to provide Riesz regression estimators of Riesz representers that
depend on the parameter of interest, rather than explicit formulae, and that
can employ any machine learner, including neural nets and random forests.
End-to-end algorithms emerge where the researcher chooses the parameter of
interest and the machine learner and the debiasing follows automatically.
Another innovation here is debiased machine learners of parameters depending on
generalized regressions, including high-dimensional generalized linear models.
An empirical example of automatic debiased machine learning using neural nets
is given. We find in Monte Carlo examples that automatic debiasing sometimes
performs better than debiasing via inverse propensity scores and never worse.
Finite sample mean square error bounds for Riesz regression estimators and
asymptotic theory are also given.

arXiv link: http://arxiv.org/abs/2104.14737v3

Econometrics arXiv updated paper (originally submitted: 2021-04-29)

Nonparametric Difference-in-Differences in Repeated Cross-Sections with Continuous Treatments

Authors: Xavier D'Haultfoeuille, Stefan Hoderlein, Yuya Sasaki

This paper studies the identification of causal effects of a continuous
treatment using a new difference-in-difference strategy. Our approach allows
for endogeneity of the treatment, and employs repeated cross-sections. It
requires an exogenous change over time which affects the treatment in a
heterogeneous way, stationarity of the distribution of unobservables and a rank
invariance condition on the time trend. On the other hand, we do not impose any
functional form restrictions or an additive time trend, and we are invariant to
the scaling of the dependent variable. Under our conditions, the time trend can
be identified using a control group, as in the binary difference-in-differences
literature. In our scenario, however, this control group is defined by the
data. We then identify average and quantile treatment effect parameters. We
develop corresponding nonparametric estimators and study their asymptotic
properties. Finally, we apply our results to the effect of disposable income on
consumption.

arXiv link: http://arxiv.org/abs/2104.14458v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-04-29

Generalized Linear Models with Structured Sparsity Estimators

Authors: Mehmet Caner

In this paper, we introduce structured sparsity estimators in Generalized
Linear Models. Structured sparsity estimators in the least squares loss are
introduced by Stucky and van de Geer (2018) recently for fixed design and
normal errors. We extend their results to debiased structured sparsity
estimators with Generalized Linear Model based loss. Structured sparsity
estimation means penalized loss functions with a possible sparsity structure
used in the chosen norm. These include weighted group lasso, lasso and norms
generated from convex cones. The significant difficulty is that it is not clear
how to prove two oracle inequalities. The first one is for the initial
penalized Generalized Linear Model estimator. Since it is not clear how a
particular feasible-weighted nodewise regression may fit in an oracle
inequality for penalized Generalized Linear Model, we need a second oracle
inequality to get oracle bounds for the approximate inverse for the sample
estimate of second-order partial derivative of Generalized Linear Model.
Our contributions are fivefold: 1. We generalize the existing oracle
inequality results in penalized Generalized Linear Models by proving the
underlying conditions rather than assuming them. One of the key issues is the
proof of a sample one-point margin condition and its use in an oracle
inequality. 2. Our results cover even non sub-Gaussian errors and regressors.
3. We provide a feasible weighted nodewise regression proof which generalizes
the results in the literature from a simple l_1 norm usage to norms generated
from convex cones. 4. We realize that norms used in feasible nodewise
regression proofs should be weaker or equal to the norms in penalized
Generalized Linear Model loss. 5. We can debias the first step estimator via
getting an approximate inverse of the singular-sample second order partial
derivative of Generalized Linear Model loss.

arXiv link: http://arxiv.org/abs/2104.14371v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-04-29

Loss-Based Variational Bayes Prediction

Authors: David T. Frazier, Ruben Loaiza-Maya, Gael M. Martin, Bonsoo Koo

We propose a new approach to Bayesian prediction that caters for models with
a large number of parameters and is robust to model misspecification. Given a
class of high-dimensional (but parametric) predictive models, this new approach
constructs a posterior predictive using a variational approximation to a
generalized posterior that is directly focused on predictive accuracy. The
theoretical behavior of the new prediction approach is analyzed and a form of
optimality demonstrated. Applications to both simulated and empirical data
using high-dimensional Bayesian neural network and autoregressive mixture
models demonstrate that the approach provides more accurate results than
various alternatives, including misspecified likelihood-based predictions.

arXiv link: http://arxiv.org/abs/2104.14054v2

Econometrics arXiv updated paper (originally submitted: 2021-04-28)

Sequential Search Models: A Pairwise Maximum Rank Approach

Authors: Jiarui Liu

This paper studies sequential search models that (1) incorporate unobserved
product quality, which can be correlated with endogenous observable
characteristics (such as price) and endogenous search cost variables (such as
product rankings in online search intermediaries); and (2) do not require
researchers to know the true distribution of the match value between consumers
and products. A likelihood approach to estimate such models gives biased
results. Therefore, I propose a new estimator -- pairwise maximum rank (PMR)
estimator -- for both preference and search cost parameters. I show that the
PMR estimator is consistent using only data on consumers' search order among
one pair of products rather than data on consumers' full consideration set or
final purchase. Additionally, we can use the PMR estimator to test for the true
match value distribution in the data. In the empirical application, I apply the
PMR estimator to quantify the effect of rankings in Expedia hotel search using
two samples of the data set, to which consumers are randomly assigned. I find
the position effect to be $0.11-$0.36, and the effect estimated using the
sample with randomly generated rankings is close to the effect estimated using
the sample with endogenous rankings. Moreover, I find that the true match value
distribution in the data is unlikely to be N(0,1). Likelihood estimation
ignoring endogeneity gives an upward bias of at least $1.17; misspecification
of match value distribution as N(0,1) gives an upward bias of at least $2.99.

arXiv link: http://arxiv.org/abs/2104.13865v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-04-27

Changepoint detection in random coefficient autoregressive models

Authors: Lajos Horvath, Lorenzo Trapani

We propose a family of CUSUM-based statistics to detect the presence of
changepoints in the deterministic part of the autoregressive parameter in a
Random Coefficient AutoRegressive (RCA) sequence. In order to ensure the
ability to detect breaks at sample endpoints, we thoroughly study weighted
CUSUM statistics, analysing the asymptotics for virtually all possible weighing
schemes, including the standardised CUSUM process (for which we derive a
Darling-Erdos theorem) and even heavier weights (studying the so-called R\'enyi
statistics). Our results are valid irrespective of whether the sequence is
stationary or not, and no prior knowledge of stationarity or lack thereof is
required. Technically, our results require strong approximations which, in the
nonstationary case, are entirely new. Similarly, we allow for
heteroskedasticity of unknown form in both the error term and in the stochastic
part of the autoregressive coefficient, proposing a family of test statistics
which are robust to heteroskedasticity, without requiring any prior knowledge
as to the presence or type thereof. Simulations show that our procedures work
very well in finite samples. We complement our theory with applications to
financial, economic and epidemiological time series.

arXiv link: http://arxiv.org/abs/2104.13440v1

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2021-04-27

A model of multiple hypothesis testing

Authors: Davide Viviano, Kaspar Wuthrich, Paul Niehaus

Multiple hypothesis testing practices vary widely, without consensus on which
are appropriate when. This paper provides an economic foundation for these
practices designed to capture leading examples, such as regulatory approval on
the basis of clinical trials. In studies of multiple treatments or
sub-populations, adjustments may be appropriate depending on scale economies in
the research production function, with control of classical notions of compound
errors emerging in some but not all cases. In studies with multiple outcomes,
indexing is appropriate and adjustments to test levels may be appropriate if
the intended audience is heterogeneous. Data on actual costs in the drug
approval process suggest both that some adjustment is warranted in that setting
and that standard procedures may be overly conservative.

arXiv link: http://arxiv.org/abs/2104.13367v8

Econometrics arXiv updated paper (originally submitted: 2021-04-26)

Algorithm as Experiment: Machine Learning, Market Design, and Policy Eligibility Rules

Authors: Yusuke Narita, Kohei Yata

Algorithms make a growing portion of policy and business decisions. We
develop a treatment-effect estimator using algorithmic decisions as instruments
for a class of stochastic and deterministic algorithms. Our estimator is
consistent and asymptotically normal for well-defined causal effects. A special
case of our setup is multidimensional regression discontinuity designs with
complex boundaries. We apply our estimator to evaluate the Coronavirus Aid,
Relief, and Economic Security Act, which allocated many billions of dollars
worth of relief funding to hospitals via an algorithmic rule. The funding is
shown to have little effect on COVID-19-related hospital activities. Naive
estimates exhibit selection bias.

arXiv link: http://arxiv.org/abs/2104.12909v6

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-04-26

Valid Heteroskedasticity Robust Testing

Authors: Benedikt M. Pötscher, David Preinerstorfer

Tests based on heteroskedasticity robust standard errors are an important
technique in econometric practice. Choosing the right critical value, however,
is not simple at all: conventional critical values based on asymptotics often
lead to severe size distortions; and so do existing adjustments including the
bootstrap. To avoid these issues, we suggest to use smallest size-controlling
critical values, the generic existence of which we prove in this article for
the commonly used test statistics. Furthermore, sufficient and often also
necessary conditions for their existence are given that are easy to check.
Granted their existence, these critical values are the canonical choice: larger
critical values result in unnecessary power loss, whereas smaller critical
values lead to over-rejections under the null hypothesis, make spurious
discoveries more likely, and thus are invalid. We suggest algorithms to
numerically determine the proposed critical values and provide implementations
in accompanying software. Finally, we numerically study the behavior of the
proposed testing procedures, including their power properties.

arXiv link: http://arxiv.org/abs/2104.12597v3

Econometrics arXiv paper, submitted: 2021-04-26

Weak Instrumental Variables: Limitations of Traditional 2SLS and Exploring Alternative Instrumental Variable Estimators

Authors: Aiwei Huang, Madhurima Chandra, Laura Malkhasyan

Instrumental variables estimation has gained considerable traction in recent
decades as a tool for causal inference, particularly amongst empirical
researchers. This paper makes three contributions. First, we provide a detailed
theoretical discussion on the properties of the standard two-stage least
squares estimator in the presence of weak instruments and introduce and derive
two alternative estimators. Second, we conduct Monte-Carlo simulations to
compare the finite-sample behavior of the different estimators, particularly in
the weak-instruments case. Third, we apply the estimators to a real-world
context; we employ the different estimators to calculate returns to schooling.

arXiv link: http://arxiv.org/abs/2104.12370v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-04-25

Interference, Bias, and Variance in Two-Sided Marketplace Experimentation: Guidance for Platforms

Authors: Hannah Li, Geng Zhao, Ramesh Johari, Gabriel Y. Weintraub

Two-sided marketplace platforms often run experiments to test the effect of
an intervention before launching it platform-wide. A typical approach is to
randomize individuals into the treatment group, which receives the
intervention, and the control group, which does not. The platform then compares
the performance in the two groups to estimate the effect if the intervention
were launched to everyone. We focus on two common experiment types, where the
platform randomizes individuals either on the supply side or on the demand
side. The resulting estimates of the treatment effect in these experiments are
typically biased: because individuals in the market compete with each other,
individuals in the treatment group affect those in the control group and vice
versa, creating interference. We develop a simple tractable market model to
study bias and variance in these experiments with interference. We focus on two
choices available to the platform: (1) Which side of the platform should it
randomize on (supply or demand)? (2) What proportion of individuals should be
allocated to treatment? We find that both choices affect the bias and variance
of the resulting estimators but in different ways. The bias-optimal choice of
experiment type depends on the relative amounts of supply and demand in the
market, and we discuss how a platform can use market data to select the
experiment type. Importantly, we find in many circumstances, choosing the
bias-optimal experiment type has little effect on variance. On the other hand,
the choice of treatment proportion can induce a bias-variance tradeoff, where
the bias-minimizing proportion increases variance. We discuss how a platform
can navigate this tradeoff and best choose the treatment proportion, using a
combination of modeling as well as contextual knowledge about the market, the
risk of the intervention, and reasonable effect sizes of the intervention.

arXiv link: http://arxiv.org/abs/2104.12222v1

Econometrics arXiv updated paper (originally submitted: 2021-04-25)

Performance of Empirical Risk Minimization for Linear Regression with Dependent Data

Authors: Christian Brownlees, Guðmundur Stefán Guðmundsson

This paper establishes bounds on the performance of empirical risk
minimization for large-dimensional linear regression. We generalize existing
results by allowing the data to be dependent and heavy-tailed. The analysis
covers both the cases of identically and heterogeneously distributed
observations. Our analysis is nonparametric in the sense that the relationship
between the regressand and the regressors is not specified. The main results of
this paper show that the empirical risk minimizer achieves the optimal
performance (up to a logarithmic factor) in a dependent data setting.

arXiv link: http://arxiv.org/abs/2104.12127v5

Econometrics arXiv cross-link from q-fin.CP (q-fin.CP), submitted: 2021-04-24

Hermite Polynomial-based Valuation of American Options with General Jump-Diffusion Processes

Authors: Li Chen, Guang Zhang

We present a new approximation scheme for the price and exercise policy of
American options. The scheme is based on Hermite polynomial expansions of the
transition density of the underlying asset dynamics and the early exercise
premium representation of the American option price. The advantages of the
proposed approach are threefold. First, our approach does not require the
transition density and characteristic functions of the underlying asset
dynamics to be attainable in closed form. Second, our approach is fast and
accurate, while the prices and exercise policy can be jointly produced. Third,
our approach has a wide range of applications. We show that the proposed
approximations of the price and optimal exercise boundary converge to the true
ones. We also provide a numerical method based on a step function to implement
our proposed approach. Applications to nonlinear mean-reverting models, double
mean-reverting models, Merton's and Kou's jump-diffusion models are presented
and discussed.

arXiv link: http://arxiv.org/abs/2104.11870v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2021-04-23

Correlated Dynamics in Marketing Sensitivities

Authors: Ryan Dew, Yuhao Fan

Understanding individual customers' sensitivities to prices, promotions,
brands, and other marketing mix elements is fundamental to a wide swath of
marketing problems. An important but understudied aspect of this problem is the
dynamic nature of these sensitivities, which change over time and vary across
individuals. Prior work has developed methods for capturing such dynamic
heterogeneity within product categories, but neglected the possibility of
correlated dynamics across categories. In this work, we introduce a framework
to capture such correlated dynamics using a hierarchical dynamic factor model,
where individual preference parameters are influenced by common cross-category
dynamic latent factors, estimated through Bayesian nonparametric Gaussian
processes. We apply our model to grocery purchase data, and find that a
surprising degree of dynamic heterogeneity can be accounted for by only a few
global trends. We also characterize the patterns in how consumers'
sensitivities evolve across categories. Managerially, the proposed framework
not only enhances predictive accuracy by leveraging cross-category data, but
enables more precise estimation of quantities of interest, like price
elasticity.

arXiv link: http://arxiv.org/abs/2104.11702v2

Econometrics arXiv updated paper (originally submitted: 2021-04-23)

Robust decision-making under risk and ambiguity

Authors: Maximilian Blesch, Philipp Eisenhauer

Economists often estimate economic models on data and use the point estimates
as a stand-in for the truth when studying the model's implications for optimal
decision-making. This practice ignores model ambiguity, exposes the decision
problem to misspecification, and ultimately leads to post-decision
disappointment. Using statistical decision theory, we develop a framework to
explore, evaluate, and optimize robust decision rules that explicitly account
for estimation uncertainty. We show how to operationalize our analysis by
studying robust decisions in a stochastic dynamic investment model in which a
decision-maker directly accounts for uncertainty in the model's transition
dynamics.

arXiv link: http://arxiv.org/abs/2104.12573v4

Econometrics arXiv paper, submitted: 2021-04-22

Investigating farming efficiency through a two stage analytical approach: Application to the agricultural sector in Northern Oman

Authors: Amar Oukil, Slim Zekri

In this paper, we develop a two-stage analytical framework to investigate
farming efficiency. In the first stage, data envelopment analysis is employed
to estimate the efficiency of the farms and conduct slack and scale economies
analyses. In the second stage, we propose a stochastic model to identify
potential sources of inefficiency. The latter model integrates within a unified
structure all variables, including inputs, outputs and contextual factors. As
an application ground, we use a sample of 60 farms from the Batinah coastal
region, an agricultural area representing more than 53 per cent of the total
cropped area of Oman. The findings of the study lay emphasis on the
inter-dependence of groundwater salinity, irrigation technology and operational
efficiency of a farm, with as a key recommendation the necessity for more
regulated water consumption and a readjustment of governmental subsidiary
policies.

arXiv link: http://arxiv.org/abs/2104.10943v1

Econometrics arXiv updated paper (originally submitted: 2021-04-21)

Identification of Peer Effects with Miss-specified Peer Groups: Missing Data and Group Uncertainty

Authors: Christiern Rose, Lizi Yu

We consider identification of peer effects under peer group
miss-specification. Two leading cases are missing data and peer group
uncertainty. Missing data can take the form of some individuals being entirely
absent from the data. The researcher need not have any information on missing
individuals and need not even know that they are missing. We show that peer
effects are nevertheless identifiable under mild restrictions on the
probabilities of observing individuals, and propose a GMM estimator to estimate
the peer effects. In practice this means that the researcher need only have
access to an individual level sample with group identifiers. Group uncertainty
arises when the relevant peer group for the outcome under study is unknown. We
show that peer effects are nevertheless identifiable if the candidate groups
are nested within one another and propose a non-linear least squares estimator.
We conduct a Monte-Carlo experiment to demonstrate our identification results
and the performance of the proposed estimators, and apply our method to study
peer effects in the career decisions of junior lawyers.

arXiv link: http://arxiv.org/abs/2104.10365v5

Econometrics arXiv paper, submitted: 2021-04-21

Automatic Double Machine Learning for Continuous Treatment Effects

Authors: Sylvia Klosin

In this paper, we introduce and prove asymptotic normality for a new
nonparametric estimator of continuous treatment effects. Specifically, we
estimate the average dose-response function - the expected value of an outcome
of interest at a particular level of the treatment level. We utilize tools from
both the double debiased machine learning (DML) and the automatic double
machine learning (ADML) literatures to construct our estimator. Our estimator
utilizes a novel debiasing method that leads to nice theoretical stability and
balancing properties. In simulations our estimator performs well compared to
current methods.

arXiv link: http://arxiv.org/abs/2104.10334v1

Econometrics arXiv cross-link from q-fin.RM (q-fin.RM), submitted: 2021-04-20

Backtesting Systemic Risk Forecasts using Multi-Objective Elicitability

Authors: Tobias Fissler, Yannick Hoga

Systemic risk measures such as CoVaR, CoES and MES are widely-used in
finance, macroeconomics and by regulatory bodies. Despite their importance, we
show that they fail to be elicitable and identifiable. This renders forecast
comparison and validation, commonly summarised as `backtesting', impossible.
The novel notion of multi-objective elicitability solves this problem.
Specifically, we propose Diebold--Mariano type tests utilising two-dimensional
scores equipped with the lexicographic order. We illustrate the test decisions
by an easy-to-apply traffic-light approach. We apply our traffic-light approach
to DAX 30 and S&P 500 returns, and infer some recommendations for regulators.

arXiv link: http://arxiv.org/abs/2104.10673v4

Econometrics arXiv updated paper (originally submitted: 2021-04-20)

CATE meets ML -- The Conditional Average Treatment Effect and Machine Learning

Authors: Daniel Jacob

For treatment effects - one of the core issues in modern econometric analysis
- prediction and estimation are two sides of the same coin. As it turns out,
machine learning methods are the tool for generalized prediction models.
Combined with econometric theory, they allow us to estimate not only the
average but a personalized treatment effect - the conditional average treatment
effect (CATE). In this tutorial, we give an overview of novel methods, explain
them in detail, and apply them via Quantlets in real data applications. We
study the effect that microcredit availability has on the amount of money
borrowed and if 401(k) pension plan eligibility has an impact on net financial
assets, as two empirical examples. The presented toolbox of methods contains
meta-learners, like the Doubly-Robust, R-, T- and X-learner, and methods that
are specially designed to estimate the CATE like the causal BART and the
generalized random forest. In both, the microcredit and 401(k) example, we find
a positive treatment effect for all observations but conflicting evidence of
treatment effect heterogeneity. An additional simulation study, where the true
treatment effect is known, allows us to compare the different methods and to
observe patterns and similarities.

arXiv link: http://arxiv.org/abs/2104.09935v2

Econometrics arXiv updated paper (originally submitted: 2021-04-19)

Deep Reinforcement Learning in a Monetary Model

Authors: Mingli Chen, Andreas Joseph, Michael Kumhof, Xinlei Pan, Xuan Zhou

We propose using deep reinforcement learning to solve dynamic stochastic
general equilibrium models. Agents are represented by deep artificial neural
networks and learn to solve their dynamic optimisation problem by interacting
with the model environment, of which they have no a priori knowledge. Deep
reinforcement learning offers a flexible yet principled way to model bounded
rationality within this general class of models. We apply our proposed approach
to a classical model from the adaptive learning literature in macroeconomics
which looks at the interaction of monetary and fiscal policy. We find that,
contrary to adaptive learning, the artificially intelligent household can solve
the model in all policy regimes.

arXiv link: http://arxiv.org/abs/2104.09368v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-04-15

Estimating and Improving Dynamic Treatment Regimes With a Time-Varying Instrumental Variable

Authors: Shuxiao Chen, Bo Zhang

Estimating dynamic treatment regimes (DTRs) from retrospective observational
data is challenging as some degree of unmeasured confounding is often expected.
In this work, we develop a framework of estimating properly defined "optimal"
DTRs with a time-varying instrumental variable (IV) when unmeasured covariates
confound the treatment and outcome, rendering the potential outcome
distributions only partially identified. We derive a novel Bellman equation
under partial identification, use it to define a generic class of estimands
(termed IV-optimal DTRs), and study the associated estimation problem. We then
extend the IV-optimality framework to tackle the policy improvement problem,
delivering IV-improved DTRs that are guaranteed to perform no worse and
potentially better than a pre-specified baseline DTR. Importantly, our
IV-improvement framework opens up the possibility of strictly improving upon
DTRs that are optimal under the no unmeasured confounding assumption (NUCA). We
demonstrate via extensive simulations the superior performance of IV-optimal
and IV-improved DTRs over the DTRs that are optimal only under the NUCA. In a
real data example, we embed retrospective observational registry data into a
natural, two-stage experiment with noncompliance using a time-varying IV and
estimate useful IV-optimal DTRs that assign mothers to high-level or low-level
neonatal intensive care units based on their prognostic variables.

arXiv link: http://arxiv.org/abs/2104.07822v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-04-15

A robust specification test in linear panel data models

Authors: Beste Hamiye Beyaztas, Soutir Bandyopadhyay, Abhijit Mandal

The presence of outlying observations may adversely affect statistical
testing procedures that result in unstable test statistics and unreliable
inferences depending on the distortion in parameter estimates. In spite of the
fact that the adverse effects of outliers in panel data models, there are only
a few robust testing procedures available for model specification. In this
paper, a new weighted likelihood based robust specification test is proposed to
determine the appropriate approach in panel data including individual-specific
components. The proposed test has been shown to have the same asymptotic
distribution as that of most commonly used Hausman's specification test under
null hypothesis of random effects specification. The finite sample properties
of the robust testing procedure are illustrated by means of Monte Carlo
simulations and an economic-growth data from the member countries of the
Organisation for Economic Co-operation and Development. Our records reveal that
the robust specification test exhibit improved performance in terms of size and
power of the test in the presence of contamination.

arXiv link: http://arxiv.org/abs/2104.07723v1

Econometrics arXiv cross-link from cs.CL (cs.CL), submitted: 2021-04-12

BERT based freedom to operate patent analysis

Authors: Michael Freunek, André Bodmer

In this paper we present a method to apply BERT to freedom to operate patent
analysis and patent searches. According to the method, BERT is fine-tuned by
training patent descriptions to the independent claims. Each description
represents an invention which is protected by the corresponding claims. Such a
trained BERT could be able to identify or order freedom to operate relevant
patents based on a short description of an invention or product. We tested the
method by training BERT on the patent class G06T1/00 and applied the trained
BERT on five inventions classified in G06T1/60, described via DOCDB abstracts.
The DOCDB abstract are available on ESPACENET of the European Patent Office.

arXiv link: http://arxiv.org/abs/2105.00817v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-04-10

Selecting Penalty Parameters of High-Dimensional M-Estimators using Bootstrapping after Cross-Validation

Authors: Denis Chetverikov, Jesper Riis-Vestergaard Sørensen

We develop a new method for selecting the penalty parameter for
$\ell_{1}$-penalized M-estimators in high dimensions, which we refer to as
bootstrapping after cross-validation. We derive rates of convergence for the
corresponding $\ell_1$-penalized M-estimator and also for the
post-$\ell_1$-penalized M-estimator, which refits the non-zero entries of the
former estimator without penalty in the criterion function. We demonstrate via
simulations that our methods are not dominated by cross-validation in terms of
estimation errors and can outperform cross-validation in terms of inference. As
an empirical illustration, we revisit Fryer Jr (2019), who investigated racial
differences in police use of force, and confirm his findings.

arXiv link: http://arxiv.org/abs/2104.04716v5

Econometrics arXiv updated paper (originally submitted: 2021-04-09)

Identification of Dynamic Panel Logit Models with Fixed Effects

Authors: Christopher Dobronyi, Jiaying Gu, Kyoo il Kim, Thomas M. Russell

We show that identification in a general class of dynamic panel logit models
with fixed effects is related to the truncated moment problem from the
mathematics literature. We use this connection to show that the identified set
for structural parameters and functionals of the distribution of latent
individual effects can be characterized by a finite set of conditional moment
equalities subject to a certain set of shape constraints on the model
parameters. In addition to providing a general approach to identification, the
new characterization can deliver informative bounds in cases where competing
methods deliver no identifying restrictions, and can deliver point
identification in cases where competing methods deliver partial identification.
We then present an estimation and inference procedure that uses semidefinite
programming methods, is applicable with continuous or discrete covariates, and
can be used for models that are either point- or partially-identified. Finally,
we illustrate our identification result with a number of examples, and provide
an empirical application to employment dynamics using data from the National
Longitudinal Survey of Youth.

arXiv link: http://arxiv.org/abs/2104.04590v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-04-08

Average Direct and Indirect Causal Effects under Interference

Authors: Yuchen Hu, Shuangning Li, Stefan Wager

We propose a definition for the average indirect effect of a binary treatment
in the potential outcomes model for causal inference under cross-unit
interference. Our definition is analogous to the standard definition of the
average direct effect, and can be expressed without needing to compare outcomes
across multiple randomized experiments. We show that the proposed indirect
effect satisfies a decomposition theorem whereby, in a Bernoulli trial, the sum
of the average direct and indirect effects always corresponds to the effect of
a policy intervention that infinitesimally increases treatment probabilities.
We also consider a number of parametric models for interference, and find that
our non-parametric indirect effect remains a natural estimand when re-expressed
in the context of these models.

arXiv link: http://arxiv.org/abs/2104.03802v4

Econometrics arXiv updated paper (originally submitted: 2021-04-08)

Predicting Inflation with Recurrent Neural Networks

Authors: Livia Paranhos

This paper applies a recurrent neural network, the LSTM, to forecast
inflation. This is an appealing model for time series as it processes each time
step sequentially and explicitly learns dynamic dependencies. The paper also
explores the dimension reduction capability of the model to uncover
economically-meaningful factors that can explain the inflation process. Results
from an exercise with US data indicate that the estimated neural nets present
competitive, but not outstanding, performance against common benchmarks
(including other machine learning models). The LSTM in particular is found to
perform well at long horizons and during periods of heightened macroeconomic
uncertainty. Interestingly, LSTM-implied factors present high correlation with
business cycle indicators, informing on the usefulness of such signals as
inflation predictors. The paper also sheds light on the impact of network
initialization and architecture on forecast performance.

arXiv link: http://arxiv.org/abs/2104.03757v2

Econometrics arXiv updated paper (originally submitted: 2021-04-07)

Min(d)ing the President: A text analytic approach to measuring tax news

Authors: Lenard Lieb, Adam Jassem, Rui Jorge Almeida, Nalan Baştürk, Stephan Smeekes

Economic agents react to signals about future tax policy changes.
Consequently, estimating their macroeconomic effects requires identification of
such signals. We propose a novel text analytic approach for transforming
textual information into an economically meaningful time series. Using this
method, we create a tax news measure from all publicly available post-war
communications of U.S. presidents. Our measure predicts the direction and size
of future tax changes and contains signals not present in previously considered
(narrative) measures of tax changes. We investigate the effects of tax news and
find that, for long anticipation horizons, pre-implementation effects lead
initially to contractions in output.

arXiv link: http://arxiv.org/abs/2104.03261v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-04-07

DoubleML -- An Object-Oriented Implementation of Double Machine Learning in Python

Authors: Philipp Bach, Victor Chernozhukov, Malte S. Kurz, Martin Spindler

DoubleML is an open-source Python library implementing the double machine
learning framework of Chernozhukov et al. (2018) for a variety of causal
models. It contains functionalities for valid statistical inference on causal
parameters when the estimation of nuisance parameters is based on machine
learning methods. The object-oriented implementation of DoubleML provides a
high flexibility in terms of model specifications and makes it easily
extendable. The package is distributed under the MIT license and relies on core
libraries from the scientific Python ecosystem: scikit-learn, numpy, pandas,
scipy, statsmodels and joblib. Source code, documentation and an extensive user
guide can be found at https://github.com/DoubleML/doubleml-for-py and
https://docs.doubleml.org.

arXiv link: http://arxiv.org/abs/2104.03220v2

Econometrics arXiv updated paper (originally submitted: 2021-04-07)

Bootstrap Inference for Hawkes and General Point Processes

Authors: Giuseppe Cavaliere, Ye Lu, Anders Rahbek, Jacob Stærk-Østergaard

Inference and testing in general point process models such as the Hawkes
model is predominantly based on asymptotic approximations for likelihood-based
estimators and tests. As an alternative, and to improve finite sample
performance, this paper considers bootstrap-based inference for interval
estimation and testing. Specifically, for a wide class of point process models
we consider a novel bootstrap scheme labeled 'fixed intensity bootstrap' (FIB),
where the conditional intensity is kept fixed across bootstrap repetitions. The
FIB, which is very simple to implement and fast in practice, extends previous
ideas from the bootstrap literature on time series in discrete time, where the
so-called 'fixed design' and 'fixed volatility' bootstrap schemes have shown to
be particularly useful and effective. We compare the FIB with the classic
recursive bootstrap, which is here labeled 'recursive intensity bootstrap'
(RIB). In RIB algorithms, the intensity is stochastic in the bootstrap world
and implementation of the bootstrap is more involved, due to its sequential
structure. For both bootstrap schemes, we provide new bootstrap (asymptotic)
theory which allows to assess bootstrap validity, and propose a
'non-parametric' approach based on resampling time-changed transformations of
the original waiting times. We also establish the link between the proposed
bootstraps for point process models and the related autoregressive conditional
duration (ACD) models. Lastly, we show effectiveness of the different bootstrap
schemes in finite samples through a set of detailed Monte Carlo experiments,
and provide applications to both financial data and social media data to
illustrate the proposed methodology.

arXiv link: http://arxiv.org/abs/2104.03122v2

Econometrics arXiv updated paper (originally submitted: 2021-04-07)

The Proper Use of Google Trends in Forecasting Models

Authors: Marcelo C. Medeiros, Henrique F. Pires

It is widely known that Google Trends have become one of the most popular
free tools used by forecasters both in academics and in the private and public
sectors. There are many papers, from several different fields, concluding that
Google Trends improve forecasts' accuracy. However, what seems to be widely
unknown, is that each sample of Google search data is different from the other,
even if you set the same search term, data and location. This means that it is
possible to find arbitrary conclusions merely by chance. This paper aims to
show why and when it can become a problem and how to overcome this obstacle.

arXiv link: http://arxiv.org/abs/2104.03065v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-04-07

Minimax Kernel Machine Learning for a Class of Doubly Robust Functionals with Application to Proximal Causal Inference

Authors: AmirEmad Ghassami, Andrew Ying, Ilya Shpitser, Eric Tchetgen Tchetgen

Robins et al. (2008) introduced a class of influence functions (IFs) which
could be used to obtain doubly robust moment functions for the corresponding
parameters. However, that class does not include the IF of parameters for which
the nuisance functions are solutions to integral equations. Such parameters are
particularly important in the field of causal inference, specifically in the
recently proposed proximal causal inference framework of Tchetgen Tchetgen et
al. (2020), which allows for estimating the causal effect in the presence of
latent confounders. In this paper, we first extend the class of Robins et al.
to include doubly robust IFs in which the nuisance functions are solutions to
integral equations. Then we demonstrate that the double robustness property of
these IFs can be leveraged to construct estimating equations for the nuisance
functions, which enables us to solve the integral equations without resorting
to parametric models. We frame the estimation of the nuisance functions as a
minimax optimization problem. We provide convergence rates for the nuisance
functions and conditions required for asymptotic linearity of the estimator of
the parameter of interest. The experiment results demonstrate that our proposed
methodology leads to robust and high-performance estimators for average causal
effect in the proximal causal inference framework.

arXiv link: http://arxiv.org/abs/2104.02929v3

Econometrics arXiv paper, submitted: 2021-04-06

Revisiting the empirical fundamental relationship of traffic flow for highways using a causal econometric approach

Authors: Anupriya, Daniel J. Graham, Daniel Hörcher, Prateek Bansal

The fundamental relationship of traffic flow is empirically estimated by
fitting a regression curve to a cloud of observations of traffic variables.
Such estimates, however, may suffer from the confounding/endogeneity bias due
to omitted variables such as driving behaviour and weather. To this end, this
paper adopts a causal approach to obtain an unbiased estimate of the
fundamental flow-density relationship using traffic detector data. In
particular, we apply a Bayesian non-parametric spline-based regression approach
with instrumental variables to adjust for the aforementioned confounding bias.
The proposed approach is benchmarked against standard curve-fitting methods in
estimating the flow-density relationship for three highway bottlenecks in the
United States. Our empirical results suggest that the saturated (or
hypercongested) regime of the estimated flow-density relationship using
correlational curve fitting methods may be severely biased, which in turn leads
to biased estimates of important traffic control inputs such as capacity and
capacity-drop. We emphasise that our causal approach is based on the physical
laws of vehicle movement in a traffic stream as opposed to a demand-supply
framework adopted in the economics literature. By doing so, we also aim to
conciliate the engineering and economics approaches to this empirical problem.
Our results, thus, have important implications both for traffic engineers and
transport economists.

arXiv link: http://arxiv.org/abs/2104.02399v1

Econometrics arXiv updated paper (originally submitted: 2021-04-05)

Identification and Estimation in Many-to-one Two-sided Matching without Transfers

Authors: YingHua He, Shruti Sinha, Xiaoting Sun

In a setting of many-to-one two-sided matching with non-transferable
utilities, e.g., college admissions, we study conditions under which
preferences of both sides are identified with data on one single market.
Regardless of whether the market is centralized or decentralized, assuming that
the observed matching is stable, we show nonparametric identification of
preferences of both sides under certain exclusion restrictions. To take our
results to the data, we use Monte Carlo simulations to evaluate different
estimators, including the ones that are directly constructed from the
identification. We find that a parametric Bayesian approach with a Gibbs
sampler works well in realistically sized problems. Finally, we illustrate our
methodology in decentralized admissions to public and private schools in Chile
and conduct a counterfactual analysis of an affirmative action policy.

arXiv link: http://arxiv.org/abs/2104.02009v3

Econometrics arXiv updated paper (originally submitted: 2021-04-01)

Local Projections vs. VARs: Lessons From Thousands of DGPs

Authors: Dake Li, Mikkel Plagborg-Møller, Christian K. Wolf

We conduct a simulation study of Local Projection (LP) and Vector
Autoregression (VAR) estimators of structural impulse responses across
thousands of data generating processes, designed to mimic the properties of the
universe of U.S. macroeconomic data. Our analysis considers various
identification schemes and several variants of LP and VAR estimators, employing
bias correction, shrinkage, or model averaging. A clear bias-variance trade-off
emerges: LP estimators have lower bias than VAR estimators, but they also have
substantially higher variance at intermediate and long horizons. Bias-corrected
LP is the preferred method if and only if the researcher overwhelmingly
prioritizes bias. For researchers who also care about precision, VAR methods
are the most attractive -- Bayesian VARs at short and long horizons, and
least-squares VARs at intermediate and long horizons.

arXiv link: http://arxiv.org/abs/2104.00655v4

Econometrics arXiv updated paper (originally submitted: 2021-04-01)

Normalizations and misspecification in skill formation models

Authors: Joachim Freyberger

An important class of structural models studies the determinants of skill
formation and the optimal timing of interventions. In this paper, I provide new
identification results for these models and investigate the effects of
seemingly innocuous scale and location restrictions on parameters of interest.
To do so, I first characterize the identified set of all parameters without
these additional restrictions and show that important policy-relevant
parameters are point identified under weaker assumptions than commonly used in
the literature. The implications of imposing standard scale and location
restrictions depend on how the model is specified, but they generally impact
the interpretation of parameters and may affect counterfactuals. Importantly,
with the popular CES production function, commonly used scale restrictions fix
identified parameters and lead to misspecification. Consequently, simply
changing the units of measurements of observed variables might yield
ineffective investment strategies and misleading policy recommendations. I show
how existing estimators can easily be adapted to solve these issues. As a
byproduct, this paper also presents a general and formal definition of when
restrictions are truly normalizations.

arXiv link: http://arxiv.org/abs/2104.00473v4

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-03-31

Universal Prediction Band via Semi-Definite Programming

Authors: Tengyuan Liang

We propose a computationally efficient method to construct nonparametric,
heteroscedastic prediction bands for uncertainty quantification, with or
without any user-specified predictive model. Our approach provides an
alternative to the now-standard conformal prediction for uncertainty
quantification, with novel theoretical insights and computational advantages.
The data-adaptive prediction band is universally applicable with minimal
distributional assumptions, has strong non-asymptotic coverage properties, and
is easy to implement using standard convex programs. Our approach can be viewed
as a novel variance interpolation with confidence and further leverages
techniques from semi-definite programming and sum-of-squares optimization.
Theoretical and numerical performances for the proposed approach for
uncertainty quantification are analyzed.

arXiv link: http://arxiv.org/abs/2103.17203v3

Econometrics arXiv paper, submitted: 2021-03-31

Forecasting open-high-low-close data contained in candlestick chart

Authors: Huiwen Wang, Wenyang Huang, Shanshan Wang

Forecasting the (open-high-low-close)OHLC data contained in candlestick chart
is of great practical importance, as exemplified by applications in the field
of finance. Typically, the existence of the inherent constraints in OHLC data
poses great challenge to its prediction, e.g., forecasting models may yield
unrealistic values if these constraints are ignored. To address it, a novel
transformation approach is proposed to relax these constraints along with its
explicit inverse transformation, which ensures the forecasting models obtain
meaningful openhigh-low-close values. A flexible and efficient framework for
forecasting the OHLC data is also provided. As an example, the detailed
procedure of modelling the OHLC data via the vector auto-regression (VAR) model
and vector error correction (VEC) model is given. The new approach has high
practical utility on account of its flexibility, simple implementation and
straightforward interpretation. Extensive simulation studies are performed to
assess the effectiveness and stability of the proposed approach. Three
financial data sets of the Kweichow Moutai, CSI 100 index and 50 ETF of Chinese
stock market are employed to document the empirical effect of the proposed
methodology.

arXiv link: http://arxiv.org/abs/2104.00581v1

Econometrics arXiv paper, submitted: 2021-03-31

Dimension reduction of open-high-low-close data in candlestick chart based on pseudo-PCA

Authors: Wenyang Huang, Huiwen Wang, Shanshan Wang

The (open-high-low-close) OHLC data is the most common data form in the field
of finance and the investigate object of various technical analysis. With
increasing features of OHLC data being collected, the issue of extracting their
useful information in a comprehensible way for visualization and easy
interpretation must be resolved. The inherent constraints of OHLC data also
pose a challenge for this issue. This paper proposes a novel approach to
characterize the features of OHLC data in a dataset and then performs dimension
reduction, which integrates the feature information extraction method and
principal component analysis. We refer to it as the pseudo-PCA method.
Specifically, we first propose a new way to represent the OHLC data, which will
free the inherent constraints and provide convenience for further analysis.
Moreover, there is a one-to-one match between the original OHLC data and its
feature-based representations, which means that the analysis of the
feature-based data can be reversed to the original OHLC data. Next, we develop
the pseudo-PCA procedure for OHLC data, which can effectively identify
important information and perform dimension reduction. Finally, the
effectiveness and interpretability of the proposed method are investigated
through finite simulations and the spot data of China's agricultural product
market.

arXiv link: http://arxiv.org/abs/2103.16908v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2021-03-31

Mobility Functional Areas and COVID-19 Spread

Authors: Stefano Maria Iacus, Carlos Santamaria, Francesco Sermi, Spyridon Spyratos, Dario Tarchi, Michele Vespe

This work introduces a new concept of functional areas called Mobility
Functional Areas (MFAs), i.e., the geographic zones highly interconnected
according to the analysis of mobile positioning data. The MFAs do not coincide
necessarily with administrative borders as they are built observing natural
human mobility and, therefore, they can be used to inform, in a bottom-up
approach, local transportation, spatial planning, health and economic policies.
After presenting the methodology behind the MFAs, this study focuses on the
link between the COVID-19 pandemic and the MFAs in Austria. It emerges that the
MFAs registered an average number of infections statistically larger than the
areas in the rest of the country, suggesting the usefulness of the MFAs in the
context of targeted re-escalation policy responses to this health crisis. The
MFAs dataset is openly available to other scholars for further analyses.

arXiv link: http://arxiv.org/abs/2103.16894v2

Econometrics arXiv updated paper (originally submitted: 2021-03-30)

On a Standard Method for Measuring the Natural Rate of Interest

Authors: Daniel Buncic

I show that Holston, Laubach and Williams' (2017) implementation of Median
Unbiased Estimation (MUE) cannot recover the signal-to-noise ratio of interest
from their Stage 2 model. Moreover, their implementation of the structural
break regressions which are used as an auxiliary model in MUE deviates from
Stock and Watson's (1998) formulation. This leads to spuriously large estimates
of the signal-to-noise parameter $\lambda _{z}$ and thereby an excessive
downward trend in other factor $z_{t}$ and the natural rate. I provide a
correction to the Stage 2 model specification and the implementation of the
structural break regressions in MUE. This correction is quantitatively
important. It results in substantially smaller point estimates of $\lambda
_{z}$ which affects the severity of the downward trend in other factor $z_{t}$.
For the US, the estimate of $\lambda _{z}$ shrinks from $0.040$ to $0.013$ and
is statistically highly insignificant. For the Euro Area, the UK and Canada,
the MUE point estimates of $\lambda _{z}$ are exactly zero. Natural rate
estimates from HLW's model using the correct Stage 2 MUE implementation are up
to 100 basis points larger than originally computed.

arXiv link: http://arxiv.org/abs/2103.16452v2

Econometrics arXiv updated paper (originally submitted: 2021-03-29)

Empirical Welfare Maximization with Constraints

Authors: Liyang Sun

Empirical Welfare Maximization (EWM) is a framework that can be used to
select welfare program eligibility policies based on data. This paper extends
EWM by allowing for uncertainty in estimating the budget needed to implement
the selected policy, in addition to its welfare. Due to the additional
estimation error, I show there exist no rules that achieve the highest welfare
possible while satisfying a budget constraint uniformly over a wide range of
DGPs. This differs from the setting without a budget constraint where
uniformity is achievable. I propose an alternative trade-off rule and
illustrate it with Medicaid expansion, a setting with imperfect take-up and
varying program costs.

arXiv link: http://arxiv.org/abs/2103.15298v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-03-26

Divide-and-Conquer: A Distributed Hierarchical Factor Approach to Modeling Large-Scale Time Series Data

Authors: Zhaoxing Gao, Ruey S. Tsay

This paper proposes a hierarchical approximate-factor approach to analyzing
high-dimensional, large-scale heterogeneous time series data using distributed
computing. The new method employs a multiple-fold dimension reduction procedure
using Principal Component Analysis (PCA) and shows great promises for modeling
large-scale data that cannot be stored nor analyzed by a single machine. Each
computer at the basic level performs a PCA to extract common factors among the
time series assigned to it and transfers those factors to one and only one node
of the second level. Each 2nd-level computer collects the common factors from
its subordinates and performs another PCA to select the 2nd-level common
factors. This process is repeated until the central server is reached, which
collects common factors from its direct subordinates and performs a final PCA
to select the global common factors. The noise terms of the 2nd-level
approximate factor model are the unique common factors of the 1st-level
clusters. We focus on the case of 2 levels in our theoretical derivations, but
the idea can easily be generalized to any finite number of hierarchies. We
discuss some clustering methods when the group memberships are unknown and
introduce a new diffusion index approach to forecasting. We further extend the
analysis to unit-root nonstationary time series. Asymptotic properties of the
proposed method are derived for the diverging dimension of the data in each
computing unit and the sample size $T$. We use both simulated data and real
examples to assess the performance of the proposed method in finite samples,
and compare our method with the commonly used ones in the literature concerning
the forecastability of extracted factors.

arXiv link: http://arxiv.org/abs/2103.14626v1

Econometrics arXiv paper, submitted: 2021-03-25

Addressing spatial dependence in technical efficiency estimation: A Spatial DEA frontier approach

Authors: Julian Ramajo, Miguel A. Marquez, Geoffrey J. D. Hewings

This paper introduces a new specification for the nonparametric
production-frontier based on Data Envelopment Analysis (DEA) when dealing with
decision-making units whose economic performances are correlated with those of
the neighbors (spatial dependence). To illustrate the bias reduction that the
SpDEA provides with respect to standard DEA methods, an analysis of the
regional production frontiers for the NUTS-2 European regions during the period
2000-2014 was carried out. The estimated SpDEA scores show a bimodal
distribution do not detected by the standard DEA estimates. The results confirm
the crucial role of space, offering important new insights on both the causes
of regional disparities in labour productivity and the observed polarization of
the European distribution of per capita income.

arXiv link: http://arxiv.org/abs/2103.14063v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-03-25

Testing for threshold effects in the TARMA framework

Authors: Greta Goracci, Simone Giannerini, Kung-Sik Chan, Howell Tong

We present supremum Lagrange Multiplier tests to compare a linear ARMA
specification against its threshold ARMA extension. We derive the asymptotic
distribution of the test statistics both under the null hypothesis and
contiguous local alternatives. Moreover, we prove the consistency of the tests.
The Monte Carlo study shows that the tests enjoy good finite-sample properties,
are robust against model mis-specification and their performance is not
affected if the order of the model is unknown. The tests present a low
computational burden and do not suffer from some of the drawbacks that affect
the quasi-likelihood ratio setting. Lastly, we apply our tests to a time series
of standardized tree-ring growth indexes and this can lead to new research in
climate studies.

arXiv link: http://arxiv.org/abs/2103.13977v1

Econometrics arXiv updated paper (originally submitted: 2021-03-25)

A perturbed utility route choice model

Authors: Mogens Fosgerau, Mads Paulsen, Thomas Kjær Rasmussen

We propose a route choice model in which traveler behavior is represented as
a utility maximizing assignment of flow across an entire network under a flow
conservation constraint}. Substitution between routes depends on how much they
overlap. {\tr The model is estimated considering the full set of route
alternatives, and no choice set generation is required. Nevertheless,
estimation requires only linear regression and is very fast. Predictions from
the model can be computed using convex optimization, and computation is
straightforward even for large networks. We estimate and validate the model
using a large dataset comprising 1,337,096 GPS traces of trips in the Greater
Copenhagen road network.

arXiv link: http://arxiv.org/abs/2103.13784v3

Econometrics arXiv paper, submitted: 2021-03-24

Phase transition of the monotonicity assumption in learning local average treatment effects

Authors: Yinchu Zhu

We consider the setting in which a strong binary instrument is available for
a binary treatment. The traditional LATE approach assumes the monotonicity
condition stating that there are no defiers (or compliers). Since this
condition is not always obvious, we investigate the sensitivity and testability
of this condition. In particular, we focus on the question: does a slight
violation of monotonicity lead to a small problem or a big problem? We find a
phase transition for the monotonicity condition. On one of the boundary of the
phase transition, it is easy to learn the sign of LATE and on the other side of
the boundary, it is impossible to learn the sign of LATE. Unfortunately, the
impossible side of the phase transition includes data-generating processes
under which the proportion of defiers tends to zero. This boundary of phase
transition is explicitly characterized in the case of binary outcomes. Outside
a special case, it is impossible to test whether the data-generating process is
on the nice side of the boundary. However, in the special case that the
non-compliance is almost one-sided, such a test is possible. We also provide
simple alternatives to monotonicity.

arXiv link: http://arxiv.org/abs/2103.13369v1

Econometrics arXiv updated paper (originally submitted: 2021-03-24)

An investigation of higher order moments of empirical financial data and the implications to risk

Authors: Luke De Clerk, Sergey Savel'ev

Here, we analyse the behaviour of the higher order standardised moments of
financial time series when we truncate a large data set into smaller and
smaller subsets, referred to below as time windows. We look at the effect of
the economic environment on the behaviour of higher order moments in these time
windows. We observe two different scaling relations of higher order moments
when the data sub sets' length decreases; one for longer time windows and
another for the shorter time windows. These scaling relations drastically
change when the time window encompasses a financial crisis. We also observe a
qualitative change of higher order standardised moments compared to the
gaussian values in response to a shrinking time window. We extend this analysis
to incorporate the effects these scaling relations have upon risk. We decompose
the return series within these time windows and carry out a Value-at-Risk
calculation. In doing so, we observe the manifestation of the scaling relations
through the change in the Value-at-Risk level. Moreover, we model the observed
scaling laws by analysing the hierarchy of rare events on higher order moments.

arXiv link: http://arxiv.org/abs/2103.13199v3

Econometrics arXiv updated paper (originally submitted: 2021-03-23)

Identification at the Zero Lower Bound

Authors: Sophocles Mavroeidis

I show that the Zero Lower Bound (ZLB) on interest rates can be used to
identify the causal effects of monetary policy. Identification depends on the
extent to which the ZLB limits the efficacy of monetary policy. I propose a
simple way to test the efficacy of unconventional policies, modelled via a
`shadow rate'. I apply this method to U.S. monetary policy using a
three-equation SVAR model of inflation, unemployment and the federal funds
rate. I reject the null hypothesis that unconventional monetary policy has no
effect at the ZLB, but find some evidence that it is not as effective as
conventional monetary policy.

arXiv link: http://arxiv.org/abs/2103.12779v2

Econometrics arXiv updated paper (originally submitted: 2021-03-23)

What Do We Get from Two-Way Fixed Effects Regressions? Implications from Numerical Equivalence

Authors: Shoya Ishimaru

This paper develops numerical and causal interpretations of two-way fixed
effects (TWFE) regressions, allowing for general scalar treatments with
non-staggered designs and time-varying covariates. Building on the numerical
equivalence between TWFE and pooled first-difference regressions, I decompose
the TWFE coefficient into a weighted average of first-difference coefficients
across varying horizons, clarifying contributions of short-run versus long-run
changes. Causal interpretation of the TWFE coefficient requires common trends
assumptions for all time horizons, conditional on changes, not levels, of
time-varying covariates. I develop diagnostic procedures to assess this
assumption's plausibility across different horizons, extending beyond recent
literature's focus on binary, staggered treatments.

arXiv link: http://arxiv.org/abs/2103.12374v9

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2021-03-22

Uncovering Bias in Order Assignment

Authors: Darren Grant

Many real life situations require a set of items to be repeatedly placed in a
random sequence. In such circumstances, it is often desirable to test whether
such randomization indeed obtains, yet this problem has received very limited
attention in the literature. This paper articulates the key features of this
problem and presents three "untargeted" tests that require no a priori
information from the analyst. These methods are used to analyze the order in
which lottery numbers are drawn in Powerball, the order in which contestants
perform on American Idol, and the order of candidates on primary election
ballots in Texas and West Virginia. In this last application, multiple
deviations from full randomization are detected, with potentially serious
political and legal consequences. The form these deviations take varies,
depending on institutional factors, which sometimes necessitates the use of
tests that exchange power for increased robustness.

arXiv link: http://arxiv.org/abs/2103.11952v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-03-22

PatentSBERTa: A Deep NLP based Hybrid Model for Patent Distance and Classification using Augmented SBERT

Authors: Hamid Bekamiri, Daniel S. Hain, Roman Jurowetzki

This study provides an efficient approach for using text data to calculate
patent-to-patent (p2p) technological similarity, and presents a hybrid
framework for leveraging the resulting p2p similarity for applications such as
semantic search and automated patent classification. We create embeddings using
Sentence-BERT (SBERT) based on patent claims. We leverage SBERTs efficiency in
creating embedding distance measures to map p2p similarity in large sets of
patent data. We deploy our framework for classification with a simple Nearest
Neighbors (KNN) model that predicts Cooperative Patent Classification (CPC) of
a patent based on the class assignment of the K patents with the highest p2p
similarity. We thereby validate that the p2p similarity captures their
technological features in terms of CPC overlap, and at the same demonstrate the
usefulness of this approach for automatic patent classification based on text
data. Furthermore, the presented classification framework is simple and the
results easy to interpret and evaluate by end-users. In the out-of-sample model
validation, we are able to perform a multi-label prediction of all assigned CPC
classes on the subclass (663) level on 1,492,294 patents with an accuracy of
54% and F1 score > 66%, which suggests that our model outperforms the current
state-of-the-art in text-based multi-label and multi-class patent
classification. We furthermore discuss the applicability of the presented
framework for semantic IP search, patent landscaping, and technology
intelligence. We finally point towards a future research agenda for leveraging
multi-source patent embeddings, their appropriateness across applications, as
well as to improve and validate patent embeddings by creating domain-expert
curated Semantic Textual Similarity (STS) benchmark datasets.

arXiv link: http://arxiv.org/abs/2103.11933v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-03-22

Robust Orthogonal Machine Learning of Treatment Effects

Authors: Yiyan Huang, Cheuk Hang Leung, Qi Wu, Xing Yan

Causal learning is the key to obtaining stable predictions and answering
what if problems in decision-makings. In causal learning, it is
central to seek methods to estimate the average treatment effect (ATE) from
observational data. The Double/Debiased Machine Learning (DML) is one of the
prevalent methods to estimate ATE. However, the DML estimators can suffer from
an error-compounding issue and even give extreme estimates when the
propensity scores are close to 0 or 1. Previous studies have overcome this
issue through some empirical tricks such as propensity score trimming, yet none
of the existing works solves it from a theoretical standpoint. In this paper,
we propose a Robust Causal Learning (RCL) method to offset the
deficiencies of DML estimators. Theoretically, the RCL estimators i) satisfy
the (higher-order) orthogonal condition and are as consistent and
doubly robust as the DML estimators, and ii) get rid of the error-compounding
issue. Empirically, the comprehensive experiments show that: i) the RCL
estimators give more stable estimations of the causal parameters than DML; ii)
the RCL estimators outperform traditional estimators and their variants when
applying different machine learning models on both simulation and benchmark
datasets, and a mimic consumer credit dataset generated by WGAN.

arXiv link: http://arxiv.org/abs/2103.11869v2

Econometrics arXiv updated paper (originally submitted: 2021-03-21)

A Powerful Subvector Anderson Rubin Test in Linear Instrumental Variables Regression with Conditional Heteroskedasticity

Authors: Patrik Guggenberger, Frank Kleibergen, Sophocles Mavroeidis

We introduce a new test for a two-sided hypothesis involving a subset of the
structural parameter vector in the linear instrumental variables (IVs) model.
Guggenberger et al. (2019), GKM19 from now on, introduce a subvector
Anderson-Rubin (AR) test with data-dependent critical values that has
asymptotic size equal to nominal size for a parameter space that allows for
arbitrary strength or weakness of the IVs and has uniformly nonsmaller power
than the projected AR test studied in Guggenberger et al. (2012). However,
GKM19 imposes the restrictive assumption of conditional homoskedasticity. The
main contribution here is to robustify the procedure in GKM19 to arbitrary
forms of conditional heteroskedasticity. We first adapt the method in GKM19 to
a setup where a certain covariance matrix has an approximate Kronecker product
(AKP) structure which nests conditional homoskedasticity. The new test equals
this adaption when the data is consistent with AKP structure as decided by a
model selection procedure. Otherwise the test equals the AR/AR test in Andrews
(2017) that is fully robust to conditional heteroskedasticity but less powerful
than the adapted method. We show theoretically that the new test has asymptotic
size bounded by the nominal size and document improved power relative to the
AR/AR test in a wide array of Monte Carlo simulations when the covariance
matrix is not too far from AKP.

arXiv link: http://arxiv.org/abs/2103.11371v4

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2021-03-19

On Spurious Causality, CO2, and Global Temperature

Authors: Philippe Goulet Coulombe, Maximilian Göbel

Stips, Macias, Coughlan, Garcia-Gorriz, and Liang (2016, Nature Scientific
Reports) use information flows (Liang, 2008, 2014) to establish causality from
various forcings to global temperature. We show that the formulas being used
hinges on a simplifying assumption that is nearly always rejected by the data.
We propose an adequate measure of information flow based on Vector
Autoregressions, and find that most results in Stips et al. (2016) cannot be
corroborated. Then, it is discussed which modeling choices (e.g., the choice of
CO2 series and assumptions about simultaneous relationships) may help in
extracting credible estimates of causal flows and the transient climate
response simply by looking at the joint dynamics of two climatic time series.

arXiv link: http://arxiv.org/abs/2103.10605v1

Econometrics arXiv updated paper (originally submitted: 2021-03-17)

Feasible IV Regression without Excluded Instruments

Authors: Emmanuel Selorm Tsyawo

The relevance condition of Integrated Conditional Moment (ICM) estimators is
significantly weaker than the conventional IV's in at least two respects: (1)
consistent estimation without excluded instruments is possible, provided
endogenous covariates are non-linearly mean-dependent on exogenous covariates,
and (2) endogenous covariates may be uncorrelated with but mean-dependent on
instruments. These remarkable properties notwithstanding, multiplicative-kernel
ICM estimators suffer diminished identification strength, large bias, and
severe size distortions even for a moderately sized instrument vector. This
paper proposes a computationally fast linear ICM estimator that better
preserves identification strength in the presence of multiple instruments and a
test of the ICM relevance condition. Monte Carlo simulations demonstrate a
considerably better size control in the presence of multiple instruments and a
favourably competitive performance in general. An empirical example illustrates
the practical usefulness of the estimator, where estimates remain plausible
when no excluded instrument is used.

arXiv link: http://arxiv.org/abs/2103.09621v4

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-03-17

DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R

Authors: Philipp Bach, Victor Chernozhukov, Malte S. Kurz, Martin Spindler, Sven Klaassen

The R package DoubleML implements the double/debiased machine learning
framework of Chernozhukov et al. (2018). It provides functionalities to
estimate parameters in causal models based on machine learning methods. The
double machine learning framework consist of three key ingredients: Neyman
orthogonality, high-quality machine learning estimation and sample splitting.
Estimation of nuisance components can be performed by various state-of-the-art
machine learning methods that are available in the mlr3 ecosystem. DoubleML
makes it possible to perform inference in a variety of causal models, including
partially linear and interactive regression models and their extensions to
instrumental variable estimation. The object-oriented implementation of
DoubleML enables a high flexibility for the model specification and makes it
easily extendable. This paper serves as an introduction to the double machine
learning framework and the R package DoubleML. In reproducible code examples
with simulated and real data sets, we demonstrate how DoubleML users can
perform valid inference based on machine learning methods.

arXiv link: http://arxiv.org/abs/2103.09603v6

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-03-17

Simultaneous Decorrelation of Matrix Time Series

Authors: Yuefeng Han, Rong Chen, Cun-Hui Zhang, Qiwei Yao

We propose a contemporaneous bilinear transformation for a $p\times q$ matrix
time series to alleviate the difficulties in modeling and forecasting matrix
time series when $p$ and/or $q$ are large. The resulting transformed matrix
assumes a block structure consisting of several small matrices, and those small
matrix series are uncorrelated across all times. Hence an overall parsimonious
model is achieved by modelling each of those small matrix series separately
without the loss of information on the linear dynamics. Such a parsimonious
model often has better forecasting performance, even when the underlying true
dynamics deviates from the assumed uncorrelated block structure after
transformation. The uniform convergence rates of the estimated transformation
are derived, which vindicate an important virtue of the proposed bilinear
transformation, i.e. it is technically equivalent to the decorrelation of a
vector time series of dimension max$(p,q)$ instead of $p\times q$. The proposed
method is illustrated numerically via both simulated and real data examples.

arXiv link: http://arxiv.org/abs/2103.09411v2

Econometrics arXiv updated paper (originally submitted: 2021-03-15)

Estimating the Long-Term Effects of Novel Treatments

Authors: Keith Battocchi, Eleanor Dillon, Maggie Hei, Greg Lewis, Miruna Oprescu, Vasilis Syrgkanis

Policy makers typically face the problem of wanting to estimate the long-term
effects of novel treatments, while only having historical data of older
treatment options. We assume access to a long-term dataset where only past
treatments were administered and a short-term dataset where novel treatments
have been administered. We propose a surrogate based approach where we assume
that the long-term effect is channeled through a multitude of available
short-term proxies. Our work combines three major recent techniques in the
causal machine learning literature: surrogate indices, dynamic treatment effect
estimation and double machine learning, in a unified pipeline. We show that our
method is consistent and provides root-n asymptotically normal estimates under
a Markovian assumption on the data and the observational policy. We use a
data-set from a major corporation that includes customer investments over a
three year period to create a semi-synthetic data distribution where the major
qualitative properties of the real dataset are preserved. We evaluate the
performance of our method and discuss practical challenges of deploying our
formal methodology and how to address them.

arXiv link: http://arxiv.org/abs/2103.08390v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-03-12

Mixture composite regression models with multi-type feature selection

Authors: Tsz Chai Fung, George Tzougas, Mario Wuthrich

The aim of this paper is to present a mixture composite regression model for
claim severity modelling. Claim severity modelling poses several challenges
such as multimodality, heavy-tailedness and systematic effects in data. We
tackle this modelling problem by studying a mixture composite regression model
for simultaneous modeling of attritional and large claims, and for considering
systematic effects in both the mixture components as well as the mixing
probabilities. For model fitting, we present a group-fused regularization
approach that allows us for selecting the explanatory variables which
significantly impact the mixing probabilities and the different mixture
components, respectively. We develop an asymptotic theory for this regularized
estimation approach, and fitting is performed using a novel Generalized
Expectation-Maximization algorithm. We exemplify our approach on real motor
insurance data set.

arXiv link: http://arxiv.org/abs/2103.07200v2

Econometrics arXiv updated paper (originally submitted: 2021-03-12)

Finding Subgroups with Significant Treatment Effects

Authors: Jann Spiess, Vasilis Syrgkanis, Victor Yaneng Wang

Researchers often run resource-intensive randomized controlled trials (RCTs)
to estimate the causal effects of interventions on outcomes of interest. Yet
these outcomes are often noisy, and estimated overall effects can be small or
imprecise. Nevertheless, we may still be able to produce reliable evidence of
the efficacy of an intervention by finding subgroups with significant effects.
In this paper, we propose a machine-learning method that is specifically
optimized for finding such subgroups in noisy data. Unlike available methods
for personalized treatment assignment, our tool is fundamentally designed to
take significance testing into account: it produces a subgroup that is chosen
to maximize the probability of obtaining a statistically significant positive
treatment effect. We provide a computationally efficient implementation using
decision trees and demonstrate its gain over selecting subgroups based on
positive (estimated) treatment effects. Compared to standard tree-based
regression and classification tools, this approach tends to yield higher power
in detecting subgroups affected by the treatment.

arXiv link: http://arxiv.org/abs/2103.07066v2

Econometrics arXiv updated paper (originally submitted: 2021-03-11)

Estimating the causal effect of an intervention in a time series setting: the C-ARIMA approach

Authors: Fiammetta Menchetti, Fabrizio Cipollini, Fabrizia Mealli

The Rubin Causal Model (RCM) is a framework that allows to define the causal
effect of an intervention as a contrast of potential outcomes. In recent years,
several methods have been developed under the RCM to estimate causal effects in
time series settings. None of these makes use of ARIMA models, which are
instead very common in the econometrics literature. In this paper, we propose a
novel approach, C-ARIMA, to define and estimate the causal effect of an
intervention in a time series setting under the RCM. We first formalize the
assumptions enabling the definition, the estimation and the attribution of the
effect to the intervention; we then check the validity of the proposed method
with an extensive simulation study, comparing its performance against a
standard intervention analysis approach. In the empirical application, we use
C-ARIMA to assess the causal effect of a permanent price reduction on
supermarket sales. The CausalArima R package provides an implementation of our
proposed approach.

arXiv link: http://arxiv.org/abs/2103.06740v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-03-11

Regression based thresholds in principal loading analysis

Authors: J. O. Bauer, B. Drabant

Principal loading analysis is a dimension reduction method that discards
variables which have only a small distorting effect on the covariance matrix.
As a special case, principal loading analysis discards variables that are not
correlated with the remaining ones. In multivariate linear regression on the
other hand, predictors that are neither correlated with both the remaining
predictors nor with the dependent variables have a regression coefficients
equal to zero. Hence, if the goal is to select a number of predictors,
variables that do not correlate are discarded as it is also done in principal
loading analysis. That both methods select the same variables occurs not only
for the special case of zero correlation however. We contribute conditions
under which both methods share the same variable selection. Further, we extend
those conditions to provide a choice for the threshold in principal loading
analysis which only follows recommendations based on simulation results so far.

arXiv link: http://arxiv.org/abs/2103.06691v3

Econometrics arXiv paper, submitted: 2021-03-11

Convergence of Computed Dynamic Models with Unbounded Shock

Authors: Kenichiro McAlinn, Kosaku Takanashi

This paper studies the asymptotic convergence of computed dynamic models when
the shock is unbounded. Most dynamic economic models lack a closed-form
solution. As such, approximate solutions by numerical methods are utilized.
Since the researcher cannot directly evaluate the exact policy function and the
associated exact likelihood, it is imperative that the approximate likelihood
asymptotically converges -- as well as to know the conditions of convergence --
to the exact likelihood, in order to justify and validate its usage. In this
regard, Fernandez-Villaverde, Rubio-Ramirez, and Santos (2006) show convergence
of the likelihood, when the shock has compact support. However, compact support
implies that the shock is bounded, which is not an assumption met in most
dynamic economic models, e.g., with normally distributed shocks. This paper
provides theoretical justification for most dynamic models used in the
literature by showing the conditions for convergence of the approximate
invariant measure obtained from numerical simulations to the exact invariant
measure, thus providing the conditions for convergence of the likelihood.

arXiv link: http://arxiv.org/abs/2103.06483v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-03-11

Causal inference with misspecified exposure mappings: separating definitions and assumptions

Authors: Fredrik Sävje

Exposure mappings facilitate investigations of complex causal effects when
units interact in experiments. Current methods require experimenters to use the
same exposure mappings both to define the effect of interest and to impose
assumptions on the interference structure. However, the two roles rarely
coincide in practice, and experimenters are forced to make the often
questionable assumption that their exposures are correctly specified. This
paper argues that the two roles exposure mappings currently serve can, and
typically should, be separated, so that exposures are used to define effects
without necessarily assuming that they are capturing the complete causal
structure in the experiment. The paper shows that this approach is practically
viable by providing conditions under which exposure effects can be precisely
estimated when the exposures are misspecified. Some important questions remain
open.

arXiv link: http://arxiv.org/abs/2103.06471v2

Econometrics arXiv updated paper (originally submitted: 2021-03-11)

More Robust Estimators for Instrumental-Variable Panel Designs, With An Application to the Effect of Imports from China on US Employment

Authors: Clément de Chaisemartin, Ziteng Lei

We show that first-difference two-stages-least-squares regressions identify
non-convex combinations of location-and-period-specific treatment effects.
Thus, those regressions could be biased if effects are heterogeneous. We
propose an alternative instrumental-variable correlated-random-coefficient
(IV-CRC) estimator, that is more robust to heterogeneous effects. We revisit
Autor et al. (2013), who use a first-difference two-stages-least-squares
regression to estimate the effect of imports from China on US manufacturing
employment. Their regression estimates a highly non-convex combination of
effects. Our more robust IV-CRC estimator is small and insignificant. Though
its confidence interval is wide, it significantly differs from the
first-difference two-stages-least-squares estimator.

arXiv link: http://arxiv.org/abs/2103.06437v10

Econometrics arXiv updated paper (originally submitted: 2021-03-10)

Optimal Targeting in Fundraising: A Causal Machine-Learning Approach

Authors: Tobias Cagala, Ulrich Glogowsky, Johannes Rincke, Anthony Strittmatter

Ineffective fundraising lowers the resources charities can use to provide
goods. We combine a field experiment and a causal machine-learning approach to
increase a charity's fundraising effectiveness. The approach optimally targets
a fundraising instrument to individuals whose expected donations exceed
solicitation costs. Our results demonstrate that machine-learning-based optimal
targeting allows the charity to substantially increase donations net of
fundraising costs relative to uniform benchmarks in which either everybody or
no one receives the gift. To that end, it (a) should direct its fundraising
efforts to a subset of past donors and (b) never address individuals who were
previously asked but never donated. Further, we show that the benefits of
machine-learning-based optimal targeting even materialize when the charity only
exploits publicly available geospatial information or applies the estimated
optimal targeting rule to later fundraising campaigns conducted in similar
samples. We conclude that charities not engaging in optimal targeting waste
significant resources.

arXiv link: http://arxiv.org/abs/2103.10251v3

Econometrics arXiv paper, submitted: 2021-03-10

Extension of the Lagrange multiplier test for error cross-section independence to large panels with non normal errors

Authors: Zhaoyuan Li, Jianfeng Yao

This paper reexamines the seminal Lagrange multiplier test for cross-section
independence in a large panel model where both the number of cross-sectional
units n and the number of time series observations T can be large. The first
contribution of the paper is an enlargement of the test with two extensions:
firstly the new asymptotic normality is derived in a simultaneous limiting
scheme where the two dimensions (n, T) tend to infinity with comparable
magnitudes; second, the result is valid for general error distribution (not
necessarily normal). The second contribution of the paper is a new test
statistic based on the sum of the fourth powers of cross-section correlations
from OLS residuals, instead of their squares used in the Lagrange multiplier
statistic. This new test is generally more powerful, and the improvement is
particularly visible against alternatives with weak or sparse cross-section
dependence. Both simulation study and real data analysis are proposed to
demonstrate the advantages of the enlarged Lagrange multiplier test and the
power enhanced test in comparison with the existing procedures.

arXiv link: http://arxiv.org/abs/2103.06075v1

Econometrics arXiv cross-link from q-fin.CP (q-fin.CP), submitted: 2021-03-09

Portfolio risk allocation through Shapley value

Authors: Patrick S. Hagan, Andrew Lesniewski, Georgios E. Skoufis, Diana E. Woodward

We argue that using the Shapley value of cooperative game theory as the
scheme for risk allocation among non-orthogonal risk factors is a natural way
of interpreting the contribution made by each of such factors to overall
portfolio risk. We discuss a Shapley value scheme for allocating risk to
non-orthogonal greeks in a portfolio of derivatives. Such a situation arises,
for example, when using a stochastic volatility model to capture option
volatility smile. We also show that Shapley value allows for a natural method
of interpreting components of enterprise risk measures such as VaR and ES. For
all applications discussed, we derive explicit formulas and / or numerical
algorithms to calculate the allocations.

arXiv link: http://arxiv.org/abs/2103.05453v1

Econometrics arXiv updated paper (originally submitted: 2021-03-08)

Root-n-consistent Conditional ML estimation of dynamic panel logit models with fixed effects

Authors: Hugo Kruiniger

In this paper we first propose a root-n-consistent Conditional Maximum
Likelihood (CML) estimator for all the common parameters in the panel logit
AR(p) model with strictly exogenous covariates and fixed effects. Our CML
estimator (CMLE) converges in probability faster and is more easily computed
than the kernel-weighted CMLE of Honor\'e and Kyriazidou (2000). Next, we
propose a root-n-consistent CMLE for the coefficients of the exogenous
covariates only. We also discuss new CMLEs for the panel logit AR(p) model
without covariates. Finally, we propose CMLEs for multinomial dynamic panel
logit models with and without covariates. All CMLEs are asymptotically normally
distributed.

arXiv link: http://arxiv.org/abs/2103.04973v6

Econometrics arXiv updated paper (originally submitted: 2021-03-08)

Approximate Bayesian inference and forecasting in huge-dimensional multi-country VARs

Authors: Martin Feldkircher, Florian Huber, Gary Koop, Michael Pfarrhofer

Panel Vector Autoregressions (PVARs) are a popular tool for analyzing
multi-country datasets. However, the number of estimated parameters can be
enormous, leading to computational and statistical issues. In this paper, we
develop fast Bayesian methods for estimating PVARs using integrated rotated
Gaussian approximations. We exploit the fact that domestic information is often
more important than international information and group the coefficients
accordingly. Fast approximations are used to estimate the latter while the
former are estimated with precision using Markov chain Monte Carlo techniques.
We illustrate, using a huge model of the world economy, that it produces
competitive forecasts quickly.

arXiv link: http://arxiv.org/abs/2103.04944v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-03-07

On a log-symmetric quantile tobit model applied to female labor supply data

Authors: Danúbia R. Cunha, Jose A. Divino, Helton Saulo

The classic censored regression model (tobit model) has been widely used in
the economic literature. This model assumes normality for the error
distribution and is not recommended for cases where positive skewness is
present. Moreover, in regression analysis, it is well-known that a quantile
regression approach allows us to study the influences of the explanatory
variables on the dependent variable considering different quantiles. Therefore,
we propose in this paper a quantile tobit regression model based on
quantile-based log-symmetric distributions. The proposed methodology allows us
to model data with positive skewness (which is not suitable for the classic
tobit model), and to study the influence of the quantiles of interest, in
addition to accommodating heteroscedasticity. The model parameters are
estimated using the maximum likelihood method and an elaborate Monte Carlo
study is performed to evaluate the performance of the estimates. Finally, the
proposed methodology is illustrated using two female labor supply data sets.
The results show that the proposed log-symmetric quantile tobit model has a
better fit than the classic tobit model.

arXiv link: http://arxiv.org/abs/2103.04449v1

Econometrics arXiv paper, submitted: 2021-03-07

The impact of online machine-learning methods on long-term investment decisions and generator utilization in electricity markets

Authors: Alexander J. M. Kell, A. Stephen McGough, Matthew Forshaw

Electricity supply must be matched with demand at all times. This helps
reduce the chances of issues such as load frequency control and the chances of
electricity blackouts. To gain a better understanding of the load that is
likely to be required over the next 24h, estimations under uncertainty are
needed. This is especially difficult in a decentralized electricity market with
many micro-producers which are not under central control.
In this paper, we investigate the impact of eleven offline learning and five
online learning algorithms to predict the electricity demand profile over the
next 24h. We achieve this through integration within the long-term agent-based
model, ElecSim. Through the prediction of electricity demand profile over the
next 24h, we can simulate the predictions made for a day-ahead market. Once we
have made these predictions, we sample from the residual distributions and
perturb the electricity market demand using the simulation, ElecSim. This
enables us to understand the impact of errors on the long-term dynamics of a
decentralized electricity market.
We show we can reduce the mean absolute error by 30% using an online
algorithm when compared to the best offline algorithm, whilst reducing the
required tendered national grid reserve required. This reduction in national
grid reserves leads to savings in costs and emissions. We also show that large
errors in prediction accuracy have a disproportionate error on investments made
over a 17-year time frame, as well as electricity mix.

arXiv link: http://arxiv.org/abs/2103.04327v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-03-06

Asymptotic Theory for IV-Based Reinforcement Learning with Potential Endogeneity

Authors: Jin Li, Ye Luo, Zigan Wang, Xiaowei Zhang

In the standard data analysis framework, data is collected (once and for
all), and then data analysis is carried out. However, with the advancement of
digital technology, decision-makers constantly analyze past data and generate
new data through their decisions. We model this as a Markov decision process
and show that the dynamic interaction between data generation and data analysis
leads to a new type of bias -- reinforcement bias -- that exacerbates the
endogeneity problem in standard data analysis. We propose a class of instrument
variable (IV)-based reinforcement learning (RL) algorithms to correct for the
bias and establish their theoretical properties by incorporating them into a
stochastic approximation (SA) framework. Our analysis accommodates
iterate-dependent Markovian structures and, therefore, can be used to study RL
algorithms with policy improvement. We also provide formulas for inference on
optimal policies of the IV-RL algorithms. These formulas highlight how
intertemporal dependencies of the Markovian environment affect the inference.

arXiv link: http://arxiv.org/abs/2103.04021v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-03-05

Autocalibration and Tweedie-dominance for Insurance Pricing with Machine Learning

Authors: Michel Denuit, Arthur Charpentier, Julien Trufin

Boosting techniques and neural networks are particularly effective machine
learning methods for insurance pricing. Often in practice, there are
nevertheless endless debates about the choice of the right loss function to be
used to train the machine learning model, as well as about the appropriate
metric to assess the performances of competing models. Also, the sum of fitted
values can depart from the observed totals to a large extent and this often
confuses actuarial analysts. The lack of balance inherent to training models by
minimizing deviance outside the familiar GLM with canonical link setting has
been empirically documented in W\"uthrich (2019, 2020) who attributes it to the
early stopping rule in gradient descent methods for model fitting. The present
paper aims to further study this phenomenon when learning proceeds by
minimizing Tweedie deviance. It is shown that minimizing deviance involves a
trade-off between the integral of weighted differences of lower partial moments
and the bias measured on a specific scale. Autocalibration is then proposed as
a remedy. This new method to correct for bias adds an extra local GLM step to
the analysis. Theoretically, it is shown that it implements the autocalibration
concept in pure premium calculation and ensures that balance also holds on a
local scale, not only at portfolio level as with existing bias-correction
techniques. The convex order appears to be the natural tool to compare
competing models, putting a new light on the diagnostic graphs and associated
metrics proposed by Denuit et al. (2019).

arXiv link: http://arxiv.org/abs/2103.03635v2

Econometrics arXiv updated paper (originally submitted: 2021-03-05)

Modeling tail risks of inflation using unobserved component quantile regressions

Authors: Michael Pfarrhofer

This paper proposes methods for Bayesian inference in time-varying parameter
(TVP) quantile regression (QR) models featuring conditional heteroskedasticity.
I use data augmentation schemes to render the model conditionally Gaussian and
develop an efficient Gibbs sampling algorithm. Regularization of the
high-dimensional parameter space is achieved via flexible dynamic shrinkage
priors. A simple version of TVP-QR based on an unobserved component model is
applied to dynamically trace the quantiles of the distribution of inflation in
the United States, the United Kingdom and the euro area. In an out-of-sample
forecast exercise, I find the proposed model to be competitive and perform
particularly well for higher-order and tail forecasts. A detailed analysis of
the resulting predictive distributions reveals that they are sometimes skewed
and occasionally feature heavy tails.

arXiv link: http://arxiv.org/abs/2103.03632v2

Econometrics arXiv paper, submitted: 2021-03-05

Prediction of financial time series using LSTM and data denoising methods

Authors: Qi Tang, Tongmei Fan, Ruchen Shi, Jingyan Huang, Yidan Ma

In order to further overcome the difficulties of the existing models in
dealing with the non-stationary and nonlinear characteristics of high-frequency
financial time series data, especially its weak generalization ability, this
paper proposes an ensemble method based on data denoising methods, including
the wavelet transform (WT) and singular spectrum analysis (SSA), and long-term
short-term memory neural network (LSTM) to build a data prediction model, The
financial time series is decomposed and reconstructed by WT and SSA to denoise.
Under the condition of denoising, the smooth sequence with effective
information is reconstructed. The smoothing sequence is introduced into LSTM
and the predicted value is obtained. With the Dow Jones industrial average
index (DJIA) as the research object, the closing price of the DJIA every five
minutes is divided into short-term (1 hour), medium-term (3 hours) and
long-term (6 hours) respectively. . Based on root mean square error (RMSE),
mean absolute error (MAE), mean absolute percentage error (MAPE) and absolute
percentage error standard deviation (SDAPE), the experimental results show that
in the short-term, medium-term and long-term, data denoising can greatly
improve the accuracy and stability of the prediction, and can effectively
improve the generalization ability of LSTM prediction model. As WT and SSA can
extract useful information from the original sequence and avoid overfitting,
the hybrid model can better grasp the sequence pattern of the closing price of
the DJIA. And the WT-LSTM model is better than the benchmark LSTM model and
SSA-LSTM model.

arXiv link: http://arxiv.org/abs/2103.03505v1

Econometrics arXiv paper, submitted: 2021-03-04

Extremal points of Lorenz curves and applications to inequality analysis

Authors: Amparo Baíllo, Javier Cárcamo, Carlos Mora-Corral

We find the set of extremal points of Lorenz curves with fixed Gini index and
compute the maximal $L^1$-distance between Lorenz curves with given values of
their Gini coefficients. As an application we introduce a bidimensional index
that simultaneously measures relative inequality and dissimilarity between two
populations. This proposal employs the Gini indices of the variables and an
$L^1$-distance between their Lorenz curves. The index takes values in a
right-angled triangle, two of whose sides characterize perfect relative
inequality-expressed by the Lorenz ordering between the underlying
distributions. Further, the hypotenuse represents maximal distance between the
two distributions. As a consequence, we construct a chart to, graphically,
either see the evolution of (relative) inequality and distance between two
income distributions over time or to compare the distribution of income of a
specific population between a fixed time point and a range of years. We prove
the mathematical results behind the above claims and provide a full description
of the asymptotic properties of the plug-in estimator of this index. Finally,
we apply the proposed bidimensional index to several real EU-SILC income
datasets to illustrate its performance in practice.

arXiv link: http://arxiv.org/abs/2103.03286v1

Econometrics arXiv paper, submitted: 2021-03-04

High-dimensional estimation of quadratic variation based on penalized realized variance

Authors: Kim Christensen, Mikkel Slot Nielsen, Mark Podolskij

In this paper, we develop a penalized realized variance (PRV) estimator of
the quadratic variation (QV) of a high-dimensional continuous It\^{o}
semimartingale. We adapt the principle idea of regularization from linear
regression to covariance estimation in a continuous-time high-frequency
setting. We show that under a nuclear norm penalization, the PRV is computed by
soft-thresholding the eigenvalues of realized variance (RV). It therefore
encourages sparsity of singular values or, equivalently, low rank of the
solution. We prove our estimator is minimax optimal up to a logarithmic factor.
We derive a concentration inequality, which reveals that the rank of PRV is --
with a high probability -- the number of non-negligible eigenvalues of the QV.
Moreover, we also provide the associated non-asymptotic analysis for the spot
variance. We suggest an intuitive data-driven bootstrap procedure to select the
shrinkage parameter. Our theory is supplemented by a simulation study and an
empirical application. The PRV detects about three-five factors in the equity
market, with a notable rank decrease during times of distress in financial
markets. This is consistent with most standard asset pricing models, where a
limited amount of systematic factors driving the cross-section of stock returns
are perturbed by idiosyncratic errors, rendering the QV -- and also RV -- of
full rank.

arXiv link: http://arxiv.org/abs/2103.03237v1

Econometrics arXiv updated paper (originally submitted: 2021-03-04)

Factor-Based Imputation of Missing Values and Covariances in Panel Data of Large Dimensions

Authors: Ercument Cahan, Jushan Bai, Serena Ng

Economists are blessed with a wealth of data for analysis, but more often
than not, values in some entries of the data matrix are missing. Various
methods have been proposed to handle missing observations in a few variables.
We exploit the factor structure in panel data of large dimensions. Our
tall-project algorithm first estimates the factors from a
tall block in which data for all rows are observed, and projections of
variable specific length are then used to estimate the factor loadings. A
missing value is imputed as the estimated common component which we show is
consistent and asymptotically normal without further iteration. Implications
for using imputed data in factor augmented regressions are then discussed.
To compensate for the downward bias in covariance matrices created by an
omitted noise when the data point is not observed, we overlay the imputed data
with re-sampled idiosyncratic residuals many times and use the average of the
covariances to estimate the parameters of interest. Simulations show that the
procedures have desirable finite sample properties.

arXiv link: http://arxiv.org/abs/2103.03045v3

Econometrics arXiv updated paper (originally submitted: 2021-03-04)

Theory of Evolutionary Spectra for Heteroskedasticity and Autocorrelation Robust Inference in Possibly Misspecified and Nonstationary Models

Authors: Alessandro Casini

We develop a theory of evolutionary spectra for heteroskedasticity and
autocorrelation robust (HAR) inference when the data may not satisfy
second-order stationarity. Nonstationarity is a common feature of economic time
series which may arise either from parameter variation or model
misspecification. In such a context, the theories that support HAR inference
are either not applicable or do not provide accurate approximations. HAR tests
standardized by existing long-run variance estimators then may display size
distortions and little or no power. This issue can be more severe for methods
that use long bandwidths (i.e., fixed-b HAR tests). We introduce a class of
nonstationary processes that have a time-varying spectral representation which
evolves continuously except at a finite number of time points. We present an
extension of the classical heteroskedasticity and autocorrelation consistent
(HAC) estimators that applies two smoothing procedures. One is over the lagged
autocovariances, akin to classical HAC estimators, and the other is over time.
The latter element is important to flexibly account for nonstationarity. We
name them double kernel HAC (DK-HAC) estimators. We show the consistency of the
estimators and obtain an optimal DK-HAC estimator under the mean squared error
(MSE) criterion. Overall, HAR tests standardized by the proposed DK-HAC
estimators are competitive with fixed-b HAR tests, when the latter work well,
with regards to size control even when there is strong dependence. Notably, in
those empirically relevant situations in which previous HAR tests are
undersized and have little or no power, the DK-HAC estimator leads to tests
that have good size and power.

arXiv link: http://arxiv.org/abs/2103.02981v2

Econometrics arXiv updated paper (originally submitted: 2021-03-03)

Modeling Macroeconomic Variations After COVID-19

Authors: Serena Ng

The coronavirus is a global event of historical proportions and just a few
months changed the time series properties of the data in ways that make many
pre-covid forecasting models inadequate. It also creates a new problem for
estimation of economic factors and dynamic causal effects because the
variations around the outbreak can be interpreted as outliers, as shifts to the
distribution of existing shocks, or as addition of new shocks. I take the
latter view and use covid indicators as controls to 'de-covid' the data prior
to estimation. I find that economic uncertainty remains high at the end of 2020
even though real economic activity has recovered and covid uncertainty has
receded. Dynamic responses of variables to shocks in a VAR similar in magnitude
and shape to the ones identified before 2020 can be recovered by directly or
indirectly modeling covid and treating it as exogenous. These responses to
economic shocks are distinctly different from those to a covid shock which are
much larger but shorter lived. Disentangling the two types of shocks can be
important in macroeconomic modeling post-covid.

arXiv link: http://arxiv.org/abs/2103.02732v4

Econometrics arXiv updated paper (originally submitted: 2021-03-03)

Prewhitened Long-Run Variance Estimation Robust to Nonstationarity

Authors: Alessandro Casini, Pierre Perron

We introduce a nonparametric nonlinear VAR prewhitened long-run variance
(LRV) estimator for the construction of standard errors robust to
autocorrelation and heteroskedasticity that can be used for hypothesis testing
in a variety of contexts including the linear regression model. Existing
methods either are theoretically valid only under stationarity and have poor
finite-sample properties under nonstationarity (i.e., fixed-b methods), or are
theoretically valid under the null hypothesis but lead to tests that are not
consistent under nonstationary alternative hypothesis (i.e., both fixed-b and
traditional HAC estimators). The proposed estimator accounts explicitly for
nonstationarity, unlike previous prewhitened procedures which are known to be
unreliable, and leads to tests with accurate null rejection rates and good
monotonic power. We also establish MSE bounds for LRV estimation that are
sharper than previously established and use them to determine the
data-dependent bandwidths.

arXiv link: http://arxiv.org/abs/2103.02235v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-03-02

Slow-Growing Trees

Authors: Philippe Goulet Coulombe

Random Forest's performance can be matched by a single slow-growing tree
(SGT), which uses a learning rate to tame CART's greedy algorithm. SGT exploits
the view that CART is an extreme case of an iterative weighted least square
procedure. Moreover, a unifying view of Boosted Trees (BT) and Random Forests
(RF) is presented. Greedy ML algorithms' outcomes can be improved using either
"slow learning" or diversification. SGT applies the former to estimate a single
deep tree, and Booging (bagging stochastic BT with a high learning rate) uses
the latter with additive shallow trees. The performance of this tree ensemble
quaternity (Booging, BT, SGT, RF) is assessed on simulated and real regression
tasks.

arXiv link: http://arxiv.org/abs/2103.01926v2

Econometrics arXiv updated paper (originally submitted: 2021-03-02)

Theory of Low Frequency Contamination from Nonstationarity and Misspecification: Consequences for HAR Inference

Authors: Alessandro Casini, Taosong Deng, Pierre Perron

We establish theoretical results about the low frequency contamination (i.e.,
long memory effects) induced by general nonstationarity for estimates such as
the sample autocovariance and the periodogram, and deduce consequences for
heteroskedasticity and autocorrelation robust (HAR) inference. We present
explicit expressions for the asymptotic bias of these estimates. We distinguish
cases where this contamination only occurs as a small-sample problem and cases
where the contamination continues to hold asymptotically. We show theoretically
that nonparametric smoothing over time is robust to low frequency
contamination. Our results provide new insights on the debate between
consistent versus inconsistent long-run variance (LRV) estimation. Existing LRV
estimators tend to be in inflated when the data are nonstationary. This results
in HAR tests that can be undersized and exhibit dramatic power losses. Our
theory indicates that long bandwidths or fixed-b HAR tests suffer more from low
frequency contamination relative to HAR tests based on HAC estimators, whereas
recently introduced double kernel HAC estimators do not super from this
problem. Finally, we present second-order Edgeworth expansions under
nonstationarity about the distribution of HAC and DK-HAC estimators and about
the corresponding t-test in the linear regression model.

arXiv link: http://arxiv.org/abs/2103.01604v3

Econometrics arXiv updated paper (originally submitted: 2021-03-02)

Network Cluster-Robust Inference

Authors: Michael P. Leung

Since network data commonly consists of observations from a single large
network, researchers often partition the network into clusters in order to
apply cluster-robust inference methods. Existing such methods require clusters
to be asymptotically independent. Under mild conditions, we prove that, for
this requirement to hold for network-dependent data, it is necessary and
sufficient that clusters have low conductance, the ratio of edge boundary size
to volume. This yields a simple measure of cluster quality. We find in
simulations that when clusters have low conductance, cluster-robust methods
control size better than HAC estimators. However, for important classes of
networks lacking low-conductance clusters, the former can exhibit substantial
size distortion. To determine the number of low-conductance clusters and
construct them, we draw on results in spectral graph theory that connect
conductance to the spectrum of the graph Laplacian. Based on these results, we
propose to use the spectrum to determine the number of low-conductance clusters
and spectral clustering to construct them.

arXiv link: http://arxiv.org/abs/2103.01470v4

Econometrics arXiv updated paper (originally submitted: 2021-03-02)

Some Finite Sample Properties of the Sign Test

Authors: Yong Cai

This paper contains two finite-sample results concerning the sign test.
First, we show that the sign-test is unbiased with independent, non-identically
distributed data for both one-sided and two-sided hypotheses. The proof for the
two-sided case is based on a novel argument that relates the derivatives of the
power function to a regular bipartite graph. Unbiasedness then follows from the
existence of perfect matchings on such graphs. Second, we provide a simple
theoretical counterexample to show that the sign test over-rejects when the
data exhibits correlation. Our results can be useful for understanding the
properties of approximate randomization tests in settings with few clusters.

arXiv link: http://arxiv.org/abs/2103.01412v2

Econometrics arXiv paper, submitted: 2021-03-02

Standing on the Shoulders of Machine Learning: Can We Improve Hypothesis Testing?

Authors: Gary Cornwall, Jeff Chen, Beau Sauley

In this paper we have updated the hypothesis testing framework by drawing
upon modern computational power and classification models from machine
learning. We show that a simple classification algorithm such as a boosted
decision stump can be used to fully recover the full size-power trade-off for
any single test statistic. This recovery implies an equivalence, under certain
conditions, between the basic building block of modern machine learning and
hypothesis testing. Second, we show that more complex algorithms such as the
random forest and gradient boosted machine can serve as mapping functions in
place of the traditional null distribution. This allows for multiple test
statistics and other information to be evaluated simultaneously and thus form a
pseudo-composite hypothesis test. Moreover, we show how practitioners can make
explicit the relative costs of Type I and Type II errors to contextualize the
test into a specific decision framework. To illustrate this approach we revisit
the case of testing for unit roots, a difficult problem in time series
econometrics for which existing tests are known to exhibit low power. Using a
simulation framework common to the literature we show that this approach can
improve upon overall accuracy of the traditional unit root test(s) by seventeen
percentage points, and the sensitivity by thirty six percentage points.

arXiv link: http://arxiv.org/abs/2103.01368v1

Econometrics arXiv updated paper (originally submitted: 2021-03-01)

Dynamic covariate balancing: estimating treatment effects over time with potential local projections

Authors: Davide Viviano, Jelena Bradic

This paper studies the estimation and inference of treatment histories in
panel data settings when treatments change dynamically over time.
We propose a method that allows for (i) treatments to be assigned dynamically
over time based on high-dimensional covariates, past outcomes and treatments;
(ii) outcomes and time-varying covariates to depend on treatment trajectories;
(iii) heterogeneity of treatment effects.
Our approach recursively projects potential outcomes' expectations on past
histories. It then controls the bias by balancing dynamically observable
characteristics. We study the asymptotic and numerical properties of the
estimator and illustrate the benefits of the procedure in an empirical
application.

arXiv link: http://arxiv.org/abs/2103.01280v4

Econometrics arXiv paper, submitted: 2021-03-01

The Kernel Trick for Nonlinear Factor Modeling

Authors: Varlam Kutateladze

Factor modeling is a powerful statistical technique that permits to capture
the common dynamics in a large panel of data with a few latent variables, or
factors, thus alleviating the curse of dimensionality. Despite its popularity
and widespread use for various applications ranging from genomics to finance,
this methodology has predominantly remained linear. This study estimates
factors nonlinearly through the kernel method, which allows flexible
nonlinearities while still avoiding the curse of dimensionality. We focus on
factor-augmented forecasting of a single time series in a high-dimensional
setting, known as diffusion index forecasting in macroeconomics literature. Our
main contribution is twofold. First, we show that the proposed estimator is
consistent and it nests linear PCA estimator as well as some nonlinear
estimators introduced in the literature as specific examples. Second, our
empirical application to a classical macroeconomic dataset demonstrates that
this approach can offer substantial advantages over mainstream methods.

arXiv link: http://arxiv.org/abs/2103.01266v1

Econometrics arXiv paper, submitted: 2021-03-01

Can Machine Learning Catch the COVID-19 Recession?

Authors: Philippe Goulet Coulombe, Massimiliano Marcellino, Dalibor Stevanovic

Based on evidence gathered from a newly built large macroeconomic data set
for the UK, labeled UK-MD and comparable to similar datasets for the US and
Canada, it seems the most promising avenue for forecasting during the pandemic
is to allow for general forms of nonlinearity by using machine learning (ML)
methods. But not all nonlinear ML methods are alike. For instance, some do not
allow to extrapolate (like regular trees and forests) and some do (when
complemented with linear dynamic components). This and other crucial aspects of
ML-based forecasting in unprecedented times are studied in an extensive
pseudo-out-of-sample exercise.

arXiv link: http://arxiv.org/abs/2103.01201v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-03-01

BERT based patent novelty search by training claims to their own description

Authors: Michael Freunek, André Bodmer

In this paper we present a method to concatenate patent claims to their own
description. By applying this method, BERT trains suitable descriptions for
claims. Such a trained BERT (claim-to-description- BERT) could be able to
identify novelty relevant descriptions for patents. In addition, we introduce a
new scoring scheme, relevance scoring or novelty scoring, to process the output
of BERT in a meaningful way. We tested the method on patent applications by
training BERT on the first claims of patents and corresponding descriptions.
BERT's output has been processed according to the relevance score and the
results compared with the cited X documents in the search reports. The test
showed that BERT has scored some of the cited X documents as highly relevant.

arXiv link: http://arxiv.org/abs/2103.01126v4

Econometrics arXiv updated paper (originally submitted: 2021-03-01)

Structural models for policy-making: Coping with parametric uncertainty

Authors: Philipp Eisenhauer, Janoś Gabler, Lena Janys, Christopher Walsh

The ex-ante evaluation of policies using structural econometric models is
based on estimated parameters as a stand-in for the true parameters. This
practice ignores uncertainty in the counterfactual policy predictions of the
model. We develop a generic approach that deals with parametric uncertainty
using uncertainty sets and frames model-informed policy-making as a decision
problem under uncertainty. The seminal human capital investment model by Keane
and Wolpin (1997) provides a well-known, influential, and empirically-grounded
test case. We document considerable uncertainty in the models's policy
predictions and highlight the resulting policy recommendations obtained from
using different formal rules of decision-making under uncertainty.

arXiv link: http://arxiv.org/abs/2103.01115v4

Econometrics arXiv cross-link from cs.SI (cs.SI), submitted: 2021-03-01

Extracting Complements and Substitutes from Sales Data: A Network Perspective

Authors: Yu Tian, Sebastian Lautz, Alisdiar O. G. Wallis, Renaud Lambiotte

The complementarity and substitutability between products are essential
concepts in retail and marketing. Qualitatively, two products are said to be
substitutable if a customer can replace one product by the other, while they
are complementary if they tend to be bought together. In this article, we take
a network perspective to help automatically identify complements and
substitutes from sales transaction data. Starting from a bipartite
product-purchase network representation, with both transaction nodes and
product nodes, we develop appropriate null models to infer significant
relations, either complements or substitutes, between products, and design
measures based on random walks to quantify their importance. The resulting
unipartite networks between products are then analysed with community detection
methods, in order to find groups of similar products for the different types of
relationships. The results are validated by combining observations from a
real-world basket dataset with the existing product hierarchy, as well as a
large-scale flavour compound and recipe dataset.

arXiv link: http://arxiv.org/abs/2103.02042v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-03-01

Panel semiparametric quantile regression neural network for electricity consumption forecasting

Authors: Xingcai Zhou, Jiangyan Wang

China has made great achievements in electric power industry during the
long-term deepening of reform and opening up. However, the complex regional
economic, social and natural conditions, electricity resources are not evenly
distributed, which accounts for the electricity deficiency in some regions of
China. It is desirable to develop a robust electricity forecasting model.
Motivated by which, we propose a Panel Semiparametric Quantile Regression
Neural Network (PSQRNN) by utilizing the artificial neural network and
semiparametric quantile regression. The PSQRNN can explore a potential linear
and nonlinear relationships among the variables, interpret the unobserved
provincial heterogeneity, and maintain the interpretability of parametric
models simultaneously. And the PSQRNN is trained by combining the penalized
quantile regression with LASSO, ridge regression and backpropagation algorithm.
To evaluate the prediction accuracy, an empirical analysis is conducted to
analyze the provincial electricity consumption from 1999 to 2018 in China based
on three scenarios. From which, one finds that the PSQRNN model performs better
for electricity consumption forecasting by considering the economic and
climatic factors. Finally, the provincial electricity consumptions of the next
$5$ years (2019-2023) in China are reported by forecasting.

arXiv link: http://arxiv.org/abs/2103.00711v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-02-28

On the Subbagging Estimation for Massive Data

Authors: Tao Zou, Xian Li, Xuan Liang, Hansheng Wang

This article introduces subbagging (subsample aggregating) estimation
approaches for big data analysis with memory constraints of computers.
Specifically, for the whole dataset with size $N$, $m_N$ subsamples are
randomly drawn, and each subsample with a subsample size $k_N\ll N$ to meet the
memory constraint is sampled uniformly without replacement. Aggregating the
estimators of $m_N$ subsamples can lead to subbagging estimation. To analyze
the theoretical properties of the subbagging estimator, we adapt the incomplete
$U$-statistics theory with an infinite order kernel to allow overlapping drawn
subsamples in the sampling procedure. Utilizing this novel theoretical
framework, we demonstrate that via a proper hyperparameter selection of $k_N$
and $m_N$, the subbagging estimator can achieve $N$-consistency and
asymptotic normality under the condition $(k_Nm_N)/N\to \alpha \in (0,\infty]$.
Compared to the full sample estimator, we theoretically show that the
$N$-consistent subbagging estimator has an inflation rate of $1/\alpha$
in its asymptotic variance. Simulation experiments are presented to demonstrate
the finite sample performances. An American airline dataset is analyzed to
illustrate that the subbagging estimate is numerically close to the full sample
estimate, and can be computationally fast under the memory constraint.

arXiv link: http://arxiv.org/abs/2103.00631v1

Econometrics arXiv updated paper (originally submitted: 2021-02-28)

Algorithmic subsampling under multiway clustering

Authors: Harold D. Chiang, Jiatong Li, Yuya Sasaki

This paper proposes a novel method of algorithmic subsampling (data
sketching) for multiway cluster dependent data. We establish a new uniform weak
law of large numbers and a new central limit theorem for the multiway
algorithmic subsample means. Consequently, we discover an additional advantage
of the algorithmic subsampling that it allows for robustness against potential
degeneracy, and even non-Gaussian degeneracy, of the asymptotic distribution
under multiway clustering. Simulation studies support this novel result, and
demonstrate that inference with the algorithmic subsampling entails more
accuracy than that without the algorithmic subsampling. Applying these basic
asymptotic theories, we derive the consistency and the asymptotic normality for
the multiway algorithmic subsampling generalized method of moments estimator
and for the multiway algorithmic subsampling M-estimator. We illustrate an
application to scanner data.

arXiv link: http://arxiv.org/abs/2103.00557v4

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2021-02-28

Confronting Machine Learning With Financial Research

Authors: Kristof Lommers, Ouns El Harzli, Jack Kim

This study aims to examine the challenges and applications of machine
learning for financial research. Machine learning algorithms have been
developed for certain data environments which substantially differ from the one
we encounter in finance. Not only do difficulties arise due to some of the
idiosyncrasies of financial markets, there is a fundamental tension between the
underlying paradigm of machine learning and the research philosophy in
financial economics. Given the peculiar features of financial markets and the
empirical framework within social science, various adjustments have to be made
to the conventional machine learning methodology. We discuss some of the main
challenges of machine learning in finance and examine how these could be
accounted for. Despite some of the challenges, we argue that machine learning
could be unified with financial research to become a robust complement to the
econometrician's toolbox. Moreover, we discuss the various applications of
machine learning in the research process such as estimation, empirical
discovery, testing, causal inference and prediction.

arXiv link: http://arxiv.org/abs/2103.00366v2

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2021-02-27

Forecasting high-frequency financial time series: an adaptive learning approach with the order book data

Authors: Parley Ruogu Yang

This paper proposes a forecast-centric adaptive learning model that engages
with the past studies on the order book and high-frequency data, with
applications to hypothesis testing. In line with the past literature, we
produce brackets of summaries of statistics from the high-frequency bid and ask
data in the CSI 300 Index Futures market and aim to forecast the one-step-ahead
prices. Traditional time series issues, e.g. ARIMA order selection,
stationarity, together with potential financial applications are covered in the
exploratory data analysis, which pave paths to the adaptive learning model. By
designing and running the learning model, we found it to perform well compared
to the top fixed models, and some could improve the forecasting accuracy by
being more stable and resilient to non-stationarity. Applications to hypothesis
testing are shown with a rolling window, and further potential applications to
finance and statistics are outlined.

arXiv link: http://arxiv.org/abs/2103.00264v1

Econometrics arXiv paper, submitted: 2021-02-26

Simultaneous Bandwidths Determination for DK-HAC Estimators and Long-Run Variance Estimation in Nonparametric Settings

Authors: Federico Belotti, Alessandro Casini, Leopoldo Catania, Stefano Grassi, Pierre Perron

We consider the derivation of data-dependent simultaneous bandwidths for
double kernel heteroskedasticity and autocorrelation consistent (DK-HAC)
estimators. In addition to the usual smoothing over lagged autocovariances for
classical HAC estimators, the DK-HAC estimator also applies smoothing over the
time direction. We obtain the optimal bandwidths that jointly minimize the
global asymptotic MSE criterion and discuss the trade-off between bias and
variance with respect to smoothing over lagged autocovariances and over time.
Unlike the MSE results of Andrews (1991), we establish how nonstationarity
affects the bias-variance trade-o?. We use the plug-in approach to construct
data-dependent bandwidths for the DK-HAC estimators and compare them with the
DK-HAC estimators from Casini (2021) that use data-dependent bandwidths
obtained from a sequential MSE criterion. The former performs better in terms
of size control, especially with stationary and close to stationary data.
Finally, we consider long-run variance estimation under the assumption that the
series is a function of a nonparametric estimator rather than of a
semiparametric estimator that enjoys the usual T^(1/2) rate of convergence.
Thus, we also establish the validity of consistent long-run variance estimation
in nonparametric parameter estimation settings.

arXiv link: http://arxiv.org/abs/2103.00060v1

Econometrics arXiv updated paper (originally submitted: 2021-02-26)

Permutation Tests at Nonparametric Rates

Authors: Marinho Bertanha, EunYi Chung

Classical two-sample permutation tests for equality of distributions have
exact size in finite samples, but they fail to control size for testing
equality of parameters that summarize each distribution. This paper proposes
permutation tests for equality of parameters that are estimated at root-$n$ or
slower rates. Our general framework applies to both parametric and
nonparametric models, with two samples or one sample split into two subsamples.
Our tests have correct size asymptotically while preserving exact size in
finite samples when distributions are equal. They have no loss in local
asymptotic power compared to tests that use asymptotic critical values. We
propose confidence sets with correct coverage in large samples that also have
exact coverage in finite samples if distributions are equal up to a
transformation. We apply our theory to four commonly-used hypothesis tests of
nonparametric functions evaluated at a point. Lastly, simulations show good
finite sample properties, and two empirical examples illustrate our tests in
practice.

arXiv link: http://arxiv.org/abs/2102.13638v3

Econometrics arXiv paper, submitted: 2021-02-26

General Bayesian time-varying parameter VARs for predicting government bond yields

Authors: Manfred M. Fischer, Niko Hauzenberger, Florian Huber, Michael Pfarrhofer

Time-varying parameter (TVP) regressions commonly assume that time-variation
in the coefficients is determined by a simple stochastic process such as a
random walk. While such models are capable of capturing a wide range of dynamic
patterns, the true nature of time variation might stem from other sources, or
arise from different laws of motion. In this paper, we propose a flexible TVP
VAR that assumes the TVPs to depend on a panel of partially latent covariates.
The latent part of these covariates differ in their state dynamics and thus
capture smoothly evolving or abruptly changing coefficients. To determine which
of these covariates are important, and thus to decide on the appropriate state
evolution, we introduce Bayesian shrinkage priors to perform model selection.
As an empirical application, we forecast the US term structure of interest
rates and show that our approach performs well relative to a set of competing
models. We then show how the model can be used to explain structural breaks in
coefficients related to the US yield curve.

arXiv link: http://arxiv.org/abs/2102.13393v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-02-25

Online Multi-Armed Bandits with Adaptive Inference

Authors: Maria Dimakopoulou, Zhimei Ren, Zhengyuan Zhou

During online decision making in Multi-Armed Bandits (MAB), one needs to
conduct inference on the true mean reward of each arm based on data collected
so far at each step. However, since the arms are adaptively selected--thereby
yielding non-iid data--conducting inference accurately is not straightforward.
In particular, sample averaging, which is used in the family of UCB and
Thompson sampling (TS) algorithms, does not provide a good choice as it suffers
from bias and a lack of good statistical properties (e.g. asymptotic
normality). Our thesis in this paper is that more sophisticated inference
schemes that take into account the adaptive nature of the sequentially
collected data can unlock further performance gains, even though both UCB and
TS type algorithms are optimal in the worst case. In particular, we propose a
variant of TS-style algorithms--which we call doubly adaptive TS--that
leverages recent advances in causal inference and adaptively reweights the
terms of a doubly robust estimator on the true mean reward of each arm. Through
20 synthetic domain experiments and a semi-synthetic experiment based on data
from an A/B test of a web service, we demonstrate that using an adaptive
inferential scheme (while still retaining the exploration efficacy of TS)
provides clear benefits in online decision making: the proposed DATS algorithm
has superior empirical performance to existing baselines (UCB and TS) in terms
of regret and sample complexity in identifying the best arm. In addition, we
also provide a finite-time regret bound of doubly adaptive TS that matches (up
to log factors) those of UCB and TS algorithms, thereby establishing that its
improved practical benefits do not come at the expense of worst-case
suboptimality.

arXiv link: http://arxiv.org/abs/2102.13202v2

Econometrics arXiv updated paper (originally submitted: 2021-02-25)

A Control Function Approach to Estimate Panel Data Binary Response Model

Authors: Amaresh K Tiwari

We propose a new control function (CF) method to estimate a binary response
model in a triangular system with multiple unobserved heterogeneities The CFs
are the expected values of the heterogeneity terms in the reduced form
equations conditional on the histories of the endogenous and the exogenous
variables. The method requires weaker restrictions compared to CF methods with
similar imposed structures. If the support of endogenous regressors is large,
average partial effects are point-identified even when instruments are
discrete. Bounds are provided when the support assumption is violated. An
application and Monte Carlo experiments compare several alternative methods
with ours.

arXiv link: http://arxiv.org/abs/2102.12927v2

Econometrics arXiv cross-link from q-fin.RM (q-fin.RM), submitted: 2021-02-25

Next Generation Models for Portfolio Risk Management: An Approach Using Financial Big Data

Authors: Kwangmin Jung, Donggyu Kim, Seunghyeon Yu

This paper proposes a dynamic process of portfolio risk measurement to
address potential information loss. The proposed model takes advantage of
financial big data to incorporate out-of-target-portfolio information that may
be missed when one considers the Value at Risk (VaR) measures only from certain
assets of the portfolio. We investigate how the curse of dimensionality can be
overcome in the use of financial big data and discuss where and when benefits
occur from a large number of assets. In this regard, the proposed approach is
the first to suggest the use of financial big data to improve the accuracy of
risk analysis. We compare the proposed model with benchmark approaches and
empirically show that the use of financial big data improves small portfolio
risk analysis. Our findings are useful for portfolio managers and financial
regulators, who may seek for an innovation to improve the accuracy of portfolio
risk estimation.

arXiv link: http://arxiv.org/abs/2102.12783v3

Econometrics arXiv updated paper (originally submitted: 2021-02-25)

Quasi-maximum likelihood estimation of break point in high-dimensional factor models

Authors: Jiangtao Duan, Jushan Bai, Xu Han

This paper estimates the break point for large-dimensional factor models with
a single structural break in factor loadings at a common unknown date. First,
we propose a quasi-maximum likelihood (QML) estimator of the change point based
on the second moments of factors, which are estimated by principal component
analysis. We show that the QML estimator performs consistently when the
covariance matrix of the pre- or post-break factor loading, or both, is
singular. When the loading matrix undergoes a rotational type of change while
the number of factors remains constant over time, the QML estimator incurs a
stochastically bounded estimation error. In this case, we establish an
asymptotic distribution of the QML estimator. The simulation results validate
the feasibility of this estimator when used in finite samples. In addition, we
demonstrate empirical applications of the proposed method by applying it to
estimate the break points in a U.S. macroeconomic dataset and a stock return
dataset.

arXiv link: http://arxiv.org/abs/2102.12666v3

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2021-02-24

Overnight GARCH-Itô Volatility Models

Authors: Donggyu Kim, Minseok Shin, Yazhen Wang

Various parametric volatility models for financial data have been developed
to incorporate high-frequency realized volatilities and better capture market
dynamics. However, because high-frequency trading data are not available during
the close-to-open period, the volatility models often ignore volatility
information over the close-to-open period and thus may suffer from loss of
important information relevant to market dynamics. In this paper, to account
for whole-day market dynamics, we propose an overnight volatility model based
on It\^o diffusions to accommodate two different instantaneous volatility
processes for the open-to-close and close-to-open periods. We develop a
weighted least squares method to estimate model parameters for two different
periods and investigate its asymptotic properties. We conduct a simulation
study to check the finite sample performance of the proposed model and method.
Finally, we apply the proposed approaches to real trading data.

arXiv link: http://arxiv.org/abs/2102.13467v2

Econometrics arXiv paper, submitted: 2021-02-24

Inference in Incomplete Models

Authors: Alfred Galichon, Marc Henry

We provide a test for the specification of a structural model without
identifying assumptions. We show the equivalence of several natural
formulations of correct specification, which we take as our null hypothesis.
From a natural empirical version of the latter, we derive a Kolmogorov-Smirnov
statistic for Choquet capacity functionals, which we use to construct our test.
We derive the limiting distribution of our test statistic under the null, and
show that our test is consistent against certain classes of alternatives. When
the model is given in parametric form, the test can be inverted to yield
confidence regions for the identified parameter set. The approach can be
applied to the estimation of models with sample selection, censored observables
and to games with multiple equilibria.

arXiv link: http://arxiv.org/abs/2102.12257v1

Econometrics arXiv paper, submitted: 2021-02-24

Set Identification in Models with Multiple Equilibria

Authors: Alfred Galichon, Marc Henry

We propose a computationally feasible way of deriving the identified features
of models with multiple equilibria in pure or mixed strategies. It is shown
that in the case of Shapley regular normal form games, the identified set is
characterized by the inclusion of the true data distribution within the core of
a Choquet capacity, which is interpreted as the generalized likelihood of the
model. In turn, this inclusion is characterized by a finite set of inequalities
and efficient and easily implementable combinatorial methods are described to
check them. In all normal form games, the identified set is characterized in
terms of the value of a submodular or convex optimization program. Efficient
algorithms are then given and compared to check inclusion of a parameter in
this identified set. The latter are illustrated with family bargaining games
and oligopoly entry games.

arXiv link: http://arxiv.org/abs/2102.12249v1

Econometrics arXiv cross-link from cs.CV (cs.CV), submitted: 2021-02-24

Deep Video Prediction for Time Series Forecasting

Authors: Zhen Zeng, Tucker Balch, Manuela Veloso

Time series forecasting is essential for decision making in many domains. In
this work, we address the challenge of predicting prices evolution among
multiple potentially interacting financial assets. A solution to this problem
has obvious importance for governments, banks, and investors. Statistical
methods such as Auto Regressive Integrated Moving Average (ARIMA) are widely
applied to these problems. In this paper, we propose to approach economic time
series forecasting of multiple financial assets in a novel way via video
prediction. Given past prices of multiple potentially interacting financial
assets, we aim to predict the prices evolution in the future. Instead of
treating the snapshot of prices at each time point as a vector, we spatially
layout these prices in 2D as an image, such that we can harness the power of
CNNs in learning a latent representation for these financial assets. Thus, the
history of these prices becomes a sequence of images, and our goal becomes
predicting future images. We build on a state-of-the-art video prediction
method for forecasting future images. Our experiments involve the prediction
task of the price evolution of nine financial assets traded in U.S. stock
markets. The proposed method outperforms baselines including ARIMA, Prophet,
and variations of the proposed method, demonstrating the benefits of harnessing
the power of CNNs in the problem of economic time series forecasting.

arXiv link: http://arxiv.org/abs/2102.12061v2

Econometrics arXiv updated paper (originally submitted: 2021-02-23)

Hierarchical Regularizers for Mixed-Frequency Vector Autoregressions

Authors: Alain Hecq, Marie Ternes, Ines Wilms

Mixed-frequency Vector AutoRegressions (MF-VAR) model the dynamics between
variables recorded at different frequencies. However, as the number of series
and high-frequency observations per low-frequency period grow, MF-VARs suffer
from the "curse of dimensionality". We curb this curse through a regularizer
that permits hierarchical sparsity patterns by prioritizing the inclusion of
coefficients according to the recency of the information they contain.
Additionally, we investigate the presence of nowcasting relations by sparsely
estimating the MF-VAR error covariance matrix. We study predictive Granger
causality relations in a MF-VAR for the U.S. economy and construct a coincident
indicator of GDP growth. Supplementary Materials for this article are available
online.

arXiv link: http://arxiv.org/abs/2102.11780v2

Econometrics arXiv updated paper (originally submitted: 2021-02-23)

Non-stationary GARCH modelling for fitting higher order moments of financial series within moving time windows

Authors: Luke De Clerk, Sergey Savel'ev

Here, we have analysed a GARCH(1,1) model with the aim to fit higher order
moments for different companies' stock prices. When we assume a gaussian
conditional distribution, we fail to capture any empirical data when fitting
the first three even moments of financial time series. We show instead that a
double gaussian conditional probability distribution better captures the higher
order moments of the data. To demonstrate this point, we construct regions
(phase diagrams), in the fourth and sixth order standardised moment space,
where a GARCH(1,1) model can be used to fit these moments and compare them with
the corresponding moments from empirical data for different sectors of the
economy. We found that the ability of the GARCH model with a double gaussian
conditional distribution to fit higher order moments is dictated by the time
window our data spans. We can only fit data collected within specific time
window lengths and only with certain parameters of the conditional double
gaussian distribution. In order to incorporate the non-stationarity of
financial series, we assume that the parameters of the GARCH model have time
dependence.

arXiv link: http://arxiv.org/abs/2102.11627v4

Econometrics arXiv updated paper (originally submitted: 2021-02-22)

Bridging factor and sparse models

Authors: Jianqing Fan, Ricardo Masini, Marcelo C. Medeiros

Factor and sparse models are two widely used methods to impose a
low-dimensional structure in high-dimensions. However, they are seemingly
mutually exclusive. We propose a lifting method that combines the merits of
these two models in a supervised learning methodology that allows for
efficiently exploring all the information in high-dimensional datasets. The
method is based on a flexible model for high-dimensional panel data, called
factor-augmented regression model with observable and/or latent common factors,
as well as idiosyncratic components. This model not only includes both
principal component regression and sparse regression as specific models but
also significantly weakens the cross-sectional dependence and facilitates model
selection and interpretability. The method consists of several steps and a
novel test for (partial) covariance structure in high dimensions to infer the
remaining cross-section dependence at each step. We develop the theory for the
model and demonstrate the validity of the multiplier bootstrap for testing a
high-dimensional (partial) covariance structure. The theory is supported by a
simulation study and applications.

arXiv link: http://arxiv.org/abs/2102.11341v4

Econometrics arXiv paper, submitted: 2021-02-22

Misguided Use of Observed Covariates to Impute Missing Covariates in Conditional Prediction: A Shrinkage Problem

Authors: Charles F Manski, Michael Gmeiner, Anat Tamburc

Researchers regularly perform conditional prediction using imputed values of
missing data. However, applications of imputation often lack a firm foundation
in statistical theory. This paper originated when we were unable to find
analysis substantiating claims that imputation of missing data has good
frequentist properties when data are missing at random (MAR). We focused on the
use of observed covariates to impute missing covariates when estimating
conditional means of the form E(y|x, w). Here y is an outcome whose
realizations are always observed, x is a covariate whose realizations are
always observed, and w is a covariate whose realizations are sometimes
unobserved. We examine the probability limit of simple imputation estimates of
E(y|x, w) as sample size goes to infinity. We find that these estimates are not
consistent when covariate data are MAR. To the contrary, the estimates suffer
from a shrinkage problem. They converge to points intermediate between the
conditional mean of interest, E(y|x, w), and the mean E(y|x) that conditions
only on x. We use a type of genotype imputation to illustrate.

arXiv link: http://arxiv.org/abs/2102.11334v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-02-22

Estimating Sibling Spillover Effects with Unobserved Confounding Using Gain-Scores

Authors: David C. Mallinson, Felix Elwert

A growing area of research in epidemiology is the identification of
health-related sibling spillover effects, or the effect of one individual's
exposure on their sibling's outcome. The health and health care of family
members may be inextricably confounded by unobserved factors, rendering
identification of spillover effects within families particularly challenging.
We demonstrate a gain-score regression method for identifying
exposure-to-outcome spillover effects within sibling pairs in a linear fixed
effects framework. The method can identify the exposure-to-outcome spillover
effect if only one sibling's exposure affects the other's outcome; and it
identifies the difference between the spillover effects if both siblings'
exposures affect the others' outcomes. The method fails in the presence of
outcome-to-exposure spillover and outcome-to-outcome spillover. Analytic
results and Monte Carlo simulations demonstrate the method and its limitations.
To exercise this method, we estimate the spillover effect of a child's preterm
birth on an older sibling's literacy skills, measured by the Phonological
Awarenesses Literacy Screening-Kindergarten test. We analyze 20,010 sibling
pairs from a population-wide, Wisconsin-based (United States) birth cohort.
Without covariate adjustment, we estimate that preterm birth modestly decreases
an older sibling's test score (-2.11 points; 95% confidence interval: -3.82,
-0.40 points). In conclusion, gain-scores are a promising strategy for
identifying exposure-to-outcome spillovers in sibling pairs while controlling
for sibling-invariant unobserved confounding in linear settings.

arXiv link: http://arxiv.org/abs/2102.11150v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-02-22

Kernel Ridge Riesz Representers: Generalization, Mis-specification, and the Counterfactual Effective Dimension

Authors: Rahul Singh

Kernel balancing weights provide confidence intervals for average treatment
effects, based on the idea of balancing covariates for the treated group and
untreated group in feature space, often with ridge regularization. Previous
works on the classical kernel ridge balancing weights have certain limitations:
(i) not articulating generalization error for the balancing weights, (ii)
typically requiring correct specification of features, and (iii) justifying
Gaussian approximation for only average effects.
I interpret kernel balancing weights as kernel ridge Riesz representers
(KRRR) and address these limitations via a new characterization of the
counterfactual effective dimension. KRRR is an exact generalization of kernel
ridge regression and kernel ridge balancing weights. I prove strong properties
similar to kernel ridge regression: population $L_2$ rates controlling
generalization error, and a standalone closed form solution that can
interpolate. The framework relaxes the stringent assumption that the underlying
regression model is correctly specified by the features. It extends Gaussian
approximation beyond average effects to heterogeneous effects, justifying
confidence sets for causal functions. I use KRRR to quantify uncertainty for
heterogeneous treatment effects, by age, of 401(k) eligibility on assets.

arXiv link: http://arxiv.org/abs/2102.11076v4

Econometrics arXiv paper, submitted: 2021-02-21

Cointegrated Solutions of Unit-Root VARs: An Extended Representation Theorem

Authors: Mario Faliva, Maria Grazia Zoia

This paper establishes an extended representation theorem for unit-root VARs.
A specific algebraic technique is devised to recover stationarity from the
solution of the model in the form of a cointegrating transformation. Closed
forms of the results of interest are derived for integrated processes up to the
4-th order. An extension to higher-order processes turns out to be within the
reach on an induction argument.

arXiv link: http://arxiv.org/abs/2102.10626v1

Econometrics arXiv paper, submitted: 2021-02-21

A Novel Multi-Period and Multilateral Price Index

Authors: Consuelo Rubina Nava, Maria Grazia Zoia

A novel approach to price indices, leading to an innovative solution in both
a multi-period or a multilateral framework, is presented. The index turns out
to be the generalized least squares solution of a regression model linking
values and quantities of the commodities. The index reference basket, which is
the union of the intersections of the baskets of all country/period taken in
pair, has a coverage broader than extant indices. The properties of the index
are investigated and updating formulas established. Applications to both real
and simulated data provide evidence of the better index performance in
comparison with extant alternatives.

arXiv link: http://arxiv.org/abs/2102.10528v1

Econometrics arXiv paper, submitted: 2021-02-20

Estimation and Inference by Stochastic Optimization: Three Examples

Authors: Jean-Jacques Forneron, Serena Ng

This paper illustrates two algorithms designed in Forneron & Ng (2020): the
resampled Newton-Raphson (rNR) and resampled quasi-Newton (rqN) algorithms
which speed-up estimation and bootstrap inference for structural models. An
empirical application to BLP shows that computation time decreases from nearly
5 hours with the standard bootstrap to just over 1 hour with rNR, and only 15
minutes using rqN. A first Monte-Carlo exercise illustrates the accuracy of the
method for estimation and inference in a probit IV regression. A second
exercise additionally illustrates statistical efficiency gains relative to
standard estimation for simulation-based estimation using a dynamic panel
regression example.

arXiv link: http://arxiv.org/abs/2102.10443v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-02-20

Logarithmic Regret in Feature-based Dynamic Pricing

Authors: Jianyu Xu, Yu-Xiang Wang

Feature-based dynamic pricing is an increasingly popular model of setting
prices for highly differentiated products with applications in digital
marketing, online sales, real estate and so on. The problem was formally
studied as an online learning problem [Javanmard & Nazerzadeh, 2019] where a
seller needs to propose prices on the fly for a sequence of $T$ products based
on their features $x$ while having a small regret relative to the best --
"omniscient" -- pricing strategy she could have come up with in hindsight. We
revisit this problem and provide two algorithms (EMLP and ONSP) for stochastic
and adversarial feature settings, respectively, and prove the optimal
$O(dT)$ regret bounds for both. In comparison, the best existing results
are $O\left(\min\left\{1{\lambda_{\min}^2}T,
T\right\}\right)$ and $O(T^{2/3})$ respectively, with $\lambda_{\min}$
being the smallest eigenvalue of $E[xx^T]$ that could be arbitrarily
close to $0$. We also prove an $\Omega(T)$ information-theoretic lower
bound for a slightly more general setting, which demonstrates that
"knowing-the-demand-curve" leads to an exponential improvement in feature-based
dynamic pricing.

arXiv link: http://arxiv.org/abs/2102.10221v2

Econometrics arXiv paper, submitted: 2021-02-19

Monitoring the pandemic: A fractional filter for the COVID-19 contact rate

Authors: Tobias Hartl

This paper aims to provide reliable estimates for the COVID-19 contact rate
of a Susceptible-Infected-Recovered (SIR) model. From observable data on
confirmed, recovered, and deceased cases, a noisy measurement for the contact
rate can be constructed. To filter out measurement errors and seasonality, a
novel unobserved components (UC) model is set up. It specifies the log contact
rate as a latent, fractionally integrated process of unknown integration order.
The fractional specification reflects key characteristics of aggregate social
behavior such as strong persistence and gradual adjustments to new information.
A computationally simple modification of the Kalman filter is introduced and is
termed the fractional filter. It allows to estimate UC models with richer
long-run dynamics, and provides a closed-form expression for the prediction
error of UC models. Based on the latter, a conditional-sum-of-squares (CSS)
estimator for the model parameters is set up that is shown to be consistent and
asymptotically normally distributed. The resulting contact rate estimates for
several countries are well in line with the chronology of the pandemic, and
allow to identify different contact regimes generated by policy interventions.
As the fractional filter is shown to provide precise contact rate estimates at
the end of the sample, it bears great potential for monitoring the pandemic in
real time.

arXiv link: http://arxiv.org/abs/2102.10067v1

Econometrics arXiv updated paper (originally submitted: 2021-02-19)

Approximate Bayes factors for unit root testing

Authors: Magris Martin, Iosifidis Alexandros

This paper introduces a feasible and practical Bayesian method for unit root
testing in financial time series. We propose a convenient approximation of the
Bayes factor in terms of the Bayesian Information Criterion as a
straightforward and effective strategy for testing the unit root hypothesis.
Our approximate approach relies on few assumptions, is of general
applicability, and preserves a satisfactory error rate. Among its advantages,
it does not require the prior distribution on model's parameters to be
specified. Our simulation study and empirical application on real exchange
rates show great accordance between the suggested simple approach and both
Bayesian and non-Bayesian alternatives.

arXiv link: http://arxiv.org/abs/2102.10048v2

Econometrics arXiv paper, submitted: 2021-02-18

Spatial Correlation Robust Inference

Authors: Ulrich K. Müller, Mark W. Watson

We propose a method for constructing confidence intervals that account for
many forms of spatial correlation. The interval has the familiar `estimator
plus and minus a standard error times a critical value' form, but we propose
new methods for constructing the standard error and the critical value. The
standard error is constructed using population principal components from a
given `worst-case' spatial covariance model. The critical value is chosen to
ensure coverage in a benchmark parametric model for the spatial correlations.
The method is shown to control coverage in large samples whenever the spatial
correlation is weak, i.e., with average pairwise correlations that vanish as
the sample size gets large. We also provide results on correct coverage in a
restricted but nonparametric class of strong spatial correlations, as well as
on the efficiency of the method. In a design calibrated to match economic
activity in U.S. states the method outperforms previous suggestions for
spatially robust inference about the population mean.

arXiv link: http://arxiv.org/abs/2102.09353v1

Econometrics arXiv paper, submitted: 2021-02-18

Deep Structural Estimation: With an Application to Option Pricing

Authors: Hui Chen, Antoine Didisheim, Simon Scheidegger

We propose a novel structural estimation framework in which we train a
surrogate of an economic model with deep neural networks. Our methodology
alleviates the curse of dimensionality and speeds up the evaluation and
parameter estimation by orders of magnitudes, which significantly enhances
one's ability to conduct analyses that require frequent parameter
re-estimation. As an empirical application, we compare two popular option
pricing models (the Heston and the Bates model with double-exponential jumps)
against a non-parametric random forest model. We document that: a) the Bates
model produces better out-of-sample pricing on average, but both structural
models fail to outperform random forest for large areas of the volatility
surface; b) random forest is more competitive at short horizons (e.g., 1-day),
for short-dated options (with less than 7 days to maturity), and on days with
poor liquidity; c) both structural models outperform random forest in
out-of-sample delta hedging; d) the Heston model's relative performance has
deteriorated significantly after the 2008 financial crisis.

arXiv link: http://arxiv.org/abs/2102.09209v1

Econometrics arXiv updated paper (originally submitted: 2021-02-17)

On the implementation of Approximate Randomization Tests in Linear Models with a Small Number of Clusters

Authors: Yong Cai, Ivan A. Canay, Deborah Kim, Azeem M. Shaikh

This paper provides a user's guide to the general theory of approximate
randomization tests developed in Canay, Romano, and Shaikh (2017) when
specialized to linear regressions with clustered data. An important feature of
the methodology is that it applies to settings in which the number of clusters
is small -- even as small as five. We provide a step-by-step algorithmic
description of how to implement the test and construct confidence intervals for
the parameter of interest. In doing so, we additionally present three novel
results concerning the methodology: we show that the method admits an
equivalent implementation based on weighted scores; we show the test and
confidence intervals are invariant to whether the test statistic is studentized
or not; and we prove convexity of the confidence intervals for scalar
parameters. We also articulate the main requirements underlying the test,
emphasizing in particular common pitfalls that researchers may encounter.
Finally, we illustrate the use of the methodology with two applications that
further illuminate these points. The companion {\tt R} and {\tt Stata} packages
facilitate the implementation of the methodology and the replication of the
empirical exercises.

arXiv link: http://arxiv.org/abs/2102.09058v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-02-17

Big Data meets Causal Survey Research: Understanding Nonresponse in the Recruitment of a Mixed-mode Online Panel

Authors: Barbara Felderer, Jannis Kueck, Martin Spindler

Survey scientists increasingly face the problem of high-dimensionality in
their research as digitization makes it much easier to construct
high-dimensional (or "big") data sets through tools such as online surveys and
mobile applications. Machine learning methods are able to handle such data, and
they have been successfully applied to solve predictive problems.
However, in many situations, survey statisticians want to learn about
causal relationships to draw conclusions and be able to transfer the
findings of one survey to another. Standard machine learning methods provide
biased estimates of such relationships. We introduce into survey statistics the
double machine learning approach, which gives approximately unbiased estimators
of causal parameters, and show how it can be used to analyze survey nonresponse
in a high-dimensional panel setting.

arXiv link: http://arxiv.org/abs/2102.08994v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-02-17

Adaptive Doubly Robust Estimator from Non-stationary Logging Policy under a Convergence of Average Probability

Authors: Masahiro Kato

Adaptive experiments, including efficient average treatment effect estimation
and multi-armed bandit algorithms, have garnered attention in various
applications, such as social experiments, clinical trials, and online
advertisement optimization. This paper considers estimating the mean outcome of
an action from samples obtained in adaptive experiments. In causal inference,
the mean outcome of an action has a crucial role, and the estimation is an
essential task, where the average treatment effect estimation and off-policy
value estimation are its variants. In adaptive experiments, the probability of
choosing an action (logging policy) is allowed to be sequentially updated based
on past observations. Due to this logging policy depending on the past
observations, the samples are often not independent and identically distributed
(i.i.d.), making developing an asymptotically normal estimator difficult. A
typical approach for this problem is to assume that the logging policy
converges in a time-invariant function. However, this assumption is restrictive
in various applications, such as when the logging policy fluctuates or becomes
zero at some periods. To mitigate this limitation, we propose another
assumption that the average logging policy converges to a time-invariant
function and show the doubly robust (DR) estimator's asymptotic normality.
Under the assumption, the logging policy itself can fluctuate or be zero for
some actions. We also show the empirical properties by simulations.

arXiv link: http://arxiv.org/abs/2102.08975v2

Econometrics arXiv updated paper (originally submitted: 2021-02-17)

Testing for Nonlinear Cointegration under Heteroskedasticity

Authors: Christoph Hanck, Till Massing

This article discusses Shin (1994, Econometric Theory)-type tests for
nonlinear cointegration in the presence of variance breaks. We build on
cointegration test approaches under heteroskedasticity (Cavaliere and Taylor,
2006, Journal of Time Series Analysis) and nonlinearity, serial correlation,
and endogeneity (Choi and Saikkonen, 2010, Econometric Theory) to propose a
bootstrap test and prove its consistency. A Monte Carlo study shows the
approach to have satisfactory finite-sample properties in a variety of
scenarios. We provide an empirical application to the environmental Kuznets
curves (EKC), finding that the cointegration test provides little evidence for
the EKC hypothesis. Additionally, we examine a nonlinear relation between the
US money demand and the interest rate, finding that our test does not reject
the null of a smooth transition cointegrating relation

arXiv link: http://arxiv.org/abs/2102.08809v5

Econometrics arXiv updated paper (originally submitted: 2021-02-16)

On-Demand Transit User Preference Analysis using Hybrid Choice Models

Authors: Nael Alsaleh, Bilal Farooq, Yixue Zhang, Steven Farber

In light of the increasing interest to transform the fixed-route public
transit (FRT) services into on-demand transit (ODT) services, there exists a
strong need for a comprehensive evaluation of the effects of this shift on the
users. Such an analysis can help the municipalities and service providers to
design and operate more convenient, attractive, and sustainable transit
solutions. To understand the user preferences, we developed three hybrid choice
models: integrated choice and latent variable (ICLV), latent class (LC), and
latent class integrated choice and latent variable (LC-ICLV) models. We used
these models to analyze the public transit user's preferences in Belleville,
Ontario, Canada. Hybrid choice models were estimated using a rich dataset that
combined the actual level of service attributes obtained from Belleville's ODT
service and self-reported usage behaviour obtained from a revealed preference
survey of the ODT users. The latent class models divided the users into two
groups with different travel behaviour and preferences. The results showed that
the captive user's preference for ODT service was significantly affected by the
number of unassigned trips, in-vehicle time, and main travel mode before the
ODT service started. On the other hand, the non-captive user's service
preference was significantly affected by the Time Sensitivity and the Online
Service Satisfaction latent variables, as well as the performance of the ODT
service and trip purpose. This study attaches importance to improving the
reliability and performance of the ODT service and outlines directions for
reducing operational costs by updating the required fleet size and assigning
more vehicles for work-related trips.

arXiv link: http://arxiv.org/abs/2102.08256v2

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2021-02-16

LATE for History

Authors: Alberto Bisin, Andrea Moro

In Historical Economics, Persistence studies document the persistence of some
historical phenomenon or leverage this persistence to identify causal
relationships of interest in the present. In this chapter, we analyze the
implications of allowing for heterogeneous treatment effects in these studies.
We delineate their common empirical structure, argue that heterogeneous
treatment effects are likely in their context, and propose minimal abstract
models that help interpret results and guide the development of empirical
strategies to uncover the mechanisms generating the effects.

arXiv link: http://arxiv.org/abs/2102.08174v1

Econometrics arXiv updated paper (originally submitted: 2021-02-16)

A Unified Framework for Specification Tests of Continuous Treatment Effect Models

Authors: Wei Huang, Oliver Linton, Zheng Zhang

We propose a general framework for the specification testing of continuous
treatment effect models. We assume a general residual function, which includes
the average and quantile treatment effect models as special cases. The null
models are identified under the unconfoundedness condition and contain a
nonparametric weighting function. We propose a test statistic for the null
model in which the weighting function is estimated by solving an expanding set
of moment equations. We establish the asymptotic distributions of our test
statistic under the null hypothesis and under fixed and local alternatives. The
proposed test statistic is shown to be more efficient than that constructed
from the true weighting function and can detect local alternatives deviated
from the null models at the rate of $O(N^{-1/2})$. A simulation method is
provided to approximate the null distribution of the test statistic.
Monte-Carlo simulations show that our test exhibits a satisfactory
finite-sample performance, and an application shows its practical value.

arXiv link: http://arxiv.org/abs/2102.08063v2

Econometrics arXiv paper, submitted: 2021-02-16

Constructing valid instrumental variables in generalized linear causal models from directed acyclic graphs

Authors: Øyvind Hoveid

Unlike other techniques of causality inference, the use of valid instrumental
variables can deal with unobserved sources of both variable errors, variable
omissions, and sampling bias, and still arrive at consistent estimates of
average treatment effects. The only problem is to find the valid instruments.
Using the definition of Pearl (2009) of valid instrumental variables, a formal
condition for validity can be stated for variables in generalized linear causal
models. The condition can be applied in two different ways: As a tool for
constructing valid instruments, or as a foundation for testing whether an
instrument is valid. When perfectly valid instruments are not found, the
squared bias of the IV-estimator induced by an imperfectly valid instrument --
estimated with bootstrapping -- can be added to its empirical variance in a
mean-square-error-like reliability measure.

arXiv link: http://arxiv.org/abs/2102.08056v1

Econometrics arXiv paper, submitted: 2021-02-15

Entropy methods for identifying hedonic models

Authors: Arnaud Dupuy, Alfred Galichon, Marc Henry

This paper contributes to the literature on hedonic models in two ways.
First, it makes use of Queyranne's reformulation of a hedonic model in the
discrete case as a network flow problem in order to provide a proof of
existence and integrality of a hedonic equilibrium and efficient computation of
hedonic prices. Second, elaborating on entropic methods developed in Galichon
and Salani\'{e} (2014), this paper proposes a new identification strategy for
hedonic models in a single market. This methodology allows one to introduce
heterogeneities in both consumers' and producers' attributes and to recover
producers' profits and consumers' utilities based on the observation of
production and consumption patterns and the set of hedonic prices.

arXiv link: http://arxiv.org/abs/2102.07491v1

Econometrics arXiv updated paper (originally submitted: 2021-02-13)

A Distance Covariance-based Estimator

Authors: Emmanuel Selorm Tsyawo, Abdul-Nasah Soale

This paper introduces an estimator that significantly weakens the relevance
condition of conventional instrumental variable (IV) methods, allowing
endogenous covariates to be weakly correlated, uncorrelated, or even
mean-independent, though not independent of instruments. As a result, the
estimator can exploit the maximum number of relevant instruments in any given
empirical setting. Identification is feasible without excludability, and the
disturbance term does not need to possess finite moments. Identification is
achieved under a weak conditional median independence condition on pairwise
differences in disturbances, along with mild regularity conditions.
Furthermore, the estimator is shown to be consistent and asymptotically normal.
The relevance condition required for identification is shown to be testable.

arXiv link: http://arxiv.org/abs/2102.07008v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-02-12

Statistical Power for Estimating Treatment Effects Using Difference-in-Differences and Comparative Interrupted Time Series Designs with Variation in Treatment Timing

Authors: Peter Z. Schochet

This article develops new closed-form variance expressions for power analyses
for commonly used difference-in-differences (DID) and comparative interrupted
time series (CITS) panel data estimators. The main contribution is to
incorporate variation in treatment timing into the analysis. The power formulas
also account for other key design features that arise in practice:
autocorrelated errors, unequal measurement intervals, and clustering due to the
unit of treatment assignment. We consider power formulas for both
cross-sectional and longitudinal models and allow for covariates. An
illustrative power analysis provides guidance on appropriate sample sizes. The
key finding is that accounting for treatment timing increases required sample
sizes. Further, DID estimators have considerably more power than standard CITS
and ITS estimators. An available Shiny R dashboard performs the sample size
calculations for the considered estimators.

arXiv link: http://arxiv.org/abs/2102.06770v2

Econometrics arXiv paper, submitted: 2021-02-12

Linear programming approach to nonparametric inference under shape restrictions: with an application to regression kink designs

Authors: Harold D. Chiang, Kengo Kato, Yuya Sasaki, Takuya Ura

We develop a novel method of constructing confidence bands for nonparametric
regression functions under shape constraints. This method can be implemented
via a linear programming, and it is thus computationally appealing. We
illustrate a usage of our proposed method with an application to the regression
kink design (RKD). Econometric analyses based on the RKD often suffer from wide
confidence intervals due to slow convergence rates of nonparametric derivative
estimators. We demonstrate that economic models and structures motivate shape
restrictions, which in turn contribute to shrinking the confidence interval for
an analysis of the causal effects of unemployment insurance benefits on
unemployment durations.

arXiv link: http://arxiv.org/abs/2102.06586v1

Econometrics arXiv paper, submitted: 2021-02-12

Identification and Inference Under Narrative Restrictions

Authors: Raffaella Giacomini, Toru Kitagawa, Matthew Read

We consider structural vector autoregressions subject to 'narrative
restrictions', which are inequality restrictions on functions of the structural
shocks in specific periods. These restrictions raise novel problems related to
identification and inference, and there is currently no frequentist procedure
for conducting inference in these models. We propose a solution that is valid
from both Bayesian and frequentist perspectives by: 1) formalizing the
identification problem under narrative restrictions; 2) correcting a feature of
the existing (single-prior) Bayesian approach that can distort inference; 3)
proposing a robust (multiple-prior) Bayesian approach that is useful for
assessing and eliminating the posterior sensitivity that arises in these models
due to the likelihood having flat regions; and 4) showing that the robust
Bayesian approach has asymptotic frequentist validity. We illustrate our
methods by estimating the effects of US monetary policy under a variety of
narrative restrictions.

arXiv link: http://arxiv.org/abs/2102.06456v1

Econometrics arXiv paper, submitted: 2021-02-11

Inference on two component mixtures under tail restrictions

Authors: Marc Henry, Koen Jochmans, Bernard Salanié

Many econometric models can be analyzed as finite mixtures. We focus on
two-component mixtures and we show that they are nonparametrically point
identified by a combination of an exclusion restriction and tail restrictions.
Our identification analysis suggests simple closed-form estimators of the
component distributions and mixing proportions, as well as a specification
test. We derive their asymptotic properties using results on tail empirical
processes and we present a simulation study that documents their finite-sample
performance.

arXiv link: http://arxiv.org/abs/2102.06232v1

Econometrics arXiv paper, submitted: 2021-02-10

Interactive Network Visualization of Opioid Crisis Related Data- Policy, Pharmaceutical, Training, and More

Authors: Olga Scrivner, Elizabeth McAvoy, Thuy Nguyen, Tenzin Choeden, Kosali Simon, Katy Börner

Responding to the U.S. opioid crisis requires a holistic approach supported
by evidence from linking and analyzing multiple data sources. This paper
discusses how 20 available resources can be combined to answer pressing public
health questions related to the crisis. It presents a network view based on
U.S. geographical units and other standard concepts, crosswalked to communicate
the coverage and interlinkage of these resources. These opioid-related datasets
can be grouped by four themes: (1) drug prescriptions, (2) opioid related
harms, (3) opioid treatment workforce, jobs, and training, and (4) drug policy.
An interactive network visualization was created and is freely available
online; it lets users explore key metadata, relevant scholarly works, and data
interlinkages in support of informed decision making through data analysis.

arXiv link: http://arxiv.org/abs/2102.05596v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-02-08

Sharp Sensitivity Analysis for Inverse Propensity Weighting via Quantile Balancing

Authors: Jacob Dorn, Kevin Guo

Inverse propensity weighting (IPW) is a popular method for estimating
treatment effects from observational data. However, its correctness relies on
the untestable (and frequently implausible) assumption that all confounders
have been measured. This paper introduces a robust sensitivity analysis for IPW
that estimates the range of treatment effects compatible with a given amount of
unobserved confounding. The estimated range converges to the narrowest possible
interval (under the given assumptions) that must contain the true treatment
effect. Our proposal is a refinement of the influential sensitivity analysis by
Zhao, Small, and Bhattacharya (2019), which we show gives bounds that are too
wide even asymptotically. This analysis is based on new partial identification
results for Tan (2006)'s marginal sensitivity model.

arXiv link: http://arxiv.org/abs/2102.04543v3

Econometrics arXiv paper, submitted: 2021-02-08

Assessing Sensitivity of Machine Learning Predictions.A Novel Toolbox with an Application to Financial Literacy

Authors: Falco J. Bargagli Stoffi, Kenneth De Beckker, Joana E. Maldonado, Kristof De Witte

Despite their popularity, machine learning predictions are sensitive to
potential unobserved predictors. This paper proposes a general algorithm that
assesses how the omission of an unobserved variable with high explanatory power
could affect the predictions of the model. Moreover, the algorithm extends the
usage of machine learning from pointwise predictions to inference and
sensitivity analysis. In the application, we show how the framework can be
applied to data with inherent uncertainty, such as students' scores in a
standardized assessment on financial literacy. First, using Bayesian Additive
Regression Trees (BART), we predict students' financial literacy scores (FLS)
for a subgroup of students with missing FLS. Then, we assess the sensitivity of
predictions by comparing the predictions and performance of models with and
without a highly explanatory synthetic predictor. We find no significant
difference in the predictions and performances of the augmented (i.e., the
model with the synthetic predictor) and original model. This evidence sheds a
light on the stability of the predictive model used in the application. The
proposed methodology can be used, above and beyond our motivating empirical
example, in a wide range of machine learning applications in social and health
sciences.

arXiv link: http://arxiv.org/abs/2102.04382v1

Econometrics arXiv updated paper (originally submitted: 2021-02-08)

Duality in dynamic discrete-choice models

Authors: Khai Xiang Chiong, Alfred Galichon, Matt Shum

Using results from convex analysis, we investigate a novel approach to
identification and estimation of discrete choice models which we call the Mass
Transport Approach (MTA). We show that the conditional choice probabilities and
the choice-specific payoffs in these models are related in the sense of
conjugate duality, and that the identification problem is a mass transport
problem. Based on this, we propose a new two-step estimator for these models;
interestingly, the first step of our estimator involves solving a linear
program which is identical to the classic assignment (two-sided matching) game
of Shapley and Shubik (1971). The application of convex-analytic tools to
dynamic discrete choice models, and the connection with two-sided matching
models, is new in the literature.

arXiv link: http://arxiv.org/abs/2102.06076v2

Econometrics arXiv paper, submitted: 2021-02-08

Extreme dependence for multivariate data

Authors: Damien Bosc, Alfred Galichon

This article proposes a generalized notion of extreme multivariate dependence
between two random vectors which relies on the extremality of the
cross-covariance matrix between these two vectors. Using a partial ordering on
the cross-covariance matrices, we also generalize the notion of positive upper
dependence. We then proposes a means to quantify the strength of the dependence
between two given multivariate series and to increase this strength while
preserving the marginal distributions. This allows for the design of
stress-tests of the dependence between two sets of financial variables, that
can be useful in portfolio management or derivatives pricing.

arXiv link: http://arxiv.org/abs/2102.04461v1

Econometrics arXiv paper, submitted: 2021-02-08

Dilation bootstrap

Authors: Alfred Galichon, Marc Henry

We propose a methodology for constructing confidence regions with partially
identified models of general form. The region is obtained by inverting a test
of internal consistency of the econometric structure. We develop a dilation
bootstrap methodology to deal with sampling uncertainty without reference to
the hypothesized economic structure. It requires bootstrapping the quantile
process for univariate data and a novel generalization of the latter to higher
dimensions. Once the dilation is chosen to control the confidence level, the
unknown true distribution of the observed data can be replaced by the known
empirical distribution and confidence regions can then be obtained as in
Galichon and Henry (2011) and Beresteanu, Molchanov and Molinari (2011).

arXiv link: http://arxiv.org/abs/2102.04457v1

Econometrics arXiv updated paper (originally submitted: 2021-02-08)

Optimal transportation and the falsifiability of incompletely specified economic models

Authors: Ivar Ekeland, Alfred Galichon, Marc Henry

A general framework is given to analyze the falsifiability of economic models
based on a sample of their observable components. It is shown that, when the
restrictions implied by the economic theory are insufficient to identify the
unknown quantities of the structure, the duality of optimal transportation with
zero-one cost function delivers interpretable and operational formulations of
the hypothesis of specification correctness from which tests can be constructed
to falsify the model.

arXiv link: http://arxiv.org/abs/2102.04162v2

Econometrics arXiv paper, submitted: 2021-02-08

A test of non-identifying restrictions and confidence regions for partially identified parameters

Authors: Alfred Galichon, Marc Henry

We propose an easily implementable test of the validity of a set of
theoretical restrictions on the relationship between economic variables, which
do not necessarily identify the data generating process. The restrictions can
be derived from any model of interactions, allowing censoring and multiple
equilibria. When the restrictions are parameterized, the test can be inverted
to yield confidence regions for partially identified parameters, thereby
complementing other proposals, primarily Chernozhukov et al. [Chernozhukov, V.,
Hong, H., Tamer, E., 2007. Estimation and confidence regions for parameter sets
in econometric models. Econometrica 75, 1243-1285].

arXiv link: http://arxiv.org/abs/2102.04151v1

Econometrics arXiv updated paper (originally submitted: 2021-02-08)

A note on global identification in structural vector autoregressions

Authors: Emanuele Bacchiocchi, Toru Kitagawa

In a landmark contribution to the structural vector autoregression (SVARs)
literature, Rubio-Ramirez, Waggoner, and Zha (2010, `Structural Vector
Autoregressions: Theory of Identification and Algorithms for Inference,' Review
of Economic Studies) shows a necessary and sufficient condition for equality
restrictions to globally identify the structural parameters of a SVAR. The
simplest form of the necessary and sufficient condition shown in Theorem 7 of
Rubio-Ramirez et al (2010) checks the number of zero restrictions and the ranks
of particular matrices without requiring knowledge of the true value of the
structural or reduced-form parameters. However, this note shows by
counterexample that this condition is not sufficient for global identification.
Analytical investigation of the counterexample clarifies why their sufficiency
claim breaks down. The problem with the rank condition is that it allows for
the possibility that restrictions are redundant, in the sense that one or more
restrictions may be implied by other restrictions, in which case the implied
restriction contains no identifying information. We derive a modified necessary
and sufficient condition for SVAR global identification and clarify how it can
be assessed in practice.

arXiv link: http://arxiv.org/abs/2102.04048v2

Econometrics arXiv updated paper (originally submitted: 2021-02-07)

Inference under Covariate-Adaptive Randomization with Imperfect Compliance

Authors: Federico A. Bugni, Mengsi Gao

This paper studies inference in a randomized controlled trial (RCT) with
covariate-adaptive randomization (CAR) and imperfect compliance of a binary
treatment. In this context, we study inference on the LATE. As in Bugni et al.
(2018,2019), CAR refers to randomization schemes that first stratify according
to baseline covariates and then assign treatment status so as to achieve
"balance" within each stratum. In contrast to these papers, however, we allow
participants of the RCT to endogenously decide to comply or not with the
assigned treatment status.
We study the properties of an estimator of the LATE derived from a "fully
saturated" IV linear regression, i.e., a linear regression of the outcome on
all indicators for all strata and their interaction with the treatment
decision, with the latter instrumented with the treatment assignment. We show
that the proposed LATE estimator is asymptotically normal, and we characterize
its asymptotic variance in terms of primitives of the problem. We provide
consistent estimators of the standard errors and asymptotically exact
hypothesis tests. In the special case when the target proportion of units
assigned to each treatment does not vary across strata, we can also consider
two other estimators of the LATE, including the one based on the "strata fixed
effects" IV linear regression, i.e., a linear regression of the outcome on
indicators for all strata and the treatment decision, with the latter
instrumented with the treatment assignment.
Our characterization of the asymptotic variance of the LATE estimators allows
us to understand the influence of the parameters of the RCT. We use this to
propose strategies to minimize their asymptotic variance in a hypothetical RCT
based on data from a pilot study. We illustrate the practical relevance of
these results using a simulation study and an empirical application based on
Dupas et al. (2018).

arXiv link: http://arxiv.org/abs/2102.03937v3

Econometrics arXiv paper, submitted: 2021-02-07

Identification of Matching Complementarities: A Geometric Viewpoint

Authors: Alfred Galichon

We provide a geometric formulation of the problem of identification of the
matching surplus function and we show how the estimation problem can be solved
by the introduction of a generalized entropy function over the set of
matchings.

arXiv link: http://arxiv.org/abs/2102.03875v1

Econometrics arXiv cross-link from cs.CV (cs.CV), submitted: 2021-02-05

Applications of Machine Learning in Document Digitisation

Authors: Christian M. Dahl, Torben S. D. Johansen, Emil N. Sørensen, Christian E. Westermann, Simon F. Wittrock

Data acquisition forms the primary step in all empirical research. The
availability of data directly impacts the quality and extent of conclusions and
insights. In particular, larger and more detailed datasets provide convincing
answers even to complex research questions. The main problem is that 'large and
detailed' usually implies 'costly and difficult', especially when the data
medium is paper and books. Human operators and manual transcription have been
the traditional approach for collecting historical data. We instead advocate
the use of modern machine learning techniques to automate the digitisation
process. We give an overview of the potential for applying machine digitisation
for data collection through two illustrative applications. The first
demonstrates that unsupervised layout classification applied to raw scans of
nurse journals can be used to construct a treatment indicator. Moreover, it
allows an assessment of assignment compliance. The second application uses
attention-based neural networks for handwritten text recognition in order to
transcribe age and birth and death dates from a large collection of Danish
death certificates. We describe each step in the digitisation pipeline and
provide implementation insights.

arXiv link: http://arxiv.org/abs/2102.03239v1

Econometrics arXiv paper, submitted: 2021-02-05

Hypothetical bias in stated choice experiments: Part II. Macro-scale analysis of literature and effectiveness of bias mitigation methods

Authors: Milad Haghani, Michiel C. J. Bliemer, John M. Rose, Harmen Oppewal, Emily Lancsar

This paper reviews methods of hypothetical bias (HB) mitigation in choice
experiments (CEs). It presents a bibliometric analysis and summary of empirical
evidence of their effectiveness. The paper follows the review of empirical
evidence on the existence of HB presented in Part I of this study. While the
number of CE studies has rapidly increased since 2010, the critical issue of HB
has been studied in only a small fraction of CE studies. The present review
includes both ex-ante and ex-post bias mitigation methods. Ex-ante bias
mitigation methods include cheap talk, real talk, consequentiality scripts,
solemn oath scripts, opt-out reminders, budget reminders, honesty priming,
induced truth telling, indirect questioning, time to think and pivot designs.
Ex-post methods include follow-up certainty calibration scales, respondent
perceived consequentiality scales, and revealed-preference-assisted estimation.
It is observed that the use of mitigation methods markedly varies across
different sectors of applied economics. The existing empirical evidence points
to their overall effectives in reducing HB, although there is some variation.
The paper further discusses how each mitigation method can counter a certain
subset of HB sources. Considering the prevalence of HB in CEs and the
effectiveness of bias mitigation methods, it is recommended that implementation
of at least one bias mitigation method (or a suitable combination where
possible) becomes standard practice in conducting CEs. Mitigation method(s)
suited to the particular application should be implemented to ensure that
inferences and subsequent policy decisions are as much as possible free of HB.

arXiv link: http://arxiv.org/abs/2102.02945v1

Econometrics arXiv paper, submitted: 2021-02-05

Hypothetical bias in stated choice experiments: Part I. Integrative synthesis of empirical evidence and conceptualisation of external validity

Authors: Milad Haghani, Michiel C. J. Bliemer, John M. Rose, Harmen Oppewal, Emily Lancsar

The notion of hypothetical bias (HB) constitutes, arguably, the most
fundamental issue in relation to the use of hypothetical survey methods.
Whether or to what extent choices of survey participants and subsequent
inferred estimates translate to real-world settings continues to be debated.
While HB has been extensively studied in the broader context of contingent
valuation, it is much less understood in relation to choice experiments (CE).
This paper reviews the empirical evidence for HB in CE in various fields of
applied economics and presents an integrative framework for how HB relates to
external validity. Results suggest mixed evidence on the prevalence, extent and
direction of HB as well as considerable context and measurement dependency.
While HB is found to be an undeniable issue when conducting CEs, the empirical
evidence on HB does not render CEs unable to represent real-world preferences.
While health-related choice experiments often find negligible degrees of HB,
experiments in consumer behaviour and transport domains suggest that
significant degrees of HB are ubiquitous. Assessments of bias in environmental
valuation studies provide mixed evidence. Also, across these disciplines many
studies display HB in their total willingness to pay estimates and opt-in rates
but not in their hypothetical marginal rates of substitution (subject to scale
correction). Further, recent findings in psychology and brain imaging studies
suggest neurocognitive mechanisms underlying HB that may explain some of the
discrepancies and unexpected findings in the mainstream CE literature. The
review also observes how the variety of operational definitions of HB prohibits
consistent measurement of HB in CE. The paper further identifies major sources
of HB and possible moderating factors. Finally, it explains how HB represents
one component of the wider concept of external validity.

arXiv link: http://arxiv.org/abs/2102.02940v1

Econometrics arXiv paper, submitted: 2021-02-04

The Econometrics and Some Properties of Separable Matching Models

Authors: Alfred Galichon, Bernard Salanié

We present a class of one-to-one matching models with perfectly transferable
utility. We discuss identification and inference in these separable models, and
we show how their comparative statics are readily analyzed.

arXiv link: http://arxiv.org/abs/2102.02564v1

Econometrics arXiv paper, submitted: 2021-02-03

Discretizing Unobserved Heterogeneity

Authors: Stéphane Bonhomme Thibaut Lamadon Elena Manresa

We study discrete panel data methods where unobserved heterogeneity is
revealed in a first step, in environments where population heterogeneity is not
discrete. We focus on two-step grouped fixed-effects (GFE) estimators, where
individuals are first classified into groups using kmeans clustering, and the
model is then estimated allowing for group-specific heterogeneity. Our
framework relies on two key properties: heterogeneity is a function - possibly
nonlinear and time-varying - of a low-dimensional continuous latent type, and
informative moments are available for classification. We illustrate the method
in a model of wages and labor market participation, and in a probit model with
time-varying heterogeneity. We derive asymptotic expansions of two-step GFE
estimators as the number of groups grows with the two dimensions of the panel.
We propose a data-driven rule for the number of groups, and discuss bias
reduction and inference.

arXiv link: http://arxiv.org/abs/2102.02124v1

Econometrics arXiv paper, submitted: 2021-02-02

Teams: Heterogeneity, Sorting, and Complementarity

Authors: Stephane Bonhomme

How much do individuals contribute to team output? I propose an econometric
framework to quantify individual contributions when only the output of their
teams is observed. The identification strategy relies on following individuals
who work in different teams over time. I consider two production technologies.
For a production function that is additive in worker inputs, I propose a
regression estimator and show how to obtain unbiased estimates of variance
components that measure the contributions of heterogeneity and sorting. To
estimate nonlinear models with complementarity, I propose a mixture approach
under the assumption that individual types are discrete, and rely on a
mean-field variational approximation for estimation. To illustrate the methods,
I estimate the impact of economists on their research output, and the
contributions of inventors to the quality of their patents.

arXiv link: http://arxiv.org/abs/2102.01802v1

Econometrics arXiv paper, submitted: 2021-02-02

Adaptive Random Bandwidth for Inference in CAViaR Models

Authors: Alain Hecq, Li Sun

This paper investigates the size performance of Wald tests for CAViaR models
(Engle and Manganelli, 2004). We find that the usual estimation strategy on
test statistics yields inaccuracies. Indeed, we show that existing density
estimation methods cannot adapt to the time-variation in the conditional
probability densities of CAViaR models. Consequently, we develop a method
called adaptive random bandwidth which can approximate time-varying conditional
probability densities robustly for inference testing on CAViaR models based on
the asymptotic normality of the model parameter estimator. This proposed method
also avoids the problem of choosing an optimal bandwidth in estimating
probability densities, and can be extended to multivariate quantile regressions
straightforward.

arXiv link: http://arxiv.org/abs/2102.01636v1

Econometrics arXiv updated paper (originally submitted: 2021-02-02)

Efficient Estimation for Staggered Rollout Designs

Authors: Jonathan Roth, Pedro H. C. Sant'Anna

We study estimation of causal effects in staggered rollout designs, i.e.
settings where there is staggered treatment adoption and the timing of
treatment is as-good-as randomly assigned. We derive the most efficient
estimator in a class of estimators that nests several popular generalized
difference-in-differences methods. A feasible plug-in version of the efficient
estimator is asymptotically unbiased with efficiency (weakly) dominating that
of existing approaches. We provide both $t$-based and permutation-test-based
methods for inference. In an application to a training program for police
officers, confidence intervals for the proposed estimator are as much as eight
times shorter than for existing approaches.

arXiv link: http://arxiv.org/abs/2102.01291v7

Econometrics arXiv updated paper (originally submitted: 2021-02-01)

A first-stage representation for instrumental variables quantile regression

Authors: Javier Alejo, Antonio F. Galvao, Gabriel Montes-Rojas

This paper develops a first-stage linear regression representation for the
instrumental variables (IV) quantile regression (QR) model. The quantile
first-stage is analogous to the least squares case, i.e., a linear projection
of the endogenous variables on the instruments and other exogenous covariates,
with the difference that the QR case is a weighted projection. The weights are
given by the conditional density function of the innovation term in the QR
structural model, conditional on the endogeneous and exogenous covariates, and
the instruments as well, at a given quantile. We also show that the required
Jacobian identification conditions for IVQR models are embedded in the quantile
first-stage. We then suggest inference procedures to evaluate the adequacy of
instruments by evaluating their statistical significance using the first-stage
result. The test is developed in an over-identification context, since
consistent estimation of the weights for implementation of the first-stage
requires at least one valid instrument to be available. Monte Carlo experiments
provide numerical evidence that the proposed tests work as expected in terms of
empirical size and power in finite samples. An empirical application
illustrates that checking for the statistical significance of the instruments
at different quantiles is important. The proposed procedures may be specially
useful in QR since the instruments may be relevant at some quantiles but not at
others.

arXiv link: http://arxiv.org/abs/2102.01212v4

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-02-01

Comparing hundreds of machine learning classifiers and discrete choice models in predicting travel behavior: an empirical benchmark

Authors: Shenhao Wang, Baichuan Mo, Yunhan Zheng, Stephane Hess, Jinhua Zhao

Numerous studies have compared machine learning (ML) and discrete choice
models (DCMs) in predicting travel demand. However, these studies often lack
generalizability as they compare models deterministically without considering
contextual variations. To address this limitation, our study develops an
empirical benchmark by designing a tournament model, thus efficiently
summarizing a large number of experiments, quantifying the randomness in model
comparisons, and using formal statistical tests to differentiate between the
model and contextual effects. This benchmark study compares two large-scale
data sources: a database compiled from literature review summarizing 136
experiments from 35 studies, and our own experiment data, encompassing a total
of 6,970 experiments from 105 models and 12 model families. This benchmark
study yields two key findings. Firstly, many ML models, particularly the
ensemble methods and deep learning, statistically outperform the DCM family
(i.e., multinomial, nested, and mixed logit models). However, this study also
highlights the crucial role of the contextual factors (i.e., data sources,
inputs and choice categories), which can explain models' predictive performance
more effectively than the differences in model types alone. Model performance
varies significantly with data sources, improving with larger sample sizes and
lower dimensional alternative sets. After controlling all the model and
contextual factors, significant randomness still remains, implying inherent
uncertainty in such model comparisons. Overall, we suggest that future
researchers shift more focus from context-specific model comparisons towards
examining model transferability across contexts and characterizing the inherent
uncertainty in ML, thus creating more robust and generalizable next-generation
travel demand models.

arXiv link: http://arxiv.org/abs/2102.01130v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2021-02-01

CRPS Learning

Authors: Jonathan Berrisch, Florian Ziel

Combination and aggregation techniques can significantly improve forecast
accuracy. This also holds for probabilistic forecasting methods where
predictive distributions are combined. There are several time-varying and
adaptive weighting schemes such as Bayesian model averaging (BMA). However, the
quality of different forecasts may vary not only over time but also within the
distribution. For example, some distribution forecasts may be more accurate in
the center of the distributions, while others are better at predicting the
tails. Therefore, we introduce a new weighting method that considers the
differences in performance over time and within the distribution. We discuss
pointwise combination algorithms based on aggregation across quantiles that
optimize with respect to the continuous ranked probability score (CRPS). After
analyzing the theoretical properties of pointwise CRPS learning, we discuss B-
and P-Spline-based estimation techniques for batch and online learning, based
on quantile regression and prediction with expert advice. We prove that the
proposed fully adaptive Bernstein online aggregation (BOA) method for pointwise
CRPS online learning has optimal convergence properties. They are confirmed in
simulations and a probabilistic forecasting study for European emission
allowance (EUA) prices.

arXiv link: http://arxiv.org/abs/2102.00968v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-01-30

Time Series (re)sampling using Generative Adversarial Networks

Authors: Christian M. Dahl, Emil N. Sørensen

We propose a novel bootstrap procedure for dependent data based on Generative
Adversarial networks (GANs). We show that the dynamics of common stationary
time series processes can be learned by GANs and demonstrate that GANs trained
on a single sample path can be used to generate additional samples from the
process. We find that temporal convolutional neural networks provide a suitable
design for the generator and discriminator, and that convincing samples can be
generated on the basis of a vector of iid normal noise. We demonstrate the
finite sample properties of GAN sampling and the suggested bootstrap using
simulations where we compare the performance to circular block bootstrapping in
the case of resampling an AR(1) time series processes. We find that resampling
using the GAN can outperform circular block bootstrapping in terms of empirical
coverage.

arXiv link: http://arxiv.org/abs/2102.00208v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-01-29

Tree-based Node Aggregation in Sparse Graphical Models

Authors: Ines Wilms, Jacob Bien

High-dimensional graphical models are often estimated using regularization
that is aimed at reducing the number of edges in a network. In this work, we
show how even simpler networks can be produced by aggregating the nodes of the
graphical model. We develop a new convex regularized method, called the
tree-aggregated graphical lasso or tag-lasso, that estimates graphical models
that are both edge-sparse and node-aggregated. The aggregation is performed in
a data-driven fashion by leveraging side information in the form of a tree that
encodes node similarity and facilitates the interpretation of the resulting
aggregated nodes. We provide an efficient implementation of the tag-lasso by
using the locally adaptive alternating direction method of multipliers and
illustrate our proposal's practical advantages in simulation and in
applications in finance and biology.

arXiv link: http://arxiv.org/abs/2101.12503v1

Econometrics arXiv paper, submitted: 2021-01-28

The Bootstrap for Network Dependent Processes

Authors: Denis Kojevnikov

This paper focuses on the bootstrap for network dependent processes under the
conditional $\psi$-weak dependence. Such processes are distinct from other
forms of random fields studied in the statistics and econometrics literature so
that the existing bootstrap methods cannot be applied directly. We propose a
block-based approach and a modification of the dependent wild bootstrap for
constructing confidence sets for the mean of a network dependent process. In
addition, we establish the consistency of these methods for the smooth function
model and provide the bootstrap alternatives to the network
heteroskedasticity-autocorrelation consistent (HAC) variance estimator. We find
that the modified dependent wild bootstrap and the corresponding variance
estimator are consistent under weaker conditions relative to the block-based
method, which makes the former approach preferable for practical
implementation.

arXiv link: http://arxiv.org/abs/2101.12312v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2021-01-28

Simple Adaptive Estimation of Quadratic Functionals in Nonparametric IV Models

Authors: Christoph Breunig, Xiaohong Chen

This paper considers adaptive, minimax estimation of a quadratic functional
in a nonparametric instrumental variables (NPIV) model, which is an important
problem in optimal estimation of a nonlinear functional of an ill-posed inverse
regression with an unknown operator. We first show that a leave-one-out, sieve
NPIV estimator of the quadratic functional can attain a convergence rate that
coincides with the lower bound previously derived in Chen and Christensen
[2018]. The minimax rate is achieved by the optimal choice of the sieve
dimension (a key tuning parameter) that depends on the smoothness of the NPIV
function and the degree of ill-posedness, both are unknown in practice. We next
propose a Lepski-type data-driven choice of the key sieve dimension adaptive to
the unknown NPIV model features. The adaptive estimator of the quadratic
functional is shown to attain the minimax optimal rate in the severely
ill-posed case and in the regular mildly ill-posed case, but up to a
multiplicative $\log n$ factor in the irregular mildly ill-posed case.

arXiv link: http://arxiv.org/abs/2101.12282v2

Econometrics arXiv paper, submitted: 2021-01-28

Gaussian Process Latent Class Choice Models

Authors: Georges Sfeir, Filipe Rodrigues, Maya Abou-Zeid

We present a Gaussian Process - Latent Class Choice Model (GP-LCCM) to
integrate a non-parametric class of probabilistic machine learning within
discrete choice models (DCMs). Gaussian Processes (GPs) are kernel-based
algorithms that incorporate expert knowledge by assuming priors over latent
functions rather than priors over parameters, which makes them more flexible in
addressing nonlinear problems. By integrating a Gaussian Process within a LCCM
structure, we aim at improving discrete representations of unobserved
heterogeneity. The proposed model would assign individuals probabilistically to
behaviorally homogeneous clusters (latent classes) using GPs and simultaneously
estimate class-specific choice models by relying on random utility models.
Furthermore, we derive and implement an Expectation-Maximization (EM) algorithm
to jointly estimate/infer the hyperparameters of the GP kernel function and the
class-specific choice parameters by relying on a Laplace approximation and
gradient-based numerical optimization methods, respectively. The model is
tested on two different mode choice applications and compared against different
LCCM benchmarks. Results show that GP-LCCM allows for a more complex and
flexible representation of heterogeneity and improves both in-sample fit and
out-of-sample predictive power. Moreover, behavioral and economic
interpretability is maintained at the class-specific choice model level while
local interpretation of the latent classes can still be achieved, although the
non-parametric characteristic of GPs lessens the transparency of the model.

arXiv link: http://arxiv.org/abs/2101.12252v1

Econometrics arXiv updated paper (originally submitted: 2021-01-28)

Choice modelling in the age of machine learning -- discussion paper

Authors: S. Van Cranenburgh, S. Wang, A. Vij, F. Pereira, J. Walker

Since its inception, the choice modelling field has been dominated by
theory-driven modelling approaches. Machine learning offers an alternative
data-driven approach for modelling choice behaviour and is increasingly drawing
interest in our field. Cross-pollination of machine learning models, techniques
and practices could help overcome problems and limitations encountered in the
current theory-driven modelling paradigm, such as subjective labour-intensive
search processes for model selection, and the inability to work with text and
image data. However, despite the potential benefits of using the advances of
machine learning to improve choice modelling practices, the choice modelling
field has been hesitant to embrace machine learning. This discussion paper aims
to consolidate knowledge on the use of machine learning models, techniques and
practices for choice modelling, and discuss their potential. Thereby, we hope
not only to make the case that further integration of machine learning in
choice modelling is beneficial, but also to further facilitate it. To this end,
we clarify the similarities and differences between the two modelling
paradigms; we review the use of machine learning for choice modelling; and we
explore areas of opportunities for embracing machine learning models and
techniques to improve our practices. To conclude this discussion paper, we put
forward a set of research questions which must be addressed to better
understand if and how machine learning can benefit choice modelling.

arXiv link: http://arxiv.org/abs/2101.11948v2

Econometrics arXiv updated paper (originally submitted: 2021-01-28)

A Bayesian approach for estimation of weight matrices in spatial autoregressive models

Authors: Tamás Krisztin, Philipp Piribauer

We develop a Bayesian approach to estimate weight matrices in spatial
autoregressive (or spatial lag) models. Datasets in regional economic
literature are typically characterized by a limited number of time periods T
relative to spatial units N. When the spatial weight matrix is subject to
estimation severe problems of over-parametrization are likely. To make
estimation feasible, our approach focusses on spatial weight matrices which are
binary prior to row-standardization. We discuss the use of hierarchical priors
which impose sparsity in the spatial weight matrix. Monte Carlo simulations
show that these priors perform very well where the number of unknown parameters
is large relative to the observations. The virtues of our approach are
demonstrated using global data from the early phase of the COVID-19 pandemic.

arXiv link: http://arxiv.org/abs/2101.11938v2

Econometrics arXiv paper, submitted: 2021-01-28

The sooner the better: lives saved by the lockdown during the COVID-19 outbreak. The case of Italy

Authors: Roy Cerqueti, Raffaella Coppier, Alessandro Girardi, Marco Ventura

This paper estimates the effects of non-pharmaceutical interventions -
mainly, the lockdown - on the COVID-19 mortality rate for the case of Italy,
the first Western country to impose a national shelter-in-place order. We use a
new estimator, the Augmented Synthetic Control Method (ASCM), that overcomes
some limits of the standard Synthetic Control Method (SCM). The results are
twofold. From a methodological point of view, the ASCM outperforms the SCM in
that the latter cannot select a valid donor set, assigning all the weights to
only one country (Spain) while placing zero weights to all the remaining. From
an empirical point of view, we find strong evidence of the effectiveness of
non-pharmaceutical interventions in avoiding losses of human lives in Italy:
conservative estimates indicate that for each human life actually lost, in the
absence of lockdown there would have been on average other 1.15, the policy
saved in total 20,400 human lives.

arXiv link: http://arxiv.org/abs/2101.11901v1

Econometrics arXiv updated paper (originally submitted: 2021-01-27)

Predictive Quantile Regression with Mixed Roots and Increasing Dimensions: The ALQR Approach

Authors: Rui Fan, Ji Hyung Lee, Youngki Shin

In this paper we propose the adaptive lasso for predictive quantile
regression (ALQR). Reflecting empirical findings, we allow predictors to have
various degrees of persistence and exhibit different signal strengths. The
number of predictors is allowed to grow with the sample size. We study
regularity conditions under which stationary, local unit root, and cointegrated
predictors are present simultaneously. We next show the convergence rates,
model selection consistency, and asymptotic distributions of ALQR. We apply the
proposed method to the out-of-sample quantile prediction problem of stock
returns and find that it outperforms the existing alternatives. We also provide
numerical evidence from additional Monte Carlo experiments, supporting the
theoretical results.

arXiv link: http://arxiv.org/abs/2101.11568v4

Econometrics arXiv paper, submitted: 2021-01-26

Identifying and Estimating Perceived Returns to Binary Investments

Authors: Clint Harris

I describe a method for estimating agents' perceived returns to investments
that relies on cross-sectional data containing binary choices and prices, where
prices may be imperfectly known to agents. This method identifies the scale of
perceived returns by assuming agent knowledge of an identity that relates
profits, revenues, and costs rather than by eliciting or assuming agent beliefs
about structural parameters that are estimated by researchers. With this
assumption, modest adjustments to standard binary choice estimators enable
consistent estimation of perceived returns when using price instruments that
are uncorrelated with unobserved determinants of agents' price misperceptions
as well as other unobserved determinants of their perceived returns. I
demonstrate the method, and the importance of using price variation that is
known to agents, in a series of data simulations.

arXiv link: http://arxiv.org/abs/2101.10941v1

Econometrics arXiv updated paper (originally submitted: 2021-01-26)

Robustness of the international oil trade network under targeted attacks to economies

Authors: N. Wei, W. -J. Xie, W. -X. Zhou

In the international oil trade network (iOTN), trade shocks triggered by
extreme events may spread over the entire network along the trade links of the
central economies and even lead to the collapse of the whole system. In this
study, we focus on the concept of "too central to fail" and use traditional
centrality indicators as strategic indicators for simulating attacks on
economic nodes, and simulates various situations in which the structure and
function of the global oil trade network are lost when the economies suffer
extreme trade shocks. The simulation results show that the global oil trade
system has become more vulnerable in recent years. The regional aggregation of
oil trade is an essential source of iOTN's vulnerability. Maintaining global
oil trade stability and security requires a focus on economies with greater
influence within the network module of the iOTN. International organizations
such as OPEC and OECD established more trade links around the world, but their
influence on the iOTN is declining. We improve the framework of oil security
and trade risk assessment based on the topological index of iOTN, and provide a
reference for finding methods to maintain network robustness and trade
stability.

arXiv link: http://arxiv.org/abs/2101.10679v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2021-01-25

A nowcasting approach to generate timely estimates of Mexican economic activity: An application to the period of COVID-19

Authors: Francisco Corona, Graciela González-Farías, Jesús López-Pérez

In this paper, we present a new approach based on dynamic factor models
(DFMs) to perform nowcasts for the percentage annual variation of the Mexican
Global Economic Activity Indicator (IGAE in Spanish). The procedure consists of
the following steps: i) build a timely and correlated database by using
economic and financial time series and real-time variables such as social
mobility and significant topics extracted by Google Trends; ii) estimate the
common factors using the two-step methodology of Doz et al. (2011); iii) use
the common factors in univariate time-series models for test data; and iv)
according to the best results obtained in the previous step, combine the
statistically equal better nowcasts (Diebold-Mariano test) to generate the
current nowcasts. We obtain timely and accurate nowcasts for the IGAE,
including those for the current phase of drastic drops in the economy related
to COVID-19 sanitary measures. Additionally, the approach allows us to
disentangle the key variables in the DFM by estimating the confidence interval
for both the factor loadings and the factor estimates. This approach can be
used in official statistics to obtain preliminary estimates for IGAE up to 50
days before the official results.

arXiv link: http://arxiv.org/abs/2101.10383v1

Econometrics arXiv updated paper (originally submitted: 2021-01-25)

A Benchmark Model for Fixed-Target Arctic Sea Ice Forecasting

Authors: Francis X. Diebold, Maximilian Gobel

We propose a reduced-form benchmark predictive model (BPM) for fixed-target
forecasting of Arctic sea ice extent, and we provide a case study of its
real-time performance for target date September 2020. We visually detail the
evolution of the statistically-optimal point, interval, and density forecasts
as time passes, new information arrives, and the end of September approaches.
Comparison to the BPM may prove useful for evaluating and selecting among
various more sophisticated dynamical sea ice models, which are widely used to
quantify the likely future evolution of Arctic conditions and their two-way
interaction with economic activity.

arXiv link: http://arxiv.org/abs/2101.10359v3

Econometrics arXiv updated paper (originally submitted: 2021-01-25)

Consistent specification testing under spatial dependence

Authors: Abhimanyu Gupta, Xi Qu

We propose a series-based nonparametric specification test for a regression
function when data are spatially dependent, the `space' being of a general
economic or social nature. Dependence can be parametric, parametric with
increasing dimension, semiparametric or any combination thereof, thus covering
a vast variety of settings. These include spatial error models of varying types
and levels of complexity. Under a new smooth spatial dependence condition, our
test statistic is asymptotically standard normal. To prove the latter property,
we establish a central limit theorem for quadratic forms in linear processes in
an increasing dimension setting. Finite sample performance is investigated in a
simulation study, with a bootstrap method also justified and illustrated, and
empirical examples illustrate the test with real-world data.

arXiv link: http://arxiv.org/abs/2101.10255v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2021-01-23

Kernel regression analysis of tie-breaker designs

Authors: Dan M. Kluger, Art B. Owen

Tie-breaker experimental designs are hybrids of Randomized Controlled Trials
(RCTs) and Regression Discontinuity Designs (RDDs) in which subjects with
moderate scores are placed in an RCT while subjects with extreme scores are
deterministically assigned to the treatment or control group. In settings where
it is unfair or uneconomical to deny the treatment to the more deserving
recipients, the tie-breaker design (TBD) trades off the practical advantages of
the RDD with the statistical advantages of the RCT. The practical costs of the
randomization in TBDs can be hard to quantify in generality, while the
statistical benefits conferred by randomization in TBDs have only been studied
under linear and quadratic models. In this paper, we discuss and quantify the
statistical benefits of TBDs without using parametric modelling assumptions. If
the goal is estimation of the average treatment effect or the treatment effect
at more than one score value, the statistical benefits of using a TBD over an
RDD are apparent. If the goal is nonparametric estimation of the mean treatment
effect at merely one score value, we prove that about 2.8 times more subjects
are needed for an RDD in order to achieve the same asymptotic mean squared
error. We further demonstrate using both theoretical results and simulations
from the Angrist and Lavy (1999) classroom size dataset, that larger
experimental radii choices for the TBD lead to greater statistical efficiency.

arXiv link: http://arxiv.org/abs/2101.09605v5

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2021-01-23

Inference on the New Keynesian Phillips Curve with Very Many Instrumental Variables

Authors: Max-Sebastian Dovì

Limited-information inference on New Keynesian Phillips Curves (NKPCs) and
other single-equation macroeconomic relations is characterised by weak and
high-dimensional instrumental variables (IVs). Beyond the efficiency concerns
previously raised in the literature, I show by simulation that ad-hoc selection
procedures can lead to substantial biases in post-selection inference. I
propose a Sup Score test that remains valid under dependent data, arbitrarily
weak identification, and a number of IVs that increases exponentially with the
sample size. Conducting inference on a standard NKPC with 359 IVs and 179
observations, I find substantially wider confidence sets than those commonly
found.

arXiv link: http://arxiv.org/abs/2101.09543v2

Econometrics arXiv updated paper (originally submitted: 2021-01-23)

A Design-Based Perspective on Synthetic Control Methods

Authors: Lea Bottmer, Guido Imbens, Jann Spiess, Merrill Warnick

Since their introduction in Abadie and Gardeazabal (2003), Synthetic Control
(SC) methods have quickly become one of the leading methods for estimating
causal effects in observational studies in settings with panel data. Formal
discussions often motivate SC methods by the assumption that the potential
outcomes were generated by a factor model. Here we study SC methods from a
design-based perspective, assuming a model for the selection of the treated
unit(s) and period(s). We show that the standard SC estimator is generally
biased under random assignment. We propose a Modified Unbiased Synthetic
Control (MUSC) estimator that guarantees unbiasedness under random assignment
and derive its exact, randomization-based, finite-sample variance. We also
propose an unbiased estimator for this variance. We document in settings with
real data that under random assignment, SC-type estimators can have root
mean-squared errors that are substantially lower than that of other common
estimators. We show that such an improvement is weakly guaranteed if the
treated period is similar to the other periods, for example, if the treated
period was randomly selected. While our results only directly apply in settings
where treatment is assigned randomly, we believe that they can complement
model-based approaches even for observational studies.

arXiv link: http://arxiv.org/abs/2101.09398v4

Econometrics arXiv updated paper (originally submitted: 2021-01-23)

Yield Spread Selection in Predicting Recession Probabilities: A Machine Learning Approach

Authors: Jaehyuk Choi, Desheng Ge, Kyu Ho Kang, Sungbin Sohn

The literature on using yield curves to forecast recessions customarily uses
10-year--three-month Treasury yield spread without verification on the pair
selection. This study investigates whether the predictive ability of spread can
be improved by letting a machine learning algorithm identify the best maturity
pair and coefficients. Our comprehensive analysis shows that, despite the
likelihood gain, the machine learning approach does not significantly improve
prediction, owing to the estimation error. This is robust to the forecasting
horizon, control variable, sample period, and oversampling of the recession
observations. Our finding supports the use of the 10-year--three-month spread.

arXiv link: http://arxiv.org/abs/2101.09394v2

Econometrics arXiv cross-link from cs.CV (cs.CV), submitted: 2021-01-22

HANA: A HAndwritten NAme Database for Offline Handwritten Text Recognition

Authors: Christian M. Dahl, Torben Johansen, Emil N. Sørensen, Simon Wittrock

Methods for linking individuals across historical data sets, typically in
combination with AI based transcription models, are developing rapidly.
Probably the single most important identifier for linking is personal names.
However, personal names are prone to enumeration and transcription errors and
although modern linking methods are designed to handle such challenges, these
sources of errors are critical and should be minimized. For this purpose,
improved transcription methods and large-scale databases are crucial
components. This paper describes and provides documentation for HANA, a newly
constructed large-scale database which consists of more than 3.3 million names.
The database contain more than 105 thousand unique names with a total of more
than 1.1 million images of personal names, which proves useful for transfer
learning to other settings. We provide three examples hereof, obtaining
significantly improved transcription accuracy on both Danish and US census
data. In addition, we present benchmark results for deep learning models
automatically transcribing the personal names from the scanned documents.
Through making more challenging large-scale databases publicly available we
hope to foster more sophisticated, accurate, and robust models for handwritten
text recognition.

arXiv link: http://arxiv.org/abs/2101.10862v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-01-21

Discrete Choice Analysis with Machine Learning Capabilities

Authors: Youssef M. Aboutaleb, Mazen Danaf, Yifei Xie, Moshe Ben-Akiva

This paper discusses capabilities that are essential to models applied in
policy analysis settings and the limitations of direct applications of
off-the-shelf machine learning methodologies to such settings. Traditional
econometric methodologies for building discrete choice models for policy
analysis involve combining data with modeling assumptions guided by
subject-matter considerations. Such considerations are typically most useful in
specifying the systematic component of random utility discrete choice models
but are typically of limited aid in determining the form of the random
component. We identify an area where machine learning paradigms can be
leveraged, namely in specifying and systematically selecting the best
specification of the random component of the utility equations. We review two
recent novel applications where mixed-integer optimization and cross-validation
are used to algorithmically select optimal specifications for the random
utility components of nested logit and logit mixture models subject to
interpretability constraints.

arXiv link: http://arxiv.org/abs/2101.10261v1

Econometrics arXiv paper, submitted: 2021-01-17

Decomposition of Bilateral Trade Flows Using a Three-Dimensional Panel Data Model

Authors: Yufeng Mao, Bin Peng, Mervyn Silvapulle, Param Silvapulle, Yanrong Yang

This study decomposes the bilateral trade flows using a three-dimensional
panel data model. Under the scenario that all three dimensions diverge to
infinity, we propose an estimation approach to identify the number of global
shocks and country-specific shocks sequentially, and establish the asymptotic
theories accordingly. From the practical point of view, being able to separate
the pervasive and nonpervasive shocks in a multi-dimensional panel data is
crucial for a range of applications, such as, international financial linkages,
migration flows, etc. In the numerical studies, we first conduct intensive
simulations to examine the theoretical findings, and then use the proposed
approach to investigate the international trade flows from two major trading
groups (APEC and EU) over 1982-2019, and quantify the network of bilateral
trade.

arXiv link: http://arxiv.org/abs/2101.06805v1

Econometrics arXiv paper, submitted: 2021-01-16

GDP Forecasting using Payments Transaction Data

Authors: Arunav Das

UK GDP data is published with a lag time of more than a month and it is often
adjusted for prior periods. This paper contemplates breaking away from the
historic GDP measure to a more dynamic method using Bank Account, Cheque and
Credit Card payment transactions as possible predictors for faster and real
time measure of GDP value. Historic timeseries data available from various
public domain for various payment types, values, volume and nominal UK GDP was
used for this analysis. Low Value Payments was selected for simple Ordinary
Least Square Simple Linear Regression with mixed results around explanatory
power of the model and reliability measured through residuals distribution and
variance. Future research could potentially expand this work using datasets
split by period of economic shocks to further test the OLS method or explore
one of General Least Square method or an autoregression on GDP timeseries
itself.

arXiv link: http://arxiv.org/abs/2101.06478v1

Econometrics arXiv paper, submitted: 2021-01-15

Causal Gradient Boosting: Boosted Instrumental Variable Regression

Authors: Edvard Bakhitov, Amandeep Singh

Recent advances in the literature have demonstrated that standard supervised
learning algorithms are ill-suited for problems with endogenous explanatory
variables. To correct for the endogeneity bias, many variants of nonparameteric
instrumental variable regression methods have been developed. In this paper, we
propose an alternative algorithm called boostIV that builds on the traditional
gradient boosting algorithm and corrects for the endogeneity bias. The
algorithm is very intuitive and resembles an iterative version of the standard
2SLS estimator. Moreover, our approach is data driven, meaning that the
researcher does not have to make a stance on neither the form of the target
function approximation nor the choice of instruments. We demonstrate that our
estimator is consistent under mild conditions. We carry out extensive Monte
Carlo simulations to demonstrate the finite sample performance of our algorithm
compared to other recently developed methods. We show that boostIV is at worst
on par with the existing methods and on average significantly outperforms them.

arXiv link: http://arxiv.org/abs/2101.06078v1

Econometrics arXiv updated paper (originally submitted: 2021-01-14)

Using Monotonicity Restrictions to Identify Models with Partially Latent Covariates

Authors: Minji Bang, Wayne Yuan Gao, Andrew Postlewaite, Holger Sieg

This paper develops a new method for identifying econometric models with
partially latent covariates. Such data structures arise in industrial
organization and labor economics settings where data are collected using an
input-based sampling strategy, e.g., if the sampling unit is one of multiple
labor input factors. We show that the latent covariates can be
nonparametrically identified, if they are functions of a common shock
satisfying some plausible monotonicity assumptions. With the latent covariates
identified, semiparametric estimation of the outcome equation proceeds within a
standard IV framework that accounts for the endogeneity of the covariates. We
illustrate the usefulness of our method using a new application that focuses on
the production functions of pharmacies. We find that differences in technology
between chains and independent pharmacies may partially explain the observed
transformation of the industry structure.

arXiv link: http://arxiv.org/abs/2101.05847v5

Econometrics arXiv cross-link from math.PR (math.PR), submitted: 2021-01-14

Explicit non-asymptotic bounds for the distance to the first-order Edgeworth expansion

Authors: Alexis Derumigny, Lucas Girard, Yannick Guyonvarch

In this article, we obtain explicit bounds on the uniform distance between
the cumulative distribution function of a standardized sum $S_n$ of $n$
independent centered random variables with moments of order four and its
first-order Edgeworth expansion. Those bounds are valid for any sample size
with $n^{-1/2}$ rate under moment conditions only and $n^{-1}$ rate under
additional regularity constraints on the tail behavior of the characteristic
function of $S_n$. In both cases, the bounds are further sharpened if the
variables involved in $S_n$ are unskewed. We also derive new Berry-Esseen-type
bounds from our results and discuss their links with existing ones. We finally
apply our results to illustrate the lack of finite-sample validity of one-sided
tests based on the normal approximation of the mean.

arXiv link: http://arxiv.org/abs/2101.05780v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-01-13

Assessing the Impact: Does an Improvement to a Revenue Management System Lead to an Improved Revenue?

Authors: Greta Laage, Emma Frejinger, Andrea Lodi, Guillaume Rabusseau

Airlines and other industries have been making use of sophisticated Revenue
Management Systems to maximize revenue for decades. While improving the
different components of these systems has been the focus of numerous studies,
estimating the impact of such improvements on the revenue has been overlooked
in the literature despite its practical importance. Indeed, quantifying the
benefit of a change in a system serves as support for investment decisions.
This is a challenging problem as it corresponds to the difference between the
generated value and the value that would have been generated keeping the system
as before. The latter is not observable. Moreover, the expected impact can be
small in relative value. In this paper, we cast the problem as counterfactual
prediction of unobserved revenue. The impact on revenue is then the difference
between the observed and the estimated revenue. The originality of this work
lies in the innovative application of econometric methods proposed for
macroeconomic applications to a new problem setting. Broadly applicable, the
approach benefits from only requiring revenue data observed for
origin-destination pairs in the network of the airline at each day, before and
after a change in the system is applied. We report results using real
large-scale data from Air Canada. We compare a deep neural network
counterfactual predictions model with econometric models. They achieve
respectively 1% and 1.1% of error on the counterfactual revenue predictions,
and allow to accurately estimate small impacts (in the order of 2%).

arXiv link: http://arxiv.org/abs/2101.10249v2

Econometrics arXiv updated paper (originally submitted: 2021-01-12)

Full-Information Estimation of Heterogeneous Agent Models Using Macro and Micro Data

Authors: Laura Liu, Mikkel Plagborg-Møller

We develop a generally applicable full-information inference method for
heterogeneous agent models, combining aggregate time series data and repeated
cross sections of micro data. To handle unobserved aggregate state variables
that affect cross-sectional distributions, we compute a numerically unbiased
estimate of the model-implied likelihood function. Employing the likelihood
estimate in a Markov Chain Monte Carlo algorithm, we obtain fully efficient and
valid Bayesian inference. Evaluation of the micro part of the likelihood lends
itself naturally to parallel computing. Numerical illustrations in models with
heterogeneous households or firms demonstrate that the proposed
full-information method substantially sharpens inference relative to using only
macro data, and for some parameters micro data is essential for identification.

arXiv link: http://arxiv.org/abs/2101.04771v2

Econometrics arXiv updated paper (originally submitted: 2021-01-12)

Empirical Decomposition of the IV-OLS Gap with Heterogeneous and Nonlinear Effects

Authors: Shoya Ishimaru

This study proposes an econometric framework to interpret and empirically
decompose the difference between IV and OLS estimates given by a linear
regression model when the true causal effects of the treatment are nonlinear in
treatment levels and heterogeneous across covariates. I show that the IV-OLS
coefficient gap consists of three estimable components: the difference in
weights on the covariates, the difference in weights on the treatment levels,
and the difference in identified marginal effects that arises from endogeneity
bias. Applications of this framework to return-to-schooling estimates
demonstrate the empirical relevance of this distinction in properly
interpreting the IV-OLS gap.

arXiv link: http://arxiv.org/abs/2101.04346v5

Econometrics arXiv updated paper (originally submitted: 2021-01-11)

Dynamic Ordering Learning in Multivariate Forecasting

Authors: Bruno P. C. Levy, Hedibert F. Lopes

In many fields where the main goal is to produce sequential forecasts for
decision making problems, the good understanding of the contemporaneous
relations among different series is crucial for the estimation of the
covariance matrix. In recent years, the modified Cholesky decomposition
appeared as a popular approach to covariance matrix estimation. However, its
main drawback relies on the imposition of the series ordering structure. In
this work, we propose a highly flexible and fast method to deal with the
problem of ordering uncertainty in a dynamic fashion with the use of Dynamic
Order Probabilities. We apply the proposed method in two different forecasting
contexts. The first is a dynamic portfolio allocation problem, where the
investor is able to learn the contemporaneous relationships among different
currencies improving final decisions and economic performance. The second is a
macroeconomic application, where the econometrician can adapt sequentially to
new economic environments, switching the contemporaneous relations among
macroeconomic variables over time.

arXiv link: http://arxiv.org/abs/2101.04164v3

Econometrics arXiv paper, submitted: 2021-01-10

Bootstrapping Non-Stationary Stochastic Volatility

Authors: H. Peter Boswijk, Giuseppe Cavaliere, Anders Rahbek, Iliyan Georgiev

In this paper we investigate how the bootstrap can be applied to time series
regressions when the volatility of the innovations is random and
non-stationary. The volatility of many economic and financial time series
displays persistent changes and possible non-stationarity. However, the theory
of the bootstrap for such models has focused on deterministic changes of the
unconditional variance and little is known about the performance and the
validity of the bootstrap when the volatility is driven by a non-stationary
stochastic process. This includes near-integrated volatility processes as well
as near-integrated GARCH processes. This paper develops conditions for
bootstrap validity in time series regressions with non-stationary, stochastic
volatility. We show that in such cases the distribution of bootstrap statistics
(conditional on the data) is random in the limit. Consequently, the
conventional approaches to proving bootstrap validity, involving weak
convergence in probability of the bootstrap statistic, fail to deliver the
required results. Instead, we use the concept of `weak convergence in
distribution' to develop and establish novel conditions for validity of the
wild bootstrap, conditional on the volatility process. We apply our results to
several testing problems in the presence of non-stationary stochastic
volatility, including testing in a location model, testing for structural
change and testing for an autoregressive unit root. Sufficient conditions for
bootstrap validity include the absence of statistical leverage effects, i.e.,
correlation between the error process and its future conditional variance. The
results are illustrated using Monte Carlo simulations, which indicate that the
wild bootstrap leads to size control even in small samples.

arXiv link: http://arxiv.org/abs/2101.03562v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2021-01-05

Online Multivalid Learning: Means, Moments, and Prediction Intervals

Authors: Varun Gupta, Christopher Jung, Georgy Noarov, Mallesh M. Pai, Aaron Roth

We present a general, efficient technique for providing contextual
predictions that are "multivalid" in various senses, against an online sequence
of adversarially chosen examples $(x,y)$. This means that the resulting
estimates correctly predict various statistics of the labels $y$ not just
marginally -- as averaged over the sequence of examples -- but also
conditionally on $x \in G$ for any $G$ belonging to an arbitrary intersecting
collection of groups $G$.
We provide three instantiations of this framework. The first is mean
prediction, which corresponds to an online algorithm satisfying the notion of
multicalibration from Hebert-Johnson et al. The second is variance and higher
moment prediction, which corresponds to an online algorithm satisfying the
notion of mean-conditioned moment multicalibration from Jung et al. Finally, we
define a new notion of prediction interval multivalidity, and give an algorithm
for finding prediction intervals which satisfy it. Because our algorithms
handle adversarially chosen examples, they can equally well be used to predict
statistics of the residuals of arbitrary point prediction methods, giving rise
to very general techniques for quantifying the uncertainty of predictions of
black box algorithms, even in an online adversarial setting. When instantiated
for prediction intervals, this solves a similar problem as conformal
prediction, but in an adversarial environment and with multivalidity guarantees
stronger than simple marginal coverage guarantees.

arXiv link: http://arxiv.org/abs/2101.01739v1

Econometrics arXiv updated paper (originally submitted: 2021-01-04)

Partial Identification in Nonseparable Binary Response Models with Endogenous Regressors

Authors: Jiaying Gu, Thomas M. Russell

This paper considers (partial) identification of a variety of counterfactual
parameters in binary response models with possibly endogenous regressors. Our
framework allows for nonseparable index functions with multi-dimensional latent
variables, and does not require parametric distributional assumptions. We
leverage results on hyperplane arrangements and cell enumeration from the
literature on computational geometry in order to provide a tractable means of
computing the identified set. We demonstrate how various functional form,
independence, and monotonicity assumptions can be imposed as constraints in our
optimization procedure to tighten the identified set. Finally, we apply our
method to study the effects of health insurance on the decision to seek medical
treatment.

arXiv link: http://arxiv.org/abs/2101.01254v5

Econometrics arXiv paper, submitted: 2021-01-04

Regression Discontinuity Design with Many Thresholds

Authors: Marinho Bertanha

Numerous empirical studies employ regression discontinuity designs with
multiple cutoffs and heterogeneous treatments. A common practice is to
normalize all the cutoffs to zero and estimate one effect. This procedure
identifies the average treatment effect (ATE) on the observed distribution of
individuals local to existing cutoffs. However, researchers often want to make
inferences on more meaningful ATEs, computed over general counterfactual
distributions of individuals, rather than simply the observed distribution of
individuals local to existing cutoffs. This paper proposes a consistent and
asymptotically normal estimator for such ATEs when heterogeneity follows a
non-parametric function of cutoff characteristics in the sharp case. The
proposed estimator converges at the minimax optimal rate of root-n for a
specific choice of tuning parameters. Identification in the fuzzy case, with
multiple cutoffs, is impossible unless heterogeneity follows a
finite-dimensional function of cutoff characteristics. Under parametric
heterogeneity, this paper proposes an ATE estimator for the fuzzy case that
optimally combines observations to maximize its precision.

arXiv link: http://arxiv.org/abs/2101.01245v1

Econometrics arXiv updated paper (originally submitted: 2021-01-04)

Better Bunching, Nicer Notching

Authors: Marinho Bertanha, Andrew H. McCallum, Nathan Seegert

This paper studies the bunching identification strategy for an elasticity
parameter that summarizes agents' responses to changes in slope (kink) or
intercept (notch) of a schedule of incentives. We show that current bunching
methods may be very sensitive to implicit assumptions in the literature about
unobserved individual heterogeneity. We overcome this sensitivity concern with
new non- and semi-parametric estimators. Our estimators allow researchers to
show how bunching elasticities depend on different identifying assumptions and
when elasticities are robust to them. We follow the literature and derive our
methods in the context of the iso-elastic utility model and an income tax
schedule that creates a piece-wise linear budget constraint. We demonstrate
bunching behavior provides robust estimates for self-employed and not-married
taxpayers in the context of the U.S. Earned Income Tax Credit. In contrast,
estimates for self-employed and married taxpayers depend on specific
identifying assumptions, which highlight the value of our approach. We provide
the Stata package "bunching" to implement our procedures.

arXiv link: http://arxiv.org/abs/2101.01170v3

Econometrics arXiv updated paper (originally submitted: 2021-01-04)

Shoiuld Humans Lie to Machines: The Incentive Compatibility of Lasso and General Weighted Lasso

Authors: Mehmet Caner, Kfir Eliaz

We consider situations where a user feeds her attributes to a machine
learning method that tries to predict her best option based on a random sample
of other users. The predictor is incentive-compatible if the user has no
incentive to misreport her covariates. Focusing on the popular Lasso estimation
technique, we borrow tools from high-dimensional statistics to characterize
sufficient conditions that ensure that Lasso is incentive compatible in large
samples. We extend our results to the Conservative Lasso estimator and provide
new moment bounds for this generalized weighted version of Lasso. Our results
show that incentive compatibility is achieved if the tuning parameter is kept
above some threshold. We present simulations that illustrate how this can be
done in practice.

arXiv link: http://arxiv.org/abs/2101.01144v2

Econometrics arXiv updated paper (originally submitted: 2021-01-03)

Estimation of Tempered Stable Lévy Models of Infinite Variation

Authors: José E. Figueroa-López, Ruoting Gong, Yuchen Han

We propose a new method for the estimation of a semiparametric tempered
stable L\'{e}vy model. The estimation procedure combines iteratively an
approximate semiparametric method of moment estimator, Truncated Realized
Quadratic Variations (TRQV), and a newly found small-time high-order
approximation for the optimal threshold of the TRQV of tempered stable
processes. The method is tested via simulations to estimate the volatility and
the Blumenthal-Getoor index of the generalized CGMY model as well as the
integrated volatility of a Heston-type model with CGMY jumps. The method
outperforms other efficient alternatives proposed in the literature when
working with a L\'evy process (i.e., the volatility is constant), or when the
index of jump intensity $Y$ is larger than $3/2$ in the presence of stochastic
volatility.

arXiv link: http://arxiv.org/abs/2101.00565v2

Econometrics arXiv paper, submitted: 2021-01-02

COVID-19 spreading in financial networks: A semiparametric matrix regression model

Authors: Billio Monica, Casarin Roberto, Costola Michele, Iacopini Matteo

Network models represent a useful tool to describe the complex set of
financial relationships among heterogeneous firms in the system. In this paper,
we propose a new semiparametric model for temporal multilayer causal networks
with both intra- and inter-layer connectivity. A Bayesian model with a
hierarchical mixture prior distribution is assumed to capture heterogeneity in
the response of the network edges to a set of risk factors including the
European COVID-19 cases. We measure the financial connectedness arising from
the interactions between two layers defined by stock returns and volatilities.
In the empirical analysis, we study the topology of the network before and
after the spreading of the COVID-19 disease.

arXiv link: http://arxiv.org/abs/2101.00422v1

Econometrics arXiv updated paper (originally submitted: 2021-01-02)

The Law of Large Numbers for Large Stable Matchings

Authors: Jacob Schwartz, Kyungchul Song

In many empirical studies of a large two-sided matching market (such as in a
college admissions problem), the researcher performs statistical inference
under the assumption that they observe a random sample from a large matching
market. In this paper, we consider a setting in which the researcher observes
either all or a nontrivial fraction of outcomes from a stable matching. We
establish a concentration inequality for empirical matching probabilities
assuming strong correlation among the colleges' preferences while allowing
students' preferences to be fully heterogeneous. Our concentration inequality
yields laws of large numbers for the empirical matching probabilities and other
statistics commonly used in empirical analyses of a large matching market. To
illustrate the usefulness of our concentration inequality, we prove consistency
for estimators of conditional matching probabilities and measures of positive
assortative matching.

arXiv link: http://arxiv.org/abs/2101.00399v8

Econometrics arXiv paper, submitted: 2020-12-31

Assessing Sensitivity to Unconfoundedness: Estimation and Inference

Authors: Matthew A. Masten, Alexandre Poirier, Linqi Zhang

This paper provides a set of methods for quantifying the robustness of
treatment effects estimated using the unconfoundedness assumption (also known
as selection on observables or conditional independence). Specifically, we
estimate and do inference on bounds on various treatment effect parameters,
like the average treatment effect (ATE) and the average effect of treatment on
the treated (ATT), under nonparametric relaxations of the unconfoundedness
assumption indexed by a scalar sensitivity parameter c. These relaxations allow
for limited selection on unobservables, depending on the value of c. For large
enough c, these bounds equal the no assumptions bounds. Using a non-standard
bootstrap method, we show how to construct confidence bands for these bound
functions which are uniform over all values of c. We illustrate these methods
with an empirical application to effects of the National Supported Work
Demonstration program. We implement these methods in a companion Stata module
for easy use in practice.

arXiv link: http://arxiv.org/abs/2012.15716v1

Econometrics arXiv paper, submitted: 2020-12-31

Breaking Ties: Regression Discontinuity Design Meets Market Design

Authors: Atila Abdulkadiroglu, Joshua D. Angrist, Yusuke Narita, Parag Pathak

Many schools in large urban districts have more applicants than seats.
Centralized school assignment algorithms ration seats at over-subscribed
schools using randomly assigned lottery numbers, non-lottery tie-breakers like
test scores, or both. The New York City public high school match illustrates
the latter, using test scores and other criteria to rank applicants at
“screened” schools, combined with lottery tie-breaking at unscreened
“lottery” schools. We show how to identify causal effects of school
attendance in such settings. Our approach generalizes regression discontinuity
methods to allow for multiple treatments and multiple running variables, some
of which are randomly assigned. The key to this generalization is a local
propensity score that quantifies the school assignment probabilities induced by
lottery and non-lottery tie-breakers. The local propensity score is applied in
an empirical assessment of the predictive value of New York City's school
report cards. Schools that receive a high grade indeed improve SAT math scores
and increase graduation rates, though by much less than OLS estimates suggest.
Selection bias in OLS estimates is egregious for screened schools.

arXiv link: http://arxiv.org/abs/2101.01093v1

Econometrics arXiv updated paper (originally submitted: 2020-12-30)

Assessing the Sensitivity of Synthetic Control Treatment Effect Estimates to Misspecification Error

Authors: Billy Ferguson, Brad Ross

We propose a sensitivity analysis for Synthetic Control (SC) treatment effect
estimates to interrogate the assumption that the SC method is well-specified,
namely that choosing weights to minimize pre-treatment prediction error yields
accurate predictions of counterfactual post-treatment outcomes. Our data-driven
procedure recovers the set of treatment effects consistent with the assumption
that the misspecification error incurred by the SC method is at most the
observable misspecification error incurred when using the SC estimator to
predict the outcomes of some control unit. We show that under one definition of
misspecification error, our procedure provides a simple, geometric motivation
for comparing the estimated treatment effect to the distribution of placebo
residuals to assess estimate credibility. When we apply our procedure to
several canonical studies that report SC estimates, we broadly confirm the
conclusions drawn by the source papers.

arXiv link: http://arxiv.org/abs/2012.15367v3

Econometrics arXiv updated paper (originally submitted: 2020-12-30)

Adversarial Estimation of Riesz Representers

Authors: Victor Chernozhukov, Whitney Newey, Rahul Singh, Vasilis Syrgkanis

Many causal parameters are linear functionals of an underlying regression.
The Riesz representer is a key component in the asymptotic variance of a
semiparametrically estimated linear functional. We propose an adversarial
framework to estimate the Riesz representer using general function spaces. We
prove a nonasymptotic mean square rate in terms of an abstract quantity called
the critical radius, then specialize it for neural networks, random forests,
and reproducing kernel Hilbert spaces as leading cases. Our estimators are
highly compatible with targeted and debiased machine learning with sample
splitting; our guarantees directly verify general conditions for inference that
allow mis-specification. We also use our guarantees to prove inference without
sample splitting, based on stability or complexity. Our estimators achieve
nominal coverage in highly nonlinear simulations where some previous methods
break down. They shed new light on the heterogeneous effects of matching
grants.

arXiv link: http://arxiv.org/abs/2101.00009v3

Econometrics arXiv updated paper (originally submitted: 2020-12-29)

A Pairwise Strategic Network Formation Model with Group Heterogeneity: With an Application to International Travel

Authors: Tadao Hoshino

In this study, we consider a pairwise network formation model in which each
dyad of agents strategically determines the link status between them. Our model
allows the agents to have unobserved group heterogeneity in the propensity of
link formation. For the model estimation, we propose a three-step maximum
likelihood (ML) method. First, we obtain consistent estimates for the
heterogeneity parameters at individual level using the ML estimator. Second, we
estimate the latent group structure using the binary segmentation algorithm
based on the results obtained from the first step. Finally, based on the
estimated group membership, we re-execute the ML estimation. Under certain
regularity conditions, we show that the proposed estimator is asymptotically
unbiased and distributed as normal at the parametric rate. As an empirical
illustration, we focus on the network data of international visa-free travels.
The results indicate the presence of significant strategic complementarity and
a certain level of degree heterogeneity in the network formation behavior.

arXiv link: http://arxiv.org/abs/2012.14886v2

Econometrics arXiv updated paper (originally submitted: 2020-12-29)

Bias-Aware Inference in Regularized Regression Models

Authors: Timothy B. Armstrong, Michal Kolesár, Soonwoo Kwon

We consider inference on a scalar regression coefficient under a constraint
on the magnitude of the control coefficients. A class of estimators based on a
regularized propensity score regression is shown to exactly solve a tradeoff
between worst-case bias and variance. We derive confidence intervals (CIs)
based on these estimators that are bias-aware: they account for the possible
bias of the estimator. Under homoskedastic Gaussian errors, these estimators
and CIs are near-optimal in finite samples for MSE and CI length. We also
provide conditions for asymptotic validity of the CI with unknown and possibly
heteroskedastic error distribution, and derive novel optimal rates of
convergence under high-dimensional asymptotics that allow the number of
regressors to increase more quickly than the number of observations. Extensive
simulations and an empirical application illustrate the performance of our
methods.

arXiv link: http://arxiv.org/abs/2012.14823v2

Econometrics arXiv updated paper (originally submitted: 2020-12-29)

Bayesian analysis of seasonally cointegrated VAR model

Authors: Justyna Wróblewska

The paper aims at developing the Bayesian seasonally cointegrated model for
quarterly data. We propose the prior structure, derive the set of full
conditional posterior distributions, and propose the sampling scheme. The
identification of cointegrating spaces is obtained via orthonormality
restrictions imposed on vectors spanning them. In the case of annual frequency,
the cointegrating vectors are complex, which should be taken into account when
identifying them. The point estimation of the cointegrating spaces is also
discussed. The presented methods are illustrated by a simulation experiment and
are employed in the analysis of money and prices in the Polish economy.

arXiv link: http://arxiv.org/abs/2012.14820v2

Econometrics arXiv paper, submitted: 2020-12-29

The impact of Climate on Economic and Financial Cycles: A Markov-switching Panel Approach

Authors: Monica Billio, Roberto Casarin, Enrica De Cian, Malcolm Mistry, Anthony Osuntuyi

This paper examines the impact of climate shocks on 13 European economies
analysing jointly business and financial cycles, in different phases and
disentangling the effects for different sector channels. A Bayesian Panel
Markov-switching framework is proposed to jointly estimate the impact of
extreme weather events on the economies as well as the interaction between
business and financial cycles. Results from the empirical analysis suggest that
extreme weather events impact asymmetrically across the different phases of the
economy and heterogeneously across the EU countries. Moreover, we highlight how
the manufacturing output, a component of the industrial production index,
constitutes the main channel through which climate shocks impact the EU
economies.

arXiv link: http://arxiv.org/abs/2012.14693v1

Econometrics arXiv updated paper (originally submitted: 2020-12-27)

Time-Transformed Test for the Explosive Bubbles under Non-stationary Volatility

Authors: Eiji Kurozumi, Anton Skrobotov, Alexey Tsarev

This paper is devoted to testing for the explosive bubble under time-varying
non-stationary volatility. Because the limiting distribution of the seminal
Phillips et al. (2011) test depends on the variance function and usually
requires a bootstrap implementation under heteroskedasticity, we construct the
test based on a deformation of the time domain. The proposed test is
asymptotically pivotal under the null hypothesis and its limiting distribution
coincides with that of the standard test under homoskedasticity, so that the
test does not require computationally extensive methods for inference.
Appealing finite sample properties are demonstrated through Monte-Carlo
simulations. An empirical application demonstrates that the upsurge behavior of
cryptocurrency time series in the middle of the sample is partially explained
by the volatility change.

arXiv link: http://arxiv.org/abs/2012.13937v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-12-26

Weighting-Based Treatment Effect Estimation via Distribution Learning

Authors: Dongcheng Zhang, Kunpeng Zhang

Existing weighting methods for treatment effect estimation are often built
upon the idea of propensity scores or covariate balance. They usually impose
strong assumptions on treatment assignment or outcome model to obtain unbiased
estimation, such as linearity or specific functional forms, which easily leads
to the major drawback of model mis-specification. In this paper, we aim to
alleviate these issues by developing a distribution learning-based weighting
method. We first learn the true underlying distribution of covariates
conditioned on treatment assignment, then leverage the ratio of covariates'
density in the treatment group to that of the control group as the weight for
estimating treatment effects. Specifically, we propose to approximate the
distribution of covariates in both treatment and control groups through
invertible transformations via change of variables. To demonstrate the
superiority, robustness, and generalizability of our method, we conduct
extensive experiments using synthetic and real data. From the experiment
results, we find that our method for estimating average treatment effect on
treated (ATT) with observational data outperforms several cutting-edge
weighting-only benchmarking methods, and it maintains its advantage under a
doubly-robust estimation framework that combines weighting with some advanced
outcome modeling methods.

arXiv link: http://arxiv.org/abs/2012.13805v4

Econometrics arXiv paper, submitted: 2020-12-26

Analysis of Randomized Experiments with Network Interference and Noncompliance

Authors: Bora Kim

Randomized experiments have become a standard tool in economics. In analyzing
randomized experiments, the traditional approach has been based on the Stable
Unit Treatment Value (SUTVA: rubin) assumption which dictates that there
is no interference between individuals. However, the SUTVA assumption fails to
hold in many applications due to social interaction, general equilibrium,
and/or externality effects. While much progress has been made in relaxing the
SUTVA assumption, most of this literature has only considered a setting with
perfect compliance to treatment assignment. In practice, however, noncompliance
occurs frequently where the actual treatment receipt is different from the
assignment to the treatment. In this paper, we study causal effects in
randomized experiments with network interference and noncompliance. Spillovers
are allowed to occur at both treatment choice stage and outcome realization
stage. In particular, we explicitly model treatment choices of agents as a
binary game of incomplete information where resulting equilibrium treatment
choice probabilities affect outcomes of interest. Outcomes are further
characterized by a random coefficient model to allow for general unobserved
heterogeneity in the causal effects. After defining our causal parameters of
interest, we propose a simple control function estimator and derive its
asymptotic properties under large-network asymptotics. We apply our methods to
the randomized subsidy program of dupas where we find evidence of
spillover effects on both short-run and long-run adoption of
insecticide-treated bed nets. Finally, we illustrate the usefulness of our
methods by analyzing the impact of counterfactual subsidy policies.

arXiv link: http://arxiv.org/abs/2012.13710v1

Econometrics arXiv paper, submitted: 2020-12-25

Quantile regression with generated dependent variable and covariates

Authors: Jayeeta Bhattacharya

We study linear quantile regression models when regressors and/or dependent
variable are not directly observed but estimated in an initial first step and
used in the second step quantile regression for estimating the quantile
parameters. This general class of generated quantile regression (GQR) covers
various statistical applications, for instance, estimation of endogenous
quantile regression models and triangular structural equation models, and some
new relevant applications are discussed. We study the asymptotic distribution
of the two-step estimator, which is challenging because of the presence of
generated covariates and/or dependent variable in the non-smooth quantile
regression estimator. We employ techniques from empirical process theory to
find uniform Bahadur expansion for the two step estimator, which is used to
establish the asymptotic results. We illustrate the performance of the GQR
estimator through simulations and an empirical application based on auctions.

arXiv link: http://arxiv.org/abs/2012.13614v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-12-24

Filtering the intensity of public concern from social media count data with jumps

Authors: Matteo Iacopini, Carlo R. M. A. Santagiustina

Count time series obtained from online social media data, such as Twitter,
have drawn increasing interest among academics and market analysts over the
past decade. Transforming Web activity records into counts yields time series
with peculiar features, including the coexistence of smooth paths and sudden
jumps, as well as cross-sectional and temporal dependence. Using Twitter posts
about country risks for the United Kingdom and the United States, this paper
proposes an innovative state space model for multivariate count data with
jumps. We use the proposed model to assess the impact of public concerns in
these countries on market systems. To do so, public concerns inferred from
Twitter data are unpacked into country-specific persistent terms, risk social
amplification events, and co-movements of the country series. The identified
components are then used to investigate the existence and magnitude of
country-risk spillovers and social amplification effects on the volatility of
financial markets.

arXiv link: http://arxiv.org/abs/2012.13267v1

Econometrics arXiv updated paper (originally submitted: 2020-12-23)

Machine Learning Advances for Time Series Forecasting

Authors: Ricardo P. Masini, Marcelo C. Medeiros, Eduardo F. Mendes

In this paper we survey the most recent advances in supervised machine
learning and high-dimensional models for time series forecasting. We consider
both linear and nonlinear alternatives. Among the linear methods we pay special
attention to penalized regressions and ensemble of models. The nonlinear
methods considered in the paper include shallow and deep neural networks, in
their feed-forward and recurrent versions, and tree-based methods, such as
random forests and boosted trees. We also consider ensemble and hybrid models
by combining ingredients from different alternatives. Tests for superior
predictive ability are briefly reviewed. Finally, we discuss application of
machine learning in economics and finance and provide an illustration with
high-frequency financial data.

arXiv link: http://arxiv.org/abs/2012.12802v3

Econometrics arXiv updated paper (originally submitted: 2020-12-23)

Invidious Comparisons: Ranking and Selection as Compound Decisions

Authors: Jiaying Gu, Roger Koenker

There is an innate human tendency, one might call it the "league table
mentality," to construct rankings. Schools, hospitals, sports teams, movies,
and myriad other objects are ranked even though their inherent
multi-dimensionality would suggest that -- at best -- only partial orderings
were possible. We consider a large class of elementary ranking problems in
which we observe noisy, scalar measurements of merit for $n$ objects of
potentially heterogeneous precision and are asked to select a group of the
objects that are "most meritorious." The problem is naturally formulated in the
compound decision framework of Robbins's (1956) empirical Bayes theory, but it
also exhibits close connections to the recent literature on multiple testing.
The nonparametric maximum likelihood estimator for mixture models (Kiefer and
Wolfowitz (1956)) is employed to construct optimal ranking and selection rules.
Performance of the rules is evaluated in simulations and an application to
ranking U.S kidney dialysis centers.

arXiv link: http://arxiv.org/abs/2012.12550v3

Econometrics arXiv paper, submitted: 2020-12-22

Split-then-Combine simplex combination and selection of forecasters

Authors: Antonio Martin Arroyo, Aranzazu de Juan Fernandez

This paper considers the Split-Then-Combine (STC) approach (Arroyo and de
Juan, 2014) to combine forecasts inside the simplex space, the sample space of
positive weights adding up to one. As it turns out, the simplicial statistic
given by the center of the simplex compares favorably against the fixed-weight,
average forecast. Besides, we also develop a Combine-After-Selection (CAS)
method to get rid of redundant forecasters. We apply these two approaches to
make out-of-sample one-step ahead combinations and subcombinations of forecasts
for several economic variables. This methodology is particularly useful when
the sample size is smaller than the number of forecasts, a case where other
methods (e.g., Least Squares (LS) or Principal Component Analysis (PCA)) are
not applicable.

arXiv link: http://arxiv.org/abs/2012.11935v1

Econometrics arXiv updated paper (originally submitted: 2020-12-21)

Discordant Relaxations of Misspecified Models

Authors: Lixiong Li, Désiré Kédagni, Ismaël Mourifié

In many set-identified models, it is difficult to obtain a tractable
characterization of the identified set. Therefore, researchers often rely on
non-sharp identification conditions, and empirical results are often based on
an outer set of the identified set. This practice is often viewed as
conservative yet valid because an outer set is always a superset of the
identified set. However, this paper shows that when the model is refuted by the
data, two sets of non-sharp identification conditions derived from the same
model could lead to disjoint outer sets and conflicting empirical results. We
provide a sufficient condition for the existence of such discordancy, which
covers models characterized by conditional moment inequalities and the Artstein
(1983) inequalities. We also derive sufficient conditions for the non-existence
of discordant submodels, therefore providing a class of models for which
constructing outer sets cannot lead to misleading interpretations. In the case
of discordancy, we follow Masten and Poirier (2021) by developing a method to
salvage misspecified models, but unlike them, we focus on discrete relaxations.
We consider all minimum relaxations of a refuted model that restores
data-consistency. We find that the union of the identified sets of these
minimum relaxations is robust to detectable misspecifications and has an
intuitive empirical interpretation.

arXiv link: http://arxiv.org/abs/2012.11679v5

Econometrics arXiv updated paper (originally submitted: 2020-12-21)

On the Aggregation of Probability Assessments: Regularized Mixtures of Predictive Densities for Eurozone Inflation and Real Interest Rates

Authors: Francis X. Diebold, Minchul Shin, Boyuan Zhang

We propose methods for constructing regularized mixtures of density
forecasts. We explore a variety of objectives and regularization penalties, and
we use them in a substantive exploration of Eurozone inflation and real
interest rate density forecasts. All individual inflation forecasters (even the
ex post best forecaster) are outperformed by our regularized mixtures. From the
Great Recession onward, the optimal regularization tends to move density
forecasts' probability mass from the centers to the tails, correcting for
overconfidence.

arXiv link: http://arxiv.org/abs/2012.11649v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-12-21

Uncertainty on the Reproduction Ratio in the SIR Model

Authors: Sean Elliott, Christian Gourieroux

The aim of this paper is to understand the extreme variability on the
estimated reproduction ratio $R_0$ observed in practice. For expository purpose
we consider a discrete time stochastic version of the
Susceptible-Infected-Recovered (SIR) model, and introduce different approximate
maximum likelihood (AML) estimators of $R_0$. We carefully discuss the
properties of these estimators and illustrate by a Monte-Carlo study the width
of confidence intervals on $R_0$.

arXiv link: http://arxiv.org/abs/2012.11542v1

Econometrics arXiv updated paper (originally submitted: 2020-12-21)

A Nearly Similar Powerful Test for Mediation

Authors: Kees Jan van Garderen, Noud van Giersbergen

This paper derives a new powerful test for mediation that is easy to use.
Testing for mediation is empirically very important in psychology, sociology,
medicine, economics and business, generating over 100,000 citations to a single
key paper. The no-mediation hypothesis $H_{0}:\theta_{1}\theta _{2}=0$ also
poses a theoretically interesting statistical problem since it defines a
manifold that is non-regular in the origin where rejection probabilities of
standard tests are extremely low. We prove that a similar test for mediation
only exists if the size is the reciprocal of an integer. It is unique, but has
objectionable properties. We propose a new test that is nearly similar with
power close to the envelope without these abject properties and is easy to use
in practice. Construction uses the general varying $g$-method that we propose.
We illustrate the results in an educational setting with gender role beliefs
and in a trade union sentiment application.

arXiv link: http://arxiv.org/abs/2012.11342v2

Econometrics arXiv updated paper (originally submitted: 2020-12-21)

Weak Identification with Bounds in a Class of Minimum Distance Models

Authors: Gregory Fletcher Cox

When parameters are weakly identified, bounds on the parameters may provide a
valuable source of information. Existing weak identification estimation and
inference results are unable to combine weak identification with bounds. Within
a class of minimum distance models, this paper proposes identification-robust
inference that incorporates information from bounds when parameters are weakly
identified. This paper demonstrates the value of the bounds and
identification-robust inference in a simple latent factor model and a simple
GARCH model. This paper also demonstrates the identification-robust inference
in an empirical application, a factor model for parental investments in
children.

arXiv link: http://arxiv.org/abs/2012.11222v5

Econometrics arXiv updated paper (originally submitted: 2020-12-21)

Binary Classification Tests, Imperfect Standards, and Ambiguous Information

Authors: Gabriel Ziegler

New binary classification tests are often evaluated relative to a
pre-established test. For example, rapid Antigen tests for the detection of
SARS-CoV-2 are assessed relative to more established PCR tests. In this paper,
I argue that the new test can be described as producing ambiguous information
when the pre-established is imperfect. This allows for a phenomenon called
dilation -- an extreme form of non-informativeness. As an example, I present
hypothetical test data satisfying the WHO's minimum quality requirement for
rapid Antigen tests which leads to dilation. The ambiguity in the information
arises from a missing data problem due to imperfection of the established test:
the joint distribution of true infection and test results is not observed.
Using results from Copula theory, I construct the (usually non-singleton) set
of all these possible joint distributions, which allows me to assess the new
test's informativeness. This analysis leads to a simple sufficient condition to
make sure that a new test is not a dilation. I illustrate my approach with
applications to data from three COVID-19 related tests. Two rapid Antigen tests
satisfy my sufficient condition easily and are therefore informative. However,
less accurate procedures, like chest CT scans, may exhibit dilation.

arXiv link: http://arxiv.org/abs/2012.11215v3

Econometrics arXiv paper, submitted: 2020-12-20

Policy Transforms and Learning Optimal Policies

Authors: Thomas M. Russell

We study the problem of choosing optimal policy rules in uncertain
environments using models that may be incomplete and/or partially identified.
We consider a policymaker who wishes to choose a policy to maximize a
particular counterfactual quantity called a policy transform. We characterize
learnability of a set of policy options by the existence of a decision rule
that closely approximates the maximin optimal value of the policy transform
with high probability. Sufficient conditions are provided for the existence of
such a rule. However, learnability of an optimal policy is an ex-ante notion
(i.e. before observing a sample), and so ex-post (i.e. after observing a
sample) theoretical guarantees for certain policy rules are also provided. Our
entire approach is applicable when the distribution of unobservables is not
parametrically specified, although we discuss how semiparametric restrictions
can be used. Finally, we show possible applications of the procedure to a
simultaneous discrete choice example and a program evaluation example.

arXiv link: http://arxiv.org/abs/2012.11046v1

Econometrics arXiv paper, submitted: 2020-12-19

Achieving Reliable Causal Inference with Data-Mined Variables: A Random Forest Approach to the Measurement Error Problem

Authors: Mochen Yang, Edward McFowland III, Gordon Burtch, Gediminas Adomavicius

Combining machine learning with econometric analysis is becoming increasingly
prevalent in both research and practice. A common empirical strategy involves
the application of predictive modeling techniques to 'mine' variables of
interest from available data, followed by the inclusion of those variables into
an econometric framework, with the objective of estimating causal effects.
Recent work highlights that, because the predictions from machine learning
models are inevitably imperfect, econometric analyses based on the predicted
variables are likely to suffer from bias due to measurement error. We propose a
novel approach to mitigate these biases, leveraging the ensemble learning
technique known as the random forest. We propose employing random forest not
just for prediction, but also for generating instrumental variables to address
the measurement error embedded in the prediction. The random forest algorithm
performs best when comprised of a set of trees that are individually accurate
in their predictions, yet which also make 'different' mistakes, i.e., have
weakly correlated prediction errors. A key observation is that these properties
are closely related to the relevance and exclusion requirements of valid
instrumental variables. We design a data-driven procedure to select tuples of
individual trees from a random forest, in which one tree serves as the
endogenous covariate and the other trees serve as its instruments. Simulation
experiments demonstrate the efficacy of the proposed approach in mitigating
estimation biases and its superior performance over three alternative methods
for bias correction.

arXiv link: http://arxiv.org/abs/2012.10790v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2020-12-18

Kernel Methods for Unobserved Confounding: Negative Controls, Proxies, and Instruments

Authors: Rahul Singh

Negative control is a strategy for learning the causal relationship between
treatment and outcome in the presence of unmeasured confounding. The treatment
effect can nonetheless be identified if two auxiliary variables are available:
a negative control treatment (which has no effect on the actual outcome), and a
negative control outcome (which is not affected by the actual treatment). These
auxiliary variables can also be viewed as proxies for a traditional set of
control variables, and they bear resemblance to instrumental variables. I
propose a family of algorithms based on kernel ridge regression for learning
nonparametric treatment effects with negative controls. Examples include dose
response curves, dose response curves with distribution shift, and
heterogeneous treatment effects. Data may be discrete or continuous, and low,
high, or infinite dimensional. I prove uniform consistency and provide finite
sample rates of convergence. I estimate the dose response curve of cigarette
smoking on infant birth weight adjusting for unobserved confounding due to
household income, using a data set of singleton births in the state of
Pennsylvania between 1989 and 1991.

arXiv link: http://arxiv.org/abs/2012.10315v5

Econometrics arXiv updated paper (originally submitted: 2020-12-18)

Two-way Fixed Effects and Differences-in-Differences Estimators with Several Treatments

Authors: Clément de Chaisemartin, Xavier D'Haultfœuille

We study two-way-fixed-effects regressions (TWFE) with several treatment
variables. Under a parallel trends assumption, we show that the coefficient on
each treatment identifies a weighted sum of that treatment's effect, with
possibly negative weights, plus a weighted sum of the effects of the other
treatments. Thus, those estimators are not robust to heterogeneous effects and
may be contaminated by other treatments' effects. We further show that omitting
a treatment from the regression can actually reduce the estimator's bias,
unlike what would happen under constant treatment effects. We propose an
alternative difference-in-differences estimator, robust to heterogeneous
effects and immune to the contamination problem. In the application we
consider, the TWFE regression identifies a highly non-convex combination of
effects, with large contamination weights, and one of its coefficients
significantly differs from our heterogeneity-robust estimator.

arXiv link: http://arxiv.org/abs/2012.10077v8

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-12-17

The Variational Method of Moments

Authors: Andrew Bennett, Nathan Kallus

The conditional moment problem is a powerful formulation for describing
structural causal parameters in terms of observables, a prominent example being
instrumental variable regression. A standard approach reduces the problem to a
finite set of marginal moment conditions and applies the optimally weighted
generalized method of moments (OWGMM), but this requires we know a finite set
of identifying moments, can still be inefficient even if identifying, or can be
theoretically efficient but practically unwieldy if we use a growing sieve of
moment conditions. Motivated by a variational minimax reformulation of OWGMM,
we define a very general class of estimators for the conditional moment
problem, which we term the variational method of moments (VMM) and which
naturally enables controlling infinitely-many moments. We provide a detailed
theoretical analysis of multiple VMM estimators, including ones based on kernel
methods and neural nets, and provide conditions under which these are
consistent, asymptotically normal, and semiparametrically efficient in the full
conditional moment model. We additionally provide algorithms for valid
statistical inference based on the same kind of variational reformulations,
both for kernel- and neural-net-based varieties. Finally, we demonstrate the
strong performance of our proposed estimation and inference algorithms in a
detailed series of synthetic experiments.

arXiv link: http://arxiv.org/abs/2012.09422v4

Econometrics arXiv paper, submitted: 2020-12-16

Exact Trend Control in Estimating Treatment Effects Using Panel Data with Heterogenous Trends

Authors: Chirok Han

For a panel model considered by Abadie et al. (2010), the counterfactual
outcomes constructed by Abadie et al., Hsiao et al. (2012), and Doudchenko and
Imbens (2017) may all be confounded by uncontrolled heterogenous trends. Based
on exact-matching on the trend predictors, I propose new methods of estimating
the model-specific treatment effects, which are free from heterogenous trends.
When applied to Abadie et al.'s (2010) model and data, the new estimators
suggest considerably smaller effects of California's tobacco control program.

arXiv link: http://arxiv.org/abs/2012.08988v1

Econometrics arXiv updated paper (originally submitted: 2020-12-16)

United States FDA drug approvals are persistent and polycyclic: Insights into economic cycles, innovation dynamics, and national policy

Authors: Iraj Daizadeh

It is challenging to elucidate the effects of changes in external influences
(such as economic or policy) on the rate of US drug approvals. Here, a novel
approach, termed the Chronological Hurst Exponent (CHE), is proposed, which
hypothesizes that changes in the long-range memory latent within the dynamics
of time series data may be temporally associated with changes in such
influences. Using the monthly number the FDA Center for Drug Evaluation and
Research (CDER) approvals from 1939 to 2019 as the data source, it is
demonstrated that the CHE has a distinct S-shaped structure demarcated by an
8-year (1939-1947) Stagnation Period, a 27-year (1947-1974) Emergent
(time-varying Period, and a 45-year (1974-2019) Saturation Period. Further,
dominant periodicities (resolved via wavelet analyses) are identified during
the most recent 45-year CHE Saturation Period at 17, 8 and 4 years; thus, US
drug approvals have been following a Juglar-Kuznet mid-term cycle with
Kitchin-like bursts. As discussed, this work suggests that (1) changes in
extrinsic factors (e.g., of economic and/or policy origin ) during the Emergent
Period may have led to persistent growth in US drug approvals enjoyed since
1974, (2) the CHE may be a valued method to explore influences on time series
data, and (3) innovation-related economic cycles exist (as viewed via the proxy
metric of US drug approvals).

arXiv link: http://arxiv.org/abs/2012.09627v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2020-12-15

Minimax Risk and Uniform Convergence Rates for Nonparametric Dyadic Regression

Authors: Bryan S. Graham, Fengshi Niu, James L. Powell

Let $i=1,\ldots,N$ index a simple random sample of units drawn from some
large population. For each unit we observe the vector of regressors $X_{i}$
and, for each of the $N\left(N-1\right)$ ordered pairs of units, an outcome
$Y_{ij}$. The outcomes $Y_{ij}$ and $Y_{kl}$ are independent if their indices
are disjoint, but dependent otherwise (i.e., "dyadically dependent"). Let
$W_{ij}=\left(X_{i}',X_{j}'\right)'$; using the sampled data we seek to
construct a nonparametric estimate of the mean regression function
$g\left(W_{ij}\right)def{\equiv}E\left[\left.Y_{ij}\right|X_{i},X_{j}\right].$
We present two sets of results. First, we calculate lower bounds on the
minimax risk for estimating the regression function at (i) a point and (ii)
under the infinity norm. Second, we calculate (i) pointwise and (ii) uniform
convergence rates for the dyadic analog of the familiar Nadaraya-Watson (NW)
kernel regression estimator. We show that the NW kernel regression estimator
achieves the optimal rates suggested by our risk bounds when an appropriate
bandwidth sequence is chosen. This optimal rate differs from the one available
under iid data: the effective sample size is smaller and
$d_W=dim(W_{ij})$ influences the rate differently.

arXiv link: http://arxiv.org/abs/2012.08444v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-12-15

Long-term prediction intervals with many covariates

Authors: Sayar Karmakar, Marek Chudy, Wei Biao Wu

Accurate forecasting is one of the fundamental focus in the literature of
econometric time-series. Often practitioners and policy makers want to predict
outcomes of an entire time horizon in the future instead of just a single
$k$-step ahead prediction. These series, apart from their own possible
non-linear dependence, are often also influenced by many external predictors.
In this paper, we construct prediction intervals of time-aggregated forecasts
in a high-dimensional regression setting. Our approach is based on quantiles of
residuals obtained by the popular LASSO routine. We allow for general
heavy-tailed, long-memory, and nonlinear stationary error process and
stochastic predictors. Through a series of systematically arranged consistency
results we provide theoretical guarantees of our proposed quantile-based method
in all of these scenarios. After validating our approach using simulations we
also propose a novel bootstrap based method that can boost the coverage of the
theoretical intervals. Finally analyzing the EPEX Spot data, we construct
prediction intervals for hourly electricity prices over horizons spanning 17
weeks and contrast them to selected Bayesian and bootstrap interval forecasts.

arXiv link: http://arxiv.org/abs/2012.08223v2

Econometrics arXiv updated paper (originally submitted: 2020-12-15)

Real-time Inflation Forecasting Using Non-linear Dimension Reduction Techniques

Authors: Niko Hauzenberger, Florian Huber, Karin Klieber

In this paper, we assess whether using non-linear dimension reduction
techniques pays off for forecasting inflation in real-time. Several recent
methods from the machine learning literature are adopted to map a large
dimensional dataset into a lower dimensional set of latent factors. We model
the relationship between inflation and the latent factors using constant and
time-varying parameter (TVP) regressions with shrinkage priors. Our models are
then used to forecast monthly US inflation in real-time. The results suggest
that sophisticated dimension reduction methods yield inflation forecasts that
are highly competitive to linear approaches based on principal components.
Among the techniques considered, the Autoencoder and squared principal
components yield factors that have high predictive power for one-month- and
one-quarter-ahead inflation. Zooming into model performance over time reveals
that controlling for non-linear relations in the data is of particular
importance during recessionary episodes of the business cycle or the current
COVID-19 pandemic.

arXiv link: http://arxiv.org/abs/2012.08155v3

Econometrics arXiv paper, submitted: 2020-12-15

Identification of inferential parameters in the covariate-normalized linear conditional logit model

Authors: Philip Erickson

The conditional logit model is a standard workhorse approach to estimating
customers' product feature preferences using choice data. Using these models at
scale, however, can result in numerical imprecision and optimization failure
due to a combination of large-valued covariates and the softmax probability
function. Standard machine learning approaches alleviate these concerns by
applying a normalization scheme to the matrix of covariates, scaling all values
to sit within some interval (such as the unit simplex). While this type of
normalization is innocuous when using models for prediction, it has the side
effect of perturbing the estimated coefficients, which are necessary for
researchers interested in inference. This paper shows that, for two common
classes of normalizers, designated scaling and centered scaling, the
data-generating non-scaled model parameters can be analytically recovered along
with their asymptotic distributions. The paper also shows the numerical
performance of the analytical results using an example of a scaling normalizer.

arXiv link: http://arxiv.org/abs/2012.08022v1

Econometrics arXiv paper, submitted: 2020-12-14

Trademark filings and patent application count time series are structurally near-identical and cointegrated: Implications for studies in innovation

Authors: Iraj Daizadeh

Through time series analysis, this paper empirically explores, confirms and
extends the trademark/patent inter-relationship as proposed in the normative
intellectual-property (IP)-oriented Innovation Agenda view of the science and
technology (S&T) firm. Beyond simple correlation, it is shown that
trademark-filing (Trademarks) and patent-application counts (Patents) have
similar (if not, identical) structural attributes (including similar
distribution characteristics and seasonal variation, cross-wavelet
synchronicity/coherency (short-term cross-periodicity) and structural breaks)
and are cointegrated (integration order of 1) over a period of approximately 40
years (given the monthly observations). The existence of cointegration strongly
suggests a "long-run" equilibrium between the two indices; that is, there is
(are) exogenous force(s) restraining the two indices from diverging from one
another. Structural breakpoints in the chrono-dynamics of the indices supports
the existence of potentially similar exogeneous forces(s), as the break dates
are simultaneous/near-simultaneous (Trademarks: 1987, 1993, 1999, 2005, 2011;
Patents: 1988, 1994, 2000, and 2011). A discussion of potential triggers
(affecting both time series) causing these breaks, and the concept of
equilibrium in the context of these proxy measures are presented. The
cointegration order and structural co-movements resemble other macro-economic
variables, stoking the opportunity of using econometrics approaches to further
analyze these data. As a corollary, this work further supports the inclusion of
trademark analysis in innovation studies. Lastly, the data and corresponding
analysis tools (R program) are presented as Supplementary Materials for
reproducibility and convenience to conduct future work for interested readers.

arXiv link: http://arxiv.org/abs/2012.10400v1

Econometrics arXiv paper, submitted: 2020-12-14

Welfare Analysis via Marginal Treatment Effects

Authors: Yuya Sasaki, Takuya Ura

Consider a causal structure with endogeneity (i.e., unobserved
confoundedness) in empirical data, where an instrumental variable is available.
In this setting, we show that the mean social welfare function can be
identified and represented via the marginal treatment effect (MTE, Bjorklund
and Moffitt, 1987) as the operator kernel. This representation result can be
applied to a variety of statistical decision rules for treatment choice,
including plug-in rules, Bayes rules, and empirical welfare maximization (EWM)
rules as in Hirano and Porter (2020, Section 2.3). Focusing on the application
to the EWM framework of Kitagawa and Tetenov (2018), we provide convergence
rates of the worst case average welfare loss (regret) in the spirit of Manski
(2004).

arXiv link: http://arxiv.org/abs/2012.07624v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2020-12-08

Occupational segregation in a Roy model with composition preferences

Authors: Haoning Chen, Miaomiao Dong, Marc Henry, Ivan Sidorov

We propose a model of labor market sector self-selection that combines
comparative advantage, as in the Roy model, and sector composition preference.
Two groups choose between two sectors based on heterogeneous potential incomes
and group compositions in each sector. Potential incomes incorporate group
specific human capital accumulation and wage discrimination. Composition
preferences are interpreted as reflecting group specific amenity preferences as
well as homophily and aversion to minority status. We show that occupational
segregation is amplified by the composition preferences and we highlight a
resulting tension between redistribution and diversity. The model also exhibits
tipping from extreme compositions to more balanced ones. Tipping occurs when a
small nudge, associated with affirmative action, pushes the system to a very
different equilibrium, and when the set of equilibria changes abruptly when a
parameter governing the relative importance of pecuniary and composition
preferences crosses a threshold.

arXiv link: http://arxiv.org/abs/2012.04485v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-12-08

Forecasting the Olympic medal distribution during a pandemic: a socio-economic machine learning model

Authors: Christoph Schlembach, Sascha L. Schmidt, Dominik Schreyer, Linus Wunderlich

Forecasting the number of Olympic medals for each nation is highly relevant
for different stakeholders: Ex ante, sports betting companies can determine the
odds while sponsors and media companies can allocate their resources to
promising teams. Ex post, sports politicians and managers can benchmark the
performance of their teams and evaluate the drivers of success. To
significantly increase the Olympic medal forecasting accuracy, we apply machine
learning, more specifically a two-staged Random Forest, thus outperforming more
traditional na\"ive forecast for three previous Olympics held between 2008 and
2016 for the first time. Regarding the Tokyo 2020 Games in 2021, our model
suggests that the United States will lead the Olympic medal table, winning 120
medals, followed by China (87) and Great Britain (74). Intriguingly, we predict
that the current COVID-19 pandemic will not significantly alter the medal count
as all countries suffer from the pandemic to some extent (data inherent) and
limited historical data points on comparable diseases (model inherent).

arXiv link: http://arxiv.org/abs/2012.04378v2

Econometrics arXiv updated paper (originally submitted: 2020-12-07)

Who Should Get Vaccinated? Individualized Allocation of Vaccines Over SIR Network

Authors: Toru Kitagawa, Guanyi Wang

How to allocate vaccines over heterogeneous individuals is one of the
important policy decisions in pandemic times. This paper develops a procedure
to estimate an individualized vaccine allocation policy under limited supply,
exploiting social network data containing individual demographic
characteristics and health status. We model spillover effects of the vaccines
based on a Heterogeneous-Interacted-SIR network model and estimate an
individualized vaccine allocation policy by maximizing an estimated social
welfare (public health) criterion incorporating the spillovers. While this
optimization problem is generally an NP-hard integer optimization problem, we
show that the SIR structure leads to a submodular objective function, and
provide a computationally attractive greedy algorithm for approximating a
solution that has theoretical performance guarantee. Moreover, we characterise
a finite sample welfare regret bound and examine how its uniform convergence
rate depends on the complexity and riskiness of social network. In the
simulation, we illustrate the importance of considering spillovers by comparing
our method with targeting without network information.

arXiv link: http://arxiv.org/abs/2012.04055v4

Econometrics arXiv updated paper (originally submitted: 2020-12-07)

Asymptotic Normality for Multivariate Random Forest Estimators

Authors: Kevin Li

Regression trees and random forests are popular and effective non-parametric
estimators in practical applications. A recent paper by Athey and Wager shows
that the random forest estimate at any point is asymptotically Gaussian; in
this paper, we extend this result to the multivariate case and show that the
vector of estimates at multiple points is jointly normal. Specifically, the
covariance matrix of the limiting normal distribution is diagonal, so that the
estimates at any two points are independent in sufficiently deep trees.
Moreover, the off-diagonal term is bounded by quantities capturing how likely
two points belong to the same partition of the resulting tree. Our results
relies on certain a certain stability property when constructing splits, and we
give examples of splitting rules for which this assumption is and is not
satisfied. We test our proposed covariance bound and the associated coverage
rates of confidence intervals in numerical simulations.

arXiv link: http://arxiv.org/abs/2012.03486v3

Econometrics arXiv updated paper (originally submitted: 2020-12-06)

Binary Response Models for Heterogeneous Panel Data with Interactive Fixed Effects

Authors: Jiti Gao, Fei Liu, Bin Peng, Yayi Yan

In this paper, we investigate binary response models for heterogeneous panel
data with interactive fixed effects by allowing both the cross-sectional
dimension and the temporal dimension to diverge. From a practical point of
view, the proposed framework can be applied to predict the probability of
corporate failure, conduct credit rating analysis, etc. Theoretically and
methodologically, we establish a link between a maximum likelihood estimation
and a least squares approach, provide a simple information criterion to detect
the number of factors, and achieve the asymptotic distributions accordingly. In
addition, we conduct intensive simulations to examine the theoretical findings.
In the empirical study, we focus on the sign prediction of stock returns, and
then use the results of sign forecast to conduct portfolio analysis.

arXiv link: http://arxiv.org/abs/2012.03182v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-12-04

Forecasting: theory and practice

Authors: Fotios Petropoulos, Daniele Apiletti, Vassilios Assimakopoulos, Mohamed Zied Babai, Devon K. Barrow, Souhaib Ben Taieb, Christoph Bergmeir, Ricardo J. Bessa, Jakub Bijak, John E. Boylan, Jethro Browell, Claudio Carnevale, Jennifer L. Castle, Pasquale Cirillo, Michael P. Clements, Clara Cordeiro, Fernando Luiz Cyrino Oliveira, Shari De Baets, Alexander Dokumentov, Joanne Ellison, Piotr Fiszeder, Philip Hans Franses, David T. Frazier, Michael Gilliland, M. Sinan Gönül, Paul Goodwin, Luigi Grossi, Yael Grushka-Cockayne, Mariangela Guidolin, Massimo Guidolin, Ulrich Gunter, Xiaojia Guo, Renato Guseo, Nigel Harvey, David F. Hendry, Ross Hollyman, Tim Januschowski, Jooyoung Jeon, Victor Richmond R. Jose, Yanfei Kang, Anne B. Koehler, Stephan Kolassa, Nikolaos Kourentzes, Sonia Leva, Feng Li, Konstantia Litsiou, Spyros Makridakis, Gael M. Martin, Andrew B. Martinez, Sheik Meeran, Theodore Modis, Konstantinos Nikolopoulos, Dilek Önkal, Alessia Paccagnini, Anastasios Panagiotelis, Ioannis Panapakidis, Jose M. Pavía, Manuela Pedio, Diego J. Pedregal, Pierre Pinson, Patrícia Ramos, David E. Rapach, J. James Reade, Bahman Rostami-Tabar, Michał Rubaszek, Georgios Sermpinis, Han Lin Shang, Evangelos Spiliotis, Aris A. Syntetos, Priyanga Dilini Talagala, Thiyanga S. Talagala, Len Tashman, Dimitrios Thomakos, Thordis Thorarinsdottir, Ezio Todini, Juan Ramón Trapero Arenas, Xiaoqian Wang, Robert L. Winkler, Alisa Yusupova, Florian Ziel

Forecasting has always been at the forefront of decision making and planning.
The uncertainty that surrounds the future is both exciting and challenging,
with individuals and organisations seeking to minimise risks and maximise
utilities. The large number of forecasting applications calls for a diverse set
of forecasting methods to tackle real-life challenges. This article provides a
non-systematic review of the theory and the practice of forecasting. We provide
an overview of a wide range of theoretical, state-of-the-art models, methods,
principles, and approaches to prepare, produce, organise, and evaluate
forecasts. We then demonstrate how such theoretical concepts are applied in a
variety of real-life contexts.
We do not claim that this review is an exhaustive list of methods and
applications. However, we wish that our encyclopedic presentation will offer a
point of reference for the rich work that has been undertaken over the last
decades, with some key insights for the future of forecasting theory and
practice. Given its encyclopedic nature, the intended mode of reading is
non-linear. We offer cross-references to allow the readers to navigate through
the various topics. We complement the theoretical concepts and applications
covered by large lists of free or open-source software implementations and
publicly-available databases.

arXiv link: http://arxiv.org/abs/2012.03854v4

Econometrics arXiv updated paper (originally submitted: 2020-12-04)

A Multivariate Realized GARCH Model

Authors: Ilya Archakov, Peter Reinhard Hansen, Asger Lunde

We propose a novel class of multivariate GARCH models that incorporate
realized measures of volatility and correlations. The key innovation is an
unconstrained vector parametrization of the conditional correlation matrix,
which enables the use of factor models for correlations. This approach
elegantly addresses the main challenge faced by multivariate GARCH models in
high-dimensional settings. As an illustration, we explore block correlation
matrices that naturally simplify to linear factor models for the conditional
correlations. The model is applied to the returns of nine assets, and its
in-sample and out-of-sample performance compares favorably against several
popular benchmarks.

arXiv link: http://arxiv.org/abs/2012.02708v3

Econometrics arXiv updated paper (originally submitted: 2020-12-04)

A Canonical Representation of Block Matrices with Applications to Covariance and Correlation Matrices

Authors: Ilya Archakov, Peter Reinhard Hansen

We obtain a canonical representation for block matrices. The representation
facilitates simple computation of the determinant, the matrix inverse, and
other powers of a block matrix, as well as the matrix logarithm and the matrix
exponential. These results are particularly useful for block covariance and
block correlation matrices, where evaluation of the Gaussian log-likelihood and
estimation are greatly simplified. We illustrate this with an empirical
application using a large panel of daily asset returns. Moreover, the
representation paves new ways to regularizing large covariance/correlation
matrices, test block structures in matrices, and estimate regressions with many
variables.

arXiv link: http://arxiv.org/abs/2012.02698v2

Econometrics arXiv updated paper (originally submitted: 2020-12-04)

Asymmetric uncertainty : Nowcasting using skewness in real-time data

Authors: Paul Labonne

This paper presents a new way to account for downside and upside risks when
producing density nowcasts of GDP growth. The approach relies on modelling
location, scale and shape common factors in real-time macroeconomic data. While
movements in the location generate shifts in the central part of the predictive
density, the scale controls its dispersion (akin to general uncertainty) and
the shape its asymmetry, or skewness (akin to downside and upside risks). The
empirical application is centred on US GDP growth and the real-time data come
from Fred-MD. The results show that there is more to real-time data than their
levels or means: their dispersion and asymmetry provide valuable information
for nowcasting economic activity. Scale and shape common factors (i) yield more
reliable measures of uncertainty and (ii) improve precision when macroeconomic
uncertainty is at its peak.

arXiv link: http://arxiv.org/abs/2012.02601v4

Econometrics arXiv paper, submitted: 2020-12-04

A New Parametrization of Correlation Matrices

Authors: Ilya Archakov, Peter Reinhard Hansen

We introduce a novel parametrization of the correlation matrix. The
reparametrization facilitates modeling of correlation and covariance matrices
by an unrestricted vector, where positive definiteness is an innate property.
This parametrization can be viewed as a generalization of Fisther's
Z-transformation to higher dimensions and has a wide range of potential
applications. An algorithm for reconstructing the unique n x n correlation
matrix from any d-dimensional vector (with d = n(n-1)/2) is provided, and we
derive its numerical complexity.

arXiv link: http://arxiv.org/abs/2012.02395v1

Econometrics arXiv updated paper (originally submitted: 2020-12-04)

Sharp Bounds in the Latent Index Selection Model

Authors: Philip Marx

A fundamental question underlying the literature on partial identification
is: what can we learn about parameters that are relevant for policy but not
necessarily point-identified by the exogenous variation we observe? This paper
provides an answer in terms of sharp, analytic characterizations and bounds for
an important class of policy-relevant treatment effects, consisting of marginal
treatment effects and linear functionals thereof, in the latent index selection
model as formalized in Vytlacil (2002). The sharp bounds use the full content
of identified marginal distributions, and analytic derivations rely on the
theory of stochastic orders. The proposed methods also make it possible to
sharply incorporate new auxiliary assumptions on distributions into the latent
index selection framework. Empirically, I apply the methods to study the
effects of Medicaid on emergency room utilization in the Oregon Health
Insurance Experiment, showing that the predictions from extrapolations based on
a distribution assumption (rank similarity) differ substantively and
consistently from existing extrapolations based on a parametric mean assumption
(linearity). This underscores the value of utilizing the model's full empirical
content in combination with auxiliary assumptions.

arXiv link: http://arxiv.org/abs/2012.02390v2

Econometrics arXiv updated paper (originally submitted: 2020-12-03)

Inference in mixed causal and noncausal models with generalized Student's t-distributions

Authors: Francesco Giancaterini, Alain Hecq

The properties of Maximum Likelihood estimator in mixed causal and noncausal
models with a generalized Student's t error process are reviewed. Several known
existing methods are typically not applicable in the heavy-tailed framework. To
this end, a new approach to make inference on causal and noncausal parameters
in finite sample sizes is proposed. It exploits the empirical variance of the
generalized Student's-t, without the existence of population variance. Monte
Carlo simulations show a good performance of the new variance construction for
fat tail series. Finally, different existing approaches are compared using
three empirical applications: the variation of daily COVID-19 deaths in
Belgium, the monthly wheat prices, and the monthly inflation rate in Brazil.

arXiv link: http://arxiv.org/abs/2012.01888v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-12-03

Competition analysis on the over-the-counter credit default swap market

Authors: Louis Abraham

We study two questions related to competition on the OTC CDS market using
data collected as part of the EMIR regulation.
First, we study the competition between central counterparties through
collateral requirements. We present models that successfully estimate the
initial margin requirements. However, our estimations are not precise enough to
use them as input to a predictive model for CCP choice by counterparties in the
OTC market.
Second, we model counterpart choice on the interdealer market using a novel
semi-supervised predictive task. We present our methodology as part of the
literature on model interpretability before arguing for the use of conditional
entropy as the metric of interest to derive knowledge from data through a
model-agnostic approach. In particular, we justify the use of deep neural
networks to measure conditional entropy on real-world datasets. We create the
$Razor entropy$ using the framework of algorithmic information theory
and derive an explicit formula that is identical to our semi-supervised
training objective. Finally, we borrow concepts from game theory to define
$top-k Shapley values$. This novel method of payoff distribution
satisfies most of the properties of Shapley values, and is of particular
interest when the value function is monotone submodular. Unlike classical
Shapley values, top-k Shapley values can be computed in quadratic time of the
number of features instead of exponential. We implement our methodology and
report the results on our particular task of counterpart choice.
Finally, we present an improvement to the $node2vec$ algorithm that
could for example be used to further study intermediation. We show that the
neighbor sampling used in the generation of biased walks can be performed in
logarithmic time with a quasilinear time pre-computation, unlike the current
implementations that do not scale well.

arXiv link: http://arxiv.org/abs/2012.01883v1

Econometrics arXiv paper, submitted: 2020-12-03

Bull and Bear Markets During the COVID-19 Pandemic

Authors: John M. Maheu, Thomas H. McCurdy, Yong Song

The COVID-19 pandemic has caused severe disruption to economic and financial
activity worldwide. We assess what happened to the aggregate U.S. stock market
during this period, including implications for both short and long-horizon
investors. Using the model of Maheu, McCurdy and Song (2012), we provide
smoothed estimates and out-of-sample forecasts associated with stock market
dynamics during the pandemic. We identify bull and bear market regimes
including their bull correction and bear rally components, demonstrate the
model's performance in capturing periods of significant regime change, and
provide forecasts that improve risk management and investment decisions. The
paper concludes with out-of-sample forecasts of market states one year ahead.

arXiv link: http://arxiv.org/abs/2012.01623v1

Econometrics arXiv paper, submitted: 2020-12-01

Testable Implications of Multiple Equilibria in Discrete Games with Correlated Types

Authors: Aureo de Paula, Xun Tang

We study testable implications of multiple equilibria in discrete games with
incomplete information. Unlike de Paula and Tang (2012), we allow the players'
private signals to be correlated. In static games, we leverage independence of
private types across games whose equilibrium selection is correlated. In
dynamic games with serially correlated discrete unobserved heterogeneity, our
testable implication builds on the fact that the distribution of a sequence of
choices and states are mixtures over equilibria and unobserved heterogeneity.
The number of mixture components is a known function of the length of the
sequence as well as the cardinality of equilibria and unobserved heterogeneity
support. In both static and dynamic cases, these testable implications are
implementable using existing statistical tools.

arXiv link: http://arxiv.org/abs/2012.00787v1

Econometrics arXiv updated paper (originally submitted: 2020-12-01)

Evaluating (weighted) dynamic treatment effects by double machine learning

Authors: Hugo Bodory, Martin Huber, Lukáš Lafférs

We consider evaluating the causal effects of dynamic treatments, i.e. of
multiple treatment sequences in various periods, based on double machine
learning to control for observed, time-varying covariates in a data-driven way
under a selection-on-observables assumption. To this end, we make use of
so-called Neyman-orthogonal score functions, which imply the robustness of
treatment effect estimation to moderate (local) misspecifications of the
dynamic outcome and treatment models. This robustness property permits
approximating outcome and treatment models by double machine learning even
under high dimensional covariates and is combined with data splitting to
prevent overfitting. In addition to effect estimation for the total population,
we consider weighted estimation that permits assessing dynamic treatment
effects in specific subgroups, e.g. among those treated in the first treatment
period. We demonstrate that the estimators are asymptotically normal and
$n$-consistent under specific regularity conditions and investigate
their finite sample properties in a simulation study. Finally, we apply the
methods to the Job Corps study in order to assess different sequences of
training programs under a large set of covariates.

arXiv link: http://arxiv.org/abs/2012.00370v5

Econometrics arXiv updated paper (originally submitted: 2020-11-30)

Double machine learning for sample selection models

Authors: Michela Bia, Martin Huber, Lukáš Lafférs

This paper considers the evaluation of discretely distributed treatments when
outcomes are only observed for a subpopulation due to sample selection or
outcome attrition. For identification, we combine a selection-on-observables
assumption for treatment assignment with either selection-on-observables or
instrumental variable assumptions concerning the outcome attrition/sample
selection process. We also consider dynamic confounding, meaning that
covariates that jointly affect sample selection and the outcome may (at least
partly) be influenced by the treatment. To control in a data-driven way for a
potentially high dimensional set of pre- and/or post-treatment covariates, we
adapt the double machine learning framework for treatment evaluation to sample
selection problems. We make use of (a) Neyman-orthogonal, doubly robust, and
efficient score functions, which imply the robustness of treatment effect
estimation to moderate regularization biases in the machine learning-based
estimation of the outcome, treatment, or sample selection models and (b) sample
splitting (or cross-fitting) to prevent overfitting bias. We demonstrate that
the proposed estimators are asymptotically normal and root-n consistent under
specific regularity conditions concerning the machine learners and investigate
their finite sample properties in a simulation study. We also apply our
proposed methodology to the Job Corps data for evaluating the effect of
training on hourly wages which are only observed conditional on employment. The
estimator is available in the causalweight package for the statistical software
R.

arXiv link: http://arxiv.org/abs/2012.00745v5

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-11-30

An Automatic Finite-Sample Robustness Metric: When Can Dropping a Little Data Make a Big Difference?

Authors: Tamara Broderick, Ryan Giordano, Rachael Meager

Study samples often differ from the target populations of inference and
policy decisions in non-random ways. Researchers typically believe that such
departures from random sampling -- due to changes in the population over time
and space, or difficulties in sampling truly randomly -- are small, and their
corresponding impact on the inference should be small as well. We might
therefore be concerned if the conclusions of our studies are excessively
sensitive to a very small proportion of our sample data. We propose a method to
assess the sensitivity of applied econometric conclusions to the removal of a
small fraction of the sample. Manually checking the influence of all possible
small subsets is computationally infeasible, so we use an approximation to find
the most influential subset. Our metric, the "Approximate Maximum Influence
Perturbation," is based on the classical influence function, and is
automatically computable for common methods including (but not limited to) OLS,
IV, MLE, GMM, and variational Bayes. We provide finite-sample error bounds on
approximation performance. At minimal extra cost, we provide an exact
finite-sample lower bound on sensitivity. We find that sensitivity is driven by
a signal-to-noise ratio in the inference problem, is not reflected in standard
errors, does not disappear asymptotically, and is not due to misspecification.
While some empirical applications are robust, results of several influential
economics papers can be overturned by removing less than 1% of the sample.

arXiv link: http://arxiv.org/abs/2011.14999v5

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2020-11-28

Adaptive Inference in Multivariate Nonparametric Regression Models Under Monotonicity

Authors: Koohyun Kwon, Soonwoo Kwon

We consider the problem of adaptive inference on a regression function at a
point under a multivariate nonparametric regression setting. The regression
function belongs to a H\"older class and is assumed to be monotone with respect
to some or all of the arguments. We derive the minimax rate of convergence for
confidence intervals (CIs) that adapt to the underlying smoothness, and provide
an adaptive inference procedure that obtains this minimax rate. The procedure
differs from that of Cai and Low (2004), intended to yield shorter CIs under
practically relevant specifications. The proposed method applies to general
linear functionals of the regression function, and is shown to have favorable
performance compared to existing inference procedures.

arXiv link: http://arxiv.org/abs/2011.14219v1

Econometrics arXiv paper, submitted: 2020-11-28

Inference in Regression Discontinuity Designs under Monotonicity

Authors: Koohyun Kwon, Soonwoo Kwon

We provide an inference procedure for the sharp regression discontinuity
design (RDD) under monotonicity, with possibly multiple running variables.
Specifically, we consider the case where the true regression function is
monotone with respect to (all or some of) the running variables and assumed to
lie in a Lipschitz smoothness class. Such a monotonicity condition is natural
in many empirical contexts, and the Lipschitz constant has an intuitive
interpretation. We propose a minimax two-sided confidence interval (CI) and an
adaptive one-sided CI. For the two-sided CI, the researcher is required to
choose a Lipschitz constant where she believes the true regression function to
lie in. This is the only tuning parameter, and the resulting CI has uniform
coverage and obtains the minimax optimal length. The one-sided CI can be
constructed to maintain coverage over all monotone functions, providing maximum
credibility in terms of the choice of the Lipschitz constant. Moreover, the
monotonicity makes it possible for the (excess) length of the CI to adapt to
the true Lipschitz constant of the unknown regression function. Overall, the
proposed procedures make it easy to see under what conditions on the underlying
regression function the given estimates are significant, which can add more
transparency to research using RDD methods.

arXiv link: http://arxiv.org/abs/2011.14216v1

Econometrics arXiv paper, submitted: 2020-11-26

A Comparison of Statistical and Machine Learning Algorithms for Predicting Rents in the San Francisco Bay Area

Authors: Paul Waddell, Arezoo Besharati-Zadeh

Urban transportation and land use models have used theory and statistical
modeling methods to develop model systems that are useful in planning
applications. Machine learning methods have been considered too 'black box',
lacking interpretability, and their use has been limited within the land use
and transportation modeling literature. We present a use case in which
predictive accuracy is of primary importance, and compare the use of random
forest regression to multiple regression using ordinary least squares, to
predict rents per square foot in the San Francisco Bay Area using a large
volume of rental listings scraped from the Craigslist website. We find that we
are able to obtain useful predictions from both models using almost exclusively
local accessibility variables, though the predictive accuracy of the random
forest model is substantially higher.

arXiv link: http://arxiv.org/abs/2011.14924v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2020-11-26

Simultaneous inference for time-varying models

Authors: Sayar Karmakar, Stefan Richter, Wei Biao Wu

A general class of time-varying regression models is considered in this
paper. We estimate the regression coefficients by using local linear
M-estimation. For these estimators, weak Bahadur representations are obtained
and are used to construct simultaneous confidence bands. For practical
implementation, we propose a bootstrap based method to circumvent the slow
logarithmic convergence of the theoretical simultaneous bands. Our results
substantially generalize and unify the treatments for several time-varying
regression and auto-regression models. The performance for ARCH and GARCH
models is studied in simulations and a few real-life applications of our study
are presented through analysis of some popular financial datasets.

arXiv link: http://arxiv.org/abs/2011.13157v2

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2020-11-25

Implementation of a cost-benefit analysis of Demand-Responsive Transport with a Multi-Agent Transport Simulation

Authors: Conny Grunicke, Jan Christian Schlüter, Jani-Pekka Jokinen

In this paper, the technical requirements to perform a cost-benefit analysis
of a Demand Responsive Transport (DRT) service with the traffic simulation
software MATSim are elaborated in order to achieve the long-term goal of
assessing the introduction of a DRT service in G\"ottingen and the surrounding
area. The aim was to determine if the software is suitable for a cost-benefit
analysis while providing a user manual for building a basic simulation that can
be extended with public transport and DRT. The main result is that the software
is suitable for a cost-benefit analysis of a DRT service. In particular, the
most important internal and external costs, such as usage costs of the various
modes of transport and emissions, can be integrated into the simulation
scenarios. Thus, the scenarios presented in this paper can be extended by data
from a mobility study of G\"ottingen and its surroundings in order to achieve
the long-term goal. This paper is aimed at transport economists and researchers
who are not familiar with MATSim, to provide them with a guide for the first
steps in working with a traffic simulation software.

arXiv link: http://arxiv.org/abs/2011.12869v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-11-25

Functional Principal Component Analysis for Cointegrated Functional Time Series

Authors: Won-Ki Seo

Functional principal component analysis (FPCA) has played an important role
in the development of functional time series analysis. This note investigates
how FPCA can be used to analyze cointegrated functional time series and
proposes a modification of FPCA as a novel statistical tool. Our modified FPCA
not only provides an asymptotically more efficient estimator of the
cointegrating vectors, but also leads to novel FPCA-based tests for examining
essential properties of cointegrated functional time series.

arXiv link: http://arxiv.org/abs/2011.12781v8

Econometrics arXiv paper, submitted: 2020-11-23

Doubly weighted M-estimation for nonrandom assignment and missing outcomes

Authors: Akanksha Negi

This paper proposes a new class of M-estimators that double weight for the
twin problems of nonrandom treatment assignment and missing outcomes, both of
which are common issues in the treatment effects literature. The proposed class
is characterized by a `robustness' property, which makes it resilient to
parametric misspecification in either a conditional model of interest (for
example, mean or quantile function) or the two weighting functions. As leading
applications, the paper discusses estimation of two specific causal parameters;
average and quantile treatment effects (ATE, QTEs), which can be expressed as
functions of the doubly weighted estimator, under misspecification of the
framework's parametric components. With respect to the ATE, this paper shows
that the proposed estimator is doubly robust even in the presence of missing
outcomes. Finally, to demonstrate the estimator's viability in empirical
settings, it is applied to Calonico and Smith (2017)'s reconstructed sample
from the National Supported Work training program.

arXiv link: http://arxiv.org/abs/2011.11485v1

Econometrics arXiv updated paper (originally submitted: 2020-11-22)

Non-Identifiability in Network Autoregressions

Authors: Federico Martellosio

We study identifiability of the parameters in autoregressions defined on a
network. Most identification conditions that are available for these models
either rely on the network being observed repeatedly, are only sufficient, or
require strong distributional assumptions. This paper derives conditions that
apply even when the individuals composing the network are observed only once,
are necessary and sufficient for identification, and require weak
distributional assumptions. We find that the model parameters are generically,
in the measure theoretic sense, identified even without repeated observations,
and analyze the combinations of the interaction matrix and the regressor matrix
causing identification failures. This is done both in the original model and
after certain transformations in the sample space, the latter case being
relevant, for example, in some fixed effects specifications.

arXiv link: http://arxiv.org/abs/2011.11084v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-11-22

Exploiting network information to disentangle spillover effects in a field experiment on teens' museum attendance

Authors: Silvia Noirjean, Marco Mariani, Alessandra Mattei, Fabrizia Mealli

A key element in the education of youths is their sensitization to historical
and artistic heritage. We analyze a field experiment conducted in Florence
(Italy) to assess how appropriate incentives assigned to high-school classes
may induce teens to visit museums in their free time. Non-compliance and
spillover effects make the impact evaluation of this clustered encouragement
design challenging. We propose to blend principal stratification and causal
mediation, by defining sub-populations of units according to their compliance
behavior and using the information on their friendship networks as mediator. We
formally define principal natural direct and indirect effects and principal
controlled direct and spillover effects, and use them to disentangle spillovers
from other causal channels. We adopt a Bayesian approach for inference.

arXiv link: http://arxiv.org/abs/2011.11023v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2020-11-20

Nonparametric instrumental regression with right censored duration outcomes

Authors: Jad Beyhum, Jean-Pierre FLorens, Ingrid Van Keilegom

This paper analyzes the effect of a discrete treatment Z on a duration T. The
treatment is not randomly assigned. The confounding issue is treated using a
discrete instrumental variable explaining the treatment and independent of the
error term of the model. Our framework is nonparametric and allows for random
right censoring. This specification generates a nonlinear inverse problem and
the average treatment effect is derived from its solution. We provide local and
global identification properties that rely on a nonlinear system of equations.
We propose an estimation procedure to solve this system and derive rates of
convergence and conditions under which the estimator is asymptotically normal.
When censoring makes identification fail, we develop partial identification
results. Our estimators exhibit good finite sample properties in simulations.
We also apply our methodology to the Illinois Reemployment Bonus Experiment.

arXiv link: http://arxiv.org/abs/2011.10423v1

Econometrics arXiv updated paper (originally submitted: 2020-11-20)

A Semi-Parametric Bayesian Generalized Least Squares Estimator

Authors: Ruochen Wu, Melvyn Weeks

In this paper we propose a semi-parametric Bayesian Generalized Least Squares
estimator. In a generic setting where each error is a vector, the parametric
Generalized Least Square estimator maintains the assumption that each error
vector has the same distributional parameters. In reality, however, errors are
likely to be heterogeneous regarding their distributions. To cope with such
heterogeneity, a Dirichlet process prior is introduced for the distributional
parameters of the errors, leading to the error distribution being a mixture of
a variable number of normal distributions. Our method let the number of normal
components be data driven. Semi-parametric Bayesian estimators for two specific
cases are then presented: the Seemingly Unrelated Regression for equation
systems and the Random Effects Model for panel data. We design a series of
simulation experiments to explore the performance of our estimators. The
results demonstrate that our estimators obtain smaller posterior standard
deviations and mean squared errors than the Bayesian estimators using a
parametric mixture of normal distributions or a normal distribution. We then
apply our semi-parametric Bayesian estimators for equation systems and panel
data models to empirical data.

arXiv link: http://arxiv.org/abs/2011.10252v2

Econometrics arXiv cross-link from cs.CV (cs.CV), submitted: 2020-11-18

Visual Time Series Forecasting: An Image-driven Approach

Authors: Srijan Sood, Zhen Zeng, Naftali Cohen, Tucker Balch, Manuela Veloso

Time series forecasting is essential for agents to make decisions.
Traditional approaches rely on statistical methods to forecast given past
numeric values. In practice, end-users often rely on visualizations such as
charts and plots to reason about their forecasts. Inspired by practitioners, we
re-imagine the topic by creating a novel framework to produce visual forecasts,
similar to the way humans intuitively do. In this work, we leverage advances in
deep learning to extend the field of time series forecasting to a visual
setting. We capture input data as an image and train a model to produce the
subsequent image. This approach results in predicting distributions as opposed
to pointwise values. We examine various synthetic and real datasets with
diverse degrees of complexity. Our experiments show that visual forecasting is
effective for cyclic data but somewhat less for irregular data such as stock
price. Importantly, when using image-based evaluation metrics, we find the
proposed visual forecasting method to outperform various numerical baselines,
including ARIMA and a numerical variation of our method. We demonstrate the
benefits of incorporating vision-based approaches in forecasting tasks -- both
for the quality of the forecasts produced, as well as the metrics that can be
used to evaluate them.

arXiv link: http://arxiv.org/abs/2011.09052v3

Econometrics arXiv paper, submitted: 2020-11-18

A Two-Way Transformed Factor Model for Matrix-Variate Time Series

Authors: Zhaoxing Gao, Ruey S. Tsay

We propose a new framework for modeling high-dimensional matrix-variate time
series by a two-way transformation, where the transformed data consist of a
matrix-variate factor process, which is dynamically dependent, and three other
blocks of white noises. Specifically, for a given $p_1\times p_2$
matrix-variate time series, we seek common nonsingular transformations to
project the rows and columns onto another $p_1$ and $p_2$ directions according
to the strength of the dynamic dependence of the series on the past values.
Consequently, we treat the data as nonsingular linear row and column
transformations of dynamically dependent common factors and white noise
idiosyncratic components. We propose a common orthonormal projection method to
estimate the front and back loading matrices of the matrix-variate factors.
Under the setting that the largest eigenvalues of the covariance of the
vectorized idiosyncratic term diverge for large $p_1$ and $p_2$, we introduce a
two-way projected Principal Component Analysis (PCA) to estimate the associated
loading matrices of the idiosyncratic terms to mitigate such diverging noise
effects. A diagonal-path white noise testing procedure is proposed to estimate
the order of the factor matrix. %under the assumption that the idiosyncratic
term is a matrix-variate white noise process. Asymptotic properties of the
proposed method are established for both fixed and diverging dimensions as the
sample size increases to infinity. We use simulated and real examples to assess
the performance of the proposed method. We also compare our method with some
existing ones in the literature and find that the proposed approach not only
provides interpretable results but also performs well in out-of-sample
forecasting.

arXiv link: http://arxiv.org/abs/2011.09029v1

Econometrics arXiv updated paper (originally submitted: 2020-11-16)

Policy design in experiments with unknown interference

Authors: Davide Viviano, Jess Rudder

This paper studies experimental designs for estimation and inference on
policies with spillover effects. Units are organized into a finite number of
large clusters and interact in unknown ways within each cluster. First, we
introduce a single-wave experiment that, by varying the randomization across
cluster pairs, estimates the marginal effect of a change in treatment
probabilities, taking spillover effects into account. Using the marginal
effect, we propose a test for policy optimality. Second, we design a
multiple-wave experiment to estimate welfare-maximizing treatment rules. We
provide strong theoretical guarantees and an implementation in a large-scale
field experiment.

arXiv link: http://arxiv.org/abs/2011.08174v9

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2020-11-16

Causal motifs and existence of endogenous cascades in directed networks with application to company defaults

Authors: Irena Barjašić, Hrvoje Štefančić, Vedrana Pribičević, Vinko Zlatić

Motivated by the detection of cascades of defaults in economy, we developed a
detection framework for an endogenous spreading based on causal motifs we
define in this paper. We assume that the change of state of a vertex can be
triggered by an endogenous or an exogenous event, that the underlying network
is directed and that times when vertices changed their states are available. In
addition to the data of company defaults, we also simulate cascades driven by
different stochastic processes on different synthetic networks. We show that
some of the smallest motifs can robustly detect endogenous spreading events.
Finally, we apply the method to the data of defaults of Croatian companies and
observe the time window in which an endogenous cascade was likely happening.

arXiv link: http://arxiv.org/abs/2011.08148v2

Econometrics arXiv paper, submitted: 2020-11-14

A Framework for Eliciting, Incorporating, and Disciplining Identification Beliefs in Linear Models

Authors: Francis J. DiTraglia, Camilo Garcia-Jimeno

To estimate causal effects from observational data, an applied researcher
must impose beliefs. The instrumental variables exclusion restriction, for
example, represents the belief that the instrument has no direct effect on the
outcome of interest. Yet beliefs about instrument validity do not exist in
isolation. Applied researchers often discuss the likely direction of selection
and the potential for measurement error in their articles but lack formal tools
for incorporating this information into their analyses. Failing to use all
relevant information not only leaves money on the table; it runs the risk of
leading to a contradiction in which one holds mutually incompatible beliefs
about the problem at hand. To address these issues, we first characterize the
joint restrictions relating instrument invalidity, treatment endogeneity, and
non-differential measurement error in a workhorse linear model, showing how
beliefs over these three dimensions are mutually constrained by each other and
the data. Using this information, we propose a Bayesian framework to help
researchers elicit their beliefs, incorporate them into estimation, and ensure
their mutual coherence. We conclude by illustrating our framework in a number
of examples drawn from the empirical microeconomics literature.

arXiv link: http://arxiv.org/abs/2011.07276v1

Econometrics arXiv paper, submitted: 2020-11-14

Identifying the effect of a mis-classified, binary, endogenous regressor

Authors: Francis J. DiTraglia, Camilo Garcia-Jimeno

This paper studies identification of the effect of a mis-classified, binary,
endogenous regressor when a discrete-valued instrumental variable is available.
We begin by showing that the only existing point identification result for this
model is incorrect. We go on to derive the sharp identified set under mean
independence assumptions for the instrument and measurement error. The
resulting bounds are novel and informative, but fail to point identify the
effect of interest. This motivates us to consider alternative and slightly
stronger assumptions: we show that adding second and third moment independence
assumptions suffices to identify the model.

arXiv link: http://arxiv.org/abs/2011.07272v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-11-13

Rank Determination in Tensor Factor Model

Authors: Yuefeng Han, Rong Chen, Cun-Hui Zhang

Factor model is an appealing and effective analytic tool for high-dimensional
time series, with a wide range of applications in economics, finance and
statistics. This paper develops two criteria for the determination of the
number of factors for tensor factor models where the signal part of an observed
tensor time series assumes a Tucker decomposition with the core tensor as the
factor tensor. The task is to determine the dimensions of the core tensor. One
of the proposed criteria is similar to information based criteria of model
selection, and the other is an extension of the approaches based on the ratios
of consecutive eigenvalues often used in factor analysis for panel time series.
Theoretically results, including sufficient conditions and convergence rates,
are established. The results include the vector factor models as special cases,
with an additional convergence rates. Simulation studies provide promising
finite sample performance for the two criteria.

arXiv link: http://arxiv.org/abs/2011.07131v3

Econometrics arXiv paper, submitted: 2020-11-13

A Generalized Focused Information Criterion for GMM

Authors: Minsu Chang, Francis J. DiTraglia

This paper proposes a criterion for simultaneous GMM model and moment
selection: the generalized focused information criterion (GFIC). Rather than
attempting to identify the "true" specification, the GFIC chooses from a set of
potentially mis-specified moment conditions and parameter restrictions to
minimize the mean-squared error (MSE) of a user-specified target parameter. The
intent of the GFIC is to formalize a situation common in applied practice. An
applied researcher begins with a set of fairly weak "baseline" assumptions,
assumed to be correct, and must decide whether to impose any of a number of
stronger, more controversial "suspect" assumptions that yield parameter
restrictions, additional moment conditions, or both. Provided that the baseline
assumptions identify the model, we show how to construct an asymptotically
unbiased estimator of the asymptotic MSE to select over these suspect
assumptions: the GFIC. We go on to provide results for post-selection inference
and model averaging that can be applied both to the GFIC and various
alternative selection criteria. To illustrate how our criterion can be used in
practice, we specialize the GFIC to the problem of selecting over exogeneity
assumptions and lag lengths in a dynamic panel model, and show that it performs
well in simulations. We conclude by applying the GFIC to a dynamic panel data
model for the price elasticity of cigarette demand.

arXiv link: http://arxiv.org/abs/2011.07085v1

Econometrics arXiv updated paper (originally submitted: 2020-11-13)

Identifying Causal Effects in Experiments with Spillovers and Non-compliance

Authors: Francis J. DiTraglia, Camilo Garcia-Jimeno, Rossa O'Keeffe-O'Donovan, Alejandro Sanchez-Becerra

This paper shows how to use a randomized saturation experimental design to
identify and estimate causal effects in the presence of spillovers--one
person's treatment may affect another's outcome--and one-sided
non-compliance--subjects can only be offered treatment, not compelled to take
it up. Two distinct causal effects are of interest in this setting: direct
effects quantify how a person's own treatment changes her outcome, while
indirect effects quantify how her peers' treatments change her outcome. We
consider the case in which spillovers occur within known groups, and take-up
decisions are invariant to peers' realized offers. In this setting we point
identify the effects of treatment-on-the-treated, both direct and indirect, in
a flexible random coefficients model that allows for heterogeneous treatment
effects and endogenous selection into treatment. We go on to propose a feasible
estimator that is consistent and asymptotically normal as the number and size
of groups increases. We apply our estimator to data from a large-scale job
placement services experiment, and find negative indirect treatment effects on
the likelihood of employment for those willing to take up the program. These
negative spillovers are offset by positive direct treatment effects from own
take-up.

arXiv link: http://arxiv.org/abs/2011.07051v3

Econometrics arXiv updated paper (originally submitted: 2020-11-13)

Dynamic factor, leverage and realized covariances in multivariate stochastic volatility

Authors: Yuta Yamauchi, Yasuhiro Omori

In the stochastic volatility models for multivariate daily stock returns, it
has been found that the estimates of parameters become unstable as the
dimension of returns increases. To solve this problem, we focus on the factor
structure of multiple returns and consider two additional sources of
information: first, the realized stock index associated with the market factor,
and second, the realized covariance matrix calculated from high frequency data.
The proposed dynamic factor model with the leverage effect and realized
measures is applied to ten of the top stocks composing the exchange traded fund
linked with the investment return of the SP500 index and the model is shown to
have a stable advantage in portfolio performance.

arXiv link: http://arxiv.org/abs/2011.06909v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-11-13

Population synthesis for urban resident modeling using deep generative models

Authors: Martin Johnsen, Oliver Brandt, Sergio Garrido, Francisco C. Pereira

The impacts of new real estate developments are strongly associated to its
population distribution (types and compositions of households, incomes, social
demographics) conditioned on aspects such as dwelling typology, price,
location, and floor level. This paper presents a Machine Learning based method
to model the population distribution of upcoming developments of new buildings
within larger neighborhood/condo settings.
We use a real data set from Ecopark Township, a real estate development
project in Hanoi, Vietnam, where we study two machine learning algorithms from
the deep generative models literature to create a population of synthetic
agents: Conditional Variational Auto-Encoder (CVAE) and Conditional Generative
Adversarial Networks (CGAN). A large experimental study was performed, showing
that the CVAE outperforms both the empirical distribution, a non-trivial
baseline model, and the CGAN in estimating the population distribution of new
real estate development projects.

arXiv link: http://arxiv.org/abs/2011.06851v1

Econometrics arXiv updated paper (originally submitted: 2020-11-13)

Weak Identification in Discrete Choice Models

Authors: David T. Frazier, Eric Renault, Lina Zhang, Xueyan Zhao

We study the impact of weak identification in discrete choice models, and
provide insights into the determinants of identification strength in these
models. Using these insights, we propose a novel test that can consistently
detect weak identification in commonly applied discrete choice models, such as
probit, logit, and many of their extensions. Furthermore, we demonstrate that
when the null hypothesis of weak identification is rejected, Wald-based
inference can be carried out using standard formulas and critical values. A
Monte Carlo study compares our proposed testing approach against commonly
applied weak identification tests. The results simultaneously demonstrate the
good performance of our approach and the fundamental failure of using
conventional weak identification tests for linear models in the discrete choice
model context. Furthermore, we compare our approach against those commonly
applied in the literature in two empirical examples: married women labor force
participation, and US food aid and civil conflicts.

arXiv link: http://arxiv.org/abs/2011.06753v2

Econometrics arXiv updated paper (originally submitted: 2020-11-13)

When Should We (Not) Interpret Linear IV Estimands as LATE?

Authors: Tymon Słoczyński

In this paper I revisit the interpretation of the linear instrumental
variables (IV) estimand as a weighted average of conditional local average
treatment effects (LATEs). I focus on a situation in which additional
covariates are required for identification while the reduced-form and
first-stage regressions may be misspecified due to an implicit homogeneity
restriction on the effects of the instrument. I show that the weights on some
conditional LATEs are negative and the IV estimand is no longer interpretable
as a causal effect under a weaker version of monotonicity, i.e. when there are
compliers but no defiers at some covariate values and defiers but no compliers
elsewhere. The problem of negative weights disappears in the interacted
specification of Angrist and Imbens (1995), which avoids misspecification and
seems to be underused in applied work. I illustrate my findings in an
application to the causal effects of pretrial detention on case outcomes. In
this setting, I reject the stronger version of monotonicity, demonstrate that
the interacted instruments are sufficiently strong for consistent estimation
using the jackknife methodology, and present several estimates that are
economically and statistically different, depending on whether the interacted
instruments are used.

arXiv link: http://arxiv.org/abs/2011.06695v7

Econometrics arXiv updated paper (originally submitted: 2020-11-12)

Treatment Allocation with Strategic Agents

Authors: Evan Munro

There is increasing interest in allocating treatments based on observed
individual characteristics: examples include targeted marketing, individualized
credit offers, and heterogeneous pricing. Treatment personalization introduces
incentives for individuals to modify their behavior to obtain a better
treatment. Strategic behavior shifts the joint distribution of covariates and
potential outcomes. The optimal rule without strategic behavior allocates
treatments only to those with a positive Conditional Average Treatment Effect.
With strategic behavior, we show that the optimal rule can involve
randomization, allocating treatments with less than 100% probability even to
those who respond positively on average to the treatment. We propose a
sequential experiment based on Bayesian Optimization that converges to the
optimal treatment rule without parametric assumptions on individual strategic
behavior.

arXiv link: http://arxiv.org/abs/2011.06528v5

Econometrics arXiv updated paper (originally submitted: 2020-11-12)

Gaussian Transforms Modeling and the Estimation of Distributional Regression Functions

Authors: Richard Spady, Sami Stouli

We propose flexible Gaussian representations for conditional cumulative
distribution functions and give a concave likelihood criterion for their
estimation. Optimal representations satisfy the monotonicity property of
conditional cumulative distribution functions, including in finite samples and
under general misspecification. We use these representations to provide a
unified framework for the flexible Maximum Likelihood estimation of conditional
density, cumulative distribution, and quantile functions at parametric rate.
Our formulation yields substantial simplifications and finite sample
improvements over related methods. An empirical application to the gender wage
gap in the United States illustrates our framework.

arXiv link: http://arxiv.org/abs/2011.06416v2

Econometrics arXiv updated paper (originally submitted: 2020-11-12)

Mostly Harmless Machine Learning: Learning Optimal Instruments in Linear IV Models

Authors: Jiafeng Chen, Daniel L. Chen, Greg Lewis

We offer straightforward theoretical results that justify incorporating
machine learning in the standard linear instrumental variable setting. The key
idea is to use machine learning, combined with sample-splitting, to predict the
treatment variable from the instrument and any exogenous covariates, and then
use this predicted treatment and the covariates as technical instruments to
recover the coefficients in the second-stage. This allows the researcher to
extract non-linear co-variation between the treatment and instrument that may
dramatically improve estimation precision and robustness by boosting instrument
strength. Importantly, we constrain the machine-learned predictions to be
linear in the exogenous covariates, thus avoiding spurious identification
arising from non-linear relationships between the treatment and the covariates.
We show that this approach delivers consistent and asymptotically normal
estimates under weak conditions and that it may be adapted to be
semiparametrically efficient (Chamberlain, 1992). Our method preserves standard
intuitions and interpretations of linear instrumental variable methods,
including under weak identification, and provides a simple, user-friendly
upgrade to the applied economics toolbox. We illustrate our method with an
example in law and criminal justice, examining the causal effect of appellate
court reversals on district court sentencing decisions.

arXiv link: http://arxiv.org/abs/2011.06158v3

Econometrics arXiv paper, submitted: 2020-11-10

Testing and Dating Structural Changes in Copula-based Dependence Measures

Authors: Florian Stark, Sven Otto

This paper is concerned with testing and dating structural breaks in the
dependence structure of multivariate time series. We consider a cumulative sum
(CUSUM) type test for constant copula-based dependence measures, such as
Spearman's rank correlation and quantile dependencies. The asymptotic null
distribution is not known in closed form and critical values are estimated by
an i.i.d. bootstrap procedure. We analyze size and power properties in a
simulation study under different dependence measure settings, such as skewed
and fat-tailed distributions. To date break points and to decide whether two
estimated break locations belong to the same break event, we propose a pivot
confidence interval procedure. Finally, we apply the test to the historical
data of ten large financial firms during the last financial crisis from 2002 to
mid-2013.

arXiv link: http://arxiv.org/abs/2011.05036v1

Econometrics arXiv paper, submitted: 2020-11-10

Optimal Policy Learning: From Theory to Practice

Authors: Giovanni Cerulli

Following in the footsteps of the literature on empirical welfare
maximization, this paper wants to contribute by stressing the policymaker
perspective via a practical illustration of an optimal policy assignment
problem. More specifically, by focusing on the class of threshold-based
policies, we first set up the theoretical underpinnings of the policymaker
selection problem, to then offer a practical solution to this problem via an
empirical illustration using the popular LaLonde (1986) training program
dataset. The paper proposes an implementation protocol for the optimal solution
that is straightforward to apply and easy to program with standard statistical
software.

arXiv link: http://arxiv.org/abs/2011.04993v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-11-09

Reducing bias in difference-in-differences models using entropy balancing

Authors: Matthew Cefalu, Brian G. Vegetabile, Michael Dworsky, Christine Eibner, Federico Girosi

This paper illustrates the use of entropy balancing in
difference-in-differences analyses when pre-intervention outcome trends suggest
a possible violation of the parallel trends assumption. We describe a set of
assumptions under which weighting to balance intervention and comparison groups
on pre-intervention outcome trends leads to consistent
difference-in-differences estimates even when pre-intervention outcome trends
are not parallel. Simulated results verify that entropy balancing of
pre-intervention outcomes trends can remove bias when the parallel trends
assumption is not directly satisfied, and thus may enable researchers to use
difference-in-differences designs in a wider range of observational settings
than previously acknowledged.

arXiv link: http://arxiv.org/abs/2011.04826v1

Econometrics arXiv updated paper (originally submitted: 2020-11-09)

Sparse time-varying parameter VECMs with an application to modeling electricity prices

Authors: Niko Hauzenberger, Michael Pfarrhofer, Luca Rossini

In this paper we propose a time-varying parameter (TVP) vector error
correction model (VECM) with heteroskedastic disturbances. We propose tools to
carry out dynamic model specification in an automatic fashion. This involves
using global-local priors, and postprocessing the parameters to achieve truly
sparse solutions. Depending on the respective set of coefficients, we achieve
this via minimizing auxiliary loss functions. Our two-step approach limits
overfitting and reduces parameter estimation uncertainty. We apply this
framework to modeling European electricity prices. When considering daily
electricity prices for different markets jointly, our model highlights the
importance of explicitly addressing cointegration and nonlinearities. In a
forecast exercise focusing on hourly prices for Germany, our approach yields
competitive metrics of predictive accuracy.

arXiv link: http://arxiv.org/abs/2011.04577v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-11-09

DoWhy: An End-to-End Library for Causal Inference

Authors: Amit Sharma, Emre Kiciman

In addition to efficient statistical estimators of a treatment's effect,
successful application of causal inference requires specifying assumptions
about the mechanisms underlying observed data and testing whether they are
valid, and to what extent. However, most libraries for causal inference focus
only on the task of providing powerful statistical estimators. We describe
DoWhy, an open-source Python library that is built with causal assumptions as
its first-class citizens, based on the formal framework of causal graphs to
specify and test causal assumptions. DoWhy presents an API for the four steps
common to any causal analysis---1) modeling the data using a causal graph and
structural assumptions, 2) identifying whether the desired effect is estimable
under the causal model, 3) estimating the effect using statistical estimators,
and finally 4) refuting the obtained estimate through robustness checks and
sensitivity analyses. In particular, DoWhy implements a number of robustness
checks including placebo tests, bootstrap tests, and tests for unoberved
confounding. DoWhy is an extensible library that supports interoperability with
other implementations, such as EconML and CausalML for the the estimation step.
The library is available at https://github.com/microsoft/dowhy

arXiv link: http://arxiv.org/abs/2011.04216v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-11-08

Inference under Superspreading: Determinants of SARS-CoV-2 Transmission in Germany

Authors: Patrick W. Schmidt

Superspreading complicates the study of SARS-CoV-2 transmission. I propose a
model for aggregated case data that accounts for superspreading and improves
statistical inference. In a Bayesian framework, the model is estimated on
German data featuring over 60,000 cases with date of symptom onset and age
group. Several factors were associated with a strong reduction in transmission:
public awareness rising, testing and tracing, information on local incidence,
and high temperature. Immunity after infection, school and restaurant closures,
stay-at-home orders, and mandatory face covering were associated with a smaller
reduction in transmission. The data suggests that public distancing rules
increased transmission in young adults. Information on local incidence was
associated with a reduction in transmission of up to 44% (95%-CI: [40%, 48%]),
which suggests a prominent role of behavioral adaptations to local risk of
infection. Testing and tracing reduced transmission by 15% (95%-CI: [9%,20%]),
where the effect was strongest among the elderly. Extrapolating weather
effects, I estimate that transmission increases by 53% (95%-CI: [43%, 64%]) in
colder seasons.

arXiv link: http://arxiv.org/abs/2011.04002v1

Econometrics arXiv updated paper (originally submitted: 2020-11-08)

Do We Exploit all Information for Counterfactual Analysis? Benefits of Factor Models and Idiosyncratic Correction

Authors: Jianqing Fan, Ricardo P. Masini, Marcelo C. Medeiros

Optimal pricing, i.e., determining the price level that maximizes profit or
revenue of a given product, is a vital task for the retail industry. To select
such a quantity, one needs first to estimate the price elasticity from the
product demand. Regression methods usually fail to recover such elasticities
due to confounding effects and price endogeneity. Therefore, randomized
experiments are typically required. However, elasticities can be highly
heterogeneous depending on the location of stores, for example. As the
randomization frequently occurs at the municipal level, standard
difference-in-differences methods may also fail. Possible solutions are based
on methodologies to measure the effects of treatments on a single (or just a
few) treated unit(s) based on counterfactuals constructed from artificial
controls. For example, for each city in the treatment group, a counterfactual
may be constructed from the untreated locations. In this paper, we apply a
novel high-dimensional statistical method to measure the effects of price
changes on daily sales from a major retailer in Brazil. The proposed
methodology combines principal components (factors) and sparse regressions,
resulting in a method called Factor-Adjusted Regularized Method for Treatment
evaluation (FarmTreat). The data consist of daily sales and prices of
five different products over more than 400 municipalities. The products
considered belong to the sweet and candies category and experiments have
been conducted over the years of 2016 and 2017. Our results confirm the
hypothesis of a high degree of heterogeneity yielding very different pricing
strategies over distinct municipalities.

arXiv link: http://arxiv.org/abs/2011.03996v3

Econometrics arXiv updated paper (originally submitted: 2020-11-06)

Robust Forecasting

Authors: Timothy Christensen, Hyungsik Roger Moon, Frank Schorfheide

We use a decision-theoretic framework to study the problem of forecasting
discrete outcomes when the forecaster is unable to discriminate among a set of
plausible forecast distributions because of partial identification or concerns
about model misspecification or structural breaks. We derive "robust" forecasts
which minimize maximum risk or regret over the set of forecast distributions.
We show that for a large class of models including semiparametric panel data
models for dynamic discrete choice, the robust forecasts depend in a natural
way on a small number of convex optimization problems which can be simplified
using duality methods. Finally, we derive "efficient robust" forecasts to deal
with the problem of first having to estimate the set of forecast distributions
and develop a suitable asymptotic efficiency theory. Forecasts obtained by
replacing nuisance parameters that characterize the set of forecast
distributions with efficient first-stage estimators can be strictly dominated
by our efficient robust forecasts.

arXiv link: http://arxiv.org/abs/2011.03153v4

Econometrics arXiv updated paper (originally submitted: 2020-11-05)

Bias correction for quantile regression estimators

Authors: Grigory Franguridi, Bulat Gafarov, Kaspar Wuthrich

We study the bias of classical quantile regression and instrumental variable
quantile regression estimators. While being asymptotically first-order
unbiased, these estimators can have non-negligible second-order biases. We
derive a higher-order stochastic expansion of these estimators using empirical
process theory. Based on this expansion, we derive an explicit formula for the
second-order bias and propose a feasible bias correction procedure that uses
finite-difference estimators of the bias components. The proposed bias
correction method performs well in simulations. We provide an empirical
illustration using Engel's classical data on household food expenditure.

arXiv link: http://arxiv.org/abs/2011.03073v8

Econometrics arXiv updated paper (originally submitted: 2020-11-05)

A Basket Half Full: Sparse Portfolios

Authors: Ekaterina Seregina

The existing approaches to sparse wealth allocations (1) are limited to
low-dimensional setup when the number of assets is less than the sample size;
(2) lack theoretical analysis of sparse wealth allocations and their impact on
portfolio exposure; (3) are suboptimal due to the bias induced by an
$\ell_1$-penalty. We address these shortcomings and develop an approach to
construct sparse portfolios in high dimensions. Our contribution is twofold:
from the theoretical perspective, we establish the oracle bounds of sparse
weight estimators and provide guidance regarding their distribution. From the
empirical perspective, we examine the merit of sparse portfolios during
different market scenarios. We find that in contrast to non-sparse
counterparts, our strategy is robust to recessions and can be used as a hedging
vehicle during such times.

arXiv link: http://arxiv.org/abs/2011.04278v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-11-04

Debiasing classifiers: is reality at variance with expectation?

Authors: Ashrya Agrawal, Florian Pfisterer, Bernd Bischl, Francois Buet-Golfouse, Srijan Sood, Jiahao Chen, Sameena Shah, Sebastian Vollmer

We present an empirical study of debiasing methods for classifiers, showing
that debiasers often fail in practice to generalize out-of-sample, and can in
fact make fairness worse rather than better. A rigorous evaluation of the
debiasing treatment effect requires extensive cross-validation beyond what is
usually done. We demonstrate that this phenomenon can be explained as a
consequence of bias-variance trade-off, with an increase in variance
necessitated by imposing a fairness constraint. Follow-up experiments validate
the theoretical prediction that the estimation variance depends strongly on the
base rates of the protected class. Considering fairness--performance trade-offs
justifies the counterintuitive notion that partial debiasing can actually yield
better results in practice on out-of-sample data.

arXiv link: http://arxiv.org/abs/2011.02407v2

Econometrics arXiv paper, submitted: 2020-11-04

Adaptive Combinatorial Allocation

Authors: Maximilian Kasy, Alexander Teytelboym

We consider settings where an allocation has to be chosen repeatedly, returns
are unknown but can be learned, and decisions are subject to constraints. Our
model covers two-sided and one-sided matching, even with complex constraints.
We propose an approach based on Thompson sampling. Our main result is a
prior-independent finite-sample bound on the expected regret for this
algorithm. Although the number of allocations grows exponentially in the number
of participants, the bound does not depend on this number. We illustrate the
performance of our algorithm using data on refugee resettlement in the United
States.

arXiv link: http://arxiv.org/abs/2011.02330v1

Econometrics arXiv updated paper (originally submitted: 2020-11-04)

Learning from Forecast Errors: A New Approach to Forecast Combinations

Authors: Tae-Hwy Lee, Ekaterina Seregina

Forecasters often use common information and hence make common mistakes. We
propose a new approach, Factor Graphical Model (FGM), to forecast combinations
that separates idiosyncratic forecast errors from the common errors. FGM
exploits the factor structure of forecast errors and the sparsity of the
precision matrix of the idiosyncratic errors. We prove the consistency of
forecast combination weights and mean squared forecast error estimated using
FGM, supporting the results with extensive simulations. Empirical applications
to forecasting macroeconomic series shows that forecast combination using FGM
outperforms combined forecasts using equal weights and graphical models without
incorporating factor structure of forecast errors.

arXiv link: http://arxiv.org/abs/2011.02077v2

Econometrics arXiv updated paper (originally submitted: 2020-11-02)

Instrumental Variable Identification of Dynamic Variance Decompositions

Authors: Mikkel Plagborg-Møller, Christian K. Wolf

Macroeconomists increasingly use external sources of exogenous variation for
causal inference. However, unless such external instruments (proxies) capture
the underlying shock without measurement error, existing methods are silent on
the importance of that shock for macroeconomic fluctuations. We show that, in a
general moving average model with external instruments, variance decompositions
for the instrumented shock are interval-identified, with informative bounds.
Various additional restrictions guarantee point identification of both variance
and historical decompositions. Unlike SVAR analysis, our methods do not require
invertibility. Applied to U.S. data, they give a tight upper bound on the
importance of monetary shocks for inflation dynamics.

arXiv link: http://arxiv.org/abs/2011.01380v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-11-02

Coresets for Regressions with Panel Data

Authors: Lingxiao Huang, K. Sudhir, Nisheeth K. Vishnoi

This paper introduces the problem of coresets for regression problems to
panel data settings. We first define coresets for several variants of
regression problems with panel data and then present efficient algorithms to
construct coresets of size that depend polynomially on 1/$\varepsilon$ (where
$\varepsilon$ is the error parameter) and the number of regression parameters -
independent of the number of individuals in the panel data or the time units
each individual is observed for. Our approach is based on the Feldman-Langberg
framework in which a key step is to upper bound the "total sensitivity" that is
roughly the sum of maximum influences of all individual-time pairs taken over
all possible choices of regression parameters. Empirically, we assess our
approach with synthetic and real-world datasets; the coreset sizes constructed
using our approach are much smaller than the full dataset and coresets indeed
accelerate the running time of computing the regression objective.

arXiv link: http://arxiv.org/abs/2011.00981v2

Econometrics arXiv updated paper (originally submitted: 2020-11-02)

Nowcasting Growth using Google Trends Data: A Bayesian Structural Time Series Model

Authors: David Kohns, Arnab Bhattacharjee

This paper investigates the benefits of internet search data in the form of
Google Trends for nowcasting real U.S. GDP growth in real time through the lens
of mixed frequency Bayesian Structural Time Series (BSTS) models. We augment
and enhance both model and methodology to make these better amenable to
nowcasting with large number of potential covariates. Specifically, we allow
shrinking state variances towards zero to avoid overfitting, extend the SSVS
(spike and slab variable selection) prior to the more flexible
normal-inverse-gamma prior which stays agnostic about the underlying model
size, as well as adapt the horseshoe prior to the BSTS. The application to
nowcasting GDP growth as well as a simulation study demonstrate that the
horseshoe prior BSTS improves markedly upon the SSVS and the original BSTS
model with the largest gains in dense data-generating-processes. Our
application also shows that a large dimensional set of search terms is able to
improve nowcasts early in a specific quarter before other macroeconomic data
become available. Search terms with high inclusion probability have good
economic interpretation, reflecting leading signals of economic anxiety and
wealth effects.

arXiv link: http://arxiv.org/abs/2011.00938v2

Econometrics arXiv updated paper (originally submitted: 2020-11-01)

Optimal Portfolio Using Factor Graphical Lasso

Authors: Tae-Hwy Lee, Ekaterina Seregina

Graphical models are a powerful tool to estimate a high-dimensional inverse
covariance (precision) matrix, which has been applied for a portfolio
allocation problem. The assumption made by these models is a sparsity of the
precision matrix. However, when stock returns are driven by common factors,
such assumption does not hold. We address this limitation and develop a
framework, Factor Graphical Lasso (FGL), which integrates graphical models with
the factor structure in the context of portfolio allocation by decomposing a
precision matrix into low-rank and sparse components. Our theoretical results
and simulations show that FGL consistently estimates the portfolio weights and
risk exposure and also that FGL is robust to heavy-tailed distributions which
makes our method suitable for financial applications. FGL-based portfolios are
shown to exhibit superior performance over several prominent competitors
including equal-weighted and Index portfolios in the empirical application for
the S&P500 constituents.

arXiv link: http://arxiv.org/abs/2011.00435v5

Econometrics arXiv updated paper (originally submitted: 2020-10-31)

Causal Inference for Spatial Treatments

Authors: Michael Pollmann

Many events and policies (treatments) occur at specific spatial locations,
with researchers interested in their effects on nearby units of interest. I
approach the spatial treatment setting from an experimental perspective: What
ideal experiment would we design to estimate the causal effects of spatial
treatments? This perspective motivates a comparison between individuals near
realized treatment locations and individuals near counterfactual (unrealized)
candidate locations, which differs from current empirical practice. I derive
design-based standard errors that are straightforward to compute irrespective
of spatial correlations in outcomes. Furthermore, I propose machine learning
methods to find counterfactual candidate locations using observational data
under unconfounded assignment of the treatment to locations. I apply the
proposed methods to study the causal effects of grocery stores on foot traffic
to nearby businesses during COVID-19 shelter-in-place policies, finding a
substantial positive effect at a very short distance, with no effect at larger
distances.

arXiv link: http://arxiv.org/abs/2011.00373v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-10-31

Estimating County-Level COVID-19 Exponential Growth Rates Using Generalized Random Forests

Authors: Zhaowei She, Zilong Wang, Turgay Ayer, Asmae Toumi, Jagpreet Chhatwal

Rapid and accurate detection of community outbreaks is critical to address
the threat of resurgent waves of COVID-19. A practical challenge in outbreak
detection is balancing accuracy vs. speed. In particular, while estimation
accuracy improves with longer fitting windows, speed degrades. This paper
presents a machine learning framework to balance this tradeoff using
generalized random forests (GRF), and applies it to detect county level
COVID-19 outbreaks. This algorithm chooses an adaptive fitting window size for
each county based on relevant features affecting the disease spread, such as
changes in social distancing policies. Experiment results show that our method
outperforms any non-adaptive window size choices in 7-day ahead COVID-19
outbreak case number predictions.

arXiv link: http://arxiv.org/abs/2011.01219v4

Econometrics arXiv paper, submitted: 2020-10-30

Nonparametric Identification of Production Function, Total Factor Productivity, and Markup from Revenue Data

Authors: Hiroyuki Kasahara, Yoichi Sugita

Commonly used methods of production function and markup estimation assume
that a firm's output quantity can be observed as data, but typical datasets
contain only revenue, not output quantity. We examine the nonparametric
identification of production function and markup from revenue data when a firm
faces a general nonparametri demand function under imperfect competition. Under
standard assumptions, we provide the constructive nonparametric identification
of various firm-level objects: gross production function, total factor
productivity, price markups over marginal costs, output prices, output
quantities, a demand system, and a representative consumer's utility function.

arXiv link: http://arxiv.org/abs/2011.00143v1

Econometrics arXiv paper, submitted: 2020-10-29

Machine Learning for Experimental Design: Methods for Improved Blocking

Authors: Brian Quistorff, Gentry Johnson

Restricting randomization in the design of experiments (e.g., using
blocking/stratification, pair-wise matching, or rerandomization) can improve
the treatment-control balance on important covariates and therefore improve the
estimation of the treatment effect, particularly for small- and medium-sized
experiments. Existing guidance on how to identify these variables and implement
the restrictions is incomplete and conflicting. We identify that differences
are mainly due to the fact that what is important in the pre-treatment data may
not translate to the post-treatment data. We highlight settings where there is
sufficient data to provide clear guidance and outline improved methods to
mostly automate the process using modern machine learning (ML) techniques. We
show in simulations using real-world data, that these methods reduce both the
mean squared error of the estimate (14%-34%) and the size of the standard error
(6%-16%).

arXiv link: http://arxiv.org/abs/2010.15966v1

Econometrics arXiv updated paper (originally submitted: 2020-10-29)

Identification and Estimation of Unconditional Policy Effects of an Endogenous Binary Treatment: An Unconditional MTE Approach

Authors: Julian Martinez-Iriarte, Yixiao Sun

This paper studies the identification and estimation of policy effects when
treatment status is binary and endogenous. We introduce a new class of marginal
treatment effects (MTEs) based on the influence function of the functional
underlying the policy target. We show that an unconditional policy effect can
be represented as a weighted average of the newly defined MTEs over the
individuals who are indifferent about their treatment status. We provide
conditions for point identification of the unconditional policy effects. When a
quantile is the functional of interest, we introduce the UNconditional
Instrumental Quantile Estimator (UNIQUE) and establish its consistency and
asymptotic distribution. In the empirical application, we estimate the effect
of changing college enrollment status, induced by higher tuition subsidy, on
the quantiles of the wage distribution.

arXiv link: http://arxiv.org/abs/2010.15864v6

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2020-10-29

Multiscale characteristics of the emerging global cryptocurrency market

Authors: Marcin Wątorek, Stanisław Drożdż, Jarosław Kwapień, Ludovico Minati, Paweł Oświęcimka, Marek Stanuszek

The review introduces the history of cryptocurrencies, offering a description
of the blockchain technology behind them. Differences between cryptocurrencies
and the exchanges on which they are traded have been shown. The central part
surveys the analysis of cryptocurrency price changes on various platforms. The
statistical properties of the fluctuations in the cryptocurrency market have
been compared to the traditional markets. With the help of the latest
statistical physics methods the non-linear correlations and multiscale
characteristics of the cryptocurrency market are analyzed. In the last part the
co-evolution of the correlation structure among the 100 cryptocurrencies having
the largest capitalization is retraced. The detailed topology of cryptocurrency
network on the Binance platform from bitcoin perspective is also considered.
Finally, an interesting observation on the Covid-19 pandemic impact on the
cryptocurrency market is presented and discussed: recently we have witnessed a
"phase transition" of the cryptocurrencies from being a hedge opportunity for
the investors fleeing the traditional markets to become a part of the global
market that is substantially coupled to the traditional financial instruments
like the currencies, stocks, and commodities.
The main contribution is an extensive demonstration that structural
self-organization in the cryptocurrency markets has caused the same to attain
complexity characteristics that are nearly indistinguishable from the Forex
market at the level of individual time-series. However, the cross-correlations
between the exchange rates on cryptocurrency platforms differ from it. The
cryptocurrency market is less synchronized and the information flows more
slowly, which results in more frequent arbitrage opportunities. The methodology
used in the review allows the latter to be detected, and lead-lag relationships
to be discovered.

arXiv link: http://arxiv.org/abs/2010.15403v2

Econometrics arXiv paper, submitted: 2020-10-28

Modeling European regional FDI flows using a Bayesian spatial Poisson interaction model

Authors: Tamás Krisztin, Philipp Piribauer

This paper presents an empirical study of spatial origin and destination
effects of European regional FDI dyads. Recent regional studies primarily focus
on locational determinants, but ignore bilateral origin- and intervening
factors, as well as associated spatial dependence. This paper fills this gap by
using observations on interregional FDI flows within a spatially augmented
Poisson interaction model. We explicitly distinguish FDI activities between
three different stages of the value chain. Our results provide important
insights on drivers of regional FDI activities, both from origin and
destination perspectives. We moreover show that spatial dependence plays a key
role in both dimensions.

arXiv link: http://arxiv.org/abs/2010.14856v1

Econometrics arXiv updated paper (originally submitted: 2020-10-28)

Deep Learning for Individual Heterogeneity

Authors: Max H. Farrell, Tengyuan Liang, Sanjog Misra

This paper integrates deep neural networks (DNNs) into structural economic
models to increase flexibility and capture rich heterogeneity while preserving
interpretability. Economic structure and machine learning are complements in
empirical modeling, not substitutes: DNNs provide the capacity to learn
complex, non-linear heterogeneity patterns, while the structural model ensures
the estimates remain interpretable and suitable for decision making and policy
analysis. We start with a standard parametric structural model and then enrich
its parameters into fully flexible functions of observables, which are
estimated using a particular DNN architecture whose structure reflects the
economic model. We illustrate our framework by studying demand estimation in
consumer choice. We show that by enriching a standard demand model we can
capture rich heterogeneity, and further, exploit this heterogeneity to create a
personalized pricing strategy. This type of optimization is not possible
without economic structure, but cannot be heterogeneous without machine
learning. Finally, we provide theoretical justification of each step in our
proposed methodology. We first establish non-asymptotic bounds and convergence
rates of our structural deep learning approach. Next, a novel and quite general
influence function calculation allows for feasible inference via double machine
learning in a wide variety of contexts. These results may be of interest in
many other contexts, as they generalize prior work.

arXiv link: http://arxiv.org/abs/2010.14694v3

Econometrics arXiv paper, submitted: 2020-10-27

E-Commerce Delivery Demand Modeling Framework for An Agent-Based Simulation Platform

Authors: Takanori Sakai, Yusuke Hara, Ravi Seshadri, André Alho, Md Sami Hasnine, Peiyu Jing, ZhiYuan Chua, Moshe Ben-Akiva

The e-commerce delivery demand has grown rapidly in the past two decades and
such trend has accelerated tremendously due to the ongoing coronavirus
pandemic. Given the situation, the need for predicting e-commerce delivery
demand and evaluating relevant logistics solutions is increasing. However, the
existing simulation models for e-commerce delivery demand are still limited and
do not consider the delivery options and their attributes that shoppers face on
e-commerce order placements. We propose a novel modeling framework which
jointly predicts the average total value of e-commerce purchase, the purchase
amount per transaction, and delivery option choices. The proposed framework can
simulate the changes in e-commerce delivery demand attributable to the changes
in delivery options. We assume the model parameters based on various sources of
relevant information and conduct a demonstrative sensitivity analysis.
Furthermore, we have applied the model to the simulation for the
Auto-Innovative Prototype city. While the calibration of the model using
real-world survey data is required, the result of the analysis highlights the
applicability of the proposed framework.

arXiv link: http://arxiv.org/abs/2010.14375v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2020-10-27

The Efficiency Gap

Authors: Timo Dimitriadis, Tobias Fissler, Johanna Ziegel

Parameter estimation via M- and Z-estimation is equally powerful in
semiparametric models for one-dimensional functionals due to a one-to-one
relation between corresponding loss and identification functions via
integration and differentiation. For multivariate functionals such as multiple
moments, quantiles, or the pair (Value at Risk, Expected Shortfall), this
one-to-one relation fails and not every identification function possesses an
antiderivative. The most important implication is an efficiency gap: The most
efficient Z-estimator often outperforms the most efficient M-estimator. We
theoretically establish this phenomenon for multiple quantiles at different
levels and for the pair (Value at Risk, Expected Shortfall), and illustrate the
gap numerically. Our results further give guidance for pseudo-efficient
M-estimation for semiparametric models of the Value at Risk and Expected
Shortfall.

arXiv link: http://arxiv.org/abs/2010.14146v3

Econometrics arXiv updated paper (originally submitted: 2020-10-26)

Consumer Theory with Non-Parametric Taste Uncertainty and Individual Heterogeneity

Authors: Christopher Dobronyi, Christian Gouriéroux

We introduce two models of non-parametric random utility for demand systems:
the stochastic absolute risk aversion (SARA) model, and the stochastic
safety-first (SSF) model. In each model, individual-level heterogeneity is
characterized by a distribution $\pi\in\Pi$ of taste parameters, and
heterogeneity across consumers is introduced using a distribution $F$ over the
distributions in $\Pi$. Demand is non-separable and heterogeneity is
infinite-dimensional. Both models admit corner solutions. We consider two
frameworks for estimation: a Bayesian framework in which $F$ is known, and a
hyperparametric (or empirical Bayesian) framework in which $F$ is a member of a
known parametric family. Our methods are illustrated by an application to a
large U.S. panel of scanner data on alcohol consumption.

arXiv link: http://arxiv.org/abs/2010.13937v4

Econometrics arXiv updated paper (originally submitted: 2020-10-26)

Modeling Long Cycles

Authors: Natasha Kang, Vadim Marmer

Recurrent boom-and-bust cycles are a salient feature of economic and
financial history. Cycles found in the data are stochastic, often highly
persistent, and span substantial fractions of the sample size. We refer to such
cycles as "long". In this paper, we develop a novel approach to modeling
cyclical behavior specifically designed to capture long cycles. We show that
existing inferential procedures may produce misleading results in the presence
of long cycles, and propose a new econometric procedure for the inference on
the cycle length. Our procedure is asymptotically valid regardless of the cycle
length. We apply our methodology to a set of macroeconomic and financial
variables for the U.S. We find evidence of long stochastic cycles in the
standard business cycle variables, as well as in credit and house prices.
However, we rule out the presence of stochastic cycles in asset market data.
Moreover, according to our result, financial cycles as characterized by credit
and house prices tend to be twice as long as business cycles.

arXiv link: http://arxiv.org/abs/2010.13877v4

Econometrics arXiv paper, submitted: 2020-10-26

What can be learned from satisfaction assessments?

Authors: Naftali Cohen, Simran Lamba, Prashant Reddy

Companies survey their customers to measure their satisfaction levels with
the company and its services. The received responses are crucial as they allow
companies to assess their respective performances and find ways to make needed
improvements. This study focuses on the non-systematic bias that arises when
customers assign numerical values in ordinal surveys. Using real customer
satisfaction survey data of a large retail bank, we show that the common
practice of segmenting ordinal survey responses into uneven segments limit the
value that can be extracted from the data. We then show that it is possible to
assess the magnitude of the irreducible error under simple assumptions, even in
real surveys, and place the achievable modeling goal in perspective. We finish
the study by suggesting that a thoughtful survey design, which uses either a
careful binning strategy or proper calibration, can reduce the compounding
non-systematic error even in elaborated ordinal surveys. A possible application
of the calibration method we propose is efficiently conducting targeted surveys
using active learning.

arXiv link: http://arxiv.org/abs/2010.13340v1

Econometrics arXiv updated paper (originally submitted: 2020-10-26)

A Systematic Comparison of Forecasting for Gross Domestic Product in an Emergent Economy

Authors: Kleyton da Costa, Felipe Leite Coelho da Silva, Josiane da Silva Cordeiro Coelho, André de Melo Modenesi

Gross domestic product (GDP) is an important economic indicator that
aggregates useful information to assist economic agents and policymakers in
their decision-making process. In this context, GDP forecasting becomes a
powerful decision optimization tool in several areas. In order to contribute in
this direction, we investigated the efficiency of classical time series models,
the state-space models, and the neural network models, applied to Brazilian
gross domestic product. The models used were: a Seasonal Autoregressive
Integrated Moving Average (SARIMA) and a Holt-Winters method, which are
classical time series models; the dynamic linear model, a state-space model;
and neural network autoregression and the multilayer perceptron, artificial
neural network models. Based on statistical metrics of model comparison, the
multilayer perceptron presented the best in-sample and out-sample forecasting
performance for the analyzed period, also incorporating the growth rate
structure significantly.

arXiv link: http://arxiv.org/abs/2010.13259v2

Econometrics arXiv updated paper (originally submitted: 2020-10-25)

Recurrent Conditional Heteroskedasticity

Authors: T. -N. Nguyen, M. -N. Tran, R. Kohn

We propose a new class of financial volatility models, called the REcurrent
Conditional Heteroskedastic (RECH) models, to improve both in-sample analysis
and out-ofsample forecasting of the traditional conditional heteroskedastic
models. In particular, we incorporate auxiliary deterministic processes,
governed by recurrent neural networks, into the conditional variance of the
traditional conditional heteroskedastic models, e.g. GARCH-type models, to
flexibly capture the dynamics of the underlying volatility. RECH models can
detect interesting effects in financial volatility overlooked by the existing
conditional heteroskedastic models such as the GARCH, GJR and EGARCH. The new
models often have good out-of-sample forecasts while still explaining well the
stylized facts of financial volatility by retaining the well-established
features of econometric GARCH-type models. These properties are illustrated
through simulation studies and applications to thirty-one stock indices and
exchange rate data. . An user-friendly software package together with the
examples reported in the paper are available at https://github.com/vbayeslab.

arXiv link: http://arxiv.org/abs/2010.13061v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-10-23

Off-Policy Evaluation of Bandit Algorithm from Dependent Samples under Batch Update Policy

Authors: Masahiro Kato, Yusuke Kaneko

The goal of off-policy evaluation (OPE) is to evaluate a new policy using
historical data obtained via a behavior policy. However, because the contextual
bandit algorithm updates the policy based on past observations, the samples are
not independent and identically distributed (i.i.d.). This paper tackles this
problem by constructing an estimator from a martingale difference sequence
(MDS) for the dependent samples. In the data-generating process, we do not
assume the convergence of the policy, but the policy uses the same conditional
probability of choosing an action during a certain period. Then, we derive an
asymptotically normal estimator of the value of an evaluation policy. As
another advantage of our method, the batch-based approach simultaneously solves
the deficient support problem. Using benchmark and real-world datasets, we
experimentally confirm the effectiveness of the proposed method.

arXiv link: http://arxiv.org/abs/2010.13554v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-10-23

A Practical Guide of Off-Policy Evaluation for Bandit Problems

Authors: Masahiro Kato, Kenshi Abe, Kaito Ariu, Shota Yasui

Off-policy evaluation (OPE) is the problem of estimating the value of a
target policy from samples obtained via different policies. Recently, applying
OPE methods for bandit problems has garnered attention. For the theoretical
guarantees of an estimator of the policy value, the OPE methods require various
conditions on the target policy and policy used for generating the samples.
However, existing studies did not carefully discuss the practical situation
where such conditions hold, and the gap between them remains. This paper aims
to show new results for bridging the gap. Based on the properties of the
evaluation policy, we categorize OPE situations. Then, among practical
applications, we mainly discuss the best policy selection. For the situation,
we propose a meta-algorithm based on existing OPE estimators. We investigate
the proposed concepts using synthetic and open real-world datasets in
experiments.

arXiv link: http://arxiv.org/abs/2010.12470v1

Econometrics arXiv updated paper (originally submitted: 2020-10-23)

Low-Rank Approximations of Nonseparable Panel Models

Authors: Iván Fernández-Val, Hugo Freeman, Martin Weidner

We provide estimation methods for nonseparable panel models based on low-rank
factor structure approximations. The factor structures are estimated by
matrix-completion methods to deal with the computational challenges of
principal component analysis in the presence of missing data. We show that the
resulting estimators are consistent in large panels, but suffer from
approximation and shrinkage biases. We correct these biases using matching and
difference-in-differences approaches. Numerical examples and an empirical
application to the effect of election day registration on voter turnout in the
U.S. illustrate the properties and usefulness of our methods.

arXiv link: http://arxiv.org/abs/2010.12439v2

Econometrics arXiv paper, submitted: 2020-10-23

Forecasting With Factor-Augmented Quantile Autoregressions: A Model Averaging Approach

Authors: Anthoulla Phella

This paper considers forecasts of the growth and inflation distributions of
the United Kingdom with factor-augmented quantile autoregressions under a model
averaging framework. We investigate model combinations across models using
weights that minimise the Akaike Information Criterion (AIC), the Bayesian
Information Criterion (BIC), the Quantile Regression Information Criterion
(QRIC) as well as the leave-one-out cross validation criterion. The unobserved
factors are estimated by principal components of a large panel with N
predictors over T periods under a recursive estimation scheme. We apply the
aforementioned methods to the UK GDP growth and CPI inflation rate. We find
that, on average, for GDP growth, in terms of coverage and final prediction
error, the equal weights or the weights obtained by the AIC and BIC perform
equally well but are outperformed by the QRIC and the Jackknife approach on the
majority of the quantiles of interest. In contrast, the naive QAR(1) model of
inflation outperforms all model averaging methodologies.

arXiv link: http://arxiv.org/abs/2010.12263v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-10-22

Theory-based residual neural networks: A synergy of discrete choice models and deep neural networks

Authors: Shenhao Wang, Baichuan Mo, Jinhua Zhao

Researchers often treat data-driven and theory-driven models as two disparate
or even conflicting methods in travel behavior analysis. However, the two
methods are highly complementary because data-driven methods are more
predictive but less interpretable and robust, while theory-driven methods are
more interpretable and robust but less predictive. Using their complementary
nature, this study designs a theory-based residual neural network (TB-ResNet)
framework, which synergizes discrete choice models (DCMs) and deep neural
networks (DNNs) based on their shared utility interpretation. The TB-ResNet
framework is simple, as it uses a ($\delta$, 1-$\delta$) weighting to take
advantage of DCMs' simplicity and DNNs' richness, and to prevent underfitting
from the DCMs and overfitting from the DNNs. This framework is also flexible:
three instances of TB-ResNets are designed based on multinomial logit model
(MNL-ResNets), prospect theory (PT-ResNets), and hyperbolic discounting
(HD-ResNets), which are tested on three data sets. Compared to pure DCMs, the
TB-ResNets provide greater prediction accuracy and reveal a richer set of
behavioral mechanisms owing to the utility function augmented by the DNN
component in the TB-ResNets. Compared to pure DNNs, the TB-ResNets can modestly
improve prediction and significantly improve interpretation and robustness,
because the DCM component in the TB-ResNets stabilizes the utility functions
and input gradients. Overall, this study demonstrates that it is both feasible
and desirable to synergize DCMs and DNNs by combining their utility
specifications under a TB-ResNet framework. Although some limitations remain,
this TB-ResNet framework is an important first step to create mutual benefits
between DCMs and DNNs for travel behavior modeling, with joint improvement in
prediction, interpretation, and robustness.

arXiv link: http://arxiv.org/abs/2010.11644v1

Econometrics arXiv paper, submitted: 2020-10-22

Approximation-Robust Inference in Dynamic Discrete Choice

Authors: Ben Deaner

Estimation and inference in dynamic discrete choice models often relies on
approximation to lower the computational burden of dynamic programming.
Unfortunately, the use of approximation can impart substantial bias in
estimation and results in invalid confidence sets. We present a method for set
estimation and inference that explicitly accounts for the use of approximation
and is thus valid regardless of the approximation error. We show how one can
account for the error from approximation at low computational cost. Our
methodology allows researchers to assess the estimation error due to the use of
approximation and thus more effectively manage the trade-off between bias and
computational expedience. We provide simulation evidence to demonstrate the
practicality of our approach.

arXiv link: http://arxiv.org/abs/2010.11482v1

Econometrics arXiv updated paper (originally submitted: 2020-10-21)

A Test for Kronecker Product Structure Covariance Matrix

Authors: Patrik Guggenberger, Frank Kleibergen, Sophocles Mavroeidis

We propose a test for a covariance matrix to have Kronecker Product Structure
(KPS). KPS implies a reduced rank restriction on a certain transformation of
the covariance matrix and the new procedure is an adaptation of the Kleibergen
and Paap (2006) reduced rank test. To derive the limiting distribution of the
Wald type test statistic proves challenging partly because of the singularity
of the covariance matrix estimator that appears in the weighting matrix. We
show that the test statistic has a chi square limiting null distribution with
degrees of freedom equal to the number of restrictions tested. Local asymptotic
power results are derived. Monte Carlo simulations reveal good size and power
properties of the test. Re-examining fifteen highly cited papers conducting
instrumental variable regressions, we find that KPS is not rejected in 56 out
of 118 specifications at the 5% nominal size.

arXiv link: http://arxiv.org/abs/2010.10961v4

Econometrics arXiv paper, submitted: 2020-10-21

Worst-case sensitivity

Authors: Jun-ya Gotoh, Michael Jong Kim, Andrew E. B. Lim

We introduce the notion of Worst-Case Sensitivity, defined as the worst-case
rate of increase in the expected cost of a Distributionally Robust Optimization
(DRO) model when the size of the uncertainty set vanishes. We show that
worst-case sensitivity is a Generalized Measure of Deviation and that a large
class of DRO models are essentially mean-(worst-case) sensitivity problems when
uncertainty sets are small, unifying recent results on the relationship between
DRO and regularized empirical optimization with worst-case sensitivity playing
the role of the regularizer. More generally, DRO solutions can be sensitive to
the family and size of the uncertainty set, and reflect the properties of its
worst-case sensitivity. We derive closed-form expressions of worst-case
sensitivity for well known uncertainty sets including smooth $\phi$-divergence,
total variation, "budgeted" uncertainty sets, uncertainty sets corresponding to
a convex combination of expected value and CVaR, and the Wasserstein metric.
These can be used to select the uncertainty set and its size for a given
application.

arXiv link: http://arxiv.org/abs/2010.10794v1

Econometrics arXiv updated paper (originally submitted: 2020-10-20)

A Simple, Short, but Never-Empty Confidence Interval for Partially Identified Parameters

Authors: Jörg Stoye

This paper revisits the simple, but empirically salient, problem of inference
on a real-valued parameter that is partially identified through upper and lower
bounds with asymptotically normal estimators. A simple confidence interval is
proposed and is shown to have the following properties:
- It is never empty or awkwardly short, including when the sample analog of
the identified set is empty.
- It is valid for a well-defined pseudotrue parameter whether or not the
model is well-specified.
- It involves no tuning parameters and minimal computation.
Computing the interval requires concentrating out one scalar nuisance
parameter. In most cases, the practical result will be simple: To achieve 95%
coverage, report the union of a simple 90% (!) confidence interval for the
identified set and a standard 95% confidence interval for the pseudotrue
parameter.
For uncorrelated estimators -- notably if bounds are estimated from distinct
subsamples -- and conventional coverage levels, validity of this simple
procedure can be shown analytically. The case obtains in the motivating
empirical application (de Quidt, Haushofer, and Roth, 2018), in which
improvement over existing inference methods is demonstrated. More generally,
simulations suggest that the novel confidence interval has excellent length and
size control. This is partly because, in anticipation of never being empty, the
interval can be made shorter than conventional ones in relevant regions of
sample space.

arXiv link: http://arxiv.org/abs/2010.10484v3

Econometrics arXiv paper, submitted: 2020-10-20

Time-varying Forecast Combination for High-Dimensional Data

Authors: Bin Chen, Kenwin Maung

In this paper, we propose a new nonparametric estimator of time-varying
forecast combination weights. When the number of individual forecasts is small,
we study the asymptotic properties of the local linear estimator. When the
number of candidate forecasts exceeds or diverges with the sample size, we
consider penalized local linear estimation with the group SCAD penalty. We show
that the estimator exhibits the oracle property and correctly selects relevant
forecasts with probability approaching one. Simulations indicate that the
proposed estimators outperform existing combination schemes when structural
changes exist. Two empirical studies on inflation forecasting and equity
premium prediction highlight the merits of our approach relative to other
popular methods.

arXiv link: http://arxiv.org/abs/2010.10435v1

Econometrics arXiv updated paper (originally submitted: 2020-10-19)

L2-Relaxation: With Applications to Forecast Combination and Portfolio Analysis

Authors: Zhentao Shi, Liangjun Su, Tian Xie

This paper tackles forecast combination with many forecasts or minimum
variance portfolio selection with many assets. A novel convex problem called
L2-relaxation is proposed. In contrast to standard formulations, L2-relaxation
minimizes the squared Euclidean norm of the weight vector subject to a set of
relaxed linear inequality constraints. The magnitude of relaxation, controlled
by a tuning parameter, balances the bias and variance. When the
variance-covariance (VC) matrix of the individual forecast errors or financial
assets exhibits latent group structures -- a block equicorrelation matrix plus
a VC for idiosyncratic noises, the solution to L2-relaxation delivers roughly
equal within-group weights. Optimality of the new method is established under
the asymptotic framework when the number of the cross-sectional units $N$
potentially grows much faster than the time dimension $T$. Excellent finite
sample performance of our method is demonstrated in Monte Carlo simulations.
Its wide applicability is highlighted in three real data examples concerning
empirical applications of microeconomics, macroeconomics, and finance.

arXiv link: http://arxiv.org/abs/2010.09477v2

Econometrics arXiv updated paper (originally submitted: 2020-10-17)

A Decomposition Approach to Counterfactual Analysis in Game-Theoretic Models

Authors: Nathan Canen, Kyungchul Song

Decomposition methods are often used for producing counterfactual predictions
in non-strategic settings. When the outcome of interest arises from a
game-theoretic setting where agents are better off by deviating from their
strategies after a new policy, such predictions, despite their practical
simplicity, are hard to justify. We present conditions in Bayesian games under
which the decomposition-based predictions coincide with the equilibrium-based
ones. In many games, such coincidence follows from an invariance condition for
equilibrium selection rules. To illustrate our message, we revisit an empirical
analysis in Ciliberto and Tamer (2009) on firms' entry decisions in the airline
industry.

arXiv link: http://arxiv.org/abs/2010.08868v7

Econometrics arXiv updated paper (originally submitted: 2020-10-17)

Empirical likelihood and uniform convergence rates for dyadic kernel density estimation

Authors: Harold D. Chiang, Bing Yang Tan

This paper studies the asymptotic properties of and alternative inference
methods for kernel density estimation (KDE) for dyadic data. We first establish
uniform convergence rates for dyadic KDE. Secondly, we propose a modified
jackknife empirical likelihood procedure for inference. The proposed test
statistic is asymptotically pivotal regardless of presence of dyadic
clustering. The results are further extended to cover the practically relevant
case of incomplete dyadic data. Simulations show that this modified jackknife
empirical likelihood-based inference procedure delivers precise coverage
probabilities even with modest sample sizes and with incomplete dyadic data.
Finally, we illustrate the method by studying airport congestion in the United
States.

arXiv link: http://arxiv.org/abs/2010.08838v5

Econometrics arXiv updated paper (originally submitted: 2020-10-17)

Synchronization analysis between exchange rates on the basis of purchasing power parity using the Hilbert transform

Authors: Makoto Muto, Yoshitaka Saiki

Synchronization is a phenomenon in which a pair of fluctuations adjust their
rhythms when interacting with each other. We measure the degree of
synchronization between the U.S. dollar (USD) and euro exchange rates and
between the USD and Japanese yen exchange rates on the basis of purchasing
power parity (PPP) over time. We employ a method of synchronization analysis
using the Hilbert transform, which is common in the field of nonlinear science.
We find that the degree of synchronization is high most of the time, suggesting
the establishment of PPP. The degree of synchronization does not remain high
across periods with economic events with asymmetric effects, such as the U.S.
real estate bubble.

arXiv link: http://arxiv.org/abs/2010.08825v2

Econometrics arXiv updated paper (originally submitted: 2020-10-16)

Binary Choice with Asymmetric Loss in a Data-Rich Environment: Theory and an Application to Racial Justice

Authors: Andrii Babii, Xi Chen, Eric Ghysels, Rohit Kumar

We study the binary choice problem in a data-rich environment with asymmetric
loss functions. The econometrics literature covers nonparametric binary choice
problems but does not offer computationally attractive solutions in data-rich
environments. The machine learning literature has many algorithms but is
focused mostly on loss functions that are independent of covariates. We show
that theoretically valid decisions on binary outcomes with general loss
functions can be achieved via a very simple loss-based reweighting of the
logistic regression or state-of-the-art machine learning techniques. We apply
our analysis to racial justice in pretrial detention.

arXiv link: http://arxiv.org/abs/2010.08463v5

Econometrics arXiv updated paper (originally submitted: 2020-10-16)

Measures of Model Risk in Continuous-time Finance Models

Authors: Emese Lazar, Shuyuan Qi, Radu Tunaru

Measuring model risk is required by regulators on financial and insurance
markets. We separate model risk into parameter estimation risk and model
specification risk, and we propose expected shortfall type model risk measures
applied to Levy jump models and affine jump-diffusion models. We investigate
the impact of parameter estimation risk and model specification risk on the
models' ability to capture the joint dynamics of stock and option prices. We
estimate the parameters using Markov chain Monte Carlo techniques, under the
risk-neutral probability measure and the real-world probability measure
jointly. We find strong evidence supporting modeling of price jumps.

arXiv link: http://arxiv.org/abs/2010.08113v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-10-16

Estimating Sleep & Work Hours from Alternative Data by Segmented Functional Classification Analysis (SFCA)

Authors: Klaus Ackermann, Simon D. Angus, Paul A. Raschky

Alternative data is increasingly adapted to predict human and economic
behaviour. This paper introduces a new type of alternative data by
re-conceptualising the internet as a data-driven insights platform at global
scale. Using data from a unique internet activity and location dataset drawn
from over 1.5 trillion observations of end-user internet connections, we
construct a functional dataset covering over 1,600 cities during a 7 year
period with temporal resolution of just 15min. To predict accurate temporal
patterns of sleep and work activity from this data-set, we develop a new
technique, Segmented Functional Classification Analysis (SFCA), and compare its
performance to a wide array of linear, functional, and classification methods.
To confirm the wider applicability of SFCA, in a second application we predict
sleep and work activity using SFCA from US city-wide electricity demand
functional data. Across both problems, SFCA is shown to out-perform current
methods.

arXiv link: http://arxiv.org/abs/2010.08102v1

Econometrics arXiv paper, submitted: 2020-10-15

Heteroscedasticity test of high-frequency data with jumps and microstructure noise

Authors: Qiang Liu, Zhi Liu, Chuanhai Zhang

In this paper, we are interested in testing if the volatility process is
constant or not during a given time span by using high-frequency data with the
presence of jumps and microstructure noise. Based on estimators of integrated
volatility and spot volatility, we propose a nonparametric way to depict the
discrepancy between local variation and global variation. We show that our
proposed test estimator converges to a standard normal distribution if the
volatility is constant, otherwise it diverges to infinity. Simulation studies
verify the theoretical results and show a good finite sample performance of the
test procedure. We also apply our test procedure to do the heteroscedasticity
test for some real high-frequency financial data. We observe that in almost
half of the days tested, the assumption of constant volatility within a day is
violated. And this is due to that the stock prices during opening and closing
periods are highly volatile and account for a relative large proportion of
intraday variation.

arXiv link: http://arxiv.org/abs/2010.07659v1

Econometrics arXiv paper, submitted: 2020-10-15

Comment: Individualized Treatment Rules Under Endogeneity

Authors: Sukjin, Han

This note discusses two recent studies on identification of individualized
treatment rules using instrumental variables---Cui and Tchetgen Tchetgen (2020)
and Qiu et al. (2020). It also proposes identifying assumptions that are
alternative to what is used in both studies.

arXiv link: http://arxiv.org/abs/2010.07656v1

Econometrics arXiv updated paper (originally submitted: 2020-10-11)

Interpretable Neural Networks for Panel Data Analysis in Economics

Authors: Yucheng Yang, Zhong Zheng, Weinan E

The lack of interpretability and transparency are preventing economists from
using advanced tools like neural networks in their empirical research. In this
paper, we propose a class of interpretable neural network models that can
achieve both high prediction accuracy and interpretability. The model can be
written as a simple function of a regularized number of interpretable features,
which are outcomes of interpretable functions encoded in the neural network.
Researchers can design different forms of interpretable functions based on the
nature of their tasks. In particular, we encode a class of interpretable
functions named persistent change filters in the neural network to study time
series cross-sectional data. We apply the model to predicting individual's
monthly employment status using high-dimensional administrative data. We
achieve an accuracy of 94.5% in the test set, which is comparable to the best
performed conventional machine learning methods. Furthermore, the
interpretability of the model allows us to understand the mechanism that
underlies the prediction: an individual's employment status is closely related
to whether she pays different types of insurances. Our work is a useful step
towards overcoming the black-box problem of neural networks, and provide a new
tool for economists to study administrative and proprietary big data.

arXiv link: http://arxiv.org/abs/2010.05311v3

Econometrics arXiv paper, submitted: 2020-10-11

Identifying causal channels of policy reforms with multiple treatments and different types of selection

Authors: Annabelle Doerr, Anthony Strittmatter

We study the identification of channels of policy reforms with multiple
treatments and different types of selection for each treatment. We disentangle
reform effects into policy effects, selection effects, and time effects under
the assumption of conditional independence, common trends, and an additional
exclusion restriction on the non-treated. Furthermore, we show the
identification of direct- and indirect policy effects after imposing additional
sequential conditional independence assumptions on mediating variables. We
illustrate the approach using the German reform of the allocation system of
vocational training for unemployed persons. The reform changed the allocation
of training from a mandatory system to a voluntary voucher system.
Simultaneously, the selection criteria for participants changed, and the reform
altered the composition of course types. We consider the course composition as
a mediator of the policy reform. We show that the empirical evidence from
previous studies reverses when considering the course composition. This has
important implications for policy conclusions.

arXiv link: http://arxiv.org/abs/2010.05221v1

Econometrics arXiv updated paper (originally submitted: 2020-10-10)

Combining Observational and Experimental Data to Improve Efficiency Using Imperfect Instruments

Authors: George Z. Gui

Randomized controlled trials generate experimental variation that can
credibly identify causal effects, but often suffer from limited scale, while
observational datasets are large, but often violate desired identification
assumptions. To improve estimation efficiency, I propose a method that
leverages imperfect instruments - pretreatment covariates that satisfy the
relevance condition but may violate the exclusion restriction. I show that
these imperfect instruments can be used to derive moment restrictions that, in
combination with the experimental data, improve estimation efficiency. I
outline estimators for implementing this strategy, and show that my methods can
reduce variance by up to 50%; therefore, only half of the experimental sample
is required to attain the same statistical precision. I apply my method to a
search listing dataset from Expedia that studies the causal effect of search
rankings on clicks, and show that the method can substantially improve the
precision.

arXiv link: http://arxiv.org/abs/2010.05117v5

Econometrics arXiv paper, submitted: 2020-10-10

Valid t-ratio Inference for IV

Authors: David S. Lee, Justin McCrary, Marcelo J. Moreira, Jack Porter

In the single IV model, current practice relies on the first-stage F
exceeding some threshold (e.g., 10) as a criterion for trusting t-ratio
inferences, even though this yields an anti-conservative test. We show that a
true 5 percent test instead requires an F greater than 104.7. Maintaining 10 as
a threshold requires replacing the critical value 1.96 with 3.43. We re-examine
57 AER papers and find that corrected inference causes half of the initially
presumed statistically significant results to be insignificant. We introduce a
more powerful test, the tF procedure, which provides F-dependent adjusted
t-ratio critical values.

arXiv link: http://arxiv.org/abs/2010.05058v1

Econometrics arXiv updated paper (originally submitted: 2020-10-10)

Asymptotic Properties of the Maximum Likelihood Estimator in Regime-Switching Models with Time-Varying Transition Probabilities

Authors: Chaojun Li, Yan Liu

We prove the asymptotic properties of the maximum likelihood estimator (MLE)
in time-varying transition probability (TVTP) regime-switching models. This
class of models extends the constant regime transition probability in
Markov-switching models to a time-varying probability by including information
from observations. An important feature in this proof is the mixing rate of the
regime process conditional on the observations, which is time varying owing to
the time-varying transition probabilities. Consistency and asymptotic normality
follow from the almost deterministic geometrically decaying bound of the mixing
rate. The assumptions are verified in regime-switching autoregressive models
with widely-applied TVTP specifications. A simulation study examines the
finite-sample distributions of the MLE and compares the estimates of the
asymptotic variance constructed from the Hessian matrix and the outer product
of the score. The simulation results favour the latter. As an empirical
example, we compare three leading economic indicators in terms of describing
U.S. industrial production.

arXiv link: http://arxiv.org/abs/2010.04930v3

Econometrics arXiv updated paper (originally submitted: 2020-10-10)

Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves

Authors: Rahul Singh, Liyuan Xu, Arthur Gretton

We propose estimators based on kernel ridge regression for nonparametric
causal functions such as dose, heterogeneous, and incremental response curves.
Treatment and covariates may be discrete or continuous in general spaces. Due
to a decomposition property specific to the RKHS, our estimators have simple
closed form solutions. We prove uniform consistency with finite sample rates
via original analysis of generalized kernel ridge regression. We extend our
main results to counterfactual distributions and to causal functions identified
by front and back door criteria. We achieve state-of-the-art performance in
nonlinear simulations with many covariates, and conduct a policy evaluation of
the US Job Corps training program for disadvantaged youths.

arXiv link: http://arxiv.org/abs/2010.04855v7

Econometrics arXiv updated paper (originally submitted: 2020-10-09)

When Is Parallel Trends Sensitive to Functional Form?

Authors: Jonathan Roth, Pedro H. C. Sant'Anna

This paper assesses when the validity of difference-in-differences depends on
functional form. We provide a novel characterization: the parallel trends
assumption holds under all strictly monotonic transformations of the outcome if
and only if a stronger “parallel trends”-type condition holds for the
cumulative distribution function of untreated potential outcomes. This
condition for parallel trends to be insensitive to functional form is satisfied
if and essentially only if the population can be partitioned into a subgroup
for which treatment is effectively randomly assigned and a remaining subgroup
for which the distribution of untreated potential outcomes is stable over time.
These conditions have testable implications, and we introduce falsification
tests for the null that parallel trends is insensitive to functional form.

arXiv link: http://arxiv.org/abs/2010.04814v5

Econometrics arXiv paper, submitted: 2020-10-09

Sparse network asymptotics for logistic regression

Authors: Bryan S. Graham

Consider a bipartite network where $N$ consumers choose to buy or not to buy
$M$ different products. This paper considers the properties of the logistic
regression of the $N\times M$ array of i-buys-j purchase decisions,
$\left[Y_{ij}\right]_{1\leq i\leq N,1\leq j\leq M}$, onto known functions of
consumer and product attributes under asymptotic sequences where (i) both $N$
and $M$ grow large and (ii) the average number of products purchased per
consumer is finite in the limit. This latter assumption implies that the
network of purchases is sparse: only a (very) small fraction of all possible
purchases are actually made (concordant with many real-world settings). Under
sparse network asymptotics, the first and last terms in an extended
Hoeffding-type variance decomposition of the score of the logit composite
log-likelihood are of equal order. In contrast, under dense network
asymptotics, the last term is asymptotically negligible. Asymptotic normality
of the logistic regression coefficients is shown using a martingale central
limit theorem (CLT) for triangular arrays. Unlike in the dense case, the
normality result derived here also holds under degeneracy of the network
graphon. Relatedly, when there happens to be no dyadic dependence in the
dataset in hand, it specializes to recently derived results on the behavior of
logistic regression with rare events and iid data. Sparse network asymptotics
may lead to better inference in practice since they suggest variance estimators
which (i) incorporate additional sources of sampling variation and (ii) are
valid under varying degrees of dyadic dependence.

arXiv link: http://arxiv.org/abs/2010.04703v1

Econometrics arXiv updated paper (originally submitted: 2020-10-09)

Identification of multi-valued treatment effects with unobserved heterogeneity

Authors: Koki Fusejima

In this paper, we establish sufficient conditions for identifying treatment
effects on continuous outcomes in endogenous and multi-valued discrete
treatment settings with unobserved heterogeneity. We employ the monotonicity
assumption for multi-valued discrete treatments and instruments, and our
identification condition has a clear economic interpretation. In addition, we
identify the local treatment effects in multi-valued treatment settings and
derive closed-form expressions of the identified treatment effects. We provide
examples to illustrate the usefulness of our result.

arXiv link: http://arxiv.org/abs/2010.04385v5

Econometrics arXiv paper, submitted: 2020-10-08

Inference with a single treated cluster

Authors: Andreas Hagemann

I introduce a generic method for inference about a scalar parameter in
research designs with a finite number of heterogeneous clusters where only a
single cluster received treatment. This situation is commonplace in
difference-in-differences estimation but the test developed here applies more
generally. I show that the test controls size and has power under asymptotics
where the number of observations within each cluster is large but the number of
clusters is fixed. The test combines weighted, approximately Gaussian parameter
estimates with a rearrangement procedure to obtain its critical values. The
weights needed for most empirically relevant situations are tabulated in the
paper. Calculation of the critical values is computationally simple and does
not require simulation or resampling. The rearrangement test is highly robust
to situations where some clusters are much more variable than others. Examples
and an empirical application are provided.

arXiv link: http://arxiv.org/abs/2010.04076v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2020-10-08

Prediction intervals for Deep Neural Networks

Authors: Tullio Mancini, Hector Calvo-Pardo, Jose Olmo

The aim of this paper is to propose a suitable method for constructing
prediction intervals for the output of neural network models. To do this, we
adapt the extremely randomized trees method originally developed for random
forests to construct ensembles of neural networks. The extra-randomness
introduced in the ensemble reduces the variance of the predictions and yields
gains in out-of-sample accuracy. An extensive Monte Carlo simulation exercise
shows the good performance of this novel method for constructing prediction
intervals in terms of coverage probability and mean square prediction error.
This approach is superior to state-of-the-art methods extant in the literature
such as the widely used MC dropout and bootstrap procedures. The out-of-sample
accuracy of the novel algorithm is further evaluated using experimental
settings already adopted in the literature.

arXiv link: http://arxiv.org/abs/2010.04044v2

Econometrics arXiv updated paper (originally submitted: 2020-10-08)

Consistent Specification Test of the Quantile Autoregression

Authors: Anthoulla Phella

This paper proposes a test for the joint hypothesis of correct dynamic
specification and no omitted latent factors for the Quantile Autoregression. If
the composite null is rejected we proceed to disentangle the cause of
rejection, i.e., dynamic misspecification or an omitted variable. We establish
the asymptotic distribution of the test statistics under fairly weak conditions
and show that factor estimation error is negligible. A Monte Carlo study shows
that the suggested tests have good finite sample properties. Finally, we
undertake an empirical illustration of modelling GDP growth and CPI inflation
in the United Kingdom, where we find evidence that factor augmented models are
correctly specified in contrast with their non-augmented counterparts when it
comes to GDP growth, while also exploring the asymmetric behaviour of the
growth and inflation distributions.

arXiv link: http://arxiv.org/abs/2010.03898v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-10-08

The Adaptive Doubly Robust Estimator for Policy Evaluation in Adaptive Experiments and a Paradox Concerning Logging Policy

Authors: Masahiro Kato, Shota Yasui, Kenichiro McAlinn

The doubly robust (DR) estimator, which consists of two nuisance parameters,
the conditional mean outcome and the logging policy (the probability of
choosing an action), is crucial in causal inference. This paper proposes a DR
estimator for dependent samples obtained from adaptive experiments. To obtain
an asymptotically normal semiparametric estimator from dependent samples with
non-Donsker nuisance estimators, we propose adaptive-fitting as a variant of
sample-splitting. We also report an empirical paradox that our proposed DR
estimator tends to show better performances compared to other estimators
utilizing the true logging policy. While a similar phenomenon is known for
estimators with i.i.d. samples, traditional explanations based on asymptotic
efficiency cannot elucidate our case with dependent samples. We confirm this
hypothesis through simulation studies.

arXiv link: http://arxiv.org/abs/2010.03792v5

Econometrics arXiv updated paper (originally submitted: 2020-10-07)

Interpreting Unconditional Quantile Regression with Conditional Independence

Authors: David M. Kaplan

This note provides additional interpretation for the counterfactual outcome
distribution and corresponding unconditional quantile "effects" defined and
estimated by Firpo, Fortin, and Lemieux (2009) and Chernozhukov,
Fern\'andez-Val, and Melly (2013). With conditional independence of the policy
variable of interest, these methods estimate the policy effect for certain
types of policies, but not others. In particular, they estimate the effect of a
policy change that itself satisfies conditional independence.

arXiv link: http://arxiv.org/abs/2010.03606v2

Econometrics arXiv updated paper (originally submitted: 2020-10-07)

Further results on the estimation of dynamic panel logit models with fixed effects

Authors: Hugo Kruiniger

Kitazawa (2013, 2016) showed that the common parameters in the panel logit
AR(1) model with strictly exogenous covariates and fixed effects are estimable
at the root-n rate using the Generalized Method of Moments. Honor\'e and
Weidner (2020) extended his results in various directions: they found
additional moment conditions for the logit AR(1) model and also considered
estimation of logit AR(p) models with p>1. In this note we prove a conjecture
in their paper and show that for given values of the initial condition, the
covariates and the common parameters 2^{T}-2T of their moment functions for the
logit AR(1) model are linearly independent and span the set of valid moment
functions, which is a 2^{T}-2T-dimensional linear subspace of the
2^{T}-dimensional vector space of real valued functions over the outcomes y
element of {0,1}^{T}. We also prove that when p=2 and T element of {3,4,5},
there are, respectively, 2^{T}-4(T-1) and 2^{T}-(3T-2) linearly independent
moment functions for the panel logit AR(2) models with and without covariates.

arXiv link: http://arxiv.org/abs/2010.03382v5

Econometrics arXiv paper, submitted: 2020-10-06

Comment on Gouriéroux, Monfort, Renne (2019): Identification and Estimation in Non-Fundamental Structural VARMA Models

Authors: Bernd Funovits

This comment points out a serious flaw in the article "Gouri\'eroux, Monfort,
Renne (2019): Identification and Estimation in Non-Fundamental Structural VARMA
Models" with regard to mirroring complex-valued roots with Blaschke polynomial
matrices. Moreover, the (non-) feasibility of the proposed method (if the
handling of Blaschke transformation were not prohibitive) for cross-sectional
dimensions greater than two and vector moving average (VMA) polynomial matrices
of degree greater than one is discussed.

arXiv link: http://arxiv.org/abs/2010.02711v1

Econometrics arXiv updated paper (originally submitted: 2020-10-06)

A Recursive Logit Model with Choice Aversion and Its Application to Transportation Networks

Authors: Austin Knies, Jorge Lorca, Emerson Melo

We propose a recursive logit model which captures the notion of choice
aversion by imposing a penalty term that accounts for the dimension of the
choice set at each node of the transportation network. We make three
contributions. First, we show that our model overcomes the correlation problem
between routes, a common pitfall of traditional logit models, and that the
choice aversion model can be seen as an alternative to these models. Second, we
show how our model can generate violations of regularity in the path choice
probabilities. In particular, we show that removing edges in the network may
decrease the probability for existing paths. Finally, we show that under the
presence of choice aversion, adding edges to the network can make users worse
off. In other words, a type of Braess's paradox can emerge outside of
congestion and can be characterized in terms of a parameter that measures
users' degree of choice aversion. We validate these contributions by estimating
this parameter over GPS traffic data captured on a real-world transportation
network.

arXiv link: http://arxiv.org/abs/2010.02398v4

Econometrics arXiv updated paper (originally submitted: 2020-10-05)

Testing homogeneity in dynamic discrete games in finite samples

Authors: Federico A. Bugni, Jackson Bunting, Takuya Ura

The literature on dynamic discrete games often assumes that the conditional
choice probabilities and the state transition probabilities are homogeneous
across markets and over time. We refer to this as the "homogeneity assumption"
in dynamic discrete games. This assumption enables empirical studies to
estimate the game's structural parameters by pooling data from multiple markets
and from many time periods. In this paper, we propose a hypothesis test to
evaluate whether the homogeneity assumption holds in the data. Our hypothesis
test is the result of an approximate randomization test, implemented via a
Markov chain Monte Carlo (MCMC) algorithm. We show that our hypothesis test
becomes valid as the (user-defined) number of MCMC draws diverges, for any
fixed number of markets, time periods, and players. We apply our test to the
empirical study of the U.S.\ Portland cement industry in Ryan (2012).

arXiv link: http://arxiv.org/abs/2010.02297v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-10-05

Deep Distributional Time Series Models and the Probabilistic Forecasting of Intraday Electricity Prices

Authors: Nadja Klein, Michael Stanley Smith, David J. Nott

Recurrent neural networks (RNNs) with rich feature vectors of past values can
provide accurate point forecasts for series that exhibit complex serial
dependence. We propose two approaches to constructing deep time series
probabilistic models based on a variant of RNN called an echo state network
(ESN). The first is where the output layer of the ESN has stochastic
disturbances and a shrinkage prior for additional regularization. The second
approach employs the implicit copula of an ESN with Gaussian disturbances,
which is a deep copula process on the feature space. Combining this copula with
a non-parametrically estimated marginal distribution produces a deep
distributional time series model. The resulting probabilistic forecasts are
deep functions of the feature vector and also marginally calibrated. In both
approaches, Bayesian Markov chain Monte Carlo methods are used to estimate the
models and compute forecasts. The proposed models are suitable for the complex
task of forecasting intraday electricity prices. Using data from the Australian
National Electricity Market, we show that our deep time series models provide
accurate short term probabilistic price forecasts, with the copula model
dominating. Moreover, the models provide a flexible framework for incorporating
probabilistic forecasts of electricity demand as additional features, which
increases upper tail forecast accuracy from the copula model significantly.

arXiv link: http://arxiv.org/abs/2010.01844v2

Econometrics arXiv updated paper (originally submitted: 2020-10-05)

Robust and Efficient Estimation of Potential Outcome Means under Random Assignment

Authors: Akanksha Negi, Jeffrey M. Wooldridge

We study efficiency improvements in randomized experiments for estimating a
vector of potential outcome means using regression adjustment (RA) when there
are more than two treatment levels. We show that linear RA which estimates
separate slopes for each assignment level is never worse, asymptotically, than
using the subsample averages. We also show that separate RA improves over
pooled RA except in the obvious case where slope parameters in the linear
projections are identical across the different assignment levels. We further
characterize the class of nonlinear RA methods that preserve consistency of the
potential outcome means despite arbitrary misspecification of the conditional
mean functions. Finally, we apply these regression adjustment techniques to
efficiently estimate the lower bound mean willingness to pay for an oil spill
prevention program in California.

arXiv link: http://arxiv.org/abs/2010.01800v2

Econometrics arXiv paper, submitted: 2020-10-04

A Class of Time-Varying Vector Moving Average Models: Nonparametric Kernel Estimation and Application

Authors: Yayi Yan, Jiti Gao, Bin Peng

Multivariate dynamic time series models are widely encountered in practical
studies, e.g., modelling policy transmission mechanism and measuring
connectedness between economic agents. To better capture the dynamics, this
paper proposes a wide class of multivariate dynamic models with time-varying
coefficients, which have a general time-varying vector moving average (VMA)
representation, and nest, for instance, time-varying vector autoregression
(VAR), time-varying vector autoregression moving-average (VARMA), and so forth
as special cases. The paper then develops a unified estimation method for the
unknown quantities before an asymptotic theory for the proposed estimators is
established. In the empirical study, we investigate the transmission mechanism
of monetary policy using U.S. data, and uncover a fall in the volatilities of
exogenous shocks. In addition, we find that (i) monetary policy shocks have
less influence on inflation before and during the so-called Great Moderation,
(ii) inflation is more anchored recently, and (iii) the long-run level of
inflation is below, but quite close to the Federal Reserve's target of two
percent after the beginning of the Great Moderation period.

arXiv link: http://arxiv.org/abs/2010.01492v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2020-10-02

On Statistical Discrimination as a Failure of Social Learning: A Multi-Armed Bandit Approach

Authors: Junpei Komiyama, Shunya Noda

We analyze statistical discrimination in hiring markets using a multi-armed
bandit model. Myopic firms face workers arriving with heterogeneous observable
characteristics. The association between the worker's skill and characteristics
is unknown ex ante; thus, firms need to learn it. Laissez-faire causes
perpetual underestimation: minority workers are rarely hired, and therefore,
the underestimation tends to persist. Even a marginal imbalance in the
population ratio frequently results in perpetual underestimation. We propose
two policy solutions: a novel subsidy rule (the hybrid mechanism) and the
Rooney Rule. Our results indicate that temporary affirmative actions
effectively alleviate discrimination stemming from insufficient data.

arXiv link: http://arxiv.org/abs/2010.01079v6

Econometrics arXiv updated paper (originally submitted: 2020-09-30)

Local Regression Distribution Estimators

Authors: Matias D. Cattaneo, Michael Jansson, Xinwei Ma

This paper investigates the large sample properties of local regression
distribution estimators, which include a class of boundary adaptive density
estimators as a prime example. First, we establish a pointwise Gaussian large
sample distributional approximation in a unified way, allowing for both
boundary and interior evaluation points simultaneously. Using this result, we
study the asymptotic efficiency of the estimators, and show that a carefully
crafted minimum distance implementation based on "redundant" regressors can
lead to efficiency gains. Second, we establish uniform linearizations and
strong approximations for the estimators, and employ these results to construct
valid confidence bands. Third, we develop extensions to weighted distributions
with estimated weights and to local $L^{2}$ least squares estimation. Finally,
we illustrate our methods with two applications in program evaluation:
counterfactual density testing, and IV specification and heterogeneity density
analysis. Companion software packages in Stata and R are available.

arXiv link: http://arxiv.org/abs/2009.14367v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2020-09-29

Online Action Learning in High Dimensions: A Conservative Perspective

Authors: Claudio Cardoso Flores, Marcelo Cunha Medeiros

Sequential learning problems are common in several fields of research and
practical applications. Examples include dynamic pricing and assortment, design
of auctions and incentives and permeate a large number of sequential treatment
experiments. In this paper, we extend one of the most popular learning
solutions, the $\epsilon_t$-greedy heuristics, to high-dimensional contexts
considering a conservative directive. We do this by allocating part of the time
the original rule uses to adopt completely new actions to a more focused search
in a restrictive set of promising actions. The resulting rule might be useful
for practical applications that still values surprises, although at a
decreasing rate, while also has restrictions on the adoption of unusual
actions. With high probability, we find reasonable bounds for the cumulative
regret of a conservative high-dimensional decaying $\epsilon_t$-greedy rule.
Also, we provide a lower bound for the cardinality of the set of viable actions
that implies in an improved regret bound for the conservative version when
compared to its non-conservative counterpart. Additionally, we show that
end-users have sufficient flexibility when establishing how much safety they
want, since it can be tuned without impacting theoretical properties. We
illustrate our proposal both in a simulation exercise and using a real dataset.

arXiv link: http://arxiv.org/abs/2009.13961v4

Econometrics arXiv updated paper (originally submitted: 2020-09-29)

A Computational Approach to Identification of Treatment Effects for Policy Evaluation

Authors: Sukjin Han, Shenshen Yang

For counterfactual policy evaluation, it is important to ensure that
treatment parameters are relevant to policies in question. This is especially
challenging under unobserved heterogeneity, as is well featured in the
definition of the local average treatment effect (LATE). Being intrinsically
local, the LATE is known to lack external validity in counterfactual
environments. This paper investigates the possibility of extrapolating local
treatment effects to different counterfactual settings when instrumental
variables are only binary. We propose a novel framework to systematically
calculate sharp nonparametric bounds on various policy-relevant treatment
parameters that are defined as weighted averages of the marginal treatment
effect (MTE). Our framework is flexible enough to fully incorporate statistical
independence (rather than mean independence) of instruments and a large menu of
identifying assumptions beyond the shape restrictions on the MTE that have been
considered in prior studies. We apply our method to understand the effects of
medical insurance policies on the use of medical services.

arXiv link: http://arxiv.org/abs/2009.13861v4

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-09-28

Lockdown effects in US states: an artificial counterfactual approach

Authors: Carlos B. Carneiro, Iúri H. Ferreira, Marcelo C. Medeiros, Henrique F. Pires, Eduardo Zilberman

We adopt an artificial counterfactual approach to assess the impact of
lockdowns on the short-run evolution of the number of cases and deaths in some
US states. To do so, we explore the different timing in which US states adopted
lockdown policies, and divide them among treated and control groups. For each
treated state, we construct an artificial counterfactual. On average, and in
the very short-run, the counterfactual accumulated number of cases would be two
times larger if lockdown policies were not implemented.

arXiv link: http://arxiv.org/abs/2009.13484v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-09-28

Difference-in-Differences for Ordinal Outcomes: Application to the Effect of Mass Shootings on Attitudes toward Gun Control

Authors: Soichiro Yamauchi

The difference-in-differences (DID) design is widely used in observational
studies to estimate the causal effect of a treatment when repeated observations
over time are available. Yet, almost all existing methods assume linearity in
the potential outcome (parallel trends assumption) and target the additive
effect. In social science research, however, many outcomes of interest are
measured on an ordinal scale. This makes the linearity assumption inappropriate
because the difference between two ordinal potential outcomes is not well
defined. In this paper, I propose a method to draw causal inferences for
ordinal outcomes under the DID design. Unlike existing methods, the proposed
method utilizes the latent variable framework to handle the non-numeric nature
of the outcome, enabling identification and estimation of causal effects based
on the assumption on the quantile of the latent continuous variable. The paper
also proposes an equivalence-based test to assess the plausibility of the key
identification assumption when additional pre-treatment periods are available.
The proposed method is applied to a study estimating the causal effect of mass
shootings on the public's support for gun control. I find little evidence for a
uniform shift toward pro-gun control policies as found in the previous study,
but find that the effect is concentrated on left-leaning respondents who
experienced the shooting for the first time in more than a decade.

arXiv link: http://arxiv.org/abs/2009.13404v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-09-28

Learning Classifiers under Delayed Feedback with a Time Window Assumption

Authors: Masahiro Kato, Shota Yasui

We consider training a binary classifier under delayed feedback (DF
learning). For example, in the conversion prediction in online ads, we
initially receive negative samples that clicked the ads but did not buy an
item; subsequently, some samples among them buy an item then change to
positive. In the setting of DF learning, we observe samples over time, then
learn a classifier at some point. We initially receive negative samples;
subsequently, some samples among them change to positive. This problem is
conceivable in various real-world applications such as online advertisements,
where the user action takes place long after the first click. Owing to the
delayed feedback, naive classification of the positive and negative samples
returns a biased classifier. One solution is to use samples that have been
observed for more than a certain time window assuming these samples are
correctly labeled. However, existing studies reported that simply using a
subset of all samples based on the time window assumption does not perform
well, and that using all samples along with the time window assumption improves
empirical performance. We extend these existing studies and propose a method
with the unbiased and convex empirical risk that is constructed from all
samples under the time window assumption. To demonstrate the soundness of the
proposed method, we provide experimental results on a synthetic and open
dataset that is the real traffic log datasets in online advertising.

arXiv link: http://arxiv.org/abs/2009.13092v2

Econometrics arXiv updated paper (originally submitted: 2020-09-26)

Nonclassical Measurement Error in the Outcome Variable

Authors: Christoph Breunig, Stephan Martin

We study a semi-/nonparametric regression model with a general form of
nonclassical measurement error in the outcome variable. We show equivalence of
this model to a generalized regression model. Our main identifying assumptions
are a special regressor type restriction and monotonicity in the nonlinear
relationship between the observed and unobserved true outcome. Nonparametric
identification is then obtained under a normalization of the unknown link
function, which is a natural extension of the classical measurement error case.
We propose a novel sieve rank estimator for the regression function and
establish its rate of convergence.
In Monte Carlo simulations, we find that our estimator corrects for biases
induced by nonclassical measurement error and provides numerically stable
results. We apply our method to analyze belief formation of stock market
expectations with survey data from the German Socio-Economic Panel (SOEP) and
find evidence for nonclassical measurement error in subjective belief data.

arXiv link: http://arxiv.org/abs/2009.12665v2

Econometrics arXiv paper, submitted: 2020-09-23

A step-by-step guide to design, implement, and analyze a discrete choice experiment

Authors: Daniel Pérez-Troncoso

Discrete Choice Experiments (DCE) have been widely used in health economics,
environmental valuation, and other disciplines. However, there is a lack of
resources disclosing the whole procedure of carrying out a DCE. This document
aims to assist anyone wishing to use the power of DCEs to understand people's
behavior by providing a comprehensive guide to the procedure. This guide
contains all the code needed to design, implement, and analyze a DCE using only
free software.

arXiv link: http://arxiv.org/abs/2009.11235v1

Econometrics arXiv paper, submitted: 2020-09-21

Recent Developments on Factor Models and its Applications in Econometric Learning

Authors: Jianqing Fan, Kunpeng Li, Yuan Liao

This paper makes a selective survey on the recent development of the factor
model and its application on statistical learnings. We focus on the perspective
of the low-rank structure of factor models, and particularly draws attentions
to estimating the model from the low-rank recovery point of view. The survey
mainly consists of three parts: the first part is a review on new factor
estimations based on modern techniques on recovering low-rank structures of
high-dimensional models. The second part discusses statistical inferences of
several factor-augmented models and applications in econometric learning
models. The final part summarizes new developments dealing with unbalanced
panels from the matrix completion perspective.

arXiv link: http://arxiv.org/abs/2009.10103v1

Econometrics arXiv updated paper (originally submitted: 2020-09-21)

On the Existence of Conditional Maximum Likelihood Estimates of the Binary Logit Model with Fixed Effects

Authors: Martin Mugnier

By exploiting McFadden (1974)'s results on conditional logit estimation, we
show that there exists a one-to-one mapping between existence and uniqueness of
conditional maximum likelihood estimates of the binary logit model with fixed
effects and the configuration of data points. Our results extend those in
Albert and Anderson (1984) for the cross-sectional case and can be used to
build a simple algorithm that detects spurious estimates in finite samples. As
an illustration, we exhibit an artificial dataset for which the STATA's command
clogit returns spurious estimates.

arXiv link: http://arxiv.org/abs/2009.09998v3

Econometrics arXiv updated paper (originally submitted: 2020-09-21)

Spillovers of Program Benefits with Missing Network Links

Authors: Lina Zhang

The issue of missing network links in partially observed networks is
frequently neglected in empirical studies. This paper addresses this issue when
investigating the spillovers of program benefits in the presence of network
interactions. Our method is flexible enough to account for non-i.i.d. missing
links. It relies on two network measures that can be easily constructed based
on the incoming and outgoing links of the same observed network. The treatment
and spillover effects can be point identified and consistently estimated if
network degrees are bounded for all units. We also demonstrate the bias
reduction property of our method if network degrees of some units are
unbounded. Monte Carlo experiments and a naturalistic simulation on real-world
network data are implemented to verify the finite-sample performance of our
method. We also re-examine the spillover effects of home computer use on
children's self-empowered learning.

arXiv link: http://arxiv.org/abs/2009.09614v3

Econometrics arXiv paper, submitted: 2020-09-21

Optimal probabilistic forecasts: When do they work?

Authors: Gael M. Martin, Rubén Loaiza-Maya, David T. Frazier, Worapree Maneesoonthorn, Andrés Ramírez Hassan

Proper scoring rules are used to assess the out-of-sample accuracy of
probabilistic forecasts, with different scoring rules rewarding distinct
aspects of forecast performance. Herein, we re-investigate the practice of
using proper scoring rules to produce probabilistic forecasts that are
`optimal' according to a given score, and assess when their out-of-sample
accuracy is superior to alternative forecasts, according to that score.
Particular attention is paid to relative predictive performance under
misspecification of the predictive model. Using numerical illustrations, we
document several novel findings within this paradigm that highlight the
important interplay between the true data generating process, the assumed
predictive model and the scoring rule. Notably, we show that only when a
predictive model is sufficiently compatible with the true process to allow a
particular score criterion to reward what it is designed to reward, will this
approach to forecasting reap benefits. Subject to this compatibility however,
the superiority of the optimal forecast will be greater, the greater is the
degree of misspecification. We explore these issues under a range of different
scenarios, and using both artificially simulated and empirical data.

arXiv link: http://arxiv.org/abs/2009.09592v1

Econometrics arXiv updated paper (originally submitted: 2020-09-18)

Inference for Large-Scale Linear Systems with Known Coefficients

Authors: Zheng Fang, Andres Santos, Azeem M. Shaikh, Alexander Torgovitsky

This paper considers the problem of testing whether there exists a
non-negative solution to a possibly under-determined system of linear equations
with known coefficients. This hypothesis testing problem arises naturally in a
number of settings, including random coefficient, treatment effect, and
discrete choice models, as well as a class of linear programming problems. As a
first contribution, we obtain a novel geometric characterization of the null
hypothesis in terms of identified parameters satisfying an infinite set of
inequality restrictions. Using this characterization, we devise a test that
requires solving only linear programs for its implementation, and thus remains
computationally feasible in the high-dimensional applications that motivate our
analysis. The asymptotic size of the proposed test is shown to equal at most
the nominal level uniformly over a large class of distributions that permits
the number of linear equations to grow with the sample size.

arXiv link: http://arxiv.org/abs/2009.08568v2

Econometrics arXiv paper, submitted: 2020-09-17

Semiparametric Testing with Highly Persistent Predictors

Authors: Bas Werker, Bo Zhou

We address the issue of semiparametric efficiency in the bivariate regression
problem with a highly persistent predictor, where the joint distribution of the
innovations is regarded an infinite-dimensional nuisance parameter. Using a
structural representation of the limit experiment and exploiting invariance
relationships therein, we construct invariant point-optimal tests for the
regression coefficient of interest. This approach naturally leads to a family
of feasible tests based on the component-wise ranks of the innovations that can
gain considerable power relative to existing tests under non-Gaussian
innovation distributions, while behaving equivalently under Gaussianity. When
an i.i.d. assumption on the innovations is appropriate for the data at hand,
our tests exploit the efficiency gains possible. Moreover, we show by
simulation that our test remains well behaved under some forms of conditional
heteroskedasticity.

arXiv link: http://arxiv.org/abs/2009.08291v1

Econometrics arXiv updated paper (originally submitted: 2020-09-17)

Fixed Effects Binary Choice Models with Three or More Periods

Authors: Laurent Davezies, Xavier D'Haultfoeuille, Martin Mugnier

We consider fixed effects binary choice models with a fixed number of periods
$T$ and regressors without a large support. If the time-varying unobserved
terms are i.i.d. with known distribution $F$, chamberlain2010 shows that
the common slope parameter is point identified if and only if $F$ is logistic.
However, he only considers in his proof $T=2$. We show that the result does not
generalize to $T\geq 3$: the common slope parameter can be identified when $F$
belongs to a family including the logit distribution. Identification is based
on a conditional moment restriction. Under restrictions on the covariates,
these moment conditions lead to point identification of relative effects. If
$T=3$ and mild conditions hold, GMM estimators based on these conditional
moment restrictions reach the semiparametric efficiency bound. Finally, we
illustrate our method by revisiting Brender and Drazen (2008).

arXiv link: http://arxiv.org/abs/2009.08108v4

Econometrics arXiv paper, submitted: 2020-09-17

Identification and Estimation of A Rational Inattention Discrete Choice Model with Bayesian Persuasion

Authors: Moyu Liao

This paper studies the semi-parametric identification and estimation of a
rational inattention model with Bayesian persuasion. The identification
requires the observation of a cross-section of market-level outcomes. The
empirical content of the model can be characterized by three moment conditions.
A two-step estimation procedure is proposed to avoid computation complexity in
the structural model. In the empirical application, I study the persuasion
effect of Fox News in the 2000 presidential election. Welfare analysis shows
that persuasion will not influence voters with high school education but will
generate higher dispersion in the welfare of voters with a partial college
education and decrease the dispersion in the welfare of voters with a bachelors
degree.

arXiv link: http://arxiv.org/abs/2009.08045v1

Econometrics arXiv updated paper (originally submitted: 2020-09-16)

Manipulation-Robust Regression Discontinuity Designs

Authors: Takuya Ishihara, Masayuki Sawada

We present simple low-level conditions for identification in regression
discontinuity designs using a potential outcome framework for the manipulation
of the running variable. Using this framework, we replace the existing
identification statement with two restrictions on manipulation. Our framework
highlights the critical role of the continuous density of the running variable
in identification. In particular, we establish the low-level auxiliary
assumption of the diagnostic density test under which the design may detect
manipulation against identification and hence is manipulation-robust.

arXiv link: http://arxiv.org/abs/2009.07551v7

Econometrics arXiv paper, submitted: 2020-09-15

Encompassing Tests for Value at Risk and Expected Shortfall Multi-Step Forecasts based on Inference on the Boundary

Authors: Timo Dimitriadis, Xiaochun Liu, Julie Schnaitmann

We propose forecast encompassing tests for the Expected Shortfall (ES)
jointly with the Value at Risk (VaR) based on flexible link (or combination)
functions. Our setup allows testing encompassing for convex forecast
combinations and for link functions which preclude crossings of the combined
VaR and ES forecasts. As the tests based on these link functions involve
parameters which are on the boundary of the parameter space under the null
hypothesis, we derive and base our tests on nonstandard asymptotic theory on
the boundary. Our simulation study shows that the encompassing tests based on
our new link functions outperform tests based on unrestricted linear link
functions for one-step and multi-step forecasts. We further illustrate the
potential of the proposed tests in a real data analysis for forecasting VaR and
ES of the S&P 500 index.

arXiv link: http://arxiv.org/abs/2009.07341v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2020-09-14

The Frisch--Waugh--Lovell Theorem for Standard Errors

Authors: Peng Ding

The Frisch--Waugh--Lovell Theorem states the equivalence of the coefficients
from the full and partial regressions. I further show the equivalence between
various standard errors. Applying the new result to stratified experiments
reveals the discrepancy between model-based and design-based standard errors.

arXiv link: http://arxiv.org/abs/2009.06621v1

Econometrics arXiv paper, submitted: 2020-09-14

Spatial Differencing for Sample Selection Models with Unobserved Heterogeneity

Authors: Alexander Klein, Guy Tchuente

This paper derives identification, estimation, and inference results using
spatial differencing in sample selection models with unobserved heterogeneity.
We show that under the assumption of smooth changes across space of the
unobserved sub-location specific heterogeneities and inverse Mills ratio, key
parameters of a sample selection model are identified. The smoothness of the
sub-location specific heterogeneities implies a correlation in the outcomes. We
assume that the correlation is restricted within a location or cluster and
derive asymptotic results showing that as the number of independent clusters
increases, the estimators are consistent and asymptotically normal. We also
propose a formula for standard error estimation. A Monte-Carlo experiment
illustrates the small sample properties of our estimator. The application of
our procedure to estimate the determinants of the municipality tax rate in
Finland shows the importance of accounting for unobserved heterogeneity.

arXiv link: http://arxiv.org/abs/2009.06570v1

Econometrics arXiv updated paper (originally submitted: 2020-09-14)

Vector copulas

Authors: Yanqin Fan, Marc Henry

This paper introduces vector copulas associated with multivariate
distributions with given multivariate marginals, based on the theory of measure
transportation, and establishes a vector version of Sklar's theorem. The latter
provides a theoretical justification for the use of vector copulas to
characterize nonlinear or rank dependence between a finite number of random
vectors (robust to within vector dependence), and to construct multivariate
distributions with any given non overlapping multivariate marginals. We
construct Elliptical and Kendall families of vector copulas, derive their
densities, and present algorithms to generate data from them. The use of vector
copulas is illustrated with a stylized analysis of international financial
contagion.

arXiv link: http://arxiv.org/abs/2009.06558v2

Econometrics arXiv updated paper (originally submitted: 2020-09-14)

Robust discrete choice models with t-distributed kernel errors

Authors: Rico Krueger, Michel Bierlaire, Thomas Gasos, Prateek Bansal

Outliers in discrete choice response data may result from misclassification
and misreporting of the response variable and from choice behaviour that is
inconsistent with modelling assumptions (e.g. random utility maximisation). In
the presence of outliers, standard discrete choice models produce biased
estimates and suffer from compromised predictive accuracy. Robust statistical
models are less sensitive to outliers than standard non-robust models. This
paper analyses two robust alternatives to the multinomial probit (MNP) model.
The two models are robit models whose kernel error distributions are
heavy-tailed t-distributions to moderate the influence of outliers. The first
model is the multinomial robit (MNR) model, in which a generic degrees of
freedom parameter controls the heavy-tailedness of the kernel error
distribution. The second model, the generalised multinomial robit (Gen-MNR)
model, is more flexible than MNR, as it allows for distinct heavy-tailedness in
each dimension of the kernel error distribution. For both models, we derive
Gibbs samplers for posterior inference. In a simulation study, we illustrate
the excellent finite sample properties of the proposed Bayes estimators and
show that MNR and Gen-MNR produce more accurate estimates if the choice data
contain outliers through the lens of the non-robust MNP model. In a case study
on transport mode choice behaviour, MNR and Gen-MNR outperform MNP by
substantial margins in terms of in-sample fit and out-of-sample predictive
accuracy. The case study also highlights differences in elasticity estimates
across models.

arXiv link: http://arxiv.org/abs/2009.06383v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2020-09-13

Bayesian modelling of time-varying conditional heteroscedasticity

Authors: Sayar Karmakar, Arkaprava Roy

Conditional heteroscedastic (CH) models are routinely used to analyze
financial datasets. The classical models such as ARCH-GARCH with time-invariant
coefficients are often inadequate to describe frequent changes over time due to
market variability. However we can achieve significantly better insight by
considering the time-varying analogues of these models. In this paper, we
propose a Bayesian approach to the estimation of such models and develop
computationally efficient MCMC algorithm based on Hamiltonian Monte Carlo (HMC)
sampling. We also established posterior contraction rates with increasing
sample size in terms of the average Hellinger metric. The performance of our
method is compared with frequentist estimates and estimates from the time
constant analogues. To conclude the paper we obtain time-varying parameter
estimates for some popular Forex (currency conversion rate) and stock market
datasets.

arXiv link: http://arxiv.org/abs/2009.06007v2

Econometrics arXiv updated paper (originally submitted: 2020-09-12)

Regularized Solutions to Linear Rational Expectations Models

Authors: Majid M. Al-Sadoon

This paper proposes an algorithm for computing regularized solutions to
linear rational expectations models. The algorithm allows for regularization
cross-sectionally as well as across frequencies. A variety of numerical
examples illustrate the advantage of regularization.

arXiv link: http://arxiv.org/abs/2009.05875v3

Econometrics arXiv paper, submitted: 2020-09-11

Inferring hidden potentials in analytical regions: uncovering crime suspect communities in Medellín

Authors: Alejandro Puerta, Andrés Ramírez-Hassan

This paper proposes a Bayesian approach to perform inference regarding the
size of hidden populations at analytical region using reported statistics. To
do so, we propose a specification taking into account one-sided error
components and spatial effects within a panel data structure. Our simulation
exercises suggest good finite sample performance. We analyze rates of crime
suspects living per neighborhood in Medell\'in (Colombia) associated with four
crime activities. Our proposal seems to identify hot spots or "crime
communities", potential neighborhoods where under-reporting is more severe, and
also drivers of crime schools. Statistical evidence suggests a high level of
interaction between homicides and drug dealing in one hand, and motorcycle and
car thefts on the other hand.

arXiv link: http://arxiv.org/abs/2009.05360v1

Econometrics arXiv updated paper (originally submitted: 2020-09-10)

Inference for high-dimensional exchangeable arrays

Authors: Harold D. Chiang, Kengo Kato, Yuya Sasaki

We consider inference for high-dimensional separately and jointly
exchangeable arrays where the dimensions may be much larger than the sample
sizes. For both exchangeable arrays, we first derive high-dimensional central
limit theorems over the rectangles and subsequently develop novel multiplier
bootstraps with theoretical guarantees. These theoretical results rely on new
technical tools such as Hoeffding-type decomposition and maximal inequalities
for the degenerate components in the Hoeffiding-type decomposition for the
exchangeable arrays. We exhibit applications of our methods to uniform
confidence bands for density estimation under joint exchangeability and penalty
choice for $\ell_1$-penalized regression under separate exchangeability.
Extensive simulations demonstrate precise uniform coverage rates. We illustrate
by constructing uniform confidence bands for international trade network
densities.

arXiv link: http://arxiv.org/abs/2009.05150v4

Econometrics arXiv paper, submitted: 2020-09-10

Capital Flows and the Stabilizing Role of Macroprudential Policies in CESEE

Authors: Markus Eller, Niko Hauzenberger, Florian Huber, Helene Schuberth, Lukas Vashold

In line with the recent policy discussion on the use of macroprudential
measures to respond to cross-border risks arising from capital flows, this
paper tries to quantify to what extent macroprudential policies (MPPs) have
been able to stabilize capital flows in Central, Eastern and Southeastern
Europe (CESEE) -- a region that experienced a substantial boom-bust cycle in
capital flows amid the global financial crisis and where policymakers had been
quite active in adopting MPPs already before that crisis. To study the dynamic
responses of capital flows to MPP shocks, we propose a novel regime-switching
factor-augmented vector autoregressive (FAVAR) model. It allows to capture
potential structural breaks in the policy regime and to control -- besides
domestic macroeconomic quantities -- for the impact of global factors such as
the global financial cycle. Feeding into this model a novel intensity-adjusted
macroprudential policy index, we find that tighter MPPs may be effective in
containing domestic private sector credit growth and the volumes of gross
capital inflows in a majority of the countries analyzed. However, they do not
seem to generally shield CESEE countries from capital flow volatility.

arXiv link: http://arxiv.org/abs/2009.06391v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-09-09

A Framework for Crop Price Forecasting in Emerging Economies by Analyzing the Quality of Time-series Data

Authors: Ayush Jain, Smit Marvaniya, Shantanu Godbole, Vitobha Munigala

Accuracy of crop price forecasting techniques is important because it enables
the supply chain planners and government bodies to take appropriate actions by
estimating market factors such as demand and supply. In emerging economies such
as India, the crop prices at marketplaces are manually entered every day, which
can be prone to human-induced errors like the entry of incorrect data or entry
of no data for many days. In addition to such human prone errors, the
fluctuations in the prices itself make the creation of stable and robust
forecasting solution a challenging task. Considering such complexities in crop
price forecasting, in this paper, we present techniques to build robust crop
price prediction models considering various features such as (i) historical
price and market arrival quantity of crops, (ii) historical weather data that
influence crop production and transportation, (iii) data quality-related
features obtained by performing statistical analysis. We additionally propose a
framework for context-based model selection and retraining considering factors
such as model stability, data quality metrics, and trend analysis of crop
prices. To show the efficacy of the proposed approach, we show experimental
results on two crops - Tomato and Maize for 14 marketplaces in India and
demonstrate that the proposed approach not only improves accuracy metrics
significantly when compared against the standard forecasting techniques but
also provides robust models.

arXiv link: http://arxiv.org/abs/2009.04171v1

Econometrics arXiv updated paper (originally submitted: 2020-09-08)

Exact Computation of Maximum Rank Correlation Estimator

Authors: Youngki Shin, Zvezdomir Todorov

In this paper we provide a computation algorithm to get a global solution for
the maximum rank correlation estimator using the mixed integer programming
(MIP) approach. We construct a new constrained optimization problem by
transforming all indicator functions into binary parameters to be estimated and
show that it is equivalent to the original problem. We also consider an
application of the best subset rank prediction and show that the original
optimization problem can be reformulated as MIP. We derive the non-asymptotic
bound for the tail probability of the predictive performance measure. We
investigate the performance of the MIP algorithm by an empirical example and
Monte Carlo simulations.

arXiv link: http://arxiv.org/abs/2009.03844v2

Econometrics arXiv updated paper (originally submitted: 2020-09-08)

Local Composite Quantile Regression for Regression Discontinuity

Authors: Xiao Huang, Zhaoguo Zhan

We introduce the local composite quantile regression (LCQR) to causal
inference in regression discontinuity (RD) designs. Kai et al. (2010) study the
efficiency property of LCQR, while we show that its nice boundary performance
translates to accurate estimation of treatment effects in RD under a variety of
data generating processes. Moreover, we propose a bias-corrected and standard
error-adjusted t-test for inference, which leads to confidence intervals with
good coverage probabilities. A bandwidth selector is also discussed. For
illustration, we conduct a simulation study and revisit a classic example from
Lee (2008). A companion R package rdcqr is developed.

arXiv link: http://arxiv.org/abs/2009.03716v3

Econometrics arXiv paper, submitted: 2020-09-07

Counterfactual and Welfare Analysis with an Approximate Model

Authors: Roy Allen, John Rehbeck

We propose a conceptual framework for counterfactual and welfare analysis for
approximate models. Our key assumption is that model approximation error is the
same magnitude at new choices as the observed data. Applying the framework to
quasilinear utility, we obtain bounds on quantities at new prices using an
approximate law of demand. We then bound utility differences between bundles
and welfare differences between prices. All bounds are computable as linear
programs. We provide detailed analytical results describing how the data map to
the bounds including shape restrictions that provide a foundation for plug-in
estimation. An application to gasoline demand illustrates the methodology.

arXiv link: http://arxiv.org/abs/2009.03379v1

Econometrics arXiv updated paper (originally submitted: 2020-09-07)

Dimension Reduction for High Dimensional Vector Autoregressive Models

Authors: Gianluca Cubadda, Alain Hecq

This paper aims to decompose a large dimensional vector autoregessive (VAR)
model into two components, the first one being generated by a small-scale VAR
and the second one being a white noise sequence. Hence, a reduced number of
common components generates the entire dynamics of the large system through a
VAR structure. This modelling, which we label as the dimension-reducible VAR,
extends the common feature approach to high dimensional systems, and it differs
from the dynamic factor model in which the idiosyncratic component can also
embed a dynamic pattern. We show the conditions under which this decomposition
exists. We provide statistical tools to detect its presence in the data and to
estimate the parameters of the underlying small-scale VAR model. Based on our
methodology, we propose a novel approach to identify the shock that is
responsible for most of the common variability at the business cycle
frequencies. We evaluate the practical value of the proposed methods by
simulations as well as by an empirical application to a large set of US
economic variables.

arXiv link: http://arxiv.org/abs/2009.03361v3

Econometrics arXiv paper, submitted: 2020-09-07

Doubly Robust Semiparametric Difference-in-Differences Estimators with High-Dimensional Data

Authors: Yang Ning, Sida Peng, Jing Tao

This paper proposes a doubly robust two-stage semiparametric
difference-in-difference estimator for estimating heterogeneous treatment
effects with high-dimensional data. Our new estimator is robust to model
miss-specifications and allows for, but does not require, many more regressors
than observations. The first stage allows a general set of machine learning
methods to be used to estimate the propensity score. In the second stage, we
derive the rates of convergence for both the parametric parameter and the
unknown function under a partially linear specification for the outcome
equation. We also provide bias correction procedures to allow for valid
inference for the heterogeneous treatment effects. We evaluate the finite
sample performance with extensive simulation studies. Additionally, a real data
analysis on the effect of Fair Minimum Wage Act on the unemployment rate is
performed as an illustration of our method. An R package for implementing the
proposed method is available on Github.

arXiv link: http://arxiv.org/abs/2009.03151v1

Econometrics arXiv updated paper (originally submitted: 2020-09-07)

Two-Stage Maximum Score Estimator

Authors: Wayne Yuan Gao, Sheng Xu, Kan Xu

This paper considers the asymptotic theory of a semiparametric M-estimator
that is generally applicable to models that satisfy a monotonicity condition in
one or several parametric indexes. We call the estimator two-stage maximum
score (TSMS) estimator since our estimator involves a first-stage nonparametric
regression when applied to the binary choice model of Manski (1975, 1985). We
characterize the asymptotic distribution of the TSMS estimator, which features
phase transitions depending on the dimension and thus the convergence rate of
the first-stage estimation. Effectively, the first-stage nonparametric
estimator serves as an imperfect smoothing function on a non-smooth criterion
function, leading to the pivotality of the first-stage estimation error with
respect to the second-stage convergence rate and asymptotic distribution

arXiv link: http://arxiv.org/abs/2009.02854v4

Econometrics arXiv updated paper (originally submitted: 2020-09-06)

Decomposing Identification Gains and Evaluating Instrument Identification Power for Partially Identified Average Treatment Effects

Authors: Lina Zhang, David T. Frazier, D. S. Poskitt, Xueyan Zhao

This paper examines the identification power of instrumental variables (IVs)
for average treatment effect (ATE) in partially identified models. We decompose
the ATE identification gains into components of contributions driven by IV
relevancy, IV strength, direction and degree of treatment endogeneity, and
matching via exogenous covariates. Our decomposition is demonstrated with
graphical illustrations, simulation studies and an empirical example of
childbearing and women's labour supply. Our analysis offers insights for
understanding the complex role of IVs in ATE identification and for selecting
IVs in practical policy designs. Simulations also suggest potential uses of our
analysis for detecting irrelevant instruments.

arXiv link: http://arxiv.org/abs/2009.02642v3

Econometrics arXiv updated paper (originally submitted: 2020-09-05)

COVID-19: Tail Risk and Predictive Regressions

Authors: Walter Distaso, Rustam Ibragimov, Alexander Semenov, Anton Skrobotov

The paper focuses on econometrically justified robust analysis of the effects
of the COVID-19 pandemic on financial markets in different countries across the
World. It provides the results of robust estimation and inference on predictive
regressions for returns on major stock indexes in 23 countries in North and
South America, Europe, and Asia incorporating the time series of reported
infections and deaths from COVID-19. We also present a detailed study of
persistence, heavy-tailedness and tail risk properties of the time series of
the COVID-19 infections and death rates that motivate the necessity in
applications of robust inference methods in the analysis. Econometrically
justified analysis is based on heteroskedasticity and autocorrelation
consistent (HAC) inference methods, recently developed robust $t$-statistic
inference approaches and robust tail index estimation.

arXiv link: http://arxiv.org/abs/2009.02486v3

Econometrics arXiv updated paper (originally submitted: 2020-09-04)

Heterogeneous Coefficients, Control Variables, and Identification of Multiple Treatment Effects

Authors: Whitney K. Newey, Sami Stouli

Multidimensional heterogeneity and endogeneity are important features of
models with multiple treatments. We consider a heterogeneous coefficients model
where the outcome is a linear combination of dummy treatment variables, with
each variable representing a different kind of treatment. We use control
variables to give necessary and sufficient conditions for identification of
average treatment effects. With mutually exclusive treatments we find that,
provided the heterogeneous coefficients are mean independent from treatments
given the controls, a simple identification condition is that the generalized
propensity scores (Imbens, 2000) be bounded away from zero and that their sum
be bounded away from one, with probability one. Our analysis extends to
distributional and quantile treatment effects, as well as corresponding
treatment effects on the treated. These results generalize the classical
identification result of Rosenbaum and Rubin (1983) for binary treatments.

arXiv link: http://arxiv.org/abs/2009.02314v3

Econometrics arXiv updated paper (originally submitted: 2020-09-04)

Cointegrating Polynomial Regressions with Power Law Trends: Environmental Kuznets Curve or Omitted Time Effects?

Authors: Yicong Lin, Hanno Reuvers

The environmental Kuznets curve predicts an inverted U-shaped relationship
between environmental pollution and economic growth. Current analyses
frequently employ models which restrict nonlinearities in the data to be
explained by the economic growth variable only. We propose a Generalized
Cointegrating Polynomial Regression (GCPR) to allow for an alternative source
of nonlinearity. More specifically, the GCPR is a seemingly unrelated
regression with (1) integer powers of deterministic and stochastic trends for
the individual units, and (2) a common flexible global trend. We estimate this
GCPR by nonlinear least squares and derive its asymptotic distribution.
Endogeneity of the regressors will introduce nuisance parameters into the
limiting distribution but a simulation-based approach nevertheless enables us
to conduct valid inference. A multivariate subsampling KPSS test is proposed to
verify the correct specification of the cointegrating relation. Our simulation
study shows good performance of the simulated inference approach and
subsampling KPSS test. We illustrate the GCPR approach using data for Austria,
Belgium, Finland, the Netherlands, Switzerland, and the UK. A single global
trend accurately captures all nonlinearities leading to a linear cointegrating
relation between GDP and CO2 for all countries. This suggests that the
environmental improvement of the last years is due to economic factors
different from GDP.

arXiv link: http://arxiv.org/abs/2009.02262v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-09-04

Unlucky Number 13? Manipulating Evidence Subject to Snooping

Authors: Uwe Hassler, Marc-Oliver Pohle

Questionable research practices like HARKing or p-hacking have generated
considerable recent interest throughout and beyond the scientific community. We
subsume such practices involving secret data snooping that influences
subsequent statistical inference under the term MESSing (manipulating evidence
subject to snooping) and discuss, illustrate and quantify the possibly dramatic
effects of several forms of MESSing using an empirical and a simple theoretical
example. The empirical example uses numbers from the most popular German
lottery, which seem to suggest that 13 is an unlucky number.

arXiv link: http://arxiv.org/abs/2009.02198v1

Econometrics arXiv updated paper (originally submitted: 2020-09-04)

Instrument Validity for Heterogeneous Causal Effects

Authors: Zhenting Sun

This paper provides a general framework for testing instrument validity in
heterogeneous causal effect models. The generalization includes the cases where
the treatment can be multivalued ordered or unordered. Based on a series of
testable implications, we propose a nonparametric test which is proved to be
asymptotically size controlled and consistent. Compared to the tests in the
literature, our test can be applied in more general settings and may achieve
power improvement. Refutation of instrument validity by the test helps detect
invalid instruments that may yield implausible results on causal effects.
Evidence that the test performs well on finite samples is provided via
simulations. We revisit the empirical study on return to schooling to
demonstrate application of the proposed test in practice. An extended
continuous mapping theorem and an extended delta method, which may be of
independent interest, are provided to establish the asymptotic distribution of
the test statistic under null.

arXiv link: http://arxiv.org/abs/2009.01995v6

Econometrics arXiv paper, submitted: 2020-09-03

The role of parallel trends in event study settings: An application to environmental economics

Authors: Michelle Marcus, Pedro H. C. Sant'Anna

Difference-in-Differences (DID) research designs usually rely on variation of
treatment timing such that, after making an appropriate parallel trends
assumption, one can identify, estimate, and make inference about causal
effects. In practice, however, different DID procedures rely on different
parallel trends assumptions (PTA), and recover different causal parameters. In
this paper, we focus on staggered DID (also referred as event-studies) and
discuss the role played by the PTA in terms of identification and estimation of
causal parameters. We document a “robustness” vs. “efficiency” trade-off in
terms of the strength of the underlying PTA, and argue that practitioners
should be explicit about these trade-offs whenever using DID procedures. We
propose new DID estimators that reflect these trade-offs and derived their
large sample properties. We illustrate the practical relevance of these results
by assessing whether the transition from federal to state management of the
Clean Water Act affects compliance rates.

arXiv link: http://arxiv.org/abs/2009.01963v1

Econometrics arXiv cross-link from cs.CY (cs.CY), submitted: 2020-09-03

Deep Learning in Science

Authors: Stefano Bianchini, Moritz Müller, Pierre Pelletier

Much of the recent success of Artificial Intelligence (AI) has been spurred
on by impressive achievements within a broader family of machine learning
methods, commonly referred to as Deep Learning (DL). This paper provides
insights on the diffusion and impact of DL in science. Through a Natural
Language Processing (NLP) approach on the arXiv.org publication corpus, we
delineate the emerging DL technology and identify a list of relevant search
terms. These search terms allow us to retrieve DL-related publications from Web
of Science across all sciences. Based on that sample, we document the DL
diffusion process in the scientific system. We find i) an exponential growth in
the adoption of DL as a research tool across all sciences and all over the
world, ii) regional differentiation in DL application domains, and iii) a
transition from interdisciplinary DL applications to disciplinary research
within application domains. In a second step, we investigate how the adoption
of DL methods affects scientific development. Therefore, we empirically assess
how DL adoption relates to re-combinatorial novelty and scientific impact in
the health sciences. We find that DL adoption is negatively correlated with
re-combinatorial novelty, but positively correlated with expectation as well as
variance of citation performance. Our findings suggest that DL does not (yet?)
work as an autopilot to navigate complex knowledge landscapes and overthrow
their structure. However, the 'DL principle' qualifies for its versatility as
the nucleus of a general scientific method that advances science in a
measurable way.

arXiv link: http://arxiv.org/abs/2009.01575v2

Econometrics arXiv updated paper (originally submitted: 2020-09-03)

A Robust Score-Driven Filter for Multivariate Time Series

Authors: Enzo D'Innocenzo, Alessandra Luati, Mario Mazzocchi

A multivariate score-driven filter is developed to extract signals from noisy
vector processes. By assuming that the conditional location vector from a
multivariate Student's t distribution changes over time, we construct a robust
filter which is able to overcome several issues that naturally arise when
modeling heavy-tailed phenomena and, more in general, vectors of dependent
non-Gaussian time series. We derive conditions for stationarity and
invertibility and estimate the unknown parameters by maximum likelihood (ML).
Strong consistency and asymptotic normality of the estimator are proved and the
finite sample properties are illustrated by a Monte-Carlo study. From a
computational point of view, analytical formulae are derived, which consent to
develop estimation procedures based on the Fisher scoring method. The theory is
supported by a novel empirical illustration that shows how the model can be
effectively applied to estimate consumer prices from home scanner data.

arXiv link: http://arxiv.org/abs/2009.01517v3

Econometrics arXiv updated paper (originally submitted: 2020-09-03)

Hidden Group Time Profiles: Heterogeneous Drawdown Behaviours in Retirement

Authors: Igor Balnozan, Denzil G. Fiebig, Anthony Asher, Robert Kohn, Scott A. Sisson

This article investigates retirement decumulation behaviours using the
Grouped Fixed-Effects (GFE) estimator applied to Australian panel data on
drawdowns from phased withdrawal retirement income products. Behaviours
exhibited by the distinct latent groups identified suggest that retirees may
adopt simple heuristics determining how they draw down their accumulated
wealth. Two extensions to the original GFE methodology are proposed: a latent
group label-matching procedure which broadens bootstrap inference to include
the time profile estimates, and a modified estimation procedure for models with
time-invariant additive fixed effects estimated using unbalanced data.

arXiv link: http://arxiv.org/abs/2009.01505v3

Econometrics arXiv updated paper (originally submitted: 2020-09-01)

A Vector Monotonicity Assumption for Multiple Instruments

Authors: Leonard Goff

When a researcher combines multiple instrumental variables for a single
binary treatment, the monotonicity assumption of the local average treatment
effects (LATE) framework can become restrictive: it requires that all units
share a common direction of response even when separate instruments are shifted
in opposing directions. What I call vector monotonicity, by contrast, simply
assumes treatment uptake to be monotonic in all instruments. I characterize the
class of causal parameters that are point identified under vector monotonicity,
when the instruments are binary. This class includes, for example, the average
treatment effect among units that are in any way responsive to the collection
of instruments, or those that are responsive to a given subset of them. The
identification results are constructive and yield a simple estimator for the
identified treatment effect parameters. An empirical application revisits the
labor market returns to college.

arXiv link: http://arxiv.org/abs/2009.00553v6

Econometrics arXiv updated paper (originally submitted: 2020-09-01)

Time-Varying Parameters as Ridge Regressions

Authors: Philippe Goulet Coulombe

Time-varying parameters (TVPs) models are frequently used in economics to
capture structural change. I highlight a rather underutilized fact -- that
these are actually ridge regressions. Instantly, this makes computations,
tuning, and implementation much easier than in the state-space paradigm. Among
other things, solving the equivalent dual ridge problem is computationally very
fast even in high dimensions, and the crucial "amount of time variation" is
tuned by cross-validation. Evolving volatility is dealt with using a two-step
ridge regression. I consider extensions that incorporate sparsity (the
algorithm selects which parameters vary and which do not) and reduced-rank
restrictions (variation is tied to a factor model). To demonstrate the
usefulness of the approach, I use it to study the evolution of monetary policy
in Canada using large time-varying local projections. The application requires
the estimation of about 4600 TVPs, a task well within the reach of the new
method.

arXiv link: http://arxiv.org/abs/2009.00401v4

Econometrics arXiv updated paper (originally submitted: 2020-09-01)

An optimal test for strategic interaction in social and economic network formation between heterogeneous agents

Authors: Andrin Pelican, Bryan S. Graham

Consider a setting where $N$ players, partitioned into $K$ observable types,
form a directed network. Agents' preferences over the form of the network
consist of an arbitrary network benefit function (e.g., agents may have
preferences over their network centrality) and a private component which is
additively separable in own links. This latter component allows for unobserved
heterogeneity in the costs of sending and receiving links across agents
(respectively out- and in- degree heterogeneity) as well as
homophily/heterophily across the $K$ types of agents. In contrast, the network
benefit function allows agents' preferences over links to vary with the
presence or absence of links elsewhere in the network (and hence with the link
formation behavior of their peers). In the null model which excludes the
network benefit function, links form independently across dyads in the manner
described by Charbonneau_EJ17. Under the alternative there is
interdependence across linking decisions (i.e., strategic interaction). We show
how to test the null with power optimized in specific directions. These
alternative directions include many common models of strategic network
formation (e.g., "connections" models, "structural hole" models etc.). Our
random utility specification induces an exponential family structure under the
null which we exploit to construct a similar test which exactly controls size
(despite the the null being a composite one with many nuisance parameters). We
further show how to construct locally best tests for specific alternatives
without making any assumptions about equilibrium selection. To make our tests
feasible we introduce a new MCMC algorithm for simulating the null
distributions of our test statistics.

arXiv link: http://arxiv.org/abs/2009.00212v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2020-08-31

InClass Nets: Independent Classifier Networks for Nonparametric Estimation of Conditional Independence Mixture Models and Unsupervised Classification

Authors: Konstantin T. Matchev, Prasanth Shyamsundar

We introduce a new machine-learning-based approach, which we call the
Independent Classifier networks (InClass nets) technique, for the
nonparameteric estimation of conditional independence mixture models (CIMMs).
We approach the estimation of a CIMM as a multi-class classification problem,
since dividing the dataset into different categories naturally leads to the
estimation of the mixture model. InClass nets consist of multiple independent
classifier neural networks (NNs), each of which handles one of the variates of
the CIMM. Fitting the CIMM to the data is performed by simultaneously training
the individual NNs using suitable cost functions. The ability of NNs to
approximate arbitrary functions makes our technique nonparametric. Further
leveraging the power of NNs, we allow the conditionally independent variates of
the model to be individually high-dimensional, which is the main advantage of
our technique over existing non-machine-learning-based approaches. We derive
some new results on the nonparametric identifiability of bivariate CIMMs, in
the form of a necessary and a (different) sufficient condition for a bivariate
CIMM to be identifiable. We provide a public implementation of InClass nets as
a Python package called RainDancesVI and validate our InClass nets technique
with several worked out examples. Our method also has applications in
unsupervised and semi-supervised classification problems.

arXiv link: http://arxiv.org/abs/2009.00131v1

Econometrics arXiv updated paper (originally submitted: 2020-08-31)

Identification of Semiparametric Panel Multinomial Choice Models with Infinite-Dimensional Fixed Effects

Authors: Wayne Yuan Gao, Ming Li

This paper proposes a robust method for semiparametric identification and
estimation in panel multinomial choice models, where we allow for
infinite-dimensional fixed effects that enter into consumer utilities in an
additively nonseparable way, thus incorporating rich forms of unobserved
heterogeneity. Our identification strategy exploits multivariate monotonicity
in parametric indexes, and uses the logical contraposition of an intertemporal
inequality on choice probabilities to obtain identifying restrictions. We
provide a consistent estimation procedure, and demonstrate the practical
advantages of our method with Monte Carlo simulations and an empirical
illustration on popcorn sales with the Nielsen data.

arXiv link: http://arxiv.org/abs/2009.00085v2

Econometrics arXiv updated paper (originally submitted: 2020-08-31)

Causal Inference in Possibly Nonlinear Factor Models

Authors: Yingjie Feng

This paper develops a general causal inference method for treatment effects
models with noisily measured confounders. The key feature is that a large set
of noisy measurements are linked with the underlying latent confounders through
an unknown, possibly nonlinear factor structure. The main building block is a
local principal subspace approximation procedure that combines $K$-nearest
neighbors matching and principal component analysis. Estimators of many causal
parameters, including average treatment effects and counterfactual
distributions, are constructed based on doubly-robust score functions.
Large-sample properties of these estimators are established, which only require
relatively mild conditions on the principal subspace approximation. The results
are illustrated with an empirical application studying the effect of political
connections on stock returns of financial firms, and a Monte Carlo experiment.
The main technical and methodological results regarding the general local
principal subspace approximation method may be of independent interest.

arXiv link: http://arxiv.org/abs/2008.13651v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2020-08-29

Efficiency Loss of Asymptotically Efficient Tests in an Instrumental Variables Regression

Authors: Marcelo J. Moreira, Geert Ridder

In an instrumental variable model, the score statistic can be bounded for any
alternative in parts of the parameter space. These regions involve a constraint
on the first-stage regression coefficients and the reduced-form covariance
matrix. Consequently, the Lagrange Multiplier test can have power close to
size, despite being efficient under standard asymptotics. This information loss
limits the power of conditional tests which use only the Anderson-Rubin and the
score statistic. The conditional quasi-likelihood ratio test also suffers
severe losses because it can be bounded for any alternative.
A necessary condition for drastic power loss to occur is that the Hermitian
of the reduced-form covariance matrix has eigenvalues of opposite signs. These
cases are denoted impossibility designs (ID). We show this happens in practice,
by applying our theory to the problem of inference on the intertemporal
elasticity of substitution (IES). Of eleven countries studied by Yogo (2004}
and Andrews (2016), nine are consistent with ID at the 95% level.

arXiv link: http://arxiv.org/abs/2008.13042v2

Econometrics arXiv updated paper (originally submitted: 2020-08-28)

The Identity Fragmentation Bias

Authors: Tesary Lin, Sanjog Misra

Consumers interact with firms across multiple devices, browsers, and
machines; these interactions are often recorded with different identifiers for
the same consumer. The failure to correctly match different identities leads to
a fragmented view of exposures and behaviors. This paper studies the identity
fragmentation bias, referring to the estimation bias resulted from using
fragmented data. Using a formal framework, we decompose the contributing
factors of the estimation bias caused by data fragmentation and discuss the
direction of bias. Contrary to conventional wisdom, this bias cannot be signed
or bounded under standard assumptions. Instead, upward biases and sign
reversals can occur even in experimental settings. We then compare several
corrective measures, and discuss their respective advantages and caveats.

arXiv link: http://arxiv.org/abs/2008.12849v2

Econometrics arXiv paper, submitted: 2020-08-28

Instrumental Variable Quantile Regression

Authors: Victor Chernozhukov, Christian Hansen, Kaspar Wuthrich

This chapter reviews the instrumental variable quantile regression model of
Chernozhukov and Hansen (2005). We discuss the key conditions used for
identification of structural quantile effects within this model which include
the availability of instruments and a restriction on the ranks of structural
disturbances. We outline several approaches to obtaining point estimates and
performing statistical inference for model parameters. Finally, we point to
possible directions for future research.

arXiv link: http://arxiv.org/abs/2009.00436v1

Econometrics arXiv updated paper (originally submitted: 2020-08-28)

Generalized Lee Bounds

Authors: Vira Semenova

Lee (2009) is a common approach to bound the average causal effect in the
presence of selection bias, assuming the treatment effect on selection has the
same sign for all subjects. This paper generalizes Lee bounds to allow the sign
of this effect to be identified by pretreatment covariates, relaxing the
standard (unconditional) monotonicity to its conditional analog. Asymptotic
theory for generalized Lee bounds is proposed in low-dimensional smooth and
high-dimensional sparse designs. The paper also generalizes Lee bounds to
accommodate multiple outcomes. Focusing on JobCorps job training program, I
first show that unconditional monotonicity is unlikely to hold, and then
demonstrate the use of covariates to tighten the bounds.

arXiv link: http://arxiv.org/abs/2008.12720v4

Econometrics arXiv updated paper (originally submitted: 2020-08-28)

Nowcasting in a Pandemic using Non-Parametric Mixed Frequency VARs

Authors: Florian Huber, Gary Koop, Luca Onorante, Michael Pfarrhofer, Josef Schreiner

This paper develops Bayesian econometric methods for posterior inference in
non-parametric mixed frequency VARs using additive regression trees. We argue
that regression tree models are ideally suited for macroeconomic nowcasting in
the face of extreme observations, for instance those produced by the COVID-19
pandemic of 2020. This is due to their flexibility and ability to model
outliers. In an application involving four major euro area countries, we find
substantial improvements in nowcasting performance relative to a linear mixed
frequency VAR.

arXiv link: http://arxiv.org/abs/2008.12706v3

Econometrics arXiv paper, submitted: 2020-08-28

How is Machine Learning Useful for Macroeconomic Forecasting?

Authors: Philippe Goulet Coulombe, Maxime Leroux, Dalibor Stevanovic, Stéphane Surprenant

We move beyond "Is Machine Learning Useful for Macroeconomic Forecasting?" by
adding the "how". The current forecasting literature has focused on matching
specific variables and horizons with a particularly successful algorithm. In
contrast, we study the usefulness of the underlying features driving ML gains
over standard macroeconometric methods. We distinguish four so-called features
(nonlinearities, regularization, cross-validation and alternative loss
function) and study their behavior in both the data-rich and data-poor
environments. To do so, we design experiments that allow to identify the
"treatment" effects of interest. We conclude that (i) nonlinearity is the true
game changer for macroeconomic prediction, (ii) the standard factor model
remains the best regularization, (iii) K-fold cross-validation is the best
practice and (iv) the $L_2$ is preferred to the $\bar \epsilon$-insensitive
in-sample loss. The forecasting gains of nonlinear techniques are associated
with high macroeconomic uncertainty, financial stress and housing bubble
bursts. This suggests that Machine Learning is useful for macroeconomic
forecasting by mostly capturing important nonlinearities that arise in the
context of uncertainty and financial frictions.

arXiv link: http://arxiv.org/abs/2008.12477v1

Econometrics arXiv updated paper (originally submitted: 2020-08-27)

Efficient closed-form estimation of large spatial autoregressions

Authors: Abhimanyu Gupta

Newton-step approximations to pseudo maximum likelihood estimates of spatial
autoregressive models with a large number of parameters are examined, in the
sense that the parameter space grows slowly as a function of sample size. These
have the same asymptotic efficiency properties as maximum likelihood under
Gaussianity but are of closed form. Hence they are computationally simple and
free from compactness assumptions, thereby avoiding two notorious pitfalls of
implicitly defined estimates of large spatial autoregressions. For an initial
least squares estimate, the Newton step can also lead to weaker regularity
conditions for a central limit theorem than those extant in the literature. A
simulation study demonstrates excellent finite sample gains from Newton
iterations, especially in large multiparameter models for which grid search is
costly. A small empirical illustration shows improvements in estimation
precision with real data.

arXiv link: http://arxiv.org/abs/2008.12395v4

Econometrics arXiv updated paper (originally submitted: 2020-08-25)

Inference for parameters identified by conditional moment restrictions using a generalized Bierens maximum statistic

Authors: Xiaohong Chen, Sokbae Lee, Myung Hwan Seo, Myunghyun Song

Many economic panel and dynamic models, such as rational behavior and Euler
equations, imply that the parameters of interest are identified by conditional
moment restrictions. We introduce a novel inference method without any prior
information about which conditioning instruments are weak or irrelevant.
Building on Bierens (1990), we propose penalized maximum statistics and combine
bootstrap inference with model selection. Our method optimizes asymptotic power
by solving a data-dependent max-min problem for tuning parameter selection.
Extensive Monte Carlo experiments, based on an empirical example, demonstrate
the extent to which our inference procedure is superior to those available in
the literature.

arXiv link: http://arxiv.org/abs/2008.11140v7

Econometrics arXiv cross-link from cond-mat.stat-mech (cond-mat.stat-mech), submitted: 2020-08-24

On the equivalence between the Kinetic Ising Model and discrete autoregressive processes

Authors: Carlo Campajola, Fabrizio Lillo, Piero Mazzarisi, Daniele Tantari

Binary random variables are the building blocks used to describe a large
variety of systems, from magnetic spins to financial time series and neuron
activity. In Statistical Physics the Kinetic Ising Model has been introduced to
describe the dynamics of the magnetic moments of a spin lattice, while in time
series analysis discrete autoregressive processes have been designed to capture
the multivariate dependence structure across binary time series. In this
article we provide a rigorous proof of the equivalence between the two models
in the range of a unique and invertible map unambiguously linking one model
parameters set to the other. Our result finds further justification
acknowledging that both models provide maximum entropy distributions of binary
time series with given means, auto-correlations, and lagged cross-correlations
of order one. We further show that the equivalence between the two models
permits to exploit the inference methods originally developed for one model in
the inference of the other.

arXiv link: http://arxiv.org/abs/2008.10666v3

Econometrics arXiv updated paper (originally submitted: 2020-08-24)

Finite-Sample Average Bid Auction

Authors: Haitian Xie

The paper studies the problem of auction design in a setting where the
auctioneer accesses the knowledge of the valuation distribution only through
statistical samples. A new framework is established that combines the
statistical decision theory with mechanism design. Two optimality criteria,
maxmin, and equivariance, are studied along with their implications on the form
of auctions. The simplest form of the equivariant auction is the average bid
auction, which set individual reservation prices proportional to the average of
other bids and historical samples. This form of auction can be motivated by the
Gamma distribution, and it sheds new light on the estimation of the optimal
price, an irregular parameter. Theoretical results show that it is often
possible to use the regular parameter population mean to approximate the
optimal price. An adaptive average bid estimator is developed under this idea,
and it has the same asymptotic properties as the empirical Myerson estimator.
The new proposed estimator has a significantly better performance in terms of
value at risk and expected shortfall when the sample size is small.

arXiv link: http://arxiv.org/abs/2008.10217v2

Econometrics arXiv updated paper (originally submitted: 2020-08-21)

Empirical Likelihood Covariate Adjustment for Regression Discontinuity Designs

Authors: Jun Ma, Zhengfei Yu

This paper proposes a versatile covariate adjustment method that directly
incorporates covariate balance in regression discontinuity (RD) designs. The
new empirical entropy balancing method reweights the standard local polynomial
RD estimator by using the entropy balancing weights that minimize the
Kullback--Leibler divergence from the uniform weights while satisfying the
covariate balance constraints. Our estimator can be formulated as an empirical
likelihood estimator that efficiently incorporates the information from the
covariate balance condition as correctly specified over-identifying moment
restrictions, and thus has an asymptotic variance no larger than that of the
standard estimator without covariates. We demystify the asymptotic efficiency
gain of Calonico, Cattaneo, Farrell, and Titiunik (2019)'s regression-based
covariate-adjusted estimator, as their estimator has the same asymptotic
variance as ours. Further efficiency improvement from balancing over sieve
spaces is possible if our entropy balancing weights are computed using stronger
covariate balance constraints that are imposed on functions of covariates. We
then show that our method enjoys favorable second-order properties from
empirical likelihood estimation and inference: the estimator has a small
(bounded) nonlinearity bias, and the likelihood ratio based confidence set
admits a simple analytical correction that can be used to improve coverage
accuracy. The coverage accuracy of our confidence set is robust against slight
perturbation to the covariate balance condition, which may happen in cases such
as data contamination and misspecified "unaffected" outcomes used as
covariates. The proposed entropy balancing approach for covariate adjustment is
applicable to other RD-related settings.

arXiv link: http://arxiv.org/abs/2008.09263v3

Econometrics arXiv updated paper (originally submitted: 2020-08-20)

Inference for Moment Inequalities: A Constrained Moment Selection Procedure

Authors: Rami V. Tabri, Christopher D. Walker

Inference in models where the parameter is defined by moment inequalities is
of interest in many areas of economics. This paper develops a new method for
improving the performance of generalized moment selection (GMS) testing
procedures in finite-samples. The method modifies GMS tests by tilting the
empirical distribution in its moment selection step by an amount that maximizes
the empirical likelihood subject to the restrictions of the null hypothesis. We
characterize sets of population distributions on which a modified GMS test is
(i) asymptotically equivalent to its non-modified version to first-order, and
(ii) superior to its non-modified version according to local power when the
sample size is large enough. An important feature of the proposed modification
is that it remains computationally feasible even when the number of moment
inequalities is large. We report simulation results that show the modified
tests control size well, and have markedly improved local power over their
non-modified counterparts.

arXiv link: http://arxiv.org/abs/2008.09021v2

Econometrics arXiv updated paper (originally submitted: 2020-08-19)

A Novel Approach to Predictive Accuracy Testing in Nested Environments

Authors: Jean-Yves Pitarakis

We introduce a new approach for comparing the predictive accuracy of two
nested models that bypasses the difficulties caused by the degeneracy of the
asymptotic variance of forecast error loss differentials used in the
construction of commonly used predictive comparison statistics. Our approach
continues to rely on the out of sample MSE loss differentials between the two
competing models, leads to nuisance parameter free Gaussian asymptotics and is
shown to remain valid under flexible assumptions that can accommodate
heteroskedasticity and the presence of mixed predictors (e.g. stationary and
local to unit root). A local power analysis also establishes its ability to
detect departures from the null in both stationary and persistent settings.
Simulations calibrated to common economic and financial applications indicate
that our methods have strong power with good size control across commonly
encountered sample sizes.

arXiv link: http://arxiv.org/abs/2008.08387v3

Econometrics arXiv paper, submitted: 2020-08-18

Bounds on Distributional Treatment Effect Parameters using Panel Data with an Application on Job Displacement

Authors: Brantly Callaway

This paper develops new techniques to bound distributional treatment effect
parameters that depend on the joint distribution of potential outcomes -- an
object not identified by standard identifying assumptions such as selection on
observables or even when treatment is randomly assigned. I show that panel data
and an additional assumption on the dependence between untreated potential
outcomes for the treated group over time (i) provide more identifying power for
distributional treatment effect parameters than existing bounds and (ii)
provide a more plausible set of conditions than existing methods that obtain
point identification. I apply these bounds to study heterogeneity in the effect
of job displacement during the Great Recession. Using standard techniques, I
find that workers who were displaced during the Great Recession lost on average
34% of their earnings relative to their counterfactual earnings had they not
been displaced. Using the methods developed in the current paper, I also show
that the average effect masks substantial heterogeneity across workers.

arXiv link: http://arxiv.org/abs/2008.08117v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-08-18

Learning Structure in Nested Logit Models

Authors: Youssef M. Aboutaleb, Moshe Ben-Akiva, Patrick Jaillet

This paper introduces a new data-driven methodology for nested logit
structure discovery. Nested logit models allow the modeling of positive
correlations between the error terms of the utility specifications of the
different alternatives in a discrete choice scenario through the specification
of a nesting structure. Current nested logit model estimation practices require
an a priori specification of a nesting structure by the modeler. In this we
work we optimize over all possible specifications of the nested logit model
that are consistent with rational utility maximization. We formulate the
problem of learning an optimal nesting structure from the data as a mixed
integer nonlinear programming (MINLP) optimization problem and solve it using a
variant of the linear outer approximation algorithm. We exploit the tree
structure of the problem and utilize the latest advances in integer
optimization to bring practical tractability to the optimization problem we
introduce. We demonstrate the ability of our algorithm to correctly recover the
true nesting structure from synthetic data in a Monte Carlo experiment. In an
empirical illustration using a stated preference survey on modes of
transportation in the U.S. state of Massachusetts, we use our algorithm to
obtain an optimal nesting tree representing the correlations between the
unobserved effects of the different travel mode choices. We provide our
implementation as a customizable and open-source code base written in the Julia
programming language.

arXiv link: http://arxiv.org/abs/2008.08048v1

Econometrics arXiv paper, submitted: 2020-08-18

Peer effects and endogenous social interactions

Authors: Koen Jochmans

We introduce an approach to deal with self-selection of peers in the
linear-in-means model. Contrary to the existing proposals we do not require to
specify a model for how the selection of peers comes about. Rather, we exploit
two restrictions that are inherent to many such specifications to construct
intuitive instrumental variables. These restrictions are that link decisions
that involve a given individual are not all independent of one another, but
that they are independent of the link behavior between other pairs of
individuals. A two-stage least-squares estimator of the linear-in-means model
is then readily obtained.

arXiv link: http://arxiv.org/abs/2008.07886v1

Econometrics arXiv cross-link from math.OC (math.OC), submitted: 2020-08-18

A Relation Analysis of Markov Decision Process Frameworks

Authors: Tien Mai, Patrick Jaillet

We study the relation between different Markov Decision Process (MDP)
frameworks in the machine learning and econometrics literatures, including the
standard MDP, the entropy and general regularized MDP, and stochastic MDP,
where the latter is based on the assumption that the reward function is
stochastic and follows a given distribution. We show that the
entropy-regularized MDP is equivalent to a stochastic MDP model, and is
strictly subsumed by the general regularized MDP. Moreover, we propose a
distributional stochastic MDP framework by assuming that the distribution of
the reward function is ambiguous. We further show that the distributional
stochastic MDP is equivalent to the regularized MDP, in the sense that they
always yield the same optimal policies. We also provide a connection between
stochastic/regularized MDP and constrained MDP. Our work gives a unified view
on several important MDP frameworks, which would lead new ways to interpret the
(entropy/general) regularized MDP frameworks through the lens of stochastic
rewards and vice-versa. Given the recent popularity of regularized MDP in
(deep) reinforcement learning, our work brings new understandings of how such
algorithmic schemes work and suggest ideas to develop new ones.

arXiv link: http://arxiv.org/abs/2008.07820v1

Econometrics arXiv paper, submitted: 2020-08-17

Analysing a built-in advantage in asymmetric darts contests using causal machine learning

Authors: Daniel Goller

We analyse a sequential contest with two players in darts where one of the
contestants enjoys a technical advantage. Using methods from the causal machine
learning literature, we analyse the built-in advantage, which is the
first-mover having potentially more but never less moves. Our empirical
findings suggest that the first-mover has an 8.6 percentage points higher
probability to win the match induced by the technical advantage. Contestants
with low performance measures and little experience have the highest built-in
advantage. With regard to the fairness principle that contestants with equal
abilities should have equal winning probabilities, this contest is ex-ante fair
in the case of equal built-in advantages for both competitors and a randomized
starting right. Nevertheless, the contest design produces unequal probabilities
of winning for equally skilled contestants because of asymmetries in the
built-in advantage associated with social pressure for contestants competing at
home and away.

arXiv link: http://arxiv.org/abs/2008.07165v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2020-08-17

To Bag is to Prune

Authors: Philippe Goulet Coulombe

It is notoriously difficult to build a bad Random Forest (RF). Concurrently,
RF blatantly overfits in-sample without any apparent consequence out-of-sample.
Standard arguments, like the classic bias-variance trade-off or double descent,
cannot rationalize this paradox. I propose a new explanation: bootstrap
aggregation and model perturbation as implemented by RF automatically prune a
latent "true" tree. More generally, randomized ensembles of greedily optimized
learners implicitly perform optimal early stopping out-of-sample. So there is
no need to tune the stopping point. By construction, novel variants of Boosting
and MARS are also eligible for automatic tuning. I empirically demonstrate the
property, with simulated and real data, by reporting that these new completely
overfitting ensembles perform similarly to their tuned counterparts -- or
better.

arXiv link: http://arxiv.org/abs/2008.07063v5

Econometrics arXiv paper, submitted: 2020-08-14

Optimal selection of the number of control units in kNN algorithm to estimate average treatment effects

Authors: Andrés Ramírez-Hassan, Raquel Vargas-Correa, Gustavo García, Daniel Londoño

We propose a simple approach to optimally select the number of control units
in k nearest neighbors (kNN) algorithm focusing in minimizing the mean squared
error for the average treatment effects. Our approach is non-parametric where
confidence intervals for the treatment effects were calculated using asymptotic
results with bias correction. Simulation exercises show that our approach gets
relative small mean squared errors, and a balance between confidence intervals
length and type I error. We analyzed the average treatment effects on treated
(ATET) of participation in 401(k) plans on accumulated net financial assets
confirming significant effects on amount and positive probability of net asset.
Our optimal k selection produces significant narrower ATET confidence intervals
compared with common practice of using k=1.

arXiv link: http://arxiv.org/abs/2008.06564v1

Econometrics arXiv updated paper (originally submitted: 2020-08-14)

Bounding Infection Prevalence by Bounding Selectivity and Accuracy of Tests: With Application to Early COVID-19

Authors: Jörg Stoye

I propose novel partial identification bounds on infection prevalence from
information on test rate and test yield. The approach utilizes user-specified
bounds on (i) test accuracy and (ii) the extent to which tests are targeted,
formalized as restriction on the effect of true infection status on the odds
ratio of getting tested and thereby embeddable in logit specifications. The
motivating application is to the COVID-19 pandemic but the strategy may also be
useful elsewhere.
Evaluated on data from the pandemic's early stage, even the weakest of the
novel bounds are reasonably informative. Notably, and in contrast to
speculations that were widely reported at the time, they place the infection
fatality rate for Italy well above the one of influenza by mid-April.

arXiv link: http://arxiv.org/abs/2008.06178v2

Econometrics arXiv updated paper (originally submitted: 2020-08-13)

"Big Data" and its Origins

Authors: Francis X. Diebold

Against the background of explosive growth in data volume, velocity, and
variety, I investigate the origins of the term "Big Data". Its origins are a
bit murky and hence intriguing, involving both academics and industry,
statistics and computer science, ultimately winding back to lunch-table
conversations at Silicon Graphics Inc. (SGI) in the mid 1990s. The Big Data
phenomenon continues unabated, and the ongoing development of statistical
machine learning tools continues to help us confront it.

arXiv link: http://arxiv.org/abs/2008.05835v6

Econometrics arXiv paper, submitted: 2020-08-12

A dynamic ordered logit model with fixed effects

Authors: Chris Muris, Pedro Raposo, Sotiris Vandoros

We study a fixed-$T$ panel data logit model for ordered outcomes that
accommodates fixed effects and state dependence. We provide identification
results for the autoregressive parameter, regression coefficients, and the
threshold parameters in this model. Our results require only four observations
on the outcome variable. We provide conditions under which a composite
conditional maximum likelihood estimator is consistent and asymptotically
normal. We use our estimator to explore the determinants of self-reported
health in a panel of European countries over the period 2003-2016. We find
that: (i) the autoregressive parameter is positive and analogous to a linear
AR(1) coefficient of about 0.25, indicating persistence in health status; (ii)
the association between income and health becomes insignificant once we control
for unobserved heterogeneity and persistence.

arXiv link: http://arxiv.org/abs/2008.05517v1

Econometrics arXiv updated paper (originally submitted: 2020-08-12)

Identification of Time-Varying Transformation Models with Fixed Effects, with an Application to Unobserved Heterogeneity in Resource Shares

Authors: Irene Botosaru, Chris Muris, Krishna Pendakur

We provide new results showing identification of a large class of fixed-T
panel models, where the response variable is an unknown, weakly monotone,
time-varying transformation of a latent linear index of fixed effects,
regressors, and an error term drawn from an unknown stationary distribution.
Our results identify the transformation, the coefficient on regressors, and
features of the distribution of the fixed effects. We then develop a
full-commitment intertemporal collective household model, where the implied
quantity demand equations are time-varying functions of a linear index. The
fixed effects in this index equal logged resource shares, defined as the
fractions of household expenditure enjoyed by each household member. Using
Bangladeshi data, we show that women's resource shares decline with household
budgets and that half of the variation in women's resource shares is due to
unobserved household-level heterogeneity.

arXiv link: http://arxiv.org/abs/2008.05507v2

Econometrics arXiv paper, submitted: 2020-08-11

Convergence rate of estimators of clustered panel models with misclassification

Authors: Andreas Dzemski, Ryo Okui

We study kmeans clustering estimation of panel data models with a latent
group structure and $N$ units and $T$ time periods under long panel
asymptotics. We show that the group-specific coefficients can be estimated at
the parametric root $NT$ rate even if error variances diverge as $T \to \infty$
and some units are asymptotically misclassified. This limit case approximates
empirically relevant settings and is not covered by existing asymptotic
results.

arXiv link: http://arxiv.org/abs/2008.04708v1

Econometrics arXiv updated paper (originally submitted: 2020-08-10)

Nonparametric prediction with spatial data

Authors: Abhimanyu Gupta, Javier Hidalgo

We describe a (nonparametric) prediction algorithm for spatial data, based on
a canonical factorization of the spectral density function. We provide
theoretical results showing that the predictor has desirable asymptotic
properties. Finite sample performance is assessed in a Monte Carlo study that
also compares our algorithm to a rival nonparametric method based on the
infinite AR representation of the dynamics of the data. Finally, we apply our
methodology to predict house prices in Los Angeles.

arXiv link: http://arxiv.org/abs/2008.04269v2

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2020-08-10

Decision Conflict and Deferral in A Class of Logit Models with a Context-Dependent Outside Option

Authors: Georgios Gerasimou

Decision makers often opt for the deferral outside option when they find it
difficult to make an active choice. Contrary to existing logit models with an
outside option where the latter is assigned a fixed value exogenously, this
paper introduces and analyzes a class of logit models where that option's value
is menu-dependent, may be determined endogenously, and could be interpreted as
proxying the varying dgree of decision difficulty at different menus. We focus
on the *power logit* special class of these models. We show that these predict
some observed choice-deferral effects that are caused by hard decisions,
including non-monotonic "roller-coaster" choice-overload phenomena that are
regulated by the presence or absence of a clearly dominant feasible
alternative. We illustrate the usability, novel insights and explanatory gains
of the proposed framework for empirical discrete choice analysis and
theoretical modelling of imperfectly competitive markets in the presence of
potentially indecisive consumers.

arXiv link: http://arxiv.org/abs/2008.04229v10

Econometrics arXiv updated paper (originally submitted: 2020-08-08)

Machine Learning Panel Data Regressions with Heavy-tailed Dependent Data: Theory and Application

Authors: Andrii Babii, Ryan T. Ball, Eric Ghysels, Jonas Striaukas

The paper introduces structured machine learning regressions for heavy-tailed
dependent panel data potentially sampled at different frequencies. We focus on
the sparse-group LASSO regularization. This type of regularization can take
advantage of the mixed frequency time series panel data structures and improve
the quality of the estimates. We obtain oracle inequalities for the pooled and
fixed effects sparse-group LASSO panel data estimators recognizing that
financial and economic data can have fat tails. To that end, we leverage on a
new Fuk-Nagaev concentration inequality for panel data consisting of
heavy-tailed $\tau$-mixing processes.

arXiv link: http://arxiv.org/abs/2008.03600v2

Econometrics arXiv paper, submitted: 2020-08-06

An Upper Bound for Functions of Estimators in High Dimensions

Authors: Mehmet Caner, Xu Han

We provide an upper bound as a random variable for the functions of
estimators in high dimensions. This upper bound may help establish the rate of
convergence of functions in high dimensions. The upper bound random variable
may converge faster, slower, or at the same rate as estimators depending on the
behavior of the partial derivative of the function. We illustrate this via
three examples. The first two examples use the upper bound for testing in high
dimensions, and third example derives the estimated out-of-sample variance of
large portfolios. All our results allow for a larger number of parameters, p,
than the sample size, n.

arXiv link: http://arxiv.org/abs/2008.02636v1

Econometrics arXiv updated paper (originally submitted: 2020-08-05)

On the Size Control of the Hybrid Test for Predictive Ability

Authors: Deborah Kim

We analyze theoretical properties of the hybrid test for superior
predictability. We demonstrate with a simple example that the test may not be
pointwise asymptotically of level $\alpha$ at commonly used significance levels
and may lead to rejection rates over $11%$ when the significance level
$\alpha$ is $5%$. Generalizing this observation, we provide a formal result
that pointwise asymptotic invalidity of the hybrid test persists in a setting
under reasonable conditions. As an easy alternative, we propose a modified
hybrid test based on the generalized moment selection method and show that the
modified test enjoys pointwise asymptotic validity. Monte Carlo simulations
support the theoretical findings.

arXiv link: http://arxiv.org/abs/2008.02318v2

Econometrics arXiv updated paper (originally submitted: 2020-08-04)

Macroeconomic Data Transformations Matter

Authors: Philippe Goulet Coulombe, Maxime Leroux, Dalibor Stevanovic, Stéphane Surprenant

In a low-dimensional linear regression setup, considering linear
transformations/combinations of predictors does not alter predictions. However,
when the forecasting technology either uses shrinkage or is nonlinear, it does.
This is precisely the fabric of the machine learning (ML) macroeconomic
forecasting environment. Pre-processing of the data translates to an alteration
of the regularization -- explicit or implicit -- embedded in ML algorithms. We
review old transformations and propose new ones, then empirically evaluate
their merits in a substantial pseudo-out-sample exercise. It is found that
traditional factors should almost always be included as predictors and moving
average rotations of the data can provide important gains for various
forecasting targets. Also, we note that while predicting directly the average
growth rate is equivalent to averaging separate horizon forecasts when using
OLS-based techniques, the latter can substantially improve on the former when
regularization and/or nonparametric nonlinearities are involved.

arXiv link: http://arxiv.org/abs/2008.01714v2

Econometrics arXiv paper, submitted: 2020-08-03

Testing error distribution by kernelized Stein discrepancy in multivariate time series models

Authors: Donghang Luo, Ke Zhu, Huan Gong, Dong Li

Knowing the error distribution is important in many multivariate time series
applications. To alleviate the risk of error distribution mis-specification,
testing methodologies are needed to detect whether the chosen error
distribution is correct. However, the majority of the existing tests only deal
with the multivariate normal distribution for some special multivariate time
series models, and they thus can not be used to testing for the often observed
heavy-tailed and skewed error distributions in applications. In this paper, we
construct a new consistent test for general multivariate time series models,
based on the kernelized Stein discrepancy. To account for the estimation
uncertainty and unobserved initial values, a bootstrap method is provided to
calculate the critical values. Our new test is easy-to-implement for a large
scope of multivariate error distributions, and its importance is illustrated by
simulated and real data.

arXiv link: http://arxiv.org/abs/2008.00747v1

Econometrics arXiv paper, submitted: 2020-08-03

Estimating TVP-VAR models with time invariant long-run multipliers

Authors: Denis Belomestny, Ekaterina Krymova, Andrey Polbin

The main goal of this paper is to develop a methodology for estimating time
varying parameter vector auto-regression (TVP-VAR) models with a timeinvariant
long-run relationship between endogenous variables and changes in exogenous
variables. We propose a Gibbs sampling scheme for estimation of model
parameters as well as time-invariant long-run multiplier parameters. Further we
demonstrate the applicability of the proposed method by analyzing examples of
the Norwegian and Russian economies based on the data on real GDP, real
exchange rate and real oil prices. Our results show that incorporating the time
invariance constraint on the long-run multipliers in TVP-VAR model helps to
significantly improve the forecasting performance.

arXiv link: http://arxiv.org/abs/2008.00718v1

Econometrics arXiv paper, submitted: 2020-08-03

A spatial multinomial logit model for analysing urban expansion

Authors: Tamás Krisztin, Philipp Piribauer, Michael Wögerer

The paper proposes a Bayesian multinomial logit model to analyse spatial
patterns of urban expansion. The specification assumes that the log-odds of
each class follow a spatial autoregressive process. Using recent advances in
Bayesian computing, our model allows for a computationally efficient treatment
of the spatial multinomial logit model. This allows us to assess spillovers
between regions and across land use classes. In a series of Monte Carlo
studies, we benchmark our model against other competing specifications. The
paper also showcases the performance of the proposed specification using
European regional data. Our results indicate that spatial dependence plays a
key role in land sealing process of cropland and grassland. Moreover, we
uncover land sealing spillovers across multiple classes of arable land.

arXiv link: http://arxiv.org/abs/2008.00673v1

Econometrics arXiv updated paper (originally submitted: 2020-08-03)

Design-Based Uncertainty for Quasi-Experiments

Authors: Ashesh Rambachan, Jonathan Roth

Design-based frameworks of uncertainty are frequently used in settings where
the treatment is (conditionally) randomly assigned. This paper develops a
design-based framework suitable for analyzing quasi-experimental settings in
the social sciences, in which the treatment assignment can be viewed as the
realization of some stochastic process but there is concern about unobserved
selection into treatment. In our framework, treatments are stochastic, but
units may differ in their probabilities of receiving treatment, thereby
allowing for rich forms of selection. We provide conditions under which the
estimands of popular quasi-experimental estimators correspond to interpretable
finite-population causal parameters. We characterize the biases and distortions
to inference that arise when these conditions are violated. These results can
be used to conduct sensitivity analyses when there are concerns about selection
into treatment. Taken together, our results establish a rigorous foundation for
quasi-experimental analyses that more closely aligns with the way empirical
researchers discuss the variation in the data.

arXiv link: http://arxiv.org/abs/2008.00602v8

Econometrics arXiv updated paper (originally submitted: 2020-08-01)

What can we learn about SARS-CoV-2 prevalence from testing and hospital data?

Authors: Daniel W. Sacks, Nir Menachemi, Peter Embi, Coady Wing

Measuring the prevalence of active SARS-CoV-2 infections in the general
population is difficult because tests are conducted on a small and non-random
segment of the population. However, people admitted to the hospital for
non-COVID reasons are tested at very high rates, even though they do not appear
to be at elevated risk of infection. This sub-population may provide valuable
evidence on prevalence in the general population. We estimate upper and lower
bounds on the prevalence of the virus in the general population and the
population of non-COVID hospital patients under weak assumptions on who gets
tested, using Indiana data on hospital inpatient records linked to SARS-CoV-2
virological tests. The non-COVID hospital population is tested fifty times as
often as the general population, yielding much tighter bounds on prevalence. We
provide and test conditions under which this non-COVID hospitalization bound is
valid for the general population. The combination of clinical testing data and
hospital records may contain much more information about the state of the
epidemic than has been previously appreciated. The bounds we calculate for
Indiana could be constructed at relatively low cost in many other states.

arXiv link: http://arxiv.org/abs/2008.00298v2

Econometrics arXiv paper, submitted: 2020-08-01

Simpler Proofs for Approximate Factor Models of Large Dimensions

Authors: Jushan Bai, Serena Ng

Estimates of the approximate factor model are increasingly used in empirical
work. Their theoretical properties, studied some twenty years ago, also laid
the ground work for analysis on large dimensional panel data models with
cross-section dependence. This paper presents simplified proofs for the
estimates by using alternative rotation matrices, exploiting properties of low
rank matrices, as well as the singular value decomposition of the data in
addition to its covariance structure. These simplifications facilitate
interpretation of results and provide a more friendly introduction to
researchers new to the field. New results are provided to allow linear
restrictions to be imposed on factor models.

arXiv link: http://arxiv.org/abs/2008.00254v1

Econometrics arXiv paper, submitted: 2020-07-30

Measuring the Effectiveness of US Monetary Policy during the COVID-19 Recession

Authors: Martin Feldkircher, Florian Huber, Michael Pfarrhofer

The COVID-19 recession that started in March 2020 led to an unprecedented
decline in economic activity across the globe. To fight this recession, policy
makers in central banks engaged in expansionary monetary policy. This paper
asks whether the measures adopted by the US Federal Reserve (Fed) have been
effective in boosting real activity and calming financial markets. To measure
these effects at high frequencies, we propose a novel mixed frequency vector
autoregressive (MF-VAR) model. This model allows us to combine weekly and
monthly information within an unified framework. Our model combines a set of
macroeconomic aggregates such as industrial production, unemployment rates and
inflation with high frequency information from financial markets such as stock
prices, interest rate spreads and weekly information on the Feds balance sheet
size. The latter set of high frequency time series is used to dynamically
interpolate the monthly time series to obtain weekly macroeconomic measures. We
use this setup to simulate counterfactuals in absence of monetary stimulus. The
results show that the monetary expansion caused higher output growth and stock
market returns, more favorable long-term financing conditions and a
depreciation of the US dollar compared to a no-policy benchmark scenario.

arXiv link: http://arxiv.org/abs/2007.15419v1

Econometrics arXiv updated paper (originally submitted: 2020-07-27)

Local Projection Inference is Simpler and More Robust Than You Think

Authors: José Luis Montiel Olea, Mikkel Plagborg-Møller

Applied macroeconomists often compute confidence intervals for impulse
responses using local projections, i.e., direct linear regressions of future
outcomes on current covariates. This paper proves that local projection
inference robustly handles two issues that commonly arise in applications:
highly persistent data and the estimation of impulse responses at long
horizons. We consider local projections that control for lags of the variables
in the regression. We show that lag-augmented local projections with normal
critical values are asymptotically valid uniformly over (i) both stationary and
non-stationary data, and also over (ii) a wide range of response horizons.
Moreover, lag augmentation obviates the need to correct standard errors for
serial correlation in the regression residuals. Hence, local projection
inference is arguably both simpler than previously thought and more robust than
standard autoregressive inference, whose validity is known to depend
sensitively on the persistence of the data and on the length of the horizon.

arXiv link: http://arxiv.org/abs/2007.13888v3

Econometrics arXiv updated paper (originally submitted: 2020-07-27)

The Spectral Approach to Linear Rational Expectations Models

Authors: Majid M. Al-Sadoon

This paper considers linear rational expectations models in the frequency
domain. The paper characterizes existence and uniqueness of solutions to
particular as well as generic systems. The set of all solutions to a given
system is shown to be a finite dimensional affine space in the frequency
domain. It is demonstrated that solutions can be discontinuous with respect to
the parameters of the models in the context of non-uniqueness, invalidating
mainstream frequentist and Bayesian methods. The ill-posedness of the problem
motivates regularized solutions with theoretically guaranteed uniqueness,
continuity, and even differentiability properties.

arXiv link: http://arxiv.org/abs/2007.13804v6

Econometrics arXiv updated paper (originally submitted: 2020-07-27)

Unconditional Quantile Regression with High Dimensional Data

Authors: Yuya Sasaki, Takuya Ura, Yichong Zhang

This paper considers estimation and inference for heterogeneous
counterfactual effects with high-dimensional data. We propose a novel robust
score for debiased estimation of the unconditional quantile regression (Firpo,
Fortin, and Lemieux, 2009) as a measure of heterogeneous counterfactual
marginal effects. We propose a multiplier bootstrap inference and develop
asymptotic theories to guarantee the size control in large sample. Simulation
studies support our theories. Applying the proposed method to Job Corps survey
data, we find that a policy which counterfactually extends the duration of
exposures to the Job Corps training program will be effective especially for
the targeted subpopulations of lower potential wage earners.

arXiv link: http://arxiv.org/abs/2007.13659v4

Econometrics arXiv paper, submitted: 2020-07-27

Total Error and Variability Measures for the Quarterly Workforce Indicators and LEHD Origin-Destination Employment Statistics in OnTheMap

Authors: Kevin L. McKinney, Andrew S. Green, Lars Vilhuber, John M. Abowd

We report results from the first comprehensive total quality evaluation of
five major indicators in the U.S. Census Bureau's Longitudinal
Employer-Household Dynamics (LEHD) Program Quarterly Workforce Indicators
(QWI): total flow-employment, beginning-of-quarter employment, full-quarter
employment, average monthly earnings of full-quarter employees, and total
quarterly payroll. Beginning-of-quarter employment is also the main tabulation
variable in the LEHD Origin-Destination Employment Statistics (LODES) workplace
reports as displayed in OnTheMap (OTM), including OnTheMap for Emergency
Management. We account for errors due to coverage; record-level non-response;
edit and imputation of item missing data; and statistical disclosure
limitation. The analysis reveals that the five publication variables under
study are estimated very accurately for tabulations involving at least 10 jobs.
Tabulations involving three to nine jobs are a transition zone, where cells may
be fit for use with caution. Tabulations involving one or two jobs, which are
generally suppressed on fitness-for-use criteria in the QWI and synthesized in
LODES, have substantial total variability but can still be used to estimate
statistics for untabulated aggregates as long as the job count in the aggregate
is more than 10.

arXiv link: http://arxiv.org/abs/2007.13275v1

Econometrics arXiv updated paper (originally submitted: 2020-07-26)

Scalable Bayesian estimation in the multinomial probit model

Authors: Ruben Loaiza-Maya, Didier Nibbering

The multinomial probit model is a popular tool for analyzing choice behaviour
as it allows for correlation between choice alternatives. Because current model
specifications employ a full covariance matrix of the latent utilities for the
choice alternatives, they are not scalable to a large number of choice
alternatives. This paper proposes a factor structure on the covariance matrix,
which makes the model scalable to large choice sets. The main challenge in
estimating this structure is that the model parameters require identifying
restrictions. We identify the parameters by a trace-restriction on the
covariance matrix, which is imposed through a reparametrization of the factor
structure. We specify interpretable prior distributions on the model parameters
and develop an MCMC sampler for parameter estimation. The proposed approach
significantly improves performance in large choice sets relative to existing
multinomial probit specifications. Applications to purchase data show the
economic importance of including a large number of choice alternatives in
consumer choice analysis.

arXiv link: http://arxiv.org/abs/2007.13247v2

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2020-07-25

The role of global economic policy uncertainty in predicting crude oil futures volatility: Evidence from a two-factor GARCH-MIDAS model

Authors: Peng-Fei Dai, Xiong Xiong, Wei-Xing Zhou

This paper aims to examine whether the global economic policy uncertainty
(GEPU) and uncertainty changes have different impacts on crude oil futures
volatility. We establish single-factor and two-factor models under the
GARCH-MIDAS framework to investigate the predictive power of GEPU and GEPU
changes excluding and including realized volatility. The findings show that the
models with rolling-window specification perform better than those with
fixed-span specification. For single-factor models, the GEPU index and its
changes, as well as realized volatility, are consistent effective factors in
predicting the volatility of crude oil futures. Specially, GEPU changes have
stronger predictive power than the GEPU index. For two-factor models, GEPU is
not an effective forecast factor for the volatility of WTI crude oil futures or
Brent crude oil futures. The two-factor model with GEPU changes contains more
information and exhibits stronger forecasting ability for crude oil futures
market volatility than the single-factor models. The GEPU changes are indeed
the main source of long-term volatility of the crude oil futures.

arXiv link: http://arxiv.org/abs/2007.12838v1

Econometrics arXiv paper, submitted: 2020-07-25

Applying Data Synthesis for Longitudinal Business Data across Three Countries

Authors: M. Jahangir Alam, Benoit Dostie, Jörg Drechsler, Lars Vilhuber

Data on businesses collected by statistical agencies are challenging to
protect. Many businesses have unique characteristics, and distributions of
employment, sales, and profits are highly skewed. Attackers wishing to conduct
identification attacks often have access to much more information than for any
individual. As a consequence, most disclosure avoidance mechanisms fail to
strike an acceptable balance between usefulness and confidentiality protection.
Detailed aggregate statistics by geography or detailed industry classes are
rare, public-use microdata on businesses are virtually inexistant, and access
to confidential microdata can be burdensome. Synthetic microdata have been
proposed as a secure mechanism to publish microdata, as part of a broader
discussion of how to provide broader access to such data sets to researchers.
In this article, we document an experiment to create analytically valid
synthetic data, using the exact same model and methods previously employed for
the United States, for data from two different countries: Canada (LEAP) and
Germany (BHP). We assess utility and protection, and provide an assessment of
the feasibility of extending such an approach in a cost-effective way to other
data.

arXiv link: http://arxiv.org/abs/2008.02246v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2020-07-24

Are low frequency macroeconomic variables important for high frequency electricity prices?

Authors: Claudia Foroni, Francesco Ravazzolo, Luca Rossini

Recent research finds that forecasting electricity prices is very relevant.
In many applications, it might be interesting to predict daily electricity
prices by using their own lags or renewable energy sources. However, the recent
turmoil of energy prices and the Russian-Ukrainian war increased attention in
evaluating the relevance of industrial production and the Purchasing Managers'
Index output survey in forecasting the daily electricity prices. We develop a
Bayesian reverse unrestricted MIDAS model which accounts for the mismatch in
frequency between the daily prices and the monthly macro variables in Germany
and Italy. We find that the inclusion of macroeconomic low frequency variables
is more important for short than medium term horizons by means of point and
density measures. In particular, accuracy increases by combining hard and soft
information, while using only surveys gives less accurate forecasts than using
only industrial production data.

arXiv link: http://arxiv.org/abs/2007.13566v2

Econometrics arXiv updated paper (originally submitted: 2020-07-23)

bootUR: An R Package for Bootstrap Unit Root Tests

Authors: Stephan Smeekes, Ines Wilms

Unit root tests form an essential part of any time series analysis. We
provide practitioners with a single, unified framework for comprehensive and
reliable unit root testing in the R package bootUR.The package's backbone is
the popular augmented Dickey-Fuller test paired with a union of rejections
principle, which can be performed directly on single time series or multiple
(including panel) time series. Accurate inference is ensured through the use of
bootstrap methods. The package addresses the needs of both novice users, by
providing user-friendly and easy-to-implement functions with sensible default
options, as well as expert users, by giving full user-control to adjust the
tests to one's desired settings. Our parallelized C++ implementation ensures
that all unit root tests are scalable to datasets containing many time series.

arXiv link: http://arxiv.org/abs/2007.12249v5

Econometrics arXiv updated paper (originally submitted: 2020-07-23)

Deep Dynamic Factor Models

Authors: Paolo Andreini, Cosimo Izzo, Giovanni Ricco

A novel deep neural network framework -- that we refer to as Deep Dynamic
Factor Model (D$^2$FM) --, is able to encode the information available, from
hundreds of macroeconomic and financial time-series into a handful of
unobserved latent states. While similar in spirit to traditional dynamic factor
models (DFMs), differently from those, this new class of models allows for
nonlinearities between factors and observables due to the autoencoder neural
network structure. However, by design, the latent states of the model can still
be interpreted as in a standard factor model. Both in a fully real-time
out-of-sample nowcasting and forecasting exercise with US data and in a Monte
Carlo experiment, the D$^2$FM improves over the performances of a
state-of-the-art DFM.

arXiv link: http://arxiv.org/abs/2007.11887v2

Econometrics arXiv paper, submitted: 2020-07-22

The Mode Treatment Effect

Authors: Neng-Chieh Chang

Mean, median, and mode are three essential measures of the centrality of
probability distributions. In program evaluation, the average treatment effect
(mean) and the quantile treatment effect (median) have been intensively studied
in the past decades. The mode treatment effect, however, has long been
neglected in program evaluation. This paper fills the gap by discussing both
the estimation and inference of the mode treatment effect. I propose both
traditional kernel and machine learning methods to estimate the mode treatment
effect. I also derive the asymptotic properties of the proposed estimators and
find that both estimators follow the asymptotic normality but with the rate of
convergence slower than the regular rate $N$, which is different from
the rates of the classical average and quantile treatment effect estimators.

arXiv link: http://arxiv.org/abs/2007.11606v1

Econometrics arXiv updated paper (originally submitted: 2020-07-21)

Lasso Inference for High-Dimensional Time Series

Authors: Robert Adamek, Stephan Smeekes, Ines Wilms

In this paper we develop valid inference for high-dimensional time series. We
extend the desparsified lasso to a time series setting under Near-Epoch
Dependence (NED) assumptions allowing for non-Gaussian, serially correlated and
heteroskedastic processes, where the number of regressors can possibly grow
faster than the time dimension. We first derive an error bound under weak
sparsity, which, coupled with the NED assumption, means this inequality can
also be applied to the (inherently misspecified) nodewise regressions performed
in the desparsified lasso. This allows us to establish the uniform asymptotic
normality of the desparsified lasso under general conditions, including for
inference on parameters of increasing dimensions. Additionally, we show
consistency of a long-run variance estimator, thus providing a complete set of
tools for performing inference in high-dimensional linear time series models.
Finally, we perform a simulation exercise to demonstrate the small sample
properties of the desparsified lasso in common time series settings.

arXiv link: http://arxiv.org/abs/2007.10952v6

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2020-07-21

The impact of economic policy uncertainties on the volatility of European carbon market

Authors: Peng-Fei Dai, Xiong Xiong, Toan Luu Duc Huynh, Jiqiang Wang

The European Union Emission Trading Scheme is a carbon emission allowance
trading system designed by Europe to achieve emission reduction targets. The
amount of carbon emission caused by production activities is closely related to
the socio-economic environment. Therefore, from the perspective of economic
policy uncertainty, this article constructs the GARCH-MIDAS-EUEPU and
GARCH-MIDAS-GEPU models for investigating the impact of European and global
economic policy uncertainty on carbon price fluctuations. The results show that
both European and global economic policy uncertainty will exacerbate the
long-term volatility of European carbon spot return, with the latter having a
stronger impact when the change is the same. Moreover, the volatility of the
European carbon spot return can be forecasted better by the predictor, global
economic policy uncertainty. This research can provide some implications for
market managers in grasping carbon market trends and helping participants
control the risk of fluctuations in carbon allowances.

arXiv link: http://arxiv.org/abs/2007.10564v2

Econometrics arXiv updated paper (originally submitted: 2020-07-20)

Treatment Effects with Targeting Instruments

Authors: Sokbae Lee, Bernard Salanié

Multivalued treatments are commonplace in applications. We explore the use of
discrete-valued instruments to control for selection bias in this setting. Our
discussion revolves around the concept of targeting: which instruments target
which treatments. It allows us to establish conditions under which
counterfactual averages and treatment effects are point- or
partially-identified for composite complier groups. We illustrate the
usefulness of our framework by applying it to data from the Head Start Impact
Study. Under a plausible positive selection assumption, we derive informative
bounds that suggest less beneficial effects of Head Start expansions than the
parametric estimates of Kline and Walters (2016).

arXiv link: http://arxiv.org/abs/2007.10432v5

Econometrics arXiv paper, submitted: 2020-07-20

Variable Selection in Macroeconomic Forecasting with Many Predictors

Authors: Zhenzhong Wang, Zhengyuan Zhu, Cindy Yu

In the data-rich environment, using many economic predictors to forecast a
few key variables has become a new trend in econometrics. The commonly used
approach is factor augment (FA) approach. In this paper, we pursue another
direction, variable selection (VS) approach, to handle high-dimensional
predictors. VS is an active topic in statistics and computer science. However,
it does not receive as much attention as FA in economics. This paper introduces
several cutting-edge VS methods to economic forecasting, which includes: (1)
classical greedy procedures; (2) l1 regularization; (3) gradient descent with
sparsification and (4) meta-heuristic algorithms. Comprehensive simulation
studies are conducted to compare their variable selection accuracy and
prediction performance under different scenarios. Among the reviewed methods, a
meta-heuristic algorithm called sequential Monte Carlo algorithm performs the
best. Surprisingly the classical forward selection is comparable to it and
better than other more sophisticated algorithms. In addition, we apply these VS
methods on economic forecasting and compare with the popular FA approach. It
turns out for employment rate and CPI inflation, some VS methods can achieve
considerable improvement over FA, and the selected predictors can be well
explained by economic theories.

arXiv link: http://arxiv.org/abs/2007.10160v1

Econometrics arXiv updated paper (originally submitted: 2020-07-20)

Permutation-based tests for discontinuities in event studies

Authors: Federico A. Bugni, Jia Li, Qiyuan Li

We propose using a permutation test to detect discontinuities in an
underlying economic model at a known cutoff point. Relative to the existing
literature, we show that this test is well suited for event studies based on
time-series data. The test statistic measures the distance between the
empirical distribution functions of observed data in two local subsamples on
the two sides of the cutoff. Critical values are computed via a standard
permutation algorithm. Under a high-level condition that the observed data can
be coupled by a collection of conditionally independent variables, we establish
the asymptotic validity of the permutation test, allowing the sizes of the
local subsamples to be either be fixed or grow to infinity. In the latter case,
we also establish that the permutation test is consistent. We demonstrate that
our high-level condition can be verified in a broad range of problems in the
infill asymptotic time-series setting, which justifies using the permutation
test to detect jumps in economic variables such as volatility, trading
activity, and liquidity. These potential applications are illustrated in an
empirical case study for selected FOMC announcements during the ongoing
COVID-19 pandemic.

arXiv link: http://arxiv.org/abs/2007.09837v4

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2020-07-17

How Flexible is that Functional Form? Quantifying the Restrictiveness of Theories

Authors: Drew Fudenberg, Wayne Gao, Annie Liang

We propose a restrictiveness measure for economic models based on how well
they fit synthetic data from a pre-defined class. This measure, together with a
measure for how well the model fits real data, outlines a Pareto frontier,
where models that rule out more regularities, yet capture the regularities that
are present in real data, are preferred. To illustrate our approach, we
evaluate the restrictiveness of popular models in two laboratory settings --
certainty equivalents and initial play -- and in one field setting -- takeup of
microfinance in Indian villages. The restrictiveness measure reveals new
insights about each of the models, including that some economic models with
only a few parameters are very flexible.

arXiv link: http://arxiv.org/abs/2007.09213v4

Econometrics arXiv cross-link from math.OC (math.OC), submitted: 2020-07-17

Tractable Profit Maximization over Multiple Attributes under Discrete Choice Models

Authors: Hongzhang Shao, Anton J. Kleywegt

A fundamental problem in revenue management is to optimally choose the
attributes of products, such that the total profit or revenue or market share
is maximized. Usually, these attributes can affect both a product's market
share (probability to be chosen) and its profit margin. For example, if a smart
phone has a better battery, then it is more costly to be produced, but is more
likely to be purchased by a customer. The decision maker then needs to choose
an optimal vector of attributes for each product that balances this trade-off.
In spite of the importance of such problems, there is not yet a method to solve
it efficiently in general. Past literature in revenue management and discrete
choice models focus on pricing problems, where price is the only attribute to
be chosen for each product. Existing approaches to solve pricing problems
tractably cannot be generalized to the optimization problem with multiple
product attributes as decision variables. On the other hand, papers studying
product line design with multiple attributes all result in intractable
optimization problems. Then we found a way to reformulate the static
multi-attribute optimization problem, as well as the multi-stage fluid
optimization problem with both resource constraints and upper and lower bounds
of attributes, as a tractable convex conic optimization problem. Our result
applies to optimization problems under the multinomial logit (MNL) model, the
Markov chain (MC) choice model, and with certain conditions, the nested logit
(NL) model.

arXiv link: http://arxiv.org/abs/2007.09193v3

Econometrics arXiv updated paper (originally submitted: 2020-07-16)

Government spending and multi-category treatment effects:The modified conditional independence assumption

Authors: Koiti Yano

I devise a novel approach to evaluate the effectiveness of fiscal policy in
the short run with multi-category treatment effects and inverse probability
weighting based on the potential outcome framework. This study's main
contribution to the literature is the proposed modified conditional
independence assumption to improve the evaluation of fiscal policy. Using this
approach, I analyze the effects of government spending on the US economy from
1992 to 2019. The empirical study indicates that large fiscal contraction
generates a negative effect on the economic growth rate, and small and large
fiscal expansions realize a positive effect. However, these effects are not
significant in the traditional multiple regression approach. I conclude that
this new approach significantly improves the evaluation of fiscal policy.

arXiv link: http://arxiv.org/abs/2007.08396v3

Econometrics arXiv updated paper (originally submitted: 2020-07-16)

Global Representation of the Conditional LATE Model: A Separability Result

Authors: Yu-Chang Chen, Haitian Xie

This paper studies the latent index representation of the conditional LATE
model, making explicit the role of covariates in treatment selection. We find
that if the directions of the monotonicity condition are the same across all
values of the conditioning covariate, which is often assumed in the literature,
then the treatment choice equation has to satisfy a separability condition
between the instrument and the covariate. This global representation result
establishes testable restrictions imposed on the way covariates enter the
treatment choice equation. We later extend the representation theorem to
incorporate multiple ordered levels of treatment.

arXiv link: http://arxiv.org/abs/2007.08106v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2020-07-15

Least Squares Estimation Using Sketched Data with Heteroskedastic Errors

Authors: Sokbae Lee, Serena Ng

Researchers may perform regressions using a sketch of data of size $m$
instead of the full sample of size $n$ for a variety of reasons. This paper
considers the case when the regression errors do not have constant variance and
heteroskedasticity robust standard errors would normally be needed for test
statistics to provide accurate inference. We show that estimates using data
sketched by random projections will behave `as if' the errors were
homoskedastic. Estimation by random sampling would not have this property. The
result arises because the sketched estimates in the case of random projections
can be expressed as degenerate $U$-statistics, and under certain conditions,
these statistics are asymptotically normal with homoskedastic variance. We
verify that the conditions hold not only in the case of least squares
regression when the covariates are exogenous, but also in instrumental
variables estimation when the covariates are endogenous. The result implies
that inference, including first-stage F tests for instrument relevance, can be
simpler than the full sample case if the sketching scheme is appropriately
chosen.

arXiv link: http://arxiv.org/abs/2007.07781v3

Econometrics arXiv cross-link from eess.SP (eess.SP), submitted: 2020-07-15

Understanding fluctuations through Multivariate Circulant Singular Spectrum Analysis

Authors: Juan Bógalo, Pilar Poncela, Eva Senra

We introduce Multivariate Circulant Singular Spectrum Analysis (M-CiSSA) to
provide a comprehensive framework to analyze fluctuations, extracting the
underlying components of a set of time series, disentangling their sources of
variation and assessing their relative phase or cyclical position at each
frequency. Our novel method is non-parametric and can be applied to series out
of phase, highly nonlinear and modulated both in frequency and amplitude. We
prove a uniqueness theorem that in the case of common information and without
the need of fitting a factor model, allows us to identify common sources of
variation. This technique can be quite useful in several fields such as
climatology, biometrics, engineering or economics among others. We show the
performance of M-CiSSA through a synthetic example of latent signals modulated
both in amplitude and frequency and through the real data analysis of energy
prices to understand the main drivers and co-movements of primary energy
commodity prices at various frequencies that are key to assess energy policy at
different time horizons.

arXiv link: http://arxiv.org/abs/2007.07561v5

Econometrics arXiv updated paper (originally submitted: 2020-07-14)

Persistence in Financial Connectedness and Systemic Risk

Authors: Jozef Barunik, Michael Ellington

This paper characterises dynamic linkages arising from shocks with
heterogeneous degrees of persistence. Using frequency domain techniques, we
introduce measures that identify smoothly varying links of a transitory and
persistent nature. Our approach allows us to test for statistical differences
in such dynamic links. We document substantial differences in transitory and
persistent linkages among US financial industry volatilities, argue that they
track heterogeneously persistent sources of systemic risk, and thus may serve
as a useful tool for market participants.

arXiv link: http://arxiv.org/abs/2007.07842v4

Econometrics arXiv paper, submitted: 2020-07-14

A More Robust t-Test

Authors: Ulrich K. Mueller

Standard inference about a scalar parameter estimated via GMM amounts to
applying a t-test to a particular set of observations. If the number of
observations is not very large, then moderately heavy tails can lead to poor
behavior of the t-test. This is a particular problem under clustering, since
the number of observations then corresponds to the number of clusters, and
heterogeneity in cluster sizes induces a form of heavy tails. This paper
combines extreme value theory for the smallest and largest observations with a
normal approximation for the average of the remaining observations to construct
a more robust alternative to the t-test. The new test is found to control size
much more successfully in small samples compared to existing methods.
Analytical results in the canonical inference for the mean problem demonstrate
that the new test provides a refinement over the full sample t-test under more
than two but less than three moments, while the bootstrapped t-test does not.

arXiv link: http://arxiv.org/abs/2007.07065v1

Econometrics arXiv updated paper (originally submitted: 2020-07-13)

An Adversarial Approach to Structural Estimation

Authors: Tetsuya Kaji, Elena Manresa, Guillaume Pouliot

We propose a new simulation-based estimation method, adversarial estimation,
for structural models. The estimator is formulated as the solution to a minimax
problem between a generator (which generates simulated observations using the
structural model) and a discriminator (which classifies whether an observation
is simulated). The discriminator maximizes the accuracy of its classification
while the generator minimizes it. We show that, with a sufficiently rich
discriminator, the adversarial estimator attains parametric efficiency under
correct specification and the parametric rate under misspecification. We
advocate the use of a neural network as a discriminator that can exploit
adaptivity properties and attain fast rates of convergence. We apply our method
to the elderly's saving decision model and show that our estimator uncovers the
bequest motive as an important source of saving across the wealth distribution,
not only for the rich.

arXiv link: http://arxiv.org/abs/2007.06169v3

Econometrics arXiv updated paper (originally submitted: 2020-07-10)

A Semiparametric Network Formation Model with Unobserved Linear Heterogeneity

Authors: Luis E. Candelaria

This paper analyzes a semiparametric model of network formation in the
presence of unobserved agent-specific heterogeneity. The objective is to
identify and estimate the preference parameters associated with homophily on
observed attributes when the distributions of the unobserved factors are not
parametrically specified. This paper offers two main contributions to the
literature on network formation. First, it establishes a new point
identification result for the vector of parameters that relies on the existence
of a special repressor. The identification proof is constructive and
characterizes a closed-form for the parameter of interest. Second, it
introduces a simple two-step semiparametric estimator for the vector of
parameters with a first-step kernel estimator. The estimator is computationally
tractable and can be applied to both dense and sparse networks. Moreover, I
show that the estimator is consistent and has a limiting normal distribution as
the number of individuals in the network increases. Monte Carlo experiments
demonstrate that the estimator performs well in finite samples and in networks
with different levels of sparsity.

arXiv link: http://arxiv.org/abs/2007.05403v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-07-10

Intelligent Credit Limit Management in Consumer Loans Based on Causal Inference

Authors: Hang Miao, Kui Zhao, Zhun Wang, Linbo Jiang, Quanhui Jia, Yanming Fang, Quan Yu

Nowadays consumer loan plays an important role in promoting the economic
growth, and credit cards are the most popular consumer loan. One of the most
essential parts in credit cards is the credit limit management. Traditionally,
credit limits are adjusted based on limited heuristic strategies, which are
developed by experienced professionals. In this paper, we present a data-driven
approach to manage the credit limit intelligently. Firstly, a conditional
independence testing is conducted to acquire the data for building models.
Based on these testing data, a response model is then built to measure the
heterogeneous treatment effect of increasing credit limits (i.e. treatments)
for different customers, who are depicted by several control variables (i.e.
features). In order to incorporate the diminishing marginal effect, a carefully
selected log transformation is introduced to the treatment variable. Moreover,
the model's capability can be further enhanced by applying a non-linear
transformation on features via GBDT encoding. Finally, a well-designed metric
is proposed to properly measure the performances of compared methods. The
experimental results demonstrate the effectiveness of the proposed approach.

arXiv link: http://arxiv.org/abs/2007.05188v1

Econometrics arXiv updated paper (originally submitted: 2020-07-09)

Structural Gaussian mixture vector autoregressive model with application to the asymmetric effects of monetary policy shocks

Authors: Savi Virolainen

A structural Gaussian mixture vector autoregressive model is introduced. The
shocks are identified by combining simultaneous diagonalization of the reduced
form error covariance matrices with constraints on the time-varying impact
matrix. This leads to flexible identification conditions, and some of the
constraints are also testable. The empirical application studies asymmetries in
the effects of the U.S. monetary policy shock and finds strong asymmetries with
respect to the sign and size of the shock and to the initial state of the
economy. The accompanying CRAN distributed R package gmvarkit provides a
comprehensive set of tools for numerical analysis.

arXiv link: http://arxiv.org/abs/2007.04713v7

Econometrics arXiv paper, submitted: 2020-07-09

Time Series Analysis of COVID-19 Infection Curve: A Change-Point Perspective

Authors: Feiyu Jiang, Zifeng Zhao, Xiaofeng Shao

In this paper, we model the trajectory of the cumulative confirmed cases and
deaths of COVID-19 (in log scale) via a piecewise linear trend model. The model
naturally captures the phase transitions of the epidemic growth rate via
change-points and further enjoys great interpretability due to its
semiparametric nature. On the methodological front, we advance the nascent
self-normalization (SN) technique (Shao, 2010) to testing and estimation of a
single change-point in the linear trend of a nonstationary time series. We
further combine the SN-based change-point test with the NOT algorithm
(Baranowski et al., 2019) to achieve multiple change-point estimation. Using
the proposed method, we analyze the trajectory of the cumulative COVID-19 cases
and deaths for 30 major countries and discover interesting patterns with
potentially relevant implications for effectiveness of the pandemic responses
by different countries. Furthermore, based on the change-point detection
algorithm and a flexible extrapolation function, we design a simple two-stage
forecasting scheme for COVID-19 and demonstrate its promising performance in
predicting cumulative deaths in the U.S.

arXiv link: http://arxiv.org/abs/2007.04553v1

Econometrics arXiv paper, submitted: 2020-07-08

Efficient Covariate Balancing for the Local Average Treatment Effect

Authors: Phillip Heiler

This paper develops an empirical balancing approach for the estimation of
treatment effects under two-sided noncompliance using a binary conditionally
independent instrumental variable. The method weighs both treatment and outcome
information with inverse probabilities to produce exact finite sample balance
across instrument level groups. It is free of functional form assumptions on
the outcome or the treatment selection step. By tailoring the loss function for
the instrument propensity scores, the resulting treatment effect estimates
exhibit both low bias and a reduced variance in finite samples compared to
conventional inverse probability weighting methods. The estimator is
automatically weight normalized and has similar bias properties compared to
conventional two-stage least squares estimation under constant causal effects
for the compliers. We provide conditions for asymptotic normality and
semiparametric efficiency and demonstrate how to utilize additional information
about the treatment selection step for bias reduction in finite samples. The
method can be easily combined with regularization or other statistical learning
approaches to deal with a high-dimensional number of observed confounding
variables. Monte Carlo simulations suggest that the theoretical advantages
translate well to finite samples. The method is illustrated in an empirical
example.

arXiv link: http://arxiv.org/abs/2007.04346v1

Econometrics arXiv updated paper (originally submitted: 2020-07-08)

Difference-in-Differences Estimators of Intertemporal Treatment Effects

Authors: Clément de Chaisemartin, Xavier D'Haultfœuille

We study treatment-effect estimation using panel data. The treatment may be
non-binary, non-absorbing, and the outcome may be affected by treatment lags.
We make a parallel-trends assumption, and propose event-study estimators of the
effect of being exposed to a weakly higher treatment dose for $\ell$ periods.
We also propose normalized estimators, that estimate a weighted average of the
effects of the current treatment and its lags. We also analyze commonly-used
two-way-fixed-effects regressions. Unlike our estimators, they can be biased in
the presence of heterogeneous treatment effects. A local-projection version of
those regressions is biased even with homogeneous effects.

arXiv link: http://arxiv.org/abs/2007.04267v13

Econometrics arXiv paper, submitted: 2020-07-08

Talents from Abroad. Foreign Managers and Productivity in the United Kingdom

Authors: Dimitrios Exadaktylos, Massimo Riccaboni, Armando Rungi

In this paper, we test the contribution of foreign management on firms'
competitiveness. We use a novel dataset on the careers of 165,084 managers
employed by 13,106 companies in the United Kingdom in the period 2009-2017. We
find that domestic manufacturing firms become, on average, between 7% and 12%
more productive after hiring the first foreign managers, whereas foreign-owned
firms register no significant improvement. In particular, we test that previous
industry-specific experience is the primary driver of productivity gains in
domestic firms (15.6%), in a way that allows the latter to catch up with
foreign-owned firms. Managers from the European Union are highly valuable, as
they represent about half of the recruits in our data. Our identification
strategy combines matching techniques, difference-in-difference, and
pre-recruitment trends to challenge reverse causality. Results are robust to
placebo tests and to different estimators of Total Factor Productivity.
Eventually, we argue that upcoming limits to the mobility of foreign talents
after the Brexit event can hamper the allocation of productive managerial
resources.

arXiv link: http://arxiv.org/abs/2007.04055v1

Econometrics arXiv updated paper (originally submitted: 2020-07-08)

Optimal Decision Rules for Weak GMM

Authors: Isaiah Andrews, Anna Mikusheva

This paper studies optimal decision rules, including estimators and tests,
for weakly identified GMM models. We derive the limit experiment for weakly
identified GMM, and propose a theoretically-motivated class of priors which
give rise to quasi-Bayes decision rules as a limiting case. Together with
results in the previous literature, this establishes desirable properties for
the quasi-Bayes approach regardless of model identification status, and we
recommend quasi-Bayes for settings where identification is a concern. We
further propose weighted average power-optimal identification-robust
frequentist tests and confidence sets, and prove a Bernstein-von Mises-type
result for the quasi-Bayes posterior under weak identification.

arXiv link: http://arxiv.org/abs/2007.04050v7

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2020-07-08

Max-sum tests for cross-sectional dependence of high-demensional panel data

Authors: Long Feng, Tiefeng Jiang, Binghui Liu, Wei Xiong

We consider a testing problem for cross-sectional dependence for
high-dimensional panel data, where the number of cross-sectional units is
potentially much larger than the number of observations. The cross-sectional
dependence is described through a linear regression model. We study three tests
named the sum test, the max test and the max-sum test, where the latter two are
new. The sum test is initially proposed by Breusch and Pagan (1980). We design
the max and sum tests for sparse and non-sparse residuals in the linear
regressions, respectively.And the max-sum test is devised to compromise both
situations on the residuals. Indeed, our simulation shows that the max-sum test
outperforms the previous two tests. This makes the max-sum test very useful in
practice where sparsity or not for a set of data is usually vague. Towards the
theoretical analysis of the three tests, we have settled two conjectures
regarding the sum of squares of sample correlation coefficients asked by
Pesaran (2004 and 2008). In addition, we establish the asymptotic theory for
maxima of sample correlations coefficients appeared in the linear regression
model for panel data, which is also the first successful attempt to our
knowledge. To study the max-sum test, we create a novel method to show
asymptotic independence between maxima and sums of dependent random variables.
We expect the method itself is useful for other problems of this nature.
Finally, an extensive simulation study as well as a case study are carried out.
They demonstrate advantages of our proposed methods in terms of both empirical
powers and robustness for residuals regardless of sparsity or not.

arXiv link: http://arxiv.org/abs/2007.03911v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-07-07

A Dynamic Choice Model with Heterogeneous Decision Rules: Application in Estimating the User Cost of Rail Crowding

Authors: Prateek Bansal, Daniel Hörcher, Daniel J. Graham

Crowding valuation of subway riders is an important input to various
supply-side decisions of transit operators. The crowding cost perceived by a
transit rider is generally estimated by capturing the trade-off that the rider
makes between crowding and travel time while choosing a route. However,
existing studies rely on static compensatory choice models and fail to account
for inertia and the learning behaviour of riders. To address these challenges,
we propose a new dynamic latent class model (DLCM) which (i) assigns riders to
latent compensatory and inertia/habit classes based on different decision
rules, (ii) enables transitions between these classes over time, and (iii)
adopts instance-based learning theory to account for the learning behaviour of
riders. We use the expectation-maximisation algorithm to estimate DLCM, and the
most probable sequence of latent classes for each rider is retrieved using the
Viterbi algorithm. The proposed DLCM can be applied in any choice context to
capture the dynamics of decision rules used by a decision-maker. We demonstrate
its practical advantages in estimating the crowding valuation of an Asian
metro's riders. To calibrate the model, we recover the daily route preferences
and in-vehicle crowding experiences of regular metro riders using a
two-month-long smart card and vehicle location data. The results indicate that
the average rider follows the compensatory rule on only 25.5% of route choice
occasions. DLCM estimates also show an increase of 47% in metro riders'
valuation of travel time under extremely crowded conditions relative to that
under uncrowded conditions.

arXiv link: http://arxiv.org/abs/2007.03682v1

Econometrics arXiv paper, submitted: 2020-07-06

Semi-nonparametric Latent Class Choice Model with a Flexible Class Membership Component: A Mixture Model Approach

Authors: Georges Sfeir, Maya Abou-Zeid, Filipe Rodrigues, Francisco Camara Pereira, Isam Kaysi

This study presents a semi-nonparametric Latent Class Choice Model (LCCM)
with a flexible class membership component. The proposed model formulates the
latent classes using mixture models as an alternative approach to the
traditional random utility specification with the aim of comparing the two
approaches on various measures including prediction accuracy and representation
of heterogeneity in the choice process. Mixture models are parametric
model-based clustering techniques that have been widely used in areas such as
machine learning, data mining and patter recognition for clustering and
classification problems. An Expectation-Maximization (EM) algorithm is derived
for the estimation of the proposed model. Using two different case studies on
travel mode choice behavior, the proposed model is compared to traditional
discrete choice models on the basis of parameter estimates' signs, value of
time, statistical goodness-of-fit measures, and cross-validation tests. Results
show that mixture models improve the overall performance of latent class choice
models by providing better out-of-sample prediction accuracy in addition to
better representations of heterogeneity without weakening the behavioral and
economic interpretability of the choice models.

arXiv link: http://arxiv.org/abs/2007.02739v1

Econometrics arXiv updated paper (originally submitted: 2020-07-06)

Teacher-to-classroom assignment and student achievement

Authors: Bryan S. Graham, Geert Ridder, Petra Thiemann, Gema Zamarro

We study the effects of counterfactual teacher-to-classroom assignments on
average student achievement in elementary and middle schools in the US. We use
the Measures of Effective Teaching (MET) experiment to semiparametrically
identify the average reallocation effects (AREs) of such assignments. Our
findings suggest that changes in within-district teacher assignments could have
appreciable effects on student achievement. Unlike policies which require
hiring additional teachers (e.g., class-size reduction measures), or those
aimed at changing the stock of teachers (e.g., VAM-guided teacher tenure
policies), alternative teacher-to-classroom assignments are resource neutral;
they raise student achievement through a more efficient deployment of existing
teachers.

arXiv link: http://arxiv.org/abs/2007.02653v2

Econometrics arXiv paper, submitted: 2020-07-06

Spectral Targeting Estimation of $λ$-GARCH models

Authors: Simon Hetland

This paper presents a novel estimator of orthogonal GARCH models, which
combines (eigenvalue and -vector) targeting estimation with stepwise
(univariate) estimation. We denote this the spectral targeting estimator. This
two-step estimator is consistent under finite second order moments, while
asymptotic normality holds under finite fourth order moments. The estimator is
especially well suited for modelling larger portfolios: we compare the
empirical performance of the spectral targeting estimator to that of the quasi
maximum likelihood estimator for five portfolios of 25 assets. The spectral
targeting estimator dominates in terms of computational complexity, being up to
57 times faster in estimation, while both estimators produce similar
out-of-sample forecasts, indicating that the spectral targeting estimator is
well suited for high-dimensional empirical applications.

arXiv link: http://arxiv.org/abs/2007.02588v1

Econometrics arXiv updated paper (originally submitted: 2020-07-05)

Forecasting with Bayesian Grouped Random Effects in Panel Data

Authors: Boyuan Zhang

In this paper, we estimate and leverage latent constant group structure to
generate the point, set, and density forecasts for short dynamic panel data. We
implement a nonparametric Bayesian approach to simultaneously identify
coefficients and group membership in the random effects which are heterogeneous
across groups but fixed within a group. This method allows us to flexibly
incorporate subjective prior knowledge on the group structure that potentially
improves the predictive accuracy. In Monte Carlo experiments, we demonstrate
that our Bayesian grouped random effects (BGRE) estimators produce accurate
estimates and score predictive gains over standard panel data estimators. With
a data-driven group structure, the BGRE estimators exhibit comparable accuracy
of clustering with the Kmeans algorithm and outperform a two-step Bayesian
grouped estimator whose group structure relies on Kmeans. In the empirical
analysis, we apply our method to forecast the investment rate across a broad
range of firms and illustrate that the estimated latent group structure
improves forecasts relative to standard panel data estimators.

arXiv link: http://arxiv.org/abs/2007.02435v8

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2020-07-05

Assessing External Validity Over Worst-case Subpopulations

Authors: Sookyo Jeong, Hongseok Namkoong

Study populations are typically sampled from limited points in space and
time, and marginalized groups are underrepresented. To assess the external
validity of randomized and observational studies, we propose and evaluate the
worst-case treatment effect (WTE) across all subpopulations of a given size,
which guarantees positive findings remain valid over subpopulations. We develop
a semiparametrically efficient estimator for the WTE that analyzes the external
validity of the augmented inverse propensity weighted estimator for the average
treatment effect. Our cross-fitting procedure leverages flexible nonparametric
and machine learning-based estimates of nuisance parameters and is a regular
root-$n$ estimator even when nuisance estimates converge more slowly. On real
examples where external validity is of core concern, our proposed framework
guards against brittle findings that are invalidated by unanticipated
population shifts.

arXiv link: http://arxiv.org/abs/2007.02411v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-07-04

Off-Policy Exploitability-Evaluation in Two-Player Zero-Sum Markov Games

Authors: Kenshi Abe, Yusuke Kaneko

Off-policy evaluation (OPE) is the problem of evaluating new policies using
historical data obtained from a different policy. In the recent OPE context,
most studies have focused on single-player cases, and not on multi-player
cases. In this study, we propose OPE estimators constructed by the doubly
robust and double reinforcement learning estimators in two-player zero-sum
Markov games. The proposed estimators project exploitability that is often used
as a metric for determining how close a policy profile (i.e., a tuple of
policies) is to a Nash equilibrium in two-player zero-sum games. We prove the
exploitability estimation error bounds for the proposed estimators. We then
propose the methods to find the best candidate policy profile by selecting the
policy profile that minimizes the estimated exploitability from a given policy
profile class. We prove the regret bounds of the policy profiles selected by
our methods. Finally, we demonstrate the effectiveness and performance of the
proposed estimators through experiments.

arXiv link: http://arxiv.org/abs/2007.02141v2

Econometrics arXiv cross-link from q-bio.PE (q-bio.PE), submitted: 2020-07-03

Bridging the COVID-19 Data and the Epidemiological Model using Time Varying Parameter SIRD Model

Authors: Cem Cakmakli, Yasin Simsek

This paper extends the canonical model of epidemiology, SIRD model, to allow
for time varying parameters for real-time measurement of the stance of the
COVID-19 pandemic. Time variation in model parameters is captured using the
generalized autoregressive score modelling structure designed for the typically
daily count data related to pandemic. The resulting specification permits a
flexible yet parsimonious model structure with a very low computational cost.
This is especially crucial at the onset of the pandemic when the data is scarce
and the uncertainty is abundant. Full sample results show that countries
including US, Brazil and Russia are still not able to contain the pandemic with
the US having the worst performance. Furthermore, Iran and South Korea are
likely to experience the second wave of the pandemic. A real-time exercise show
that the proposed structure delivers timely and precise information on the
current stance of the pandemic ahead of the competitors that use rolling
window. This, in turn, transforms into accurate short-term predictions of the
active cases. We further modify the model to allow for unreported cases.
Results suggest that the effects of the presence of these cases on the
estimation results diminish towards the end of sample with the increasing
number of testing.

arXiv link: http://arxiv.org/abs/2007.02726v2

Econometrics arXiv updated paper (originally submitted: 2020-07-01)

When are Google data useful to nowcast GDP? An approach via pre-selection and shrinkage

Authors: Laurent Ferrara, Anna Simoni

Alternative data sets are widely used for macroeconomic nowcasting together
with machine learning--based tools. The latter are often applied without a
complete picture of their theoretical nowcasting properties. Against this
background, this paper proposes a theoretically grounded nowcasting methodology
that allows researchers to incorporate alternative Google Search Data (GSD)
among the predictors and that combines targeted preselection, Ridge
regularization, and Generalized Cross Validation. Breaking with most existing
literature, which focuses on asymptotic in-sample theoretical properties, we
establish the theoretical out-of-sample properties of our methodology and
support them by Monte-Carlo simulations. We apply our methodology to GSD to
nowcast GDP growth rate of several countries during various economic periods.
Our empirical findings support the idea that GSD tend to increase nowcasting
accuracy, even after controlling for official variables, but that the gain
differs between periods of recessions and of macroeconomic stability.

arXiv link: http://arxiv.org/abs/2007.00273v3

Econometrics arXiv paper, submitted: 2020-07-01

Regression Discontinuity Design with Multivalued Treatments

Authors: Carolina Caetano, Gregorio Caetano, Juan Carlos Escanciano

We study identification and estimation in the Regression Discontinuity Design
(RDD) with a multivalued treatment variable. We also allow for the inclusion of
covariates. We show that without additional information, treatment effects are
not identified. We give necessary and sufficient conditions that lead to
identification of LATEs as well as of weighted averages of the conditional
LATEs. We show that if the first stage discontinuities of the multiple
treatments conditional on covariates are linearly independent, then it is
possible to identify multivariate weighted averages of the treatment effects
with convenient identifiable weights. If, moreover, treatment effects do not
vary with some covariates or a flexible parametric structure can be assumed, it
is possible to identify (in fact, over-identify) all the treatment effects. The
over-identification can be used to test these assumptions. We propose a simple
estimator, which can be programmed in packaged software as a Two-Stage Least
Squares regression, and packaged standard errors and tests can also be used.
Finally, we implement our approach to identify the effects of different types
of insurance coverage on health care utilization, as in Card, Dobkin and
Maestas (2008).

arXiv link: http://arxiv.org/abs/2007.00185v1

Econometrics arXiv updated paper (originally submitted: 2020-06-30)

Inference in Difference-in-Differences with Few Treated Units and Spatial Correlation

Authors: Luis Alvarez, Bruno Ferman

We consider the problem of inference in Difference-in-Differences (DID) when
there are few treated units and errors are spatially correlated. We first show
that, when there is a single treated unit, some existing inference methods
designed for settings with few treated and many control units remain
asymptotically valid when errors are weakly dependent. However, these methods
may be invalid with more than one treated unit. We propose alternatives that
are asymptotically valid in this setting, even when the relevant distance
metric across units is unavailable.

arXiv link: http://arxiv.org/abs/2006.16997v7

Econometrics arXiv updated paper (originally submitted: 2020-06-29)

Inference in Bayesian Additive Vector Autoregressive Tree Models

Authors: Florian Huber, Luca Rossini

Vector autoregressive (VAR) models assume linearity between the endogenous
variables and their lags. This assumption might be overly restrictive and could
have a deleterious impact on forecasting accuracy. As a solution, we propose
combining VAR with Bayesian additive regression tree (BART) models. The
resulting Bayesian additive vector autoregressive tree (BAVART) model is
capable of capturing arbitrary non-linear relations between the endogenous
variables and the covariates without much input from the researcher. Since
controlling for heteroscedasticity is key for producing precise density
forecasts, our model allows for stochastic volatility in the errors. We apply
our model to two datasets. The first application shows that the BAVART model
yields highly competitive forecasts of the US term structure of interest rates.
In a second application, we estimate our model using a moderately sized
Eurozone dataset to investigate the dynamic effects of uncertainty on the
economy.

arXiv link: http://arxiv.org/abs/2006.16333v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-06-29

Estimation of Covid-19 Prevalence from Serology Tests: A Partial Identification Approach

Authors: Panos Toulis

We propose a partial identification method for estimating disease prevalence
from serology studies. Our data are results from antibody tests in some
population sample, where the test parameters, such as the true/false positive
rates, are unknown. Our method scans the entire parameter space, and rejects
parameter values using the joint data density as the test statistic. The
proposed method is conservative for marginal inference, in general, but its key
advantage over more standard approaches is that it is valid in finite samples
even when the underlying model is not point identified. Moreover, our method
requires only independence of serology test results, and does not rely on
asymptotic arguments, normality assumptions, or other approximations. We use
recent Covid-19 serology studies in the US, and show that the parameter
confidence set is generally wide, and cannot support definite conclusions.
Specifically, recent serology studies from California suggest a prevalence
anywhere in the range 0%-2% (at the time of study), and are therefore
inconclusive. However, this range could be narrowed down to 0.7%-1.5% if the
actual false positive rate of the antibody test was indeed near its empirical
estimate ( 0.5%). In another study from New York state, Covid-19 prevalence is
confidently estimated in the range 13%-17% in mid-April of 2020, which also
suggests significant geographic variation in Covid-19 exposure across the US.
Combining all datasets yields a 5%-8% prevalence range. Our results overall
suggest that serology testing on a massive scale can give crucial information
for future policy design, even when such tests are imperfect and their
parameters unknown.

arXiv link: http://arxiv.org/abs/2006.16214v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-06-29

Visualizing and comparing distributions with half-disk density strips

Authors: Carlo Romano Marcello Alessandro Santagiustina, Matteo Iacopini

We propose a user-friendly graphical tool, the half-disk density strip
(HDDS), for visualizing and comparing probability density functions. The HDDS
exploits color shading for representing a distribution in an intuitive way. In
univariate settings, the half-disk density strip allows to immediately discern
the key characteristics of a density, such as symmetry, dispersion, and
multi-modality. In the multivariate settings, we define HDDS tables to
generalize the concept of contingency tables. It is an array of half-disk
density strips, which compactly displays the univariate marginal and
conditional densities of a variable of interest, together with the joint and
marginal densities of the conditioning variables. Moreover, HDDSs are by
construction well suited to easily compare pairs of densities. To highlight the
concrete benefits of the proposed methods, we show how to use HDDSs for
analyzing income distribution and life-satisfaction, conditionally on
continuous and categorical controls, from survey data. The code for
implementing HDDS methods is made available through a dedicated R package.

arXiv link: http://arxiv.org/abs/2006.16063v1

Econometrics arXiv updated paper (originally submitted: 2020-06-29)

Treatment Effects in Interactive Fixed Effects Models with a Small Number of Time Periods

Authors: Brantly Callaway, Sonia Karami

This paper considers identifying and estimating the Average Treatment Effect
on the Treated (ATT) when untreated potential outcomes are generated by an
interactive fixed effects model. That is, in addition to time-period and
individual fixed effects, we consider the case where there is an unobserved
time invariant variable whose effect on untreated potential outcomes may change
over time and which can therefore cause outcomes (in the absence of
participating in the treatment) to follow different paths for the treated group
relative to the untreated group. The models that we consider in this paper
generalize many commonly used models in the treatment effects literature
including difference in differences and individual-specific linear trend
models. Unlike the majority of the literature on interactive fixed effects
models, we do not require the number of time periods to go to infinity to
consistently estimate the ATT. Our main identification result relies on having
the effect of some time invariant covariate (e.g., race or sex) not vary over
time. Using our approach, we show that the ATT can be identified with as few as
three time periods and with panel or repeated cross sections data.

arXiv link: http://arxiv.org/abs/2006.15780v3

Econometrics arXiv cross-link from q-fin.RM (q-fin.RM), submitted: 2020-06-28

Quantitative Statistical Robustness for Tail-Dependent Law Invariant Risk Measures

Authors: Wei Wang, Huifu Xu, Tiejun Ma

When estimating the risk of a financial position with empirical data or Monte
Carlo simulations via a tail-dependent law invariant risk measure such as the
Conditional Value-at-Risk (CVaR), it is important to ensure the robustness of
the statistical estimator particularly when the data contain noise. Kratscher
et al. [1] propose a new framework to examine the qualitative robustness of
estimators for tail-dependent law invariant risk measures on Orlicz spaces,
which is a step further from earlier work for studying the robustness of risk
measurement procedures by Cont et al. [2]. In this paper, we follow the stream
of research to propose a quantitative approach for verifying the statistical
robustness of tail-dependent law invariant risk measures. A distinct feature of
our approach is that we use the Fortet-Mourier metric to quantify the variation
of the true underlying probability measure in the analysis of the discrepancy
between the laws of the plug-in estimators of law invariant risk measure based
on the true data and perturbed data, which enables us to derive an explicit
error bound for the discrepancy when the risk functional is Lipschitz
continuous with respect to a class of admissible laws. Moreover, the newly
introduced notion of Lipschitz continuity allows us to examine the degree of
robustness for tail-dependent risk measures. Finally, we apply our quantitative
approach to some well-known risk measures to illustrate our theory.

arXiv link: http://arxiv.org/abs/2006.15491v1

Econometrics arXiv updated paper (originally submitted: 2020-06-26)

Real-Time Real Economic Activity: Entering and Exiting the Pandemic Recession of 2020

Authors: Francis X. Diebold

Entering and exiting the Pandemic Recession, I study the high-frequency
real-activity signals provided by a leading nowcast, the ADS Index of Business
Conditions produced and released in real time by the Federal Reserve Bank of
Philadelphia. I track the evolution of real-time vintage beliefs and compare
them to a later-vintage chronology. Real-time ADS plunges and then swings as
its underlying economic indicators swing, but the ADS paths quickly converge to
indicate a return to brisk positive growth by mid-May. I show, moreover, that
the daily real activity path was highly correlated with the daily COVID-19
cases. Finally, I provide a comparative assessment of the real-time ADS signals
provided when exiting the Great Recession.

arXiv link: http://arxiv.org/abs/2006.15183v4

Econometrics arXiv paper, submitted: 2020-06-26

Endogenous Treatment Effect Estimation with some Invalid and Irrelevant Instruments

Authors: Qingliang Fan, Yaqian Wu

Instrumental variables (IV) regression is a popular method for the estimation
of the endogenous treatment effects. Conventional IV methods require all the
instruments are relevant and valid. However, this is impractical especially in
high-dimensional models when we consider a large set of candidate IVs. In this
paper, we propose an IV estimator robust to the existence of both the invalid
and irrelevant instruments (called R2IVE) for the estimation of endogenous
treatment effects. This paper extends the scope of Kang et al. (2016) by
considering a true high-dimensional IV model and a nonparametric reduced form
equation. It is shown that our procedure can select the relevant and valid
instruments consistently and the proposed R2IVE is root-n consistent and
asymptotically normal. Monte Carlo simulations demonstrate that the R2IVE
performs favorably compared to the existing high-dimensional IV estimators
(such as, NAIVE (Fan and Zhong, 2018) and sisVIVE (Kang et al., 2016)) when
invalid instruments exist. In the empirical study, we revisit the classic
question of trade and growth (Frankel and Romer, 1999).

arXiv link: http://arxiv.org/abs/2006.14998v1

Econometrics arXiv updated paper (originally submitted: 2020-06-25)

Identification and Formal Privacy Guarantees

Authors: Tatiana Komarova, Denis Nekipelov

Empirical economic research crucially relies on highly sensitive individual
datasets. At the same time, increasing availability of public individual-level
data makes it possible for adversaries to potentially de-identify anonymized
records in sensitive research datasets. Most commonly accepted formal
definition of an individual non-disclosure guarantee is referred to as
differential privacy. It restricts the interaction of researchers with the data
by allowing them to issue queries to the data. The differential privacy
mechanism then replaces the actual outcome of the query with a randomised
outcome.
The impact of differential privacy on the identification of empirical
economic models and on the performance of estimators in nonlinear empirical
Econometric models has not been sufficiently studied. Since privacy protection
mechanisms are inherently finite-sample procedures, we define the notion of
identifiability of the parameter of interest under differential privacy as a
property of the limit of experiments. It is naturally characterized by the
concepts from the random sets theory.
We show that particular instances of regression discontinuity design may be
problematic for inference with differential privacy as parameters turn out to
be neither point nor partially identified. The set of differentially private
estimators converges weakly to a random set. Our analysis suggests that many
other estimators that rely on nuisance parameters may have similar properties
with the requirement of differential privacy. We show that identification
becomes possible if the target parameter can be deterministically located
within the random set. In that case, a full exploration of the random set of
the weak limits of differentially private estimators can allow the data curator
to select a sequence of instances of differentially private estimators
converging to the target parameter in probability.

arXiv link: http://arxiv.org/abs/2006.14732v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2020-06-25

Empirical MSE Minimization to Estimate a Scalar Parameter

Authors: Clément de Chaisemartin, Xavier D'Haultfœuille

We consider the estimation of a scalar parameter, when two estimators are
available. The first is always consistent. The second is inconsistent in
general, but has a smaller asymptotic variance than the first, and may be
consistent if an assumption is satisfied. We propose to use the weighted sum of
the two estimators with the lowest estimated mean-squared error (MSE). We show
that this third estimator dominates the other two from a minimax-regret
perspective: the maximum asymptotic-MSE-gain one may incur by using this
estimator rather than one of the other estimators is larger than the maximum
asymptotic-MSE-loss.

arXiv link: http://arxiv.org/abs/2006.14667v1

Econometrics arXiv paper, submitted: 2020-06-25

Inference without smoothing for large panels with cross-sectional and temporal dependence

Authors: J. Hidalgo, M. Schafgans

This paper addresses inference in large panel data models in the presence of
both cross-sectional and temporal dependence of unknown form. We are interested
in making inferences that do not rely on the choice of any smoothing parameter
as is the case with the often employed "HAC" estimator for the covariance
matrix. To that end, we propose a cluster estimator for the asymptotic
covariance of the estimators and valid bootstrap schemes that do not require
the selection of a bandwidth or smoothing parameter and accommodate the
nonparametric nature of both temporal and cross-sectional dependence. Our
approach is based on the observation that the spectral representation of the
fixed effect panel data model is such that the errors become approximately
temporally uncorrelated. Our proposed bootstrap schemes can be viewed as wild
bootstraps in the frequency domain. We present some Monte-Carlo simulations to
shed some light on the small sample performance of our inferential procedure.

arXiv link: http://arxiv.org/abs/2006.14409v1

Econometrics arXiv paper, submitted: 2020-06-25

Matching Multidimensional Types: Theory and Application

Authors: Veli Safak

Becker (1973) presents a bilateral matching model in which scalar types
describe agents. For this framework, he establishes the conditions under which
positive sorting between agents' attributes is the unique market outcome.
Becker's celebrated sorting result has been applied to address many economic
questions. However, recent empirical studies in the fields of health,
household, and labor economics suggest that agents have multiple
outcome-relevant attributes. In this paper, I study a matching model with
multidimensional types. I offer multidimensional generalizations of concordance
and supermodularity to construct three multidimensional sorting patterns and
two classes of multidimensional complementarities. For each of these sorting
patterns, I identify the sufficient conditions which guarantee its optimality.
In practice, we observe sorting patterns between observed attributes that are
aggregated over unobserved characteristics. To reconcile theory with practice,
I establish the link between production complementarities and the aggregated
sorting patterns. Finally, I examine the relationship between agents' health
status and their spouses' education levels among U.S. households within the
framework for multidimensional matching markets. Preliminary analysis reveals a
weak positive association between agents' health status and their spouses'
education levels. This weak positive association is estimated to be a product
of three factors: (a) an attraction between better-educated individuals, (b) an
attraction between healthier individuals, and (c) a weak positive association
between agents' health status and their education levels. The attraction
channel suggests that the insurance risk associated with a two-person family
plan is higher than the aggregate risk associated with two individual policies.

arXiv link: http://arxiv.org/abs/2006.14243v1

Econometrics arXiv updated paper (originally submitted: 2020-06-25)

Cointegration in large VARs

Authors: Anna Bykhovskaya, Vadim Gorin

The paper analyses cointegration in vector autoregressive processes (VARs)
for the cases when both the number of coordinates, $N$, and the number of time
periods, $T$, are large and of the same order. We propose a way to examine a
VAR of order $1$ for the presence of cointegration based on a modification of
the Johansen likelihood ratio test. The advantage of our procedure over the
original Johansen test and its finite sample corrections is that our test does
not suffer from over-rejection. This is achieved through novel asymptotic
theorems for eigenvalues of matrices in the test statistic in the regime of
proportionally growing $N$ and $T$. Our theoretical findings are supported by
Monte Carlo simulations and an empirical illustration. Moreover, we find a
surprising connection with multivariate analysis of variance (MANOVA) and
explain why it emerges.

arXiv link: http://arxiv.org/abs/2006.14179v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-06-25

Robust and Efficient Approximate Bayesian Computation: A Minimum Distance Approach

Authors: David T. Frazier

In many instances, the application of approximate Bayesian methods is
hampered by two practical features: 1) the requirement to project the data down
to low-dimensional summary, including the choice of this projection, which
ultimately yields inefficient inference; 2) a possible lack of robustness to
deviations from the underlying model structure. Motivated by these efficiency
and robustness concerns, we construct a new Bayesian method that can deliver
efficient estimators when the underlying model is well-specified, and which is
simultaneously robust to certain forms of model misspecification. This new
approach bypasses the calculation of summaries by considering a norm between
empirical and simulated probability measures. For specific choices of the norm,
we demonstrate that this approach can deliver point estimators that are as
efficient as those obtained using exact Bayesian inference, while also
simultaneously displaying robustness to deviations from the underlying model
assumptions.

arXiv link: http://arxiv.org/abs/2006.14126v1

Econometrics arXiv paper, submitted: 2020-06-25

A Model of the Fed's View on Inflation

Authors: Thomas Hasenzagl, Filippo Pellegrino, Lucrezia Reichlin, Giovanni Ricco

We develop a medium-size semi-structural time series model of inflation
dynamics that is consistent with the view - often expressed by central banks -
that three components are important: a trend anchored by long-run expectations,
a Phillips curve and temporary fluctuations in energy prices. We find that a
stable long-term inflation trend and a well identified steep Phillips curve are
consistent with the data, but they imply potential output declining since the
new millennium and energy prices affecting headline inflation not only via the
Phillips curve but also via an independent expectational channel. A
high-frequency energy price cycle can be related to global factors affecting
the commodity market, and often overpowers the Phillips curve thereby
explaining the inflation puzzles of the last ten years.

arXiv link: http://arxiv.org/abs/2006.14110v1

Econometrics arXiv paper, submitted: 2020-06-24

Dynamic Effects of Persistent Shocks

Authors: Mario Alloza, Jesus Gonzalo, Carlos Sanz

We provide evidence that many narrative shocks used by prominent literature
are persistent. We show that the two leading methods to estimate impulse
responses to an independently identified shock (local projections and
distributed lag models) treat persistence differently, hence identifying
different objects. We propose corrections to re-establish the equivalence
between local projections and distributed lag models, providing applied
researchers with methods and guidance to estimate their desired object of
interest. We apply these methods to well-known empirical work and find that how
persistence is treated has a sizable impact on the estimates of dynamic
effects.

arXiv link: http://arxiv.org/abs/2006.14047v1

Econometrics arXiv paper, submitted: 2020-06-24

Asset Prices and Capital Share Risks: Theory and Evidence

Authors: Joseph P. Byrne, Boulis M. Ibrahim, Xiaoyu Zong

An asset pricing model using long-run capital share growth risk has recently
been found to successfully explain U.S. stock returns. Our paper adopts a
recursive preference utility framework to derive an heterogeneous asset pricing
model with capital share risks.While modeling capital share risks, we account
for the elevated consumption volatility of high income stockholders. Capital
risks have strong volatility effects in our recursive asset pricing model.
Empirical evidence is presented in which capital share growth is also a source
of risk for stock return volatility. We uncover contrasting unconditional and
conditional asset pricing evidence for capital share risks.

arXiv link: http://arxiv.org/abs/2006.14023v1

Econometrics arXiv cross-link from cs.CY (cs.CY), submitted: 2020-06-24

Interdependence in active mobility adoption: Joint modelling and motivational spill-over in walking, cycling and bike-sharing

Authors: M Said, A Biehl, A Stathopoulos

Active mobility offers an array of physical, emotional, and social wellbeing
benefits. However, with the proliferation of the sharing economy, new
nonmotorized means of transport are entering the fold, complementing some
existing mobility options while competing with others. The purpose of this
research study is to investigate the adoption of three active travel modes;
namely walking, cycling and bikesharing, in a joint modeling framework. The
analysis is based on an adaptation of the stages of change framework, which
originates from the health behavior sciences. Multivariate ordered probit
modeling drawing on U.S. survey data provides well-needed insights into
individuals preparedness to adopt multiple active modes as a function of
personal, neighborhood and psychosocial factors. The research suggests three
important findings. 1) The joint model structure confirms interdependence among
different active mobility choices. The strongest complementarity is found for
walking and cycling adoption. 2) Each mode has a distinctive adoption path with
either three or four separate stages. We discuss the implications of derived
stage-thresholds and plot adoption contours for selected scenarios. 3)
Psychological and neighborhood variables generate more coupling among active
modes than individual and household factors. Specifically, identifying strongly
with active mobility aspirations, experiences with multimodal travel,
possessing better navigational skills, along with supportive local community
norms are the factors that appear to drive the joint adoption decisions. This
study contributes to the understanding of how decisions within the same
functional domain are related and help to design policies that promote active
mobility by identifying positive spillovers and joint determinants.

arXiv link: http://arxiv.org/abs/2006.16920v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-06-24

Unified Principal Component Analysis for Sparse and Dense Functional Data under Spatial Dependency

Authors: Haozhe Zhang, Yehua Li

We consider spatially dependent functional data collected under a
geostatistics setting, where locations are sampled from a spatial point
process. The functional response is the sum of a spatially dependent functional
effect and a spatially independent functional nugget effect. Observations on
each function are made on discrete time points and contaminated with
measurement errors. Under the assumption of spatial stationarity and isotropy,
we propose a tensor product spline estimator for the spatio-temporal covariance
function. When a coregionalization covariance structure is further assumed, we
propose a new functional principal component analysis method that borrows
information from neighboring functions. The proposed method also generates
nonparametric estimators for the spatial covariance functions, which can be
used for functional kriging. Under a unified framework for sparse and dense
functional data, infill and increasing domain asymptotic paradigms, we develop
the asymptotic convergence rates for the proposed estimators. Advantages of the
proposed approach are demonstrated through simulation studies and two real data
applications representing sparse and dense functional data, respectively.

arXiv link: http://arxiv.org/abs/2006.13489v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2020-06-24

Design and Evaluation of Personalized Free Trials

Authors: Hema Yoganarasimhan, Ebrahim Barzegary, Abhishek Pani

Free trial promotions, where users are given a limited time to try the
product for free, are a commonly used customer acquisition strategy in the
Software as a Service (SaaS) industry. We examine how trial length affect
users' responsiveness, and seek to quantify the gains from personalizing the
length of the free trial promotions. Our data come from a large-scale field
experiment conducted by a leading SaaS firm, where new users were randomly
assigned to 7, 14, or 30 days of free trial. First, we show that the 7-day
trial to all consumers is the best uniform policy, with a 5.59% increase in
subscriptions. Next, we develop a three-pronged framework for personalized
policy design and evaluation. Using our framework, we develop seven
personalized targeting policies based on linear regression, lasso, CART, random
forest, XGBoost, causal tree, and causal forest, and evaluate their
performances using the Inverse Propensity Score (IPS) estimator. We find that
the personalized policy based on lasso performs the best, followed by the one
based on XGBoost. In contrast, policies based on causal tree and causal forest
perform poorly. We then link a method's effectiveness in designing policy with
its ability to personalize the treatment sufficiently without over-fitting
(i.e., capture spurious heterogeneity). Next, we segment consumers based on
their optimal trial length and derive some substantive insights on the drivers
of user behavior in this context. Finally, we show that policies designed to
maximize short-run conversions also perform well on long-run outcomes such as
consumer loyalty and profitability.

arXiv link: http://arxiv.org/abs/2006.13420v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2020-06-23

Bootstrapping $\ell_p$-Statistics in High Dimensions

Authors: Alexander Giessing, Jianqing Fan

This paper considers a new bootstrap procedure to estimate the distribution
of high-dimensional $\ell_p$-statistics, i.e. the $\ell_p$-norms of the sum of
$n$ independent $d$-dimensional random vectors with $d \gg n$ and $p \in [1,
\infty]$. We provide a non-asymptotic characterization of the sampling
distribution of $\ell_p$-statistics based on Gaussian approximation and show
that the bootstrap procedure is consistent in the Kolmogorov-Smirnov distance
under mild conditions on the covariance structure of the data. As an
application of the general theory we propose a bootstrap hypothesis test for
simultaneous inference on high-dimensional mean vectors. We establish its
asymptotic correctness and consistency under high-dimensional alternatives, and
discuss the power of the test as well as the size of associated confidence
sets. We illustrate the bootstrap and testing procedure numerically on
simulated data.

arXiv link: http://arxiv.org/abs/2006.13099v3

Econometrics arXiv updated paper (originally submitted: 2020-06-23)

The Macroeconomy as a Random Forest

Authors: Philippe Goulet Coulombe

I develop Macroeconomic Random Forest (MRF), an algorithm adapting the
canonical Machine Learning (ML) tool to flexibly model evolving parameters in a
linear macro equation. Its main output, Generalized Time-Varying Parameters
(GTVPs), is a versatile device nesting many popular nonlinearities
(threshold/switching, smooth transition, structural breaks/change) and allowing
for sophisticated new ones. The approach delivers clear forecasting gains over
numerous alternatives, predicts the 2008 drastic rise in unemployment, and
performs well for inflation. Unlike most ML-based methods, MRF is directly
interpretable -- via its GTVPs. For instance, the successful unemployment
forecast is due to the influence of forward-looking variables (e.g., term
spreads, housing starts) nearly doubling before every recession. Interestingly,
the Phillips curve has indeed flattened, and its might is highly cyclical.

arXiv link: http://arxiv.org/abs/2006.12724v3

Econometrics arXiv paper, submitted: 2020-06-22

Locally trimmed least squares: conventional inference in possibly nonstationary models

Authors: Zhishui Hu, Ioannis Kasparis, Qiying Wang

A novel IV estimation method, that we term Locally Trimmed LS (LTLS), is
developed which yields estimators with (mixed) Gaussian limit distributions in
situations where the data may be weakly or strongly persistent. In particular,
we allow for nonlinear predictive type of regressions where the regressor can
be stationary short/long memory as well as nonstationary long memory process or
a nearly integrated array. The resultant t-tests have conventional limit
distributions (i.e. N(0; 1)) free of (near to unity and long memory) nuisance
parameters. In the case where the regressor is a fractional process, no
preliminary estimator for the memory parameter is required. Therefore, the
practitioner can conduct inference while being agnostic about the exact
dependence structure in the data. The LTLS estimator is obtained by applying
certain chronological trimming to the OLS instrument via the utilisation of
appropriate kernel functions of time trend variables. The finite sample
performance of LTLS based t-tests is investigated with the aid of a simulation
experiment. An empirical application to the predictability of stock returns is
also provided.

arXiv link: http://arxiv.org/abs/2006.12595v1

Econometrics arXiv updated paper (originally submitted: 2020-06-22)

A Pipeline for Variable Selection and False Discovery Rate Control With an Application in Labor Economics

Authors: Sophie-Charlotte Klose, Johannes Lederer

We introduce tools for controlled variable selection to economists. In
particular, we apply a recently introduced aggregation scheme for false
discovery rate (FDR) control to German administrative data to determine the
parts of the individual employment histories that are relevant for the career
outcomes of women. Our results suggest that career outcomes can be predicted
based on a small set of variables, such as daily earnings, wage increases in
combination with a high level of education, employment status, and working
experience.

arXiv link: http://arxiv.org/abs/2006.12296v2

Econometrics arXiv paper, submitted: 2020-06-22

Vocational Training Programs and Youth Labor Market Outcomes: Evidence from Nepal

Authors: S. Chakravarty, M. Lundberg, P. Nikolov, J. Zenker

Lack of skills is arguably one of the most important determinants of high
levels of unemployment and poverty. In response, policymakers often initiate
vocational training programs in effort to enhance skill formation among the
youth. Using a regression-discontinuity design, we examine a large youth
training intervention in Nepal. We find, twelve months after the start of the
training program, that the intervention generated an increase in non-farm
employment of 10 percentage points (ITT estimates) and up to 31 percentage
points for program compliers (LATE estimates). We also detect sizeable gains in
monthly earnings. Women who start self-employment activities inside their homes
largely drive these impacts. We argue that low baseline educational levels and
non-farm employment levels and Nepal's social and cultural norms towards women
drive our large program impacts. Our results suggest that the program enables
otherwise underemployed women to earn an income while staying at home - close
to household errands and in line with the socio-cultural norms that prevent
them from taking up employment outside the house.

arXiv link: http://arxiv.org/abs/2006.13036v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-06-22

Unified Discrete-Time Factor Stochastic Volatility and Continuous-Time Ito Models for Combining Inference Based on Low-Frequency and High-Frequency

Authors: Donggyu Kim, Xinyu Song, Yazhen Wang

This paper introduces unified models for high-dimensional factor-based Ito
process, which can accommodate both continuous-time Ito diffusion and
discrete-time stochastic volatility (SV) models by embedding the discrete SV
model in the continuous instantaneous factor volatility process. We call it the
SV-Ito model. Based on the series of daily integrated factor volatility matrix
estimators, we propose quasi-maximum likelihood and least squares estimation
methods. Their asymptotic properties are established. We apply the proposed
method to predict future vast volatility matrix whose asymptotic behaviors are
studied. A simulation study is conducted to check the finite sample performance
of the proposed estimation and prediction method. An empirical analysis is
carried out to demonstrate the advantage of the SV-Ito model in volatility
prediction and portfolio allocation problems.

arXiv link: http://arxiv.org/abs/2006.12039v1

Econometrics arXiv paper, submitted: 2020-06-20

Mitigating Bias in Online Microfinance Platforms: A Case Study on Kiva.org

Authors: Soumajyoti Sarkar, Hamidreza Alvari

Over the last couple of decades in the lending industry, financial
disintermediation has occurred on a global scale. Traditionally, even for small
supply of funds, banks would act as the conduit between the funds and the
borrowers. It has now been possible to overcome some of the obstacles
associated with such supply of funds with the advent of online platforms like
Kiva, Prosper, LendingClub. Kiva for example, works with Micro Finance
Institutions (MFIs) in developing countries to build Internet profiles of
borrowers with a brief biography, loan requested, loan term, and purpose. Kiva,
in particular, allows lenders to fund projects in different sectors through
group or individual funding. Traditional research studies have investigated
various factors behind lender preferences purely from the perspective of loan
attributes and only until recently have some cross-country cultural preferences
been investigated. In this paper, we investigate lender perceptions of economic
factors of the borrower countries in relation to their preferences towards
loans associated with different sectors. We find that the influence from
economic factors and loan attributes can have substantially different roles to
play for different sectors in achieving faster funding. We formally investigate
and quantify the hidden biases prevalent in different loan sectors using recent
tools from causal inference and regression models that rely on Bayesian
variable selection methods. We then extend these models to incorporate fairness
constraints based on our empirical analysis and find that such models can still
achieve near comparable results with respect to baseline regression models.

arXiv link: http://arxiv.org/abs/2006.12995v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-06-19

Valid Causal Inference with (Some) Invalid Instruments

Authors: Jason Hartford, Victor Veitch, Dhanya Sridhar, Kevin Leyton-Brown

Instrumental variable methods provide a powerful approach to estimating
causal effects in the presence of unobserved confounding. But a key challenge
when applying them is the reliance on untestable "exclusion" assumptions that
rule out any relationship between the instrument variable and the response that
is not mediated by the treatment. In this paper, we show how to perform
consistent IV estimation despite violations of the exclusion assumption. In
particular, we show that when one has multiple candidate instruments, only a
majority of these candidates---or, more generally, the modal candidate-response
relationship---needs to be valid to estimate the causal effect. Our approach
uses an estimate of the modal prediction from an ensemble of instrumental
variable estimators. The technique is simple to apply and is "black-box" in the
sense that it may be used with any instrumental variable estimator as long as
the treatment effect is identified for each valid instrument independently. As
such, it is compatible with recent machine-learning based estimators that allow
for the estimation of conditional average treatment effects (CATE) on complex,
high dimensional data. Experimentally, we achieve accurate estimates of
conditional average treatment effects using an ensemble of deep network-based
estimators, including on a challenging simulated Mendelian Randomization
problem.

arXiv link: http://arxiv.org/abs/2006.11386v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-06-19

Do Methodological Birds of a Feather Flock Together?

Authors: Carrie E. Fry, Laura A. Hatfield

Quasi-experimental methods have proliferated over the last two decades, as
researchers develop causal inference tools for settings in which randomization
is infeasible. Two popular such methods, difference-in-differences (DID) and
comparative interrupted time series (CITS), compare observations before and
after an intervention in a treated group to an untreated comparison group
observed over the same period. Both methods rely on strong, untestable
counterfactual assumptions. Despite their similarities, the methodological
literature on CITS lacks the mathematical formality of DID. In this paper, we
use the potential outcomes framework to formalize two versions of CITS - a
general version described by Bloom (2005) and a linear version often used in
health services research. We then compare these to two corresponding DID
formulations - one with time fixed effects and one with time fixed effects and
group trends. We also re-analyze three previously published studies using these
methods. We demonstrate that the most general versions of CITS and DID impute
the same counterfactuals and estimate the same treatment effects. The only
difference between these two designs is the language used to describe them and
their popularity in distinct disciplines. We also show that these designs
diverge when one constrains them using linearity (CITS) or parallel trends
(DID). We recommend defaulting to the more flexible versions and provide advice
to practitioners on choosing between the more constrained versions by
considering the data-generating mechanism. We also recommend greater attention
to specifying the outcome model and counterfactuals in papers, allowing for
transparent evaluation of the plausibility of causal assumptions.

arXiv link: http://arxiv.org/abs/2006.11346v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-06-19

Proper scoring rules for evaluating asymmetry in density forecasting

Authors: Matteo Iacopini, Francesco Ravazzolo, Luca Rossini

This paper proposes a novel asymmetric continuous probabilistic score (ACPS)
for evaluating and comparing density forecasts. It extends the proposed score
and defines a weighted version, which emphasizes regions of interest, such as
the tails or the center of a variable's range. A test is also introduced to
statistically compare the predictive ability of different forecasts. The ACPS
is of general use in any situation where the decision maker has asymmetric
preferences in the evaluation of the forecasts. In an artificial experiment,
the implications of varying the level of asymmetry in the ACPS are illustrated.
Then, the proposed score and test are applied to assess and compare density
forecasts of macroeconomic relevant datasets (US employment growth) and of
commodity prices (oil and electricity prices) with particular focus on the
recent COVID-19 crisis period.

arXiv link: http://arxiv.org/abs/2006.11265v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-06-19

Sparse Quantile Regression

Authors: Le-Yu Chen, Sokbae Lee

We consider both $\ell _{0}$-penalized and $\ell _{0}$-constrained quantile
regression estimators. For the $\ell _{0}$-penalized estimator, we derive an
exponential inequality on the tail probability of excess quantile prediction
risk and apply it to obtain non-asymptotic upper bounds on the mean-square
parameter and regression function estimation errors. We also derive analogous
results for the $\ell _{0}$-constrained estimator. The resulting rates of
convergence are nearly minimax-optimal and the same as those for $\ell
_{1}$-penalized and non-convex penalized estimators. Further, we characterize
expected Hamming loss for the $\ell _{0}$-penalized estimator. We implement the
proposed procedure via mixed integer linear programming and also a more
scalable first-order approximation algorithm. We illustrate the finite-sample
performance of our approach in Monte Carlo experiments and its usefulness in a
real data application concerning conformal prediction of infant birth weights
(with $n\approx 10^{3}$ and up to $p>10^{3}$). In sum, our $\ell _{0}$-based
method produces a much sparser estimator than the $\ell _{1}$-penalized and
non-convex penalized approaches without compromising precision.

arXiv link: http://arxiv.org/abs/2006.11201v4

Econometrics arXiv updated paper (originally submitted: 2020-06-19)

On the Time Trend of COVID-19: A Panel Data Study

Authors: Chaohua Dong, Jiti Gao, Oliver Linton, Bin Peng

In this paper, we study the trending behaviour of COVID-19 data at country
level, and draw attention to some existing econometric tools which are
potentially helpful to understand the trend better in future studies. In our
empirical study, we find that European countries overall flatten the curves
more effectively compared to the other regions, while Asia & Oceania also
achieve some success, but the situations are not as optimistic elsewhere.
Africa and America are still facing serious challenges in terms of managing the
spread of the virus, and reducing the death rate, although in Africa the virus
spreads slower and has a lower death rate than the other regions. By comparing
the performances of different countries, our results incidentally agree with Gu
et al. (2020), though different approaches and models are considered. For
example, both works agree that countries such as USA, UK and Italy perform
relatively poorly; on the other hand, Australia, China, Japan, Korea, and
Singapore perform relatively better.

arXiv link: http://arxiv.org/abs/2006.11060v2

Econometrics arXiv paper, submitted: 2020-06-18

COVID-19 response needs to broaden financial inclusion to curb the rise in poverty

Authors: Mostak Ahamed, Roxana Gutiérrez-Romero

The ongoing COVID-19 pandemic risks wiping out years of progress made in
reducing global poverty. In this paper, we explore to what extent financial
inclusion could help mitigate the increase in poverty using cross-country data
across 78 low- and lower-middle-income countries. Unlike other recent
cross-country studies, we show that financial inclusion is a key driver of
poverty reduction in these countries. This effect is not direct, but indirect,
by mitigating the detrimental effect that inequality has on poverty. Our
findings are consistent across all the different measures of poverty used. Our
forecasts suggest that the world's population living on less than $1.90 per day
could increase from 8% to 14% by 2021, pushing nearly 400 million people into
poverty. However, urgent improvements in financial inclusion could
substantially reduce the impact on poverty.

arXiv link: http://arxiv.org/abs/2006.10706v1

Econometrics arXiv paper, submitted: 2020-06-18

Conflict in Africa during COVID-19: social distancing, food vulnerability and welfare response

Authors: Roxana Gutiérrez-Romero

We study the effect of social distancing, food vulnerability, welfare and
labour COVID-19 policy responses on riots, violence against civilians and
food-related conflicts. Our analysis uses georeferenced data for 24 African
countries with monthly local prices and real-time conflict data reported in the
Armed Conflict Location and Event Data Project (ACLED) from January 2015 until
early May 2020. Lockdowns and recent welfare policies have been implemented in
light of COVID-19, but in some contexts also likely in response to ongoing
conflicts. To mitigate the potential risk of endogeneity, we use instrumental
variables. We exploit the exogeneity of global commodity prices, and three
variables that increase the risk of COVID-19 and efficiency in response such as
countries colonial heritage, male mortality rate attributed to air pollution
and prevalence of diabetes in adults. We find that the probability of
experiencing riots, violence against civilians, food-related conflicts and food
looting has increased since lockdowns. Food vulnerability has been a
contributing factor. A 10% increase in the local price index is associated with
an increase of 0.7 percentage points in violence against civilians.
Nonetheless, for every additional anti-poverty measure implemented in response
to COVID-19 the probability of experiencing violence against civilians, riots
and food-related conflicts declines by approximately 0.2 percentage points.
These anti-poverty measures also reduce the number of fatalities associated
with these conflicts. Overall, our findings reveal that food vulnerability has
increased conflict risks, but also offer an optimistic view of the importance
of the state in providing an extensive welfare safety net.

arXiv link: http://arxiv.org/abs/2006.10696v1

Econometrics arXiv updated paper (originally submitted: 2020-06-18)

Sparse HP Filter: Finding Kinks in the COVID-19 Contact Rate

Authors: Sokbae Lee, Yuan Liao, Myung Hwan Seo, Youngki Shin

In this paper, we estimate the time-varying COVID-19 contact rate of a
Susceptible-Infected-Recovered (SIR) model. Our measurement of the contact rate
is constructed using data on actively infected, recovered and deceased cases.
We propose a new trend filtering method that is a variant of the
Hodrick-Prescott (HP) filter, constrained by the number of possible kinks. We
term it the $sparse HP filter$ and apply it to daily data from five
countries: Canada, China, South Korea, the UK and the US. Our new method yields
the kinks that are well aligned with actual events in each country. We find
that the sparse HP filter provides a fewer kinks than the $\ell_1$ trend
filter, while both methods fitting data equally well. Theoretically, we
establish risk consistency of both the sparse HP and $\ell_1$ trend filters.
Ultimately, we propose to use time-varying $contact growth rates$ to
document and monitor outbreaks of COVID-19.

arXiv link: http://arxiv.org/abs/2006.10555v2

Econometrics arXiv paper, submitted: 2020-06-18

Approximate Maximum Likelihood for Complex Structural Models

Authors: Veronika Czellar, David T. Frazier, Eric Renault

Indirect Inference (I-I) is a popular technique for estimating complex
parametric models whose likelihood function is intractable, however, the
statistical efficiency of I-I estimation is questionable. While the efficient
method of moments, Gallant and Tauchen (1996), promises efficiency, the price
to pay for this efficiency is a loss of parsimony and thereby a potential lack
of robustness to model misspecification. This stands in contrast to simpler I-I
estimation strategies, which are known to display less sensitivity to model
misspecification precisely due to their focus on specific elements of the
underlying structural model. In this research, we propose a new
simulation-based approach that maintains the parsimony of I-I estimation, which
is often critical in empirical applications, but can also deliver estimators
that are nearly as efficient as maximum likelihood. This new approach is based
on using a constrained approximation to the structural model, which ensures
identification and can deliver estimators that are nearly efficient. We
demonstrate this approach through several examples, and show that this approach
can deliver estimators that are nearly as efficient as maximum likelihood, when
feasible, but can be employed in many situations where maximum likelihood is
infeasible.

arXiv link: http://arxiv.org/abs/2006.10245v1

Econometrics arXiv updated paper (originally submitted: 2020-06-17)

Flexible Mixture Priors for Large Time-varying Parameter Models

Authors: Niko Hauzenberger

Time-varying parameter (TVP) models often assume that the TVPs evolve
according to a random walk. This assumption, however, might be questionable
since it implies that coefficients change smoothly and in an unbounded manner.
In this paper, we relax this assumption by proposing a flexible law of motion
for the TVPs in large-scale vector autoregressions (VARs). Instead of imposing
a restrictive random walk evolution of the latent states, we carefully design
hierarchical mixture priors on the coefficients in the state equation. These
priors effectively allow for discriminating between periods where coefficients
evolve according to a random walk and times where the TVPs are better
characterized by a stationary stochastic process. Moreover, this approach is
capable of introducing dynamic sparsity by pushing small parameter changes
towards zero if necessary. The merits of the model are illustrated by means of
two applications. Using synthetic data we show that our approach yields precise
parameter estimates. When applied to US data, the model reveals interesting
patterns of low-frequency dynamics in coefficients and forecasts well relative
to a wide range of competing models.

arXiv link: http://arxiv.org/abs/2006.10088v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-06-17

Using Experiments to Correct for Selection in Observational Studies

Authors: Susan Athey, Raj Chetty, Guido Imbens

Researchers increasingly have access to two types of data: (i) large
observational datasets where treatment (e.g., class size) is not randomized but
several primary outcomes (e.g., graduation rates) and secondary outcomes (e.g.,
test scores) are observed and (ii) experimental data in which treatment is
randomized but only secondary outcomes are observed. We develop a new method to
estimate treatment effects on primary outcomes in such settings. We use the
difference between the secondary outcome and its predicted value based on the
experimental treatment effect to measure selection bias in the observational
data. Controlling for this estimate of selection bias yields an unbiased
estimate of the treatment effect on the primary outcome under a new assumption
that we term latent unconfoundedness, which requires that the same confounders
affect the primary and secondary outcomes. Latent unconfoundedness weakens the
assumptions underlying commonly used surrogate estimators. We apply our
estimator to identify the effect of third grade class size on students
outcomes. Estimated impacts on test scores using OLS regressions in
observational school district data have the opposite sign of estimates from the
Tennessee STAR experiment. In contrast, selection-corrected estimates in the
observational data replicate the experimental estimates. Our estimator reveals
that reducing class sizes by 25% increases high school graduation rates by 0.7
percentage points. Controlling for observables does not change the OLS
estimates, demonstrating that experimental selection correction can remove
biases that cannot be addressed with standard controls.

arXiv link: http://arxiv.org/abs/2006.09676v2

Econometrics arXiv updated paper (originally submitted: 2020-06-17)

Adaptive, Rate-Optimal Hypothesis Testing in Nonparametric IV Models

Authors: Christoph Breunig, Xiaohong Chen

We propose a new adaptive hypothesis test for inequality (e.g., monotonicity,
convexity) and equality (e.g., parametric, semiparametric) restrictions on a
structural function in a nonparametric instrumental variables (NPIV) model. Our
test statistic is based on a modified leave-one-out sample analog of a
quadratic distance between the restricted and unrestricted sieve two-stage
least squares estimators. We provide computationally simple, data-driven
choices of sieve tuning parameters and Bonferroni adjusted chi-squared critical
values. Our test adapts to the unknown smoothness of alternative functions in
the presence of unknown degree of endogeneity and unknown strength of the
instruments. It attains the adaptive minimax rate of testing in $L^{2}$. That
is, the sum of the supremum of type I error over the composite null and the
supremum of type II error over nonparametric alternative models cannot be
minimized by any other tests for NPIV models of unknown regularities.
Confidence sets in $L^{2}$ are obtained by inverting the adaptive test.
Simulations confirm that, across different strength of instruments and sample
sizes, our adaptive test controls size and its finite-sample power greatly
exceeds existing non-adaptive tests for monotonicity and parametric
restrictions in NPIV models. Empirical applications to test for shape
restrictions of differentiated products demand and of Engel curves are
presented.

arXiv link: http://arxiv.org/abs/2006.09587v6

Econometrics arXiv updated paper (originally submitted: 2020-06-16)

Measuring Macroeconomic Uncertainty: The Labor Channel of Uncertainty from a Cross-Country Perspective

Authors: Andreas Dibiasi, Samad Sarferaz

This paper constructs internationally consistent measures of macroeconomic
uncertainty. Our econometric framework extracts uncertainty from revisions in
data obtained from standardized national accounts. Applying our model to
post-WWII real-time data, we estimate macroeconomic uncertainty for 39
countries. The cross-country dimension of our uncertainty data allows us to
study the impact of uncertainty shocks under different employment protection
legislation. Our empirical findings suggest that the effects of uncertainty
shocks are stronger and more persistent in countries with low employment
protection compared to countries with high employment protection. These
empirical findings are in line with a theoretical model under varying firing
cost.

arXiv link: http://arxiv.org/abs/2006.09007v2

Econometrics arXiv paper, submitted: 2020-06-14

Nonparametric Tests of Tail Behavior in Stochastic Frontier Models

Authors: William, C. Horrace, Yulong Wang

This article studies tail behavior for the error components in the stochastic
frontier model, where one component has bounded support on one side, and the
other has unbounded support on both sides. Under weak assumptions on the error
components, we derive nonparametric tests that the unbounded component
distribution has thin tails and that the component tails are equivalent. The
tests are useful diagnostic tools for stochastic frontier analysis. A
simulation study and an application to a stochastic cost frontier for 6,100 US
banks from 1998 to 2005 are provided. The new tests reject the normal or
Laplace distributional assumptions, which are commonly imposed in the existing
literature.

arXiv link: http://arxiv.org/abs/2006.07780v1

Econometrics arXiv updated paper (originally submitted: 2020-06-13)

Synthetic Interventions

Authors: Anish Agarwal, Devavrat Shah, Dennis Shen

The synthetic controls (SC) methodology is a prominent tool for policy
evaluation in panel data applications. Researchers commonly justify the SC
framework with a low-rank matrix factor model that assumes the potential
outcomes are described by low-dimensional unit and time specific latent
factors. In the recent work of [Abadie '20], one of the pioneering authors of
the SC method posed the question of how the SC framework can be extended to
multiple treatments. This article offers one resolution to this open question
that we call synthetic interventions (SI). Fundamental to the SI framework is a
low-rank tensor factor model, which extends the matrix factor model by
including a latent factorization over treatments. Under this model, we propose
a generalization of the standard SC-based estimators. We prove the consistency
for one instantiation of our approach and provide conditions under which it is
asymptotically normal. Moreover, we conduct a representative simulation to
study its prediction performance and revisit the canonical SC case study of
[Abadie-Diamond-Hainmueller '10] on the impact of anti-tobacco legislations by
exploring related questions not previously investigated.

arXiv link: http://arxiv.org/abs/2006.07691v7

Econometrics arXiv updated paper (originally submitted: 2020-06-13)

Horseshoe Prior Bayesian Quantile Regression

Authors: David Kohns, Tibor Szendrei

This paper extends the horseshoe prior of Carvalho et al. (2010) to Bayesian
quantile regression (HS-BQR) and provides a fast sampling algorithm for
computation in high dimensions. The performance of the proposed HS-BQR is
evaluated on Monte Carlo simulations and a high dimensional Growth-at-Risk
(GaR) forecasting application for the U.S. The Monte Carlo design considers
several sparsity and error structures. Compared to alternative shrinkage
priors, the proposed HS-BQR yields better (or at worst similar) performance in
coefficient bias and forecast error. The HS-BQR is particularly potent in
sparse designs and in estimating extreme quantiles. As expected, the
simulations also highlight that identifying quantile specific location and
scale effects for individual regressors in dense DGPs requires substantial
data. In the GaR application, we forecast tail risks as well as complete
forecast densities using the McCracken and Ng (2020) database. Quantile
specific and density calibration score functions show that the HS-BQR provides
the best performance, especially at short and medium run horizons. The ability
to produce well calibrated density forecasts and accurate downside risk
measures in large data contexts makes the HS-BQR a promising tool for
nowcasting applications and recession modelling.

arXiv link: http://arxiv.org/abs/2006.07655v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2020-06-12

Detangling robustness in high dimensions: composite versus model-averaged estimation

Authors: Jing Zhou, Gerda Claeskens, Jelena Bradic

Robust methods, though ubiquitous in practice, are yet to be fully understood
in the context of regularized estimation and high dimensions. Even simple
questions become challenging very quickly. For example, classical statistical
theory identifies equivalence between model-averaged and composite quantile
estimation. However, little to nothing is known about such equivalence between
methods that encourage sparsity. This paper provides a toolbox to further study
robustness in these settings and focuses on prediction. In particular, we study
optimally weighted model-averaged as well as composite $l_1$-regularized
estimation. Optimal weights are determined by minimizing the asymptotic mean
squared error. This approach incorporates the effects of regularization,
without the assumption of perfect selection, as is often used in practice. Such
weights are then optimal for prediction quality. Through an extensive
simulation study, we show that no single method systematically outperforms
others. We find, however, that model-averaged and composite quantile estimators
often outperform least-squares methods, even in the case of Gaussian model
noise. Real data application witnesses the method's practical use through the
reconstruction of compressed audio signals.

arXiv link: http://arxiv.org/abs/2006.07457v1

Econometrics arXiv paper, submitted: 2020-06-12

Minimax Estimation of Conditional Moment Models

Authors: Nishanth Dikkala, Greg Lewis, Lester Mackey, Vasilis Syrgkanis

We develop an approach for estimating models described via conditional moment
restrictions, with a prototypical application being non-parametric instrumental
variable regression. We introduce a min-max criterion function, under which the
estimation problem can be thought of as solving a zero-sum game between a
modeler who is optimizing over the hypothesis space of the target model and an
adversary who identifies violating moments over a test function space. We
analyze the statistical estimation rate of the resulting estimator for
arbitrary hypothesis spaces, with respect to an appropriate analogue of the
mean squared error metric, for ill-posed inverse problems. We show that when
the minimax criterion is regularized with a second moment penalty on the test
function and the test function space is sufficiently rich, then the estimation
rate scales with the critical radius of the hypothesis and test function
spaces, a quantity which typically gives tight fast rates. Our main result
follows from a novel localized Rademacher analysis of statistical learning
problems defined via minimax objectives. We provide applications of our main
results for several hypothesis spaces used in practice such as: reproducing
kernel Hilbert spaces, high dimensional sparse linear functions, spaces defined
via shape constraints, ensemble estimators such as random forests, and neural
networks. For each of these applications we provide computationally efficient
optimization methods for solving the corresponding minimax problem (e.g.
stochastic first-order heuristics for neural networks). In several
applications, we show how our modified mean squared error rate, combined with
conditions that bound the ill-posedness of the inverse problem, lead to mean
squared error rates. We conclude with an extensive experimental analysis of the
proposed methods.

arXiv link: http://arxiv.org/abs/2006.07201v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-06-12

Seemingly Unrelated Regression with Measurement Error: Estimation via Markov chain Monte Carlo and Mean Field Variational Bayes Approximation

Authors: Georges Bresson, Anoop Chaturvedi, Mohammad Arshad Rahman, Shalabh

Linear regression with measurement error in the covariates is a heavily
studied topic, however, the statistics/econometrics literature is almost silent
to estimating a multi-equation model with measurement error. This paper
considers a seemingly unrelated regression model with measurement error in the
covariates and introduces two novel estimation methods: a pure Bayesian
algorithm (based on Markov chain Monte Carlo techniques) and its mean field
variational Bayes (MFVB) approximation. The MFVB method has the added advantage
of being computationally fast and can handle big data. An issue pertinent to
measurement error models is parameter identification, and this is resolved by
employing a prior distribution on the measurement error variance. The methods
are shown to perform well in multiple simulation studies, where we analyze the
impact on posterior estimates arising due to different values of reliability
ratio or variance of the true unobserved quantity used in the data generating
process. The paper further implements the proposed algorithms in an application
drawn from the health literature and shows that modeling measurement error in
the data can improve model fitting.

arXiv link: http://arxiv.org/abs/2006.07074v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2020-06-12

Confidence Interval for Off-Policy Evaluation from Dependent Samples via Bandit Algorithm: Approach from Standardized Martingales

Authors: Masahiro Kato

This study addresses the problem of off-policy evaluation (OPE) from
dependent samples obtained via the bandit algorithm. The goal of OPE is to
evaluate a new policy using historical data obtained from behavior policies
generated by the bandit algorithm. Because the bandit algorithm updates the
policy based on past observations, the samples are not independent and
identically distributed (i.i.d.). However, several existing methods for OPE do
not take this issue into account and are based on the assumption that samples
are i.i.d. In this study, we address this problem by constructing an estimator
from a standardized martingale difference sequence. To standardize the
sequence, we consider using evaluation data or sample splitting with a two-step
estimation. This technique produces an estimator with asymptotic normality
without restricting a class of behavior policies. In an experiment, the
proposed estimator performs better than existing methods, which assume that the
behavior policy converges to a time-invariant policy.

arXiv link: http://arxiv.org/abs/2006.06982v1

Econometrics arXiv paper, submitted: 2020-06-11

Confidence sets for dynamic poverty indexes

Authors: Guglielmo D'Amico, Riccardo De Blasis, Philippe Regnault

In this study, we extend the research on the dynamic poverty indexes, namely
the dynamic Headcount ratio, the dynamic income-gap ratio, the dynamic Gini and
the dynamic Sen, proposed in D'Amico and Regnault (2018). The contribution is
twofold. First, we extend the computation of the dynamic Gini index, thus the
Sen index accordingly, with the inclusion of the inequality within each class
of poverty where people are classified according to their income. Second, for
each poverty index, we establish a central limit theorem that gives us the
possibility to determine the confidence sets. An application to the Italian
income data from 1998 to 2012 confirms the effectiveness of the considered
approach and the possibility to determine the evolution of poverty and
inequality in real economies.

arXiv link: http://arxiv.org/abs/2006.06595v1

Econometrics arXiv cross-link from cs.GT (cs.GT), submitted: 2020-06-11

Reserve Price Optimization for First Price Auctions

Authors: Zhe Feng, Sébastien Lahaie, Jon Schneider, Jinchao Ye

The display advertising industry has recently transitioned from second- to
first-price auctions as its primary mechanism for ad allocation and pricing. In
light of this, publishers need to re-evaluate and optimize their auction
parameters, notably reserve prices. In this paper, we propose a gradient-based
algorithm to adaptively update and optimize reserve prices based on estimates
of bidders' responsiveness to experimental shocks in reserves. Our key
innovation is to draw on the inherent structure of the revenue objective in
order to reduce the variance of gradient estimates and improve convergence
rates in both theory and practice. We show that revenue in a first-price
auction can be usefully decomposed into a demand component and a
bidding component, and introduce techniques to reduce the variance of
each component. We characterize the bias-variance trade-offs of these
techniques and validate the performance of our proposed algorithm through
experiments on synthetic data and real display ad auctions data from Google ad
exchange.

arXiv link: http://arxiv.org/abs/2006.06519v2

Econometrics arXiv paper, submitted: 2020-06-11

Text as data: a machine learning-based approach to measuring uncertainty

Authors: Rickard Nyman, Paul Ormerod

The Economic Policy Uncertainty index had gained considerable traction with
both academics and policy practitioners. Here, we analyse news feed data to
construct a simple, general measure of uncertainty in the United States using a
highly cited machine learning methodology. Over the period January 1996 through
May 2020, we show that the series unequivocally Granger-causes the EPU and
there is no Granger-causality in the reverse direction

arXiv link: http://arxiv.org/abs/2006.06457v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-06-11

What Drives Inflation and How: Evidence from Additive Mixed Models Selected by cAIC

Authors: Philipp F. M. Baumann, Enzo Rossi, Alexander Volkmann

We analyze the forces that explain inflation using a panel of 122 countries
from 1997 to 2015 with 37 regressors. 98 models motivated by economic theory
are compared to a gradient boosting algorithm, non-linearities and structural
breaks are considered. We show that the typical estimation methods are likely
to lead to fallacious policy conclusions which motivates the use of a new
approach that we propose in this paper. The boosting algorithm outperforms
theory-based models. We confirm that energy prices are important but what
really matters for inflation is their non-linear interplay with energy rents.
Demographic developments also make a difference. Globalization and technology,
public debt, central bank independence and political characteristics are less
relevant. GDP per capita is more relevant than the output gap, credit growth
more than M2 growth.

arXiv link: http://arxiv.org/abs/2006.06274v4

Econometrics arXiv paper, submitted: 2020-06-10

Trading Privacy for the Greater Social Good: How Did America React During COVID-19?

Authors: Anindya Ghose, Beibei Li, Meghanath Macha, Chenshuo Sun, Natasha Ying Zhang Foutz

Digital contact tracing and analysis of social distancing from smartphone
location data are two prime examples of non-therapeutic interventions used in
many countries to mitigate the impact of the COVID-19 pandemic. While many
understand the importance of trading personal privacy for the public good,
others have been alarmed at the potential for surveillance via measures enabled
through location tracking on smartphones. In our research, we analyzed massive
yet atomic individual-level location data containing over 22 billion records
from ten Blue (Democratic) and ten Red (Republican) cities in the U.S., based
on which we present, herein, some of the first evidence of how Americans
responded to the increasing concerns that government authorities, the private
sector, and public health experts might use individual-level location data to
track the COVID-19 spread. First, we found a significant decreasing trend of
mobile-app location-sharing opt-out. Whereas areas with more Democrats were
more privacy-concerned than areas with more Republicans before the advent of
the COVID-19 pandemic, there was a significant decrease in the overall opt-out
rates after COVID-19, and this effect was more salient among Democratic than
Republican cities. Second, people who practiced social distancing (i.e., those
who traveled less and interacted with fewer close contacts during the pandemic)
were also less likely to opt-out, whereas the converse was true for people who
practiced less social-distancing. This relationship also was more salient among
Democratic than Republican cities. Third, high-income populations and males,
compared with low-income populations and females, were more
privacy-conscientious and more likely to opt-out of location tracking.

arXiv link: http://arxiv.org/abs/2006.05859v1

Econometrics arXiv updated paper (originally submitted: 2020-06-08)

Heterogeneous Effects of Job Displacement on Earnings

Authors: Afrouz Azadikhah Jahromi, Brantly Callaway

This paper considers how the effect of job displacement varies across
different individuals. In particular, our interest centers on features of the
distribution of the individual-level effect of job displacement. Identifying
features of this distribution is particularly challenging -- e.g., even if we
could randomly assign workers to be displaced or not, many of the parameters
that we consider would not be point identified. We exploit our access to panel
data, and our approach relies on comparing outcomes of displaced workers to
outcomes the same workers would have experienced if they had not been displaced
and if they maintained the same rank in the distribution of earnings as they
had before they were displaced. Using data from the Displaced Workers Survey,
we find that displaced workers earn about $157 per week less, on average, than
they would have earned if they had not been displaced. We also find that there
is substantial heterogeneity. We estimate that 42% of workers have higher
earnings than they would have had if they had not been displaced and that a
large fraction of workers have experienced substantially more negative effects
than the average effect of displacement. Finally, we also document major
differences in the distribution of the effect of job displacement across
education levels, sex, age, and counterfactual earnings levels. Throughout the
paper, we rely heavily on quantile regression. First, we use quantile
regression as a flexible (yet feasible) first step estimator of conditional
distributions and quantile functions that our main results build on. We also
use quantile regression to study how covariates affect the distribution of the
individual-level effect of job displacement.

arXiv link: http://arxiv.org/abs/2006.04968v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-06-07

False (and Missed) Discoveries in Financial Economics

Authors: Campbell R. Harvey, Yan Liu

Multiple testing plagues many important questions in finance such as fund and
factor selection. We propose a new way to calibrate both Type I and Type II
errors. Next, using a double-bootstrap method, we establish a t-statistic
hurdle that is associated with a specific false discovery rate (e.g., 5%). We
also establish a hurdle that is associated with a certain acceptable ratio of
misses to false discoveries (Type II error scaled by Type I error), which
effectively allows for differential costs of the two types of mistakes.
Evaluating current methods, we find that they lack power to detect
outperforming managers.

arXiv link: http://arxiv.org/abs/2006.04269v1

Econometrics arXiv paper, submitted: 2020-06-07

Ensemble Learning with Statistical and Structural Models

Authors: Jiaming Mao, Jingzhi Xu

Statistical and structural modeling represent two distinct approaches to data
analysis. In this paper, we propose a set of novel methods for combining
statistical and structural models for improved prediction and causal inference.
Our first proposed estimator has the doubly robustness property in that it only
requires the correct specification of either the statistical or the structural
model. Our second proposed estimator is a weighted ensemble that has the
ability to outperform both models when they are both misspecified. Experiments
demonstrate the potential of our estimators in various settings, including
fist-price auctions, dynamic models of entry and exit, and demand estimation
with instrumental variables.

arXiv link: http://arxiv.org/abs/2006.05308v1

Econometrics arXiv paper, submitted: 2020-06-05

Inflation Dynamics of Financial Shocks

Authors: Olli Palmén

We study the effects of financial shocks on the United States economy by
using a Bayesian structural vector autoregressive (SVAR) model that exploits
the non-normalities in the data. We use this method to uniquely identify the
model and employ inequality constraints to single out financial shocks. The
results point to the existence of two distinct financial shocks that have
opposing effects on inflation, which supports the idea that financial shocks
are transmitted to the real economy through both demand and supply side
channels.

arXiv link: http://arxiv.org/abs/2006.03301v1

Econometrics arXiv paper, submitted: 2020-06-04

Evaluating the Effectiveness of Regional Lockdown Policies in the Containment of Covid-19: Evidence from Pakistan

Authors: Hamza Umer, Muhammad Salar Khan

To slow down the spread of Covid-19, administrative regions within Pakistan
imposed complete and partial lockdown restrictions on socio-economic
activities, religious congregations, and human movement. Here we examine the
impact of regional lockdown strategies on Covid-19 outcomes. After conducting
econometric analyses (Regression Discontinuity and Negative Binomial
Regressions) on official data from the National Institute of Health (NIH)
Pakistan, we find that the strategies did not lead to a similar level of
Covid-19 caseload (positive cases and deaths) in all regions. In terms of
reduction in the overall caseload (positive cases and deaths), compared to no
lockdown, complete and partial lockdown appeared to be effective in four
regions: Balochistan, Gilgit Baltistan (GT), Islamabad Capital Territory (ICT),
and Azad Jammu and Kashmir (AJK). Contrarily, complete and partial lockdowns
did not appear to be effective in containing the virus in the three largest
provinces of Punjab, Sindh, and Khyber Pakhtunkhwa (KPK). The observed regional
heterogeneity in the effectiveness of lockdowns advocates for a careful use of
lockdown strategies based on the demographic, social, and economic factors.

arXiv link: http://arxiv.org/abs/2006.02987v1

Econometrics arXiv paper, submitted: 2020-06-04

The pain of a new idea: Do Late Bloomers response to Extension Service in Rural Ethiopia?

Authors: Alexander Jordan, Marco Guerzoni

The paper analyses the efficiency of extension programs in the adoption of
chemical fertilisers in Ethiopia between 1994 and 2004. Fertiliser adoption
provides a suitable strategy to ensure and stabilize food production in remote
vulnerable areas. Extension services programs have a long history in supporting
the application of fertiliser. How-ever, their efficiency is questioned. In our
analysis, we focus on seven villages with a considerable time lag in fertiliser
diffusion. Using matching techniques avoids sample selection bias in the
comparison of treated (households received extension service) and controlled
households. Additionally to common factors, measures of culture, proxied by
ethnicity and religion, aim to control for potential tensions between extension
agents and peasants that hamper the efficiency of the program. We find a
considerable impact of extension service on the first fertiliser adoption. The
impact is consistent for five of seven villages.

arXiv link: http://arxiv.org/abs/2006.02846v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-06-04

Tensor Factor Model Estimation by Iterative Projection

Authors: Yuefeng Han, Rong Chen, Dan Yang, Cun-Hui Zhang

Tensor time series, which is a time series consisting of tensorial
observations, has become ubiquitous. It typically exhibits high dimensionality.
One approach for dimension reduction is to use a factor model structure, in a
form similar to Tucker tensor decomposition, except that the time dimension is
treated as a dynamic process with a time dependent structure. In this paper we
introduce two approaches to estimate such a tensor factor model by using
iterative orthogonal projections of the original tensor time series. These
approaches extend the existing estimation procedures and improve the estimation
accuracy and convergence rate significantly as proven in our theoretical
investigation. Our algorithms are similar to the higher order orthogonal
projection method for tensor decomposition, but with significant differences
due to the need to unfold tensors in the iterations and the use of
autocorrelation. Consequently, our analysis is significantly different from the
existing ones. Computational and statistical lower bounds are derived to prove
the optimality of the sample size requirement and convergence rate for the
proposed methods. Simulation study is conducted to further illustrate the
statistical properties of these estimators.

arXiv link: http://arxiv.org/abs/2006.02611v3

Econometrics arXiv updated paper (originally submitted: 2020-06-03)

Testing Finite Moment Conditions for the Consistency and the Root-N Asymptotic Normality of the GMM and M Estimators

Authors: Yuya Sasaki, Yulong Wang

Common approaches to inference for structural and reduced-form parameters in
empirical economic analysis are based on the consistency and the root-n
asymptotic normality of the GMM and M estimators. The canonical consistency
(respectively, root-n asymptotic normality) for these classes of estimators
requires at least the first (respectively, second) moment of the score to be
finite. In this article, we present a method of testing these conditions for
the consistency and the root-n asymptotic normality of the GMM and M
estimators. The proposed test controls size nearly uniformly over the set of
data generating processes that are compatible with the null hypothesis.
Simulation studies support this theoretical result. Applying the proposed test
to the market share data from the Dominick's Finer Foods retail chain, we find
that a common ad hoc procedure to deal with zero market shares in
analysis of differentiated products markets results in a failure to satisfy the
conditions for both the consistency and the root-n asymptotic normality.

arXiv link: http://arxiv.org/abs/2006.02541v3

Econometrics arXiv updated paper (originally submitted: 2020-06-03)

Capital and Labor Income Pareto Exponents across Time and Space

Authors: Tjeerd de Vries, Alexis Akira Toda

We estimate capital and labor income Pareto exponents across 475 country-year
observations that span 52 countries over half a century (1967-2018). We
document two stylized facts: (i) capital income is more unequally distributed
than labor income in the tail; namely, the capital exponent (1-3, median 1.46)
is smaller than labor (2-5, median 3.35), and (ii) capital and labor exponents
are nearly uncorrelated. To explain these findings, we build an incomplete
market model with job ladders and capital income risk that gives rise to a
capital income Pareto exponent smaller than but nearly unrelated to the labor
exponent. Our results suggest the importance of distinguishing income and
wealth inequality.

arXiv link: http://arxiv.org/abs/2006.03441v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-06-03

A Negative Correlation Strategy for Bracketing in Difference-in-Differences

Authors: Ting Ye, Luke Keele, Raiden Hasegawa, Dylan S. Small

The method of difference-in-differences (DID) is widely used to study the
causal effect of policy interventions in observational studies. DID employs a
before and after comparison of the treated and control units to remove bias due
to time-invariant unmeasured confounders under the parallel trends assumption.
Estimates from DID, however, will be biased if the outcomes for the treated and
control units evolve differently in the absence of treatment, namely if the
parallel trends assumption is violated. We propose a general identification
strategy that leverages two groups of control units whose outcomes relative to
the treated units exhibit a negative correlation, and achieves partial
identification of the average treatment effect for the treated. The identified
set is of a union bounds form that involves the minimum and maximum operators,
which makes the canonical bootstrap generally inconsistent and naive methods
overly conservative. By utilizing the directional inconsistency of the
bootstrap distribution, we develop a novel bootstrap method to construct
uniformly valid confidence intervals for the identified set and parameter of
interest when the identified set is of a union bounds form, and we establish
the method's theoretical properties. We develop a simple falsification test and
sensitivity analysis. We apply the proposed strategy for bracketing to study
whether minimum wage laws affect employment levels.

arXiv link: http://arxiv.org/abs/2006.02423v3

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-06-02

Evaluating Public Supports to the Investment Activities of Business Firms: A Multilevel Meta-Regression Analysis of Italian Studies

Authors: Chiara Bocci, Annalisa Caloffi, Marco Mariani, Alessandro Sterlacchini

We conduct an extensive meta-regression analysis of counterfactual programme
evaluations from Italy, considering both published and grey literature on
enterprise and innovation policies. We specify a multilevel model for the
probability of finding positive effect estimates, also assessing correlation
possibly induced by co-authorship networks. We find that the probability of
positive effects is considerable, especially for weaker firms and outcomes that
are directly targeted by public programmes. However, these policies are less
likely to trigger change in the long run.

arXiv link: http://arxiv.org/abs/2006.01880v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2020-06-02

Subjective Complexity Under Uncertainty

Authors: Quitzé Valenzuela-Stookey

Complexity of the problem of choosing among uncertain acts is a salient
feature of many of the environments in which departures from expected utility
theory are observed. I propose and axiomatize a model of choice under
uncertainty in which the size of the partition with respect to which an act is
measurable arises endogenously as a measure of subjective complexity. I derive
a representation of incomplete Simple Bounds preferences in which acts that are
complex from the perspective of the decision maker are bracketed by simple acts
to which they are related by statewise dominance. The key axioms are motivated
by a model of learning from limited data. I then consider choice behavior
characterized by a "cautious completion" of Simple Bounds preferences, and
discuss the relationship between this model and models of ambiguity aversion. I
develop general comparative statics results, and explore applications to
portfolio choice, contracting, and insurance choice.

arXiv link: http://arxiv.org/abs/2006.01852v3

Econometrics arXiv updated paper (originally submitted: 2020-06-02)

On the plausibility of the latent ignorability assumption

Authors: Martin Huber

The estimation of the causal effect of an endogenous treatment based on an
instrumental variable (IV) is often complicated by attrition, sample selection,
or non-response in the outcome of interest. To tackle the latter problem, the
latent ignorability (LI) assumption imposes that attrition/sample selection is
independent of the outcome conditional on the treatment compliance type (i.e.
how the treatment behaves as a function of the instrument), the instrument, and
possibly further observed covariates. As a word of caution, this note formally
discusses the strong behavioral implications of LI in rather standard IV
models. We also provide an empirical illustration based on the Job Corps
experimental study, in which the sensitivity of the estimated program effect to
LI and alternative assumptions about outcome attrition is investigated.

arXiv link: http://arxiv.org/abs/2006.01703v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-06-02

Explaining the distribution of energy consumption at slow charging infrastructure for electric vehicles from socio-economic data

Authors: Milan Straka, Rui Carvalho, Gijs van der Poel, Ľuboš Buzna

Here, we develop a data-centric approach enabling to analyse which
activities, function, and characteristics of the environment surrounding the
slow charging infrastructure impact the distribution of the electricity
consumed at slow charging infrastructure. To gain a basic insight, we analysed
the probabilistic distribution of energy consumption and its relation to
indicators characterizing charging events. We collected geospatial datasets and
utilizing statistical methods for data pre-processing, we prepared features
modelling the spatial context in which the charging infrastructure operates. To
enhance the statistical reliability of results, we applied the bootstrap method
together with the Lasso method that combines regression with variable selection
ability. We evaluate the statistical distributions of the selected regression
coefficients. We identified the most influential features correlated with
energy consumption, indicating that the spatial context of the charging
infrastructure affects its utilization pattern. Many of these features are
related to the economic prosperity of residents. Application of the methodology
to a specific class of charging infrastructure enables the differentiation of
selected features, e.g. by the used rollout strategy. Overall, the paper
demonstrates the application of statistical methodologies to energy data and
provides insights on factors potentially shaping the energy consumption that
could be utilized when developing models to inform charging infrastructure
deployment and planning of power grids.

arXiv link: http://arxiv.org/abs/2006.01672v2

Econometrics arXiv paper, submitted: 2020-06-02

Estimates of derivatives of (log) densities and related objects

Authors: Joris Pinkse, Karl Schurter

We estimate the density and its derivatives using a local polynomial
approximation to the logarithm of an unknown density $f$. The estimator is
guaranteed to be nonnegative and achieves the same optimal rate of convergence
in the interior as well as the boundary of the support of $f$. The estimator is
therefore well-suited to applications in which nonnegative density estimates
are required, such as in semiparametric maximum likelihood estimation. In
addition, we show that our estimator compares favorably with other kernel-based
methods, both in terms of asymptotic performance and computational ease.
Simulation results confirm that our method can perform similarly in finite
samples to these alternative methods when they are used with optimal inputs,
i.e. an Epanechnikov kernel and optimally chosen bandwidth sequence. Further
simulation evidence demonstrates that, if the researcher modifies the inputs
and chooses a larger bandwidth, our approach can even improve upon these
optimized alternatives, asymptotically. We provide code in several languages.

arXiv link: http://arxiv.org/abs/2006.01328v1

Econometrics arXiv updated paper (originally submitted: 2020-06-01)

Revisiting money and labor for valuing environmental goods and services in developing countries

Authors: Habtamu Tilahun Kassahun, Jette Bredahl Jacobsen, Charles F. Nicholson

Many Stated Preference studies conducted in developing countries provide a
low willingness to pay (WTP) for a wide range of goods and services. However,
recent studies in these countries indicate that this may partly be a result of
the choice of payment vehicle, not the preference for the good. Thus, low WTP
may not indicate a low welfare effect for public projects in developing
countries. We argue that in a setting where 1) there is imperfect
substitutability between money and other measures of wealth (e.g. labor), and
2) institutions are perceived to be corrupt, including payment vehicles that
are currently available to the individual and less pron to corruption may be
needed to obtain valid welfare estimates. Otherwise, we risk underestimating
the welfare benefit of projects. We demonstrate this through a rural household
contingent valuation (CV) survey designed to elicit the value of access to
reliable irrigation water in Ethiopia. Of the total average annual WTP for
access to reliable irrigation service, cash contribution comprises only 24.41
%. The implication is that socially desirable projects might be rejected based
on cost-benefit analysis as a result of welfare gain underestimation due to
mismatch of payment vehicles choice in valuation study.

arXiv link: http://arxiv.org/abs/2006.01290v3

Econometrics arXiv updated paper (originally submitted: 2020-06-01)

New Approaches to Robust Inference on Market (Non-)Efficiency, Volatility Clustering and Nonlinear Dependence

Authors: Rustam Ibragimov, Rasmus Pedersen, Anton Skrobotov

Many financial and economic variables, including financial returns, exhibit
nonlinear dependence, heterogeneity and heavy-tailedness. These properties may
make problematic the analysis of (non-)efficiency and volatility clustering in
economic and financial markets using traditional approaches that appeal to
asymptotic normality of sample autocorrelation functions of returns and their
squares.
This paper presents new approaches to deal with the above problems. We
provide the results that motivate the use of measures of market
(non-)efficiency and volatility clustering based on (small) powers of absolute
returns and their signed versions.
We further provide new approaches to robust inference on the measures in the
case of general time series, including GARCH-type processes. The approaches are
based on robust $t-$statistics tests and new results on their applicability are
presented. In the approaches, parameter estimates (e.g., estimates of measures
of nonlinear dependence) are computed for groups of data, and the inference is
based on $t-$statistics in the resulting group estimates. This results in valid
robust inference under heterogeneity and dependence assumptions satisfied in
real-world financial markets. Numerical results and empirical applications
confirm the advantages and wide applicability of the proposed approaches.

arXiv link: http://arxiv.org/abs/2006.01212v4

Econometrics arXiv updated paper (originally submitted: 2020-06-01)

New robust inference for predictive regressions

Authors: Rustam Ibragimov, Jihyun Kim, Anton Skrobotov

We propose two robust methods for testing hypotheses on unknown parameters of
predictive regression models under heterogeneous and persistent volatility as
well as endogenous, persistent and/or fat-tailed regressors and errors. The
proposed robust testing approaches are applicable both in the case of discrete
and continuous time models. Both of the methods use the Cauchy estimator to
effectively handle the problems of endogeneity, persistence and/or
fat-tailedness in regressors and errors. The difference between our two methods
is how the heterogeneous volatility is controlled. The first method relies on
robust t-statistic inference using group estimators of a regression parameter
of interest proposed in Ibragimov and Muller, 2010. It is simple to implement,
but requires the exogenous volatility assumption. To relax the exogenous
volatility assumption, we propose another method which relies on the
nonparametric correction of volatility. The proposed methods perform well
compared with widely used alternative inference procedures in terms of their
finite sample properties.

arXiv link: http://arxiv.org/abs/2006.01191v4

Econometrics arXiv updated paper (originally submitted: 2020-06-01)

Do Public Program Benefits Crowd Out Private Transfers in Developing Countries? A Critical Review of Recent Evidence

Authors: Plamen Nikolov, Matthew Bonci

Precipitated by rapid globalization, rising inequality, population growth,
and longevity gains, social protection programs have been on the rise in low-
and middle-income countries (LMICs) in the last three decades. However, the
introduction of public benefits could displace informal mechanisms for
risk-protection, which are especially prevalent in LMICs. If the displacement
of private transfers is considerably large, the expansion of social protection
programs could even lead to social welfare loss. In this paper, we critically
survey the recent empirical literature on crowd-out effects in response to
public policies, specifically in the context of LMICs. We review and synthesize
patterns from the behavioral response to various types of social protection
programs. Furthermore, we specifically examine for heterogeneous treatment
effects by important socioeconomic characteristics. We conclude by drawing on
lessons from our synthesis of studies. If poverty reduction objectives are
considered, along with careful program targeting that accounts for potential
crowd-out effects, there may well be a net social gain.

arXiv link: http://arxiv.org/abs/2006.00737v2

Econometrics arXiv paper, submitted: 2020-06-01

Influence via Ethos: On the Persuasive Power of Reputation in Deliberation Online

Authors: Emaad Manzoor, George H. Chen, Dokyun Lee, Michael D. Smith

Deliberation among individuals online plays a key role in shaping the
opinions that drive votes, purchases, donations and other critical offline
behavior. Yet, the determinants of opinion-change via persuasion in
deliberation online remain largely unexplored. Our research examines the
persuasive power of $ethos$ -- an individual's "reputation" -- using a
7-year panel of over a million debates from an argumentation platform
containing explicit indicators of successful persuasion. We identify the causal
effect of reputation on persuasion by constructing an instrument for reputation
from a measure of past debate competition, and by controlling for unstructured
argument text using neural models of language in the double machine-learning
framework. We find that an individual's reputation significantly impacts their
persuasion rate above and beyond the validity, strength and presentation of
their arguments. In our setting, we find that having 10 additional reputation
points causes a 31% increase in the probability of successful persuasion over
the platform average. We also find that the impact of reputation is moderated
by characteristics of the argument content, in a manner consistent with a
theoretical model that attributes the persuasive power of reputation to
heuristic information-processing under cognitive overload. We discuss
managerial implications for platforms that facilitate deliberative
decision-making for public and private organizations online.

arXiv link: http://arxiv.org/abs/2006.00707v1

Econometrics arXiv paper, submitted: 2020-05-31

Lockdown Strategies, Mobility Patterns and COVID-19

Authors: Nikos Askitas, Konstantinos Tatsiramos, Bertrand Verheyden

We develop a multiple-events model and exploit within and between country
variation in the timing, type and level of intensity of various public policies
to study their dynamic effects on the daily incidence of COVID-19 and on
population mobility patterns across 135 countries. We remove concurrent policy
bias by taking into account the contemporaneous presence of multiple
interventions. The main result of the paper is that cancelling public events
and imposing restrictions on private gatherings followed by school closures
have quantitatively the most pronounced effects on reducing the daily incidence
of COVID-19. They are followed by workplace as well as stay-at-home
requirements, whose statistical significance and levels of effect are not as
pronounced. Instead, we find no effects for international travel controls,
public transport closures and restrictions on movements across cities and
regions. We establish that these findings are mediated by their effect on
population mobility patterns in a manner consistent with time-use and
epidemiological factors.

arXiv link: http://arxiv.org/abs/2006.00531v1

Econometrics arXiv paper, submitted: 2020-05-30

Statistical Decision Properties of Imprecise Trials Assessing COVID-19 Drugs

Authors: Charles F. Manski, Aleksey Tetenov

As the COVID-19 pandemic progresses, researchers are reporting findings of
randomized trials comparing standard care with care augmented by experimental
drugs. The trials have small sample sizes, so estimates of treatment effects
are imprecise. Seeing imprecision, clinicians reading research articles may
find it difficult to decide when to treat patients with experimental drugs.
Whatever decision criterion one uses, there is always some probability that
random variation in trial outcomes will lead to prescribing sub-optimal
treatments. A conventional practice when comparing standard care and an
innovation is to choose the innovation only if the estimated treatment effect
is positive and statistically significant. This practice defers to standard
care as the status quo. To evaluate decision criteria, we use the concept of
near-optimality, which jointly considers the probability and magnitude of
decision errors. An appealing decision criterion from this perspective is the
empirical success rule, which chooses the treatment with the highest observed
average patient outcome in the trial. Considering the design of recent and
ongoing COVID-19 trials, we show that the empirical success rule yields
treatment results that are much closer to optimal than those generated by
prevailing decision criteria based on hypothesis tests.

arXiv link: http://arxiv.org/abs/2006.00343v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-05-30

Parametric Modeling of Quantile Regression Coefficient Functions with Longitudinal Data

Authors: Paolo Frumento, Matteo Bottai, Iván Fernández-Val

In ordinary quantile regression, quantiles of different order are estimated
one at a time. An alternative approach, which is referred to as quantile
regression coefficients modeling (QRCM), is to model quantile regression
coefficients as parametric functions of the order of the quantile. In this
paper, we describe how the QRCM paradigm can be applied to longitudinal data.
We introduce a two-level quantile function, in which two different quantile
regression models are used to describe the (conditional) distribution of the
within-subject response and that of the individual effects. We propose a novel
type of penalized fixed-effects estimator, and discuss its advantages over
standard methods based on $\ell_1$ and $\ell_2$ penalization. We provide model
identifiability conditions, derive asymptotic properties, describe
goodness-of-fit measures and model selection criteria, present simulation
results, and discuss an application. The proposed method has been implemented
in the R package qrcm.

arXiv link: http://arxiv.org/abs/2006.00160v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2020-05-30

The impacts of asymmetry on modeling and forecasting realized volatility in Japanese stock markets

Authors: Daiki Maki, Yasushi Ota

This study investigates the impacts of asymmetry on the modeling and
forecasting of realized volatility in the Japanese futures and spot stock
markets. We employ heterogeneous autoregressive (HAR) models allowing for three
types of asymmetry: positive and negative realized semivariance (RSV),
asymmetric jumps, and leverage effects. The estimation results show that
leverage effects clearly influence the modeling of realized volatility models.
Leverage effects exist for both the spot and futures markets in the Nikkei 225.
Although realized semivariance aids better modeling, the estimations of RSV
models depend on whether these models have leverage effects. Asymmetric jump
components do not have a clear influence on realized volatility models. While
leverage effects and realized semivariance also improve the out-of-sample
forecast performance of volatility models, asymmetric jumps are not useful for
predictive ability. The empirical results of this study indicate that
asymmetric information, in particular, leverage effects and realized
semivariance, yield better modeling and more accurate forecast performance.
Accordingly, asymmetric information should be included when we model and
forecast the realized volatility of Japanese stock markets.

arXiv link: http://arxiv.org/abs/2006.00158v1

Econometrics arXiv updated paper (originally submitted: 2020-05-28)

Causal Impact of Masks, Policies, Behavior on Early Covid-19 Pandemic in the U.S

Authors: Victor Chernozhukov, Hiroyuki Kasaha, Paul Schrimpf

This paper evaluates the dynamic impact of various policies adopted by US
states on the growth rates of confirmed Covid-19 cases and deaths as well as
social distancing behavior measured by Google Mobility Reports, where we take
into consideration people's voluntarily behavioral response to new information
of transmission risks. Our analysis finds that both policies and information on
transmission risks are important determinants of Covid-19 cases and deaths and
shows that a change in policies explains a large fraction of observed changes
in social distancing behavior. Our counterfactual experiments suggest that
nationally mandating face masks for employees on April 1st could have reduced
the growth rate of cases and deaths by more than 10 percentage points in late
April, and could have led to as much as 17 to 55 percent less deaths nationally
by the end of May, which roughly translates into 17 to 55 thousand saved lives.
Our estimates imply that removing non-essential business closures (while
maintaining school closures, restrictions on movie theaters and restaurants)
could have led to -20 to 60 percent more cases and deaths by the end of May. We
also find that, without stay-at-home orders, cases would have been larger by 25
to 170 percent, which implies that 0.5 to 3.4 million more Americans could have
been infected if stay-at-home orders had not been implemented. Finally, not
having implemented any policies could have led to at least a 7 fold increase
with an uninformative upper bound in cases (and deaths) by the end of May in
the US, with considerable uncertainty over the effects of school closures,
which had little cross-sectional variation.

arXiv link: http://arxiv.org/abs/2005.14168v4

Econometrics arXiv updated paper (originally submitted: 2020-05-28)

Machine Learning Time Series Regressions with an Application to Nowcasting

Authors: Andrii Babii, Eric Ghysels, Jonas Striaukas

This paper introduces structured machine learning regressions for
high-dimensional time series data potentially sampled at different frequencies.
The sparse-group LASSO estimator can take advantage of such time series data
structures and outperforms the unstructured LASSO. We establish oracle
inequalities for the sparse-group LASSO estimator within a framework that
allows for the mixing processes and recognizes that the financial and the
macroeconomic data may have heavier than exponential tails. An empirical
application to nowcasting US GDP growth indicates that the estimator performs
favorably compared to other alternatives and that text data can be a useful
addition to more traditional numerical data.

arXiv link: http://arxiv.org/abs/2005.14057v4

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2020-05-27

Breiman's "Two Cultures" Revisited and Reconciled

Authors: Subhadeep, Mukhopadhyay, Kaijun Wang

In a landmark paper published in 2001, Leo Breiman described the tense
standoff between two cultures of data modeling: parametric statistical and
algorithmic machine learning. The cultural division between these two
statistical learning frameworks has been growing at a steady pace in recent
years. What is the way forward? It has become blatantly obvious that this
widening gap between "the two cultures" cannot be averted unless we find a way
to blend them into a coherent whole. This article presents a solution by
establishing a link between the two cultures. Through examples, we describe the
challenges and potential gains of this new integrated statistical thinking.

arXiv link: http://arxiv.org/abs/2005.13596v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-05-27

Probabilistic multivariate electricity price forecasting using implicit generative ensemble post-processing

Authors: Tim Janke, Florian Steinke

The reliable estimation of forecast uncertainties is crucial for
risk-sensitive optimal decision making. In this paper, we propose implicit
generative ensemble post-processing, a novel framework for multivariate
probabilistic electricity price forecasting. We use a likelihood-free implicit
generative model based on an ensemble of point forecasting models to generate
multivariate electricity price scenarios with a coherent dependency structure
as a representation of the joint predictive distribution. Our ensemble
post-processing method outperforms well-established model combination
benchmarks. This is demonstrated on a data set from the German day-ahead
market. As our method works on top of an ensemble of domain-specific expert
models, it can readily be deployed to other forecasting tasks.

arXiv link: http://arxiv.org/abs/2005.13417v1

Econometrics arXiv updated paper (originally submitted: 2020-05-25)

Fair Policy Targeting

Authors: Davide Viviano, Jelena Bradic

One of the major concerns of targeting interventions on individuals in social
welfare programs is discrimination: individualized treatments may induce
disparities across sensitive attributes such as age, gender, or race. This
paper addresses the question of the design of fair and efficient treatment
allocation rules. We adopt the non-maleficence perspective of first do no harm:
we select the fairest allocation within the Pareto frontier. We cast the
optimization into a mixed-integer linear program formulation, which can be
solved using off-the-shelf algorithms. We derive regret bounds on the
unfairness of the estimated policy function and small sample guarantees on the
Pareto frontier under general notions of fairness. Finally, we illustrate our
method using an application from education economics.

arXiv link: http://arxiv.org/abs/2005.12395v3

Econometrics arXiv updated paper (originally submitted: 2020-05-25)

An alternative to synthetic control for models with many covariates under sparsity

Authors: Marianne Bléhaut, Xavier D'Haultfoeuille, Jérémy L'Hour, Alexandre B. Tsybakov

The synthetic control method is a an econometric tool to evaluate causal
effects when only one unit is treated. While initially aimed at evaluating the
effect of large-scale macroeconomic changes with very few available control
units, it has increasingly been used in place of more well-known
microeconometric tools in a broad range of applications, but its properties in
this context are unknown. This paper introduces an alternative to the synthetic
control method, which is developed both in the usual asymptotic framework and
in the high-dimensional scenario. We propose an estimator of average treatment
effect that is doubly robust, consistent and asymptotically normal. It is also
immunized against first-step selection mistakes. We illustrate these properties
using Monte Carlo simulations and applications to both standard and potentially
high-dimensional settings, and offer a comparison with the synthetic control
method.

arXiv link: http://arxiv.org/abs/2005.12225v2

Econometrics arXiv updated paper (originally submitted: 2020-05-25)

Bootstrap Inference for Quantile Treatment Effects in Randomized Experiments with Matched Pairs

Authors: Liang Jiang, Xiaobin Liu, Peter C. B. Phillips, Yichong Zhang

This paper examines methods of inference concerning quantile treatment
effects (QTEs) in randomized experiments with matched-pairs designs (MPDs).
Standard multiplier bootstrap inference fails to capture the negative
dependence of observations within each pair and is therefore conservative.
Analytical inference involves estimating multiple functional quantities that
require several tuning parameters. Instead, this paper proposes two bootstrap
methods that can consistently approximate the limit distribution of the
original QTE estimator and lessen the burden of tuning parameter choice. Most
especially, the inverse propensity score weighted multiplier bootstrap can be
implemented without knowledge of pair identities.

arXiv link: http://arxiv.org/abs/2005.11967v4

Econometrics arXiv paper, submitted: 2020-05-23

Macroeconomic factors for inflation in Argentine 2013-2019

Authors: Manuel Lopez Galvan

The aim of this paper is to investigate the use of the Factor Analysis in
order to identify the role of the relevant macroeconomic variables in driving
the inflation. The Macroeconomic predictors that usually affect the inflation
are summarized using a small number of factors constructed by the principal
components. This allows us to identify the crucial role of money growth,
inflation expectation and exchange rate in driving the inflation. Then we use
this factors to build econometric models to forecast inflation. Specifically,
we use univariate and multivariate models such as classical autoregressive,
Factor models and FAVAR models. Results of forecasting suggest that models
which incorporate more economic information outperform the benchmark.
Furthermore, causality test and impulse response are performed in order to
examine the short-run dynamics of inflation to shocks in the principal factors.

arXiv link: http://arxiv.org/abs/2005.11455v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-05-22

The probability of a robust inference for internal validity and its applications in regression models

Authors: Tenglong Li, Kenneth A. Frank

The internal validity of observational study is often subject to debate. In
this study, we define the unobserved sample based on the counterfactuals and
formalize its relationship with the null hypothesis statistical testing (NHST)
for regression models. The probability of a robust inference for internal
validity, i.e., the PIV, is the probability of rejecting the null hypothesis
again based on the ideal sample which is defined as the combination of the
observed and unobserved samples, provided the same null hypothesis has already
been rejected for the observed sample. When the unconfoundedness assumption is
dubious, one can bound the PIV of an inference based on bounded belief about
the mean counterfactual outcomes, which is often needed in this case.
Essentially, the PIV is statistical power of the NHST that is thought to be
built on the ideal sample. We summarize the process of evaluating internal
validity with the PIV into a six-step procedure and illustrate it with an
empirical example (i.e., Hong and Raudenbush (2005)).

arXiv link: http://arxiv.org/abs/2005.12784v1

Econometrics arXiv updated paper (originally submitted: 2020-05-20)

On the Nuisance of Control Variables in Regression Analysis

Authors: Paul Hünermund, Beyers Louw

Control variables are included in regression analyses to estimate the causal
effect of a treatment on an outcome. In this paper, we argue that the estimated
effect sizes of controls are unlikely to have a causal interpretation
themselves, though. This is because even valid controls are possibly endogenous
and represent a combination of several different causal mechanisms operating
jointly on the outcome, which is hard to interpret theoretically. Therefore, we
recommend refraining from interpreting marginal effects of controls and
focusing on the main variables of interest, for which a plausible
identification argument can be established. To prevent erroneous managerial or
policy implications, coefficients of control variables should be clearly marked
as not having a causal interpretation or omitted from regression tables
altogether. Moreover, we advise against using control variable estimates for
subsequent theory building and meta-analyses.

arXiv link: http://arxiv.org/abs/2005.10314v5

Econometrics arXiv cross-link from q-fin.RM (q-fin.RM), submitted: 2020-05-20

Stochastic modeling of assets and liabilities with mortality risk

Authors: Sergio Alvares Maffra, John Armstrong, Teemu Pennanen

This paper describes a general approach for stochastic modeling of assets
returns and liability cash-flows of a typical pensions insurer. On the asset
side, we model the investment returns on equities and various classes of
fixed-income instruments including short- and long-maturity fixed-rate bonds as
well as index-linked and corporate bonds. On the liability side, the risks are
driven by future mortality developments as well as price and wage inflation.
All the risk factors are modeled as a multivariate stochastic process that
captures the dynamics and the dependencies across different risk factors. The
model is easy to interpret and to calibrate to both historical data and to
forecasts or expert views concerning the future. The simple structure of the
model allows for efficient computations. The construction of a million
scenarios takes only a few minutes on a personal computer. The approach is
illustrated with an asset-liability analysis of a defined benefit pension fund.

arXiv link: http://arxiv.org/abs/2005.09974v1

Econometrics arXiv paper, submitted: 2020-05-20

Uniform Rates for Kernel Estimators of Weakly Dependent Data

Authors: Juan Carlos Escanciano

This paper provides new uniform rate results for kernel estimators of
absolutely regular stationary processes that are uniform in the bandwidth and
in infinite-dimensional classes of dependent variables and regressors. Our
results are useful for establishing asymptotic theory for two-step
semiparametric estimators in time series models. We apply our results to obtain
nonparametric estimates and their rates for Expected Shortfall processes.

arXiv link: http://arxiv.org/abs/2005.09951v1

Econometrics arXiv updated paper (originally submitted: 2020-05-19)

Treatment recommendation with distributional targets

Authors: Anders Bredahl Kock, David Preinerstorfer, Bezirgen Veliyev

We study the problem of a decision maker who must provide the best possible
treatment recommendation based on an experiment. The desirability of the
outcome distribution resulting from the policy recommendation is measured
through a functional capturing the distributional characteristic that the
decision maker is interested in optimizing. This could be, e.g., its inherent
inequality, welfare, level of poverty or its distance to a desired outcome
distribution. If the functional of interest is not quasi-convex or if there are
constraints, the optimal recommendation may be a mixture of treatments. This
vastly expands the set of recommendations that must be considered. We
characterize the difficulty of the problem by obtaining maximal expected regret
lower bounds. Furthermore, we propose two (near) regret-optimal policies. The
first policy is static and thus applicable irrespectively of subjects arriving
sequentially or not in the course of the experimentation phase. The second
policy can utilize that subjects arrive sequentially by successively
eliminating inferior treatments and thus spends the sampling effort where it is
most needed.

arXiv link: http://arxiv.org/abs/2005.09717v4

Econometrics arXiv updated paper (originally submitted: 2020-05-19)

Evaluating Policies Early in a Pandemic: Bounding Policy Effects with Nonrandomly Missing Data

Authors: Brantly Callaway, Tong Li

During the early part of the Covid-19 pandemic, national and local
governments introduced a number of policies to combat the spread of Covid-19.
In this paper, we propose a new approach to bound the effects of such
early-pandemic policies on Covid-19 cases and other outcomes while dealing with
complications arising from (i) limited availability of Covid-19 tests, (ii)
differential availability of Covid-19 tests across locations, and (iii)
eligibility requirements for individuals to be tested. We use our approach
study the effects of Tennessee's expansion of Covid-19 testing early in the
pandemic and find that the policy decreased Covid-19 cases.

arXiv link: http://arxiv.org/abs/2005.09605v6

Econometrics arXiv paper, submitted: 2020-05-19

Instrumental Variables with Treatment-Induced Selection: Exact Bias Results

Authors: Felix Elwert, Elan Segarra

Instrumental variables (IV) estimation suffers selection bias when the
analysis conditions on the treatment. Judea Pearl's early graphical definition
of instrumental variables explicitly prohibited conditioning on the treatment.
Nonetheless, the practice remains common. In this paper, we derive exact
analytic expressions for IV selection bias across a range of data-generating
models, and for various selection-inducing procedures. We present four sets of
results for linear models. First, IV selection bias depends on the conditioning
procedure (covariate adjustment vs. sample truncation). Second, IV selection
bias due to covariate adjustment is the limiting case of IV selection bias due
to sample truncation. Third, in certain models, the IV and OLS estimators under
selection bound the true causal effect in large samples. Fourth, we
characterize situations where IV remains preferred to OLS despite selection on
the treatment. These results broaden the notion of IV selection bias beyond
sample truncation, replace prior simulation findings with exact analytic
formulas, and enable formal sensitivity analyses.

arXiv link: http://arxiv.org/abs/2005.09583v1

Econometrics arXiv paper, submitted: 2020-05-19

A Flexible Stochastic Conditional Duration Model

Authors: Samuel Gingras, William J. McCausland

We introduce a new stochastic duration model for transaction times in asset
markets. We argue that widely accepted rules for aggregating seemingly related
trades mislead inference pertaining to durations between unrelated trades:
while any two trades executed in the same second are probably related, it is
extremely unlikely that all such pairs of trades are, in a typical sample. By
placing uncertainty about which trades are related within our model, we improve
inference for the distribution of durations between unrelated trades,
especially near zero. We introduce a normalized conditional distribution for
durations between unrelated trades that is both flexible and amenable to
shrinkage towards an exponential distribution, which we argue is an appropriate
first-order model. Thanks to highly efficient draws of state variables,
numerical efficiency of posterior simulation is much higher than in previous
studies. In an empirical application, we find that the conditional hazard
function for durations between unrelated trades varies much less than what most
studies find. We claim that this is because we avoid statistical artifacts that
arise from deterministic trade-aggregation rules and unsuitable parametric
distributions.

arXiv link: http://arxiv.org/abs/2005.09166v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-05-18

Is being an only child harmful to psychological health?: Evidence from an instrumental variable analysis of China's One-Child Policy

Authors: Shuxi Zeng, Fan Li, Peng Ding

This paper evaluates the effects of being an only child in a family on
psychological health, leveraging data on the One-Child Policy in China. We use
an instrumental variable approach to address the potential unmeasured
confounding between the fertility decision and psychological health, where the
instrumental variable is an index on the intensity of the implementation of the
One-Child Policy. We establish an analytical link between the local
instrumental variable approach and principal stratification to accommodate the
continuous instrumental variable. Within the principal stratification
framework, we postulate a Bayesian hierarchical model to infer various causal
estimands of policy interest while adjusting for the clustering data structure.
We apply the method to the data from the China Family Panel Studies and find
small but statistically significant negative effects of being an only child on
self-reported psychological health for some subpopulations. Our analysis
reveals treatment effect heterogeneity with respect to both observed and
unobserved characteristics. In particular, urban males suffer the most from
being only children, and the negative effect has larger magnitude if the
families were more resistant to the One-Child Policy. We also conduct
sensitivity analysis to assess the key instrumental variable assumption.

arXiv link: http://arxiv.org/abs/2005.09130v2

Econometrics arXiv updated paper (originally submitted: 2020-05-18)

Role models and revealed gender-specific costs of STEM in an extended Roy model of major choice

Authors: Marc Henry, Romuald Meango, Ismael Mourifie

We derive sharp bounds on the non consumption utility component in an
extended Roy model of sector selection. We interpret this non consumption
utility component as a compensating wage differential. The bounds are derived
under the assumption that potential utilities in each sector are (jointly)
stochastically monotone with respect to an observed selection shifter. The
research is motivated by the analysis of women's choice of university major,
their under representation in mathematics intensive fields, and the impact of
role models on choices and outcomes. To illustrate our methodology, we
investigate the cost of STEM fields with data from a German graduate survey,
and using the mother's education level and the proportion of women on the STEM
faculty at the time of major choice as selection shifters.

arXiv link: http://arxiv.org/abs/2005.09095v4

Econometrics arXiv paper, submitted: 2020-05-18

Irregular Identification of Structural Models with Nonparametric Unobserved Heterogeneity

Authors: Juan Carlos Escanciano

One of the most important empirical findings in microeconometrics is the
pervasiveness of heterogeneity in economic behaviour (cf. Heckman 2001). This
paper shows that cumulative distribution functions and quantiles of the
nonparametric unobserved heterogeneity have an infinite efficiency bound in
many structural economic models of interest. The paper presents a relatively
simple check of this fact. The usefulness of the theory is demonstrated with
several relevant examples in economics, including, among others, the proportion
of individuals with severe long term unemployment duration, the average
marginal effect and the proportion of individuals with a positive marginal
effect in a correlated random coefficient model with heterogenous first-stage
effects, and the distribution and quantiles of random coefficients in linear,
binary and the Mixed Logit models. Monte Carlo simulations illustrate the
finite sample implications of our findings for the distribution and quantiles
of the random coefficients in the Mixed Logit model.

arXiv link: http://arxiv.org/abs/2005.08611v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-05-16

Nested Model Averaging on Solution Path for High-dimensional Linear Regression

Authors: Yang Feng, Qingfeng Liu

We study the nested model averaging method on the solution path for a
high-dimensional linear regression problem. In particular, we propose to
combine model averaging with regularized estimators (e.g., lasso and SLOPE) on
the solution path for high-dimensional linear regression. In simulation
studies, we first conduct a systematic investigation on the impact of predictor
ordering on the behavior of nested model averaging, then show that nested model
averaging with lasso and SLOPE compares favorably with other competing methods,
including the infeasible lasso and SLOPE with the tuning parameter optimally
selected. A real data analysis on predicting the per capita violent crime in
the United States shows an outstanding performance of the nested model
averaging with lasso.

arXiv link: http://arxiv.org/abs/2005.08057v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-05-16

Conformal Prediction: a Unified Review of Theory and New Challenges

Authors: Matteo Fontana, Gianluca Zeni, Simone Vantini

In this work we provide a review of basic ideas and novel developments about
Conformal Prediction -- an innovative distribution-free, non-parametric
forecasting method, based on minimal assumptions -- that is able to yield in a
very straightforward way predictions sets that are valid in a statistical sense
also in in the finite sample case. The in-depth discussion provided in the
paper covers the theoretical underpinnings of Conformal Prediction, and then
proceeds to list the more advanced developments and adaptations of the original
idea.

arXiv link: http://arxiv.org/abs/2005.07972v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-05-15

Fast and Accurate Variational Inference for Models with Many Latent Variables

Authors: Rubén Loaiza-Maya, Michael Stanley Smith, David J. Nott, Peter J. Danaher

Models with a large number of latent variables are often used to fully
utilize the information in big or complex data. However, they can be difficult
to estimate using standard approaches, and variational inference methods are a
popular alternative. Key to the success of these is the selection of an
approximation to the target density that is accurate, tractable and fast to
calibrate using optimization methods. Most existing choices can be inaccurate
or slow to calibrate when there are many latent variables. Here, we propose a
family of tractable variational approximations that are more accurate and
faster to calibrate for this case. It combines a parsimonious parametric
approximation for the parameter posterior, with the exact conditional posterior
of the latent variables. We derive a simplified expression for the
re-parameterization gradient of the variational lower bound, which is the main
ingredient of efficient optimization algorithms used to implement variational
estimation. To do so only requires the ability to generate exactly or
approximately from the conditional posterior of the latent variables, rather
than to compute its density. We illustrate using two complex contemporary
econometric examples. The first is a nonlinear multivariate state space model
for U.S. macroeconomic variables. The second is a random coefficients tobit
model applied to two million sales by 20,000 individuals in a large consumer
panel from a marketing study. In both cases, we show that our approximating
family is considerably more accurate than mean field or structured Gaussian
approximations, and faster than Markov chain Monte Carlo. Last, we show how to
implement data sub-sampling in variational inference for our approximation,
which can lead to a further reduction in computation time. MATLAB code
implementing the method for our examples is included in supplementary material.

arXiv link: http://arxiv.org/abs/2005.07430v3

Econometrics arXiv paper, submitted: 2020-05-14

Dynamic shrinkage in time-varying parameter stochastic volatility in mean models

Authors: Florian Huber, Michael Pfarrhofer

Successful forecasting models strike a balance between parsimony and
flexibility. This is often achieved by employing suitable shrinkage priors that
penalize model complexity but also reward model fit. In this note, we modify
the stochastic volatility in mean (SVM) model proposed in Chan (2017) by
introducing state-of-the-art shrinkage techniques that allow for time-variation
in the degree of shrinkage. Using a real-time inflation forecast exercise, we
show that employing more flexible prior distributions on several key parameters
slightly improves forecast performance for the United States (US), the United
Kingdom (UK) and the Euro Area (EA). Comparing in-sample results reveals that
our proposed model yields qualitatively similar insights to the original
version of the model.

arXiv link: http://arxiv.org/abs/2005.06851v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-05-14

Combining Population and Study Data for Inference on Event Rates

Authors: Christoph Rothe

This note considers the problem of conducting statistical inference on the
share of individuals in some subgroup of a population that experience some
event. The specific complication is that the size of the subgroup needs to be
estimated, whereas the number of individuals that experience the event is
known. The problem is motivated by the recent study of Streeck et al. (2020),
who estimate the infection fatality rate (IFR) of SARS-CoV-2 infection in a
German town that experienced a super-spreading event in mid-February 2020. In
their case the subgroup of interest is comprised of all infected individuals,
and the event is death caused by the infection. We clarify issues with the
precise definition of the target parameter in this context, and propose
confidence intervals (CIs) based on classical statistical principles that
result in good coverage properties.

arXiv link: http://arxiv.org/abs/2005.06769v1

Econometrics arXiv updated paper (originally submitted: 2020-05-12)

Moment Conditions for Dynamic Panel Logit Models with Fixed Effects

Authors: Bo E. Honoré, Martin Weidner

This paper investigates the construction of moment conditions in discrete
choice panel data with individual specific fixed effects. We describe how to
systematically explore the existence of moment conditions that do not depend on
the fixed effects, and we demonstrate how to construct them when they exist.
Our approach is closely related to the numerical "functional differencing"
construction in Bonhomme (2012), but our emphasis is to find explicit analytic
expressions for the moment functions. We first explain the construction and
give examples of such moment conditions in various models. Then, we focus on
the dynamic binary choice logit model and explore the implications of the
moment conditions for identification and estimation of the model parameters
that are common to all individuals.

arXiv link: http://arxiv.org/abs/2005.05942v7

Econometrics arXiv updated paper (originally submitted: 2020-05-11)

Fractional trends and cycles in macroeconomic time series

Authors: Tobias Hartl, Rolf Tschernig, Enzo Weber

We develop a generalization of correlated trend-cycle decompositions that
avoids prior assumptions about the long-run dynamic characteristics by
modelling the permanent component as a fractionally integrated process and
incorporating a fractional lag operator into the autoregressive polynomial of
the cyclical component. The model allows for an endogenous estimation of the
integration order jointly with the other model parameters and, therefore, no
prior specification tests with respect to persistence are required. We relate
the model to the Beveridge-Nelson decomposition and derive a modified Kalman
filter estimator for the fractional components. Identification, consistency,
and asymptotic normality of the maximum likelihood estimator are shown. For US
macroeconomic data we demonstrate that, unlike $I(1)$ correlated unobserved
components models, the new model estimates a smooth trend together with a cycle
hitting all NBER recessions. While $I(1)$ unobserved components models yield an
upward-biased signal-to-noise ratio whenever the integration order of the
data-generating mechanism is greater than one, the fractionally integrated
model attributes less variation to the long-run shocks due to the fractional
trend specification and a higher variation to the cycle shocks due to the
fractional lag operator, leading to more persistent cycles and smooth trend
estimates that reflect macroeconomic common sense.

arXiv link: http://arxiv.org/abs/2005.05266v2

Econometrics arXiv paper, submitted: 2020-05-11

Macroeconomic Forecasting with Fractional Factor Models

Authors: Tobias Hartl

We combine high-dimensional factor models with fractional integration methods
and derive models where nonstationary, potentially cointegrated data of
different persistence is modelled as a function of common fractionally
integrated factors. A two-stage estimator, that combines principal components
and the Kalman filter, is proposed. The forecast performance is studied for a
high-dimensional US macroeconomic data set, where we find that benefits from
the fractional factor models can be substantial, as they outperform univariate
autoregressions, principal components, and the factor-augmented
error-correction model.

arXiv link: http://arxiv.org/abs/2005.04897v1

Econometrics arXiv updated paper (originally submitted: 2020-05-11)

Posterior Probabilities for Lorenz and Stochastic Dominance of Australian Income Distributions

Authors: David Gunawan, William E. Griffiths, Duangkamon Chotikapanich

Using HILDA data for the years 2001, 2006, 2010, 2014 and 2017, we compute
posterior probabilities for dominance for all pairwise comparisons of income
distributions in these years. The dominance criteria considered are Lorenz
dominance and first and second order stochastic dominance. The income
distributions are estimated using an infinite mixture of gamma density
functions, with posterior probabilities computed as the proportion of Markov
chain Monte Carlo draws that satisfy the inequalities that define the dominance
criteria. We find welfare improvements from 2001 to 2006 and qualified
improvements from 2006 to the later three years. Evidence of an ordering
between 2010, 2014 and 2017 cannot be established.

arXiv link: http://arxiv.org/abs/2005.04870v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-05-09

Probabilistic Multi-Step-Ahead Short-Term Water Demand Forecasting with Lasso

Authors: Jens Kley-Holsteg, Florian Ziel

Water demand is a highly important variable for operational control and
decision making. Hence, the development of accurate forecasts is a valuable
field of research to further improve the efficiency of water utilities.
Focusing on probabilistic multi-step-ahead forecasting, a time series model is
introduced, to capture typical autoregressive, calendar and seasonal effects,
to account for time-varying variance, and to quantify the uncertainty and
path-dependency of the water demand process. To deal with the high complexity
of the water demand process a high-dimensional feature space is applied, which
is efficiently tuned by an automatic shrinkage and selection operator (lasso).
It allows to obtain an accurate, simple interpretable and fast computable
forecasting model, which is well suited for real-time applications. The
complete probabilistic forecasting framework allows not only for simulating the
mean and the marginal properties, but also the correlation structure between
hours within the forecasting horizon. For practitioners, complete probabilistic
multi-step-ahead forecasts are of considerable relevance as they provide
additional information about the expected aggregated or cumulative water
demand, so that a statement can be made about the probability with which a
water storage capacity can guarantee the supply over a certain period of time.
This information allows to better control storage capacities and to better
ensure the smooth operation of pumps. To appropriately evaluate the forecasting
performance of the considered models, the energy score (ES) as a strictly
proper multidimensional evaluation criterion, is introduced. The methodology is
applied to the hourly water demand data of a German water supplier.

arXiv link: http://arxiv.org/abs/2005.04522v1

Econometrics arXiv updated paper (originally submitted: 2020-05-08)

Critical Values Robust to P-hacking

Authors: Adam McCloskey, Pascal Michaillat

P-hacking is prevalent in reality but absent from classical hypothesis
testing theory. As a consequence, significant results are much more common than
they are supposed to be when the null hypothesis is in fact true. In this
paper, we build a model of hypothesis testing with p-hacking. From the model,
we construct critical values such that, if the values are used to determine
significance, and if scientists' p-hacking behavior adjusts to the new
significance standards, significant results occur with the desired frequency.
Such robust critical values allow for p-hacking so they are larger than
classical critical values. To illustrate the amount of correction that
p-hacking might require, we calibrate the model using evidence from the medical
sciences. In the calibrated model the robust critical value for any test
statistic is the classical critical value for the same test statistic with one
fifth of the significance level.

arXiv link: http://arxiv.org/abs/2005.04141v8

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2020-05-08

How Reliable are Bootstrap-based Heteroskedasticity Robust Tests?

Authors: Benedikt M. Pötscher, David Preinerstorfer

We develop theoretical finite-sample results concerning the size of wild
bootstrap-based heteroskedasticity robust tests in linear regression models. In
particular, these results provide an efficient diagnostic check, which can be
used to weed out tests that are unreliable for a given testing problem in the
sense that they overreject substantially. This allows us to assess the
reliability of a large variety of wild bootstrap-based tests in an extensive
numerical study.

arXiv link: http://arxiv.org/abs/2005.04089v2

Econometrics arXiv updated paper (originally submitted: 2020-05-08)

Fractional trends in unobserved components models

Authors: Tobias Hartl, Rolf Tschernig, Enzo Weber

We develop a generalization of unobserved components models that allows for a
wide range of long-run dynamics by modelling the permanent component as a
fractionally integrated process. The model does not require stationarity and
can be cast in state space form. In a multivariate setup, fractional trends may
yield a cointegrated system. We derive the Kalman filter estimator for the
common fractionally integrated component and establish consistency and
asymptotic (mixed) normality of the maximum likelihood estimator. We apply the
model to extract a common long-run component of three US inflation measures,
where we show that the $I(1)$ assumption is likely to be violated for the
common trend.

arXiv link: http://arxiv.org/abs/2005.03988v2

Econometrics arXiv updated paper (originally submitted: 2020-05-08)

Dynamic Shrinkage Priors for Large Time-varying Parameter Regressions using Scalable Markov Chain Monte Carlo Methods

Authors: Niko Hauzenberger, Florian Huber, Gary Koop

Time-varying parameter (TVP) regression models can involve a huge number of
coefficients. Careful prior elicitation is required to yield sensible posterior
and predictive inferences. In addition, the computational demands of Markov
Chain Monte Carlo (MCMC) methods mean their use is limited to the case where
the number of predictors is not too large. In light of these two concerns, this
paper proposes a new dynamic shrinkage prior which reflects the empirical
regularity that TVPs are typically sparse (i.e. time variation may occur only
episodically and only for some of the coefficients). A scalable MCMC algorithm
is developed which is capable of handling very high dimensional TVP regressions
or TVP Vector Autoregressions. In an exercise using artificial data we
demonstrate the accuracy and computational efficiency of our methods. In an
application involving the term structure of interest rates in the eurozone, we
find our dynamic shrinkage prior to effectively pick out small amounts of
parameter change and our methods to forecast well.

arXiv link: http://arxiv.org/abs/2005.03906v2

Econometrics arXiv updated paper (originally submitted: 2020-05-07)

Know Your Clients' behaviours: a cluster analysis of financial transactions

Authors: John R. J. Thompson, Longlong Feng, R. Mark Reesor, Chuck Grace

In Canada, financial advisors and dealers are required by provincial
securities commissions and self-regulatory organizations--charged with direct
regulation over investment dealers and mutual fund dealers--to respectively
collect and maintain Know Your Client (KYC) information, such as their age or
risk tolerance, for investor accounts. With this information, investors, under
their advisor's guidance, make decisions on their investments which are
presumed to be beneficial to their investment goals. Our unique dataset is
provided by a financial investment dealer with over 50,000 accounts for over
23,000 clients. We use a modified behavioural finance recency, frequency,
monetary model for engineering features that quantify investor behaviours, and
machine learning clustering algorithms to find groups of investors that behave
similarly. We show that the KYC information collected does not explain client
behaviours, whereas trade and transaction frequency and volume are most
informative. We believe the results shown herein encourage financial regulators
and advisors to use more advanced metrics to better understand and predict
investor behaviours.

arXiv link: http://arxiv.org/abs/2005.03625v2

Econometrics arXiv paper, submitted: 2020-05-07

Diffusion Copulas: Identification and Estimation

Authors: Ruijun Bu, Kaddour Hadri, Dennis Kristensen

We propose a new semiparametric approach for modelling nonlinear univariate
diffusions, where the observed process is a nonparametric transformation of an
underlying parametric diffusion (UPD). This modelling strategy yields a general
class of semiparametric Markov diffusion models with parametric dynamic copulas
and nonparametric marginal distributions. We provide primitive conditions for
the identification of the UPD parameters together with the unknown
transformations from discrete samples. Likelihood-based estimators of both
parametric and nonparametric components are developed and we analyze the
asymptotic properties of these. Kernel-based drift and diffusion estimators are
also proposed and shown to be normally distributed in large samples. A
simulation study investigates the finite sample performance of our estimators
in the context of modelling US short-term interest rates. We also present a
simple application of the proposed method for modelling the CBOE volatility
index data.

arXiv link: http://arxiv.org/abs/2005.03513v1

Econometrics arXiv updated paper (originally submitted: 2020-05-07)

Distributional robustness of K-class estimators and the PULSE

Authors: Martin Emil Jakobsen, Jonas Peters

While causal models are robust in that they are prediction optimal under
arbitrarily strong interventions, they may not be optimal when the
interventions are bounded. We prove that the classical K-class estimator
satisfies such optimality by establishing a connection between K-class
estimators and anchor regression. This connection further motivates a novel
estimator in instrumental variable settings that minimizes the mean squared
prediction error subject to the constraint that the estimator lies in an
asymptotically valid confidence region of the causal coefficient. We call this
estimator PULSE (p-uncorrelated least squares estimator), relate it to work on
invariance, show that it can be computed efficiently as a data-driven K-class
estimator, even though the underlying optimization problem is non-convex, and
prove consistency. We evaluate the estimators on real data and perform
simulation experiments illustrating that PULSE suffers from less variability.
There are several settings including weak instrument settings, where it
outperforms other estimators.

arXiv link: http://arxiv.org/abs/2005.03353v3

Econometrics arXiv updated paper (originally submitted: 2020-05-07)

Detecting Latent Communities in Network Formation Models

Authors: Shujie Ma, Liangjun Su, Yichong Zhang

This paper proposes a logistic undirected network formation model which
allows for assortative matching on observed individual characteristics and the
presence of edge-wise fixed effects. We model the coefficients of observed
characteristics to have a latent community structure and the edge-wise fixed
effects to be of low rank. We propose a multi-step estimation procedure
involving nuclear norm regularization, sample splitting, iterative logistic
regression and spectral clustering to detect the latent communities. We show
that the latent communities can be exactly recovered when the expected degree
of the network is of order log n or higher, where n is the number of nodes in
the network. The finite sample performance of the new estimation and inference
methods is illustrated through both simulated and real datasets.

arXiv link: http://arxiv.org/abs/2005.03226v3

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2020-05-06

Spatial dependence in the rank-size distribution of cities

Authors: Rolf Bergs

Power law distributions characterise several natural and social phenomena.
The Zipf law for cities is one of those. The study views the question of
whether that global regularity is independent of different spatial
distributions of cities. For that purpose, a typical Zipfian rank-size
distribution of cities is generated with random numbers. This distribution is
then cast into different settings of spatial coordinates. For the estimation,
the variables rank and size are supplemented by spatial spillover effects in a
standard spatial econometric approach. Results suggest that distance and
contiguity effects matter. This finding is further corroborated by three
country analyses.

arXiv link: http://arxiv.org/abs/2005.02836v1

Econometrics arXiv updated paper (originally submitted: 2020-05-05)

Arctic Amplification of Anthropogenic Forcing: A Vector Autoregressive Analysis

Authors: Philippe Goulet Coulombe, Maximilian Göbel

On September 15th 2020, Arctic sea ice extent (SIE) ranked second-to-lowest
in history and keeps trending downward. The understanding of how feedback loops
amplify the effects of external CO2 forcing is still limited. We propose the
VARCTIC, which is a Vector Autoregression (VAR) designed to capture and
extrapolate Arctic feedback loops. VARs are dynamic simultaneous systems of
equations, routinely estimated to predict and understand the interactions of
multiple macroeconomic time series. The VARCTIC is a parsimonious compromise
between full-blown climate models and purely statistical approaches that
usually offer little explanation of the underlying mechanism. Our completely
unconditional forecast has SIE hitting 0 in September by the 2060's. Impulse
response functions reveal that anthropogenic CO2 emission shocks have an
unusually durable effect on SIE -- a property shared by no other shock. We find
Albedo- and Thickness-based feedbacks to be the main amplification channels
through which CO2 anomalies impact SIE in the short/medium run. Further,
conditional forecast analyses reveal that the future path of SIE crucially
depends on the evolution of CO2 emissions, with outcomes ranging from
recovering SIE to it reaching 0 in the 2050's. Finally, Albedo and Thickness
feedbacks are shown to play an important role in accelerating the speed at
which predicted SIE is heading towards 0.

arXiv link: http://arxiv.org/abs/2005.02535v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-05-05

Modeling High-Dimensional Unit-Root Time Series

Authors: Zhaoxing Gao, Ruey S. Tsay

This paper proposes a new procedure to build factor models for
high-dimensional unit-root time series by postulating that a $p$-dimensional
unit-root process is a nonsingular linear transformation of a set of unit-root
processes, a set of stationary common factors, which are dynamically dependent,
and some idiosyncratic white noise components. For the stationary components,
we assume that the factor process captures the temporal-dependence and the
idiosyncratic white noise series explains, jointly with the factors, the
cross-sectional dependence. The estimation of nonsingular linear loading spaces
is carried out in two steps. First, we use an eigenanalysis of a nonnegative
definite matrix of the data to separate the unit-root processes from the
stationary ones and a modified method to specify the number of unit roots. We
then employ another eigenanalysis and a projected principal component analysis
to identify the stationary common factors and the white noise series. We
propose a new procedure to specify the number of white noise series and, hence,
the number of stationary common factors, establish asymptotic properties of the
proposed method for both fixed and diverging $p$ as the sample size $n$
increases, and use simulation and a real example to demonstrate the performance
of the proposed method in finite samples. We also compare our method with some
commonly used ones in the literature regarding the forecast ability of the
extracted factors and find that the proposed method performs well in
out-of-sample forecasting of a 508-dimensional PM$_{2.5}$ series in Taiwan.

arXiv link: http://arxiv.org/abs/2005.03496v2

Econometrics arXiv paper, submitted: 2020-05-05

Stocks Vote with Their Feet: Can a Piece of Paper Document Fights the COVID-19 Pandemic?

Authors: J. Su, Q. Zhong

Assessing the trend of the COVID-19 pandemic and policy effectiveness is
essential for both policymakers and stock investors, but challenging because
the crisis has unfolded with extreme speed and the previous index was not
suitable for measuring policy effectiveness for COVID-19. This paper builds an
index of policy effectiveness on fighting COVID-19 pandemic, whose building
method is similar to the index of Policy Uncertainty, based on province-level
paper documents released in China from Jan.1st to Apr.16th of 2020. This paper
also studies the relationships among COVID-19 daily confirmed cases, stock
market volatility, and document-based policy effectiveness in China. This paper
uses the DCC-GARCH model to fit conditional covariance's change rule of
multi-series. This paper finally tests four hypotheses, about the time-space
difference of policy effectiveness and its overflow effect both on the COVID-19
pandemic and stock market. Through the inner interaction of this triad
structure, we can bring forward more specific and scientific suggestions to
maintain stability in the stock market at such exceptional times.

arXiv link: http://arxiv.org/abs/2005.02034v1

Econometrics arXiv updated paper (originally submitted: 2020-05-05)

Identifying Preferences when Households are Financially Constrained

Authors: Andreas Tryphonides

This paper shows that utilizing information on the extensive margin of
financially constrained households can narrow down the set of admissible
preferences in a large class of macroeconomic models. Estimates based on
Spanish aggregate data provide further empirical support for this result and
suggest that accounting for this margin can bring estimates closer to
microeconometric evidence. Accounting for financial constraints and the
extensive margin is shown to matter for empirical asset pricing and quantifying
distortions in financial markets.

arXiv link: http://arxiv.org/abs/2005.02010v6

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-05-04

The Murphy Decomposition and the Calibration-Resolution Principle: A New Perspective on Forecast Evaluation

Authors: Marc-Oliver Pohle

I provide a unifying perspective on forecast evaluation, characterizing
accurate forecasts of all types, from simple point to complete probabilistic
forecasts, in terms of two fundamental underlying properties, autocalibration
and resolution, which can be interpreted as describing a lack of systematic
mistakes and a high information content. This "calibration-resolution
principle" gives a new insight into the nature of forecasting and generalizes
the famous sharpness principle by Gneiting et al. (2007) from probabilistic to
all types of forecasts. It amongst others exposes the shortcomings of several
widely used forecast evaluation methods. The principle is based on a fully
general version of the Murphy decomposition of loss functions, which I provide.
Special cases of this decomposition are well-known and widely used in
meteorology.
Besides using the decomposition in this new theoretical way, after having
introduced it and the underlying properties in a proper theoretical framework,
accompanied by an illustrative example, I also employ it in its classical sense
as a forecast evaluation method as the meteorologists do: As such, it unveils
the driving forces behind forecast errors and complements classical forecast
evaluation methods. I discuss estimation of the decomposition via kernel
regression and then apply it to popular economic forecasts. Analysis of mean
forecasts from the US Survey of Professional Forecasters and quantile forecasts
derived from Bank of England fan charts indeed yield interesting new insights
and highlight the potential of the method.

arXiv link: http://arxiv.org/abs/2005.01835v1

Econometrics arXiv cross-link from q-fin.RM (q-fin.RM), submitted: 2020-05-04

Neural Networks and Value at Risk

Authors: Alexander Arimond, Damian Borth, Andreas Hoepner, Michael Klawunn, Stefan Weisheit

Utilizing a generative regime switching framework, we perform Monte-Carlo
simulations of asset returns for Value at Risk threshold estimation. Using
equity markets and long term bonds as test assets in the global, US, Euro area
and UK setting over an up to 1,250 weeks sample horizon ending in August 2018,
we investigate neural networks along three design steps relating (i) to the
initialization of the neural network, (ii) its incentive function according to
which it has been trained and (iii) the amount of data we feed. First, we
compare neural networks with random seeding with networks that are initialized
via estimations from the best-established model (i.e. the Hidden Markov). We
find latter to outperform in terms of the frequency of VaR breaches (i.e. the
realized return falling short of the estimated VaR threshold). Second, we
balance the incentive structure of the loss function of our networks by adding
a second objective to the training instructions so that the neural networks
optimize for accuracy while also aiming to stay in empirically realistic regime
distributions (i.e. bull vs. bear market frequencies). In particular this
design feature enables the balanced incentive recurrent neural network (RNN) to
outperform the single incentive RNN as well as any other neural network or
established approach by statistically and economically significant levels.
Third, we half our training data set of 2,000 days. We find our networks when
fed with substantially less data (i.e. 1,000 days) to perform significantly
worse which highlights a crucial weakness of neural networks in their
dependence on very large data sets ...

arXiv link: http://arxiv.org/abs/2005.01686v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-05-04

The Information Content of Taster's Valuation in Tea Auctions of India

Authors: Abhinandan Dalal, Diganta Mukherjee, Subhrajyoty Roy

Tea auctions across India occur as an ascending open auction, conducted
online. Before the auction, a sample of the tea lot is sent to potential
bidders and a group of tea tasters. The seller's reserve price is a
confidential function of the tea taster's valuation, which also possibly acts
as a signal to the bidders.
In this paper, we work with the dataset from a single tea auction house, J
Thomas, of tea dust category, on 49 weeks in the time span of 2018-2019, with
the following objectives in mind:
$\bullet$ Objective classification of the various categories of tea dust (25)
into a more manageable, and robust classification of the tea dust, based on
source and grades.
$\bullet$ Predict which tea lots would be sold in the auction market, and a
model for the final price conditioned on sale.
$\bullet$ To study the distribution of price and ratio of the sold tea
auction lots.
$\bullet$ Make a detailed analysis of the information obtained from the tea
taster's valuation and its impact on the final auction price.
The model used has shown various promising results on cross-validation. The
importance of valuation is firmly established through analysis of causal
relationship between the valuation and the actual price. The authors hope that
this study of the properties and the detailed analysis of the role played by
the various factors, would be significant in the decision making process for
the players of the auction game, pave the way to remove the manual interference
in an attempt to automate the auction procedure, and improve tea quality in
markets.

arXiv link: http://arxiv.org/abs/2005.02814v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2020-05-04

Ensemble Forecasting for Intraday Electricity Prices: Simulating Trajectories

Authors: Michał Narajewski, Florian Ziel

Recent studies concerning the point electricity price forecasting have shown
evidence that the hourly German Intraday Continuous Market is weak-form
efficient. Therefore, we take a novel, advanced approach to the problem. A
probabilistic forecasting of the hourly intraday electricity prices is
performed by simulating trajectories in every trading window to receive a
realistic ensemble to allow for more efficient intraday trading and redispatch.
A generalized additive model is fitted to the price differences with the
assumption that they follow a zero-inflated distribution, precisely a mixture
of the Dirac and the Student's t-distributions. Moreover, the mixing term is
estimated using a high-dimensional logistic regression with lasso penalty. We
model the expected value and volatility of the series using i.a. autoregressive
and no-trade effects or load, wind and solar generation forecasts and
accounting for the non-linearities in e.g. time to maturity. Both the in-sample
characteristics and forecasting performance are analysed using a rolling window
forecasting study. Multiple versions of the model are compared to several
benchmark models and evaluated using probabilistic forecasting measures and
significance tests. The study aims to forecast the price distribution in the
German Intraday Continuous Market in the last 3 hours of trading, but the
approach allows for application to other continuous markets, especially in
Europe. The results prove superiority of the mixture model over the benchmarks
gaining the most from the modelling of the volatility. They also indicate that
the introduction of XBID reduced the market volatility.

arXiv link: http://arxiv.org/abs/2005.01365v3

Econometrics arXiv updated paper (originally submitted: 2020-04-30)

Two Burning Questions on COVID-19: Did shutting down the economy help? Can we (partially) reopen the economy without risking the second wave?

Authors: Anish Agarwal, Abdullah Alomar, Arnab Sarker, Devavrat Shah, Dennis Shen, Cindy Yang

As we reach the apex of the COVID-19 pandemic, the most pressing question
facing us is: can we even partially reopen the economy without risking a second
wave? We first need to understand if shutting down the economy helped. And if
it did, is it possible to achieve similar gains in the war against the pandemic
while partially opening up the economy? To do so, it is critical to understand
the effects of the various interventions that can be put into place and their
corresponding health and economic implications. Since many interventions exist,
the key challenge facing policy makers is understanding the potential
trade-offs between them, and choosing the particular set of interventions that
works best for their circumstance. In this memo, we provide an overview of
Synthetic Interventions (a natural generalization of Synthetic Control), a
data-driven and statistically principled method to perform what-if scenario
planning, i.e., for policy makers to understand the trade-offs between
different interventions before having to actually enact them. In essence, the
method leverages information from different interventions that have already
been enacted across the world and fits it to a policy maker's setting of
interest, e.g., to estimate the effect of mobility-restricting interventions on
the U.S., we use daily death data from countries that enforced severe mobility
restrictions to create a "synthetic low mobility U.S." and predict the
counterfactual trajectory of the U.S. if it had indeed applied a similar
intervention. Using Synthetic Interventions, we find that lifting severe
mobility restrictions and only retaining moderate mobility restrictions (at
retail and transit locations), seems to effectively flatten the curve. We hope
this provides guidance on weighing the trade-offs between the safety of the
population, strain on the healthcare system, and impact on the economy.

arXiv link: http://arxiv.org/abs/2005.00072v2

Econometrics arXiv paper, submitted: 2020-04-30

The Interaction Between Credit Constraints and Uncertainty Shocks

Authors: Pratiti Chatterjee, David Gunawan, Robert Kohn

Can uncertainty about credit availability trigger a slowdown in real
activity? This question is answered by using a novel method to identify shocks
to uncertainty in access to credit. Time-variation in uncertainty about credit
availability is estimated using particle Markov Chain Monte Carlo. We extract
shocks to time-varying credit uncertainty and decompose it into two parts: the
first captures the "pure" effect of a shock to the second moment; the second
captures total effects of uncertainty including effects on the first moment.
Using state-dependent local projections, we find that the "pure" effect by
itself generates a sharp slowdown in real activity and the effects are largely
countercyclical. We feed the estimated shocks into a flexible price real
business cycle model with a collateral constraint and show that when the
collateral constraint binds, an uncertainty shock about credit access is
recessionary leading to a simultaneous decline in consumption, investment, and
output.

arXiv link: http://arxiv.org/abs/2004.14719v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-04-28

Causal Inference on Networks under Continuous Treatment Interference

Authors: Laura Forastiere, Davide Del Prete, Valerio Leone Sciabolazza

This paper investigates the case of interference, when a unit's treatment
also affects other units' outcome. When interference is at work, policy
evaluation mostly relies on the use of randomized experiments under cluster
interference and binary treatment. Instead, we consider a non-experimental
setting under continuous treatment and network interference. In particular, we
define spillover effects by specifying the exposure to network treatment as a
weighted average of the treatment received by units connected through physical,
social or economic interactions. We provide a generalized propensity
score-based estimator to estimate both direct and spillover effects of a
continuous treatment. Our estimator also allows to consider asymmetric network
connections characterized by heterogeneous intensities. To showcase this
methodology, we investigate whether and how spillover effects shape the optimal
level of policy interventions in agricultural markets. Our results show that,
in this context, neglecting interference may underestimate the degree of policy
effectiveness.

arXiv link: http://arxiv.org/abs/2004.13459v2

Econometrics arXiv paper, submitted: 2020-04-27

Measuring wage inequality under right censoring

Authors: João Nicolau, Pedro Raposo, Paulo M. M. Rodrigues

In this paper we investigate potential changes which may have occurred over
the last two decades in the probability mass of the right tail of the wage
distribution, through the analysis of the corresponding tail index. In
specific, a conditional tail index estimator is introduced which explicitly
allows for right tail censoring (top-coding), which is a feature of the widely
used current population survey (CPS), as well as of other surveys. Ignoring the
top-coding may lead to inconsistent estimates of the tail index and to under or
over statements of inequality and of its evolution over time. Thus, having a
tail index estimator that explicitly accounts for this sample characteristic is
of importance to better understand and compute the tail index dynamics in the
censored right tail of the wage distribution. The contribution of this paper is
threefold: i) we introduce a conditional tail index estimator that explicitly
handles the top-coding problem, and evaluate its finite sample performance and
compare it with competing methods; ii) we highlight that the factor values used
to adjust the top-coded wage have changed over time and depend on the
characteristics of individuals, occupations and industries, and propose
suitable values; and iii) we provide an in-depth empirical analysis of the
dynamics of the US wage distribution's right tail using the public-use CPS
database from 1992 to 2017.

arXiv link: http://arxiv.org/abs/2004.12856v1

Econometrics arXiv updated paper (originally submitted: 2020-04-27)

State Dependence and Unobserved Heterogeneity in the Extensive Margin of Trade

Authors: Julian Hinz, Amrei Stammann, Joschka Wanner

We study the role and drivers of persistence in the extensive margin of
bilateral trade. Motivated by a stylized heterogeneous firms model of
international trade with market entry costs, we consider dynamic three-way
fixed effects binary choice models and study the corresponding incidental
parameter problem. The standard maximum likelihood estimator is consistent
under asymptotics where all panel dimensions grow at a constant rate, but it
has an asymptotic bias in its limiting distribution, invalidating inference
even in situations where the bias appears to be small. Thus, we propose two
different bias-corrected estimators. Monte Carlo simulations confirm their
desirable statistical properties. We apply these estimators in a reassessment
of the most commonly studied determinants of the extensive margin of trade.
Both true state dependence and unobserved heterogeneity contribute considerably
to trade persistence and taking this persistence into account matters
significantly in identifying the effects of trade policies on the extensive
margin.

arXiv link: http://arxiv.org/abs/2004.12655v2

Econometrics arXiv updated paper (originally submitted: 2020-04-27)

Structural Regularization

Authors: Jiaming Mao, Zhesheng Zheng

We propose a novel method for modeling data by using structural models based
on economic theory as regularizers for statistical models. We show that even if
a structural model is misspecified, as long as it is informative about the
data-generating mechanism, our method can outperform both the (misspecified)
structural model and un-structural-regularized statistical models. Our method
permits a Bayesian interpretation of theory as prior knowledge and can be used
both for statistical prediction and causal inference. It contributes to
transfer learning by showing how incorporating theory into statistical modeling
can significantly improve out-of-domain predictions and offers a way to
synthesize reduced-form and structural approaches for causal effect estimation.
Simulation experiments demonstrate the potential of our method in various
settings, including first-price auctions, dynamic models of entry and exit, and
demand estimation with instrumental variables. Our method has potential
applications not only in economics, but in other scientific disciplines whose
theoretical models offer important insight but are subject to significant
misspecification concerns.

arXiv link: http://arxiv.org/abs/2004.12601v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-04-26

Reducing Interference Bias in Online Marketplace Pricing Experiments

Authors: David Holtz, Ruben Lobel, Inessa Liskovich, Sinan Aral

Online marketplace designers frequently run A/B tests to measure the impact
of proposed product changes. However, given that marketplaces are inherently
connected, total average treatment effect estimates obtained through Bernoulli
randomized experiments are often biased due to violations of the stable unit
treatment value assumption. This can be particularly problematic for
experiments that impact sellers' strategic choices, affect buyers' preferences
over items in their consideration set, or change buyers' consideration sets
altogether. In this work, we measure and reduce bias due to interference in
online marketplace experiments by using observational data to create clusters
of similar listings, and then using those clusters to conduct
cluster-randomized field experiments. We provide a lower bound on the magnitude
of bias due to interference by conducting a meta-experiment that randomizes
over two experiment designs: one Bernoulli randomized, one cluster randomized.
In both meta-experiment arms, treatment sellers are subject to a different
platform fee policy than control sellers, resulting in different prices for
buyers. By conducting a joint analysis of the two meta-experiment arms, we find
a large and statistically significant difference between the total average
treatment effect estimates obtained with the two designs, and estimate that
32.60% of the Bernoulli-randomized treatment effect estimate is due to
interference bias. We also find weak evidence that the magnitude and/or
direction of interference bias depends on extent to which a marketplace is
supply- or demand-constrained, and analyze a second meta-experiment to
highlight the difficulty of detecting interference bias when treatment
interventions require intention-to-treat analysis.

arXiv link: http://arxiv.org/abs/2004.12489v1

Econometrics arXiv updated paper (originally submitted: 2020-04-26)

Inference with Many Weak Instruments

Authors: Anna Mikusheva, Liyang Sun

We develop a concept of weak identification in linear IV models in which the
number of instruments can grow at the same rate or slower than the sample size.
We propose a jackknifed version of the classical weak identification-robust
Anderson-Rubin (AR) test statistic. Large-sample inference based on the
jackknifed AR is valid under heteroscedasticity and weak identification. The
feasible version of this statistic uses a novel variance estimator. The test
has uniformly correct size and good power properties. We also develop a
pre-test for weak identification that is related to the size property of a Wald
test based on the Jackknife Instrumental Variable Estimator (JIVE). This new
pre-test is valid under heteroscedasticity and with many instruments.

arXiv link: http://arxiv.org/abs/2004.12445v3

Econometrics arXiv updated paper (originally submitted: 2020-04-26)

Maximum Likelihood Estimation of Stochastic Frontier Models with Endogeneity

Authors: Samuele Centorrino, María Pérez-Urdiales

We propose and study a maximum likelihood estimator of stochastic frontier
models with endogeneity in cross-section data when the composite error term may
be correlated with inputs and environmental variables. Our framework is a
generalization of the normal half-normal stochastic frontier model with
endogeneity. We derive the likelihood function in closed form using three
fundamental assumptions: the existence of control functions that fully capture
the dependence between regressors and unobservables; the conditional
independence of the two error components given the control functions; and the
conditional distribution of the stochastic inefficiency term given the control
functions being a folded normal distribution. We also provide a Battese-Coelli
estimator of technical efficiency. Our estimator is computationally fast and
easy to implement. We study some of its asymptotic properties, and we showcase
its finite sample behavior in Monte-Carlo simulations and an empirical
application to farmers in Nepal.

arXiv link: http://arxiv.org/abs/2004.12369v3

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-04-25

Limiting Bias from Test-Control Interference in Online Marketplace Experiments

Authors: David Holtz, Sinan Aral

In an A/B test, the typical objective is to measure the total average
treatment effect (TATE), which measures the difference between the average
outcome if all users were treated and the average outcome if all users were
untreated. However, a simple difference-in-means estimator will give a biased
estimate of the TATE when outcomes of control units depend on the outcomes of
treatment units, an issue we refer to as test-control interference. Using a
simulation built on top of data from Airbnb, this paper considers the use of
methods from the network interference literature for online marketplace
experimentation. We model the marketplace as a network in which an edge exists
between two sellers if their goods substitute for one another. We then simulate
seller outcomes, specifically considering a "status quo" context and
"treatment" context that forces all sellers to lower their prices. We use the
same simulation framework to approximate TATE distributions produced by using
blocked graph cluster randomization, exposure modeling, and the Hajek estimator
for the difference in means. We find that while blocked graph cluster
randomization reduces the bias of the naive difference-in-means estimator by as
much as 62%, it also significantly increases the variance of the estimator. On
the other hand, the use of more sophisticated estimators produces mixed
results. While some provide (small) additional reductions in bias and small
reductions in variance, others lead to increased bias and variance. Overall,
our results suggest that experiment design and analysis techniques from the
network experimentation literature are promising tools for reducing bias due to
test-control interference in marketplace experiments.

arXiv link: http://arxiv.org/abs/2004.12162v1

Econometrics arXiv updated paper (originally submitted: 2020-04-25)

Sensitivity to Calibrated Parameters

Authors: Thomas H. Jørgensen

A common approach to estimation of economic models is to calibrate a sub-set
of model parameters and keep them fixed when estimating the remaining
parameters. Calibrated parameters likely affect conclusions based on the model
but estimation time often makes a systematic investigation of the sensitivity
to calibrated parameters infeasible. I propose a simple and computationally
low-cost measure of the sensitivity of parameters and other objects of interest
to the calibrated parameters. In the main empirical application, I revisit the
analysis of life-cycle savings motives in Gourinchas and Parker (2002) and show
that some estimates are sensitive to calibrations.

arXiv link: http://arxiv.org/abs/2004.12100v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-04-25

Bayesian Clustered Coefficients Regression with Auxiliary Covariates Assistant Random Effects

Authors: Guanyu Hu, Yishu Xue, Zhihua Ma

In regional economics research, a problem of interest is to detect
similarities between regions, and estimate their shared coefficients in
economics models. In this article, we propose a mixture of finite mixtures
(MFM) clustered regression model with auxiliary covariates that account for
similarities in demographic or economic characteristics over a spatial domain.
Our Bayesian construction provides both inference for number of clusters and
clustering configurations, and estimation for parameters for each cluster.
Empirical performance of the proposed model is illustrated through simulation
experiments, and further applied to a study of influential factors for monthly
housing cost in Georgia.

arXiv link: http://arxiv.org/abs/2004.12022v2

Econometrics arXiv cross-link from q-fin.TR (q-fin.TR), submitted: 2020-04-24

From orders to prices: A stochastic description of the limit order book to forecast intraday returns

Authors: Johannes Bleher, Michael Bleher, Thomas Dimpfl

We propose a microscopic model to describe the dynamics of the fundamental
events in the limit order book (LOB): order arrivals and cancellations. It is
based on an operator algebra for individual orders and describes their effect
on the LOB. The model inputs are arrival and cancellation rate distributions
that emerge from individual behavior of traders, and we show how prices and
liquidity arise from the LOB dynamics. In a simulation study we illustrate how
the model works and highlight its sensitivity with respect to assumptions
regarding the collective behavior of market participants. Empirically, we test
the model on a LOB snapshot of XETRA, estimate several linearized model
specifications, and conduct in- and out-of-sample forecasts.The in-sample
results based on contemporaneous information suggest that our model describes
returns very well, resulting in an adjusted $R^2$ of roughly 80%. In the more
realistic setting where only past information enters the model, we observe an
adjusted $R^2$ around 15%. The direction of the next return can be predicted
(out-of-sample) with an accuracy above 75% for time horizons below 10 minutes.
On average, we obtain an RMSPE that is 10 times lower than values documented in
the literature.

arXiv link: http://arxiv.org/abs/2004.11953v2

Econometrics arXiv paper, submitted: 2020-04-24

Microeconometrics with Partial Identification

Authors: Francesca Molinari

This chapter reviews the microeconometrics literature on partial
identification, focusing on the developments of the last thirty years. The
topics presented illustrate that the available data combined with credible
maintained assumptions may yield much information about a parameter of
interest, even if they do not reveal it exactly. Special attention is devoted
to discussing the challenges associated with, and some of the solutions put
forward to, (1) obtain a tractable characterization of the values for the
parameters of interest which are observationally equivalent, given the
available data and maintained assumptions; (2) estimate this set of values; (3)
conduct test of hypotheses and make confidence statements. The chapter reviews
advances in partial identification analysis both as applied to learning
(functionals of) probability distributions that are well-defined in the absence
of models, as well as to learning parameters that are well-defined only in the
context of particular models. A simple organizing principle is highlighted: the
source of the identification problem can often be traced to a collection of
random variables that are consistent with the available data and maintained
assumptions. This collection may be part of the observed data or be a model
implication. In either case, it can be formalized as a random set. Random set
theory is then used as a mathematical framework to unify a number of special
results and produce a general methodology to carry out partial identification
analysis.

arXiv link: http://arxiv.org/abs/2004.11751v1

Econometrics arXiv updated paper (originally submitted: 2020-04-24)

A Comparison of Methods for Treatment Assignment with an Application to Playlist Generation

Authors: Carlos Fernández-Loría, Foster Provost, Jesse Anderton, Benjamin Carterette, Praveen Chandar

This study presents a systematic comparison of methods for individual
treatment assignment, a general problem that arises in many applications and
has received significant attention from economists, computer scientists, and
social scientists. We group the various methods proposed in the literature into
three general classes of algorithms (or metalearners): learning models to
predict outcomes (the O-learner), learning models to predict causal effects
(the E-learner), and learning models to predict optimal treatment assignments
(the A-learner). We compare the metalearners in terms of (1) their level of
generality and (2) the objective function they use to learn models from data;
we then discuss the implications that these characteristics have for modeling
and decision making. Notably, we demonstrate analytically and empirically that
optimizing for the prediction of outcomes or causal effects is not the same as
optimizing for treatment assignments, suggesting that in general the A-learner
should lead to better treatment assignments than the other metalearners. We
demonstrate the practical implications of our findings in the context of
choosing, for each user, the best algorithm for playlist generation in order to
optimize engagement. This is the first comparison of the three different
metalearners on a real-world application at scale (based on more than half a
billion individual treatment assignments). In addition to supporting our
analytical findings, the results show how large A/B tests can provide
substantial value for learning treatment assignment policies, rather than
simply choosing the variant that performs best on average.

arXiv link: http://arxiv.org/abs/2004.11532v5

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2020-04-23

Machine Learning Econometrics: Bayesian algorithms and methods

Authors: Dimitris Korobilis, Davide Pettenuzzo

As the amount of economic and other data generated worldwide increases
vastly, a challenge for future generations of econometricians will be to master
efficient algorithms for inference in empirical models with large information
sets. This Chapter provides a review of popular estimation algorithms for
Bayesian inference in econometrics and surveys alternative algorithms developed
in machine learning and computing science that allow for efficient computation
in high-dimensional settings. The focus is on scalability and parallelizability
of each algorithm, as well as their ability to be adopted in various empirical
settings in economics and finance.

arXiv link: http://arxiv.org/abs/2004.11486v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-04-23

High-dimensional macroeconomic forecasting using message passing algorithms

Authors: Dimitris Korobilis

This paper proposes two distinct contributions to econometric analysis of
large information sets and structural instabilities. First, it treats a
regression model with time-varying coefficients, stochastic volatility and
exogenous predictors, as an equivalent high-dimensional static regression
problem with thousands of covariates. Inference in this specification proceeds
using Bayesian hierarchical priors that shrink the high-dimensional vector of
coefficients either towards zero or time-invariance. Second, it introduces the
frameworks of factor graphs and message passing as a means of designing
efficient Bayesian estimation algorithms. In particular, a Generalized
Approximate Message Passing (GAMP) algorithm is derived that has low
algorithmic complexity and is trivially parallelizable. The result is a
comprehensive methodology that can be used to estimate time-varying parameter
regressions with arbitrarily large number of exogenous predictors. In a
forecasting exercise for U.S. price inflation this methodology is shown to work
very well.

arXiv link: http://arxiv.org/abs/2004.11485v1

Econometrics arXiv paper, submitted: 2020-04-23

Does Subjective Well-being Contribute to Our Understanding of Mexican Well-being?

Authors: Jeremy Heald, Erick Treviño Aguilar

The article reviews the history of well-being to gauge how subjective
question surveys can improve our understanding of well-being in Mexico. The
research uses data at the level of the 32 federal entities or States, taking
advantage of the heterogeneity in development indicator readings between and
within geographical areas, the product of socioeconomic inequality. The data
come principally from two innovative subjective questionnaires, BIARE and
ENVIPE, which intersect in their fully representative state-wide applications
in 2014, but also from conventional objective indicator sources such as the HDI
and conventional surveys. This study uses two approaches, a descriptive
analysis of a state-by-state landscape of indicators, both subjective and
objective, in an initial search for stand-out well-being patterns, and an
econometric study of a large selection of mainly subjective indicators inspired
by theory and the findings of previous Mexican research. Descriptive analysis
confirms that subjective well-being correlates strongly with and complements
objective data, providing interesting directions for analysis. The econometrics
literature indicates that happiness increases with income and satisfying of
material needs as theory suggests, but also that Mexicans are relatively happy
considering their mediocre incomes and high levels of insecurity, the last of
which, by categorizing according to satisfaction with life, can be shown to
impact poorer people disproportionately. The article suggests that well-being
is a complex, multidimensional construct which can be revealed by using
exploratory multi-regression and partial correlations models which juxtapose
subjective and objective indicators.

arXiv link: http://arxiv.org/abs/2004.11420v1

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2020-04-21

Bayesian Optimization of Hyperparameters from Noisy Marginal Likelihood Estimates

Authors: Oskar Gustafsson, Mattias Villani, Pär Stockhammar

Bayesian models often involve a small set of hyperparameters determined by
maximizing the marginal likelihood. Bayesian optimization is a popular
iterative method where a Gaussian process posterior of the underlying function
is sequentially updated by new function evaluations. An acquisition strategy
uses this posterior distribution to decide where to place the next function
evaluation. We propose a novel Bayesian optimization framework for situations
where the user controls the computational effort, and therefore the precision
of the function evaluations. This is a common situation in econometrics where
the marginal likelihood is often computed by Markov chain Monte Carlo (MCMC) or
importance sampling methods, with the precision of the marginal likelihood
estimator determined by the number of samples. The new acquisition strategy
gives the optimizer the option to explore the function with cheap noisy
evaluations and therefore find the optimum faster. The method is applied to
estimating the prior hyperparameters in two popular models on US macroeconomic
time series data: the steady-state Bayesian vector autoregressive (BVAR) and
the time-varying parameter BVAR with stochastic volatility. The proposed method
is shown to find the optimum much quicker than traditional Bayesian
optimization or grid search.

arXiv link: http://arxiv.org/abs/2004.10092v2

Econometrics arXiv updated paper (originally submitted: 2020-04-21)

Revealing Cluster Structures Based on Mixed Sampling Frequencies

Authors: Yeonwoo Rho, Yun Liu, Hie Joo Ahn

This paper proposes a new linearized mixed data sampling (MIDAS) model and
develops a framework to infer clusters in a panel regression with mixed
frequency data. The linearized MIDAS estimation method is more flexible and
substantially simpler to implement than competing approaches. We show that the
proposed clustering algorithm successfully recovers true membership in the
cross-section, both in theory and in simulations, without requiring prior
knowledge of the number of clusters. This methodology is applied to a
mixed-frequency Okun's law model for state-level data in the U.S. and uncovers
four meaningful clusters based on the dynamic features of state-level labor
markets.

arXiv link: http://arxiv.org/abs/2004.09770v2

Econometrics arXiv updated paper (originally submitted: 2020-04-20)

Inference by Stochastic Optimization: A Free-Lunch Bootstrap

Authors: Jean-Jacques Forneron, Serena Ng

Assessing sampling uncertainty in extremum estimation can be challenging when
the asymptotic variance is not analytically tractable. Bootstrap inference
offers a feasible solution but can be computationally costly especially when
the model is complex. This paper uses iterates of a specially designed
stochastic optimization algorithm as draws from which both point estimates and
bootstrap standard errors can be computed in a single run. The draws are
generated by the gradient and Hessian computed from batches of data that are
resampled at each iteration. We show that these draws yield consistent
estimates and asymptotically valid frequentist inference for a large class of
regular problems. The algorithm provides accurate standard errors in simulation
examples and empirical applications at low computational costs. The draws from
the algorithm also provide a convenient way to detect data irregularities.

arXiv link: http://arxiv.org/abs/2004.09627v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-04-20

Noise-Induced Randomization in Regression Discontinuity Designs

Authors: Dean Eckles, Nikolaos Ignatiadis, Stefan Wager, Han Wu

Regression discontinuity designs assess causal effects in settings where
treatment is determined by whether an observed running variable crosses a
pre-specified threshold. Here we propose a new approach to identification,
estimation, and inference in regression discontinuity designs that uses
knowledge about exogenous noise (e.g., measurement error) in the running
variable. In our strategy, we weight treated and control units to balance a
latent variable of which the running variable is a noisy measure. Our approach
is driven by effective randomization provided by the noise in the running
variable, and complements standard formal analyses that appeal to continuity
arguments while ignoring the stochastic nature of the assignment mechanism.

arXiv link: http://arxiv.org/abs/2004.09458v5

Econometrics arXiv paper, submitted: 2020-04-20

Awareness of crash risk improves Kelly strategies in simulated financial time series

Authors: Jan-Christian Gerlach, Jerome Kreuser, Didier Sornette

We simulate a simplified version of the price process including bubbles and
crashes proposed in Kreuser and Sornette (2018). The price process is defined
as a geometric random walk combined with jumps modelled by separate, discrete
distributions associated with positive (and negative) bubbles. The key
ingredient of the model is to assume that the sizes of the jumps are
proportional to the bubble size. Thus, the jumps tend to efficiently bring back
excess bubble prices close to a normal or fundamental value (efficient
crashes). This is different from existing processes studied that assume jumps
that are independent of the mispricing. The present model is simplified
compared to Kreuser and Sornette (2018) in that we ignore the possibility of a
change of the probability of a crash as the price accelerates above the normal
price. We study the behaviour of investment strategies that maximize the
expected log of wealth (Kelly criterion) for the risky asset and a risk-free
asset. We show that the method behaves similarly to Kelly on Geometric Brownian
Motion in that it outperforms other methods in the long-term and it beats
classical Kelly. As a primary source of outperformance, we determine knowledge
about the presence of crashes, but interestingly find that knowledge of only
the size, and not the time of occurrence, already provides a significant and
robust edge. We then perform an error analysis to show that the method is
robust with respect to variations in the parameters. The method is most
sensitive to errors in the expected return.

arXiv link: http://arxiv.org/abs/2004.09368v1

Econometrics arXiv paper, submitted: 2020-04-20

Multi-frequency-band tests for white noise under heteroskedasticity

Authors: Mengya Liu, Fukan Zhu, Ke Zhu

This paper proposes a new family of multi-frequency-band (MFB) tests for the
white noise hypothesis by using the maximum overlap discrete wavelet packet
transform (MODWPT). The MODWPT allows the variance of a process to be
decomposed into the variance of its components on different equal-length
frequency sub-bands, and the MFB tests then measure the distance between the
MODWPT-based variance ratio and its theoretical null value jointly over several
frequency sub-bands. The resulting MFB tests have the chi-squared asymptotic
null distributions under mild conditions, which allow the data to be
heteroskedastic. The MFB tests are shown to have the desirable size and power
performance by simulation studies, and their usefulness is further illustrated
by two applications.

arXiv link: http://arxiv.org/abs/2004.09161v1

Econometrics arXiv paper, submitted: 2020-04-20

Consistent Calibration of Economic Scenario Generators: The Case for Conditional Simulation

Authors: Misha van Beek

Economic Scenario Generators (ESGs) simulate economic and financial variables
forward in time for risk management and asset allocation purposes. It is often
not feasible to calibrate the dynamics of all variables within the ESG to
historical data alone. Calibration to forward-information such as future
scenarios and return expectations is needed for stress testing and portfolio
optimization, but no generally accepted methodology is available. This paper
introduces the Conditional Scenario Simulator, which is a framework for
consistently calibrating simulations and projections of economic and financial
variables both to historical data and forward-looking information. The
framework can be viewed as a multi-period, multi-factor generalization of the
Black-Litterman model, and can embed a wide array of financial and
macroeconomic models. Two practical examples demonstrate this in a frequentist
and Bayesian setting.

arXiv link: http://arxiv.org/abs/2004.09042v1

Econometrics arXiv paper, submitted: 2020-04-19

Estimating High-Dimensional Discrete Choice Model of Differentiated Products with Random Coefficients

Authors: Masayuki Sawada, Kohei Kawaguchi

We propose an estimation procedure for discrete choice models of
differentiated products with possibly high-dimensional product attributes. In
our model, high-dimensional attributes can be determinants of both mean and
variance of the indirect utility of a product. The key restriction in our model
is that the high-dimensional attributes affect the variance of indirect
utilities only through finitely many indices. In a framework of the
random-coefficients logit model, we show a bound on the error rate of a
$l_1$-regularized minimum distance estimator and prove the asymptotic linearity
of the de-biased estimator.

arXiv link: http://arxiv.org/abs/2004.08791v1

Econometrics arXiv updated paper (originally submitted: 2020-04-17)

Loss aversion and the welfare ranking of policy interventions

Authors: Sergio Firpo, Antonio F. Galvao, Martyna Kobus, Thomas Parker, Pedro Rosa-Dias

This paper develops theoretical criteria and econometric methods to rank
policy interventions in terms of welfare when individuals are loss-averse. Our
new criterion for "loss aversion-sensitive dominance" defines a weak partial
ordering of the distributions of policy-induced gains and losses. It applies to
the class of welfare functions which model individual preferences with
non-decreasing and loss-averse attitudes towards changes in outcomes. We also
develop new statistical methods to test loss aversion-sensitive dominance in
practice, using nonparametric plug-in estimates; these allow inference to be
conducted through a special resampling procedure. Since point-identification of
the distribution of policy-induced gains and losses may require strong
assumptions, we extend our comparison criteria, test statistics, and resampling
procedures to the partially-identified case. We illustrate our methods with a
simple empirical application to the welfare comparison of alternative income
support programs in the US.

arXiv link: http://arxiv.org/abs/2004.08468v4

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-04-17

Estimating and Projecting Air Passenger Traffic during the COVID-19 Coronavirus Outbreak and its Socio-Economic Impact

Authors: Stefano Maria Iacus, Fabrizio Natale, Carlos Satamaria, Spyridon Spyratos, Michele Vespe

The main focus of this study is to collect and prepare data on air passengers
traffic worldwide with the scope of analyze the impact of travel ban on the
aviation sector. Based on historical data from January 2010 till October 2019,
a forecasting model is implemented in order to set a reference baseline. Making
use of airplane movements extracted from online flight tracking platforms and
on-line booking systems, this study presents also a first assessment of recent
changes in flight activity around the world as a result of the COVID-19
pandemic. To study the effects of air travel ban on aviation and in turn its
socio-economic, several scenarios are constructed based on past pandemic crisis
and the observed flight volumes. It turns out that, according to this
hypothetical scenarios, in the first Quarter of 2020 the impact of aviation
losses could have negatively reduced World GDP by 0.02% to 0.12% according to
the observed data and, in the worst case scenarios, at the end of 2020 the loss
could be as high as 1.41-1.67% and job losses may reach the value of 25-30
millions. Focusing on EU27, the GDP loss may amount to 1.66-1.98% by the end of
2020 and the number of job losses from 4.2 to 5 millions in the worst case
scenarios. Some countries will be more affected than others in the short run
and most European airlines companies will suffer from the travel ban.

arXiv link: http://arxiv.org/abs/2004.08460v2

Econometrics arXiv updated paper (originally submitted: 2020-04-17)

Causal Inference under Outcome-Based Sampling with Monotonicity Assumptions

Authors: Sung Jae Jun, Sokbae Lee

We study causal inference under case-control and case-population sampling.
Specifically, we focus on the binary-outcome and binary-treatment case, where
the parameters of interest are causal relative and attributable risks defined
via the potential outcome framework. It is shown that strong ignorability is
not always as powerful as it is under random sampling and that certain
monotonicity assumptions yield comparable results in terms of sharp identified
intervals. Specifically, the usual odds ratio is shown to be a sharp identified
upper bound on causal relative risk under the monotone treatment response and
monotone treatment selection assumptions. We offer algorithms for inference on
the causal parameters that are aggregated over the true population distribution
of the covariates. We show the usefulness of our approach by studying three
empirical examples: the benefit of attending private school for entering a
prestigious university in Pakistan; the relationship between staying in school
and getting involved with drug-trafficking gangs in Brazil; and the link
between physicians' hours and size of the group practice in the United States.

arXiv link: http://arxiv.org/abs/2004.08318v6

Econometrics arXiv paper, submitted: 2020-04-17

The direct and spillover effects of a nationwide socio-emotional learning program for disruptive students

Authors: Clément de Chaisemartin, Nicolás Navarrete H.

Social and emotional learning (SEL) programs teach disruptive students to
improve their classroom behavior. Small-scale programs in high-income countries
have been shown to improve treated students' behavior and academic outcomes.
Using a randomized experiment, we show that a nationwide SEL program in Chile
has no effect on eligible students. We find evidence that very disruptive
students may hamper the program's effectiveness. ADHD, a disorder correlated
with disruptiveness, is much more prevalent in Chile than in high-income
countries, so very disruptive students may be more present in Chile than in the
contexts where SEL programs have been shown to work.

arXiv link: http://arxiv.org/abs/2004.08126v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-04-16

Short-Term Covid-19 Forecast for Latecomers

Authors: Marcelo Medeiros, Alexandre Street, Davi Valladão, Gabriel Vasconcelos, Eduardo Zilberman

The number of Covid-19 cases is increasing dramatically worldwide. Therefore,
the availability of reliable forecasts for the number of cases in the coming
days is of fundamental importance. We propose a simple statistical method for
short-term real-time forecasting of the number of Covid-19 cases and fatalities
in countries that are latecomers -- i.e., countries where cases of the disease
started to appear some time after others. In particular, we propose a penalized
(LASSO) regression with an error correction mechanism to construct a model of a
latecomer in terms of the other countries that were at a similar stage of the
pandemic some days before. By tracking the number of cases and deaths in those
countries, we forecast through an adaptive rolling-window scheme the number of
cases and deaths in the latecomer. We apply this methodology to Brazil, and
show that (so far) it has been performing very well. These forecasts aim to
foster a better short-run management of the health system capacity.

arXiv link: http://arxiv.org/abs/2004.07977v3

Econometrics arXiv paper, submitted: 2020-04-16

Identification of a class of index models: A topological approach

Authors: Mogens Fosgerau, Dennis Kristensen

We establish nonparametric identification in a class of so-called index
models using a novel approach that relies on general topological results. Our
proof strategy requires substantially weaker conditions on the functions and
distributions characterizing the model compared to existing strategies; in
particular, it does not require any large support conditions on the regressors
of our model. We apply the general identification result to additive random
utility and competing risk models.

arXiv link: http://arxiv.org/abs/2004.07900v1

Econometrics arXiv paper, submitted: 2020-04-16

Non-linear interlinkages and key objectives amongst the Paris Agreement and the Sustainable Development Goals

Authors: Felix Laumann, Julius von Kügelgen, Mauricio Barahona

The United Nations' ambitions to combat climate change and prosper human
development are manifested in the Paris Agreement and the Sustainable
Development Goals (SDGs), respectively. These are inherently inter-linked as
progress towards some of these objectives may accelerate or hinder progress
towards others. We investigate how these two agendas influence each other by
defining networks of 18 nodes, consisting of the 17 SDGs and climate change,
for various groupings of countries. We compute a non-linear measure of
conditional dependence, the partial distance correlation, given any subset of
the remaining 16 variables. These correlations are treated as weights on edges,
and weighted eigenvector centralities are calculated to determine the most
important nodes. We find that SDG 6, clean water and sanitation, and SDG 4,
quality education, are most central across nearly all groupings of countries.
In developing regions, SDG 17, partnerships for the goals, is strongly
connected to the progress of other objectives in the two agendas whilst,
somewhat surprisingly, SDG 8, decent work and economic growth, is not as
important in terms of eigenvector centrality.

arXiv link: http://arxiv.org/abs/2004.09318v1

Econometrics arXiv cross-link from q-bio.PE (q-bio.PE), submitted: 2020-04-14

Epidemic control via stochastic optimal control

Authors: Andrew Lesniewski

We study the problem of optimal control of the stochastic SIR model. Models
of this type are used in mathematical epidemiology to capture the time
evolution of highly infectious diseases such as COVID-19. Our approach relies
on reformulating the Hamilton-Jacobi-Bellman equation as a stochastic minimum
principle. This results in a system of forward backward stochastic differential
equations, which is amenable to numerical solution via Monte Carlo simulations.
We present a number of numerical solutions of the system under a variety of
scenarios.

arXiv link: http://arxiv.org/abs/2004.06680v3

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2020-04-14

On Vickrey's Income Averaging

Authors: Stefan Steinerberger, Aleh Tsyvinski

We consider a small set of axioms for income averaging -- recursivity,
continuity, and the boundary condition for the present. These properties yield
a unique averaging function that is the density of the reflected Brownian
motion with a drift started at the current income and moving over the past
incomes. When averaging is done over the short past, the weighting function is
asymptotically converging to a Gaussian. When averaging is done over the long
horizon, the weighing function converges to the exponential distribution. For
all intermediate averaging scales, we derive an explicit solution that
interpolates between the two.

arXiv link: http://arxiv.org/abs/2004.06289v1

Econometrics arXiv paper, submitted: 2020-04-13

Estimating the COVID-19 Infection Rate: Anatomy of an Inference Problem

Authors: Charles F. Manski, Francesca Molinari

As a consequence of missing data on tests for infection and imperfect
accuracy of tests, reported rates of population infection by the SARS CoV-2
virus are lower than actual rates of infection. Hence, reported rates of severe
illness conditional on infection are higher than actual rates. Understanding
the time path of the COVID-19 pandemic has been hampered by the absence of
bounds on infection rates that are credible and informative. This paper
explains the logical problem of bounding these rates and reports illustrative
findings, using data from Illinois, New York, and Italy. We combine the data
with assumptions on the infection rate in the untested population and on the
accuracy of the tests that appear credible in the current context. We find that
the infection rate might be substantially higher than reported. We also find
that the infection fatality rate in Italy is substantially lower than reported.

arXiv link: http://arxiv.org/abs/2004.06178v1

Econometrics arXiv paper, submitted: 2020-04-12

A Machine Learning Approach for Flagging Incomplete Bid-rigging Cartels

Authors: Hannes Wallimann, David Imhof, Martin Huber

We propose a new method for flagging bid rigging, which is particularly
useful for detecting incomplete bid-rigging cartels. Our approach combines
screens, i.e. statistics derived from the distribution of bids in a tender,
with machine learning to predict the probability of collusion. As a
methodological innovation, we calculate such screens for all possible subgroups
of three or four bids within a tender and use summary statistics like the mean,
median, maximum, and minimum of each screen as predictors in the machine
learning algorithm. This approach tackles the issue that competitive bids in
incomplete cartels distort the statistical signals produced by bid rigging. We
demonstrate that our algorithm outperforms previously suggested methods in
applications to incomplete cartels based on empirical data from Switzerland.

arXiv link: http://arxiv.org/abs/2004.05629v1

Econometrics arXiv updated paper (originally submitted: 2020-04-10)

Wild Bootstrap Inference for Penalized Quantile Regression for Longitudinal Data

Authors: Carlos Lamarche, Thomas Parker

The existing theory of penalized quantile regression for longitudinal data
has focused primarily on point estimation. In this work, we investigate
statistical inference. We propose a wild residual bootstrap procedure and show
that it is asymptotically valid for approximating the distribution of the
penalized estimator. The model puts no restrictions on individual effects, and
the estimator achieves consistency by letting the shrinkage decay in importance
asymptotically. The new method is easy to implement and simulation studies show
that it has accurate small sample behavior in comparison with existing
procedures. Finally, we illustrate the new approach using U.S. Census data to
estimate a model that includes more than eighty thousand parameters.

arXiv link: http://arxiv.org/abs/2004.05127v3

Econometrics arXiv cross-link from cs.SI (cs.SI), submitted: 2020-04-10

mFLICA: An R package for Inferring Leadership of Coordination From Time Series

Authors: Chainarong Amornbunchornvej

Leadership is a process that leaders influence followers to achieve
collective goals. One of special cases of leadership is the coordinated pattern
initiation. In this context, leaders are initiators who initiate coordinated
patterns that everyone follows. Given a set of individual-multivariate time
series of real numbers, the mFLICA package provides a framework for R users to
infer coordination events within time series, initiators and followers of these
coordination events, as well as dynamics of group merging and splitting. The
mFLICA package also has a visualization function to make results of leadership
inference more understandable. The package is available on Comprehensive R
Archive Network (CRAN) at https://CRAN.R-project.org/package=mFLICA.

arXiv link: http://arxiv.org/abs/2004.06092v3

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-04-10

Direct and spillover effects of a new tramway line on the commercial vitality of peripheral streets. A synthetic-control approach

Authors: Giulio Grossi, Marco Mariani, Alessandra Mattei, Patrizia Lattarulo, Özge Öner

In cities, the creation of public transport infrastructure such as light
rails can cause changes on a very detailed spatial scale, with different
stories unfolding next to each other within a same urban neighborhood. We study
the direct effect of a light rail line built in Florence (Italy) on the retail
density of the street where it was built and and its spillover effect on other
streets in the treated street's neighborhood. To this aim, we investigate the
use of the Synthetic Control Group (SCG) methods in panel comparative case
studies where interference between the treated and the untreated units is
plausible, an issue still little researched in the SCG methodological
literature. We frame our discussion in the potential outcomes approach. Under a
partial interference assumption, we formally define relevant direct and
spillover causal effects. We also consider the “unrealized” spillover effect
on the treated street in the hypothetical scenario that another street in the
treated unit's neighborhood had been assigned to the intervention.

arXiv link: http://arxiv.org/abs/2004.05027v5

Econometrics arXiv paper, submitted: 2020-04-10

Forecasts with Bayesian vector autoregressions under real time conditions

Authors: Michael Pfarrhofer

This paper investigates the sensitivity of forecast performance measures to
taking a real time versus pseudo out-of-sample perspective. We use monthly
vintages for the United States (US) and the Euro Area (EA) and estimate a set
of vector autoregressive (VAR) models of different sizes with constant and
time-varying parameters (TVPs) and stochastic volatility (SV). Our results
suggest differences in the relative ordering of model performance for point and
density forecasts depending on whether real time data or truncated final
vintages in pseudo out-of-sample simulations are used for evaluating forecasts.
No clearly superior specification for the US or the EA across variable types
and forecast horizons can be identified, although larger models featuring TVPs
appear to be affected the least by missing values and data revisions. We
identify substantial differences in performance metrics with respect to whether
forecasts are produced for the US or the EA.

arXiv link: http://arxiv.org/abs/2004.04984v1

Econometrics arXiv paper, submitted: 2020-04-09

On the Factors Influencing the Choices of Weekly Telecommuting Frequencies of Post-secondary Students in Toronto

Authors: Khandker Nurul Habib, Ph. D., PEng

The paper presents an empirical investigation of telecommuting frequency
choices by post-secondary students in Toronto. It uses a dataset collected
through a large-scale travel survey conducted on post-secondary students of
four major universities in Toronto and it employs multiple alternative
econometric modelling techniques for the empirical investigation. Results
contribute on two fronts. Firstly, it presents empirical investigations of
factors affecting telecommuting frequency choices of post-secondary students
that are rare in literature. Secondly, it identifies better a performing
econometric modelling technique for modelling telecommuting frequency choices.
Empirical investigation clearly reveals that telecommuting for school related
activities is prevalent among post-secondary students in Toronto. Around 80
percent of 0.18 million of the post-secondary students of the region, who make
roughly 36,000 trips per day, also telecommute at least once a week.
Considering that large numbers of students need to spend a long time travelling
from home to campus with around 33 percent spending more than two hours a day
on travelling, telecommuting has potential to enhance their quality of life.
Empirical investigations reveal that car ownership and living farther from the
campus have similar positive effects on the choice of higher frequency of
telecommuting. Students who use a bicycle for regular travel are least likely
to telecommute, compared to those using transit or a private car.

arXiv link: http://arxiv.org/abs/2004.04683v1

Econometrics arXiv updated paper (originally submitted: 2020-04-08)

Bias optimal vol-of-vol estimation: the role of window overlapping

Authors: Giacomo Toscano, Maria Cristina Recchioni

We derive a feasible criterion for the bias-optimal selection of the tuning
parameters involved in estimating the integrated volatility of the spot
volatility via the simple realized estimator by Barndorff-Nielsen and Veraart
(2009). Our analytic results are obtained assuming that the spot volatility is
a continuous mean-reverting process and that consecutive local windows for
estimating the spot volatility are allowed to overlap in a finite sample
setting. Moreover, our analytic results support some optimal selections of
tuning parameters prescribed in the literature, based on numerical evidence.
Interestingly, it emerges that window-overlapping is crucial for optimizing the
finite-sample bias of volatility-of-volatility estimates.

arXiv link: http://arxiv.org/abs/2004.04013v2

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2020-04-08

Manipulation-Proof Machine Learning

Authors: Daniel Björkegren, Joshua E. Blumenstock, Samsun Knight

An increasing number of decisions are guided by machine learning algorithms.
In many settings, from consumer credit to criminal justice, those decisions are
made by applying an estimator to data on an individual's observed behavior. But
when consequential decisions are encoded in rules, individuals may
strategically alter their behavior to achieve desired outcomes. This paper
develops a new class of estimator that is stable under manipulation, even when
the decision rule is fully transparent. We explicitly model the costs of
manipulating different behaviors, and identify decision rules that are stable
in equilibrium. Through a large field experiment in Kenya, we show that
decision rules estimated with our strategy-robust method outperform those based
on standard supervised learning approaches.

arXiv link: http://arxiv.org/abs/2004.03865v1

Econometrics arXiv updated paper (originally submitted: 2020-04-07)

Robust Empirical Bayes Confidence Intervals

Authors: Timothy B. Armstrong, Michal Kolesár, Mikkel Plagborg-Møller

We construct robust empirical Bayes confidence intervals (EBCIs) in a normal
means problem. The intervals are centered at the usual linear empirical Bayes
estimator, but use a critical value accounting for shrinkage. Parametric EBCIs
that assume a normal distribution for the means (Morris, 1983b) may
substantially undercover when this assumption is violated. In contrast, our
EBCIs control coverage regardless of the means distribution, while remaining
close in length to the parametric EBCIs when the means are indeed Gaussian. If
the means are treated as fixed, our EBCIs have an average coverage guarantee:
the coverage probability is at least $1 - \alpha$ on average across the $n$
EBCIs for each of the means. Our empirical application considers the effects of
U.S. neighborhoods on intergenerational mobility.

arXiv link: http://arxiv.org/abs/2004.03448v4

Econometrics arXiv updated paper (originally submitted: 2020-04-07)

Inference in Unbalanced Panel Data Models with Interactive Fixed Effects

Authors: Daniel Czarnowske, Amrei Stammann

We derive the asymptotic theory of Bai (2009)'s interactive fixed effects
estimator in unbalanced panels where the source of attrition is conditionally
random. For inference, we propose a method of alternating projections algorithm
based on straightforward scalar expressions to compute the residualized
variables required for the estimation of the bias terms and the covariance
matrix. Simulation experiments confirm our asymptotic results as reliable
finite sample approximations. Furthermore, we reassess Acemoglu et al. (2019).
Allowing for a more general form of unobserved heterogeneity, we confirm
significant effects of democratization on growth.

arXiv link: http://arxiv.org/abs/2004.03414v2

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2020-04-07

Visualising the Evolution of English Covid-19 Cases with Topological Data Analysis Ball Mapper

Authors: Pawel Dlotko, Simon Rudkin

Understanding disease spread through data visualisation has concentrated on
trends and maps. Whilst these are helpful, they neglect important
multi-dimensional interactions between characteristics of communities. Using
the Topological Data Analysis Ball Mapper algorithm we construct an abstract
representation of NUTS3 level economic data, overlaying onto it the confirmed
cases of Covid-19 in England. In so doing we may understand how the disease
spreads on different socio-economical dimensions. It is observed that some
areas of the characteristic space have quickly raced to the highest levels of
infection, while others close by in the characteristic space, do not show large
infection growth. Likewise, we see patterns emerging in very different areas
that command more monitoring. A strong contribution for Topological Data
Analysis, and the Ball Mapper algorithm especially, in comprehending dynamic
epidemic data is signposted.

arXiv link: http://arxiv.org/abs/2004.03282v2

Econometrics arXiv updated paper (originally submitted: 2020-04-06)

Double Debiased Machine Learning Nonparametric Inference with Continuous Treatments

Authors: Kyle Colangelo, Ying-Ying Lee

We propose a doubly robust inference method for causal effects of continuous
treatment variables, under unconfoundedness and with nonparametric or
high-dimensional nuisance functions. Our double debiased machine learning (DML)
estimators for the average dose-response function (or the average structural
function) and the partial effects are asymptotically normal with non-parametric
convergence rates. The first-step estimators for the nuisance conditional
expectation function and the conditional density can be nonparametric or ML
methods. Utilizing a kernel-based doubly robust moment function and
cross-fitting, we give high-level conditions under which the nuisance function
estimators do not affect the first-order large sample distribution of the DML
estimators. We provide sufficient low-level conditions for kernel, series, and
deep neural networks. We justify the use of kernel to localize the continuous
treatment at a given value by the Gateaux derivative. We implement various ML
methods in Monte Carlo simulations and an empirical application on a job
training program evaluation

arXiv link: http://arxiv.org/abs/2004.03036v8

Econometrics arXiv paper, submitted: 2020-04-06

What do online listings tell us about the housing market?

Authors: Michele Loberto, Andrea Luciani, Marco Pangallo

Traditional data sources for the analysis of housing markets show several
limitations, that recently started to be overcome using data coming from
housing sales advertisements (ads) websites. In this paper, using a large
dataset of ads in Italy, we provide the first comprehensive analysis of the
problems and potential of these data. The main problem is that multiple ads
("duplicates") can correspond to the same housing unit. We show that this issue
is mainly caused by sellers' attempt to increase visibility of their listings.
Duplicates lead to misrepresentation of the volume and composition of housing
supply, but this bias can be corrected by identifying duplicates with machine
learning tools. We then focus on the potential of these data. We show that the
timeliness, granularity, and online nature of these data allow monitoring of
housing demand, supply and liquidity, and that the (asking) prices posted on
the website can be more informative than transaction prices.

arXiv link: http://arxiv.org/abs/2004.02706v1

Econometrics arXiv cross-link from q-fin.PM (q-fin.PM), submitted: 2020-04-06

Spanning analysis of stock market anomalies under Prospect Stochastic Dominance

Authors: Stelios Arvanitis, Olivier Scaillet, Nikolas Topaloglou

We develop and implement methods for determining whether introducing new
securities or relaxing investment constraints improves the investment
opportunity set for prospect investors. We formulate a new testing procedure
for prospect spanning for two nested portfolio sets based on subsampling and
Linear Programming. In an application, we use the prospect spanning framework
to evaluate whether well-known anomalies are spanned by standard factors. We
find that of the strategies considered, many expand the opportunity set of the
prospect type investors, thus have real economic value for them. In-sample and
out-of-sample results prove remarkably consistent in identifying genuine
anomalies for prospect investors.

arXiv link: http://arxiv.org/abs/2004.02670v1

Econometrics arXiv updated paper (originally submitted: 2020-04-04)

Kernel Estimation of Spot Volatility with Microstructure Noise Using Pre-Averaging

Authors: José E. Figueroa-López, Bei Wu

We first revisit the problem of estimating the spot volatility of an It\^o
semimartingale using a kernel estimator. We prove a Central Limit Theorem with
optimal convergence rate for a general two-sided kernel. Next, we introduce a
new pre-averaging/kernel estimator for spot volatility to handle the
microstructure noise of ultra high-frequency observations. We prove a Central
Limit Theorem for the estimation error with an optimal rate and study the
optimal selection of the bandwidth and kernel functions. We show that the
pre-averaging/kernel estimator's asymptotic variance is minimal for exponential
kernels, hence, justifying the need of working with kernels of unbounded
support as proposed in this work. We also develop a feasible implementation of
the proposed estimators with optimal bandwidth. Monte Carlo experiments confirm
the superior performance of the devised method.

arXiv link: http://arxiv.org/abs/2004.01865v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-04-03

Estimation and Uniform Inference in Sparse High-Dimensional Additive Models

Authors: Philipp Bach, Sven Klaassen, Jannis Kueck, Martin Spindler

We develop a novel method to construct uniformly valid confidence bands for a
nonparametric component $f_1$ in the sparse additive model $Y=f_1(X_1)+\ldots +
f_p(X_p) + \varepsilon$ in a high-dimensional setting. Our method integrates
sieve estimation into a high-dimensional Z-estimation framework, facilitating
the construction of uniformly valid confidence bands for the target component
$f_1$. To form these confidence bands, we employ a multiplier bootstrap
procedure. Additionally, we provide rates for the uniform lasso estimation in
high dimensions, which may be of independent interest. Through simulation
studies, we demonstrate that our proposed method delivers reliable results in
terms of estimation and coverage, even in small samples.

arXiv link: http://arxiv.org/abs/2004.01623v2

Econometrics arXiv updated paper (originally submitted: 2020-04-03)

Targeting predictors in random forest regression

Authors: Daniel Borup, Bent Jesper Christensen, Nicolaj Nørgaard Mühlbach, Mikkel Slot Nielsen

Random forest regression (RF) is an extremely popular tool for the analysis
of high-dimensional data. Nonetheless, its benefits may be lessened in sparse
settings due to weak predictors, and a pre-estimation dimension reduction
(targeting) step is required. We show that proper targeting controls the
probability of placing splits along strong predictors, thus providing an
important complement to RF's feature sampling. This is supported by simulations
using representative finite samples. Moreover, we quantify the immediate gain
from targeting in terms of increased strength of individual trees.
Macroeconomic and financial applications show that the bias-variance trade-off
implied by targeting, due to increased correlation among trees in the forest,
is balanced at a medium degree of targeting, selecting the best 10--30% of
commonly applied predictors. Improvements in predictive accuracy of targeted RF
relative to ordinary RF are considerable, up to 12-13%, occurring both in
recessions and expansions, particularly at long horizons.

arXiv link: http://arxiv.org/abs/2004.01411v4

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2020-03-31

Machine Learning Algorithms for Financial Asset Price Forecasting

Authors: Philip Ndikum

This research paper explores the performance of Machine Learning (ML)
algorithms and techniques that can be used for financial asset price
forecasting. The prediction and forecasting of asset prices and returns remains
one of the most challenging and exciting problems for quantitative finance and
practitioners alike. The massive increase in data generated and captured in
recent years presents an opportunity to leverage Machine Learning algorithms.
This study directly compares and contrasts state-of-the-art implementations of
modern Machine Learning algorithms on high performance computing (HPC)
infrastructures versus the traditional and highly popular Capital Asset Pricing
Model (CAPM) on U.S equities data. The implemented Machine Learning models -
trained on time series data for an entire stock universe (in addition to
exogenous macroeconomic variables) significantly outperform the CAPM on
out-of-sample (OOS) test data.

arXiv link: http://arxiv.org/abs/2004.01504v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-03-31

Optimal Combination of Arctic Sea Ice Extent Measures: A Dynamic Factor Modeling Approach

Authors: Francis X. Diebold, Maximilian Göbel, Philippe Goulet Coulombe, Glenn D. Rudebusch, Boyuan Zhang

The diminishing extent of Arctic sea ice is a key indicator of climate change
as well as an accelerant for future global warming. Since 1978, Arctic sea ice
has been measured using satellite-based microwave sensing; however, different
measures of Arctic sea ice extent have been made available based on differing
algorithmic transformations of the raw satellite data. We propose and estimate
a dynamic factor model that combines four of these measures in an optimal way
that accounts for their differing volatility and cross-correlations. We then
use the Kalman smoother to extract an optimal combined measure of Arctic sea
ice extent. It turns out that almost all weight is put on the NSIDC Sea Ice
Index, confirming and enhancing confidence in the Sea Ice Index and the NASA
Team algorithm on which it is based.

arXiv link: http://arxiv.org/abs/2003.14276v2

Econometrics arXiv paper, submitted: 2020-03-31

A wavelet analysis of inter-dependence, contagion and long memory among global equity markets

Authors: Avishek Bhandari

This study attempts to investigate into the structure and features of global
equity markets from a time-frequency perspective. An analysis grounded on this
framework allows one to capture information from a different dimension, as
opposed to the traditional time domain analyses, where multiscale structures of
financial markets are clearly extracted. In financial time series, multiscale
features manifest themselves due to presence of multiple time horizons. The
existence of multiple time horizons necessitates a careful investigation of
each time horizon separately as market structures are not homogenous across
different time horizons. The presence of multiple time horizons, with varying
levels of complexity, requires one to investigate financial time series from a
heterogeneous market perspective where market players are said to operate at
different investment horizons. This thesis extends the application of
time-frequency based wavelet techniques to: i) analyse the interdependence of
global equity markets from a heterogeneous investor perspective with a special
focus on the Indian stock market, ii) investigate the contagion effect, if any,
of financial crises on Indian stock market, and iii) to study fractality and
scaling properties of global equity markets and analyse the efficiency of
Indian stock markets using wavelet based long memory methods.

arXiv link: http://arxiv.org/abs/2003.14110v1

Econometrics arXiv updated paper (originally submitted: 2020-03-30)

Specification tests for generalized propensity scores using double projections

Authors: Pedro H. C. Sant'Anna, Xiaojun Song

This paper proposes a new class of nonparametric tests for the correct
specification of models based on conditional moment restrictions, paying
particular attention to generalized propensity score models. The test procedure
is based on two different projection arguments, leading to test statistics that
are suitable to setups with many covariates, and are (asymptotically) invariant
to the estimation method used to estimate the nuisance parameters. We show that
our proposed tests are able to detect a broad class of local alternatives
converging to the null at the usual parametric rate and illustrate its
attractive power properties via simulations. We also extend our proposal to
test parametric or semiparametric single-index-type models.

arXiv link: http://arxiv.org/abs/2003.13803v2

Econometrics arXiv paper, submitted: 2020-03-30

High-dimensional mixed-frequency IV regression

Authors: Andrii Babii

This paper introduces a high-dimensional linear IV regression for the data
sampled at mixed frequencies. We show that the high-dimensional slope parameter
of a high-frequency covariate can be identified and accurately estimated
leveraging on a low-frequency instrumental variable. The distinguishing feature
of the model is that it allows handing high-dimensional datasets without
imposing the approximate sparsity restrictions. We propose a
Tikhonov-regularized estimator and derive the convergence rate of its
mean-integrated squared error for time series data. The estimator has a
closed-form expression that is easy to compute and demonstrates excellent
performance in our Monte Carlo experiments. We estimate the real-time price
elasticity of supply on the Australian electricity spot market. Our estimates
suggest that the supply is relatively inelastic and that its elasticity is
heterogeneous throughout the day.

arXiv link: http://arxiv.org/abs/2003.13478v1

Econometrics arXiv paper, submitted: 2020-03-26

Sequential monitoring for cointegrating regressions

Authors: Lorenzo Trapani, Emily Whitehouse

We develop monitoring procedures for cointegrating regressions, testing the
null of no breaks against the alternatives that there is either a change in the
slope, or a change to non-cointegration. After observing the regression for a
calibration sample m, we study a CUSUM-type statistic to detect the presence of
change during a monitoring horizon m+1,...,T. Our procedures use a class of
boundary functions which depend on a parameter whose value affects the delay in
detecting the possible break. Technically, these procedures are based on almost
sure limiting theorems whose derivation is not straightforward. We therefore
define a monitoring function which - at every point in time - diverges to
infinity under the null, and drifts to zero under alternatives. We cast this
sequence in a randomised procedure to construct an i.i.d. sequence, which we
then employ to define the detector function. Our monitoring procedure rejects
the null of no break (when correct) with a small probability, whilst it rejects
with probability one over the monitoring horizon in the presence of breaks.

arXiv link: http://arxiv.org/abs/2003.12182v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-03-26

Estimating Treatment Effects with Observed Confounders and Mediators

Authors: Shantanu Gupta, Zachary C. Lipton, David Childers

Given a causal graph, the do-calculus can express treatment effects as
functionals of the observational joint distribution that can be estimated
empirically. Sometimes the do-calculus identifies multiple valid formulae,
prompting us to compare the statistical properties of the corresponding
estimators. For example, the backdoor formula applies when all confounders are
observed and the frontdoor formula applies when an observed mediator transmits
the causal effect. In this paper, we investigate the over-identified scenario
where both confounders and mediators are observed, rendering both estimators
valid. Addressing the linear Gaussian causal model, we demonstrate that either
estimator can dominate the other by an unbounded constant factor. Next, we
derive an optimal estimator, which leverages all observed variables, and bound
its finite-sample variance. We show that it strictly outperforms the backdoor
and frontdoor estimators and that this improvement can be unbounded. We also
present a procedure for combining two datasets, one with observed confounders
and another with observed mediators. Finally, we evaluate our methods on both
simulated data and the IHDP and JTPA datasets.

arXiv link: http://arxiv.org/abs/2003.11991v3

Econometrics arXiv updated paper (originally submitted: 2020-03-25)

Rationalizing Rational Expectations: Characterization and Tests

Authors: Xavier D'Haultfoeuille, Christophe Gaillac, Arnaud Maurel

In this paper, we build a new test of rational expectations based on the
marginal distributions of realizations and subjective beliefs. This test is
widely applicable, including in the common situation where realizations and
beliefs are observed in two different datasets that cannot be matched. We show
that whether one can rationalize rational expectations is equivalent to the
distribution of realizations being a mean-preserving spread of the distribution
of beliefs. The null hypothesis can then be rewritten as a system of many
moment inequality and equality constraints, for which tests have been recently
developed in the literature. The test is robust to measurement errors under
some restrictions and can be extended to account for aggregate shocks. Finally,
we apply our methodology to test for rational expectations about future
earnings. While individuals tend to be right on average about their future
earnings, our test strongly rejects rational expectations.

arXiv link: http://arxiv.org/abs/2003.11537v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-03-25

Missing at Random or Not: A Semiparametric Testing Approach

Authors: Rui Duan, C. Jason Liang, Pamela Shaw, Cheng Yong Tang, Yong Chen

Practical problems with missing data are common, and statistical methods have
been developed concerning the validity and/or efficiency of statistical
procedures. On a central focus, there have been longstanding interests on the
mechanism governing data missingness, and correctly deciding the appropriate
mechanism is crucially relevant for conducting proper practical investigations.
The conventional notions include the three common potential classes -- missing
completely at random, missing at random, and missing not at random. In this
paper, we present a new hypothesis testing approach for deciding between
missing at random and missing not at random. Since the potential alternatives
of missing at random are broad, we focus our investigation on a general class
of models with instrumental variables for data missing not at random. Our
setting is broadly applicable, thanks to that the model concerning the missing
data is nonparametric, requiring no explicit model specification for the data
missingness. The foundational idea is to develop appropriate discrepancy
measures between estimators whose properties significantly differ only when
missing at random does not hold. We show that our new hypothesis testing
approach achieves an objective data oriented choice between missing at random
or not. We demonstrate the feasibility, validity, and efficacy of the new test
by theoretical analysis, simulation studies, and a real data analysis.

arXiv link: http://arxiv.org/abs/2003.11181v1

Econometrics arXiv updated paper (originally submitted: 2020-03-20)

A Correlated Random Coefficient Panel Model with Time-Varying Endogeneity

Authors: Louise Laage

This paper studies a class of linear panel models with random coefficients.
We do not restrict the joint distribution of the time-invariant unobserved
heterogeneity and the covariates. We investigate identification of the average
partial effect (APE) when fixed-effect techniques cannot be used to control for
the correlation between the regressors and the time-varying disturbances.
Relying on control variables, we develop a constructive two-step identification
argument. The first step identifies nonparametrically the conditional
expectation of the disturbances given the regressors and the control variables,
and the second step uses “between-group” variations, correcting for
endogeneity, to identify the APE. We propose a natural semiparametric estimator
of the APE, show its $n$ asymptotic normality and compute its asymptotic
variance. The estimator is computationally easy to implement, and Monte Carlo
simulations show favorable finite sample properties. Control variables arise in
various economic and econometric models, and we propose applications of our
argument in several models. As an empirical illustration, we estimate the
average elasticity of intertemporal substitution in a labor supply model with
random coefficients.

arXiv link: http://arxiv.org/abs/2003.09367v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-03-18

Causal Simulation Experiments: Lessons from Bias Amplification

Authors: Tyrel Stokes, Russell Steele, Ian Shrier

Recent theoretical work in causal inference has explored an important class
of variables which, when conditioned on, may further amplify existing
unmeasured confounding bias (bias amplification). Despite this theoretical
work, existing simulations of bias amplification in clinical settings have
suggested bias amplification may not be as important in many practical cases as
suggested in the theoretical literature.We resolve this tension by using tools
from the semi-parametric regression literature leading to a general
characterization in terms of the geometry of OLS estimators which allows us to
extend current results to a larger class of DAGs, functional forms, and
distributional assumptions. We further use these results to understand the
limitations of current simulation approaches and to propose a new framework for
performing causal simulation experiments to compare estimators. We then
evaluate the challenges and benefits of extending this simulation approach to
the context of a real clinical data set with a binary treatment, laying the
groundwork for a principled approach to sensitivity analysis for bias
amplification in the presence of unmeasured confounding.

arXiv link: http://arxiv.org/abs/2003.08449v1

Econometrics arXiv updated paper (originally submitted: 2020-03-18)

Experimental Design under Network Interference

Authors: Davide Viviano

This paper studies the design of two-wave experiments in the presence of
spillover effects when the researcher aims to conduct precise inference on
treatment effects. We consider units connected through a single network, local
dependence among individuals, and a general class of estimands encompassing
average treatment and average spillover effects. We introduce a statistical
framework for designing two-wave experiments with networks, where the
researcher optimizes over participants and treatment assignments to minimize
the variance of the estimators of interest, using a first-wave (pilot)
experiment to estimate the variance. We derive guarantees for inference on
treatment effects and regret guarantees on the variance obtained from the
proposed design mechanism. Our results illustrate the existence of a trade-off
in the choice of the pilot study and formally characterize the pilot's size
relative to the main experiment. Simulations using simulated and real-world
networks illustrate the advantages of the method.

arXiv link: http://arxiv.org/abs/2003.08421v4

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-03-17

Interpretable Personalization via Policy Learning with Linear Decision Boundaries

Authors: Zhaonan Qu, Isabella Qian, Zhengyuan Zhou

With the rise of the digital economy and an explosion of available
information about consumers, effective personalization of goods and services
has become a core business focus for companies to improve revenues and maintain
a competitive edge. This paper studies the personalization problem through the
lens of policy learning, where the goal is to learn a decision-making rule (a
policy) that maps from consumer and product characteristics (features) to
recommendations (actions) in order to optimize outcomes (rewards). We focus on
using available historical data for offline learning with unknown data
collection procedures, where a key challenge is the non-random assignment of
recommendations. Moreover, in many business and medical applications,
interpretability of a policy is essential. We study the class of policies with
linear decision boundaries to ensure interpretability, and propose learning
algorithms using tools from causal inference to address unbalanced treatments.
We study several optimization schemes to solve the associated non-convex,
non-smooth optimization problem, and find that a Bayesian optimization
algorithm is effective. We test our algorithm with extensive simulation studies
and apply it to an anonymized online marketplace customer purchase dataset,
where the learned policy outputs a personalized discount recommendation based
on customer and product features in order to maximize gross merchandise value
(GMV) for sellers. Our learned policy improves upon the platform's baseline by
88.2% in net sales revenue, while also providing informative insights on which
features are important for the decision-making process. Our findings suggest
that our proposed policy learning framework using tools from causal inference
and Bayesian optimization provides a promising practical approach to
interpretable personalization across a wide range of applications.

arXiv link: http://arxiv.org/abs/2003.07545v4

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2020-03-16

Anomalous supply shortages from dynamic pricing in on-demand mobility

Authors: Malte Schröder, David-Maximilian Storch, Philip Marszal, Marc Timme

Dynamic pricing schemes are increasingly employed across industries to
maintain a self-organized balance of demand and supply. However, throughout
complex dynamical systems, unintended collective states exist that may
compromise their function. Here we reveal how dynamic pricing may induce
demand-supply imbalances instead of preventing them. Combining game theory and
time series analysis of dynamic pricing data from on-demand ride-hailing
services, we explain this apparent contradiction. We derive a phase diagram
demonstrating how and under which conditions dynamic pricing incentivizes
collective action of ride-hailing drivers to induce anomalous supply shortages.
By disentangling different timescales in price time series of ride-hailing
services at 137 locations across the globe, we identify characteristic patterns
in the price dynamics reflecting these anomalous supply shortages. Our results
provide systemic insights for the regulation of dynamic pricing, in particular
in publicly accessible mobility systems, by unraveling under which conditions
dynamic pricing schemes promote anomalous supply shortages.

arXiv link: http://arxiv.org/abs/2003.07736v1

Econometrics arXiv updated paper (originally submitted: 2020-03-16)

Testing Many Restrictions Under Heteroskedasticity

Authors: Stanislav Anatolyev, Mikkel Sølvsten

We propose a hypothesis test that allows for many tested restrictions in a
heteroskedastic linear regression model. The test compares the conventional F
statistic to a critical value that corrects for many restrictions and
conditional heteroskedasticity. This correction uses leave-one-out estimation
to correctly center the critical value and leave-three-out estimation to
appropriately scale it. The large sample properties of the test are established
in an asymptotic framework where the number of tested restrictions may be fixed
or may grow with the sample size, and can even be proportional to the number of
observations. We show that the test is asymptotically valid and has non-trivial
asymptotic power against the same local alternatives as the exact F test when
the latter is valid. Simulations corroborate these theoretical findings and
suggest excellent size control in moderately small samples, even under strong
heteroskedasticity.

arXiv link: http://arxiv.org/abs/2003.07320v3

Econometrics arXiv updated paper (originally submitted: 2020-03-16)

Stochastic Frontier Analysis with Generalized Errors: inference, model comparison and averaging

Authors: Kamil Makieła, Błażej Mazur

Contribution of this paper lies in the formulation and estimation of a
generalized model for stochastic frontier analysis (SFA) that nests virtually
all forms used and includes some that have not been considered so far. The
model is based on the generalized t distribution for the observation error and
the generalized beta distribution of the second kind for the
inefficiency-related term. We use this general error structure framework for
formal testing, to compare alternative specifications and to conduct model
averaging. This allows us to deal with model specification uncertainty, which
is one of the main unresolved issues in SFA, and to relax a number of
potentially restrictive assumptions embedded within existing SF models. We also
develop Bayesian inference methods that are less restrictive compared to the
ones used so far and demonstrate feasible approximate alternatives based on
maximum likelihood.

arXiv link: http://arxiv.org/abs/2003.07150v2

Econometrics arXiv updated paper (originally submitted: 2020-03-13)

Targeting customers under response-dependent costs

Authors: Johannes Haupt, Stefan Lessmann

This study provides a formal analysis of the customer targeting problem when
the cost for a marketing action depends on the customer response and proposes a
framework to estimate the decision variables for campaign profit optimization.
Targeting a customer is profitable if the impact and associated profit of the
marketing treatment are higher than its cost. Despite the growing literature on
uplift models to identify the strongest treatment-responders, no research has
investigated optimal targeting when the costs of the treatment are unknown at
the time of the targeting decision. Stochastic costs are ubiquitous in direct
marketing and customer retention campaigns because marketing incentives are
conditioned on a positive customer response. This study makes two contributions
to the literature, which are evaluated on an e-commerce coupon targeting
campaign. First, we formally analyze the targeting decision problem under
response-dependent costs. Profit-optimal targeting requires an estimate of the
treatment effect on the customer and an estimate of the customer response
probability under treatment. The empirical results demonstrate that the
consideration of treatment cost substantially increases campaign profit when
used for customer targeting in combination with an estimate of the average or
customer-level treatment effect. Second, we propose a framework to jointly
estimate the treatment effect and the response probability by combining methods
for causal inference with a hurdle mixture model. The proposed causal hurdle
model achieves competitive campaign profit while streamlining model building.
Code is available at https://github.com/Humboldt-WI/response-dependent-costs.

arXiv link: http://arxiv.org/abs/2003.06271v2

Econometrics arXiv updated paper (originally submitted: 2020-03-12)

Causal Spillover Effects Using Instrumental Variables

Authors: Gonzalo Vazquez-Bare

I set up a potential outcomes framework to analyze spillover effects using
instrumental variables. I characterize the population compliance types in a
setting in which spillovers can occur on both treatment take-up and outcomes,
and provide conditions for identification of the marginal distribution of
compliance types. I show that intention-to-treat (ITT) parameters aggregate
multiple direct and spillover effects for different compliance types, and hence
do not have a clear link to causally interpretable parameters. Moreover,
rescaling ITT parameters by first-stage estimands generally recovers a weighted
combination of average effects where the sum of weights is larger than one. I
then analyze identification of causal direct and spillover effects under
one-sided noncompliance, and show that causal effects can be estimated by 2SLS
in this case. I illustrate the proposed methods using data from an experiment
on social interactions and voting behavior. I also introduce an alternative
assumption, independence of peers' types, that identifies parameters of
interest under two-sided noncompliance by restricting the amount of
heterogeneity in average potential outcomes.

arXiv link: http://arxiv.org/abs/2003.06023v5

Econometrics arXiv updated paper (originally submitted: 2020-03-11)

A mixture autoregressive model based on Gaussian and Student's $t$-distributions

Authors: Savi Virolainen

We introduce a new mixture autoregressive model which combines Gaussian and
Student's $t$ mixture components. The model has very attractive properties
analogous to the Gaussian and Student's $t$ mixture autoregressive models, but
it is more flexible as it enables to model series which consist of both
conditionally homoscedastic Gaussian regimes and conditionally heteroscedastic
Student's $t$ regimes. The usefulness of our model is demonstrated in an
empirical application to the monthly U.S. interest rate spread between the
3-month Treasury bill rate and the effective federal funds rate.

arXiv link: http://arxiv.org/abs/2003.05221v3

Econometrics arXiv updated paper (originally submitted: 2020-03-09)

Identification and Estimation of Weakly Separable Models Without Monotonicity

Authors: Songnian Chen, Shakeeb Khan, Xun Tang

We study the identification and estimation of treatment effect parameters in
weakly separable models. In their seminal work, Vytlacil and Yildiz (2007)
showed how to identify and estimate the average treatment effect of a dummy
endogenous variable when the outcome is weakly separable in a single index.
Their identification result builds on a monotonicity condition with respect to
this single index. In comparison, we consider similar weakly separable models
with multiple indices, and relax the monotonicity condition for identification.
Unlike Vytlacil and Yildiz (2007), we exploit the full information in the
distribution of the outcome variable, instead of just its mean. Indeed, when
the outcome distribution function is more informative than the mean, our method
is applicable to more general settings than theirs; in particular we do not
rely on their monotonicity assumption and at the same time we also allow for
multiple indices. To illustrate the advantage of our approach, we provide
examples of models where our approach can identify parameters of interest
whereas existing methods would fail. These examples include models with
multiple unobserved disturbance terms such as the Roy model and multinomial
choice models with dummy endogenous variables, as well as potential outcome
models with endogenous random coefficients. Our method is easy to implement and
can be applied to a wide class of models. We establish standard asymptotic
properties such as consistency and asymptotic normality.

arXiv link: http://arxiv.org/abs/2003.04337v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-03-09

Fast Bayesian Record Linkage With Record-Specific Disagreement Parameters

Authors: Thomas Stringham

Researchers are often interested in linking individuals between two datasets
that lack a common unique identifier. Matching procedures often struggle to
match records with common names, birthplaces or other field values.
Computational feasibility is also a challenge, particularly when linking large
datasets. We develop a Bayesian method for automated probabilistic record
linkage and show it recovers more than 50% more true matches, holding accuracy
constant, than comparable methods in a matching of military recruitment data to
the 1900 US Census for which expert-labelled matches are available. Our
approach, which builds on a recent state-of-the-art Bayesian method, refines
the modelling of comparison data, allowing disagreement probability parameters
conditional on non-match status to be record-specific in the smaller of the two
datasets. This flexibility significantly improves matching when many records
share common field values. We show that our method is computationally feasible
in practice, despite the added complexity, with an R/C++ implementation that
achieves significant improvement in speed over comparable recent methods. We
also suggest a lightweight method for treatment of very common names and show
how to estimate true positive rate and positive predictive value when true
match status is unavailable.

arXiv link: http://arxiv.org/abs/2003.04238v2

Econometrics arXiv updated paper (originally submitted: 2020-03-09)

Unit Root Testing with Slowly Varying Trends

Authors: Sven Otto

A unit root test is proposed for time series with a general nonlinear
deterministic trend component. It is shown that asymptotically the pooled OLS
estimator of overlapping blocks filters out any trend component that satisfies
some Lipschitz condition. Under both fixed-$b$ and small-$b$ block asymptotics,
the limiting distribution of the t-statistic for the unit root hypothesis is
derived. Nuisance parameter corrections provide heteroskedasticity-robust
tests, and serial correlation is accounted for by pre-whitening. A Monte Carlo
study that considers slowly varying trends yields both good size and improved
power results for the proposed tests when compared to conventional unit root
tests.

arXiv link: http://arxiv.org/abs/2003.04066v3

Econometrics arXiv updated paper (originally submitted: 2020-03-06)

Complete Subset Averaging for Quantile Regressions

Authors: Ji Hyung Lee, Youngki Shin

We propose a novel conditional quantile prediction method based on complete
subset averaging (CSA) for quantile regressions. All models under consideration
are potentially misspecified and the dimension of regressors goes to infinity
as the sample size increases. Since we average over the complete subsets, the
number of models is much larger than the usual model averaging method which
adopts sophisticated weighting schemes. We propose to use an equal weight but
select the proper size of the complete subset based on the leave-one-out
cross-validation method. Building upon the theory of Lu and Su (2015), we
investigate the large sample properties of CSA and show the asymptotic
optimality in the sense of Li (1987). We check the finite sample performance
via Monte Carlo simulations and empirical applications.

arXiv link: http://arxiv.org/abs/2003.03299v3

Econometrics arXiv updated paper (originally submitted: 2020-03-06)

Double Machine Learning based Program Evaluation under Unconfoundedness

Authors: Michael C. Knaus

This paper reviews, applies and extends recently proposed methods based on
Double Machine Learning (DML) with a focus on program evaluation under
unconfoundedness. DML based methods leverage flexible prediction models to
adjust for confounding variables in the estimation of (i) standard average
effects, (ii) different forms of heterogeneous effects, and (iii) optimal
treatment assignment rules. An evaluation of multiple programs of the Swiss
Active Labour Market Policy illustrates how DML based methods enable a
comprehensive program evaluation. Motivated by extreme individualised treatment
effect estimates of the DR-learner, we propose the normalised DR-learner
(NDR-learner) to address this issue. The NDR-learner acknowledges that
individualised effect estimates can be stabilised by an individualised
normalisation of inverse probability weights.

arXiv link: http://arxiv.org/abs/2003.03191v5

Econometrics arXiv updated paper (originally submitted: 2020-03-05)

Equal Predictive Ability Tests Based on Panel Data with Applications to OECD and IMF Forecasts

Authors: Oguzhan Akgun, Alain Pirotte, Giovanni Urga, Zhenlin Yang

We propose two types of equal predictive ability (EPA) tests with panels to
compare the predictions made by two forecasters. The first type, namely
$S$-statistics, focuses on the overall EPA hypothesis which states that the EPA
holds on average over all panel units and over time. The second, called
$C$-statistics, focuses on the clustered EPA hypothesis where the EPA holds
jointly for a fixed number of clusters of panel units. The asymptotic
properties of the proposed tests are evaluated under weak and strong
cross-sectional dependence. An extensive Monte Carlo simulation shows that the
proposed tests have very good finite sample properties even with little
information about the cross-sectional dependence in the data. The proposed
framework is applied to compare the economic growth forecasts of the OECD and
the IMF, and to evaluate the performance of the consumer price inflation
forecasts of the IMF.

arXiv link: http://arxiv.org/abs/2003.02803v3

Econometrics arXiv updated paper (originally submitted: 2020-03-05)

Backward CUSUM for Testing and Monitoring Structural Change with an Application to COVID-19 Pandemic Data

Authors: Sven Otto, Jörg Breitung

It is well known that the conventional cumulative sum (CUSUM) test suffers
from low power and large detection delay. In order to improve the power of the
test, we propose two alternative statistics. The backward CUSUM detector
considers the recursive residuals in reverse chronological order, whereas the
stacked backward CUSUM detector sequentially cumulates a triangular array of
backwardly cumulated residuals. A multivariate invariance principle for partial
sums of recursive residuals is given, and the limiting distributions of the
test statistics are derived under local alternatives. In the retrospective
context, the local power of the tests is shown to be substantially higher than
that of the conventional CUSUM test if a break occurs in the middle or at the
end of the sample. When applied to monitoring schemes, the detection delay of
the stacked backward CUSUM is found to be much shorter than that of the
conventional monitoring CUSUM procedure. Furthermore, we propose an estimator
of the break date based on the backward CUSUM detector and show that in
monitoring exercises this estimator tends to outperform the usual maximum
likelihood estimator. Finally, an application of the methodology to COVID-19
data is presented.

arXiv link: http://arxiv.org/abs/2003.02682v3

Econometrics arXiv updated paper (originally submitted: 2020-03-05)

Impact of Congestion Charge and Minimum Wage on TNCs: A Case Study for San Francisco

Authors: Sen Li, Kameshwar Poolla, Pravin Varaiya

This paper describes the impact on transportation network companies (TNCs) of
the imposition of a congestion charge and a driver minimum wage. The impact is
assessed using a market equilibrium model to calculate the changes in the
number of passenger trips and trip fare, number of drivers employed, the TNC
platform profit, the number of TNC vehicles, and city revenue. Two charges are
considered: (a) a charge per TNC trip similar to an excise tax, and (b) a
charge per vehicle operating hour (whether or not it has a passenger) similar
to a road tax. Both charges reduce the number of TNC trips, but this reduction
is limited by the wage floor, and the number of TNC vehicles reduced is not
significant. The time-based charge is preferable to the trip-based charge
since, by penalizing idle vehicle time, the former increases vehicle occupancy.
In a case study for San Francisco, the time-based charge is found to be Pareto
superior to the trip-based charge as it yields higher passenger surplus, higher
platform profits, and higher tax revenue for the city.

arXiv link: http://arxiv.org/abs/2003.02550v4

Econometrics arXiv cross-link from math.OC (math.OC), submitted: 2020-03-04

Joint Estimation of Discrete Choice Model and Arrival Rate with Unobserved Stock-out Events

Authors: Hongzhang Shao, Anton J. Kleywegt

This paper studies the joint estimation problem of a discrete choice model
and the arrival rate of potential customers when unobserved stock-out events
occur. In this paper, we generalize [Anupindi et al., 1998] and [Conlon and
Mortimer, 2013] in the sense that (1) we work with generic choice models, (2)
we allow arbitrary numbers of products and stock-out events, and (3) we
consider the existence of the null alternative, and estimates the overall
arrival rate of potential customers. In addition, we point out that the
modeling in [Conlon and Mortimer, 2013] is problematic, and present the correct
formulation.

arXiv link: http://arxiv.org/abs/2003.02313v1

Econometrics arXiv updated paper (originally submitted: 2020-03-04)

Estimating the Effect of Central Bank Independence on Inflation Using Longitudinal Targeted Maximum Likelihood Estimation

Authors: Philipp F. M. Baumann, Michael Schomaker, Enzo Rossi

The notion that an independent central bank reduces a country's inflation is
a controversial hypothesis. To date, it has not been possible to satisfactorily
answer this question because the complex macroeconomic structure that gives
rise to the data has not been adequately incorporated into statistical
analyses. We develop a causal model that summarizes the economic process of
inflation. Based on this causal model and recent data, we discuss and identify
the assumptions under which the effect of central bank independence on
inflation can be identified and estimated. Given these and alternative
assumptions, we estimate this effect using modern doubly robust effect
estimators, i.e., longitudinal targeted maximum likelihood estimators. The
estimation procedure incorporates machine learning algorithms and is tailored
to address the challenges associated with complex longitudinal macroeconomic
data. We do not find strong support for the hypothesis that having an
independent central bank for a long period of time necessarily lowers
inflation. Simulation studies evaluate the sensitivity of the proposed methods
in complex settings when certain assumptions are violated and highlight the
importance of working with appropriate learning algorithms for estimation.

arXiv link: http://arxiv.org/abs/2003.02208v7

Econometrics arXiv paper, submitted: 2020-02-29

Identification of Random Coefficient Latent Utility Models

Authors: Roy Allen, John Rehbeck

This paper provides nonparametric identification results for random
coefficient distributions in perturbed utility models. We cover discrete and
continuous choice models. We establish identification using variation in mean
quantities, and the results apply when an analyst observes aggregate demands
but not whether goods are chosen together. We require exclusion restrictions
and independence between random slope coefficients and random intercepts. We do
not require regressors to have large supports or parametric assumptions.

arXiv link: http://arxiv.org/abs/2003.00276v1

Econometrics arXiv updated paper (originally submitted: 2020-02-28)

Causal mediation analysis with double machine learning

Authors: Helmut Farbmacher, Martin Huber, Lukáš Lafférs, Henrika Langen, Martin Spindler

This paper combines causal mediation analysis with double machine learning to
control for observed confounders in a data-driven way under a
selection-on-observables assumption in a high-dimensional setting. We consider
the average indirect effect of a binary treatment operating through an
intermediate variable (or mediator) on the causal path between the treatment
and the outcome, as well as the unmediated direct effect. Estimation is based
on efficient score functions, which possess a multiple robustness property
w.r.t. misspecifications of the outcome, mediator, and treatment models. This
property is key for selecting these models by double machine learning, which is
combined with data splitting to prevent overfitting in the estimation of the
effects of interest. We demonstrate that the direct and indirect effect
estimators are asymptotically normal and root-n consistent under specific
regularity conditions and investigate the finite sample properties of the
suggested methods in a simulation study when considering lasso as machine
learner. We also provide an empirical application to the U.S. National
Longitudinal Survey of Youth, assessing the indirect effect of health insurance
coverage on general health operating via routine checkups as mediator, as well
as the direct effect. We find a moderate short term effect of health insurance
coverage on general health which is, however, not mediated by routine checkups.

arXiv link: http://arxiv.org/abs/2002.12710v6

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-02-28

Modelling Network Interference with Multi-valued Treatments: the Causal Effect of Immigration Policy on Crime Rates

Authors: C. Tortù, I. Crimaldi, F. Mealli, L. Forastiere

Policy evaluation studies, which intend to assess the effect of an
intervention, face some statistical challenges: in real-world settings
treatments are not randomly assigned and the analysis might be further
complicated by the presence of interference between units. Researchers have
started to develop novel methods that allow to manage spillover mechanisms in
observational studies; recent works focus primarily on binary treatments.
However, many policy evaluation studies deal with more complex interventions.
For instance, in political science, evaluating the impact of policies
implemented by administrative entities often implies a multivariate approach,
as a policy towards a specific issue operates at many different levels and can
be defined along a number of dimensions. In this work, we extend the
statistical framework about causal inference under network interference in
observational studies, allowing for a multi-valued individual treatment and an
interference structure shaped by a weighted network. The estimation strategy is
based on a joint multiple generalized propensity score and allows one to
estimate direct effects, controlling for both individual and network
covariates. We follow the proposed methodology to analyze the impact of the
national immigration policy on the crime rate. We define a multi-valued
characterization of political attitudes towards migrants and we assume that the
extent to which each country can be influenced by another country is modeled by
an appropriate indicator, summarizing their cultural and geographical
proximity. Results suggest that implementing a highly restrictive immigration
policy leads to an increase of the crime rate and the estimated effects is
larger if we take into account interference from other countries.

arXiv link: http://arxiv.org/abs/2003.10525v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2020-02-26

Off-Policy Evaluation and Learning for External Validity under a Covariate Shift

Authors: Masahiro Kato, Masatoshi Uehara, Shota Yasui

We consider evaluating and training a new policy for the evaluation data by
using the historical data obtained from a different policy. The goal of
off-policy evaluation (OPE) is to estimate the expected reward of a new policy
over the evaluation data, and that of off-policy learning (OPL) is to find a
new policy that maximizes the expected reward over the evaluation data.
Although the standard OPE and OPL assume the same distribution of covariate
between the historical and evaluation data, a covariate shift often exists,
i.e., the distribution of the covariate of the historical data is different
from that of the evaluation data. In this paper, we derive the efficiency bound
of OPE under a covariate shift. Then, we propose doubly robust and efficient
estimators for OPE and OPL under a covariate shift by using a nonparametric
estimator of the density ratio between the historical and evaluation data
distributions. We also discuss other possible estimators and compare their
theoretical properties. Finally, we confirm the effectiveness of the proposed
estimators through experiments.

arXiv link: http://arxiv.org/abs/2002.11642v3

Econometrics arXiv updated paper (originally submitted: 2020-02-26)

Econometric issues with Laubach and Williams' estimates of the natural rate of interest

Authors: Daniel Buncic

Holston, Laubach and Williams' (2017) estimates of the natural rate of
interest are driven by the downward trending behaviour of 'other factor'
$z_{t}$. I show that their implementation of Stock and Watson's (1998) Median
Unbiased Estimation (MUE) to determine the size of the $\lambda _{z}$ parameter
which drives this downward trend in $z_{t}$ is unsound. It cannot recover the
ratio of interest $\lambda _{z}=a_{r}\sigma _{z}/\sigma _{y}$ from MUE
required for the estimation of the full structural model. This failure is due
to an 'unnecessary' misspecification in Holston et al.'s (2017) formulation of
the Stage 2 model. More importantly, their implementation of MUE on this
misspecified Stage 2 model spuriously amplifies the point estimate of $\lambda
_{z}$. Using a simulation experiment, I show that their procedure generates
excessively large estimates of $\lambda _{z}$ when applied to data generated
from a model where the true $\lambda _{z}$ is equal to zero. Correcting the
misspecification in their Stage 2 model and the implementation of MUE leads to
a substantially smaller $\lambda _{z}$ estimate, and with this, a more subdued
downward trending influence of 'other factor' $z_{t}$ on the natural rate.
Moreover, the $\lambda _{z}$ point estimate is statistically highly
insignificant, suggesting that there is no role for 'other factor' $z_{t}$ in
this model. I also discuss various other estimation issues that arise in
Holston et al.'s (2017) model of the natural rate that make it unsuitable for
policy analysis.

arXiv link: http://arxiv.org/abs/2002.11583v2

Econometrics arXiv updated paper (originally submitted: 2020-02-25)

Hours Worked and the U.S. Distribution of Real Annual Earnings 1976-2019

Authors: Iván Fernández-Val, Franco Peracchi, Aico van Vuuren, Francis Vella

We examine the impact of annual hours worked on annual earnings by
decomposing changes in the real annual earnings distribution into composition,
structural and hours effects. We do so via a nonseparable simultaneous model of
hours, wages and earnings. Using the Current Population Survey for the survey
years 1976--2019, we find that changes in the female distribution of annual
hours of work are important in explaining movements in inequality in female
annual earnings. This captures the substantial changes in their employment
behavior over this period. Movements in the male hours distribution only affect
the lower part of their earnings distribution and reflect the sensitivity of
these workers' annual hours of work to cyclical factors.

arXiv link: http://arxiv.org/abs/2002.11211v3

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2020-02-25

A Practical Approach to Social Learning

Authors: Amir Ban, Moran Koren

Models of social learning feature either binary signals or abstract signal
structures often deprived of micro-foundations. Both models are limited when
analyzing interim results or performing empirical analysis. We present a method
of generating signal structures which are richer than the binary model, yet are
tractable enough to perform simulations and empirical analysis. We demonstrate
the method's usability by revisiting two classical papers: (1) we discuss the
economic significance of unbounded signals Smith and Sorensen (2000); (2) we
use experimental data from Anderson and Holt (1997) to perform econometric
analysis. Additionally, we provide a necessary and sufficient condition for the
occurrence of action cascades.

arXiv link: http://arxiv.org/abs/2002.11017v1

Econometrics arXiv updated paper (originally submitted: 2020-02-24)

Estimating Economic Models with Testable Assumptions: Theory and Applications

Authors: Moyu Liao

This paper studies the identification, estimation, and hypothesis testing
problem in complete and incomplete economic models with testable assumptions.
Testable assumptions ($A$) give strong and interpretable empirical content to
the models but they also carry the possibility that some distribution of
observed outcomes may reject these assumptions. A natural way to avoid this is
to find a set of relaxed assumptions ($A$) that cannot be rejected by
any distribution of observed outcome and the identified set of the parameter of
interest is not changed when the original assumption is not rejected. The main
contribution of this paper is to characterize the properties of such a relaxed
assumption $A$ using a generalized definition of refutability and
confirmability. I also propose a general method to construct such $A$.
A general estimation and inference procedure is proposed and can be applied to
most incomplete economic models. I apply my methodology to the instrument
monotonicity assumption in Local Average Treatment Effect (LATE) estimation and
to the sector selection assumption in a binary outcome Roy model of employment
sector choice. In the LATE application, I use my general method to construct a
set of relaxed assumptions $A$ that can never be rejected, and the
identified set of LATE is the same as imposing $A$ when $A$ is not rejected.
LATE is point identified under my extension $A$ in the LATE
application. In the binary outcome Roy model, I use my method of incomplete
models to relax Roy's sector selection assumption and characterize the
identified set of the binary potential outcome as a polyhedron.

arXiv link: http://arxiv.org/abs/2002.10415v3

Econometrics arXiv paper, submitted: 2020-02-24

Bayesian Inference in High-Dimensional Time-varying Parameter Models using Integrated Rotated Gaussian Approximations

Authors: Florian Huber, Gary Koop, Michael Pfarrhofer

Researchers increasingly wish to estimate time-varying parameter (TVP)
regressions which involve a large number of explanatory variables. Including
prior information to mitigate over-parameterization concerns has led to many
using Bayesian methods. However, Bayesian Markov Chain Monte Carlo (MCMC)
methods can be very computationally demanding. In this paper, we develop
computationally efficient Bayesian methods for estimating TVP models using an
integrated rotated Gaussian approximation (IRGA). This exploits the fact that
whereas constant coefficients on regressors are often important, most of the
TVPs are often unimportant. Since Gaussian distributions are invariant to
rotations we can split the the posterior into two parts: one involving the
constant coefficients, the other involving the TVPs. Approximate methods are
used on the latter and, conditional on these, the former are estimated with
precision using MCMC methods. In empirical exercises involving artificial data
and a large macroeconomic data set, we show the accuracy and computational
benefits of IRGA methods.

arXiv link: http://arxiv.org/abs/2002.10274v1

Econometrics arXiv paper, submitted: 2020-02-23

Estimation and Inference about Tail Features with Tail Censored Data

Authors: Yulong Wang, Zhijie Xiao

This paper considers estimation and inference about tail features when the
observations beyond some threshold are censored. We first show that ignoring
such tail censoring could lead to substantial bias and size distortion, even if
the censored probability is tiny. Second, we propose a new maximum likelihood
estimator (MLE) based on the Pareto tail approximation and derive its
asymptotic properties. Third, we provide a small sample modification to the MLE
by resorting to Extreme Value theory. The MLE with this modification delivers
excellent small sample performance, as shown by Monte Carlo simulations. We
illustrate its empirical relevance by estimating (i) the tail index and the
extreme quantiles of the US individual earnings with the Current Population
Survey dataset and (ii) the tail index of the distribution of macroeconomic
disasters and the coefficient of risk aversion using the dataset collected by
Barro and Urs{\'u}a (2008). Our new empirical findings are substantially
different from the existing literature.

arXiv link: http://arxiv.org/abs/2002.09982v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-02-23

Testing for threshold regulation in presence of measurement error with an application to the PPP hypothesis

Authors: Kung-Sik Chan, Simone Giannerini, Greta Goracci, Howell Tong

Regulation is an important feature characterising many dynamical phenomena
and can be tested within the threshold autoregressive setting, with the null
hypothesis being a global non-stationary process. Nonetheless, this setting is
debatable since data are often corrupted by measurement errors. Thus, it is
more appropriate to consider a threshold autoregressive moving-average model as
the general hypothesis. We implement this new setting with the integrated
moving-average model of order one as the null hypothesis. We derive a Lagrange
multiplier test which has an asymptotically similar null distribution and
provide the first rigorous proof of tightness pertaining to testing for
threshold nonlinearity against difference stationarity, which is of independent
interest. Simulation studies show that the proposed approach enjoys less bias
and higher power in detecting threshold regulation than existing tests when
there are measurement errors. We apply the new approach to the daily real
exchange rates of Eurozone countries. It lends support to the purchasing power
parity hypothesis, via a nonlinear mean-reversion mechanism triggered upon
crossing a threshold located in the extreme upper tail. Furthermore, we analyse
the Eurozone series and propose a threshold autoregressive moving-average
specification, which sheds new light on the purchasing power parity debate.

arXiv link: http://arxiv.org/abs/2002.09968v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-02-23

Survey Bandits with Regret Guarantees

Authors: Sanath Kumar Krishnamurthy, Susan Athey

We consider a variant of the contextual bandit problem. In standard
contextual bandits, when a user arrives we get the user's complete feature
vector and then assign a treatment (arm) to that user. In a number of
applications (like healthcare), collecting features from users can be costly.
To address this issue, we propose algorithms that avoid needless feature
collection while maintaining strong regret guarantees.

arXiv link: http://arxiv.org/abs/2002.09814v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2020-02-21

Kernel Conditional Moment Test via Maximum Moment Restriction

Authors: Krikamol Muandet, Wittawat Jitkrittum, Jonas Kübler

We propose a new family of specification tests called kernel conditional
moment (KCM) tests. Our tests are built on a novel representation of
conditional moment restrictions in a reproducing kernel Hilbert space (RKHS)
called conditional moment embedding (CMME). After transforming the conditional
moment restrictions into a continuum of unconditional counterparts, the test
statistic is defined as the maximum moment restriction (MMR) within the unit
ball of the RKHS. We show that the MMR not only fully characterizes the
original conditional moment restrictions, leading to consistency in both
hypothesis testing and parameter estimation, but also has an analytic
expression that is easy to compute as well as closed-form asymptotic
distributions. Our empirical studies show that the KCM test has a promising
finite-sample performance compared to existing tests.

arXiv link: http://arxiv.org/abs/2002.09225v3

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-02-20

Forecasting the Intra-Day Spread Densities of Electricity Prices

Authors: Ekaterina Abramova, Derek Bunn

Intra-day price spreads are of interest to electricity traders, storage and
electric vehicle operators. This paper formulates dynamic density functions,
based upon skewed-t and similar representations, to model and forecast the
German electricity price spreads between different hours of the day, as
revealed in the day-ahead auctions. The four specifications of the density
functions are dynamic and conditional upon exogenous drivers, thereby
permitting the location, scale and shape parameters of the densities to respond
hourly to such factors as weather and demand forecasts. The best fitting and
forecasting specifications for each spread are selected based on the Pinball
Loss function, following the closed-form analytical solutions of the cumulative
distribution functions.

arXiv link: http://arxiv.org/abs/2002.10566v1

Econometrics arXiv updated paper (originally submitted: 2020-02-20)

Combining Shrinkage and Sparsity in Conjugate Vector Autoregressive Models

Authors: Niko Hauzenberger, Florian Huber, Luca Onorante

Conjugate priors allow for fast inference in large dimensional vector
autoregressive (VAR) models but, at the same time, introduce the restriction
that each equation features the same set of explanatory variables. This paper
proposes a straightforward means of post-processing posterior estimates of a
conjugate Bayesian VAR to effectively perform equation-specific covariate
selection. Compared to existing techniques using shrinkage alone, our approach
combines shrinkage and sparsity in both the VAR coefficients and the error
variance-covariance matrices, greatly reducing estimation uncertainty in large
dimensions while maintaining computational tractability. We illustrate our
approach by means of two applications. The first application uses synthetic
data to investigate the properties of the model across different
data-generating processes, the second application analyzes the predictive gains
from sparsification in a forecasting exercise for US data.

arXiv link: http://arxiv.org/abs/2002.08760v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-02-20

Debiased Off-Policy Evaluation for Recommendation Systems

Authors: Yusuke Narita, Shota Yasui, Kohei Yata

Efficient methods to evaluate new algorithms are critical for improving
interactive bandit and reinforcement learning systems such as recommendation
systems. A/B tests are reliable, but are time- and money-consuming, and entail
a risk of failure. In this paper, we develop an alternative method, which
predicts the performance of algorithms given historical data that may have been
generated by a different algorithm. Our estimator has the property that its
prediction converges in probability to the true performance of a counterfactual
algorithm at a rate of $N$, as the sample size $N$ increases. We also
show a correct way to estimate the variance of our prediction, thus allowing
the analyst to quantify the uncertainty in the prediction. These properties
hold even when the analyst does not know which among a large number of
potentially important state variables are actually important. We validate our
method by a simulation experiment about reinforcement learning. We finally
apply it to improve advertisement design by a major advertisement company. We
find that our method produces smaller mean squared errors than state-of-the-art
methods.

arXiv link: http://arxiv.org/abs/2002.08536v3

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2020-02-19

Forecasting Foreign Exchange Rate: A Multivariate Comparative Analysis between Traditional Econometric, Contemporary Machine Learning & Deep Learning Techniques

Authors: Manav Kaushik, A K Giri

In todays global economy, accuracy in predicting macro-economic parameters
such as the foreign the exchange rate or at least estimating the trend
correctly is of key importance for any future investment. In recent times, the
use of computational intelligence-based techniques for forecasting
macroeconomic variables has been proven highly successful. This paper tries to
come up with a multivariate time series approach to forecast the exchange rate
(USD/INR) while parallelly comparing the performance of three multivariate
prediction modelling techniques: Vector Auto Regression (a Traditional
Econometric Technique), Support Vector Machine (a Contemporary Machine Learning
Technique), and Recurrent Neural Networks (a Contemporary Deep Learning
Technique). We have used monthly historical data for several macroeconomic
variables from April 1994 to December 2018 for USA and India to predict USD-INR
Foreign Exchange Rate. The results clearly depict that contemporary techniques
of SVM and RNN (Long Short-Term Memory) outperform the widely used traditional
method of Auto Regression. The RNN model with Long Short-Term Memory (LSTM)
provides the maximum accuracy (97.83%) followed by SVM Model (97.17%) and VAR
Model (96.31%). At last, we present a brief analysis of the correlation and
interdependencies of the variables used for forecasting.

arXiv link: http://arxiv.org/abs/2002.10247v1

Econometrics arXiv updated paper (originally submitted: 2020-02-19)

Cointegration without Unit Roots

Authors: James A. Duffy, Jerome R. Simons

It has been known since Elliott (1998) that standard methods of inference on
cointegrating relationships break down entirely when autoregressive roots are
near but not exactly equal to unity. We consider this problem within the
framework of a structural VAR, arguing this it is as much a problem of
identification failure as it is of inference. We develop a characterisation of
cointegration based on the impulse response function, which allows long-run
equilibrium relationships to remain identified even in the absence of exact
unit roots. Our approach also provides a framework in which the structural
shocks driving the common persistent components continue to be identified via
long-run restrictions, just as in an SVAR with exact unit roots. We show that
inference on the cointegrating relationships is affected by nuisance
parameters, in a manner familiar from predictive regression; indeed the two
problems are asymptotically equivalent. By adapting the approach of Elliott,
M\"uller and Watson (2015) to our setting, we develop tests that robustly
control size while sacrificing little power (relative to tests that are
efficient in the presence of exact unit roots).

arXiv link: http://arxiv.org/abs/2002.08092v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-02-19

Seasonal and Trend Forecasting of Tourist Arrivals: An Adaptive Multiscale Ensemble Learning Approach

Authors: Shaolong Suna, Dan Bi, Ju-e Guo, Shouyang Wang

The accurate seasonal and trend forecasting of tourist arrivals is a very
challenging task. In the view of the importance of seasonal and trend
forecasting of tourist arrivals, and limited research work paid attention to
these previously. In this study, a new adaptive multiscale ensemble (AME)
learning approach incorporating variational mode decomposition (VMD) and least
square support vector regression (LSSVR) is developed for short-, medium-, and
long-term seasonal and trend forecasting of tourist arrivals. In the
formulation of our developed AME learning approach, the original tourist
arrivals series are first decomposed into the trend, seasonal and remainders
volatility components. Then, the ARIMA is used to forecast the trend component,
the SARIMA is used to forecast seasonal component with a 12-month cycle, while
the LSSVR is used to forecast remainder volatility components. Finally, the
forecasting results of the three components are aggregated to generate an
ensemble forecasting of tourist arrivals by the LSSVR based nonlinear ensemble
approach. Furthermore, a direct strategy is used to implement multi-step-ahead
forecasting. Taking two accuracy measures and the Diebold-Mariano test, the
empirical results demonstrate that our proposed AME learning approach can
achieve higher level and directional forecasting accuracy compared with other
benchmarks used in this study, indicating that our proposed approach is a
promising model for forecasting tourist arrivals with high seasonality and
volatility.

arXiv link: http://arxiv.org/abs/2002.08021v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-02-19

Tourism Demand Forecasting: An Ensemble Deep Learning Approach

Authors: Shaolong Sun, Yanzhao Li, Ju-e Guo, Shouyang Wang

The availability of tourism-related big data increases the potential to
improve the accuracy of tourism demand forecasting, but presents significant
challenges for forecasting, including curse of dimensionality and high model
complexity. A novel bagging-based multivariate ensemble deep learning approach
integrating stacked autoencoders and kernel-based extreme learning machines
(B-SAKE) is proposed to address these challenges in this study. By using
historical tourist arrival data, economic variable data and search intensity
index (SII) data, we forecast tourist arrivals in Beijing from four countries.
The consistent results of multiple schemes suggest that our proposed B-SAKE
approach outperforms benchmark models in terms of level accuracy, directional
accuracy and even statistical significance. Both bagging and stacked
autoencoder can effectively alleviate the challenges brought by tourism big
data and improve the forecasting performance of the models. The ensemble deep
learning model we propose contributes to tourism forecasting literature and
benefits relevant government officials and tourism practitioners.

arXiv link: http://arxiv.org/abs/2002.07964v3

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2020-02-18

Fair Prediction with Endogenous Behavior

Authors: Christopher Jung, Sampath Kannan, Changhwa Lee, Mallesh M. Pai, Aaron Roth, Rakesh Vohra

There is increasing regulatory interest in whether machine learning
algorithms deployed in consequential domains (e.g. in criminal justice) treat
different demographic groups "fairly." However, there are several proposed
notions of fairness, typically mutually incompatible. Using criminal justice as
an example, we study a model in which society chooses an incarceration rule.
Agents of different demographic groups differ in their outside options (e.g.
opportunity for legal employment) and decide whether to commit crimes. We show
that equalizing type I and type II errors across groups is consistent with the
goal of minimizing the overall crime rate; other popular notions of fairness
are not.

arXiv link: http://arxiv.org/abs/2002.07147v1

Econometrics arXiv paper, submitted: 2020-02-18

Hopf Bifurcation from new-Keynesian Taylor rule to Ramsey Optimal Policy

Authors: Jean-Bernard Chatelain, Kirsten Ralf

This paper compares different implementations of monetary policy in a
new-Keynesian setting. We can show that a shift from Ramsey optimal policy
under short-term commitment (based on a negative feedback mechanism) to a
Taylor rule (based on a positive feedback mechanism) corresponds to a Hopf
bifurcation with opposite policy advice and a change of the dynamic properties.
This bifurcation occurs because of the ad hoc assumption that interest rate is
a forward-looking variable when policy targets (inflation and output gap) are
forward-looking variables in the new-Keynesian theory.

arXiv link: http://arxiv.org/abs/2002.07479v1

Econometrics arXiv updated paper (originally submitted: 2020-02-17)

Double/Debiased Machine Learning for Dynamic Treatment Effects via g-Estimation

Authors: Greg Lewis, Vasilis Syrgkanis

We consider the estimation of treatment effects in settings when multiple
treatments are assigned over time and treatments can have a causal effect on
future outcomes or the state of the treated unit. We propose an extension of
the double/debiased machine learning framework to estimate the dynamic effects
of treatments, which can be viewed as a Neyman orthogonal (locally robust)
cross-fitted version of $g$-estimation in the dynamic treatment regime. Our
method applies to a general class of non-linear dynamic treatment models known
as Structural Nested Mean Models and allows the use of machine learning methods
to control for potentially high dimensional state variables, subject to a mean
square error guarantee, while still allowing parametric estimation and
construction of confidence intervals for the structural parameters of interest.
These structural parameters can be used for off-policy evaluation of any target
dynamic policy at parametric rates, subject to semi-parametric restrictions on
the data generating process. Our work is based on a recursive peeling process,
typical in $g$-estimation, and formulates a strongly convex objective at each
stage, which allows us to extend the $g$-estimation framework in multiple
directions: i) to provide finite sample guarantees, ii) to estimate non-linear
effect heterogeneity with respect to fixed unit characteristics, within
arbitrary function spaces, enabling a dynamic analogue of the RLearner
algorithm for heterogeneous effects, iii) to allow for high-dimensional sparse
parameterizations of the target structural functions, enabling automated model
selection via a recursive lasso algorithm. We also provide guarantees for data
stemming from a single treated unit over a long horizon and under stationarity
conditions.

arXiv link: http://arxiv.org/abs/2002.07285v5

Econometrics arXiv cross-link from cs.SI (cs.SI), submitted: 2020-02-14

Fairness through Experimentation: Inequality in A/B testing as an approach to responsible design

Authors: Guillaume Saint-Jacques, Amir Sepehri, Nicole Li, Igor Perisic

As technology continues to advance, there is increasing concern about
individuals being left behind. Many businesses are striving to adopt
responsible design practices and avoid any unintended consequences of their
products and services, ranging from privacy vulnerabilities to algorithmic
bias. We propose a novel approach to fairness and inclusiveness based on
experimentation. We use experimentation because we want to assess not only the
intrinsic properties of products and algorithms but also their impact on
people. We do this by introducing an inequality approach to A/B testing,
leveraging the Atkinson index from the economics literature. We show how to
perform causal inference over this inequality measure. We also introduce the
concept of site-wide inequality impact, which captures the inclusiveness impact
of targeting specific subpopulations for experiments, and show how to conduct
statistical inference on this impact. We provide real examples from LinkedIn,
as well as an open-source, highly scalable implementation of the computation of
the Atkinson index and its variance in Spark/Scala. We also provide over a
year's worth of learnings -- gathered by deploying our method at scale and
analyzing thousands of experiments -- on which areas and which kinds of product
innovations seem to inherently foster fairness through inclusiveness.

arXiv link: http://arxiv.org/abs/2002.05819v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-02-13

Experimental Design in Two-Sided Platforms: An Analysis of Bias

Authors: Ramesh Johari, Hannah Li, Inessa Liskovich, Gabriel Weintraub

We develop an analytical framework to study experimental design in two-sided
marketplaces. Many of these experiments exhibit interference, where an
intervention applied to one market participant influences the behavior of
another participant. This interference leads to biased estimates of the
treatment effect of the intervention. We develop a stochastic market model and
associated mean field limit to capture dynamics in such experiments, and use
our model to investigate how the performance of different designs and
estimators is affected by marketplace interference effects. Platforms typically
use two common experimental designs: demand-side ("customer") randomization
(CR) and supply-side ("listing") randomization (LR), along with their
associated estimators. We show that good experimental design depends on market
balance: in highly demand-constrained markets, CR is unbiased, while LR is
biased; conversely, in highly supply-constrained markets, LR is unbiased, while
CR is biased. We also introduce and study a novel experimental design based on
two-sided randomization (TSR) where both customers and listings are randomized
to treatment and control. We show that appropriate choices of TSR designs can
be unbiased in both extremes of market balance, while yielding relatively low
bias in intermediate regimes of market balance.

arXiv link: http://arxiv.org/abs/2002.05670v5

Econometrics arXiv paper, submitted: 2020-02-13

Long-term prediction intervals of economic time series

Authors: Marek Chudy, Sayar Karmakar, Wei Biao Wu

We construct long-term prediction intervals for time-aggregated future values
of univariate economic time series. We propose computational adjustments of the
existing methods to improve coverage probability under a small sample
constraint. A pseudo-out-of-sample evaluation shows that our methods perform at
least as well as selected alternative methods based on model-implied Bayesian
approaches and bootstrapping. Our most successful method yields prediction
intervals for eight macroeconomic indicators over a horizon spanning several
decades.

arXiv link: http://arxiv.org/abs/2002.05384v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2020-02-13

Efficient Adaptive Experimental Design for Average Treatment Effect Estimation

Authors: Masahiro Kato, Takuya Ishihara, Junya Honda, Yusuke Narita

We study how to efficiently estimate average treatment effects (ATEs) using
adaptive experiments. In adaptive experiments, experimenters sequentially
assign treatments to experimental units while updating treatment assignment
probabilities based on past data. We start by defining the efficient
treatment-assignment probability, which minimizes the semiparametric efficiency
bound for ATE estimation. Our proposed experimental design estimates and uses
the efficient treatment-assignment probability to assign treatments. At the end
of the proposed design, the experimenter estimates the ATE using a newly
proposed Adaptive Augmented Inverse Probability Weighting (A2IPW) estimator. We
show that the asymptotic variance of the A2IPW estimator using data from the
proposed design achieves the minimized semiparametric efficiency bound. We also
analyze the estimator's finite-sample properties and develop nonparametric and
nonasymptotic confidence intervals that are valid at any round of the proposed
design. These anytime valid confidence intervals allow us to conduct
rate-optimal sequential hypothesis testing, allowing for early stopping and
reducing necessary sample size.

arXiv link: http://arxiv.org/abs/2002.05308v7

Econometrics arXiv updated paper (originally submitted: 2020-02-12)

Bounds on direct and indirect effects under treatment/mediator endogeneity and outcome attrition

Authors: Martin Huber, Lukáš Lafférs

Causal mediation analysis aims at disentangling a treatment effect into an
indirect mechanism operating through an intermediate outcome or mediator, as
well as the direct effect of the treatment on the outcome of interest. However,
the evaluation of direct and indirect effects is frequently complicated by
non-ignorable selection into the treatment and/or mediator, even after
controlling for observables, as well as sample selection/outcome attrition. We
propose a method for bounding direct and indirect effects in the presence of
such complications using a method that is based on a sequence of linear
programming problems. Considering inverse probability weighting by propensity
scores, we compute the weights that would yield identification in the absence
of complications and perturb them by an entropy parameter reflecting a specific
amount of propensity score misspecification to set-identify the effects of
interest. We apply our method to data from the National Longitudinal Survey of
Youth 1979 to derive bounds on the explained and unexplained components of a
gender wage gap decomposition that is likely prone to non-ignorable mediator
selection and outcome attrition.

arXiv link: http://arxiv.org/abs/2002.05253v3

Econometrics arXiv cross-link from cs.CY (cs.CY), submitted: 2020-02-12

A Hierarchy of Limitations in Machine Learning

Authors: Momin M. Malik

"All models are wrong, but some are useful", wrote George E. P. Box (1979).
Machine learning has focused on the usefulness of probability models for
prediction in social systems, but is only now coming to grips with the ways in
which these models are wrong---and the consequences of those shortcomings. This
paper attempts a comprehensive, structured overview of the specific conceptual,
procedural, and statistical limitations of models in machine learning when
applied to society. Machine learning modelers themselves can use the described
hierarchy to identify possible failure points and think through how to address
them, and consumers of machine learning models can know what to question when
confronted with the decision about if, where, and how to apply machine
learning. The limitations go from commitments inherent in quantification
itself, through to showing how unmodeled dependencies can lead to
cross-validation being overly optimistic as a way of assessing model
performance.

arXiv link: http://arxiv.org/abs/2002.05193v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-02-12

Efficient Policy Learning from Surrogate-Loss Classification Reductions

Authors: Andrew Bennett, Nathan Kallus

Recent work on policy learning from observational data has highlighted the
importance of efficient policy evaluation and has proposed reductions to
weighted (cost-sensitive) classification. But, efficient policy evaluation need
not yield efficient estimation of policy parameters. We consider the estimation
problem given by a weighted surrogate-loss classification reduction of policy
learning with any score function, either direct, inverse-propensity weighted,
or doubly robust. We show that, under a correct specification assumption, the
weighted classification formulation need not be efficient for policy
parameters. We draw a contrast to actual (possibly weighted) binary
classification, where correct specification implies a parametric model, while
for policy learning it only implies a semiparametric model. In light of this,
we instead propose an estimation approach based on generalized method of
moments, which is efficient for the policy parameters. We propose a particular
method based on recent developments on solving moment problems using neural
networks and demonstrate the efficiency and regret benefits of this method
empirically.

arXiv link: http://arxiv.org/abs/2002.05153v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-02-11

Generalized Poisson Difference Autoregressive Processes

Authors: Giulia Carallo, Roberto Casarin, Christian P. Robert

This paper introduces a new stochastic process with values in the set Z of
integers with sign. The increments of process are Poisson differences and the
dynamics has an autoregressive structure. We study the properties of the
process and exploit the thinning representation to derive stationarity
conditions and the stationary distribution of the process. We provide a
Bayesian inference method and an efficient posterior approximation procedure
based on Monte Carlo. Numerical illustrations on both simulated and real data
show the effectiveness of the proposed inference.

arXiv link: http://arxiv.org/abs/2002.04470v1

Econometrics arXiv paper, submitted: 2020-02-11

The Dimension of the Set of Causal Solutions of Linear Multivariate Rational Expectations Models

Authors: Bernd Funovits

This paper analyses the number of free parameters and solutions of the
structural difference equation obtained from a linear multivariate rational
expectations model. First, it is shown that the number of free parameters
depends on the structure of the zeros at zero of a certain matrix polynomial of
the structural difference equation and the number of inputs of the rational
expectations model. Second, the implications of requiring that some components
of the endogenous variables be predetermined are analysed. Third, a condition
for existence and uniqueness of a causal stationary solution is given.

arXiv link: http://arxiv.org/abs/2002.04369v1

Econometrics arXiv updated paper (originally submitted: 2020-02-11)

Identifiability and Estimation of Possibly Non-Invertible SVARMA Models: A New Parametrisation

Authors: Bernd Funovits

This article deals with parameterisation, identifiability, and maximum
likelihood (ML) estimation of possibly non-invertible structural vector
autoregressive moving average (SVARMA) models driven by independent and
non-Gaussian shocks. In contrast to previous literature, the novel
representation of the MA polynomial matrix using the Wiener-Hopf factorisation
(WHF) focuses on the multivariate nature of the model, generates insights into
its structure, and uses this structure for devising optimisation algorithms. In
particular, it allows to parameterise the location of determinantal zeros
inside and outside the unit circle, and it allows for MA zeros at zero, which
can be interpreted as informational delays. This is highly relevant for
data-driven evaluation of Dynamic Stochastic General Equilibrium (DSGE) models.
Typically imposed identifying restrictions on the shock transmission matrix as
well as on the determinantal root location are made testable. Furthermore, we
provide low level conditions for asymptotic normality of the ML estimator and
analytic expressions for the score and the information matrix. As application,
we estimate the Blanchard and Quah model and show that our method provides
further insights regarding non-invertibility using a standard macroeconometric
model. These and further analyses are implemented in a well documented
R-package.

arXiv link: http://arxiv.org/abs/2002.04346v2

Econometrics arXiv paper, submitted: 2020-02-10

Sequential Monitoring of Changes in Housing Prices

Authors: Lajos Horváth, Zhenya Liu, Shanglin Lu

We propose a sequential monitoring scheme to find structural breaks in real
estate markets. The changes in the real estate prices are modeled by a
combination of linear and autoregressive terms. The monitoring scheme is based
on a detector and a suitably chosen boundary function. If the detector crosses
the boundary function, a structural break is detected. We provide the
asymptotics for the procedure under the stability null hypothesis and the
stopping time under the change point alternative. Monte Carlo simulation is
used to show the size and the power of our method under several conditions. We
study the real estate markets in Boston, Los Angeles and at the national U.S.
level. We find structural breaks in the markets, and we segment the data into
stationary segments. It is observed that the autoregressive parameter is
increasing but stays below 1.

arXiv link: http://arxiv.org/abs/2002.04101v1

Econometrics arXiv updated paper (originally submitted: 2020-02-10)

The Effect of Weather Conditions on Fertilizer Applications: A Spatial Dynamic Panel Data Analysis

Authors: Anna Gloria Billè, Marco Rogna

Given the extreme dependence of agriculture on weather conditions, this paper
analyses the effect of climatic variations on this economic sector, by
considering both a huge dataset and a flexible spatio-temporal model
specification. In particular, we study the response of N-fertilizer application
to abnormal weather conditions, while accounting for other relevant control
variables. The dataset consists of gridded data spanning over 21 years
(1993-2013), while the methodological strategy makes use of a spatial dynamic
panel data (SDPD) model that accounts for both space and time fixed effects,
besides dealing with both space and time dependences. Time-invariant short and
long term effects, as well as time-varying marginal effects are also properly
defined, revealing interesting results on the impact of both GDP and weather
conditions on fertilizer utilizations. The analysis considers four
macro-regions -- Europe, South America, South-East Asia and Africa -- to allow
for comparisons among different socio-economic societies. In addition to
finding both spatial (in the form of knowledge spillover effects) and temporal
dependences as well as a good support for the existence of an environmental
Kuznets curve for fertilizer application, the paper shows peculiar responses of
N-fertilization to deviations from normal weather conditions of moisture for
each selected region, calling for ad hoc policy interventions.

arXiv link: http://arxiv.org/abs/2002.03922v2

Econometrics arXiv paper, submitted: 2020-02-10

Markov Switching

Authors: Yong Song, Tomasz Woźniak

Markov switching models are a popular family of models that introduces
time-variation in the parameters in the form of their state- or regime-specific
values. Importantly, this time-variation is governed by a discrete-valued
latent stochastic process with limited memory. More specifically, the current
value of the state indicator is determined only by the value of the state
indicator from the previous period, thus the Markov property, and the
transition matrix. The latter characterizes the properties of the Markov
process by determining with what probability each of the states can be visited
next period, given the state in the current period. This setup decides on the
two main advantages of the Markov switching models. Namely, the estimation of
the probability of state occurrences in each of the sample periods by using
filtering and smoothing methods and the estimation of the state-specific
parameters. These two features open the possibility for improved
interpretations of the parameters associated with specific regimes combined
with the corresponding regime probabilities, as well as for improved
forecasting performance based on persistent regimes and parameters
characterizing them.

arXiv link: http://arxiv.org/abs/2002.03598v1

Econometrics arXiv cross-link from math.PR (math.PR), submitted: 2020-02-08

Asymptotically Optimal Control of a Centralized Dynamic Matching Market with General Utilities

Authors: Jose H. Blanchet, Martin I. Reiman, Viragh Shah, Lawrence M. Wein, Linjia Wu

We consider a matching market where buyers and sellers arrive according to
independent Poisson processes at the same rate and independently abandon the
market if not matched after an exponential amount of time with the same mean.
In this centralized market, the utility for the system manager from matching
any buyer and any seller is a general random variable. We consider a sequence
of systems indexed by $n$ where the arrivals in the $n^{th}$ system
are sped up by a factor of $n$. We analyze two families of one-parameter
policies: the population threshold policy immediately matches an arriving agent
to its best available mate only if the number of mates in the system is above a
threshold, and the utility threshold policy matches an arriving agent to its
best available mate only if the corresponding utility is above a threshold.
Using a fluid analysis of the two-dimensional Markov process of buyers and
sellers, we show that when the matching utility distribution is light-tailed,
the population threshold policy with threshold $n{\ln n}$ is
asymptotically optimal among all policies that make matches only at agent
arrival epochs. In the heavy-tailed case, we characterize the optimal threshold
level for both policies. We also study the utility threshold policy in an
unbalanced matching market with heavy-tailed matching utilities and find that
the buyers and sellers have the same asymptotically optimal utility threshold.
We derive optimal thresholds when the matching utility distribution is
exponential, uniform, Pareto, and correlated Pareto. We find that as the right
tail of the matching utility distribution gets heavier, the threshold level of
each policy (and hence market thickness) increases, as does the magnitude by
which the utility threshold policy outperforms the population threshold policy.

arXiv link: http://arxiv.org/abs/2002.03205v2

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2020-02-06

On Ridership and Frequency

Authors: Simon Berrebi, Sanskruti Joshi, Kari E Watkins

Even before the start of the COVID-19 pandemic, bus ridership in the United
States had attained its lowest level since 1973. If transit agencies hope to
reverse this trend, they must understand how their service allocation policies
affect ridership. This paper is among the first to model ridership trends on a
hyper-local level over time. A Poisson fixed-effects model is developed to
evaluate the ridership elasticity to frequency on weekdays using passenger
count data from Portland, Miami, Minneapolis/St-Paul, and Atlanta between 2012
and 2018. In every agency, ridership is found to be elastic to frequency when
observing the variation between individual route-segments at one point in time.
In other words, the most frequent routes are already the most productive in
terms of passengers per vehicle-trip. When observing the variation within each
route-segment over time, however, ridership is inelastic; each additional
vehicle-trip is expected to generate less ridership than the average bus
already on the route. In three of the four agencies, the elasticity is a
decreasing function of prior frequency, meaning that low-frequency routes are
the most sensitive to changes in frequency. This paper can help transit
agencies anticipate the marginal effect of shifting service throughout the
network. As the quality and availability of passenger count data improve, this
paper can serve as the methodological basis to explore the dynamics of bus
ridership.

arXiv link: http://arxiv.org/abs/2002.02493v3

Econometrics arXiv updated paper (originally submitted: 2020-02-06)

Dependence-Robust Inference Using Resampled Statistics

Authors: Michael P. Leung

We develop inference procedures robust to general forms of weak dependence.
The procedures utilize test statistics constructed by resampling in a manner
that does not depend on the unknown correlation structure of the data. We prove
that the statistics are asymptotically normal under the weak requirement that
the target parameter can be consistently estimated at the parametric rate. This
holds for regular estimators under many well-known forms of weak dependence and
justifies the claim of dependence-robustness. We consider applications to
settings with unknown or complicated forms of dependence, with various forms of
network dependence as leading examples. We develop tests for both moment
equalities and inequalities.

arXiv link: http://arxiv.org/abs/2002.02097v4

Econometrics arXiv cross-link from q-fin.PM (q-fin.PM), submitted: 2020-02-05

Sharpe Ratio Analysis in High Dimensions: Residual-Based Nodewise Regression in Factor Models

Authors: Mehmet Caner, Marcelo Medeiros, Gabriel Vasconcelos

We provide a new theory for nodewise regression when the residuals from a
fitted factor model are used. We apply our results to the analysis of the
consistency of Sharpe ratio estimators when there are many assets in a
portfolio. We allow for an increasing number of assets as well as time
observations of the portfolio. Since the nodewise regression is not feasible
due to the unknown nature of idiosyncratic errors, we provide a
feasible-residual-based nodewise regression to estimate the precision matrix of
errors which is consistent even when number of assets, p, exceeds the time span
of the portfolio, n. In another new development, we also show that the
precision matrix of returns can be estimated consistently, even with an
increasing number of factors and p>n. We show that: (1) with p>n, the Sharpe
ratio estimators are consistent in global minimum-variance and mean-variance
portfolios; and (2) with p>n, the maximum Sharpe ratio estimator is consistent
when the portfolio weights sum to one; and (3) with p<<n, the
maximum-out-of-sample Sharpe ratio estimator is consistent.

arXiv link: http://arxiv.org/abs/2002.01800v5

Econometrics arXiv updated paper (originally submitted: 2020-02-03)

A Neural-embedded Choice Model: TasteNet-MNL Modeling Taste Heterogeneity with Flexibility and Interpretability

Authors: Yafei Han, Francisco Camara Pereira, Moshe Ben-Akiva, Christopher Zegras

Discrete choice models (DCMs) require a priori knowledge of the utility
functions, especially how tastes vary across individuals. Utility
misspecification may lead to biased estimates, inaccurate interpretations and
limited predictability. In this paper, we utilize a neural network to learn
taste representation. Our formulation consists of two modules: a neural network
(TasteNet) that learns taste parameters (e.g., time coefficient) as flexible
functions of individual characteristics; and a multinomial logit (MNL) model
with utility functions defined with expert knowledge. Taste parameters learned
by the neural network are fed into the choice model and link the two modules.
Our approach extends the L-MNL model (Sifringer et al., 2020) by allowing the
neural network to learn the interactions between individual characteristics and
alternative attributes. Moreover, we formalize and strengthen the
interpretability condition - requiring realistic estimates of behavior
indicators (e.g., value-of-time, elasticity) at the disaggregated level, which
is crucial for a model to be suitable for scenario analysis and policy
decisions. Through a unique network architecture and parameter transformation,
we incorporate prior knowledge and guide the neural network to output realistic
behavior indicators at the disaggregated level. We show that TasteNet-MNL
reaches the ground-truth model's predictability and recovers the nonlinear
taste functions on synthetic data. Its estimated value-of-time and choice
elasticities at the individual level are close to the ground truth. On a
publicly available Swissmetro dataset, TasteNet-MNL outperforms benchmarking
MNLs and Mixed Logit model's predictability. It learns a broader spectrum of
taste variations within the population and suggests a higher average
value-of-time.

arXiv link: http://arxiv.org/abs/2002.00922v2

Econometrics arXiv paper, submitted: 2020-02-03

Profit-oriented sales forecasting: a comparison of forecasting techniques from a business perspective

Authors: Tine Van Calster, Filip Van den Bossche, Bart Baesens, Wilfried Lemahieu

Choosing the technique that is the best at forecasting your data, is a
problem that arises in any forecasting application. Decades of research have
resulted into an enormous amount of forecasting methods that stem from
statistics, econometrics and machine learning (ML), which leads to a very
difficult and elaborate choice to make in any forecasting exercise. This paper
aims to facilitate this process for high-level tactical sales forecasts by
comparing a large array of techniques for 35 times series that consist of both
industry data from the Coca-Cola Company and publicly available datasets.
However, instead of solely focusing on the accuracy of the resulting forecasts,
this paper introduces a novel and completely automated profit-driven approach
that takes into account the expected profit that a technique can create during
both the model building and evaluation process. The expected profit function
that is used for this purpose, is easy to understand and adaptable to any
situation by combining forecasting accuracy with business expertise.
Furthermore, we examine the added value of ML techniques, the inclusion of
external factors and the use of seasonal models in order to ascertain which
type of model works best in tactical sales forecasting. Our findings show that
simple seasonal time series models consistently outperform other methodologies
and that the profit-driven approach can lead to selecting a different
forecasting model.

arXiv link: http://arxiv.org/abs/2002.00949v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2020-02-03

NAPLES;Mining the lead-lag Relationship from Non-synchronous and High-frequency Data

Authors: Katsuya Ito, Kei Nakagawa

In time-series analysis, the term "lead-lag effect" is used to describe a
delayed effect on a given time series caused by another time series. lead-lag
effects are ubiquitous in practice and are specifically critical in formulating
investment strategies in high-frequency trading. At present, there are three
major challenges in analyzing the lead-lag effects. First, in practical
applications, not all time series are observed synchronously. Second, the size
of the relevant dataset and rate of change of the environment is increasingly
faster, and it is becoming more difficult to complete the computation within a
particular time limit. Third, some lead-lag effects are time-varying and only
last for a short period, and their delay lengths are often affected by external
factors. In this paper, we propose NAPLES (Negative And Positive lead-lag
EStimator), a new statistical measure that resolves all these problems. Through
experiments on artificial and real datasets, we demonstrate that NAPLES has a
strong correlation with the actual lead-lag effects, including those triggered
by significant macroeconomic announcements.

arXiv link: http://arxiv.org/abs/2002.00724v1

Econometrics arXiv cross-link from q-fin.PR (q-fin.PR), submitted: 2020-02-02

Efficient representation of supply and demand curves on day-ahead electricity markets

Authors: Mariia Soloviova, Tiziano Vargiolu

Our paper aims to model supply and demand curves of electricity day-ahead
auction in a parsimonious way. Our main task is to build an appropriate
algorithm to present the information about electricity prices and demands with
far less parameters than the original one. We represent each curve using
mesh-free interpolation techniques based on radial basis function
approximation. We describe results of this method for the day-ahead IPEX spot
price of Italy.

arXiv link: http://arxiv.org/abs/2002.00507v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2020-02-01

Variable-lag Granger Causality and Transfer Entropy for Time Series Analysis

Authors: Chainarong Amornbunchornvej, Elena Zheleva, Tanya Berger-Wolf

Granger causality is a fundamental technique for causal inference in time
series data, commonly used in the social and biological sciences. Typical
operationalizations of Granger causality make a strong assumption that every
time point of the effect time series is influenced by a combination of other
time series with a fixed time delay. The assumption of fixed time delay also
exists in Transfer Entropy, which is considered to be a non-linear version of
Granger causality. However, the assumption of the fixed time delay does not
hold in many applications, such as collective behavior, financial markets, and
many natural phenomena. To address this issue, we develop Variable-lag Granger
causality and Variable-lag Transfer Entropy, generalizations of both Granger
causality and Transfer Entropy that relax the assumption of the fixed time
delay and allow causes to influence effects with arbitrary time delays. In
addition, we propose methods for inferring both variable-lag Granger causality
and Transfer Entropy relations. In our approaches, we utilize an optimal
warping path of Dynamic Time Warping (DTW) to infer variable-lag causal
relations. We demonstrate our approaches on an application for studying
coordinated collective behavior and other real-world casual-inference datasets
and show that our proposed approaches perform better than several existing
methods in both simulated and real-world datasets. Our approaches can be
applied in any domain of time series analysis. The software of this work is
available in the R-CRAN package: VLTimeCausality.

arXiv link: http://arxiv.org/abs/2002.00208v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-02-01

Natural Experiments

Authors: Rocio Titiunik

The term natural experiment is used inconsistently. In one interpretation, it
refers to an experiment where a treatment is randomly assigned by someone other
than the researcher. In another interpretation, it refers to a study in which
there is no controlled random assignment, but treatment is assigned by some
external factor in a way that loosely resembles a randomized experiment---often
described as an "as if random" assignment. In yet another interpretation, it
refers to any non-randomized study that compares a treatment to a control
group, without any specific requirements on how the treatment is assigned. I
introduce an alternative definition that seeks to clarify the integral features
of natural experiments and at the same time distinguish them from randomized
controlled experiments. I define a natural experiment as a research study where
the treatment assignment mechanism (i) is neither designed nor implemented by
the researcher, (ii) is unknown to the researcher, and (iii) is probabilistic
by virtue of depending on an external factor. The main message of this
definition is that the difference between a randomized controlled experiment
and a natural experiment is not a matter of degree, but of essence, and thus
conceptualizing a natural experiment as a research design akin to a randomized
experiment is neither rigorous nor a useful guide to empirical analysis. Using
my alternative definition, I discuss how a natural experiment differs from a
traditional observational study, and offer practical recommendations for
researchers who wish to use natural experiments to study causal effects.

arXiv link: http://arxiv.org/abs/2002.00202v1

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2020-01-31

Estimating Welfare Effects in a Nonparametric Choice Model: The Case of School Vouchers

Authors: Vishal Kamat, Samuel Norris

We develop new robust discrete choice tools to learn about the average
willingness to pay for a price subsidy and its effects on demand given
exogenous, discrete variation in prices. Our starting point is a nonparametric,
nonseparable model of choice. We exploit the insight that our welfare
parameters in this model can be expressed as functions of demand for the
different alternatives. However, while the variation in the data reveals the
value of demand at the observed prices, the parameters generally depend on its
values beyond these prices. We show how to sharply characterize what we can
learn when demand is specified to be entirely nonparametric or to be
parameterized in a flexible manner, both of which imply that the parameters are
not necessarily point identified. We use our tools to analyze the welfare
effects of price subsidies provided by school vouchers in the DC Opportunity
Scholarship Program. We find that the provision of the status quo voucher and a
wide range of counterfactual vouchers of different amounts can have positive
and potentially large benefits net of costs. The positive effect can be
explained by the popularity of low-tuition schools in the program; removing
them from the program can result in a negative net benefit. We also find that
various standard logit specifications, in comparison, limit attention to demand
functions with low demand for the voucher, which do not capture the large
magnitudes of benefits credibly consistent with the data.

arXiv link: http://arxiv.org/abs/2002.00103v6

Econometrics arXiv paper, submitted: 2020-01-29

Blocked Clusterwise Regression

Authors: Max Cytrynbaum

A recent literature in econometrics models unobserved cross-sectional
heterogeneity in panel data by assigning each cross-sectional unit a
one-dimensional, discrete latent type. Such models have been shown to allow
estimation and inference by regression clustering methods. This paper is
motivated by the finding that the clustered heterogeneity models studied in
this literature can be badly misspecified, even when the panel has significant
discrete cross-sectional structure. To address this issue, we generalize
previous approaches to discrete unobserved heterogeneity by allowing each unit
to have multiple, imperfectly-correlated latent variables that describe its
response-type to different covariates. We give inference results for a k-means
style estimator of our model and develop information criteria to jointly select
the number clusters for each latent variable. Monte Carlo simulations confirm
our theoretical results and give intuition about the finite-sample performance
of estimation and model selection. We also contribute to the theory of
clustering with an over-specified number of clusters and derive new convergence
rates for this setting. Our results suggest that over-fitting can be severe in
k-means style estimators when the number of clusters is over-specified.

arXiv link: http://arxiv.org/abs/2001.11130v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2020-01-29

Functional Sequential Treatment Allocation with Covariates

Authors: Anders Bredahl Kock, David Preinerstorfer, Bezirgen Veliyev

We consider a multi-armed bandit problem with covariates. Given a realization
of the covariate vector, instead of targeting the treatment with highest
conditional expectation, the decision maker targets the treatment which
maximizes a general functional of the conditional potential outcome
distribution, e.g., a conditional quantile, trimmed mean, or a socio-economic
functional such as an inequality, welfare or poverty measure. We develop
expected regret lower bounds for this problem, and construct a near minimax
optimal assignment policy.

arXiv link: http://arxiv.org/abs/2001.10996v1

Econometrics arXiv paper, submitted: 2020-01-28

Frequentist Shrinkage under Inequality Constraints

Authors: Edvard Bakhitov

This paper shows how to shrink extremum estimators towards inequality
constraints motivated by economic theory. We propose an Inequality Constrained
Shrinkage Estimator (ICSE) which takes the form of a weighted average between
the unconstrained and inequality constrained estimators with the data dependent
weight. The weight drives both the direction and degree of shrinkage. We use a
local asymptotic framework to derive the asymptotic distribution and risk of
the ICSE. We provide conditions under which the asymptotic risk of the ICSE is
strictly less than that of the unrestricted extremum estimator. The degree of
shrinkage cannot be consistently estimated under the local asymptotic
framework. To address this issue, we propose a feasible plug-in estimator and
investigate its finite sample behavior. We also apply our framework to gasoline
demand estimation under the Slutsky restriction.

arXiv link: http://arxiv.org/abs/2001.10586v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-01-28

Skills to not fall behind in school

Authors: Felipe Maia Polo

Many recent studies emphasize how important the role of cognitive and
social-emotional skills can be in determining people's quality of life.
Although skills are of great importance in many aspects, in this paper we will
focus our efforts to better understand the relationship between several types
of skills with academic progress delay. Our dataset contains the same students
in 2012 and 2017, and we consider that there was a academic progress delay for
a specific student if he or she progressed less than expected in school grades.
Our methodology primarily includes the use of a Bayesian logistic regression
model and our results suggest that both cognitive and social-emotional skills
may impact the conditional probability of falling behind in school, and the
magnitude of the impact between the two types of skills can be comparable.

arXiv link: http://arxiv.org/abs/2001.10519v1

Econometrics arXiv paper, submitted: 2020-01-27

Risk Fluctuation Characteristics of Internet Finance: Combining Industry Characteristics with Ecological Value

Authors: Runjie Xu, Chuanmin Mi, Nan Ye, Tom Marshall, Yadong Xiao, Hefan Shuai

The Internet plays a key role in society and is vital to economic
development. Due to the pressure of competition, most technology companies,
including Internet finance companies, continue to explore new markets and new
business. Funding subsidies and resource inputs have led to significant
business income tendencies in financial statements. This tendency of business
income is often manifested as part of the business loss or long-term
unprofitability. We propose a risk change indicator (RFR) and compare the risk
indicator of fourteen representative companies. This model combines extreme
risk value with slope, and the combination method is simple and effective. The
results of experiment show the potential of this model. The risk volatility of
technology enterprises including Internet finance enterprises is highly
cyclical, and the risk volatility of emerging Internet fintech companies is
much higher than that of other technology companies.

arXiv link: http://arxiv.org/abs/2001.09798v1

Econometrics arXiv updated paper (originally submitted: 2020-01-27)

Estimating Marginal Treatment Effects under Unobserved Group Heterogeneity

Authors: Tadao Hoshino, Takahide Yanagi

This paper studies treatment effect models in which individuals are
classified into unobserved groups based on heterogeneous treatment rules. Using
a finite mixture approach, we propose a marginal treatment effect (MTE)
framework in which the treatment choice and outcome equations can be
heterogeneous across groups. Under the availability of instrumental variables
specific to each group, we show that the MTE for each group can be separately
identified. Based on our identification result, we propose a two-step
semiparametric procedure for estimating the group-wise MTE. We illustrate the
usefulness of the proposed method with an application to economic returns to
college education.

arXiv link: http://arxiv.org/abs/2001.09560v6

Econometrics arXiv paper, submitted: 2020-01-25

Bayesian Panel Quantile Regression for Binary Outcomes with Correlated Random Effects: An Application on Crime Recidivism in Canada

Authors: Georges Bresson, Guy Lacroix, Mohammad Arshad Rahman

This article develops a Bayesian approach for estimating panel quantile
regression with binary outcomes in the presence of correlated random effects.
We construct a working likelihood using an asymmetric Laplace (AL) error
distribution and combine it with suitable prior distributions to obtain the
complete joint posterior distribution. For posterior inference, we propose two
Markov chain Monte Carlo (MCMC) algorithms but prefer the algorithm that
exploits the blocking procedure to produce lower autocorrelation in the MCMC
draws. We also explain how to use the MCMC draws to calculate the marginal
effects, relative risk and odds ratio. The performance of our preferred
algorithm is demonstrated in multiple simulation studies and shown to perform
extremely well. Furthermore, we implement the proposed framework to study crime
recidivism in Quebec, a Canadian Province, using a novel data from the
administrative correctional files. Our results suggest that the recently
implemented "tough-on-crime" policy of the Canadian government has been largely
successful in reducing the probability of repeat offenses in the post-policy
period. Besides, our results support existing findings on crime recidivism and
offer new insights at various quantiles.

arXiv link: http://arxiv.org/abs/2001.09295v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2020-01-22

Saddlepoint approximations for spatial panel data models

Authors: Chaonan Jiang, Davide La Vecchia, Elvezio Ronchetti, Olivier Scaillet

We develop new higher-order asymptotic techniques for the Gaussian maximum
likelihood estimator in a spatial panel data model, with fixed effects,
time-varying covariates, and spatially correlated errors. Our saddlepoint
density and tail area approximation feature relative error of order
$O(1/(n(T-1)))$ with $n$ being the cross-sectional dimension and $T$ the
time-series dimension. The main theoretical tool is the tilted-Edgeworth
technique in a non-identically distributed setting. The density approximation
is always non-negative, does not need resampling, and is accurate in the tails.
Monte Carlo experiments on density approximation and testing in the presence of
nuisance parameters illustrate the good performance of our approximation over
first-order asymptotics and Edgeworth expansions. An empirical application to
the investment-saving relationship in OECD (Organisation for Economic
Co-operation and Development) countries shows disagreement between testing
results based on first-order asymptotics and saddlepoint techniques.

arXiv link: http://arxiv.org/abs/2001.10377v3

Econometrics arXiv updated paper (originally submitted: 2020-01-22)

Oracle Efficient Estimation of Structural Breaks in Cointegrating Regressions

Authors: Karsten Schweikert

In this paper, we propose an adaptive group lasso procedure to efficiently
estimate structural breaks in cointegrating regressions. It is well-known that
the group lasso estimator is not simultaneously estimation consistent and model
selection consistent in structural break settings. Hence, we use a first step
group lasso estimation of a diverging number of breakpoint candidates to
produce weights for a second adaptive group lasso estimation. We prove that
parameter changes are estimated consistently by group lasso and show that the
number of estimated breaks is greater than the true number but still
sufficiently close to it. Then, we use these results and prove that the
adaptive group lasso has oracle properties if weights are obtained from our
first step estimation. Simulation results show that the proposed estimator
delivers the expected results. An economic application to the long-run US money
demand function demonstrates the practical importance of this methodology.

arXiv link: http://arxiv.org/abs/2001.07949v4

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2020-01-20

Fundamental Limits of Testing the Independence of Irrelevant Alternatives in Discrete Choice

Authors: Arjun Seshadri, Johan Ugander

The Multinomial Logit (MNL) model and the axiom it satisfies, the
Independence of Irrelevant Alternatives (IIA), are together the most widely
used tools of discrete choice. The MNL model serves as the workhorse model for
a variety of fields, but is also widely criticized, with a large body of
experimental literature claiming to document real-world settings where IIA
fails to hold. Statistical tests of IIA as a modelling assumption have been the
subject of many practical tests focusing on specific deviations from IIA over
the past several decades, but the formal size properties of hypothesis testing
IIA are still not well understood. In this work we replace some of the
ambiguity in this literature with rigorous pessimism, demonstrating that any
general test for IIA with low worst-case error would require a number of
samples exponential in the number of alternatives of the choice problem. A
major benefit of our analysis over previous work is that it lies entirely in
the finite-sample domain, a feature crucial to understanding the behavior of
tests in the common data-poor settings of discrete choice. Our lower bounds are
structure-dependent, and as a potential cause for optimism, we find that if one
restricts the test of IIA to violations that can occur in a specific collection
of choice sets (e.g., pairs), one obtains structure-dependent lower bounds that
are much less pessimistic. Our analysis of this testing problem is unorthodox
in being highly combinatorial, counting Eulerian orientations of cycle
decompositions of a particular bipartite graph constructed from a data set of
choices. By identifying fundamental relationships between the comparison
structure of a given testing problem and its sample efficiency, we hope these
relationships will help lay the groundwork for a rigorous rethinking of the IIA
testing problem as well as other testing problems in discrete choice.

arXiv link: http://arxiv.org/abs/2001.07042v1

Econometrics arXiv updated paper (originally submitted: 2020-01-19)

Efficient and Robust Estimation of the Generalized LATE Model

Authors: Haitian Xie

This paper studies the estimation of causal parameters in the generalized
local average treatment effect (GLATE) model, a generalization of the classical
LATE model encompassing multi-valued treatment and instrument. We derive the
efficient influence function (EIF) and the semiparametric efficiency bound
(SPEB) for two types of parameters: local average structural function (LASF)
and local average structural function for the treated (LASF-T). The moment
condition generated by the EIF satisfies two robustness properties: double
robustness and Neyman orthogonality. Based on the robust moment condition, we
propose the double/debiased machine learning (DML) estimators for LASF and
LASF-T. The DML estimator is semiparametric efficient and suitable for high
dimensional settings. We also propose null-restricted inference methods that
are robust against weak identification issues. As an empirical application, we
study the effects across different sources of health insurance by applying the
developed methods to the Oregon Health Insurance Experiment.

arXiv link: http://arxiv.org/abs/2001.06746v2

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2020-01-18

A tail dependence-based MST and their topological indicators in modelling systemic risk in the European insurance sector

Authors: Anna Denkowska, Stanisław Wanat

In the present work we analyse the dynamics of indirect connections between
insurance companies that result from market price channels. In our analysis we
assume that the stock quotations of insurance companies reflect market
sentiments which constitute a very important systemic risk factor.
Interlinkages between insurers and their dynamics have a direct impact on
systemic risk contagion in the insurance sector. We propose herein a new hybrid
approach to the analysis of interlinkages dynamics based on combining the
copula-DCC-GARCH model and Minimum Spanning Trees (MST). Using the
copula-DCC-GARCH model we determine the tail dependence coefficients. Then, for
each analysed period we construct MST based on these coefficients. The dynamics
is analysed by means of time series of selected topological indicators of the
MSTs in the years 2005-2019. Our empirical results show the usefulness of the
proposed approach to the analysis of systemic risk in the insurance sector. The
times series obtained from the proposed hybrid approach reflect the phenomena
occurring on the market. The analysed MST topological indicators can be
considered as systemic risk predictors.

arXiv link: http://arxiv.org/abs/2001.06567v2

Econometrics arXiv updated paper (originally submitted: 2020-01-17)

Entropy Balancing for Continuous Treatments

Authors: Stefan Tübbicke

This paper introduces entropy balancing for continuous treatments (EBCT) by
extending the original entropy balancing methodology of Hainm\"uller (2012). In
order to estimate balancing weights, the proposed approach solves a globally
convex constrained optimization problem. EBCT weights reliably eradicate
Pearson correlations between covariates and the continuous treatment variable.
This is the case even when other methods based on the generalized propensity
score tend to yield insufficient balance due to strong selection into different
treatment intensities. Moreover, the optimization procedure is more successful
in avoiding extreme weights attached to a single unit. Extensive Monte-Carlo
simulations show that treatment effect estimates using EBCT display similar or
lower bias and uniformly lower root mean squared error. These properties make
EBCT an attractive method for the evaluation of continuous treatments.

arXiv link: http://arxiv.org/abs/2001.06281v2

Econometrics arXiv updated paper (originally submitted: 2020-01-17)

Distributional synthetic controls

Authors: Florian Gunsilius

This article extends the widely-used synthetic controls estimator for
evaluating causal effects of policy changes to quantile functions. The proposed
method provides a geometrically faithful estimate of the entire counterfactual
quantile function of the treated unit. Its appeal stems from an efficient
implementation via a constrained quantile-on-quantile regression. This
constitutes a novel concept of independent interest. The method provides a
unique counterfactual quantile function in any scenario: for continuous,
discrete or mixed distributions. It operates in both repeated cross-sections
and panel data with as little as a single pre-treatment period. The article
also provides abstract identification results by showing that any synthetic
controls method, classical or our generalization, provides the correct
counterfactual for causal models that preserve distances between the outcome
distributions. Working with whole quantile functions instead of aggregate
values allows for tests of equality and stochastic dominance of the
counterfactual- and the observed distribution. It can provide causal inference
on standard outcomes like average- or quantile treatment effects, but also more
general concepts such as counterfactual Lorenz curves or interquartile ranges.

arXiv link: http://arxiv.org/abs/2001.06118v5

Econometrics arXiv paper, submitted: 2020-01-16

Recovering Network Structure from Aggregated Relational Data using Penalized Regression

Authors: Hossein Alidaee, Eric Auerbach, Michael P. Leung

Social network data can be expensive to collect. Breza et al. (2017) propose
aggregated relational data (ARD) as a low-cost substitute that can be used to
recover the structure of a latent social network when it is generated by a
specific parametric random effects model. Our main observation is that many
economic network formation models produce networks that are effectively
low-rank. As a consequence, network recovery from ARD is generally possible
without parametric assumptions using a nuclear-norm penalized regression. We
demonstrate how to implement this method and provide finite-sample bounds on
the mean squared error of the resulting estimator for the distribution of
network links. Computation takes seconds for samples with hundreds of
observations. Easy-to-use code in R and Python can be found at
https://github.com/mpleung/ARD.

arXiv link: http://arxiv.org/abs/2001.06052v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-01-14

Sparse Covariance Estimation in Logit Mixture Models

Authors: Youssef M Aboutaleb, Mazen Danaf, Yifei Xie, Moshe Ben-Akiva

This paper introduces a new data-driven methodology for estimating sparse
covariance matrices of the random coefficients in logit mixture models.
Researchers typically specify covariance matrices in logit mixture models under
one of two extreme assumptions: either an unrestricted full covariance matrix
(allowing correlations between all random coefficients), or a restricted
diagonal matrix (allowing no correlations at all). Our objective is to find
optimal subsets of correlated coefficients for which we estimate covariances.
We propose a new estimator, called MISC, that uses a mixed-integer optimization
(MIO) program to find an optimal block diagonal structure specification for the
covariance matrix, corresponding to subsets of correlated coefficients, for any
desired sparsity level using Markov Chain Monte Carlo (MCMC) posterior draws
from the unrestricted full covariance matrix. The optimal sparsity level of the
covariance matrix is determined using out-of-sample validation. We demonstrate
the ability of MISC to correctly recover the true covariance structure from
synthetic data. In an empirical illustration using a stated preference survey
on modes of transportation, we use MISC to obtain a sparse covariance matrix
indicating how preferences for attributes are related to one another.

arXiv link: http://arxiv.org/abs/2001.05034v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-01-14

A Higher-Order Correct Fast Moving-Average Bootstrap for Dependent Data

Authors: Davide La Vecchia, Alban Moor, Olivier Scaillet

We develop and implement a novel fast bootstrap for dependent data. Our
scheme is based on the i.i.d. resampling of the smoothed moment indicators. We
characterize the class of parametric and semi-parametric estimation problems
for which the method is valid. We show the asymptotic refinements of the
proposed procedure, proving that it is higher-order correct under mild
assumptions on the time series, the estimating functions, and the smoothing
kernel. We illustrate the applicability and the advantages of our procedure for
Generalized Empirical Likelihood estimation. As a by-product, our fast
bootstrap provides higher-order correct asymptotic confidence distributions.
Monte Carlo simulations on an autoregressive conditional duration model provide
numerical evidence that the novel bootstrap yields higher-order accurate
confidence intervals. A real-data application on dynamics of trading volume of
stocks illustrates the advantage of our method over the routinely-applied
first-order asymptotic theory, when the underlying distribution of the test
statistic is skewed or fat-tailed.

arXiv link: http://arxiv.org/abs/2001.04867v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-01-13

Panel Data Quantile Regression for Treatment Effect Models

Authors: Takuya Ishihara

In this study, we develop a novel estimation method for quantile treatment
effects (QTE) under rank invariance and rank stationarity assumptions. Ishihara
(2020) explores identification of the nonseparable panel data model under these
assumptions and proposes a parametric estimation based on the minimum distance
method. However, when the dimensionality of the covariates is large, the
minimum distance estimation using this process is computationally demanding. To
overcome this problem, we propose a two-step estimation method based on the
quantile regression and minimum distance methods. We then show the uniform
asymptotic properties of our estimator and the validity of the nonparametric
bootstrap. The Monte Carlo studies indicate that our estimator performs well in
finite samples. Finally, we present two empirical illustrations, to estimate
the distributional effects of insurance provision on household production and
TV watching on child cognitive development.

arXiv link: http://arxiv.org/abs/2001.04324v3

Econometrics arXiv paper, submitted: 2020-01-12

A multi-country dynamic factor model with stochastic volatility for euro area business cycle analysis

Authors: Florian Huber, Michael Pfarrhofer, Philipp Piribauer

This paper develops a dynamic factor model that uses euro area (EA)
country-specific information on output and inflation to estimate an area-wide
measure of the output gap. Our model assumes that output and inflation can be
decomposed into country-specific stochastic trends and a common cyclical
component. Comovement in the trends is introduced by imposing a factor
structure on the shocks to the latent states. We moreover introduce flexible
stochastic volatility specifications to control for heteroscedasticity in the
measurement errors and innovations to the latent states. Carefully specified
shrinkage priors allow for pushing the model towards a homoscedastic
specification, if supported by the data. Our measure of the output gap closely
tracks other commonly adopted measures, with small differences in magnitudes
and timing. To assess whether the model-based output gap helps in forecasting
inflation, we perform an out-of-sample forecasting exercise. The findings
indicate that our approach yields superior inflation forecasts, both in terms
of point and density predictions.

arXiv link: http://arxiv.org/abs/2001.03935v1

Econometrics arXiv updated paper (originally submitted: 2020-01-12)

Two-Step Estimation of a Strategic Network Formation Model with Clustering

Authors: Geert Ridder, Shuyang Sheng

This paper explores strategic network formation under incomplete information
using data from a single large network. We allow the utility function to be
nonseparable in an individual's link choices to capture the spillover effects
from friends in common. In a network with n individuals, an individual with a
nonseparable utility function chooses between 2^{n-1} overlapping portfolios of
links. We develop a novel approach that applies the Legendre transform to the
utility function so that the optimal link choices can be represented as a
sequence of correlated binary choices. The link dependence that results from
the preference for friends in common is captured by an auxiliary variable
introduced by the Legendre transform. We propose a two-step estimator that is
consistent and asymptotically normal. We also derive a limiting approximation
of the game as n grows large that simplifies the computation in large networks.
We apply these methods to favor exchange networks in rural India and find that
the direction of support from a mutual link matters in facilitating favor
provision.

arXiv link: http://arxiv.org/abs/2001.03838v4

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2020-01-04

Bayesian Median Autoregression for Robust Time Series Forecasting

Authors: Zijian Zeng, Meng Li

We develop a Bayesian median autoregressive (BayesMAR) model for time series
forecasting. The proposed method utilizes time-varying quantile regression at
the median, favorably inheriting the robustness of median regression in
contrast to the widely used mean-based methods. Motivated by a working Laplace
likelihood approach in Bayesian quantile regression, BayesMAR adopts a
parametric model bearing the same structure as autoregressive models by
altering the Gaussian error to Laplace, leading to a simple, robust, and
interpretable modeling strategy for time series forecasting. We estimate model
parameters by Markov chain Monte Carlo. Bayesian model averaging is used to
account for model uncertainty, including the uncertainty in the autoregressive
order, in addition to a Bayesian model selection approach. The proposed methods
are illustrated using simulations and real data applications. An application to
U.S. macroeconomic data forecasting shows that BayesMAR leads to favorable and
often superior predictive performance compared to the selected mean-based
alternatives under various loss functions that encompass both point and
probabilistic forecasts. The proposed methods are generic and can be used to
complement a rich class of methods that build on autoregressive models.

arXiv link: http://arxiv.org/abs/2001.01116v2

Econometrics arXiv updated paper (originally submitted: 2020-01-03)

Logical Differencing in Dyadic Network Formation Models with Nontransferable Utilities

Authors: Wayne Yuan Gao, Ming Li, Sheng Xu

This paper considers a semiparametric model of dyadic network formation under
nontransferable utilities (NTU). NTU arises frequently in real-world social
interactions that require bilateral consent, but by its nature induces additive
non-separability. We show how unobserved individual heterogeneity in our model
can be canceled out without additive separability, using a novel method we call
logical differencing. The key idea is to construct events involving the
intersection of two mutually exclusive restrictions on the unobserved
heterogeneity, based on multivariate monotonicity. We provide a consistent
estimator and analyze its performance via simulation, and apply our method to
the Nyakatoke risk-sharing networks.

arXiv link: http://arxiv.org/abs/2001.00691v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2020-01-02

Prediction in locally stationary time series

Authors: Holger Dette, Weichi Wu

We develop an estimator for the high-dimensional covariance matrix of a
locally stationary process with a smoothly varying trend and use this statistic
to derive consistent predictors in non-stationary time series. In contrast to
the currently available methods for this problem the predictor developed here
does not rely on fitting an autoregressive model and does not require a
vanishing trend. The finite sample properties of the new methodology are
illustrated by means of a simulation study and a financial indices study.

arXiv link: http://arxiv.org/abs/2001.00419v2

Econometrics arXiv paper, submitted: 2019-12-30

Recovering Latent Variables by Matching

Authors: Manuel Arellano, Stephane Bonhomme

We propose an optimal-transport-based matching method to nonparametrically
estimate linear models with independent latent variables. The method consists
in generating pseudo-observations from the latent variables, so that the
Euclidean distance between the model's predictions and their matched
counterparts in the data is minimized. We show that our nonparametric estimator
is consistent, and we document that it performs well in simulated data. We
apply this method to study the cyclicality of permanent and transitory income
shocks in the Panel Study of Income Dynamics. We find that the dispersion of
income shocks is approximately acyclical, whereas the skewness of permanent
shocks is procyclical. By comparison, we find that the dispersion and skewness
of shocks to hourly wages vary little with the business cycle.

arXiv link: http://arxiv.org/abs/1912.13081v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-12-30

Adaptive Discrete Smoothing for High-Dimensional and Nonlinear Panel Data

Authors: Xi Chen, Ye Luo, Martin Spindler

In this paper we develop a data-driven smoothing technique for
high-dimensional and non-linear panel data models. We allow for individual
specific (non-linear) functions and estimation with econometric or machine
learning methods by using weighted observations from other individuals. The
weights are determined by a data-driven way and depend on the similarity
between the corresponding functions and are measured based on initial
estimates. The key feature of such a procedure is that it clusters individuals
based on the distance / similarity between them, estimated in a first stage.
Our estimation method can be combined with various statistical estimation
procedures, in particular modern machine learning methods which are in
particular fruitful in the high-dimensional case and with complex,
heterogeneous data. The approach can be interpreted as a \textquotedblleft
soft-clustering\textquotedblright\ in comparison to
traditional\textquotedblleft\ hard clustering\textquotedblright that assigns
each individual to exactly one group. We conduct a simulation study which shows
that the prediction can be greatly improved by using our estimator. Finally, we
analyze a big data set from didichuxing.com, a leading company in
transportation industry, to analyze and predict the gap between supply and
demand based on a large set of covariates. Our estimator clearly performs much
better in out-of-sample prediction compared to existing linear panel data
estimators.

arXiv link: http://arxiv.org/abs/1912.12867v2

Econometrics arXiv updated paper (originally submitted: 2019-12-30)

Priority to unemployed immigrants? A causal machine learning evaluation of training in Belgium

Authors: Bart Cockx, Michael Lechner, Joost Bollens

Based on administrative data of unemployed in Belgium, we estimate the labour
market effects of three training programmes at various aggregation levels using
Modified Causal Forests, a causal machine learning estimator. While all
programmes have positive effects after the lock-in period, we find substantial
heterogeneity across programmes and unemployed. Simulations show that
'black-box' rules that reassign unemployed to programmes that maximise
estimated individual gains can considerably improve effectiveness: up to 20
percent more (less) time spent in (un)employment within a 30 months window. A
shallow policy tree delivers a simple rule that realizes about 70 percent of
this gain.

arXiv link: http://arxiv.org/abs/1912.12864v4

Econometrics arXiv paper, submitted: 2019-12-29

Credit Risk: Simple Closed Form Approximate Maximum Likelihood Estimator

Authors: Anand Deo, Sandeep Juneja

We consider discrete default intensity based and logit type reduced form
models for conditional default probabilities for corporate loans where we
develop simple closed form approximations to the maximum likelihood estimator
(MLE) when the underlying covariates follow a stationary Gaussian process. In a
practically reasonable asymptotic regime where the default probabilities are
small, say 1-3% annually, the number of firms and the time period of data
available is reasonably large, we rigorously show that the proposed estimator
behaves similarly or slightly worse than the MLE when the underlying model is
correctly specified. For more realistic case of model misspecification, both
estimators are seen to be equally good, or equally bad. Further, beyond a
point, both are more-or-less insensitive to increase in data. These conclusions
are validated on empirical and simulated data. The proposed approximations
should also have applications outside finance, where logit-type models are used
and probabilities of interest are small.

arXiv link: http://arxiv.org/abs/1912.12611v1

Econometrics arXiv paper, submitted: 2019-12-28

Bayesian estimation of large dimensional time varying VARs using copulas

Authors: Mike Tsionas, Marwan Izzeldin, Lorenzo Trapani

This paper provides a simple, yet reliable, alternative to the (Bayesian)
estimation of large multivariate VARs with time variation in the conditional
mean equations and/or in the covariance structure. With our new methodology,
the original multivariate, n dimensional model is treated as a set of n
univariate estimation problems, and cross-dependence is handled through the use
of a copula. Thus, only univariate distribution functions are needed when
estimating the individual equations, which are often available in closed form,
and easy to handle with MCMC (or other techniques). Estimation is carried out
in parallel for the individual equations. Thereafter, the individual posteriors
are combined with the copula, so obtaining a joint posterior which can be
easily resampled. We illustrate our approach by applying it to a large
time-varying parameter VAR with 25 macroeconomic variables.

arXiv link: http://arxiv.org/abs/1912.12527v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-12-27

Minimax Semiparametric Learning With Approximate Sparsity

Authors: Jelena Bradic, Victor Chernozhukov, Whitney K. Newey, Yinchu Zhu

Estimating linear, mean-square continuous functionals is a pivotal challenge
in statistics. In high-dimensional contexts, this estimation is often performed
under the assumption of exact model sparsity, meaning that only a small number
of parameters are precisely non-zero. This excludes models where linear
formulations only approximate the underlying data distribution, such as
nonparametric regression methods that use basis expansion such as splines,
kernel methods or polynomial regressions. Many recent methods for root-$n$
estimation have been proposed, but the implications of exact model sparsity
remain largely unexplored. In particular, minimax optimality for models that
are not exactly sparse has not yet been developed. This paper formalizes the
concept of approximate sparsity through classical semi-parametric theory. We
derive minimax rates under this formulation for a regression slope and an
average derivative, finding these bounds to be substantially larger than those
in low-dimensional, semi-parametric settings. We identify several new
phenomena. We discover new regimes where rate double robustness does not hold,
yet root-$n$ estimation is still possible. In these settings, we propose an
estimator that achieves minimax optimal rates. Our findings further reveal
distinct optimality boundaries for ordered versus unordered nonparametric
regression estimation.

arXiv link: http://arxiv.org/abs/1912.12213v7

Econometrics arXiv paper, submitted: 2019-12-26

Pareto models for risk management

Authors: Arthur Charpentier, Emmanuel Flachaire

The Pareto model is very popular in risk management, since simple analytical
formulas can be derived for financial downside risk measures (Value-at-Risk,
Expected Shortfall) or reinsurance premiums and related quantities (Large Claim
Index, Return Period). Nevertheless, in practice, distributions are (strictly)
Pareto only in the tails, above (possible very) large threshold. Therefore, it
could be interesting to take into account second order behavior to provide a
better fit. In this article, we present how to go from a strict Pareto model to
Pareto-type distributions. We discuss inference, and derive formulas for
various measures and indices, and finally provide applications on insurance
losses and financial risks.

arXiv link: http://arxiv.org/abs/1912.11736v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-12-23

Probability Assessments of an Ice-Free Arctic: Comparing Statistical and Climate Model Projections

Authors: Francis X. Diebold, Glenn D. Rudebusch

The downward trend in the amount of Arctic sea ice has a wide range of
environmental and economic consequences including important effects on the pace
and intensity of global climate change. Based on several decades of satellite
data, we provide statistical forecasts of Arctic sea ice extent during the rest
of this century. The best fitting statistical model indicates that overall sea
ice coverage is declining at an increasing rate. By contrast, average
projections from the CMIP5 global climate models foresee a gradual slowing of
Arctic sea ice loss even in scenarios with high carbon emissions. Our
long-range statistical projections also deliver probability assessments of the
timing of an ice-free Arctic. These results indicate almost a 60 percent chance
of an effectively ice-free Arctic Ocean sometime during the 2030s -- much
earlier than the average projection from the global climate models.

arXiv link: http://arxiv.org/abs/1912.10774v4

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-12-22

Improved Central Limit Theorem and bootstrap approximations in high dimensions

Authors: Victor Chernozhukov, Denis Chetverikov, Kengo Kato, Yuta Koike

This paper deals with the Gaussian and bootstrap approximations to the
distribution of the max statistic in high dimensions. This statistic takes the
form of the maximum over components of the sum of independent random vectors
and its distribution plays a key role in many high-dimensional econometric
problems. Using a novel iterative randomized Lindeberg method, the paper
derives new bounds for the distributional approximation errors. These new
bounds substantially improve upon existing ones and simultaneously allow for a
larger class of bootstrap methods.

arXiv link: http://arxiv.org/abs/1912.10529v2

Econometrics arXiv cross-link from q-fin.RM (q-fin.RM), submitted: 2019-12-22

Building and Testing Yield Curve Generators for P&C Insurance

Authors: Gary Venter, Kailan Shang

Interest-rate risk is a key factor for property-casualty insurer capital. P&C
companies tend to be highly leveraged, with bond holdings much greater than
capital. For GAAP capital, bonds are marked to market but liabilities are not,
so shifts in the yield curve can have a significant impact on capital.
Yield-curve scenario generators are one approach to quantifying this risk. They
produce many future simulated evolutions of the yield curve, which can be used
to quantify the probabilities of bond-value changes that would result from
various maturity-mix strategies. Some of these generators are provided as
black-box models where the user gets only the projected scenarios. One focus of
this paper is to provide methods for testing generated scenarios from such
models by comparing to known distributional properties of yield curves.
P&C insurers hold bonds to maturity and manage cash-flow risk by matching
asset and liability flows. Derivative pricing and stochastic volatility are of
little concern over the relevant time frames. This requires different models
and model testing than what is common in the broader financial markets.
To complicate things further, interest rates for the last decade have not
been following the patterns established in the sixty years following WWII. We
are now coming out of the period of very low rates, yet are still not returning
to what had been thought of as normal before that. Modeling and model testing
are in an evolving state while new patterns emerge.
Our analysis starts with a review of the literature on interest-rate model
testing, with a P&C focus, and an update of the tests for current market
behavior. We then discuss models, and use them to illustrate the fitting and
testing methods. The testing discussion does not require the model-building
section.

arXiv link: http://arxiv.org/abs/1912.10526v1

Econometrics arXiv updated paper (originally submitted: 2019-12-22)

Efficient and Convergent Sequential Pseudo-Likelihood Estimation of Dynamic Discrete Games

Authors: Adam Dearing, Jason R. Blevins

We propose a new sequential Efficient Pseudo-Likelihood (k-EPL) estimator for
dynamic discrete choice games of incomplete information. k-EPL considers the
joint behavior of multiple players simultaneously, as opposed to individual
responses to other agents' equilibrium play. This, in addition to reframing the
problem from conditional choice probability (CCP) space to value function
space, yields a computationally tractable, stable, and efficient estimator. We
show that each iteration in the k-EPL sequence is consistent and asymptotically
efficient, so the first-order asymptotic properties do not vary across
iterations. Furthermore, we show the sequence achieves higher-order equivalence
to the finite-sample maximum likelihood estimator with iteration and that the
sequence of estimators converges almost surely to the maximum likelihood
estimator at a nearly-superlinear rate when the data are generated by any
regular Markov perfect equilibrium, including equilibria that lead to
inconsistency of other sequential estimators. When utility is linear in
parameters, k-EPL iterations are computationally simple, only requiring that
the researcher solve linear systems of equations to generate pseudo-regressors
which are used in a static logit/probit regression. Monte Carlo simulations
demonstrate the theoretical results and show k-EPL's good performance in finite
samples in both small- and large-scale games, even when the game admits
spurious equilibria in addition to one that generated the data. We apply the
estimator to study the role of competition in the U.S. wholesale club industry.

arXiv link: http://arxiv.org/abs/1912.10488v6

Econometrics arXiv updated paper (originally submitted: 2019-12-20)

ResLogit: A residual neural network logit model for data-driven choice modelling

Authors: Melvin Wong, Bilal Farooq

This paper presents a novel deep learning-based travel behaviour choice
model.Our proposed Residual Logit (ResLogit) model formulation seamlessly
integrates a Deep Neural Network (DNN) architecture into a multinomial logit
model. Recently, DNN models such as the Multi-layer Perceptron (MLP) and the
Recurrent Neural Network (RNN) have shown remarkable success in modelling
complex and noisy behavioural data. However, econometric studies have argued
that machine learning techniques are a `black-box' and difficult to interpret
for use in the choice analysis.We develop a data-driven choice model that
extends the systematic utility function to incorporate non-linear cross-effects
using a series of residual layers and using skipped connections to handle model
identifiability in estimating a large number of parameters.The model structure
accounts for cross-effects and choice heterogeneity arising from substitution,
interactions with non-chosen alternatives and other effects in a non-linear
manner.We describe the formulation, model estimation, interpretability and
examine the relative performance and econometric implications of our proposed
model.We present an illustrative example of the model on a classic red/blue bus
choice scenario example. For a real-world application, we use a travel mode
choice dataset to analyze the model characteristics compared to traditional
neural networks and Logit formulations.Our findings show that our ResLogit
approach significantly outperforms MLP models while providing similar
interpretability as a Multinomial Logit model.

arXiv link: http://arxiv.org/abs/1912.10058v2

Econometrics arXiv updated paper (originally submitted: 2019-12-20)

Optimal Dynamic Treatment Regimes and Partial Welfare Ordering

Authors: Sukjin Han

Dynamic treatment regimes are treatment allocations tailored to heterogeneous
individuals. The optimal dynamic treatment regime is a regime that maximizes
counterfactual welfare. We introduce a framework in which we can partially
learn the optimal dynamic regime from observational data, relaxing the
sequential randomization assumption commonly employed in the literature but
instead using (binary) instrumental variables. We propose the notion of sharp
partial ordering of counterfactual welfares with respect to dynamic regimes and
establish mapping from data to partial ordering via a set of linear programs.
We then characterize the identified set of the optimal regime as the set of
maximal elements associated with the partial ordering. We relate the notion of
partial ordering with a more conventional notion of partial identification
using topological sorts. Practically, topological sorts can be served as a
policy benchmark for a policymaker. We apply our method to understand returns
to schooling and post-school training as a sequence of treatments by combining
data from multiple sources. The framework of this paper can be used beyond the
current context, e.g., in establishing rankings of multiple treatments or
policies across different counterfactual scenarios.

arXiv link: http://arxiv.org/abs/1912.10014v4

Econometrics arXiv cross-link from math.OC (math.OC), submitted: 2019-12-19

Robust Product-line Pricing under Generalized Extreme Value Models

Authors: Tien Mai, Patrick Jaillet

We study robust versions of pricing problems where customers choose products
according to a generalized extreme value (GEV) choice model, and the choice
parameters are not known exactly but lie in an uncertainty set. We show that,
when the robust problem is unconstrained and the price sensitivity parameters
are homogeneous, the robust optimal prices have a constant markup over
products, and we provide formulas that allow to compute this constant markup by
bisection. We further show that, in the case that the price sensitivity
parameters are only homogeneous in each partition of the products, under the
assumption that the choice probability generating function and the uncertainty
set are partition-wise separable, a robust solution will have a constant markup
in each subset, and this constant-markup vector can be found efficiently by
convex optimization. We provide numerical results to illustrate the advantages
of our robust approach in protecting from bad scenarios. Our results hold for
convex and bounded uncertainty sets,} and for any arbitrary GEV model,
including the multinomial logit, nested or cross-nested logit.

arXiv link: http://arxiv.org/abs/1912.09552v2

Econometrics arXiv updated paper (originally submitted: 2019-12-19)

Temporal-Difference estimation of dynamic discrete choice models

Authors: Karun Adusumilli, Dita Eckardt

We study the use of Temporal-Difference learning for estimating the
structural parameters in dynamic discrete choice models. Our algorithms are
based on the conditional choice probability approach but use functional
approximations to estimate various terms in the pseudo-likelihood function. We
suggest two approaches: The first - linear semi-gradient - provides
approximations to the recursive terms using basis functions. The second -
Approximate Value Iteration - builds a sequence of approximations to the
recursive terms by solving non-parametric estimation problems. Our approaches
are fast and naturally allow for continuous and/or high-dimensional state
spaces. Furthermore, they do not require specification of transition densities.
In dynamic games, they avoid integrating over other players' actions, further
heightening the computational advantage. Our proposals can be paired with
popular existing methods such as pseudo-maximum-likelihood, and we propose
locally robust corrections for the latter to achieve parametric rates of
convergence. Monte Carlo simulations confirm the properties of our algorithms
in practice.

arXiv link: http://arxiv.org/abs/1912.09509v2

Econometrics arXiv updated paper (originally submitted: 2019-12-19)

Causal Inference and Data Fusion in Econometrics

Authors: Paul Hünermund, Elias Bareinboim

Learning about cause and effect is arguably the main goal in applied
econometrics. In practice, the validity of these causal inferences is
contingent on a number of critical assumptions regarding the type of data that
has been collected and the substantive knowledge that is available. For
instance, unobserved confounding factors threaten the internal validity of
estimates, data availability is often limited to non-random, selection-biased
samples, causal effects need to be learned from surrogate experiments with
imperfect compliance, and causal knowledge has to be extrapolated across
structurally heterogeneous populations. A powerful causal inference framework
is required to tackle these challenges, which plague most data analysis to
varying degrees. Building on the structural approach to causality introduced by
Haavelmo (1943) and the graph-theoretic framework proposed by Pearl (1995), the
artificial intelligence (AI) literature has developed a wide array of
techniques for causal learning that allow to leverage information from various
imperfect, heterogeneous, and biased data sources (Bareinboim and Pearl, 2016).
In this paper, we discuss recent advances in this literature that have the
potential to contribute to econometric methodology along three dimensions.
First, they provide a unified and comprehensive framework for causal inference,
in which the aforementioned problems can be addressed in full generality.
Second, due to their origin in AI, they come together with sound, efficient,
and complete algorithmic criteria for automatization of the corresponding
identification task. And third, because of the nonparametric description of
structural models that graph-theoretic approaches build on, they combine the
strengths of both structural econometrics as well as the potential outcomes
framework, and thus offer an effective middle ground between these two
literature streams.

arXiv link: http://arxiv.org/abs/1912.09104v4

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-12-19

Regularized Estimation of High-Dimensional Vector AutoRegressions with Weakly Dependent Innovations

Authors: Ricardo P. Masini, Marcelo C. Medeiros, Eduardo F. Mendes

There has been considerable advance in understanding the properties of sparse
regularization procedures in high-dimensional models. In time series context,
it is mostly restricted to Gaussian autoregressions or mixing sequences. We
study oracle properties of LASSO estimation of weakly sparse
vector-autoregressive models with heavy tailed, weakly dependent innovations
with virtually no assumption on the conditional heteroskedasticity. In contrast
to current literature, our innovation process satisfy an $L^1$ mixingale type
condition on the centered conditional covariance matrices. This condition
covers $L^1$-NED sequences and strong ($\alpha$-) mixing sequences as
particular examples.

arXiv link: http://arxiv.org/abs/1912.09002v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2019-12-18

Variable-lag Granger Causality for Time Series Analysis

Authors: Chainarong Amornbunchornvej, Elena Zheleva, Tanya Y. Berger-Wolf

Granger causality is a fundamental technique for causal inference in time
series data, commonly used in the social and biological sciences. Typical
operationalizations of Granger causality make a strong assumption that every
time point of the effect time series is influenced by a combination of other
time series with a fixed time delay. However, the assumption of the fixed time
delay does not hold in many applications, such as collective behavior,
financial markets, and many natural phenomena. To address this issue, we
develop variable-lag Granger causality, a generalization of Granger causality
that relaxes the assumption of the fixed time delay and allows causes to
influence effects with arbitrary time delays. In addition, we propose a method
for inferring variable-lag Granger causality relations. We demonstrate our
approach on an application for studying coordinated collective behavior and
show that it performs better than several existing methods in both simulated
and real-world datasets. Our approach can be applied in any domain of time
series analysis.

arXiv link: http://arxiv.org/abs/1912.10829v1

Econometrics arXiv updated paper (originally submitted: 2019-12-18)

Assessing Inference Methods

Authors: Bruno Ferman

We analyze different types of simulations that applied researchers can use to
assess whether their inference methods reliably control false-positive rates.
We show that different assessments involve trade-offs, varying in the types of
problems they may detect, finite-sample performance, susceptibility to
sequential-testing distortions, susceptibility to cherry-picking, and
implementation complexity. We also show that a commonly used simulation to
assess inference methods in shift-share designs can lead to misleading
conclusions and propose alternatives. Overall, we provide novel insights and
recommendations for applied researchers on how to choose, implement, and
interpret inference assessments in their empirical applications.

arXiv link: http://arxiv.org/abs/1912.08772v14

Econometrics arXiv updated paper (originally submitted: 2019-12-17)

Econometrics For Decision Making: Building Foundations Sketched By Haavelmo And Wald

Authors: Charles F. Manski

Haavelmo (1944) proposed a probabilistic structure for econometric modeling,
aiming to make econometrics useful for decision making. His fundamental
contribution has become thoroughly embedded in subsequent econometric research,
yet it could not answer all the deep issues that the author raised. Notably,
Haavelmo struggled to formalize the implications for decision making of the
fact that models can at most approximate actuality. In the same period, Wald
(1939, 1945) initiated his own seminal development of statistical decision
theory. Haavelmo favorably cited Wald, but econometrics did not embrace
statistical decision theory. Instead, it focused on study of identification,
estimation, and statistical inference. This paper proposes statistical decision
theory as a framework for evaluation of the performance of models in decision
making. I particularly consider the common practice of as-if optimization:
specification of a model, point estimation of its parameters, and use of the
point estimate to make a decision that would be optimal if the estimate were
accurate. A central theme is that one should evaluate as-if optimization or any
other model-based decision rule by its performance across the state space,
listing all states of nature that one believes feasible, not across the model
space. I apply the theme to prediction and treatment choice. Statistical
decision theory is conceptually simple, but application is often challenging.
Advancement of computation is the primary task to continue building the
foundations sketched by Haavelmo and Wald.

arXiv link: http://arxiv.org/abs/1912.08726v4

Econometrics arXiv paper, submitted: 2019-12-16

Estimation of Auction Models with Shape Restrictions

Authors: Joris Pinkse, Karl Schurter

We introduce several new estimation methods that leverage shape constraints
in auction models to estimate various objects of interest, including the
distribution of a bidder's valuations, the bidder's ex ante expected surplus,
and the seller's counterfactual revenue. The basic approach applies broadly in
that (unlike most of the literature) it works for a wide range of auction
formats and allows for asymmetric bidders. Though our approach is not
restrictive, we focus our analysis on first--price, sealed--bid auctions with
independent private valuations. We highlight two nonparametric estimation
strategies, one based on a least squares criterion and the other on a maximum
likelihood criterion. We also provide the first direct estimator of the
strategy function. We establish several theoretical properties of our methods
to guide empirical analysis and inference. In addition to providing the
asymptotic distributions of our estimators, we identify ways in which
methodological choices should be tailored to the objects of their interest. For
objects like the bidders' ex ante surplus and the seller's counterfactual
expected revenue with an additional symmetric bidder, we show that our
input--parameter--free estimators achieve the semiparametric efficiency bound.
For objects like the bidders' inverse strategy function, we provide an easily
implementable boundary--corrected kernel smoothing and transformation method in
order to ensure the squared error is integrable over the entire support of the
valuations. An extensive simulation study illustrates our analytical results
and demonstrates the respective advantages of our least--squares and maximum
likelihood estimators in finite samples. Compared to estimation strategies
based on kernel density estimation, the simulations indicate that the smoothed
versions of our estimators enjoy a large degree of robustness to the choice of
an input parameter.

arXiv link: http://arxiv.org/abs/1912.07466v1

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2019-12-16

Analysis of Regression Discontinuity Designs with Multiple Cutoffs or Multiple Scores

Authors: Matias D. Cattaneo, Rocio Titiunik, Gonzalo Vazquez-Bare

We introduce the Stata (and R) package rdmulti,
which includes three commands (rdmc, rdmcplot, rdms)
for analyzing Regression Discontinuity (RD) designs with multiple cutoffs or
multiple scores. The command rdmc applies to non-cumulative and
cumulative multi-cutoff RD settings. It calculates pooled and cutoff-specific
RD treatment effects, and provides robust bias-corrected inference procedures.
Post estimation and inference is allowed. The command rdmcplot offers
RD plots for multi-cutoff settings. Finally, the command rdms concerns
multi-score settings, covering in particular cumulative cutoffs and two running
variables contexts. It also calculates pooled and cutoff-specific RD treatment
effects, provides robust bias-corrected inference procedures, and allows for
post-estimation estimation and inference. These commands employ the
Stata (and R) package rdrobust for plotting,
estimation, and inference. Companion R functions with the same syntax
and capabilities are provided.

arXiv link: http://arxiv.org/abs/1912.07346v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-12-15

Prediction Intervals for Synthetic Control Methods

Authors: Matias D. Cattaneo, Yingjie Feng, Rocio Titiunik

Uncertainty quantification is a fundamental problem in the analysis and
interpretation of synthetic control (SC) methods. We develop conditional
prediction intervals in the SC framework, and provide conditions under which
these intervals offer finite-sample probability guarantees. Our method allows
for covariate adjustment and non-stationary data. The construction begins by
noting that the statistical uncertainty of the SC prediction is governed by two
distinct sources of randomness: one coming from the construction of the (likely
misspecified) SC weights in the pre-treatment period, and the other coming from
the unobservable stochastic error in the post-treatment period when the
treatment effect is analyzed. Accordingly, our proposed prediction intervals
are constructed taking into account both sources of randomness. For
implementation, we propose a simulation-based approach along with
finite-sample-based probability bound arguments, naturally leading to
principled sensitivity analysis methods. We illustrate the numerical
performance of our methods using empirical applications and a small simulation
study. Python, R and Stata software packages
implementing our methodology are available.

arXiv link: http://arxiv.org/abs/1912.07120v4

Econometrics arXiv paper, submitted: 2019-12-13

Network Data

Authors: Bryan S. Graham

Many economic activities are embedded in networks: sets of agents and the
(often) rivalrous relationships connecting them to one another. Input sourcing
by firms, interbank lending, scientific research, and job search are four
examples, among many, of networked economic activities. Motivated by the
premise that networks' structures are consequential, this chapter describes
econometric methods for analyzing them. I emphasize (i) dyadic regression
analysis incorporating unobserved agent-specific heterogeneity and supporting
causal inference, (ii) techniques for estimating, and conducting inference on,
summary network parameters (e.g., the degree distribution or transitivity
index); and (iii) empirical models of strategic network formation admitting
interdependencies in preferences. Current research challenges and open
questions are also discussed.

arXiv link: http://arxiv.org/abs/1912.06346v1

Econometrics arXiv paper, submitted: 2019-12-13

Synthetic Control Inference for Staggered Adoption: Estimating the Dynamic Effects of Board Gender Diversity Policies

Authors: Jianfei Cao, Shirley Lu

We introduce a synthetic control methodology to study policies with staggered
adoption. Many policies, such as the board gender quota, are replicated by
other policy setters at different time frames. Our method estimates the dynamic
average treatment effects on the treated using variation introduced by the
staggered adoption of policies. Our method gives asymptotically unbiased
estimators of many interesting quantities and delivers asymptotically valid
inference. By using the proposed method and national labor data in Europe, we
find evidence that quota regulation on board diversity leads to a decrease in
part-time employment, and an increase in full-time employment for female
professionals.

arXiv link: http://arxiv.org/abs/1912.06320v1

Econometrics arXiv updated paper (originally submitted: 2019-12-13)

High-Dimensional Granger Causality Tests with an Application to VIX and News

Authors: Andrii Babii, Eric Ghysels, Jonas Striaukas

We study Granger causality testing for high-dimensional time series using
regularized regressions. To perform proper inference, we rely on
heteroskedasticity and autocorrelation consistent (HAC) estimation of the
asymptotic variance and develop the inferential theory in the high-dimensional
setting. To recognize the time series data structures we focus on the
sparse-group LASSO estimator, which includes the LASSO and the group LASSO as
special cases. We establish the debiased central limit theorem for low
dimensional groups of regression coefficients and study the HAC estimator of
the long-run variance based on the sparse-group LASSO residuals. This leads to
valid time series inference for individual regression coefficients as well as
groups, including Granger causality tests. The treatment relies on a new
Fuk-Nagaev inequality for a class of $\tau$-mixing processes with heavier than
Gaussian tails, which is of independent interest. In an empirical application,
we study the Granger causal relationship between the VIX and financial news.

arXiv link: http://arxiv.org/abs/1912.06307v4

Econometrics arXiv paper, submitted: 2019-12-12

A Regularized Factor-augmented Vector Autoregressive Model

Authors: Maurizio Daniele, Julie Schnaitmann

We propose a regularized factor-augmented vector autoregressive (FAVAR) model
that allows for sparsity in the factor loadings. In this framework, factors may
only load on a subset of variables which simplifies the factor identification
and their economic interpretation. We identify the factors in a data-driven
manner without imposing specific relations between the unobserved factors and
the underlying time series. Using our approach, the effects of structural
shocks can be investigated on economically meaningful factors and on all
observed time series included in the FAVAR model. We prove consistency for the
estimators of the factor loadings, the covariance matrix of the idiosyncratic
component, the factors, as well as the autoregressive parameters in the dynamic
model. In an empirical application, we investigate the effects of a monetary
policy shock on a broad range of economically relevant variables. We identify
this shock using a joint identification of the factor model and the structural
innovations in the VAR model. We find impulse response functions which are in
line with economic rationale, both on the factor aggregates and observed time
series level.

arXiv link: http://arxiv.org/abs/1912.06049v1

Econometrics arXiv paper, submitted: 2019-12-10

Adaptive Dynamic Model Averaging with an Application to House Price Forecasting

Authors: Alisa Yusupova, Nicos G. Pavlidis, Efthymios G. Pavlidis

Dynamic model averaging (DMA) combines the forecasts of a large number of
dynamic linear models (DLMs) to predict the future value of a time series. The
performance of DMA critically depends on the appropriate choice of two
forgetting factors. The first of these controls the speed of adaptation of the
coefficient vector of each DLM, while the second enables time variation in the
model averaging stage. In this paper we develop a novel, adaptive dynamic model
averaging (ADMA) methodology. The proposed methodology employs a stochastic
optimisation algorithm that sequentially updates the forgetting factor of each
DLM, and uses a state-of-the-art non-parametric model combination algorithm
from the prediction with expert advice literature, which offers finite-time
performance guarantees. An empirical application to quarterly UK house price
data suggests that ADMA produces more accurate forecasts than the benchmark
autoregressive model, as well as competing DMA specifications.

arXiv link: http://arxiv.org/abs/1912.04661v1

Econometrics arXiv cross-link from q-fin.TR (q-fin.TR), submitted: 2019-12-10

Market Price of Trading Liquidity Risk and Market Depth

Authors: Masaaki Kijima, Christopher Ting

Price impact of a trade is an important element in pre-trade and post-trade
analyses. We introduce a framework to analyze the market price of liquidity
risk, which allows us to derive an inhomogeneous Bernoulli ordinary
differential equation. We obtain two closed form solutions, one of which
reproduces the linear function of the order flow in Kyle (1985) for informed
traders. However, when traders are not as asymmetrically informed, an S-shape
function of the order flow is obtained. We perform an empirical intra-day
analysis on Nikkei futures to quantify the price impact of order flow and
compare our results with industry's heuristic price impact functions. Our model
of order flow yields a rich framework for not only to estimate the liquidity
risk parameters, but also to provide a plausible cause of why volatility and
correlation are stochastic in nature. Finally, we find that the market depth
encapsulates the market price of liquidity risk.

arXiv link: http://arxiv.org/abs/1912.04565v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-12-09

Regularized Estimation of High-dimensional Factor-Augmented Vector Autoregressive (FAVAR) Models

Authors: Jiahe Lin, George Michailidis

A factor-augmented vector autoregressive (FAVAR) model is defined by a VAR
equation that captures lead-lag correlations amongst a set of observed
variables $X$ and latent factors $F$, and a calibration equation that relates
another set of observed variables $Y$ with $F$ and $X$. The latter equation is
used to estimate the factors that are subsequently used in estimating the
parameters of the VAR system. The FAVAR model has become popular in applied
economic research, since it can summarize a large number of variables of
interest as a few factors through the calibration equation and subsequently
examine their influence on core variables of primary interest through the VAR
equation. However, there is increasing need for examining lead-lag
relationships between a large number of time series, while incorporating
information from another high-dimensional set of variables. Hence, in this
paper we investigate the FAVAR model under high-dimensional scaling. We
introduce an appropriate identification constraint for the model parameters,
which when incorporated into the formulated optimization problem yields
estimates with good statistical properties. Further, we address a number of
technical challenges introduced by the fact that estimates of the VAR system
model parameters are based on estimated rather than directly observed
quantities. The performance of the proposed estimators is evaluated on
synthetic data. Further, the model is applied to commodity prices and reveals
interesting and interpretable relationships between the prices and the factors
extracted from a set of global macroeconomic indicators.

arXiv link: http://arxiv.org/abs/1912.04146v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-12-09

Approximate Factor Models with Strongly Correlated Idiosyncratic Errors

Authors: Jiahe Lin, George Michailidis

We consider the estimation of approximate factor models for time series data,
where strong serial and cross-sectional correlations amongst the idiosyncratic
component are present. This setting comes up naturally in many applications,
but existing approaches in the literature rely on the assumption that such
correlations are weak, leading to mis-specification of the number of factors
selected and consequently inaccurate inference. In this paper, we explicitly
incorporate the dependent structure present in the idiosyncratic component
through lagged values of the observed multivariate time series. We formulate a
constrained optimization problem to estimate the factor space and the
transition matrices of the lagged values {\em simultaneously}, wherein the
constraints reflect the low rank nature of the common factors and the sparsity
of the transition matrices. We establish theoretical properties of the obtained
estimates, and introduce an easy-to-implement computational procedure for
empirical work. The performance of the model and the implementation procedure
is evaluated on synthetic data and compared with competing approaches, and
further illustrated on a data set involving weekly log-returns of 75 US large
financial institutions for the 2001-2016 period.

arXiv link: http://arxiv.org/abs/1912.04123v1

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2019-12-08

Energy Scenario Exploration with Modeling to Generate Alternatives (MGA)

Authors: Joseph F. DeCarolis, Samaneh Babaee, Binghui Li, Suyash Kanungo

Energy system optimization models (ESOMs) should be used in an interactive
way to uncover knife-edge solutions, explore alternative system configurations,
and suggest different ways to achieve policy objectives under conditions of
deep uncertainty. In this paper, we do so by employing an existing optimization
technique called modeling to generate alternatives (MGA), which involves a
change in the model structure in order to systematically explore the
near-optimal decision space. The MGA capability is incorporated into Tools for
Energy Model Optimization and Analysis (Temoa), an open source framework that
also includes a technology rich, bottom up ESOM. In this analysis, Temoa is
used to explore alternative energy futures in a simplified single region energy
system that represents the U.S. electric sector and a portion of the light duty
transport sector. Given the dataset limitations, we place greater emphasis on
the methodological approach rather than specific results.

arXiv link: http://arxiv.org/abs/1912.03788v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-12-06

Synthetic Controls with Staggered Adoption

Authors: Eli Ben-Michael, Avi Feller, Jesse Rothstein

Staggered adoption of policies by different units at different times creates
promising opportunities for observational causal inference. Estimation remains
challenging, however, and common regression methods can give misleading
results. A promising alternative is the synthetic control method (SCM), which
finds a weighted average of control units that closely balances the treated
unit's pre-treatment outcomes. In this paper, we generalize SCM, originally
designed to study a single treated unit, to the staggered adoption setting. We
first bound the error for the average effect and show that it depends on both
the imbalance for each treated unit separately and the imbalance for the
average of the treated units. We then propose "partially pooled" SCM weights to
minimize a weighted combination of these measures; approaches that focus only
on balancing one of the two components can lead to bias. We extend this
approach to incorporate unit-level intercept shifts and auxiliary covariates.
We assess the performance of the proposed method via extensive simulations and
apply our results to the question of whether teacher collective bargaining
leads to higher school spending, finding minimal impacts. We implement the
proposed method in the augsynth R package.

arXiv link: http://arxiv.org/abs/1912.03290v2

Econometrics arXiv updated paper (originally submitted: 2019-12-06)

High-frequency and heteroskedasticity identification in multicountry models: Revisiting spillovers of monetary shocks

Authors: Michael Pfarrhofer, Anna Stelzer

We explore the international transmission of monetary policy and central bank
information shocks originating from the United States and the euro area.
Employing a panel vector autoregression, we use macroeconomic and financial
variables across several major economies to address both static and dynamic
spillovers. To identify structural shocks, we introduce a novel approach that
combines external instruments with heteroskedasticity-based identification and
sign restrictions. Our results suggest significant spillovers from European
Central Bank and Federal Reserve policies to each other's economies, global
aggregates, and other countries. These effects are more pronounced for central
bank information shocks than for pure monetary policy shocks, and the dominance
of the US in the global economy is reflected in our findings.

arXiv link: http://arxiv.org/abs/1912.03158v2

Econometrics arXiv paper, submitted: 2019-12-06

Triple the gamma -- A unifying shrinkage prior for variance and variable selection in sparse state space and TVP models

Authors: Annalisa Cadonna, Sylvia Frühwirth-Schnatter, Peter Knaus

Time-varying parameter (TVP) models are very flexible in capturing gradual
changes in the effect of a predictor on the outcome variable. However, in
particular when the number of predictors is large, there is a known risk of
overfitting and poor predictive performance, since the effect of some
predictors is constant over time. We propose a prior for variance shrinkage in
TVP models, called triple gamma. The triple gamma prior encompasses a number of
priors that have been suggested previously, such as the Bayesian lasso, the
double gamma prior and the Horseshoe prior. We present the desirable properties
of such a prior and its relationship to Bayesian Model Averaging for variance
selection. The features of the triple gamma prior are then illustrated in the
context of time varying parameter vector autoregressive models, both for
simulated datasets and for a series of macroeconomics variables in the Euro
Area.

arXiv link: http://arxiv.org/abs/1912.03100v1

Econometrics arXiv paper, submitted: 2019-12-04

Estimating Large Mixed-Frequency Bayesian VAR Models

Authors: Sebastian Ankargren, Paulina Jonéus

We discuss the issue of estimating large-scale vector autoregressive (VAR)
models with stochastic volatility in real-time situations where data are
sampled at different frequencies. In the case of a large VAR with stochastic
volatility, the mixed-frequency data warrant an additional step in the already
computationally challenging Markov Chain Monte Carlo algorithm used to sample
from the posterior distribution of the parameters. We suggest the use of a
factor stochastic volatility model to capture a time-varying error covariance
structure. Because the factor stochastic volatility model renders the equations
of the VAR conditionally independent, settling for this particular stochastic
volatility model comes with major computational benefits. First, we are able to
improve upon the mixed-frequency simulation smoothing step by leveraging a
univariate and adaptive filtering algorithm. Second, the regression parameters
can be sampled equation-by-equation in parallel. These computational features
of the model alleviate the computational burden and make it possible to move
the mixed-frequency VAR to the high-dimensional regime. We illustrate the model
by an application to US data using our mixed-frequency VAR with 20, 34 and 119
variables.

arXiv link: http://arxiv.org/abs/1912.02231v1

Econometrics arXiv updated paper (originally submitted: 2019-12-04)

High Dimensional Latent Panel Quantile Regression with an Application to Asset Pricing

Authors: Alexandre Belloni, Mingli Chen, Oscar Hernan Madrid Padilla, Zixuan, Wang

We propose a generalization of the linear panel quantile regression model to
accommodate both sparse and dense parts: sparse means while
the number of covariates available is large, potentially only a much smaller
number of them have a nonzero impact on each conditional quantile of the
response variable; while the dense part is represent by a low-rank matrix that
can be approximated by latent factors and their loadings. Such a structure
poses problems for traditional sparse estimators, such as the
$\ell_1$-penalised Quantile Regression, and for traditional latent factor
estimator, such as PCA. We propose a new estimation procedure, based on the
ADMM algorithm, consists of combining the quantile loss function with $\ell_1$
and nuclear norm regularization. We show, under general conditions,
that our estimator can consistently estimate both the nonzero coefficients of
the covariates and the latent low-rank matrix.
Our proposed model has a "Characteristics + Latent Factors" Asset Pricing
Model interpretation: we apply our model and estimator with a large-dimensional
panel of financial data and find that (i) characteristics have sparser
predictive power once latent factors were controlled (ii) the factors and
coefficients at upper and lower quantiles are different from the median.

arXiv link: http://arxiv.org/abs/1912.02151v2

Econometrics arXiv paper, submitted: 2019-12-03

Bilinear form test statistics for extremum estimation

Authors: Federico Crudu, Felipe Osorio

This paper develops a set of test statistics based on bilinear forms in the
context of the extremum estimation framework with particular interest in
nonlinear hypothesis. We show that the proposed statistic converges to a
conventional chi-square limit. A Monte Carlo experiment suggests that the test
statistic works well in finite samples.

arXiv link: http://arxiv.org/abs/1912.01410v1

Econometrics arXiv paper, submitted: 2019-12-03

Mean-shift least squares model averaging

Authors: Kenichiro McAlinn, Kosaku Takanashi

This paper proposes a new estimator for selecting weights to average over
least squares estimates obtained from a set of models. Our proposed estimator
builds on the Mallows model average (MMA) estimator of Hansen (2007), but,
unlike MMA, simultaneously controls for location bias and regression error
through a common constant. We show that our proposed estimator-- the mean-shift
Mallows model average (MSA) estimator-- is asymptotically optimal to the
original MMA estimator in terms of mean squared error. A simulation study is
presented, where we show that our proposed estimator uniformly outperforms the
MMA estimator.

arXiv link: http://arxiv.org/abs/1912.01194v1

Econometrics arXiv cross-link from q-fin.GN (q-fin.GN), submitted: 2019-12-02

Stylized Facts and Agent-Based Modeling

Authors: Simon Cramer, Torsten Trimborn

The existence of stylized facts in financial data has been documented in many
studies. In the past decade the modeling of financial markets by agent-based
computational economic market models has become a frequently used modeling
approach. The main purpose of these models is to replicate stylized facts and
to identify sufficient conditions for their creations. In this paper we
introduce the most prominent examples of stylized facts and especially present
stylized facts of financial data. Furthermore, we given an introduction to
agent-based modeling. Here, we not only provide an overview of this topic but
introduce the idea of universal building blocks for agent-based economic market
models.

arXiv link: http://arxiv.org/abs/1912.02684v1

Econometrics arXiv updated paper (originally submitted: 2019-12-02)

Clustering and External Validity in Randomized Controlled Trials

Authors: Antoine Deeb, Clément de Chaisemartin

The randomization inference literature studying randomized controlled trials
(RCTs) assumes that units' potential outcomes are deterministic. This
assumption is unlikely to hold, as stochastic shocks may take place during the
experiment. In this paper, we consider the case of an RCT with individual-level
treatment assignment, and we allow for individual-level and cluster-level (e.g.
village-level) shocks. We show that one can draw inference on the ATE
conditional on the realizations of the cluster-level shocks, using
heteroskedasticity-robust standard errors, or on the ATE netted out of those
shocks, using cluster-robust standard errors.

arXiv link: http://arxiv.org/abs/1912.01052v7

Econometrics arXiv paper, submitted: 2019-12-02

A multifactor regime-switching model for inter-trade durations in the limit order market

Authors: Zhicheng Li, Haipeng Xing, Xinyun Chen

This paper studies inter-trade durations in the NASDAQ limit order market and
finds that inter-trade durations in ultra-high frequency have two modes. One
mode is to the order of approximately 10^{-4} seconds, and the other is to the
order of 1 second. This phenomenon and other empirical evidence suggest that
there are two regimes associated with the dynamics of inter-trade durations,
and the regime switchings are driven by the changes of high-frequency traders
(HFTs) between providing and taking liquidity. To find how the two modes depend
on information in the limit order book (LOB), we propose a two-state
multifactor regime-switching (MF-RSD) model for inter-trade durations, in which
the probabilities transition matrices are time-varying and depend on some
lagged LOB factors. The MF-RSD model has good in-sample fitness and the
superior out-of-sample performance, compared with some benchmark duration
models. Our findings of the effects of LOB factors on the inter-trade durations
help to understand more about the high-frequency market microstructure.

arXiv link: http://arxiv.org/abs/1912.00764v1

Econometrics arXiv updated paper (originally submitted: 2019-11-29)

Semiparametric Quantile Models for Ascending Auctions with Asymmetric Bidders

Authors: Jayeeta Bhattacharya, Nathalie Gimenes, Emmanuel Guerre

The paper proposes a parsimonious and flexible semiparametric quantile
regression specification for asymmetric bidders within the independent private
value framework. Asymmetry is parameterized using powers of a parent private
value distribution, which is generated by a quantile regression specification.
As noted in Cantillon (2008) , this covers and extends models used for
efficient collusion, joint bidding and mergers among homogeneous bidders. The
specification can be estimated for ascending auctions using the winning bids
and the winner's identity. The estimation is in two stage. The asymmetry
parameters are estimated from the winner's identity using a simple maximum
likelihood procedure. The parent quantile regression specification can be
estimated using simple modifications of Gimenes (2017). Specification testing
procedures are also considered. A timber application reveals that weaker
bidders have $30%$ less chances to win the auction than stronger ones. It is
also found that increasing participation in an asymmetric ascending auction may
not be as beneficial as using an optimal reserve price as would have been
expected from a result of BulowKlemperer (1996) valid under symmetry.

arXiv link: http://arxiv.org/abs/1911.13063v2

Econometrics arXiv updated paper (originally submitted: 2019-11-28)

Inference under random limit bootstrap measures

Authors: Giuseppe Cavaliere, Iliyan Georgiev

Asymptotic bootstrap validity is usually understood as consistency of the
distribution of a bootstrap statistic, conditional on the data, for the
unconditional limit distribution of a statistic of interest. From this
perspective, randomness of the limit bootstrap measure is regarded as a failure
of the bootstrap. We show that such limiting randomness does not necessarily
invalidate bootstrap inference if validity is understood as control over the
frequency of correct inferences in large samples. We first establish sufficient
conditions for asymptotic bootstrap validity in cases where the unconditional
limit distribution of a statistic can be obtained by averaging a (random)
limiting bootstrap distribution. Further, we provide results ensuring the
asymptotic validity of the bootstrap as a tool for conditional inference, the
leading case being that where a bootstrap distribution estimates consistently a
conditional (and thus, random) limit distribution of a statistic. We apply our
framework to several inference problems in econometrics, including linear
models with possibly non-stationary regressors, functional CUSUM statistics,
conditional Kolmogorov-Smirnov specification tests, the `parameter on the
boundary' problem and tests for constancy of parameters in dynamic econometric
models.

arXiv link: http://arxiv.org/abs/1911.12779v2

Econometrics arXiv paper, submitted: 2019-11-28

An Integrated Early Warning System for Stock Market Turbulence

Authors: Peiwan Wang, Lu Zong, Ye Ma

This study constructs an integrated early warning system (EWS) that
identifies and predicts stock market turbulence. Based on switching ARCH
(SWARCH) filtering probabilities of the high volatility regime, the proposed
EWS first classifies stock market crises according to an indicator function
with thresholds dynamically selected by the two-peak method. A hybrid algorithm
is then developed in the framework of a long short-term memory (LSTM) network
to make daily predictions that alert turmoils. In the empirical evaluation
based on ten-year Chinese stock data, the proposed EWS yields satisfying
results with the test-set accuracy of $96.6%$ and on average $2.4$ days of the
forewarned period. The model's stability and practical value in real-time
decision-making are also proven by the cross-validation and back-testing.

arXiv link: http://arxiv.org/abs/1911.12596v1

Econometrics arXiv updated paper (originally submitted: 2019-11-25)

Predicting crashes in oil prices during the COVID-19 pandemic with mixed causal-noncausal models

Authors: Alain Hecq, Elisa Voisin

This paper aims at shedding light upon how transforming or detrending a
series can substantially impact predictions of mixed causal-noncausal (MAR)
models, namely dynamic processes that depend not only on their lags but also on
their leads. MAR models have been successfully implemented on commodity prices
as they allow to generate nonlinear features such as locally explosive episodes
(denoted here as bubbles) in a strictly stationary setting. We consider
multiple detrending methods and investigate, using Monte Carlo simulations, to
what extent they preserve the bubble patterns observed in the raw data. MAR
models relies on the dynamics observed in the series alone and does not require
economical background to construct a structural model, which can sometimes be
intricate to specify or which may lack parsimony. We investigate oil prices and
estimate probabilities of crashes before and during the first 2020 wave of the
COVID-19 pandemic. We consider three different mechanical detrending methods
and compare them to a detrending performed using the level of strategic
petroleum reserves.

arXiv link: http://arxiv.org/abs/1911.10916v3

Econometrics arXiv paper, submitted: 2019-11-24

High-Dimensional Forecasting in the Presence of Unit Roots and Cointegration

Authors: Stephan Smeekes, Etienne Wijler

We investigate how the possible presence of unit roots and cointegration
affects forecasting with Big Data. As most macroeoconomic time series are very
persistent and may contain unit roots, a proper handling of unit roots and
cointegration is of paramount importance for macroeconomic forecasting. The
high-dimensional nature of Big Data complicates the analysis of unit roots and
cointegration in two ways. First, transformations to stationarity require
performing many unit root tests, increasing room for errors in the
classification. Second, modelling unit roots and cointegration directly is more
difficult, as standard high-dimensional techniques such as factor models and
penalized regression are not directly applicable to (co)integrated data and
need to be adapted. We provide an overview of both issues and review methods
proposed to address these issues. These methods are also illustrated with two
empirical applications.

arXiv link: http://arxiv.org/abs/1911.10552v1

Econometrics arXiv paper, submitted: 2019-11-24

Topologically Mapping the Macroeconomy

Authors: Pawel Dlotko, Simon Rudkin, Wanling Qiu

An understanding of the economic landscape in a world of ever increasing data
necessitates representations of data that can inform policy, deepen
understanding and guide future research. Topological Data Analysis offers a set
of tools which deliver on all three calls. Abstract two-dimensional snapshots
of multi-dimensional space readily capture non-monotonic relationships, inform
of similarity between points of interest in parameter space, mapping such to
outcomes. Specific examples show how some, but not all, countries have returned
to Great Depression levels, and reappraise the links between real private
capital growth and the performance of the economy. Theoretical and empirical
expositions alike remind on the dangers of assuming monotonic relationships and
discounting combinations of factors as determinants of outcomes; both dangers
Topological Data Analysis addresses. Policy-makers can look at outcomes and
target areas of the input space where such are not satisfactory, academics may
additionally find evidence to motivate theoretical development, and
practitioners can gain a rapid and robust base for decision making.

arXiv link: http://arxiv.org/abs/1911.10476v1

Econometrics arXiv cross-link from q-fin.TR (q-fin.TR), submitted: 2019-11-24

A singular stochastic control approach for optimal pairs trading with proportional transaction costs

Authors: Haipeng Xing

Optimal trading strategies for pairs trading have been studied by models that
try to find either optimal shares of stocks by assuming no transaction costs or
optimal timing of trading fixed numbers of shares of stocks with transaction
costs. To find optimal strategies which determine optimally both trade times
and number of shares in pairs trading process, we use a singular stochastic
control approach to study an optimal pairs trading problem with proportional
transaction costs. Assuming a cointegrated relationship for a pair of stock
log-prices, we consider a portfolio optimization problem which involves dynamic
trading strategies with proportional transaction costs. We show that the value
function of the control problem is the unique viscosity solution of a nonlinear
quasi-variational inequality, which is equivalent to a free boundary problem
for the singular stochastic control value function. We then develop a discrete
time dynamic programming algorithm to compute the transaction regions, and show
the convergence of the discretization scheme. We illustrate our approach with
numerical examples and discuss the impact of different parameters on
transaction regions. We study the out-of-sample performance in an empirical
study that consists of six pairs of U.S. stocks selected from different
industry sectors, and demonstrate the efficiency of the optimal strategy.

arXiv link: http://arxiv.org/abs/1911.10450v1

Econometrics arXiv updated paper (originally submitted: 2019-11-22)

Uniform inference for value functions

Authors: Sergio Firpo, Antonio F. Galvao, Thomas Parker

We propose a method to conduct uniform inference for the (optimal) value
function, that is, the function that results from optimizing an objective
function marginally over one of its arguments. Marginal optimization is not
Hadamard differentiable (that is, compactly differentiable) as a map between
the spaces of objective and value functions, which is problematic because
standard inference methods for nonlinear maps usually rely on Hadamard
differentiability. However, we show that the map from objective function to an
$L_p$ functional of a value function, for $1 \leq p \leq \infty$, are Hadamard
directionally differentiable. As a result, we establish consistency and weak
convergence of nonparametric plug-in estimates of Cram\'er-von Mises and
Kolmogorov-Smirnov test statistics applied to value functions. For practical
inference, we develop detailed resampling techniques that combine a bootstrap
procedure with estimates of the directional derivatives. In addition, we
establish local size control of tests which use the resampling procedure. Monte
Carlo simulations assess the finite-sample properties of the proposed methods
and show accurate empirical size and nontrivial power of the procedures.
Finally, we apply our methods to the evaluation of a job training program using
bounds for the distribution function of treatment effects.

arXiv link: http://arxiv.org/abs/1911.10215v7

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-11-21

A Practical Introduction to Regression Discontinuity Designs: Foundations

Authors: Matias D. Cattaneo, Nicolas Idrobo, Rocio Titiunik

In this Element and its accompanying Element, Matias D. Cattaneo, Nicolas
Idrobo, and Rocio Titiunik provide an accessible and practical guide for the
analysis and interpretation of Regression Discontinuity (RD) designs that
encourages the use of a common set of practices and facilitates the
accumulation of RD-based empirical evidence. In this Element, the authors
discuss the foundations of the canonical Sharp RD design, which has the
following features: (i) the score is continuously distributed and has only one
dimension, (ii) there is only one cutoff, and (iii) compliance with the
treatment assignment is perfect. In the accompanying Element, the authors
discuss practical and conceptual extensions to the basic RD setup.

arXiv link: http://arxiv.org/abs/1911.09511v1

Econometrics arXiv paper, submitted: 2019-11-21

Hybrid quantile estimation for asymmetric power GARCH models

Authors: Guochang Wang, Ke Zhu, Guodong Li, Wai Keung Li

Asymmetric power GARCH models have been widely used to study the higher order
moments of financial returns, while their quantile estimation has been rarely
investigated. This paper introduces a simple monotonic transformation on its
conditional quantile function to make the quantile regression tractable. The
asymptotic normality of the resulting quantile estimators is established under
either stationarity or non-stationarity. Moreover, based on the estimation
procedure, new tests for strict stationarity and asymmetry are also
constructed. This is the first try of the quantile estimation for
non-stationary ARCH-type models in the literature. The usefulness of the
proposed methodology is illustrated by simulation results and real data
analysis.

arXiv link: http://arxiv.org/abs/1911.09343v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-11-21

Regression Discontinuity Design under Self-selection

Authors: Sida Peng, Yang Ning

In Regression Discontinuity (RD) design, self-selection leads to different
distributions of covariates on two sides of the policy intervention, which
essentially violates the continuity of potential outcome assumption. The
standard RD estimand becomes difficult to interpret due to the existence of
some indirect effect, i.e. the effect due to self selection. We show that the
direct causal effect of interest can still be recovered under a class of
estimands. Specifically, we consider a class of weighted average treatment
effects tailored for potentially different target populations. We show that a
special case of our estimands can recover the average treatment effect under
the conditional independence assumption per Angrist and Rokkanen (2015), and
another example is the estimand recently proposed in Fr\"olich and Huber
(2018). We propose a set of estimators through a weighted local linear
regression framework and prove the consistency and asymptotic normality of the
estimators. Our approach can be further extended to the fuzzy RD case. In
simulation exercises, we compare the performance of our estimator with the
standard RD estimator. Finally, we apply our method to two empirical data sets:
the U.S. House elections data in Lee (2008) and a novel data set from Microsoft
Bing on Generalized Second Price (GSP) auction.

arXiv link: http://arxiv.org/abs/1911.09248v1

Econometrics arXiv paper, submitted: 2019-11-20

A Flexible Mixed-Frequency Vector Autoregression with a Steady-State Prior

Authors: Sebastian Ankargren, Måns Unosson, Yukai Yang

We propose a Bayesian vector autoregressive (VAR) model for mixed-frequency
data. Our model is based on the mean-adjusted parametrization of the VAR and
allows for an explicit prior on the 'steady states' (unconditional means) of
the included variables. Based on recent developments in the literature, we
discuss extensions of the model that improve the flexibility of the modeling
approach. These extensions include a hierarchical shrinkage prior for the
steady-state parameters, and the use of stochastic volatility to model
heteroskedasticity. We put the proposed model to use in a forecast evaluation
using US data consisting of 10 monthly and 3 quarterly variables. The results
show that the predictive ability typically benefits from using mixed-frequency
data, and that improvements can be obtained for both monthly and quarterly
variables. We also find that the steady-state prior generally enhances the
accuracy of the forecasts, and that accounting for heteroskedasticity by means
of stochastic volatility usually provides additional improvements, although not
for all variables.

arXiv link: http://arxiv.org/abs/1911.09151v1

Econometrics arXiv paper, submitted: 2019-11-20

A Scrambled Method of Moments

Authors: Jean-Jacques Forneron

Quasi-Monte Carlo (qMC) methods are a powerful alternative to classical
Monte-Carlo (MC) integration. Under certain conditions, they can approximate
the desired integral at a faster rate than the usual Central Limit Theorem,
resulting in more accurate estimates. This paper explores these methods in a
simulation-based estimation setting with an emphasis on the scramble of Owen
(1995). For cross-sections and short-panels, the resulting Scrambled Method of
Moments simply replaces the random number generator with the scramble
(available in most softwares) to reduce simulation noise. Scrambled Indirect
Inference estimation is also considered. For time series, qMC may not apply
directly because of a curse of dimensionality on the time dimension. A simple
algorithm and a class of moments which circumvent this issue are described.
Asymptotic results are given for each algorithm. Monte-Carlo examples
illustrate these results in finite samples, including an income process with
"lots of heterogeneity."

arXiv link: http://arxiv.org/abs/1911.09128v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2019-11-20

Competition of noise and collectivity in global cryptocurrency trading: route to a self-contained market

Authors: Stanisław Drożdż, Ludovico Minati, Paweł Oświęcimka, Marek Stanuszek, Marcin Wątorek

Cross-correlations in fluctuations of the daily exchange rates within the
basket of the 100 highest-capitalization cryptocurrencies over the period
October 1, 2015, through March 31, 2019, are studied. The corresponding
dynamics predominantly involve one leading eigenvalue of the correlation
matrix, while the others largely coincide with those of Wishart random
matrices. However, the magnitude of the principal eigenvalue, and thus the
degree of collectivity, strongly depends on which cryptocurrency is used as a
base. It is largest when the base is the most peripheral cryptocurrency; when
more significant ones are taken into consideration, its magnitude
systematically decreases, nevertheless preserving a sizable gap with respect to
the random bulk, which in turn indicates that the organization of correlations
becomes more heterogeneous. This finding provides a criterion for recognizing
which currencies or cryptocurrencies play a dominant role in the global
crypto-market. The present study shows that over the period under
consideration, the Bitcoin (BTC) predominates, hallmarking exchange rate
dynamics at least as influential as the US dollar. The BTC started dominating
around the year 2017, while further cryptocurrencies, like the Ethereum (ETH)
and even Ripple (XRP), assumed similar trends. At the same time, the USD, an
original value determinant for the cryptocurrency market, became increasingly
disconnected, its related characteristics eventually approaching those of a
fictitious currency. These results are strong indicators of incipient
independence of the global cryptocurrency market, delineating a self-contained
trade resembling the Forex.

arXiv link: http://arxiv.org/abs/1911.08944v2

Econometrics arXiv paper, submitted: 2019-11-20

Statistical Inference on Partially Linear Panel Model under Unobserved Linearity

Authors: Ruiqi Liu, Ben Boukai, Zuofeng Shang

A new statistical procedure, based on a modified spline basis, is proposed to
identify the linear components in the panel data model with fixed effects.
Under some mild assumptions, the proposed procedure is shown to consistently
estimate the underlying regression function, correctly select the linear
components, and effectively conduct the statistical inference. When compared to
existing methods for detection of linearity in the panel model, our approach is
demonstrated to be theoretically justified as well as practically convenient.
We provide a computational algorithm that implements the proposed procedure
along with a path-based solution method for linearity detection, which avoids
the burden of selecting the tuning parameter for the penalty term. Monte Carlo
simulations are conducted to examine the finite sample performance of our
proposed procedure with detailed findings that confirm our theoretical results
in the paper. Applications to Aggregate Production and Environmental Kuznets
Curve data also illustrate the necessity for detecting linearity in the
partially linear panel model.

arXiv link: http://arxiv.org/abs/1911.08830v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-11-20

Equivariant online predictions of non-stationary time series

Authors: Kōsaku Takanashi, Kenichiro McAlinn

We discuss the finite sample theoretical properties of online predictions in
non-stationary time series under model misspecification. To analyze the
theoretical predictive properties of statistical methods under this setting, we
first define the Kullback-Leibler risk, in order to place the problem within a
decision theoretic framework. Under this framework, we show that a specific
class of dynamic models -- random walk dynamic linear models -- produce exact
minimax predictive densities. We first show this result under Gaussian
assumptions, then relax this assumption using semi-martingale processes. This
result provides a theoretical baseline, under both non-stationary and
stationary time series data, for which other models can be compared against. We
extend the result to the synthesis of multiple predictive densities. Three
topical applications in epidemiology, climatology, and economics, confirm and
highlight our theoretical results.

arXiv link: http://arxiv.org/abs/1911.08662v5

Econometrics arXiv updated paper (originally submitted: 2019-11-20)

Robust Inference on Infinite and Growing Dimensional Time Series Regression

Authors: Abhimanyu Gupta, Myung Hwan Seo

We develop a class of tests for time series models such as multiple
regression with growing dimension, infinite-order autoregression and
nonparametric sieve regression. Examples include the Chow test and general
linear restriction tests of growing rank $p$. Employing such increasing $p$
asymptotics, we introduce a new scale correction to conventional test
statistics which accounts for a high-order long-run variance (HLV) that emerges
as $ p $ grows with sample size. We also propose a bias correction via a
null-imposed bootstrap to alleviate finite sample bias without sacrificing
power unduly. A simulation study shows the importance of robustifying testing
procedures against the HLV even when $ p $ is moderate. The tests are
illustrated with an application to the oil regressions in Hamilton (2003).

arXiv link: http://arxiv.org/abs/1911.08637v4

Econometrics arXiv updated paper (originally submitted: 2019-11-19)

Synthetic Controls with Imperfect Pre-Treatment Fit

Authors: Bruno Ferman, Cristine Pinto

We analyze the properties of the Synthetic Control (SC) and related
estimators when the pre-treatment fit is imperfect. In this framework, we show
that these estimators are generally biased if treatment assignment is
correlated with unobserved confounders, even when the number of pre-treatment
periods goes to infinity. Still, we show that a demeaned version of the SC
method can substantially improve in terms of bias and variance relative to the
difference-in-difference estimator. We also derive a specification test for the
demeaned SC estimator in this setting with imperfect pre-treatment fit. Given
our theoretical results, we provide practical guidance for applied researchers
on how to justify the use of such estimators in empirical applications.

arXiv link: http://arxiv.org/abs/1911.08521v2

Econometrics arXiv paper, submitted: 2019-11-16

Inference in Models of Discrete Choice with Social Interactions Using Network Data

Authors: Michael P. Leung

This paper studies inference in models of discrete choice with social
interactions when the data consists of a single large network. We provide
theoretical justification for the use of spatial and network HAC variance
estimators in applied work, the latter constructed by using network path
distance in place of spatial distance. Toward this end, we prove new central
limit theorems for network moments in a large class of social interactions
models. The results are applicable to discrete games on networks and dynamic
models where social interactions enter through lagged dependent variables. We
illustrate our results in an empirical application and simulation study.

arXiv link: http://arxiv.org/abs/1911.07106v1

Econometrics arXiv updated paper (originally submitted: 2019-11-16)

Causal Inference Under Approximate Neighborhood Interference

Authors: Michael P. Leung

This paper studies causal inference in randomized experiments under network
interference. Commonly used models of interference posit that treatments
assigned to alters beyond a certain network distance from the ego have no
effect on the ego's response. However, this assumption is violated in common
models of social interactions. We propose a substantially weaker model of
"approximate neighborhood interference" (ANI) under which treatments assigned
to alters further from the ego have a smaller, but potentially nonzero, effect
on the ego's response. We formally verify that ANI holds for well-known models
of social interactions. Under ANI, restrictions on the network topology, and
asymptotics under which the network size increases, we prove that standard
inverse-probability weighting estimators consistently estimate useful exposure
effects and are approximately normal. For inference, we consider a network HAC
variance estimator. Under a finite population model, we show that the estimator
is biased but that the bias can be interpreted as the variance of unit-level
exposure effects. This generalizes Neyman's well-known result on conservative
variance estimation to settings with interference.

arXiv link: http://arxiv.org/abs/1911.07085v4

Econometrics arXiv paper, submitted: 2019-11-15

Semiparametric Estimation of Correlated Random Coefficient Models without Instrumental Variables

Authors: Samuele Centorrino, Aman Ullah, Jing Xue

We study a linear random coefficient model where slope parameters may be
correlated with some continuous covariates. Such a model specification may
occur in empirical research, for instance, when quantifying the effect of a
continuous treatment observed at two time periods. We show one can carry
identification and estimation without instruments. We propose a semiparametric
estimator of average partial effects and of average treatment effects on the
treated. We showcase the small sample properties of our estimator in an
extensive simulation study. Among other things, we reveal that it compares
favorably with a control function estimator. We conclude with an application to
the effect of malaria eradication on economic development in Colombia.

arXiv link: http://arxiv.org/abs/1911.06857v1

Econometrics arXiv updated paper (originally submitted: 2019-11-14)

Bayesian state-space modeling for analyzing heterogeneous network effects of US monetary policy

Authors: Niko Hauzenberger, Michael Pfarrhofer

Understanding disaggregate channels in the transmission of monetary policy is
of crucial importance for effectively implementing policy measures. We extend
the empirical econometric literature on the role of production networks in the
propagation of shocks along two dimensions. First, we allow for
industry-specific responses that vary over time, reflecting non-linearities and
cross-sectional heterogeneities in direct transmission channels. Second, we
allow for time-varying network structures and dependence. This feature captures
both variation in the structure of the production network, but also differences
in cross-industry demand elasticities. We find that impacts vary substantially
over time and the cross-section. Higher-order effects appear to be particularly
important in periods of economic and financial uncertainty, often coinciding
with tight credit market conditions and financial stress. Differentials in
industry-specific responses can be explained by how close the respective
industries are to end-consumers.

arXiv link: http://arxiv.org/abs/1911.06206v3

Econometrics arXiv paper, submitted: 2019-11-13

Randomization tests of copula symmetry

Authors: Brendan K. Beare, Juwon Seo

New nonparametric tests of copula exchangeability and radial symmetry are
proposed. The novel aspect of the tests is a resampling procedure that exploits
group invariance conditions associated with the relevant symmetry hypothesis.
They may be viewed as feasible versions of randomization tests of symmetry, the
latter being inapplicable due to the unobservability of margins. Our tests are
simple to compute, control size asymptotically, consistently detect arbitrary
forms of asymmetry, and do not require the specification of a tuning parameter.
Simulations indicate excellent small sample properties compared to existing
procedures involving the multiplier bootstrap.

arXiv link: http://arxiv.org/abs/1911.05307v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-11-12

Combinatorial Models of Cross-Country Dual Meets: What is a Big Victory?

Authors: Kurt S. Riedel

Combinatorial/probabilistic models for cross-country dual-meets are proposed.
The first model assumes that all runners are equally likely to finish in any
possible order. The second model assumes that each team is selected from a
large identically distributed population of potential runners and with each
potential runner's ranking determined by the initial draw from the combined
population.

arXiv link: http://arxiv.org/abs/1911.05044v1

Econometrics arXiv paper, submitted: 2019-11-12

A Simple Estimator for Quantile Panel Data Models Using Smoothed Quantile Regressions

Authors: Liang Chen, Yulong Huo

Canay (2011)'s two-step estimator of quantile panel data models, due to its
simple intuition and low computational cost, has been widely used in empirical
studies in recent years. In this paper, we revisit the estimator of Canay
(2011) and point out that in his asymptotic analysis the bias of his estimator
due to the estimation of the fixed effects is mistakenly omitted, and that such
omission will lead to invalid inference on the coefficients. To solve this
problem, we propose a similar easy-to-implement estimator based on smoothed
quantile regressions. The asymptotic distribution of the new estimator is
established and the analytical expression of its asymptotic bias is derived.
Based on these results, we show how to make asymptotically valid inference
based on both analytical and split-panel jackknife bias corrections. Finally,
finite sample simulations are used to support our theoretical analysis and to
illustrate the importance of bias correction in quantile regressions for panel
data.

arXiv link: http://arxiv.org/abs/1911.04729v1

Econometrics arXiv updated paper (originally submitted: 2019-11-12)

Extended MinP Tests for Global and Multiple testing

Authors: Zeng-Hua Lu

Empirical economic studies often involve multiple propositions or hypotheses,
with researchers aiming to assess both the collective and individual evidence
against these propositions or hypotheses. To rigorously assess this evidence,
practitioners frequently employ tests with quadratic test statistics, such as
$F$-tests and Wald tests, or tests based on minimum/maximum type test
statistics. This paper introduces a combination test that merges these two
classes of tests using the minimum $p$-value principle. The proposed test
capitalizes on the global power advantages of both constituent tests while
retaining the benefits of the stepdown procedure from minimum/maximum type
tests.

arXiv link: http://arxiv.org/abs/1911.04696v2

Econometrics arXiv updated paper (originally submitted: 2019-11-11)

Identification in discrete choice models with imperfect information

Authors: Cristina Gualdani, Shruti Sinha

We study identification of preferences in static single-agent discrete choice
models where decision makers may be imperfectly informed about the state of the
world. We leverage the notion of one-player Bayes Correlated Equilibrium by
Bergemann and Morris (2016) to provide a tractable characterization of the
sharp identified set. We develop a procedure to practically construct the sharp
identified set following a sieve approach, and provide sharp bounds on
counterfactual outcomes of interest. We use our methodology and data on the
2017 UK general election to estimate a spatial voting model under weak
assumptions on agents' information about the returns to voting. Counterfactual
exercises quantify the consequences of imperfect information on the well-being
of voters and parties.

arXiv link: http://arxiv.org/abs/1911.04529v5

Econometrics arXiv paper, submitted: 2019-11-09

An Asymptotically F-Distributed Chow Test in the Presence of Heteroscedasticity and Autocorrelation

Authors: Yixiao Sun, Xuexin Wang

This study proposes a simple, trustworthy Chow test in the presence of
heteroscedasticity and autocorrelation. The test is based on a series
heteroscedasticity and autocorrelation robust variance estimator with
judiciously crafted basis functions. Like the Chow test in a classical normal
linear regression, the proposed test employs the standard F distribution as the
reference distribution, which is justified under fixed-smoothing asymptotics.
Monte Carlo simulations show that the null rejection probability of the
asymptotic F test is closer to the nominal level than that of the chi-square
test.

arXiv link: http://arxiv.org/abs/1911.03771v1

Econometrics arXiv updated paper (originally submitted: 2019-11-09)

Optimal Experimental Design for Staggered Rollouts

Authors: Ruoxuan Xiong, Susan Athey, Mohsen Bayati, Guido Imbens

In this paper, we study the design and analysis of experiments conducted on a
set of units over multiple time periods where the starting time of the
treatment may vary by unit. The design problem involves selecting an initial
treatment time for each unit in order to most precisely estimate both the
instantaneous and cumulative effects of the treatment. We first consider
non-adaptive experiments, where all treatment assignment decisions are made
prior to the start of the experiment. For this case, we show that the
optimization problem is generally NP-hard, and we propose a near-optimal
solution. Under this solution, the fraction entering treatment each period is
initially low, then high, and finally low again. Next, we study an adaptive
experimental design problem, where both the decision to continue the experiment
and treatment assignment decisions are updated after each period's data is
collected. For the adaptive case, we propose a new algorithm, the
Precision-Guided Adaptive Experiment (PGAE) algorithm, that addresses the
challenges at both the design stage and at the stage of estimating treatment
effects, ensuring valid post-experiment inference accounting for the adaptive
nature of the design. Using realistic settings, we demonstrate that our
proposed solutions can reduce the opportunity cost of the experiments by over
50%, compared to static design benchmarks.

arXiv link: http://arxiv.org/abs/1911.03764v6

Econometrics arXiv updated paper (originally submitted: 2019-11-07)

Group Average Treatment Effects for Observational Studies

Authors: Daniel Jacob

The paper proposes an estimator to make inference of heterogeneous treatment
effects sorted by impact groups (GATES) for non-randomised experiments. The
groups can be understood as a broader aggregation of the conditional average
treatment effect (CATE) where the number of groups is set in advance. In
economics, this approach is similar to pre-analysis plans. Observational
studies are standard in policy evaluation from labour markets, educational
surveys and other empirical studies. To control for a potential selection-bias,
we implement a doubly-robust estimator in the first stage. We use machine
learning methods to learn the conditional mean functions as well as the
propensity score. The group average treatment effect is then estimated via a
linear projection model. The linear model is easy to interpret, provides
p-values and confidence intervals, and limits the danger of finding spurious
heterogeneity due to small subgroups in the CATE. To control for confounding in
the linear model, we use Neyman-orthogonal moments to partial out the effect
that covariates have on both, the treatment assignment and the outcome. The
result is a best linear predictor for effect heterogeneity based on impact
groups. We find that our proposed method has lower absolute errors as well as
smaller bias than the benchmark doubly-robust estimator. We further introduce a
bagging type averaging for the CATE function for each observation to avoid
biases through sample splitting. The advantage of the proposed method is a
robust linear estimation of heterogeneous group treatment effects in
observational studies.

arXiv link: http://arxiv.org/abs/1911.02688v5

Econometrics arXiv updated paper (originally submitted: 2019-11-06)

Quantile Factor Models

Authors: Liang Chen, Juan Jose Dolado, Jesus Gonzalo

Quantile Factor Models (QFM) represent a new class of factor models for
high-dimensional panel data. Unlike Approximate Factor Models (AFM), where only
location-shifting factors can be extracted, QFM also allow to recover
unobserved factors shifting other relevant parts of the distributions of
observed variables. A quantile regression approach, labeled Quantile Factor
Analysis (QFA), is proposed to consistently estimate all the quantile-dependent
factors and loadings. Their asymptotic distribution is then derived using a
kernel-smoothed version of the QFA estimators. Two consistent model selection
criteria, based on information criteria and rank minimization, are developed to
determine the number of factors at each quantile. Moreover, in contrast to the
conditions required for the use of Principal Components Analysis in AFM, QFA
estimation remains valid even when the idiosyncratic errors have heavy-tailed
distributions. Three empirical applications (regarding macroeconomic, climate
and finance panel data) provide evidence that extra factors shifting the
quantiles other than the means could be relevant in practice.

arXiv link: http://arxiv.org/abs/1911.02173v2

Econometrics arXiv updated paper (originally submitted: 2019-11-05)

Nonparametric Quantile Regressions for Panel Data Models with Large T

Authors: Liang Chen

This paper considers panel data models where the conditional quantiles of the
dependent variables are additively separable as unknown functions of the
regressors and the individual effects. We propose two estimators of the
quantile partial effects while controlling for the individual heterogeneity.
The first estimator is based on local linear quantile regressions, and the
second is based on local linear smoothed quantile regressions, both of which
are easy to compute in practice. Within the large T framework, we provide
sufficient conditions under which the two estimators are shown to be
asymptotically normally distributed. In particular, for the first estimator, it
is shown that $N<<T^{2/(d+4)}$ is needed to ignore the incidental parameter
biases, where $d$ is the dimension of the regressors. For the second estimator,
we are able to derive the analytical expression of the asymptotic biases under
the assumption that $N\approx Th^{d}$, where $h$ is the bandwidth parameter in
local linear approximations. Our theoretical results provide the basis of using
split-panel jackknife for bias corrections. A Monte Carlo simulation shows that
the proposed estimators and the bias-correction method perform well in finite
samples.

arXiv link: http://arxiv.org/abs/1911.01824v3

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2019-11-04

Cheating with (Recursive) Models

Authors: Kfir Eliaz, Ran Spiegler, Yair Weiss

To what extent can agents with misspecified subjective models predict false
correlations? We study an "analyst" who utilizes models that take the form of a
recursive system of linear regression equations. The analyst fits each equation
to minimize the sum of squared errors against an arbitrarily large sample. We
characterize the maximal pairwise correlation that the analyst can predict
given a generic objective covariance matrix, subject to the constraint that the
estimated model does not distort the mean and variance of individual variables.
We show that as the number of variables in the model grows, the false pairwise
correlation can become arbitrarily close to one, regardless of the true
correlation.

arXiv link: http://arxiv.org/abs/1911.01251v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-11-02

Model Specification Test with Unlabeled Data: Approach from Covariate Shift

Authors: Masahiro Kato, Hikaru Kawarazaki

We propose a novel framework of the model specification test in regression
using unlabeled test data. In many cases, we have conducted statistical
inferences based on the assumption that we can correctly specify a model.
However, it is difficult to confirm whether a model is correctly specified. To
overcome this problem, existing works have devised statistical tests for model
specification. Existing works have defined a correctly specified model in
regression as a model with zero conditional mean of the error term over train
data only. Extending the definition in conventional statistical tests, we
define a correctly specified model as a model with zero conditional mean of the
error term over any distribution of the explanatory variable. This definition
is a natural consequence of the orthogonality of the explanatory variable and
the error term. If a model does not satisfy this condition, the model might
lack robustness with regards to the distribution shift. The proposed method
would enable us to reject a misspecified model under our definition. By
applying the proposed method, we can obtain a model that predicts the label for
the unlabeled test data well without losing the interpretability of the model.
In experiments, we show how the proposed method works for synthetic and
real-world datasets.

arXiv link: http://arxiv.org/abs/1911.00688v2

Econometrics arXiv updated paper (originally submitted: 2019-11-02)

A two-dimensional propensity score matching method for longitudinal quasi-experimental studies: A focus on travel behavior and the built environment

Authors: Haotian Zhong, Wei Li, Marlon G. Boarnet

The lack of longitudinal studies of the relationship between the built
environment and travel behavior has been widely discussed in the literature.
This paper discusses how standard propensity score matching estimators can be
extended to enable such studies by pairing observations across two dimensions:
longitudinal and cross-sectional. Researchers mimic randomized controlled
trials (RCTs) and match observations in both dimensions, to find synthetic
control groups that are similar to the treatment group and to match subjects
synthetically across before-treatment and after-treatment time periods. We call
this a two-dimensional propensity score matching (2DPSM). This method
demonstrates superior performance for estimating treatment effects based on
Monte Carlo evidence. A near-term opportunity for such matching is identifying
the impact of transportation infrastructure on travel behavior.

arXiv link: http://arxiv.org/abs/1911.00667v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2019-11-01

Explaining black box decisions by Shapley cohort refinement

Authors: Masayoshi Mase, Art B. Owen, Benjamin Seiler

We introduce a variable importance measure to quantify the impact of
individual input variables to a black box function. Our measure is based on the
Shapley value from cooperative game theory. Many measures of variable
importance operate by changing some predictor values with others held fixed,
potentially creating unlikely or even logically impossible combinations. Our
cohort Shapley measure uses only observed data points. Instead of changing the
value of a predictor we include or exclude subjects similar to the target
subject on that predictor to form a similarity cohort. Then we apply Shapley
value to the cohort averages. We connect variable importance measures from
explainable AI to function decompositions from global sensitivity analysis. We
introduce a squared cohort Shapley value that splits previously studied Shapley
effects over subjects, consistent with a Shapley axiom.

arXiv link: http://arxiv.org/abs/1911.00467v2

Econometrics arXiv updated paper (originally submitted: 2019-11-01)

Regularized Quantile Regression with Interactive Fixed Effects

Authors: Junlong Feng

This paper studies large $N$ and large $T$ conditional quantile panel data
models with interactive fixed effects. We propose a nuclear norm penalized
estimator of the coefficients on the covariates and the low-rank matrix formed
by the fixed effects. The estimator solves a convex minimization problem, not
requiring pre-estimation of the (number of the) fixed effects. It also allows
the number of covariates to grow slowly with $N$ and $T$. We derive an error
bound on the estimator that holds uniformly in quantile level. The order of the
bound implies uniform consistency of the estimator and is nearly optimal for
the low-rank component. Given the error bound, we also propose a consistent
estimator of the number of fixed effects at any quantile level. To derive the
error bound, we develop new theoretical arguments under primitive assumptions
and new results on random matrices that may be of independent interest. We
demonstrate the performance of the estimator via Monte Carlo simulations.

arXiv link: http://arxiv.org/abs/1911.00166v4

Econometrics arXiv paper, submitted: 2019-10-29

Analyzing China's Consumer Price Index Comparatively with that of United States

Authors: Zhenzhong Wang, Yundong Tu, Song Xi Chen

This paper provides a thorough analysis on the dynamic structures and
predictability of China's Consumer Price Index (CPI-CN), with a comparison to
those of the United States. Despite the differences in the two leading
economies, both series can be well modeled by a class of Seasonal
Autoregressive Integrated Moving Average Model with Covariates (S-ARIMAX). The
CPI-CN series possess regular patterns of dynamics with stable annual cycles
and strong Spring Festival effects, with fitting and forecasting errors largely
comparable to their US counterparts. Finally, for the CPI-CN, the diffusion
index (DI) approach offers improved predictions than the S-ARIMAX models.

arXiv link: http://arxiv.org/abs/1910.13301v1

Econometrics arXiv updated paper (originally submitted: 2019-10-28)

Testing Forecast Rationality for Measures of Central Tendency

Authors: Timo Dimitriadis, Andrew J. Patton, Patrick W. Schmidt

Rational respondents to economic surveys may report as a point forecast any
measure of the central tendency of their (possibly latent) predictive
distribution, for example the mean, median, mode, or any convex combination
thereof. We propose tests of forecast rationality when the measure of central
tendency used by the respondent is unknown. We overcome an identification
problem that arises when the measures of central tendency are equal or in a
local neighborhood of each other, as is the case for (exactly or nearly)
symmetric distributions. As a building block, we also present novel tests for
the rationality of mode forecasts. We apply our tests to income forecasts from
the Federal Reserve Bank of New York's Survey of Consumer Expectations. We find
these forecasts are rationalizable as mode forecasts, but not as mean or median
forecasts. We also find heterogeneity in the measure of centrality used by
respondents when stratifying the sample by past income, age, job stability, and
survey experience.

arXiv link: http://arxiv.org/abs/1910.12545v5

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2019-10-27

Dual Instrumental Variable Regression

Authors: Krikamol Muandet, Arash Mehrjou, Si Kai Lee, Anant Raj

We present a novel algorithm for non-linear instrumental variable (IV)
regression, DualIV, which simplifies traditional two-stage methods via a dual
formulation. Inspired by problems in stochastic programming, we show that
two-stage procedures for non-linear IV regression can be reformulated as a
convex-concave saddle-point problem. Our formulation enables us to circumvent
the first-stage regression which is a potential bottleneck in real-world
applications. We develop a simple kernel-based algorithm with an analytic
solution based on this formulation. Empirical results show that we are
competitive to existing, more complicated algorithms for non-linear
instrumental variable regression.

arXiv link: http://arxiv.org/abs/1910.12358v3

Econometrics arXiv paper, submitted: 2019-10-26

Estimating a Large Covariance Matrix in Time-varying Factor Models

Authors: Jaeheon Jung

This paper deals with the time-varying high dimensional covariance matrix
estimation. We propose two covariance matrix estimators corresponding with a
time-varying approximate factor model and a time-varying approximate
characteristic-based factor model, respectively. The models allow the factor
loadings, factor covariance matrix, and error covariance matrix to change
smoothly over time. We study the rate of convergence of each estimator. Our
simulation and empirical study indicate that time-varying covariance matrix
estimators generally perform better than time-invariant covariance matrix
estimators. Also, if characteristics are available that genuinely explain true
loadings, the characteristics can be used to estimate loadings more precisely
in finite samples; their helpfulness increases when loadings rapidly change.

arXiv link: http://arxiv.org/abs/1910.11965v1

Econometrics arXiv updated paper (originally submitted: 2019-10-23)

Fast and Flexible Bayesian Inference in Time-varying Parameter Regression Models

Authors: Niko Hauzenberger, Florian Huber, Gary Koop, Luca Onorante

In this paper, we write the time-varying parameter (TVP) regression model
involving K explanatory variables and T observations as a constant coefficient
regression model with KT explanatory variables. In contrast with much of the
existing literature which assumes coefficients to evolve according to a random
walk, a hierarchical mixture model on the TVPs is introduced. The resulting
model closely mimics a random coefficients specification which groups the TVPs
into several regimes. These flexible mixtures allow for TVPs that feature a
small, moderate or large number of structural breaks. We develop
computationally efficient Bayesian econometric methods based on the singular
value decomposition of the KT regressors. In artificial data, we find our
methods to be accurate and much faster than standard approaches in terms of
computation time. In an empirical exercise involving inflation forecasting
using a large number of predictors, we find our models to forecast better than
alternative approaches and document different patterns of parameter change than
are found with approaches which assume random walk evolution of parameters.

arXiv link: http://arxiv.org/abs/1910.10779v4

Econometrics arXiv paper, submitted: 2019-10-23

Nonparametric identification of an interdependent value model with buyer covariates from first-price auction bids

Authors: Nathalie Gimenes, Emmanuel Guerre

This paper introduces a version of the interdependent value model of Milgrom
and Weber (1982), where the signals are given by an index gathering signal
shifters observed by the econometrician and private ones specific to each
bidders. The model primitives are shown to be nonparametrically identified from
first-price auction bids under a testable mild rank condition. Identification
holds for all possible signal values. This allows to consider a wide range of
counterfactuals where this is important, as expected revenue in second-price
auction. An estimation procedure is briefly discussed.

arXiv link: http://arxiv.org/abs/1910.10646v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-10-23

How well can we learn large factor models without assuming strong factors?

Authors: Yinchu Zhu

In this paper, we consider the problem of learning models with a latent
factor structure. The focus is to find what is possible and what is impossible
if the usual strong factor condition is not imposed. We study the minimax rate
and adaptivity issues in two problems: pure factor models and panel regression
with interactive fixed effects. For pure factor models, if the number of
factors is known, we develop adaptive estimation and inference procedures that
attain the minimax rate. However, when the number of factors is not specified a
priori, we show that there is a tradeoff between validity and efficiency: any
confidence interval that has uniform validity for arbitrary factor strength has
to be conservative; in particular its width is bounded away from zero even when
the factors are strong. Conversely, any data-driven confidence interval that
does not require as an input the exact number of factors (including weak ones)
and has shrinking width under strong factors does not have uniform coverage and
the worst-case coverage probability is at most 1/2. For panel regressions with
interactive fixed effects, the tradeoff is much better. We find that the
minimax rate for learning the regression coefficient does not depend on the
factor strength and propose a simple estimator that achieves this rate.
However, when weak factors are allowed, uncertainty in the number of factors
can cause a great loss of efficiency although the rate is not affected. In most
cases, we find that the strong factor condition (and/or exact knowledge of
number of factors) improves efficiency, but this condition needs to be imposed
by faith and cannot be verified in data for inference purposes.

arXiv link: http://arxiv.org/abs/1910.10382v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-10-22

Principal Component Analysis: A Generalized Gini Approach

Authors: Charpentier, Arthur, Mussard, Stephane, Tea Ouraga

A principal component analysis based on the generalized Gini correlation
index is proposed (Gini PCA). The Gini PCA generalizes the standard PCA based
on the variance. It is shown, in the Gaussian case, that the standard PCA is
equivalent to the Gini PCA. It is also proven that the dimensionality reduction
based on the generalized Gini correlation matrix, that relies on city-block
distances, is robust to outliers. Monte Carlo simulations and an application on
cars data (with outliers) show the robustness of the Gini PCA and provide
different interpretations of the results compared with the variance PCA.

arXiv link: http://arxiv.org/abs/1910.10133v1

Econometrics arXiv paper, submitted: 2019-10-22

Quasi Maximum Likelihood Estimation of Non-Stationary Large Approximate Dynamic Factor Models

Authors: Matteo Barigozzi, Matteo Luciani

This paper considers estimation of large dynamic factor models with common
and idiosyncratic trends by means of the Expectation Maximization algorithm,
implemented jointly with the Kalman smoother. We show that, as the
cross-sectional dimension $n$ and the sample size $T$ diverge to infinity, the
common component for a given unit estimated at a given point in time is
$\min(\sqrt n,\sqrt T)$-consistent. The case of local levels and/or local
linear trends trends is also considered. By means of a MonteCarlo simulation
exercise, we compare our approach with estimators based on principal component
analysis.

arXiv link: http://arxiv.org/abs/1910.09841v1

Econometrics arXiv updated paper (originally submitted: 2019-10-21)

A path-sampling method to partially identify causal effects in instrumental variable models

Authors: Florian Gunsilius

Partial identification approaches are a flexible and robust alternative to
standard point-identification approaches in general instrumental variable
models. However, this flexibility comes at the cost of a “curse of
cardinality”: the number of restrictions on the identified set grows
exponentially with the number of points in the support of the endogenous
treatment. This article proposes a novel path-sampling approach to this
challenge. It is designed for partially identifying causal effects of interest
in the most complex models with continuous endogenous treatments. A stochastic
process representation allows to seamlessly incorporate assumptions on
individual behavior into the model. Some potential applications include
dose-response estimation in randomized trials with imperfect compliance, the
evaluation of social programs, welfare estimation in demand models, and
continuous choice models. As a demonstration, the method provides informative
nonparametric bounds on household expenditures under the assumption that
expenditure is continuous. The mathematical contribution is an approach to
approximately solving infinite dimensional linear programs on path spaces via
sampling.

arXiv link: http://arxiv.org/abs/1910.09502v2

Econometrics arXiv paper, submitted: 2019-10-21

Multi-Stage Compound Real Options Valuation in Residential PV-Battery Investment

Authors: Yiju Ma, Kevin Swandi, Archie Chapman, Gregor Verbic

Strategic valuation of efficient and well-timed network investments under
uncertain electricity market environment has become increasingly challenging,
because there generally exist multiple interacting options in these
investments, and failing to systematically consider these options can lead to
decisions that undervalue the investment. In our work, a real options valuation
(ROV) framework is proposed to determine the optimal strategy for executing
multiple interacting options within a distribution network investment, to
mitigate the risk of financial losses in the presence of future uncertainties.
To demonstrate the characteristics of the proposed framework, we determine the
optimal strategy to economically justify the investment in residential
PV-battery systems for additional grid supply during peak demand periods. The
options to defer, and then expand, are considered as multi-stage compound
options, since the option to expand is a subsequent option of the former. These
options are valued via the least squares Monte Carlo method, incorporating
uncertainty over growing power demand, varying diesel fuel price, and the
declining cost of PV-battery technology as random variables. Finally, a
sensitivity analysis is performed to demonstrate how the proposed framework
responds to uncertain events. The proposed framework shows that executing the
interacting options at the optimal timing increases the investment value.

arXiv link: http://arxiv.org/abs/1910.09132v1

Econometrics arXiv updated paper (originally submitted: 2019-10-20)

Feasible Generalized Least Squares for Panel Data with Cross-sectional and Serial Correlations

Authors: Jushan Bai, Sung Hoon Choi, Yuan Liao

This paper considers generalized least squares (GLS) estimation for linear
panel data models. By estimating the large error covariance matrix
consistently, the proposed feasible GLS (FGLS) estimator is more efficient than
the ordinary least squares (OLS) in the presence of heteroskedasticity, serial,
and cross-sectional correlations. To take into account the serial correlations,
we employ the banding method. To take into account the cross-sectional
correlations, we suggest to use the thresholding method. We establish the
limiting distribution of the proposed estimator. A Monte Carlo study is
considered. The proposed method is applied to an empirical application.

arXiv link: http://arxiv.org/abs/1910.09004v3

Econometrics arXiv updated paper (originally submitted: 2019-10-18)

Large Dimensional Latent Factor Modeling with Missing Observations and Applications to Causal Inference

Authors: Ruoxuan Xiong, Markus Pelger

This paper develops the inferential theory for latent factor models estimated
from large dimensional panel data with missing observations. We propose an
easy-to-use all-purpose estimator for a latent factor model by applying
principal component analysis to an adjusted covariance matrix estimated from
partially observed panel data. We derive the asymptotic distribution for the
estimated factors, loadings and the imputed values under an approximate factor
model and general missing patterns. The key application is to estimate
counterfactual outcomes in causal inference from panel data. The unobserved
control group is modeled as missing values, which are inferred from the latent
factor model. The inferential theory for the imputed values allows us to test
for individual treatment effects at any time under general adoption patterns
where the units can be affected by unobserved factors.

arXiv link: http://arxiv.org/abs/1910.08273v6

Econometrics arXiv paper, submitted: 2019-10-17

Forecasting under Long Memory and Nonstationarity

Authors: Uwe Hassler, Marc-Oliver Pohle

Long memory in the sense of slowly decaying autocorrelations is a stylized
fact in many time series from economics and finance. The fractionally
integrated process is the workhorse model for the analysis of these time
series. Nevertheless, there is mixed evidence in the literature concerning its
usefulness for forecasting and how forecasting based on it should be
implemented.
Employing pseudo-out-of-sample forecasting on inflation and realized
volatility time series and simulations we show that methods based on fractional
integration clearly are superior to alternative methods not accounting for long
memory, including autoregressions and exponential smoothing. Our proposal of
choosing a fixed fractional integration parameter of $d=0.5$ a priori yields
the best results overall, capturing long memory behavior, but overcoming the
deficiencies of methods using an estimated parameter.
Regarding the implementation of forecasting methods based on fractional
integration, we use simulations to compare local and global semiparametric and
parametric estimators of the long memory parameter from the Whittle family and
provide asymptotic theory backed up by simulations to compare different mean
estimators. Both of these analyses lead to new results, which are also of
interest outside the realm of forecasting.

arXiv link: http://arxiv.org/abs/1910.08202v1

Econometrics arXiv updated paper (originally submitted: 2019-10-17)

Econometric Models of Network Formation

Authors: Aureo de Paula

This article provides a selective review on the recent literature on
econometric models of network formation. The survey starts with a brief
exposition on basic concepts and tools for the statistical description of
networks. I then offer a review of dyadic models, focussing on statistical
models on pairs of nodes and describe several developments of interest to the
econometrics literature. The article also presents a discussion of non-dyadic
models where link formation might be influenced by the presence or absence of
additional links, which themselves are subject to similar influences. This is
related to the statistical literature on conditionally specified models and the
econometrics of game theoretical models. I close with a (non-exhaustive)
discussion of potential areas for further development.

arXiv link: http://arxiv.org/abs/1910.07781v2

Econometrics arXiv updated paper (originally submitted: 2019-10-17)

A Projection Framework for Testing Shape Restrictions That Form Convex Cones

Authors: Zheng Fang, Juwon Seo

This paper develops a uniformly valid and asymptotically nonconservative test
based on projection for a class of shape restrictions. The key insight we
exploit is that these restrictions form convex cones, a simple and yet elegant
structure that has been barely harnessed in the literature. Based on a
monotonicity property afforded by such a geometric structure, we construct a
bootstrap procedure that, unlike many studies in nonstandard settings,
dispenses with estimation of local parameter spaces, and the critical values
are obtained in a way as simple as computing the test statistic. Moreover, by
appealing to strong approximations, our framework accommodates nonparametric
regression models as well as distributional/density-related and structural
settings. Since the test entails a tuning parameter (due to the nonstandard
nature of the problem), we propose a data-driven choice and prove its validity.
Monte Carlo simulations confirm that our test works well.

arXiv link: http://arxiv.org/abs/1910.07689v4

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-10-16

Asymptotic Theory of $L$-Statistics and Integrable Empirical Processes

Authors: Tetsuya Kaji

This paper develops asymptotic theory of integrals of empirical quantile
functions with respect to random weight functions, which is an extension of
classical $L$-statistics. They appear when sample trimming or Winsorization is
applied to asymptotically linear estimators. The key idea is to consider
empirical processes in the spaces appropriate for integration. First, we
characterize weak convergence of empirical distribution functions and random
weight functions in the space of bounded integrable functions. Second, we
establish the delta method for empirical quantile functions as integrable
functions. Third, we derive the delta method for $L$-statistics. Finally, we
prove weak convergence of their bootstrap processes, showing validity of
nonparametric bootstrap.

arXiv link: http://arxiv.org/abs/1910.07572v1

Econometrics arXiv updated paper (originally submitted: 2019-10-16)

Identifying Network Ties from Panel Data: Theory and an Application to Tax Competition

Authors: Aureo de Paula, Imran Rasul, Pedro Souza

Social interactions determine many economic behaviors, but information on
social ties does not exist in most publicly available and widely used datasets.
We present results on the identification of social networks from observational
panel data that contains no information on social ties between agents. In the
context of a canonical social interactions model, we provide sufficient
conditions under which the social interactions matrix, endogenous and exogenous
social effect parameters are all globally identified. While this result is
relevant across different estimation strategies, we then describe how
high-dimensional estimation techniques can be used to estimate the interactions
model based on the Adaptive Elastic Net GMM method. We employ the method to
study tax competition across US states. We find the identified social
interactions matrix implies tax competition differs markedly from the common
assumption of competition between geographically neighboring states, providing
further insights for the long-standing debate on the relative roles of factor
mobility and yardstick competition in driving tax setting behavior across
states. Most broadly, our identification and application show the analysis of
social interactions can be extended to economic realms where no network data
exists.

arXiv link: http://arxiv.org/abs/1910.07452v4

Econometrics arXiv updated paper (originally submitted: 2019-10-16)

Standard Errors for Panel Data Models with Unknown Clusters

Authors: Jushan Bai, Sung Hoon Choi, Yuan Liao

This paper develops a new standard-error estimator for linear panel data
models. The proposed estimator is robust to heteroskedasticity, serial
correlation, and cross-sectional correlation of unknown forms. The serial
correlation is controlled by the Newey-West method. To control for
cross-sectional correlations, we propose to use the thresholding method,
without assuming the clusters to be known. We establish the consistency of the
proposed estimator. Monte Carlo simulations show the method works well. An
empirical application is considered.

arXiv link: http://arxiv.org/abs/1910.07406v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-10-16

Multivariate Forecasting Evaluation: On Sensitive and Strictly Proper Scoring Rules

Authors: Florian Ziel, Kevin Berk

In recent years, probabilistic forecasting is an emerging topic, which is why
there is a growing need of suitable methods for the evaluation of multivariate
predictions. We analyze the sensitivity of the most common scoring rules,
especially regarding quality of the forecasted dependency structures.
Additionally, we propose scoring rules based on the copula, which uniquely
describes the dependency structure for every probability distribution with
continuous marginal distributions. Efficient estimation of the considered
scoring rules and evaluation methods such as the Diebold-Mariano test are
discussed. In detailed simulation studies, we compare the performance of the
renowned scoring rules and the ones we propose. Besides extended synthetic
studies based on recently published results we also consider a real data
example. We find that the energy score, which is probably the most widely used
multivariate scoring rule, performs comparably well in detecting forecast
errors, also regarding dependencies. This contradicts other studies. The
results also show that a proposed copula score provides very strong distinction
between models with correct and incorrect dependency structure. We close with a
comprehensive discussion on the proposed methodology.

arXiv link: http://arxiv.org/abs/1910.07325v1

Econometrics arXiv updated paper (originally submitted: 2019-10-15)

Matrix Completion, Counterfactuals, and Factor Analysis of Missing Data

Authors: Jushan Bai, Serena Ng

This paper proposes an imputation procedure that uses the factors estimated
from a tall block along with the re-rotated loadings estimated from a wide
block to impute missing values in a panel of data. Assuming that a strong
factor structure holds for the full panel of data and its sub-blocks, it is
shown that the common component can be consistently estimated at four different
rates of convergence without requiring regularization or iteration. An
asymptotic analysis of the estimation error is obtained. An application of our
analysis is estimation of counterfactuals when potential outcomes have a factor
structure. We study the estimation of average and individual treatment effects
on the treated and establish a normal distribution theory that can be useful
for hypothesis testing.

arXiv link: http://arxiv.org/abs/1910.06677v5

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-10-14

Principled estimation of regression discontinuity designs

Authors: L. Jason Anastasopoulos

Regression discontinuity designs are frequently used to estimate the causal
effect of election outcomes and policy interventions. In these contexts,
treatment effects are typically estimated with covariates included to improve
efficiency. While including covariates improves precision asymptotically, in
practice, treatment effects are estimated with a small number of observations,
resulting in considerable fluctuations in treatment effect magnitude and
precision depending upon the covariates chosen. This practice thus incentivizes
researchers to select covariates which maximize treatment effect statistical
significance rather than precision. Here, I propose a principled approach for
estimating RDDs which provides a means of improving precision with covariates
while minimizing adverse incentives. This is accomplished by integrating the
adaptive LASSO, a machine learning method, into RDD estimation using an R
package developed for this purpose, adaptiveRDD. Using simulations, I show that
this method significantly improves treatment effect precision, particularly
when estimating treatment effects with fewer than 200 observations.

arXiv link: http://arxiv.org/abs/1910.06381v2

Econometrics arXiv updated paper (originally submitted: 2019-10-10)

Latent Dirichlet Analysis of Categorical Survey Responses

Authors: Evan Munro, Serena Ng

Beliefs are important determinants of an individual's choices and economic
outcomes, so understanding how they comove and differ across individuals is of
considerable interest. Researchers often rely on surveys that report individual
beliefs as qualitative data. We propose using a Bayesian hierarchical latent
class model to analyze the comovements and observed heterogeneity in
categorical survey responses. We show that the statistical model corresponds to
an economic structural model of information acquisition, which guides
interpretation and estimation of the model parameters. An algorithm based on
stochastic optimization is proposed to estimate a model for repeated surveys
when responses follow a dynamic structure and conjugate priors are not
appropriate. Guidance on selecting the number of belief types is also provided.
Two examples are considered. The first shows that there is information in the
Michigan survey responses beyond the consumer sentiment index that is
officially published. The second shows that belief types constructed from
survey responses can be used in a subsequent analysis to estimate heterogeneous
returns to education.

arXiv link: http://arxiv.org/abs/1910.04883v3

Econometrics arXiv updated paper (originally submitted: 2019-10-10)

Robust Likelihood Ratio Tests for Incomplete Economic Models

Authors: Hiroaki Kaido, Yi Zhang

This study develops a framework for testing hypotheses on structural
parameters in incomplete models. Such models make set-valued predictions and
hence do not generally yield a unique likelihood function. The model structure,
however, allows us to construct tests based on the least favorable pairs of
likelihoods using the theory of Huber and Strassen (1973). We develop tests
robust to model incompleteness that possess certain optimality properties. We
also show that sharp identifying restrictions play a role in constructing such
tests in a computationally tractable manner. A framework for analyzing the
local asymptotic power of the tests is developed by embedding the least
favorable pairs into a model that allows local approximations under the limits
of experiments argument. Examples of the hypotheses we consider include those
on the presence of strategic interaction effects in discrete games of complete
information. Monte Carlo experiments demonstrate the robust performance of the
proposed tests.

arXiv link: http://arxiv.org/abs/1910.04610v2

Econometrics arXiv paper, submitted: 2019-10-09

Averaging estimation for instrumental variables quantile regression

Authors: Xin Liu

This paper proposes averaging estimation methods to improve the finite-sample
efficiency of the instrumental variables quantile regression (IVQR) estimation.
First, I apply Cheng, Liao, Shi's (2019) averaging GMM framework to the IVQR
model. I propose using the usual quantile regression moments for averaging to
take advantage of cases when endogeneity is not too strong. I also propose
using two-stage least squares slope moments to take advantage of cases when
heterogeneity is not too strong. The empirical optimal weight formula of Cheng
et al. (2019) helps optimize the bias-variance tradeoff, ensuring uniformly
better (asymptotic) risk of the averaging estimator over the standard IVQR
estimator under certain conditions. My implementation involves many
computational considerations and builds on recent developments in the quantile
literature. Second, I propose a bootstrap method that directly averages among
IVQR, quantile regression, and two-stage least squares estimators. More
specifically, I find the optimal weights in the bootstrap world and then apply
the bootstrap-optimal weights to the original sample. The bootstrap method is
simpler to compute and generally performs better in simulations, but it lacks
the formal uniform dominance results of Cheng et al. (2019). Simulation results
demonstrate that in the multiple-regressors/instruments case, both the GMM
averaging and bootstrap estimators have uniformly smaller risk than the IVQR
estimator across data-generating processes (DGPs) with all kinds of
combinations of different endogeneity levels and heterogeneity levels. In DGPs
with a single endogenous regressor and instrument, where averaging estimation
is known to have least opportunity for improvement, the proposed averaging
estimators outperform the IVQR estimator in some cases but not others.

arXiv link: http://arxiv.org/abs/1910.04245v1

Econometrics arXiv updated paper (originally submitted: 2019-10-09)

Identifiability of Structural Singular Vector Autoregressive Models

Authors: Bernd Funovits, Alexander Braumann

We generalize well-known results on structural identifiability of vector
autoregressive models (VAR) to the case where the innovation covariance matrix
has reduced rank. Structural singular VAR models appear, for example, as
solutions of rational expectation models where the number of shocks is usually
smaller than the number of endogenous variables, and as an essential building
block in dynamic factor models. We show that order conditions for
identifiability are misleading in the singular case and provide a rank
condition for identifiability of the noise parameters. Since the Yule-Walker
equations may have multiple solutions, we analyze the effect of restrictions on
the system parameters on over- and underidentification in detail and provide
easily verifiable conditions.

arXiv link: http://arxiv.org/abs/1910.04096v2

Econometrics arXiv paper, submitted: 2019-10-09

Identification and Estimation of SVARMA models with Independent and Non-Gaussian Inputs

Authors: Bernd Funovits

This paper analyzes identifiability properties of structural vector
autoregressive moving average (SVARMA) models driven by independent and
non-Gaussian shocks. It is well known, that SVARMA models driven by Gaussian
errors are not identified without imposing further identifying restrictions on
the parameters. Even in reduced form and assuming stability and invertibility,
vector autoregressive moving average models are in general not identified
without requiring certain parameter matrices to be non-singular. Independence
and non-Gaussianity of the shocks is used to show that they are identified up
to permutations and scalings. In this way, typically imposed identifying
restrictions are made testable. Furthermore, we introduce a maximum-likelihood
estimator of the non-Gaussian SVARMA model which is consistent and
asymptotically normally distributed.

arXiv link: http://arxiv.org/abs/1910.04087v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-10-09

Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm

Authors: Matteo Barigozzi, Matteo Luciani

We study estimation of large Dynamic Factor models implemented through the
Expectation Maximization (EM) algorithm, jointly with the Kalman smoother. We
prove that as both the cross-sectional dimension, $n$, and the sample size,
$T$, diverge to infinity: (i) the estimated loadings are $\sqrt T$-consistent,
asymptotically normal and equivalent to their Quasi Maximum Likelihood
estimates; (ii) the estimated factors are $\sqrt n$-consistent, asymptotically
normal and equivalent to their Weighted Least Squares estimates. Moreover, the
estimated loadings are asymptotically as efficient as those obtained by
Principal Components analysis, while the estimated factors are more efficient
if the idiosyncratic covariance is sparse enough.

arXiv link: http://arxiv.org/abs/1910.03821v5

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-10-08

On the feasibility of parsimonious variable selection for Hotelling's $T^2$-test

Authors: Michael D. Perlman

Hotelling's $T^2$-test for the mean of a multivariate normal distribution is
one of the triumphs of classical multivariate analysis. It is uniformly most
powerful among invariant tests, and admissible, proper Bayes, and locally and
asymptotically minimax among all tests. Nonetheless, investigators often prefer
non-invariant tests, especially those obtained by selecting only a small subset
of variables from which the $T^2$-statistic is to be calculated, because such
reduced statistics are more easily interpretable for their specific
application. Thus it is relevant to ask the extent to which power is lost when
variable selection is limited to very small subsets of variables, e.g. of size
one (yielding univariate Student-$t^2$ tests) or size two (yielding bivariate
$T^2$-tests). This study presents some evidence, admittedly fragmentary and
incomplete, suggesting that in some cases no power may be lost over a wide
range of alternatives.

arXiv link: http://arxiv.org/abs/1910.03669v1

Econometrics arXiv paper, submitted: 2019-10-07

Application of Machine Learning in Forecasting International Trade Trends

Authors: Feras Batarseh, Munisamy Gopinath, Ganesh Nalluru, Jayson Beckman

International trade policies have recently garnered attention for limiting
cross-border exchange of essential goods (e.g. steel, aluminum, soybeans, and
beef). Since trade critically affects employment and wages, predicting future
patterns of trade is a high-priority for policy makers around the world. While
traditional economic models aim to be reliable predictors, we consider the
possibility that Machine Learning (ML) techniques allow for better predictions
to inform policy decisions. Open-government data provide the fuel to power the
algorithms that can explain and forecast trade flows to inform policies. Data
collected in this article describe international trade transactions and
commonly associated economic factors. Machine learning (ML) models deployed
include: ARIMA, GBoosting, XGBoosting, and LightGBM for predicting future trade
patterns, and K-Means clustering of countries according to economic factors.
Unlike short-term and subjective (straight-line) projections and medium-term
(aggre-gated) projections, ML methods provide a range of data-driven and
interpretable projections for individual commodities. Models, their results,
and policies are introduced and evaluated for prediction quality.

arXiv link: http://arxiv.org/abs/1910.03112v1

Econometrics arXiv paper, submitted: 2019-10-07

Boosting High Dimensional Predictive Regressions with Time Varying Parameters

Authors: Kashif Yousuf, Serena Ng

High dimensional predictive regressions are useful in wide range of
applications. However, the theory is mainly developed assuming that the model
is stationary with time invariant parameters. This is at odds with the
prevalent evidence for parameter instability in economic time series, but
theories for parameter instability are mainly developed for models with a small
number of covariates. In this paper, we present two $L_2$ boosting algorithms
for estimating high dimensional models in which the coefficients are modeled as
functions evolving smoothly over time and the predictors are locally
stationary. The first method uses componentwise local constant estimators as
base learner, while the second relies on componentwise local linear estimators.
We establish consistency of both methods, and address the practical issues of
choosing the bandwidth for the base learners and the number of boosting
iterations. In an extensive application to macroeconomic forecasting with many
potential predictors, we find that the benefits to modeling time variation are
substantial and they increase with the forecast horizon. Furthermore, the
timing of the benefits suggests that the Great Moderation is associated with
substantial instability in the conditional mean of various economic series.

arXiv link: http://arxiv.org/abs/1910.03109v1

Econometrics arXiv cross-link from math.PR (math.PR), submitted: 2019-10-07

A 2-Dimensional Functional Central Limit Theorem for Non-stationary Dependent Random Fields

Authors: Michael C. Tseng

We obtain an elementary invariance principle for multi-dimensional Brownian
sheet where the underlying random fields are not necessarily independent or
stationary. Possible applications include unit-root tests for spatial as well
as panel data models.

arXiv link: http://arxiv.org/abs/1910.02577v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-10-06

Predicting popularity of EV charging infrastructure from GIS data

Authors: Milan Straka, Pasquale De Falco, Gabriella Ferruzzi, Daniela Proto, Gijs van der Poel, Shahab Khormali, Ľuboš Buzna

The availability of charging infrastructure is essential for large-scale
adoption of electric vehicles (EV). Charging patterns and the utilization of
infrastructure have consequences not only for the energy demand, loading local
power grids but influence the economic returns, parking policies and further
adoption of EVs. We develop a data-driven approach that is exploiting
predictors compiled from GIS data describing the urban context and urban
activities near charging infrastructure to explore correlations with a
comprehensive set of indicators measuring the performance of charging
infrastructure. The best fit was identified for the size of the unique group of
visitors (popularity) attracted by the charging infrastructure. Consecutively,
charging infrastructure is ranked by popularity. The question of whether or not
a given charging spot belongs to the top tier is posed as a binary
classification problem and predictive performance of logistic regression
regularized with an l-1 penalty, random forests and gradient boosted regression
trees is evaluated. Obtained results indicate that the collected predictors
contain information that can be used to predict the popularity of charging
infrastructure. The significance of predictors and how they are linked with the
popularity are explored as well. The proposed methodology can be used to inform
charging infrastructure deployment strategies.

arXiv link: http://arxiv.org/abs/1910.02498v1

Econometrics arXiv updated paper (originally submitted: 2019-10-03)

Informational Content of Factor Structures in Simultaneous Binary Response Models

Authors: Shakeeb Khan, Arnaud Maurel, Yichong Zhang

We study the informational content of factor structures in discrete
triangular systems. Factor structures have been employed in a variety of
settings in cross sectional and panel data models, and in this paper we
formally quantify their identifying power in a bivariate system often employed
in the treatment effects literature. Our main findings are that imposing a
factor structure yields point identification of parameters of interest, such as
the coefficient associated with the endogenous regressor in the outcome
equation, under weaker assumptions than usually required in these models. In
particular, we show that a "non-standard" exclusion restriction that requires
an explanatory variable in the outcome equation to be excluded from the
treatment equation is no longer necessary for identification, even in cases
where all of the regressors from the outcome equation are discrete. We also
establish identification of the coefficient of the endogenous regressor in
models with more general factor structures, in situations where one has access
to at least two continuous measurements of the common factor.

arXiv link: http://arxiv.org/abs/1910.01318v3

Econometrics arXiv paper, submitted: 2019-10-01

An introduction to flexible methods for policy evaluation

Authors: Martin Huber

This chapter covers different approaches to policy evaluation for assessing
the causal effect of a treatment or intervention on an outcome of interest. As
an introduction to causal inference, the discussion starts with the
experimental evaluation of a randomized treatment. It then reviews evaluation
methods based on selection on observables (assuming a quasi-random treatment
given observed covariates), instrumental variables (inducing a quasi-random
shift in the treatment), difference-in-differences and changes-in-changes
(exploiting changes in outcomes over time), as well as regression
discontinuities and kinks (using changes in the treatment assignment at some
threshold of a running variable). The chapter discusses methods particularly
suited for data with many observations for a flexible (i.e. semi- or
nonparametric) modeling of treatment effects, and/or many (i.e. high
dimensional) observed covariates by applying machine learning to select and
control for covariates in a data-driven way. This is not only useful for
tackling confounding by controlling for instance for factors jointly affecting
the treatment and the outcome, but also for learning effect heterogeneities
across subgroups defined upon observable covariates and optimally targeting
those groups for which the treatment is most effective.

arXiv link: http://arxiv.org/abs/1910.00641v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-10-01

Usage-Based Vehicle Insurance: Driving Style Factors of Accident Probability and Severity

Authors: Konstantin Korishchenko, Ivan Stankevich, Nikolay Pilnik, Daria Petrova

The paper introduces an approach to telematics devices data application in
automotive insurance. We conduct a comparative analysis of different types of
devices that collect information on vehicle utilization and driving style of
its driver, describe advantages and disadvantages of these devices and indicate
the most efficient from the insurer point of view. The possible formats of
telematics data are described and methods of their processing to a format
convenient for modelling are proposed. We also introduce an approach to
classify the accidents strength. Using all the available information, we
estimate accident probability models for different types of accidents and
identify an optimal set of factors for each of the models. We assess the
quality of resulting models using both in-sample and out-of-sample estimates.

arXiv link: http://arxiv.org/abs/1910.00460v2

Econometrics arXiv updated paper (originally submitted: 2019-09-30)

An econometric analysis of the Italian cultural supply

Authors: Consuelo Nava, Maria Grazia Zoia

Price indexes in time and space is a most relevant topic in statistical
analysis from both the methodological and the application side. In this paper a
price index providing a novel and effective solution to price indexes over
several periods and among several countries, that is in both a multi-period and
a multilateral framework, is devised. The reference basket of the devised index
is the union of the intersections of the baskets of all periods/countries in
pairs. As such, it provides a broader coverage than usual indexes. Index
closed-form expressions and updating formulas are provided and properties
investigated. Last, applications with real and simulated data provide evidence
of the performance of the index at stake.

arXiv link: http://arxiv.org/abs/1910.00073v3

Econometrics arXiv paper, submitted: 2019-09-27

Monotonicity-Constrained Nonparametric Estimation and Inference for First-Price Auctions

Authors: Jun Ma, Vadim Marmer, Artyom Shneyerov, Pai Xu

We propose a new nonparametric estimator for first-price auctions with
independent private values that imposes the monotonicity constraint on the
estimated inverse bidding strategy. We show that our estimator has a smaller
asymptotic variance than that of Guerre, Perrigne and Vuong's (2000) estimator.
In addition to establishing pointwise asymptotic normality of our estimator, we
provide a bootstrap-based approach to constructing uniform confidence bands for
the density function of latent valuations.

arXiv link: http://arxiv.org/abs/1909.12974v1

Econometrics arXiv updated paper (originally submitted: 2019-09-27)

Debiased/Double Machine Learning for Instrumental Variable Quantile Regressions

Authors: Jau-er Chen, Chien-Hsun Huang, Jia-Jyun Tien

In this study, we investigate estimation and inference on a low-dimensional
causal parameter in the presence of high-dimensional controls in an
instrumental variable quantile regression. Our proposed econometric procedure
builds on the Neyman-type orthogonal moment conditions of a previous study
Chernozhukov, Hansen and Wuthrich (2018) and is thus relatively insensitive to
the estimation of the nuisance parameters. The Monte Carlo experiments show
that the estimator copes well with high-dimensional controls. We also apply the
procedure to empirically reinvestigate the quantile treatment effect of 401(k)
participation on accumulated wealth.

arXiv link: http://arxiv.org/abs/1909.12592v3

Econometrics arXiv updated paper (originally submitted: 2019-09-26)

Inference in Nonparametric Series Estimation with Specification Searches for the Number of Series Terms

Authors: Byunghoon Kang

Nonparametric series regression often involves specification search over the
tuning parameter, i.e., evaluating estimates and confidence intervals with a
different number of series terms. This paper develops pointwise and uniform
inferences for conditional mean functions in nonparametric series estimations
that are uniform in the number of series terms. As a result, this paper
constructs confidence intervals and confidence bands with possibly
data-dependent series terms that have valid asymptotic coverage probabilities.
This paper also considers a partially linear model setup and develops inference
methods for the parametric part uniform in the number of series terms. The
finite sample performance of the proposed methods is investigated in various
simulation setups as well as in an illustrative example, i.e., the
nonparametric estimation of the wage elasticity of the expected labor supply
from Blomquist and Newey (2002).

arXiv link: http://arxiv.org/abs/1909.12162v2

Econometrics arXiv updated paper (originally submitted: 2019-09-24)

A Peek into the Unobservable: Hidden States and Bayesian Inference for the Bitcoin and Ether Price Series

Authors: Constandina Koki, Stefanos Leonardos, Georgios Piliouras

Conventional financial models fail to explain the economic and monetary
properties of cryptocurrencies due to the latter's dual nature: their usage as
financial assets on the one side and their tight connection to the underlying
blockchain structure on the other. In an effort to examine both components via
a unified approach, we apply a recently developed Non-Homogeneous Hidden Markov
(NHHM) model with an extended set of financial and blockchain specific
covariates on the Bitcoin (BTC) and Ether (ETH) price data. Based on the
observable series, the NHHM model offers a novel perspective on the underlying
microstructure of the cryptocurrency market and provides insight on
unobservable parameters such as the behavior of investors, traders and miners.
The algorithm identifies two alternating periods (hidden states) of inherently
different activity -- fundamental versus uninformed or noise traders -- in the
Bitcoin ecosystem and unveils differences in both the short/long run dynamics
and in the financial characteristics of the two states, such as significant
explanatory variables, extreme events and varying series autocorrelation. In a
somewhat unexpected result, the Bitcoin and Ether markets are found to be
influenced by markedly distinct indicators despite their perceived correlation.
The current approach backs earlier findings that cryptocurrencies are unlike
any conventional financial asset and makes a first step towards understanding
cryptocurrency markets via a more comprehensive lens.

arXiv link: http://arxiv.org/abs/1909.10957v2

Econometrics arXiv cross-link from cs.GT (cs.GT), submitted: 2019-09-24

Scalable Fair Division for 'At Most One' Preferences

Authors: Christian Kroer, Alexander Peysakhovich

Allocating multiple scarce items across a set of individuals is an important
practical problem. In the case of divisible goods and additive preferences a
convex program can be used to find the solution that maximizes Nash welfare
(MNW). The MNW solution is equivalent to finding the equilibrium of a market
economy (aka. the competitive equilibrium from equal incomes, CEEI) and thus
has good properties such as Pareto optimality, envy-freeness, and incentive
compatibility in the large. Unfortunately, this equivalence (and nice
properties) breaks down for general preference classes. Motivated by real world
problems such as course allocation and recommender systems we study the case of
additive `at most one' (AMO) preferences - individuals want at most 1 of each
item and lotteries are allowed. We show that in this case the MNW solution is
still a convex program and importantly is a CEEI solution when the instance
gets large but has a `low rank' structure. Thus a polynomial time algorithm can
be used to scale CEEI (which is in general PPAD-hard) for AMO preferences. We
examine whether the properties guaranteed in the limit hold approximately in
finite samples using several real datasets.

arXiv link: http://arxiv.org/abs/1909.10925v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2019-09-24

Structural Change Analysis of Active Cryptocurrency Market

Authors: C. Y. Tan, Y. B. Koh, K. H. Ng, K. H. Ng

Structural Change Analysis of Active Cryptocurrency Market

arXiv link: http://arxiv.org/abs/1909.10679v1

Econometrics arXiv paper, submitted: 2019-09-23

Goodness-of-Fit Tests based on Series Estimators in Nonparametric Instrumental Regression

Authors: Christoph Breunig

This paper proposes several tests of restricted specification in
nonparametric instrumental regression. Based on series estimators, test
statistics are established that allow for tests of the general model against a
parametric or nonparametric specification as well as a test of exogeneity of
the vector of regressors. The tests' asymptotic distributions under correct
specification are derived and their consistency against any alternative model
is shown. Under a sequence of local alternative hypotheses, the asymptotic
distributions of the tests is derived. Moreover, uniform consistency is
established over a class of alternatives whose distance to the null hypothesis
shrinks appropriately as the sample size increases. A Monte Carlo study
examines finite sample performance of the test statistics.

arXiv link: http://arxiv.org/abs/1909.10133v1

Econometrics arXiv paper, submitted: 2019-09-23

Specification Testing in Nonparametric Instrumental Quantile Regression

Authors: Christoph Breunig

There are many environments in econometrics which require nonseparable
modeling of a structural disturbance. In a nonseparable model with endogenous
regressors, key conditions are validity of instrumental variables and
monotonicity of the model in a scalar unobservable variable. Under these
conditions the nonseparable model is equivalent to an instrumental quantile
regression model. A failure of the key conditions, however, makes instrumental
quantile regression potentially inconsistent. This paper develops a methodology
for testing the hypothesis whether the instrumental quantile regression model
is correctly specified. Our test statistic is asymptotically normally
distributed under correct specification and consistent against any alternative
model. In addition, test statistics to justify the model simplification are
established. Finite sample properties are examined in a Monte Carlo study and
an empirical illustration is provided.

arXiv link: http://arxiv.org/abs/1909.10129v1

Econometrics arXiv updated paper (originally submitted: 2019-09-22)

Inference for Linear Conditional Moment Inequalities

Authors: Isaiah Andrews, Jonathan Roth, Ariel Pakes

We show that moment inequalities in a wide variety of economic applications
have a particular linear conditional structure. We use this structure to
construct uniformly valid confidence sets that remain computationally tractable
even in settings with nuisance parameters. We first introduce least favorable
critical values which deliver non-conservative tests if all moments are
binding. Next, we introduce a novel conditional inference approach which
ensures a strong form of insensitivity to slack moments. Our recommended
approach is a hybrid technique which combines desirable aspects of the least
favorable and conditional methods. The hybrid approach performs well in
simulations calibrated to Wollmann (2018), with favorable power and
computational time comparisons relative to existing alternatives.

arXiv link: http://arxiv.org/abs/1909.10062v5

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-09-22

Meaningful causal decompositions in health equity research: definition, identification, and estimation through a weighting framework

Authors: John W. Jackson

Causal decomposition analyses can help build the evidence base for
interventions that address health disparities (inequities). They ask how
disparities in outcomes may change under hypothetical intervention. Through
study design and assumptions, they can rule out alternate explanations such as
confounding, selection-bias, and measurement error, thereby identifying
potential targets for intervention. Unfortunately, the literature on causal
decomposition analysis and related methods have largely ignored equity concerns
that actual interventionists would respect, limiting their relevance and
practical value. This paper addresses these concerns by explicitly considering
what covariates the outcome disparity and hypothetical intervention adjust for
(so-called allowable covariates) and the equity value judgements these choices
convey, drawing from the bioethics, biostatistics, epidemiology, and health
services research literatures. From this discussion, we generalize
decomposition estimands and formulae to incorporate allowable covariate sets,
to reflect equity choices, while still allowing for adjustment of non-allowable
covariates needed to satisfy causal assumptions. For these general formulae, we
provide weighting-based estimators based on adaptations of
ratio-of-mediator-probability and inverse-odds-ratio weighting. We discuss when
these estimators reduce to already used estimators under certain equity value
judgements, and a novel adaptation under other judgements.

arXiv link: http://arxiv.org/abs/1909.10060v3

Econometrics arXiv updated paper (originally submitted: 2019-09-22)

Subspace Clustering for Panel Data with Interactive Effects

Authors: Jiangtao Duan, Wei Gao, Hao Qu, Hon Keung Tony

In this paper, a statistical model for panel data with unobservable grouped
factor structures which are correlated with the regressors and the group
membership can be unknown. The factor loadings are assumed to be in different
subspaces and the subspace clustering for factor loadings are considered. A
method called least squares subspace clustering estimate (LSSC) is proposed to
estimate the model parameters by minimizing the least-square criterion and to
perform the subspace clustering simultaneously. The consistency of the proposed
subspace clustering is proved and the asymptotic properties of the estimation
procedure are studied under certain conditions. A Monte Carlo simulation study
is used to illustrate the advantages of the proposed method. Further
considerations for the situations that the number of subspaces for factors, the
dimension of factors and the dimension of subspaces are unknown are also
discussed. For illustrative purposes, the proposed method is applied to study
the linkage between income and democracy across countries while subspace
patterns of unobserved factors and factor loadings are allowed.

arXiv link: http://arxiv.org/abs/1909.09928v2

Econometrics arXiv updated paper (originally submitted: 2019-09-20)

Doubly Robust Identification for Causal Panel Data Models

Authors: Dmitry Arkhangelsky, Guido W. Imbens

We study identification and estimation of causal effects in settings with
panel data. Traditionally researchers follow model-based identification
strategies relying on assumptions governing the relation between the potential
outcomes and the observed and unobserved confounders. We focus on a different,
complementary approach to identification where assumptions are made about the
connection between the treatment assignment and the unobserved confounders.
Such strategies are common in cross-section settings but rarely used with panel
data. We introduce different sets of assumptions that follow the two paths to
identification and develop a doubly robust approach. We propose estimation
methods that build on these identification strategies.

arXiv link: http://arxiv.org/abs/1909.09412v3

Econometrics arXiv paper, submitted: 2019-09-20

Discerning Solution Concepts

Authors: Nail Kashaev, Bruno Salcedo

The empirical analysis of discrete complete-information games has relied on
behavioral restrictions in the form of solution concepts, such as Nash
equilibrium. Choosing the right solution concept is crucial not just for
identification of payoff parameters, but also for the validity and
informativeness of counterfactual exercises and policy implications. We say
that a solution concept is discernible if it is possible to determine whether
it generated the observed data on the players' behavior and covariates. We
propose a set of conditions that make it possible to discern solution concepts.
In particular, our conditions are sufficient to tell whether the players'
choices emerged from Nash equilibria. We can also discern between
rationalizable behavior, maxmin behavior, and collusive behavior. Finally, we
identify the correlation structure of unobserved shocks in our model using a
novel approach.

arXiv link: http://arxiv.org/abs/1909.09320v1

Econometrics arXiv updated paper (originally submitted: 2019-09-18)

Nonparametric Estimation of the Random Coefficients Model: An Elastic Net Approach

Authors: Florian Heiss, Stephan Hetzenecker, Maximilian Osterhaus

This paper investigates and extends the computationally attractive
nonparametric random coefficients estimator of Fox, Kim, Ryan, and Bajari
(2011). We show that their estimator is a special case of the nonnegative
LASSO, explaining its sparse nature observed in many applications. Recognizing
this link, we extend the estimator, transforming it to a special case of the
nonnegative elastic net. The extension improves the estimator's recovery of the
true support and allows for more accurate estimates of the random coefficients'
distribution. Our estimator is a generalization of the original estimator and
therefore, is guaranteed to have a model fit at least as good as the original
one. A theoretical analysis of both estimators' properties shows that, under
conditions, our generalized estimator approximates the true distribution more
accurately. Two Monte Carlo experiments and an application to a travel mode
data set illustrate the improved performance of the generalized estimator.

arXiv link: http://arxiv.org/abs/1909.08434v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-09-18

How have German University Tuition Fees Affected Enrollment Rates: Robust Model Selection and Design-based Inference in High-Dimensions

Authors: Konstantin Görgen, Melanie Schienle

We use official data for all 16 federal German states to study the causal
effect of a flat 1000 Euro state-dependent university tuition fee on the
enrollment behavior of students during the years 2006-2014. In particular, we
show how the variation in the introduction scheme across states and times can
be exploited to identify the federal average causal effect of tuition fees by
controlling for a large amount of potentially influencing attributes for state
heterogeneity. We suggest a stability post-double selection methodology to
robustly determine the causal effect across types in the transparently modeled
unknown response components. The proposed stability resampling scheme in the
two LASSO selection steps efficiently mitigates the risk of model
underspecification and thus biased effects when the tuition fee policy decision
also depends on relevant variables for the state enrollment rates. Correct
inference for the full cross-section state population in the sample requires
adequate design -- rather than sampling-based standard errors. With the
data-driven model selection and explicit control for spatial cross-effects we
detect that tuition fees induce substantial migration effects where the
mobility occurs both from fee but also from non-fee states suggesting also a
general movement for quality. Overall, we find a significant negative impact of
up to 4.5 percentage points of fees on student enrollment. This is in contrast
to plain one-step LASSO or previous empirical studies with full fixed effects
linear panel regressions which generally underestimate the size and get an only
insignificant effect.

arXiv link: http://arxiv.org/abs/1909.08299v2

Econometrics arXiv paper, submitted: 2019-09-17

Adjusted QMLE for the spatial autoregressive parameter

Authors: Federico Martellosio, Grant Hillier

One simple, and often very effective, way to attenuate the impact of nuisance
parameters on maximum likelihood estimation of a parameter of interest is to
recenter the profile score for that parameter. We apply this general principle
to the quasi-maximum likelihood estimator (QMLE) of the autoregressive
parameter $\lambda$ in a spatial autoregression. The resulting estimator for
$\lambda$ has better finite sample properties compared to the QMLE for
$\lambda$, especially in the presence of a large number of covariates. It can
also solve the incidental parameter problem that arises, for example, in social
interaction models with network fixed effects, or in spatial panel models with
individual or time fixed effects. However, spatial autoregressions present
specific challenges for this type of adjustment, because recentering the
profile score may cause the adjusted estimate to be outside the usual parameter
space for $\lambda$. Conditions for this to happen are given, and implications
are discussed. For inference, we propose confidence intervals based on a
Lugannani--Rice approximation to the distribution of the adjusted QMLE of
$\lambda$. Based on our simulations, the coverage properties of these intervals
are excellent even in models with a large number of covariates.

arXiv link: http://arxiv.org/abs/1909.08141v1

Econometrics arXiv updated paper (originally submitted: 2019-09-17)

Distributional conformal prediction

Authors: Victor Chernozhukov, Kaspar Wüthrich, Yinchu Zhu

We propose a robust method for constructing conditionally valid prediction
intervals based on models for conditional distributions such as quantile and
distribution regression. Our approach can be applied to important prediction
problems including cross-sectional prediction, k-step-ahead forecasts,
synthetic controls and counterfactual prediction, and individual treatment
effects prediction. Our method exploits the probability integral transform and
relies on permuting estimated ranks. Unlike regression residuals, ranks are
independent of the predictors, allowing us to construct conditionally valid
prediction intervals under heteroskedasticity. We establish approximate
conditional validity under consistent estimation and provide approximate
unconditional validity under model misspecification, overfitting, and with time
series data. We also propose a simple "shape" adjustment of our baseline method
that yields optimal prediction intervals.

arXiv link: http://arxiv.org/abs/1909.07889v3

Econometrics arXiv paper, submitted: 2019-09-15

Statistical inference for statistical decisions

Authors: Charles F. Manski

The Wald development of statistical decision theory addresses decision making
with sample data. Wald's concept of a statistical decision function (SDF)
embraces all mappings of the form [data -> decision]. An SDF need not perform
statistical inference; that is, it need not use data to draw conclusions about
the true state of nature. Inference-based SDFs have the sequential form [data
-> inference -> decision]. This paper motivates inference-based SDFs as
practical procedures for decision making that may accomplish some of what Wald
envisioned. The paper first addresses binary choice problems, where all SDFs
may be viewed as hypothesis tests. It next considers as-if optimization, which
uses a point estimate of the true state as if the estimate were accurate. It
then extends this idea to as-if maximin and minimax-regret decisions, which use
point estimates of some features of the true state as if they were accurate.
The paper primarily uses finite-sample maximum regret to evaluate the
performance of inference-based SDFs. To illustrate abstract ideas, it presents
specific findings concerning treatment choice and point prediction with sample
data.

arXiv link: http://arxiv.org/abs/1909.06853v1

Econometrics arXiv paper, submitted: 2019-09-14

Comparing the forecasting of cryptocurrencies by Bayesian time-varying volatility models

Authors: Rick Bohte, Luca Rossini

This paper studies the forecasting ability of cryptocurrency time series.
This study is about the four most capitalized cryptocurrencies: Bitcoin,
Ethereum, Litecoin and Ripple. Different Bayesian models are compared,
including models with constant and time-varying volatility, such as stochastic
volatility and GARCH. Moreover, some crypto-predictors are included in the
analysis, such as S&P 500 and Nikkei 225. In this paper the results show that
stochastic volatility is significantly outperforming the benchmark of VAR in
both point and density forecasting. Using a different type of distribution, for
the errors of the stochastic volatility the student-t distribution came out to
be outperforming the standard normal approach.

arXiv link: http://arxiv.org/abs/1909.06599v1

Econometrics arXiv updated paper (originally submitted: 2019-09-12)

Fast Algorithms for the Quantile Regression Process

Authors: Victor Chernozhukov, Iván Fernández-Val, Blaise Melly

The widespread use of quantile regression methods depends crucially on the
existence of fast algorithms. Despite numerous algorithmic improvements, the
computation time is still non-negligible because researchers often estimate
many quantile regressions and use the bootstrap for inference. We suggest two
new fast algorithms for the estimation of a sequence of quantile regressions at
many quantile indexes. The first algorithm applies the preprocessing idea of
Portnoy and Koenker (1997) but exploits a previously estimated quantile
regression to guess the sign of the residuals. This step allows for a reduction
of the effective sample size. The second algorithm starts from a previously
estimated quantile regression at a similar quantile index and updates it using
a single Newton-Raphson iteration. The first algorithm is exact, while the
second is only asymptotically equivalent to the traditional quantile regression
estimator. We also apply the preprocessing idea to the bootstrap by using the
sample estimates to guess the sign of the residuals in the bootstrap sample.
Simulations show that our new algorithms provide very large improvements in
computation time without significant (if any) cost in the quality of the
estimates. For instance, we divide by 100 the time required to estimate 99
quantile regressions with 20 regressors and 50,000 observations.

arXiv link: http://arxiv.org/abs/1909.05782v2

Econometrics arXiv paper, submitted: 2019-09-12

A Consistent LM Type Specification Test for Semiparametric Panel Data Models

Authors: Ivan Korolev

This paper develops a consistent series-based specification test for
semiparametric panel data models with fixed effects. The test statistic
resembles the Lagrange Multiplier (LM) test statistic in parametric models and
is based on a quadratic form in the restricted model residuals. The use of
series methods facilitates both estimation of the null model and computation of
the test statistic. The asymptotic distribution of the test statistic is
standard normal, so that appropriate critical values can easily be computed.
The projection property of series estimators allows me to develop a degrees of
freedom correction. This correction makes it possible to account for the
estimation variance and obtain refined asymptotic results. It also
substantially improves the finite sample performance of the test.

arXiv link: http://arxiv.org/abs/1909.05649v1

Econometrics arXiv paper, submitted: 2019-09-12

Estimation and Applications of Quantile Regression for Binary Longitudinal Data

Authors: Mohammad Arshad Rahman, Angela Vossmeyer

This paper develops a framework for quantile regression in binary
longitudinal data settings. A novel Markov chain Monte Carlo (MCMC) method is
designed to fit the model and its computational efficiency is demonstrated in a
simulation study. The proposed approach is flexible in that it can account for
common and individual-specific parameters, as well as multivariate
heterogeneity associated with several covariates. The methodology is applied to
study female labor force participation and home ownership in the United States.
The results offer new insights at the various quantiles, which are of interest
to policymakers and researchers alike.

arXiv link: http://arxiv.org/abs/1909.05560v1

Econometrics arXiv updated paper (originally submitted: 2019-09-12)

Quantile regression methods for first-price auctions

Authors: Nathalie Gimenes, Emmanuel Guerre

The paper proposes a quantile-regression inference framework for first-price
auctions with symmetric risk-neutral bidders under the independent
private-value paradigm. It is first shown that a private-value quantile
regression generates a quantile regression for the bids. The private-value
quantile regression can be easily estimated from the bid quantile regression
and its derivative with respect to the quantile level. This also allows to test
for various specification or exogeneity null hypothesis using the observed bids
in a simple way. A new local polynomial technique is proposed to estimate the
latter over the whole quantile level interval. Plug-in estimation of
functionals is also considered, as needed for the expected revenue or the case
of CRRA risk-averse bidders, which is amenable to our framework. A
quantile-regression analysis to USFS timber is found more appropriate than the
homogenized-bid methodology and illustrates the contribution of each
explanatory variables to the private-value distribution. Linear interactive
sieve extensions are proposed and studied in the Appendices.

arXiv link: http://arxiv.org/abs/1909.05542v2

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2019-09-12

Recovering Preferences from Finite Data

Authors: Christopher P. Chambers, Federico Echenique, Nicolas Lambert

We study preferences estimated from finite choice experiments and provide
sufficient conditions for convergence to a unique underlying "true" preference.
Our conditions are weak, and therefore valid in a wide range of economic
environments. We develop applications to expected utility theory, choice over
consumption bundles, menu choice and intertemporal consumption. Our framework
unifies the revealed preference tradition with models that allow for errors.

arXiv link: http://arxiv.org/abs/1909.05457v4

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2019-09-11

Validating Weak-form Market Efficiency in United States Stock Markets with Trend Deterministic Price Data and Machine Learning

Authors: Samuel Showalter, Jeffrey Gropp

The Efficient Market Hypothesis has been a staple of economics research for
decades. In particular, weak-form market efficiency -- the notion that past
prices cannot predict future performance -- is strongly supported by
econometric evidence. In contrast, machine learning algorithms implemented to
predict stock price have been touted, to varying degrees, as successful.
Moreover, some data scientists boast the ability to garner above-market returns
using price data alone. This study endeavors to connect existing econometric
research on weak-form efficient markets with data science innovations in
algorithmic trading. First, a traditional exploration of stationarity in stock
index prices over the past decade is conducted with Augmented Dickey-Fuller and
Variance Ratio tests. Then, an algorithmic trading platform is implemented with
the use of five machine learning algorithms. Econometric findings identify
potential stationarity, hinting technical evaluation may be possible, though
algorithmic trading results find little predictive power in any machine
learning model, even when using trend-specific metrics. Accounting for
transaction costs and risk, no system achieved above-market returns
consistently. Our findings reinforce the validity of weak-form market
efficiency.

arXiv link: http://arxiv.org/abs/1909.05151v1

Econometrics arXiv updated paper (originally submitted: 2019-09-11)

Matching Estimators with Few Treated and Many Control Observations

Authors: Bruno Ferman

We analyze the properties of matching estimators when there are few treated,
but many control observations. We show that, under standard assumptions, the
nearest neighbor matching estimator for the average treatment effect on the
treated is asymptotically unbiased in this framework. However, when the number
of treated observations is fixed, the estimator is not consistent, and it is
generally not asymptotically normal. Since standard inference methods are
inadequate, we propose alternative inference methods, based on the theory of
randomization tests under approximate symmetry, that are asymptotically valid
in this framework. We show that these tests are valid under relatively strong
assumptions when the number of treated observations is fixed, and under weaker
assumptions when the number of treated observations increases, but at a lower
rate relative to the number of control observations.

arXiv link: http://arxiv.org/abs/1909.05093v4

Econometrics arXiv updated paper (originally submitted: 2019-09-11)

Direct and Indirect Effects based on Changes-in-Changes

Authors: Martin Huber, Mark Schelker, Anthony Strittmatter

We propose a novel approach for causal mediation analysis based on
changes-in-changes assumptions restricting unobserved heterogeneity over time.
This allows disentangling the causal effect of a binary treatment on a
continuous outcome into an indirect effect operating through a binary
intermediate variable (called mediator) and a direct effect running via other
causal mechanisms. We identify average and quantile direct and indirect effects
for various subgroups under the condition that the outcome is monotonic in the
unobserved heterogeneity and that the distribution of the latter does not
change over time conditional on the treatment and the mediator. We also provide
a simulation study and an empirical application to the Jobs II programme.

arXiv link: http://arxiv.org/abs/1909.04981v3

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2019-09-11

Estimating the volatility of Bitcoin using GARCH models

Authors: Samuel Asante Gyamerah

In this paper, an application of three GARCH-type models (sGARCH, iGARCH, and
tGARCH) with Student t-distribution, Generalized Error distribution (GED), and
Normal Inverse Gaussian (NIG) distribution are examined. The new development
allows for the modeling of volatility clustering effects, the leptokurtic and
the skewed distributions in the return series of Bitcoin. Comparative to the
two distributions, the normal inverse Gaussian distribution captured adequately
the fat tails and skewness in all the GARCH type models. The tGARCH model was
the best model as it described the asymmetric occurrence of shocks in the
Bitcoin market. That is, the response of investors to the same amount of good
and bad news are distinct. From the empirical results, it can be concluded that
tGARCH-NIG was the best model to estimate the volatility in the return series
of Bitcoin. Generally, it would be optimal to use the NIG distribution in GARCH
type models since time series of most cryptocurrency are leptokurtic.

arXiv link: http://arxiv.org/abs/1909.04903v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-09-11

Bayesian Inference on Volatility in the Presence of Infinite Jump Activity and Microstructure Noise

Authors: Qi Wang, José E. Figueroa-López, Todd Kuffner

Volatility estimation based on high-frequency data is key to accurately
measure and control the risk of financial assets. A L\'{e}vy process with
infinite jump activity and microstructure noise is considered one of the
simplest, yet accurate enough, models for financial data at high-frequency.
Utilizing this model, we propose a "purposely misspecified" posterior of the
volatility obtained by ignoring the jump-component of the process. The
misspecified posterior is further corrected by a simple estimate of the
location shift and re-scaling of the log likelihood. Our main result
establishes a Bernstein-von Mises (BvM) theorem, which states that the proposed
adjusted posterior is asymptotically Gaussian, centered at a consistent
estimator, and with variance equal to the inverse of the Fisher information. In
the absence of microstructure noise, our approach can be extended to inferences
of the integrated variance of a general It\^o semimartingale. Simulations are
provided to demonstrate the accuracy of the resulting credible intervals, and
the frequentist properties of the approximate Bayesian inference based on the
adjusted posterior.

arXiv link: http://arxiv.org/abs/1909.04853v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-09-10

Regression to the Mean's Impact on the Synthetic Control Method: Bias and Sensitivity Analysis

Authors: Nicholas Illenberger, Dylan S. Small, Pamela A. Shaw

To make informed policy recommendations from observational data, we must be
able to discern true treatment effects from random noise and effects due to
confounding. Difference-in-Difference techniques which match treated units to
control units based on pre-treatment outcomes, such as the synthetic control
approach have been presented as principled methods to account for confounding.
However, we show that use of synthetic controls or other matching procedures
can introduce regression to the mean (RTM) bias into estimates of the average
treatment effect on the treated. Through simulations, we show RTM bias can lead
to inflated type I error rates as well as decreased power in typical policy
evaluation settings. Further, we provide a novel correction for RTM bias which
can reduce bias and attain appropriate type I error rates. This correction can
be used to perform a sensitivity analysis which determines how results may be
affected by RTM. We use our proposed correction and sensitivity analysis to
reanalyze data concerning the effects of California's Proposition 99, a
large-scale tobacco control program, on statewide smoking rates.

arXiv link: http://arxiv.org/abs/1909.04706v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2019-09-10

Double Robustness for Complier Parameters and a Semiparametric Test for Complier Characteristics

Authors: Rahul Singh, Liyang Sun

We propose a semiparametric test to evaluate (i) whether different
instruments induce subpopulations of compliers with the same observable
characteristics on average, and (ii) whether compliers have observable
characteristics that are the same as the full population on average. The test
is a flexible robustness check for the external validity of instruments. We use
it to reinterpret the difference in LATE estimates that Angrist and Evans
(1998) obtain when using different instrumental variables. To justify the test,
we characterize the doubly robust moment for Abadie (2003)'s class of complier
parameters, and we analyze a machine learning update to $\kappa$ weighting.

arXiv link: http://arxiv.org/abs/1909.05244v7

Econometrics arXiv paper, submitted: 2019-09-10

Virtual Historical Simulation for estimating the conditional VaR of large portfolios

Authors: Christian Francq, Jean-Michel Zakoian

In order to estimate the conditional risk of a portfolio's return, two
strategies can be advocated. A multivariate strategy requires estimating a
dynamic model for the vector of risk factors, which is often challenging, when
at all possible, for large portfolios. A univariate approach based on a dynamic
model for the portfolio's return seems more attractive. However, when the
combination of the individual returns is time varying, the portfolio's return
series is typically non stationary which may invalidate statistical inference.
An alternative approach consists in reconstituting a "virtual portfolio", whose
returns are built using the current composition of the portfolio and for which
a stationary dynamic model can be estimated.
This paper establishes the asymptotic properties of this method, that we call
Virtual Historical Simulation. Numerical illustrations on simulated and real
data are provided.

arXiv link: http://arxiv.org/abs/1909.04661v1

Econometrics arXiv updated paper (originally submitted: 2019-09-10)

Dynamics of reallocation within India's income distribution

Authors: Anand Sahasranaman, Henrik Jeldtoft Jensen

We investigate the nature and extent of reallocation occurring within the
Indian income distribution, with a particular focus on the dynamics of the
bottom of the distribution. Specifically, we use a stochastic model of
Geometric Brownian Motion with a reallocation parameter that was constructed to
capture the quantum and direction of composite redistribution implied in the
income distribution. It is well known that inequality has been rising in India
in the recent past, but the assumption has been that while the rich benefit
more than proportionally from economic growth, the poor are also better off
than before. Findings from our model refute this, as we find that since the
early 2000s reallocation has consistently been negative, and that the Indian
income distribution has entered a regime of perverse redistribution of
resources from the poor to the rich. Outcomes from the model indicate not only
that income shares of the bottom decile ( 1%) and bottom percentile ( 0.03%)
are at historic lows, but also that real incomes of the bottom decile (-2.5%)
and percentile (-6%) have declined in the 2000s. We validate these findings
using income distribution data and find support for our contention of
persistent negative reallocation in the 2000s. We characterize these findings
in the context of increasing informalization of the workforce in the formal
manufacturing and service sectors, as well as the growing economic insecurity
of the agricultural workforce in India. Significant structural changes will be
required to address this phenomenon.

arXiv link: http://arxiv.org/abs/1909.04452v4

Econometrics arXiv updated paper (originally submitted: 2019-09-09)

Tree-based Synthetic Control Methods: Consequences of moving the US Embassy

Authors: Nicolaj Søndergaard Mühlbach, Mikkel Slot Nielsen

We recast the synthetic controls for evaluating policies as a counterfactual
prediction problem and replace its linear regression with a nonparametric model
inspired by machine learning. The proposed method enables us to achieve
accurate counterfactual predictions and we provide theoretical guarantees. We
apply our method to a highly debated policy: the relocation of the US embassy
to Jerusalem. In Israel and Palestine, we find that the average number of
weekly conflicts has increased by roughly 103% over 48 weeks since the
relocation was announced on December 6, 2017. By using conformal inference and
placebo tests, we justify our model and find the increase to be statistically
significant.

arXiv link: http://arxiv.org/abs/1909.03968v3

Econometrics arXiv updated paper (originally submitted: 2019-09-08)

An Economic Topology of the Brexit vote

Authors: Pawel Dlotko, Lucy Minford, Simon Rudkin, Wanling Qiu

A desire to understand the decision of the UK to leave the European Union,
Brexit, in the referendum of June 2016 has continued to occupy academics, the
media and politicians. Using topological data analysis ball mapper we extract
information from multi-dimensional datasets gathered on Brexit voting and
regional socio-economic characteristics. While we find broad patterns
consistent with extant empirical work, we also evidence that support for Leave
drew from a far more homogenous demographic than Remain. Obtaining votes from
this concise set was more straightforward for Leave campaigners than was
Remain's task of mobilising a diverse group to oppose Brexit.

arXiv link: http://arxiv.org/abs/1909.03490v2

Econometrics arXiv updated paper (originally submitted: 2019-09-08)

Multiway Cluster Robust Double/Debiased Machine Learning

Authors: Harold D. Chiang, Kengo Kato, Yukun Ma, Yuya Sasaki

This paper investigates double/debiased machine learning (DML) under multiway
clustered sampling environments. We propose a novel multiway cross fitting
algorithm and a multiway DML estimator based on this algorithm. We also develop
a multiway cluster robust standard error formula. Simulations indicate that the
proposed procedure has favorable finite sample performance. Applying the
proposed method to market share data for demand analysis, we obtain larger
two-way cluster robust standard errors than non-robust ones.

arXiv link: http://arxiv.org/abs/1909.03489v3

Econometrics arXiv updated paper (originally submitted: 2019-09-07)

Identifying Different Definitions of Future in the Assessment of Future Economic Conditions: Application of PU Learning and Text Mining

Authors: Masahiro Kato

The Economy Watcher Survey, which is a market survey published by the
Japanese government, contains assessments of current and future economic
conditions by people from various fields. Although this survey provides
insights regarding economic policy for policymakers, a clear definition of the
word "future" in future economic conditions is not provided. Hence, the
assessments respondents provide in the survey are simply based on their
interpretations of the meaning of "future." This motivated us to reveal the
different interpretations of the future in their judgments of future economic
conditions by applying weakly supervised learning and text mining. In our
research, we separate the assessments of future economic conditions into
economic conditions of the near and distant future using learning from positive
and unlabeled data (PU learning). Because the dataset includes data from
several periods, we devised new architecture to enable neural networks to
conduct PU learning based on the idea of multi-task learning to efficiently
learn a classifier. Our empirical analysis confirmed that the proposed method
could separate the future economic conditions, and we interpreted the
classification results to obtain intuitions for policymaking.

arXiv link: http://arxiv.org/abs/1909.03348v3

Econometrics arXiv updated paper (originally submitted: 2019-09-06)

Shrinkage Estimation of Network Spillovers with Factor Structured Errors

Authors: Ayden Higgins, Federico Martellosio

This paper explores the estimation of a panel data model with cross-sectional
interaction that is flexible both in its approach to specifying the network of
connections between cross-sectional units, and in controlling for unobserved
heterogeneity. It is assumed that there are different sources of information
available on a network, which can be represented in the form of multiple
weights matrices. These matrices may reflect observed links, different measures
of connectivity, groupings or other network structures, and the number of
matrices may be increasing with sample size. A penalised quasi-maximum
likelihood estimator is proposed which aims to alleviate the risk of network
misspecification by shrinking the coefficients of irrelevant weights matrices
to exactly zero. Moreover, controlling for unobserved factors in estimation
provides a safeguard against the misspecification that might arise from
unobserved heterogeneity. The asymptotic properties of the estimator are
derived in a framework where the true value of each parameter remains fixed as
the total number of parameters increases. A Monte Carlo simulation is used to
assess finite sample performance, and in an empirical application the method is
applied to study the prevalence of network spillovers in determining growth
rates across countries.

arXiv link: http://arxiv.org/abs/1909.02823v4

Econometrics arXiv updated paper (originally submitted: 2019-09-05)

Using Wasserstein Generative Adversarial Networks for the Design of Monte Carlo Simulations

Authors: Susan Athey, Guido Imbens, Jonas Metzger, Evan Munro

When researchers develop new econometric methods it is common practice to
compare the performance of the new methods to those of existing methods in
Monte Carlo studies. The credibility of such Monte Carlo studies is often
limited because of the freedom the researcher has in choosing the design. In
recent years a new class of generative models emerged in the machine learning
literature, termed Generative Adversarial Networks (GANs) that can be used to
systematically generate artificial data that closely mimics real economic
datasets, while limiting the degrees of freedom for the researcher and
optionally satisfying privacy guarantees with respect to their training data.
In addition if an applied researcher is concerned with the performance of a
particular statistical method on a specific data set (beyond its theoretical
properties in large samples), she may wish to assess the performance, e.g., the
coverage rate of confidence intervals or the bias of the estimator, using
simulated data which resembles her setting. Tol illustrate these methods we
apply Wasserstein GANs (WGANs) to compare a number of different estimators for
average treatment effects under unconfoundedness in three distinct settings
(corresponding to three real data sets) and present a methodology for assessing
the robustness of the results. In this example, we find that (i) there is not
one estimator that outperforms the others in all three settings, so researchers
should tailor their analytic approach to a given setting, and (ii) systematic
simulation studies can be helpful for selecting among competing methods in this
situation.

arXiv link: http://arxiv.org/abs/1909.02210v3

Econometrics arXiv updated paper (originally submitted: 2019-09-04)

Inference in Difference-in-Differences: How Much Should We Trust in Independent Clusters?

Authors: Bruno Ferman

We analyze the challenges for inference in difference-in-differences (DID)
when there is spatial correlation. We present novel theoretical insights and
empirical evidence on the settings in which ignoring spatial correlation should
lead to more or less distortions in DID applications. We show that details such
as the time frame used in the estimation, the choice of the treated and control
groups, and the choice of the estimator, are key determinants of distortions
due to spatial correlation. We also analyze the feasibility and trade-offs
involved in a series of alternatives to take spatial correlation into account.
Given that, we provide relevant recommendations for applied researchers on how
to mitigate and assess the possibility of inference distortions due to spatial
correlation.

arXiv link: http://arxiv.org/abs/1909.01782v7

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-09-04

Testing nonparametric shape restrictions

Authors: Tatiana Komarova, Javier Hidalgo

We describe and examine a test for a general class of shape constraints, such
as constraints on the signs of derivatives, U-(S-)shape, symmetry,
quasi-convexity, log-convexity, $r$-convexity, among others, in a nonparametric
framework using partial sums empirical processes. We show that, after a
suitable transformation, its asymptotic distribution is a functional of the
standard Brownian motion, so that critical values are available. However, due
to the possible poor approximation of the asymptotic critical values to the
finite sample ones, we also describe a valid bootstrap algorithm.

arXiv link: http://arxiv.org/abs/1909.01675v2

Econometrics arXiv updated paper (originally submitted: 2019-09-03)

Bias and Consistency in Three-way Gravity Models

Authors: Martin Weidner, Thomas Zylkin

We study the incidental parameter problem for the “three-way” Poisson
{Pseudo-Maximum Likelihood} (“PPML”) estimator recently recommended for
identifying the effects of trade policies and in other panel data gravity
settings. Despite the number and variety of fixed effects involved, we confirm
PPML is consistent for fixed $T$ and we show it is in fact the only estimator
among a wide range of PML gravity estimators that is generally consistent in
this context when $T$ is fixed. At the same time, asymptotic confidence
intervals in fixed-$T$ panels are not correctly centered at the true point
estimates, and cluster-robust variance estimates used to construct standard
errors are generally biased as well. We characterize each of these biases
analytically and show both numerically and empirically that they are salient
even for real-data settings with a large number of countries. We also offer
practical remedies that can be used to obtain more reliable inferences of the
effects of trade policies and other time-varying gravity variables, which we
make available via an accompanying Stata package called ppml_fe_bias.

arXiv link: http://arxiv.org/abs/1909.01327v6

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-09-03

State Drug Policy Effectiveness: Comparative Policy Analysis of Drug Overdose Mortality

Authors: Jarrod Olson, Po-Hsu Allen Chen, Marissa White, Nicole Brennan, Ning Gong

Opioid overdose rates have reached an epidemic level and state-level policy
innovations have followed suit in an effort to prevent overdose deaths.
State-level drug law is a set of policies that may reinforce or undermine each
other, and analysts have a limited set of tools for handling the policy
collinearity using statistical methods. This paper uses a machine learning
method called hierarchical clustering to empirically generate "policy bundles"
by grouping states with similar sets of policies in force at a given time
together for analysis in a 50-state, 10-year interrupted time series regression
with drug overdose deaths as the dependent variable. Policy clusters were
generated from 138 binomial variables observed by state and year from the
Prescription Drug Abuse Policy System. Clustering reduced the policies to a set
of 10 bundles. The approach allows for ranking of the relative effect of
different bundles and is a tool to recommend those most likely to succeed. This
study shows that a set of policies balancing Medication Assisted Treatment,
Naloxone Access, Good Samaritan Laws, Medication Assisted Treatment,
Prescription Drug Monitoring Programs and legalization of medical marijuana
leads to a reduced number of overdose deaths, but not until its second year in
force.

arXiv link: http://arxiv.org/abs/1909.01936v3

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2019-09-03

Are Bitcoins price predictable? Evidence from machine learning techniques using technical indicators

Authors: Samuel Asante Gyamerah

The uncertainties in future Bitcoin price make it difficult to accurately
predict the price of Bitcoin. Accurately predicting the price for Bitcoin is
therefore important for decision-making process of investors and market players
in the cryptocurrency market. Using historical data from 01/01/2012 to
16/08/2019, machine learning techniques (Generalized linear model via penalized
maximum likelihood, random forest, support vector regression with linear
kernel, and stacking ensemble) were used to forecast the price of Bitcoin. The
prediction models employed key and high dimensional technical indicators as the
predictors. The performance of these techniques were evaluated using mean
absolute percentage error (MAPE), root mean square error (RMSE), mean absolute
error (MAE), and coefficient of determination (R-squared). The performance
metrics revealed that the stacking ensemble model with two base learner (random
forest and generalized linear model via penalized maximum likelihood) and
support vector regression with linear kernel as meta-learner was the optimal
model for forecasting Bitcoin price. The MAPE, RMSE, MAE, and R-squared values
for the stacking ensemble model were 0.0191%, 15.5331 USD, 124.5508 USD, and
0.9967 respectively. These values show a high degree of reliability in
predicting the price of Bitcoin using the stacking ensemble model. Accurately
predicting the future price of Bitcoin will yield significant returns for
investors and market players in the cryptocurrency market.

arXiv link: http://arxiv.org/abs/1909.01268v1

Econometrics arXiv updated paper (originally submitted: 2019-09-02)

SortedEffects: Sorted Causal Effects in R

Authors: Shuowen Chen, Victor Chernozhukov, Iván Fernández-Val, Ye Luo

Chernozhukov et al. (2018) proposed the sorted effect method for nonlinear
regression models. This method consists of reporting percentiles of the partial
effects in addition to the average commonly used to summarize the heterogeneity
in the partial effects. They also proposed to use the sorted effects to carry
out classification analysis where the observational units are classified as
most and least affected if their causal effects are above or below some tail
sorted effects. The R package SortedEffects implements the estimation and
inference methods therein and provides tools to visualize the results. This
vignette serves as an introduction to the package and displays basic
functionality of the functions within.

arXiv link: http://arxiv.org/abs/1909.00836v3

Econometrics arXiv updated paper (originally submitted: 2019-08-31)

Fixed-k Inference for Conditional Extremal Quantiles

Authors: Yuya Sasaki, Yulong Wang

We develop a new extreme value theory for repeated cross-sectional and panel
data to construct asymptotically valid confidence intervals (CIs) for
conditional extremal quantiles from a fixed number $k$ of nearest-neighbor tail
observations. As a by-product, we also construct CIs for extremal quantiles of
coefficients in linear random coefficient models. For any fixed $k$, the CIs
are uniformly valid without parametric assumptions over a set of nonparametric
data generating processes associated with various tail indices. Simulation
studies show that our CIs exhibit superior small-sample coverage and length
properties than alternative nonparametric methods based on asymptotic
normality. Applying the proposed method to Natality Vital Statistics, we study
factors of extremely low birth weights. We find that signs of major effects are
the same as those found in preceding studies based on parametric models, but
with different magnitudes.

arXiv link: http://arxiv.org/abs/1909.00294v3

Econometrics arXiv updated paper (originally submitted: 2019-08-31)

Mapping Firms' Locations in Technological Space: A Topological Analysis of Patent Statistics

Authors: Emerson G. Escolar, Yasuaki Hiraoka, Mitsuru Igami, Yasin Ozcan

Where do firms innovate? Mapping their locations and directions in
technological space is challenging due to its high dimensionality. We propose a
new method to characterize firms' inventive activities via topological data
analysis (TDA) that represents high-dimensional data in a shape graph. Applying
this method to 333 major firms' patents in 1976--2005 reveals substantial
heterogeneity: some firms remain undifferentiated; others develop unique
portfolios. Firms with unique trajectories, which we define and measure
graph-theoretically as "flares" in the Mapper graph, perform better. This
association is statistically and economically significant, and continues to
hold after we control for portfolio size, firm survivorship, industry
classification, and firm fixed effects. By contrast, existing techniques --
such as principal component analysis (PCA) and Jaffe's (1989) clustering method
-- struggle to track these firm-level dynamics.

arXiv link: http://arxiv.org/abs/1909.00257v7

Econometrics arXiv paper, submitted: 2019-08-31

Rethinking travel behavior modeling representations through embeddings

Authors: Francisco C. Pereira

This paper introduces the concept of travel behavior embeddings, a method for
re-representing discrete variables that are typically used in travel demand
modeling, such as mode, trip purpose, education level, family type or
occupation. This re-representation process essentially maps those variables
into a latent space called the embedding space. The benefit of this is
that such spaces allow for richer nuances than the typical transformations used
in categorical variables (e.g. dummy encoding, contrasted encoding, principal
components analysis). While the usage of latent variable representations is not
new per se in travel demand modeling, the idea presented here brings several
innovations: it is an entirely data driven algorithm; it is informative and
consistent, since the latent space can be visualized and interpreted based on
distances between different categories; it preserves interpretability of
coefficients, despite being based on Neural Network principles; and it is
transferrable, in that embeddings learned from one dataset can be reused for
other ones, as long as travel behavior keeps consistent between the datasets.
The idea is strongly inspired on natural language processing techniques,
namely the word2vec algorithm. Such algorithm is behind recent developments
such as in automatic translation or next word prediction. Our method is
demonstrated using a model choice model, and shows improvements of up to 60%
with respect to initial likelihood, and up to 20% with respect to likelihood of
the corresponding traditional model (i.e. using dummy variables) in
out-of-sample evaluation. We provide a new Python package, called PyTre (PYthon
TRavel Embeddings), that others can straightforwardly use to replicate our
results or improve their own models. Our experiments are themselves based on an
open dataset (swissmetro).

arXiv link: http://arxiv.org/abs/1909.00154v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2019-08-30

Systemic Risk Clustering of China Internet Financial Based on t-SNE Machine Learning Algorithm

Authors: Mi Chuanmin, Xu Runjie, Lin Qingtong

With the rapid development of Internet finance, a large number of studies
have shown that Internet financial platforms have different financial systemic
risk characteristics when they are subject to macroeconomic shocks or fragile
internal crisis. From the perspective of regional development of Internet
finance, this paper uses t-SNE machine learning algorithm to obtain data mining
of China's Internet finance development index involving 31 provinces and 335
cities and regions. The conclusion of the peak and thick tail characteristics,
then proposed three classification risks of Internet financial systemic risk,
providing more regionally targeted recommendations for the systematic risk of
Internet finance.

arXiv link: http://arxiv.org/abs/1909.03808v1

Econometrics arXiv paper, submitted: 2019-08-30

The economics of minority language use: theory and empirical evidence for a language game model

Authors: Stefan Sperlich, Jose-Ramon Uriarte

Language and cultural diversity is a fundamental aspect of the present world.
We study three modern multilingual societies -- the Basque Country, Ireland and
Wales -- which are endowed with two, linguistically distant, official
languages: $A$, spoken by all individuals, and $B$, spoken by a bilingual
minority. In the three cases it is observed a decay in the use of minoritarian
$B$, a sign of diversity loss. However, for the "Council of Europe" the key
factor to avoid the shift of $B$ is its use in all domains. Thus, we
investigate the language choices of the bilinguals by means of an evolutionary
game theoretic model. We show that the language population dynamics has reached
an evolutionary stable equilibrium where a fraction of bilinguals have shifted
to speak $A$. Thus, this equilibrium captures the decline in the use of $B$. To
test the theory we build empirical models that predict the use of $B$ for each
proportion of bilinguals. We show that model-based predictions fit very well
the observed use of Basque, Irish, and Welsh.

arXiv link: http://arxiv.org/abs/1908.11604v1

Econometrics arXiv updated paper (originally submitted: 2019-08-28)

Infinitely Stochastic Micro Forecasting

Authors: Matúš Maciak, Ostap Okhrin, Michal Pešta

Forecasting costs is now a front burner in empirical economics. We propose an
unconventional tool for stochastic prediction of future expenses based on the
individual (micro) developments of recorded events. Consider a firm,
enterprise, institution, or state, which possesses knowledge about particular
historical events. For each event, there is a series of several related
subevents: payments or losses spread over time, which all leads to an
infinitely stochastic process at the end. Nevertheless, the issue is that some
already occurred events do not have to be necessarily reported. The aim lies in
forecasting future subevent flows coming from already reported, occurred but
not reported, and yet not occurred events. Our methodology is illustrated on
quantitative risk assessment, however, it can be applied to other areas such as
startups, epidemics, war damages, advertising and commercials, digital
payments, or drug prescription as manifested in the paper. As a theoretical
contribution, inference for infinitely stochastic processes is developed. In
particular, a non-homogeneous Poisson process with non-homogeneous Poisson
processes as marks is used, which includes for instance the Cox process as a
special case.

arXiv link: http://arxiv.org/abs/1908.10636v2

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2019-08-28

Stock Price Forecasting and Hypothesis Testing Using Neural Networks

Authors: Kerda Varaku

In this work we use Recurrent Neural Networks and Multilayer Perceptrons to
predict NYSE, NASDAQ and AMEX stock prices from historical data. We experiment
with different architectures and compare data normalization techniques. Then,
we leverage those findings to question the efficient-market hypothesis through
a formal statistical test.

arXiv link: http://arxiv.org/abs/1908.11212v1

Econometrics arXiv updated paper (originally submitted: 2019-08-27)

Theory of Weak Identification in Semiparametric Models

Authors: Tetsuya Kaji

We provide general formulation of weak identification in semiparametric
models and an efficiency concept. Weak identification occurs when a parameter
is weakly regular, i.e., when it is locally homogeneous of degree zero. When
this happens, consistent or equivariant estimation is shown to be impossible.
We then show that there exists an underlying regular parameter that fully
characterizes the weakly regular parameter. While this parameter is not unique,
concepts of sufficiency and minimality help pin down a desirable one. If
estimation of minimal sufficient underlying parameters is inefficient, it
introduces noise in the corresponding estimation of weakly regular parameters,
whence we can improve the estimators by local asymptotic Rao-Blackwellization.
We call an estimator weakly efficient if it does not admit such improvement.
New weakly efficient estimators are presented in linear IV and nonlinear
regression models. Simulation of a linear IV model demonstrates how 2SLS and
optimal IV estimators are improved.

arXiv link: http://arxiv.org/abs/1908.10478v3

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2019-08-27

A multi-scale symmetry analysis of uninterrupted trends returns of daily financial indices

Authors: C. M. Rodríguez-Martínez, H. F. Coronel-Brizio, A. R. Hernández-Montoya

We present a symmetry analysis of the distribution of variations of different
financial indices, by means of a statistical procedure developed by the authors
based on a symmetry statistic by Einmahl and Mckeague. We applied this
statistical methodology to financial uninterrupted daily trends returns and to
other derived observable. In our opinion, to study distributional symmetry,
trends returns offer more advantages than the commonly used daily financial
returns; the two most important being: 1) Trends returns involve sampling over
different time scales and 2) By construction, this variable time series
contains practically the same number of non-negative and negative entry values.
We also show that these time multi-scale returns display distributional
bi-modality. Daily financial indices analyzed in this work, are the Mexican
IPC, the American DJIA, DAX from Germany and the Japanese Market index Nikkei,
covering a time period from 11-08-1991 to 06-30-2017. We show that, at the time
scale resolution and significance considered in this paper, it is almost always
feasible to find an interval of possible symmetry points containing one most
plausible symmetry point denoted by C. Finally, we study the temporal evolution
of C showing that this point is seldom zero and responds with sensitivity to
extreme market events.

arXiv link: http://arxiv.org/abs/1908.11204v1

Econometrics arXiv paper, submitted: 2019-08-25

The Ridge Path Estimator for Linear Instrumental Variables

Authors: Nandana Sengupta, Fallaw Sowell

This paper presents the asymptotic behavior of a linear instrumental
variables (IV) estimator that uses a ridge regression penalty. The
regularization tuning parameter is selected empirically by splitting the
observed data into training and test samples. Conditional on the tuning
parameter, the training sample creates a path from the IV estimator to a prior.
The optimal tuning parameter is the value along this path that minimizes the IV
objective function for the test sample.
The empirically selected regularization tuning parameter becomes an estimated
parameter that jointly converges with the parameters of interest. The
asymptotic distribution of the tuning parameter is a nonstandard mixture
distribution. Monte Carlo simulations show the asymptotic distribution captures
the characteristics of the sampling distributions and when this ridge estimator
performs better than two-stage least squares.

arXiv link: http://arxiv.org/abs/1908.09237v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2019-08-24

Welfare Analysis in Dynamic Models

Authors: Victor Chernozhukov, Whitney Newey, Vira Semenova

This paper introduces metrics for welfare analysis in dynamic models. We
develop estimation and inference for these parameters even in the presence of a
high-dimensional state space. Examples of welfare metrics include average
welfare, average marginal welfare effects, and welfare decompositions into
direct and indirect effects similar to Oaxaca (1973) and Blinder (1973). We
derive dual and doubly robust representations of welfare metrics that
facilitate debiased inference. For average welfare, the value function does not
have to be estimated. In general, debiasing can be applied to any estimator of
the value function, including neural nets, random forests, Lasso, boosting, and
other high-dimensional methods. In particular, we derive Lasso and Neural
Network estimators of the value function and associated dynamic dual
representation and establish associated mean square convergence rates for these
functions. Debiasing is automatic in the sense that it only requires knowledge
of the welfare metric of interest, not the form of bias correction. The
proposed methods are applied to estimate a dynamic behavioral model of teacher
absenteeism in DHR and associated average teacher welfare.

arXiv link: http://arxiv.org/abs/1908.09173v5

Econometrics arXiv updated paper (originally submitted: 2019-08-24)

Constraint Qualifications in Partial Identification

Authors: Hiroaki Kaido, Francesca Molinari, Jörg Stoye

The literature on stochastic programming typically restricts attention to
problems that fulfill constraint qualifications. The literature on estimation
and inference under partial identification frequently restricts the geometry of
identified sets with diverse high-level assumptions. These superficially appear
to be different approaches to closely related problems. We extensively analyze
their relation. Among other things, we show that for partial identification
through pure moment inequalities, numerous assumptions from the literature
essentially coincide with the Mangasarian-Fromowitz constraint qualification.
This clarifies the relation between well-known contributions, including within
econometrics, and elucidates stringency, as well as ease of verification, of
some high-level assumptions in seminal papers.

arXiv link: http://arxiv.org/abs/1908.09103v4

Econometrics arXiv paper, submitted: 2019-08-23

Dyadic Regression

Authors: Bryan S. Graham

Dyadic data, where outcomes reflecting pairwise interaction among sampled
units are of primary interest, arise frequently in social science research.
Regression analyses with such data feature prominently in many research
literatures (e.g., gravity models of trade). The dependence structure
associated with dyadic data raises special estimation and, especially,
inference issues. This chapter reviews currently available methods for
(parametric) dyadic regression analysis and presents guidelines for empirical
researchers.

arXiv link: http://arxiv.org/abs/1908.09029v1

Econometrics arXiv paper, submitted: 2019-08-23

Nonparametric estimation of causal heterogeneity under high-dimensional confounding

Authors: Michael Zimmert, Michael Lechner

This paper considers the practically important case of nonparametrically
estimating heterogeneous average treatment effects that vary with a limited
number of discrete and continuous covariates in a selection-on-observables
framework where the number of possible confounders is very large. We propose a
two-step estimator for which the first step is estimated by machine learning.
We show that this estimator has desirable statistical properties like
consistency, asymptotic normality and rate double robustness. In particular, we
derive the coupled convergence conditions between the nonparametric and the
machine learning steps. We also show that estimating population average
treatment effects by averaging the estimated heterogeneous effects is
semi-parametrically efficient. The new estimator is an empirical example of the
effects of mothers' smoking during pregnancy on the resulting birth weight.

arXiv link: http://arxiv.org/abs/1908.08779v1

Econometrics arXiv paper, submitted: 2019-08-23

Heterogeneous Earnings Effects of the Job Corps by Gender Earnings: A Translated Quantile Approach

Authors: Anthony Strittmatter

Several studies of the Job Corps tend to nd more positive earnings effects
for males than for females. This effect heterogeneity favouring males contrasts
with the results of the majority of other training programmes' evaluations.
Applying the translated quantile approach of Bitler, Hoynes, and Domina (2014),
I investigate a potential mechanism behind the surprising findings for the Job
Corps. My results provide suggestive evidence that the effect of heterogeneity
by gender operates through existing gender earnings inequality rather than Job
Corps trainability differences.

arXiv link: http://arxiv.org/abs/1908.08721v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2019-08-22

Online Causal Inference for Advertising in Real-Time Bidding Auctions

Authors: Caio Waisman, Harikesh S. Nair, Carlos Carrion

Real-time bidding (RTB) systems, which utilize auctions to allocate user
impressions to competing advertisers, continue to enjoy success in digital
advertising. Assessing the effectiveness of such advertising remains a
challenge in research and practice. This paper proposes a new approach to
perform causal inference on advertising bought through such mechanisms.
Leveraging the economic structure of first- and second-price auctions, we first
show that the effects of advertising are identified by the optimal bids. Hence,
since these optimal bids are the only objects that need to be recovered, we
introduce an adapted Thompson sampling (TS) algorithm to solve a multi-armed
bandit problem that succeeds in recovering such bids and, consequently, the
effects of advertising while minimizing the costs of experimentation. We derive
a regret bound for our algorithm which is order optimal and use data from RTB
auctions to show that it outperforms commonly used methods that estimate the
effects of advertising.

arXiv link: http://arxiv.org/abs/1908.08600v4

Econometrics arXiv updated paper (originally submitted: 2019-08-21)

A Doubly Corrected Robust Variance Estimator for Linear GMM

Authors: Jungbin Hwang, Byunghoon Kang, Seojeong Lee

We propose a new finite sample corrected variance estimator for the linear
generalized method of moments (GMM) including the one-step, two-step, and
iterated estimators. Our formula additionally corrects for the
over-identification bias in variance estimation on top of the commonly used
finite sample correction of Windmeijer (2005) which corrects for the bias from
estimating the efficient weight matrix, so is doubly corrected. An important
feature of the proposed double correction is that it automatically provides
robustness to misspecification of the moment condition. In contrast, the
conventional variance estimator and the Windmeijer correction are inconsistent
under misspecification. That is, the proposed double correction formula
provides a convenient way to obtain improved inference under correct
specification and robustness against misspecification at the same time.

arXiv link: http://arxiv.org/abs/1908.07821v2

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2019-08-21

Analyzing Commodity Futures Using Factor State-Space Models with Wishart Stochastic Volatility

Authors: Tore Selland Kleppe, Roman Liesenfeld, Guilherme Valle Moura, Atle Oglend

We propose a factor state-space approach with stochastic volatility to model
and forecast the term structure of future contracts on commodities. Our
approach builds upon the dynamic 3-factor Nelson-Siegel model and its 4-factor
Svensson extension and assumes for the latent level, slope and curvature
factors a Gaussian vector autoregression with a multivariate Wishart stochastic
volatility process. Exploiting the conjugacy of the Wishart and the Gaussian
distribution, we develop a computationally fast and easy to implement MCMC
algorithm for the Bayesian posterior analysis. An empirical application to
daily prices for contracts on crude oil with stipulated delivery dates ranging
from one to 24 months ahead show that the estimated 4-factor Svensson model
with two curvature factors provides a good parsimonious representation of the
serial correlation in the individual prices and their volatility. It also shows
that this model has a good out-of-sample forecast performance.

arXiv link: http://arxiv.org/abs/1908.07798v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2019-08-20

New developments in revealed preference theory: decisions under risk, uncertainty, and intertemporal choice

Authors: Federico Echenique

This survey reviews recent developments in revealed preference theory. It
discusses the testable implications of theories of choice that are germane to
specific economic environments. The focus is on expected utility in risky
environments; subjected expected utility and maxmin expected utility in the
presence of uncertainty; and exponentially discounted utility for intertemporal
choice. The testable implications of these theories for data on choice from
classical linear budget sets are described, and shown to follow a common
thread. The theories all imply an inverse relation between prices and
quantities, with different qualifications depending on the functional forms in
the theory under consideration.

arXiv link: http://arxiv.org/abs/1908.07561v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-08-18

Spectral inference for large Stochastic Blockmodels with nodal covariates

Authors: Angelo Mele, Lingxin Hao, Joshua Cape, Carey E. Priebe

In many applications of network analysis, it is important to distinguish
between observed and unobserved factors affecting network structure. To this
end, we develop spectral estimators for both unobserved blocks and the effect
of covariates in stochastic blockmodels. On the theoretical side, we establish
asymptotic normality of our estimators for the subsequent purpose of performing
inference. On the applied side, we show that computing our estimator is much
faster than standard variational expectation--maximization algorithms and
scales well for large networks. Monte Carlo experiments suggest that the
estimator performs well under different data generating processes. Our
application to Facebook data shows evidence of homophily in gender, role and
campus-residence, while allowing us to discover unobserved communities. The
results in this paper provide a foundation for spectral estimation of the
effect of observed covariates as well as unobserved latent community structure
on the probability of link formation in networks.

arXiv link: http://arxiv.org/abs/1908.06438v2

Econometrics arXiv updated paper (originally submitted: 2019-08-17)

Measuring international uncertainty using global vector autoregressions with drifting parameters

Authors: Michael Pfarrhofer

This paper investigates the time-varying impacts of international
macroeconomic uncertainty shocks. We use a global vector autoregressive
specification with drifting coefficients and factor stochastic volatility in
the errors to model six economies jointly. The measure of uncertainty is
constructed endogenously by estimating a scalar driving the innovation
variances of the latent factors, which is also included in the mean of the
process. To achieve regularization, we use Bayesian techniques for estimation,
and introduce a set of hierarchical global-local priors. The adopted priors
center the model on a constant parameter specification with homoscedastic
errors, but allow for time-variation if suggested by likelihood information.
Moreover, we assume coefficients across economies to be similar, but provide
sufficient flexibility via the hierarchical prior for country-specific
idiosyncrasies. The results point towards pronounced real and financial effects
of uncertainty shocks in all countries, with differences across economies and
over time.

arXiv link: http://arxiv.org/abs/1908.06325v2

Econometrics arXiv paper, submitted: 2019-08-16

A model of discrete choice based on reinforcement learning under short-term memory

Authors: Misha Perepelitsa

A family of models of individual discrete choice are constructed by means of
statistical averaging of choices made by a subject in a reinforcement learning
process, where the subject has short, k-term memory span. The choice
probabilities in these models combine in a non-trivial, non-linear way the
initial learning bias and the experience gained through learning. The
properties of such models are discussed and, in particular, it is shown that
probabilities deviate from Luce's Choice Axiom, even if the initial bias
adheres to it. Moreover, we shown that the latter property is recovered as the
memory span becomes large.
Two applications in utility theory are considered. In the first, we use the
discrete choice model to generate binary preference relation on simple
lotteries. We show that the preferences violate transitivity and independence
axioms of expected utility theory. Furthermore, we establish the dependence of
the preferences on frames, with risk aversion for gains, and risk seeking for
losses. Based on these findings we propose next a parametric model of choice
based on the probability maximization principle, as a model for deviations from
expected utility principle. To illustrate the approach we apply it to the
classical problem of demand for insurance.

arXiv link: http://arxiv.org/abs/1908.06133v1

Econometrics arXiv updated paper (originally submitted: 2019-08-16)

Forward-Selected Panel Data Approach for Program Evaluation

Authors: Zhentao Shi, Jingyi Huang

Policy evaluation is central to economic data analysis, but economists mostly
work with observational data in view of limited opportunities to carry out
controlled experiments. In the potential outcome framework, the panel data
approach (Hsiao, Ching and Wan, 2012) constructs the counterfactual by
exploiting the correlation between cross-sectional units in panel data. The
choice of cross-sectional control units, a key step in its implementation, is
nevertheless unresolved in data-rich environment when many possible controls
are at the researcher's disposal. We propose the forward selection method to
choose control units, and establish validity of the post-selection inference.
Our asymptotic framework allows the number of possible controls to grow much
faster than the time dimension. The easy-to-implement algorithms and their
theoretical guarantee extend the panel data approach to big data settings.

arXiv link: http://arxiv.org/abs/1908.05894v3

Econometrics arXiv paper, submitted: 2019-08-16

Testing the Drift-Diffusion Model

Authors: Drew Fudenberg, Whitney K. Newey, Philipp Strack, Tomasz Strzalecki

The drift diffusion model (DDM) is a model of sequential sampling with
diffusion (Brownian) signals, where the decision maker accumulates evidence
until the process hits a stopping boundary, and then stops and chooses the
alternative that corresponds to that boundary. This model has been widely used
in psychology, neuroeconomics, and neuroscience to explain the observed
patterns of choice and response times in a range of binary choice decision
problems. This paper provides a statistical test for DDM's with general
boundaries. We first prove a characterization theorem: we find a condition on
choice probabilities that is satisfied if and only if the choice probabilities
are generated by some DDM. Moreover, we show that the drift and the boundary
are uniquely identified. We then use our condition to nonparametrically
estimate the drift and the boundary and construct a test statistic.

arXiv link: http://arxiv.org/abs/1908.05824v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-08-16

Counting Defiers

Authors: Amanda Kowalski

The LATE monotonicity assumption of Imbens and Angrist (1994) precludes
"defiers," individuals whose treatment always runs counter to the instrument,
in the terminology of Balke and Pearl (1993) and Angrist et al. (1996). I allow
for defiers in a model with a binary instrument and a binary treatment. The
model is explicit about the randomization process that gives rise to the
instrument. I use the model to develop estimators of the counts of defiers,
always takers, compliers, and never takers. I propose separate versions of the
estimators for contexts in which the parameter of the randomization process is
unspecified, which I intend for use with natural experiments with virtual
random assignment. I present an empirical application that revisits Angrist and
Evans (1998), which examines the impact of virtual random assignment of the sex
of the first two children on subsequent fertility. I find that subsequent
fertility is much more responsive to the sex mix of the first two children when
defiers are allowed.

arXiv link: http://arxiv.org/abs/1908.05811v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-08-16

A Model of a Randomized Experiment with an Application to the PROWESS Clinical Trial

Authors: Amanda Kowalski

I develop a model of a randomized experiment with a binary intervention and a
binary outcome. Potential outcomes in the intervention and control groups give
rise to four types of participants. Fixing ideas such that the outcome is
mortality, some participants would live regardless, others would be saved,
others would be killed, and others would die regardless. These potential
outcome types are not observable. However, I use the model to develop
estimators of the number of participants of each type. The model relies on the
randomization within the experiment and on deductive reasoning. I apply the
model to an important clinical trial, the PROWESS trial, and I perform a Monte
Carlo simulation calibrated to estimates from the trial. The reduced form from
the trial shows a reduction in mortality, which provided a rationale for FDA
approval. However, I find that the intervention killed two participants for
every three it saved.

arXiv link: http://arxiv.org/abs/1908.05810v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-08-15

Isotonic Regression Discontinuity Designs

Authors: Andrii Babii, Rohit Kumar

This paper studies the estimation and inference for the isotonic regression
at the boundary point, an object that is particularly interesting and required
in the analysis of monotone regression discontinuity designs. We show that the
isotonic regression is inconsistent in this setting and derive the asymptotic
distributions of boundary corrected estimators. Interestingly, the boundary
corrected estimators can be bootstrapped without subsampling or additional
nonparametric smoothing which is not the case for the interior point. The Monte
Carlo experiments indicate that shape restrictions can improve dramatically the
finite-sample performance of unrestricted estimators. Lastly, we apply the
isotonic regression discontinuity designs to estimate the causal effect of
incumbency in the U.S. House elections.

arXiv link: http://arxiv.org/abs/1908.05752v6

Econometrics arXiv paper, submitted: 2019-08-15

Injectivity and the Law of Demand

Authors: Roy Allen

Establishing that a demand mapping is injective is core first step for a
variety of methodologies. When a version of the law of demand holds, global
injectivity can be checked by seeing whether the demand mapping is constant
over any line segments. When we add the assumption of differentiability, we
obtain necessary and sufficient conditions for injectivity that generalize
classical gale1965jacobian conditions for quasi-definite Jacobians.

arXiv link: http://arxiv.org/abs/1908.05714v1

Econometrics arXiv updated paper (originally submitted: 2019-08-15)

Nonparametric Identification of First-Price Auction with Unobserved Competition: A Density Discontinuity Framework

Authors: Emmanuel Guerre, Yao Luo

We consider nonparametric identification of independent private value
first-price auction models, in which the analyst only observes winning bids.
Our benchmark model assumes an exogenous number of bidders $N$. We show that,
if the bidders observe $N$, the resulting discontinuities in the winning bid
density can be used to identify the distribution of $N$. The private value
distribution can be nonparametrically identified in a second step. This
extends, under testable identification conditions, to the case where $N$ is a
number of potential buyers, who bid with some unknown probability.
Identification also holds in presence of additive unobserved heterogeneity
drawn from some parametric distributions. A parametric Bayesian estimation
procedure is proposed. An application to Shanghai Government IT procurements
finds that the imposed three bidders participation rule is not effective. This
generates loss in the range of as large as $10%$ of the appraisal budget for
small IT contracts.

arXiv link: http://arxiv.org/abs/1908.05476v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-08-14

On rank estimators in increasing dimensions

Authors: Yanqin Fan, Fang Han, Wei Li, Xiao-Hua Zhou

The family of rank estimators, including Han's maximum rank correlation (Han,
1987) as a notable example, has been widely exploited in studying regression
problems. For these estimators, although the linear index is introduced for
alleviating the impact of dimensionality, the effect of large dimension on
inference is rarely studied. This paper fills this gap via studying the
statistical properties of a larger family of M-estimators, whose objective
functions are formulated as U-processes and may be discontinuous in increasing
dimension set-up where the number of parameters, $p_{n}$, in the model is
allowed to increase with the sample size, $n$. First, we find that often in
estimation, as $p_{n}/n\rightarrow 0$, $(p_{n}/n)^{1/2}$ rate of convergence is
obtainable. Second, we establish Bahadur-type bounds and study the validity of
normal approximation, which we find often requires a much stronger scaling
requirement than $p_{n}^{2}/n\rightarrow 0.$ Third, we state conditions under
which the numerical derivative estimator of asymptotic covariance matrix is
consistent, and show that the step size in implementing the covariance
estimator has to be adjusted with respect to $p_{n}$. All theoretical results
are further backed up by simulation studies.

arXiv link: http://arxiv.org/abs/1908.05255v1

Econometrics arXiv cross-link from q-fin.RM (q-fin.RM), submitted: 2019-08-13

Forecast Encompassing Tests for the Expected Shortfall

Authors: Timo Dimitriadis, Julie Schnaitmann

We introduce new forecast encompassing tests for the risk measure Expected
Shortfall (ES). The ES currently receives much attention through its
introduction into the Basel III Accords, which stipulate its use as the primary
market risk measure for the international banking regulation. We utilize joint
loss functions for the pair ES and Value at Risk to set up three ES
encompassing test variants. The tests are built on misspecification robust
asymptotic theory and we investigate the finite sample properties of the tests
in an extensive simulation study. We use the encompassing tests to illustrate
the potential of forecast combination methods for different financial assets.

arXiv link: http://arxiv.org/abs/1908.04569v3

Econometrics arXiv updated paper (originally submitted: 2019-08-12)

Zero Black-Derman-Toy interest rate model

Authors: Grzegorz Krzyżanowski, Ernesto Mordecki, Andrés Sosa

We propose a modification of the classical Black-Derman-Toy (BDT) interest
rate tree model, which includes the possibility of a jump with small
probability at each step to a practically zero interest rate. The corresponding
BDT algorithms are consequently modified to calibrate the tree containing the
zero interest rate scenarios. This modification is motivated by the recent
2008-2009 crisis in the United States and it quantifies the risk of a future
crises in bond prices and derivatives. The proposed model is useful to price
derivatives. This exercise also provides a tool to calibrate the probability of
this event. A comparison of option prices and implied volatilities on US
Treasury bonds computed with both the proposed and the classical tree model is
provided, in six different scenarios along the different periods comprising the
years 2002-2017.

arXiv link: http://arxiv.org/abs/1908.04401v2

Econometrics arXiv paper, submitted: 2019-08-12

Maximum Approximated Likelihood Estimation

Authors: Michael Griebel, Florian Heiss, Jens Oettershagen, Constantin Weiser

Empirical economic research frequently applies maximum likelihood estimation
in cases where the likelihood function is analytically intractable. Most of the
theoretical literature focuses on maximum simulated likelihood (MSL)
estimators, while empirical and simulation analyzes often find that alternative
approximation methods such as quasi-Monte Carlo simulation, Gaussian
quadrature, and integration on sparse grids behave considerably better
numerically. This paper generalizes the theoretical results widely known for
MSL estimators to a general set of maximum approximated likelihood (MAL)
estimators. We provide general conditions for both the model and the
approximation approach to ensure consistency and asymptotic normality. We also
show specific examples and finite-sample simulation results.

arXiv link: http://arxiv.org/abs/1908.04110v1

Econometrics arXiv cross-link from cs.CR (cs.CR), submitted: 2019-08-09

Privacy-Aware Distributed Mobility Choice Modelling over Blockchain

Authors: David Lopez, Bilal Farooq

A generalized distributed tool for mobility choice modelling is presented,
where participants do not share personal raw data, while all computations are
done locally. Participants use Blockchain based Smart Mobility Data-market
(BSMD), where all transactions are secure and private. Nodes in blockchain can
transact information with other participants as long as both parties agree to
the transaction rules issued by the owner of the data. A case study is
presented where a mode choice model is distributed and estimated over BSMD. As
an example, the parameter estimation problem is solved on a distributed version
of simulated annealing. It is demonstrated that the estimated model parameters
are consistent and reproducible.

arXiv link: http://arxiv.org/abs/1908.03446v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-08-08

Analysis of Networks via the Sparse $β$-Model

Authors: Mingli Chen, Kengo Kato, Chenlei Leng

Data in the form of networks are increasingly available in a variety of
areas, yet statistical models allowing for parameter estimates with desirable
statistical properties for sparse networks remain scarce. To address this, we
propose the Sparse $\beta$-Model (S$\beta$M), a new network model that
interpolates the celebrated Erdos-R\'enyi model and the $\beta$-model that
assigns one different parameter to each node. By a novel reparameterization of
the $\beta$-model to distinguish global and local parameters, our S$\beta$M can
drastically reduce the dimensionality of the $\beta$-model by requiring some of
the local parameters to be zero. We derive the asymptotic distribution of the
maximum likelihood estimator of the S$\beta$M when the support of the parameter
vector is known. When the support is unknown, we formulate a penalized
likelihood approach with the $\ell_0$-penalty. Remarkably, we show via a
monotonicity lemma that the seemingly combinatorial computational problem due
to the $\ell_0$-penalty can be overcome by assigning nonzero parameters to
those nodes with the largest degrees. We further show that a $\beta$-min
condition guarantees our method to identify the true model and provide excess
risk bounds for the estimated parameters. The estimation procedure enjoys good
finite sample properties as shown by simulation studies. The usefulness of the
S$\beta$M is further illustrated via the analysis of a microfinance take-up
example.

arXiv link: http://arxiv.org/abs/1908.03152v3

Econometrics arXiv updated paper (originally submitted: 2019-08-07)

Efficient Estimation by Fully Modified GLS with an Application to the Environmental Kuznets Curve

Authors: Yicong Lin, Hanno Reuvers

This paper develops the asymptotic theory of a Fully Modified Generalized
Least Squares estimator for multivariate cointegrating polynomial regressions.
Such regressions allow for deterministic trends, stochastic trends and integer
powers of stochastic trends to enter the cointegrating relations. Our fully
modified estimator incorporates: (1) the direct estimation of the inverse
autocovariance matrix of the multidimensional errors, and (2) second order bias
corrections. The resulting estimator has the intuitive interpretation of
applying a weighted least squares objective function to filtered data series.
Moreover, the required second order bias corrections are convenient byproducts
of our approach and lead to standard asymptotic inference. We also study
several multivariate KPSS-type of tests for the null of cointegration. A
comprehensive simulation study shows good performance of the FM-GLS estimator
and the related tests. As a practical illustration, we reinvestigate the
Environmental Kuznets Curve (EKC) hypothesis for six early industrialized
countries as in Wagner et al. (2020).

arXiv link: http://arxiv.org/abs/1908.02552v2

Econometrics arXiv updated paper (originally submitted: 2019-08-06)

Estimation of Conditional Average Treatment Effects with High-Dimensional Data

Authors: Qingliang Fan, Yu-Chin Hsu, Robert P. Lieli, Yichong Zhang

Given the unconfoundedness assumption, we propose new nonparametric
estimators for the reduced dimensional conditional average treatment effect
(CATE) function. In the first stage, the nuisance functions necessary for
identifying CATE are estimated by machine learning methods, allowing the number
of covariates to be comparable to or larger than the sample size. The second
stage consists of a low-dimensional local linear regression, reducing CATE to a
function of the covariate(s) of interest. We consider two variants of the
estimator depending on whether the nuisance functions are estimated over the
full sample or over a hold-out sample. Building on Belloni at al. (2017) and
Chernozhukov et al. (2018), we derive functional limit theory for the
estimators and provide an easy-to-implement procedure for uniform inference
based on the multiplier bootstrap. The empirical application revisits the
effect of maternal smoking on a baby's birth weight as a function of the
mother's age.

arXiv link: http://arxiv.org/abs/1908.02399v5

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-08-06

Semiparametric Wavelet-based JPEG IV Estimator for endogenously truncated data

Authors: Nir Billfeld, Moshe Kim

A new and an enriched JPEG algorithm is provided for identifying redundancies
in a sequence of irregular noisy data points which also accommodates a
reference-free criterion function. Our main contribution is by formulating
analytically (instead of approximating) the inverse of the transpose of
JPEGwavelet transform without involving matrices which are computationally
cumbersome. The algorithm is suitable for the widely-spread situations where
the original data distribution is unobservable such as in cases where there is
deficient representation of the entire population in the training data (in
machine learning) and thus the covariate shift assumption is violated. The
proposed estimator corrects for both biases, the one generated by endogenous
truncation and the one generated by endogenous covariates. Results from
utilizing 2,000,000 different distribution functions verify the applicability
and high accuracy of our procedure to cases in which the disturbances are
neither jointly nor marginally normally distributed.

arXiv link: http://arxiv.org/abs/1908.02166v1

Econometrics arXiv cross-link from q-fin.PM (q-fin.PM), submitted: 2019-08-06

Analysing Global Fixed Income Markets with Tensors

Authors: Bruno Scalzo Dees

Global fixed income returns span across multiple maturities and economies,
that is, they naturally reside on multi-dimensional data structures referred to
as tensors. In contrast to standard "flat-view" multivariate models that are
agnostic to data structure and only describe linear pairwise relationships, we
introduce a tensor-valued approach to model the global risks shared by multiple
interest rate curves. In this way, the estimated risk factors can be
analytically decomposed into maturity-domain and country-domain constituents,
which allows the investor to devise rigorous and tractable global portfolio
management and hedging strategies tailored to each risk domain. An empirical
analysis confirms the existence of global risk factors shared by eight
developed economies, and demonstrates their ability to compactly describe the
global macroeconomic environment.

arXiv link: http://arxiv.org/abs/1908.02101v4

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2019-08-05

Discovery of Bias and Strategic Behavior in Crowdsourced Performance Assessment

Authors: Yifei Huang, Matt Shum, Xi Wu, Jason Zezhong Xiao

With the industry trend of shifting from a traditional hierarchical approach
to flatter management structure, crowdsourced performance assessment gained
mainstream popularity. One fundamental challenge of crowdsourced performance
assessment is the risks that personal interest can introduce distortions of
facts, especially when the system is used to determine merit pay or promotion.
In this paper, we developed a method to identify bias and strategic behavior in
crowdsourced performance assessment, using a rich dataset collected from a
professional service firm in China. We find a pattern of "discriminatory
generosity" on the part of peer evaluation, where raters downgrade their peer
coworkers who have passed objective promotion requirements while overrating
their peer coworkers who have not yet passed. This introduces two types of
biases: the first aimed against more competent competitors, and the other
favoring less eligible peers which can serve as a mask of the first bias. This
paper also aims to bring angles of fairness-aware data mining to talent and
management computing. Historical decision records, such as performance ratings,
often contain subjective judgment which is prone to bias and strategic
behavior. For practitioners of predictive talent analytics, it is important to
investigate potential bias and strategic behavior underlying historical
decision records.

arXiv link: http://arxiv.org/abs/1908.01718v2

Econometrics arXiv updated paper (originally submitted: 2019-08-04)

Uncertainty in the Hot Hand Fallacy: Detecting Streaky Alternatives to Random Bernoulli Sequences

Authors: David M. Ritzwoller, Joseph P. Romano

We study a class of permutation tests of the randomness of a collection of
Bernoulli sequences and their application to analyses of the human tendency to
perceive streaks of consecutive successes as overly representative of positive
dependence - the hot hand fallacy. In particular, we study permutation tests of
the null hypothesis of randomness (i.e., that trials are i.i.d.) based on test
statistics that compare the proportion of successes that directly follow k
consecutive successes with either the overall proportion of successes or the
proportion of successes that directly follow k consecutive failures. We
characterize the asymptotic distributions of these test statistics and their
permutation distributions under randomness, under a set of general stationary
processes, and under a class of Markov chain alternatives, which allow us to
derive their local asymptotic power. The results are applied to evaluate the
empirical support for the hot hand fallacy provided by four controlled
basketball shooting experiments. We establish that substantially larger data
sets are required to derive an informative measurement of the deviation from
randomness in basketball shooting. In one experiment, for which we were able to
obtain data, multiple testing procedures reveal that one shooter exhibits a
shooting pattern significantly inconsistent with randomness - supplying strong
evidence that basketball shooting is not random for all shooters all of the
time. However, we find that the evidence against randomness in this experiment
is limited to this shooter. Our results provide a mathematical and statistical
foundation for the design and validation of experiments that directly compare
deviations from randomness with human beliefs about deviations from randomness,
and thereby constitute a direct test of the hot hand fallacy.

arXiv link: http://arxiv.org/abs/1908.01406v6

Econometrics arXiv updated paper (originally submitted: 2019-08-04)

Estimating Unobserved Individual Heterogeneity Using Pairwise Comparisons

Authors: Elena Krasnokutskaya, Kyungchul Song, Xun Tang

We propose a new method for studying environments with unobserved individual
heterogeneity. Based on model-implied pairwise inequalities, the method
classifies individuals in the sample into groups defined by discrete unobserved
heterogeneity with unknown support. We establish conditions under which the
groups are identified and consistently estimated through our method. We show
that the method performs well in finite samples through Monte Carlo simulation.
We then apply the method to estimate a model of lowest-price procurement
auctions with unobserved bidder heterogeneity, using data from the California
highway procurement market.

arXiv link: http://arxiv.org/abs/1908.01272v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2019-08-03

The Use of Binary Choice Forests to Model and Estimate Discrete Choices

Authors: Ningyuan Chen, Guillermo Gallego, Zhuodong Tang

Problem definition. In retailing, discrete choice models (DCMs) are commonly
used to capture the choice behavior of customers when offered an assortment of
products. When estimating DCMs using transaction data, flexible models (such as
machine learning models or nonparametric models) are typically not
interpretable and hard to estimate, while tractable models (such as the
multinomial logit model) tend to misspecify the complex behavior represeted in
the data. Methodology/results. In this study, we use a forest of binary
decision trees to represent DCMs. This approach is based on random forests, a
popular machine learning algorithm. The resulting model is interpretable: the
decision trees can explain the decision-making process of customers during the
purchase. We show that our approach can predict the choice probability of any
DCM consistently and thus never suffers from misspecification. Moreover, our
algorithm predicts assortments unseen in the training data. The mechanism and
errors can be theoretically analyzed. We also prove that the random forest can
recover preference rankings of customers thanks to the splitting criterion such
as the Gini index and information gain ratio. Managerial implications. The
framework has unique practical advantages. It can capture customers' behavioral
patterns such as irrationality or sequential searches when purchasing a
product. It handles nonstandard formats of training data that result from
aggregation. It can measure product importance based on how frequently a random
customer would make decisions depending on the presence of the product. It can
also incorporate price information and customer features. Our numerical
experiments using synthetic and real data show that using random forests to
estimate customer choices can outperform existing methods.

arXiv link: http://arxiv.org/abs/1908.01109v6

Econometrics arXiv paper, submitted: 2019-08-02

Heterogeneous Endogenous Effects in Networks

Authors: Sida Peng

This paper proposes a new method to identify leaders and followers in a
network. Prior works use spatial autoregression models (SARs) which implicitly
assume that each individual in the network has the same peer effects on others.
Mechanically, they conclude the key player in the network to be the one with
the highest centrality. However, when some individuals are more influential
than others, centrality may fail to be a good measure. I develop a model that
allows for individual-specific endogenous effects and propose a two-stage LASSO
procedure to identify influential individuals in a network. Under an assumption
of sparsity: only a subset of individuals (which can increase with sample size
n) is influential, I show that my 2SLSS estimator for individual-specific
endogenous effects is consistent and achieves asymptotic normality. I also
develop robust inference including uniformly valid confidence intervals. These
results also carry through to scenarios where the influential individuals are
not sparse. I extend the analysis to allow for multiple types of connections
(multiple networks), and I show how to use the sparse group LASSO to detect
which of the multiple connection types is more influential. Simulation evidence
shows that my estimator has good finite sample performance. I further apply my
method to the data in Banerjee et al. (2013) and my proposed procedure is able
to identify leaders and effective networks.

arXiv link: http://arxiv.org/abs/1908.00663v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-07-31

Testing for Externalities in Network Formation Using Simulation

Authors: Bryan S. Graham, Andrin Pelican

We discuss a simplified version of the testing problem considered by Pelican
and Graham (2019): testing for interdependencies in preferences over links
among N (possibly heterogeneous) agents in a network. We describe an exact test
which conditions on a sufficient statistic for the nuisance parameter
characterizing any agent-level heterogeneity. Employing an algorithm due to
Blitzstein and Diaconis (2011), we show how to simulate the null distribution
of the test statistic in order to estimate critical values and/or p-values. We
illustrate our methods using the Nyakatoke risk-sharing network. We find that
the transitivity of the Nyakatoke network far exceeds what can be explained by
degree heterogeneity across households alone.

arXiv link: http://arxiv.org/abs/1908.00099v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-07-31

Kernel Density Estimation for Undirected Dyadic Data

Authors: Bryan S. Graham, Fengshi Niu, James L. Powell

We study nonparametric estimation of density functions for undirected dyadic
random variables (i.e., random variables defined for all
ndef{\equiv}N{2} unordered pairs of agents/nodes in a
weighted network of order N). These random variables satisfy a local dependence
property: any random variables in the network that share one or two indices may
be dependent, while those sharing no indices in common are independent. In this
setting, we show that density functions may be estimated by an application of
the kernel estimation method of Rosenblatt (1956) and Parzen (1962). We suggest
an estimate of their asymptotic variances inspired by a combination of (i)
Newey's (1994) method of variance estimation for kernel estimators in the
"monadic" setting and (ii) a variance estimator for the (estimated) density of
a simple network first suggested by Holland and Leinhardt (1976). More unusual
are the rates of convergence and asymptotic (normal) distributions of our
dyadic density estimates. Specifically, we show that they converge at the same
rate as the (unconditional) dyadic sample mean: the square root of the number,
N, of nodes. This differs from the results for nonparametric estimation of
densities and regression functions for monadic data, which generally have a
slower rate of convergence than their corresponding sample mean.

arXiv link: http://arxiv.org/abs/1907.13630v1

Econometrics arXiv updated paper (originally submitted: 2019-07-30)

Detecting Identification Failure in Moment Condition Models

Authors: Jean-Jacques Forneron

This paper develops an approach to detect identification failure in moment
condition models. This is achieved by introducing a quasi-Jacobian matrix
computed as the slope of a linear approximation of the moments on an estimate
of the identified set. It is asymptotically singular when local and/or global
identification fails, and equivalent to the usual Jacobian matrix which has
full rank when the model is point and locally identified. Building on this
property, a simple test with chi-squared critical values is introduced to
conduct subvector inferences allowing for strong, semi-strong, and weak
identification without a priori knowledge about the underlying
identification structure. Monte-Carlo simulations and an empirical application
to the Long-Run Risks model illustrate the results.

arXiv link: http://arxiv.org/abs/1907.13093v5

Econometrics arXiv paper, submitted: 2019-07-30

Predicting credit default probabilities using machine learning techniques in the face of unequal class distributions

Authors: Anna Stelzer

This study conducts a benchmarking study, comparing 23 different statistical
and machine learning methods in a credit scoring application. In order to do
so, the models' performance is evaluated over four different data sets in
combination with five data sampling strategies to tackle existing class
imbalances in the data. Six different performance measures are used to cover
different aspects of predictive performance. The results indicate a strong
superiority of ensemble methods and show that simple sampling strategies
deliver better results than more sophisticated ones.

arXiv link: http://arxiv.org/abs/1907.12996v1

Econometrics arXiv paper, submitted: 2019-07-30

A Comparison of First-Difference and Forward Orthogonal Deviations GMM

Authors: Robert F. Phillips

This paper provides a necessary and sufficient instruments condition assuring
two-step generalized method of moments (GMM) based on the forward orthogonal
deviations transformation is numerically equivalent to two-step GMM based on
the first-difference transformation. The condition also tells us when system
GMM, based on differencing, can be computed using forward orthogonal
deviations. Additionally, it tells us when forward orthogonal deviations and
differencing do not lead to the same GMM estimator. When estimators based on
these two transformations differ, Monte Carlo simulations indicate that
estimators based on forward orthogonal deviations have better finite sample
properties than estimators based on differencing.

arXiv link: http://arxiv.org/abs/1907.12880v1

Econometrics arXiv updated paper (originally submitted: 2019-07-30)

Robust tests for ARCH in the presence of the misspecified conditional mean: A comparison of nonparametric approches

Authors: Daiki Maki, Yasushi Ota

This study compares statistical properties of ARCH tests that are robust to
the presence of the misspecified conditional mean. The approaches employed in
this study are based on two nonparametric regressions for the conditional mean.
First is the ARCH test using Nadayara-Watson kernel regression. Second is the
ARCH test using the polynomial approximation regression. The two approaches do
not require specification of the conditional mean and can adapt to various
nonlinear models, which are unknown a priori. Accordingly, they are robust to
misspecified conditional mean models. Simulation results show that ARCH tests
based on the polynomial approximation regression approach have better
statistical properties than ARCH tests using Nadayara-Watson kernel regression
approach for various nonlinear models.

arXiv link: http://arxiv.org/abs/1907.12752v2

Econometrics arXiv updated paper (originally submitted: 2019-07-28)

Testing for time-varying properties under misspecified conditional mean and variance

Authors: Daiki Maki, Yasushi Ota

This study examines statistical performance of tests for time-varying
properties under misspecified conditional mean and variance. When we test for
time-varying properties of the conditional mean in the case in which data have
no time-varying mean but have time-varying variance, asymptotic tests have size
distortions. This is improved by the use of a bootstrap method. Similarly, when
we test for time-varying properties of the conditional variance in the case in
which data have time-varying mean but no time-varying variance, asymptotic
tests have large size distortions. This is not improved even by the use of
bootstrap methods. We show that tests for time-varying properties of the
conditional mean by the bootstrap are robust regardless of the time-varying
variance model, whereas tests for time-varying properties of the conditional
variance do not perform well in the presence of misspecified time-varying mean.

arXiv link: http://arxiv.org/abs/1907.12107v2

Econometrics arXiv paper, submitted: 2019-07-22

X-model: further development and possible modifications

Authors: Sergei Kulakov

Despite its critical importance, the famous X-model elaborated by Ziel and
Steinert (2016) has neither bin been widely studied nor further developed. And
yet, the possibilities to improve the model are as numerous as the fields it
can be applied to. The present paper takes advantage of a technique proposed by
Coulon et al. (2014) to enhance the X-model. Instead of using the wholesale
supply and demand curves as inputs for the model, we rely on the transformed
versions of these curves with a perfectly inelastic demand. As a result,
computational requirements of our X-model reduce and its forecasting power
increases substantially. Moreover, our X-model becomes more robust towards
outliers present in the initial auction curves data.

arXiv link: http://arxiv.org/abs/1907.09206v1

Econometrics arXiv paper, submitted: 2019-07-22

On the simulation of the Hawkes process via Lambert-W functions

Authors: Martin Magris

Several methods have been developed for the simulation of the Hawkes process.
The oldest approach is the inverse sampling transform (ITS) suggested in
ozaki1979maximum, but rapidly abandoned in favor of more efficient
alternatives. This manuscript shows that the ITS approach can be conveniently
discussed in terms of Lambert-W functions. An optimized and efficient
implementation suggests that this approach is computationally more performing
than more recent alternatives available for the simulation of the Hawkes
process.

arXiv link: http://arxiv.org/abs/1907.09162v1

Econometrics arXiv paper, submitted: 2019-07-20

Rebuttal of "On Nonparametric Identification of Treatment Effects in Duration Models"

Authors: Jaap H. Abbring, Gerard J. van den Berg

In their IZA Discussion Paper 10247, Johansson and Lee claim that the main
result (Proposition 3) in Abbring and Van den Berg (2003b) does not hold. We
show that their claim is incorrect. At a certain point within their line of
reasoning, they make a rather basic error while transforming one random
variable into another random variable, and this leads them to draw incorrect
conclusions. As a result, their paper can be discarded.

arXiv link: http://arxiv.org/abs/1907.09886v1

Econometrics arXiv paper, submitted: 2019-07-19

A Vine-copula extension for the HAR model

Authors: Martin Magris

The heterogeneous autoregressive (HAR) model is revised by modeling the joint
distribution of the four partial-volatility terms therein involved. Namely,
today's, yesterday's, last week's and last month's volatility components. The
joint distribution relies on a (C-) Vine copula construction, allowing to
conveniently extract volatility forecasts based on the conditional expectation
of today's volatility given its past terms. The proposed empirical application
involves more than seven years of high-frequency transaction prices for ten
stocks and evaluates the in-sample, out-of-sample and one-step-ahead forecast
performance of our model for daily realized-kernel measures. The model proposed
in this paper is shown to outperform the HAR counterpart under different models
for marginal distributions, copula construction methods, and forecasting
settings.

arXiv link: http://arxiv.org/abs/1907.08522v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2019-07-17

Product Aesthetic Design: A Machine Learning Augmentation

Authors: Alex Burnap, John R. Hauser, Artem Timoshenko

Aesthetics are critically important to market acceptance. In the automotive
industry, an improved aesthetic design can boost sales by 30% or more. Firms
invest heavily in designing and testing aesthetics. A single automotive "theme
clinic" can cost over $100,000, and hundreds are conducted annually. We propose
a model to augment the commonly-used aesthetic design process by predicting
aesthetic scores and automatically generating innovative and appealing product
designs. The model combines a probabilistic variational autoencoder (VAE) with
adversarial components from generative adversarial networks (GAN) and a
supervised learning component. We train and evaluate the model with data from
an automotive partner-images of 203 SUVs evaluated by targeted consumers and
180,000 high-quality unrated images. Our model predicts well the appeal of new
aesthetic designs-43.5% improvement relative to a uniform baseline and
substantial improvement over conventional machine learning models and
pretrained deep neural networks. New automotive designs are generated in a
controllable manner for use by design teams. We empirically verify that
automatically generated designs are (1) appealing to consumers and (2) resemble
designs which were introduced to the market five years after our data were
collected. We provide an additional proof-of-concept application using
opensource images of dining room chairs.

arXiv link: http://arxiv.org/abs/1907.07786v2

Econometrics arXiv paper, submitted: 2019-07-17

Testing for Unobserved Heterogeneity via k-means Clustering

Authors: Andrew J. Patton, Brian M. Weller

Clustering methods such as k-means have found widespread use in a variety of
applications. This paper proposes a formal testing procedure to determine
whether a null hypothesis of a single cluster, indicating homogeneity of the
data, can be rejected in favor of multiple clusters. The test is simple to
implement, valid under relatively mild conditions (including non-normality, and
heterogeneity of the data in aspects beyond those in the clustering analysis),
and applicable in a range of contexts (including clustering when the time
series dimension is small, or clustering on parameters other than the mean). We
verify that the test has good size control in finite samples, and we illustrate
the test in applications to clustering vehicle manufacturers and U.S. mutual
funds.

arXiv link: http://arxiv.org/abs/1907.07582v1

Econometrics arXiv updated paper (originally submitted: 2019-07-17)

Testing for Quantile Sample Selection

Authors: Valentina Corradi, Daniel Gutknecht

This paper provides tests for detecting sample selection in nonparametric
conditional quantile functions. The first test is an omitted predictor test
with the propensity score as the omitted variable. As with any omnibus test, in
the case of rejection we cannot distinguish between rejection due to genuine
selection or to misspecification. Thus, we suggest a second test to provide
supporting evidence whether the cause for rejection at the first stage was
solely due to selection or not. Using only individuals with propensity score
close to one, this second test relies on an `identification at infinity'
argument, but accommodates cases of irregular identification. Importantly,
neither of the two tests requires parametric assumptions on the selection
equation nor a continuous exclusion restriction. Data-driven bandwidth
procedures are proposed, and Monte Carlo evidence suggests a good finite sample
performance in particular of the first test. Finally, we also derive an
extension of the first test to nonparametric conditional mean functions, and
apply our procedure to test for selection in log hourly wages using UK Family
Expenditure Survey data as AB2017.

arXiv link: http://arxiv.org/abs/1907.07412v5

Econometrics arXiv updated paper (originally submitted: 2019-07-16)

On the inconsistency of matching without replacement

Authors: Fredrik Sävje

The paper shows that matching without replacement on propensity scores
produces estimators that generally are inconsistent for the average treatment
effect of the treated. To achieve consistency, practitioners must either assume
that no units exist with propensity scores greater than one-half or assume that
there is no confounding among such units. The result is not driven by the use
of propensity scores, and similar artifacts arise when matching on other scores
as long as it is without replacement.

arXiv link: http://arxiv.org/abs/1907.07288v2

Econometrics arXiv updated paper (originally submitted: 2019-07-16)

Shrinkage in the Time-Varying Parameter Model Framework Using the R Package shrinkTVP

Authors: Peter Knaus, Angela Bitto-Nemling, Annalisa Cadonna, Sylvia Frühwirth-Schnatter

Time-varying parameter (TVP) models are widely used in time series analysis
to flexibly deal with processes which gradually change over time. However, the
risk of overfitting in TVP models is well known. This issue can be dealt with
using appropriate global-local shrinkage priors, which pull time-varying
parameters towards static ones. In this paper, we introduce the R package
shrinkTVP (Knaus, Bitto-Nemling, Cadonna, and Fr\"uhwirth-Schnatter 2019),
which provides a fully Bayesian implementation of shrinkage priors for TVP
models, taking advantage of recent developments in the literature, in
particular that of Bitto and Fr\"uhwirth-Schnatter (2019). The package
shrinkTVP allows for posterior simulation of the parameters through an
efficient Markov Chain Monte Carlo (MCMC) scheme. Moreover, summary and
visualization methods, as well as the possibility of assessing predictive
performance through log predictive density scores (LPDSs), are provided. The
computationally intensive tasks have been implemented in C++ and interfaced
with R. The paper includes a brief overview of the models and shrinkage priors
implemented in the package. Furthermore, core functionalities are illustrated,
both with simulated and real data.

arXiv link: http://arxiv.org/abs/1907.07065v3

Econometrics arXiv updated paper (originally submitted: 2019-07-16)

Information processing constraints in travel behaviour modelling: A generative learning approach

Authors: Melvin Wong, Bilal Farooq

Travel decisions tend to exhibit sensitivity to uncertainty and information
processing constraints. These behavioural conditions can be characterized by a
generative learning process. We propose a data-driven generative model version
of rational inattention theory to emulate these behavioural representations. We
outline the methodology of the generative model and the associated learning
process as well as provide an intuitive explanation of how this process
captures the value of prior information in the choice utility specification. We
demonstrate the effects of information heterogeneity on a travel choice,
analyze the econometric interpretation, and explore the properties of our
generative model. Our findings indicate a strong correlation with rational
inattention behaviour theory, which suggest that individuals may ignore certain
exogenous variables and rely on prior information for evaluating decisions
under uncertainty. Finally, the principles demonstrated in this study can be
formulated as a generalized entropy and utility based multinomial logit model.

arXiv link: http://arxiv.org/abs/1907.07036v2

Econometrics arXiv updated paper (originally submitted: 2019-07-15)

Audits as Evidence: Experiments, Ensembles, and Enforcement

Authors: Patrick Kline, Christopher Walters

We develop tools for utilizing correspondence experiments to detect illegal
discrimination by individual employers. Employers violate US employment law if
their propensity to contact applicants depends on protected characteristics
such as race or sex. We establish identification of higher moments of the
causal effects of protected characteristics on callback rates as a function of
the number of fictitious applications sent to each job ad. These moments are
used to bound the fraction of jobs that illegally discriminate. Applying our
results to three experimental datasets, we find evidence of significant
employer heterogeneity in discriminatory behavior, with the standard deviation
of gaps in job-specific callback probabilities across protected groups
averaging roughly twice the mean gap. In a recent experiment manipulating
racially distinctive names, we estimate that at least 85% of jobs that contact
both of two white applications and neither of two black applications are
engaged in illegal discrimination. To assess the tradeoff between type I and II
errors presented by these patterns, we consider the performance of a series of
decision rules for investigating suspicious callback behavior under a simple
two-type model that rationalizes the experimental data. Though, in our
preferred specification, only 17% of employers are estimated to discriminate on
the basis of race, we find that an experiment sending 10 applications to each
job would enable accurate detection of 7-10% of discriminators while falsely
accusing fewer than 0.2% of non-discriminators. A minimax decision rule
acknowledging partial identification of the joint distribution of callback
rates yields higher error rates but more investigations than our baseline
two-type model. Our results suggest illegal labor market discrimination can be
reliably monitored with relatively small modifications to existing audit
designs.

arXiv link: http://arxiv.org/abs/1907.06622v2

Econometrics arXiv updated paper (originally submitted: 2019-07-15)

Simple Adaptive Size-Exact Testing for Full-Vector and Subvector Inference in Moment Inequality Models

Authors: Gregory Cox, Xiaoxia Shi

We propose a simple test for moment inequalities that has exact size in
normal models with known variance and has uniformly asymptotically exact size
more generally. The test compares the quasi-likelihood ratio statistic to a
chi-squared critical value, where the degree of freedom is the rank of the
inequalities that are active in finite samples. The test requires no simulation
and thus is computationally fast and especially suitable for constructing
confidence sets for parameters by test inversion. It uses no tuning parameter
for moment selection and yet still adapts to the slackness of the moment
inequalities. Furthermore, we show how the test can be easily adapted for
inference on subvectors for the common empirical setting of conditional moment
inequalities with nuisance parameters entering linearly.

arXiv link: http://arxiv.org/abs/1907.06317v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-07-15

On the Evolution of U.S. Temperature Dynamics

Authors: Francis X. Diebold, Glenn D. Rudebusch

Climate change is a massive multidimensional shift. Temperature shifts, in
particular, have important implications for urbanization, agriculture, health,
productivity, and poverty, among other things. While much research has
documented rising mean temperature levels, we also examine range-based
measures of daily temperature volatility. Specifically, using data for
select U.S. cities over the past half-century, we compare the evolving time
series dynamics of the average temperature level, AVG, and the diurnal
temperature range, DTR (the difference between the daily maximum and minimum
temperatures). We characterize trend and seasonality in these two series using
linear models with time-varying coefficients. These straightforward yet
flexible approximations provide evidence of evolving DTR seasonality and stable
AVG seasonality.

arXiv link: http://arxiv.org/abs/1907.06303v3

Econometrics arXiv paper, submitted: 2019-07-12

On the residues vectors of a rational class of complex functions. Application to autoregressive processes

Authors: Guillermo Daniel Scheidereiter, Omar Roberto Faure

Complex functions have multiple uses in various fields of study, so analyze
their characteristics it is of extensive interest to other sciences. This work
begins with a particular class of rational functions of a complex variable;
over this is deduced two elementals properties concerning the residues and is
proposed one results which establishes one lower bound for the p-norm of the
residues vector. Applications to the autoregressive processes are presented and
the exemplifications are indicated in historical data of electric generation
and econometric series.

arXiv link: http://arxiv.org/abs/1907.05949v1

Econometrics arXiv updated paper (originally submitted: 2019-07-09)

Identification and Estimation of Discrete Choice Models with Unobserved Choice Sets

Authors: Victor H. Aguiar, Nail Kashaev

We propose a framework for nonparametric identification and estimation of
discrete choice models with unobserved choice sets. We recover the joint
distribution of choice sets and preferences from a panel dataset on choices. We
assume that either the latent choice sets are sparse or that the panel is
sufficiently long. Sparsity requires the number of possible choice sets to be
relatively small. It is satisfied, for instance, when the choice sets are
nested, or when they form a partition. Our estimation procedure is
computationally fast and uses mixed-integer optimization to recover the sparse
support of choice sets. Analyzing the ready-to-eat cereal industry using a
household scanner dataset, we find that ignoring the unobservability of choice
sets can lead to biased estimates of preferences due to significant latent
heterogeneity in choice sets.

arXiv link: http://arxiv.org/abs/1907.04853v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-07-09

Adaptive inference for a semiparametric generalized autoregressive conditional heteroskedasticity model

Authors: Feiyu Jiang, Dong Li, Ke Zhu

This paper considers a semiparametric generalized autoregressive conditional
heteroskedasticity (S-GARCH) model. For this model, we first estimate the
time-varying long run component for unconditional variance by the kernel
estimator, and then estimate the non-time-varying parameters in GARCH-type
short run component by the quasi maximum likelihood estimator (QMLE). We show
that the QMLE is asymptotically normal with the parametric convergence rate.
Next, we construct a Lagrange multiplier test for linear parameter constraint
and a portmanteau test for model checking, and obtain their asymptotic null
distributions. Our entire statistical inference procedure works for the
non-stationary data with two important features: first, our QMLE and two tests
are adaptive to the unknown form of the long run component; second, our QMLE
and two tests share the same efficiency and testing power as those in variance
targeting method when the S-GARCH model is stationary.

arXiv link: http://arxiv.org/abs/1907.04147v4

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2019-07-08

Competing Models

Authors: Jose Luis Montiel Olea, Pietro Ortoleva, Mallesh M Pai, Andrea Prat

Different agents need to make a prediction. They observe identical data, but
have different models: they predict using different explanatory variables. We
study which agent believes they have the best predictive ability -- as measured
by the smallest subjective posterior mean squared prediction error -- and show
how it depends on the sample size. With small samples, we present results
suggesting it is an agent using a low-dimensional model. With large samples, it
is generally an agent with a high-dimensional model, possibly including
irrelevant variables, but never excluding relevant ones. We apply our results
to characterize the winning model in an auction of productive assets, to argue
that entrepreneurs and investors with simple models will be over-represented in
new sectors, and to understand the proliferation of "factors" that explain the
cross-sectional variation of expected stock returns in the asset-pricing
literature.

arXiv link: http://arxiv.org/abs/1907.03809v5

Econometrics arXiv cross-link from q-fin.PM (q-fin.PM), submitted: 2019-07-08

Artificial Intelligence Alter Egos: Who benefits from Robo-investing?

Authors: Catherine D'Hondt, Rudy De Winne, Eric Ghysels, Steve Raymond

Artificial intelligence, or AI, enhancements are increasingly shaping our
daily lives. Financial decision-making is no exception to this. We introduce
the notion of AI Alter Egos, which are shadow robo-investors, and use a unique
data set covering brokerage accounts for a large cross-section of investors
over a sample from January 2003 to March 2012, which includes the 2008
financial crisis, to assess the benefits of robo-investing. We have detailed
investor characteristics and records of all trades. Our data set consists of
investors typically targeted for robo-advising. We explore robo-investing
strategies commonly used in the industry, including some involving advanced
machine learning methods. The man versus machine comparison allows us to shed
light on potential benefits the emerging robo-advising industry may provide to
certain segments of the population, such as low income and/or high risk averse
investors.

arXiv link: http://arxiv.org/abs/1907.03370v1

Econometrics arXiv updated paper (originally submitted: 2019-07-04)

Random Forest Estimation of the Ordered Choice Model

Authors: Michael Lechner, Gabriel Okasa

In this paper we develop a new machine learning estimator for ordered choice
models based on the random forest. The proposed Ordered Forest flexibly
estimates the conditional choice probabilities while taking the ordering
information explicitly into account. In addition to common machine learning
estimators, it enables the estimation of marginal effects as well as conducting
inference and thus provides the same output as classical econometric
estimators. An extensive simulation study reveals a good predictive
performance, particularly in settings with non-linearities and
near-multicollinearity. An empirical application contrasts the estimation of
marginal effects and their standard errors with an ordered logit model. A
software implementation of the Ordered Forest is provided both in R and Python
in the package orf available on CRAN and PyPI, respectively.

arXiv link: http://arxiv.org/abs/1907.02436v3

Econometrics arXiv updated paper (originally submitted: 2019-07-04)

Heterogeneous Choice Sets and Preferences

Authors: Levon Barseghyan, Maura Coughlin, Francesca Molinari, Joshua C. Teitelbaum

We propose a robust method of discrete choice analysis when agents' choice
sets are unobserved. Our core model assumes nothing about agents' choice sets
apart from their minimum size. Importantly, it leaves unrestricted the
dependence, conditional on observables, between choice sets and preferences. We
first characterize the sharp identification region of the model's parameters by
a finite set of conditional moment inequalities. We then apply our theoretical
findings to learn about households' risk preferences and choice sets from data
on their deductible choices in auto collision insurance. We find that the data
can be explained by expected utility theory with low levels of risk aversion
and heterogeneous non-singleton choice sets, and that more than three in four
households require limited choice sets to explain their deductible choices. We
also provide simulation evidence on the computational tractability of our
method in applications with larger feasible sets or higher-dimensional
unobserved heterogeneity.

arXiv link: http://arxiv.org/abs/1907.02337v2

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2019-07-04

Optimal transport on large networks, a practitioner's guide

Authors: Arthur Charpentier, Alfred Galichon, Lucas Vernet

This article presents a set of tools for the modeling of a spatial allocation
problem in a large geographic market and gives examples of applications. In our
settings, the market is described by a network that maps the cost of travel
between each pair of adjacent locations. Two types of agents are located at the
nodes of this network. The buyers choose the most competitive sellers depending
on their prices and the cost to reach them. Their utility is assumed additive
in both these quantities. Each seller, taking as given other sellers prices,
sets her own price to have a demand equal to the one we observed. We give a
linear programming formulation for the equilibrium conditions. After formally
introducing our model we apply it on two examples: prices offered by petrol
stations and quality of services provided by maternity wards. These examples
illustrate the applicability of our model to aggregate demand, rank prices and
estimate cost structure over the network. We insist on the possibility of
applications to large scale data sets using modern linear programming solvers
such as Gurobi. In addition to this paper we released a R toolbox to implement
our results and an online tutorial (http://optimalnetwork.github.io)

arXiv link: http://arxiv.org/abs/1907.02320v2

Econometrics arXiv updated paper (originally submitted: 2019-07-04)

Heterogeneous Regression Models for Clusters of Spatial Dependent Data

Authors: Zhihua Ma, Yishu Xue, Guanyu Hu

In economic development, there are often regions that share similar economic
characteristics, and economic models on such regions tend to have similar
covariate effects. In this paper, we propose a Bayesian clustered regression
for spatially dependent data in order to detect clusters in the covariate
effects. Our proposed method is based on the Dirichlet process which provides a
probabilistic framework for simultaneous inference of the number of clusters
and the clustering configurations. The usage of our method is illustrated both
in simulation studies and an application to a housing cost dataset of Georgia.

arXiv link: http://arxiv.org/abs/1907.02212v4

Econometrics arXiv updated paper (originally submitted: 2019-07-03)

The Informativeness of Estimation Moments

Authors: Bo Honore, Thomas Jorgensen, Aureo de Paula

This paper introduces measures for how each moment contributes to the
precision of parameter estimates in GMM settings. For example, one of the
measures asks what would happen to the variance of the parameter estimates if a
particular moment was dropped from the estimation. The measures are all easy to
compute. We illustrate the usefulness of the measures through two simple
examples as well as an application to a model of joint retirement planning of
couples. We estimate the model using the UK-BHPS, and we find evidence of
complementarities in leisure. Our sensitivity measures illustrate that the
estimate of the complementarity is primarily informed by the distribution of
differences in planned retirement dates. The estimated econometric model can be
interpreted as a bivariate ordered choice model that allows for simultaneity.
This makes the model potentially useful in other applications.

arXiv link: http://arxiv.org/abs/1907.02101v2

Econometrics arXiv updated paper (originally submitted: 2019-07-03)

An Econometric Perspective on Algorithmic Subsampling

Authors: Sokbae Lee, Serena Ng

Datasets that are terabytes in size are increasingly common, but computer
bottlenecks often frustrate a complete analysis of the data. While more data
are better than less, diminishing returns suggest that we may not need
terabytes of data to estimate a parameter or test a hypothesis. But which rows
of data should we analyze, and might an arbitrary subset of rows preserve the
features of the original data? This paper reviews a line of work that is
grounded in theoretical computer science and numerical linear algebra, and
which finds that an algorithmically desirable sketch, which is a randomly
chosen subset of the data, must preserve the eigenstructure of the data, a
property known as a subspace embedding. Building on this work, we study how
prediction and inference can be affected by data sketching within a linear
regression setup. We show that the sketching error is small compared to the
sample size effect which a researcher can control. As a sketch size that is
algorithmically optimal may not be suitable for prediction and inference, we
use statistical arguments to provide 'inference conscious' guides to the sketch
size. When appropriately implemented, an estimator that pools over different
sketches can be nearly as efficient as the infeasible one using the full
sample.

arXiv link: http://arxiv.org/abs/1907.01954v4

Econometrics arXiv paper, submitted: 2019-07-02

Adaptive Pricing in Insurance: Generalized Linear Models and Gaussian Process Regression Approaches

Authors: Yuqing Zhang, Neil Walton

We study the application of dynamic pricing to insurance. We view this as an
online revenue management problem where the insurance company looks to set
prices to optimize the long-run revenue from selling a new insurance product.
We develop two pricing models: an adaptive Generalized Linear Model (GLM) and
an adaptive Gaussian Process (GP) regression model. Both balance between
exploration, where we choose prices in order to learn the distribution of
demands & claims for the insurance product, and exploitation, where we
myopically choose the best price from the information gathered so far. The
performance of the pricing policies is measured in terms of regret: the
expected revenue loss caused by not using the optimal price. As is commonplace
in insurance, we model demand and claims by GLMs. In our adaptive GLM design,
we use the maximum quasi-likelihood estimation (MQLE) to estimate the unknown
parameters. We show that, if prices are chosen with suitably decreasing
variability, the MQLE parameters eventually exist and converge to the correct
values, which in turn implies that the sequence of chosen prices will also
converge to the optimal price. In the adaptive GP regression model, we sample
demand and claims from Gaussian Processes and then choose selling prices by the
upper confidence bound rule. We also analyze these GLM and GP pricing
algorithms with delayed claims. Although similar results exist in other
domains, this is among the first works to consider dynamic pricing problems in
the field of insurance. We also believe this is the first work to consider
Gaussian Process regression in the context of insurance pricing. These initial
findings suggest that online machine learning algorithms could be a fruitful
area of future investigation and application in insurance.

arXiv link: http://arxiv.org/abs/1907.05381v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-07-02

Large Volatility Matrix Prediction with High-Frequency Data

Authors: Xinyu Song

We provide a novel method for large volatility matrix prediction with
high-frequency data by applying eigen-decomposition to daily realized
volatility matrix estimators and capturing eigenvalue dynamics with ARMA
models. Given a sequence of daily volatility matrix estimators, we compute the
aggregated eigenvectors and obtain the corresponding eigenvalues. Eigenvalues
in the same relative magnitude form a time series and the ARMA models are
further employed to model the dynamics within each eigenvalue time series to
produce a predictor. We predict future large volatility matrix based on the
predicted eigenvalues and the aggregated eigenvectors, and demonstrate the
advantages of the proposed method in volatility prediction and portfolio
allocation problems.

arXiv link: http://arxiv.org/abs/1907.01196v2

Econometrics arXiv paper, submitted: 2019-07-01

Simulation smoothing for nowcasting with large mixed-frequency VARs

Authors: Sebastian Ankargren, Paulina Jonéus

There is currently an increasing interest in large vector autoregressive
(VAR) models. VARs are popular tools for macroeconomic forecasting and use of
larger models has been demonstrated to often improve the forecasting ability
compared to more traditional small-scale models. Mixed-frequency VARs deal with
data sampled at different frequencies while remaining within the realms of
VARs. Estimation of mixed-frequency VARs makes use of simulation smoothing, but
using the standard procedure these models quickly become prohibitive in
nowcasting situations as the size of the model grows. We propose two algorithms
that alleviate the computational efficiency of the simulation smoothing
algorithm. Our preferred choice is an adaptive algorithm, which augments the
state vector as necessary to sample also monthly variables that are missing at
the end of the sample. For large VARs, we find considerable improvements in
speed using our adaptive algorithm. The algorithm therefore provides a crucial
building block for bringing the mixed-frequency VARs to the high-dimensional
regime.

arXiv link: http://arxiv.org/abs/1907.01075v1

Econometrics arXiv updated paper (originally submitted: 2019-07-01)

Permutation inference with a finite number of heterogeneous clusters

Authors: Andreas Hagemann

I introduce a simple permutation procedure to test conventional (non-sharp)
hypotheses about the effect of a binary treatment in the presence of a finite
number of large, heterogeneous clusters when the treatment effect is identified
by comparisons across clusters. The procedure asymptotically controls size by
applying a level-adjusted permutation test to a suitable statistic. The
adjustments needed for most empirically relevant situations are tabulated in
the paper. The adjusted permutation test is easy to implement in practice and
performs well at conventional levels of significance with at least four treated
clusters and a similar number of control clusters. It is particularly robust to
situations where some clusters are much more variable than others. Examples and
an empirical application are provided.

arXiv link: http://arxiv.org/abs/1907.01049v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-06-30

Bounding Causes of Effects with Mediators

Authors: Philip Dawid, Macartan Humphreys, Monica Musio

Suppose X and Y are binary exposure and outcome variables, and we have full
knowledge of the distribution of Y, given application of X. From this we know
the average causal effect of X on Y. We are now interested in assessing, for a
case that was exposed and exhibited a positive outcome, whether it was the
exposure that caused the outcome. The relevant "probability of causation", PC,
typically is not identified by the distribution of Y given X, but bounds can be
placed on it, and these bounds can be improved if we have further information
about the causal process. Here we consider cases where we know the
probabilistic structure for a sequence of complete mediators between X and Y.
We derive a general formula for calculating bounds on PC for any pattern of
data on the mediators (including the case with no data). We show that the
largest and smallest upper and lower bounds that can result from any complete
mediation process can be obtained in processes with at most two steps. We also
consider homogeneous processes with many mediators. PC can sometimes be
identified as 0 with negative data, but it cannot be identified at 1 even with
positive data on an infinite set of mediators. The results have implications
for learning about causation from knowledge of general processes and of data on
cases.

arXiv link: http://arxiv.org/abs/1907.00399v1

Econometrics arXiv updated paper (originally submitted: 2019-06-29)

Relaxing the Exclusion Restriction in Shift-Share Instrumental Variable Estimation

Authors: Nicolas Apfel

Many economic studies use shift-share instruments to estimate causal effects.
Often, all shares need to fulfil an exclusion restriction, making the
identifying assumption strict. This paper proposes to use methods that relax
the exclusion restriction by selecting invalid shares. I apply the methods in
two empirical examples: the effect of immigration on wages and of Chinese
import exposure on employment. In the first application, the coefficient
becomes lower and often changes sign, but this is reconcilable with arguments
made in the literature. In the second application, the findings are mostly
robust to the use of the new methods.

arXiv link: http://arxiv.org/abs/1907.00222v4

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2019-06-28

Dealing with Stochastic Volatility in Time Series Using the R Package stochvol

Authors: Gregor Kastner

The R package stochvol provides a fully Bayesian implementation of
heteroskedasticity modeling within the framework of stochastic volatility. It
utilizes Markov chain Monte Carlo (MCMC) samplers to conduct inference by
obtaining draws from the posterior distribution of parameters and latent
variables which can then be used for predicting future volatilities. The
package can straightforwardly be employed as a stand-alone tool; moreover, it
allows for easy incorporation into other MCMC samplers. The main focus of this
paper is to show the functionality of stochvol. In addition, it provides a
brief mathematical description of the model, an overview of the sampling
schemes used, and several illustrative examples using exchange rate data.

arXiv link: http://arxiv.org/abs/1906.12134v1

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2019-06-28

Modeling Univariate and Multivariate Stochastic Volatility in R with stochvol and factorstochvol

Authors: Darjus Hosszejni, Gregor Kastner

Stochastic volatility (SV) models are nonlinear state-space models that enjoy
increasing popularity for fitting and predicting heteroskedastic time series.
However, due to the large number of latent quantities, their efficient
estimation is non-trivial and software that allows to easily fit SV models to
data is rare. We aim to alleviate this issue by presenting novel
implementations of four SV models delivered in two R packages. Several unique
features are included and documented. As opposed to previous versions, stochvol
is now capable of handling linear mean models, heavy-tailed SV, and SV with
leverage. Moreover, we newly introduce factorstochvol which caters for
multivariate SV. Both packages offer a user-friendly interface through the
conventional R generics and a range of tailor-made methods. Computational
efficiency is achieved via interfacing R to C++ and doing the heavy work in the
latter. In the paper at hand, we provide a detailed discussion on Bayesian SV
estimation and showcase the use of the new software through various examples.

arXiv link: http://arxiv.org/abs/1906.12123v3

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-06-26

Estimation of the size of informal employment based on administrative records with non-ignorable selection mechanism

Authors: Maciej Beręsewicz, Dagmara Nikulin

In this study we used company level administrative data from the National
Labour Inspectorate and The Polish Social Insurance Institution in order to
estimate the prevalence of informal employment in Poland. Since the selection
mechanism is non-ignorable we employed a generalization of Heckman's sample
selection model assuming non-Gaussian correlation of errors and clustering by
incorporation of random effects. We found that 5.7% (4.6%, 7.1%; 95% CI) of
registered enterprises in Poland, to some extent, take advantage of the
informal labour force. Our study exemplifies a new approach to measuring
informal employment, which can be implemented in other countries. It also
contributes to the existing literature by providing, to the best of our
knowledge, the first estimates of informal employment at the level of companies
based solely on administrative data.

arXiv link: http://arxiv.org/abs/1906.10957v1

Econometrics arXiv updated paper (originally submitted: 2019-06-25)

Understanding the explosive trend in EU ETS prices -- fundamentals or speculation?

Authors: Marina Friedrich, Sébastien Fries, Michael Pahle, Ottmar Edenhofer

In 2018, allowance prices in the EU Emission Trading Scheme (EU ETS)
experienced a run-up from persistently low levels in previous years. Regulators
attribute this to a comprehensive reform in the same year, and are confident
the new price level reflects an anticipated tighter supply of allowances. We
ask if this is indeed the case, or if it is an overreaction of the market
driven by speculation. We combine several econometric methods - time-varying
coefficient regression, formal bubble detection as well as time stamping and
crash odds prediction - to juxtapose the regulators' claim versus the
concurrent explanation. We find evidence of a long period of explosive
behaviour in allowance prices, starting in March 2018 when the reform was
adopted. Our results suggest that the reform triggered market participants into
speculation, and question regulators' confidence in its long-term outcome. This
has implications for both the further development of the EU ETS, and the long
lasting debate about taxes versus emission trading schemes.

arXiv link: http://arxiv.org/abs/1906.10572v5

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-06-25

Forecasting the Remittances of the Overseas Filipino Workers in the Philippines

Authors: Merry Christ E. Manayaga, Roel F. Ceballos

This study aims to find a Box-Jenkins time series model for the monthly OFW's
remittance in the Philippines. Forecasts of OFW's remittance for the years 2018
and 2019 will be generated using the appropriate time series model. The data
were retrieved from the official website of Bangko Sentral ng Pilipinas. There
are 108 observations, 96 of which were used in model building and the remaining
12 observations were used in forecast evaluation. ACF and PACF were used to
examine the stationarity of the series. Augmented Dickey Fuller test was used
to confirm the stationarity of the series. The data was found to have a
seasonal component, thus, seasonality has been considered in the final model
which is SARIMA (2,1,0)x(0,0,2)_12. There are no significant spikes in the ACF
and PACF of residuals of the final model and the L-jung Box Q* test confirms
further that the residuals of the model are uncorrelated. Also, based on the
result of the Shapiro-Wilk test for the forecast errors, the forecast errors
can be considered a Gaussian white noise. Considering the results of diagnostic
checking and forecast evaluation, SARIMA (2,1,0)x(0,0,2)_12 is an appropriate
model for the series. All necessary computations were done using the R
statistical software.

arXiv link: http://arxiv.org/abs/1906.10422v1

Econometrics arXiv updated paper (originally submitted: 2019-06-24)

Policy Targeting under Network Interference

Authors: Davide Viviano

This paper studies the problem of optimally allocating treatments in the
presence of spillover effects, using information from a (quasi-)experiment. I
introduce a method that maximizes the sample analog of average social welfare
when spillovers occur. I construct semi-parametric welfare estimators with
known and unknown propensity scores and cast the optimization problem into a
mixed-integer linear program, which can be solved using off-the-shelf
algorithms. I derive a strong set of guarantees on regret, i.e., the difference
between the maximum attainable welfare and the welfare evaluated at the
estimated policy. The proposed method presents attractive features for
applications: (i) it does not require network information of the target
population; (ii) it exploits heterogeneity in treatment effects for targeting
individuals; (iii) it does not rely on the correct specification of a
particular structural model; and (iv) it accommodates constraints on the policy
function. An application for targeting information on social networks
illustrates the advantages of the method.

arXiv link: http://arxiv.org/abs/1906.10258v14

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-06-24

Empirical Process Results for Exchangeable Arrays

Authors: Laurent Davezies, Xavier D'Haultfoeuille, Yannick Guyonvarch

Exchangeable arrays are natural tools to model common forms of dependence
between units of a sample. Jointly exchangeable arrays are well suited to
dyadic data, where observed random variables are indexed by two units from the
same population. Examples include trade flows between countries or
relationships in a network. Separately exchangeable arrays are well suited to
multiway clustering, where units sharing the same cluster (e.g. geographical
areas or sectors of activity when considering individual wages) may be
dependent in an unrestricted way. We prove uniform laws of large numbers and
central limit theorems for such exchangeable arrays. We obtain these results
under the same moment restrictions and conditions on the class of functions as
those typically assumed with i.i.d. data. We also show the convergence of
bootstrap processes adapted to such arrays.

arXiv link: http://arxiv.org/abs/1906.11293v4

Econometrics arXiv cross-link from q-fin.RM (q-fin.RM), submitted: 2019-06-21

Semi-parametric Realized Nonlinear Conditional Autoregressive Expectile and Expected Shortfall

Authors: Chao Wang, Richard Gerlach

A joint conditional autoregressive expectile and Expected Shortfall framework
is proposed. The framework is extended through incorporating a measurement
equation which models the contemporaneous dependence between the realized
measures and the latent conditional expectile. Nonlinear threshold
specification is further incorporated into the proposed framework. A Bayesian
Markov Chain Monte Carlo method is adapted for estimation, whose properties are
assessed and compared with maximum likelihood via a simulation study.
One-day-ahead VaR and ES forecasting studies, with seven market indices,
provide empirical support to the proposed models.

arXiv link: http://arxiv.org/abs/1906.09961v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-06-20

On the probability of a causal inference is robust for internal validity

Authors: Tenglong Li, Kenneth A. Frank

The internal validity of observational study is often subject to debate. In
this study, we define the counterfactuals as the unobserved sample and intend
to quantify its relationship with the null hypothesis statistical testing
(NHST). We propose the probability of a causal inference is robust for internal
validity, i.e., the PIV, as a robustness index of causal inference. Formally,
the PIV is the probability of rejecting the null hypothesis again based on both
the observed sample and the counterfactuals, provided the same null hypothesis
has already been rejected based on the observed sample. Under either
frequentist or Bayesian framework, one can bound the PIV of an inference based
on his bounded belief about the counterfactuals, which is often needed when the
unconfoundedness assumption is dubious. The PIV is equivalent to statistical
power when the NHST is thought to be based on both the observed sample and the
counterfactuals. We summarize the process of evaluating internal validity with
the PIV into an eight-step procedure and illustrate it with an empirical
example (i.e., Hong and Raudenbush (2005)).

arXiv link: http://arxiv.org/abs/1906.08726v1

Econometrics arXiv paper, submitted: 2019-06-19

From Local to Global: External Validity in a Fertility Natural Experiment

Authors: Rajeev Dehejia, Cristian Pop-Eleches, Cyrus Samii

We study issues related to external validity for treatment effects using over
100 replications of the Angrist and Evans (1998) natural experiment on the
effects of sibling sex composition on fertility and labor supply. The
replications are based on census data from around the world going back to 1960.
We decompose sources of error in predicting treatment effects in external
contexts in terms of macro and micro sources of variation. In our empirical
setting, we find that macro covariates dominate over micro covariates for
reducing errors in predicting treatments, an issue that past studies of
external validity have been unable to evaluate. We develop methods for two
applications to evidence-based decision-making, including determining where to
locate an experiment and whether policy-makers should commission new
experiments or rely on an existing evidence base for making a policy decision.

arXiv link: http://arxiv.org/abs/1906.08096v1

Econometrics arXiv updated paper (originally submitted: 2019-06-19)

Sparse structures with LASSO through Principal Components: forecasting GDP components in the short-run

Authors: Saulius Jokubaitis, Dmitrij Celov, Remigijus Leipus

This paper aims to examine the use of sparse methods to forecast the real, in
the chain-linked volume sense, expenditure components of the US and EU GDP in
the short-run sooner than the national institutions of statistics officially
release the data. We estimate current quarter nowcasts along with 1- and
2-quarter forecasts by bridging quarterly data with available monthly
information announced with a much smaller delay. We solve the
high-dimensionality problem of the monthly dataset by assuming sparse
structures of leading indicators, capable of adequately explaining the dynamics
of analyzed data. For variable selection and estimation of the forecasts, we
use the sparse methods - LASSO together with its recent modifications. We
propose an adjustment that combines LASSO cases with principal components
analysis that deemed to improve the forecasting performance. We evaluate
forecasting performance conducting pseudo-real-time experiments for gross fixed
capital formation, private consumption, imports and exports over the sample of
2005-2019, compared with benchmark ARMA and factor models. The main results
suggest that sparse methods can outperform the benchmarks and to identify
reasonable subsets of explanatory variables. The proposed LASSO-PC modification
show further improvement in forecast accuracy.

arXiv link: http://arxiv.org/abs/1906.07992v2

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2019-06-18

Signatures of crypto-currency market decoupling from the Forex

Authors: Stanisław Drożdż, Ludovico Minati, Paweł Oświęcimka, Marek Stanuszek, Marcin Wątorek

Based on the high-frequency recordings from Kraken, a cryptocurrency exchange
and professional trading platform that aims to bring Bitcoin and other
cryptocurrencies into the mainstream, the multiscale cross-correlations
involving the Bitcoin (BTC), Ethereum (ETH), Euro (EUR) and US dollar (USD) are
studied over the period between July 1, 2016 and December 31, 2018. It is shown
that the multiscaling characteristics of the exchange rate fluctuations related
to the cryptocurrency market approach those of the Forex. This, in particular,
applies to the BTC/ETH exchange rate, whose Hurst exponent by the end of 2018
started approaching the value of 0.5, which is characteristic of the mature
world markets. Furthermore, the BTC/ETH direct exchange rate has already
developed multifractality, which manifests itself via broad singularity
spectra. A particularly significant result is that the measures applied for
detecting cross-correlations between the dynamics of the BTC/ETH and EUR/USD
exchange rates do not show any noticeable relationships. This may be taken as
an indication that the cryptocurrency market has begun decoupling itself from
the Forex.

arXiv link: http://arxiv.org/abs/1906.07834v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-06-18

Nonparametric estimation in a regression model with additive and multiplicative noise

Authors: Christophe Chesneau, Salima El Kolei, Junke Kou, Fabien Navarro

In this paper, we consider an unknown functional estimation problem in a
general nonparametric regression model with the feature of having both
multiplicative and additive noise.We propose two new wavelet estimators in this
general context. We prove that they achieve fast convergence rates under the
mean integrated square error over Besov spaces. The obtained rates have the
particularity of being established under weak conditions on the model. A
numerical study in a context comparable to stochastic frontier estimation (with
the difference that the boundary is not necessarily a production function)
supports the theory.

arXiv link: http://arxiv.org/abs/1906.07695v2

Econometrics arXiv paper, submitted: 2019-06-16

Shape Matters: Evidence from Machine Learning on Body Shape-Income Relationship

Authors: Suyong Song, Stephen S. Baek

We study the association between physical appearance and family income using
a novel data which has 3-dimensional body scans to mitigate the issue of
reporting errors and measurement errors observed in most previous studies. We
apply machine learning to obtain intrinsic features consisting of human body
and take into account a possible issue of endogenous body shapes. The
estimation results show that there is a significant relationship between
physical appearance and family income and the associations are different across
the gender. This supports the hypothesis on the physical attractiveness premium
and its heterogeneity across the gender.

arXiv link: http://arxiv.org/abs/1906.06747v1

Econometrics arXiv updated paper (originally submitted: 2019-06-16)

Detecting p-hacking

Authors: Graham Elliott, Nikolay Kudrin, Kaspar Wuthrich

We theoretically analyze the problem of testing for $p$-hacking based on
distributions of $p$-values across multiple studies. We provide general results
for when such distributions have testable restrictions (are non-increasing)
under the null of no $p$-hacking. We find novel additional testable
restrictions for $p$-values based on $t$-tests. Specifically, the shape of the
power functions results in both complete monotonicity as well as bounds on the
distribution of $p$-values. These testable restrictions result in more powerful
tests for the null hypothesis of no $p$-hacking. When there is also publication
bias, our tests are joint tests for $p$-hacking and publication bias. A
reanalysis of two prominent datasets shows the usefulness of our new tests.

arXiv link: http://arxiv.org/abs/1906.06711v5

Econometrics arXiv updated paper (originally submitted: 2019-06-16)

On the Properties of the Synthetic Control Estimator with Many Periods and Many Controls

Authors: Bruno Ferman

We consider the asymptotic properties of the Synthetic Control (SC) estimator
when both the number of pre-treatment periods and control units are large. If
potential outcomes follow a linear factor model, we provide conditions under
which the factor loadings of the SC unit converge in probability to the factor
loadings of the treated unit. This happens when there are weights diluted among
an increasing number of control units such that a weighted average of the
factor loadings of the control units asymptotically reconstructs the factor
loadings of the treated unit. In this case, the SC estimator is asymptotically
unbiased even when treatment assignment is correlated with time-varying
unobservables. This result can be valid even when the number of control units
is larger than the number of pre-treatment periods.

arXiv link: http://arxiv.org/abs/1906.06665v5

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2019-06-15

lpdensity: Local Polynomial Density Estimation and Inference

Authors: Matias D. Cattaneo, Michael Jansson, Xinwei Ma

Density estimation and inference methods are widely used in empirical work.
When the underlying distribution has compact support, conventional kernel-based
density estimators are no longer consistent near or at the boundary because of
their well-known boundary bias. Alternative smoothing methods are available to
handle boundary points in density estimation, but they all require additional
tuning parameter choices or other typically ad hoc modifications depending on
the evaluation point and/or approach considered. This article discusses the R
and Stata package lpdensity implementing a novel local polynomial density
estimator proposed and studied in Cattaneo, Jansson, and Ma (2020, 2021), which
is boundary adaptive and involves only one tuning parameter. The methods
implemented also cover local polynomial estimation of the cumulative
distribution function and density derivatives. In addition to point estimation
and graphical procedures, the package offers consistent variance estimators,
mean squared error optimal bandwidth selection, robust bias-corrected
inference, and confidence bands construction, among other features. A
comparison with other density estimation packages available in R using a Monte
Carlo experiment is provided.

arXiv link: http://arxiv.org/abs/1906.06529v3

Econometrics arXiv paper, submitted: 2019-06-15

Proxy expenditure weights for Consumer Price Index: Audit sampling inference for big data statistics

Authors: Li-Chun Zhang

Purchase data from retail chains provide proxy measures of private household
expenditure on items that are the most troublesome to collect in the
traditional expenditure survey. Due to the sheer amount of proxy data, the bias
due to coverage and selection errors completely dominates the variance. We
develop tests for bias based on audit sampling, which makes use of available
survey data that cannot be linked to the proxy data source at the individual
level. However, audit sampling fails to yield a meaningful mean squared error
estimate, because the sampling variance is too large compared to the bias of
the big data estimate. We propose a novel accuracy measure that is applicable
in such situations. This can provide a necessary part of the statistical
argument for the uptake of big data source, in replacement of traditional
survey sampling. An application to disaggregated food price index is used to
demonstrate the proposed approach.

arXiv link: http://arxiv.org/abs/1906.11208v1

Econometrics arXiv updated paper (originally submitted: 2019-06-14)

Posterior Average Effects

Authors: Stéphane Bonhomme, Martin Weidner

Economists are often interested in estimating averages with respect to
distributions of unobservables, such as moments of individual fixed-effects, or
average partial effects in discrete choice models. For such quantities, we
propose and study posterior average effects (PAE), where the average is
computed conditional on the sample, in the spirit of empirical Bayes and
shrinkage methods. While the usefulness of shrinkage for prediction is
well-understood, a justification of posterior conditioning to estimate
population averages is currently lacking. We show that PAE have minimum
worst-case specification error under various forms of misspecification of the
parametric distribution of unobservables. In addition, we introduce a measure
of informativeness of the posterior conditioning, which quantifies the
worst-case specification error of PAE relative to parametric model-based
estimators. As illustrations, we report PAE estimates of distributions of
neighborhood effects in the US, and of permanent and transitory components in a
model of income dynamics.

arXiv link: http://arxiv.org/abs/1906.06360v6

Econometrics arXiv paper, submitted: 2019-06-13

Sparse Approximate Factor Estimation for High-Dimensional Covariance Matrices

Authors: Maurizio Daniele, Winfried Pohlmeier, Aygul Zagidullina

We propose a novel estimation approach for the covariance matrix based on the
$l_1$-regularized approximate factor model. Our sparse approximate factor (SAF)
covariance estimator allows for the existence of weak factors and hence relaxes
the pervasiveness assumption generally adopted for the standard approximate
factor model. We prove consistency of the covariance matrix estimator under the
Frobenius norm as well as the consistency of the factor loadings and the
factors.
Our Monte Carlo simulations reveal that the SAF covariance estimator has
superior properties in finite samples for low and high dimensions and different
designs of the covariance matrix. Moreover, in an out-of-sample portfolio
forecasting application the estimator uniformly outperforms alternative
portfolio strategies based on alternative covariance estimation approaches and
modeling strategies including the $1/N$-strategy.

arXiv link: http://arxiv.org/abs/1906.05545v1

Econometrics arXiv paper, submitted: 2019-06-12

Nonparametric Identification and Estimation with Independent, Discrete Instruments

Authors: Isaac Loh

In a nonparametric instrumental regression model, we strengthen the
conventional moment independence assumption towards full statistical
independence between instrument and error term. This allows us to prove
identification results and develop estimators for a structural function of
interest when the instrument is discrete, and in particular binary. When the
regressor of interest is also discrete with more mass points than the
instrument, we state straightforward conditions under which the structural
function is partially identified, and give modified assumptions which imply
point identification. These stronger assumptions are shown to hold outside of a
small set of conditional moments of the error term. Estimators for the
identified set are given when the structural function is either partially or
point identified. When the regressor is continuously distributed, we prove that
if the instrument induces a sufficiently rich variation in the joint
distribution of the regressor and error term then point identification of the
structural function is still possible. This approach is relatively tractable,
and under some standard conditions we demonstrate that our point identifying
assumption holds on a topologically generic set of density functions for the
joint distribution of regressor, error, and instrument. Our method also applies
to a well-known nonparametric quantile regression framework, and we are able to
state analogous point identification results in that context.

arXiv link: http://arxiv.org/abs/1906.05231v1

Econometrics arXiv paper, submitted: 2019-06-11

Generalized Beta Prime Distribution: Stochastic Model of Economic Exchange and Properties of Inequality Indices

Authors: M. Dashti Moghaddam, Jeffrey Mills, R. A. Serota

We argue that a stochastic model of economic exchange, whose steady-state
distribution is a Generalized Beta Prime (also known as GB2), and some unique
properties of the latter, are the reason for GB2's success in describing
wealth/income distributions. We use housing sale prices as a proxy to
wealth/income distribution to numerically illustrate this point. We also
explore parametric limits of the distribution to do so analytically. We discuss
parametric properties of the inequality indices -- Gini, Hoover, Theil T and
Theil L -- vis-a-vis those of GB2 and introduce a new inequality index, which
serves a similar purpose. We argue that Hoover and Theil L are more appropriate
measures for distributions with power-law dependencies, especially fat tails,
such as GB2.

arXiv link: http://arxiv.org/abs/1906.04822v1

Econometrics arXiv updated paper (originally submitted: 2019-06-11)

Bias-Aware Inference in Fuzzy Regression Discontinuity Designs

Authors: Claudia Noack, Christoph Rothe

We propose new confidence sets (CSs) for the regression discontinuity
parameter in fuzzy designs. Our CSs are based on local linear regression, and
are bias-aware, in the sense that they take possible bias explicitly into
account. Their construction shares similarities with that of Anderson-Rubin CSs
in exactly identified instrumental variable models, and thereby avoids issues
with "delta method" approximations that underlie most commonly used existing
inference methods for fuzzy regression discontinuity analysis. Our CSs are
asymptotically equivalent to existing procedures in canonical settings with
strong identification and a continuous running variable. However, due to their
particular construction they are also valid under a wide range of empirically
relevant conditions in which existing methods can fail, such as setups with
discrete running variables, donut designs, and weak identification.

arXiv link: http://arxiv.org/abs/1906.04631v4

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-06-11

Regional economic convergence and spatial quantile regression

Authors: Alfredo Cartone, Geoffrey JD Hewings, Paolo Postiglione

The presence of eta-convergence in European regions is an important issue
to be analyzed. In this paper, we adopt a quantile regression approach in
analyzing economic convergence. While previous work has performed quantile
regression at the national level, we focus on 187 European NUTS2 regions for
the period 1981-2009 and use spatial quantile regression to account for spatial
dependence.

arXiv link: http://arxiv.org/abs/1906.04613v1

Econometrics arXiv updated paper (originally submitted: 2019-06-10)

The Regression Discontinuity Design

Authors: Matias D. Cattaneo, Rocio Titiunik, Gonzalo Vazquez-Bare

This handbook chapter gives an introduction to the sharp regression
discontinuity design, covering identification, estimation, inference, and
falsification methods.

arXiv link: http://arxiv.org/abs/1906.04242v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-06-10

Efficient Bayesian estimation for GARCH-type models via Sequential Monte Carlo

Authors: Dan Li, Adam Clements, Christopher Drovandi

The advantages of sequential Monte Carlo (SMC) are exploited to develop
parameter estimation and model selection methods for GARCH (Generalized
AutoRegressive Conditional Heteroskedasticity) style models. It provides an
alternative method for quantifying estimation uncertainty relative to classical
inference. Even with long time series, it is demonstrated that the posterior
distribution of model parameters are non-normal, highlighting the need for a
Bayesian approach and an efficient posterior sampling method. Efficient
approaches for both constructing the sequence of distributions in SMC, and
leave-one-out cross-validation, for long time series data are also proposed.
Finally, an unbiased estimator of the likelihood is developed for the Bad
Environment-Good Environment model, a complex GARCH-type model, which permits
exact Bayesian inference not previously available in the literature.

arXiv link: http://arxiv.org/abs/1906.03828v2

Econometrics arXiv updated paper (originally submitted: 2019-06-07)

A Statistical Recurrent Stochastic Volatility Model for Stock Markets

Authors: Trong-Nghia Nguyen, Minh-Ngoc Tran, David Gunawan, R. Kohn

The Stochastic Volatility (SV) model and its variants are widely used in the
financial sector while recurrent neural network (RNN) models are successfully
used in many large-scale industrial applications of Deep Learning. Our article
combines these two methods in a non-trivial way and proposes a model, which we
call the Statistical Recurrent Stochastic Volatility (SR-SV) model, to capture
the dynamics of stochastic volatility. The proposed model is able to capture
complex volatility effects (e.g., non-linearity and long-memory
auto-dependence) overlooked by the conventional SV models, is statistically
interpretable and has an impressive out-of-sample forecast performance. These
properties are carefully discussed and illustrated through extensive simulation
studies and applications to five international stock index datasets: The German
stock index DAX30, the Hong Kong stock index HSI50, the France market index
CAC40, the US stock market index SP500 and the Canada market index TSX250. An
user-friendly software package together with the examples reported in the paper
are available at https://github.com/vbayeslab.

arXiv link: http://arxiv.org/abs/1906.02884v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2019-06-06

Counterfactual Inference for Consumer Choice Across Many Product Categories

Authors: Rob Donnelly, Francisco R. Ruiz, David Blei, Susan Athey

This paper proposes a method for estimating consumer preferences among
discrete choices, where the consumer chooses at most one product in a category,
but selects from multiple categories in parallel. The consumer's utility is
additive in the different categories. Her preferences about product attributes
as well as her price sensitivity vary across products and are in general
correlated across products. We build on techniques from the machine learning
literature on probabilistic models of matrix factorization, extending the
methods to account for time-varying product attributes and products going out
of stock. We evaluate the performance of the model using held-out data from
weeks with price changes or out of stock products. We show that our model
improves over traditional modeling approaches that consider each category in
isolation. One source of the improvement is the ability of the model to
accurately estimate heterogeneity in preferences (by pooling information across
categories); another source of improvement is its ability to estimate the
preferences of consumers who have rarely or never made a purchase in a given
category in the training data. Using held-out data, we show that our model can
accurately distinguish which consumers are most price sensitive to a given
product. We consider counterfactuals such as personally targeted price
discounts, showing that using a richer model such as the one we propose
substantially increases the benefits of personalization in discounts.

arXiv link: http://arxiv.org/abs/1906.02635v2

Econometrics arXiv updated paper (originally submitted: 2019-06-05)

Indirect Inference for Locally Stationary Models

Authors: David Frazier, Bonsoo Koo

We propose the use of indirect inference estimation to conduct inference in
complex locally stationary models. We develop a local indirect inference
algorithm and establish the asymptotic properties of the proposed estimator.
Due to the nonparametric nature of locally stationary models, the resulting
indirect inference estimator exhibits nonparametric rates of convergence. We
validate our methodology with simulation studies in the confines of a locally
stationary moving average model and a new locally stationary multiplicative
stochastic volatility model. Using this indirect inference methodology and the
new locally stationary volatility model, we obtain evidence of non-linear,
time-varying volatility trends for monthly returns on several Fama-French
portfolios.

arXiv link: http://arxiv.org/abs/1906.01768v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2019-06-04

Assessing Disparate Impacts of Personalized Interventions: Identifiability and Bounds

Authors: Nathan Kallus, Angela Zhou

Personalized interventions in social services, education, and healthcare
leverage individual-level causal effect predictions in order to give the best
treatment to each individual or to prioritize program interventions for the
individuals most likely to benefit. While the sensitivity of these domains
compels us to evaluate the fairness of such policies, we show that actually
auditing their disparate impacts per standard observational metrics, such as
true positive rates, is impossible since ground truths are unknown. Whether our
data is experimental or observational, an individual's actual outcome under an
intervention different than that received can never be known, only predicted
based on features. We prove how we can nonetheless point-identify these
quantities under the additional assumption of monotone treatment response,
which may be reasonable in many applications. We further provide a sensitivity
analysis for this assumption by means of sharp partial-identification bounds
under violations of monotonicity of varying strengths. We show how to use our
results to audit personalized interventions using partially-identified ROC and
xROC curves and demonstrate this in a case study of a French job training
dataset.

arXiv link: http://arxiv.org/abs/1906.01552v1

Econometrics arXiv updated paper (originally submitted: 2019-06-03)

The Laws of Motion of the Broker Call Rate in the United States

Authors: Alex Garivaltis

In this paper, which is the third installment of the author's trilogy on
margin loan pricing, we analyze $1,367$ monthly observations of the U.S. broker
call money rate, which is the interest rate at which stock brokers can borrow
to fund their margin loans to retail clients. We describe the basic features
and mean-reverting behavior of this series and juxtapose the
empirically-derived laws of motion with the author's prior theories of margin
loan pricing (Garivaltis 2019a-b). This allows us to derive stochastic
differential equations that govern the evolution of the margin loan interest
rate and the leverage ratios of sophisticated brokerage clients (namely,
continuous time Kelly gamblers). Finally, we apply Merton's (1974) arbitrage
theory of corporate liability pricing to study theoretical constraints on the
risk premia that could be generated in the market for call money. Apparently,
if there is no arbitrage in the U.S. financial markets, the implication is that
the total volume of call loans must constitute north of $70%$ of the value of
all leveraged portfolios.

arXiv link: http://arxiv.org/abs/1906.00946v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-06-03

Stress Testing Network Reconstruction via Graphical Causal Model

Authors: Helder Rojas, David Dias

An resilience optimal evaluation of financial portfolios implies having
plausible hypotheses about the multiple interconnections between the
macroeconomic variables and the risk parameters. In this paper, we propose a
graphical model for the reconstruction of the causal structure that links the
multiple macroeconomic variables and the assessed risk parameters, it is this
structure that we call Stress Testing Network (STN). In this model, the
relationships between the macroeconomic variables and the risk parameter define
a "relational graph" among their time-series, where related time-series are
connected by an edge. Our proposal is based on the temporal causal models, but
unlike, we incorporate specific conditions in the structure which correspond to
intrinsic characteristics this type of networks. Using the proposed model and
given the high-dimensional nature of the problem, we used regularization
methods to efficiently detect causality in the time-series and reconstruct the
underlying causal structure. In addition, we illustrate the use of model in
credit risk data of a portfolio. Finally, we discuss its uses and practical
benefits in stress testing.

arXiv link: http://arxiv.org/abs/1906.01468v3

Econometrics arXiv paper, submitted: 2019-06-03

Bayesian nonparametric graphical models for time-varying parameters VAR

Authors: Matteo Iacopini, Luca Rossini

Over the last decade, big data have poured into econometrics, demanding new
statistical methods for analysing high-dimensional data and complex non-linear
relationships. A common approach for addressing dimensionality issues relies on
the use of static graphical structures for extracting the most significant
dependence interrelationships between the variables of interest. Recently,
Bayesian nonparametric techniques have become popular for modelling complex
phenomena in a flexible and efficient manner, but only few attempts have been
made in econometrics. In this paper, we provide an innovative Bayesian
nonparametric (BNP) time-varying graphical framework for making inference in
high-dimensional time series. We include a Bayesian nonparametric dependent
prior specification on the matrix of coefficients and the covariance matrix by
mean of a Time-Series DPP as in Nieto-Barajas et al. (2012). Following Billio
et al. (2019), our hierarchical prior overcomes over-parametrization and
over-fitting issues by clustering the vector autoregressive (VAR) coefficients
into groups and by shrinking the coefficients of each group toward a common
location. Our BNP timevarying VAR model is based on a spike-and-slab
construction coupled with dependent Dirichlet Process prior (DPP) and allows
to: (i) infer time-varying Granger causality networks from time series; (ii)
flexibly model and cluster non-zero time-varying coefficients; (iii)
accommodate for potential non-linearities. In order to assess the performance
of the model, we study the merits of our approach by considering a well-known
macroeconomic dataset. Moreover, we check the robustness of the method by
comparing two alternative specifications, with Dirac and diffuse spike prior
distributions.

arXiv link: http://arxiv.org/abs/1906.02140v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-06-03

The Age-Period-Cohort-Interaction Model for Describing and Investigating Inter-Cohort Deviations and Intra-Cohort Life-Course Dynamics

Authors: Liying Luo, James Hodges

Social scientists have frequently sought to understand the distinct effects
of age, period, and cohort, but disaggregation of the three dimensions is
difficult because cohort = period - age. We argue that this technical
difficulty reflects a disconnection between how cohort effect is conceptualized
and how it is modeled in the traditional age-period-cohort framework. We
propose a new method, called the age-period-cohort-interaction (APC-I) model,
that is qualitatively different from previous methods in that it represents
Ryder's (1965) theoretical account about the conditions under which cohort
differentiation may arise. This APC-I model does not require problematic
statistical assumptions and the interpretation is straightforward. It
quantifies inter-cohort deviations from the age and period main effects and
also permits hypothesis testing about intra-cohort life-course dynamics. We
demonstrate how this new model can be used to examine age, period, and cohort
patterns in women's labor force participation.

arXiv link: http://arxiv.org/abs/1906.08357v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2019-06-01

The Theory of Weak Revealed Preference

Authors: Victor H. Aguiar, Per Hjertstrand, Roberto Serrano

We offer a rationalization of the weak generalized axiom of revealed
preference (WGARP) for both finite and infinite data sets of consumer choice.
We call it maximin rationalization, in which each pairwise choice is associated
with a "local" utility function. We develop its associated weak
revealed-preference theory. We show that preference recoverability and welfare
analysis \`a la Varian (1982) may not be informative enough, when the weak
axiom holds, but when consumers are not utility maximizers. We clarify the
reasons for this failure and provide new informative bounds for the consumer's
true preferences.

arXiv link: http://arxiv.org/abs/1906.00296v1

Econometrics arXiv updated paper (originally submitted: 2019-06-01)

At What Level Should One Cluster Standard Errors in Paired and Small-Strata Experiments?

Authors: Clément de Chaisemartin, Jaime Ramirez-Cuellar

In matched-pairs experiments in which one cluster per pair of clusters is
assigned to treatment, to estimate treatment effects, researchers often regress
their outcome on a treatment indicator and pair fixed effects, clustering
standard errors at the unit-ofrandomization level. We show that even if the
treatment has no effect, a 5%-level t-test based on this regression will
wrongly conclude that the treatment has an effect up to 16.5% of the time. To
fix this problem, researchers should instead cluster standard errors at the
pair level. Using simulations, we show that similar results apply to clustered
experiments with small strata.

arXiv link: http://arxiv.org/abs/1906.00288v10

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2019-06-01

Kernel Instrumental Variable Regression

Authors: Rahul Singh, Maneesh Sahani, Arthur Gretton

Instrumental variable (IV) regression is a strategy for learning causal
relationships in observational data. If measurements of input X and output Y
are confounded, the causal relationship can nonetheless be identified if an
instrumental variable Z is available that influences X directly, but is
conditionally independent of Y given X and the unmeasured confounder. The
classic two-stage least squares algorithm (2SLS) simplifies the estimation
problem by modeling all relationships as linear functions. We propose kernel
instrumental variable regression (KIV), a nonparametric generalization of 2SLS,
modeling relations among X, Y, and Z as nonlinear functions in reproducing
kernel Hilbert spaces (RKHSs). We prove the consistency of KIV under mild
assumptions, and derive conditions under which convergence occurs at the
minimax optimal rate for unconfounded, single-stage RKHS regression. In doing
so, we obtain an efficient ratio between training sample sizes used in the
algorithm's first and second stages. In experiments, KIV outperforms state of
the art alternatives for nonparametric IV regression.

arXiv link: http://arxiv.org/abs/1906.00232v6

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2019-06-01

lspartition: Partitioning-Based Least Squares Regression

Authors: Matias D. Cattaneo, Max H. Farrell, Yingjie Feng

Nonparametric partitioning-based least squares regression is an important
tool in empirical work. Common examples include regressions based on splines,
wavelets, and piecewise polynomials. This article discusses the main
methodological and numerical features of the R software package lspartition,
which implements modern estimation and inference results for partitioning-based
least squares (series) regression estimation. This article discusses the main
methodological and numerical features of the R software package lspartition,
which implements results for partitioning-based least squares (series)
regression estimation and inference from Cattaneo and Farrell (2013) and
Cattaneo, Farrell, and Feng (2019). These results cover the multivariate
regression function as well as its derivatives. First, the package provides
data-driven methods to choose the number of partition knots optimally,
according to integrated mean squared error, yielding optimal point estimation.
Second, robust bias correction is implemented to combine this point estimator
with valid inference. Third, the package provides estimates and inference for
the unknown function both pointwise and uniformly in the conditioning
variables. In particular, valid confidence bands are provided. Finally, an
extension to two-sample analysis is developed, which can be used in
treatment-control comparisons and related problems

arXiv link: http://arxiv.org/abs/1906.00202v2

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2019-06-01

nprobust: Nonparametric Kernel-Based Estimation and Robust Bias-Corrected Inference

Authors: Sebastian Calonico, Matias D. Cattaneo, Max H. Farrell

Nonparametric kernel density and local polynomial regression estimators are
very popular in Statistics, Economics, and many other disciplines. They are
routinely employed in applied work, either as part of the main empirical
analysis or as a preliminary ingredient entering some other estimation or
inference procedure. This article describes the main methodological and
numerical features of the software package nprobust, which offers an array of
estimation and inference procedures for nonparametric kernel-based density and
local polynomial regression methods, implemented in both the R and Stata
statistical platforms. The package includes not only classical bandwidth
selection, estimation, and inference methods (Wand and Jones, 1995; Fan and
Gijbels, 1996), but also other recent developments in the statistics and
econometrics literatures such as robust bias-corrected inference and coverage
error optimal bandwidth selection (Calonico, Cattaneo and Farrell, 2018, 2019).
Furthermore, this article also proposes a simple way of estimating optimal
bandwidths in practice that always delivers the optimal mean square error
convergence rate regardless of the specific evaluation point, that is, no
matter whether it is implemented at a boundary or interior point. Numerical
performance is illustrated using an empirical application and simulated data,
where a detailed numerical comparison with other R packages is given.

arXiv link: http://arxiv.org/abs/1906.00198v1

Econometrics arXiv updated paper (originally submitted: 2019-05-31)

Counterfactual Analysis under Partial Identification Using Locally Robust Refinement

Authors: Nathan Canen, Kyungchul Song

Structural models that admit multiple reduced forms, such as game-theoretic
models with multiple equilibria, pose challenges in practice, especially when
parameters are set-identified and the identified set is large. In such cases,
researchers often choose to focus on a particular subset of equilibria for
counterfactual analysis, but this choice can be hard to justify. This paper
shows that some parameter values can be more "desirable" than others for
counterfactual analysis, even if they are empirically equivalent given the
data. In particular, within the identified set, some counterfactual predictions
can exhibit more robustness than others, against local perturbations of the
reduced forms (e.g. the equilibrium selection rule). We provide a
representation of this subset which can be used to simplify the implementation.
We illustrate our message using moment inequality models, and provide an
empirical application based on a model with top-coded data.

arXiv link: http://arxiv.org/abs/1906.00003v3

Econometrics arXiv updated paper (originally submitted: 2019-05-31)

On Policy Evaluation with Aggregate Time-Series Shocks

Authors: Dmitry Arkhangelsky, Vasily Korovkin

We develop an estimator for applications where the variable of interest is
endogenous and researchers have access to aggregate instruments. Our method
addresses the critical identification challenge -- unobserved confounding,
which renders conventional estimators invalid. Our proposal relies on a new
data-driven aggregation scheme that eliminates the unobserved confounders. We
illustrate the advantages of our algorithm using data from Nakamura and
Steinsson (2014) study of local fiscal multipliers. We introduce a finite
population model with aggregate uncertainty to analyze our estimator. We
establish conditions for consistency and asymptotic normality and show how to
use our estimator to conduct valid inference.

arXiv link: http://arxiv.org/abs/1905.13660v8

Econometrics arXiv cross-link from q-fin.GN (q-fin.GN), submitted: 2019-05-31

Learned Sectors: A fundamentals-driven sector reclassification project

Authors: Rukmal Weerawarana, Yiyi Zhu, Yuzhen He

Market sectors play a key role in the efficient flow of capital through the
modern Global economy. We analyze existing sectorization heuristics, and
observe that the most popular - the GICS (which informs the S&P 500), and the
NAICS (published by the U.S. Government) - are not entirely quantitatively
driven, but rather appear to be highly subjective and rooted in dogma. Building
on inferences from analysis of the capital structure irrelevance principle and
the Modigliani-Miller theoretic universe conditions, we postulate that
corporation fundamentals - particularly those components specific to the
Modigliani-Miller universe conditions - would be optimal descriptors of the
true economic domain of operation of a company. We generate a set of potential
candidate learned sector universes by varying the linkage method of a
hierarchical clustering algorithm, and the number of resulting sectors derived
from the model (ranging from 5 to 19), resulting in a total of 60 candidate
learned sector universes. We then introduce reIndexer, a backtest-driven sector
universe evaluation research tool, to rank the candidate sector universes
produced by our learned sector classification heuristic. This rank was utilized
to identify the risk-adjusted return optimal learned sector universe as being
the universe generated under CLINK (i.e. complete linkage), with 17 sectors.
The optimal learned sector universe was tested against the benchmark GICS
classification universe with reIndexer, outperforming on both absolute
portfolio value, and risk-adjusted return over the backtest period. We conclude
that our fundamentals-driven Learned Sector classification heuristic provides a
superior risk-diversification profile than the status quo classification
heuristic.

arXiv link: http://arxiv.org/abs/1906.03935v1

Econometrics arXiv updated paper (originally submitted: 2019-05-30)

Threshold Regression with Nonparametric Sample Splitting

Authors: Yoonseok Lee, Yulong Wang

This paper develops a threshold regression model where an unknown
relationship between two variables nonparametrically determines the threshold.
We allow the observations to be cross-sectionally dependent so that the model
can be applied to determine an unknown spatial border for sample splitting over
a random field. We derive the uniform rate of convergence and the nonstandard
limiting distribution of the nonparametric threshold estimator. We also obtain
the root-n consistency and the asymptotic normality of the regression
coefficient estimator. Our model has broad empirical relevance as illustrated
by estimating the tipping point in social segregation problems as a function of
demographic characteristics; and determining metropolitan area boundaries using
nighttime light intensity collected from satellite imagery. We find that the
new empirical results are substantially different from those in the existing
studies.

arXiv link: http://arxiv.org/abs/1905.13140v3

Econometrics arXiv paper, submitted: 2019-05-30

Heterogeneity in demand and optimal price conditioning for local rail transport

Authors: Evgeniy M. Ozhegov, Alina Ozhegova

This paper describes the results of research project on optimal pricing for
LLC "Perm Local Rail Company". In this study we propose a regression tree based
approach for estimation of demand function for local rail tickets considering
high degree of demand heterogeneity by various trip directions and the goals of
travel. Employing detailed data on ticket sales for 5 years we estimate the
parameters of demand function and reveal the significant variation in price
elasticity of demand. While in average the demand is elastic by price, near a
quarter of trips is characterized by weakly elastic demand. Lower elasticity of
demand is correlated with lower degree of competition with other transport and
inflexible frequency of travel.

arXiv link: http://arxiv.org/abs/1905.12859v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2019-05-29

Deep Generalized Method of Moments for Instrumental Variable Analysis

Authors: Andrew Bennett, Nathan Kallus, Tobias Schnabel

Instrumental variable analysis is a powerful tool for estimating causal
effects when randomization or full control of confounders is not possible. The
application of standard methods such as 2SLS, GMM, and more recent variants are
significantly impeded when the causal effects are complex, the instruments are
high-dimensional, and/or the treatment is high-dimensional. In this paper, we
propose the DeepGMM algorithm to overcome this. Our algorithm is based on a new
variational reformulation of GMM with optimal inverse-covariance weighting that
allows us to efficiently control very many moment conditions. We further
develop practical techniques for optimization and model selection that make it
particularly successful in practice. Our algorithm is also computationally
tractable and can handle large-scale datasets. Numerical results show our
algorithm matches the performance of the best tuned methods in standard
settings and continues to work in high-dimensional settings where even recent
methods break.

arXiv link: http://arxiv.org/abs/1905.12495v2

Econometrics arXiv paper, submitted: 2019-05-29

Centered and non-centered variance inflation factor

Authors: Román Salmerón Gómez, Catalina García García y José García Pérez

This paper analyzes the diagnostic of near multicollinearity in a multiple
linear regression from auxiliary centered regressions (with intercept) and
non-centered (without intercept). From these auxiliary regression, the centered
and non-centered Variance Inflation Factors are calculated, respectively. It is
also presented an expression that relate both of them.

arXiv link: http://arxiv.org/abs/1905.12293v1

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2019-05-29

The Income Fluctuation Problem and the Evolution of Wealth

Authors: Qingyin Ma, John Stachurski, Alexis Akira Toda

We analyze the household savings problem in a general setting where returns
on assets, non-financial income and impatience are all state dependent and
fluctuate over time. All three processes can be serially correlated and
mutually dependent. Rewards can be bounded or unbounded and wealth can be
arbitrarily large. Extending classic results from an earlier literature, we
determine conditions under which (a) solutions exist, are unique and are
globally computable, (b) the resulting wealth dynamics are stationary, ergodic
and geometrically mixing, and (c) the wealth distribution has a Pareto tail. We
show how these results can be used to extend recent studies of the wealth
distribution. Our conditions have natural economic interpretations in terms of
asymptotic growth rates for discounting and return on savings.

arXiv link: http://arxiv.org/abs/1905.13045v3

Econometrics arXiv paper, submitted: 2019-05-28

Matching on What Matters: A Pseudo-Metric Learning Approach to Matching Estimation in High Dimensions

Authors: Gentry Johnson, Brian Quistorff, Matt Goldman

When pre-processing observational data via matching, we seek to approximate
each unit with maximally similar peers that had an alternative treatment
status--essentially replicating a randomized block design. However, as one
considers a growing number of continuous features, a curse of dimensionality
applies making asymptotically valid inference impossible (Abadie and Imbens,
2006). The alternative of ignoring plausibly relevant features is certainly no
better, and the resulting trade-off substantially limits the application of
matching methods to "wide" datasets. Instead, Li and Fu (2017) recasts the
problem of matching in a metric learning framework that maps features to a
low-dimensional space that facilitates "closer matches" while still capturing
important aspects of unit-level heterogeneity. However, that method lacks key
theoretical guarantees and can produce inconsistent estimates in cases of
heterogeneous treatment effects. Motivated by straightforward extension of
existing results in the matching literature, we present alternative techniques
that learn latent matching features through either MLPs or through siamese
neural networks trained on a carefully selected loss function. We benchmark the
resulting alternative methods in simulations as well as against two
experimental data sets--including the canonical NSW worker training program
data set--and find superior performance of the neural-net-based methods.

arXiv link: http://arxiv.org/abs/1905.12020v1

Econometrics arXiv cross-link from q-fin.GN (q-fin.GN), submitted: 2019-05-28

Graph-based era segmentation of international financial integration

Authors: Cécile Bastidon, Antoine Parent, Pablo Jensen, Patrice Abry, Pierre Borgnat

Assessing world-wide financial integration constitutes a recurrent challenge
in macroeconometrics, often addressed by visual inspections searching for data
patterns. Econophysics literature enables us to build complementary,
data-driven measures of financial integration using graphs. The present
contribution investigates the potential and interests of a novel 3-step
approach that combines several state-of-the-art procedures to i) compute
graph-based representations of the multivariate dependence structure of asset
prices time series representing the financial states of 32 countries world-wide
(1955-2015); ii) compute time series of 5 graph-based indices that characterize
the time evolution of the topologies of the graph; iii) segment these time
evolutions in piece-wise constant eras, using an optimization framework
constructed on a multivariate multi-norm total variation penalized functional.
The method shows first that it is possible to find endogenous stable eras of
world-wide financial integration. Then, our results suggest that the most
relevant globalization eras would be based on the historical patterns of global
capital flows, while the major regulatory events of the 1970s would only appear
as a cause of sub-segmentation.

arXiv link: http://arxiv.org/abs/1905.11842v1

Econometrics arXiv paper, submitted: 2019-05-27

Local Asymptotic Equivalence of the Bai and Ng (2004) and Moon and Perron (2004) Frameworks for Panel Unit Root Testing

Authors: Oliver Wichert, I. Gaia Becheri, Feike C. Drost, Ramon van den Akker

This paper considers unit-root tests in large n and large T heterogeneous
panels with cross-sectional dependence generated by unobserved factors. We
reconsider the two prevalent approaches in the literature, that of Moon and
Perron (2004) and the PANIC setup proposed in Bai and Ng (2004). While these
have been considered as completely different setups, we show that, in case of
Gaussian innovations, the frameworks are asymptotically equivalent in the sense
that both experiments are locally asymptotically normal (LAN) with the same
central sequence. Using Le Cam's theory of statistical experiments we determine
the local asymptotic power envelope and derive an optimal test jointly in both
setups. We show that the popular Moon and Perron (2004) and Bai and Ng (2010)
tests only attain the power envelope in case there is no heterogeneity in the
long-run variance of the idiosyncratic components. The new test is
asymptotically uniformly most powerful irrespective of possible heterogeneity.
Moreover, it turns out that for any test, satisfying a mild regularity
condition, the size and local asymptotic power are the same under both data
generating processes. Thus, applied researchers do not need to decide on one of
the two frameworks to conduct unit root tests. Monte-Carlo simulations
corroborate our asymptotic results and document significant gains in
finite-sample power if the variances of the idiosyncratic shocks differ
substantially among the cross sectional units.

arXiv link: http://arxiv.org/abs/1905.11184v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-05-26

Score-Driven Exponential Random Graphs: A New Class of Time-Varying Parameter Models for Dynamical Networks

Authors: Domenico Di Gangi, Giacomo Bormetti, Fabrizio Lillo

Motivated by the increasing abundance of data describing real-world networks
that exhibit dynamical features, we propose an extension of the Exponential
Random Graph Models (ERGMs) that accommodates the time variation of its
parameters. Inspired by the fast-growing literature on Dynamic Conditional
Score models, each parameter evolves according to an updating rule driven by
the score of the ERGM distribution. We demonstrate the flexibility of
score-driven ERGMs (SD-ERGMs) as data-generating processes and filters and show
the advantages of the dynamic version over the static one. We discuss two
applications to temporal networks from financial and political systems. First,
we consider the prediction of future links in the Italian interbank credit
network. Second, we show that the SD-ERGM allows discriminating between static
or time-varying parameters when used to model the U.S. Congress co-voting
network dynamics.

arXiv link: http://arxiv.org/abs/1905.10806v3

Econometrics arXiv updated paper (originally submitted: 2019-05-26)

Inducing Sparsity and Shrinkage in Time-Varying Parameter Models

Authors: Florian Huber, Gary Koop, Luca Onorante

Time-varying parameter (TVP) models have the potential to be
over-parameterized, particularly when the number of variables in the model is
large. Global-local priors are increasingly used to induce shrinkage in such
models. But the estimates produced by these priors can still have appreciable
uncertainty. Sparsification has the potential to reduce this uncertainty and
improve forecasts. In this paper, we develop computationally simple methods
which both shrink and sparsify TVP models. In a simulated data exercise we show
the benefits of our shrink-then-sparsify approach in a variety of sparse and
dense TVP regressions. In a macroeconomic forecasting exercise, we find our
approach to substantially improve forecast performance relative to shrinkage
alone.

arXiv link: http://arxiv.org/abs/1905.10787v2

Econometrics arXiv updated paper (originally submitted: 2019-05-24)

Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments

Authors: Vasilis Syrgkanis, Victor Lei, Miruna Oprescu, Maggie Hei, Keith Battocchi, Greg Lewis

We consider the estimation of heterogeneous treatment effects with arbitrary
machine learning methods in the presence of unobserved confounders with the aid
of a valid instrument. Such settings arise in A/B tests with an intent-to-treat
structure, where the experimenter randomizes over which user will receive a
recommendation to take an action, and we are interested in the effect of the
downstream action. We develop a statistical learning approach to the estimation
of heterogeneous effects, reducing the problem to the minimization of an
appropriate loss function that depends on a set of auxiliary models (each
corresponding to a separate prediction task). The reduction enables the use of
all recent algorithmic advances (e.g. neural nets, forests). We show that the
estimated effect model is robust to estimation errors in the auxiliary models,
by showing that the loss satisfies a Neyman orthogonality criterion. Our
approach can be used to estimate projections of the true effect model on
simpler hypothesis spaces. When these spaces are parametric, then the parameter
estimates are asymptotically normal, which enables construction of confidence
sets. We applied our method to estimate the effect of membership on downstream
webpage engagement on TripAdvisor, using as an instrument an intent-to-treat
A/B test among 4 million TripAdvisor users, where some users received an easier
membership sign-up process. We also validate our method on synthetic data and
on public datasets for the effects of schooling on income.

arXiv link: http://arxiv.org/abs/1905.10176v3

Econometrics arXiv updated paper (originally submitted: 2019-05-24)

Semi-Parametric Efficient Policy Learning with Continuous Actions

Authors: Mert Demirer, Vasilis Syrgkanis, Greg Lewis, Victor Chernozhukov

We consider off-policy evaluation and optimization with continuous action
spaces. We focus on observational data where the data collection policy is
unknown and needs to be estimated. We take a semi-parametric approach where the
value function takes a known parametric form in the treatment, but we are
agnostic on how it depends on the observed contexts. We propose a doubly robust
off-policy estimate for this setting and show that off-policy optimization
based on this estimate is robust to estimation errors of the policy function or
the regression model. Our results also apply if the model does not satisfy our
semi-parametric form, but rather we measure regret in terms of the best
projection of the true value function to this functional space. Our work
extends prior approaches of policy optimization from observational data that
only considered discrete actions. We provide an experimental evaluation of our
method in a synthetic data example motivated by optimal personalized pricing
and costly resource allocation.

arXiv link: http://arxiv.org/abs/1905.10116v2

Econometrics arXiv updated paper (originally submitted: 2019-05-21)

Smoothing quantile regressions

Authors: Marcelo Fernandes, Emmanuel Guerre, Eduardo Horta

We propose to smooth the entire objective function, rather than only the
check function, in a linear quantile regression context. Not only does the
resulting smoothed quantile regression estimator yield a lower mean squared
error and a more accurate Bahadur-Kiefer representation than the standard
estimator, but it is also asymptotically differentiable. We exploit the latter
to propose a quantile density estimator that does not suffer from the curse of
dimensionality. This means estimating the conditional density function without
worrying about the dimension of the covariate vector. It also allows for
two-stage efficient quantile regression estimation. Our asymptotic theory holds
uniformly with respect to the bandwidth and quantile level. Finally, we propose
a rule of thumb for choosing the smoothing bandwidth that should approximate
well the optimal bandwidth. Simulations confirm that our smoothed quantile
regression estimator indeed performs very well in finite samples.

arXiv link: http://arxiv.org/abs/1905.08535v3

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2019-05-20

Demand forecasting techniques for build-to-order lean manufacturing supply chains

Authors: Rodrigo Rivera-Castro, Ivan Nazarov, Yuke Xiang, Alexander Pletneev, Ivan Maksimov, Evgeny Burnaev

Build-to-order (BTO) supply chains have become common-place in industries
such as electronics, automotive and fashion. They enable building products
based on individual requirements with a short lead time and minimum inventory
and production costs. Due to their nature, they differ significantly from
traditional supply chains. However, there have not been studies dedicated to
demand forecasting methods for this type of setting. This work makes two
contributions. First, it presents a new and unique data set from a manufacturer
in the BTO sector. Second, it proposes a novel data transformation technique
for demand forecasting of BTO products. Results from thirteen forecasting
methods show that the approach compares well to the state-of-the-art while
being easy to implement and to explain to decision-makers.

arXiv link: http://arxiv.org/abs/1905.07902v1

Econometrics arXiv updated paper (originally submitted: 2019-05-20)

Conformal Prediction Interval Estimations with an Application to Day-Ahead and Intraday Power Markets

Authors: Christopher Kath, Florian Ziel

We discuss a concept denoted as Conformal Prediction (CP) in this paper.
While initially stemming from the world of machine learning, it was never
applied or analyzed in the context of short-term electricity price forecasting.
Therefore, we elaborate the aspects that render Conformal Prediction worthwhile
to know and explain why its simple yet very efficient idea has worked in other
fields of application and why its characteristics are promising for short-term
power applications as well. We compare its performance with different
state-of-the-art electricity price forecasting models such as quantile
regression averaging (QRA) in an empirical out-of-sample study for three
short-term electricity time series. We combine Conformal Prediction with
various underlying point forecast models to demonstrate its versatility and
behavior under changing conditions. Our findings suggest that Conformal
Prediction yields sharp and reliable prediction intervals in short-term power
markets. We further inspect the effect each of Conformal Prediction's model
components has and provide a path-based guideline on how to find the best CP
model for each market.

arXiv link: http://arxiv.org/abs/1905.07886v2

Econometrics arXiv paper, submitted: 2019-05-20

Time Series Analysis and Forecasting of the US Housing Starts using Econometric and Machine Learning Model

Authors: Sudiksha Joshi

In this research paper, I have performed time series analysis and forecasted
the monthly value of housing starts for the year 2019 using several econometric
methods - ARIMA(X), VARX, (G)ARCH and machine learning algorithms - artificial
neural networks, ridge regression, K-Nearest Neighbors, and support vector
regression, and created an ensemble model. The ensemble model stacks the
predictions from various individual models, and gives a weighted average of all
predictions. The analyses suggest that the ensemble model has performed the
best among all the models as the prediction errors are the lowest, while the
econometric models have higher error rates.

arXiv link: http://arxiv.org/abs/1905.07848v1

Econometrics arXiv updated paper (originally submitted: 2019-05-19)

Iterative Estimation of Nonparametric Regressions with Continuous Endogenous Variables and Discrete Instruments

Authors: Samuele Centorrino, Frédérique Fève, Jean-Pierre Florens

We consider a nonparametric regression model with continuous endogenous
independent variables when only discrete instruments are available that are
independent of the error term. Although this framework is very relevant for
applied research, its implementation is challenging, as the regression function
becomes the solution to a nonlinear integral equation. We propose a simple
iterative procedure to estimate such models and showcase some of its asymptotic
properties. In a simulation experiment, we detail its implementation in the
case when the instrumental variable is binary. We conclude with an empirical
application to returns to education.

arXiv link: http://arxiv.org/abs/1905.07812v3

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2019-05-17

Cointegration in high frequency data

Authors: Simon Clinet, Yoann Potiron

In this paper, we consider a framework adapting the notion of cointegration
when two asset prices are generated by a driftless It\^{o}-semimartingale
featuring jumps with infinite activity, observed regularly and synchronously at
high frequency. We develop a regression based estimation of the cointegrated
relations method and show the related consistency and central limit theory when
there is cointegration within that framework. We also provide a Dickey-Fuller
type residual based test for the null of no cointegration against the
alternative of cointegration, along with its limit theory. Under no
cointegration, the asymptotic limit is the same as that of the original
Dickey-Fuller residual based test, so that critical values can be easily
tabulated in the same way. Finite sample indicates adequate size and good power
properties in a variety of realistic configurations, outperforming original
Dickey-Fuller and Phillips-Perron type residual based tests, whose sizes are
distorted by non ergodic time-varying variance and power is altered by price
jumps. Two empirical examples consolidate the Monte-Carlo evidence that the
adapted tests can be rejected while the original tests are not, and vice versa.

arXiv link: http://arxiv.org/abs/1905.07081v2

Econometrics arXiv updated paper (originally submitted: 2019-05-16)

A Comment on "Estimating Dynamic Discrete Choice Models with Hyperbolic Discounting" by Hanming Fang and Yang Wang

Authors: Jaap H. Abbring, Øystein Daljord

The recent literature often cites Fang and Wang (2015) for analyzing the
identification of time preferences in dynamic discrete choice under exclusion
restrictions (e.g. Yao et al., 2012; Lee, 2013; Ching et al., 2013; Norets and
Tang, 2014; Dub\'e et al., 2014; Gordon and Sun, 2015; Bajari et al., 2016;
Chan, 2017; Gayle et al., 2018). Fang and Wang's Proposition 2 claims generic
identification of a dynamic discrete choice model with hyperbolic discounting.
This claim uses a definition of "generic" that does not preclude the
possibility that a generically identified model is nowhere identified. To
illustrate this point, we provide two simple examples of models that are
generically identified in Fang and Wang's sense, but that are, respectively,
everywhere and nowhere identified. We conclude that Proposition 2 is void: It
has no implications for identification of the dynamic discrete choice model. We
show that its proof is incorrect and incomplete and suggest alternative
approaches to identification.

arXiv link: http://arxiv.org/abs/1905.07048v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-05-16

The Empirical Saddlepoint Estimator

Authors: Benjamin Holcblat, Fallaw Sowell

We define a moment-based estimator that maximizes the empirical saddlepoint
(ESP) approximation of the distribution of solutions to empirical moment
conditions. We call it the ESP estimator. We prove its existence, consistency
and asymptotic normality, and we propose novel test statistics. We also show
that the ESP estimator corresponds to the MM (method of moments) estimator
shrunk toward parameter values with lower estimated variance, so it reduces the
documented instability of existing moment-based estimators. In the case of
just-identified moment conditions, which is the case we focus on, the ESP
estimator is different from the MM estimator, unlike the recently proposed
alternatives, such as the empirical-likelihood-type estimators.

arXiv link: http://arxiv.org/abs/1905.06977v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-05-16

Inference in a class of optimization problems: Confidence regions and finite sample bounds on errors in coverage probabilities

Authors: Joel L. Horowitz, Sokbae Lee

This paper describes three methods for carrying out non-asymptotic inference
on partially identified parameters that are solutions to a class of
optimization problems. Applications in which the optimization problems arise
include estimation under shape restrictions, estimation of models of discrete
games, and estimation based on grouped data. The partially identified
parameters are characterized by restrictions that involve the unknown
population means of observed random variables in addition to structural
parameters. Inference consists of finding confidence intervals for functions of
the structural parameters. Our theory provides finite-sample lower bounds on
the coverage probabilities of the confidence intervals under three sets of
assumptions of increasing strength. With the moderate sample sizes found in
most economics applications, the bounds become tighter as the assumptions
strengthen. We discuss estimation of population parameters that the bounds
depend on and contrast our methods with alternative methods for obtaining
confidence intervals for partially identified parameters. The results of Monte
Carlo experiments and empirical examples illustrate the usefulness of our
method.

arXiv link: http://arxiv.org/abs/1905.06491v6

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-05-15

mRSC: Multi-dimensional Robust Synthetic Control

Authors: Muhummad Amjad, Vishal Misra, Devavrat Shah, Dennis Shen

When evaluating the impact of a policy on a metric of interest, it may not be
possible to conduct a randomized control trial. In settings where only
observational data is available, Synthetic Control (SC) methods provide a
popular data-driven approach to estimate a "synthetic" control by combining
measurements of "similar" units (donors). Recently, Robust SC (RSC) was
proposed as a generalization of SC to overcome the challenges of missing data
high levels of noise, while removing the reliance on domain knowledge for
selecting donors. However, SC, RSC, and their variants, suffer from poor
estimation when the pre-intervention period is too short. As the main
contribution, we propose a generalization of unidimensional RSC to
multi-dimensional RSC, mRSC. Our proposed mechanism incorporates multiple
metrics to estimate a synthetic control, thus overcoming the challenge of poor
inference from limited pre-intervention data. We show that the mRSC algorithm
with $K$ metrics leads to a consistent estimator of the synthetic control for
the target unit under any metric. Our finite-sample analysis suggests that the
prediction error decays to zero at a rate faster than the RSC algorithm by a
factor of $K$ and $K$ for the training and testing periods (pre- and
post-intervention), respectively. Additionally, we provide a diagnostic test
that evaluates the utility of including additional metrics. Moreover, we
introduce a mechanism to validate the performance of mRSC: time series
prediction. That is, we propose a method to predict the future evolution of a
time series based on limited data when the notion of time is relative and not
absolute, i.e., we have access to a donor pool that has undergone the desired
future evolution. Finally, we conduct experimentation to establish the efficacy
of mRSC on synthetic data and two real-world case studies (retail and Cricket).

arXiv link: http://arxiv.org/abs/1905.06400v3

Econometrics arXiv paper, submitted: 2019-05-15

Analyzing Subjective Well-Being Data with Misclassification

Authors: Ekaterina Oparina, Sorawoot Srisuma

We use novel nonparametric techniques to test for the presence of
non-classical measurement error in reported life satisfaction (LS) and study
the potential effects from ignoring it. Our dataset comes from Wave 3 of the UK
Understanding Society that is surveyed from 35,000 British households. Our test
finds evidence of measurement error in reported LS for the entire dataset as
well as for 26 out of 32 socioeconomic subgroups in the sample. We estimate the
joint distribution of reported and latent LS nonparametrically in order to
understand the mis-reporting behavior. We show this distribution can then be
used to estimate parametric models of latent LS. We find measurement error bias
is not severe enough to distort the main drivers of LS. But there is an
important difference that is policy relevant. We find women tend to over-report
their latent LS relative to men. This may help explain the gender puzzle that
questions why women are reportedly happier than men despite being worse off on
objective outcomes such as income and employment.

arXiv link: http://arxiv.org/abs/1905.06037v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2019-05-13

Sustainable Investing and the Cross-Section of Returns and Maximum Drawdown

Authors: Lisa R. Goldberg, Saad Mouti

We use supervised learning to identify factors that predict the cross-section
of returns and maximum drawdown for stocks in the US equity market. Our data
run from January 1970 to December 2019 and our analysis includes ordinary least
squares, penalized linear regressions, tree-based models, and neural networks.
We find that the most important predictors tended to be consistent across
models, and that non-linear models had better predictive power than linear
models. Predictive power was higher in calm periods than in stressed periods.
Environmental, social, and governance indicators marginally impacted the
predictive power of non-linear models in our data, despite their negative
correlation with maximum drawdown and positive correlation with returns. Upon
exploring whether ESG variables are captured by some models, we find that ESG
data contribute to the prediction nonetheless.

arXiv link: http://arxiv.org/abs/1905.05237v2

Econometrics arXiv paper, submitted: 2019-05-11

Regression Discontinuity Design with Multiple Groups for Heterogeneous Causal Effect Estimation

Authors: Takayuki Toda, Ayako Wakano, Takahiro Hoshino

We propose a new estimation method for heterogeneous causal effects which
utilizes a regression discontinuity (RD) design for multiple datasets with
different thresholds. The standard RD design is frequently used in applied
researches, but the result is very limited in that the average treatment
effects is estimable only at the threshold on the running variable. In
application studies it is often the case that thresholds are different among
databases from different regions or firms. For example thresholds for
scholarship differ with states. The proposed estimator based on the augmented
inverse probability weighted local linear estimator can estimate the average
effects at an arbitrary point on the running variable between the thresholds
under mild conditions, while the method adjust for the difference of the
distributions of covariates among datasets. We perform simulations to
investigate the performance of the proposed estimator in the finite samples.

arXiv link: http://arxiv.org/abs/1905.04443v1

Econometrics arXiv updated paper (originally submitted: 2019-05-10)

Demand and Welfare Analysis in Discrete Choice Models with Social Interactions

Authors: Debopam Bhattacharya, Pascaline Dupas, Shin Kanaya

Many real-life settings of consumer-choice involve social interactions,
causing targeted policies to have spillover-effects. This paper develops novel
empirical tools for analyzing demand and welfare-effects of
policy-interventions in binary choice settings with social interactions.
Examples include subsidies for health-product adoption and vouchers for
attending a high-achieving school. We establish the connection between
econometrics of large games and Brock-Durlauf-type interaction models, under
both I.I.D. and spatially correlated unobservables. We develop new convergence
results for associated beliefs and estimates of preference-parameters under
increasing-domain spatial asymptotics. Next, we show that even with fully
parametric specifications and unique equilibrium, choice data, that are
sufficient for counterfactual demand-prediction under interactions, are
insufficient for welfare-calculations. This is because distinct underlying
mechanisms producing the same interaction coefficient can imply different
welfare-effects and deadweight-loss from a policy-intervention. Standard
index-restrictions imply distribution-free bounds on welfare. We illustrate our
results using experimental data on mosquito-net adoption in rural Kenya.

arXiv link: http://arxiv.org/abs/1905.04028v2

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2019-05-10

Identifying Present-Bias from the Timing of Choices

Authors: Paul Heidhues, Philipp Strack

Timing decisions are common: when to file your taxes, finish a referee
report, or complete a task at work. We ask whether time preferences can be
inferred when only task completion is observed. To answer this
question, we analyze the following model: each period a decision maker faces
the choice whether to complete the task today or to postpone it to later. Cost
and benefits of task completion cannot be directly observed by the analyst, but
the analyst knows that net benefits are drawn independently between periods
from a time-invariant distribution and that the agent has time-separable
utility. Furthermore, we suppose the analyst can observe the agent's exact
stopping probability. We establish that for any agent with quasi-hyperbolic
$\beta,\delta$-preferences and given level of partial naivete $\beta$,
the probability of completing the task conditional on not having done it
earlier increases towards the deadline. And conversely, for any given
preference parameters $\beta,\delta$ and (weakly increasing) profile of task
completion probability, there exists a stationary payoff distribution that
rationalizes her behavior as long as the agent is either sophisticated or fully
naive. An immediate corollary being that, without parametric assumptions, it is
impossible to rule out time-consistency even when imposing an a priori
assumption on the permissible long-run discount factor. We also provide an
exact partial identification result when the analyst can, in addition to the
stopping probability, observe the agent's continuation value.

arXiv link: http://arxiv.org/abs/1905.03959v1

Econometrics arXiv updated paper (originally submitted: 2019-05-09)

The Likelihood of Mixed Hitting Times

Authors: Jaap H. Abbring, Tim Salimans

We present a method for computing the likelihood of a mixed hitting-time
model that specifies durations as the first time a latent L\'evy process
crosses a heterogeneous threshold. This likelihood is not generally known in
closed form, but its Laplace transform is. Our approach to its computation
relies on numerical methods for inverting Laplace transforms that exploit
special properties of the first passage times of L\'evy processes. We use our
method to implement a maximum likelihood estimator of the mixed hitting-time
model in MATLAB. We illustrate the application of this estimator with an
analysis of Kennan's (1985) strike data.

arXiv link: http://arxiv.org/abs/1905.03463v2

Econometrics arXiv updated paper (originally submitted: 2019-05-06)

Lasso under Multi-way Clustering: Estimation and Post-selection Inference

Authors: Harold D. Chiang, Yuya Sasaki

This paper studies high-dimensional regression models with lasso when data is
sampled under multi-way clustering. First, we establish convergence rates for
the lasso and post-lasso estimators. Second, we propose a novel inference
method based on a post-double-selection procedure and show its asymptotic
validity. Our procedure can be easily implemented with existing statistical
packages. Simulation results demonstrate that the proposed procedure works well
in finite sample. We illustrate the proposed method with a couple of empirical
applications to development and growth economics.

arXiv link: http://arxiv.org/abs/1905.02107v3

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-05-06

Estimation of high-dimensional factor models and its application in power data analysis

Authors: Xin Shi, Robert Qiu

In dealing with high-dimensional data, factor models are often used for
reducing dimensions and extracting relevant information. The spectrum of
covariance matrices from power data exhibits two aspects: 1) bulk, which arises
from random noise or fluctuations and 2) spikes, which represents factors
caused by anomaly events. In this paper, we propose a new approach to the
estimation of high-dimensional factor models, minimizing the distance between
the empirical spectral density (ESD) of covariance matrices of the residuals of
power data that are obtained by subtracting principal components and the
limiting spectral density (LSD) from a multiplicative covariance structure
model. The free probability theory (FPT) is used to derive the spectral density
of the multiplicative covariance model, which efficiently solves the
computational difficulties. The proposed approach connects the estimation of
the number of factors to the LSD of covariance matrices of the residuals, which
provides estimators of the number of factors and the correlation structure
information in the residuals. Considering a lot of measurement noise is
contained in the power data and the correlation structure is complex for the
residuals, the approach prefers approaching the ESD of covariance matrices of
the residuals through a multiplicative covariance model, which avoids making
crude assumptions or simplifications on the complex structure of the data.
Theoretical studies show the proposed approach is robust against noise and
sensitive to the presence of weak factors. The synthetic data from IEEE 118-bus
power system is used to validate the effectiveness of the approach.
Furthermore, the application to the analysis of the real-world online
monitoring data in a power grid shows that the estimators in the approach can
be used to indicate the system behavior.

arXiv link: http://arxiv.org/abs/1905.02061v2

Econometrics arXiv paper, submitted: 2019-05-06

Non-standard inference for augmented double autoregressive models with null volatility coefficients

Authors: Feiyu Jiang, Dong Li, Ke Zhu

This paper considers an augmented double autoregressive (DAR) model, which
allows null volatility coefficients to circumvent the over-parameterization
problem in the DAR model. Since the volatility coefficients might be on the
boundary, the statistical inference methods based on the Gaussian quasi-maximum
likelihood estimation (GQMLE) become non-standard, and their asymptotics
require the data to have a finite sixth moment, which narrows applicable scope
in studying heavy-tailed data. To overcome this deficiency, this paper develops
a systematic statistical inference procedure based on the self-weighted GQMLE
for the augmented DAR model. Except for the Lagrange multiplier test statistic,
the Wald, quasi-likelihood ratio and portmanteau test statistics are all shown
to have non-standard asymptotics. The entire procedure is valid as long as the
data is stationary, and its usefulness is illustrated by simulation studies and
one real example.

arXiv link: http://arxiv.org/abs/1905.01798v1

Econometrics arXiv updated paper (originally submitted: 2019-05-03)

A Uniform Bound on the Operator Norm of Sub-Gaussian Random Matrices and Its Applications

Authors: Grigory Franguridi, Hyungsik Roger Moon

For an $N \times T$ random matrix $X(\beta)$ with weakly dependent uniformly
sub-Gaussian entries $x_{it}(\beta)$ that may depend on a possibly
infinite-dimensional parameter $\beta\in B$, we obtain a uniform bound
on its operator norm of the form $E \sup_{\beta \in B}
||X(\beta)|| \leq CK \left(\max(N,T) +
\gamma_2(B,d_B)\right)$, where $C$ is an absolute constant,
$K$ controls the tail behavior of (the increments of) $x_{it}(\cdot)$, and
$\gamma_2(B,d_B)$ is Talagrand's functional, a measure of
multi-scale complexity of the metric space $(B,d_B)$. We
illustrate how this result may be used for estimation that seeks to minimize
the operator norm of moment conditions as well as for estimation of the maximal
number of factors with functional data.

arXiv link: http://arxiv.org/abs/1905.01096v4

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-05-02

Sparsity Double Robust Inference of Average Treatment Effects

Authors: Jelena Bradic, Stefan Wager, Yinchu Zhu

Many popular methods for building confidence intervals on causal effects
under high-dimensional confounding require strong "ultra-sparsity" assumptions
that may be difficult to validate in practice. To alleviate this difficulty, we
here study a new method for average treatment effect estimation that yields
asymptotically exact confidence intervals assuming that either the conditional
response surface or the conditional probability of treatment allows for an
ultra-sparse representation (but not necessarily both). This guarantee allows
us to provide valid inference for average treatment effect in high dimensions
under considerably more generality than available baselines. In addition, we
showcase that our results are semi-parametrically efficient.

arXiv link: http://arxiv.org/abs/1905.00744v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-05-01

Variational Bayesian Inference for Mixed Logit Models with Unobserved Inter- and Intra-Individual Heterogeneity

Authors: Rico Krueger, Prateek Bansal, Michel Bierlaire, Ricardo A. Daziano, Taha H. Rashidi

Variational Bayes (VB), a method originating from machine learning, enables
fast and scalable estimation of complex probabilistic models. Thus far,
applications of VB in discrete choice analysis have been limited to mixed logit
models with unobserved inter-individual taste heterogeneity. However, such a
model formulation may be too restrictive in panel data settings, since tastes
may vary both between individuals as well as across choice tasks encountered by
the same individual. In this paper, we derive a VB method for posterior
inference in mixed logit models with unobserved inter- and intra-individual
heterogeneity. In a simulation study, we benchmark the performance of the
proposed VB method against maximum simulated likelihood (MSL) and Markov chain
Monte Carlo (MCMC) methods in terms of parameter recovery, predictive accuracy
and computational efficiency. The simulation study shows that VB can be a fast,
scalable and accurate alternative to MSL and MCMC estimation, especially in
applications in which fast predictions are paramount. VB is observed to be
between 2.8 and 17.7 times faster than the two competing methods, while
affording comparable or superior accuracy. Besides, the simulation study
demonstrates that a parallelised implementation of the MSL estimator with
analytical gradients is a viable alternative to MCMC in terms of both
estimation accuracy and computational efficiency, as the MSL estimator is
observed to be between 0.9 and 2.1 times faster than MCMC.

arXiv link: http://arxiv.org/abs/1905.00419v3

Econometrics arXiv updated paper (originally submitted: 2019-05-01)

Boosting: Why You Can Use the HP Filter

Authors: Peter C. B. Phillips, Zhentao Shi

The Hodrick-Prescott (HP) filter is one of the most widely used econometric
methods in applied macroeconomic research. Like all nonparametric methods, the
HP filter depends critically on a tuning parameter that controls the degree of
smoothing. Yet in contrast to modern nonparametric methods and applied work
with these procedures, empirical practice with the HP filter almost universally
relies on standard settings for the tuning parameter that have been suggested
largely by experimentation with macroeconomic data and heuristic reasoning. As
recent research (Phillips and Jin, 2015) has shown, standard settings may not
be adequate in removing trends, particularly stochastic trends, in economic
data.
This paper proposes an easy-to-implement practical procedure of iterating the
HP smoother that is intended to make the filter a smarter smoothing device for
trend estimation and trend elimination. We call this iterated HP technique the
boosted HP filter in view of its connection to $L_{2}$-boosting in machine
learning. The paper develops limit theory to show that the boosted HP (bHP)
filter asymptotically recovers trend mechanisms that involve unit root
processes, deterministic polynomial drifts, and polynomial drifts with
structural breaks. A stopping criterion is used to automate the iterative HP
algorithm, making it a data-determined method that is ready for modern
data-rich environments in economic research. The methodology is illustrated
using three real data examples that highlight the differences between simple HP
filtering, the data-determined boosted filter, and an alternative
autoregressive approach. These examples show that the bHP filter is helpful in
analyzing a large collection of heterogeneous macroeconomic time series that
manifest various degrees of persistence, trend behavior, and volatility.

arXiv link: http://arxiv.org/abs/1905.00175v3

Econometrics arXiv updated paper (originally submitted: 2019-04-30)

A Factor-Augmented Markov Switching (FAMS) Model

Authors: Gregor Zens, Maximilian Böck

This paper investigates the role of high-dimensional information sets in the
context of Markov switching models with time varying transition probabilities.
Markov switching models are commonly employed in empirical macroeconomic
research and policy work. However, the information used to model the switching
process is usually limited drastically to ensure stability of the model.
Increasing the number of included variables to enlarge the information set
might even result in decreasing precision of the model. Moreover, it is often
not clear a priori which variables are actually relevant when it comes to
informing the switching behavior. Building strongly on recent contributions in
the field of factor analysis, we introduce a general type of Markov switching
autoregressive models for non-linear time series analysis. Large numbers of
time series are allowed to inform the switching process through a factor
structure. This factor-augmented Markov switching (FAMS) model overcomes
estimation issues that are likely to arise in previous assessments of the
modeling framework. More accurate estimates of the switching behavior as well
as improved model fit result. The performance of the FAMS model is illustrated
in a simulated data example as well as in an US business cycle application.

arXiv link: http://arxiv.org/abs/1904.13194v2

Econometrics arXiv cross-link from math.OC (math.OC), submitted: 2019-04-29

Fast Mesh Refinement in Pseudospectral Optimal Control

Authors: N. Koeppen, I. M. Ross, L. C. Wilcox, R. J. Proulx

Mesh refinement in pseudospectral (PS) optimal control is embarrassingly easy
--- simply increase the order $N$ of the Lagrange interpolating polynomial and
the mathematics of convergence automates the distribution of the grid points.
Unfortunately, as $N$ increases, the condition number of the resulting linear
algebra increases as $N^2$; hence, spectral efficiency and accuracy are lost in
practice. In this paper, we advance Birkhoff interpolation concepts over an
arbitrary grid to generate well-conditioned PS optimal control discretizations.
We show that the condition number increases only as $N$ in general, but
is independent of $N$ for the special case of one of the boundary points being
fixed. Hence, spectral accuracy and efficiency are maintained as $N$ increases.
The effectiveness of the resulting fast mesh refinement strategy is
demonstrated by using polynomials of over a thousandth order to
solve a low-thrust, long-duration orbit transfer problem.

arXiv link: http://arxiv.org/abs/1904.12992v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-04-29

Exact Testing of Many Moment Inequalities Against Multiple Violations

Authors: Nick Koning, Paul Bekker

This paper considers the problem of testing many moment inequalities, where
the number of moment inequalities ($p$) is possibly larger than the sample size
($n$). Chernozhukov et al. (2019) proposed asymptotic tests for this problem
using the maximum $t$ statistic. We observe that such tests can have low power
if multiple inequalities are violated. As an alternative, we propose novel
randomization tests based on a maximum non-negatively weighted combination of
$t$ statistics. We provide a condition guaranteeing size control in large
samples. Simulations show that the tests control size in small samples ($n =
30$, $p = 1000$), and often has substantially higher power against alternatives
with multiple violations than tests based on the maximum $t$ statistic.

arXiv link: http://arxiv.org/abs/1904.12775v3

Econometrics arXiv updated paper (originally submitted: 2019-04-27)

Working women and caste in India: A study of social disadvantage using feature attribution

Authors: Kuhu Joshi, Chaitanya K. Joshi

Women belonging to the socially disadvantaged caste-groups in India have
historically been engaged in labour-intensive, blue-collar work. We study
whether there has been any change in the ability to predict a woman's
work-status and work-type based on her caste by interpreting machine learning
models using feature attribution. We find that caste is now a less important
determinant of work for the younger generation of women compared to the older
generation. Moreover, younger women from disadvantaged castes are now more
likely to be working in white-collar jobs.

arXiv link: http://arxiv.org/abs/1905.03092v2

Econometrics arXiv updated paper (originally submitted: 2019-04-25)

Nonparametric Estimation and Inference in Economic and Psychological Experiments

Authors: Raffaello Seri, Samuele Centorrino, Michele Bernasconi

The goal of this paper is to provide some tools for nonparametric estimation
and inference in psychological and economic experiments. We consider an
experimental framework in which each of $n$subjects provides $T$ responses to a
vector of $T$ stimuli. We propose to estimate the unknown function $f$ linking
stimuli to responses through a nonparametric sieve estimator. We give
conditions for consistency when either $n$ or $T$ or both diverge. The rate of
convergence depends upon the error covariance structure, that is allowed to
differ across subjects. With these results we derive the optimal divergence
rate of the dimension of the sieve basis with both $n$ and $T$. We provide
guidance about the optimal balance between the number of subjects and questions
in a laboratory experiment and argue that a large $n$is often better than a
large $T$. We derive conditions for asymptotic normality of functionals of the
estimator of $T$ and apply them to obtain the asymptotic distribution of the
Wald test when the number of constraints under the null is finite and when it
diverges along with other asymptotic parameters. Lastly, we investigate the
previous properties when the conditional covariance matrix is replaced by an
estimator.

arXiv link: http://arxiv.org/abs/1904.11156v3

Econometrics arXiv paper, submitted: 2019-04-25

Forecasting in Big Data Environments: an Adaptable and Automated Shrinkage Estimation of Neural Networks (AAShNet)

Authors: Ali Habibnia, Esfandiar Maasoumi

This paper considers improved forecasting in possibly nonlinear dynamic
settings, with high-dimension predictors ("big data" environments). To overcome
the curse of dimensionality and manage data and model complexity, we examine
shrinkage estimation of a back-propagation algorithm of a deep neural net with
skip-layer connections. We expressly include both linear and nonlinear
components. This is a high-dimensional learning approach including both
sparsity L1 and smoothness L2 penalties, allowing high-dimensionality and
nonlinearity to be accommodated in one step. This approach selects significant
predictors as well as the topology of the neural network. We estimate optimal
values of shrinkage hyperparameters by incorporating a gradient-based
optimization technique resulting in robust predictions with improved
reproducibility. The latter has been an issue in some approaches. This is
statistically interpretable and unravels some network structure, commonly left
to a black box. An additional advantage is that the nonlinear part tends to get
pruned if the underlying process is linear. In an application to forecasting
equity returns, the proposed approach captures nonlinear dynamics between
equities to enhance forecast performance. It offers an appreciable improvement
over current univariate and multivariate models by RMSE and actual portfolio
performance.

arXiv link: http://arxiv.org/abs/1904.11145v1

Econometrics arXiv updated paper (originally submitted: 2019-04-25)

Identification of Regression Models with a Misclassified and Endogenous Binary Regressor

Authors: Hiroyuki Kasahara, Katsumi Shimotsu

We study identification in nonparametric regression models with a
misclassified and endogenous binary regressor when an instrument is correlated
with misclassification error. We show that the regression function is
nonparametrically identified if one binary instrument variable and one binary
covariate satisfy the following conditions. The instrumental variable corrects
endogeneity; the instrumental variable must be correlated with the unobserved
true underlying binary variable, must be uncorrelated with the error term in
the outcome equation, but is allowed to be correlated with the
misclassification error. The covariate corrects misclassification; this
variable can be one of the regressors in the outcome equation, must be
correlated with the unobserved true underlying binary variable, and must be
uncorrelated with the misclassification error. We also propose a mixture-based
framework for modeling unobserved heterogeneous treatment effects with a
misclassified and endogenous binary regressor and show that treatment effects
can be identified if the true treatment effect is related to an observed
regressor and another observable variable.

arXiv link: http://arxiv.org/abs/1904.11143v3

Econometrics arXiv updated paper (originally submitted: 2019-04-24)

Normal Approximation in Large Network Models

Authors: Michael P. Leung, Hyungsik Roger Moon

We prove a central limit theorem for network formation models with strategic
interactions and homophilous agents. Since data often consists of observations
on a single large network, we consider an asymptotic framework in which the
network size diverges. We argue that a modification of “stabilization”
conditions from the literature on geometric graphs provides a useful high-level
formulation of weak dependence which we utilize to establish an abstract
central limit theorem. Using results in branching process theory, we derive
interpretable primitive conditions for stabilization. The main conditions
restrict the strength of strategic interactions and equilibrium selection
mechanism. We discuss practical inference procedures justified by our results.

arXiv link: http://arxiv.org/abs/1904.11060v7

Econometrics arXiv updated paper (originally submitted: 2019-04-19)

Average Density Estimators: Efficiency and Bootstrap Consistency

Authors: Matias D. Cattaneo, Michael Jansson

This paper highlights a tension between semiparametric efficiency and
bootstrap consistency in the context of a canonical semiparametric estimation
problem, namely the problem of estimating the average density. It is shown that
although simple plug-in estimators suffer from bias problems preventing them
from achieving semiparametric efficiency under minimal smoothness conditions,
the nonparametric bootstrap automatically corrects for this bias and that, as a
result, these seemingly inferior estimators achieve bootstrap consistency under
minimal smoothness conditions. In contrast, several "debiased" estimators that
achieve semiparametric efficiency under minimal smoothness conditions do not
achieve bootstrap consistency under those same conditions.

arXiv link: http://arxiv.org/abs/1904.09372v2

Econometrics arXiv paper, submitted: 2019-04-19

Location-Sector Analysis of International Profit Shifting on a Multilayer Ownership-Tax Network

Authors: Tembo Nakamoto, Odile Rouhban, Yuichi Ikeda

Currently all countries including developing countries are expected to
utilize their own tax revenues and carry out their own development for solving
poverty in their countries. However, developing countries cannot earn tax
revenues like developed countries partly because they do not have effective
countermeasures against international tax avoidance. Our analysis focuses on
treaty shopping among various ways to conduct international tax avoidance
because tax revenues of developing countries have been heavily damaged through
treaty shopping. To analyze the location and sector of conduit firms likely to
be used for treaty shopping, we constructed a multilayer ownership-tax network
and proposed multilayer centrality. Because multilayer centrality can consider
not only the value owing in the ownership network but also the withholding tax
rate, it is expected to grasp precisely the locations and sectors of conduit
firms established for the purpose of treaty shopping. Our analysis shows that
firms in the sectors of Finance & Insurance and Wholesale & Retail trade etc.
are involved with treaty shopping. We suggest that developing countries make a
clause focusing on these sectors in the tax treaties they conclude.

arXiv link: http://arxiv.org/abs/1904.09165v1

Econometrics arXiv paper, submitted: 2019-04-18

Ridge regularization for Mean Squared Error Reduction in Regression with Weak Instruments

Authors: Karthik Rajkumar

In this paper, I show that classic two-stage least squares (2SLS) estimates
are highly unstable with weak instruments. I propose a ridge estimator (ridge
IV) and show that it is asymptotically normal even with weak instruments,
whereas 2SLS is severely distorted and un-bounded. I motivate the ridge IV
estimator as a convex optimization problem with a GMM objective function and an
L2 penalty. I show that ridge IV leads to sizable mean squared error reductions
theoretically and validate these results in a simulation study inspired by data
designs of papers published in the American Economic Review.

arXiv link: http://arxiv.org/abs/1904.08580v1

Econometrics arXiv paper, submitted: 2019-04-17

Sharp Bounds for the Marginal Treatment Effect with Sample Selection

Authors: Vitor Possebom

I analyze treatment effects in situations when agents endogenously select
into the treatment group and into the observed sample. As a theoretical
contribution, I propose pointwise sharp bounds for the marginal treatment
effect (MTE) of interest within the always-observed subpopulation under
monotonicity assumptions. Moreover, I impose an extra mean dominance assumption
to tighten the previous bounds. I further discuss how to identify those bounds
when the support of the propensity score is either continuous or discrete.
Using these results, I estimate bounds for the MTE of the Job Corps Training
Program on hourly wages for the always-employed subpopulation and find that it
is decreasing in the likelihood of attending the program within the
Non-Hispanic group. For example, the Average Treatment Effect on the Treated is
between $.33 and $.99 while the Average Treatment Effect on the Untreated is
between $.71 and $3.00.

arXiv link: http://arxiv.org/abs/1904.08522v1

Econometrics arXiv updated paper (originally submitted: 2019-04-17)

A Generalized Continuous-Multinomial Response Model with a t-distributed Error Kernel

Authors: Subodh Dubey, Prateek Bansal, Ricardo A. Daziano, Erick Guerra

In multinomial response models, idiosyncratic variations in the indirect
utility are generally modeled using Gumbel or normal distributions. This study
makes a strong case to substitute these thin-tailed distributions with a
t-distribution. First, we demonstrate that a model with a t-distributed error
kernel better estimates and predicts preferences, especially in
class-imbalanced datasets. Our proposed specification also implicitly accounts
for decision-uncertainty behavior, i.e. the degree of certainty that
decision-makers hold in their choices relative to the variation in the indirect
utility of any alternative. Second, after applying a t-distributed error kernel
in a multinomial response model for the first time, we extend this
specification to a generalized continuous-multinomial (GCM) model and derive
its full-information maximum likelihood estimator. The likelihood involves an
open-form expression of the cumulative density function of the multivariate
t-distribution, which we propose to compute using a combination of the
composite marginal likelihood method and the separation-of-variables approach.
Third, we establish finite sample properties of the GCM model with a
t-distributed error kernel (GCM-t) and highlight its superiority over the GCM
model with a normally-distributed error kernel (GCM-N) in a Monte Carlo study.
Finally, we compare GCM-t and GCM-N in an empirical setting related to
preferences for electric vehicles (EVs). We observe that accounting for
decision-uncertainty behavior in GCM-t results in lower elasticity estimates
and a higher willingness to pay for improving the EV attributes than those of
the GCM-N model. These differences are relevant in making policies to expedite
the adoption of EVs.

arXiv link: http://arxiv.org/abs/1904.08332v3

Econometrics arXiv updated paper (originally submitted: 2019-04-15)

Subgeometric ergodicity and $β$-mixing

Authors: Mika Meitz, Pentti Saikkonen

It is well known that stationary geometrically ergodic Markov chains are
$\beta$-mixing (absolutely regular) with geometrically decaying mixing
coefficients. Furthermore, for initial distributions other than the stationary
one, geometric ergodicity implies $\beta$-mixing under suitable moment
assumptions. In this note we show that similar results hold also for
subgeometrically ergodic Markov chains. In particular, for both stationary and
other initial distributions, subgeometric ergodicity implies $\beta$-mixing
with subgeometrically decaying mixing coefficients. Although this result is
simple it should prove very useful in obtaining rates of mixing in situations
where geometric ergodicity can not be established. To illustrate our results we
derive new subgeometric ergodicity and $\beta$-mixing results for the
self-exciting threshold autoregressive model.

arXiv link: http://arxiv.org/abs/1904.07103v2

Econometrics arXiv updated paper (originally submitted: 2019-04-15)

Subgeometrically ergodic autoregressions

Authors: Mika Meitz, Pentti Saikkonen

In this paper we discuss how the notion of subgeometric ergodicity in Markov
chain theory can be exploited to study stationarity and ergodicity of nonlinear
time series models. Subgeometric ergodicity means that the transition
probability measures converge to the stationary measure at a rate slower than
geometric. Specifically, we consider suitably defined higher-order nonlinear
autoregressions that behave similarly to a unit root process for large values
of the observed series but we place almost no restrictions on their dynamics
for moderate values of the observed series. Results on the subgeometric
ergodicity of nonlinear autoregressions have previously appeared only in the
first-order case. We provide an extension to the higher-order case and show
that the autoregressions we consider are, under appropriate conditions,
subgeometrically ergodic. As useful implications we also obtain stationarity
and $\beta$-mixing with subgeometrically decaying mixing coefficients.

arXiv link: http://arxiv.org/abs/1904.07089v3

Econometrics arXiv paper, submitted: 2019-04-15

Estimation of Cross-Sectional Dependence in Large Panels

Authors: Jiti Gao, Guangming Pan, Yanrong Yang, Bo Zhang

Accurate estimation for extent of cross{sectional dependence in large panel
data analysis is paramount to further statistical analysis on the data under
study. Grouping more data with weak relations (cross{sectional dependence)
together often results in less efficient dimension reduction and worse
forecasting. This paper describes cross-sectional dependence among a large
number of objects (time series) via a factor model and parameterizes its extent
in terms of strength of factor loadings. A new joint estimation method,
benefiting from unique feature of dimension reduction for high dimensional time
series, is proposed for the parameter representing the extent and some other
parameters involved in the estimation procedure. Moreover, a joint asymptotic
distribution for a pair of estimators is established. Simulations illustrate
the effectiveness of the proposed estimation method in the finite sample
performance. Applications in cross-country macro-variables and stock returns
from S&P 500 are studied.

arXiv link: http://arxiv.org/abs/1904.06843v1

Econometrics arXiv updated paper (originally submitted: 2019-04-14)

Peer Effects in Random Consideration Sets

Authors: Nail Kashaev, Natalia Lazzati

We develop a dynamic model of discrete choice that incorporates peer effects
into random consideration sets. We characterize the equilibrium behavior and
study the empirical content of the model. In our setup, changes in the choices
of friends affect the distribution of the consideration sets. We exploit this
variation to recover the ranking of preferences, attention mechanisms, and
network connections. These nonparametric identification results allow
unrestricted heterogeneity across people and do not rely on the variation of
either covariates or the set of available options. Our methodology leads to a
maximum-likelihood estimator that performs well in simulations. We apply our
results to an experimental dataset that has been designed to study the visual
focus of attention.

arXiv link: http://arxiv.org/abs/1904.06742v3

Econometrics arXiv updated paper (originally submitted: 2019-04-14)

Complex Network Construction of Internet Financial risk

Authors: Runjie Xu, Chuanmin Mi, Rafal Mierzwiak, Runyu Meng

Internet finance is a new financial model that applies Internet technology to
payment, capital borrowing and lending and transaction processing. In order to
study the internal risks, this paper uses the Internet financial risk elements
as the network node to construct the complex network of Internet financial risk
system. Different from the study of macroeconomic shocks and financial
institution data, this paper mainly adopts the perspective of complex system to
analyze the systematic risk of Internet finance. By dividing the entire
financial system into Internet financial subnet, regulatory subnet and
traditional financial subnet, the paper discusses the relationship between
contagion and contagion among different risk factors, and concludes that risks
are transmitted externally through the internal circulation of Internet
finance, thus discovering potential hidden dangers of systemic risks. The
results show that the nodes around the center of the whole system are the main
objects of financial risk contagion in the Internet financial network. In
addition, macro-prudential regulation plays a decisive role in the control of
the Internet financial system, and points out the reasons why the current
regulatory measures are still limited. This paper summarizes a research model
which is still in its infancy, hoping to open up new prospects and directions
for us to understand the cascading behaviors of Internet financial risks.

arXiv link: http://arxiv.org/abs/1904.06640v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2019-04-13

Pólygamma Data Augmentation to address Non-conjugacy in the Bayesian Estimation of Mixed Multinomial Logit Models

Authors: Prateek Bansal, Rico Krueger, Michel Bierlaire, Ricardo A. Daziano, Taha H. Rashidi

The standard Gibbs sampler of Mixed Multinomial Logit (MMNL) models involves
sampling from conditional densities of utility parameters using
Metropolis-Hastings (MH) algorithm due to unavailability of conjugate prior for
logit kernel. To address this non-conjugacy concern, we propose the application
of P\'olygamma data augmentation (PG-DA) technique for the MMNL estimation. The
posterior estimates of the augmented and the default Gibbs sampler are similar
for two-alternative scenario (binary choice), but we encounter empirical
identification issues in the case of more alternatives ($J \geq 3$).

arXiv link: http://arxiv.org/abs/1904.07688v1

Econometrics arXiv updated paper (originally submitted: 2019-04-12)

Distribution Regression in Duration Analysis: an Application to Unemployment Spells

Authors: Miguel A. Delgado, Andrés García-Suaza, Pedro H. C. Sant'Anna

This article proposes inference procedures for distribution regression models
in duration analysis using randomly right-censored data. This generalizes
classical duration models by allowing situations where explanatory variables'
marginal effects freely vary with duration time. The article discusses
applications to testing uniform restrictions on the varying coefficients,
inferences on average marginal effects, and others involving conditional
distribution estimates. Finite sample properties of the proposed method are
studied by means of Monte Carlo experiments. Finally, we apply our proposal to
study the effects of unemployment benefits on unemployment duration.

arXiv link: http://arxiv.org/abs/1904.06185v2

Econometrics arXiv paper, submitted: 2019-04-11

Identification of Noncausal Models by Quantile Autoregressions

Authors: Alain Hecq, Li Sun

We propose a model selection criterion to detect purely causal from purely
noncausal models in the framework of quantile autoregressions (QAR). We also
present asymptotics for the i.i.d. case with regularly varying distributed
innovations in QAR. This new modelling perspective is appealing for
investigating the presence of bubbles in economic and financial time series,
and is an alternative to approximate maximum likelihood methods. We illustrate
our analysis using hyperinflation episodes in Latin American countries.

arXiv link: http://arxiv.org/abs/1904.05952v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-04-10

On the construction of confidence intervals for ratios of expectations

Authors: Alexis Derumigny, Lucas Girard, Yannick Guyonvarch

In econometrics, many parameters of interest can be written as ratios of
expectations. The main approach to construct confidence intervals for such
parameters is the delta method. However, this asymptotic procedure yields
intervals that may not be relevant for small sample sizes or, more generally,
in a sequence-of-model framework that allows the expectation in the denominator
to decrease to $0$ with the sample size. In this setting, we prove a
generalization of the delta method for ratios of expectations and the
consistency of the nonparametric percentile bootstrap. We also investigate
finite-sample inference and show a partial impossibility result: nonasymptotic
uniform confidence intervals can be built for ratios of expectations but not at
every level. Based on this, we propose an easy-to-compute index to appraise the
reliability of the intervals based on the delta method. Simulations and an
application illustrate our results and the practical usefulness of our rule of
thumb.

arXiv link: http://arxiv.org/abs/1904.07111v1

Econometrics arXiv updated paper (originally submitted: 2019-04-10)

Solving Dynamic Discrete Choice Models Using Smoothing and Sieve Methods

Authors: Dennis Kristensen, Patrick K. Mogensen, Jong Myun Moon, Bertel Schjerning

We propose to combine smoothing, simulations and sieve approximations to
solve for either the integrated or expected value function in a general class
of dynamic discrete choice (DDC) models. We use importance sampling to
approximate the Bellman operators defining the two functions. The random
Bellman operators, and therefore also the corresponding solutions, are
generally non-smooth which is undesirable. To circumvent this issue, we
introduce a smoothed version of the random Bellman operator and solve for the
corresponding smoothed value function using sieve methods. We show that one can
avoid using sieves by generalizing and adapting the `self-approximating' method
of Rust (1997) to our setting. We provide an asymptotic theory for the
approximate solutions and show that they converge with root-N-rate, where $N$
is number of Monte Carlo draws, towards Gaussian processes. We examine their
performance in practice through a set of numerical experiments and find that
both methods perform well with the sieve method being particularly attractive
in terms of computational speed and accuracy.

arXiv link: http://arxiv.org/abs/1904.05232v2

Econometrics arXiv updated paper (originally submitted: 2019-04-10)

Local Polynomial Estimation of Time-Varying Parameters in Nonlinear Models

Authors: Dennis Kristensen, Young Jun Lee

We develop a novel asymptotic theory for local polynomial extremum estimators
of time-varying parameters in a broad class of nonlinear time series models. We
show the proposed estimators are consistent and follow normal distributions in
large samples under weak conditions. We also provide a precise characterisation
of the leading bias term due to smoothing, which has not been done before. We
demonstrate the usefulness of our general results by establishing primitive
conditions for local (quasi-)maximum-likelihood estimators of time-varying
models threshold autoregressions, ARCH models and Poisson autogressions with
exogenous co--variates, to be normally distributed in large samples and
characterise their leading biases. An empirical study of US corporate default
counts demonstrates the applicability of the proposed local linear estimator
for Poisson autoregression, shedding new light on the dynamic properties of US
corporate defaults.

arXiv link: http://arxiv.org/abs/1904.05209v3

Econometrics arXiv updated paper (originally submitted: 2019-04-08)

Fixed Effects Binary Choice Models: Estimation and Inference with Long Panels

Authors: Daniel Czarnowske, Amrei Stammann

Empirical economists are often deterred from the application of fixed effects
binary choice models mainly for two reasons: the incidental parameter problem
and the computational challenge even in moderately large panels. Using the
example of binary choice models with individual and time fixed effects, we show
how both issues can be alleviated by combining asymptotic bias corrections with
computational advances. Because unbalancedness is often encountered in applied
work, we investigate its consequences on the finite sample properties of
various (bias corrected) estimators. In simulation experiments we find that
analytical bias corrections perform particularly well, whereas split-panel
jackknife estimators can be severely biased in unbalanced panels.

arXiv link: http://arxiv.org/abs/1904.04217v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2019-04-07

Bayesian Estimation of Mixed Multinomial Logit Models: Advances and Simulation-Based Evaluations

Authors: Prateek Bansal, Rico Krueger, Michel Bierlaire, Ricardo A. Daziano, Taha H. Rashidi

Variational Bayes (VB) methods have emerged as a fast and
computationally-efficient alternative to Markov chain Monte Carlo (MCMC)
methods for scalable Bayesian estimation of mixed multinomial logit (MMNL)
models. It has been established that VB is substantially faster than MCMC at
practically no compromises in predictive accuracy. In this paper, we address
two critical gaps concerning the usage and understanding of VB for MMNL. First,
extant VB methods are limited to utility specifications involving only
individual-specific taste parameters. Second, the finite-sample properties of
VB estimators and the relative performance of VB, MCMC and maximum simulated
likelihood estimation (MSLE) are not known. To address the former, this study
extends several VB methods for MMNL to admit utility specifications including
both fixed and random utility parameters. To address the latter, we conduct an
extensive simulation-based evaluation to benchmark the extended VB methods
against MCMC and MSLE in terms of estimation times, parameter recovery and
predictive accuracy. The results suggest that all VB variants with the
exception of the ones relying on an alternative variational lower bound
constructed with the help of the modified Jensen's inequality perform as well
as MCMC and MSLE at prediction and parameter recovery. In particular, VB with
nonconjugate variational message passing and the delta-method (VB-NCVMP-Delta)
is up to 16 times faster than MCMC and MSLE. Thus, VB-NCVMP-Delta can be an
attractive alternative to MCMC and MSLE for fast, scalable and accurate
estimation of MMNL models.

arXiv link: http://arxiv.org/abs/1904.03647v4

Econometrics arXiv cross-link from Economic Theory (econ.TH), submitted: 2019-04-05

Second-order Inductive Inference: an axiomatic approach

Authors: Patrick H. O'Callaghan

Consider a predictor who ranks eventualities on the basis of past cases: for
instance a search engine ranking webpages given past searches. Resampling past
cases leads to different rankings and the extraction of deeper information. Yet
a rich database, with sufficiently diverse rankings, is often beyond reach.
Inexperience demands either "on the fly" learning-by-doing or prudence: the
arrival of a novel case does not force (i) a revision of current rankings, (ii)
dogmatism towards new rankings, or (iii) intransitivity. For this higher-order
framework of inductive inference, we derive a suitably unique numerical
representation of these rankings via a matrix on eventualities x cases and
describe a robust test of prudence. Applications include: the success/failure
of startups; the veracity of fake news; and novel conditions for the existence
of a yield curve that is robustly arbitrage-free.

arXiv link: http://arxiv.org/abs/1904.02934v5

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-04-02

Synthetic learner: model-free inference on treatments over time

Authors: Davide Viviano, Jelena Bradic

Understanding the effect of a particular treatment or a policy pertains to
many areas of interest, ranging from political economics, marketing to
healthcare. In this paper, we develop a non-parametric algorithm for detecting
the effects of treatment over time in the context of Synthetic Controls. The
method builds on counterfactual predictions from many algorithms without
necessarily assuming that the algorithms correctly capture the model. We
introduce an inferential procedure for detecting treatment effects and show
that the testing procedure is asymptotically valid for stationary, beta mixing
processes without imposing any restriction on the set of base algorithms under
consideration. We discuss consistency guarantees for average treatment effect
estimates and derive regret bounds for the proposed methodology. The class of
algorithms may include Random Forest, Lasso, or any other machine-learning
estimator. Numerical studies and an application illustrate the advantages of
the method.

arXiv link: http://arxiv.org/abs/1904.01490v2

Econometrics arXiv updated paper (originally submitted: 2019-04-02)

Matching Points: Supplementing Instruments with Covariates in Triangular Models

Authors: Junlong Feng

Models with a discrete endogenous variable are typically underidentified when
the instrument takes on too few values. This paper presents a new method that
matches pairs of covariates and instruments to restore point identification in
this scenario in a triangular model. The model consists of a structural
function for a continuous outcome and a selection model for the discrete
endogenous variable. The structural outcome function must be continuous and
monotonic in a scalar disturbance, but it can be nonseparable. The selection
model allows for unrestricted heterogeneity. Global identification is obtained
under weak conditions. The paper also provides estimators of the structural
outcome function. Two empirical examples of the return to education and
selection into Head Start illustrate the value and limitations of the method.

arXiv link: http://arxiv.org/abs/1904.01159v3

Econometrics arXiv updated paper (originally submitted: 2019-04-01)

Dynamically Optimal Treatment Allocation

Authors: Karun Adusumilli, Friedrich Geiecke, Claudio Schilter

Dynamic decisions are pivotal to economic policy making. We show how existing
evidence from randomized control trials can be utilized to guide personalized
decisions in challenging dynamic environments with budget and capacity
constraints. Recent advances in reinforcement learning now enable the solution
of many complex, real-world problems for the first time. We allow for
restricted classes of policy functions and prove that their regret decays at
rate n^(-0.5), the same as in the static case. Applying our methods to job
training, we find that by exploiting the problem's dynamic structure, we
achieve significantly higher welfare compared to static approaches.

arXiv link: http://arxiv.org/abs/1904.01047v5

Econometrics arXiv updated paper (originally submitted: 2019-04-01)

Counterfactual Sensitivity and Robustness

Authors: Timothy Christensen, Benjamin Connault

We propose a framework for analyzing the sensitivity of counterfactuals to
parametric assumptions about the distribution of latent variables in structural
models. In particular, we derive bounds on counterfactuals as the distribution
of latent variables spans nonparametric neighborhoods of a given parametric
specification while other "structural" features of the model are maintained.
Our approach recasts the infinite-dimensional problem of optimizing the
counterfactual with respect to the distribution of latent variables (subject to
model constraints) as a finite-dimensional convex program. We also develop an
MPEC version of our method to further simplify computation in models with
endogenous parameters (e.g., value functions) defined by equilibrium
constraints. We propose plug-in estimators of the bounds and two methods for
inference. We also show that our bounds converge to the sharp nonparametric
bounds on counterfactuals as the neighborhood size becomes large. To illustrate
the broad applicability of our procedure, we present empirical applications to
matching models with transferable utility and dynamic discrete choice models.

arXiv link: http://arxiv.org/abs/1904.00989v4

Econometrics arXiv updated paper (originally submitted: 2019-03-30)

Post-Selection Inference in Three-Dimensional Panel Data

Authors: Harold D. Chiang, Joel Rodrigue, Yuya Sasaki

Three-dimensional panel models are widely used in empirical analysis.
Researchers use various combinations of fixed effects for three-dimensional
panels. When one imposes a parsimonious model and the true model is rich, then
it incurs mis-specification biases. When one employs a rich model and the true
model is parsimonious, then it incurs larger standard errors than necessary. It
is therefore useful for researchers to know correct models. In this light, Lu,
Miao, and Su (2018) propose methods of model selection. We advance this
literature by proposing a method of post-selection inference for regression
parameters. Despite our use of the lasso technique as means of model selection,
our assumptions allow for many and even all fixed effects to be nonzero.
Simulation studies demonstrate that the proposed method is more precise than
under-fitting fixed effect estimators, is more efficient than over-fitting
fixed effect estimators, and allows for as accurate inference as the oracle
estimator.

arXiv link: http://arxiv.org/abs/1904.00211v2

Econometrics arXiv updated paper (originally submitted: 2019-03-29)

Simple subvector inference on sharp identified set in affine models

Authors: Bulat Gafarov

This paper studies a regularized support function estimator for bounds on
components of the parameter vector in the case in which the identified set is a
polygon. The proposed regularized estimator has three important properties: (i)
it has a uniform asymptotic Gaussian limit in the presence of flat faces in the
absence of redundant (or overidentifying) constraints (or vice versa); (ii) the
bias from regularization does not enter the first-order limiting distribution;
(iii) the estimator remains consistent for sharp (non-enlarged) identified set
for the individual components even in the non-regualar case. These properties
are used to construct uniformly valid confidence sets for an element
$\theta_{1}$ of a parameter vector $\theta\inR^{d}$ that is partially
identified by affine moment equality and inequality conditions. The proposed
confidence sets can be computed as a solution to a small number of linear and
convex quadratic programs, leading to a substantial decrease in computation
time and guarantees a global optimum. As a result, the method provides a
uniformly valid inference in applications in which the dimension of the
parameter space, $d$, and the number of inequalities, $k$, were previously
computationally unfeasible ($d,k=100$). The proposed approach can be extended
to construct confidence sets for intersection bounds, to construct joint
polygon-shaped confidence sets for multiple components of $\theta$, and to find
the set of solutions to a linear program. Inference for coefficients in the
linear IV regression model with an interval outcome is used as an illustrative
example.

arXiv link: http://arxiv.org/abs/1904.00111v3

Econometrics arXiv updated paper (originally submitted: 2019-03-26)

Testing for Differences in Stochastic Network Structure

Authors: Eric Auerbach

How can one determine whether a community-level treatment, such as the
introduction of a social program or trade shock, alters agents' incentives to
form links in a network? This paper proposes analogues of a two-sample
Kolmogorov-Smirnov test, widely used in the literature to test the null
hypothesis of "no treatment effects", for network data. It first specifies a
testing problem in which the null hypothesis is that two networks are drawn
from the same random graph model. It then describes two randomization tests
based on the magnitude of the difference between the networks' adjacency
matrices as measured by the $2\to2$ and $\infty\to1$ operator norms. Power
properties of the tests are examined analytically, in simulation, and through
two real-world applications. A key finding is that the test based on the
$\infty\to1$ norm can be substantially more powerful than that based on the
$2\to2$ norm for the kinds of sparse and degree-heterogeneous networks common
in economics.

arXiv link: http://arxiv.org/abs/1903.11117v5

Econometrics arXiv paper, submitted: 2019-03-26

On the Effect of Imputation on the 2SLS Variance

Authors: Helmut Farbmacher, Alexander Kann

Endogeneity and missing data are common issues in empirical research. We
investigate how both jointly affect inference on causal parameters.
Conventional methods to estimate the variance, which treat the imputed data as
if it was observed in the first place, are not reliable. We derive the
asymptotic variance and propose a heteroskedasticity robust variance estimator
for two-stage least squares which accounts for the imputation. Monte Carlo
simulations support our theoretical findings.

arXiv link: http://arxiv.org/abs/1903.11004v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-03-26

Time series models for realized covariance matrices based on the matrix-F distribution

Authors: Jiayuan Zhou, Feiyu Jiang, Ke Zhu, Wai Keung Li

We propose a new Conditional BEKK matrix-F (CBF) model for the time-varying
realized covariance (RCOV) matrices. This CBF model is capable of capturing
heavy-tailed RCOV, which is an important stylized fact but could not be handled
adequately by the Wishart-based models. To further mimic the long memory
feature of the RCOV, a special CBF model with the conditional heterogeneous
autoregressive (HAR) structure is introduced. Moreover, we give a systematical
study on the probabilistic properties and statistical inferences of the CBF
model, including exploring its stationarity, establishing the asymptotics of
its maximum likelihood estimator, and giving some new inner-product-based tests
for its model checking. In order to handle a large dimensional RCOV matrix, we
construct two reduced CBF models -- the variance-target CBF model (for moderate
but fixed dimensional RCOV matrix) and the factor CBF model (for high
dimensional RCOV matrix). For both reduced models, the asymptotic theory of the
estimated parameters is derived. The importance of our entire methodology is
illustrated by simulation results and two real examples.

arXiv link: http://arxiv.org/abs/1903.12077v2

Econometrics arXiv paper, submitted: 2019-03-24

Ensemble Methods for Causal Effects in Panel Data Settings

Authors: Susan Athey, Mohsen Bayati, Guido Imbens, Zhaonan Qu

This paper studies a panel data setting where the goal is to estimate causal
effects of an intervention by predicting the counterfactual values of outcomes
for treated units, had they not received the treatment. Several approaches have
been proposed for this problem, including regression methods, synthetic control
methods and matrix completion methods. This paper considers an ensemble
approach, and shows that it performs better than any of the individual methods
in several economic datasets. Matrix completion methods are often given the
most weight by the ensemble, but this clearly depends on the setting. We argue
that ensemble methods present a fruitful direction for further research in the
causal panel data setting.

arXiv link: http://arxiv.org/abs/1903.10079v1

Econometrics arXiv paper, submitted: 2019-03-24

Machine Learning Methods Economists Should Know About

Authors: Susan Athey, Guido Imbens

We discuss the relevance of the recent Machine Learning (ML) literature for
economics and econometrics. First we discuss the differences in goals, methods
and settings between the ML literature and the traditional econometrics and
statistics literatures. Then we discuss some specific methods from the machine
learning literature that we view as important for empirical researchers in
economics. These include supervised learning methods for regression and
classification, unsupervised learning methods, as well as matrix completion
methods. Finally, we highlight newly developed methods at the intersection of
ML and econometrics, methods that typically perform better than either
off-the-shelf ML or more traditional econometric methods when applied to
particular classes of problems, problems that include causal inference for
average treatment effects, optimal policy estimation, and estimation of the
counterfactual effect of price changes in consumer choice models.

arXiv link: http://arxiv.org/abs/1903.10075v1

Econometrics arXiv updated paper (originally submitted: 2019-03-22)

Identification and Estimation of a Partially Linear Regression Model using Network Data

Authors: Eric Auerbach

I study a regression model in which one covariate is an unknown function of a
latent driver of link formation in a network. Rather than specify and fit a
parametric network formation model, I introduce a new method based on matching
pairs of agents with similar columns of the squared adjacency matrix, the ijth
entry of which contains the number of other agents linked to both agents i and
j. The intuition behind this approach is that for a large class of network
formation models the columns of the squared adjacency matrix characterize all
of the identifiable information about individual linking behavior. In this
paper, I describe the model, formalize this intuition, and provide consistent
estimators for the parameters of the regression model. Auerbach (2021)
considers inference and an application to network peer effects.

arXiv link: http://arxiv.org/abs/1903.09679v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-03-21

Feature quantization for parsimonious and interpretable predictive models

Authors: Adrien Ehrhardt, Christophe Biernacki, Vincent Vandewalle, Philippe Heinrich

For regulatory and interpretability reasons, logistic regression is still
widely used. To improve prediction accuracy and interpretability, a
preprocessing step quantizing both continuous and categorical data is usually
performed: continuous features are discretized and, if numerous, levels of
categorical features are grouped. An even better predictive accuracy can be
reached by embedding this quantization estimation step directly into the
predictive estimation step itself. But doing so, the predictive loss has to be
optimized on a huge set. To overcome this difficulty, we introduce a specific
two-step optimization strategy: first, the optimization problem is relaxed by
approximating discontinuous quantization functions by smooth functions; second,
the resulting relaxed optimization problem is solved via a particular neural
network. The good performances of this approach, which we call glmdisc, are
illustrated on simulated and real data from the UCI library and Cr\'edit
Agricole Consumer Finance (a major European historic player in the consumer
credit market).

arXiv link: http://arxiv.org/abs/1903.08920v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-03-20

Omitted variable bias of Lasso-based inference methods: A finite sample analysis

Authors: Kaspar Wuthrich, Ying Zhu

We study the finite sample behavior of Lasso-based inference methods such as
post double Lasso and debiased Lasso. We show that these methods can exhibit
substantial omitted variable biases (OVBs) due to Lasso not selecting relevant
controls. This phenomenon can occur even when the coefficients are sparse and
the sample size is large and larger than the number of controls. Therefore,
relying on the existing asymptotic inference theory can be problematic in
empirical applications. We compare the Lasso-based inference methods to modern
high-dimensional OLS-based methods and provide practical guidance.

arXiv link: http://arxiv.org/abs/1903.08704v9

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2019-03-19

State-Building through Public Land Disposal? An Application of Matrix Completion for Counterfactual Prediction

Authors: Jason Poulos

This paper examines how homestead policies, which opened vast frontier lands
for settlement, influenced the development of American frontier states. It uses
a treatment propensity-weighted matrix completion model to estimate the
counterfactual size of these states without homesteading. In simulation
studies, the method shows lower bias and variance than other estimators,
particularly in higher complexity scenarios. The empirical analysis reveals
that homestead policies significantly and persistently reduced state government
expenditure and revenue. These findings align with continuous
difference-in-differences estimates using 1.46 million land patent records.
This study's extension of the matrix completion method to include propensity
score weighting for causal effect estimation in panel data, especially in
staggered treatment contexts, enhances policy evaluation by improving the
precision of long-term policy impact assessments.

arXiv link: http://arxiv.org/abs/1903.08028v4

Econometrics arXiv updated paper (originally submitted: 2019-03-19)

Bayesian MIDAS Penalized Regressions: Estimation, Selection, and Prediction

Authors: Matteo Mogliani, Anna Simoni

We propose a new approach to mixed-frequency regressions in a
high-dimensional environment that resorts to Group Lasso penalization and
Bayesian techniques for estimation and inference. In particular, to improve the
prediction properties of the model and its sparse recovery ability, we consider
a Group Lasso with a spike-and-slab prior. Penalty hyper-parameters governing
the model shrinkage are automatically tuned via an adaptive MCMC algorithm. We
establish good frequentist asymptotic properties of the posterior of the
in-sample and out-of-sample prediction error, we recover the optimal posterior
contraction rate, and we show optimality of the posterior predictive density.
Simulations show that the proposed models have good selection and forecasting
performance in small samples, even when the design matrix presents
cross-correlation. When applied to forecasting U.S. GDP, our penalized
regressions can outperform many strong competitors. Results suggest that
financial variables may have some, although very limited, short-term predictive
content.

arXiv link: http://arxiv.org/abs/1903.08025v3

Econometrics arXiv paper, submitted: 2019-03-19

An Integrated Panel Data Approach to Modelling Economic Growth

Authors: Guohua Feng, Jiti Gao, Bin Peng

Empirical growth analysis has three major problems --- variable selection,
parameter heterogeneity and cross-sectional dependence --- which are addressed
independently from each other in most studies. The purpose of this study is to
propose an integrated framework that extends the conventional linear growth
regression model to allow for parameter heterogeneity and cross-sectional error
dependence, while simultaneously performing variable selection. We also derive
the asymptotic properties of the estimator under both low and high dimensions,
and further investigate the finite sample performance of the estimator through
Monte Carlo simulations. We apply the framework to a dataset of 89 countries
over the period from 1960 to 2014. Our results reveal some cross-country
patterns not found in previous studies (e.g., "middle income trap hypothesis",
"natural resources curse hypothesis", "religion works via belief, not
practice", etc.).

arXiv link: http://arxiv.org/abs/1903.07948v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-03-16

Deciding with Judgment

Authors: Simone Manganelli

A decision maker starts from a judgmental decision and moves to the closest
boundary of the confidence interval. This statistical decision rule is
admissible and does not perform worse than the judgmental decision with a
probability equal to the confidence level, which is interpreted as a
coefficient of statistical risk aversion. The confidence level is related to
the decision maker's aversion to uncertainty and can be elicited with
laboratory experiments using urns a la Ellsberg. The decision rule is applied
to a problem of asset allocation for an investor whose judgmental decision is
to keep all her wealth in cash.

arXiv link: http://arxiv.org/abs/1903.06980v1

Econometrics arXiv paper, submitted: 2019-03-15

Inference for First-Price Auctions with Guerre, Perrigne, and Vuong's Estimator

Authors: Jun Ma, Vadim Marmer, Artyom Shneyerov

We consider inference on the probability density of valuations in the
first-price sealed-bid auctions model within the independent private value
paradigm. We show the asymptotic normality of the two-step nonparametric
estimator of Guerre, Perrigne, and Vuong (2000) (GPV), and propose an easily
implementable and consistent estimator of the asymptotic variance. We prove the
validity of the pointwise percentile bootstrap confidence intervals based on
the GPV estimator. Lastly, we use the intermediate Gaussian approximation
approach to construct bootstrap-based asymptotically valid uniform confidence
bands for the density of the valuations.

arXiv link: http://arxiv.org/abs/1903.06401v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-03-13

A statistical analysis of time trends in atmospheric ethane

Authors: Marina Friedrich, Eric Beutner, Hanno Reuvers, Stephan Smeekes, Jean-Pierre Urbain, Whitney Bader, Bruno Franco, Bernard Lejeune, Emmanuel Mahieu

Ethane is the most abundant non-methane hydrocarbon in the Earth's atmosphere
and an important precursor of tropospheric ozone through various chemical
pathways. Ethane is also an indirect greenhouse gas (global warming potential),
influencing the atmospheric lifetime of methane through the consumption of the
hydroxyl radical (OH). Understanding the development of trends and identifying
trend reversals in atmospheric ethane is therefore crucial. Our dataset
consists of four series of daily ethane columns obtained from ground-based FTIR
measurements. As many other decadal time series, our data are characterized by
autocorrelation, heteroskedasticity, and seasonal effects. Additionally,
missing observations due to instrument failure or unfavorable measurement
conditions are common in such series. The goal of this paper is therefore to
analyze trends in atmospheric ethane with statistical tools that correctly
address these data features. We present selected methods designed for the
analysis of time trends and trend reversals. We consider bootstrap inference on
broken linear trends and smoothly varying nonlinear trends. In particular, for
the broken trend model, we propose a bootstrap method for inference on the
break location and the corresponding changes in slope. For the smooth trend
model we construct simultaneous confidence bands around the nonparametrically
estimated trend. Our autoregressive wild bootstrap approach, combined with a
seasonal filter, is able to handle all issues mentioned above.

arXiv link: http://arxiv.org/abs/1903.05403v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2019-03-11

From interpretability to inference: an estimation framework for universal approximators

Authors: Andreas Joseph

We present a novel framework for estimation and inference with the broad
class of universal approximators. Estimation is based on the decomposition of
model predictions into Shapley values. Inference relies on analyzing the bias
and variance properties of individual Shapley components. We show that Shapley
value estimation is asymptotically unbiased, and we introduce Shapley
regressions as a tool to uncover the true data generating process from noisy
data alone. The well-known case of the linear regression is the special case in
our framework if the model is linear in parameters. We present theoretical,
numerical, and empirical results for the estimation of heterogeneous treatment
effects as our guiding example.

arXiv link: http://arxiv.org/abs/1903.04209v6

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-03-09

Estimating Dynamic Conditional Spread Densities to Optimise Daily Storage Trading of Electricity

Authors: Ekaterina Abramova, Derek Bunn

This paper formulates dynamic density functions, based upon skewed-t and
similar representations, to model and forecast electricity price spreads
between different hours of the day. This supports an optimal day ahead storage
and discharge schedule, and thereby facilitates a bidding strategy for a
merchant arbitrage facility into the day-ahead auctions for wholesale
electricity. The four latent moments of the density functions are dynamic and
conditional upon exogenous drivers, thereby permitting the mean, variance,
skewness and kurtosis of the densities to respond hourly to such factors as
weather and demand forecasts. The best specification for each spread is
selected based on the Pinball Loss function, following the closed form
analytical solutions of the cumulative density functions. Those analytical
properties also allow the calculation of risk associated with the spread
arbitrages. From these spread densities, the optimal daily operation of a
battery storage facility is determined.

arXiv link: http://arxiv.org/abs/1903.06668v1

Econometrics arXiv paper, submitted: 2019-03-06

A Varying Coefficient Model for Assessing the Returns to Growth to Account for Poverty and Inequality

Authors: Max Köhler, Stefan Sperlich, Jisu Yoon

Various papers demonstrate the importance of inequality, poverty and the size
of the middle class for economic growth. When explaining why these measures of
the income distribution are added to the growth regression, it is often
mentioned that poor people behave different which may translate to the economy
as a whole. However, simply adding explanatory variables does not reflect this
behavior. By a varying coefficient model we show that the returns to growth
differ a lot depending on poverty and inequality. Furthermore, we investigate
how these returns differ for the poorer and for the richer part of the
societies. We argue that the differences in the coefficients impede, on the one
hand, that the means coefficients are informative, and, on the other hand,
challenge the credibility of the economic interpretation. In short, we show
that, when estimating mean coefficients without accounting for poverty and
inequality, the estimation is likely to suffer from a serious endogeneity bias.

arXiv link: http://arxiv.org/abs/1903.02390v1

Econometrics arXiv paper, submitted: 2019-03-06

The Africa-Dummy: Gone with the Millennium?

Authors: Max Köhler, Stefan Sperlich

A fixed effects regression estimator is introduced that can directly identify
and estimate the Africa-Dummy in one regression step so that its correct
standard errors as well as correlations to other coefficients can easily be
estimated. We can estimate the Nickel bias and found it to be negligibly tiny.
Semiparametric extensions check whether the Africa-Dummy is simply a result of
misspecification of the functional form. In particular, we show that the
returns to growth factors are different for Sub-Saharan African countries
compared to the rest of the world. For example, returns to population growth
are positive and beta-convergence is faster. When extending the model to
identify the development of the Africa-Dummy over time we see that it has been
changing dramatically over time and that the punishment for Sub-Saharan African
countries has been decreasing incrementally to reach insignificance around the
turn of the millennium.

arXiv link: http://arxiv.org/abs/1903.02357v1

Econometrics arXiv cross-link from math.OC (math.OC), submitted: 2019-03-06

Experimenting in Equilibrium

Authors: Stefan Wager, Kuang Xu

Classical approaches to experimental design assume that intervening on one
unit does not affect other units. There are many important settings, however,
where this non-interference assumption does not hold, as when running
experiments on supply-side incentives on a ride-sharing platform or subsidies
in an energy marketplace. In this paper, we introduce a new approach to
experimental design in large-scale stochastic systems with considerable
cross-unit interference, under an assumption that the interference is
structured enough that it can be captured via mean-field modeling. Our approach
enables us to accurately estimate the effect of small changes to system
parameters by combining unobstrusive randomization with lightweight modeling,
all while remaining in equilibrium. We can then use these estimates to optimize
the system by gradient descent. Concretely, we focus on the problem of a
platform that seeks to optimize supply-side payments p in a centralized
marketplace where different suppliers interact via their effects on the overall
supply-demand equilibrium, and show that our approach enables the platform to
optimize p in large systems using vanishingly small perturbations.

arXiv link: http://arxiv.org/abs/1903.02124v5

Econometrics arXiv updated paper (originally submitted: 2019-03-05)

ppmlhdfe: Fast Poisson Estimation with High-Dimensional Fixed Effects

Authors: Sergio Correia, Paulo Guimarães, Thomas Zylkin

In this paper we present ppmlhdfe, a new Stata command for estimation of
(pseudo) Poisson regression models with multiple high-dimensional fixed effects
(HDFE). Estimation is implemented using a modified version of the iteratively
reweighted least-squares (IRLS) algorithm that allows for fast estimation in
the presence of HDFE. Because the code is built around the reghdfe package, it
has similar syntax, supports many of the same functionalities, and benefits
from reghdfe's fast convergence properties for computing high-dimensional least
squares problems.
Performance is further enhanced by some new techniques we introduce for
accelerating HDFE-IRLS estimation specifically. ppmlhdfe also implements a
novel and more robust approach to check for the existence of (pseudo) maximum
likelihood estimates.

arXiv link: http://arxiv.org/abs/1903.01690v3

Econometrics arXiv updated paper (originally submitted: 2019-03-05)

When do common time series estimands have nonparametric causal meaning?

Authors: Ashesh Rambachan, Neil Shephard

In this paper, we introduce the direct potential outcome system as a
framework for analyzing dynamic causal effects of assignments on outcomes in
observational time series settings. We provide conditions under which common
predictive time series estimands, such as the impulse response function,
generalized impulse response function, local projection, and local projection
instrumental variables, have a nonparametric causal interpretation in terms of
dynamic causal effects. The direct potential outcome system therefore provides
a foundation for analyzing popular reduced-form methods for estimating the
causal effect of macroeconomic shocks on outcomes in time series settings.

arXiv link: http://arxiv.org/abs/1903.01637v4

Econometrics arXiv updated paper (originally submitted: 2019-03-05)

Verifying the existence of maximum likelihood estimates for generalized linear models

Authors: Sergio Correia, Paulo Guimarães, Thomas Zylkin

A fundamental problem with nonlinear models is that maximum likelihood
estimates are not guaranteed to exist. Though nonexistence is a well known
problem in the binary choice literature, it presents significant challenges for
other models as well and is not as well understood in more general settings.
These challenges are only magnified for models that feature many fixed effects
and other high-dimensional parameters. We address the current ambiguity
surrounding this topic by studying the conditions that govern the existence of
estimates for (pseudo-)maximum likelihood estimators used to estimate a wide
class of generalized linear models (GLMs). We show that some, but not all, of
these GLM estimators can still deliver consistent estimates of at least some of
the linear parameters when these conditions fail to hold. We also demonstrate
how to verify these conditions in models with high-dimensional parameters, such
as panel data models with multiple levels of fixed effects.

arXiv link: http://arxiv.org/abs/1903.01633v7

Econometrics arXiv updated paper (originally submitted: 2019-03-04)

Finite Sample Inference for the Maximum Score Estimand

Authors: Adam M. Rosen, Takuya Ura

We provide a finite sample inference method for the structural parameters of
a semiparametric binary response model under a conditional median restriction
originally studied by Manski (1975, 1985). Our inference method is valid for
any sample size and irrespective of whether the structural parameters are point
identified or partially identified, for example due to the lack of a
continuously distributed covariate with large support. Our inference approach
exploits distributional properties of observable outcomes conditional on the
observed sequence of exogenous variables. Moment inequalities conditional on
this size n sequence of exogenous covariates are constructed, and the test
statistic is a monotone function of violations of sample moment inequalities.
The critical value used for inference is provided by the appropriate quantile
of a known function of n independent Rademacher random variables. We
investigate power properties of the underlying test and provide simulation
studies to support the theoretical findings.

arXiv link: http://arxiv.org/abs/1903.01511v2

Econometrics arXiv updated paper (originally submitted: 2019-03-04)

Limit Theorems for Network Dependent Random Variables

Authors: Denis Kojevnikov, Vadim Marmer, Kyungchul Song

This paper is concerned with cross-sectional dependence arising because
observations are interconnected through an observed network. Following Doukhan
and Louhichi (1999), we measure the strength of dependence by covariances of
nonlinearly transformed variables. We provide a law of large numbers and
central limit theorem for network dependent variables. We also provide a method
of calculating standard errors robust to general forms of network dependence.
For that purpose, we rely on a network heteroskedasticity and autocorrelation
consistent (HAC) variance estimator, and show its consistency. The results rely
on conditions characterized by tradeoffs between the rate of decay of
dependence across a network and network's denseness. Our approach can
accommodate data generated by network formation models, random fields on
graphs, conditional dependency graphs, and large functional-causal systems of
equations.

arXiv link: http://arxiv.org/abs/1903.01059v6

Econometrics arXiv updated paper (originally submitted: 2019-03-02)

Model Selection in Utility-Maximizing Binary Prediction

Authors: Jiun-Hua Su

The maximum utility estimation proposed by Elliott and Lieli (2013) can be
viewed as cost-sensitive binary classification; thus, its in-sample overfitting
issue is similar to that of perceptron learning. A utility-maximizing
prediction rule (UMPR) is constructed to alleviate the in-sample overfitting of
the maximum utility estimation. We establish non-asymptotic upper bounds on the
difference between the maximal expected utility and the generalized expected
utility of the UMPR. Simulation results show that the UMPR with an appropriate
data-dependent penalty achieves larger generalized expected utility than common
estimators in the binary classification if the conditional probability of the
binary outcome is misspecified.

arXiv link: http://arxiv.org/abs/1903.00716v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2019-03-02

Approximation Properties of Variational Bayes for Vector Autoregressions

Authors: Reza Hajargasht

Variational Bayes (VB) is a recent approximate method for Bayesian inference.
It has the merit of being a fast and scalable alternative to Markov Chain Monte
Carlo (MCMC) but its approximation error is often unknown. In this paper, we
derive the approximation error of VB in terms of mean, mode, variance,
predictive density and KL divergence for the linear Gaussian multi-equation
regression. Our results indicate that VB approximates the posterior mean
perfectly. Factors affecting the magnitude of underestimation in posterior
variance and mode are revealed. Importantly, We demonstrate that VB estimates
predictive densities accurately.

arXiv link: http://arxiv.org/abs/1903.00617v1

Econometrics arXiv paper, submitted: 2019-02-28

Robust Nearly-Efficient Estimation of Large Panels with Factor Structures

Authors: Marco Avarucci, Paolo Zaffaroni

This paper studies estimation of linear panel regression models with
heterogeneous coefficients, when both the regressors and the residual contain a
possibly common, latent, factor structure. Our theory is (nearly) efficient,
because based on the GLS principle, and also robust to the specification of
such factor structure because it does not require any information on the number
of factors nor estimation of the factor structure itself. We first show how the
unfeasible GLS estimator not only affords an efficiency improvement but, more
importantly, provides a bias-adjusted estimator with the conventional limiting
distribution, for situations where the OLS is affected by a first-order bias.
The technical challenge resolved in the paper is to show how these properties
are preserved for a class of feasible GLS estimators in a double-asymptotics
setting. Our theory is illustrated by means of Monte Carlo exercises and, then,
with an empirical application using individual asset returns and firms'
characteristics data.

arXiv link: http://arxiv.org/abs/1902.11181v1

Econometrics arXiv updated paper (originally submitted: 2019-02-28)

Integrability and Identification in Multinomial Choice Models

Authors: Debopam Bhattacharya

McFadden's random-utility model of multinomial choice has long been the
workhorse of applied research. We establish shape-restrictions under which
multinomial choice-probability functions can be rationalized via random-utility
models with nonparametric unobserved heterogeneity and general income-effects.
When combined with an additional restriction, the above conditions are
equivalent to the canonical Additive Random Utility Model. The
sufficiency-proof is constructive, and facilitates nonparametric identification
of preference-distributions without requiring identification-at-infinity type
arguments. A corollary shows that Slutsky-symmetry, a key condition for
previous rationalizability results, is equivalent to absence of income-effects.
Our results imply theory-consistent nonparametric bounds for
choice-probabilities on counterfactual budget-sets. They also apply to widely
used random-coefficient models, upon conditioning on observable choice
characteristics. The theory of partial differential equations plays a key role
in our analysis.

arXiv link: http://arxiv.org/abs/1902.11017v4

Econometrics arXiv updated paper (originally submitted: 2019-02-28)

The Empirical Content of Binary Choice Models

Authors: Debopam Bhattacharya

An important goal of empirical demand analysis is choice and welfare
prediction on counterfactual budget sets arising from potential
policy-interventions. Such predictions are more credible when made without
arbitrary functional-form/distributional assumptions, and instead based solely
on economic rationality, i.e. that choice is consistent with utility
maximization by a heterogeneous population. This paper investigates
nonparametric economic rationality in the empirically important context of
binary choice. We show that under general unobserved heterogeneity, economic
rationality is equivalent to a pair of Slutsky-like shape-restrictions on
choice-probability functions. The forms of these restrictions differ from
Slutsky-inequalities for continuous goods. Unlike McFadden-Richter's stochastic
revealed preference, our shape-restrictions (a) are global, i.e. their forms do
not depend on which and how many budget-sets are observed, (b) are closed-form,
hence easy to impose on parametric/semi/non-parametric models in practical
applications, and (c) provide computationally simple, theory-consistent bounds
on demand and welfare predictions on counterfactual budget-sets.

arXiv link: http://arxiv.org/abs/1902.11012v4

Econometrics arXiv updated paper (originally submitted: 2019-02-28)

Granger Causality Testing in High-Dimensional VARs: a Post-Double-Selection Procedure

Authors: Alain Hecq, Luca Margaritella, Stephan Smeekes

We develop an LM test for Granger causality in high-dimensional VAR models
based on penalized least squares estimations. To obtain a test retaining the
appropriate size after the variable selection done by the lasso, we propose a
post-double-selection procedure to partial out effects of nuisance variables
and establish its uniform asymptotic validity. We conduct an extensive set of
Monte-Carlo simulations that show our tests perform well under different data
generating processes, even without sparsity. We apply our testing procedure to
find networks of volatility spillovers and we find evidence that causal
relationships become clearer in high-dimensional compared to standard
low-dimensional VARs.

arXiv link: http://arxiv.org/abs/1902.10991v4

Econometrics arXiv paper, submitted: 2019-02-27

Estimation of Dynamic Panel Threshold Model using Stata

Authors: Myung Hwan Seo, Sueyoul Kim, Young-Joo Kim

We develop a Stata command xthenreg to implement the first-differenced GMM
estimation of the dynamic panel threshold model, which Seo and Shin (2016,
Journal of Econometrics 195: 169-186) have proposed. Furthermore, We derive the
asymptotic variance formula for a kink constrained GMM estimator of the dynamic
threshold model and include an estimation algorithm. We also propose a fast
bootstrap algorithm to implement the bootstrap for the linearity test. The use
of the command is illustrated through a Monte Carlo simulation and an economic
application.

arXiv link: http://arxiv.org/abs/1902.10318v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-02-26

Penalized Sieve GEL for Weighted Average Derivatives of Nonparametric Quantile IV Regressions

Authors: Xiaohong Chen, Demian Pouzo, James L. Powell

This paper considers estimation and inference for a weighted average
derivative (WAD) of a nonparametric quantile instrumental variables regression
(NPQIV). NPQIV is a non-separable and nonlinear ill-posed inverse problem,
which might be why there is no published work on the asymptotic properties of
any estimator of its WAD. We first characterize the semiparametric efficiency
bound for a WAD of a NPQIV, which, unfortunately, depends on an unknown
conditional derivative operator and hence an unknown degree of ill-posedness,
making it difficult to know if the information bound is singular or not. In
either case, we propose a penalized sieve generalized empirical likelihood
(GEL) estimation and inference procedure, which is based on the unconditional
WAD moment restriction and an increasing number of unconditional moments that
are implied by the conditional NPQIV restriction, where the unknown quantile
function is approximated by a penalized sieve. Under some regularity
conditions, we show that the self-normalized penalized sieve GEL estimator of
the WAD of a NPQIV is asymptotically standard normal. We also show that the
quasi likelihood ratio statistic based on the penalized sieve GEL criterion is
asymptotically chi-square distributed regardless of whether or not the
information bound is singular.

arXiv link: http://arxiv.org/abs/1902.10100v1

Econometrics arXiv paper, submitted: 2019-02-26

Semiparametric estimation of heterogeneous treatment effects under the nonignorable assignment condition

Authors: Keisuke Takahata, Takahiro Hoshino

We propose a semiparametric two-stage least square estimator for the
heterogeneous treatment effects (HTE). HTE is the solution to certain integral
equation which belongs to the class of Fredholm integral equations of the first
kind, which is known to be ill-posed problem. Naive semi/nonparametric methods
do not provide stable solution to such problems. Then we propose to approximate
the function of interest by orthogonal series under the constraint which makes
the inverse mapping of integral to be continuous and eliminates the
ill-posedness. We illustrate the performance of the proposed estimator through
simulation experiments.

arXiv link: http://arxiv.org/abs/1902.09978v1

Econometrics arXiv updated paper (originally submitted: 2019-02-25)

Binscatter Regressions

Authors: Matias D. Cattaneo, Richard K. Crump, Max H. Farrell, Yingjie Feng

We introduce the package Binsreg, which implements the binscatter methods
developed by Cattaneo, Crump, Farrell, and Feng (2024b,a). The package includes
seven commands: binsreg, binslogit, binsprobit, binsqreg, binstest, binspwc,
and binsregselect. The first four commands implement binscatter plotting, point
estimation, and uncertainty quantification (confidence intervals and confidence
bands) for least squares linear binscatter regression (binsreg) and for
nonlinear binscatter regression (binslogit for Logit regression, binsprobit for
Probit regression, and binsqreg for quantile regression). The next two commands
focus on pointwise and uniform inference: binstest implements hypothesis
testing procedures for parametric specifications and for nonparametric shape
restrictions of the unknown regression function, while binspwc implements
multi-group pairwise statistical comparisons. Finally, the command
binsregselect implements data-driven number of bins selectors. The commands
offer binned scatter plots, and allow for covariate adjustment, weighting,
clustering, and multi-sample analysis, which is useful when studying treatment
effect heterogeneity in randomized and observational studies, among many other
features.

arXiv link: http://arxiv.org/abs/1902.09615v5

Econometrics arXiv updated paper (originally submitted: 2019-02-25)

On Binscatter

Authors: Matias D. Cattaneo, Richard K. Crump, Max H. Farrell, Yingjie Feng

Binscatter is a popular method for visualizing bivariate relationships and
conducting informal specification testing. We study the properties of this
method formally and develop enhanced visualization and econometric binscatter
tools. These include estimating conditional means with optimal binning and
quantifying uncertainty. We also highlight a methodological problem related to
covariate adjustment that can yield incorrect conclusions. We revisit two
applications using our methodology and find substantially different results
relative to those obtained using prior informal binscatter methods. General
purpose software in Python, R, and Stata is provided. Our technical work is of
independent interest for the nonparametric partition-based estimation
literature.

arXiv link: http://arxiv.org/abs/1902.09608v5

Econometrics arXiv updated paper (originally submitted: 2019-02-23)

Robust Principal Component Analysis with Non-Sparse Errors

Authors: Jushan Bai, Junlong Feng

We show that when a high-dimensional data matrix is the sum of a low-rank
matrix and a random error matrix with independent entries, the low-rank
component can be consistently estimated by solving a convex minimization
problem. We develop a new theoretical argument to establish consistency without
assuming sparsity or the existence of any moments of the error matrix, so that
fat-tailed continuous random errors such as Cauchy are allowed. The results are
illustrated by simulations.

arXiv link: http://arxiv.org/abs/1902.08735v2

Econometrics arXiv paper, submitted: 2019-02-22

Counterfactual Inference in Duration Models with Random Censoring

Authors: Jiun-Hua Su

We propose a counterfactual Kaplan-Meier estimator that incorporates
exogenous covariates and unobserved heterogeneity of unrestricted
dimensionality in duration models with random censoring. Under some regularity
conditions, we establish the joint weak convergence of the proposed
counterfactual estimator and the unconditional Kaplan-Meier (1958) estimator.
Applying the functional delta method, we make inference on the cumulative
hazard policy effect, that is, the change of duration dependence in response to
a counterfactual policy. We also evaluate the finite sample performance of the
proposed counterfactual estimation method in a Monte Carlo study.

arXiv link: http://arxiv.org/abs/1902.08502v1

Econometrics arXiv updated paper (originally submitted: 2019-02-22)

Nonparametric Counterfactuals in Random Utility Models

Authors: Yuichi Kitamura, Jörg Stoye

We bound features of counterfactual choices in the nonparametric random
utility model of demand, i.e. if observable choices are repeated cross-sections
and one allows for unrestricted, unobserved heterogeneity. In this setting,
tight bounds are developed on counterfactual discrete choice probabilities and
on the expectation and c.d.f. of (functionals of) counterfactual stochastic
demand.

arXiv link: http://arxiv.org/abs/1902.08350v2

Econometrics arXiv updated paper (originally submitted: 2019-02-20)

Robust Ranking of Happiness Outcomes: A Median Regression Perspective

Authors: Le-Yu Chen, Ekaterina Oparina, Nattavudh Powdthavee, Sorawoot Srisuma

Ordered probit and logit models have been frequently used to estimate the
mean ranking of happiness outcomes (and other ordinal data) across groups.
However, it has been recently highlighted that such ranking may not be
identified in most happiness applications. We suggest researchers focus on
median comparison instead of the mean. This is because the median rank can be
identified even if the mean rank is not. Furthermore, median ranks in probit
and logit models can be readily estimated using standard statistical softwares.
The median ranking, as well as ranking for other quantiles, can also be
estimated semiparametrically and we provide a new constrained mixed integer
optimization procedure for implementation. We apply it to estimate a happiness
equation using General Social Survey data of the US.

arXiv link: http://arxiv.org/abs/1902.07696v3

Econometrics arXiv updated paper (originally submitted: 2019-02-20)

Eliciting ambiguity with mixing bets

Authors: Patrick Schmidt

Preferences for mixing can reveal ambiguity perception and attitude on a
single event. The validity of the approach is discussed for multiple preference
classes including maxmin, maxmax, variational, and smooth second-order
preferences. An experimental implementation suggests that participants perceive
almost as much ambiguity for the stock index and actions of other participants
as for the Ellsberg urn, indicating the importance of ambiguity in real-world
decision-making.

arXiv link: http://arxiv.org/abs/1902.07447v5

Econometrics arXiv updated paper (originally submitted: 2019-02-19)

Estimation and Inference for Synthetic Control Methods with Spillover Effects

Authors: Jianfei Cao, Connor Dowd

The synthetic control method is often used in treatment effect estimation
with panel data where only a few units are treated and a small number of
post-treatment periods are available. Current estimation and inference
procedures for synthetic control methods do not allow for the existence of
spillover effects, which are plausible in many applications. In this paper, we
consider estimation and inference for synthetic control methods, allowing for
spillover effects. We propose estimators for both direct treatment effects and
spillover effects and show they are asymptotically unbiased. In addition, we
propose an inferential procedure and show it is asymptotically unbiased. Our
estimation and inference procedure applies to cases with multiple treated units
or periods, and where the underlying factor model is either stationary or
cointegrated. In simulations, we confirm that the presence of spillovers
renders current methods biased and have distorted sizes, whereas our methods
yield properly sized tests and retain reasonable power. We apply our method to
a classic empirical example that investigates the effect of California's
tobacco control program as in Abadie et al. (2010) and find evidence of
spillovers.

arXiv link: http://arxiv.org/abs/1902.07343v2

Econometrics arXiv cross-link from cs.SI (cs.SI), submitted: 2019-02-19

Estimating Network Effects Using Naturally Occurring Peer Notification Queue Counterfactuals

Authors: Craig Tutterow, Guillaume Saint-Jacques

Randomized experiments, or A/B tests are used to estimate the causal impact
of a feature on the behavior of users by creating two parallel universes in
which members are simultaneously assigned to treatment and control. However, in
social network settings, members interact, such that the impact of a feature is
not always contained within the treatment group. Researchers have developed a
number of experimental designs to estimate network effects in social settings.
Alternatively, naturally occurring exogenous variation, or 'natural
experiments,' allow researchers to recover causal estimates of peer effects
from observational data in the absence of experimental manipulation. Natural
experiments trade off the engineering costs and some of the ethical concerns
associated with network randomization with the search costs of finding
situations with natural exogenous variation. To mitigate the search costs
associated with discovering natural counterfactuals, we identify a common
engineering requirement used to scale massive online systems, in which natural
exogenous variation is likely to exist: notification queueing. We identify two
natural experiments on the LinkedIn platform based on the order of notification
queues to estimate the causal impact of a received message on the engagement of
a recipient. We show that receiving a message from another member significantly
increases a member's engagement, but that some popular observational
specifications, such as fixed-effects estimators, overestimate this effect by
as much as 2.7x. We then apply the estimated network effect coefficients to a
large body of past experiments to quantify the extent to which it changes our
interpretation of experimental results. The study points to the benefits of
using messaging queues to discover naturally occurring counterfactuals for the
estimation of causal effects without experimenter intervention.

arXiv link: http://arxiv.org/abs/1902.07133v1

Econometrics arXiv updated paper (originally submitted: 2019-02-18)

Discrete Choice under Risk with Limited Consideration

Authors: Levon Barseghyan, Francesca Molinari, Matthew Thirkettle

This paper is concerned with learning decision makers' preferences using data
on observed choices from a finite set of risky alternatives. We propose a
discrete choice model with unobserved heterogeneity in consideration sets and
in standard risk aversion. We obtain sufficient conditions for the model's
semi-nonparametric point identification, including in cases where consideration
depends on preferences and on some of the exogenous variables. Our method
yields an estimator that is easy to compute and is applicable in markets with
large choice sets. We illustrate its properties using a dataset on property
insurance purchases.

arXiv link: http://arxiv.org/abs/1902.06629v3

Econometrics arXiv paper, submitted: 2019-02-17

Semiparametric correction for endogenous truncation bias with Vox Populi based participation decision

Authors: Nir Billfeld, Moshe Kim

We synthesize the knowledge present in various scientific disciplines for the
development of semiparametric endogenous truncation-proof algorithm, correcting
for truncation bias due to endogenous self-selection. This synthesis enriches
the algorithm's accuracy, efficiency and applicability. Improving upon the
covariate shift assumption, data are intrinsically affected and largely
generated by their own behavior (cognition). Refining the concept of Vox Populi
(Wisdom of Crowd) allows data points to sort themselves out depending on their
estimated latent reference group opinion space. Monte Carlo simulations, based
on 2,000,000 different distribution functions, practically generating 100
million realizations, attest to a very high accuracy of our model.

arXiv link: http://arxiv.org/abs/1902.06286v1

Econometrics arXiv paper, submitted: 2019-02-16

Weak Identification and Estimation of Social Interaction Models

Authors: Guy Tchuente

The identification of the network effect is based on either group size
variation, the structure of the network or the relative position in the
network. I provide easy-to-verify necessary conditions for identification of
undirected network models based on the number of distinct eigenvalues of the
adjacency matrix. Identification of network effects is possible; although in
many empirical situations existing identification strategies may require the
use of many instruments or instruments that could be strongly correlated with
each other. The use of highly correlated instruments or many instruments may
lead to weak identification or many instruments bias. This paper proposes
regularized versions of the two-stage least squares (2SLS) estimators as a
solution to these problems. The proposed estimators are consistent and
asymptotically normal. A Monte Carlo study illustrates the properties of the
regularized estimators. An empirical application, assessing a local government
tax competition model, shows the empirical relevance of using regularization
methods.

arXiv link: http://arxiv.org/abs/1902.06143v1

Econometrics arXiv updated paper (originally submitted: 2019-02-14)

Partial Identification in Matching Models for the Marriage Market

Authors: Cristina Gualdani, Shruti Sinha

We study partial identification of the preference parameters in the
one-to-one matching model with perfectly transferable utilities. We do so
without imposing parametric distributional assumptions on the unobserved
heterogeneity and with data on one large market. We provide a tractable
characterisation of the identified set under various classes of nonparametric
distributional assumptions on the unobserved heterogeneity. Using our
methodology, we re-examine some of the relevant questions in the empirical
literature on the marriage market, which have been previously studied under the
Logit assumption. Our results reveal that many findings in the aforementioned
literature are primarily driven by such parametric restrictions.

arXiv link: http://arxiv.org/abs/1902.05610v6

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2019-02-08

Censored Quantile Regression Forests

Authors: Alexander Hanbo Li, Jelena Bradic

Random forests are powerful non-parametric regression method but are severely
limited in their usage in the presence of randomly censored observations, and
naively applied can exhibit poor predictive performance due to the incurred
biases. Based on a local adaptive representation of random forests, we develop
its regression adjustment for randomly censored regression quantile models.
Regression adjustment is based on new estimating equations that adapt to
censoring and lead to quantile score whenever the data do not exhibit
censoring. The proposed procedure named censored quantile regression forest,
allows us to estimate quantiles of time-to-event without any parametric
modeling assumption. We establish its consistency under mild model
specifications. Numerical studies showcase a clear advantage of the proposed
procedure.

arXiv link: http://arxiv.org/abs/1902.03327v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-02-08

Testing the Order of Multivariate Normal Mixture Models

Authors: Hiroyuki Kasahara, Katsumi Shimotsu

Finite mixtures of multivariate normal distributions have been widely used in
empirical applications in diverse fields such as statistical genetics and
statistical finance. Testing the number of components in multivariate normal
mixture models is a long-standing challenge even in the most important case of
testing homogeneity. This paper develops likelihood-based tests of the null
hypothesis of $M_0$ components against the alternative hypothesis of $M_0 + 1$
components for a general $M_0 \geq 1$. For heteroscedastic normal mixtures, we
propose an EM test and derive the asymptotic distribution of the EM test
statistic. For homoscedastic normal mixtures, we derive the asymptotic
distribution of the likelihood ratio test statistic. We also derive the
asymptotic distribution of the likelihood ratio test statistic and EM test
statistic under local alternatives and show the validity of parametric
bootstrap. The simulations show that the proposed test has good finite sample
size and power properties.

arXiv link: http://arxiv.org/abs/1902.02920v1

Econometrics arXiv updated paper (originally submitted: 2019-02-05)

A Bootstrap Test for the Existence of Moments for GARCH Processes

Authors: Alexander Heinemann

This paper studies the joint inference on conditional volatility parameters
and the innovation moments by means of bootstrap to test for the existence of
moments for GARCH(p,q) processes. We propose a residual bootstrap to mimic the
joint distribution of the quasi-maximum likelihood estimators and the empirical
moments of the residuals and also prove its validity. A bootstrap-based test
for the existence of moments is proposed, which provides asymptotically
correctly-sized tests without losing its consistency property. It is simple to
implement and extends to other GARCH-type settings. A simulation study
demonstrates the test's size and power properties in finite samples and an
empirical application illustrates the testing approach.

arXiv link: http://arxiv.org/abs/1902.01808v3

Econometrics arXiv paper, submitted: 2019-02-05

A General Framework for Prediction in Time Series Models

Authors: Eric Beutner, Alexander Heinemann, Stephan Smeekes

In this paper we propose a general framework to analyze prediction in time
series models and show how a wide class of popular time series models satisfies
this framework. We postulate a set of high-level assumptions, and formally
verify these assumptions for the aforementioned time series models. Our
framework coincides with that of Beutner et al. (2019, arXiv:1710.00643) who
establish the validity of conditional confidence intervals for predictions made
in this framework. The current paper therefore complements the results in
Beutner et al. (2019, arXiv:1710.00643) by providing practically relevant
applications of their theory.

arXiv link: http://arxiv.org/abs/1902.01622v1

Econometrics arXiv paper, submitted: 2019-02-04

Asymptotic Theory for Clustered Samples

Authors: Bruce E. Hansen, Seojeong Lee

We provide a complete asymptotic distribution theory for clustered data with
a large number of independent groups, generalizing the classic laws of large
numbers, uniform laws, central limit theory, and clustered covariance matrix
estimation. Our theory allows for clustered observations with heterogeneous and
unbounded cluster sizes. Our conditions cleanly nest the classical results for
i.n.i.d. observations, in the sense that our conditions specialize to the
classical conditions under independent sampling. We use this theory to develop
a full asymptotic distribution theory for estimation based on linear
least-squares, 2SLS, nonlinear MLE, and nonlinear GMM.

arXiv link: http://arxiv.org/abs/1902.01497v1

Econometrics arXiv updated paper (originally submitted: 2019-02-04)

A Sieve-SMM Estimator for Dynamic Models

Authors: Jean-Jacques Forneron

This paper proposes a Sieve Simulated Method of Moments (Sieve-SMM) estimator
for the parameters and the distribution of the shocks in nonlinear dynamic
models where the likelihood and the moments are not tractable. An important
concern with SMM, which matches sample with simulated moments, is that a
parametric distribution is required. However, economic quantities that depend
on this distribution, such as welfare and asset-prices, can be sensitive to
misspecification. The Sieve-SMM estimator addresses this issue by flexibly
approximating the distribution of the shocks with a Gaussian and tails mixture
sieve. The asymptotic framework provides consistency, rate of convergence and
asymptotic normality results, extending existing results to a new framework
with more general dynamics and latent variables. An application to asset
pricing in a production economy shows a large decline in the estimates of
relative risk-aversion, highlighting the empirical relevance of
misspecification bias.

arXiv link: http://arxiv.org/abs/1902.01456v4

Econometrics arXiv updated paper (originally submitted: 2019-02-04)

Factor Investing: A Bayesian Hierarchical Approach

Authors: Guanhao Feng, Jingyu He

This paper investigates asset allocation problems when returns are
predictable. We introduce a market-timing Bayesian hierarchical (BH) approach
that adopts heterogeneous time-varying coefficients driven by lagged
fundamental characteristics. Our approach includes a joint estimation of
conditional expected returns and covariance matrix and considers estimation
risk for portfolio analysis. The hierarchical prior allows modeling different
assets separately while sharing information across assets. We demonstrate the
performance of the U.S. equity market. Though the Bayesian forecast is slightly
biased, our BH approach outperforms most alternative methods in point and
interval prediction. Our BH approach in sector investment for the recent twenty
years delivers a 0.92% average monthly returns and a 0.32% significant
Jensen`s alpha. We also find technology, energy, and manufacturing are
important sectors in the past decade, and size, investment, and short-term
reversal factors are heavily weighted. Finally, the stochastic discount factor
constructed by our BH approach explains most anomalies.

arXiv link: http://arxiv.org/abs/1902.01015v3

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2019-01-31

Approaches Toward the Bayesian Estimation of the Stochastic Volatility Model with Leverage

Authors: Darjus Hosszejni, Gregor Kastner

The sampling efficiency of MCMC methods in Bayesian inference for stochastic
volatility (SV) models is known to highly depend on the actual parameter
values, and the effectiveness of samplers based on different parameterizations
varies significantly. We derive novel algorithms for the centered and the
non-centered parameterizations of the practically highly relevant SV model with
leverage, where the return process and innovations of the volatility process
are allowed to correlate. Moreover, based on the idea of
ancillarity-sufficiency interweaving (ASIS), we combine the resulting samplers
in order to guarantee stable sampling efficiency irrespective of the baseline
parameterization.We carry out an extensive comparison to already existing
sampling methods for this model using simulated as well as real world data.

arXiv link: http://arxiv.org/abs/1901.11491v2

Econometrics arXiv updated paper (originally submitted: 2019-01-31)

A dynamic factor model approach to incorporate Big Data in state space models for official statistics

Authors: Caterina Schiavoni, Franz Palm, Stephan Smeekes, Jan van den Brakel

In this paper we consider estimation of unobserved components in state space
models using a dynamic factor approach to incorporate auxiliary information
from high-dimensional data sources. We apply the methodology to unemployment
estimation as done by Statistics Netherlands, who uses a multivariate state
space model to produce monthly figures for the unemployment using series
observed with the labour force survey (LFS). We extend the model by including
auxiliary series of Google Trends about job-search and economic uncertainty,
and claimant counts, partially observed at higher frequencies. Our factor model
allows for nowcasting the variable of interest, providing reliable unemployment
estimates in real-time before LFS data become available.

arXiv link: http://arxiv.org/abs/1901.11355v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2019-01-26

Volatility Models Applied to Geophysics and High Frequency Financial Market Data

Authors: Maria C Mariani, Md Al Masum Bhuiyan, Osei K Tweneboah, Hector Gonzalez-Huizar, Ionut Florescu

This work is devoted to the study of modeling geophysical and financial time
series. A class of volatility models with time-varying parameters is presented
to forecast the volatility of time series in a stationary environment. The
modeling of stationary time series with consistent properties facilitates
prediction with much certainty. Using the GARCH and stochastic volatility
model, we forecast one-step-ahead suggested volatility with +/- 2 standard
prediction errors, which is enacted via Maximum Likelihood Estimation. We
compare the stochastic volatility model relying on the filtering technique as
used in the conditional volatility with the GARCH model. We conclude that the
stochastic volatility is a better forecasting tool than GARCH (1, 1), since it
is less conditioned by autoregressive past information.

arXiv link: http://arxiv.org/abs/1901.09145v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2019-01-25

Orthogonal Statistical Learning

Authors: Dylan J. Foster, Vasilis Syrgkanis

We provide non-asymptotic excess risk guarantees for statistical learning in
a setting where the population risk with respect to which we evaluate the
target parameter depends on an unknown nuisance parameter that must be
estimated from data. We analyze a two-stage sample splitting meta-algorithm
that takes as input arbitrary estimation algorithms for the target parameter
and nuisance parameter. We show that if the population risk satisfies a
condition called Neyman orthogonality, the impact of the nuisance estimation
error on the excess risk bound achieved by the meta-algorithm is of second
order. Our theorem is agnostic to the particular algorithms used for the target
and nuisance and only makes an assumption on their individual performance. This
enables the use of a plethora of existing results from machine learning to give
new guarantees for learning with a nuisance component. Moreover, by focusing on
excess risk rather than parameter estimation, we can provide rates under weaker
assumptions than in previous works and accommodate settings in which the target
parameter belongs to a complex nonparametric class. We provide conditions on
the metric entropy of the nuisance and target classes such that oracle rates of
the same order as if we knew the nuisance parameter are achieved.

arXiv link: http://arxiv.org/abs/1901.09036v4

Econometrics arXiv paper, submitted: 2019-01-23

The Wisdom of a Kalman Crowd

Authors: Ulrik W. Nash

The Kalman Filter has been called one of the greatest inventions in
statistics during the 20th century. Its purpose is to measure the state of a
system by processing the noisy data received from different electronic sensors.
In comparison, a useful resource for managers in their effort to make the right
decisions is the wisdom of crowds. This phenomenon allows managers to combine
judgments by different employees to get estimates that are often more accurate
and reliable than estimates, which managers produce alone. Since harnessing the
collective intelligence of employees, and filtering signals from multiple noisy
sensors appear related, we looked at the possibility of using the Kalman Filter
on estimates by people. Our predictions suggest, and our findings based on the
Survey of Professional Forecasters reveal, that the Kalman Filter can help
managers solve their decision-making problems by giving them stronger signals
before they choose. Indeed, when used on a subset of forecasters identified by
the Contribution Weighted Model, the Kalman Filter beat that rule clearly,
across all the forecasting horizons in the survey.

arXiv link: http://arxiv.org/abs/1901.08133v1

Econometrics arXiv paper, submitted: 2019-01-16

lassopack: Model selection and prediction with regularized regression in Stata

Authors: Achim Ahrens, Christian B. Hansen, Mark E. Schaffer

This article introduces lassopack, a suite of programs for regularized
regression in Stata. lassopack implements lasso, square-root lasso, elastic
net, ridge regression, adaptive lasso and post-estimation OLS. The methods are
suitable for the high-dimensional setting where the number of predictors $p$
may be large and possibly greater than the number of observations, $n$. We
offer three different approaches for selecting the penalization (`tuning')
parameters: information criteria (implemented in lasso2), $K$-fold
cross-validation and $h$-step ahead rolling cross-validation for cross-section,
panel and time-series data (cvlasso), and theory-driven (`rigorous')
penalization for the lasso and square-root lasso for cross-section and panel
data (rlasso). We discuss the theoretical framework and practical
considerations for each approach. We also present Monte Carlo results to
compare the performance of the penalization approaches.

arXiv link: http://arxiv.org/abs/1901.05397v1

Econometrics arXiv paper, submitted: 2019-01-15

Inference on Functionals under First Order Degeneracy

Authors: Qihui Chen, Zheng Fang

This paper presents a unified second order asymptotic framework for
conducting inference on parameters of the form $\phi(\theta_0)$, where
$\theta_0$ is unknown but can be estimated by $\hat\theta_n$, and $\phi$ is a
known map that admits null first order derivative at $\theta_0$. For a large
number of examples in the literature, the second order Delta method reveals a
nondegenerate weak limit for the plug-in estimator $\phi(\hat\theta_n)$. We
show, however, that the `standard' bootstrap is consistent if and only if the
second order derivative $\phi_{\theta_0}”=0$ under regularity conditions,
i.e., the standard bootstrap is inconsistent if $\phi_{\theta_0}”\neq 0$, and
provides degenerate limits unhelpful for inference otherwise. We thus identify
a source of bootstrap failures distinct from that in Fang and Santos (2018)
because the problem (of consistently bootstrapping a nondegenerate
limit) persists even if $\phi$ is differentiable. We show that the correction
procedure in Babu (1984) can be extended to our general setup. Alternatively, a
modified bootstrap is proposed when the map is in addition second
order nondifferentiable. Both are shown to provide local size control under
some conditions. As an illustration, we develop a test of common conditional
heteroskedastic (CH) features, a setting with both degeneracy and
nondifferentiability -- the latter is because the Jacobian matrix is degenerate
at zero and we allow the existence of multiple common CH features.

arXiv link: http://arxiv.org/abs/1901.04861v1

Econometrics arXiv paper, submitted: 2019-01-12

Mastering Panel 'Metrics: Causal Impact of Democracy on Growth

Authors: Shuowen Chen, Victor Chernozhukov, Iván Fernández-Val

The relationship between democracy and economic growth is of long-standing
interest. We revisit the panel data analysis of this relationship by Acemoglu,
Naidu, Restrepo and Robinson (forthcoming) using state of the art econometric
methods. We argue that this and lots of other panel data settings in economics
are in fact high-dimensional, resulting in principal estimators -- the fixed
effects (FE) and Arellano-Bond (AB) estimators -- to be biased to the degree
that invalidates statistical inference. We can however remove these biases by
using simple analytical and sample-splitting methods, and thereby restore valid
statistical inference. We find that the debiased FE and AB estimators produce
substantially higher estimates of the long-run effect of democracy on growth,
providing even stronger support for the key hypothesis in Acemoglu, Naidu,
Restrepo and Robinson (forthcoming). Given the ubiquitous nature of panel data,
we conclude that the use of debiased panel data estimators should substantially
improve the quality of empirical inference in economics.

arXiv link: http://arxiv.org/abs/1901.03821v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2019-01-11

Non-Parametric Inference Adaptive to Intrinsic Dimension

Authors: Khashayar Khosravi, Greg Lewis, Vasilis Syrgkanis

We consider non-parametric estimation and inference of conditional moment
models in high dimensions. We show that even when the dimension $D$ of the
conditioning variable is larger than the sample size $n$, estimation and
inference is feasible as long as the distribution of the conditioning variable
has small intrinsic dimension $d$, as measured by locally low doubling
measures. Our estimation is based on a sub-sampled ensemble of the $k$-nearest
neighbors ($k$-NN) $Z$-estimator. We show that if the intrinsic dimension of
the covariate distribution is equal to $d$, then the finite sample estimation
error of our estimator is of order $n^{-1/(d+2)}$ and our estimate is
$n^{1/(d+2)}$-asymptotically normal, irrespective of $D$. The sub-sampling size
required for achieving these results depends on the unknown intrinsic dimension
$d$. We propose an adaptive data-driven approach for choosing this parameter
and prove that it achieves the desired rates. We discuss extensions and
applications to heterogeneous treatment effect estimation.

arXiv link: http://arxiv.org/abs/1901.03719v3

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2019-01-11

Community Matters: Heterogeneous Impacts of a Sanitation Intervention

Authors: Laura Abramovsky, Britta Augsburg, Melanie Lührmann, Francisco Oteiza, Juan Pablo Rud

We study the effectiveness of a community-level information intervention
aimed at improving sanitation using a cluster-randomized controlled trial (RCT)
in Nigerian communities. The intervention, Community-Led Total Sanitation
(CLTS), is currently part of national sanitation policy in more than 25
countries. While average impacts are exiguous almost three years after
implementation at scale, the results hide important heterogeneity: the
intervention has strong and lasting effects on sanitation practices in poorer
communities. These are realized through increased sanitation investments. We
show that community wealth, widely available in secondary data, is a key
statistic for effective intervention targeting. Using data from five other
similar randomized interventions in various contexts, we find that
community-level wealth heterogeneity can rationalize the wide range of impact
estimates in the literature. This exercise provides plausible external validity
to our findings, with implications for intervention scale-up. JEL Codes: O12,
I12, I15, I18.

arXiv link: http://arxiv.org/abs/1901.03544v5

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-01-10

Estimating population average treatment effects from experiments with noncompliance

Authors: Kellie Ottoboni, Jason Poulos

Randomized control trials (RCTs) are the gold standard for estimating causal
effects, but often use samples that are non-representative of the actual
population of interest. We propose a reweighting method for estimating
population average treatment effects in settings with noncompliance.
Simulations show the proposed compliance-adjusted population estimator
outperforms its unadjusted counterpart when compliance is relatively low and
can be predicted by observed covariates. We apply the method to evaluate the
effect of Medicaid coverage on health care use for a target population of
adults who may benefit from expansions to the Medicaid program. We draw RCT
data from the Oregon Health Insurance Experiment, where less than one-third of
those randomly selected to receive Medicaid benefits actually enrolled.

arXiv link: http://arxiv.org/abs/1901.02991v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2019-01-08

Dynamic tail inference with log-Laplace volatility

Authors: Gordon V. Chavez

We propose a family of models that enable predictive estimation of
time-varying extreme event probabilities in heavy-tailed and nonlinearly
dependent time series. The models are a white noise process with conditionally
log-Laplace stochastic volatility. In contrast to other, similar stochastic
volatility formalisms, this process has analytic expressions for its
conditional probabilistic structure that enable straightforward estimation of
dynamically changing extreme event probabilities. The process and volatility
are conditionally Pareto-tailed, with tail exponent given by the reciprocal of
the log-volatility's mean absolute innovation. This formalism can accommodate a
wide variety of nonlinear dependence, as well as conditional power law-tail
behavior ranging from weakly non-Gaussian to Cauchy-like tails. We provide a
computationally straightforward estimation procedure that uses an asymptotic
approximation of the process' dynamic large deviation probabilities. We
demonstrate the estimator's utility with a simulation study. We then show the
method's predictive capabilities on a simulated nonlinear time series where the
volatility is driven by the chaotic Lorenz system. Lastly we provide an
empirical application, which shows that this simple modeling method can be
effectively used for dynamic and predictive tail inference in financial time
series.

arXiv link: http://arxiv.org/abs/1901.02419v5

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2019-01-07

Semi-parametric dynamic contextual pricing

Authors: Virag Shah, Jose Blanchet, Ramesh Johari

Motivated by the application of real-time pricing in e-commerce platforms, we
consider the problem of revenue-maximization in a setting where the seller can
leverage contextual information describing the customer's history and the
product's type to predict her valuation of the product. However, her true
valuation is unobservable to the seller, only binary outcome in the form of
success-failure of a transaction is observed. Unlike in usual contextual bandit
settings, the optimal price/arm given a covariate in our setting is sensitive
to the detailed characteristics of the residual uncertainty distribution. We
develop a semi-parametric model in which the residual distribution is
non-parametric and provide the first algorithm which learns both regression
parameters and residual distribution with $\tilde O(n)$ regret. We
empirically test a scalable implementation of our algorithm and observe good
performance.

arXiv link: http://arxiv.org/abs/1901.02045v4

Econometrics arXiv paper, submitted: 2019-01-07

Shrinkage for Categorical Regressors

Authors: Phillip Heiler, Jana Mareckova

This paper introduces a flexible regularization approach that reduces point
estimation risk of group means stemming from e.g. categorical regressors,
(quasi-)experimental data or panel data models. The loss function is penalized
by adding weighted squared l2-norm differences between group location
parameters and informative first-stage estimates. Under quadratic loss, the
penalized estimation problem has a simple interpretable closed-form solution
that nests methods established in the literature on ridge regression,
discretized support smoothing kernels and model averaging methods. We derive
risk-optimal penalty parameters and propose a plug-in approach for estimation.
The large sample properties are analyzed in an asymptotic local to zero
framework by introducing a class of sequences for close and distant systems of
locations that is sufficient for describing a large range of data generating
processes. We provide the asymptotic distributions of the shrinkage estimators
under different penalization schemes. The proposed plug-in estimator uniformly
dominates the ordinary least squares in terms of asymptotic risk if the number
of groups is larger than three. Monte Carlo simulations reveal robust
improvements over standard methods in finite samples. Real data examples of
estimating time trends in a panel and a difference-in-differences study
illustrate potential applications.

arXiv link: http://arxiv.org/abs/1901.01898v1

Econometrics arXiv updated paper (originally submitted: 2019-01-04)

Nonparametric Instrumental Variables Estimation Under Misspecification

Authors: Ben Deaner

Nonparametric Instrumental Variables (NPIV) analysis is based on a
conditional moment restriction. We show that if this moment condition is even
slightly misspecified, say because instruments are not quite valid, then NPIV
estimates can be subject to substantial asymptotic error and the identified set
under a relaxed moment condition may be large. Imposing strong a priori
smoothness restrictions mitigates the problem but induces bias if the
restrictions are too strong. In order to manage this trade-off we develop a
methods for empirical sensitivity analysis and apply them to the consumer
demand data previously analyzed in Blundell (2007) and Horowitz (2011).

arXiv link: http://arxiv.org/abs/1901.01241v7

Econometrics arXiv paper, submitted: 2019-01-02

Modeling Dynamic Transport Network with Matrix Factor Models: with an Application to International Trade Flow

Authors: Elynn Y. Chen, Rong Chen

International trade research plays an important role to inform trade policy
and shed light on wider issues relating to poverty, development, migration,
productivity, and economy. With recent advances in information technology,
global and regional agencies distribute an enormous amount of internationally
comparable trading data among a large number of countries over time, providing
a goldmine for empirical analysis of international trade. Meanwhile, an array
of new statistical methods are recently developed for dynamic network analysis.
However, these advanced methods have not been utilized for analyzing such
massive dynamic cross-country trading data. International trade data can be
viewed as a dynamic transport network because it emphasizes the amount of goods
moving across a network. Most literature on dynamic network analysis
concentrates on the connectivity network that focuses on link formation or
deformation rather than the transport moving across the network. We take a
different perspective from the pervasive node-and-edge level modeling: the
dynamic transport network is modeled as a time series of relational matrices.
We adopt a matrix factor model of wang2018factor, with a specific
interpretation for the dynamic transport network. Under the model, the observed
surface network is assumed to be driven by a latent dynamic transport network
with lower dimensions. The proposed method is able to unveil the latent dynamic
structure and achieve the objective of dimension reduction. We applied the
proposed framework and methodology to a data set of monthly trading volumes
among 24 countries and regions from 1982 to 2015. Our findings shed light on
trading hubs, centrality, trends and patterns of international trade and show
matching change points to trading policies. The dataset also provides a fertile
ground for future research on international trade.

arXiv link: http://arxiv.org/abs/1901.00769v1

Econometrics arXiv updated paper (originally submitted: 2018-12-30)

Salvaging Falsified Instrumental Variable Models

Authors: Matthew A. Masten, Alexandre Poirier

What should researchers do when their baseline model is refuted? We provide
four constructive answers. First, researchers can measure the extent of
falsification. To do this, we consider continuous relaxations of the baseline
assumptions of concern. We then define the falsification frontier: The smallest
relaxations of the baseline model which are not refuted. This frontier provides
a quantitative measure of the extent of falsification. Second, researchers can
present the identified set for the parameter of interest under the assumption
that the true model lies somewhere on this frontier. We call this the
falsification adaptive set. This set generalizes the standard baseline estimand
to account for possible falsification. Third, researchers can present the
identified set for a specific point on this frontier. Finally, as a sensitivity
analysis, researchers can present identified sets for points beyond the
frontier. To illustrate these four ways of salvaging falsified models, we study
overidentifying restrictions in two instrumental variable models: a homogeneous
effects linear model, and heterogeneous effect models with either binary or
continuous outcomes. In the linear model, we consider the classical
overidentifying restrictions implied when multiple instruments are observed. We
generalize these conditions by considering continuous relaxations of the
classical exclusion restrictions. By sufficiently weakening the assumptions, a
falsified baseline model becomes non-falsified. We obtain analogous results in
the heterogeneous effect models, where we derive identified sets for marginal
distributions of potential outcomes, falsification frontiers, and falsification
adaptive sets under continuous relaxations of the instrument exogeneity
assumptions. We illustrate our results in four different empirical
applications.

arXiv link: http://arxiv.org/abs/1812.11598v3

Econometrics arXiv updated paper (originally submitted: 2018-12-28)

Dynamic Models with Robust Decision Makers: Identification and Estimation

Authors: Timothy M. Christensen

This paper studies identification and estimation of a class of dynamic models
in which the decision maker (DM) is uncertain about the data-generating
process. The DM surrounds a benchmark model that he or she fears is
misspecified by a set of models. Decisions are evaluated under a worst-case
model delivering the lowest utility among all models in this set. The DM's
benchmark model and preference parameters are jointly underidentified. With the
benchmark model held fixed, primitive conditions are established for
identification of the DM's worst-case model and preference parameters. The key
step in the identification analysis is to establish existence and uniqueness of
the DM's continuation value function allowing for unbounded statespace and
unbounded utilities. To do so, fixed-point results are derived for monotone,
convex operators that act on a Banach space of thin-tailed functions arising
naturally from the structure of the continuation value recursion. The
fixed-point results are quite general; applications to models with learning and
Rust-type dynamic discrete choice models are also discussed. For estimation, a
perturbation result is derived which provides a necessary and sufficient
condition for consistent estimation of continuation values and the worst-case
model. The result also allows convergence rates of estimators to be
characterized. An empirical application studies an endowment economy where the
DM's benchmark model may be interpreted as an aggregate of experts' forecasting
models. The application reveals time-variation in the way the DM
pessimistically distorts benchmark probabilities. Consequences for asset
pricing are explored and connections are drawn with the literature on
macroeconomic uncertainty.

arXiv link: http://arxiv.org/abs/1812.11246v3

Econometrics arXiv paper, submitted: 2018-12-28

Predicting "Design Gaps" in the Market: Deep Consumer Choice Models under Probabilistic Design Constraints

Authors: Alex Burnap, John Hauser

Predicting future successful designs and corresponding market opportunity is
a fundamental goal of product design firms. There is accordingly a long history
of quantitative approaches that aim to capture diverse consumer preferences,
and then translate those preferences to corresponding "design gaps" in the
market. We extend this work by developing a deep learning approach to predict
design gaps in the market. These design gaps represent clusters of designs that
do not yet exist, but are predicted to be both (1) highly preferred by
consumers, and (2) feasible to build under engineering and manufacturing
constraints. This approach is tested on the entire U.S. automotive market using
of millions of real purchase data. We retroactively predict design gaps in the
market, and compare predicted design gaps with actual known successful designs.
Our preliminary results give evidence it may be possible to predict design
gaps, suggesting this approach has promise for early identification of market
opportunity.

arXiv link: http://arxiv.org/abs/1812.11067v1

Econometrics arXiv updated paper (originally submitted: 2018-12-28)

Decentralization Estimators for Instrumental Variable Quantile Regression Models

Authors: Hiroaki Kaido, Kaspar Wuthrich

The instrumental variable quantile regression (IVQR) model (Chernozhukov and
Hansen, 2005) is a popular tool for estimating causal quantile effects with
endogenous covariates. However, estimation is complicated by the non-smoothness
and non-convexity of the IVQR GMM objective function. This paper shows that the
IVQR estimation problem can be decomposed into a set of conventional quantile
regression sub-problems which are convex and can be solved efficiently. This
reformulation leads to new identification results and to fast, easy to
implement, and tuning-free estimators that do not require the availability of
high-level "black box" optimization routines.

arXiv link: http://arxiv.org/abs/1812.10925v4

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2018-12-27

Semiparametric Difference-in-Differences with Potentially Many Control Variables

Authors: Neng-Chieh Chang

This paper discusses difference-in-differences (DID) estimation when there
exist many control variables, potentially more than the sample size. In this
case, traditional estimation methods, which require a limited number of
variables, do not work. One may consider using statistical or machine learning
(ML) methods. However, by the well-known theory of inference of ML methods
proposed in Chernozhukov et al. (2018), directly applying ML methods to the
conventional semiparametric DID estimators will cause significant bias and make
these DID estimators fail to be sqrt{N}-consistent. This article proposes three
new DID estimators for three different data structures, which are able to
shrink the bias and achieve sqrt{N}-consistency and asymptotic normality with
mean zero when applying ML methods. This leads to straightforward inferential
procedures. In addition, I show that these new estimators have the small bias
property (SBP), meaning that their bias will converge to zero faster than the
pointwise bias of the nonparametric estimator on which it is based.

arXiv link: http://arxiv.org/abs/1812.10846v3

Econometrics arXiv updated paper (originally submitted: 2018-12-27)

Debiasing and $t$-tests for synthetic control inference on average causal effects

Authors: Victor Chernozhukov, Kaspar Wuthrich, Yinchu Zhu

We propose a practical and robust method for making inferences on average
treatment effects estimated by synthetic controls. We develop a $K$-fold
cross-fitting procedure for bias correction. To avoid the difficult estimation
of the long-run variance, inference is based on a self-normalized
$t$-statistic, which has an asymptotically pivotal $t$-distribution. Our
$t$-test is easy to implement, provably robust against misspecification, and
valid with stationary and non-stationary data. It demonstrates an excellent
small sample performance in application-based simulations and performs well
relative to other methods. We illustrate the usefulness of the $t$-test by
revisiting the effect of carbon taxes on emissions.

arXiv link: http://arxiv.org/abs/1812.10820v9

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2018-12-27

How to avoid the zero-power trap in testing for correlation

Authors: David Preinerstorfer

In testing for correlation of the errors in regression models the power of
tests can be very low for strongly correlated errors. This counterintuitive
phenomenon has become known as the "zero-power trap". Despite a considerable
amount of literature devoted to this problem, mainly focusing on its detection,
a convincing solution has not yet been found. In this article we first discuss
theoretical results concerning the occurrence of the zero-power trap
phenomenon. Then, we suggest and compare three ways to avoid it. Given an
initial test that suffers from the zero-power trap, the method we recommend for
practice leads to a modified test whose power converges to one as the
correlation gets very strong. Furthermore, the modified test has approximately
the same power function as the initial test, and thus approximately preserves
all of its optimality properties. We also provide some numerical illustrations
in the context of testing for network generated correlation.

arXiv link: http://arxiv.org/abs/1812.10752v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2018-12-24

Synthetic Difference in Differences

Authors: Dmitry Arkhangelsky, Susan Athey, David A. Hirshberg, Guido W. Imbens, Stefan Wager

We present a new estimator for causal effects with panel data that builds on
insights behind the widely used difference in differences and synthetic control
methods. Relative to these methods we find, both theoretically and empirically,
that this "synthetic difference in differences" estimator has desirable
robustness properties, and that it performs well in settings where the
conventional estimators are commonly used in practice. We study the asymptotic
behavior of the estimator when the systematic part of the outcome model
includes latent unit factors interacted with latent time factors, and we
present conditions for consistency and asymptotic normality.

arXiv link: http://arxiv.org/abs/1812.09970v4

Econometrics arXiv paper, submitted: 2018-12-22

Robust Tests for Convergence Clubs

Authors: Luisa Corrado, Melvyn Weeks, Thanasis Stengos, M. Ege Yazgan

In many applications common in testing for convergence the number of
cross-sectional units is large and the number of time periods are few. In these
situations asymptotic tests based on an omnibus null hypothesis are
characterised by a number of problems. In this paper we propose a multiple
pairwise comparisons method based on an a recursive bootstrap to test for
convergence with no prior information on the composition of convergence clubs.
Monte Carlo simulations suggest that our bootstrap-based test performs well to
correctly identify convergence clubs when compared with other similar tests
that rely on asymptotic arguments. Across a potentially large number of
regions, using both cross-country and regional data for the European Union, we
find that the size distortion which afflicts standard tests and results in a
bias towards finding less convergence, is ameliorated when we utilise our
bootstrap test.

arXiv link: http://arxiv.org/abs/1812.09518v1

Econometrics arXiv updated paper (originally submitted: 2018-12-22)

Modified Causal Forests for Estimating Heterogeneous Causal Effects

Authors: Michael Lechner

Uncovering the heterogeneity of causal effects of policies and business
decisions at various levels of granularity provides substantial value to
decision makers. This paper develops new estimation and inference procedures
for multiple treatment models in a selection-on-observables framework by
modifying the Causal Forest approach suggested by Wager and Athey (2018) in
several dimensions. The new estimators have desirable theoretical,
computational and practical properties for various aggregation levels of the
causal effects. While an Empirical Monte Carlo study suggests that they
outperform previously suggested estimators, an application to the evaluation of
an active labour market programme shows the value of the new methods for
applied research.

arXiv link: http://arxiv.org/abs/1812.09487v2

Econometrics arXiv updated paper (originally submitted: 2018-12-21)

Functional Sequential Treatment Allocation

Authors: Anders Bredahl Kock, David Preinerstorfer, Bezirgen Veliyev

Consider a setting in which a policy maker assigns subjects to treatments,
observing each outcome before the next subject arrives. Initially, it is
unknown which treatment is best, but the sequential nature of the problem
permits learning about the effectiveness of the treatments. While the
multi-armed-bandit literature has shed much light on the situation when the
policy maker compares the effectiveness of the treatments through their mean,
much less is known about other targets. This is restrictive, because a cautious
decision maker may prefer to target a robust location measure such as a
quantile or a trimmed mean. Furthermore, socio-economic decision making often
requires targeting purpose specific characteristics of the outcome
distribution, such as its inherent degree of inequality, welfare or poverty. In
the present paper we introduce and study sequential learning algorithms when
the distributional characteristic of interest is a general functional of the
outcome distribution. Minimax expected regret optimality results are obtained
within the subclass of explore-then-commit policies, and for the unrestricted
class of all policies.

arXiv link: http://arxiv.org/abs/1812.09408v8

Econometrics arXiv updated paper (originally submitted: 2018-12-21)

Many Average Partial Effects: with An Application to Text Regression

Authors: Harold D. Chiang

We study estimation, pointwise and simultaneous inference, and confidence
intervals for many average partial effects of lasso Logit. Focusing on
high-dimensional, cluster-sampling environments, we propose a new average
partial effect estimator and explore its asymptotic properties. Practical
penalty choices compatible with our asymptotic theory are also provided. The
proposed estimator allow for valid inference without requiring oracle property.
We provide easy-to-implement algorithms for cluster-robust high-dimensional
hypothesis testing and construction of simultaneously valid confidence
intervals using a multiplier cluster bootstrap. We apply the proposed
algorithms to the text regression model of Wu (2018) to examine the presence of
gendered language on the internet.

arXiv link: http://arxiv.org/abs/1812.09397v5

Econometrics arXiv updated paper (originally submitted: 2018-12-21)

Selection and the Distribution of Female Hourly Wages in the U.S

Authors: Iván Fernández-Val, Franco Peracchi, Aico van Vuuren, Francis Vella

We analyze the role of selection bias in generating the changes in the
observed distribution of female hourly wages in the United States using CPS
data for the years 1975 to 2020. We account for the selection bias from the
employment decision by modeling the distribution of the number of working hours
and estimating a nonseparable model of wages. We decompose changes in the wage
distribution into composition, structural and selection effects. Composition
effects have increased wages at all quantiles while the impact of the
structural effects varies by time period and quantile. Changes in the role of
selection only appear at the lower quantiles of the wage distribution. The
evidence suggests that there is positive selection in the 1970s which
diminishes until the later 1990s. This reduces wages at lower quantiles and
increases wage inequality. Post 2000 there appears to be an increase in
positive sorting which reduces the selection effects on wage inequality.

arXiv link: http://arxiv.org/abs/1901.00419v5

Econometrics arXiv updated paper (originally submitted: 2018-12-21)

Multivariate Fractional Components Analysis

Authors: Tobias Hartl, Roland Weigand

We propose a setup for fractionally cointegrated time series which is
formulated in terms of latent integrated and short-memory components. It
accommodates nonstationary processes with different fractional orders and
cointegration of different strengths and is applicable in high-dimensional
settings. In an application to realized covariance matrices, we find that
orthogonal short- and long-memory components provide a reasonable fit and
competitive out-of-sample performance compared to several competing methods.

arXiv link: http://arxiv.org/abs/1812.09149v2

Econometrics arXiv updated paper (originally submitted: 2018-12-21)

Approximate State Space Modelling of Unobserved Fractional Components

Authors: Tobias Hartl, Roland Weigand

We propose convenient inferential methods for potentially nonstationary
multivariate unobserved components models with fractional integration and
cointegration. Based on finite-order ARMA approximations in the state space
representation, maximum likelihood estimation can make use of the EM algorithm
and related techniques. The approximation outperforms the frequently used
autoregressive or moving average truncation, both in terms of computational
costs and with respect to approximation quality. Monte Carlo simulations reveal
good estimation properties of the proposed methods for processes of different
complexity and dimension.

arXiv link: http://arxiv.org/abs/1812.09142v3

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2018-12-21

Econometric modelling and forecasting of intraday electricity prices

Authors: Michał Narajewski, Florian Ziel

In the following paper, we analyse the ID$_3$-Price in the German Intraday
Continuous electricity market using an econometric time series model. A
multivariate approach is conducted for hourly and quarter-hourly products
separately. We estimate the model using lasso and elastic net techniques and
perform an out-of-sample, very short-term forecasting study. The model's
performance is compared with benchmark models and is discussed in detail.
Forecasting results provide new insights to the German Intraday Continuous
electricity market regarding its efficiency and to the ID$_3$-Price behaviour.

arXiv link: http://arxiv.org/abs/1812.09081v2

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2018-12-20

Multifractal cross-correlations between the World Oil and other Financial Markets in 2012-2017

Authors: Marcin Wątorek, Stanisław Drożdż, Paweł Oświȩcimka, Marek Stanuszek

Statistical and multiscaling characteristics of WTI Crude Oil prices
expressed in US dollar in relation to the most traded currencies as well as to
gold futures and to the E-mini S$&$P500 futures prices on 5 min intra-day
recordings in the period January 2012 - December 2017 are studied. It is shown
that in most of the cases the tails of return distributions of the considered
financial instruments follow the inverse cubic power law. The only exception is
the Russian ruble for which the distribution tail is heavier and scales with
the exponent close to 2. From the perspective of multiscaling the analysed time
series reveal the multifractal organization with the left-sided asymmetry of
the corresponding singularity spectra. Even more, all the considered financial
instruments appear to be multifractally cross-correlated with oil, especially
on the level of medium-size fluctuations, as the multifractal cross-correlation
analysis carried out by means of the multifractal cross-correlation analysis
(MFCCA) and detrended cross-correlation coefficient $\rho_q$ show. The degree
of such cross-correlations is however varying among the financial instruments.
The strongest ties to the oil characterize currencies of the oil extracting
countries. Strength of this multifractal coupling appears to depend also on the
oil market trend. In the analysed time period the level of cross-correlations
systematically increases during the bear phase on the oil market and it
saturates after the trend reversal in 1st half of 2016. The same methodology is
also applied to identify possible causal relations between considered
observables. Searching for some related asymmetry in the information flow
mediating cross-correlations indicates that it was the oil price that led the
Russian ruble over the time period here considered rather than vice versa.

arXiv link: http://arxiv.org/abs/1812.08548v2

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2018-12-20

A Primal-dual Learning Algorithm for Personalized Dynamic Pricing with an Inventory Constraint

Authors: Ningyuan Chen, Guillermo Gallego

We consider the problem of a firm seeking to use personalized pricing to sell
an exogenously given stock of a product over a finite selling horizon to
different consumer types. We assume that the type of an arriving consumer can
be observed but the demand function associated with each type is initially
unknown. The firm sets personalized prices dynamically for each type and
attempts to maximize the revenue over the season. We provide a learning
algorithm that is near-optimal when the demand and capacity scale in
proportion. The algorithm utilizes the primal-dual formulation of the problem
and learns the dual optimal solution explicitly. It allows the algorithm to
overcome the curse of dimensionality (the rate of regret is independent of the
number of types) and sheds light on novel algorithmic designs for learning
problems with resource constraints.

arXiv link: http://arxiv.org/abs/1812.09234v3

Econometrics arXiv updated paper (originally submitted: 2018-12-16)

Fuzzy Difference-in-Discontinuities: Identification Theory and Application to the Affordable Care Act

Authors: Hector Galindo-Silva, Nibene Habib Some, Guy Tchuente

This paper explores the use of a fuzzy regression discontinuity design where
multiple treatments are applied at the threshold. The identification results
show that, under the very strong assumption that the change in the probability
of treatment at the cutoff is equal across treatments, a
difference-in-discontinuities estimator identifies the treatment effect of
interest. The point estimates of the treatment effect using a simple fuzzy
difference-in-discontinuities design are biased if the change in the
probability of a treatment applying at the cutoff differs across treatments.
Modifications of the fuzzy difference-in-discontinuities approach that rely on
milder assumptions are also proposed. Our results suggest caution is needed
when applying before-and-after methods in the presence of fuzzy
discontinuities. Using data from the National Health Interview Survey, we apply
this new identification strategy to evaluate the causal effect of the
Affordable Care Act (ACA) on older Americans' health care access and
utilization.

arXiv link: http://arxiv.org/abs/1812.06537v3

Econometrics arXiv updated paper (originally submitted: 2018-12-16)

What Is the Value Added by Using Causal Machine Learning Methods in a Welfare Experiment Evaluation?

Authors: Anthony Strittmatter

Recent studies have proposed causal machine learning (CML) methods to
estimate conditional average treatment effects (CATEs). In this study, I
investigate whether CML methods add value compared to conventional CATE
estimators by re-evaluating Connecticut's Jobs First welfare experiment. This
experiment entails a mix of positive and negative work incentives. Previous
studies show that it is hard to tackle the effect heterogeneity of Jobs First
by means of CATEs. I report evidence that CML methods can provide support for
the theoretical labor supply predictions. Furthermore, I document reasons why
some conventional CATE estimators fail and discuss the limitations of CML
methods.

arXiv link: http://arxiv.org/abs/1812.06533v3

Econometrics arXiv updated paper (originally submitted: 2018-12-11)

Closing the U.S. gender wage gap requires understanding its heterogeneity

Authors: Philipp Bach, Victor Chernozhukov, Martin Spindler

In 2016, the majority of full-time employed women in the U.S. earned
significantly less than comparable men. The extent to which women were affected
by gender inequality in earnings, however, depended greatly on socio-economic
characteristics, such as marital status or educational attainment. In this
paper, we analyzed data from the 2016 American Community Survey using a
high-dimensional wage regression and applying double lasso to quantify
heterogeneity in the gender wage gap. We found that the gap varied
substantially across women and was driven primarily by marital status, having
children at home, race, occupation, industry, and educational attainment. We
recommend that policy makers use these insights to design policies that will
reduce discrimination and unequal pay more effectively.

arXiv link: http://arxiv.org/abs/1812.04345v2

Econometrics arXiv paper, submitted: 2018-12-09

A supreme test for periodic explosive GARCH

Authors: Stefan Richter, Weining Wang, Wei Biao Wu

We develop a uniform test for detecting and dating explosive behavior of a
strictly stationary GARCH$(r,s)$ (generalized autoregressive conditional
heteroskedasticity) process. Namely, we test the null hypothesis of a globally
stable GARCH process with constant parameters against an alternative where
there is an 'abnormal' period with changed parameter values. During this
period, the change may lead to an explosive behavior of the volatility process.
It is assumed that both the magnitude and the timing of the breaks are unknown.
We develop a double supreme test for the existence of a break, and then provide
an algorithm to identify the period of change. Our theoretical results hold
under mild moment assumptions on the innovations of the GARCH process.
Technically, the existing properties for the QMLE in the GARCH model need to be
reinvestigated to hold uniformly over all possible periods of change. The key
results involve a uniform weak Bahadur representation for the estimated
parameters, which leads to weak convergence of the test statistic to the
supreme of a Gaussian Process. In simulations we show that the test has good
size and power for reasonably large time series lengths. We apply the test to
Apple asset returns and Bitcoin returns.

arXiv link: http://arxiv.org/abs/1812.03475v1

Econometrics arXiv updated paper (originally submitted: 2018-12-06)

Improved Inference on the Rank of a Matrix

Authors: Qihui Chen, Zheng Fang

This paper develops a general framework for conducting inference on the rank
of an unknown matrix $\Pi_0$. A defining feature of our setup is the null
hypothesis of the form $\mathrm H_0: rank(\Pi_0)\le r$. The problem is
of first order importance because the previous literature focuses on $\mathrm
H_0': rank(\Pi_0)= r$ by implicitly assuming away
$rank(\Pi_0)<r$, which may lead to invalid rank tests due to
over-rejections. In particular, we show that limiting distributions of test
statistics under $\mathrm H_0'$ may not stochastically dominate those under
$rank(\Pi_0)<r$. A multiple test on the nulls
$rank(\Pi_0)=0,\ldots,r$, though valid, may be substantially
conservative. We employ a testing statistic whose limiting distributions under
$\mathrm H_0$ are highly nonstandard due to the inherent irregular natures of
the problem, and then construct bootstrap critical values that deliver size
control and improved power. Since our procedure relies on a tuning parameter, a
two-step procedure is designed to mitigate concerns on this nuisance. We
additionally argue that our setup is also important for estimation. We
illustrate the empirical relevance of our results through testing
identification in linear IV models that allows for clustered data and inference
on sorting dimensions in a two-sided matching model with transferrable utility.

arXiv link: http://arxiv.org/abs/1812.02337v2

Econometrics arXiv updated paper (originally submitted: 2018-12-06)

Identifying the Effect of Persuasion

Authors: Sung Jae Jun, Sokbae Lee

This paper examines a commonly used measure of persuasion whose precise
interpretation has been obscure in the literature. By using the potential
outcome framework, we define the causal persuasion rate by a proper conditional
probability of taking the action of interest with a persuasive message
conditional on not taking the action without the message. We then formally
study identification under empirically relevant data scenarios and show that
the commonly adopted measure generally does not estimate, but often overstates,
the causal rate of persuasion. We discuss several new parameters of interest
and provide practical methods for causal inference.

arXiv link: http://arxiv.org/abs/1812.02276v6

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2018-12-04

Necessary and Probably Sufficient Test for Finding Valid Instrumental Variables

Authors: Amit Sharma

Can instrumental variables be found from data? While instrumental variable
(IV) methods are widely used to identify causal effect, testing their validity
from observed data remains a challenge. This is because validity of an IV
depends on two assumptions, exclusion and as-if-random, that are largely
believed to be untestable from data. In this paper, we show that under certain
conditions, testing for instrumental variables is possible. We build upon prior
work on necessary tests to derive a test that characterizes the odds of being a
valid instrument, thus yielding the name "necessary and probably sufficient".
The test works by defining the class of invalid-IV and valid-IV causal models
as Bayesian generative models and comparing their marginal likelihood based on
observed data. When all variables are discrete, we also provide a method to
efficiently compute these marginal likelihoods.
We evaluate the test on an extensive set of simulations for binary data,
inspired by an open problem for IV testing proposed in past work. We find that
the test is most powerful when an instrument follows monotonicity---effect on
treatment is either non-decreasing or non-increasing---and has moderate-to-weak
strength; incidentally, such instruments are commonly used in observational
studies. Among as-if-random and exclusion, it detects exclusion violations with
higher power. Applying the test to IVs from two seminal studies on instrumental
variables and five recent studies from the American Economic Review shows that
many of the instruments may be flawed, at least when all variables are
discretized. The proposed test opens the possibility of data-driven validation
and search for instrumental variables.

arXiv link: http://arxiv.org/abs/1812.01412v1

Econometrics arXiv paper, submitted: 2018-12-04

Column Generation Algorithms for Nonparametric Analysis of Random Utility Models

Authors: Bart Smeulders

Kitamura and Stoye (2014) develop a nonparametric test for linear inequality
constraints, when these are are represented as vertices of a polyhedron instead
of its faces. They implement this test for an application to nonparametric
tests of Random Utility Models. As they note in their paper, testing such
models is computationally challenging. In this paper, we develop and implement
more efficient algorithms, based on column generation, to carry out the test.
These improved algorithms allow us to tackle larger datasets.

arXiv link: http://arxiv.org/abs/1812.01400v1

Econometrics arXiv updated paper (originally submitted: 2018-11-29)

Doubly Robust Difference-in-Differences Estimators

Authors: Pedro H. C. Sant'Anna, Jun B. Zhao

This article proposes doubly robust estimators for the average treatment
effect on the treated (ATT) in difference-in-differences (DID) research
designs. In contrast to alternative DID estimators, the proposed estimators are
consistent if either (but not necessarily both) a propensity score or outcome
regression working models are correctly specified. We also derive the
semiparametric efficiency bound for the ATT in DID designs when either panel or
repeated cross-section data are available, and show that our proposed
estimators attain the semiparametric efficiency bound when the working models
are correctly specified. Furthermore, we quantify the potential efficiency
gains of having access to panel data instead of repeated cross-section data.
Finally, by paying articular attention to the estimation method used to
estimate the nuisance parameters, we show that one can sometimes construct
doubly robust DID estimators for the ATT that are also doubly robust for
inference. Simulation studies and an empirical application illustrate the
desirable finite-sample performance of the proposed estimators. Open-source
software for implementing the proposed policy evaluation tools is available.

arXiv link: http://arxiv.org/abs/1812.01723v3

Econometrics arXiv updated paper (originally submitted: 2018-11-28)

Distribution Regression with Sample Selection, with an Application to Wage Decompositions in the UK

Authors: Victor Chernozhukov, Iván Fernández-Val, Siyi Luo

We develop a distribution regression model under endogenous sample selection.
This model is a semi-parametric generalization of the Heckman selection model.
It accommodates much richer effects of the covariates on outcome distribution
and patterns of heterogeneity in the selection process, and allows for drastic
departures from the Gaussian error structure, while maintaining the same level
tractability as the classical model. The model applies to continuous, discrete
and mixed outcomes. We provide identification, estimation, and inference
methods, and apply them to obtain wage decomposition for the UK. Here we
decompose the difference between the male and female wage distributions into
composition, wage structure, selection structure, and selection sorting
effects. After controlling for endogenous employment selection, we still find
substantial gender wage gap -- ranging from 21% to 40% throughout the (latent)
offered wage distribution that is not explained by composition. We also uncover
positive sorting for single men and negative sorting for married women that
accounts for a substantive fraction of the gender wage gap at the top of the
distribution.

arXiv link: http://arxiv.org/abs/1811.11603v6

Econometrics arXiv updated paper (originally submitted: 2018-11-28)

Simple Local Polynomial Density Estimators

Authors: Matias D. Cattaneo, Michael Jansson, Xinwei Ma

This paper introduces an intuitive and easy-to-implement nonparametric
density estimator based on local polynomial techniques. The estimator is fully
boundary adaptive and automatic, but does not require pre-binning or any other
transformation of the data. We study the main asymptotic properties of the
estimator, and use these results to provide principled estimation, inference,
and bandwidth selection methods. As a substantive application of our results,
we develop a novel discontinuity in density testing procedure, an important
problem in regression discontinuity designs and other program evaluation
settings. An illustrative empirical application is given. Two companion Stata
and R software packages are provided.

arXiv link: http://arxiv.org/abs/1811.11512v2

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2018-11-27

Simulation of Stylized Facts in Agent-Based Computational Economic Market Models

Authors: Maximilian Beikirch, Simon Cramer, Martin Frank, Philipp Otte, Emma Pabich, Torsten Trimborn

We study the qualitative and quantitative appearance of stylized facts in
several agent-based computational economic market (ABCEM) models. We perform
our simulations with the SABCEMM (Simulator for Agent-Based Computational
Economic Market Models) tool recently introduced by the authors (Trimborn et
al. 2019). Furthermore, we present novel ABCEM models created by recombining
existing models and study them with respect to stylized facts as well. This can
be efficiently performed by the SABCEMM tool thanks to its object-oriented
software design. The code is available on GitHub (Trimborn et al. 2018), such
that all results can be reproduced by the reader.

arXiv link: http://arxiv.org/abs/1812.02726v2

Econometrics arXiv paper, submitted: 2018-11-26

A Residual Bootstrap for Conditional Expected Shortfall

Authors: Alexander Heinemann, Sean Telg

This paper studies a fixed-design residual bootstrap method for the two-step
estimator of Francq and Zako\"ian (2015) associated with the conditional
Expected Shortfall. For a general class of volatility models the bootstrap is
shown to be asymptotically valid under the conditions imposed by Beutner et al.
(2018). A simulation study is conducted revealing that the average coverage
rates are satisfactory for most settings considered. There is no clear evidence
to have a preference for any of the three proposed bootstrap intervals. This
contrasts results in Beutner et al. (2018) for the VaR, for which the
reversed-tails interval has a superior performance.

arXiv link: http://arxiv.org/abs/1811.11557v1

Econometrics arXiv updated paper (originally submitted: 2018-11-26)

Estimation of a Heterogeneous Demand Function with Berkson Errors

Authors: Richard Blundell, Joel Horowitz, Matthias Parey

Berkson errors are commonplace in empirical microeconomics. In consumer
demand this form of measurement error occurs when the price an individual pays
is measured by the (weighted) average price paid by individuals in a specified
group (e.g., a county), rather than the true transaction price. We show the
importance of such measurement errors for the estimation of demand in a setting
with nonseparable unobserved heterogeneity. We develop a consistent estimator
using external information on the true distribution of prices. Examining the
demand for gasoline in the U.S., we document substantial within-market price
variability, and show that there are significant spatial differences in the
magnitude of Berkson errors across regions of the U.S. Accounting for Berkson
errors is found to be quantitatively important for estimating price effects and
for welfare calculations. Imposing the Slutsky shape constraint greatly reduces
the sensitivity to Berkson errors.

arXiv link: http://arxiv.org/abs/1811.10690v2

Econometrics arXiv paper, submitted: 2018-11-26

LM-BIC Model Selection in Semiparametric Models

Authors: Ivan Korolev

This paper studies model selection in semiparametric econometric models. It
develops a consistent series-based model selection procedure based on a
Bayesian Information Criterion (BIC) type criterion to select between several
classes of models. The procedure selects a model by minimizing the
semiparametric Lagrange Multiplier (LM) type test statistic from Korolev (2018)
but additionally rewards simpler models. The paper also develops consistent
upward testing (UT) and downward testing (DT) procedures based on the
semiparametric LM type specification test. The proposed semiparametric LM-BIC
and UT procedures demonstrate good performance in simulations. To illustrate
the use of these semiparametric model selection procedures, I apply them to the
parametric and semiparametric gasoline demand specifications from Yatchew and
No (2001). The LM-BIC procedure selects the semiparametric specification that
is nonparametric in age but parametric in all other variables, which is in line
with the conclusions in Yatchew and No (2001). The results of the UT and DT
procedures heavily depend on the choice of tuning parameters and assumptions
about the model errors.

arXiv link: http://arxiv.org/abs/1811.10676v1

Econometrics arXiv updated paper (originally submitted: 2018-11-25)

Generalized Dynamic Factor Models and Volatilities: Consistency, rates, and prediction intervals

Authors: Matteo Barigozzi, Marc Hallin

Volatilities, in high-dimensional panels of economic time series with a
dynamic factor structure on the levels or returns, typically also admit a
dynamic factor decomposition. We consider a two-stage dynamic factor model
method recovering the common and idiosyncratic components of both levels and
log-volatilities. Specifically, in a first estimation step, we extract the
common and idiosyncratic shocks for the levels, from which a log-volatility
proxy is computed. In a second step, we estimate a dynamic factor model, which
is equivalent to a multiplicative factor structure for volatilities, for the
log-volatility panel. By exploiting this two-stage factor approach, we build
one-step-ahead conditional prediction intervals for large $n \times T$ panels
of returns. Those intervals are based on empirical quantiles, not on
conditional variances; they can be either equal- or unequal- tailed. We provide
uniform consistency and consistency rates results for the proposed estimators
as both $n$ and $T$ tend to infinity. We study the finite-sample properties of
our estimators by means of Monte Carlo simulations. Finally, we apply our
methodology to a panel of asset returns belonging to the S&P100 index in order
to compute one-step-ahead conditional prediction intervals for the period
2006-2013. A comparison with the componentwise GARCH benchmark (which does not
take advantage of cross-sectional information) demonstrates the superiority of
our approach, which is genuinely multivariate (and high-dimensional),
nonparametric, and model-free.

arXiv link: http://arxiv.org/abs/1811.10045v2

Econometrics arXiv updated paper (originally submitted: 2018-11-24)

Identification of Treatment Effects under Limited Exogenous Variation

Authors: Whitney K. Newey, Sami Stouli

Multidimensional heterogeneity and endogeneity are important features of a
wide class of econometric models. With control variables to correct for
endogeneity, nonparametric identification of treatment effects requires strong
support conditions. To alleviate this requirement, we consider varying
coefficients specifications for the conditional expectation function of the
outcome given a treatment and control variables. This function is expressed as
a linear combination of either known functions of the treatment, with unknown
coefficients varying with the controls, or known functions of the controls,
with unknown coefficients varying with the treatment. We use this modeling
approach to give necessary and sufficient conditions for identification of
average treatment effects. A sufficient condition for identification is
conditional nonsingularity, that the second moment matrix of the known
functions given the variable in the varying coefficients is nonsingular with
probability one. For known treatment functions with sufficient variation, we
find that triangular models with discrete instrument cannot identify average
treatment effects when the number of support points for the instrument is less
than the number of coefficients. For known functions of the controls, we find
that average treatment effects can be identified in general nonseparable
triangular models with binary or discrete instruments. We extend our analysis
to flexible models of increasing dimension and relate conditional
nonsingularity to the full support condition of Imbens and Newey (2009),
thereby embedding semi- and non-parametric identification into a common
framework.

arXiv link: http://arxiv.org/abs/1811.09837v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2018-11-23

High Dimensional Classification through $\ell_0$-Penalized Empirical Risk Minimization

Authors: Le-Yu Chen, Sokbae Lee

We consider a high dimensional binary classification problem and construct a
classification procedure by minimizing the empirical misclassification risk
with a penalty on the number of selected features. We derive non-asymptotic
probability bounds on the estimated sparsity as well as on the excess
misclassification risk. In particular, we show that our method yields a sparse
solution whose l0-norm can be arbitrarily close to true sparsity with high
probability and obtain the rates of convergence for the excess
misclassification risk. The proposed procedure is implemented via the method of
mixed integer linear programming. Its numerical performance is illustrated in
Monte Carlo experiments.

arXiv link: http://arxiv.org/abs/1811.09540v1

Econometrics arXiv updated paper (originally submitted: 2018-11-21)

Model instability in predictive exchange rate regressions

Authors: Niko Hauzenberger, Florian Huber

In this paper we aim to improve existing empirical exchange rate models by
accounting for uncertainty with respect to the underlying structural
representation. Within a flexible Bayesian non-linear time series framework,
our modeling approach assumes that different regimes are characterized by
commonly used structural exchange rate models, with their evolution being
driven by a Markov process. We assume a time-varying transition probability
matrix with transition probabilities depending on a measure of the monetary
policy stance of the central bank at the home and foreign country. We apply
this model to a set of eight exchange rates against the US dollar. In a
forecasting exercise, we show that model evidence varies over time and a model
approach that takes this empirical evidence seriously yields improvements in
accuracy of density forecasts for most currency pairs considered.

arXiv link: http://arxiv.org/abs/1811.08818v2

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2018-11-21

The value of forecasts: Quantifying the economic gains of accurate quarter-hourly electricity price forecasts

Authors: Christopher Kath, Florian Ziel

We propose a multivariate elastic net regression forecast model for German
quarter-hourly electricity spot markets. While the literature is diverse on
day-ahead prediction approaches, both the intraday continuous and intraday
call-auction prices have not been studied intensively with a clear focus on
predictive power. Besides electricity price forecasting, we check for the
impact of early day-ahead (DA) EXAA prices on intraday forecasts. Another
novelty of this paper is the complementary discussion of economic benefits. A
precise estimation is worthless if it cannot be utilized. We elaborate possible
trading decisions based upon our forecasting scheme and analyze their monetary
effects. We find that even simple electricity trading strategies can lead to
substantial economic impact if combined with a decent forecasting technique.

arXiv link: http://arxiv.org/abs/1811.08604v1

Econometrics arXiv paper, submitted: 2018-11-20

Bayesian Inference for Structural Vector Autoregressions Identified by Markov-Switching Heteroskedasticity

Authors: Helmut Lütkepohl, Tomasz Woźniak

In this study, Bayesian inference is developed for structural vector
autoregressive models in which the structural parameters are identified via
Markov-switching heteroskedasticity. In such a model, restrictions that are
just-identifying in the homoskedastic case, become over-identifying and can be
tested. A set of parametric restrictions is derived under which the structural
matrix is globally or partially identified and a Savage-Dickey density ratio is
used to assess the validity of the identification conditions. The latter is
facilitated by analytical derivations that make the computations fast and
numerical standard errors small. As an empirical example, monetary models are
compared using heteroskedasticity as an additional device for identification.
The empirical results support models with money in the interest rate reaction
function.

arXiv link: http://arxiv.org/abs/1811.08167v1

Econometrics arXiv updated paper (originally submitted: 2018-11-20)

Complete Subset Averaging with Many Instruments

Authors: Seojeong Lee, Youngki Shin

We propose a two-stage least squares (2SLS) estimator whose first stage is
the equal-weighted average over a complete subset with $k$ instruments among
$K$ available, which we call the complete subset averaging (CSA) 2SLS. The
approximate mean squared error (MSE) is derived as a function of the subset
size $k$ by the Nagar (1959) expansion. The subset size is chosen by minimizing
the sample counterpart of the approximate MSE. We show that this method
achieves the asymptotic optimality among the class of estimators with different
subset sizes. To deal with averaging over a growing set of irrelevant
instruments, we generalize the approximate MSE to find that the optimal $k$ is
larger than otherwise. An extensive simulation experiment shows that the
CSA-2SLS estimator outperforms the alternative estimators when instruments are
correlated. As an empirical illustration, we estimate the logistic demand
function in Berry, Levinsohn, and Pakes (1995) and find the CSA-2SLS estimate
is better supported by economic theory than the alternative estimates.

arXiv link: http://arxiv.org/abs/1811.08083v6

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2018-11-19

Optimal Iterative Threshold-Kernel Estimation of Jump Diffusion Processes

Authors: José E. Figueroa-López, Cheng Li, Jeffrey Nisen

In this paper, we propose a new threshold-kernel jump-detection method for
jump-diffusion processes, which iteratively applies thresholding and kernel
methods in an approximately optimal way to achieve improved finite-sample
performance. We use the expected number of jump misclassifications as the
objective function to optimally select the threshold parameter of the jump
detection scheme. We prove that the objective function is quasi-convex and
obtain a new second-order infill approximation of the optimal threshold in
closed form. The approximate optimal threshold depends not only on the spot
volatility, but also the jump intensity and the value of the jump density at
the origin. Estimation methods for these quantities are then developed, where
the spot volatility is estimated by a kernel estimator with thresholding and
the value of the jump density at the origin is estimated by a density kernel
estimator applied to those increments deemed to contain jumps by the chosen
thresholding criterion. Due to the interdependency between the model parameters
and the approximate optimal estimators built to estimate them, a type of
iterative fixed-point algorithm is developed to implement them. Simulation
studies for a prototypical stochastic volatility model show that it is not only
feasible to implement the higher-order local optimal threshold scheme but also
that this is superior to those based only on the first order approximation
and/or on average values of the parameters over the estimation time period.

arXiv link: http://arxiv.org/abs/1811.07499v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2018-11-18

MALTS: Matching After Learning to Stretch

Authors: Harsh Parikh, Cynthia Rudin, Alexander Volfovsky

We introduce a flexible framework that produces high-quality almost-exact
matches for causal inference. Most prior work in matching uses ad-hoc distance
metrics, often leading to poor quality matches, particularly when there are
irrelevant covariates. In this work, we learn an interpretable distance metric
for matching, which leads to substantially higher quality matches. The learned
distance metric stretches the covariate space according to each covariate's
contribution to outcome prediction: this stretching means that mismatches on
important covariates carry a larger penalty than mismatches on irrelevant
covariates. Our ability to learn flexible distance metrics leads to matches
that are interpretable and useful for the estimation of conditional average
treatment effects.

arXiv link: http://arxiv.org/abs/1811.07415v9

Econometrics arXiv paper, submitted: 2018-11-13

Estimation of High-Dimensional Seemingly Unrelated Regression Models

Authors: Lidan Tan, Khai X. Chiong, Hyungsik Roger Moon

In this paper, we investigate seemingly unrelated regression (SUR) models
that allow the number of equations (N) to be large, and to be comparable to the
number of the observations in each equation (T). It is well known in the
literature that the conventional SUR estimator, for example, the generalized
least squares (GLS) estimator of Zellner (1962) does not perform well. As the
main contribution of the paper, we propose a new feasible GLS estimator called
the feasible graphical lasso (FGLasso) estimator. For a feasible implementation
of the GLS estimator, we use the graphical lasso estimation of the precision
matrix (the inverse of the covariance matrix of the equation system errors)
assuming that the underlying unknown precision matrix is sparse. We derive
asymptotic theories of the new estimator and investigate its finite sample
properties via Monte-Carlo simulations.

arXiv link: http://arxiv.org/abs/1811.05567v1

Econometrics arXiv updated paper (originally submitted: 2018-11-13)

Identification and estimation of multinomial choice models with latent special covariates

Authors: Nail Kashaev

Identification of multinomial choice models is often established by using
special covariates that have full support. This paper shows how these
identification results can be extended to a large class of multinomial choice
models when all covariates are bounded. I also provide a new
$n$-consistent asymptotically normal estimator of the finite-dimensional
parameters of the model.

arXiv link: http://arxiv.org/abs/1811.05555v3

Econometrics arXiv paper, submitted: 2018-11-11

Capital Structure and Speed of Adjustment in U.S. Firms. A Comparative Study in Microeconomic and Macroeconomic Conditions - A Quantille Regression Approach

Authors: Andreas Kaloudis, Dimitrios Tsolis

The major perspective of this paper is to provide more evidence regarding how
"quickly", in different macroeconomic states, companies adjust their capital
structure to their leverage targets. This study extends the empirical research
on the topic of capital structure by focusing on a quantile regression method
to investigate the behavior of firm-specific characteristics and macroeconomic
factors across all quantiles of distribution of leverage (book leverage and
market leverage). Therefore, depending on a partial adjustment model, we find
that the adjustment speed fluctuated in different stages of book versus market
leverage. Furthermore, while macroeconomic states change, we detect clear
differentiations of the contribution and the effects of the firm-specific and
the macroeconomic variables between market leverage and book leverage debt
ratios. Consequently, we deduce that across different macroeconomic states the
nature and maturity of borrowing influence the persistence and endurance of the
relation between determinants and borrowing.

arXiv link: http://arxiv.org/abs/1811.04473v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2018-11-10

The Augmented Synthetic Control Method

Authors: Eli Ben-Michael, Avi Feller, Jesse Rothstein

The synthetic control method (SCM) is a popular approach for estimating the
impact of a treatment on a single unit in panel data settings. The "synthetic
control" is a weighted average of control units that balances the treated
unit's pre-treatment outcomes as closely as possible. A critical feature of the
original proposal is to use SCM only when the fit on pre-treatment outcomes is
excellent. We propose Augmented SCM as an extension of SCM to settings where
such pre-treatment fit is infeasible. Analogous to bias correction for inexact
matching, Augmented SCM uses an outcome model to estimate the bias due to
imperfect pre-treatment fit and then de-biases the original SCM estimate. Our
main proposal, which uses ridge regression as the outcome model, directly
controls pre-treatment fit while minimizing extrapolation from the convex hull.
This estimator can also be expressed as a solution to a modified synthetic
controls problem that allows negative weights on some donor units. We bound the
estimation error of this approach under different data generating processes,
including a linear factor model, and show how regularization helps to avoid
over-fitting to noise. We demonstrate gains from Augmented SCM with extensive
simulation studies and apply this framework to estimate the impact of the 2012
Kansas tax cuts on economic growth. We implement the proposed method in the new
augsynth R package.

arXiv link: http://arxiv.org/abs/1811.04170v3

Econometrics arXiv paper, submitted: 2018-11-09

Bootstrapping Structural Change Tests

Authors: Otilia Boldea, Adriana Cornea-Madeira, Alastair R. Hall

This paper analyses the use of bootstrap methods to test for parameter change
in linear models estimated via Two Stage Least Squares (2SLS). Two types of
test are considered: one where the null hypothesis is of no change and the
alternative hypothesis involves discrete change at k unknown break-points in
the sample; and a second test where the null hypothesis is that there is
discrete parameter change at l break-points in the sample against an
alternative in which the parameters change at l + 1 break-points. In both
cases, we consider inferences based on a sup-Wald-type statistic using either
the wild recursive bootstrap or the wild fixed bootstrap. We establish the
asymptotic validity of these bootstrap tests under a set of general conditions
that allow the errors to exhibit conditional and/or unconditional
heteroskedasticity, and report results from a simulation study that indicate
the tests yield reliable inferences in the sample sizes often encountered in
macroeconomics. The analysis covers the cases where the first-stage estimation
of 2SLS involves a model whose parameters are either constant or themselves
subject to discrete parameter change. If the errors exhibit unconditional
heteroskedasticity and/or the reduced form is unstable then the bootstrap
methods are particularly attractive because the limiting distributions of the
test statistics are not pivotal.

arXiv link: http://arxiv.org/abs/1811.04125v1

Econometrics arXiv paper, submitted: 2018-11-09

How does stock market volatility react to oil shocks?

Authors: Andrea Bastianin, Matteo Manera

We study the impact of oil price shocks on the U.S. stock market volatility.
We jointly analyze three different structural oil market shocks (i.e.,
aggregate demand, oil supply, and oil-specific demand shocks) and stock market
volatility using a structural vector autoregressive model. Identification is
achieved by assuming that the price of crude oil reacts to stock market
volatility only with delay. This implies that innovations to the price of crude
oil are not strictly exogenous, but predetermined with respect to the stock
market. We show that volatility responds significantly to oil price shocks
caused by unexpected changes in aggregate and oil-specific demand, whereas the
impact of supply-side shocks is negligible.

arXiv link: http://arxiv.org/abs/1811.03820v1

Econometrics arXiv updated paper (originally submitted: 2018-11-09)

Estimation of a Structural Break Point in Linear Regression Models

Authors: Yaein Baek

This study proposes a point estimator of the break location for a one-time
structural break in linear regression models. If the break magnitude is small,
the least-squares estimator of the break date has two modes at the ends of the
finite sample period, regardless of the true break location. To solve this
problem, I suggest an alternative estimator based on a modification of the
least-squares objective function. The modified objective function incorporates
estimation uncertainty that varies across potential break dates. The new break
point estimator is consistent and has a unimodal finite sample distribution
under small break magnitudes. A limit distribution is provided under an in-fill
asymptotic framework. Monte Carlo simulation results suggest that the new
estimator outperforms the least-squares estimator. I apply the method to
estimate the break date in U.S. real GDP growth and U.S. and UK stock return
prediction models.

arXiv link: http://arxiv.org/abs/1811.03720v3

Econometrics arXiv updated paper (originally submitted: 2018-11-08)

Nonparametric maximum likelihood methods for binary response models with random coefficients

Authors: Jiaying Gu, Roger Koenker

Single index linear models for binary response with random coefficients have
been extensively employed in many econometric settings under various parametric
specifications of the distribution of the random coefficients. Nonparametric
maximum likelihood estimation (NPMLE) as proposed by Cosslett (1983) and
Ichimura and Thompson (1998), in contrast, has received less attention in
applied work due primarily to computational difficulties. We propose a new
approach to computation of NPMLEs for binary response models that significantly
increase their computational tractability thereby facilitating greater
flexibility in applications. Our approach, which relies on recent developments
involving the geometry of hyperplane arrangements, is contrasted with the
recently proposed deconvolution method of Gautier and Kitamura (2013). An
application to modal choice for the journey to work in the Washington DC area
illustrates the methods.

arXiv link: http://arxiv.org/abs/1811.03329v3

Econometrics arXiv paper, submitted: 2018-11-07

Nonparametric Analysis of Finite Mixtures

Authors: Yuichi Kitamura, Louise Laage

Finite mixture models are useful in applied econometrics. They can be used to
model unobserved heterogeneity, which plays major roles in labor economics,
industrial organization and other fields. Mixtures are also convenient in
dealing with contaminated sampling models and models with multiple equilibria.
This paper shows that finite mixture models are nonparametrically identified
under weak assumptions that are plausible in economic applications. The key is
to utilize the identification power implied by information in covariates
variation. First, three identification approaches are presented, under distinct
and non-nested sets of sufficient conditions. Observable features of data
inform us which of the three approaches is valid. These results apply to
general nonparametric switching regressions, as well as to structural
econometric models, such as auction models with unobserved heterogeneity.
Second, some extensions of the identification results are developed. In
particular, a mixture regression where the mixing weights depend on the value
of the regressors in a fully unrestricted manner is shown to be
nonparametrically identifiable. This means a finite mixture model with
function-valued unobserved heterogeneity can be identified in a cross-section
setting, without restricting the dependence pattern between the regressor and
the unobserved heterogeneity. In this aspect it is akin to fixed effects panel
data models which permit unrestricted correlation between unobserved
heterogeneity and covariates. Third, the paper shows that fully nonparametric
estimation of the entire mixture model is possible, by forming a sample
analogue of one of the new identification strategies. The estimator is shown to
possess a desirable polynomial rate of convergence as in a standard
nonparametric estimation problem, despite nonregular features of the model.

arXiv link: http://arxiv.org/abs/1811.02727v1

Econometrics arXiv paper, submitted: 2018-11-06

Randomization Tests for Equality in Dependence Structure

Authors: Juwon Seo

We develop a new statistical procedure to test whether the dependence
structure is identical between two groups. Rather than relying on a single
index such as Pearson's correlation coefficient or Kendall's Tau, we consider
the entire dependence structure by investigating the dependence functions
(copulas). The critical values are obtained by a modified randomization
procedure designed to exploit asymptotic group invariance conditions.
Implementation of the test is intuitive and simple, and does not require any
specification of a tuning parameter or weight function. At the same time, the
test exhibits excellent finite sample performance, with the null rejection
rates almost equal to the nominal level even when the sample size is extremely
small. Two empirical applications concerning the dependence between income and
consumption, and the Brexit effect on European financial market integration are
provided.

arXiv link: http://arxiv.org/abs/1811.02105v1

Econometrics arXiv updated paper (originally submitted: 2018-11-01)

Treatment Effect Estimation with Noisy Conditioning Variables

Authors: Kenichi Nagasawa

I develop a new identification strategy for treatment effects when noisy
measurements of unobserved confounding factors are available. I use proxy
variables to construct a random variable conditional on which treatment
variables become exogenous. The key idea is that, under appropriate conditions,
there exists a one-to-one mapping between the distribution of unobserved
confounding factors and the distribution of proxies. To ensure sufficient
variation in the constructed control variable, I use an additional variable,
termed excluded variable, which satisfies certain exclusion restrictions and
relevance conditions. I establish asymptotic distributional results for
semiparametric and flexible parametric estimators of causal parameters. I
illustrate empirical relevance and usefulness of my results by estimating
causal effects of attending selective college on earnings.

arXiv link: http://arxiv.org/abs/1811.00667v4

Econometrics arXiv paper, submitted: 2018-10-31

Partial Mean Processes with Generated Regressors: Continuous Treatment Effects and Nonseparable Models

Authors: Ying-Ying Lee

Partial mean with generated regressors arises in several econometric
problems, such as the distribution of potential outcomes with continuous
treatments and the quantile structural function in a nonseparable triangular
model. This paper proposes a nonparametric estimator for the partial mean
process, where the second step consists of a kernel regression on regressors
that are estimated in the first step. The main contribution is a uniform
expansion that characterizes in detail how the estimation error associated with
the generated regressor affects the limiting distribution of the marginal
integration estimator. The general results are illustrated with two examples:
the generalized propensity score for a continuous treatment (Hirano and Imbens,
2004) and control variables in triangular models (Newey, Powell, and Vella,
1999; Imbens and Newey, 2009). An empirical application to the Job Corps
program evaluation demonstrates the usefulness of the method.

arXiv link: http://arxiv.org/abs/1811.00157v1

Econometrics arXiv updated paper (originally submitted: 2018-10-31)

Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence

Authors: Michael C. Knaus, Michael Lechner, Anthony Strittmatter

We investigate the finite sample performance of causal machine learning
estimators for heterogeneous causal effects at different aggregation levels. We
employ an Empirical Monte Carlo Study that relies on arguably realistic data
generation processes (DGPs) based on actual data. We consider 24 different
DGPs, eleven different causal machine learning estimators, and three
aggregation levels of the estimated effects. In the main DGPs, we allow for
selection into treatment based on a rich set of observable covariates. We
provide evidence that the estimators can be categorized into three groups. The
first group performs consistently well across all DGPs and aggregation levels.
These estimators have multiple steps to account for the selection into the
treatment and the outcome process. The second group shows competitive
performance only for particular DGPs. The third group is clearly outperformed
by the other estimators.

arXiv link: http://arxiv.org/abs/1810.13237v2

Econometrics arXiv updated paper (originally submitted: 2018-10-31)

Dynamic Assortment Optimization with Changing Contextual Information

Authors: Xi Chen, Yining Wang, Yuan Zhou

In this paper, we study the dynamic assortment optimization problem under a
finite selling season of length $T$. At each time period, the seller offers an
arriving customer an assortment of substitutable products under a cardinality
constraint, and the customer makes the purchase among offered products
according to a discrete choice model. Most existing work associates each
product with a real-valued fixed mean utility and assumes a multinomial logit
choice (MNL) model. In many practical applications, feature/contexutal
information of products is readily available. In this paper, we incorporate the
feature information by assuming a linear relationship between the mean utility
and the feature. In addition, we allow the feature information of products to
change over time so that the underlying choice model can also be
non-stationary. To solve the dynamic assortment optimization under this
changing contextual MNL model, we need to simultaneously learn the underlying
unknown coefficient and makes the decision on the assortment. To this end, we
develop an upper confidence bound (UCB) based policy and establish the regret
bound on the order of $\widetilde O(dT)$, where $d$ is the dimension of
the feature and $\widetilde O$ suppresses logarithmic dependence. We further
established the lower bound $\Omega(dT/K)$ where $K$ is the cardinality
constraint of an offered assortment, which is usually small. When $K$ is a
constant, our policy is optimal up to logarithmic factors. In the exploitation
phase of the UCB algorithm, we need to solve a combinatorial optimization for
assortment optimization based on the learned information. We further develop an
approximation algorithm and an efficient greedy heuristic. The effectiveness of
the proposed policy is further demonstrated by our numerical studies.

arXiv link: http://arxiv.org/abs/1810.13069v2

Econometrics arXiv paper, submitted: 2018-10-30

Semiparametrically efficient estimation of the average linear regression function

Authors: Bryan S. Graham, Cristine Campos de Xavier Pinto

Let Y be an outcome of interest, X a vector of treatment measures, and W a
vector of pre-treatment control variables. Here X may include (combinations of)
continuous, discrete, and/or non-mutually exclusive "treatments". Consider the
linear regression of Y onto X in a subpopulation homogenous in W = w (formally
a conditional linear predictor). Let b0(w) be the coefficient vector on X in
this regression. We introduce a semiparametrically efficient estimate of the
average beta0 = E[b0(W)]. When X is binary-valued (multi-valued) our procedure
recovers the (a vector of) average treatment effect(s). When X is
continuously-valued, or consists of multiple non-exclusive treatments, our
estimand coincides with the average partial effect (APE) of X on Y when the
underlying potential response function is linear in X, but otherwise
heterogenous across agents. When the potential response function takes a
general nonlinear/heterogenous form, and X is continuously-valued, our
procedure recovers a weighted average of the gradient of this response across
individuals and values of X. We provide a simple, and semiparametrically
efficient, method of covariate adjustment for settings with complicated
treatment regimes. Our method generalizes familiar methods of covariate
adjustment used for program evaluation as well as methods of semiparametric
regression (e.g., the partially linear regression model).

arXiv link: http://arxiv.org/abs/1810.12511v1

Econometrics arXiv updated paper (originally submitted: 2018-10-26)

Robust Inference Using Inverse Probability Weighting

Authors: Xinwei Ma, Jingshen Wang

Inverse Probability Weighting (IPW) is widely used in empirical work in
economics and other disciplines. As Gaussian approximations perform poorly in
the presence of "small denominators," trimming is routinely employed as a
regularization strategy. However, ad hoc trimming of the observations renders
usual inference procedures invalid for the target estimand, even in large
samples. In this paper, we first show that the IPW estimator can have different
(Gaussian or non-Gaussian) asymptotic distributions, depending on how "close to
zero" the probability weights are and on how large the trimming threshold is.
As a remedy, we propose an inference procedure that is robust not only to small
probability weights entering the IPW estimator but also to a wide range of
trimming threshold choices, by adapting to these different asymptotic
distributions. This robustness is achieved by employing resampling techniques
and by correcting a non-negligible trimming bias. We also propose an
easy-to-implement method for choosing the trimming threshold by minimizing an
empirical analogue of the asymptotic mean squared error. In addition, we show
that our inference procedure remains valid with the use of a data-driven
trimming threshold. We illustrate our method by revisiting a dataset from the
National Supported Work program.

arXiv link: http://arxiv.org/abs/1810.11397v2

Econometrics arXiv updated paper (originally submitted: 2018-10-25)

Factor-Driven Two-Regime Regression

Authors: Sokbae Lee, Yuan Liao, Myung Hwan Seo, Youngki Shin

We propose a novel two-regime regression model where regime switching is
driven by a vector of possibly unobservable factors. When the factors are
latent, we estimate them by the principal component analysis of a panel data
set. We show that the optimization problem can be reformulated as mixed integer
optimization, and we present two alternative computational algorithms. We
derive the asymptotic distribution of the resulting estimator under the scheme
that the threshold effect shrinks to zero. In particular, we establish a phase
transition that describes the effect of first-stage factor estimation as the
cross-sectional dimension of panel data increases relative to the time-series
dimension. Moreover, we develop bootstrap inference and illustrate our methods
via numerical studies.

arXiv link: http://arxiv.org/abs/1810.11109v4

Econometrics arXiv updated paper (originally submitted: 2018-10-25)

Nuclear Norm Regularized Estimation of Panel Regression Models

Authors: Hyungsik Roger Moon, Martin Weidner

In this paper we investigate panel regression models with interactive fixed
effects. We propose two new estimation methods that are based on minimizing
convex objective functions. The first method minimizes the sum of squared
residuals with a nuclear (trace) norm regularization. The second method
minimizes the nuclear norm of the residuals. We establish the consistency of
the two resulting estimators. Those estimators have a very important
computational advantage compared to the existing least squares (LS) estimator,
in that they are defined as minimizers of a convex objective function. In
addition, the nuclear norm penalization helps to resolve a potential
identification problem for interactive fixed effect models, in particular when
the regressors are low-rank and the number of the factors is unknown. We also
show how to construct estimators that are asymptotically equivalent to the
least squares (LS) estimator in Bai (2009) and Moon and Weidner (2017) by using
our nuclear norm regularized or minimized estimators as initial values for a
finite number of LS minimizing iteration steps. This iteration avoids any
non-convex minimization, while the original LS estimation problem is generally
non-convex, and can have multiple local minima.

arXiv link: http://arxiv.org/abs/1810.10987v4

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2018-10-25

Spanning Tests for Markowitz Stochastic Dominance

Authors: Stelios Arvanitis, Olivier Scaillet, Nikolas Topaloglou

We derive properties of the cdf of random variables defined as saddle-type
points of real valued continuous stochastic processes. This facilitates the
derivation of the first-order asymptotic properties of tests for stochastic
spanning given some stochastic dominance relation. We define the concept of
Markowitz stochastic dominance spanning, and develop an analytical
representation of the spanning property. We construct a non-parametric test for
spanning based on subsampling, and derive its asymptotic exactness and
consistency. The spanning methodology determines whether introducing new
securities or relaxing investment constraints improves the investment
opportunity set of investors driven by Markowitz stochastic dominance. In an
application to standard data sets of historical stock market returns, we reject
market portfolio Markowitz efficiency as well as two-fund separation. Hence, we
find evidence that equity management through base assets can outperform the
market, for investors with Markowitz type preferences.

arXiv link: http://arxiv.org/abs/1810.10800v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2018-10-22

Model Selection Techniques -- An Overview

Authors: Jie Ding, Vahid Tarokh, Yuhong Yang

In the era of big data, analysts usually explore various statistical models
or machine learning methods for observed data in order to facilitate scientific
discoveries or gain predictive power. Whatever data and fitting procedures are
employed, a crucial step is to select the most appropriate model or method from
a set of candidates. Model selection is a key ingredient in data analysis for
reliable and reproducible statistical inference or prediction, and thus central
to scientific studies in fields such as ecology, economics, engineering,
finance, political science, biology, and epidemiology. There has been a long
history of model selection techniques that arise from researches in statistics,
information theory, and signal processing. A considerable number of methods
have been proposed, following different philosophies and exhibiting varying
performances. The purpose of this article is to bring a comprehensive overview
of them, in terms of their motivation, large sample performance, and
applicability. We provide integrated and practically relevant discussions on
theoretical properties of state-of- the-art model selection approaches. We also
share our thoughts on some controversial views on the practice of model
selection.

arXiv link: http://arxiv.org/abs/1810.09583v1

Econometrics arXiv cross-link from eess.SP (eess.SP), submitted: 2018-10-19

Forecasting Time Series with VARMA Recursions on Graphs

Authors: Elvin Isufi, Andreas Loukas, Nathanael Perraudin, Geert Leus

Graph-based techniques emerged as a choice to deal with the dimensionality
issues in modeling multivariate time series. However, there is yet no complete
understanding of how the underlying structure could be exploited to ease this
task. This work provides contributions in this direction by considering the
forecasting of a process evolving over a graph. We make use of the
(approximate) time-vertex stationarity assumption, i.e., timevarying graph
signals whose first and second order statistical moments are invariant over
time and correlated to a known graph topology. The latter is combined with VAR
and VARMA models to tackle the dimensionality issues present in predicting the
temporal evolution of multivariate time series. We find out that by projecting
the data to the graph spectral domain: (i) the multivariate model estimation
reduces to that of fitting a number of uncorrelated univariate ARMA models and
(ii) an optimal low-rank data representation can be exploited so as to further
reduce the estimation costs. In the case that the multivariate process can be
observed at a subset of nodes, the proposed models extend naturally to Kalman
filtering on graphs allowing for optimal tracking. Numerical experiments with
both synthetic and real data validate the proposed approach and highlight its
benefits over state-of-the-art alternatives.

arXiv link: http://arxiv.org/abs/1810.08581v2

Econometrics arXiv updated paper (originally submitted: 2018-10-19)

Probabilistic Forecasting in Day-Ahead Electricity Markets: Simulating Peak and Off-Peak Prices

Authors: Peru Muniain, Florian Ziel

In this paper we include dependency structures for electricity price
forecasting and forecasting evaluation. We work with off-peak and peak time
series from the German-Austrian day-ahead price, hence we analyze bivariate
data. We first estimate the mean of the two time series, and then in a second
step we estimate the residuals. The mean equation is estimated by OLS and
elastic net and the residuals are estimated by maximum likelihood. Our
contribution is to include a bivariate jump component on a mean reverting jump
diffusion model in the residuals. The models' forecasts are evaluated using
four different criteria, including the energy score to measure whether the
correlation structure between the time series is properly included or not. In
the results it is observed that the models with bivariate jumps provide better
results with the energy score, which means that it is important to consider
this structure in order to properly forecast correlated time series.

arXiv link: http://arxiv.org/abs/1810.08418v2

Econometrics arXiv updated paper (originally submitted: 2018-10-19)

Treatment Effect Models with Strategic Interaction in Treatment Decisions

Authors: Tadao Hoshino, Takahide Yanagi

This study considers treatment effect models in which others' treatment
decisions can affect both one's own treatment and outcome. Focusing on the case
of two-player interactions, we formulate treatment decision behavior as a
complete information game with multiple equilibria. Using a latent index
framework and assuming a stochastic equilibrium selection, we prove that the
marginal treatment effect from one's own treatment and that from the partner
are identifiable on the conditional supports of certain threshold variables
determined through the game model. Based on our constructive identification
results, we propose a two-step semiparametric procedure for estimating the
marginal treatment effects using series approximation. We show that the
proposed estimator is uniformly consistent and asymptotically normally
distributed. As an empirical illustration, we investigate the impacts of risky
behaviors on adolescents' academic performance.

arXiv link: http://arxiv.org/abs/1810.08350v11

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2018-10-18

Quantile Regression Under Memory Constraint

Authors: Xi Chen, Weidong Liu, Yichen Zhang

This paper studies the inference problem in quantile regression (QR) for a
large sample size $n$ but under a limited memory constraint, where the memory
can only store a small batch of data of size $m$. A natural method is the
na\"ive divide-and-conquer approach, which splits data into batches of size
$m$, computes the local QR estimator for each batch, and then aggregates the
estimators via averaging. However, this method only works when $n=o(m^2)$ and
is computationally expensive. This paper proposes a computationally efficient
method, which only requires an initial QR estimator on a small batch of data
and then successively refines the estimator via multiple rounds of
aggregations. Theoretically, as long as $n$ grows polynomially in $m$, we
establish the asymptotic normality for the obtained estimator and show that our
estimator with only a few rounds of aggregations achieves the same efficiency
as the QR estimator computed on all the data. Moreover, our result allows the
case that the dimensionality $p$ goes to infinity. The proposed method can also
be applied to address the QR problem under distributed computing environment
(e.g., in a large-scale sensor network) or for real-time streaming data.

arXiv link: http://arxiv.org/abs/1810.08264v1

Econometrics arXiv updated paper (originally submitted: 2018-10-17)

A Consistent Heteroskedasticity Robust LM Type Specification Test for Semiparametric Models

Authors: Ivan Korolev

This paper develops a consistent heteroskedasticity robust Lagrange
Multiplier (LM) type specification test for semiparametric conditional mean
models. Consistency is achieved by turning a conditional moment restriction
into a growing number of unconditional moment restrictions using series
methods. The proposed test statistic is straightforward to compute and is
asymptotically standard normal under the null. Compared with the earlier
literature on series-based specification tests in parametric models, I rely on
the projection property of series estimators and derive a different
normalization of the test statistic. Compared with the recent test in Gupta
(2018), I use a different way of accounting for heteroskedasticity. I
demonstrate using Monte Carlo studies that my test has superior finite sample
performance compared with the existing tests. I apply the test to one of the
semiparametric gasoline demand specifications from Yatchew and No (2001) and
find no evidence against it.

arXiv link: http://arxiv.org/abs/1810.07620v3

Econometrics arXiv updated paper (originally submitted: 2018-10-16)

Accounting for Unobservable Heterogeneity in Cross Section Using Spatial First Differences

Authors: Hannah Druckenmiller, Solomon Hsiang

We develop a cross-sectional research design to identify causal effects in
the presence of unobservable heterogeneity without instruments. When units are
dense in physical space, it may be sufficient to regress the "spatial first
differences" (SFD) of the outcome on the treatment and omit all covariates. The
identifying assumptions of SFD are similar in mathematical structure and
plausibility to other quasi-experimental designs. We use SFD to obtain new
estimates for the effects of time-invariant geographic factors, soil and
climate, on long-run agricultural productivities --- relationships crucial for
economic decisions, such as land management and climate policy, but notoriously
confounded by unobservables.

arXiv link: http://arxiv.org/abs/1810.07216v2

Econometrics arXiv paper, submitted: 2018-10-13

Using generalized estimating equations to estimate nonlinear models with spatial data

Authors: Cuicui Lu, Weining Wang, Jeffrey M. Wooldridge

In this paper, we study estimation of nonlinear models with cross sectional
data using two-step generalized estimating equations (GEE) in the quasi-maximum
likelihood estimation (QMLE) framework. In the interest of improving
efficiency, we propose a grouping estimator to account for the potential
spatial correlation in the underlying innovations. We use a Poisson model and a
Negative Binomial II model for count data and a Probit model for binary
response data to demonstrate the GEE procedure. Under mild weak dependency
assumptions, results on estimation consistency and asymptotic normality are
provided. Monte Carlo simulations show efficiency gain of our approach in
comparison of different estimation methods for count data and binary response
data. Finally we apply the GEE approach to study the determinants of the inflow
foreign direct investment (FDI) to China.

arXiv link: http://arxiv.org/abs/1810.05855v1

Econometrics arXiv updated paper (originally submitted: 2018-10-11)

Stochastic Revealed Preferences with Measurement Error

Authors: Victor H. Aguiar, Nail Kashaev

A long-standing question about consumer behavior is whether individuals'
observed purchase decisions satisfy the revealed preference (RP) axioms of the
utility maximization theory (UMT). Researchers using survey or experimental
panel data sets on prices and consumption to answer this question face the
well-known problem of measurement error. We show that ignoring measurement
error in the RP approach may lead to overrejection of the UMT. To solve this
problem, we propose a new statistical RP framework for consumption panel data
sets that allows for testing the UMT in the presence of measurement error. Our
test is applicable to all consumer models that can be characterized by their
first-order conditions. Our approach is nonparametric, allows for unrestricted
heterogeneity in preferences, and requires only a centering condition on
measurement error. We develop two applications that provide new evidence about
the UMT. First, we find support in a survey data set for the dynamic and
time-consistent UMT in single-individual households, in the presence of
nonclassical measurement error in consumption. In the second
application, we cannot reject the static UMT in a widely used experimental data
set in which measurement error in prices is assumed to be the result of price
misperception due to the experimental design. The first finding stands in
contrast to the conclusions drawn from the deterministic RP test of Browning
(1989). The second finding reverses the conclusions drawn from the
deterministic RP test of Afriat (1967) and Varian (1982).

arXiv link: http://arxiv.org/abs/1810.05287v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2018-10-10

Offline Multi-Action Policy Learning: Generalization and Optimization

Authors: Zhengyuan Zhou, Susan Athey, Stefan Wager

In many settings, a decision-maker wishes to learn a rule, or policy, that
maps from observable characteristics of an individual to an action. Examples
include selecting offers, prices, advertisements, or emails to send to
consumers, as well as the problem of determining which medication to prescribe
to a patient. While there is a growing body of literature devoted to this
problem, most existing results are focused on the case where data comes from a
randomized experiment, and further, there are only two possible actions, such
as giving a drug to a patient or not. In this paper, we study the offline
multi-action policy learning problem with observational data and where the
policy may need to respect budget constraints or belong to a restricted policy
class such as decision trees. We build on the theory of efficient
semi-parametric inference in order to propose and implement a policy learning
algorithm that achieves asymptotically minimax-optimal regret. To the best of
our knowledge, this is the first result of this type in the multi-action setup,
and it provides a substantial performance improvement over the existing
learning algorithms. We then consider additional computational challenges that
arise in implementing our method for the case where the policy is restricted to
take the form of a decision tree. We propose two different approaches, one
using a mixed integer program formulation and the other using a tree-search
based algorithm.

arXiv link: http://arxiv.org/abs/1810.04778v2

Econometrics arXiv updated paper (originally submitted: 2018-10-10)

Prices, Profits, Proxies, and Production

Authors: Victor H. Aguiar, Nail Kashaev, Roy Allen

This paper studies nonparametric identification and counterfactual bounds for
heterogeneous firms that can be ranked in terms of productivity. Our approach
works when quantities and prices are latent, rendering standard approaches
inapplicable. Instead, we require observation of profits or other
optimizing-values such as costs or revenues, and either prices or price proxies
of flexibly chosen variables. We extend classical duality results for
price-taking firms to a setup with discrete heterogeneity, endogeneity, and
limited variation in possibly latent prices. Finally, we show that convergence
results for nonparametric estimators may be directly converted to convergence
results for production sets.

arXiv link: http://arxiv.org/abs/1810.04697v4

Econometrics arXiv updated paper (originally submitted: 2018-10-08)

The Incidental Parameters Problem in Testing for Remaining Cross-section Correlation

Authors: Arturas Juodis, Simon Reese

In this paper we consider the properties of the Pesaran (2004, 2015a) CD test
for cross-section correlation when applied to residuals obtained from panel
data models with many estimated parameters. We show that the presence of
period-specific parameters leads the CD test statistic to diverge as length of
the time dimension of the sample grows. This result holds even if cross-section
dependence is correctly accounted for and hence constitutes an example of the
Incidental Parameters Problem. The relevance of this problem is investigated
both for the classical Time Fixed Effects estimator as well as the Common
Correlated Effects estimator of Pesaran (2006). We suggest a weighted CD test
statistic which re-establishes standard normal inference under the null
hypothesis. Given the widespread use of the CD test statistic to test for
remaining cross-section correlation, our results have far reaching implications
for empirical researchers.

arXiv link: http://arxiv.org/abs/1810.03715v4

Econometrics arXiv cross-link from General Economics (econ.GN), submitted: 2018-10-08

Evaluating regulatory reform of network industries: a survey of empirical models based on categorical proxies

Authors: Andrea Bastianin, Paolo Castelnovo, Massimo Florio

Proxies for regulatory reforms based on categorical variables are
increasingly used in empirical evaluation models. We surveyed 63 studies that
rely on such indices to analyze the effects of entry liberalization,
privatization, unbundling, and independent regulation of the electricity,
natural gas, and telecommunications sectors. We highlight methodological issues
related to the use of these proxies. Next, taking stock of the literature, we
provide practical advice for the design of the empirical strategy and discuss
the selection of control and instrumental variables to attenuate endogeneity
problems undermining identification of the effects of regulatory reforms.

arXiv link: http://arxiv.org/abs/1810.03348v1

Econometrics arXiv updated paper (originally submitted: 2018-10-07)

Simple Inference on Functionals of Set-Identified Parameters Defined by Linear Moments

Authors: JoonHwan Cho, Thomas M. Russell

This paper proposes a new approach to obtain uniformly valid inference for
linear functionals or scalar subvectors of a partially identified parameter
defined by linear moment inequalities. The procedure amounts to bootstrapping
the value functions of randomly perturbed linear programming problems, and does
not require the researcher to grid over the parameter space. The low-level
conditions for uniform validity rely on genericity results for linear programs.
The unconventional perturbation approach produces a confidence set with a
coverage probability of 1 over the identified set, but obtains exact coverage
on an outer set, is valid under weak assumptions, and is computationally simple
to implement.

arXiv link: http://arxiv.org/abs/1810.03180v10

Econometrics arXiv updated paper (originally submitted: 2018-10-07)

On LASSO for Predictive Regression

Authors: Ji Hyung Lee, Zhentao Shi, Zhan Gao

Explanatory variables in a predictive regression typically exhibit low signal
strength and various degrees of persistence. Variable selection in such a
context is of great importance. In this paper, we explore the pitfalls and
possibilities of the LASSO methods in this predictive regression framework. In
the presence of stationary, local unit root, and cointegrated predictors, we
show that the adaptive LASSO cannot asymptotically eliminate all cointegrating
variables with zero regression coefficients. This new finding motivates a novel
post-selection adaptive LASSO, which we call the twin adaptive LASSO (TAlasso),
to restore variable selection consistency. Accommodating the system of
heterogeneous regressors, TAlasso achieves the well-known oracle property. In
contrast, conventional LASSO fails to attain coefficient estimation consistency
and variable screening in all components simultaneously. We apply these LASSO
methods to evaluate the short- and long-horizon predictability of S&P 500
excess returns.

arXiv link: http://arxiv.org/abs/1810.03140v4

Econometrics arXiv paper, submitted: 2018-10-03

Granger causality on horizontal sum of Boolean algebras

Authors: M. Bohdalová, M. Kalina, O. Nánásiová

The intention of this paper is to discuss the mathematical model of causality
introduced by C.W.J. Granger in 1969. The Granger's model of causality has
become well-known and often used in various econometric models describing
causal systems, e.g., between commodity prices and exchange rates.
Our paper presents a new mathematical model of causality between two measured
objects. We have slightly modified the well-known Kolmogorovian probability
model. In particular, we use the horizontal sum of set $\sigma$-algebras
instead of their direct product.

arXiv link: http://arxiv.org/abs/1810.01654v1

Econometrics arXiv updated paper (originally submitted: 2018-10-03)

Interpreting OLS Estimands When Treatment Effects Are Heterogeneous: Smaller Groups Get Larger Weights

Authors: Tymon Słoczyński

Applied work often studies the effect of a binary variable ("treatment")
using linear models with additive effects. I study the interpretation of the
OLS estimands in such models when treatment effects are heterogeneous. I show
that the treatment coefficient is a convex combination of two parameters, which
under certain conditions can be interpreted as the average treatment effects on
the treated and untreated. The weights on these parameters are inversely
related to the proportion of observations in each group. Reliance on these
implicit weights can have serious consequences for applied work, as I
illustrate with two well-known applications. I develop simple diagnostic tools
that empirical researchers can use to avoid potential biases. Software for
implementing these methods is available in R and Stata. In an important special
case, my diagnostics only require the knowledge of the proportion of treated
units.

arXiv link: http://arxiv.org/abs/1810.01576v3

Econometrics arXiv updated paper (originally submitted: 2018-10-02)

Covariate Distribution Balance via Propensity Scores

Authors: Pedro H. C. Sant'Anna, Xiaojun Song, Qi Xu

This paper proposes new estimators for the propensity score that aim to
maximize the covariate distribution balance among different treatment groups.
Heuristically, our proposed procedure attempts to estimate a propensity score
model by making the underlying covariate distribution of different treatment
groups as close to each other as possible. Our estimators are data-driven, do
not rely on tuning parameters such as bandwidths, admit an asymptotic linear
representation, and can be used to estimate different treatment effect
parameters under different identifying assumptions, including unconfoundedness
and local treatment effects. We derive the asymptotic properties of inverse
probability weighted estimators for the average, distributional, and quantile
treatment effects based on the proposed propensity score estimator and
illustrate their finite sample performance via Monte Carlo simulations and two
empirical applications.

arXiv link: http://arxiv.org/abs/1810.01370v4

Econometrics arXiv updated paper (originally submitted: 2018-09-30)

Nonparametric Regression with Selectively Missing Covariates

Authors: Christoph Breunig, Peter Haan

We consider the problem of regression with selectively observed covariates in
a nonparametric framework. Our approach relies on instrumental variables that
explain variation in the latent covariates but have no direct effect on
selection. The regression function of interest is shown to be a weighted
version of observed conditional expectation where the weighting function is a
fraction of selection probabilities. Nonparametric identification of the
fractional probability weight (FPW) function is achieved via a partial
completeness assumption. We provide primitive functional form assumptions for
partial completeness to hold. The identification result is constructive for the
FPW series estimator. We derive the rate of convergence and also the pointwise
asymptotic distribution. In both cases, the asymptotic performance of the FPW
series estimator does not suffer from the inverse problem which derives from
the nonparametric instrumental variable approach. In a Monte Carlo study, we
analyze the finite sample properties of our estimator and we compare our
approach to inverse probability weighting, which can be used alternatively for
unconditional moment estimation. In the empirical application, we focus on two
different applications. We estimate the association between income and health
using linked data from the SHARE survey and administrative pension information
and use pension entitlements as an instrument. In the second application we
revisit the question how income affects the demand for housing based on data
from the German Socio-Economic Panel Study (SOEP). In this application we use
regional income information on the residential block level as an instrument. In
both applications we show that income is selectively missing and we demonstrate
that standard methods that do not account for the nonrandom selection process
lead to significantly biased estimates for individuals with low income.

arXiv link: http://arxiv.org/abs/1810.00411v4

Econometrics arXiv updated paper (originally submitted: 2018-09-30)

Proxy Controls and Panel Data

Authors: Ben Deaner

We provide new results for nonparametric identification, estimation, and
inference of causal effects using `proxy controls': observables that are noisy
but informative proxies for unobserved confounding factors. Our analysis
applies to cross-sectional settings but is particularly well-suited to panel
models. Our identification results motivate a simple and `well-posed'
nonparametric estimator. We derive convergence rates for the estimator and
construct uniform confidence bands with asymptotically correct size. In panel
settings, our methods provide a novel approach to the difficult problem of
identification with non-separable, general heterogeneity and fixed $T$. In
panels, observations from different periods serve as proxies for unobserved
heterogeneity and our key identifying assumptions follow from restrictions on
the serial dependence structure. We apply our methods to two empirical
settings. We estimate consumer demand counterfactuals using panel data and we
estimate causal effects of grade retention on cognitive performance.

arXiv link: http://arxiv.org/abs/1810.00283v8

Econometrics arXiv updated paper (originally submitted: 2018-09-26)

Deep Neural Networks for Estimation and Inference

Authors: Max H. Farrell, Tengyuan Liang, Sanjog Misra

We study deep neural networks and their use in semiparametric inference. We
establish novel rates of convergence for deep feedforward neural nets. Our new
rates are sufficiently fast (in some cases minimax optimal) to allow us to
establish valid second-step inference after first-step estimation with deep
learning, a result also new to the literature. Our estimation rates and
semiparametric inference results handle the current standard architecture:
fully connected feedforward neural networks (multi-layer perceptrons), with the
now-common rectified linear unit activation function and a depth explicitly
diverging with the sample size. We discuss other architectures as well,
including fixed-width, very deep networks. We establish nonasymptotic bounds
for these deep nets for a general class of nonparametric regression-type loss
functions, which includes as special cases least squares, logistic regression,
and other generalized linear models. We then apply our theory to develop
semiparametric inference, focusing on causal parameters for concreteness, such
as treatment effects, expected welfare, and decomposition effects. Inference in
many other semiparametric contexts can be readily obtained. We demonstrate the
effectiveness of deep learning with a Monte Carlo analysis and an empirical
application to direct mail marketing.

arXiv link: http://arxiv.org/abs/1809.09953v3

Econometrics arXiv updated paper (originally submitted: 2018-09-26)

Multivariate Stochastic Volatility Model with Realized Volatilities and Pairwise Realized Correlations

Authors: Yuta Yamauchi, Yasuhiro Omori

Although stochastic volatility and GARCH (generalized autoregressive
conditional heteroscedasticity) models have successfully described the
volatility dynamics of univariate asset returns, extending them to the
multivariate models with dynamic correlations has been difficult due to several
major problems. First, there are too many parameters to estimate if available
data are only daily returns, which results in unstable estimates. One solution
to this problem is to incorporate additional observations based on intraday
asset returns, such as realized covariances. Second, since multivariate asset
returns are not synchronously traded, we have to use the largest time intervals
such that all asset returns are observed in order to compute the realized
covariance matrices. However, in this study, we fail to make full use of the
available intraday informations when there are less frequently traded assets.
Third, it is not straightforward to guarantee that the estimated (and the
realized) covariance matrices are positive definite. Our contributions are the
following: (1) we obtain the stable parameter estimates for the dynamic
correlation models using the realized measures, (2) we make full use of
intraday informations by using pairwise realized correlations, (3) the
covariance matrices are guaranteed to be positive definite, (4) we avoid the
arbitrariness of the ordering of asset returns, (5) we propose the flexible
correlation structure model (e.g., such as setting some correlations to be zero
if necessary), and (6) the parsimonious specification for the leverage effect
is proposed. Our proposed models are applied to the daily returns of nine U.S.
stocks with their realized volatilities and pairwise realized correlations and
are shown to outperform the existing models with respect to portfolio
performances.

arXiv link: http://arxiv.org/abs/1809.09928v2

Econometrics arXiv updated paper (originally submitted: 2018-09-25)

Mostly Harmless Simulations? Using Monte Carlo Studies for Estimator Selection

Authors: Arun Advani, Toru Kitagawa, Tymon Słoczyński

We consider two recent suggestions for how to perform an empirically
motivated Monte Carlo study to help select a treatment effect estimator under
unconfoundedness. We show theoretically that neither is likely to be
informative except under restrictive conditions that are unlikely to be
satisfied in many contexts. To test empirical relevance, we also apply the
approaches to a real-world setting where estimator performance is known. Both
approaches are worse than random at selecting estimators which minimise
absolute bias. They are better when selecting estimators that minimise mean
squared error. However, using a simple bootstrap is at least as good and often
better. For now researchers would be best advised to use a range of estimators
and compare estimates for robustness.

arXiv link: http://arxiv.org/abs/1809.09527v2

Econometrics arXiv updated paper (originally submitted: 2018-09-24)

An Automated Approach Towards Sparse Single-Equation Cointegration Modelling

Authors: Stephan Smeekes, Etienne Wijler

In this paper we propose the Single-equation Penalized Error Correction
Selector (SPECS) as an automated estimation procedure for dynamic
single-equation models with a large number of potentially (co)integrated
variables. By extending the classical single-equation error correction model,
SPECS enables the researcher to model large cointegrated datasets without
necessitating any form of pre-testing for the order of integration or
cointegrating rank. Under an asymptotic regime in which both the number of
parameters and time series observations jointly diverge to infinity, we show
that SPECS is able to consistently estimate an appropriate linear combination
of the cointegrating vectors that may occur in the underlying DGP. In addition,
SPECS is shown to enable the correct recovery of sparsity patterns in the
parameter space and to posses the same limiting distribution as the OLS oracle
procedure. A simulation study shows strong selective capabilities, as well as
superior predictive performance in the context of nowcasting compared to
high-dimensional models that ignore cointegration. An empirical application to
nowcasting Dutch unemployment rates using Google Trends confirms the strong
practical performance of our procedure.

arXiv link: http://arxiv.org/abs/1809.08889v3

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2018-09-19

Transmission of Macroeconomic Shocks to Risk Parameters: Their uses in Stress Testing

Authors: Helder Rojas, David Dias

In this paper, we are interested in evaluating the resilience of financial
portfolios under extreme economic conditions. Therefore, we use empirical
measures to characterize the transmission process of macroeconomic shocks to
risk parameters. We propose the use of an extensive family of models, called
General Transfer Function Models, which condense well the characteristics of
the transmission described by the impact measures. The procedure for estimating
the parameters of these models is described employing the Bayesian approach and
using the prior information provided by the impact measures. In addition, we
illustrate the use of the estimated models from the credit risk data of a
portfolio.

arXiv link: http://arxiv.org/abs/1809.07401v3

Econometrics arXiv paper, submitted: 2018-09-19

Focused econometric estimation for noisy and small datasets: A Bayesian Minimum Expected Loss estimator approach

Authors: Andres Ramirez-Hassan, Manuel Correa-Giraldo

Central to many inferential situations is the estimation of rational
functions of parameters. The mainstream in statistics and econometrics
estimates these quantities based on the plug-in approach without consideration
of the main objective of the inferential situation. We propose the Bayesian
Minimum Expected Loss (MELO) approach focusing explicitly on the function of
interest, and calculating its frequentist variability. Asymptotic properties of
the MELO estimator are similar to the plug-in approach. Nevertheless,
simulation exercises show that our proposal is better in situations
characterized by small sample sizes and noisy models. In addition, we observe
in the applications that our approach gives lower standard errors than
frequently used alternatives when datasets are not very informative.

arXiv link: http://arxiv.org/abs/1809.06996v1

Econometrics arXiv paper, submitted: 2018-09-18

Estimating grouped data models with a binary dependent variable and fixed effects: What are the issues

Authors: Nathaniel Beck

This article deals with asimple issue: if we have grouped data with a binary
dependent variable and want to include fixed effects (group specific
intercepts) in the specification, is Ordinary Least Squares (OLS) in any way
superior to a (conditional) logit form? In particular, what are the
consequences of using OLS instead of a fixed effects logit model with respect
to the latter dropping all units which show no variability in the dependent
variable while the former allows for estimation using all units. First, we show
that the discussion of fthe incidental parameters problem is based on an
assumption about the kinds of data being studied; for what appears to be the
common use of fixed effect models in political science the incidental
parameters issue is illusory. Turning to linear models, we see that OLS yields
a linear combination of the estimates for the units with and without variation
in the dependent variable, and so the coefficient estimates must be carefully
interpreted. The article then compares two methods of estimating logit models
with fixed effects, and shows that the Chamberlain conditional logit is as good
as or better than a logit analysis which simply includes group specific
intercepts (even though the conditional logit technique was designed to deal
with the incidental parameters problem!). Related to this, the article
discusses the estimation of marginal effects using both OLS and logit. While it
appears that a form of logit with fixed effects can be used to estimate
marginal effects, this method can be improved by starting with conditional
logit and then using the those parameter estimates to constrain the logit with
fixed effects model. This method produces estimates of sample average marginal
effects that are at least as good as OLS, and much better when group size is
small or the number of groups is large. .

arXiv link: http://arxiv.org/abs/1809.06505v1

Econometrics arXiv updated paper (originally submitted: 2018-09-15)

Control Variables, Discrete Instruments, and Identification of Structural Functions

Authors: Whitney Newey, Sami Stouli

Control variables provide an important means of controlling for endogeneity
in econometric models with nonseparable and/or multidimensional heterogeneity.
We allow for discrete instruments, giving identification results under a
variety of restrictions on the way the endogenous variable and the control
variables affect the outcome. We consider many structural objects of interest,
such as average or quantile treatment effects. We illustrate our results with
an empirical application to Engel curve estimation.

arXiv link: http://arxiv.org/abs/1809.05706v2

Econometrics arXiv paper, submitted: 2018-09-14

On the Choice of Instruments in Mixed Frequency Specification Tests

Authors: Yun Liu, Yeonwoo Rho

Time averaging has been the traditional approach to handle mixed sampling
frequencies. However, it ignores information possibly embedded in high
frequency. Mixed data sampling (MIDAS) regression models provide a concise way
to utilize the additional information in high-frequency variables. In this
paper, we propose a specification test to choose between time averaging and
MIDAS models, based on a Durbin-Wu-Hausman test. In particular, a set of
instrumental variables is proposed and theoretically validated when the
frequency ratio is large. As a result, our method tends to be more powerful
than existing methods, as reconfirmed through the simulations.

arXiv link: http://arxiv.org/abs/1809.05503v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2018-09-14

Automatic Debiased Machine Learning of Causal and Structural Effects

Authors: Victor Chernozhukov, Whitney K Newey, Rahul Singh

Many causal and structural effects depend on regressions. Examples include
policy effects, average derivatives, regression decompositions, average
treatment effects, causal mediation, and parameters of economic structural
models. The regressions may be high dimensional, making machine learning
useful. Plugging machine learners into identifying equations can lead to poor
inference due to bias from regularization and/or model selection. This paper
gives automatic debiasing for linear and nonlinear functions of regressions.
The debiasing is automatic in using Lasso and the function of interest without
the full form of the bias correction. The debiasing can be applied to any
regression learner, including neural nets, random forests, Lasso, boosting, and
other high dimensional methods. In addition to providing the bias correction we
give standard errors that are robust to misspecification, convergence rates for
the bias correction, and primitive conditions for asymptotic inference for
estimators of a variety of estimators of structural and causal effects. The
automatic debiased machine learning is used to estimate the average treatment
effect on the treated for the NSW job training data and to estimate demand
elasticities from Nielsen scanner data while allowing preferences to be
correlated with prices and income.

arXiv link: http://arxiv.org/abs/1809.05224v5

Econometrics arXiv paper, submitted: 2018-09-13

Valid Simultaneous Inference in High-Dimensional Settings (with the hdm package for R)

Authors: Philipp Bach, Victor Chernozhukov, Martin Spindler

Due to the increasing availability of high-dimensional empirical applications
in many research disciplines, valid simultaneous inference becomes more and
more important. For instance, high-dimensional settings might arise in economic
studies due to very rich data sets with many potential covariates or in the
analysis of treatment heterogeneities. Also the evaluation of potentially more
complicated (non-linear) functional forms of the regression relationship leads
to many potential variables for which simultaneous inferential statements might
be of interest. Here we provide a review of classical and modern methods for
simultaneous inference in (high-dimensional) settings and illustrate their use
by a case study using the R package hdm. The R package hdm implements valid
joint powerful and efficient hypothesis tests for a potentially large number of
coeffcients as well as the construction of simultaneous confidence intervals
and, therefore, provides useful methods to perform valid post-selection
inference based on the LASSO.

arXiv link: http://arxiv.org/abs/1809.04951v1

Econometrics arXiv updated paper (originally submitted: 2018-09-13)

Bayesian shrinkage in mixture of experts models: Identifying robust determinants of class membership

Authors: Gregor Zens

A method for implicit variable selection in mixture of experts frameworks is
proposed. We introduce a prior structure where information is taken from a set
of independent covariates. Robust class membership predictors are identified
using a normal gamma prior. The resulting model setup is used in a finite
mixture of Bernoulli distributions to find homogenous clusters of women in
Mozambique based on their information sources on HIV. Fully Bayesian inference
is carried out via the implementation of a Gibbs sampler.

arXiv link: http://arxiv.org/abs/1809.04853v2

Econometrics arXiv paper, submitted: 2018-09-11

Bootstrap Methods in Econometrics

Authors: Joel L. Horowitz

The bootstrap is a method for estimating the distribution of an estimator or
test statistic by re-sampling the data or a model estimated from the data.
Under conditions that hold in a wide variety of econometric applications, the
bootstrap provides approximations to distributions of statistics, coverage
probabilities of confidence intervals, and rejection probabilities of
hypothesis tests that are more accurate than the approximations of first-order
asymptotic distribution theory. The reductions in the differences between true
and nominal coverage or rejection probabilities can be very large. In addition,
the bootstrap provides a way to carry out inference in certain settings where
obtaining analytic distributional approximations is difficult or impossible.
This article explains the usefulness and limitations of the bootstrap in
contexts of interest in econometrics. The presentation is informal and
expository. It provides an intuitive understanding of how the bootstrap works.
Mathematical details are available in references that are cited.

arXiv link: http://arxiv.org/abs/1809.04016v1

Econometrics arXiv paper, submitted: 2018-09-11

Regression Discontinuity Designs Using Covariates

Authors: Sebastian Calonico, Matias D. Cattaneo, Max H. Farrell, Rocio Titiunik

We study regression discontinuity designs when covariates are included in the
estimation. We examine local polynomial estimators that include discrete or
continuous covariates in an additive separable way, but without imposing any
parametric restrictions on the underlying population regression functions. We
recommend a covariate-adjustment approach that retains consistency under
intuitive conditions, and characterize the potential for estimation and
inference improvements. We also present new covariate-adjusted mean squared
error expansions and robust bias-corrected inference procedures, with
heteroskedasticity-consistent and cluster-robust standard errors. An empirical
illustration and an extensive simulation study is presented. All methods are
implemented in R and Stata software packages.

arXiv link: http://arxiv.org/abs/1809.03904v1

Econometrics arXiv paper, submitted: 2018-09-10

Non-Asymptotic Inference in Instrumental Variables Estimation

Authors: Joel L. Horowitz

This paper presents a simple method for carrying out inference in a wide
variety of possibly nonlinear IV models under weak assumptions. The method is
non-asymptotic in the sense that it provides a finite sample bound on the
difference between the true and nominal probabilities of rejecting a correct
null hypothesis. The method is a non-Studentized version of the Anderson-Rubin
test but is motivated and analyzed differently. In contrast to the conventional
Anderson-Rubin test, the method proposed here does not require restrictive
distributional assumptions, linearity of the estimated model, or simultaneous
equations. Nor does it require knowledge of whether the instruments are strong
or weak. It does not require testing or estimating the strength of the
instruments. The method can be applied to quantile IV models that may be
nonlinear and can be used to test a parametric IV model against a nonparametric
alternative. The results presented here hold in finite samples, regardless of
the strength of the instruments.

arXiv link: http://arxiv.org/abs/1809.03600v1

Econometrics arXiv updated paper (originally submitted: 2018-09-10)

Characteristic-Sorted Portfolios: Estimation and Inference

Authors: Matias D. Cattaneo, Richard K. Crump, Max H. Farrell, Ernst Schaumburg

Portfolio sorting is ubiquitous in the empirical finance literature, where it
has been widely used to identify pricing anomalies. Despite its popularity,
little attention has been paid to the statistical properties of the procedure.
We develop a general framework for portfolio sorting by casting it as a
nonparametric estimator. We present valid asymptotic inference methods and a
valid mean square error expansion of the estimator leading to an optimal choice
for the number of portfolios. In practical settings, the optimal choice may be
much larger than the standard choices of 5 or 10. To illustrate the relevance
of our results, we revisit the size and momentum anomalies.

arXiv link: http://arxiv.org/abs/1809.03584v3

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2018-09-09

Bayesian dynamic variable selection in high dimensions

Authors: Gary Koop, Dimitris Korobilis

This paper proposes a variational Bayes algorithm for computationally
efficient posterior and predictive inference in time-varying parameter (TVP)
models. Within this context we specify a new dynamic variable/model selection
strategy for TVP dynamic regression models in the presence of a large number of
predictors. This strategy allows for assessing in individual time periods which
predictors are relevant (or not) for forecasting the dependent variable. The
new algorithm is evaluated numerically using synthetic data and its
computational advantages are established. Using macroeconomic data for the US
we find that regression models that combine time-varying parameters with the
information in many predictors have the potential to improve forecasts of price
inflation over a number of alternative forecasting models.

arXiv link: http://arxiv.org/abs/1809.03031v2

Econometrics arXiv updated paper (originally submitted: 2018-09-07)

Change-Point Testing for Risk Measures in Time Series

Authors: Lin Fan, Junting Duan, Peter W. Glynn, Markus Pelger

We propose novel methods for change-point testing for nonparametric
estimators of expected shortfall and related risk measures in weakly dependent
time series. We can detect general multiple structural changes in the tails of
marginal distributions of time series under general assumptions.
Self-normalization allows us to avoid the issues of standard error estimation.
The theoretical foundations for our methods are functional central limit
theorems, which we develop under weak assumptions. An empirical study of S&P
500 and US Treasury bond returns illustrates the practical use of our methods
in detecting and quantifying instability in the tails of financial time series.

arXiv link: http://arxiv.org/abs/1809.02303v3

Econometrics arXiv updated paper (originally submitted: 2018-09-05)

Efficient Difference-in-Differences Estimation with High-Dimensional Common Trend Confounding

Authors: Michael Zimmert

This study considers various semiparametric difference-in-differences models
under different assumptions on the relation between the treatment group
identifier, time and covariates for cross-sectional and panel data. The
variance lower bound is shown to be sensitive to the model assumptions imposed
implying a robustness-efficiency trade-off. The obtained efficient influence
functions lead to estimators that are rate double robust and have desirable
asymptotic properties under weak first stage convergence conditions. This
enables to use sophisticated machine-learning algorithms that can cope with
settings where common trend confounding is high-dimensional. The usefulness of
the proposed estimators is assessed in an empirical example. It is shown that
the efficiency-robustness trade-offs and the choice of first stage predictors
can lead to divergent empirical results in practice.

arXiv link: http://arxiv.org/abs/1809.01643v5

Econometrics arXiv updated paper (originally submitted: 2018-09-04)

Shape-Enforcing Operators for Point and Interval Estimators

Authors: Xi Chen, Victor Chernozhukov, Iván Fernández-Val, Scott Kostyshak, Ye Luo

A common problem in econometrics, statistics, and machine learning is to
estimate and make inference on functions that satisfy shape restrictions. For
example, distribution functions are nondecreasing and range between zero and
one, height growth charts are nondecreasing in age, and production functions
are nondecreasing and quasi-concave in input quantities. We propose a method to
enforce these restrictions ex post on point and interval estimates of the
target function by applying functional operators. If an operator satisfies
certain properties that we make precise, the shape-enforced point estimates are
closer to the target function than the original point estimates and the
shape-enforced interval estimates have greater coverage and shorter length than
the original interval estimates. We show that these properties hold for six
different operators that cover commonly used shape restrictions in practice:
range, convexity, monotonicity, monotone convexity, quasi-convexity, and
monotone quasi-convexity. We illustrate the results with two empirical
applications to the estimation of a height growth chart for infants in India
and a production function for chemical firms in China.

arXiv link: http://arxiv.org/abs/1809.01038v5

Econometrics arXiv updated paper (originally submitted: 2018-09-01)

Optimal Bandwidth Choice for Robust Bias Corrected Inference in Regression Discontinuity Designs

Authors: Sebastian Calonico, Matias D. Cattaneo, Max H. Farrell

Modern empirical work in Regression Discontinuity (RD) designs often employs
local polynomial estimation and inference with a mean square error (MSE)
optimal bandwidth choice. This bandwidth yields an MSE-optimal RD treatment
effect estimator, but is by construction invalid for inference. Robust bias
corrected (RBC) inference methods are valid when using the MSE-optimal
bandwidth, but we show they yield suboptimal confidence intervals in terms of
coverage error. We establish valid coverage error expansions for RBC confidence
interval estimators and use these results to propose new inference-optimal
bandwidth choices for forming these intervals. We find that the standard
MSE-optimal bandwidth for the RD point estimator is too large when the goal is
to construct RBC confidence intervals with the smallest coverage error. We
further optimize the constant terms behind the coverage error to derive new
optimal choices for the auxiliary bandwidth required for RBC inference. Our
expansions also establish that RBC inference yields higher-order refinements
(relative to traditional undersmoothing) in the context of RD designs. Our main
results cover sharp and sharp kink RD designs under conditional
heteroskedasticity, and we discuss extensions to fuzzy and other RD designs,
clustered sampling, and pre-intervention covariates adjustments. The
theoretical findings are illustrated with a Monte Carlo experiment and an
empirical application, and the main methodological results are available in
R and Stata packages.

arXiv link: http://arxiv.org/abs/1809.00236v4

Econometrics arXiv updated paper (originally submitted: 2018-08-31)

Identifying the Discount Factor in Dynamic Discrete Choice Models

Authors: Jaap H. Abbring, Øystein Daljord

Empirical research often cites observed choice responses to variation that
shifts expected discounted future utilities, but not current utilities, as an
intuitive source of information on time preferences. We study the
identification of dynamic discrete choice models under such economically
motivated exclusion restrictions on primitive utilities. We show that each
exclusion restriction leads to an easily interpretable moment condition with
the discount factor as the only unknown parameter. The identified set of
discount factors that solves this condition is finite, but not necessarily a
singleton. Consequently, in contrast to common intuition, an exclusion
restriction does not in general give point identification. Finally, we show
that exclusion restrictions have nontrivial empirical content: The implied
moment conditions impose restrictions on choices that are absent from the
unconstrained model.

arXiv link: http://arxiv.org/abs/1808.10651v4

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2018-08-30

A Self-Attention Network for Hierarchical Data Structures with an Application to Claims Management

Authors: Leander Löw, Martin Spindler, Eike Brechmann

Insurance companies must manage millions of claims per year. While most of
these claims are non-fraudulent, fraud detection is core for insurance
companies. The ultimate goal is a predictive model to single out the fraudulent
claims and pay out the non-fraudulent ones immediately. Modern machine learning
methods are well suited for this kind of problem. Health care claims often have
a data structure that is hierarchical and of variable length. We propose one
model based on piecewise feed forward neural networks (deep learning) and
another model based on self-attention neural networks for the task of claim
management. We show that the proposed methods outperform bag-of-words based
models, hand designed features, and models based on convolutional neural
networks, on a data set of two million health care claims. The proposed
self-attention method performs the best.

arXiv link: http://arxiv.org/abs/1808.10543v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2018-08-30

Uniform Inference in High-Dimensional Gaussian Graphical Models

Authors: Sven Klaassen, Jannis Kück, Martin Spindler, Victor Chernozhukov

Graphical models have become a very popular tool for representing
dependencies within a large set of variables and are key for representing
causal structures. We provide results for uniform inference on high-dimensional
graphical models with the number of target parameters $d$ being possible much
larger than sample size. This is in particular important when certain features
or structures of a causal model should be recovered. Our results highlight how
in high-dimensional settings graphical models can be estimated and recovered
with modern machine learning methods in complex data sets. To construct
simultaneous confidence regions on many target parameters, sufficiently fast
estimation rates of the nuisance functions are crucial. In this context, we
establish uniform estimation rates and sparsity guarantees of the square-root
estimator in a random design under approximate sparsity conditions that might
be of independent interest for related problems in high-dimensions. We also
demonstrate in a comprehensive simulation study that our procedure has good
small sample properties.

arXiv link: http://arxiv.org/abs/1808.10532v2

Econometrics arXiv paper, submitted: 2018-08-29

House Price Modeling with Digital Census

Authors: Enwei Zhu, Stanislav Sobolevsky

Urban house prices are strongly associated with local socioeconomic factors.
In literature, house price modeling is based on socioeconomic variables from
traditional census, which is not real-time, dynamic and comprehensive. Inspired
by the emerging concept of "digital census" - using large-scale digital records
of human activities to measure urban population dynamics and socioeconomic
conditions, we introduce three typical datasets, namely 311 complaints, crime
complaints and taxi trips, into house price modeling. Based on the individual
housing sales data in New York City, we provide comprehensive evidence that
these digital census datasets can substantially improve the modeling
performances on both house price levels and changes, regardless whether
traditional census is included or not. Hence, digital census can serve as both
effective alternatives and complements to traditional census for house price
modeling.

arXiv link: http://arxiv.org/abs/1809.03834v1

Econometrics arXiv updated paper (originally submitted: 2018-08-28)

Inference based on Kotlarski's Identity

Authors: Kengo Kato, Yuya Sasaki, Takuya Ura

Kotlarski's identity has been widely used in applied economic research.
However, how to conduct inference based on this popular identification approach
has been an open question for two decades. This paper addresses this open
problem by constructing a novel confidence band for the density function of a
latent variable in repeated measurement error model. The confidence band builds
on our finding that we can rewrite Kotlarski's identity as a system of linear
moment restrictions. The confidence band controls the asymptotic size uniformly
over a class of data generating processes, and it is consistent against all
fixed alternatives. Simulation studies support our theoretical results.

arXiv link: http://arxiv.org/abs/1808.09375v3

Econometrics arXiv updated paper (originally submitted: 2018-08-28)

A Residual Bootstrap for Conditional Value-at-Risk

Authors: Eric Beutner, Alexander Heinemann, Stephan Smeekes

A fixed-design residual bootstrap method is proposed for the two-step
estimator of Francq and Zako\"ian (2015) associated with the conditional
Value-at-Risk. The bootstrap's consistency is proven for a general class of
volatility models and intervals are constructed for the conditional
Value-at-Risk. A simulation study reveals that the equal-tailed percentile
bootstrap interval tends to fall short of its nominal value. In contrast, the
reversed-tails bootstrap interval yields accurate coverage. We also compare the
theoretically analyzed fixed-design bootstrap with the recursive-design
bootstrap. It turns out that the fixed-design bootstrap performs equally well
in terms of average coverage, yet leads on average to shorter intervals in
smaller samples. An empirical application illustrates the interval estimation.

arXiv link: http://arxiv.org/abs/1808.09125v4

Econometrics arXiv updated paper (originally submitted: 2018-08-27)

Tests for price indices in a dynamic item universe

Authors: Li-Chun Zhang, Ingvild Johansen, Ragnhild Nygaard

There is generally a need to deal with quality change and new goods in the
consumer price index due to the underlying dynamic item universe. Traditionally
axiomatic tests are defined for a fixed universe. We propose five tests
explicitly formulated for a dynamic item universe, and motivate them both from
the perspectives of a cost-of-goods index and a cost-of-living index. None of
the indices satisfies all the tests at the same time, which are currently
available for making use of scanner data that comprises the whole item
universe. The set of tests provides a rigorous diagnostic for whether an index
is completely appropriate in a dynamic item universe, as well as pointing
towards the directions of possible remedies. We thus outline a large index
family that potentially can satisfy all the tests.

arXiv link: http://arxiv.org/abs/1808.08995v2

Econometrics arXiv cross-link from q-fin.CP (q-fin.CP), submitted: 2018-08-23

Supporting Crowd-Powered Science in Economics: FRACTI, a Conceptual Framework for Large-Scale Collaboration and Transparent Investigation in Financial Markets

Authors: Jorge Faleiro, Edward Tsang

Modern investigation in economics and in other sciences requires the ability
to store, share, and replicate results and methods of experiments that are
often multidisciplinary and yield a massive amount of data. Given the
increasing complexity and growing interaction across diverse bodies of
knowledge it is becoming imperative to define a platform to properly support
collaborative research and track origin, accuracy and use of data. This paper
starts by defining a set of methods leveraging scientific principles and
advocating the importance of those methods in multidisciplinary, computer
intensive fields like computational finance. The next part of this paper
defines a class of systems called scientific support systems, vis-a-vis usages
in other research fields such as bioinformatics, physics and engineering. We
outline a basic set of fundamental concepts, and list our goals and motivation
for leveraging such systems to enable large-scale investigation, "crowd powered
science", in economics. The core of this paper provides an outline of FRACTI in
five steps. First we present definitions related to scientific support systems
intrinsic to finance and describe common characteristics of financial use
cases. The second step concentrates on what can be exchanged through the
definition of shareable entities called contributions. The third step is the
description of a classification system for building blocks of the conceptual
framework, called facets. The fourth step introduces the meta-model that will
enable provenance tracking and representation of data fragments and simulation.
Finally we describe intended cases of use to highlight main strengths of
FRACTI: application of the scientific method for investigation in computational
finance, large-scale collaboration and simulation.

arXiv link: http://arxiv.org/abs/1808.07959v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2018-08-22

Optimizing the tie-breaker regression discontinuity design

Authors: Art B. Owen, Hal Varian

Motivated by customer loyalty plans and scholarship programs, we study
tie-breaker designs which are hybrids of randomized controlled trials (RCTs)
and regression discontinuity designs (RDDs). We quantify the statistical
efficiency of a tie-breaker design in which a proportion $\Delta$ of observed
subjects are in the RCT. In a two line regression, statistical efficiency
increases monotonically with $\Delta$, so efficiency is maximized by an RCT. We
point to additional advantages of tie-breakers versus RDD: for a nonparametric
regression the boundary bias is much less severe and for quadratic regression,
the variance is greatly reduced. For a two line model we can quantify the short
term value of the treatment allocation and this comparison favors smaller
$\Delta$ with the RDD being best. We solve for the optimal tradeoff between
these exploration and exploitation goals. The usual tie-breaker design applies
an RCT on the middle $\Delta$ subjects as ranked by the assignment variable. We
quantify the efficiency of other designs such as experimenting only in the
second decile from the top. We also show that in some general parametric models
a Monte Carlo evaluation can be replaced by matrix algebra.

arXiv link: http://arxiv.org/abs/1808.07563v3

Econometrics arXiv updated paper (originally submitted: 2018-08-22)

Sensitivity Analysis using Approximate Moment Condition Models

Authors: Timothy B. Armstrong, Michal Kolesár

We consider inference in models defined by approximate moment conditions. We
show that near-optimal confidence intervals (CIs) can be formed by taking a
generalized method of moments (GMM) estimator, and adding and subtracting the
standard error times a critical value that takes into account the potential
bias from misspecification of the moment conditions. In order to optimize
performance under potential misspecification, the weighting matrix for this GMM
estimator takes into account this potential bias, and therefore differs from
the one that is optimal under correct specification. To formally show the
near-optimality of these CIs, we develop asymptotic efficiency bounds for
inference in the locally misspecified GMM setting. These bounds may be of
independent interest, due to their implications for the possibility of using
moment selection procedures when conducting inference in moment condition
models. We apply our methods in an empirical application to automobile demand,
and show that adjusting the weighting matrix can shrink the CIs by a factor of
3 or more.

arXiv link: http://arxiv.org/abs/1808.07387v5

Econometrics arXiv cross-link from cs.CY (cs.CY), submitted: 2018-08-20

Deep learning, deep change? Mapping the development of the Artificial Intelligence General Purpose Technology

Authors: J. Klinger, J. Mateos-Garcia, K. Stathoulopoulos

General Purpose Technologies (GPTs) that can be applied in many industries
are an important driver of economic growth and national and regional
competitiveness. In spite of this, the geography of their development and
diffusion has not received significant attention in the literature. We address
this with an analysis of Deep Learning (DL), a core technique in Artificial
Intelligence (AI) increasingly being recognized as the latest GPT. We identify
DL papers in a novel dataset from ArXiv, a popular preprints website, and use
CrunchBase, a technology business directory to measure industrial capabilities
related to it. After showing that DL conforms with the definition of a GPT,
having experienced rapid growth and diffusion into new fields where it has
generated an impact, we describe changes in its geography. Our analysis shows
China's rise in AI rankings and relative decline in several European countries.
We also find that initial volatility in the geography of DL has been followed
by consolidation, suggesting that the window of opportunity for new entrants
might be closing down as new DL research hubs become dominant. Finally, we
study the regional drivers of DL clustering. We find that competitive DL
clusters tend to be based in regions combining research and industrial
activities related to it. This could be because GPT developers and adopters
located close to each other can collaborate and share knowledge more easily,
thus overcoming coordination failures in GPT deployment. Our analysis also
reveals a Chinese comparative advantage in DL after we control for other
explanatory factors, perhaps underscoring the importance of access to data and
supportive policies for the successful development of this complex, `omni-use'
technology.

arXiv link: http://arxiv.org/abs/1808.06355v1

Econometrics arXiv paper, submitted: 2018-08-17

Quantifying the Computational Advantage of Forward Orthogonal Deviations

Authors: Robert F. Phillips

Under suitable conditions, one-step generalized method of moments (GMM) based
on the first-difference (FD) transformation is numerically equal to one-step
GMM based on the forward orthogonal deviations (FOD) transformation. However,
when the number of time periods ($T$) is not small, the FOD transformation
requires less computational work. This paper shows that the computational
complexity of the FD and FOD transformations increases with the number of
individuals ($N$) linearly, but the computational complexity of the FOD
transformation increases with $T$ at the rate $T^{4}$ increases, while the
computational complexity of the FD transformation increases at the rate $T^{6}$
increases. Simulations illustrate that calculations exploiting the FOD
transformation are performed orders of magnitude faster than those using the FD
transformation. The results in the paper indicate that, when one-step GMM based
on the FD and FOD transformations are the same, Monte Carlo experiments can be
conducted much faster if the FOD version of the estimator is used.

arXiv link: http://arxiv.org/abs/1808.05995v1

Econometrics arXiv updated paper (originally submitted: 2018-08-17)

Estimation in a Generalization of Bivariate Probit Models with Dummy Endogenous Regressors

Authors: Sukjin Han, Sungwon Lee

The purpose of this paper is to provide guidelines for empirical researchers
who use a class of bivariate threshold crossing models with dummy endogenous
variables. A common practice employed by the researchers is the specification
of the joint distribution of the unobservables as a bivariate normal
distribution, which results in a bivariate probit model. To address the problem
of misspecification in this practice, we propose an easy-to-implement
semiparametric estimation framework with parametric copula and nonparametric
marginal distributions. We establish asymptotic theory, including root-n
normality, for the sieve maximum likelihood estimators that can be used to
conduct inference on the individual structural parameters and the average
treatment effect (ATE). In order to show the practical relevance of the
proposed framework, we conduct a sensitivity analysis via extensive Monte Carlo
simulation exercises. The results suggest that the estimates of the parameters,
especially the ATE, are sensitive to parametric specification, while
semiparametric estimation exhibits robustness to underlying data generating
processes. We then provide an empirical illustration where we estimate the
effect of health insurance on doctor visits. In this paper, we also show that
the absence of excluded instruments may result in identification failure, in
contrast to what some practitioners believe.

arXiv link: http://arxiv.org/abs/1808.05792v2

Econometrics arXiv paper, submitted: 2018-08-16

When Do Households Invest in Solar Photovoltaics? An Application of Prospect Theory

Authors: Martin Klein, Marc Deissenroth

While investments in renewable energy sources (RES) are incentivized around
the world, the policy tools that do so are still poorly understood, leading to
costly misadjustments in many cases. As a case study, the deployment dynamics
of residential solar photovoltaics (PV) invoked by the German feed-in tariff
legislation are investigated. Here we report a model showing that the question
of when people invest in residential PV systems is found to be not only
determined by profitability, but also by profitability's change compared to the
status quo. This finding is interpreted in the light of loss aversion, a
concept developed in Kahneman and Tversky's Prospect Theory. The model is able
to reproduce most of the dynamics of the uptake with only a few financial and
behavioral assumptions

arXiv link: http://arxiv.org/abs/1808.05572v1

Econometrics arXiv updated paper (originally submitted: 2018-08-15)

Design-based Analysis in Difference-In-Differences Settings with Staggered Adoption

Authors: Susan Athey, Guido Imbens

In this paper we study estimation of and inference for average treatment
effects in a setting with panel data. We focus on the setting where units,
e.g., individuals, firms, or states, adopt the policy or treatment of interest
at a particular point in time, and then remain exposed to this treatment at all
times afterwards. We take a design perspective where we investigate the
properties of estimators and procedures given assumptions on the assignment
process. We show that under random assignment of the adoption date the standard
Difference-In-Differences estimator is is an unbiased estimator of a particular
weighted average causal effect. We characterize the proeperties of this
estimand, and show that the standard variance estimator is conservative.

arXiv link: http://arxiv.org/abs/1808.05293v3

Econometrics arXiv paper, submitted: 2018-08-15

Can GDP measurement be further improved? Data revision and reconciliation

Authors: Jan P. A. M. Jacobs, Samad Sarferaz, Jan-Egbert Sturm, Simon van Norden

Recent years have seen many attempts to combine expenditure-side estimates of
U.S. real output (GDE) growth with income-side estimates (GDI) to improve
estimates of real GDP growth. We show how to incorporate information from
multiple releases of noisy data to provide more precise estimates while
avoiding some of the identifying assumptions required in earlier work. This
relies on a new insight: using multiple data releases allows us to distinguish
news and noise measurement errors in situations where a single vintage does
not.
Our new measure, GDP++, fits the data better than GDP+, the GDP growth
measure of Aruoba et al. (2016) published by the Federal Reserve Bank of
Philadephia. Historical decompositions show that GDE releases are more
informative than GDI, while the use of multiple data releases is particularly
important in the quarters leading up to the Great Recession.

arXiv link: http://arxiv.org/abs/1808.04970v1

Econometrics arXiv updated paper (originally submitted: 2018-08-15)

A Unified Framework for Efficient Estimation of General Treatment Models

Authors: Chunrong Ai, Oliver Linton, Kaiji Motegi, Zheng Zhang

This paper presents a weighted optimization framework that unifies the
binary,multi-valued, continuous, as well as mixture of discrete and continuous
treatment, under the unconfounded treatment assignment. With a general loss
function, the framework includes the average, quantile and asymmetric least
squares causal effect of treatment as special cases. For this general
framework, we first derive the semiparametric efficiency bound for the causal
effect of treatment, extending the existing bound results to a wider class of
models. We then propose a generalized optimization estimation for the causal
effect with weights estimated by solving an expanding set of equations. Under
some sufficient conditions, we establish consistency and asymptotic normality
of the proposed estimator of the causal effect and show that the estimator
attains our semiparametric efficiency bound, thereby extending the existing
literature on efficient estimation of causal effect to a wider class of
applications. Finally, we discuss etimation of some causal effect functionals
such as the treatment effect curve and the average outcome. To evaluate the
finite sample performance of the proposed procedure, we conduct a small scale
simulation study and find that the proposed estimation has practical value. To
illustrate the applicability of the procedure, we revisit the literature on
campaign advertise and campaign contributions. Unlike the existing procedures
which produce mixed results, we find no evidence of campaign advertise on
campaign contribution.

arXiv link: http://arxiv.org/abs/1808.04936v2

Econometrics arXiv updated paper (originally submitted: 2018-08-13)

Extrapolating Treatment Effects in Multi-Cutoff Regression Discontinuity Designs

Authors: Matias D. Cattaneo, Luke Keele, Rocio Titiunik, Gonzalo Vazquez-Bare

In non-experimental settings, the Regression Discontinuity (RD) design is one
of the most credible identification strategies for program evaluation and
causal inference. However, RD treatment effect estimands are necessarily local,
making statistical methods for the extrapolation of these effects a key area
for development. We introduce a new method for extrapolation of RD effects that
relies on the presence of multiple cutoffs, and is therefore design-based. Our
approach employs an easy-to-interpret identifying assumption that mimics the
idea of "common trends" in difference-in-differences designs. We illustrate our
methods with data on a subsidized loan program on post-education attendance in
Colombia, and offer new evidence on program effects for students with test
scores away from the cutoff that determined program eligibility.

arXiv link: http://arxiv.org/abs/1808.04416v3

Econometrics arXiv paper, submitted: 2018-08-12

Engineering and Economic Analysis for Electric Vehicle Charging Infrastructure --- Placement, Pricing, and Market Design

Authors: Chao Luo

This dissertation is to study the interplay between large-scale electric
vehicle (EV) charging and the power system. We address three important issues
pertaining to EV charging and integration into the power system: (1) charging
station placement, (2) pricing policy and energy management strategy, and (3)
electricity trading market and distribution network design to facilitate
integrating EV and renewable energy source (RES) into the power system.
For charging station placement problem, we propose a multi-stage consumer
behavior based placement strategy with incremental EV penetration rates and
model the EV charging industry as an oligopoly where the entire market is
dominated by a few charging service providers (oligopolists). The optimal
placement policy for each service provider is obtained by solving a Bayesian
game.
For pricing and energy management of EV charging stations, we provide
guidelines for charging service providers to determine charging price and
manage electricity reserve to balance the competing objectives of improving
profitability, enhancing customer satisfaction, and reducing impact on the
power system. Two algorithms --- stochastic dynamic programming (SDP) algorithm
and greedy algorithm (benchmark algorithm) are applied to derive the pricing
and electricity procurement strategy.
We design a novel electricity trading market and distribution network, which
supports seamless RES integration, grid to vehicle (G2V), vehicle to grid
(V2G), vehicle to vehicle (V2V), and distributed generation (DG) and storage.
We apply a sharing economy model to the electricity sector to stimulate
different entities to exchange and monetize their underutilized electricity. A
fitness-score (FS)-based supply-demand matching algorithm is developed by
considering consumer surplus, electricity network congestion, and economic
dispatch.

arXiv link: http://arxiv.org/abs/1808.03897v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2018-08-10

BooST: Boosting Smooth Trees for Partial Effect Estimation in Nonlinear Regressions

Authors: Yuri Fonseca, Marcelo Medeiros, Gabriel Vasconcelos, Alvaro Veiga

In this paper, we introduce a new machine learning (ML) model for nonlinear
regression called the Boosted Smooth Transition Regression Trees (BooST), which
is a combination of boosting algorithms with smooth transition regression
trees. The main advantage of the BooST model is the estimation of the
derivatives (partial effects) of very general nonlinear models. Therefore, the
model can provide more interpretation about the mapping between the covariates
and the dependent variable than other tree-based models, such as Random
Forests. We present several examples with both simulated and real data.

arXiv link: http://arxiv.org/abs/1808.03698v5

Econometrics arXiv paper, submitted: 2018-08-09

A Panel Quantile Approach to Attrition Bias in Big Data: Evidence from a Randomized Experiment

Authors: Matthew Harding, Carlos Lamarche

This paper introduces a quantile regression estimator for panel data models
with individual heterogeneity and attrition. The method is motivated by the
fact that attrition bias is often encountered in Big Data applications. For
example, many users sign-up for the latest program but few remain active users
several months later, making the evaluation of such interventions inherently
very challenging. Building on earlier work by Hausman and Wise (1979), we
provide a simple identification strategy that leads to a two-step estimation
procedure. In the first step, the coefficients of interest in the selection
equation are consistently estimated using parametric or nonparametric methods.
In the second step, standard panel quantile methods are employed on a subset of
weighted observations. The estimator is computationally easy to implement in
Big Data applications with a large number of subjects. We investigate the
conditions under which the parameter estimator is asymptotically Gaussian and
we carry out a series of Monte Carlo simulations to investigate the finite
sample properties of the estimator. Lastly, using a simulation exercise, we
apply the method to the evaluation of a recent Time-of-Day electricity pricing
experiment inspired by the work of Aigner and Hausman (1980).

arXiv link: http://arxiv.org/abs/1808.03364v1

Econometrics arXiv paper, submitted: 2018-08-09

Change Point Estimation in Panel Data with Time-Varying Individual Effects

Authors: Otilia Boldea, Bettina Drepper, Zhuojiong Gan

This paper proposes a method for estimating multiple change points in panel
data models with unobserved individual effects via ordinary least-squares
(OLS). Typically, in this setting, the OLS slope estimators are inconsistent
due to the unobserved individual effects bias. As a consequence, existing
methods remove the individual effects before change point estimation through
data transformations such as first-differencing. We prove that under reasonable
assumptions, the unobserved individual effects bias has no impact on the
consistent estimation of change points. Our simulations show that since our
method does not remove any variation in the dataset before change point
estimation, it performs better in small samples compared to first-differencing
methods. We focus on short panels because they are commonly used in practice,
and allow for the unobserved individual effects to vary over time. Our method
is illustrated via two applications: the environmental Kuznets curve and the
U.S. house price expectations after the financial crisis.

arXiv link: http://arxiv.org/abs/1808.03109v1

Econometrics arXiv updated paper (originally submitted: 2018-08-07)

Machine Learning for Dynamic Discrete Choice

Authors: Vira Semenova

Dynamic discrete choice models often discretize the state vector and restrict
its dimension in order to achieve valid inference. I propose a novel two-stage
estimator for the set-identified structural parameter that incorporates a
high-dimensional state space into the dynamic model of imperfect competition.
In the first stage, I estimate the state variable's law of motion and the
equilibrium policy function using machine learning tools. In the second stage,
I plug the first-stage estimates into a moment inequality and solve for the
structural parameter. The moment function is presented as the sum of two
components, where the first one expresses the equilibrium assumption and the
second one is a bias correction term that makes the sum insensitive (i.e.,
orthogonal) to first-stage bias. The proposed estimator uniformly converges at
the root-N rate and I use it to construct confidence regions. The results
developed here can be used to incorporate high-dimensional state space into
classic dynamic discrete choice models, for example, those considered in Rust
(1987), Bajari et al. (2007), and Scott (2013).

arXiv link: http://arxiv.org/abs/1808.02569v2

Econometrics arXiv updated paper (originally submitted: 2018-08-04)

Coverage Error Optimal Confidence Intervals for Local Polynomial Regression

Authors: Sebastian Calonico, Matias D. Cattaneo, Max H. Farrell

This paper studies higher-order inference properties of nonparametric local
polynomial regression methods under random sampling. We prove Edgeworth
expansions for $t$ statistics and coverage error expansions for interval
estimators that (i) hold uniformly in the data generating process, (ii) allow
for the uniform kernel, and (iii) cover estimation of derivatives of the
regression function. The terms of the higher-order expansions, and their
associated rates as a function of the sample size and bandwidth sequence,
depend on the smoothness of the population regression function, the smoothness
exploited by the inference procedure, and on whether the evaluation point is in
the interior or on the boundary of the support. We prove that robust bias
corrected confidence intervals have the fastest coverage error decay rates in
all cases, and we use our results to deliver novel, inference-optimal bandwidth
selectors. The main methodological results are implemented in companion
R and Stata software packages.

arXiv link: http://arxiv.org/abs/1808.01398v4

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2018-08-01

A Theory of Dichotomous Valuation with Applications to Variable Selection

Authors: Xingwei Hu

An econometric or statistical model may undergo a marginal gain if we admit a
new variable to the model, and a marginal loss if we remove an existing
variable from the model. Assuming equality of opportunity among all candidate
variables, we derive a valuation framework by the expected marginal gain and
marginal loss in all potential modeling scenarios. However, marginal gain and
loss are not symmetric; thus, we introduce three unbiased solutions. When used
in variable selection, our new approaches significantly outperform several
popular methods used in practice. The results also explore some novel traits of
the Shapley value.

arXiv link: http://arxiv.org/abs/1808.00131v5

Econometrics arXiv updated paper (originally submitted: 2018-07-31)

On the Unbiased Asymptotic Normality of Quantile Regression with Fixed Effects

Authors: Antonio F. Galvao, Jiaying Gu, Stanislav Volgushev

Nonlinear panel data models with fixed individual effects provide an
important set of tools for describing microeconometric data. In a large class
of such models (including probit, proportional hazard and quantile regression
to name just a few) it is impossible to difference out individual effects, and
inference is usually justified in a `large n large T' asymptotic framework.
However, there is a considerable gap in the type of assumptions that are
currently imposed in models with smooth score functions (such as probit, and
proportional hazard) and quantile regression. In the present paper we show that
this gap can be bridged and establish asymptotic unbiased normality for
quantile regression panels under conditions on n,T that are very close to what
is typically assumed in standard nonlinear panels. Our results considerably
improve upon existing theory and show that quantile regression is applicable to
the same type of panel data (in terms of n,T) as other commonly used nonlinear
panel data models. Thorough numerical experiments confirm our theoretical
findings.

arXiv link: http://arxiv.org/abs/1807.11863v2

Econometrics arXiv updated paper (originally submitted: 2018-07-31)

The econometrics of happiness: Are we underestimating the returns to education and income?

Authors: Christopher P Barrington-Leigh

This paper describes a fundamental and empirically conspicuous problem
inherent to surveys of human feelings and opinions in which subjective
responses are elicited on numerical scales. The paper also proposes a solution.
The problem is a tendency by some individuals -- particularly those with low
levels of education -- to simplify the response scale by considering only a
subset of possible responses such as the lowest, middle, and highest. In
principle, this “focal value rounding” (FVR) behavior renders invalid even
the weak ordinality assumption often used in analysis of such data. With
“happiness” or life satisfaction data as an example, descriptive methods and
a multinomial logit model both show that the effect is large and that education
and, to a lesser extent, income level are predictors of FVR behavior.
A model simultaneously accounting for the underlying wellbeing and for the
degree of FVR is able to estimate the latent subjective wellbeing, i.e. the
counterfactual full-scale responses for all respondents, the biases associated
with traditional estimates, and the fraction of respondents who exhibit FVR.
Addressing this problem helps to resolve a longstanding puzzle in the life
satisfaction literature, namely that the returns to education, after adjusting
for income, appear to be small or negative. Due to the same econometric
problem, the marginal utility of income in a subjective wellbeing sense has
been consistently underestimated.

arXiv link: http://arxiv.org/abs/1807.11835v3

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2018-07-30

Local Linear Forests

Authors: Rina Friedberg, Julie Tibshirani, Susan Athey, Stefan Wager

Random forests are a powerful method for non-parametric regression, but are
limited in their ability to fit smooth signals, and can show poor predictive
performance in the presence of strong, smooth effects. Taking the perspective
of random forests as an adaptive kernel method, we pair the forest kernel with
a local linear regression adjustment to better capture smoothness. The
resulting procedure, local linear forests, enables us to improve on asymptotic
rates of convergence for random forests with smooth signals, and provides
substantial gains in accuracy on both real and simulated data. We prove a
central limit theorem valid under regularity conditions on the forest and
smoothness constraints, and propose a computationally efficient construction
for confidence intervals. Moving to a causal inference application, we discuss
the merits of local regression adjustments for heterogeneous treatment effect
estimation, and give an example on a dataset exploring the effect word choice
has on attitudes to the social safety net. Last, we include simulation results
on real and generated data.

arXiv link: http://arxiv.org/abs/1807.11408v4

Econometrics arXiv paper, submitted: 2018-07-26

Two-Step Estimation and Inference with Possibly Many Included Covariates

Authors: Matias D. Cattaneo, Michael Jansson, Xinwei Ma

We study the implications of including many covariates in a first-step
estimate entering a two-step estimation procedure. We find that a first order
bias emerges when the number of included covariates is "large"
relative to the square-root of sample size, rendering standard inference
procedures invalid. We show that the jackknife is able to estimate this "many
covariates" bias consistently, thereby delivering a new automatic
bias-corrected two-step point estimator. The jackknife also consistently
estimates the standard error of the original two-step point estimator. For
inference, we develop a valid post-bias-correction bootstrap approximation that
accounts for the additional variability introduced by the jackknife
bias-correction. We find that the jackknife bias-corrected point estimator and
the bootstrap post-bias-correction inference perform excellent in simulations,
offering important improvements over conventional two-step point estimators and
inference procedures, which are not robust to including many covariates. We
apply our results to an array of distinct treatment effect, policy evaluation,
and other applied microeconomics settings. In particular, we discuss production
function and marginal treatment effect estimation in detail.

arXiv link: http://arxiv.org/abs/1807.10100v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2018-07-23

Score Permutation Based Finite Sample Inference for Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) Models

Authors: Balázs Csanád Csáji

A standard model of (conditional) heteroscedasticity, i.e., the phenomenon
that the variance of a process changes over time, is the Generalized
AutoRegressive Conditional Heteroskedasticity (GARCH) model, which is
especially important for economics and finance. GARCH models are typically
estimated by the Quasi-Maximum Likelihood (QML) method, which works under mild
statistical assumptions. Here, we suggest a finite sample approach, called
ScoPe, to construct distribution-free confidence regions around the QML
estimate, which have exact coverage probabilities, despite no additional
assumptions about moments are made. ScoPe is inspired by the recently developed
Sign-Perturbed Sums (SPS) method, which however cannot be applied in the GARCH
case. ScoPe works by perturbing the score function using randomly permuted
residuals. This produces alternative samples which lead to exact confidence
regions. Experiments on simulated and stock market data are also presented, and
ScoPe is compared with the asymptotic theory and bootstrap approaches.

arXiv link: http://arxiv.org/abs/1807.08390v1

Econometrics arXiv paper, submitted: 2018-07-21

EMU and ECB Conflicts

Authors: William Mackenzie

In dynamical framework the conflict between government and the central bank
according to the exchange Rate of payment of fixed rates and fixed rates of
fixed income (EMU) convergence criteria such that the public debt / GDP ratio
The method consists of calculating private public debt management in a public
debt management system purpose there is no mechanism to allow naturally for
this adjustment.

arXiv link: http://arxiv.org/abs/1807.08097v1

Econometrics arXiv updated paper (originally submitted: 2018-07-20)

Asymptotic results under multiway clustering

Authors: Laurent Davezies, Xavier D'Haultfoeuille, Yannick Guyonvarch

If multiway cluster-robust standard errors are used routinely in applied
economics, surprisingly few theoretical results justify this practice. This
paper aims to fill this gap. We first prove, under nearly the same conditions
as with i.i.d. data, the weak convergence of empirical processes under multiway
clustering. This result implies central limit theorems for sample averages but
is also key for showing the asymptotic normality of nonlinear estimators such
as GMM estimators. We then establish consistency of various asymptotic variance
estimators, including that of Cameron et al. (2011) but also a new estimator
that is positive by construction. Next, we show the general consistency, for
linear and nonlinear estimators, of the pigeonhole bootstrap, a resampling
scheme adapted to multiway clustering. Monte Carlo simulations suggest that
inference based on our two preferred methods may be accurate even with very few
clusters, and significantly improve upon inference based on Cameron et al.
(2011).

arXiv link: http://arxiv.org/abs/1807.07925v2

Econometrics arXiv paper, submitted: 2018-07-20

Stability in EMU

Authors: Theo Peeters

The public debt and deficit ceilings of the Maastricht Treaty are the subject
of recurring controversy. First, there is debate about the role and impact of
these criteria in the initial phase of the introduction of the single currency.
Secondly, it must be specified how these will then be applied, in a permanent
regime, when the single currency is well established.

arXiv link: http://arxiv.org/abs/1807.07730v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2018-07-19

Machine Learning Classifiers Do Not Improve the Prediction of Academic Risk: Evidence from Australia

Authors: Sarah Cornell-Farrow, Robert Garrard

Machine learning methods tend to outperform traditional statistical models at
prediction. In the prediction of academic achievement, ML models have not shown
substantial improvement over logistic regression. So far, these results have
almost entirely focused on college achievement, due to the availability of
administrative datasets, and have contained relatively small sample sizes by ML
standards. In this article we apply popular machine learning models to a large
dataset ($n=1.2$ million) containing primary and middle school performance on a
standardized test given annually to Australian students. We show that machine
learning models do not outperform logistic regression for detecting students
who will perform in the `below standard' band of achievement upon sitting their
next test, even in a large-$n$ setting.

arXiv link: http://arxiv.org/abs/1807.07215v4

Econometrics arXiv updated paper (originally submitted: 2018-07-18)

Take a Look Around: Using Street View and Satellite Images to Estimate House Prices

Authors: Stephen Law, Brooks Paige, Chris Russell

When an individual purchases a home, they simultaneously purchase its
structural features, its accessibility to work, and the neighborhood amenities.
Some amenities, such as air quality, are measurable while others, such as the
prestige or the visual impression of a neighborhood, are difficult to quantify.
Despite the well-known impacts intangible housing features have on house
prices, limited attention has been given to systematically quantifying these
difficult to measure amenities. Two issues have led to this neglect. Not only
do few quantitative methods exist that can measure the urban environment, but
that the collection of such data is both costly and subjective.
We show that street image and satellite image data can capture these urban
qualities and improve the estimation of house prices. We propose a pipeline
that uses a deep neural network model to automatically extract visual features
from images to estimate house prices in London, UK. We make use of traditional
housing features such as age, size, and accessibility as well as visual
features from Google Street View images and Bing aerial images in estimating
the house price model. We find encouraging results where learning to
characterize the urban quality of a neighborhood improves house price
prediction, even when generalizing to previously unseen London boroughs.
We explore the use of non-linear vs. linear methods to fuse these cues with
conventional models of house pricing, and show how the interpretability of
linear models allows us to directly extract proxy variables for visual
desirability of neighborhoods that are both of interest in their own right, and
could be used as inputs to other econometric methods. This is particularly
valuable as once the network has been trained with the training data, it can be
applied elsewhere, allowing us to generate vivid dense maps of the visual
appeal of London streets.

arXiv link: http://arxiv.org/abs/1807.07155v2

Econometrics arXiv paper, submitted: 2018-07-18

A New Index of Human Capital to Predict Economic Growth

Authors: Henry Laverde, Juan C. Correa, Klaus Jaffe

The accumulation of knowledge required to produce economic value is a process
that often relates to nations economic growth. Such a relationship, however, is
misleading when the proxy of such accumulation is the average years of
education. In this paper, we show that the predictive power of this proxy
started to dwindle in 1990 when nations schooling began to homogenized. We
propose a metric of human capital that is less sensitive than average years of
education and remains as a significant predictor of economic growth when tested
with both cross-section data and panel data. We argue that future research on
economic growth will discard educational variables based on quantity as
predictor given the thresholds that these variables are reaching.

arXiv link: http://arxiv.org/abs/1807.07051v1

Econometrics arXiv paper, submitted: 2018-07-18

Cross Validation Based Model Selection via Generalized Method of Moments

Authors: Junpei Komiyama, Hajime Shimao

Structural estimation is an important methodology in empirical economics, and
a large class of structural models are estimated through the generalized method
of moments (GMM). Traditionally, selection of structural models has been
performed based on model fit upon estimation, which take the entire observed
samples. In this paper, we propose a model selection procedure based on
cross-validation (CV), which utilizes sample-splitting technique to avoid
issues such as over-fitting. While CV is widely used in machine learning
communities, we are the first to prove its consistency in model selection in
GMM framework. Its empirical property is compared to existing methods by
simulations of IV regressions and oligopoly market model. In addition, we
propose the way to apply our method to Mathematical Programming of Equilibrium
Constraint (MPEC) approach. Finally, we perform our method to online-retail
sales data to compare dynamic market model to static model.

arXiv link: http://arxiv.org/abs/1807.06993v1

Econometrics arXiv updated paper (originally submitted: 2018-07-18)

Quantile-Regression Inference With Adaptive Control of Size

Authors: Juan Carlos Escanciano, Chuan Goh

Regression quantiles have asymptotic variances that depend on the conditional
densities of the response variable given regressors. This paper develops a new
estimate of the asymptotic variance of regression quantiles that leads any
resulting Wald-type test or confidence region to behave as well in large
samples as its infeasible counterpart in which the true conditional response
densities are embedded. We give explicit guidance on implementing the new
variance estimator to control adaptively the size of any resulting Wald-type
test. Monte Carlo evidence indicates the potential of our approach to deliver
powerful tests of heterogeneity of quantile treatment effects in covariates
with good size performance over different quantile levels, data-generating
processes and sample sizes. We also include an empirical example. Supplementary
material is available online.

arXiv link: http://arxiv.org/abs/1807.06977v2

Econometrics arXiv paper, submitted: 2018-07-17

Pink Work: Same-Sex Marriage, Employment and Discrimination

Authors: Dario Sansone

This paper analyzes how the legalization of same-sex marriage in the U.S.
affected gay and lesbian couples in the labor market. Results from a
difference-in-difference model show that both partners in same-sex couples were
more likely to be employed, to have a full-time contract, and to work longer
hours in states that legalized same-sex marriage. In line with a theoretical
search model of discrimination, suggestive empirical evidence supports the
hypothesis that marriage equality led to an improvement in employment outcomes
among gays and lesbians and lower occupational segregation thanks to a decrease
in discrimination towards sexual minorities.

arXiv link: http://arxiv.org/abs/1807.06698v1

Econometrics arXiv updated paper (originally submitted: 2018-07-17)

Limit Theorems for Factor Models

Authors: Stanislav Anatolyev, Anna Mikusheva

The paper establishes the central limit theorems and proposes how to perform
valid inference in factor models. We consider a setting where many
counties/regions/assets are observed for many time periods, and when estimation
of a global parameter includes aggregation of a cross-section of heterogeneous
micro-parameters estimated separately for each entity. The central limit
theorem applies for quantities involving both cross-sectional and time series
aggregation, as well as for quadratic forms in time-aggregated errors. The
paper studies the conditions when one can consistently estimate the asymptotic
variance, and proposes a bootstrap scheme for cases when one cannot. A small
simulation study illustrates performance of the asymptotic and bootstrap
procedures. The results are useful for making inferences in two-step estimation
procedures related to factor models, as well as in other related contexts. Our
treatment avoids structural modeling of cross-sectional dependence but imposes
time-series independence.

arXiv link: http://arxiv.org/abs/1807.06338v3

Econometrics arXiv paper, submitted: 2018-07-16

A Simple and Efficient Estimation of the Average Treatment Effect in the Presence of Unmeasured Confounders

Authors: Chunrong Ai, Lukang Huang, Zheng Zhang

Wang and Tchetgen Tchetgen (2017) studied identification and estimation of
the average treatment effect when some confounders are unmeasured. Under their
identification condition, they showed that the semiparametric efficient
influence function depends on five unknown functionals. They proposed to
parameterize all functionals and estimate the average treatment effect from the
efficient influence function by replacing the unknown functionals with
estimated functionals. They established that their estimator is consistent when
certain functionals are correctly specified and attains the semiparametric
efficiency bound when all functionals are correctly specified. In applications,
it is likely that those functionals could all be misspecified. Consequently
their estimator could be inconsistent or consistent but not efficient. This
paper presents an alternative estimator that does not require parameterization
of any of the functionals. We establish that the proposed estimator is always
consistent and always attains the semiparametric efficiency bound. A simple and
intuitive estimator of the asymptotic variance is presented, and a small scale
simulation study reveals that the proposed estimation outperforms the existing
alternatives in finite samples.

arXiv link: http://arxiv.org/abs/1807.05678v1

Econometrics arXiv updated paper (originally submitted: 2018-07-12)

Analysis of a Dynamic Voluntary Contribution Mechanism Public Good Game

Authors: Dmytro Bogatov

I present a dynamic, voluntary contribution mechanism, public good game and
derive its potential outcomes. In each period, players endogenously determine
contribution productivity by engaging in costly investment. The level of
contribution productivity carries from period to period, creating a dynamic
link between periods. The investment mimics investing in the stock of
technology for producing public goods such as national defense or a clean
environment. After investing, players decide how much of their remaining money
to contribute to provision of the public good, as in traditional public good
games. I analyze three kinds of outcomes of the game: the lowest payoff
outcome, the Nash Equilibria, and socially optimal behavior. In the lowest
payoff outcome, all players receive payoffs of zero. Nash Equilibrium occurs
when players invest any amount and contribute all or nothing depending on the
contribution productivity. Therefore, there are infinitely many Nash Equilibria
strategies. Finally, the socially optimal result occurs when players invest
everything in early periods, then at some point switch to contributing
everything. My goal is to discover and explain this point. I use mathematical
analysis and computer simulation to derive the results.

arXiv link: http://arxiv.org/abs/1807.04621v2

Econometrics arXiv paper, submitted: 2018-07-11

Heterogeneous Effects of Unconventional Monetary Policy on Loan Demand and Supply. Insights from the Bank Lending Survey

Authors: Martin Guth

This paper analyzes the bank lending channel and the heterogeneous effects on
the euro area, providing evidence that the channel is indeed working. The
analysis of the transmission mechanism is based on structural impulse responses
to an unconventional monetary policy shock on bank loans. The Bank Lending
Survey (BLS) is exploited in order to get insights on developments of loan
demand and supply. The contribution of this paper is to use country-specific
data to analyze the consequences of unconventional monetary policy, instead of
taking an aggregate stance by using euro area data. This approach provides a
deeper understanding of the bank lending channel and its effects. That is, an
expansionary monetary policy shock leads to an increase in loan demand, supply
and output growth. A small north-south disparity between the countries can be
observed.

arXiv link: http://arxiv.org/abs/1807.04161v1

Econometrics arXiv updated paper (originally submitted: 2018-07-11)

Factor models with many assets: strong factors, weak factors, and the two-pass procedure

Authors: Stanislav Anatolyev, Anna Mikusheva

This paper re-examines the problem of estimating risk premia in linear factor
pricing models. Typically, the data used in the empirical literature are
characterized by weakness of some pricing factors, strong cross-sectional
dependence in the errors, and (moderately) high cross-sectional dimensionality.
Using an asymptotic framework where the number of assets/portfolios grows with
the time span of the data while the risk exposures of weak factors are
local-to-zero, we show that the conventional two-pass estimation procedure
delivers inconsistent estimates of the risk premia. We propose a new estimation
procedure based on sample-splitting instrumental variables regression. The
proposed estimator of risk premia is robust to weak included factors and to the
presence of strong unaccounted cross-sectional error dependence. We derive the
many-asset weak factor asymptotic distribution of the proposed estimator, show
how to construct its standard errors, verify its performance in simulations,
and revisit some empirical studies.

arXiv link: http://arxiv.org/abs/1807.04094v2

Econometrics arXiv updated paper (originally submitted: 2018-07-11)

Clustering Macroeconomic Time Series

Authors: Iwo Augustyński, Paweł Laskoś-Grabowski

The data mining technique of time series clustering is well established in
many fields. However, as an unsupervised learning method, it requires making
choices that are nontrivially influenced by the nature of the data involved.
The aim of this paper is to verify usefulness of the time series clustering
method for macroeconomics research, and to develop the most suitable
methodology.
By extensively testing various possibilities, we arrive at a choice of a
dissimilarity measure (compression-based dissimilarity measure, or CDM) which
is particularly suitable for clustering macroeconomic variables. We check that
the results are stable in time and reflect large-scale phenomena such as
crises. We also successfully apply our findings to analysis of national
economies, specifically to identifying their structural relations.

arXiv link: http://arxiv.org/abs/1807.04004v2

Econometrics arXiv paper, submitted: 2018-07-09

Simulation Modelling of Inequality in Cancer Service Access

Authors: Ka C. Chan, Ruth F. G. Williams, Christopher T. Lenard, Terence M. Mills

This paper applies economic concepts from measuring income inequality to an
exercise in assessing spatial inequality in cancer service access in regional
areas. We propose a mathematical model for accessing chemotherapy among local
government areas (LGAs). Our model incorporates a distance factor. With a
simulation we report results for a single inequality measure: the Lorenz curve
is depicted for our illustrative data. We develop this approach in order to
move incrementally towards its application to actual data and real-world health
service regions. We seek to develop the exercises that can lead policy makers
to relevant policy information on the most useful data collections to be
collected and modeling for cancer service access in regional areas.

arXiv link: http://arxiv.org/abs/1807.03048v1

Econometrics arXiv updated paper (originally submitted: 2018-07-09)

Cancer Risk Messages: Public Health and Economic Welfare

Authors: Ruth F. G. Williams, Ka C. Chan, Christopher T. Lenard, Terence M. Mills

Statements for public health purposes such as "1 in 2 will get cancer by age
85" have appeared in public spaces. The meaning drawn from such statements
affects economic welfare, not just public health. Both markets and government
use risk information on all kinds of risks, useful information can, in turn,
improve economic welfare, however inaccuracy can lower it. We adapt the
contingency table approach so that a quoted risk is cross-classified with the
states of nature. We show that bureaucratic objective functions regarding the
accuracy of a reported cancer risk can then be stated.

arXiv link: http://arxiv.org/abs/1807.03045v2

Econometrics arXiv updated paper (originally submitted: 2018-07-09)

Cancer Risk Messages: A Light Bulb Model

Authors: Ka C. Chan, Ruth F. G. Williams, Christopher T. Lenard, Terence M. Mills

The meaning of public messages such as "One in x people gets cancer" or "One
in y people gets cancer by age z" can be improved. One assumption commonly
invoked is that there is no other cause of death, a confusing assumption. We
develop a light bulb model to clarify cumulative risk and we use Markov chain
modeling, incorporating the assumption widely in place, to evaluate transition
probabilities. Age-progression in the cancer risk is then reported on
Australian data. Future modelling can elicit realistic assumptions.

arXiv link: http://arxiv.org/abs/1807.03040v2

Econometrics arXiv paper, submitted: 2018-07-09

Transaction costs and institutional change of trade litigations in Bulgaria

Authors: Shteryo Nozharov, Petya Koralova-Nozharova

The methods of new institutional economics for identifying the transaction
costs of trade litigations in Bulgaria are used in the current paper. For the
needs of the research, an indicative model, measuring this type of costs on
microeconomic level, is applied in the study. The main purpose of the model is
to forecast the rational behavior of trade litigation parties in accordance
with the transaction costs in the process of enforcing the execution of the
signed commercial contract. The application of the model is related to the more
accurate measurement of the transaction costs on microeconomic level, which
fact could lead to better prediction and management of these costs in order
market efficiency and economic growth to be achieved. In addition, it is made
an attempt to be analysed the efficiency of the institutional change of the
commercial justice system and the impact of the reform of the judicial system
over the economic turnover. The augmentation or lack of reduction of the
transaction costs in trade litigations would mean inefficiency of the reform of
the judicial system. JEL Codes: O43, P48, D23, K12

arXiv link: http://arxiv.org/abs/1807.03034v1

Econometrics arXiv updated paper (originally submitted: 2018-07-08)

Measurement Errors as Bad Leverage Points

Authors: Eric Blankmeyer

Errors-in-variables is a long-standing, difficult issue in linear regression;
and progress depends in part on new identifying assumptions. I characterize
measurement error as bad-leverage points and assume that fewer than half the
sample observations are heavily contaminated, in which case a high-breakdown
robust estimator may be able to isolate and down weight or discard the
problematic data. In simulations of simple and multiple regression where eiv
affects 25% of the data and R-squared is mediocre, certain high-breakdown
estimators have small bias and reliable confidence intervals.

arXiv link: http://arxiv.org/abs/1807.02814v2

Econometrics arXiv cross-link from cs.SI (cs.SI), submitted: 2018-07-06

Maximizing Welfare in Social Networks under a Utility Driven Influence Diffusion Model

Authors: Prithu Banerjee, Wei Chen, Laks V. S. Lakshmanan

Motivated by applications such as viral marketing, the problem of influence
maximization (IM) has been extensively studied in the literature. The goal is
to select a small number of users to adopt an item such that it results in a
large cascade of adoptions by others. Existing works have three key
limitations. (1) They do not account for economic considerations of a user in
buying/adopting items. (2) Most studies on multiple items focus on competition,
with complementary items receiving limited attention. (3) For the network
owner, maximizing social welfare is important to ensure customer loyalty, which
is not addressed in prior work in the IM literature. In this paper, we address
all three limitations and propose a novel model called UIC that combines
utility-driven item adoption with influence propagation over networks. Focusing
on the mutually complementary setting, we formulate the problem of social
welfare maximization in this novel setting. We show that while the objective
function is neither submodular nor supermodular, surprisingly a simple greedy
allocation algorithm achieves a factor of $(1-1/e-\epsilon)$ of the optimum
expected social welfare. We develop bundleGRD, a scalable version of
this approximation algorithm, and demonstrate, with comprehensive experiments
on real and synthetic datasets, that it significantly outperforms all
baselines.

arXiv link: http://arxiv.org/abs/1807.02502v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2018-07-06

Autoregressive Wild Bootstrap Inference for Nonparametric Trends

Authors: Marina Friedrich, Stephan Smeekes, Jean-Pierre Urbain

In this paper we propose an autoregressive wild bootstrap method to construct
confidence bands around a smooth deterministic trend. The bootstrap method is
easy to implement and does not require any adjustments in the presence of
missing data, which makes it particularly suitable for climatological
applications. We establish the asymptotic validity of the bootstrap method for
both pointwise and simultaneous confidence bands under general conditions,
allowing for general patterns of missing data, serial dependence and
heteroskedasticity. The finite sample properties of the method are studied in a
simulation study. We use the method to study the evolution of trends in daily
measurements of atmospheric ethane obtained from a weather station in the Swiss
Alps, where the method can easily deal with the many missing observations due
to adverse weather conditions.

arXiv link: http://arxiv.org/abs/1807.02357v2

Econometrics arXiv updated paper (originally submitted: 2018-07-06)

State-Varying Factor Models of Large Dimensions

Authors: Markus Pelger, Ruoxuan Xiong

This paper develops an inferential theory for state-varying factor models of
large dimensions. Unlike constant factor models, loadings are general functions
of some recurrent state process. We develop an estimator for the latent factors
and state-varying loadings under a large cross-section and time dimension. Our
estimator combines nonparametric methods with principal component analysis. We
derive the rate of convergence and limiting normal distribution for the
factors, loadings and common components. In addition, we develop a statistical
test for a change in the factor structure in different states. We apply the
estimator to U.S. Treasury yields and S&P500 stock returns. The systematic
factor structure in treasury yields differs in times of booms and recessions as
well as in periods of high market volatility. State-varying factors based on
the VIX capture significantly more variation and pricing information in
individual stocks than constant factor models.

arXiv link: http://arxiv.org/abs/1807.02248v4

Econometrics arXiv updated paper (originally submitted: 2018-07-05)

Minimizing Sensitivity to Model Misspecification

Authors: Stéphane Bonhomme, Martin Weidner

We propose a framework for estimation and inference when the model may be
misspecified. We rely on a local asymptotic approach where the degree of
misspecification is indexed by the sample size. We construct estimators whose
mean squared error is minimax in a neighborhood of the reference model, based
on one-step adjustments. In addition, we provide confidence intervals that
contain the true parameter under local misspecification. As a tool to interpret
the degree of misspecification, we map it to the local power of a specification
test of the reference model. Our approach allows for systematic sensitivity
analysis when the parameter of interest may be partially or irregularly
identified. As illustrations, we study three applications: an empirical
analysis of the impact of conditional cash transfers in Mexico where
misspecification stems from the presence of stigma effects of the program, a
cross-sectional binary choice model where the error distribution is
misspecified, and a dynamic panel data binary choice model where the number of
time periods is small and the distribution of individual effects is
misspecified.

arXiv link: http://arxiv.org/abs/1807.02161v6

Econometrics arXiv updated paper (originally submitted: 2018-07-05)

Fixed Effects and the Generalized Mundlak Estimator

Authors: Dmitry Arkhangelsky, Guido Imbens

We develop a new approach for estimating average treatment effects in
observational studies with unobserved group-level heterogeneity. We consider a
general model with group-level unconfoundedness and provide conditions under
which aggregate balancing statistics -- group-level averages of functions of
treatments and covariates -- are sufficient to eliminate differences between
groups. Building on these results, we reinterpret commonly used linear
fixed-effect regression estimators by writing them in the Mundlak form as
linear regression estimators without fixed effects but including group
averages. We use this representation to develop Generalized Mundlak Estimators
(GMEs) that capture group differences through group averages of (functions of)
the unit-level variables and adjust for these group differences in flexible and
robust ways in the spirit of the modern causal literature.

arXiv link: http://arxiv.org/abs/1807.02099v9

Econometrics arXiv updated paper (originally submitted: 2018-07-04)

On the Identifying Content of Instrument Monotonicity

Authors: Vishal Kamat

This paper studies the identifying content of the instrument monotonicity
assumption of Imbens and Angrist (1994) on the distribution of potential
outcomes in a model with a binary outcome, a binary treatment and an exogenous
binary instrument. Specifically, I derive necessary and sufficient conditions
on the distribution of the data under which the identified set for the
distribution of potential outcomes when the instrument monotonicity assumption
is imposed can be a strict subset of that when it is not imposed.

arXiv link: http://arxiv.org/abs/1807.01661v2

Econometrics arXiv paper, submitted: 2018-07-04

Indirect inference through prediction

Authors: Ernesto Carrella, Richard M. Bailey, Jens Koed Madsen

By recasting indirect inference estimation as a prediction rather than a
minimization and by using regularized regressions, we can bypass the three
major problems of estimation: selecting the summary statistics, defining the
distance function and minimizing it numerically. By substituting regression
with classification we can extend this approach to model selection as well. We
present three examples: a statistical fit, the parametrization of a simple real
business cycle model and heuristics selection in a fishery agent-based model.
The outcome is a method that automatically chooses summary statistics, weighs
them and use them to parametrize models without running any direct
minimization.

arXiv link: http://arxiv.org/abs/1807.01579v1

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2018-07-04

Bring a friend! Privately or Publicly?

Authors: Elias Carroni, Paolo Pin, Simone Righi

We study the optimal referral strategy of a seller and its relationship with
the type of communication channels among consumers. The seller faces a
partially uninformed population of consumers, interconnected through a directed
social network. In the network, the seller offers rewards to informed consumers
(influencers) conditional on inducing purchases by uninformed consumers
(influenced). Rewards are needed to bear a communication cost and to induce
word-of-mouth (WOM) either privately (cost-per-contact) or publicly (fixed cost
to inform all friends). From the seller's viewpoint, eliciting Private WOM is
more costly than eliciting Public WOM. We investigate (i) the incentives for
the seller to move to a denser network, inducing either Private or Public WOM
and (ii) the optimal mix between the two types of communication. A denser
network is found to be always better, not only for information diffusion but
also for seller's profits, as long as Private WOM is concerned. Differently,
under Public WOM, the seller may prefer an environment with less competition
between informed consumers and the presence of highly connected influencers
(hubs) is the main driver to make network density beneficial to profits. When
the seller is able to discriminate between Private and Public WOM, the optimal
strategy is to cheaply incentivize the more connected people to pass on the
information publicly and then offer a high bonus for Private WOM.

arXiv link: http://arxiv.org/abs/1807.01994v2

Econometrics arXiv updated paper (originally submitted: 2018-07-02)

Stochastic model specification in Markov switching vector error correction models

Authors: Niko Hauzenberger, Florian Huber, Michael Pfarrhofer, Thomas O. Zörner

This paper proposes a hierarchical modeling approach to perform stochastic
model specification in Markov switching vector error correction models. We
assume that a common distribution gives rise to the regime-specific regression
coefficients. The mean as well as the variances of this distribution are
treated as fully stochastic and suitable shrinkage priors are used. These
shrinkage priors enable to assess which coefficients differ across regimes in a
flexible manner. In the case of similar coefficients, our model pushes the
respective regions of the parameter space towards the common distribution. This
allows for selecting a parsimonious model while still maintaining sufficient
flexibility to control for sudden shifts in the parameters, if necessary. We
apply our modeling approach to real-time Euro area data and assume transition
probabilities between expansionary and recessionary regimes to be driven by the
cointegration errors. The results suggest that the regime allocation is
governed by a subset of short-run adjustment coefficients and regime-specific
variance-covariance matrices. These findings are complemented by an
out-of-sample forecast exercise, illustrating the advantages of the model for
predicting Euro area inflation in real time.

arXiv link: http://arxiv.org/abs/1807.00529v2

Econometrics arXiv paper, submitted: 2018-07-02

Maastricht and Monetary Cooperation

Authors: Chris Kirrane

This paper describes the opportunities and also the difficulties of EMU with
regard to international monetary cooperation. Even though the institutional and
intellectual assistance to the coordination of monetary policy in the EU will
probably be strengthened with the EMU, among the shortcomings of the Maastricht
Treaty concerns the relationship between the founder members and those
countries who wish to remain outside monetary union.

arXiv link: http://arxiv.org/abs/1807.00419v1

Econometrics arXiv paper, submitted: 2018-07-02

The Bretton Woods Experience and ERM

Authors: Chris Kirrane

Historical examination of the Bretton Woods system allows comparisons to be
made with the current evolution of the EMS.

arXiv link: http://arxiv.org/abs/1807.00418v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2018-06-29

Subvector Inference in Partially Identified Models with Many Moment Inequalities

Authors: Alexandre Belloni, Federico Bugni, Victor Chernozhukov

This paper considers inference for a function of a parameter vector in a
partially identified model with many moment inequalities. This framework allows
the number of moment conditions to grow with the sample size, possibly at
exponential rates. Our main motivating application is subvector inference,
i.e., inference on a single component of the partially identified parameter
vector associated with a treatment effect or a policy variable of interest.
Our inference method compares a MinMax test statistic (minimum over
parameters satisfying $H_0$ and maximum over moment inequalities) against
critical values that are based on bootstrap approximations or analytical
bounds. We show that this method controls asymptotic size uniformly over a
large class of data generating processes despite the partially identified many
moment inequality setting. The finite sample analysis allows us to obtain
explicit rates of convergence on the size control. Our results are based on
combining non-asymptotic approximations and new high-dimensional central limit
theorems for the MinMax of the components of random matrices. Unlike the
previous literature on functional inference in partially identified models, our
results do not rely on weak convergence results based on Donsker's class
assumptions and, in fact, our test statistic may not even converge in
distribution. Our bootstrap approximation requires the choice of a tuning
parameter sequence that can avoid the excessive concentration of our test
statistic. To this end, we propose an asymptotically valid data-driven method
to select this tuning parameter sequence. This method generalizes the selection
of tuning parameter sequences to problems outside the Donsker's class
assumptions and may also be of independent interest. Our procedures based on
self-normalized moderate deviation bounds are relatively more conservative but
easier to implement.

arXiv link: http://arxiv.org/abs/1806.11466v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2018-06-28

Quantitative analysis on the disparity of regional economic development in China and its evolution from 1952 to 2000

Authors: Jianhua Xu, Nanshan Ai, Yan Lu, Yong Chen, Yiying Ling, Wenze Yue

Domestic and foreign scholars have already done much research on regional
disparity and its evolution in China, but there is a big difference in
conclusions. What is the reason for this? We think it is mainly due to
different analytic approaches, perspectives, spatial units, statistical
indicators and different periods for studies. On the basis of previous analyses
and findings, we have done some further quantitative computation and empirical
study, and revealed the inter-provincial disparity and regional disparity of
economic development and their evolution trends from 1952-2000. The results
shows that (a) Regional disparity in economic development in China, including
the inter-provincial disparity, inter-regional disparity and intra-regional
disparity, has existed for years; (b) Gini coefficient and Theil coefficient
have revealed a similar dynamic trend for comparative disparity in economic
development between provinces in China. From 1952 to 1978, except for the
"Great Leap Forward" period, comparative disparity basically assumes a upward
trend and it assumed a slowly downward trend from 1979 to1990. Afterwards from
1991 to 2000 the disparity assumed a slowly upward trend again; (c) A
comparison between Shanghai and Guizhou shows that absolute inter-provincial
disparity has been quite big for years; and (d) The Hurst exponent (H=0.5) in
the period of 1966-1978 indicates that the comparative inter-provincial
disparity of economic development showed a random characteristic, and in the
Hurst exponent (H>0.5) in period of 1979-2000 indicates that in this period the
evolution of the comparative inter-provincial disparity of economic development
in China has a long-enduring characteristic.

arXiv link: http://arxiv.org/abs/1806.10794v1

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2018-06-27

Implementing Convex Optimization in R: Two Econometric Examples

Authors: Zhan Gao, Zhentao Shi

Economists specify high-dimensional models to address heterogeneity in
empirical studies with complex big data. Estimation of these models calls for
optimization techniques to handle a large number of parameters. Convex problems
can be effectively executed in modern statistical programming languages. We
complement Koenker and Mizera (2014)'s work on numerical implementation of
convex optimization, with focus on high-dimensional econometric estimators.
Combining R and the convex solver MOSEK achieves faster speed and equivalent
accuracy, demonstrated by examples from Su, Shi, and Phillips (2016) and Shi
(2016). Robust performance of convex optimization is witnessed cross platforms.
The convenience and reliability of convex optimization in R make it easy to
turn new ideas into prototypes.

arXiv link: http://arxiv.org/abs/1806.10423v2

Econometrics arXiv paper, submitted: 2018-06-25

Point-identification in multivariate nonseparable triangular models

Authors: Florian Gunsilius

In this article we introduce a general nonparametric point-identification
result for nonseparable triangular models with a multivariate first- and second
stage. Based on this we prove point-identification of Hedonic models with
multivariate heterogeneity and endogenous observable characteristics, extending
and complementing identification results from the literature which all require
exogeneity. As an additional application of our theoretical result, we show
that the BLP model (Berry et al. 1995) can also be identified without index
restrictions.

arXiv link: http://arxiv.org/abs/1806.09680v1

Econometrics arXiv updated paper (originally submitted: 2018-06-25)

Non-testability of instrument validity under continuous endogenous variables

Authors: Florian Gunsilius

This note presents a proof of the conjecture in \citet*{pearl1995testability}
about testing the validity of an instrumental variable in hidden variable
models. It implies that instrument validity cannot be tested in the case where
the endogenous treatment is continuously distributed. This stands in contrast
to the classical testability results for instrument validity when the treatment
is discrete. However, imposing weak structural assumptions on the model, such
as continuity between the observable variables, can re-establish theoretical
testability in the continuous setting.

arXiv link: http://arxiv.org/abs/1806.09517v3

Econometrics arXiv paper, submitted: 2018-06-25

Semiparametrically Point-Optimal Hybrid Rank Tests for Unit Roots

Authors: Bo Zhou, Ramon van den Akker, Bas J. M. Werker

We propose a new class of unit root tests that exploits invariance properties
in the Locally Asymptotically Brownian Functional limit experiment associated
to the unit root model. The invariance structures naturally suggest tests that
are based on the ranks of the increments of the observations, their average,
and an assumed reference density for the innovations. The tests are
semiparametric in the sense that they are valid, i.e., have the correct
(asymptotic) size, irrespective of the true innovation density. For a correctly
specified reference density, our test is point-optimal and nearly efficient.
For arbitrary reference densities, we establish a Chernoff-Savage type result,
i.e., our test performs as well as commonly used tests under Gaussian
innovations but has improved power under other, e.g., fat-tailed or skewed,
innovation distributions. To avoid nonparametric estimation, we propose a
simplified version of our test that exhibits the same asymptotic properties,
except for the Chernoff-Savage result that we are only able to demonstrate by
means of simulations.

arXiv link: http://arxiv.org/abs/1806.09304v1

Econometrics arXiv paper, submitted: 2018-06-21

The transmission of uncertainty shocks on income inequality: State-level evidence from the United States

Authors: Manfred M. Fischer, Florian Huber, Michael Pfarrhofer

In this paper, we explore the relationship between state-level household
income inequality and macroeconomic uncertainty in the United States. Using a
novel large-scale macroeconometric model, we shed light on regional disparities
of inequality responses to a national uncertainty shock. The results suggest
that income inequality decreases in most states, with a pronounced degree of
heterogeneity in terms of shapes and magnitudes of the dynamic responses. By
contrast, some few states, mostly located in the West and South census region,
display increasing levels of income inequality over time. We find that this
directional pattern in responses is mainly driven by the income composition and
labor market fundamentals. In addition, forecast error variance decompositions
allow for a quantitative assessment of the importance of uncertainty shocks in
explaining income inequality. The findings highlight that volatility shocks
account for a considerable fraction of forecast error variance for most states
considered. Finally, a regression-based analysis sheds light on the driving
forces behind differences in state-specific inequality responses.

arXiv link: http://arxiv.org/abs/1806.08278v1

Econometrics arXiv updated paper (originally submitted: 2018-06-20)

Shift-Share Designs: Theory and Inference

Authors: Rodrigo Adão, Michal Kolesár, Eduardo Morales

We study inference in shift-share regression designs, such as when a regional
outcome is regressed on a weighted average of sectoral shocks, using regional
sector shares as weights. We conduct a placebo exercise in which we estimate
the effect of a shift-share regressor constructed with randomly generated
sectoral shocks on actual labor market outcomes across U.S. Commuting Zones.
Tests based on commonly used standard errors with 5% nominal significance
level reject the null of no effect in up to 55% of the placebo samples. We use
a stylized economic model to show that this overrejection problem arises
because regression residuals are correlated across regions with similar
sectoral shares, independently of their geographic location. We derive novel
inference methods that are valid under arbitrary cross-regional correlation in
the regression residuals. We show using popular applications of shift-share
designs that our methods may lead to substantially wider confidence intervals
in practice.

arXiv link: http://arxiv.org/abs/1806.07928v5

Econometrics arXiv cross-link from q-fin.GN (q-fin.GN), submitted: 2018-06-20

Is VIX still the investor fear gauge? Evidence for the US and BRIC markets

Authors: Marco Neffelli, Marina Resta

We investigate the relationships of the VIX with US and BRIC markets. In
detail, we pick up the analysis from the point left off by (Sarwar, 2012), and
we focus on the period: Jan 2007 - Feb 2018, thus capturing the relations
before, during and after the 2008 financial crisis. Results pinpoint frequent
structural breaks in the VIX and suggest an enhancement around 2008 of the fear
transmission in response to negative market moves; largely depending on
overlaps in trading hours, this has become even stronger post-crisis for the
US, while for BRIC countries has gone back towards pre-crisis levels.

arXiv link: http://arxiv.org/abs/1806.07556v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2018-06-19

Adaptive Bayesian Estimation of Mixed Discrete-Continuous Distributions under Smoothness and Sparsity

Authors: Andriy Norets, Justinas Pelenis

We consider nonparametric estimation of a mixed discrete-continuous
distribution under anisotropic smoothness conditions and possibly increasing
number of support points for the discrete part of the distribution. For these
settings, we derive lower bounds on the estimation rates in the total variation
distance. Next, we consider a nonparametric mixture of normals model that uses
continuous latent variables for the discrete part of the observations. We show
that the posterior in this model contracts at rates that are equal to the
derived lower bounds up to a log factor. Thus, Bayesian mixture of normals
models can be used for optimal adaptive estimation of mixed discrete-continuous
distributions.

arXiv link: http://arxiv.org/abs/1806.07484v1

Econometrics arXiv cross-link from quant-ph (quant-ph), submitted: 2018-06-19

Quantum Nash equilibrium in the thermodynamic limit

Authors: Shubhayan Sarkar, Colin Benjamin

The quantum Nash equilibrium in the thermodynamic limit is studied for games
like quantum Prisoner's dilemma and the quantum game of chicken. A phase
transition is seen in both games as a function of the entanglement in the game.
We observe that for maximal entanglement irrespective of the classical payoffs,
a majority of players choose Quantum strategy over Defect in the thermodynamic
limit.

arXiv link: http://arxiv.org/abs/1806.07343v3

Econometrics arXiv updated paper (originally submitted: 2018-06-19)

Cluster-Robust Standard Errors for Linear Regression Models with Many Controls

Authors: Riccardo D'Adamo

It is common practice in empirical work to employ cluster-robust standard
errors when using the linear regression model to estimate some
structural/causal effect of interest. Researchers also often include a large
set of regressors in their model specification in order to control for observed
and unobserved confounders. In this paper we develop inference methods for
linear regression models with many controls and clustering. We show that
inference based on the usual cluster-robust standard errors by Liang and Zeger
(1986) is invalid in general when the number of controls is a non-vanishing
fraction of the sample size. We then propose a new clustered standard errors
formula that is robust to the inclusion of many controls and allows to carry
out valid inference in a variety of high-dimensional linear regression models,
including fixed effects panel data models and the semiparametric partially
linear model. Monte Carlo evidence supports our theoretical results and shows
that our proposed variance estimator performs well in finite samples. The
proposed method is also illustrated with an empirical application that
re-visits Donohue III and Levitt's (2001) study of the impact of abortion on
crime.

arXiv link: http://arxiv.org/abs/1806.07314v3

Econometrics arXiv cross-link from Quantitative Finance – Economics (q-fin.EC), submitted: 2018-06-18

The Origin and the Resolution of Nonuniqueness in Linear Rational Expectations

Authors: John G. Thistle

The nonuniqueness of rational expectations is explained: in the stochastic,
discrete-time, linear, constant-coefficients case, the associated free
parameters are coefficients that determine the public's most immediate
reactions to shocks. The requirement of model-consistency may leave these
parameters completely free, yet when their values are appropriately specified,
a unique solution is determined. In a broad class of models, the requirement of
least-square forecast errors determines the parameter values, and therefore
defines a unique solution. This approach is independent of dynamical stability,
and generally does not suppress model dynamics.
Application to a standard New Keynesian example shows that the traditional
solution suppresses precisely those dynamics that arise from rational
expectations. The uncovering of those dynamics reveals their incompatibility
with the new I-S equation and the expectational Phillips curve.

arXiv link: http://arxiv.org/abs/1806.06657v3

Econometrics arXiv updated paper (originally submitted: 2018-06-17)

Effect of Climate and Geography on worldwide fine resolution economic activity

Authors: Alberto Troccoli

Geography, including climatic factors, have long been considered potentially
important elements in shaping socio-economic activities, alongside other
determinants, such as institutions. Here we demonstrate that geography and
climate satisfactorily explain worldwide economic activity as measured by the
per capita Gross Cell Product (GCP-PC) at a fine geographical resolution,
typically much higher than country average. A 1{\deg} by 1{\deg} GCP-PC dataset
has been key for establishing and testing a direct relationship between 'local'
geography/climate and GCP-PC. Not only have we tested the geography/climate
hypothesis using many possible explanatory variables, importantly we have also
predicted and reconstructed GCP-PC worldwide by retaining the most significant
predictors. While this study confirms that latitude is the most important
predictor for GCP-PC when taken in isolation, the accuracy of the GCP-PC
prediction is greatly improved when other factors mainly related to variations
in climatic variables, such as the variability in air pressure, rather than
average climatic conditions as typically used, are considered. Implications of
these findings include an improved understanding of why economically better-off
societies are geographically placed where they are

arXiv link: http://arxiv.org/abs/1806.06358v2

Econometrics arXiv paper, submitted: 2018-06-17

On the relation between Sion's minimax theorem and existence of Nash equilibrium in asymmetric multi-players zero-sum game with only one alien

Authors: Atsuhiro Satoh, Yasuhito Tanaka

We consider the relation between Sion's minimax theorem for a continuous
function and a Nash equilibrium in an asymmetric multi-players zero-sum game in
which only one player is different from other players, and the game is
symmetric for the other players. Then,
1. The existence of a Nash equilibrium, which is symmetric for players other
than one player, implies Sion's minimax theorem for pairs of this player and
one of other players with symmetry for the other players.
2. Sion's minimax theorem for pairs of one player and one of other players
with symmetry for the other players implies the existence of a Nash equilibrium
which is symmetric for the other players.
Thus, they are equivalent.

arXiv link: http://arxiv.org/abs/1806.07253v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2018-06-15

Generalized Log-Normal Chain-Ladder

Authors: D. Kuang, B. Nielsen

We propose an asymptotic theory for distribution forecasting from the log
normal chain-ladder model. The theory overcomes the difficulty of convoluting
log normal variables and takes estimation error into account. The results
differ from that of the over-dispersed Poisson model and from the chain-ladder
based bootstrap. We embed the log normal chain-ladder model in a class of
infinitely divisible distributions called the generalized log normal
chain-ladder model. The asymptotic theory uses small $\sigma$ asymptotics where
the dimension of the reserving triangle is kept fixed while the standard
deviation is assumed to decrease. The resulting asymptotic forecast
distributions follow t distributions. The theory is supported by simulations
and an empirical application.

arXiv link: http://arxiv.org/abs/1806.05939v1

Econometrics arXiv updated paper (originally submitted: 2018-06-13)

Stratification Trees for Adaptive Randomization in Randomized Controlled Trials

Authors: Max Tabord-Meehan

This paper proposes an adaptive randomization procedure for two-stage
randomized controlled trials. The method uses data from a first-wave experiment
in order to determine how to stratify in a second wave of the experiment, where
the objective is to minimize the variance of an estimator for the average
treatment effect (ATE). We consider selection from a class of stratified
randomization procedures which we call stratification trees: these are
procedures whose strata can be represented as decision trees, with differing
treatment assignment probabilities across strata. By using the first wave to
estimate a stratification tree, we simultaneously select which covariates to
use for stratification, how to stratify over these covariates, as well as the
assignment probabilities within these strata. Our main result shows that using
this randomization procedure with an appropriate estimator results in an
asymptotic variance which is minimal in the class of stratification trees.
Moreover, the results we present are able to accommodate a large class of
assignment mechanisms within strata, including stratified block randomization.
In a simulation study, we find that our method, paired with an appropriate
cross-validation procedure ,can improve on ad-hoc choices of stratification. We
conclude by applying our method to the study in Karlan and Wood (2017), where
we estimate stratification trees using the first wave of their experiment.

arXiv link: http://arxiv.org/abs/1806.05127v7

Econometrics arXiv updated paper (originally submitted: 2018-06-13)

LASSO-Driven Inference in Time and Space

Authors: Victor Chernozhukov, Wolfgang K. Härdle, Chen Huang, Weining Wang

We consider the estimation and inference in a system of high-dimensional
regression equations allowing for temporal and cross-sectional dependency in
covariates and error processes, covering rather general forms of weak temporal
dependence. A sequence of regressions with many regressors using LASSO (Least
Absolute Shrinkage and Selection Operator) is applied for variable selection
purpose, and an overall penalty level is carefully chosen by a block multiplier
bootstrap procedure to account for multiplicity of the equations and
dependencies in the data. Correspondingly, oracle properties with a jointly
selected tuning parameter are derived. We further provide high-quality
de-biased simultaneous inference on the many target parameters of the system.
We provide bootstrap consistency results of the test procedure, which are based
on a general Bahadur representation for the $Z$-estimators with dependent data.
Simulations demonstrate good performance of the proposed inference procedure.
Finally, we apply the method to quantify spillover effects of textual sentiment
indices in a financial market and to test the connectedness among sectors.

arXiv link: http://arxiv.org/abs/1806.05081v4

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2018-06-13

Regularized Orthogonal Machine Learning for Nonlinear Semiparametric Models

Authors: Denis Nekipelov, Vira Semenova, Vasilis Syrgkanis

This paper proposes a Lasso-type estimator for a high-dimensional sparse
parameter identified by a single index conditional moment restriction (CMR). In
addition to this parameter, the moment function can also depend on a nuisance
function, such as the propensity score or the conditional choice probability,
which we estimate by modern machine learning tools. We first adjust the moment
function so that the gradient of the future loss function is insensitive
(formally, Neyman-orthogonal) with respect to the first-stage regularization
bias, preserving the single index property. We then take the loss function to
be an indefinite integral of the adjusted moment function with respect to the
single index. The proposed Lasso estimator converges at the oracle rate, where
the oracle knows the nuisance function and solves only the parametric problem.
We demonstrate our method by estimating the short-term heterogeneous impact of
Connecticut's Jobs First welfare reform experiment on women's welfare
participation decision.

arXiv link: http://arxiv.org/abs/1806.04823v8

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2018-06-12

Asymmetric response to PMI announcements in China's stock returns

Authors: Yingli Wang, Xiaoguang Yang

Considered an important macroeconomic indicator, the Purchasing Managers'
Index (PMI) on Manufacturing generally assumes that PMI announcements will
produce an impact on stock markets. International experience suggests that
stock markets react to negative PMI news. In this research, we empirically
investigate the stock market reaction towards PMI in China. The asymmetric
effects of PMI announcements on the stock market are observed: no market
reaction is generated towards negative PMI announcements, while a positive
reaction is generally generated for positive PMI news. We further find that the
positive reaction towards the positive PMI news occurs 1 day before the
announcement and lasts for nearly 3 days, and the positive reaction is observed
in the context of expanding economic conditions. By contrast, the negative
reaction towards negative PMI news is prevalent during downward economic
conditions for stocks with low market value, low institutional shareholding
ratios or high price earnings. Our study implies that China's stock market
favors risk to a certain extent given the vast number of individual investors
in the country, and there may exist information leakage in the market.

arXiv link: http://arxiv.org/abs/1806.04347v1

Econometrics arXiv paper, submitted: 2018-06-11

Estimating Trade-Related Adjustment Costs in the Agricultural Sector in Iran

Authors: Omid Karami, Mina Mahmoudi

Tariff liberalization and its impact on tax revenue is an important
consideration for developing countries, because they are increasingly facing
the difficult task of implementing and harmonizing regional and international
trade commitments. The tariff reform and its costs for Iranian government is
one of the issues that are examined in this study. Another goal of this paper
is, estimating the cost of trade liberalization. On this regard, imports value
of agricultural sector in Iran in 2010 was analyzed according to two scenarios.
For reforming nuisance tariff, a VAT policy is used in both scenarios. In this
study, TRIST method is used. In the first scenario, imports' value decreased to
a level equal to the second scenario and higher tariff revenue will be created.
The results show that reducing the average tariff rate does not always result
in the loss of tariff revenue. This paper is a witness that different forms of
tariff can generate different amount of income when they have same level of
liberalization and equal effect on producers. Therefore, using a good tariff
regime can help a government to generate income when increases social welfare
by liberalization.

arXiv link: http://arxiv.org/abs/1806.04238v1

Econometrics arXiv paper, submitted: 2018-06-11

The Role of Agricultural Sector Productivity in Economic Growth: The Case of Iran's Economic Development Plan

Authors: Morteza Tahamipour, Mina Mahmoudi

This study provides the theoretical framework and empirical model for
productivity growth evaluations in agricultural sector as one of the most
important sectors in Iran's economic development plan. We use the Solow
residual model to measure the productivity growth share in the value-added
growth of the agricultural sector. Our time series data includes value-added
per worker, employment, and capital in this sector. The results show that the
average total factor productivity growth rate in the agricultural sector is
-0.72% during 1991-2010. Also, during this period, the share of total factor
productivity growth in the value-added growth is -19.6%, while it has been
forecasted to be 33.8% in the fourth development plan. Considering the
effective role of capital in the agricultural low productivity, we suggest
applying productivity management plans (especially in regards of capital
productivity) to achieve future growth goals.

arXiv link: http://arxiv.org/abs/1806.04235v1

Econometrics arXiv paper, submitted: 2018-06-11

A Growth Model with Unemployment

Authors: Mina Mahmoudi, Mark Pingle

A standard growth model is modified in a straightforward way to incorporate
what Keynes (1936) suggests in the "essence" of his general theory. The
theoretical essence is the idea that exogenous changes in investment cause
changes in employment and unemployment. We implement this idea by assuming the
path for capital growth rate is exogenous in the growth model. The result is a
growth model that can explain both long term trends and fluctuations around the
trend. The modified growth model was tested using the U.S. economic data from
1947 to 2014. The hypothesized inverse relationship between the capital growth
and changes in unemployment was confirmed, and the structurally estimated model
fits fluctuations in unemployment reasonably well.

arXiv link: http://arxiv.org/abs/1806.04228v1

Econometrics arXiv updated paper (originally submitted: 2018-06-11)

Inference under Covariate-Adaptive Randomization with Multiple Treatments

Authors: Federico A. Bugni, Ivan A. Canay, Azeem M. Shaikh

This paper studies inference in randomized controlled trials with
covariate-adaptive randomization when there are multiple treatments. More
specifically, we study inference about the average effect of one or more
treatments relative to other treatments or a control. As in Bugni et al.
(2018), covariate-adaptive randomization refers to randomization schemes that
first stratify according to baseline covariates and then assign treatment
status so as to achieve balance within each stratum. In contrast to Bugni et
al. (2018), we not only allow for multiple treatments, but further allow for
the proportion of units being assigned to each of the treatments to vary across
strata. We first study the properties of estimators derived from a fully
saturated linear regression, i.e., a linear regression of the outcome on all
interactions between indicators for each of the treatments and indicators for
each of the strata. We show that tests based on these estimators using the
usual heteroskedasticity-consistent estimator of the asymptotic variance are
invalid; on the other hand, tests based on these estimators and suitable
estimators of the asymptotic variance that we provide are exact. For the
special case in which the target proportion of units being assigned to each of
the treatments does not vary across strata, we additionally consider tests
based on estimators derived from a linear regression with strata fixed effects,
i.e., a linear regression of the outcome on indicators for each of the
treatments and indicators for each of the strata. We show that tests based on
these estimators using the usual heteroskedasticity-consistent estimator of the
asymptotic variance are conservative, but tests based on these estimators and
suitable estimators of the asymptotic variance that we provide are exact. A
simulation study illustrates the practical relevance of our theoretical
results.

arXiv link: http://arxiv.org/abs/1806.04206v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2018-06-10

Determining the dimension of factor structures in non-stationary large datasets

Authors: Matteo Barigozzi, Lorenzo Trapani

We propose a procedure to determine the dimension of the common factor space
in a large, possibly non-stationary, dataset. Our procedure is designed to
determine whether there are (and how many) common factors (i) with linear
trends, (ii) with stochastic trends, (iii) with no trends, i.e. stationary. Our
analysis is based on the fact that the largest eigenvalues of a suitably scaled
covariance matrix of the data (corresponding to the common factor part)
diverge, as the dimension $N$ of the dataset diverges, whilst the others stay
bounded. Therefore, we propose a class of randomised test statistics for the
null that the $p$-th eigenvalue diverges, based directly on the estimated
eigenvalue. The tests only requires minimal assumptions on the data, and no
restrictions on the relative rates of divergence of $N$ and $T$ are imposed.
Monte Carlo evidence shows that our procedure has very good finite sample
properties, clearly dominating competing approaches when no common factors are
present. We illustrate our methodology through an application to US bond yields
with different maturities observed over the last 30 years. A common linear
trend and two common stochastic trends are found and identified as the
classical level, slope and curvature factors.

arXiv link: http://arxiv.org/abs/1806.03647v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2018-06-09

Orthogonal Random Forest for Causal Inference

Authors: Miruna Oprescu, Vasilis Syrgkanis, Zhiwei Steven Wu

We propose the orthogonal random forest, an algorithm that combines
Neyman-orthogonality to reduce sensitivity with respect to estimation error of
nuisance parameters with generalized random forests (Athey et al., 2017)--a
flexible non-parametric method for statistical estimation of conditional moment
models using random forests. We provide a consistency rate and establish
asymptotic normality for our estimator. We show that under mild assumptions on
the consistency rate of the nuisance estimator, we can achieve the same error
rate as an oracle with a priori knowledge of these nuisance parameters. We show
that when the nuisance functions have a locally sparse parametrization, then a
local $\ell_1$-penalized regression achieves the required rate. We apply our
method to estimate heterogeneous treatment effects from observational data with
discrete treatments or continuous treatments, and we show that, unlike prior
work, our method provably allows to control for a high-dimensional set of
variables under standard sparsity conditions. We also provide a comprehensive
empirical evaluation of our algorithm on both synthetic and real data.

arXiv link: http://arxiv.org/abs/1806.03467v4

Econometrics arXiv updated paper (originally submitted: 2018-06-09)

A hybrid econometric-machine learning approach for relative importance analysis: Prioritizing food policy

Authors: Akash Malhotra

A measure of relative importance of variables is often desired by researchers
when the explanatory aspects of econometric methods are of interest. To this
end, the author briefly reviews the limitations of conventional econometrics in
constructing a reliable measure of variable importance. The author highlights
the relative stature of explanatory and predictive analysis in economics and
the emergence of fruitful collaborations between econometrics and computer
science. Learning lessons from both, the author proposes a hybrid approach
based on conventional econometrics and advanced machine learning (ML)
algorithms, which are otherwise, used in predictive analytics. The purpose of
this article is two-fold, to propose a hybrid approach to assess relative
importance and demonstrate its applicability in addressing policy priority
issues with an example of food inflation in India, followed by a broader aim to
introduce the possibility of conflation of ML and conventional econometrics to
an audience of researchers in economics and social sciences, in general.

arXiv link: http://arxiv.org/abs/1806.04517v3

Econometrics arXiv updated paper (originally submitted: 2018-06-08)

Pricing Engine: Estimating Causal Impacts in Real World Business Settings

Authors: Matt Goldman, Brian Quistorff

We introduce the Pricing Engine package to enable the use of Double ML
estimation techniques in general panel data settings. Customization allows the
user to specify first-stage models, first-stage featurization, second stage
treatment selection and second stage causal-modeling. We also introduce a
DynamicDML class that allows the user to generate dynamic treatment-aware
forecasts at a range of leads and to understand how the forecasts will vary as
a function of causally estimated treatment parameters. The Pricing Engine is
built on Python 3.5 and can be run on an Azure ML Workbench environment with
the addition of only a few Python packages. This note provides high-level
discussion of the Double ML method, describes the packages intended use and
includes an example Jupyter notebook demonstrating application to some publicly
available data. Installation of the package and additional technical
documentation is available at
$https://github.com/bquistorff/pricingengine{github.com/bquistorff/pricingengine}$.

arXiv link: http://arxiv.org/abs/1806.03285v2

Econometrics arXiv paper, submitted: 2018-06-07

A Profit Optimization Approach Based on the Use of Pumped-Hydro Energy Storage Unit and Dynamic Pricing

Authors: Akın Taşcikaraoğlu, Ozan Erdinç

In this study, an optimization problem is proposed in order to obtain the
maximum economic benefit from wind farms with variable and intermittent energy
generation in the day ahead and balancing electricity markets. This method,
which is based on the use of pumped-hydro energy storage unit and wind farm
together, increases the profit from the power plant by taking advantage of the
price changes in the markets and at the same time supports the power system by
supplying a portion of the peak load demand in the system to which the plant is
connected. With the objective of examining the effectiveness of the proposed
method, detailed simulation studies are carried out by making use of actual
wind and price data, and the results are compared to those obtained for the
various cases in which the storage unit is not available and/or the proposed
price-based energy management method is not applied. As a consequence, it is
demonstrated that the pumped-hydro energy storage units are the storage systems
capable of being used effectively for high-power levels and that the proposed
optimization problem is quite successful in the cost-effective implementation
of these systems.

arXiv link: http://arxiv.org/abs/1806.05211v1

Econometrics arXiv cross-link from physics.pop-ph (physics.pop-ph), submitted: 2018-06-07

Role of Symmetry in Irrational Choice

Authors: Ivan Kozic

Symmetry is a fundamental concept in modern physics and other related
sciences. Being such a powerful tool, almost all physical theories can be
derived from symmetry, and the effectiveness of such an approach is
astonishing. Since many physicists do not actually believe that symmetry is a
fundamental feature of nature, it seems more likely it is a fundamental feature
of human cognition. According to evolutionary psychologists, humans have a
sensory bias for symmetry. The unconscious quest for symmetrical patterns has
developed as a solution to specific adaptive problems related to survival and
reproduction. Therefore, it comes as no surprise that some fundamental concepts
in psychology and behavioral economics necessarily involve symmetry. The
purpose of this paper is to draw attention to the role of symmetry in
decision-making and to illustrate how it can be algebraically operationalized
through the use of mathematical group theory.

arXiv link: http://arxiv.org/abs/1806.02627v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2018-06-05

High-Dimensional Econometrics and Regularized GMM

Authors: Alexandre Belloni, Victor Chernozhukov, Denis Chetverikov, Christian Hansen, Kengo Kato

This chapter presents key concepts and theoretical results for analyzing
estimation and inference in high-dimensional models. High-dimensional models
are characterized by having a number of unknown parameters that is not
vanishingly small relative to the sample size. We first present results in a
framework where estimators of parameters of interest may be represented
directly as approximate means. Within this context, we review fundamental
results including high-dimensional central limit theorems, bootstrap
approximation of high-dimensional limit distributions, and moderate deviation
theory. We also review key concepts underlying inference when many parameters
are of interest such as multiple testing with family-wise error rate or false
discovery rate control. We then turn to a general high-dimensional minimum
distance framework with a special focus on generalized method of moments
problems where we present results for estimation and inference about model
parameters. The presented results cover a wide array of econometric
applications, and we discuss several leading special cases including
high-dimensional linear regression and linear instrumental variables models to
illustrate the general results.

arXiv link: http://arxiv.org/abs/1806.01888v2

Econometrics arXiv paper, submitted: 2018-06-05

A Quantitative Analysis of Possible Futures of Autonomous Transport

Authors: Christopher L. Benson, Pranav D Sumanth, Alina P Colling

Autonomous ships (AS) used for cargo transport have gained a considerable
amount of attention in recent years. They promise benefits such as reduced crew
costs, increased safety and increased flexibility. This paper explores the
effects of a faster increase in technological performance in maritime shipping
achieved by leveraging fast-improving technological domains such as computer
processors, and advanced energy storage. Based on historical improvement rates
of several modes of transport (Cargo Ships, Air, Rail, Trucking) a simplified
Markov-chain Monte-Carlo (MCMC) simulation of an intermodal transport model
(IMTM) is used to explore the effects of differing technological improvement
rates for AS. The results show that the annual improvement rates of traditional
shipping (Ocean Cargo Ships = 2.6%, Air Cargo = 5.5%, Trucking = 0.6%, Rail =
1.9%, Inland Water Transport = 0.4%) improve at lower rates than technologies
associated with automation such as Computer Processors (35.6%), Fuel Cells
(14.7%) and Automotive Autonomous Hardware (27.9%). The IMTM simulations up to
the year 2050 show that the introduction of any mode of autonomous transport
will increase competition in lower cost shipping options, but is unlikely to
significantly alter the overall distribution of transport mode costs. Secondly,
if all forms of transport end up converting to autonomous systems, then the
uncertainty surrounding the improvement rates yields a complex intermodal
transport solution involving several options, all at a much lower cost over
time. Ultimately, the research shows a need for more accurate measurement of
current autonomous transport costs and how they are changing over time.

arXiv link: http://arxiv.org/abs/1806.01696v1

Econometrics arXiv updated paper (originally submitted: 2018-06-05)

Leave-out estimation of variance components

Authors: Patrick Kline, Raffaele Saggio, Mikkel Sølvsten

We propose leave-out estimators of quadratic forms designed for the study of
linear models with unrestricted heteroscedasticity. Applications include
analysis of variance and tests of linear restrictions in models with many
regressors. An approximation algorithm is provided that enables accurate
computation of the estimator in very large datasets. We study the large sample
properties of our estimator allowing the number of regressors to grow in
proportion to the number of observations. Consistency is established in a
variety of settings where plug-in methods and estimators predicated on
homoscedasticity exhibit first-order biases. For quadratic forms of increasing
rank, the limiting distribution can be represented by a linear combination of
normal and non-central $\chi^2$ random variables, with normality ensuing under
strong identification. Standard error estimators are proposed that enable tests
of linear restrictions and the construction of uniformly valid confidence
intervals for quadratic forms of interest. We find in Italian social security
records that leave-out estimates of a variance decomposition in a two-way fixed
effects model of wage determination yield substantially different conclusions
regarding the relative contribution of workers, firms, and worker-firm sorting
to wage inequality than conventional methods. Monte Carlo exercises corroborate
the accuracy of our asymptotic approximations, with clear evidence of
non-normality emerging when worker mobility between blocks of firms is limited.

arXiv link: http://arxiv.org/abs/1806.01494v2

Econometrics arXiv paper, submitted: 2018-06-05

A Consistent Variance Estimator for 2SLS When Instruments Identify Different LATEs

Authors: Seojeong Lee

Under treatment effect heterogeneity, an instrument identifies the
instrument-specific local average treatment effect (LATE). With multiple
instruments, two-stage least squares (2SLS) estimand is a weighted average of
different LATEs. What is often overlooked in the literature is that the
postulated moment condition evaluated at the 2SLS estimand does not hold unless
those LATEs are the same. If so, the conventional heteroskedasticity-robust
variance estimator would be inconsistent, and 2SLS standard errors based on
such estimators would be incorrect. I derive the correct asymptotic
distribution, and propose a consistent asymptotic variance estimator by using
the result of Hall and Inoue (2003, Journal of Econometrics) on misspecified
moment condition models. This can be used to correctly calculate the standard
errors regardless of whether there is more than one LATE or not.

arXiv link: http://arxiv.org/abs/1806.01457v1

Econometrics arXiv paper, submitted: 2018-06-05

Asymptotic Refinements of a Misspecification-Robust Bootstrap for Generalized Method of Moments Estimators

Authors: Seojeong Lee

I propose a nonparametric iid bootstrap that achieves asymptotic refinements
for t tests and confidence intervals based on GMM estimators even when the
model is misspecified. In addition, my bootstrap does not require recentering
the moment function, which has been considered as critical for GMM. Regardless
of model misspecification, the proposed bootstrap achieves the same sharp
magnitude of refinements as the conventional bootstrap methods which establish
asymptotic refinements by recentering in the absence of misspecification. The
key idea is to link the misspecified bootstrap moment condition to the large
sample theory of GMM under misspecification of Hall and Inoue (2003). Two
examples are provided: Combining data sets and invalid instrumental variables.

arXiv link: http://arxiv.org/abs/1806.01450v1

Econometrics arXiv cross-link from cs.CY (cs.CY), submitted: 2018-06-04

Driving by the Elderly and their Awareness of their Driving Difficulties (Hebrew)

Authors: Idit Sohlberg

In the past twenty years the number of elderly drivers has increased for two
reasons. one is the higher proportion of elderly in the population, and the
other is the rise in the share of the elderly who drive. This paper examines
the features of their driving and the level of their awareness of problems
relating to it, by analysis preference survey that included interviews with 205
drivers aged between 70 and 80. The interviewees exhibited a level of optimism
and self confidence in their driving that is out of line with the real
situation. There is also a discrepancy between how their driving is viewed by
others and their own assessment, and between their self assessment and their
assessment of the driving of other elderly drivers, which they rate lower than
their own. they attributed great importance to safety feature in cars, although
they did not think that they themselves needed them, and most elderly drivers
did not think there was any reason that they should stop driving, despite
suggestions from family members and others that they should do so. A declared
preference survey was undertaken to assess the degree of difficulty elderly
drivers attribute to driving conditions. It was found that they are concerned
mainly about weather condition, driving at night, and long journeys. Worry
about night driving was most marked among women, the oldest drivers, and those
who drove less frequently. In light of the findings, imposing greater
responsibility on the health system should be considered. Consideration should
also be given to issuing partial licenses to the elderly for daytime driving
only, or restricted to certain weather conditions, dependent on their medical
condition. Such flexibility will enable the elderly to maintain their life
style and independence for a longer period on the one hand, and on the other,
will minimize the risks to themselves and other.

arXiv link: http://arxiv.org/abs/1806.03254v1

Econometrics arXiv paper, submitted: 2018-06-04

The Impact of Supervision and Incentive Process in Explaining Wage Profile and Variance

Authors: Nitsa Kasir, Idit Sohlberg

The implementation of a supervision and incentive process for identical
workers may lead to wage variance that stems from employer and employee
optimization. The harder it is to assess the nature of the labor output, the
more important such a process becomes, and the influence of such a process on
wage development growth. The dynamic model presented in this paper shows that
an employer will choose to pay a worker a starting wage that is less than what
he deserves, resulting in a wage profile that fits the classic profile in the
human-capital literature. The wage profile and wage variance rise at times of
technological advancements, which leads to increased turnover as older workers
are replaced by younger workers due to a rise in the relative marginal cost of
the former.

arXiv link: http://arxiv.org/abs/1806.01332v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2018-06-04

Limit Theory for Moderate Deviation from Integrated GARCH Processes

Authors: Yubo Tao

This paper develops the limit theory of the GARCH(1,1) process that
moderately deviates from IGARCH process towards both stationary and explosive
regimes. The GARCH(1,1) process is defined by equations $u_t = \sigma_t
\varepsilon_t$, $\sigma_t^2 = \omega + \alpha_n u_{t-1}^2 +
\beta_n\sigma_{t-1}^2$ and $\alpha_n + \beta_n$ approaches to unity as sample
size goes to infinity. The asymptotic theory developed in this paper extends
Berkes et al. (2005) by allowing the parameters to have a slower convergence
rate. The results can be applied to unit root test for processes with
mildly-integrated GARCH innovations (e.g. Boswijk (2001), Cavaliere and Taylor
(2007, 2009)) and deriving limit theory of estimators for models involving
mildly-integrated GARCH processes (e.g. Jensen and Rahbek (2004), Francq and
Zako\"ian (2012, 2013)).

arXiv link: http://arxiv.org/abs/1806.01229v3

Econometrics arXiv updated paper (originally submitted: 2018-06-04)

Quasi-Experimental Shift-Share Research Designs

Authors: Kirill Borusyak, Peter Hull, Xavier Jaravel

Many studies use shift-share (or “Bartik”) instruments, which average a set
of shocks with exposure share weights. We provide a new econometric framework
for shift-share instrumental variable (SSIV) regressions in which
identification follows from the quasi-random assignment of shocks, while
exposure shares are allowed to be endogenous. The framework is motivated by an
equivalence result: the orthogonality between a shift-share instrument and an
unobserved residual can be represented as the orthogonality between the
underlying shocks and a shock-level unobservable. SSIV regression coefficients
can similarly be obtained from an equivalent shock-level regression, motivating
shock-level conditions for their consistency. We discuss and illustrate several
practical insights of this framework in the setting of Autor et al. (2013),
estimating the effect of Chinese import competition on manufacturing employment
across U.S. commuting zones.

arXiv link: http://arxiv.org/abs/1806.01221v9

Econometrics arXiv updated paper (originally submitted: 2018-06-04)

Asymptotic Refinements of a Misspecification-Robust Bootstrap for Generalized Empirical Likelihood Estimators

Authors: Seojeong Lee

I propose a nonparametric iid bootstrap procedure for the empirical
likelihood, the exponential tilting, and the exponentially tilted empirical
likelihood estimators that achieves asymptotic refinements for t tests and
confidence intervals, and Wald tests and confidence regions based on such
estimators. Furthermore, the proposed bootstrap is robust to model
misspecification, i.e., it achieves asymptotic refinements regardless of
whether the assumed moment condition model is correctly specified or not. This
result is new, because asymptotic refinements of the bootstrap based on these
estimators have not been established in the literature even under correct model
specification. Monte Carlo experiments are conducted in dynamic panel data
setting to support the theoretical finding. As an application, bootstrap
confidence intervals for the returns to schooling of Hellerstein and Imbens
(1999) are calculated. The result suggests that the returns to schooling may be
higher.

arXiv link: http://arxiv.org/abs/1806.00953v2

Econometrics arXiv paper, submitted: 2018-06-03

Identification of Conduit Countries and Community Structures in the Withholding Tax Networks

Authors: Tembo Nakamoto, Yuichi Ikeda

Due to economic globalization, each country's economic law, including tax
laws and tax treaties, has been forced to work as a single network. However,
each jurisdiction (country or region) has not made its economic law under the
assumption that its law functions as an element of one network, so it has
brought unexpected results. We thought that the results are exactly
international tax avoidance. To contribute to the solution of international tax
avoidance, we tried to investigate which part of the network is vulnerable.
Specifically, focusing on treaty shopping, which is one of international tax
avoidance methods, we attempt to identified which jurisdiction are likely to be
used for treaty shopping from tax liabilities and the relationship between
jurisdictions which are likely to be used for treaty shopping and others. For
that purpose, based on withholding tax rates imposed on dividends, interest,
and royalties by jurisdictions, we produced weighted multiple directed graphs,
computed the centralities and detected the communities. As a result, we
clarified the jurisdictions that are likely to be used for treaty shopping and
pointed out that there are community structures. The results of this study
suggested that fewer jurisdictions need to introduce more regulations for
prevention of treaty abuse worldwide.

arXiv link: http://arxiv.org/abs/1806.00799v1

Econometrics arXiv updated paper (originally submitted: 2018-06-02)

Ill-posed Estimation in High-Dimensional Models with Instrumental Variables

Authors: Christoph Breunig, Enno Mammen, Anna Simoni

This paper is concerned with inference about low-dimensional components of a
high-dimensional parameter vector $\beta^0$ which is identified through
instrumental variables. We allow for eigenvalues of the expected outer product
of included and excluded covariates, denoted by $M$, to shrink to zero as the
sample size increases. We propose a novel estimator based on desparsification
of an instrumental variable Lasso estimator, which is a regularized version of
2SLS with an additional correction term. This estimator converges to $\beta^0$
at a rate depending on the mapping properties of $M$ captured by a sparse link
condition. Linear combinations of our estimator of $\beta^0$ are shown to be
asymptotically normally distributed. Based on consistent covariance estimation,
our method allows for constructing confidence intervals and statistical tests
for single or low-dimensional components of $\beta^0$. In Monte-Carlo
simulations we analyze the finite sample behavior of our estimator.

arXiv link: http://arxiv.org/abs/1806.00666v2

Econometrics arXiv updated paper (originally submitted: 2018-05-30)

Introducing shrinkage in heavy-tailed state space models to predict equity excess returns

Authors: Florian Huber, Gregor Kastner, Michael Pfarrhofer

We forecast S&P 500 excess returns using a flexible Bayesian econometric
state space model with non-Gaussian features at several levels. More precisely,
we control for overparameterization via novel global-local shrinkage priors on
the state innovation variances as well as the time-invariant part of the state
space model. The shrinkage priors are complemented by heavy tailed state
innovations that cater for potential large breaks in the latent states.
Moreover, we allow for leptokurtic stochastic volatility in the observation
equation. The empirical findings indicate that several variants of the proposed
approach outperform typical competitors frequently used in the literature, both
in terms of point and density forecasts.

arXiv link: http://arxiv.org/abs/1805.12217v2

Econometrics arXiv updated paper (originally submitted: 2018-05-29)

Estimation and Inference for Policy Relevant Treatment Effects

Authors: Yuya Sasaki, Takuya Ura

The policy relevant treatment effect (PRTE) measures the average effect of
switching from a status-quo policy to a counterfactual policy. Estimation of
the PRTE involves estimation of multiple preliminary parameters, including
propensity scores, conditional expectation functions of the outcome and
covariates given the propensity score, and marginal treatment effects. These
preliminary estimators can affect the asymptotic distribution of the PRTE
estimator in complicated and intractable manners. In this light, we propose an
orthogonal score for double debiased estimation of the PRTE, whereby the
asymptotic distribution of the PRTE estimator is obtained without any influence
of preliminary parameter estimators as far as they satisfy mild requirements of
convergence rates. To our knowledge, this paper is the first to develop limit
distribution theories for inference about the PRTE.

arXiv link: http://arxiv.org/abs/1805.11503v4

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2018-05-29

Stationarity and ergodicity of vector STAR models

Authors: Igor L. Kheifets, Pentti J. Saikkonen

Smooth transition autoregressive models are widely used to capture
nonlinearities in univariate and multivariate time series. Existence of
stationary solution is typically assumed, implicitly or explicitly. In this
paper we describe conditions for stationarity and ergodicity of vector STAR
models. The key condition is that the joint spectral radius of certain matrices
is below 1, which is not guaranteed if only separate spectral radii are below
1. Our result allows to use recently introduced toolboxes from computational
mathematics to verify the stationarity and ergodicity of vector STAR models.

arXiv link: http://arxiv.org/abs/1805.11311v3

Econometrics arXiv updated paper (originally submitted: 2018-05-28)

Modeling the residential electricity consumption within a restructured power market

Authors: Chelsea Sun

The United States' power market is featured by the lack of judicial power at
the federal level. The market thus provides a unique testing environment for
the market organization structure. At the same time, the econometric modeling
and forecasting of electricity market consumption become more challenging.
Import and export, which generally follow simple rules in European countries,
can be a result of direct market behaviors. This paper seeks to build a general
model for power consumption and using the model to test several hypotheses.

arXiv link: http://arxiv.org/abs/1805.11138v2

Econometrics arXiv updated paper (originally submitted: 2018-05-28)

Tilting Approximate Models

Authors: Andreas Tryphonides

Model approximations are common practice when estimating structural or
quasi-structural models. The paper considers the econometric properties of
estimators that utilize projections to reimpose information about the exact
model in the form of conditional moments. The resulting estimator efficiently
combines the information provided by the approximate law of motion and the
moment conditions. The paper develops the corresponding asymptotic theory and
provides simulation evidence that tilting substantially reduces the mean
squared error for parameter estimates. It applies the methodology to pricing
long-run risks in aggregate consumption in the US, whereas the model is solved
using the Campbell and Shiller (1988) approximation. Tilting improves empirical
fit and results suggest that approximation error is a source of upward bias in
estimates of risk aversion and downward bias in the elasticity of intertemporal
substitution.

arXiv link: http://arxiv.org/abs/1805.10869v5

Econometrics arXiv paper, submitted: 2018-05-28

Flexible shrinkage in high-dimensional Bayesian spatial autoregressive models

Authors: Michael Pfarrhofer, Philipp Piribauer

This article introduces two absolutely continuous global-local shrinkage
priors to enable stochastic variable selection in the context of
high-dimensional matrix exponential spatial specifications. Existing approaches
as a means to dealing with overparameterization problems in spatial
autoregressive specifications typically rely on computationally demanding
Bayesian model-averaging techniques. The proposed shrinkage priors can be
implemented using Markov chain Monte Carlo methods in a flexible and efficient
way. A simulation study is conducted to evaluate the performance of each of the
shrinkage priors. Results suggest that they perform particularly well in
high-dimensional environments, especially when the number of parameters to
estimate exceeds the number of observations. For an empirical illustration we
use pan-European regional economic growth data.

arXiv link: http://arxiv.org/abs/1805.10822v1

Econometrics arXiv paper, submitted: 2018-05-25

Inference Related to Common Breaks in a Multivariate System with Joined Segmented Trends with Applications to Global and Hemispheric Temperatures

Authors: Dukpa Kim, Tatsushi Oka, Francisco Estrada, Pierre Perron

What transpires from recent research is that temperatures and radiative
forcing seem to be characterized by a linear trend with two changes in the rate
of growth. The first occurs in the early 60s and indicates a very large
increase in the rate of growth of both temperature and radiative forcing
series. This was termed as the "onset of sustained global warming". The second
is related to the more recent so-called hiatus period, which suggests that
temperatures and total radiative forcing have increased less rapidly since the
mid-90s compared to the larger rate of increase from 1960 to 1990. There are
two issues that remain unresolved. The first is whether the breaks in the slope
of the trend functions of temperatures and radiative forcing are common. This
is important because common breaks coupled with the basic science of climate
change would strongly suggest a causal effect from anthropogenic factors to
temperatures. The second issue relates to establishing formally via a proper
testing procedure that takes into account the noise in the series, whether
there was indeed a `hiatus period' for temperatures since the mid 90s. This is
important because such a test would counter the widely held view that the
hiatus is the product of natural internal variability. Our paper provides tests
related to both issues. The results show that the breaks in temperatures and
radiative forcing are common and that the hiatus is characterized by a
significant decrease in their rate of growth. The statistical results are of
independent interest and applicable more generally.

arXiv link: http://arxiv.org/abs/1805.09937v1

Econometrics arXiv updated paper (originally submitted: 2018-05-23)

Identification in Nonparametric Models for Dynamic Treatment Effects

Authors: Sukjin Han

This paper develops a nonparametric model that represents how sequences of
outcomes and treatment choices influence one another in a dynamic manner. In
this setting, we are interested in identifying the average outcome for
individuals in each period, had a particular treatment sequence been assigned.
The identification of this quantity allows us to identify the average treatment
effects (ATE's) and the ATE's on transitions, as well as the optimal treatment
regimes, namely, the regimes that maximize the (weighted) sum of the average
potential outcomes, possibly less the cost of the treatments. The main
contribution of this paper is to relax the sequential randomization assumption
widely used in the biostatistics literature by introducing a flexible
choice-theoretic framework for a sequence of endogenous treatments. We show
that the parameters of interest are identified under each period's two-way
exclusion restriction, i.e., with instruments excluded from the
outcome-determining process and other exogenous variables excluded from the
treatment-selection process. We also consider partial identification in the
case where the latter variables are not available. Lastly, we extend our
results to a setting where treatments do not appear in every period.

arXiv link: http://arxiv.org/abs/1805.09397v3

Econometrics arXiv updated paper (originally submitted: 2018-05-23)

A Double Machine Learning Approach to Estimate the Effects of Musical Practice on Student's Skills

Authors: Michael C. Knaus

This study investigates the dose-response effects of making music on youth
development. Identification is based on the conditional independence assumption
and estimation is implemented using a recent double machine learning estimator.
The study proposes solutions to two highly practically relevant questions that
arise for these new methods: (i) How to investigate sensitivity of estimates to
tuning parameter choices in the machine learning part? (ii) How to assess
covariate balancing in high-dimensional settings? The results show that
improvements in objectively measured cognitive skills require at least medium
intensity, while improvements in school grades are already observed for low
intensity of practice.

arXiv link: http://arxiv.org/abs/1805.10300v2

Econometrics arXiv paper, submitted: 2018-05-23

Model Selection in Time Series Analysis: Using Information Criteria as an Alternative to Hypothesis Testing

Authors: R. Scott Hacker, Abdulnasser Hatemi-J

The issue of model selection in applied research is of vital importance.
Since the true model in such research is not known, which model should be used
from among various potential ones is an empirical question. There might exist
several competitive models. A typical approach to dealing with this is classic
hypothesis testing using an arbitrarily chosen significance level based on the
underlying assumption that a true null hypothesis exists. In this paper we
investigate how successful this approach is in determining the correct model
for different data generating processes using time series data. An alternative
approach based on more formal model selection techniques using an information
criterion or cross-validation is suggested and evaluated in the time series
environment via Monte Carlo experiments. This paper also explores the
effectiveness of deciding what type of general relation exists between two
variables (e.g. relation in levels or relation in first differences) using
various strategies based on hypothesis testing and on information criteria with
the presence or absence of unit roots.

arXiv link: http://arxiv.org/abs/1805.08991v1

Econometrics arXiv paper, submitted: 2018-05-22

Sensitivity of Regular Estimators

Authors: Yaroslav Mukhin

This paper studies local asymptotic relationship between two scalar
estimates. We define sensitivity of a target estimate to a control estimate to
be the directional derivative of the target functional with respect to the
gradient direction of the control functional. Sensitivity according to the
information metric on the model manifold is the asymptotic covariance of
regular efficient estimators. Sensitivity according to a general policy metric
on the model manifold can be obtained from influence functions of regular
efficient estimators. Policy sensitivity has a local counterfactual
interpretation, where the ceteris paribus change to a counterfactual
distribution is specified by the combination of a control parameter and a
Riemannian metric on the model manifold.

arXiv link: http://arxiv.org/abs/1805.08883v1

Econometrics arXiv updated paper (originally submitted: 2018-05-21)

Multiple Treatments with Strategic Interaction

Authors: Jorge Balat, Sukjin Han

We develop an empirical framework to identify and estimate the effects of
treatments on outcomes of interest when the treatments are the result of
strategic interaction (e.g., bargaining, oligopolistic entry, peer effects). We
consider a model where agents play a discrete game with complete information
whose equilibrium actions (i.e., binary treatments) determine a post-game
outcome in a nonseparable model with endogeneity. Due to the simultaneity in
the first stage, the model as a whole is incomplete and the selection process
fails to exhibit the conventional monotonicity. Without imposing parametric
restrictions or large support assumptions, this poses challenges in recovering
treatment parameters. To address these challenges, we first establish a
monotonic pattern of the equilibria in the first-stage game in terms of the
number of treatments selected. Based on this finding, we derive bounds on the
average treatment effects (ATEs) under nonparametric shape restrictions and the
existence of excluded exogenous variables. We show that instrument variation
that compensates strategic substitution helps solve the multiple equilibria
problem. We apply our method to data on airlines and air pollution in cities in
the U.S. We find that (i) the causal effect of each airline on pollution is
positive, and (ii) the effect is increasing in the number of firms but at a
decreasing rate.

arXiv link: http://arxiv.org/abs/1805.08275v2

Econometrics arXiv cross-link from cs.DS (cs.DS), submitted: 2018-05-19

On testing substitutability

Authors: Cosmina Croitoru, Kurt Mehlhorn

The papers hatfimmokomi11 and azizbrilharr13 propose algorithms
for testing whether the choice function induced by a (strict) preference list
of length $N$ over a universe $U$ is substitutable. The running time of these
algorithms is $O(|U|^3\cdot N^3)$, respectively $O(|U|^2\cdot N^3)$. In this
note we present an algorithm with running time $O(|U|^2\cdot N^2)$. Note that
$N$ may be exponential in the size $|U|$ of the universe.

arXiv link: http://arxiv.org/abs/1805.07642v1

Econometrics arXiv paper, submitted: 2018-05-19

Bitcoin price and its marginal cost of production: support for a fundamental value

Authors: Adam Hayes

This study back-tests a marginal cost of production model proposed to value
the digital currency bitcoin. Results from both conventional regression and
vector autoregression (VAR) models show that the marginal cost of production
plays an important role in explaining bitcoin prices, challenging recent
allegations that bitcoins are essentially worthless. Even with markets pricing
bitcoin in the thousands of dollars each, the valuation model seems robust. The
data show that a price bubble that began in the Fall of 2017 resolved itself in
early 2018, converging with the marginal cost model. This suggests that while
bubbles may appear in the bitcoin market, prices will tend to this bound and
not collapse to zero.

arXiv link: http://arxiv.org/abs/1805.07610v1

Econometrics arXiv updated paper (originally submitted: 2018-05-17)

Learning non-smooth models: instrumental variable quantile regressions and related problems

Authors: Yinchu Zhu

This paper proposes computationally efficient methods that can be used for
instrumental variable quantile regressions (IVQR) and related methods with
statistical guarantees. This is much needed when we investigate heterogenous
treatment effects since interactions between the endogenous treatment and
control variables lead to an increased number of endogenous covariates. We
prove that the GMM formulation of IVQR is NP-hard and finding an approximate
solution is also NP-hard. Hence, solving the problem from a purely
computational perspective seems unlikely. Instead, we aim to obtain an estimate
that has good statistical properties and is not necessarily the global solution
of any optimization problem.
The proposal consists of employing $k$-step correction on an initial
estimate. The initial estimate exploits the latest advances in mixed integer
linear programming and can be computed within seconds. One theoretical
contribution is that such initial estimators and Jacobian of the moment
condition used in the k-step correction need not be even consistent and merely
$k=4\log n$ fast iterations are needed to obtain an efficient estimator. The
overall proposal scales well to handle extremely large sample sizes because
lack of consistency requirement allows one to use a very small subsample to
obtain the initial estimate and the k-step iterations on the full sample can be
implemented efficiently. Another contribution that is of independent interest
is to propose a tuning-free estimation for the Jacobian matrix, whose
definition nvolves conditional densities. This Jacobian estimator generalizes
bootstrap quantile standard errors and can be efficiently computed via
closed-end solutions. We evaluate the performance of the proposal in
simulations and an empirical example on the heterogeneous treatment effect of
Job Training Partnership Act.

arXiv link: http://arxiv.org/abs/1805.06855v4

Econometrics arXiv paper, submitted: 2018-05-17

Happy family of stable marriages

Authors: Gershon Wolansky

Some aspects of the problem of stable marriage are discussed. There are two
distinguished marriage plans: the fully transferable case, where money can be
transferred between the participants, and the fully non transferable case where
each participant has its own rigid preference list regarding the other gender.
We continue to discuss intermediate partial transferable cases. Partial
transferable plans can be approached as either special cases of cooperative
games using the notion of a core, or as a generalization of the cyclical
monotonicity property of the fully transferable case (fake promises). We shall
introduced these two approaches, and prove the existence of stable marriage for
the fully transferable and non-transferable plans.

arXiv link: http://arxiv.org/abs/1805.06687v1

Econometrics arXiv paper, submitted: 2018-05-16

Data-Driven Investment Decision-Making: Applying Moore's Law and S-Curves to Business Strategies

Authors: Christopher L. Benson, Christopher L. Magee

This paper introduces a method for linking technological improvement rates
(i.e. Moore's Law) and technology adoption curves (i.e. S-Curves). There has
been considerable research surrounding Moore's Law and the generalized versions
applied to the time dependence of performance for other technologies. The prior
work has culminated with methodology for quantitative estimation of
technological improvement rates for nearly any technology. This paper examines
the implications of such regular time dependence for performance upon the
timing of key events in the technological adoption process. We propose a simple
crossover point in performance which is based upon the technological
improvement rates and current level differences for target and replacement
technologies. The timing for the cross-over is hypothesized as corresponding to
the first 'knee'? in the technology adoption "S-curve" and signals when the
market for a given technology will start to be rewarding for innovators. This
is also when potential entrants are likely to intensely experiment with
product-market fit and when the competition to achieve a dominant design
begins. This conceptual framework is then back-tested by examining two
technological changes brought about by the internet, namely music and video
transmission. The uncertainty analysis around the cases highlight opportunities
for organizations to reduce future technological uncertainty. Overall, the
results from the case studies support the reliability and utility of the
conceptual framework in strategic business decision-making with the caveat that
while technical uncertainty is reduced, it is not eliminated.

arXiv link: http://arxiv.org/abs/1805.06339v1

Econometrics arXiv paper, submitted: 2018-05-14

The Finite Sample Performance of Treatment Effects Estimators based on the Lasso

Authors: Michael Zimmert

This paper contributes to the literature on treatment effects estimation with
machine learning inspired methods by studying the performance of different
estimators based on the Lasso. Building on recent work in the field of
high-dimensional statistics, we use the semiparametric efficient score
estimation structure to compare different estimators. Alternative weighting
schemes are considered and their suitability for the incorporation of machine
learning estimators is assessed using theoretical arguments and various Monte
Carlo experiments. Additionally we propose an own estimator based on doubly
robust Kernel matching that is argued to be more robust to nuisance parameter
misspecification. In the simulation study we verify theory based intuition and
find good finite sample properties of alternative weighting scheme estimators
like the one we propose.

arXiv link: http://arxiv.org/abs/1805.05067v1

Econometrics arXiv paper, submitted: 2018-05-12

A Dynamic Analysis of Nash Equilibria in Search Models with Fiat Money

Authors: Federico Bonetto, Maurizio Iacopetta

We study the rise in the acceptability fiat money in a Kiyotaki-Wright
economy by developing a method that can determine dynamic Nash equilibria for a
class of search models with genuine heterogenous agents. We also address open
issues regarding the stability properties of pure strategies equilibria and the
presence of multiple equilibria. Experiments illustrate the liquidity
conditions that favor the transition from partial to full acceptance of fiat
money, and the effects of inflationary shocks on production, liquidity, and
trade.

arXiv link: http://arxiv.org/abs/1805.04733v1

Econometrics arXiv paper, submitted: 2018-05-11

Efficiency in Micro-Behaviors and FL Bias

Authors: Kurihara Kazutaka, Yohei Tutiya

In this paper, we propose a model which simulates odds distributions of
pari-mutuel betting system under two hypotheses on the behavior of bettors: 1.
The amount of bets increases very rapidly as the deadline for betting comes
near. 2. Each bettor bets on a horse which gives the largest expectation value
of the benefit. The results can be interpreted as such efficient behaviors do
not serve to extinguish the FL bias but even produce stronger FL bias.

arXiv link: http://arxiv.org/abs/1805.04225v1

Econometrics arXiv updated paper (originally submitted: 2018-05-10)

Density Forecasts in Panel Data Models: A Semiparametric Bayesian Perspective

Authors: Laura Liu

This paper constructs individual-specific density forecasts for a panel of
firms or households using a dynamic linear model with common and heterogeneous
coefficients as well as cross-sectional heteroskedasticity. The panel
considered in this paper features a large cross-sectional dimension N but short
time series T. Due to the short T, traditional methods have difficulty in
disentangling the heterogeneous parameters from the shocks, which contaminates
the estimates of the heterogeneous parameters. To tackle this problem, I assume
that there is an underlying distribution of heterogeneous parameters, model
this distribution nonparametrically allowing for correlation between
heterogeneous parameters and initial conditions as well as individual-specific
regressors, and then estimate this distribution by combining information from
the whole panel. Theoretically, I prove that in cross-sectional homoskedastic
cases, both the estimated common parameters and the estimated distribution of
the heterogeneous parameters achieve posterior consistency, and that the
density forecasts asymptotically converge to the oracle forecast.
Methodologically, I develop a simulation-based posterior sampling algorithm
specifically addressing the nonparametric density estimation of unobserved
heterogeneous parameters. Monte Carlo simulations and an empirical application
to young firm dynamics demonstrate improvements in density forecasts relative
to alternative approaches.

arXiv link: http://arxiv.org/abs/1805.04178v3

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2018-05-10

News Sentiment as Leading Indicators for Recessions

Authors: Melody Y. Huang, Randall R. Rojas, Patrick D. Convery

In the following paper, we use a topic modeling algorithm and sentiment
scoring methods to construct a novel metric that serves as a leading indicator
in recession prediction models. We hypothesize that the inclusion of such a
sentiment indicator, derived purely from unstructured news data, will improve
our capabilities to forecast future recessions because it provides a direct
measure of the polarity of the information consumers and producers are exposed
to. We go on to show that the inclusion of our proposed news sentiment
indicator, with traditional sentiment data, such as the Michigan Index of
Consumer Sentiment and the Purchasing Manager's Index, and common factors
derived from a large panel of economic and financial indicators helps improve
model performance significantly.

arXiv link: http://arxiv.org/abs/1805.04160v2

Econometrics arXiv paper, submitted: 2018-05-10

Sufficient Statistics for Unobserved Heterogeneity in Structural Dynamic Logit Models

Authors: Victor Aguirregabiria, Jiaying Gu, Yao Luo

We study the identification and estimation of structural parameters in
dynamic panel data logit models where decisions are forward-looking and the
joint distribution of unobserved heterogeneity and observable state variables
is nonparametric, i.e., fixed-effects model. We consider models with two
endogenous state variables: the lagged decision variable, and the time duration
in the last choice. This class of models includes as particular cases important
economic applications such as models of market entry-exit, occupational choice,
machine replacement, inventory and investment decisions, or dynamic demand of
differentiated products. The identification of structural parameters requires a
sufficient statistic that controls for unobserved heterogeneity not only in
current utility but also in the continuation value of the forward-looking
decision problem. We obtain the minimal sufficient statistic and prove
identification of some structural parameters using a conditional likelihood
approach. We apply this estimator to a machine replacement model.

arXiv link: http://arxiv.org/abs/1805.04048v1

Econometrics arXiv paper, submitted: 2018-05-10

A mixture autoregressive model based on Student's $t$-distribution

Authors: Mika Meitz, Daniel Preve, Pentti Saikkonen

A new mixture autoregressive model based on Student's $t$-distribution is
proposed. A key feature of our model is that the conditional $t$-distributions
of the component models are based on autoregressions that have multivariate
$t$-distributions as their (low-dimensional) stationary distributions. That
autoregressions with such stationary distributions exist is not immediate. Our
formulation implies that the conditional mean of each component model is a
linear function of past observations and the conditional variance is also time
varying. Compared to previous mixture autoregressive models our model may
therefore be useful in applications where the data exhibits rather strong
conditional heteroskedasticity. Our formulation also has the theoretical
advantage that conditions for stationarity and ergodicity are always met and
these properties are much more straightforward to establish than is common in
nonlinear autoregressive models. An empirical example employing a realized
kernel series based on S&P 500 high-frequency data shows that the proposed
model performs well in volatility forecasting.

arXiv link: http://arxiv.org/abs/1805.04010v1

Econometrics arXiv paper, submitted: 2018-05-10

Structural Breaks in Time Series

Authors: Alessandro Casini, Pierre Perron

This chapter covers methodological issues related to estimation, testing and
computation for models involving structural changes. Our aim is to review
developments as they relate to econometric applications based on linear models.
Substantial advances have been made to cover models at a level of generality
that allow a host of interesting practical applications. These include models
with general stationary regressors and errors that can exhibit temporal
dependence and heteroskedasticity, models with trending variables and possible
unit roots and cointegrated models, among others. Advances have been made
pertaining to computational aspects of constructing estimates, their limit
distributions, tests for structural changes, and methods to determine the
number of changes present. A variety of topics are covered. The first part
summarizes and updates developments described in an earlier review, Perron
(2006), with the exposition following heavily that of Perron (2008). Additions
are included for recent developments: testing for common breaks, models with
endogenous regressors (emphasizing that simply using least-squares is
preferable over instrumental variables methods), quantile regressions, methods
based on Lasso, panel data models, testing for changes in forecast accuracy,
factors models and methods of inference based on a continuous records
asymptotic framework. Our focus is on the so-called off-line methods whereby
one wants to retrospectively test for breaks in a given sample of data and form
confidence intervals about the break dates. The aim is to provide the readers
with an overview of methods that are of direct usefulness in practice as
opposed to issues that are mostly of theoretical interest.

arXiv link: http://arxiv.org/abs/1805.03807v1

Econometrics arXiv updated paper (originally submitted: 2018-05-08)

Optimal Linear Instrumental Variables Approximations

Authors: Juan Carlos Escanciano, Wei Li

This paper studies the identification and estimation of the optimal linear
approximation of a structural regression function. The parameter in the linear
approximation is called the Optimal Linear Instrumental Variables Approximation
(OLIVA). This paper shows that a necessary condition for standard inference on
the OLIVA is also sufficient for the existence of an IV estimand in a linear
model. The instrument in the IV estimand is unknown and may not be identified.
A Two-Step IV (TSIV) estimator based on Tikhonov regularization is proposed,
which can be implemented by standard regression routines. We establish the
asymptotic normality of the TSIV estimator assuming neither completeness nor
identification of the instrument. As an important application of our analysis,
we robustify the classical Hausman test for exogeneity against misspecification
of the linear structural model. We also discuss extensions to weighted least
squares criteria. Monte Carlo simulations suggest an excellent finite sample
performance for the proposed inferences. Finally, in an empirical application
estimating the elasticity of intertemporal substitution (EIS) with US data, we
obtain TSIV estimates that are much larger than their standard IV counterparts,
with our robust Hausman test failing to reject the null hypothesis of
exogeneity of real interest rates.

arXiv link: http://arxiv.org/abs/1805.03275v3

Econometrics arXiv paper, submitted: 2018-05-02

Endogenous growth - A dynamic technology augmentation of the Solow model

Authors: Murad Kasim

In this paper, I endeavour to construct a new model, by extending the classic
exogenous economic growth model by including a measurement which tries to
explain and quantify the size of technological innovation ( A ) endogenously. I
do not agree technology is a "constant" exogenous variable, because it is
humans who create all technological innovations, and it depends on how much
human and physical capital is allocated for its research. I inspect several
possible approaches to do this, and then I test my model both against sample
and real world evidence data. I call this method "dynamic" because it tries to
model the details in resource allocations between research, labor and capital,
by affecting each other interactively. In the end, I point out which is the new
residual and the parts of the economic growth model which can be further
improved.

arXiv link: http://arxiv.org/abs/1805.00668v1

Econometrics arXiv paper, submitted: 2018-04-30

Identifying Effects of Multivalued Treatments

Authors: Sokbae Lee, Bernard Salanié

Multivalued treatment models have typically been studied under restrictive
assumptions: ordered choice, and more recently unordered monotonicity. We show
how treatment effects can be identified in a more general class of models that
allows for multidimensional unobserved heterogeneity. Our results rely on two
main assumptions: treatment assignment must be a measurable function of
threshold-crossing rules, and enough continuous instruments must be available.
We illustrate our approach for several classes of models.

arXiv link: http://arxiv.org/abs/1805.00057v1

Econometrics arXiv paper, submitted: 2018-04-29

Interpreting Quantile Independence

Authors: Matthew A. Masten, Alexandre Poirier

How should one assess the credibility of assumptions weaker than statistical
independence, like quantile independence? In the context of identifying causal
effects of a treatment variable, we argue that such deviations should be chosen
based on the form of selection on unobservables they allow. For quantile
independence, we characterize this form of treatment selection. Specifically,
we show that quantile independence is equivalent to a constraint on the average
value of either a latent propensity score (for a binary treatment) or the cdf
of treatment given the unobservables (for a continuous treatment). In both
cases, this average value constraint requires a kind of non-monotonic treatment
selection. Using these results, we show that several common treatment selection
models are incompatible with quantile independence. We introduce a class of
assumptions which weakens quantile independence by removing the average value
constraint, and therefore allows for monotonic treatment selection. In a
potential outcomes model with a binary treatment, we derive identified sets for
the ATT and QTT under both classes of assumptions. In a numerical example we
show that the average value constraint inherent in quantile independence has
substantial identifying power. Our results suggest that researchers should
carefully consider the credibility of this non-monotonicity property when using
quantile independence to weaken full independence.

arXiv link: http://arxiv.org/abs/1804.10957v1

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2018-04-26

Chain effects of clean water: The Mills-Reincke phenomenon in early twentieth-century Japan

Authors: Tatsuki Inoue, Kota Ogasawara

This study explores the validity of chain effects of clean water, which are
known as the "Mills-Reincke phenomenon," in early twentieth-century Japan.
Recent studies have reported that water purifications systems are responsible
for huge contributions to human capital. Although some studies have
investigated the instantaneous effects of water-supply systems in pre-war
Japan, little is known about the chain effects of these systems. By analyzing
city-level cause-specific mortality data from 1922-1940, we find that a decline
in typhoid deaths by one per 1,000 people decreased the risk of death due to
non-waterborne diseases such as tuberculosis and pneumonia by 0.742-2.942 per
1,000 people. Our finding suggests that the observed Mills-Reincke phenomenon
could have resulted in the relatively rapid decline in the mortality rate in
early twentieth-century Japan.

arXiv link: http://arxiv.org/abs/1805.00875v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2018-04-26

New HSIC-based tests for independence between two stationary multivariate time series

Authors: Guochang Wang, Wai Keung Li, Ke Zhu

This paper proposes some novel one-sided omnibus tests for independence
between two multivariate stationary time series. These new tests apply the
Hilbert-Schmidt independence criterion (HSIC) to test the independence between
the innovations of both time series. Under regular conditions, the limiting
null distributions of our HSIC-based tests are established. Next, our
HSIC-based tests are shown to be consistent. Moreover, a residual bootstrap
method is used to obtain the critical values for our HSIC-based tests, and its
validity is justified. Compared with the existing cross-correlation-based tests
for linear dependence, our tests examine the general (including both linear and
non-linear) dependence to give investigators more complete information on the
causal relationship between two multivariate time series. The merits of our
tests are illustrated by some simulation results and a real example.

arXiv link: http://arxiv.org/abs/1804.09866v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2018-04-25

Deep Learning for Predicting Asset Returns

Authors: Guanhao Feng, Jingyu He, Nicholas G. Polson

Deep learning searches for nonlinear factors for predicting asset returns.
Predictability is achieved via multiple layers of composite factors as opposed
to additive ones. Viewed in this way, asset pricing studies can be revisited
using multi-layer deep learners, such as rectified linear units (ReLU) or
long-short-term-memory (LSTM) for time-series effects. State-of-the-art
algorithms including stochastic gradient descent (SGD), TensorFlow and dropout
design provide imple- mentation and efficient factor exploration. To illustrate
our methodology, we revisit the equity market risk premium dataset of Welch and
Goyal (2008). We find the existence of nonlinear factors which explain
predictability of returns, in particular at the extremes of the characteristic
space. Finally, we conclude with directions for future research.

arXiv link: http://arxiv.org/abs/1804.09314v2

Econometrics arXiv paper, submitted: 2018-04-24

Economic inequality and Islamic Charity: An exploratory agent-based modeling approach

Authors: Hossein Sabzian, Alireza Aliahmadi, Adel Azar, Madjid Mirzaee

Economic inequality is one of the pivotal issues for most of economic and
social policy makers across the world to insure the sustainable economic growth
and justice. In the mainstream school of economics, namely neoclassical
theories, economic issues are dealt with in a mechanistic manner. Such a
mainstream framework is majorly focused on investigating a socio-economic
system based on an axiomatic scheme where reductionism approach plays a vital
role. The major limitations of such theories include unbounded rationality of
economic agents, reducing the economic aggregates to a set of predictable
factors and lack of attention to adaptability and the evolutionary nature of
economic agents. In tackling deficiencies of conventional economic models, in
the past two decades, some new approaches have been recruited. One of those
novel approaches is the Complex adaptive systems (CAS) framework which has
shown a very promising performance in action. In contrast to mainstream school,
under this framework, the economic phenomena are studied in an organic manner
where the economic agents are supposed to be both boundedly rational and
adaptive. According to it, the economic aggregates emerge out of the ways
agents of a system decide and interact. As a powerful way of modeling CASs,
Agent-based models (ABMs) has found a growing application among academicians
and practitioners. ABMs show that how simple behavioral rules of agents and
local interactions among them at micro-scale can generate surprisingly complex
patterns at macro-scale. In this paper, ABMs have been used to show (1) how an
economic inequality emerges in a system and to explain (2) how sadaqah as an
Islamic charity rule can majorly help alleviating the inequality and how
resource allocation strategies taken by charity entities can accelerate this
alleviation.

arXiv link: http://arxiv.org/abs/1804.09284v1

Econometrics arXiv paper, submitted: 2018-04-23

Statistical and Economic Evaluation of Time Series Models for Forecasting Arrivals at Call Centers

Authors: Andrea Bastianin, Marzio Galeotti, Matteo Manera

Call centers' managers are interested in obtaining accurate point and
distributional forecasts of call arrivals in order to achieve an optimal
balance between service quality and operating costs. We present a strategy for
selecting forecast models of call arrivals which is based on three pillars: (i)
flexibility of the loss function; (ii) statistical evaluation of forecast
accuracy; (iii) economic evaluation of forecast performance using money
metrics. We implement fourteen time series models and seven forecast
combination schemes on three series of daily call arrivals. Although we focus
mainly on point forecasts, we also analyze density forecast evaluation. We show
that second moments modeling is important both for point and density
forecasting and that the simple Seasonal Random Walk model is always
outperformed by more general specifications. Our results suggest that call
center managers should invest in the use of forecast models which describe both
first and second moments of call arrivals.

arXiv link: http://arxiv.org/abs/1804.08315v1

Econometrics arXiv paper, submitted: 2018-04-23

Econometric Modeling of Regional Electricity Spot Prices in the Australian Market

Authors: Michael Stanley Smith, Thomas S. Shively

Wholesale electricity markets are increasingly integrated via high voltage
interconnectors, and inter-regional trade in electricity is growing. To model
this, we consider a spatial equilibrium model of price formation, where
constraints on inter-regional flows result in three distinct equilibria in
prices. We use this to motivate an econometric model for the distribution of
observed electricity spot prices that captures many of their unique empirical
characteristics. The econometric model features supply and inter-regional trade
cost functions, which are estimated using Bayesian monotonic regression
smoothing methodology. A copula multivariate time series model is employed to
capture additional dependence -- both cross-sectional and serial-- in regional
prices. The marginal distributions are nonparametric, with means given by the
regression means. The model has the advantage of preserving the heavy
right-hand tail in the predictive densities of price. We fit the model to
half-hourly spot price data in the five interconnected regions of the
Australian national electricity market. The fitted model is then used to
measure how both supply and price shocks in one region are transmitted to the
distribution of prices in all regions in subsequent periods. Finally, to
validate our econometric model, we show that prices forecast using the proposed
model compare favorably with those from some benchmark alternatives.

arXiv link: http://arxiv.org/abs/1804.08218v1

Econometrics arXiv paper, submitted: 2018-04-22

Price Competition with Geometric Brownian motion in Exchange Rate Uncertainty

Authors: Murat Erkoc, Huaqing Wang, Anas Ahmed

We analyze an operational policy for a multinational manufacturer to hedge
against exchange rate uncertainties and competition. We consider a single
product and single period. Because of long-lead times, the capacity investment
must done before the selling season begins when the exchange rate between the
two countries is uncertain. we consider a duopoly competition in the foreign
country. We model the exchange rate as a random variable. We investigate the
impact of competition and exchange rate on optimal capacities and optimal
prices. We show how competition can impact the decision of the home
manufacturer to enter the foreign market.

arXiv link: http://arxiv.org/abs/1804.08153v1

Econometrics arXiv updated paper (originally submitted: 2018-04-21)

Empirical Equilibrium

Authors: Rodrigo A. Velez, Alexander L. Brown

We study the foundations of empirical equilibrium, a refinement of Nash
equilibrium that is based on a non-parametric characterization of empirical
distributions of behavior in games (Velez and Brown,2020b arXiv:1907.12408).
The refinement can be alternatively defined as those Nash equilibria that do
not refute the regular QRE theory of Goeree, Holt, and Palfrey (2005). By
contrast, some empirical equilibria may refute monotone additive randomly
disturbed payoff models. As a by product, we show that empirical equilibrium
does not coincide with refinements based on approximation by monotone additive
randomly disturbed payoff models, and further our understanding of the
empirical content of these models.

arXiv link: http://arxiv.org/abs/1804.07986v3

Econometrics arXiv paper, submitted: 2018-04-18

Transaction Costs in Collective Waste Recovery Systems in the EU

Authors: Shteryo Nozharov

The study aims to identify the institutional flaws of the current EU waste
management model by analysing the economic model of extended producer
responsibility and collective waste management systems and to create a model
for measuring the transaction costs borne by waste recovery organizations. The
model was approbated by analysing the Bulgarian collective waste management
systems that have been complying with the EU legislation for the last 10 years.
The analysis focuses on waste oils because of their economic importance and the
limited number of studies and analyses in this field as the predominant body of
research to date has mainly addressed packaging waste, mixed household waste or
discarded electrical and electronic equipment. The study aims to support the
process of establishing a circular economy in the EU, which was initiated in
2015.

arXiv link: http://arxiv.org/abs/1804.06792v1

Econometrics arXiv paper, submitted: 2018-04-18

Estimating Treatment Effects in Mover Designs

Authors: Peter Hull

Researchers increasingly leverage movement across multiple treatments to
estimate causal effects. While these "mover regressions" are often motivated by
a linear constant-effects model, it is not clear what they capture under weaker
quasi-experimental assumptions. I show that binary treatment mover regressions
recover a convex average of four difference-in-difference comparisons and are
thus causally interpretable under a standard parallel trends assumption.
Estimates from multiple-treatment models, however, need not be causal without
stronger restrictions on the heterogeneity of treatment effects and
time-varying shocks. I propose a class of two-step estimators to isolate and
combine the large set of difference-in-difference quasi-experiments generated
by a mover design, identifying mover average treatment effects under
conditional-on-covariate parallel trends and effect homogeneity restrictions. I
characterize the efficient estimators in this class and derive specification
tests based on the model's overidentifying restrictions. Future drafts will
apply the theory to the Finkelstein et al. (2016) movers design, analyzing the
causal effects of geography on healthcare utilization.

arXiv link: http://arxiv.org/abs/1804.06721v1

Econometrics arXiv paper, submitted: 2018-04-17

Revisiting the thermal and superthermal two-class distribution of incomes: A critical perspective

Authors: Markus P. A. Schneider

This paper offers a two-pronged critique of the empirical investigation of
the income distribution performed by physicists over the past decade. Their
finding rely on the graphical analysis of the observed distribution of
normalized incomes. Two central observations lead to the conclusion that the
majority of incomes are exponentially distributed, but neither each individual
piece of evidence nor their concurrent observation robustly proves that the
thermal and superthermal mixture fits the observed distribution of incomes
better than reasonable alternatives. A formal analysis using popular measures
of fit shows that while an exponential distribution with a power-law tail
provides a better fit of the IRS income data than the log-normal distribution
(often assumed by economists), the thermal and superthermal mixture's fit can
be improved upon further by adding a log-normal component. The economic
implications of the thermal and superthermal distribution of incomes, and the
expanded mixture are explored in the paper.

arXiv link: http://arxiv.org/abs/1804.06341v1

Econometrics arXiv updated paper (originally submitted: 2018-04-17)

Dissection of Bitcoin's Multiscale Bubble History from January 2012 to February 2018

Authors: Jan-Christian Gerlach, Guilherme Demos, Didier Sornette

We present a detailed bubble analysis of the Bitcoin to US Dollar price
dynamics from January 2012 to February 2018. We introduce a robust automatic
peak detection method that classifies price time series into periods of
uninterrupted market growth (drawups) and regimes of uninterrupted market
decrease (drawdowns). In combination with the Lagrange Regularisation Method
for detecting the beginning of a new market regime, we identify 3 major peaks
and 10 additional smaller peaks, that have punctuated the dynamics of Bitcoin
price during the analyzed time period. We explain this classification of long
and short bubbles by a number of quantitative metrics and graphs to understand
the main socio-economic drivers behind the ascent of Bitcoin over this period.
Then, a detailed analysis of the growing risks associated with the three long
bubbles using the Log-Periodic Power Law Singularity (LPPLS) model is based on
the LPPLS Confidence Indicators, defined as the fraction of qualified fits of
the LPPLS model over multiple time windows. Furthermore, for various fictitious
'present' times $t_2$ before the crashes, we employ a clustering method to
group the predicted critical times $t_c$ of the LPPLS fits over different time
scales, where $t_c$ is the most probable time for the ending of the bubble.
Each cluster is proposed as a plausible scenario for the subsequent Bitcoin
price evolution. We present these predictions for the three long bubbles and
the four short bubbles that our time scale of analysis was able to resolve.
Overall, our predictive scheme provides useful information to warn of an
imminent crash risk.

arXiv link: http://arxiv.org/abs/1804.06261v4

Econometrics arXiv paper, submitted: 2018-04-16

Quantifying the Economic Case for Electric Semi-Trucks

Authors: Shashank Sripad, Venkatasubramanian Viswanathan

There has been considerable interest in the electrification of freight
transport, particularly heavy-duty trucks to downscale the greenhouse-gas (GHG)
emissions from the transportation sector. However, the economic competitiveness
of electric semi-trucks is uncertain as there are substantial additional
initial costs associated with the large battery packs required. In this work,
we analyze the trade-off between the initial investment and the operating cost
for realistic usage scenarios to compare a fleet of electric semi-trucks with a
range of 500 miles with a fleet of diesel trucks. For the baseline case with
30% of fleet requiring battery pack replacements and a price differential of
US$50,000, we find a payback period of about 3 years. Based on sensitivity
analysis, we find that the fraction of the fleet that requires battery pack
replacements is a major factor. For the case with 100% replacement fraction,
the payback period could be as high as 5-6 years. We identify the price of
electricity as the second most important variable, where a price of
US$0.14/kWh, the payback period could go up to 5 years. Electric semi-trucks
are expected to lead to savings due to reduced repairs and magnitude of these
savings could play a crucial role in the payback period as well. With increased
penetration of autonomous vehicles, the annual mileage of semi-trucks could
substantially increase and this heavily sways in favor of electric semi-trucks,
bringing down the payback period to around 2 years at an annual mileage of
120,000 miles. There is an undeniable economic case for electric semi-trucks
and developing battery packs with longer cycle life and higher specific energy
would make this case even stronger.

arXiv link: http://arxiv.org/abs/1804.05974v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2018-04-16

Bitcoin market route to maturity? Evidence from return fluctuations, temporal correlations and multiscaling effects

Authors: Stanisław Drożdż, Robert Gębarowski, Ludovico Minati, Paweł Oświęcimka, Marcin Wątorek

Based on 1-minute price changes recorded since year 2012, the fluctuation
properties of the rapidly-emerging Bitcoin (BTC) market are assessed over
chosen sub-periods, in terms of return distributions, volatility
autocorrelation, Hurst exponents and multiscaling effects. The findings are
compared to the stylized facts of mature world markets. While early trading was
affected by system-specific irregularities, it is found that over the months
preceding Apr 2018 all these statistical indicators approach the features
hallmarking maturity. This can be taken as an indication that the Bitcoin
market, and possibly other cryptocurrencies, carry concrete potential of
imminently becoming a regular market, alternative to the foreign exchange
(Forex). Since high-frequency price data are available since the beginning of
trading, the Bitcoin offers a unique window into the statistical
characteristics of a market maturation trajectory.

arXiv link: http://arxiv.org/abs/1804.05916v2

Econometrics arXiv updated paper (originally submitted: 2018-04-16)

Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects

Authors: Liyang Sun, Sarah Abraham

To estimate the dynamic effects of an absorbing treatment, researchers often
use two-way fixed effects regressions that include leads and lags of the
treatment. We show that in settings with variation in treatment timing across
units, the coefficient on a given lead or lag can be contaminated by effects
from other periods, and apparent pretrends can arise solely from treatment
effects heterogeneity. We propose an alternative estimator that is free of
contamination, and illustrate the relative shortcomings of two-way fixed
effects regressions with leads and lags through an empirical application.

arXiv link: http://arxiv.org/abs/1804.05785v2

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2018-04-16

Triggers for cooperative behavior in the thermodynamic limit: a case study in Public goods game

Authors: Colin Benjamin, Shubhayan Sarkar

In this work, we aim to answer the question: what triggers cooperative
behavior in the thermodynamic limit by taking recourse to the Public goods
game. Using the idea of mapping the 1D Ising model Hamiltonian with nearest
neighbor coupling to payoffs in the game theory we calculate the Magnetisation
of the game in the thermodynamic limit. We see a phase transition in the
thermodynamic limit of the two player Public goods game. We observe that
punishment acts as an external field for the two player Public goods game
triggering cooperation or provide strategy, while cost can be a trigger for
suppressing cooperation or free riding. Finally, reward also acts as a trigger
for providing while the role of inverse temperature (fluctuations in choices)
is to introduce randomness in strategic choices.

arXiv link: http://arxiv.org/abs/1804.06465v2

Econometrics arXiv paper, submitted: 2018-04-15

Shapley Value Methods for Attribution Modeling in Online Advertising

Authors: Kaifeng Zhao, Seyed Hanif Mahboobi, Saeed R. Bagheri

This paper re-examines the Shapley value methods for attribution analysis in
the area of online advertising. As a credit allocation solution in cooperative
game theory, Shapley value method directly quantifies the contribution of
online advertising inputs to the advertising key performance indicator (KPI)
across multiple channels. We simplify its calculation by developing an
alternative mathematical formulation. The new formula significantly improves
the computational efficiency and therefore extends the scope of applicability.
Based on the simplified formula, we further develop the ordered Shapley value
method. The proposed method is able to take into account the order of channels
visited by users. We claim that it provides a more comprehensive insight by
evaluating the attribution of channels at different stages of user conversion
journeys. The proposed approaches are illustrated using a real-world online
advertising campaign dataset.

arXiv link: http://arxiv.org/abs/1804.05327v1

Econometrics arXiv paper, submitted: 2018-04-13

Aide et Croissance dans les pays de l'Union Economique et Mon{é}taire Ouest Africaine (UEMOA) : retour sur une relation controvers{é}e

Authors: Nimonka Bayale

The main purpose of this paper is to analyze threshold effects of official
development assistance (ODA) on economic growth in WAEMU zone countries. To
achieve this, the study is based on OECD and WDI data covering the period
1980-2015 and used Hansen's Panel Threshold Regression (PTR) model to
"bootstrap" aid threshold above which its effectiveness is effective. The
evidence strongly supports the view that the relationship between aid and
economic growth is non-linear with a unique threshold which is 12.74% GDP.
Above this value, the marginal effect of aid is 0.69 points, "all things being
equal to otherwise". One of the main contribution of this paper is to show that
WAEMU countries need investments that could be covered by the foreign aid. This
later one should be considered just as a complementary resource. Thus, WEAMU
countries should continue to strengthen their efforts in internal resource
mobilization in order to fulfil this need.

arXiv link: http://arxiv.org/abs/1805.00435v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2018-04-13

Large Sample Properties of Partitioning-Based Series Estimators

Authors: Matias D. Cattaneo, Max H. Farrell, Yingjie Feng

We present large sample results for partitioning-based least squares
nonparametric regression, a popular method for approximating conditional
expectation functions in statistics, econometrics, and machine learning. First,
we obtain a general characterization of their leading asymptotic bias. Second,
we establish integrated mean squared error approximations for the point
estimator and propose feasible tuning parameter selection. Third, we develop
pointwise inference methods based on undersmoothing and robust bias correction.
Fourth, employing different coupling approaches, we develop uniform
distributional approximations for the undersmoothed and robust bias-corrected
t-statistic processes and construct valid confidence bands. In the univariate
case, our uniform distributional approximations require seemingly minimal rate
restrictions and improve on approximation rates known in the literature.
Finally, we apply our general results to three partitioning-based estimators:
splines, wavelets, and piecewise polynomials. The supplemental appendix
includes several other general and example-specific technical and
methodological results. A companion R package is provided.

arXiv link: http://arxiv.org/abs/1804.04916v3

Econometrics arXiv paper, submitted: 2018-04-10

Moment Inequalities in the Context of Simulated and Predicted Variables

Authors: Hiroaki Kaido, Jiaxuan Li, Marc Rysman

This paper explores the effects of simulated moments on the performance of
inference methods based on moment inequalities. Commonly used confidence sets
for parameters are level sets of criterion functions whose boundary points may
depend on sample moments in an irregular manner. Due to this feature,
simulation errors can affect the performance of inference in non-standard ways.
In particular, a (first-order) bias due to the simulation errors may remain in
the estimated boundary of the confidence set. We demonstrate, through Monte
Carlo experiments, that simulation errors can significantly reduce the coverage
probabilities of confidence sets in small samples. The size distortion is
particularly severe when the number of inequality restrictions is large. These
results highlight the danger of ignoring the sampling variations due to the
simulation errors in moment inequality models. Similar issues arise when using
predicted variables in moment inequalities models. We propose a method for
properly correcting for these variations based on regularizing the intersection
of moments in parameter space, and we show that our proposed method performs
well theoretically and in practice.

arXiv link: http://arxiv.org/abs/1804.03674v1

Econometrics arXiv paper, submitted: 2018-04-10

Inference on Local Average Treatment Effects for Misclassified Treatment

Authors: Takahide Yanagi

We develop point-identification for the local average treatment effect when
the binary treatment contains a measurement error. The standard instrumental
variable estimator is inconsistent for the parameter since the measurement
error is non-classical by construction. We correct the problem by identifying
the distribution of the measurement error based on the use of an exogenous
variable that can even be a binary covariate. The moment conditions derived
from the identification lead to generalized method of moments estimation with
asymptotically valid inferences. Monte Carlo simulations and an empirical
illustration demonstrate the usefulness of the proposed procedure.

arXiv link: http://arxiv.org/abs/1804.03349v1

Econometrics arXiv updated paper (originally submitted: 2018-04-09)

Varying Random Coefficient Models

Authors: Christoph Breunig

This paper provides a new methodology to analyze unobserved heterogeneity
when observed characteristics are modeled nonlinearly. The proposed model
builds on varying random coefficients (VRC) that are determined by nonlinear
functions of observed regressors and additively separable unobservables. This
paper proposes a novel estimator of the VRC density based on weighted sieve
minimum distance. The main example of sieve bases are Hermite functions which
yield a numerically stable estimation procedure. This paper shows inference
results that go beyond what has been shown in ordinary RC models. We provide in
each case rates of convergence and also establish pointwise limit theory of
linear functionals, where a prominent example is the density of potential
outcomes. In addition, a multiplier bootstrap procedure is proposed to
construct uniform confidence bands. A Monte Carlo study examines finite sample
properties of the estimator and shows that it performs well even when the
regressors associated to RC are far from being heavy tailed. Finally, the
methodology is applied to analyze heterogeneity in income elasticity of demand
for housing.

arXiv link: http://arxiv.org/abs/1804.03110v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2018-04-06

Statistical inference for autoregressive models under heteroscedasticity of unknown form

Authors: Ke Zhu

This paper provides an entire inference procedure for the autoregressive
model under (conditional) heteroscedasticity of unknown form with a finite
variance. We first establish the asymptotic normality of the weighted least
absolute deviations estimator (LADE) for the model. Second, we develop the
random weighting (RW) method to estimate its asymptotic covariance matrix,
leading to the implementation of the Wald test. Third, we construct a
portmanteau test for model checking, and use the RW method to obtain its
critical values. As a special weighted LADE, the feasible adaptive LADE (ALADE)
is proposed and proved to have the same efficiency as its infeasible
counterpart. The importance of our entire methodology based on the feasible
ALADE is illustrated by simulation results and the real data analysis on three
U.S. economic data sets.

arXiv link: http://arxiv.org/abs/1804.02348v2

Econometrics arXiv updated paper (originally submitted: 2018-04-05)

Simultaneous Mean-Variance Regression

Authors: Richard Spady, Sami Stouli

We propose simultaneous mean-variance regression for the linear estimation
and approximation of conditional mean functions. In the presence of
heteroskedasticity of unknown form, our method accounts for varying dispersion
in the regression outcome across the support of conditioning variables by using
weights that are jointly determined with the mean regression parameters.
Simultaneity generates outcome predictions that are guaranteed to improve over
ordinary least-squares prediction error, with corresponding parameter standard
errors that are automatically valid. Under shape misspecification of the
conditional mean and variance functions, we establish existence and uniqueness
of the resulting approximations and characterize their formal interpretation
and robustness properties. In particular, we show that the corresponding
mean-variance regression location-scale model weakly dominates the ordinary
least-squares location model under a Kullback-Leibler measure of divergence,
with strict improvement in the presence of heteroskedasticity. The simultaneous
mean-variance regression loss function is globally convex and the corresponding
estimator is easy to implement. We establish its consistency and asymptotic
normality under misspecification, provide robust inference methods, and present
numerical simulations that show large improvements over ordinary and weighted
least-squares in terms of estimation and inference in finite samples. We
further illustrate our method with two empirical applications to the estimation
of the relationship between economic prosperity in 1500 and today, and demand
for gasoline in the United States.

arXiv link: http://arxiv.org/abs/1804.01631v2

Econometrics arXiv updated paper (originally submitted: 2018-04-04)

A Bayesian panel VAR model to analyze the impact of climate change on high-income economies

Authors: Florian Huber, Tamás Krisztin, Michael Pfarrhofer

In this paper, we assess the impact of climate shocks on futures markets for
agricultural commodities and a set of macroeconomic quantities for multiple
high-income economies. To capture relations among countries, markets, and
climate shocks, this paper proposes parsimonious methods to estimate
high-dimensional panel VARs. We assume that coefficients associated with
domestic lagged endogenous variables arise from a Gaussian mixture model while
further parsimony is achieved using suitable global-local shrinkage priors on
several regions of the parameter space. Our results point towards pronounced
global reactions of key macroeconomic quantities to climate shocks. Moreover,
the empirical findings highlight substantial linkages between regionally
located climate shifts and global commodity markets.

arXiv link: http://arxiv.org/abs/1804.01554v3

Econometrics arXiv updated paper (originally submitted: 2018-04-04)

Should We Adjust for the Test for Pre-trends in Difference-in-Difference Designs?

Authors: Jonathan Roth

The common practice in difference-in-difference (DiD) designs is to check for
parallel trends prior to treatment assignment, yet typical estimation and
inference does not account for the fact that this test has occurred. I analyze
the properties of the traditional DiD estimator conditional on having passed
(i.e. not rejected) the test for parallel pre-trends. When the DiD design is
valid and the test for pre-trends confirms it, the typical DiD estimator is
unbiased, but traditional standard errors are overly conservative.
Additionally, there exists an alternative unbiased estimator that is more
efficient than the traditional DiD estimator under parallel trends. However,
when in population there is a non-zero pre-trend but we fail to reject the
hypothesis of parallel pre-trends, the DiD estimator is generally biased
relative to the population DiD coefficient. Moreover, if the trend is monotone,
then under reasonable assumptions the bias from conditioning exacerbates the
bias relative to the true treatment effect. I propose new estimation and
inference procedures that account for the test for parallel trends, and compare
their performance to that of the traditional estimator in a Monte Carlo
simulation.

arXiv link: http://arxiv.org/abs/1804.01208v2

Econometrics arXiv updated paper (originally submitted: 2018-04-01)

Continuous Record Laplace-based Inference about the Break Date in Structural Change Models

Authors: Alessandro Casini, Pierre Perron

Building upon the continuous record asymptotic framework recently introduced
by Casini and Perron (2018a) for inference in structural change models, we
propose a Laplace-based (Quasi-Bayes) procedure for the construction of the
estimate and confidence set for the date of a structural change. It is defined
by an integration rather than an optimization-based method. A transformation of
the least-squares criterion function is evaluated in order to derive a proper
distribution, referred to as the Quasi-posterior. For a given choice of a loss
function, the Laplace-type estimator is the minimizer of the expected risk with
the expectation taken under the Quasi-posterior. Besides providing an
alternative estimate that is more precise|lower mean absolute error (MAE) and
lower root-mean squared error (RMSE)|than the usual least-squares one, the
Quasi-posterior distribution can be used to construct asymptotically valid
inference using the concept of Highest Density Region. The resulting
Laplace-based inferential procedure is shown to have lower MAE and RMSE, and
the confidence sets strike the best balance between empirical coverage rates
and average lengths of the confidence sets relative to traditional long-span
methods, whether the break size is small or large.

arXiv link: http://arxiv.org/abs/1804.00232v3

Econometrics arXiv paper, submitted: 2018-03-29

Mortality in a heterogeneous population - Lee-Carter's methodology

Authors: Kamil Jodź

The EU Solvency II directive recommends insurance companies to pay more
attention to the risk management methods. The sense of risk management is the
ability to quantify risk and apply methods that reduce uncertainty. In life
insurance, the risk is a consequence of the random variable describing the life
expectancy. The article will present a proposal for stochastic mortality
modeling based on the Lee and Carter methodology. The maximum likelihood method
is often used to estimate parameters in mortality models. This method assumes
that the population is homogeneous and the number of deaths has the Poisson
distribution. The aim of this article is to change assumptions about the
distribution of the number of deaths. The results indicate that the model can
get a better match to historical data, when the number of deaths has a negative
binomial distribution.

arXiv link: http://arxiv.org/abs/1803.11233v1

Econometrics arXiv updated paper (originally submitted: 2018-03-29)

Bi-Demographic Changes and Current Account using SVAR Modeling

Authors: Hassan B. Ghassan, Hassan R. Al-Hajhoj, Faruk Balli

The paper aims to explore the impacts of bi-demographic structure on the
current account and growth. Using a SVAR modeling, we track the dynamic impacts
between these underlying variables. New insights have been developed about the
dynamic interrelation between population growth, current account and economic
growth. The long-run net impact on economic growth of the domestic working
population growth and demand labor for emigrants is positive, due to the
predominant contribution of skilled emigrant workers. Besides, the positive
long-run contribution of emigrant workers to the current account growth largely
compensates the negative contribution from the native population, because of
the predominance of skilled compared to unskilled workforce. We find that a
positive shock in demand labor for emigrant workers leads to an increasing
effect on native active age ratio. Thus, the emigrants appear to be more
complements than substitutes for native workers.

arXiv link: http://arxiv.org/abs/1803.11161v4

Econometrics arXiv updated paper (originally submitted: 2018-03-29)

Tests for Forecast Instability and Forecast Failure under a Continuous Record Asymptotic Framework

Authors: Alessandro Casini

We develop a novel continuous-time asymptotic framework for inference on
whether the predictive ability of a given forecast model remains stable over
time. We formally define forecast instability from the economic forecaster's
perspective and highlight that the time duration of the instability bears no
relationship with stable period. Our approach is applicable in forecasting
environment involving low-frequency as well as high-frequency macroeconomic and
financial variables. As the sampling interval between observations shrinks to
zero the sequence of forecast losses is approximated by a continuous-time
stochastic process (i.e., an Ito semimartingale) possessing certain pathwise
properties. We build an hypotheses testing problem based on the local
properties of the continuous-time limit counterpart of the sequence of losses.
The null distribution follows an extreme value distribution. While controlling
the statistical size well, our class of test statistics feature uniform power
over the location of the forecast failure in the sample. The test statistics
are designed to have power against general form of insatiability and are robust
to common forms of non-stationarity such as heteroskedasticty and serial
correlation. The gains in power are substantial relative to extant methods,
especially when the instability is short-lasting and when occurs toward the
tail of the sample.

arXiv link: http://arxiv.org/abs/1803.10883v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2018-03-28

Continuous Record Asymptotics for Change-Points Models

Authors: Alessandro Casini, Pierre Perron

For a partial structural change in a linear regression model with a single
break, we develop a continuous record asymptotic framework to build inference
methods for the break date. We have T observations with a sampling frequency h
over a fixed time horizon [0, N] , and let T with h 0 while keeping the time
span N fixed. We impose very mild regularity conditions on an underlying
continuous-time model assumed to generate the data. We consider the
least-squares estimate of the break date and establish consistency and
convergence rate. We provide a limit theory for shrinking magnitudes of shifts
and locally increasing variances. The asymptotic distribution corresponds to
the location of the extremum of a function of the quadratic variation of the
regressors and of a Gaussian centered martingale process over a certain time
interval. We can account for the asymmetric informational content provided by
the pre- and post-break regimes and show how the location of the break and
shift magnitude are key ingredients in shaping the distribution. We consider a
feasible version based on plug-in estimates, which provides a very good
approximation to the finite sample distribution. We use the concept of Highest
Density Region to construct confidence sets. Overall, our method is reliable
and delivers accurate coverage probabilities and relatively short average
length of the confidence sets. Importantly, it does so irrespective of the size
of the break.

arXiv link: http://arxiv.org/abs/1803.10881v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2018-03-28

Generalized Laplace Inference in Multiple Change-Points Models

Authors: Alessandro Casini, Pierre Perron

Under the classical long-span asymptotic framework we develop a class of
Generalized Laplace (GL) inference methods for the change-point dates in a
linear time series regression model with multiple structural changes analyzed
in, e.g., Bai and Perron (1998). The GL estimator is defined by an integration
rather than optimization-based method and relies on the least-squares criterion
function. It is interpreted as a classical (non-Bayesian) estimator and the
inference methods proposed retain a frequentist interpretation. This approach
provides a better approximation about the uncertainty in the data of the
change-points relative to existing methods. On the theoretical side, depending
on some input (smoothing) parameter, the class of GL estimators exhibits a dual
limiting distribution; namely, the classical shrinkage asymptotic distribution,
or a Bayes-type asymptotic distribution. We propose an inference method based
on Highest Density Regions using the latter distribution. We show that it has
attractive theoretical properties not shared by the other popular alternatives,
i.e., it is bet-proof. Simulations confirm that these theoretical properties
translate to better finite-sample performance.

arXiv link: http://arxiv.org/abs/1803.10871v4

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2018-03-27

Emergence of Cooperation in the thermodynamic limit

Authors: Colin Benjamin, Shubhayan Sarkar

Predicting how cooperative behavior arises in the thermodynamic limit is one
of the outstanding problems in evolutionary game theory. For two player games,
cooperation is seldom the Nash equilibrium. However, in the thermodynamic limit
cooperation is the natural recourse regardless of whether we are dealing with
humans or animals. In this work, we use the analogy with the Ising model to
predict how cooperation arises in the thermodynamic limit.

arXiv link: http://arxiv.org/abs/1803.10083v2

Econometrics arXiv paper, submitted: 2018-03-27

A Perfect Specialization Model for Gravity Equation in Bilateral Trade based on Production Structure

Authors: Majid Einian, Farshad Ranjbar Ravasan

Although initially originated as a totally empirical relationship to explain
the volume of trade between two partners, gravity equation has been the focus
of several theoretic models that try to explain it. Specialization models are
of great importance in providing a solid theoretic ground for gravity equation
in bilateral trade. Some research papers try to improve specialization models
by adding imperfect specialization to model, but we believe it is unnecessary
complication. We provide a perfect specialization model based on the phenomenon
we call tradability, which overcomes the problems with simpler initial. We
provide empirical evidence using estimates on panel data of bilateral trade of
40 countries over 10 years that support the theoretical model. The empirical
results have implied that tradability is the only reason for deviations of data
from basic perfect specialization models.

arXiv link: http://arxiv.org/abs/1803.09935v1

Econometrics arXiv updated paper (originally submitted: 2018-03-26)

Panel Data Analysis with Heterogeneous Dynamics

Authors: Ryo Okui, Takahide Yanagi

This paper proposes a model-free approach to analyze panel data with
heterogeneous dynamic structures across observational units. We first compute
the sample mean, autocovariances, and autocorrelations for each unit, and then
estimate the parameters of interest based on their empirical distributions. We
then investigate the asymptotic properties of our estimators using double
asymptotics and propose split-panel jackknife bias correction and inference
based on the cross-sectional bootstrap. We illustrate the usefulness of our
procedures by studying the deviation dynamics of the law of one price. Monte
Carlo simulations confirm that the proposed bias correction is effective and
yields valid inference in small samples.

arXiv link: http://arxiv.org/abs/1803.09452v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2018-03-24

Efficient Discovery of Heterogeneous Quantile Treatment Effects in Randomized Experiments via Anomalous Pattern Detection

Authors: Edward McFowland III, Sriram Somanchi, Daniel B. Neill

In the recent literature on estimating heterogeneous treatment effects, each
proposed method makes its own set of restrictive assumptions about the
intervention's effects and which subpopulations to explicitly estimate.
Moreover, the majority of the literature provides no mechanism to identify
which subpopulations are the most affected--beyond manual inspection--and
provides little guarantee on the correctness of the identified subpopulations.
Therefore, we propose Treatment Effect Subset Scan (TESS), a new method for
discovering which subpopulation in a randomized experiment is most
significantly affected by a treatment. We frame this challenge as a pattern
detection problem where we efficiently maximize a nonparametric scan statistic
(a measure of the conditional quantile treatment effect) over subpopulations.
Furthermore, we identify the subpopulation which experiences the largest
distributional change as a result of the intervention, while making minimal
assumptions about the intervention's effects or the underlying data generating
process. In addition to the algorithm, we demonstrate that under the sharp null
hypothesis of no treatment effect, the asymptotic Type I and II error can be
controlled, and provide sufficient conditions for detection consistency--i.e.,
exact identification of the affected subpopulation. Finally, we validate the
efficacy of the method by discovering heterogeneous treatment effects in
simulations and in real-world data from a well-known program evaluation study.

arXiv link: http://arxiv.org/abs/1803.09159v3

Econometrics arXiv updated paper (originally submitted: 2018-03-24)

Schooling Choice, Labour Market Matching, and Wages

Authors: Jacob Schwartz

We develop inference for a two-sided matching model where the characteristics
of agents on one side of the market are endogenous due to pre-matching
investments. The model can be used to measure the impact of frictions in labour
markets using a single cross-section of matched employer-employee data. The
observed matching of workers to firms is the outcome of a discrete, two-sided
matching process where firms with heterogeneous preferences over education
sequentially choose workers according to an index correlated with worker
preferences over firms. The distribution of education arises in equilibrium
from a Bayesian game: workers, knowing the distribution of worker and firm
types, invest in education prior to the matching process. Although the observed
matching exhibits strong cross-sectional dependence due to the matching
process, we propose an asymptotically valid inference procedure that combines
discrete choice methods with simulation.

arXiv link: http://arxiv.org/abs/1803.09020v6

Econometrics arXiv updated paper (originally submitted: 2018-03-23)

Difference-in-Differences with Multiple Time Periods

Authors: Brantly Callaway, Pedro H. C. Sant'Anna

In this article, we consider identification, estimation, and inference
procedures for treatment effect parameters using Difference-in-Differences
(DiD) with (i) multiple time periods, (ii) variation in treatment timing, and
(iii) when the "parallel trends assumption" holds potentially only after
conditioning on observed covariates. We show that a family of causal effect
parameters are identified in staggered DiD setups, even if differences in
observed characteristics create non-parallel outcome dynamics between groups.
Our identification results allow one to use outcome regression, inverse
probability weighting, or doubly-robust estimands. We also propose different
aggregation schemes that can be used to highlight treatment effect
heterogeneity across different dimensions as well as to summarize the overall
effect of participating in the treatment. We establish the asymptotic
properties of the proposed estimators and prove the validity of a
computationally convenient bootstrap procedure to conduct asymptotically valid
simultaneous (instead of pointwise) inference. Finally, we illustrate the
relevance of our proposed tools by analyzing the effect of the minimum wage on
teen employment from 2001--2007. Open-source software is available for
implementing the proposed methods.

arXiv link: http://arxiv.org/abs/1803.09015v4

Econometrics arXiv updated paper (originally submitted: 2018-03-23)

How does monetary policy affect income inequality in Japan? Evidence from grouped data

Authors: Martin Feldkircher, Kazuhiko Kakamu

We examine the effects of monetary policy on income inequality in Japan using
a novel econometric approach that jointly estimates the Gini coefficient based
on micro-level grouped data of households and the dynamics of macroeconomic
quantities. Our results indicate different effects on income inequality for
different types of households: A monetary tightening increases inequality when
income data is based on households whose head is employed (workers'
households), while the effect reverses over the medium term when considering a
broader definition of households. Differences in the relative strength of the
transmission channels can account for this finding. Finally we demonstrate that
the proposed joint estimation strategy leads to more informative inference
while results based on the frequently used two-step estimation approach yields
inconclusive results.

arXiv link: http://arxiv.org/abs/1803.08868v2

Econometrics arXiv updated paper (originally submitted: 2018-03-23)

Decentralized Pure Exchange Processes on Networks

Authors: Daniele Cassese, Paolo Pin

We define a class of pure exchange Edgeworth trading processes that under
minimal assumptions converge to a stable set in the space of allocations, and
characterise the Pareto set of these processes. Choosing a specific process
belonging to this class, that we define fair trading, we analyse the trade
dynamics between agents located on a weighted network. We determine the
conditions under which there always exists a one-to-one map between the set of
networks and the set of limit points of the dynamics. This result is used to
understand what is the effect of the network topology on the trade dynamics and
on the final allocation. We find that the positions in the network affect the
distribution of the utility gains, given the initial allocations

arXiv link: http://arxiv.org/abs/1803.08836v7

Econometrics arXiv paper, submitted: 2018-03-22

Causal Inference for Survival Analysis

Authors: Vikas Ramachandra

In this paper, we propose the use of causal inference techniques for survival
function estimation and prediction for subgroups of the data, upto individual
units. Tree ensemble methods, specifically random forests were modified for
this purpose. A real world healthcare dataset was used with about 1800 patients
with breast cancer, which has multiple patient covariates as well as disease
free survival days (DFS) and a death event binary indicator (y). We use the
type of cancer curative intervention as the treatment variable (T=0 or 1,
binary treatment case in our example). The algorithm is a 2 step approach. In
step 1, we estimate heterogeneous treatment effects using a causalTree with the
DFS as the dependent variable. Next, in step 2, for each selected leaf of the
causalTree with distinctly different average treatment effect (with respect to
survival), we fit a survival forest to all the patients in that leaf, one
forest each for treatment T=0 as well as T=1 to get estimated patient level
survival curves for each treatment (more generally, any model can be used at
this step). Then, we subtract the patient level survival curves to get the
differential survival curve for a given patient, to compare the survival
function as a result of the 2 treatments. The path to a selected leaf also
gives us the combination of patient features and their values which are
causally important for the treatment effect difference at the leaf.

arXiv link: http://arxiv.org/abs/1803.08218v1

Econometrics arXiv updated paper (originally submitted: 2018-03-21)

Two-way fixed effects estimators with heterogeneous treatment effects

Authors: Clément de Chaisemartin, Xavier D'Haultfœuille

Linear regressions with period and group fixed effects are widely used to
estimate treatment effects. We show that they estimate weighted sums of the
average treatment effects (ATE) in each group and period, with weights that may
be negative. Due to the negative weights, the linear regression coefficient may
for instance be negative while all the ATEs are positive. We propose another
estimator that solves this issue. In the two applications we revisit, it is
significantly different from the linear regression estimator.

arXiv link: http://arxiv.org/abs/1803.08807v7

Econometrics arXiv updated paper (originally submitted: 2018-03-21)

Network and Panel Quantile Effects Via Distribution Regression

Authors: Victor Chernozhukov, Iván Fernández-Val, Martin Weidner

This paper provides a method to construct simultaneous confidence bands for
quantile functions and quantile effects in nonlinear network and panel models
with unobserved two-way effects, strictly exogenous covariates, and possibly
discrete outcome variables. The method is based upon projection of simultaneous
confidence bands for distribution functions constructed from fixed effects
distribution regression estimators. These fixed effects estimators are debiased
to deal with the incidental parameter problem. Under asymptotic sequences where
both dimensions of the data set grow at the same rate, the confidence bands for
the quantile functions and effects have correct joint coverage in large
samples. An empirical application to gravity models of trade illustrates the
applicability of the methods to network data.

arXiv link: http://arxiv.org/abs/1803.08154v3

Econometrics arXiv updated paper (originally submitted: 2018-03-21)

Testing Continuity of a Density via g-order statistics in the Regression Discontinuity Design

Authors: Federico A. Bugni, Ivan A. Canay

In the regression discontinuity design (RDD), it is common practice to assess
the credibility of the design by testing the continuity of the density of the
running variable at the cut-off, e.g., McCrary (2008). In this paper we propose
an approximate sign test for continuity of a density at a point based on the
so-called g-order statistics, and study its properties under two complementary
asymptotic frameworks. In the first asymptotic framework, the number q of
observations local to the cut-off is fixed as the sample size n diverges to
infinity, while in the second framework q diverges to infinity slowly as n
diverges to infinity. Under both of these frameworks, we show that the test we
propose is asymptotically valid in the sense that it has limiting rejection
probability under the null hypothesis not exceeding the nominal level. More
importantly, the test is easy to implement, asymptotically valid under weaker
conditions than those used by competing methods, and exhibits finite sample
validity under stronger conditions than those needed for its asymptotic
validity. In a simulation study, we find that the approximate sign test
provides good control of the rejection probability under the null hypothesis
while remaining competitive under the alternative hypothesis. We finally apply
our test to the design in Lee (2008), a well-known application of the RDD to
study incumbency advantage.

arXiv link: http://arxiv.org/abs/1803.07951v6

Econometrics arXiv updated paper (originally submitted: 2018-03-20)

Testing for Unobserved Heterogeneous Treatment Effects with Observational Data

Authors: Yu-Chin Hsu, Ta-Cheng Huang, Haiqing Xu

Unobserved heterogeneous treatment effects have been emphasized in the recent
policy evaluation literature (see e.g., Heckman and Vytlacil, 2005). This paper
proposes a nonparametric test for unobserved heterogeneous treatment effects in
a treatment effect model with a binary treatment assignment, allowing for
individuals' self-selection to the treatment. Under the standard local average
treatment effects assumptions, i.e., the no defiers condition, we derive
testable model restrictions for the hypothesis of unobserved heterogeneous
treatment effects. Also, we show that if the treatment outcomes satisfy a
monotonicity assumption, these model restrictions are also sufficient. Then, we
propose a modified Kolmogorov-Smirnov-type test which is consistent and simple
to implement. Monte Carlo simulations show that our test performs well in
finite samples. For illustration, we apply our test to study heterogeneous
treatment effects of the Job Training Partnership Act on earnings and the
impacts of fertility on family income, where the null hypothesis of homogeneous
treatment effects gets rejected in the second case but fails to be rejected in
the first application.

arXiv link: http://arxiv.org/abs/1803.07514v2

Econometrics arXiv updated paper (originally submitted: 2018-03-19)

Adversarial Generalized Method of Moments

Authors: Greg Lewis, Vasilis Syrgkanis

We provide an approach for learning deep neural net representations of models
described via conditional moment restrictions. Conditional moment restrictions
are widely used, as they are the language by which social scientists describe
the assumptions they make to enable causal inference. We formulate the problem
of estimating the underling model as a zero-sum game between a modeler and an
adversary and apply adversarial training. Our approach is similar in nature to
Generative Adversarial Networks (GAN), though here the modeler is learning a
representation of a function that satisfies a continuum of moment conditions
and the adversary is identifying violating moments. We outline ways of
constructing effective adversaries in practice, including kernels centered by
k-means clustering, and random forests. We examine the practical performance of
our approach in the setting of non-parametric instrumental variable regression.

arXiv link: http://arxiv.org/abs/1803.07164v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2018-03-18

Large-Scale Dynamic Predictive Regressions

Authors: Daniele Bianchi, Kenichiro McAlinn

We develop a novel "decouple-recouple" dynamic predictive strategy and
contribute to the literature on forecasting and economic decision making in a
data-rich environment. Under this framework, clusters of predictors generate
different latent states in the form of predictive densities that are later
synthesized within an implied time-varying latent factor model. As a result,
the latent inter-dependencies across predictive densities and biases are
sequentially learned and corrected. Unlike sparse modeling and variable
selection procedures, we do not assume a priori that there is a given subset of
active predictors, which characterize the predictive density of a quantity of
interest. We test our procedure by investigating the predictive content of a
large set of financial ratios and macroeconomic variables on both the equity
premium across different industries and the inflation rate in the U.S., two
contexts of topical interest in finance and macroeconomics. We find that our
predictive synthesis framework generates both statistically and economically
significant out-of-sample benefits while maintaining interpretability of the
forecasting variables. In addition, the main empirical results highlight that
our proposed framework outperforms both LASSO-type shrinkage regressions,
factor based dimension reduction, sequential variable selection, and
equal-weighted linear pooling methodologies.

arXiv link: http://arxiv.org/abs/1803.06738v1

Econometrics arXiv paper, submitted: 2018-03-16

Evaluating Conditional Cash Transfer Policies with Machine Learning Methods

Authors: Tzai-Shuen Chen

This paper presents an out-of-sample prediction comparison between major
machine learning models and the structural econometric model. Over the past
decade, machine learning has established itself as a powerful tool in many
prediction applications, but this approach is still not widely adopted in
empirical economic studies. To evaluate the benefits of this approach, I use
the most common machine learning algorithms, CART, C4.5, LASSO, random forest,
and adaboost, to construct prediction models for a cash transfer experiment
conducted by the Progresa program in Mexico, and I compare the prediction
results with those of a previous structural econometric study. Two prediction
tasks are performed in this paper: the out-of-sample forecast and the long-term
within-sample simulation. For the out-of-sample forecast, both the mean
absolute error and the root mean square error of the school attendance rates
found by all machine learning models are smaller than those found by the
structural model. Random forest and adaboost have the highest accuracy for the
individual outcomes of all subgroups. For the long-term within-sample
simulation, the structural model has better performance than do all of the
machine learning models. The poor within-sample fitness of the machine learning
model results from the inaccuracy of the income and pregnancy prediction
models. The result shows that the machine learning model performs better than
does the structural model when there are many data to learn; however, when the
data are limited, the structural model offers a more sensible prediction. The
findings of this paper show promise for adopting machine learning in economic
policy analyses in the era of big data.

arXiv link: http://arxiv.org/abs/1803.06401v1

Econometrics arXiv paper, submitted: 2018-03-16

Business Cycles in Economics

Authors: Viktor O. Ledenyov, Dimitri O. Ledenyov

The business cycles are generated by the oscillating macro-/micro-/nano-
economic output variables in the economy of the scale and the scope in the
amplitude/frequency/phase/time domains in the economics. The accurate forward
looking assumptions on the business cycles oscillation dynamics can optimize
the financial capital investing and/or borrowing by the economic agents in the
capital markets. The book's main objective is to study the business cycles in
the economy of the scale and the scope, formulating the Ledenyov unified
business cycles theory in the Ledenyov classic and quantum econodynamics.

arXiv link: http://arxiv.org/abs/1803.06108v1

Econometrics arXiv cross-link from cs.CG (cs.CG), submitted: 2018-03-15

Practical volume computation of structured convex bodies, and an application to modeling portfolio dependencies and financial crises

Authors: Ludovic Cales, Apostolos Chalkis, Ioannis Z. Emiris, Vissarion Fisikopoulos

We examine volume computation of general-dimensional polytopes and more
general convex bodies, defined as the intersection of a simplex by a family of
parallel hyperplanes, and another family of parallel hyperplanes or a family of
concentric ellipsoids. Such convex bodies appear in modeling and predicting
financial crises. The impact of crises on the economy (labor, income, etc.)
makes its detection of prime interest. Certain features of dependencies in the
markets clearly identify times of turmoil. We describe the relationship between
asset characteristics by means of a copula; each characteristic is either a
linear or quadratic form of the portfolio components, hence the copula can be
constructed by computing volumes of convex bodies. We design and implement
practical algorithms in the exact and approximate setting, we experimentally
juxtapose them and study the tradeoff of exactness and accuracy for speed. We
analyze the following methods in order of increasing generality: rejection
sampling relying on uniformly sampling the simplex, which is the fastest
approach, but inaccurate for small volumes; exact formulae based on the
computation of integrals of probability distribution functions; an optimized
Lawrence sign decomposition method, since the polytopes at hand are shown to be
simple; Markov chain Monte Carlo algorithms using random walks based on the
hit-and-run paradigm generalized to nonlinear convex bodies and relying on new
methods for computing a ball enclosed; the latter is experimentally extended to
non-convex bodies with very encouraging results. Our C++ software, based on
CGAL and Eigen and available on github, is shown to be very effective in up to
100 dimensions. Our results offer novel, effective means of computing portfolio
dependencies and an indicator of financial crises, which is shown to correctly
identify past crises.

arXiv link: http://arxiv.org/abs/1803.05861v1

Econometrics arXiv paper, submitted: 2018-03-15

Are Bitcoin Bubbles Predictable? Combining a Generalized Metcalfe's Law and the LPPLS Model

Authors: Spencer Wheatley, Didier Sornette, Tobias Huber, Max Reppen, Robert N. Gantner

We develop a strong diagnostic for bubbles and crashes in bitcoin, by
analyzing the coincidence (and its absence) of fundamental and technical
indicators. Using a generalized Metcalfe's law based on network properties, a
fundamental value is quantified and shown to be heavily exceeded, on at least
four occasions, by bubbles that grow and burst. In these bubbles, we detect a
universal super-exponential unsustainable growth. We model this universal
pattern with the Log-Periodic Power Law Singularity (LPPLS) model, which
parsimoniously captures diverse positive feedback phenomena, such as herding
and imitation. The LPPLS model is shown to provide an ex-ante warning of market
instabilities, quantifying a high crash hazard and probabilistic bracket of the
crash time consistent with the actual corrections; although, as always, the
precise time and trigger (which straw breaks the camel's back) being exogenous
and unpredictable. Looking forward, our analysis identifies a substantial but
not unprecedented overvaluation in the price of bitcoin, suggesting many months
of volatile sideways bitcoin prices ahead (from the time of writing, March
2018).

arXiv link: http://arxiv.org/abs/1803.05663v1

Econometrics arXiv paper, submitted: 2018-03-15

Does agricultural subsidies foster Italian southern farms? A Spatial Quantile Regression Approach

Authors: Marusca De Castris, Daniele Di Gennaro

During the last decades, public policies become a central pillar in
supporting and stabilising agricultural sector. In 1962, EU policy-makers
developed the so-called Common Agricultural Policy (CAP) to ensure
competitiveness and a common market organisation for agricultural products,
while 2003 reform decouple the CAP from the production to focus only on income
stabilization and the sustainability of agricultural sector. Notwithstanding
farmers are highly dependent to public support, literature on the role played
by the CAP in fostering agricultural performances is still scarce and
fragmented. Actual CAP policies increases performance differentials between
Northern Central EU countries and peripheral regions. This paper aims to
evaluate the effectiveness of CAP in stimulate performances by focusing on
Italian lagged Regions. Moreover, agricultural sector is deeply rooted in
place-based production processes. In this sense, economic analysis which omit
the presence of spatial dependence produce biased estimates of the
performances. Therefore, this paper, using data on subsidies and economic
results of farms from the RICA dataset which is part of the Farm Accountancy
Data Network (FADN), proposes a spatial Augmented Cobb-Douglas Production
Function to evaluate the effects of subsidies on farm's performances. The major
innovation in this paper is the implementation of a micro-founded quantile
version of a spatial lag model to examine how the impact of the subsidies may
vary across the conditional distribution of agricultural performances. Results
show an increasing shape which switch from negative to positive at the median
and becomes statistical significant for higher quantiles. Additionally, spatial
autocorrelation parameter is positive and significant across all the
conditional distribution, suggesting the presence of significant spatial
spillovers in agricultural performances.

arXiv link: http://arxiv.org/abs/1803.05659v1

Econometrics arXiv updated paper (originally submitted: 2018-03-14)

Limitations of P-Values and $R^2$ for Stepwise Regression Building: A Fairness Demonstration in Health Policy Risk Adjustment

Authors: Sherri Rose, Thomas G. McGuire

Stepwise regression building procedures are commonly used applied statistical
tools, despite their well-known drawbacks. While many of their limitations have
been widely discussed in the literature, other aspects of the use of individual
statistical fit measures, especially in high-dimensional stepwise regression
settings, have not. Giving primacy to individual fit, as is done with p-values
and $R^2$, when group fit may be the larger concern, can lead to misguided
decision making. One of the most consequential uses of stepwise regression is
in health care, where these tools allocate hundreds of billions of dollars to
health plans enrolling individuals with different predicted health care costs.
The main goal of this "risk adjustment" system is to convey incentives to
health plans such that they provide health care services fairly, a component of
which is not to discriminate in access or care for persons or groups likely to
be expensive. We address some specific limitations of p-values and $R^2$ for
high-dimensional stepwise regression in this policy problem through an
illustrated example by additionally considering a group-level fairness metric.

arXiv link: http://arxiv.org/abs/1803.05513v2

Econometrics arXiv updated paper (originally submitted: 2018-03-13)

Inference on a Distribution from Noisy Draws

Authors: Koen Jochmans, Martin Weidner

We consider a situation where the distribution of a random variable is being
estimated by the empirical distribution of noisy measurements of that variable.
This is common practice in, for example, teacher value-added models and other
fixed-effect models for panel data. We use an asymptotic embedding where the
noise shrinks with the sample size to calculate the leading bias in the
empirical distribution arising from the presence of noise. The leading bias in
the empirical quantile function is equally obtained. These calculations are new
in the literature, where only results on smooth functionals such as the mean
and variance have been derived. We provide both analytical and jackknife
corrections that recenter the limit distribution and yield confidence intervals
with correct coverage in large samples. Our approach can be connected to
corrections for selection bias and shrinkage estimation and is to be contrasted
with deconvolution. Simulation results confirm the much-improved sampling
behavior of the corrected estimators. An empirical illustration on
heterogeneity in deviations from the law of one price is equally provided.

arXiv link: http://arxiv.org/abs/1803.04991v5

Econometrics arXiv paper, submitted: 2018-03-13

How Smart Are `Water Smart Landscapes'?

Authors: Christa Brelsford, Joshua K. Abbott

Understanding the effectiveness of alternative approaches to water
conservation is crucially important for ensuring the security and reliability
of water services for urban residents. We analyze data from one of the
longest-running "cash for grass" policies - the Southern Nevada Water
Authority's Water Smart Landscapes program, where homeowners are paid to
replace grass with xeric landscaping. We use a twelve year long panel dataset
of monthly water consumption records for 300,000 households in Las Vegas,
Nevada. Utilizing a panel difference-in-differences approach, we estimate the
average water savings per square meter of turf removed. We find that
participation in this program reduced the average treated household's
consumption by 18 percent. We find no evidence that water savings degrade as
the landscape ages, or that water savings per unit area are influenced by the
value of the rebate. Depending on the assumed time horizon of benefits from
turf removal, we find that the WSL program cost the water authority about $1.62
per thousand gallons of water saved, which compares favorably to alternative
means of water conservation or supply augmentation.

arXiv link: http://arxiv.org/abs/1803.04593v1

Econometrics arXiv updated paper (originally submitted: 2018-03-09)

A study of strategy to the remove and ease TBT for increasing export in GCC6 countries

Authors: YongJae Kim

The last technical barriers to trade(TBT) between countries are Non-Tariff
Barriers(NTBs), meaning all trade barriers are possible other than Tariff
Barriers. And the most typical examples are (TBT), which refer to measure
Technical Regulation, Standards, Procedure for Conformity Assessment, Test &
Certification etc. Therefore, in order to eliminate TBT, WTO has made all
membership countries automatically enter into an agreement on TBT

arXiv link: http://arxiv.org/abs/1803.03394v3

Econometrics arXiv paper, submitted: 2018-03-08

Does the time horizon of the return predictive effect of investor sentiment vary with stock characteristics? A Granger causality analysis in the frequency domain

Authors: Yong Jiang, Zhongbao Zhou

Behavioral theories posit that investor sentiment exhibits predictive power
for stock returns, whereas there is little study have investigated the
relationship between the time horizon of the predictive effect of investor
sentiment and the firm characteristics. To this end, by using a Granger
causality analysis in the frequency domain proposed by Lemmens et al. (2008),
this paper examine whether the time horizon of the predictive effect of
investor sentiment on the U.S. returns of stocks vary with different firm
characteristics (e.g., firm size (Size), book-to-market equity (B/M) rate,
operating profitability (OP) and investment (Inv)). The empirical results
indicate that investor sentiment has a long-term (more than 12 months) or
short-term (less than 12 months) predictive effect on stock returns with
different firm characteristics. Specifically, the investor sentiment has strong
predictability in the stock returns for smaller Size stocks, lower B/M stocks
and lower OP stocks, both in the short term and long term, but only has a
short-term predictability for higher quantile ones. The investor sentiment
merely has predictability for the returns of smaller Inv stocks in the short
term, but has a strong short-term and long-term predictability for larger Inv
stocks. These results have important implications for the investors for the
planning of the short and the long run stock investment strategy.

arXiv link: http://arxiv.org/abs/1803.02962v1

Econometrics arXiv cross-link from cs.CR (cs.CR), submitted: 2018-03-07

A first look at browser-based Cryptojacking

Authors: Shayan Eskandari, Andreas Leoutsarakos, Troy Mursch, Jeremy Clark

In this paper, we examine the recent trend towards in-browser mining of
cryptocurrencies; in particular, the mining of Monero through Coinhive and
similar code- bases. In this model, a user visiting a website will download a
JavaScript code that executes client-side in her browser, mines a
cryptocurrency, typically without her consent or knowledge, and pays out the
seigniorage to the website. Websites may consciously employ this as an
alternative or to supplement advertisement revenue, may offer premium content
in exchange for mining, or may be unwittingly serving the code as a result of a
breach (in which case the seigniorage is collected by the attacker). The
cryptocurrency Monero is preferred seemingly for its unfriendliness to
large-scale ASIC mining that would drive browser-based efforts out of the
market, as well as for its purported privacy features. In this paper, we survey
this landscape, conduct some measurements to establish its prevalence and
profitability, outline an ethical framework for considering whether it should
be classified as an attack or business opportunity, and make suggestions for
the detection, mitigation and/or prevention of browser-based mining for non-
consenting users.

arXiv link: http://arxiv.org/abs/1803.02887v1

Econometrics arXiv updated paper (originally submitted: 2018-03-06)

Almost Sure Uniqueness of a Global Minimum Without Convexity

Authors: Gregory Cox

This paper establishes the argmin of a random objective function to be unique
almost surely. This paper first formulates a general result that proves almost
sure uniqueness without convexity of the objective function. The general result
is then applied to a variety of applications in statistics. Four applications
are discussed, including uniqueness of M-estimators, both classical likelihood
and penalized likelihood estimators, and two applications of the argmin
theorem, threshold regression and weak identification.

arXiv link: http://arxiv.org/abs/1803.02415v3

Econometrics arXiv updated paper (originally submitted: 2018-03-06)

A Nonparametric Approach to Measure the Heterogeneous Spatial Association: Under Spatial Temporal Data

Authors: Zihao Yuan

Spatial association and heterogeneity are two critical areas in the research
about spatial analysis, geography, statistics and so on. Though large amounts
of outstanding methods has been proposed and studied, there are few of them
tend to study spatial association under heterogeneous environment.
Additionally, most of the traditional methods are based on distance statistic
and spatial weighted matrix. However, in some abstract spatial situations,
distance statistic can not be applied since we can not even observe the
geographical locations directly. Meanwhile, under these circumstances, due to
invisibility of spatial positions, designing of weight matrix can not
absolutely avoid subjectivity. In this paper, a new entropy-based method, which
is data-driven and distribution-free, has been proposed to help us investigate
spatial association while fully taking the fact that heterogeneity widely
exist. Specifically, this method is not bounded with distance statistic or
weight matrix. Asymmetrical dependence is adopted to reflect the heterogeneity
in spatial association for each individual and the whole discussion in this
paper is performed on spatio-temporal data with only assuming stationary
m-dependent over time.

arXiv link: http://arxiv.org/abs/1803.02334v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2018-03-06

An Online Algorithm for Learning Buyer Behavior under Realistic Pricing Restrictions

Authors: Debjyoti Saharoy, Theja Tulabandhula

We propose a new efficient online algorithm to learn the parameters governing
the purchasing behavior of a utility maximizing buyer, who responds to prices,
in a repeated interaction setting. The key feature of our algorithm is that it
can learn even non-linear buyer utility while working with arbitrary price
constraints that the seller may impose. This overcomes a major shortcoming of
previous approaches, which use unrealistic prices to learn these parameters
making them unsuitable in practice.

arXiv link: http://arxiv.org/abs/1803.01968v1

Econometrics arXiv paper, submitted: 2018-03-05

Testing a Goodwin model with general capital accumulation rate

Authors: Matheus R. Grasselli, Aditya Maheshwari

We perform econometric tests on a modified Goodwin model where the capital
accumulation rate is constant but not necessarily equal to one as in the
original model (Goodwin, 1967). In addition to this modification, we find that
addressing the methodological and reporting issues in Harvie (2000) leads to
remarkably better results, with near perfect agreement between the estimates of
equilibrium employment rates and the corresponding empirical averages, as well
as significantly improved estimates of equilibrium wage shares. Despite its
simplicity and obvious limitations, the performance of the modified Goodwin
model implied by our results show that it can be used as a starting point for
more sophisticated models for endogenous growth cycles.

arXiv link: http://arxiv.org/abs/1803.01536v1

Econometrics arXiv paper, submitted: 2018-03-05

Pricing Mechanism in Information Goods

Authors: Xinming Li, Huaqing Wang

We study three pricing mechanisms' performance and their effects on the
participants in the data industry from the data supply chain perspective. A
win-win pricing strategy for the players in the data supply chain is proposed.
We obtain analytical solutions in each pricing mechanism, including the
decentralized and centralized pricing, Nash Bargaining pricing, and revenue
sharing mechanism.

arXiv link: http://arxiv.org/abs/1803.01530v1

Econometrics arXiv paper, submitted: 2018-03-05

A comment on 'Testing Goodwin: growth cycles in ten OECD countries'

Authors: Matheus R. Grasselli, Aditya Maheshwari

We revisit the results of Harvie (2000) and show how correcting for a
reporting mistake in some of the estimated parameter values leads to
significantly different conclusions, including realistic parameter values for
the Philips curve and estimated equilibrium employment rates exhibiting on
average one tenth of the relative error of those obtained in Harvie (2000).

arXiv link: http://arxiv.org/abs/1803.01527v1

Econometrics arXiv updated paper (originally submitted: 2018-03-04)

An Note on Why Geographically Weighted Regression Overcomes Multidimensional-Kernel-Based Varying-Coefficient Model

Authors: Zihao Yuan

It is widely known that geographically weighted regression(GWR) is
essentially same as varying-coefficient model. In the former research about
varying-coefficient model, scholars tend to use multidimensional-kernel-based
locally weighted estimation(MLWE) so that information of both distance and
direction is considered. However, when we construct the local weight matrix of
geographically weighted estimation, distance among the locations in the
neighbor is the only factor controlling the value of entries of weight matrix.
In other word, estimation of GWR is distance-kernel-based. Thus, in this paper,
under stationary and limited dependent data with multidimensional subscripts,
we analyze the local mean squared properties of without any assumption of the
form of coefficient functions and compare it with MLWE. According to the
theoretical and simulation results, geographically-weighted locally linear
estimation(GWLE) is asymptotically more efficient than MLWE. Furthermore, a
relationship between optimal bandwith selection and design of scale parameters
is also obtained.

arXiv link: http://arxiv.org/abs/1803.01402v2

Econometrics arXiv updated paper (originally submitted: 2018-03-02)

Permutation Tests for Equality of Distributions of Functional Data

Authors: Federico A. Bugni, Joel L. Horowitz

Economic data are often generated by stochastic processes that take place in
continuous time, though observations may occur only at discrete times. For
example, electricity and gas consumption take place in continuous time. Data
generated by a continuous time stochastic process are called functional data.
This paper is concerned with comparing two or more stochastic processes that
generate functional data. The data may be produced by a randomized experiment
in which there are multiple treatments. The paper presents a method for testing
the hypothesis that the same stochastic process generates all the functional
data. The test described here applies to both functional data and multiple
treatments. It is implemented as a combination of two permutation tests. This
ensures that in finite samples, the true and nominal probabilities that each
test rejects a correct null hypothesis are equal. The paper presents upper and
lower bounds on the asymptotic power of the test under alternative hypotheses.
The results of Monte Carlo experiments and an application to an experiment on
billing and pricing of natural gas illustrate the usefulness of the test.

arXiv link: http://arxiv.org/abs/1803.00798v4

Econometrics arXiv paper, submitted: 2018-03-01

Deep Learning for Causal Inference

Authors: Vikas Ramachandra

In this paper, we propose deep learning techniques for econometrics,
specifically for causal inference and for estimating individual as well as
average treatment effects. The contribution of this paper is twofold: 1. For
generalized neighbor matching to estimate individual and average treatment
effects, we analyze the use of autoencoders for dimensionality reduction while
maintaining the local neighborhood structure among the data points in the
embedding space. This deep learning based technique is shown to perform better
than simple k nearest neighbor matching for estimating treatment effects,
especially when the data points have several features/covariates but reside in
a low dimensional manifold in high dimensional space. We also observe better
performance than manifold learning methods for neighbor matching. 2. Propensity
score matching is one specific and popular way to perform matching in order to
estimate average and individual treatment effects. We propose the use of deep
neural networks (DNNs) for propensity score matching, and present a network
called PropensityNet for this. This is a generalization of the logistic
regression technique traditionally used to estimate propensity scores and we
show empirically that DNNs perform better than logistic regression at
propensity score matching. Code for both methods will be made available shortly
on Github at: https://github.com/vikas84bf

arXiv link: http://arxiv.org/abs/1803.00149v1

Econometrics arXiv paper, submitted: 2018-02-28

Synthetic Control Methods and Big Data

Authors: Daniel Kinn

Many macroeconomic policy questions may be assessed in a case study
framework, where the time series of a treated unit is compared to a
counterfactual constructed from a large pool of control units. I provide a
general framework for this setting, tailored to predict the counterfactual by
minimizing a tradeoff between underfitting (bias) and overfitting (variance).
The framework nests recently proposed structural and reduced form machine
learning approaches as special cases. Furthermore, difference-in-differences
with matching and the original synthetic control are restrictive cases of the
framework, in general not minimizing the bias-variance objective. Using
simulation studies I find that machine learning methods outperform traditional
methods when the number of potential controls is large or the treated unit is
substantially different from the controls. Equipped with a toolbox of
approaches, I revisit a study on the effect of economic liberalisation on
economic growth. I find effects for several countries where no effect was found
in the original study. Furthermore, I inspect how a systematically important
bank respond to increasing capital requirements by using a large pool of banks
to estimate the counterfactual. Finally, I assess the effect of a changing
product price on product sales using a novel scanner dataset.

arXiv link: http://arxiv.org/abs/1803.00096v1

Econometrics arXiv paper, submitted: 2018-02-28

Dimensional Analysis in Economics: A Study of the Neoclassical Economic Growth Model

Authors: Miguel Alvarez Texocotitla, M. David Alvarez Hernandez, Shani Alvarez Hernandez

The fundamental purpose of the present research article is to introduce the
basic principles of Dimensional Analysis in the context of the neoclassical
economic theory, in order to apply such principles to the fundamental relations
that underlay most models of economic growth. In particular, basic instruments
from Dimensional Analysis are used to evaluate the analytical consistency of
the Neoclassical economic growth model. The analysis shows that an adjustment
to the model is required in such a way that the principle of dimensional
homogeneity is satisfied.

arXiv link: http://arxiv.org/abs/1802.10528v1

Econometrics arXiv paper, submitted: 2018-02-28

Partial Identification of Expectations with Interval Data

Authors: Sam Asher, Paul Novosad, Charlie Rafkin

A conditional expectation function (CEF) can at best be partially identified
when the conditioning variable is interval censored. When the number of bins is
small, existing methods often yield minimally informative bounds. We propose
three innovations that make meaningful inference possible in interval data
contexts. First, we prove novel nonparametric bounds for contexts where the
distribution of the censored variable is known. Second, we show that a class of
measures that describe the conditional mean across a fixed interval of the
conditioning space can often be bounded tightly even when the CEF itself
cannot. Third, we show that a constraint on CEF curvature can either tighten
bounds or can substitute for the monotonicity assumption often made in interval
data applications. We derive analytical bounds that use the first two
innovations, and develop a numerical method to calculate bounds under the
third. We show the performance of the method in simulations and then present
two applications. First, we resolve a known problem in the estimation of
mortality as a function of education: because individuals with high school or
less are a smaller and thus more negatively selected group over time, estimates
of their mortality change are likely to be biased. Our method makes it possible
to hold education rank bins constant over time, revealing that current
estimates of rising mortality for less educated women are biased upward in some
cases by a factor of three. Second, we apply the method to the estimation of
intergenerational mobility, where researchers frequently use coarsely measured
education data in the many contexts where matched parent-child income data are
unavailable. Conventional measures like the rank-rank correlation may be
uninformative once interval censoring is taken into account; CEF interval-based
measures of mobility are bounded tightly.

arXiv link: http://arxiv.org/abs/1802.10490v1

Econometrics arXiv updated paper (originally submitted: 2018-02-27)

On the solution of the variational optimisation in the rational inattention framework

Authors: Nigar Hashimzade

I analyse the solution method for the variational optimisation problem in the
rational inattention framework proposed by Christopher A. Sims. The solution,
in general, does not exist, although it may exist in exceptional cases. I show
that the solution does not exist for the quadratic and the logarithmic
objective functions analysed by Sims (2003, 2006). For a linear-quadratic
objective function a solution can be constructed under restrictions on all but
one of its parameters. This approach is, therefore, unlikely to be applicable
to a wider set of economic models.

arXiv link: http://arxiv.org/abs/1802.09869v2

Econometrics arXiv paper, submitted: 2018-02-25

Identifying the occurrence or non occurrence of cognitive bias in situations resembling the Monty Hall problem

Authors: Fatemeh Borhani, Edward J. Green

People reason heuristically in situations resembling inferential puzzles such
as Bertrand's box paradox and the Monty Hall problem. The practical
significance of that fact for economic decision making is uncertain because a
departure from sound reasoning may, but does not necessarily, result in a
"cognitively biased" outcome different from what sound reasoning would have
produced. Criteria are derived here, applicable to both experimental and
non-experimental situations, for heuristic reasoning in an inferential-puzzle
situations to result, or not to result, in cognitively bias. In some
situations, neither of these criteria is satisfied, and whether or not agents'
posterior probability assessments or choices are cognitively biased cannot be
determined.

arXiv link: http://arxiv.org/abs/1802.08935v1

Econometrics arXiv updated paper (originally submitted: 2018-02-24)

Kernel Estimation for Panel Data with Heterogeneous Dynamics

Authors: Ryo Okui, Takahide Yanagi

This paper proposes nonparametric kernel-smoothing estimation for panel data
to examine the degree of heterogeneity across cross-sectional units. We first
estimate the sample mean, autocovariances, and autocorrelations for each unit
and then apply kernel smoothing to compute their density functions. The
dependence of the kernel estimator on bandwidth makes asymptotic bias of very
high order affect the required condition on the relative magnitudes of the
cross-sectional sample size (N) and the time-series length (T). In particular,
it makes the condition on N and T stronger and more complicated than those
typically observed in the long-panel literature without kernel smoothing. We
also consider a split-panel jackknife method to correct bias and construction
of confidence intervals. An empirical application and Monte Carlo simulations
illustrate our procedure in finite samples.

arXiv link: http://arxiv.org/abs/1802.08825v4

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2018-02-24

Measuring the Demand Effects of Formal and Informal Communication : Evidence from Online Markets for Illicit Drugs

Authors: Luis Armona

I present evidence that communication between marketplace participants is an
important influence on market demand. I find that consumer demand is
approximately equally influenced by communication on both formal and informal
networks- namely, product reviews and community forums. In addition, I find
empirical evidence of a vendor's ability to commit to disclosure dampening the
effect of communication on demand. I also find that product demand is more
responsive to average customer sentiment as the number of messages grows, as
may be expected in a Bayesian updating framework.

arXiv link: http://arxiv.org/abs/1802.08778v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2018-02-23

De-Biased Machine Learning of Global and Local Parameters Using Regularized Riesz Representers

Authors: Victor Chernozhukov, Whitney Newey, Rahul Singh

We provide adaptive inference methods, based on $\ell_1$ regularization, for
regular (semi-parametric) and non-regular (nonparametric) linear functionals of
the conditional expectation function. Examples of regular functionals include
average treatment effects, policy effects, and derivatives. Examples of
non-regular functionals include average treatment effects, policy effects, and
derivatives conditional on a covariate subvector fixed at a point. We construct
a Neyman orthogonal equation for the target parameter that is approximately
invariant to small perturbations of the nuisance parameters. To achieve this
property, we include the Riesz representer for the functional as an additional
nuisance parameter. Our analysis yields weak “double sparsity robustness”:
either the approximation to the regression or the approximation to the
representer can be “completely dense” as long as the other is sufficiently
“sparse”. Our main results are non-asymptotic and imply asymptotic uniform
validity over large classes of models, translating into honest confidence bands
for both global and local parameters.

arXiv link: http://arxiv.org/abs/1802.08667v6

Econometrics arXiv paper, submitted: 2018-02-21

Algorithmic Collusion in Cournot Duopoly Market: Evidence from Experimental Economics

Authors: Nan Zhou, Li Zhang, Shijian Li, Zhijian Wang

Algorithmic collusion is an emerging concept in current artificial
intelligence age. Whether algorithmic collusion is a creditable threat remains
as an argument. In this paper, we propose an algorithm which can extort its
human rival to collude in a Cournot duopoly competing market. In experiments,
we show that, the algorithm can successfully extorted its human rival and gets
higher profit in long run, meanwhile the human rival will fully collude with
the algorithm. As a result, the social welfare declines rapidly and stably.
Both in theory and in experiment, our work confirms that, algorithmic collusion
can be a creditable threat. In application, we hope, the frameworks, the
algorithm design as well as the experiment environment illustrated in this
work, can be an incubator or a test bed for researchers and policymakers to
handle the emerging algorithmic collusion.

arXiv link: http://arxiv.org/abs/1802.08061v1

Econometrics arXiv paper, submitted: 2018-02-21

The Security of the United Kingdom Electricity Imports under Conditions of High European Demand

Authors: Anthony D Stephens, David R Walwyn

Energy policy in Europe has been driven by the three goals of security of
supply, economic competitiveness and environmental sustainability, referred to
as the energy trilemma. Although there are clear conflicts within the trilemma,
member countries have acted to facilitate a fully integrated European
electricity market. Interconnection and cross-border electricity trade has been
a fundamental part of such market liberalisation. However, it has been
suggested that consumers are exposed to a higher price volatility as a
consequence of interconnection. Furthermore, during times of energy shortages
and high demand, issues of national sovereignty take precedence over
cooperation. In this article, the unique and somewhat peculiar conditions of
early 2017 within France, Germany and the United Kingdom have been studied to
understand how the existing integration arrangements address the energy
trilemma. It is concluded that the dominant interests are economic and national
security; issues of environmental sustainability are neglected or overridden.
Although the optimisation of European electricity generation to achieve a lower
overall carbon emission is possible, such a goal is far from being realised.
Furthermore, it is apparent that the United Kingdom, and other countries,
cannot rely upon imports from other countries during periods of high demand
and/or limited supply.

arXiv link: http://arxiv.org/abs/1802.07457v1

Econometrics arXiv updated paper (originally submitted: 2018-02-19)

On the iterated estimation of dynamic discrete choice games

Authors: Federico A. Bugni, Jackson Bunting

We study the asymptotic properties of a class of estimators of the structural
parameters in dynamic discrete choice games. We consider K-stage policy
iteration (PI) estimators, where K denotes the number of policy iterations
employed in the estimation. This class nests several estimators proposed in the
literature such as those in Aguirregabiria and Mira (2002, 2007), Pesendorfer
and Schmidt-Dengler (2008), and Pakes et al. (2007). First, we establish that
the K-PML estimator is consistent and asymptotically normal for all K. This
complements findings in Aguirregabiria and Mira (2007), who focus on K=1 and K
large enough to induce convergence of the estimator. Furthermore, we show under
certain conditions that the asymptotic variance of the K-PML estimator can
exhibit arbitrary patterns as a function of K. Second, we establish that the
K-MD estimator is consistent and asymptotically normal for all K. For a
specific weight matrix, the K-MD estimator has the same asymptotic distribution
as the K-PML estimator. Our main result provides an optimal sequence of weight
matrices for the K-MD estimator and shows that the optimally weighted K-MD
estimator has an asymptotic distribution that is invariant to K. The invariance
result is especially unexpected given the findings in Aguirregabiria and Mira
(2007) for K-PML estimators. Our main result implies two new corollaries about
the optimal 1-MD estimator (derived by Pesendorfer and Schmidt-Dengler (2008)).
First, the optimal 1-MD estimator is optimal in the class of K-MD estimators.
In other words, additional policy iterations do not provide asymptotic
efficiency gains relative to the optimal 1-MD estimator. Second, the optimal
1-MD estimator is more or equally asymptotically efficient than any K-PML
estimator for all K. Finally, the appendix provides appropriate conditions
under which the optimal 1-MD estimator is asymptotically efficient.

arXiv link: http://arxiv.org/abs/1802.06665v4

Econometrics arXiv updated paper (originally submitted: 2018-02-17)

Achieving perfect coordination amongst agents in the co-action minority game

Authors: Hardik Rajpal, Deepak Dhar

We discuss the strategy that rational agents can use to maximize their
expected long-term payoff in the co-action minority game. We argue that the
agents will try to get into a cyclic state, where each of the $(2N +1)$ agent
wins exactly $N$ times in any continuous stretch of $(2N+1)$ days. We propose
and analyse a strategy for reaching such a cyclic state quickly, when any
direct communication between agents is not allowed, and only the publicly
available common information is the record of total number of people choosing
the first restaurant in the past. We determine exactly the average time
required to reach the periodic state for this strategy. We show that it varies
as $(N/\ln 2) [1 + \alpha \cos (2 \pi \log_2 N)$], for large $N$, where the
amplitude $\alpha$ of the leading term in the log-periodic oscillations is
found be $8 \pi^2{(\ln 2)^2} (- 2 \pi^2/\ln 2) \approx
{blue7 \times 10^{-11}}$.

arXiv link: http://arxiv.org/abs/1802.06770v2

Econometrics arXiv paper, submitted: 2018-02-16

The dynamic impact of monetary policy on regional housing prices in the US: Evidence based on factor-augmented vector autoregressions

Authors: Manfred M. Fischer, Florian Huber, Michael Pfarrhofer, Petra Staufer-Steinnocher

In this study interest centers on regional differences in the response of
housing prices to monetary policy shocks in the US. We address this issue by
analyzing monthly home price data for metropolitan regions using a
factor-augmented vector autoregression (FAVAR) model. Bayesian model estimation
is based on Gibbs sampling with Normal-Gamma shrinkage priors for the
autoregressive coefficients and factor loadings, while monetary policy shocks
are identified using high-frequency surprises around policy announcements as
external instruments. The empirical results indicate that monetary policy
actions typically have sizeable and significant positive effects on regional
housing prices, revealing differences in magnitude and duration. The largest
effects are observed in regions located in states on both the East and West
Coasts, notably California, Arizona and Florida.

arXiv link: http://arxiv.org/abs/1802.05870v1

Econometrics arXiv paper, submitted: 2018-02-14

Bootstrap-Assisted Unit Root Testing With Piecewise Locally Stationary Errors

Authors: Yeonwoo Rho, Xiaofeng Shao

In unit root testing, a piecewise locally stationary process is adopted to
accommodate nonstationary errors that can have both smooth and abrupt changes
in second- or higher-order properties. Under this framework, the limiting null
distributions of the conventional unit root test statistics are derived and
shown to contain a number of unknown parameters. To circumvent the difficulty
of direct consistent estimation, we propose to use the dependent wild bootstrap
to approximate the non-pivotal limiting null distributions and provide a
rigorous theoretical justification for bootstrap consistency. The proposed
method is compared through finite sample simulations with the recolored wild
bootstrap procedure, which was developed for errors that follow a
heteroscedastic linear process. Further, a combination of autoregressive sieve
recoloring with the dependent wild bootstrap is shown to perform well. The
validity of the dependent wild bootstrap in a nonstationary setting is
demonstrated for the first time, showing the possibility of extensions to other
inference problems associated with locally stationary processes.

arXiv link: http://arxiv.org/abs/1802.05333v1

Econometrics arXiv cross-link from q-fin.ST (q-fin.ST), submitted: 2018-02-14

Analysis of Financial Credit Risk Using Machine Learning

Authors: Jacky C. K. Chow

Corporate insolvency can have a devastating effect on the economy. With an
increasing number of companies making expansion overseas to capitalize on
foreign resources, a multinational corporate bankruptcy can disrupt the world's
financial ecosystem. Corporations do not fail instantaneously; objective
measures and rigorous analysis of qualitative (e.g. brand) and quantitative
(e.g. econometric factors) data can help identify a company's financial risk.
Gathering and storage of data about a corporation has become less difficult
with recent advancements in communication and information technologies. The
remaining challenge lies in mining relevant information about a company's
health hidden under the vast amounts of data, and using it to forecast
insolvency so that managers and stakeholders have time to react. In recent
years, machine learning has become a popular field in big data analytics
because of its success in learning complicated models. Methods such as support
vector machines, adaptive boosting, artificial neural networks, and Gaussian
processes can be used for recognizing patterns in the data (with a high degree
of accuracy) that may not be apparent to human analysts. This thesis studied
corporate bankruptcy of manufacturing companies in Korea and Poland using
experts' opinions and financial measures, respectively. Using publicly
available datasets, several machine learning methods were applied to learn the
relationship between the company's current state and its fate in the near
future. Results showed that predictions with accuracy greater than 95% were
achievable using any machine learning technique when informative features like
experts' assessment were used. However, when using purely financial factors to
predict whether or not a company will go bankrupt, the correlation is not as
strong.

arXiv link: http://arxiv.org/abs/1802.05326v1

Econometrics arXiv updated paper (originally submitted: 2018-02-13)

A General Method for Demand Inversion

Authors: Lixiong Li

This paper describes a numerical method to solve for mean product qualities
which equates the real market share to the market share predicted by a discrete
choice model. The method covers a general class of discrete choice model,
including the pure characteristics model in Berry and Pakes(2007) and the
random coefficient logit model in Berry et al.(1995) (hereafter BLP). The
method transforms the original market share inversion problem to an
unconstrained convex minimization problem, so that any convex programming
algorithm can be used to solve the inversion. Moreover, such results also imply
that the computational complexity of inverting a demand model should be no more
than that of a convex programming problem. In simulation examples, I show the
method outperforms the contraction mapping algorithm in BLP. I also find the
method remains robust in pure characteristics models with near-zero market
shares.

arXiv link: http://arxiv.org/abs/1802.04444v3

Econometrics arXiv cross-link from q-fin.TR (q-fin.TR), submitted: 2018-02-11

Structural Estimation of Behavioral Heterogeneity

Authors: Zhentao Shi, Huanhuan Zheng

We develop a behavioral asset pricing model in which agents trade in a market
with information friction. Profit-maximizing agents switch between trading
strategies in response to dynamic market conditions. Due to noisy private
information about the fundamental value, the agents form different evaluations
about heterogeneous strategies. We exploit a thin set---a small
sub-population---to pointly identify this nonlinear model, and estimate the
structural parameters using extended method of moments. Based on the estimated
parameters, the model produces return time series that emulate the moments of
the real data. These results are robust across different sample periods and
estimation methods.

arXiv link: http://arxiv.org/abs/1802.03735v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2018-02-11

A Time-Varying Network for Cryptocurrencies

Authors: Li Guo, Wolfgang Karl Härdle, Yubo Tao

Cryptocurrencies return cross-predictability and technological similarity
yield information on risk propagation and market segmentation. To investigate
these effects, we build a time-varying network for cryptocurrencies, based on
the evolution of return cross-predictability and technological similarities. We
develop a dynamic covariate-assisted spectral clustering method to consistently
estimate the latent community structure of cryptocurrencies network that
accounts for both sets of information. We demonstrate that investors can
achieve better risk diversification by investing in cryptocurrencies from
different communities. A cross-sectional portfolio that implements an
inter-crypto momentum trading strategy earns a 1.08% daily return. By
dissecting the portfolio returns on behavioral factors, we confirm that our
results are not driven by behavioral mechanisms.

arXiv link: http://arxiv.org/abs/1802.03708v8

Econometrics arXiv updated paper (originally submitted: 2018-02-09)

Long-Term Unemployed hirings: Should targeted or untargeted policies be preferred?

Authors: Alessandra Pasquini, Marco Centra, Guido Pellegrini

To what extent, hiring incentives targeting a specific group of vulnerable
unemployed (i.e. long term unemployed) are more effective, with respect to
generalised incentives (without a definite target), to increase hirings of the
targeted group? Are generalized incentives able to influence hirings of the
vulnerable group? Do targeted policies have negative side effects too important
to accept them? Even though there is a huge literature on hiring subsidies,
these questions remained unresolved. We tried to answer them, comparing the
impact of two similar hiring policies, one oriented towards a target group and
one generalised, implemented on the italian labour market. We used
administrative data on job contracts, and counterfactual analysis methods. The
targeted policy had a positive and significant impact, while the generalized
policy didn't have a significant impact on the vulnerable group. Moreover, we
concluded the targeted policy didn't have any indirect negative side effect.

arXiv link: http://arxiv.org/abs/1802.03343v2

Econometrics arXiv paper, submitted: 2018-02-09

The Allen--Uzawa elasticity of substitution for nonhomogeneous production functions

Authors: Elena Burmistrova, Sergey Lobanov

This note proves that the representation of the Allen elasticity of
substitution obtained by Uzawa for linear homogeneous functions holds true for
nonhomogeneous functions. It is shown that the criticism of the Allen-Uzawa
elasticity of substitution in the works of Blackorby, Primont, Russell is based
on an incorrect example.

arXiv link: http://arxiv.org/abs/1802.06885v1

Econometrics arXiv paper, submitted: 2018-02-08

Prediction of Shared Bicycle Demand with Wavelet Thresholding

Authors: J. Christopher Westland, Jian Mou, Dafei Yin

Consumers are creatures of habit, often periodic, tied to work, shopping and
other schedules. We analyzed one month of data from the world's largest
bike-sharing company to elicit demand behavioral cycles, initially using models
from animal tracking that showed large customers fit an Ornstein-Uhlenbeck
model with demand peaks at periodicities of 7, 12, 24 hour and 7-days. Lorenz
curves of bicycle demand showed that the majority of customer usage was
infrequent, and demand cycles from time-series models would strongly overfit
the data yielding unreliable models. Analysis of thresholded wavelets for the
space-time tensor of bike-sharing contracts was able to compress the data into
a 56-coefficient model with little loss of information, suggesting that
bike-sharing demand behavior is exceptionally strong and regular. Improvements
to predicted demand could be made by adjusting for 'noise' filtered by our
model from air quality and weather information and demand from infrequent
riders.

arXiv link: http://arxiv.org/abs/1802.02683v1

Econometrics arXiv paper, submitted: 2018-02-07

Random taste heterogeneity in discrete choice models: Flexible nonparametric finite mixture distributions

Authors: Akshay Vij, Rico Krueger

This study proposes a mixed logit model with multivariate nonparametric
finite mixture distributions. The support of the distribution is specified as a
high-dimensional grid over the coefficient space, with equal or unequal
intervals between successive points along the same dimension; the location of
each point on the grid and the probability mass at that point are model
parameters that need to be estimated. The framework does not require the
analyst to specify the shape of the distribution prior to model estimation, but
can approximate any multivariate probability distribution function to any
arbitrary degree of accuracy. The grid with unequal intervals, in particular,
offers greater flexibility than existing multivariate nonparametric
specifications, while requiring the estimation of a small number of additional
parameters. An expectation maximization algorithm is developed for the
estimation of these models. Multiple synthetic datasets and a case study on
travel mode choice behavior are used to demonstrate the value of the model
framework and estimation algorithm. Compared to extant models that incorporate
random taste heterogeneity through continuous mixture distributions, the
proposed model provides better out-of-sample predictive ability. Findings
reveal significant differences in willingness to pay measures between the
proposed model and extant specifications. The case study further demonstrates
the ability of the proposed model to endogenously recover patterns of attribute
non-attendance and choice set formation.

arXiv link: http://arxiv.org/abs/1802.02299v1

Econometrics arXiv updated paper (originally submitted: 2018-02-06)

Forecasting the impact of state pension reforms in post-Brexit England and Wales using microsimulation and deep learning

Authors: Agnieszka Werpachowska

We employ stochastic dynamic microsimulations to analyse and forecast the
pension cost dependency ratio for England and Wales from 1991 to 2061,
evaluating the impact of the ongoing state pension reforms and changes in
international migration patterns under different Brexit scenarios. To fully
account for the recently observed volatility in life expectancies, we propose
mortality rate model based on deep learning techniques, which discovers complex
patterns in data and extrapolated trends. Our results show that the recent
reforms can effectively stave off the "pension crisis" and bring back the
system on a sounder fiscal footing. At the same time, increasingly more workers
can expect to spend greater share of their lifespan in retirement, despite the
eligibility age rises. The population ageing due to the observed postponement
of death until senectitude often occurs with the compression of morbidity, and
thus will not, perforce, intrinsically strain healthcare costs. To a lesser
degree, the future pension cost dependency ratio will depend on the post-Brexit
relations between the UK and the EU, with "soft" alignment on the free movement
lowering the relative cost of the pension system compared to the "hard" one. In
the long term, however, the ratio has a rising tendency.

arXiv link: http://arxiv.org/abs/1802.09427v2

Econometrics arXiv updated paper (originally submitted: 2018-02-05)

An Experimental Investigation of Preference Misrepresentation in the Residency Match

Authors: Alex Rees-Jones, Samuel Skowronek

The development and deployment of matching procedures that incentivize
truthful preference reporting is considered one of the major successes of
market design research. In this study, we test the degree to which these
procedures succeed in eliminating preference misrepresentation. We administered
an online experiment to 1,714 medical students immediately after their
participation in the medical residency match--a leading field application of
strategy-proof market design. When placed in an analogous, incentivized
matching task, we find that 23% of participants misrepresent their preferences.
We explore the factors that predict preference misrepresentation, including
cognitive ability, strategic positioning, overconfidence, expectations, advice,
and trust. We discuss the implications of this behavior for the design of
allocation mechanisms and the social welfare in markets that use them.

arXiv link: http://arxiv.org/abs/1802.01990v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2018-02-02

Voting patterns in 2016: Exploration using multilevel regression and poststratification (MRP) on pre-election polls

Authors: Rob Trangucci, Imad Ali, Andrew Gelman, Doug Rivers

We analyzed 2012 and 2016 YouGov pre-election polls in order to understand
how different population groups voted in the 2012 and 2016 elections. We broke
the data down by demographics and state. We display our findings with a series
of graphs and maps. The R code associated with this project is available at
https://github.com/rtrangucci/mrp_2016_election/.

arXiv link: http://arxiv.org/abs/1802.00842v3

Econometrics arXiv paper, submitted: 2018-02-02

Structural analysis with mixed-frequency data: A MIDAS-SVAR model of US capital flows

Authors: Emanuele Bacchiocchi, Andrea Bastianin, Alessandro Missale, Eduardo Rossi

We develop a new VAR model for structural analysis with mixed-frequency data.
The MIDAS-SVAR model allows to identify structural dynamic links exploiting the
information contained in variables sampled at different frequencies. It also
provides a general framework to test homogeneous frequency-based
representations versus mixed-frequency data models. A set of Monte Carlo
experiments suggests that the test performs well both in terms of size and
power. The MIDAS-SVAR is then used to study how monetary policy and financial
market volatility impact on the dynamics of gross capital inflows to the US.
While no relation is found when using standard quarterly data, exploiting the
variability present in the series within the quarter shows that the effect of
an interest rate shock is greater the longer the time lag between the month of
the shock and the end of the quarter

arXiv link: http://arxiv.org/abs/1802.00793v1

Econometrics arXiv paper, submitted: 2018-01-29

Are `Water Smart Landscapes' Contagious? An epidemic approach on networks to study peer effects

Authors: Christa Brelsford, Caterina De Bacco

We test the existence of a neighborhood based peer effect around
participation in an incentive based conservation program called `Water Smart
Landscapes' (WSL) in the city of Las Vegas, Nevada. We use 15 years of
geo-coded daily records of WSL program applications and approvals compiled by
the Southern Nevada Water Authority and Clark County Tax Assessors rolls for
home characteristics. We use this data to test whether a spatially mediated
peer effect can be observed in WSL participation likelihood at the household
level. We show that epidemic spreading models provide more flexibility in
modeling assumptions, and also provide one mechanism for addressing problems
associated with correlated unobservables than hazards models which can also be
applied to address the same questions. We build networks of neighborhood based
peers for 16 randomly selected neighborhoods in Las Vegas and test for the
existence of a peer based influence on WSL participation by using a
Susceptible-Exposed-Infected-Recovered epidemic spreading model (SEIR), in
which a home can become infected via autoinfection or through contagion from
its infected neighbors. We show that this type of epidemic model can be
directly recast to an additive-multiplicative hazard model, but not to purely
multiplicative one. Using both inference and prediction approaches we find
evidence of peer effects in several Las Vegas neighborhoods.

arXiv link: http://arxiv.org/abs/1801.10516v1

Econometrics arXiv paper, submitted: 2018-01-27

How Can We Induce More Women to Competitions?

Authors: Masayuki Yagasaki, Mitsunosuke Morishita

Why women avoid participating in a competition and how can we encourage them
to participate in it? In this paper, we investigate how social image concerns
affect women's decision to compete. We first construct a theoretical model and
show that participating in a competition, even under affirmative action
policies favoring women, is costly for women under public observability since
it deviates from traditional female gender norms, resulting in women's low
appearance in competitive environments. We propose and theoretically show that
introducing prosocial incentives in the competitive environment is effective
and robust to public observability since (i) it induces women who are
intrinsically motivated by prosocial incentives to the competitive environment
and (ii) it makes participating in a competition not costly for women from
social image point of view. We conduct a laboratory experiment where we
randomly manipulate the public observability of decisions to compete and test
our theoretical predictions. The results of the experiment are fairly
consistent with our theoretical predictions. We suggest that when designing
policies to promote gender equality in competitive environments, using
prosocial incentives through company philanthropy or other social
responsibility policies, either as substitutes or as complements to traditional
affirmative action policies, could be promising.

arXiv link: http://arxiv.org/abs/1801.10518v1

Econometrics arXiv updated paper (originally submitted: 2018-01-26)

Nonseparable Sample Selection Models with Censored Selection Rules

Authors: Iván Fernández-Val, Aico van Vuuren, Francis Vella

We consider identification and estimation of nonseparable sample selection
models with censored selection rules. We employ a control function approach and
discuss different objects of interest based on (1) local effects conditional on
the control function, and (2) global effects obtained from integration over
ranges of values of the control function. We derive the conditions for the
identification of these different objects and suggest strategies for
estimation. Moreover, we provide the associated asymptotic theory. These
strategies are illustrated in an empirical investigation of the determinants of
female wages in the United Kingdom.

arXiv link: http://arxiv.org/abs/1801.08961v2

Econometrics arXiv paper, submitted: 2018-01-26

Ordered Kripke Model, Permissibility, and Convergence of Probabilistic Kripke Model

Authors: Shuige Liu

We define a modification of the standard Kripke model, called the ordered
Kripke model, by introducing a linear order on the set of accessible states of
each state. We first show this model can be used to describe the lexicographic
belief hierarchy in epistemic game theory, and perfect rationalizability can be
characterized within this model. Then we show that each ordered Kripke model is
the limit of a sequence of standard probabilistic Kripke models with a modified
(common) belief operator, in the senses of structure and the
(epsilon-)permissibilities characterized within them.

arXiv link: http://arxiv.org/abs/1801.08767v1

Econometrics arXiv paper, submitted: 2018-01-26

Quantifying Health Shocks Over the Life Cycle

Authors: Taiyo Fukai, Hidehiko Ichimura, Kyogo Kanazawa

We first show (1) the importance of investigating health expenditure process
using the order two Markov chain model, rather than the standard order one
model, which is widely used in the literature. Markov chain of order two is the
minimal framework that is capable of distinguishing those who experience a
certain health expenditure level for the first time from those who have been
experiencing that or other levels for some time. In addition, using the model
we show (2) that the probability of encountering a health shock first de-
creases until around age 10, and then increases with age, particularly, after
age 40, (3) that health shock distributions among different age groups do not
differ until their percentiles reach the median range, but that above the
median the health shock distributions of older age groups gradually start to
first-order dominate those of younger groups, and (4) that the persistency of
health shocks also shows a U-shape in relation to age.

arXiv link: http://arxiv.org/abs/1801.08746v1

Econometrics arXiv paper, submitted: 2018-01-22

Estimating Heterogeneous Consumer Preferences for Restaurants and Travel Time Using Mobile Location Data

Authors: Susan Athey, David Blei, Robert Donnelly, Francisco Ruiz, Tobias Schmidt

This paper analyzes consumer choices over lunchtime restaurants using data
from a sample of several thousand anonymous mobile phone users in the San
Francisco Bay Area. The data is used to identify users' approximate typical
morning location, as well as their choices of lunchtime restaurants. We build a
model where restaurants have latent characteristics (whose distribution may
depend on restaurant observables, such as star ratings, food category, and
price range), each user has preferences for these latent characteristics, and
these preferences are heterogeneous across users. Similarly, each item has
latent characteristics that describe users' willingness to travel to the
restaurant, and each user has individual-specific preferences for those latent
characteristics. Thus, both users' willingness to travel and their base utility
for each restaurant vary across user-restaurant pairs. We use a Bayesian
approach to estimation. To make the estimation computationally feasible, we
rely on variational inference to approximate the posterior distribution, as
well as stochastic gradient descent as a computational approach. Our model
performs better than more standard competing models such as multinomial logit
and nested logit models, in part due to the personalization of the estimates.
We analyze how consumers re-allocate their demand after a restaurant closes to
nearby restaurants versus more distant restaurants with similar
characteristics, and we compare our predictions to actual outcomes. Finally, we
show how the model can be used to analyze counterfactual questions such as what
type of restaurant would attract the most consumers in a given location.

arXiv link: http://arxiv.org/abs/1801.07826v1

Econometrics arXiv cross-link from q-fin.GN (q-fin.GN), submitted: 2018-01-22

Accurate Evaluation of Asset Pricing Under Uncertainty and Ambiguity of Information

Authors: Farouq Abdulaziz Masoudy

Since exchange economy considerably varies in the market assets, asset prices
have become an attractive research area for investigating and modeling
ambiguous and uncertain information in today markets. This paper proposes a new
generative uncertainty mechanism based on the Bayesian Inference and
Correntropy (BIC) technique for accurately evaluating asset pricing in markets.
This technique examines the potential processes of risk, ambiguity, and
variations of market information in a controllable manner. We apply the new BIC
technique to a consumption asset-pricing model in which the consumption
variations are modeled using the Bayesian network model with observing the
dynamics of asset pricing phenomena in the data. These dynamics include the
procyclical deviations of price, the countercyclical deviations of equity
premia and equity volatility, the leverage impact and the mean reversion of
excess returns. The key findings reveal that the precise modeling of asset
information can estimate price changes in the market effectively.

arXiv link: http://arxiv.org/abs/1801.06966v2

Econometrics arXiv updated paper (originally submitted: 2018-01-22)

Evolution of Regional Innovation with Spatial Knowledge Spillovers: Convergence or Divergence?

Authors: Jinwen Qiu, Wenjian Liu, Ning Ning

This paper extends endogenous economic growth models to incorporate knowledge
externality. We explores whether spatial knowledge spillovers among regions
exist, whether spatial knowledge spillovers promote regional innovative
activities, and whether external knowledge spillovers affect the evolution of
regional innovations in the long run. We empirically verify the theoretical
results through applying spatial statistics and econometric model in the
analysis of panel data of 31 regions in China. An accurate estimate of the
range of knowledge spillovers is achieved and the convergence of regional
knowledge growth rate is found, with clear evidences that developing regions
benefit more from external knowledge spillovers than developed regions.

arXiv link: http://arxiv.org/abs/1801.06936v3

Econometrics arXiv updated paper (originally submitted: 2018-01-21)

Testing the Number of Regimes in Markov Regime Switching Models

Authors: Hiroyuki Kasahara, Katsumi Shimotsu

Markov regime switching models have been used in numerous empirical studies
in economics and finance. However, the asymptotic distribution of the
likelihood ratio test statistic for testing the number of regimes in Markov
regime switching models has been an unresolved problem. This paper derives the
asymptotic distribution of the likelihood ratio test statistic for testing the
null hypothesis of $M_0$ regimes against the alternative hypothesis of $M_0 +
1$ regimes for any $M_0 \geq 1$ both under the null hypothesis and under local
alternatives. We show that the contiguous alternatives converge to the null
hypothesis at a rate of $n^{-1/8}$ in regime switching models with normal
density. The asymptotic validity of the parametric bootstrap is also
established.

arXiv link: http://arxiv.org/abs/1801.06862v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2018-01-20

Nonfractional Memory: Filtering, Antipersistence, and Forecasting

Authors: J. Eduardo Vera-Valdés

The fractional difference operator remains to be the most popular mechanism
to generate long memory due to the existence of efficient algorithms for their
simulation and forecasting. Nonetheless, there is no theoretical argument
linking the fractional difference operator with the presence of long memory in
real data. In this regard, one of the most predominant theoretical explanations
for the presence of long memory is cross-sectional aggregation of persistent
micro units. Yet, the type of processes obtained by cross-sectional aggregation
differs from the one due to fractional differencing. Thus, this paper develops
fast algorithms to generate and forecast long memory by cross-sectional
aggregation. Moreover, it is shown that the antipersistent phenomenon that
arises for negative degrees of memory in the fractional difference literature
is not present for cross-sectionally aggregated processes. Pointedly, while the
autocorrelations for the fractional difference operator are negative for
negative degrees of memory by construction, this restriction does not apply to
the cross-sectional aggregated scheme. We show that this has implications for
long memory tests in the frequency domain, which will be misspecified for
cross-sectionally aggregated processes with negative degrees of memory.
Finally, we assess the forecast performance of high-order $AR$ and $ARFIMA$
models when the long memory series are generated by cross-sectional
aggregation. Our results are of interest to practitioners developing forecasts
of long memory variables like inflation, volatility, and climate data, where
aggregation may be the source of long memory.

arXiv link: http://arxiv.org/abs/1801.06677v1

Econometrics arXiv cross-link from Quantitative Finance – Economics (q-fin.EC), submitted: 2018-01-20

Capital Structure in U.S., a Quantile Regression Approach with Macroeconomic Impacts

Authors: Andreas Kaloudis, Dimitrios Tsolis

The major perspective of this paper is to provide more evidence into the
empirical determinants of capital structure adjustment in different
macroeconomics states by focusing and discussing the relative importance of
firm-specific and macroeconomic characteristics from an alternative scope in
U.S. This study extends the empirical research on the topic of capital
structure by focusing on a quantile regression method to investigate the
behavior of firm-specific characteristics and macroeconomic variables across
all quantiles of distribution of leverage (total debt, long-terms debt and
short-terms debt). Thus, based on a partial adjustment model, we find that
long-term and short-term debt ratios varying regarding their partial adjustment
speeds; the short-term debt raises up while the long-term debt ratio slows down
for same periods.

arXiv link: http://arxiv.org/abs/1801.06651v1

Econometrics arXiv paper, submitted: 2018-01-19

USDA Forecasts: A meta-analysis study

Authors: Bahram Sanginabadi

The primary goal of this study is doing a meta-analysis research on two
groups of published studies. First, the ones that focus on the evaluation of
the United States Department of Agriculture (USDA) forecasts and second, the
ones that evaluate the market reactions to the USDA forecasts. We investigate
four questions. 1) How the studies evaluate the accuracy of the USDA forecasts?
2) How they evaluate the market reactions to the USDA forecasts? 3) Is there
any heterogeneity in the results of the mentioned studies? 4) Is there any
publication bias? About the first question, while some researchers argue that
the forecasts are unbiased, most of them maintain that they are biased,
inefficient, not optimal, or not rational. About the second question, while a
few studies claim that the forecasts are not newsworthy, most of them maintain
that they are newsworthy, provide useful information, and cause market
reactions. About the third and the fourth questions, based on our findings,
there are some clues that the results of the studies are heterogeneous, but we
didn't find enough evidences of publication bias.

arXiv link: http://arxiv.org/abs/1801.06575v1

Econometrics arXiv updated paper (originally submitted: 2018-01-19)

Predicting crypto-currencies using sparse non-Gaussian state space models

Authors: Christian Hotz-Behofsits, Florian Huber, Thomas O. Zörner

In this paper we forecast daily returns of crypto-currencies using a wide
variety of different econometric models. To capture salient features commonly
observed in financial time series like rapid changes in the conditional
variance, non-normality of the measurement errors and sharply increasing
trends, we develop a time-varying parameter VAR with t-distributed measurement
errors and stochastic volatility. To control for overparameterization, we rely
on the Bayesian literature on shrinkage priors that enables us to shrink
coefficients associated with irrelevant predictors and/or perform model
specification in a flexible manner. Using around one year of daily data we
perform a real-time forecasting exercise and investigate whether any of the
proposed models is able to outperform the naive random walk benchmark. To
assess the economic relevance of the forecasting gains produced by the proposed
models we moreover run a simple trading exercise.

arXiv link: http://arxiv.org/abs/1801.06373v2

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2018-01-19

A Dirichlet Process Mixture Model of Discrete Choice

Authors: Rico Krueger, Akshay Vij, Taha H. Rashidi

We present a mixed multinomial logit (MNL) model, which leverages the
truncated stick-breaking process representation of the Dirichlet process as a
flexible nonparametric mixing distribution. The proposed model is a Dirichlet
process mixture model and accommodates discrete representations of
heterogeneity, like a latent class MNL model. Yet, unlike a latent class MNL
model, the proposed discrete choice model does not require the analyst to fix
the number of mixture components prior to estimation, as the complexity of the
discrete mixing distribution is inferred from the evidence. For posterior
inference in the proposed Dirichlet process mixture model of discrete choice,
we derive an expectation maximisation algorithm. In a simulation study, we
demonstrate that the proposed model framework can flexibly capture
differently-shaped taste parameter distributions. Furthermore, we empirically
validate the model framework in a case study on motorists' route choice
preferences and find that the proposed Dirichlet process mixture model of
discrete choice outperforms a latent class MNL model and mixed MNL models with
common parametric mixing distributions in terms of both in-sample fit and
out-of-sample predictive ability. Compared to extant modelling approaches, the
proposed discrete choice model substantially abbreviates specification
searches, as it relies on less restrictive parametric assumptions and does not
require the analyst to specify the complexity of the discrete mixing
distribution prior to estimation.

arXiv link: http://arxiv.org/abs/1801.06296v1

Econometrics arXiv updated paper (originally submitted: 2018-01-15)

Panel Data Quantile Regression with Grouped Fixed Effects

Authors: Jiaying Gu, Stanislav Volgushev

This paper introduces estimation methods for grouped latent heterogeneity in
panel data quantile regression. We assume that the observed individuals come
from a heterogeneous population with a finite number of types. The number of
types and group membership is not assumed to be known in advance and is
estimated by means of a convex optimization problem. We provide conditions
under which group membership is estimated consistently and establish asymptotic
normality of the resulting estimators. Simulations show that the method works
well in finite samples when T is reasonably large. To illustrate the proposed
methodology we study the effects of the adoption of Right-to-Carry concealed
weapon laws on violent crime rates using panel data of 51 U.S. states from 1977
- 2010.

arXiv link: http://arxiv.org/abs/1801.05041v2

Econometrics arXiv paper, submitted: 2018-01-15

Characterizing Assumption of Rationality by Incomplete Information

Authors: Shuige Liu

We characterize common assumption of rationality of 2-person games within an
incomplete information framework. We use the lexicographic model with
incomplete information and show that a belief hierarchy expresses common
assumption of rationality within a complete information framework if and only
if there is a belief hierarchy within the corresponding incomplete information
framework that expresses common full belief in caution, rationality, every good
choice is supported, and prior belief in the original utility functions.

arXiv link: http://arxiv.org/abs/1801.04714v1

Econometrics arXiv updated paper (originally submitted: 2018-01-15)

Heterogeneous structural breaks in panel data models

Authors: Ryo Okui, Wendun Wang

This paper develops a new model and estimation procedure for panel data that
allows us to identify heterogeneous structural breaks. We model individual
heterogeneity using a grouped pattern. For each group, we allow common
structural breaks in the coefficients. However, the number, timing, and size of
these breaks can differ across groups. We develop a hybrid estimation procedure
of the grouped fixed effects approach and adaptive group fused Lasso. We show
that our method can consistently identify the latent group structure, detect
structural breaks, and estimate the regression parameters. Monte Carlo results
demonstrate the good performance of the proposed method in finite samples. An
empirical application to the relationship between income and democracy
illustrates the importance of considering heterogeneous structural breaks.

arXiv link: http://arxiv.org/abs/1801.04672v2

Econometrics arXiv updated paper (originally submitted: 2018-01-13)

Censored Quantile Instrumental Variable Estimation with Stata

Authors: Victor Chernozhukov, Iván Fernández-Val, Sukjin Han, Amanda Kowalski

Many applications involve a censored dependent variable and an endogenous
independent variable. Chernozhukov et al. (2015) introduced a censored quantile
instrumental variable estimator (CQIV) for use in those applications, which has
been applied by Kowalski (2016), among others. In this article, we introduce a
Stata command, cqiv, that simplifes application of the CQIV estimator in Stata.
We summarize the CQIV estimator and algorithm, we describe the use of the cqiv
command, and we provide empirical examples.

arXiv link: http://arxiv.org/abs/1801.05305v3

Econometrics arXiv updated paper (originally submitted: 2018-01-11)

Hyper-rational choice theory

Authors: Madjid Eshaghi Gordji, Gholamreza Askari

The rational choice theory is based on this idea that people rationally
pursue goals for increasing their personal interests. In most conditions, the
behavior of an actor is not independent of the person and others' behavior.
Here, we present a new concept of rational choice as a hyper-rational choice
which in this concept, the actor thinks about profit or loss of other actors in
addition to his personal profit or loss and then will choose an action which is
desirable to him. We implement the hyper-rational choice to generalize and
expand the game theory. Results of this study will help to model the behavior
of people considering environmental conditions, the kind of behavior
interactive, valuation system of itself and others and system of beliefs and
internal values of societies. Hyper-rationality helps us understand how human
decision makers behave in interactive decisions.

arXiv link: http://arxiv.org/abs/1801.10520v2

Econometrics arXiv paper, submitted: 2018-01-11

Solving Dynamic Discrete Choice Models: Integrated or Expected Value Function?

Authors: Patrick Kofod Mogensen

Dynamic Discrete Choice Models (DDCMs) are important in the structural
estimation literature. Since the structural errors are practically always
continuous and unbounded in nature, researchers often use the expected value
function. The idea to solve for the expected value function made solution more
practical and estimation feasible. However, as we show in this paper, the
expected value function is impractical compared to an alternative: the
integrated (ex ante) value function. We provide brief descriptions of the
inefficacy of the former, and benchmarks on actual problems with varying
cardinality of the state space and number of decisions. Though the two
approaches solve the same problem in theory, the benchmarks support the claim
that the integrated value function is preferred in practice.

arXiv link: http://arxiv.org/abs/1801.03978v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2018-01-09

Assessing the effect of advertising expenditures upon sales: a Bayesian structural time series model

Authors: Víctor Gallego, Pablo Suárez-García, Pablo Angulo, David Gómez-Ullate

We propose a robust implementation of the Nerlove--Arrow model using a
Bayesian structural time series model to explain the relationship between
advertising expenditures of a country-wide fast-food franchise network with its
weekly sales. Thanks to the flexibility and modularity of the model, it is well
suited to generalization to other markets or situations. Its Bayesian nature
facilitates incorporating a priori information (the manager's views),
which can be updated with relevant data. This aspect of the model will be used
to present a strategy of budget scheduling across time and channels.

arXiv link: http://arxiv.org/abs/1801.03050v3

Econometrics arXiv updated paper (originally submitted: 2018-01-09)

Implications of macroeconomic volatility in the Euro area

Authors: Niko Hauzenberger, Maximilian Böck, Michael Pfarrhofer, Anna Stelzer, Gregor Zens

In this paper we estimate a Bayesian vector autoregressive model with factor
stochastic volatility in the error term to assess the effects of an uncertainty
shock in the Euro area. This allows us to treat macroeconomic uncertainty as a
latent quantity during estimation. Only a limited number of contributions to
the literature estimate uncertainty and its macroeconomic consequences jointly,
and most are based on single country models. We analyze the special case of a
shock restricted to the Euro area, where member states are highly related by
construction. We find significant results of a decrease in real activity for
all countries over a period of roughly a year following an uncertainty shock.
Moreover, equity prices, short-term interest rates and exports tend to decline,
while unemployment levels increase. Dynamic responses across countries differ
slightly in magnitude and duration, with Ireland, Slovakia and Greece
exhibiting different reactions for some macroeconomic fundamentals.

arXiv link: http://arxiv.org/abs/1801.02925v2

Econometrics arXiv cross-link from eess.SP (eess.SP), submitted: 2018-01-09

Dynamic Pricing and Energy Management Strategy for EV Charging Stations under Uncertainties

Authors: Chao Luo, Yih-Fang Huang, Vijay Gupta

This paper presents a dynamic pricing and energy management framework for
electric vehicle (EV) charging service providers. To set the charging prices,
the service providers faces three uncertainties: the volatility of wholesale
electricity price, intermittent renewable energy generation, and
spatial-temporal EV charging demand. The main objective of our work here is to
help charging service providers to improve their total profits while enhancing
customer satisfaction and maintaining power grid stability, taking into account
those uncertainties. We employ a linear regression model to estimate the EV
charging demand at each charging station, and introduce a quantitative measure
for customer satisfaction. Both the greedy algorithm and the dynamic
programming (DP) algorithm are employed to derive the optimal charging prices
and determine how much electricity to be purchased from the wholesale market in
each planning horizon. Simulation results show that DP algorithm achieves an
increased profit (up to 9%) compared to the greedy algorithm (the benchmark
algorithm) under certain scenarios. Additionally, we observe that the
integration of a low-cost energy storage into the system can not only improve
the profit, but also smooth out the charging price fluctuation, protecting the
end customers from the volatile wholesale market.

arXiv link: http://arxiv.org/abs/1801.02783v1

Econometrics arXiv updated paper (originally submitted: 2018-01-08)

Revealed Price Preference: Theory and Empirical Analysis

Authors: Rahul Deb, Yuichi Kitamura, John K. -H. Quah, Jörg Stoye

To determine the welfare implications of price changes in demand data, we
introduce a revealed preference relation over prices. We show that the absence
of cycles in this relation characterizes a consumer who trades off the utility
of consumption against the disutility of expenditure. Our model can be applied
whenever a consumer's demand over a strict subset of all available goods is
being analyzed; it can also be extended to settings with discrete goods and
nonlinear prices. To illustrate its use, we apply our model to a single-agent
data set and to a data set with repeated cross-sections. We develop a novel
test of linear hypotheses on partially identified parameters to estimate the
proportion of the population who are revealed better off due to a price change
in the latter application. This new technique can be used for nonparametric
counterfactual analysis more broadly.

arXiv link: http://arxiv.org/abs/1801.02702v3

Econometrics arXiv paper, submitted: 2018-01-07

On a Constructive Theory of Markets

Authors: Steven D. Moffitt

This article is a prologue to the article "Why Markets are Inefficient: A
Gambling 'Theory' of Financial Markets for Practitioners and Theorists." It
presents important background for that article --- why gambling is important,
even necessary, for real-world traders --- the reason for the superiority of
the strategic/gambling approach to the competing market ideologies of market
fundamentalism and the scientific approach --- and its potential to uncover
profitable trading systems. Much of this article was drawn from Chapter 1 of
the book "The Strategic Analysis of Financial Markets (in 2 volumes)" World
Scientific, 2017.

arXiv link: http://arxiv.org/abs/1801.02994v1

Econometrics arXiv cross-link from eess.SP (eess.SP), submitted: 2018-01-07

A Consumer Behavior Based Approach to Multi-Stage EV Charging Station Placement

Authors: Chao Luo, Yih-Fang Huang, Vijay Gupta

This paper presents a multi-stage approach to the placement of charging
stations under the scenarios of different electric vehicle (EV) penetration
rates. The EV charging market is modeled as the oligopoly. A consumer behavior
based approach is applied to forecast the charging demand of the charging
stations using a nested logit model. The impacts of both the urban road network
and the power grid network on charging station planning are also considered. At
each planning stage, the optimal station placement strategy is derived through
solving a Bayesian game among the service providers. To investigate the
interplay of the travel pattern, the consumer behavior, urban road network,
power grid network, and the charging station placement, a simulation platform
(The EV Virtual City 1.0) is developed using Java on Repast.We conduct a case
study in the San Pedro District of Los Angeles by importing the geographic and
demographic data of that region into the platform. The simulation results
demonstrate a strong consistency between the charging station placement and the
traffic flow of EVs. The results also reveal an interesting phenomenon that
service providers prefer clustering instead of spatial separation in this
oligopoly market.

arXiv link: http://arxiv.org/abs/1801.02135v1

Econometrics arXiv cross-link from eess.SP (eess.SP), submitted: 2018-01-07

Placement of EV Charging Stations --- Balancing Benefits among Multiple Entities

Authors: Chao Luo, Yih-Fang Huang, Vijay Gupta

This paper studies the problem of multi-stage placement of electric vehicle
(EV) charging stations with incremental EV penetration rates. A nested logit
model is employed to analyze the charging preference of the individual consumer
(EV owner), and predict the aggregated charging demand at the charging
stations. The EV charging industry is modeled as an oligopoly where the entire
market is dominated by a few charging service providers (oligopolists). At the
beginning of each planning stage, an optimal placement policy for each service
provider is obtained through analyzing strategic interactions in a Bayesian
game. To derive the optimal placement policy, we consider both the
transportation network graph and the electric power network graph. A simulation
software --- The EV Virtual City 1.0 --- is developed using Java to investigate
the interactions among the consumers (EV owner), the transportation network
graph, the electric power network graph, and the charging stations. Through a
series of experiments using the geographic and demographic data from the city
of San Pedro District of Los Angeles, we show that the charging station
placement is highly consistent with the heatmap of the traffic flow. In
addition, we observe a spatial economic phenomenon that service providers
prefer clustering instead of separation in the EV charging market.

arXiv link: http://arxiv.org/abs/1801.02129v1

Econometrics arXiv cross-link from eess.SP (eess.SP), submitted: 2018-01-07

Stochastic Dynamic Pricing for EV Charging Stations with Renewables Integration and Energy Storage

Authors: Chao Luo, Yih-Fang Huang, Vijay Gupta

This paper studies the problem of stochastic dynamic pricing and energy
management policy for electric vehicle (EV) charging service providers. In the
presence of renewable energy integration and energy storage system, EV charging
service providers must deal with multiple uncertainties --- charging demand
volatility, inherent intermittency of renewable energy generation, and
wholesale electricity price fluctuation. The motivation behind our work is to
offer guidelines for charging service providers to determine proper charging
prices and manage electricity to balance the competing objectives of improving
profitability, enhancing customer satisfaction, and reducing impact on power
grid in spite of these uncertainties. We propose a new metric to assess the
impact on power grid without solving complete power flow equations. To protect
service providers from severe financial losses, a safeguard of profit is
incorporated in the model. Two algorithms --- stochastic dynamic programming
(SDP) algorithm and greedy algorithm (benchmark algorithm) --- are applied to
derive the pricing and electricity procurement policy. A Pareto front of the
multiobjective optimization is derived. Simulation results show that using SDP
algorithm can achieve up to 7% profit gain over using greedy algorithm.
Additionally, we observe that the charging service provider is able to reshape
spatial-temporal charging demands to reduce the impact on power grid via
pricing signals.

arXiv link: http://arxiv.org/abs/1801.02128v1

Econometrics arXiv paper, submitted: 2018-01-06

Why Markets are Inefficient: A Gambling "Theory" of Financial Markets For Practitioners and Theorists

Authors: Steven D. Moffitt

The purpose of this article is to propose a new "theory," the Strategic
Analysis of Financial Markets (SAFM) theory, that explains the operation of
financial markets using the analytical perspective of an enlightened gambler.
The gambler understands that all opportunities for superior performance arise
from suboptimal decisions by humans, but understands also that knowledge of
human decision making alone is not enough to understand market behavior --- one
must still model how those decisions lead to market prices. Thus are there
three parts to the model: gambling theory, human decision making, and strategic
problem solving. A new theory is necessary because at this writing in 2017,
there is no theory of financial markets acceptable to both practitioners and
theorists. Theorists' efficient market theory, for example, cannot explain
bubbles and crashes nor the exceptional returns of famous investors and
speculators such as Warren Buffett and George Soros. At the same time, a new
theory must be sufficiently quantitative, explain market "anomalies" and
provide predictions in order to satisfy theorists. It is hoped that the SAFM
framework will meet these requirements.

arXiv link: http://arxiv.org/abs/1801.01948v1

Econometrics arXiv paper, submitted: 2018-01-05

Does it Pay to Buy the Pot in the Canadian 6/49 Lotto? Implications for Lottery Design

Authors: Steven D. Moffitt, William T. Ziemba

Despite its unusual payout structure, the Canadian 6/49 Lotto is one of the
few government sponsored lotteries that has the potential for a favorable
strategy we call "buying the pot." By buying the pot we mean that a syndicate
buys each ticket in the lottery, ensuring that it holds a jackpot winner. We
assume that the other bettors independently buy small numbers of tickets. This
paper presents (1) a formula for the syndicate's expected return, (2)
conditions under which buying the pot produces a significant positive expected
return, and (3) the implications of these findings for lottery design.

arXiv link: http://arxiv.org/abs/1801.02959v1

Econometrics arXiv paper, submitted: 2018-01-05

A Method for Winning at Lotteries

Authors: Steven D. Moffitt, William T. Ziemba

We report a new result on lotteries --- that a well-funded syndicate has a
purely mechanical strategy to achieve expected returns of 10% to 25% in an
equiprobable lottery with no take and no carryover pool. We prove that an
optimal strategy (Nash equilibrium) in a game between the syndicate and other
players consists of betting one of each ticket (the "trump ticket"), and extend
that result to proportional ticket selection in non-equiprobable lotteries. The
strategy can be adjusted to accommodate lottery taxes and carryover pools. No
"irrationality" need be involved for the strategy to succeed --- it requires
only that a large group of non-syndicate bettors each choose a few tickets
independently.

arXiv link: http://arxiv.org/abs/1801.02958v1

Econometrics arXiv cross-link from q-fin.CP (q-fin.CP), submitted: 2018-01-05

SABCEMM-A Simulator for Agent-Based Computational Economic Market Models

Authors: Torsten Trimborn, Philipp Otte, Simon Cramer, Max Beikirch, Emma Pabich, Martin Frank

We introduce the simulation tool SABCEMM (Simulator for Agent-Based
Computational Economic Market Models) for agent-based computational economic
market (ABCEM) models. Our simulation tool is implemented in C++ and we can
easily run ABCEM models with several million agents. The object-oriented
software design enables the isolated implementation of building blocks for
ABCEM models, such as agent types and market mechanisms. The user can design
and compare ABCEM models in a unified environment by recombining existing
building blocks using the XML-based SABCEMM configuration file. We introduce an
abstract ABCEM model class which our simulation tool is built upon.
Furthermore, we present the software architecture as well as computational
aspects of SABCEMM. Here, we focus on the efficiency of SABCEMM with respect to
the run time of our simulations. We show the great impact of different random
number generators on the run time of ABCEM models. The code and documentation
is published on GitHub at https://github.com/SABCEMM/SABCEMM, such that all
results can be reproduced by the reader.

arXiv link: http://arxiv.org/abs/1801.01811v2

Econometrics arXiv paper, submitted: 2018-01-05

Dynamic and granular loss reserving with copulae

Authors: Matúš Maciak, Ostap Okhrin, Michal Pešta

An intensive research sprang up for stochastic methods in insurance during
the past years. To meet all future claims rising from policies, it is requisite
to quantify the outstanding loss liabilities. Loss reserving methods based on
aggregated data from run-off triangles are predominantly used to calculate the
claims reserves. Conventional reserving techniques have some disadvantages:
loss of information from the policy and the claim's development due to the
aggregation, zero or negative cells in the triangle; usually small number of
observations in the triangle; only few observations for recent accident years;
and sensitivity to the most recent paid claims.
To overcome these dilemmas, granular loss reserving methods for individual
claim-by-claim data will be derived. Reserves' estimation is a crucial part of
the risk valuation process, which is now a front burner in economics. Since
there is a growing demand for prediction of total reserves for different types
of claims or even multiple lines of business, a time-varying copula framework
for granular reserving will be established.

arXiv link: http://arxiv.org/abs/1801.01792v1

Econometrics arXiv updated paper (originally submitted: 2018-01-03)

Comparing the Forecasting Performances of Linear Models for Electricity Prices with High RES Penetration

Authors: Angelica Gianfreda, Francesco Ravazzolo, Luca Rossini

This paper compares alternative univariate versus multivariate models,
frequentist versus Bayesian autoregressive and vector autoregressive
specifications, for hourly day-ahead electricity prices, both with and without
renewable energy sources. The accuracy of point and density forecasts are
inspected in four main European markets (Germany, Denmark, Italy and Spain)
characterized by different levels of renewable energy power generation. Our
results show that the Bayesian VAR specifications with exogenous variables
dominate other multivariate and univariate specifications, in terms of both
point and density forecasting.

arXiv link: http://arxiv.org/abs/1801.01093v3

Econometrics arXiv paper, submitted: 2018-01-03

A New Wald Test for Hypothesis Testing Based on MCMC outputs

Authors: Yong Li, Xiaobin Liu, Jun Yu, Tao Zeng

In this paper, a new and convenient $\chi^2$ wald test based on MCMC outputs
is proposed for hypothesis testing. The new statistic can be explained as MCMC
version of Wald test and has several important advantages that make it very
convenient in practical applications. First, it is well-defined under improper
prior distributions and avoids Jeffrey-Lindley's paradox. Second, it's
asymptotic distribution can be proved to follow the $\chi^2$ distribution so
that the threshold values can be easily calibrated from this distribution.
Third, it's statistical error can be derived using the Markov chain Monte Carlo
(MCMC) approach. Fourth, most importantly, it is only based on the posterior
MCMC random samples drawn from the posterior distribution. Hence, it is only
the by-product of the posterior outputs and very easy to compute. In addition,
when the prior information is available, the finite sample theory is derived
for the proposed test statistic. At last, the usefulness of the test is
illustrated with several applications to latent variable models widely used in
economics and finance.

arXiv link: http://arxiv.org/abs/1801.00973v1

Econometrics arXiv cross-link from cs.CC (cs.CC), submitted: 2018-01-02

Complexity Theory, Game Theory, and Economics: The Barbados Lectures

Authors: Tim Roughgarden

This document collects the lecture notes from my mini-course "Complexity
Theory, Game Theory, and Economics," taught at the Bellairs Research Institute
of McGill University, Holetown, Barbados, February 19--23, 2017, as the 29th
McGill Invitational Workshop on Computational Complexity.
The goal of this mini-course is twofold: (i) to explain how complexity theory
has helped illuminate several barriers in economics and game theory; and (ii)
to illustrate how game-theoretic questions have led to new and interesting
complexity theory, including recent several breakthroughs. It consists of two
five-lecture sequences: the Solar Lectures, focusing on the communication and
computational complexity of computing equilibria; and the Lunar Lectures,
focusing on applications of complexity theory in game theory and economics. No
background in game theory is assumed.

arXiv link: http://arxiv.org/abs/1801.00734v3

Econometrics arXiv paper, submitted: 2017-12-31

Resource Abundance and Life Expectancy

Authors: Bahram Sanginabadi

This paper investigates the impacts of major natural resource discoveries
since 1960 on life expectancy in the nations that they were resource poor prior
to the discoveries. Previous literature explains the relation between nations
wealth and life expectancy, but it has been silent about the impacts of
resource discoveries on life expectancy. We attempt to fill this gap in this
study. An important advantage of this study is that as the previous researchers
argued resource discovery could be an exogenous variable. We use longitudinal
data from 1960 to 2014 and we apply three modern empirical methods including
Difference-in-Differences, Event studies, and Synthetic Control approach, to
investigate the main question of the research which is 'how resource
discoveries affect life expectancy?'. The findings show that resource
discoveries in Ecuador, Yemen, Oman, and Equatorial Guinea have positive and
significant impacts on life expectancy, but the effects for the European
countries are mostly negative.

arXiv link: http://arxiv.org/abs/1801.00369v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2017-12-31

Estimation and Inference of Treatment Effects with $L_2$-Boosting in High-Dimensional Settings

Authors: Jannis Kueck, Ye Luo, Martin Spindler, Zigan Wang

Empirical researchers are increasingly faced with rich data sets containing
many controls or instrumental variables, making it essential to choose an
appropriate approach to variable selection. In this paper, we provide results
for valid inference after post- or orthogonal $L_2$-Boosting is used for
variable selection. We consider treatment effects after selecting among many
control variables and instrumental variable models with potentially many
instruments. To achieve this, we establish new results for the rate of
convergence of iterated post-$L_2$-Boosting and orthogonal $L_2$-Boosting in a
high-dimensional setting similar to Lasso, i.e., under approximate sparsity
without assuming the beta-min condition. These results are extended to the 2SLS
framework and valid inference is provided for treatment effect analysis. We
give extensive simulation results for the proposed methods and compare them
with Lasso. In an empirical application, we construct efficient IVs with our
proposed methods to estimate the effect of pre-merger overlap of bank branch
networks in the US on the post-merger stock returns of the acquirer bank.

arXiv link: http://arxiv.org/abs/1801.00364v2

Econometrics arXiv updated paper (originally submitted: 2017-12-31)

Confidence set for group membership

Authors: Andreas Dzemski, Ryo Okui

Our confidence set quantifies the statistical uncertainty from data-driven
group assignments in grouped panel models. It covers the true group memberships
jointly for all units with pre-specified probability and is constructed by
inverting many simultaneous unit-specific one-sided tests for group membership.
We justify our approach under $N, T \to \infty$ asymptotics using tools from
high-dimensional statistics, some of which we extend in this paper. We provide
Monte Carlo evidence that the confidence set has adequate coverage in finite
samples.An empirical application illustrates the use of our confidence set.

arXiv link: http://arxiv.org/abs/1801.00332v6

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2017-12-28

Debiased Machine Learning of Set-Identified Linear Models

Authors: Vira Semenova

This paper provides estimation and inference methods for an identified set's
boundary (i.e., support function) where the selection among a very large number
of covariates is based on modern regularized tools. I characterize the boundary
using a semiparametric moment equation. Combining Neyman-orthogonality and
sample splitting ideas, I construct a root-N consistent, uniformly
asymptotically Gaussian estimator of the boundary and propose a multiplier
bootstrap procedure to conduct inference. I apply this result to the partially
linear model, the partially linear IV model and the average partial derivative
with an interval-valued outcome.

arXiv link: http://arxiv.org/abs/1712.10024v6

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2017-12-26

Variational Bayes Estimation of Discrete-Margined Copula Models with Application to Time Series

Authors: Ruben Loaiza-Maya, Michael Stanley Smith

We propose a new variational Bayes estimator for high-dimensional copulas
with discrete, or a combination of discrete and continuous, margins. The method
is based on a variational approximation to a tractable augmented posterior, and
is faster than previous likelihood-based approaches. We use it to estimate
drawable vine copulas for univariate and multivariate Markov ordinal and mixed
time series. These have dimension $rT$, where $T$ is the number of observations
and $r$ is the number of series, and are difficult to estimate using previous
methods. The vine pair-copulas are carefully selected to allow for
heteroskedasticity, which is a feature of most ordinal time series data. When
combined with flexible margins, the resulting time series models also allow for
other common features of ordinal data, such as zero inflation, multiple modes
and under- or over-dispersion. Using six example series, we illustrate both the
flexibility of the time series copula models, and the efficacy of the
variational Bayes estimator for copulas of up to 792 dimensions and 60
parameters. This far exceeds the size and complexity of copula models for
discrete data that can be estimated using previous methods.

arXiv link: http://arxiv.org/abs/1712.09150v2

Econometrics arXiv updated paper (originally submitted: 2017-12-25)

An Exact and Robust Conformal Inference Method for Counterfactual and Synthetic Controls

Authors: Victor Chernozhukov, Kaspar Wüthrich, Yinchu Zhu

We introduce new inference procedures for counterfactual and synthetic
control methods for policy evaluation. We recast the causal inference problem
as a counterfactual prediction and a structural breaks testing problem. This
allows us to exploit insights from conformal prediction and structural breaks
testing to develop permutation inference procedures that accommodate modern
high-dimensional estimators, are valid under weak and easy-to-verify
conditions, and are provably robust against misspecification. Our methods work
in conjunction with many different approaches for predicting counterfactual
mean outcomes in the absence of the policy intervention. Examples include
synthetic controls, difference-in-differences, factor and matrix completion
models, and (fused) time series panel data models. Our approach demonstrates an
excellent small-sample performance in simulations and is taken to a data
application where we re-evaluate the consequences of decriminalizing indoor
prostitution. Open-source software for implementing our conformal inference
methods is available.

arXiv link: http://arxiv.org/abs/1712.09089v10

Econometrics arXiv updated paper (originally submitted: 2017-12-21)

Simultaneous Confidence Intervals for High-dimensional Linear Models with Many Endogenous Variables

Authors: Alexandre Belloni, Christian Hansen, Whitney Newey

High-dimensional linear models with endogenous variables play an increasingly
important role in recent econometric literature. In this work we allow for
models with many endogenous variables and many instrument variables to achieve
identification. Because of the high-dimensionality in the second stage,
constructing honest confidence regions with asymptotically correct coverage is
non-trivial. Our main contribution is to propose estimators and confidence
regions that would achieve that. The approach relies on moment conditions that
have an additional orthogonal property with respect to nuisance parameters.
Moreover, estimation of high-dimension nuisance parameters is carried out via
new pivotal procedures. In order to achieve simultaneously valid confidence
regions we use a multiplier bootstrap procedure to compute critical values and
establish its validity.

arXiv link: http://arxiv.org/abs/1712.08102v4

Econometrics arXiv paper, submitted: 2017-12-21

On Long Memory Origins and Forecast Horizons

Authors: J. Eduardo Vera-Valdés

Most long memory forecasting studies assume that the memory is generated by
the fractional difference operator. We argue that the most cited theoretical
arguments for the presence of long memory do not imply the fractional
difference operator, and assess the performance of the autoregressive
fractionally integrated moving average $(ARFIMA)$ model when forecasting series
with long memory generated by nonfractional processes. We find that high-order
autoregressive $(AR)$ models produce similar or superior forecast performance
than $ARFIMA$ models at short horizons. Nonetheless, as the forecast horizon
increases, the $ARFIMA$ models tend to dominate in forecast performance. Hence,
$ARFIMA$ models are well suited for forecasts of long memory processes
regardless of the long memory generating mechanism, particularly for medium and
long forecast horizons. Additionally, we analyse the forecasting performance of
the heterogeneous autoregressive ($HAR$) model which imposes restrictions on
high-order $AR$ models. We find that the structure imposed by the $HAR$ model
produces better long horizon forecasts than $AR$ models of the same order, at
the price of inferior short horizon forecasts in some cases. Our results have
implications for, among others, Climate Econometrics and Financial Econometrics
models dealing with long memory series at different forecast horizons. We show
in an example that while a short memory autoregressive moving average $(ARMA)$
model gives the best performance when forecasting the Realized Variance of the
S&P 500 up to a month ahead, the $ARFIMA$ model gives the best performance for
longer forecast horizons.

arXiv link: http://arxiv.org/abs/1712.08057v1

Econometrics arXiv updated paper (originally submitted: 2017-12-20)

Cointegration in functional autoregressive processes

Authors: Massimo Franchi, Paolo Paruolo

This paper defines the class of $H$-valued autoregressive (AR)
processes with a unit root of finite type, where $H$ is an infinite
dimensional separable Hilbert space, and derives a generalization of the
Granger-Johansen Representation Theorem valid for any integration order
$d=1,2,\dots$. An existence theorem shows that the solution of an AR with a
unit root of finite type is necessarily integrated of some finite integer $d$
and displays a common trends representation with a finite number of common
stochastic trends of the type of (cumulated) bilateral random walks and an
infinite dimensional cointegrating space. A characterization theorem clarifies
the connections between the structure of the AR operators and $(i)$ the order
of integration, $(ii)$ the structure of the attractor space and the
cointegrating space, $(iii)$ the expression of the cointegrating relations, and
$(iv)$ the Triangular representation of the process. Except for the fact that
the number of cointegrating relations that are integrated of order 0 is
infinite, the representation of $H$-valued ARs with a unit root of
finite type coincides with that of usual finite dimensional VARs, which
corresponds to the special case $H=R^p$.

arXiv link: http://arxiv.org/abs/1712.07522v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2017-12-20

Transformation Models in High-Dimensions

Authors: Sven Klaassen, Jannis Kueck, Martin Spindler

Transformation models are a very important tool for applied statisticians and
econometricians. In many applications, the dependent variable is transformed so
that homogeneity or normal distribution of the error holds. In this paper, we
analyze transformation models in a high-dimensional setting, where the set of
potential covariates is large. We propose an estimator for the transformation
parameter and we show that it is asymptotically normally distributed using an
orthogonalized moment condition where the nuisance functions depend on the
target parameter. In a simulation study, we show that the proposed estimator
works well in small samples. A common practice in labor economics is to
transform wage with the log-function. In this study, we test if this
transformation holds in CPS data from the United States.

arXiv link: http://arxiv.org/abs/1712.07364v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2017-12-19

Towards a General Large Sample Theory for Regularized Estimators

Authors: Michael Jansson, Demian Pouzo

We present a general framework for studying regularized estimators; such
estimators are pervasive in estimation problems wherein "plug-in" type
estimators are either ill-defined or ill-behaved. Within this framework, we
derive, under primitive conditions, consistency and a generalization of the
asymptotic linearity property. We also provide data-driven methods for choosing
tuning parameters that, under some conditions, achieve the aforementioned
properties. We illustrate the scope of our approach by presenting a wide range
of applications.

arXiv link: http://arxiv.org/abs/1712.07248v4

Econometrics arXiv updated paper (originally submitted: 2017-12-14)

Assessment Voting in Large Electorates

Authors: Hans Gersbach, Akaki Mamageishvili, Oriol Tejada

We analyze Assessment Voting, a new two-round voting procedure that can be
applied to binary decisions in democratic societies. In the first round, a
randomly-selected number of citizens cast their vote on one of the two
alternatives at hand, thereby irrevocably exercising their right to vote. In
the second round, after the results of the first round have been published, the
remaining citizens decide whether to vote for one alternative or to ab- stain.
The votes from both rounds are aggregated, and the final outcome is obtained by
applying the majority rule, with ties being broken by fair randomization.
Within a costly voting framework, we show that large elec- torates will choose
the preferred alternative of the majority with high prob- ability, and that
average costs will be low. This result is in contrast with the literature on
one-round voting, which predicts either higher voting costs (when voting is
compulsory) or decisions that often do not represent the preferences of the
majority (when voting is voluntary).

arXiv link: http://arxiv.org/abs/1712.05470v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2017-12-13

Quasi-Oracle Estimation of Heterogeneous Treatment Effects

Authors: Xinkun Nie, Stefan Wager

Flexible estimation of heterogeneous treatment effects lies at the heart of
many statistical challenges, such as personalized medicine and optimal resource
allocation. In this paper, we develop a general class of two-step algorithms
for heterogeneous treatment effect estimation in observational studies. We
first estimate marginal effects and treatment propensities in order to form an
objective function that isolates the causal component of the signal. Then, we
optimize this data-adaptive objective function. Our approach has several
advantages over existing methods. From a practical perspective, our method is
flexible and easy to use: In both steps, we can use any loss-minimization
method, e.g., penalized regression, deep neural networks, or boosting;
moreover, these methods can be fine-tuned by cross validation. Meanwhile, in
the case of penalized kernel regression, we show that our method has a
quasi-oracle property: Even if the pilot estimates for marginal effects and
treatment propensities are not particularly accurate, we achieve the same error
bounds as an oracle who has a priori knowledge of these two nuisance
components. We implement variants of our approach based on penalized
regression, kernel ridge regression, and boosting in a variety of simulation
setups, and find promising performance relative to existing baselines.

arXiv link: http://arxiv.org/abs/1712.04912v4

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2017-12-13

Fisher-Schultz Lecture: Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments, with an Application to Immunization in India

Authors: Victor Chernozhukov, Mert Demirer, Esther Duflo, Iván Fernández-Val

We propose strategies to estimate and make inference on key features of
heterogeneous effects in randomized experiments. These key features include
best linear predictors of the effects using machine learning proxies, average
effects sorted by impact groups, and average characteristics of most and least
impacted units. The approach is valid in high dimensional settings, where the
effects are proxied (but not necessarily consistently estimated) by predictive
and causal machine learning methods. We post-process these proxies into
estimates of the key features. Our approach is generic, it can be used in
conjunction with penalized methods, neural networks, random forests, boosted
trees, and ensemble methods, both predictive and causal. Estimation and
inference are based on repeated data splitting to avoid overfitting and achieve
validity. We use quantile aggregation of the results across many potential
splits, in particular taking medians of p-values and medians and other
quantiles of confidence intervals. We show that quantile aggregation lowers
estimation risks over a single split procedure, and establish its principal
inferential properties. Finally, our analysis reveals ways to build provably
better machine learning proxies through causal learning: we can use the
objective functions that we develop to construct the best linear predictors of
the effects, to obtain better machine learning proxies in the initial step. We
illustrate the use of both inferential tools and causal learners with a
randomized field experiment that evaluates a combination of nudges to stimulate
demand for immunization in India.

arXiv link: http://arxiv.org/abs/1712.04802v8

Econometrics arXiv cross-link from Statistics – Applications (stat.AP), submitted: 2017-12-13

Finite-Sample Optimal Estimation and Inference on Average Treatment Effects Under Unconfoundedness

Authors: Timothy B. Armstrong, Michal Kolesár

We consider estimation and inference on average treatment effects under
unconfoundedness conditional on the realizations of the treatment variable and
covariates. Given nonparametric smoothness and/or shape restrictions on the
conditional mean of the outcome variable, we derive estimators and confidence
intervals (CIs) that are optimal in finite samples when the regression errors
are normal with known variance. In contrast to conventional CIs, our CIs use a
larger critical value that explicitly takes into account the potential bias of
the estimator. When the error distribution is unknown, feasible versions of our
CIs are valid asymptotically, even when $n$-inference is not possible
due to lack of overlap, or low smoothness of the conditional mean. We also
derive the minimum smoothness conditions on the conditional mean that are
necessary for $n$-inference. When the conditional mean is restricted to
be Lipschitz with a large enough bound on the Lipschitz constant, the optimal
estimator reduces to a matching estimator with the number of matches set to
one. We illustrate our methods in an application to the National Supported Work
Demonstration.

arXiv link: http://arxiv.org/abs/1712.04594v5

Econometrics arXiv paper, submitted: 2017-12-12

The Calculus of Democratization and Development

Authors: Jacob Ferguson

In accordance with "Democracy's Effect on Development: More Questions than
Answers", we seek to carry out a study in following the description in the
'Questions for Further Study.' To that end, we studied 33 countries in the
Sub-Saharan Africa region, who all went through an election which should signal
a "step-up" for their democracy, one in which previously homogenous regimes
transfer power to an opposition party that fairly won the election. After doing
so, liberal-democracy indicators and democracy indicators were evaluated in the
five years prior to and after the election took place, and over that ten-year
period, we examine the data for trends. If we see positive or negative trends
over this time horizon, we are able to conclude that it was the recent increase
in the quality of their democracy which led to it. Having investigated examples
of this in depth, there seem to be three main archetypes which drive the
results. Countries with positive results to their democracy from the election
have generally positive effects on their development, countries with more
"plateau" like results also did well, but countries for whom the descent to
authoritarianism was continued by this election found more negative results.

arXiv link: http://arxiv.org/abs/1712.04117v1

Econometrics arXiv updated paper (originally submitted: 2017-12-11)

Set Identified Dynamic Economies and Robustness to Misspecification

Authors: Andreas Tryphonides

We propose a new inferential methodology for dynamic economies that is robust
to misspecification of the mechanism generating frictions. Economies with
frictions are treated as perturbations of a frictionless economy that are
consistent with a variety of mechanisms. We derive a representation for the law
of motion for such economies and we characterize parameter set identification.
We derive a link from model aggregate predictions to distributional information
contained in qualitative survey data and specify conditions under which the
identified set is refined. The latter is used to semi-parametrically estimate
distortions due to frictions in macroeconomic variables. Based on these
estimates, we propose a novel test for complete models. Using consumer and
business survey data collected by the European Commission, we apply our method
to estimate distortions due to financial frictions in the Spanish economy. We
investigate the implications of these estimates for the adequacy of the
standard model of financial frictions SW-BGG (Smets and Wouters (2007),
Bernanke, Gertler, and Gilchrist (1999)).

arXiv link: http://arxiv.org/abs/1712.03675v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2017-12-10

RNN-based counterfactual prediction, with an application to homestead policy and public schooling

Authors: Jason Poulos, Shuxi Zeng

This paper proposes a method for estimating the effect of a policy
intervention on an outcome over time. We train recurrent neural networks (RNNs)
on the history of control unit outcomes to learn a useful representation for
predicting future outcomes. The learned representation of control units is then
applied to the treated units for predicting counterfactual outcomes. RNNs are
specifically structured to exploit temporal dependencies in panel data, and are
able to learn negative and nonlinear interactions between control unit
outcomes. We apply the method to the problem of estimating the long-run impact
of U.S. homestead policy on public school spending.

arXiv link: http://arxiv.org/abs/1712.03553v7

Econometrics arXiv updated paper (originally submitted: 2017-12-09)

A Random Attention Model

Authors: Matias D. Cattaneo, Xinwei Ma, Yusufcan Masatlioglu, Elchin Suleymanov

This paper illustrates how one can deduce preference from observed choices
when attention is not only limited but also random. In contrast to earlier
approaches, we introduce a Random Attention Model (RAM) where we abstain from
any particular attention formation, and instead consider a large class of
nonparametric random attention rules. Our model imposes one intuitive
condition, termed Monotonic Attention, which captures the idea that each
consideration set competes for the decision-maker's attention. We then develop
revealed preference theory within RAM and obtain precise testable implications
for observable choice probabilities. Based on these theoretical findings, we
propose econometric methods for identification, estimation, and inference of
the decision maker's preferences. To illustrate the applicability of our
results and their concrete empirical content in specific settings, we also
develop revealed preference theory and accompanying econometric methods under
additional nonparametric assumptions on the consideration set for binary choice
problems. Finally, we provide general purpose software implementation of our
estimation and inference results, and showcase their performance using
simulations.

arXiv link: http://arxiv.org/abs/1712.03448v3

Econometrics arXiv updated paper (originally submitted: 2017-12-08)

Aggregating Google Trends: Multivariate Testing and Analysis

Authors: Stephen L. France, Yuying Shi

Web search data are a valuable source of business and economic information.
Previous studies have utilized Google Trends web search data for economic
forecasting. We expand this work by providing algorithms to combine and
aggregate search volume data, so that the resulting data is both consistent
over time and consistent between data series. We give a brand equity example,
where Google Trends is used to analyze shopping data for 100 top ranked brands
and these data are used to nowcast economic variables. We describe the
importance of out of sample prediction and show how principal component
analysis (PCA) can be used to improve the signal to noise ratio and prevent
overfitting in nowcasting models. We give a finance example, where exploratory
data analysis and classification is used to analyze the relationship between
Google Trends searches and stock prices.

arXiv link: http://arxiv.org/abs/1712.03152v2

Econometrics arXiv cross-link from physics.soc-ph (physics.soc-ph), submitted: 2017-12-08

On Metropolis Growth

Authors: Syed Amaar Ahmad

We consider the scaling laws, second-order statistics and entropy of the
consumed energy of metropolis cities which are hybrid complex systems
comprising social networks, engineering systems, agricultural output, economic
activity and energy components. We abstract a city in terms of two fundamental
variables; $s$ resource cells (of unit area) that represent energy-consuming
geographic or spatial zones (e.g. land, housing or infrastructure etc.) and a
population comprising $n$ mobile units that can migrate between these cells. We
show that with a constant metropolis area (fixed $s$), the variance and entropy
of consumed energy initially increase with $n$, reach a maximum and then
eventually diminish to zero as saturation is reached. These metrics are
indicators of the spatial mobility of the population. Under certain situations,
the variance is bounded as a quadratic function of the mean consumed energy of
the metropolis. However, when population and metropolis area are endogenous,
growth in the latter is arrested when $n\leqs{2}\log(s)$ due to
diminished population density. Conversely, the population growth reaches
equilibrium when $n\geq {s}n$ or equivalently when the aggregate of both
over-populated and under-populated areas is large. Moreover, we also draw the
relationship between our approach and multi-scalar information, when economic
dependency between a metropolis's sub-regions is based on the entropy of
consumed energy. Finally, if the city's economic size (domestic product etc.)
is proportional to the consumed energy, then for a constant population density,
we show that the economy scales linearly with the surface area (or $s$).

arXiv link: http://arxiv.org/abs/1712.02937v2

Econometrics arXiv cross-link from cs.SI (cs.SI), submitted: 2017-12-08

Online Red Packets: A Large-scale Empirical Study of Gift Giving on WeChat

Authors: Yuan Yuan, Tracy Xiao Liu, Chenhao Tan, Jie Tang

Gift giving is a ubiquitous social phenomenon, and red packets have been used
as monetary gifts in Asian countries for thousands of years. In recent years,
online red packets have become widespread in China through the WeChat platform.
Exploiting a unique dataset consisting of 61 million group red packets and
seven million users, we conduct a large-scale, data-driven study to understand
the spread of red packets and the effect of red packets on group activity. We
find that the cash flows between provinces are largely consistent with
provincial GDP rankings, e.g., red packets are sent from users in the south to
those in the north. By distinguishing spontaneous from reciprocal red packets,
we reveal the behavioral patterns in sending red packets: males, seniors, and
people with more in-group friends are more inclined to spontaneously send red
packets, while red packets from females, youths, and people with less in-group
friends are more reciprocal. Furthermore, we use propensity score matching to
study the external effects of red packets on group dynamics. We show that red
packets increase group participation and strengthen in-group relationships,
which partly explain the benefits and motivations for sending red packets.

arXiv link: http://arxiv.org/abs/1712.02926v1

Econometrics arXiv updated paper (originally submitted: 2017-12-06)

On monitoring development indicators using high resolution satellite images

Authors: Potnuru Kishen Suraj, Ankesh Gupta, Makkunda Sharma, Sourabh Bikas Paul, Subhashis Banerjee

We develop a machine learning based tool for accurate prediction of
socio-economic indicators from daytime satellite imagery. The diverse set of
indicators are often not intuitively related to observable features in
satellite images, and are not even always well correlated with each other. Our
predictive tool is more accurate than using night light as a proxy, and can be
used to predict missing data, smooth out noise in surveys, monitor development
progress of a region, and flag potential anomalies. Finally, we use predicted
variables to do robustness analysis of a regression study of high rate of
stunting in India.

arXiv link: http://arxiv.org/abs/1712.02282v3

Econometrics arXiv updated paper (originally submitted: 2017-12-05)

Determination of Pareto exponents in economic models driven by Markov multiplicative processes

Authors: Brendan K. Beare, Alexis Akira Toda

This article contains new tools for studying the shape of the stationary
distribution of sizes in a dynamic economic system in which units experience
random multiplicative shocks and are occasionally reset. Each unit has a
Markov-switching type which influences their growth rate and reset probability.
We show that the size distribution has a Pareto upper tail, with exponent equal
to the unique positive solution to an equation involving the spectral radius of
a certain matrix-valued function. Under a non-lattice condition on growth
rates, an eigenvector associated with the Pareto exponent provides the
distribution of types in the upper tail of the size distribution.

arXiv link: http://arxiv.org/abs/1712.01431v5

Econometrics arXiv updated paper (originally submitted: 2017-11-28)

The Effect of Partisanship and Political Advertising on Close Family Ties

Authors: M. Keith Chen, Ryne Rohla

Research on growing American political polarization and antipathy primarily
studies public institutions and political processes, ignoring private effects
including strained family ties. Using anonymized smartphone-location data and
precinct-level voting, we show that Thanksgiving dinners attended by
opposing-party precinct residents were 30-50 minutes shorter than same-party
dinners. This decline from a mean of 257 minutes survives extensive spatial and
demographic controls. Dinner reductions in 2016 tripled for travelers from
media markets with heavy political advertising --- an effect not observed in
2015 --- implying a relationship to election-related behavior. Effects appear
asymmetric: while fewer Democratic-precinct residents traveled in 2016 than
2015, political differences shortened Thanksgiving dinners more among
Republican-precinct residents. Nationwide, 34 million person-hours of
cross-partisan Thanksgiving discourse were lost in 2016 to partisan effects.

arXiv link: http://arxiv.org/abs/1711.10602v2

Econometrics arXiv paper, submitted: 2017-11-28

Identification of and correction for publication bias

Authors: Isaiah Andrews, Maximilian Kasy

Some empirical results are more likely to be published than others. Such
selective publication leads to biased estimates and distorted inference. This
paper proposes two approaches for identifying the conditional probability of
publication as a function of a study's results, the first based on systematic
replication studies and the second based on meta-studies. For known conditional
publication probabilities, we propose median-unbiased estimators and associated
confidence sets that correct for selective publication. We apply our methods to
recent large-scale replication studies in experimental economics and
psychology, and to meta-studies of the effects of minimum wages and de-worming
programs.

arXiv link: http://arxiv.org/abs/1711.10527v1

Econometrics arXiv paper, submitted: 2017-11-27

Constructive Identification of Heterogeneous Elasticities in the Cobb-Douglas Production Function

Authors: Tong Li, Yuya Sasaki

This paper presents the identification of heterogeneous elasticities in the
Cobb-Douglas production function. The identification is constructive with
closed-form formulas for the elasticity with respect to each input for each
firm. We propose that the flexible input cost ratio plays the role of a control
function under "non-collinear heterogeneity" between elasticities with respect
to two flexible inputs. The ex ante flexible input cost share can be used to
identify the elasticities with respect to flexible inputs for each firm. The
elasticities with respect to labor and capital can be subsequently identified
for each firm under the timing assumption admitting the functional
independence.

arXiv link: http://arxiv.org/abs/1711.10031v1

Econometrics arXiv paper, submitted: 2017-11-25

Forecasting of a Hierarchical Functional Time Series on Example of Macromodel for Day and Night Air Pollution in Silesia Region: A Critical Overview

Authors: Daniel Kosiorowski, Dominik Mielczarek, Jerzy. P. Rydlewski

In economics we often face a system, which intrinsically imposes a structure
of hierarchy of its components, i.e., in modelling trade accounts related to
foreign exchange or in optimization of regional air protection policy.
A problem of reconciliation of forecasts obtained on different levels of
hierarchy has been addressed in the statistical and econometric literature for
many times and concerns bringing together forecasts obtained independently at
different levels of hierarchy.
This paper deals with this issue in case of a hierarchical functional time
series. We present and critically discuss a state of art and indicate
opportunities of an application of these methods to a certain environment
protection problem. We critically compare the best predictor known from the
literature with our own original proposal. Within the paper we study a
macromodel describing a day and night air pollution in Silesia region divided
into five subregions.

arXiv link: http://arxiv.org/abs/1712.03797v1

Econometrics arXiv paper, submitted: 2017-11-24

The Research on the Stagnant Development of Shantou Special Economic Zone Under Reform and Opening-Up Policy

Authors: Bowen Cai

This study briefly introduces the development of Shantou Special Economic
Zone under Reform and Opening-Up Policy from 1980 through 2016 with a focus on
policy making issues and its influences on local economy. This paper is divided
into two parts, 1980 to 1991, 1992 to 2016 in accordance with the separation of
the original Shantou District into three cities: Shantou, Chaozhou and Jieyang
in the end of 1991. This study analyzes the policy making issues in the
separation of the original Shantou District, the influences of the policy on
Shantou's economy after separation, the possibility of merging the three cities
into one big new economic district in the future and reasons that lead to the
stagnant development of Shantou in recent 20 years. This paper uses statistical
longitudinal analysis in analyzing economic problems with applications of
non-parametric statistics through generalized additive model and time series
forecasting methods. The paper is authored by Bowen Cai solely, who is the
graduate student in the PhD program of Applied and Computational Mathematics
and Statistics at the University of Notre Dame with concentration in big data
analysis.

arXiv link: http://arxiv.org/abs/1711.08877v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2017-11-19

Estimation Considerations in Contextual Bandits

Authors: Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, Guido Imbens

Contextual bandit algorithms are sensitive to the estimation method of the
outcome model as well as the exploration method used, particularly in the
presence of rich heterogeneity or complex outcome models, which can lead to
difficult estimation problems along the path of learning. We study a
consideration for the exploration vs. exploitation framework that does not
arise in multi-armed bandits but is crucial in contextual bandits; the way
exploration and exploitation is conducted in the present affects the bias and
variance in the potential outcome model estimation in subsequent stages of
learning. We develop parametric and non-parametric contextual bandits that
integrate balancing methods from the causal inference literature in their
estimation to make it less prone to problems of estimation bias. We provide the
first regret bound analyses for contextual bandits with balancing in the domain
of linear contextual bandits that match the state of the art regret bounds. We
demonstrate the strong practical advantage of balanced contextual bandits on a
large number of supervised learning datasets and on a synthetic example that
simulates model mis-specification and prejudice in the initial training data.
Additionally, we develop contextual bandits with simpler assignment policies by
leveraging sparse model estimation methods from the econometrics literature and
demonstrate empirically that in the early stages they can improve the rate of
learning and decrease regret.

arXiv link: http://arxiv.org/abs/1711.07077v4

Econometrics arXiv paper, submitted: 2017-11-18

Robust Synthetic Control

Authors: Muhammad Jehangir Amjad, Devavrat Shah, Dennis Shen

We present a robust generalization of the synthetic control method for
comparative case studies. Like the classical method, we present an algorithm to
estimate the unobservable counterfactual of a treatment unit. A distinguishing
feature of our algorithm is that of de-noising the data matrix via singular
value thresholding, which renders our approach robust in multiple facets: it
automatically identifies a good subset of donors, overcomes the challenges of
missing data, and continues to work well in settings where covariate
information may not be provided. To begin, we establish the condition under
which the fundamental assumption in synthetic control-like approaches holds,
i.e. when the linear relationship between the treatment unit and the donor pool
prevails in both the pre- and post-intervention periods. We provide the first
finite sample analysis for a broader class of models, the Latent Variable
Model, in contrast to Factor Models previously considered in the literature.
Further, we show that our de-noising procedure accurately imputes missing
entries, producing a consistent estimator of the underlying signal matrix
provided $p = \Omega( T^{-1 + \zeta})$ for some $\zeta > 0$; here, $p$ is the
fraction of observed data and $T$ is the time interval of interest. Under the
same setting, we prove that the mean-squared-error (MSE) in our prediction
estimation scales as $O(\sigma^2/p + 1/T)$, where $\sigma^2$ is the
noise variance. Using a data aggregation method, we show that the MSE can be
made as small as $O(T^{-1/2+\gamma})$ for any $\gamma \in (0, 1/2)$, leading to
a consistent estimator. We also introduce a Bayesian framework to quantify the
model uncertainty through posterior probabilities. Our experiments, using both
real-world and synthetic datasets, demonstrate that our robust generalization
yields an improvement over the classical synthetic control method.

arXiv link: http://arxiv.org/abs/1711.06940v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2017-11-17

Calibration of Distributionally Robust Empirical Optimization Models

Authors: Jun-Ya Gotoh, Michael Jong Kim, Andrew E. B. Lim

We study the out-of-sample properties of robust empirical optimization
problems with smooth $\phi$-divergence penalties and smooth concave objective
functions, and develop a theory for data-driven calibration of the non-negative
"robustness parameter" $\delta$ that controls the size of the deviations from
the nominal model. Building on the intuition that robust optimization reduces
the sensitivity of the expected reward to errors in the model by controlling
the spread of the reward distribution, we show that the first-order benefit of
“little bit of robustness" (i.e., $\delta$ small, positive) is a significant
reduction in the variance of the out-of-sample reward while the corresponding
impact on the mean is almost an order of magnitude smaller. One implication is
that substantial variance (sensitivity) reduction is possible at little cost if
the robustness parameter is properly calibrated. To this end, we introduce the
notion of a robust mean-variance frontier to select the robustness parameter
and show that it can be approximated using resampling methods like the
bootstrap. Our examples show that robust solutions resulting from "open loop"
calibration methods (e.g., selecting a $90%$ confidence level regardless of
the data and objective function) can be very conservative out-of-sample, while
those corresponding to the robustness parameter that optimizes an estimate of
the out-of-sample expected reward (e.g., via the bootstrap) with no regard for
the variance are often insufficiently robust.

arXiv link: http://arxiv.org/abs/1711.06565v2

Econometrics arXiv updated paper (originally submitted: 2017-11-17)

Economic Complexity Unfolded: Interpretable Model for the Productive Structure of Economies

Authors: Zoran Utkovski, Melanie F. Pradier, Viktor Stojkoski, Fernando Perez-Cruz, Ljupco Kocarev

Economic complexity reflects the amount of knowledge that is embedded in the
productive structure of an economy. It resides on the premise of hidden
capabilities - fundamental endowments underlying the productive structure. In
general, measuring the capabilities behind economic complexity directly is
difficult, and indirect measures have been suggested which exploit the fact
that the presence of the capabilities is expressed in a country's mix of
products. We complement these studies by introducing a probabilistic framework
which leverages Bayesian non-parametric techniques to extract the dominant
features behind the comparative advantage in exported products. Based on
economic evidence and trade data, we place a restricted Indian Buffet Process
on the distribution of countries' capability endowment, appealing to a culinary
metaphor to model the process of capability acquisition. The approach comes
with a unique level of interpretability, as it produces a concise and
economically plausible description of the instantiated capabilities.

arXiv link: http://arxiv.org/abs/1711.07327v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2017-11-13

Improved Density and Distribution Function Estimation

Authors: Vitaliy Oryshchenko, Richard J. Smith

Given additional distributional information in the form of moment
restrictions, kernel density and distribution function estimators with implied
generalised empirical likelihood probabilities as weights achieve a reduction
in variance due to the systematic use of this extra information. The particular
interest here is the estimation of densities or distributions of (generalised)
residuals in semi-parametric models defined by a finite number of moment
restrictions. Such estimates are of great practical interest, being potentially
of use for diagnostic purposes, including tests of parametric assumptions on an
error distribution, goodness-of-fit tests or tests of overidentifying moment
restrictions. The paper gives conditions for the consistency and describes the
asymptotic mean squared error properties of the kernel density and distribution
estimators proposed in the paper. A simulation study evaluates the small sample
performance of these estimators. Supplements provide analytic examples to
illustrate situations where kernel weighting provides a reduction in variance
together with proofs of the results in the paper.

arXiv link: http://arxiv.org/abs/1711.04793v2

Econometrics arXiv updated paper (originally submitted: 2017-11-13)

Uniform Inference for Characteristic Effects of Large Continuous-Time Linear Models

Authors: Yuan Liao, Xiye Yang

We consider continuous-time models with a large panel of moment conditions,
where the structural parameter depends on a set of characteristics, whose
effects are of interest. The leading example is the linear factor model in
financial economics where factor betas depend on observed characteristics such
as firm specific instruments and macroeconomic variables, and their effects
pick up long-run time-varying beta fluctuations. We specify the factor betas as
the sum of characteristic effects and an orthogonal idiosyncratic parameter
that captures high-frequency movements. It is often the case that researchers
do not know whether or not the latter exists, or its strengths, and thus the
inference about the characteristic effects should be valid uniformly over a
broad class of data generating processes for idiosyncratic parameters. We
construct our estimation and inference in a two-step continuous-time GMM
framework. It is found that the limiting distribution of the estimated
characteristic effects has a discontinuity when the variance of the
idiosyncratic parameter is near the boundary (zero), which makes the usual
"plug-in" method using the estimated asymptotic variance only valid pointwise
and may produce either over- or under- coveraging probabilities. We show that
the uniformity can be achieved by cross-sectional bootstrap. Our procedure
allows both known and estimated factors, and also features a bias correction
for the effect of estimating unknown factors.

arXiv link: http://arxiv.org/abs/1711.04392v2

Econometrics arXiv cross-link from math.PR (math.PR), submitted: 2017-11-10

How fragile are information cascades?

Authors: Yuval Peres, Miklos Z. Racz, Allan Sly, Izabella Stuhl

It is well known that sequential decision making may lead to information
cascades. That is, when agents make decisions based on their private
information, as well as observing the actions of those before them, then it
might be rational to ignore their private signal and imitate the action of
previous individuals. If the individuals are choosing between a right and a
wrong state, and the initial actions are wrong, then the whole cascade will be
wrong. This issue is due to the fact that cascades can be based on very little
information.
We show that if agents occasionally disregard the actions of others and base
their action only on their private information, then wrong cascades can be
avoided. Moreover, we study the optimal asymptotic rate at which the error
probability at time $t$ can go to zero. The optimal policy is for the player at
time $t$ to follow their private information with probability $p_{t} = c/t$,
leading to a learning rate of $c'/t$, where the constants $c$ and $c'$ are
explicit.

arXiv link: http://arxiv.org/abs/1711.04024v2

Econometrics arXiv paper, submitted: 2017-11-10

Testing for observation-dependent regime switching in mixture autoregressive models

Authors: Mika Meitz, Pentti Saikkonen

Testing for regime switching when the regime switching probabilities are
specified either as constants (`mixture models') or are governed by a
finite-state Markov chain (`Markov switching models') are long-standing
problems that have also attracted recent interest. This paper considers testing
for regime switching when the regime switching probabilities are time-varying
and depend on observed data (`observation-dependent regime switching').
Specifically, we consider the likelihood ratio test for observation-dependent
regime switching in mixture autoregressive models. The testing problem is
highly nonstandard, involving unidentified nuisance parameters under the null,
parameters on the boundary, singular information matrices, and higher-order
approximations of the log-likelihood. We derive the asymptotic null
distribution of the likelihood ratio test statistic in a general mixture
autoregressive setting using high-level conditions that allow for various forms
of dependence of the regime switching probabilities on past observations, and
we illustrate the theory using two particular mixture autoregressive models.
The likelihood ratio test has a nonstandard asymptotic distribution that can
easily be simulated, and Monte Carlo studies show the test to have satisfactory
finite sample size and power properties.

arXiv link: http://arxiv.org/abs/1711.03959v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2017-11-09

SHOPPER: A Probabilistic Model of Consumer Choice with Substitutes and Complements

Authors: Francisco J. R. Ruiz, Susan Athey, David M. Blei

We develop SHOPPER, a sequential probabilistic model of shopping data.
SHOPPER uses interpretable components to model the forces that drive how a
customer chooses products; in particular, we designed SHOPPER to capture how
items interact with other items. We develop an efficient posterior inference
algorithm to estimate these forces from large-scale data, and we analyze a
large dataset from a major chain grocery store. We are interested in answering
counterfactual queries about changes in prices. We found that SHOPPER provides
accurate predictions even under price interventions, and that it helps identify
complementary and substitutable pairs of products.

arXiv link: http://arxiv.org/abs/1711.03560v3

Econometrics arXiv paper, submitted: 2017-11-09

Measuring Price Discovery between Nearby and Deferred Contracts in Storable and Non-Storable Commodity Futures Markets

Authors: Zhepeng Hu, Mindy Mallory, Teresa Serra, Philip Garcia

Futures market contracts with varying maturities are traded concurrently and
the speed at which they process information is of value in understanding the
pricing discovery process. Using price discovery measures, including Putnins
(2013) information leadership share and intraday data, we quantify the
proportional contribution of price discovery between nearby and deferred
contracts in the corn and live cattle futures markets. Price discovery is more
systematic in the corn than in the live cattle market. On average, nearby
contracts lead all deferred contracts in price discovery in the corn market,
but have a relatively less dominant role in the live cattle market. In both
markets, the nearby contract loses dominance when its relative volume share
dips below 50%, which occurs about 2-3 weeks before expiration in corn and 5-6
weeks before expiration in live cattle. Regression results indicate that the
share of price discovery is most closely linked to trading volume but is also
affected, to far less degree, by time to expiration, backwardation, USDA
announcements and market crashes. The effects of these other factors vary
between the markets which likely reflect the difference in storability as well
as other market-related characteristics.

arXiv link: http://arxiv.org/abs/1711.03506v1

Econometrics arXiv updated paper (originally submitted: 2017-11-07)

Identification and Estimation of Spillover Effects in Randomized Experiments

Authors: Gonzalo Vazquez-Bare

I study identification, estimation and inference for spillover effects in
experiments where units' outcomes may depend on the treatment assignments of
other units within a group. I show that the commonly-used reduced-form
linear-in-means regression identifies a weighted sum of spillover effects with
some negative weights, and that the difference in means between treated and
controls identifies a combination of direct and spillover effects entering with
different signs. I propose nonparametric estimators for average direct and
spillover effects that overcome these issues and are consistent and
asymptotically normal under a precise relationship between the number of
parameters of interest, the total sample size and the treatment assignment
mechanism. These findings are illustrated using data from a conditional cash
transfer program and with simulations. The empirical results reveal the
potential pitfalls of failing to flexibly account for spillover effects in
policy evaluation: the estimated difference in means and the reduced-form
linear-in-means coefficients are all close to zero and statistically
insignificant, whereas the nonparametric estimators I propose reveal large,
nonlinear and significant spillover effects.

arXiv link: http://arxiv.org/abs/1711.02745v8

Econometrics arXiv cross-link from math.GR (math.GR), submitted: 2017-11-07

In search of a new economic model determined by logistic growth

Authors: Roman G. Smirnov, Kunpeng Wang

In this paper we extend the work by Ryuzo Sato devoted to the development of
economic growth models within the framework of the Lie group theory. We propose
a new growth model based on the assumption of logistic growth in factors. It is
employed to derive new production functions and introduce a new notion of wage
share. In the process it is shown that the new functions compare reasonably
well against relevant economic data. The corresponding problem of maximization
of profit under conditions of perfect competition is solved with the aid of one
of these functions. In addition, it is explained in reasonably rigorous
mathematical terms why Bowley's law no longer holds true in post-1960 data.

arXiv link: http://arxiv.org/abs/1711.02625v5

Econometrics arXiv updated paper (originally submitted: 2017-11-06)

Semiparametric Estimation of Structural Functions in Nonseparable Triangular Models

Authors: Victor Chernozhukov, Iván Fernández-Val, Whitney Newey, Sami Stouli, Francis Vella

Triangular systems with nonadditively separable unobserved heterogeneity
provide a theoretically appealing framework for the modelling of complex
structural relationships. However, they are not commonly used in practice due
to the need for exogenous variables with large support for identification, the
curse of dimensionality in estimation, and the lack of inferential tools. This
paper introduces two classes of semiparametric nonseparable triangular models
that address these limitations. They are based on distribution and quantile
regression modelling of the reduced form conditional distributions of the
endogenous variables. We show that average, distribution and quantile
structural functions are identified in these systems through a control function
approach that does not require a large support condition. We propose a
computationally attractive three-stage procedure to estimate the structural
functions where the first two stages consist of quantile or distribution
regressions. We provide asymptotic theory and uniform inference methods for
each stage. In particular, we derive functional central limit theorems and
bootstrap functional central limit theorems for the distribution regression
estimators of the structural functions. These results establish the validity of
the bootstrap for three-stage estimators of structural functions, and lead to
simple inference algorithms. We illustrate the implementation and applicability
of all our methods with numerical simulations and an empirical application to
demand analysis.

arXiv link: http://arxiv.org/abs/1711.02184v3

Econometrics arXiv updated paper (originally submitted: 2017-11-06)

Identifying the Effects of a Program Offer with an Application to Head Start

Authors: Vishal Kamat

I propose a treatment selection model that introduces unobserved
heterogeneity in both choice sets and preferences to evaluate the average
effects of a program offer. I show how to exploit the model structure to define
parameters capturing these effects and then computationally characterize their
identified sets under instrumental variable variation in choice sets. I
illustrate these tools by analyzing the effects of providing an offer to the
Head Start preschool program using data from the Head Start Impact Study. I
find that such a policy affects a large number of children who take up the
offer, and that they subsequently have positive effects on test scores. These
effects arise from children who do not have any preschool as an outside option.
A cost-benefit analysis reveals that the earning benefits associated with the
test score gains can be large and outweigh the net costs associated with offer
take up.

arXiv link: http://arxiv.org/abs/1711.02048v6

Econometrics arXiv paper, submitted: 2017-11-02

Equity in Startups

Authors: Hervé Lebret

Startups have become in less than 50 years a major component of innovation
and economic growth. An important feature of the startup phenomenon has been
the wealth created through equity in startups to all stakeholders. These
include the startup founders, the investors, and also the employees through the
stock-option mechanism and universities through licenses of intellectual
property. In the employee group, the allocation to important managers like the
chief executive, vice-presidents and other officers, and independent board
members is also analyzed. This report analyzes how equity was allocated in more
than 400 startups, most of which had filed for an initial public offering. The
author has the ambition of informing a general audience about best practice in
equity split, in particular in Silicon Valley, the central place for startup
innovation.

arXiv link: http://arxiv.org/abs/1711.00661v1

Econometrics arXiv paper, submitted: 2017-11-02

Startups and Stanford University

Authors: Hervé Lebret

Startups have become in less than 50 years a major component of innovation
and economic growth. Silicon Valley has been the place where the startup
phenomenon was the most obvious and Stanford University was a major component
of that success. Companies such as Google, Yahoo, Sun Microsystems, Cisco,
Hewlett Packard had very strong links with Stanford but even these vary famous
success stories cannot fully describe the richness and diversity of the
Stanford entrepreneurial activity. This report explores the dynamics of more
than 5000 companies founded by Stanford University alumni and staff, through
their value creation, their field of activities, their growth patterns and
more. The report also explores some features of the founders of these companies
such as their academic background or the number of years between their Stanford
experience and their company creation.

arXiv link: http://arxiv.org/abs/1711.00644v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2017-11-01

Sophisticated and small versus simple and sizeable: When does it pay off to introduce drifting coefficients in Bayesian VARs?

Authors: Martin Feldkircher, Luis Gruber, Florian Huber, Gregor Kastner

We assess the relationship between model size and complexity in the
time-varying parameter VAR framework via thorough predictive exercises for the
Euro Area, the United Kingdom and the United States. It turns out that
sophisticated dynamics through drifting coefficients are important in small
data sets, while simpler models tend to perform better in sizeable data sets.
To combine the best of both worlds, novel shrinkage priors help to mitigate the
curse of dimensionality, resulting in competitive forecasts for all scenarios
considered. Furthermore, we discuss dynamic model selection to improve upon the
best performing individual model for each point in time.

arXiv link: http://arxiv.org/abs/1711.00564v4

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2017-11-01

Orthogonal Machine Learning: Power and Limitations

Authors: Lester Mackey, Vasilis Syrgkanis, Ilias Zadik

Double machine learning provides $n$-consistent estimates of
parameters of interest even when high-dimensional or nonparametric nuisance
parameters are estimated at an $n^{-1/4}$ rate. The key is to employ
Neyman-orthogonal moment equations which are first-order insensitive to
perturbations in the nuisance parameters. We show that the $n^{-1/4}$
requirement can be improved to $n^{-1/(2k+2)}$ by employing a $k$-th order
notion of orthogonality that grants robustness to more complex or
higher-dimensional nuisance parameters. In the partially linear regression
setting popular in causal inference, we show that we can construct second-order
orthogonal moments if and only if the treatment residual is not normally
distributed. Our proof relies on Stein's lemma and may be of independent
interest. We conclude by demonstrating the robustness benefits of an explicit
doubly-orthogonal estimation procedure for treatment effect.

arXiv link: http://arxiv.org/abs/1711.00342v6

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2017-11-01

On some further properties and application of Weibull-R family of distributions

Authors: Indranil Ghosh, Saralees Nadarajah

In this paper, we provide some new results for the Weibull-R family of
distributions (Alzaghal, Ghosh and Alzaatreh (2016)). We derive some new
structural properties of the Weibull-R family of distributions. We provide
various characterizations of the family via conditional moments, some functions
of order statistics and via record values.

arXiv link: http://arxiv.org/abs/1711.00171v1

Econometrics arXiv paper, submitted: 2017-10-31

Macroeconomics and FinTech: Uncovering Latent Macroeconomic Effects on Peer-to-Peer Lending

Authors: Jessica Foo, Lek-Heng Lim, Ken Sze-Wai Wong

Peer-to-peer (P2P) lending is a fast growing financial technology (FinTech)
trend that is displacing traditional retail banking. Studies on P2P lending
have focused on predicting individual interest rates or default probabilities.
However, the relationship between aggregated P2P interest rates and the general
economy will be of interest to investors and borrowers as the P2P credit market
matures. We show that the variation in P2P interest rates across grade types
are determined by three macroeconomic latent factors formed by Canonical
Correlation Analysis (CCA) - macro default, investor uncertainty, and the
fundamental value of the market. However, the variation in P2P interest rates
across term types cannot be explained by the general economy.

arXiv link: http://arxiv.org/abs/1710.11283v1

Econometrics arXiv updated paper (originally submitted: 2017-10-30)

Nonparametric Identification in Index Models of Link Formation

Authors: Wayne Yuan Gao

We consider an index model of dyadic link formation with a homophily effect
index and a degree heterogeneity index. We provide nonparametric identification
results in a single large network setting for the potentially nonparametric
homophily effect function, the realizations of unobserved individual fixed
effects and the unknown distribution of idiosyncratic pairwise shocks, up to
normalization, for each possible true value of the unknown parameters. We
propose a novel form of scale normalization on an arbitrary interquantile
range, which is not only theoretically robust but also proves particularly
convenient for the identification analysis, as quantiles provide direct
linkages between the observable conditional probabilities and the unknown index
values. We then use an inductive "in-fill and out-expansion" algorithm to
establish our main results, and consider extensions to more general settings
that allow nonseparable dependence between homophily and degree heterogeneity,
as well as certain extents of network sparsity and weaker assumptions on the
support of unobserved heterogeneity. As a byproduct, we also propose a concept
called "modeling equivalence" as a refinement of "observational equivalence",
and use it to provide a formal discussion about normalization, identification
and their interplay with counterfactuals.

arXiv link: http://arxiv.org/abs/1710.11230v5

Econometrics arXiv updated paper (originally submitted: 2017-10-30)

Artificial Intelligence as Structural Estimation: Economic Interpretations of Deep Blue, Bonanza, and AlphaGo

Authors: Mitsuru Igami

Artificial intelligence (AI) has achieved superhuman performance in a growing
number of tasks, but understanding and explaining AI remain challenging. This
paper clarifies the connections between machine-learning algorithms to develop
AIs and the econometrics of dynamic structural models through the case studies
of three famous game AIs. Chess-playing Deep Blue is a calibrated value
function, whereas shogi-playing Bonanza is an estimated value function via
Rust's (1987) nested fixed-point method. AlphaGo's "supervised-learning policy
network" is a deep neural network implementation of Hotz and Miller's (1993)
conditional choice probability estimation; its "reinforcement-learning value
network" is equivalent to Hotz, Miller, Sanders, and Smith's (1994) conditional
choice simulation method. Relaxing these AIs' implicit econometric assumptions
would improve their structural interpretability.

arXiv link: http://arxiv.org/abs/1710.10967v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2017-10-27

Matrix Completion Methods for Causal Panel Data Models

Authors: Susan Athey, Mohsen Bayati, Nikolay Doudchenko, Guido Imbens, Khashayar Khosravi

In this paper we study methods for estimating causal effects in settings with
panel data, where some units are exposed to a treatment during some periods and
the goal is estimating counterfactual (untreated) outcomes for the treated
unit/period combinations. We propose a class of matrix completion estimators
that uses the observed elements of the matrix of control outcomes corresponding
to untreated unit/periods to impute the "missing" elements of the control
outcome matrix, corresponding to treated units/periods. This leads to a matrix
that well-approximates the original (incomplete) matrix, but has lower
complexity according to the nuclear norm for matrices. We generalize results
from the matrix completion literature by allowing the patterns of missing data
to have a time series dependency structure that is common in social science
applications. We present novel insights concerning the connections between the
matrix completion literature, the literature on interactive fixed effects
models and the literatures on program evaluation under unconfoundedness and
synthetic control methods. We show that all these estimators can be viewed as
focusing on the same objective function. They differ solely in the way they
deal with identification, in some cases solely through regularization (our
proposed nuclear norm matrix completion estimator) and in other cases primarily
through imposing hard restrictions (the unconfoundedness and synthetic control
approaches). The proposed method outperforms unconfoundedness-based or
synthetic control estimators in simulations based on real data.

arXiv link: http://arxiv.org/abs/1710.10251v5

Econometrics arXiv updated paper (originally submitted: 2017-10-25)

Shape-Constrained Density Estimation via Optimal Transport

Authors: Ryan Cumings-Menon

Constraining the maximum likelihood density estimator to satisfy a
sufficiently strong constraint, $\log-$concavity being a common example, has
the effect of restoring consistency without requiring additional parameters.
Since many results in economics require densities to satisfy a regularity
condition, these estimators are also attractive for the structural estimation
of economic models. In all of the examples of regularity conditions provided by
Bagnoli and Bergstrom (2005) and Ewerhart (2013), $\log-$concavity is
sufficient to ensure that the density satisfies the required conditions.
However, in many cases $\log-$concavity is far from necessary, and it has the
unfortunate side effect of ruling out sub-exponential tail behavior.
In this paper, we use optimal transport to formulate a shape constrained
density estimator. We initially describe the estimator using a $\rho-$concavity
constraint. In this setting we provide results on consistency, asymptotic
distribution, convexity of the optimization problem defining the estimator, and
formulate a test for the null hypothesis that the population density satisfies
a shape constraint. Afterward, we provide sufficient conditions for these
results to hold using an arbitrary shape constraint. This generalization is
used to explore whether the California Department of Transportation's decision
to award construction contracts with the use of a first price auction is cost
minimizing. We estimate the marginal costs of construction firms subject to
Myerson's (1981) regularity condition, which is a requirement for the first
price reverse auction to be cost minimizing. The proposed test fails to reject
that the regularity condition is satisfied.

arXiv link: http://arxiv.org/abs/1710.09069v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2017-10-24

Asymptotic Distribution and Simultaneous Confidence Bands for Ratios of Quantile Functions

Authors: Fabian Dunker, Stephan Klasen, Tatyana Krivobokova

Ratio of medians or other suitable quantiles of two distributions is widely
used in medical research to compare treatment and control groups or in
economics to compare various economic variables when repeated cross-sectional
data are available. Inspired by the so-called growth incidence curves
introduced in poverty research, we argue that the ratio of quantile functions
is a more appropriate and informative tool to compare two distributions. We
present an estimator for the ratio of quantile functions and develop
corresponding simultaneous confidence bands, which allow to assess significance
of certain features of the quantile functions ratio. Derived simultaneous
confidence bands rely on the asymptotic distribution of the quantile functions
ratio and do not require re-sampling techniques. The performance of the
simultaneous confidence bands is demonstrated in simulations. Analysis of the
expenditure data from Uganda in years 1999, 2002 and 2005 illustrates the
relevance of our approach.

arXiv link: http://arxiv.org/abs/1710.09009v1

Econometrics arXiv paper, submitted: 2017-10-24

Calibrated Projection in MATLAB: Users' Manual

Authors: Hiroaki Kaido, Francesca Molinari, Jörg Stoye, Matthew Thirkettle

We present the calibrated-projection MATLAB package implementing the method
to construct confidence intervals proposed by Kaido, Molinari and Stoye (2017).
This manual provides details on how to use the package for inference on
projections of partially identified parameters. It also explains how to use the
MATLAB functions we developed to compute confidence intervals on solutions of
nonlinear optimization problems with estimated constraints.

arXiv link: http://arxiv.org/abs/1710.09707v1

Econometrics arXiv paper, submitted: 2017-10-24

Calibration of Machine Learning Classifiers for Probability of Default Modelling

Authors: Pedro G. Fonseca, Hugo D. Lopes

Binary classification is highly used in credit scoring in the estimation of
probability of default. The validation of such predictive models is based both
on rank ability, and also on calibration (i.e. how accurately the probabilities
output by the model map to the observed probabilities). In this study we cover
the current best practices regarding calibration for binary classification, and
explore how different approaches yield different results on real world credit
scoring data. The limitations of evaluating credit scoring models using only
rank ability metrics are explored. A benchmark is run on 18 real world
datasets, and results compared. The calibration techniques used are Platt
Scaling and Isotonic Regression. Also, different machine learning models are
used: Logistic Regression, Random Forest Classifiers, and Gradient Boosting
Classifiers. Results show that when the dataset is treated as a time series,
the use of re-calibration with Isotonic Regression is able to improve the long
term calibration better than the alternative methods. Using re-calibration, the
non-parametric models are able to outperform the Logistic Regression on Brier
Score Loss.

arXiv link: http://arxiv.org/abs/1710.08901v1

Econometrics arXiv paper, submitted: 2017-10-24

Propensity score matching for multiple treatment levels: A CODA-based contribution

Authors: Hajime Seya, Takahiro Yoshida

This study proposes a simple technique for propensity score matching for
multiple treatment levels under the strong unconfoundedness assumption with the
help of the Aitchison distance proposed in the field of compositional data
analysis (CODA).

arXiv link: http://arxiv.org/abs/1710.08558v1

Econometrics arXiv updated paper (originally submitted: 2017-10-23)

Existence in Multidimensional Screening with General Nonlinear Preferences

Authors: Kelvin Shuangjian Zhang

We generalize the approach of Carlier (2001) and provide an existence proof
for the multidimensional screening problem with general nonlinear preferences.
We first formulate the principal's problem as a maximization problem with
$G$-convexity constraints and then use $G$-convex analysis to prove existence.

arXiv link: http://arxiv.org/abs/1710.08549v2

Econometrics arXiv paper, submitted: 2017-10-22

Electricity Market Theory Based on Continuous Time Commodity Model

Authors: Haoyong Chen, Lijia Han

The recent research report of U.S. Department of Energy prompts us to
re-examine the pricing theories applied in electricity market design. The
theory of spot pricing is the basis of electricity market design in many
countries, but it has two major drawbacks: one is that it is still based on the
traditional hourly scheduling/dispatch model, ignores the crucial time
continuity in electric power production and consumption and does not treat the
inter-temporal constraints seriously; the second is that it assumes that the
electricity products are homogeneous in the same dispatch period and cannot
distinguish the base, intermediate and peak power with obviously different
technical and economic characteristics. To overcome the shortcomings, this
paper presents a continuous time commodity model of electricity, including spot
pricing model and load duration model. The market optimization models under the
two pricing mechanisms are established with the Riemann and Lebesgue integrals
respectively and the functional optimization problem are solved by the
Euler-Lagrange equation to obtain the market equilibria. The feasibility of
pricing according to load duration is proved by strict mathematical derivation.
Simulation results show that load duration pricing can correctly identify and
value different attributes of generators, reduce the total electricity
purchasing cost, and distribute profits among the power plants more equitably.
The theory and methods proposed in this paper will provide new ideas and
theoretical foundation for the development of electric power markets.

arXiv link: http://arxiv.org/abs/1710.07918v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2017-10-19

Modal Regression using Kernel Density Estimation: a Review

Authors: Yen-Chi Chen

We review recent advances in modal regression studies using kernel density
estimation. Modal regression is an alternative approach for investigating
relationship between a response variable and its covariates. Specifically,
modal regression summarizes the interactions between the response variable and
covariates using the conditional mode or local modes. We first describe the
underlying model of modal regression and its estimators based on kernel density
estimation. We then review the asymptotic properties of the estimators and
strategies for choosing the smoothing bandwidth. We also discuss useful
algorithms and similar alternative approaches for modal regression, and propose
future direction in this field.

arXiv link: http://arxiv.org/abs/1710.07004v2

Econometrics arXiv paper, submitted: 2017-10-18

Minimax Linear Estimation at a Boundary Point

Authors: Wayne Yuan Gao

This paper characterizes the minimax linear estimator of the value of an
unknown function at a boundary point of its domain in a Gaussian white noise
model under the restriction that the first-order derivative of the unknown
function is Lipschitz continuous (the second-order H\"{o}lder class). The
result is then applied to construct the minimax optimal estimator for the
regression discontinuity design model, where the parameter of interest involves
function values at boundary points.

arXiv link: http://arxiv.org/abs/1710.06809v1

Econometrics arXiv paper, submitted: 2017-10-18

Revenue-based Attribution Modeling for Online Advertising

Authors: Kaifeng Zhao, Seyed Hanif Mahboobi, Saeed Bagheri

This paper examines and proposes several attribution modeling methods that
quantify how revenue should be attributed to online advertising inputs. We
adopt and further develop relative importance method, which is based on
regression models that have been extensively studied and utilized to
investigate the relationship between advertising efforts and market reaction
(revenue). Relative importance method aims at decomposing and allocating
marginal contributions to the coefficient of determination (R^2) of regression
models as attribution values. In particular, we adopt two alternative
submethods to perform this decomposition: dominance analysis and relative
weight analysis. Moreover, we demonstrate an extension of the decomposition
methods from standard linear model to additive model. We claim that our new
approaches are more flexible and accurate in modeling the underlying
relationship and calculating the attribution values. We use simulation examples
to demonstrate the superior performance of our new approaches over traditional
methods. We further illustrate the value of our proposed approaches using a
real advertising campaign dataset.

arXiv link: http://arxiv.org/abs/1710.06561v1

Econometrics arXiv updated paper (originally submitted: 2017-10-10)

Inference on Auctions with Weak Assumptions on Information

Authors: Vasilis Syrgkanis, Elie Tamer, Juba Ziani

Given a sample of bids from independent auctions, this paper examines the
question of inference on auction fundamentals (e.g. valuation distributions,
welfare measures) under weak assumptions on information structure. The question
is important as it allows us to learn about the valuation distribution in a
robust way, i.e., without assuming that a particular information structure
holds across observations. We leverage the recent contributions of
Bergemann2013 in the robust mechanism design literature that exploit the
link between Bayesian Correlated Equilibria and Bayesian Nash Equilibria in
incomplete information games to construct an econometrics framework for
learning about auction fundamentals using observed data on bids. We showcase
our construction of identified sets in private value and common value auctions.
Our approach for constructing these sets inherits the computational simplicity
of solving for correlated equilibria: checking whether a particular valuation
distribution belongs to the identified set is as simple as determining whether
a {\it linear} program is feasible. A similar linear program can be used to
construct the identified set on various welfare measures and counterfactual
objects. For inference and to summarize statistical uncertainty, we propose
novel finite sample methods using tail inequalities that are used to construct
confidence regions on sets. We also highlight methods based on Bayesian
bootstrap and subsampling. A set of Monte Carlo experiments show adequate
finite sample properties of our inference procedures. We illustrate our methods
using data from OCS auctions.

arXiv link: http://arxiv.org/abs/1710.03830v2

Econometrics arXiv paper, submitted: 2017-10-09

A Unified Approach on the Local Power of Panel Unit Root Tests

Authors: Zhongwen Liang

In this paper, a unified approach is proposed to derive the exact local
asymptotic power for panel unit root tests, which is one of the most important
issues in nonstationary panel data literature. Two most widely used panel unit
root tests known as Levin-Lin-Chu (LLC, Levin, Lin and Chu (2002)) and
Im-Pesaran-Shin (IPS, Im, Pesaran and Shin (2003)) tests are systematically
studied for various situations to illustrate our method. Our approach is
characteristic function based, and can be used directly in deriving the moments
of the asymptotic distributions of these test statistics under the null and the
local-to-unity alternatives. For the LLC test, the approach provides an
alternative way to obtain the results that can be derived by the existing
method. For the IPS test, the new results are obtained, which fills the gap in
the literature where few results exist, since the IPS test is non-admissible.
Moreover, our approach has the advantage in deriving Edgeworth expansions of
these tests, which are also given in the paper. The simulations are presented
to illustrate our theoretical findings.

arXiv link: http://arxiv.org/abs/1710.02944v1

Econometrics arXiv cross-link from Computer Science – Machine Learning (cs.LG), submitted: 2017-10-09

Forecasting Across Time Series Databases using Recurrent Neural Networks on Groups of Similar Series: A Clustering Approach

Authors: Kasun Bandara, Christoph Bergmeir, Slawek Smyl

With the advent of Big Data, nowadays in many applications databases
containing large quantities of similar time series are available. Forecasting
time series in these domains with traditional univariate forecasting procedures
leaves great potentials for producing accurate forecasts untapped. Recurrent
neural networks (RNNs), and in particular Long Short-Term Memory (LSTM)
networks, have proven recently that they are able to outperform
state-of-the-art univariate time series forecasting methods in this context
when trained across all available time series. However, if the time series
database is heterogeneous, accuracy may degenerate, so that on the way towards
fully automatic forecasting methods in this space, a notion of similarity
between the time series needs to be built into the methods. To this end, we
present a prediction model that can be used with different types of RNN models
on subgroups of similar time series, which are identified by time series
clustering techniques. We assess our proposed methodology using LSTM networks,
a widely popular RNN variant. Our method achieves competitive results on
benchmarking datasets under competition evaluation procedures. In particular,
in terms of mean sMAPE accuracy, it consistently outperforms the baseline LSTM
model and outperforms all other methods on the CIF2016 forecasting competition
dataset.

arXiv link: http://arxiv.org/abs/1710.03222v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2017-10-09

When Should You Adjust Standard Errors for Clustering?

Authors: Alberto Abadie, Susan Athey, Guido Imbens, Jeffrey Wooldridge

In empirical work it is common to estimate parameters of models and report
associated standard errors that account for "clustering" of units, where
clusters are defined by factors such as geography. Clustering adjustments are
typically motivated by the concern that unobserved components of outcomes for
units within clusters are correlated. However, this motivation does not provide
guidance about questions such as: (i) Why should we adjust standard errors for
clustering in some situations but not others? How can we justify the common
practice of clustering in observational studies but not randomized experiments,
or clustering by state but not by gender? (ii) Why is conventional clustering a
potentially conservative "all-or-nothing" adjustment, and are there alternative
methods that respond to data and are less conservative? (iii) In what settings
does the choice of whether and how to cluster make a difference? We address
these questions using a framework of sampling and design inference. We argue
that clustering can be needed to address sampling issues if sampling follows a
two stage process where in the first stage, a subset of clusters are sampled
from a population of clusters, and in the second stage, units are sampled from
the sampled clusters. Then, clustered standard errors account for the existence
of clusters in the population that we do not see in the sample. Clustering can
be needed to account for design issues if treatment assignment is correlated
with membership in a cluster. We propose new variance estimators to deal with
intermediate settings where conventional cluster standard errors are
unnecessarily conservative and robust standard errors are too small.

arXiv link: http://arxiv.org/abs/1710.02926v4

Econometrics arXiv paper, submitted: 2017-10-06

A Note on Gale, Kuhn, and Tucker's Reductions of Zero-Sum Games

Authors: Shuige Liu

Gale, Kuhn and Tucker (1950) introduced two ways to reduce a zero-sum game by
packaging some strategies with respect to a probability distribution on them.
In terms of value, they gave conditions for a desirable reduction. We show that
a probability distribution for a desirable reduction relies on optimal
strategies in the original game. Also, we correct an improper example given by
them to show that the reverse of a theorem does not hold.

arXiv link: http://arxiv.org/abs/1710.02326v1

Econometrics arXiv cross-link from cs.SY (cs.SY), submitted: 2017-10-05

Finite Time Identification in Unstable Linear Systems

Authors: Mohamad Kazem Shirani Faradonbeh, Ambuj Tewari, George Michailidis

Identification of the parameters of stable linear dynamical systems is a
well-studied problem in the literature, both in the low and high-dimensional
settings. However, there are hardly any results for the unstable case,
especially regarding finite time bounds. For this setting, classical results on
least-squares estimation of the dynamics parameters are not applicable and
therefore new concepts and technical approaches need to be developed to address
the issue. Unstable linear systems arise in key real applications in control
theory, econometrics, and finance. This study establishes finite time bounds
for the identification error of the least-squares estimates for a fairly large
class of heavy-tailed noise distributions, and transition matrices of such
systems. The results relate the time length (samples) required for estimation
to a function of the problem dimension and key characteristics of the true
underlying transition matrix and the noise distribution. To establish them,
appropriate concentration inequalities for random matrices and for sequences of
martingale differences are leveraged.

arXiv link: http://arxiv.org/abs/1710.01852v2

Econometrics arXiv updated paper (originally submitted: 2017-10-04)

Rate-Optimal Estimation of the Intercept in a Semiparametric Sample-Selection Model

Authors: Chuan Goh

This paper presents a new estimator of the intercept of a linear regression
model in cases where the outcome varaible is observed subject to a selection
rule. The intercept is often in this context of inherent interest; for example,
in a program evaluation context, the difference between the intercepts in
outcome equations for participants and non-participants can be interpreted as
the difference in average outcomes of participants and their counterfactual
average outcomes if they had chosen not to participate. The new estimator can
under mild conditions exhibit a rate of convergence in probability equal to
$n^{-p/(2p+1)}$, where $p\ge 2$ is an integer that indexes the strength of
certain smoothness assumptions. This rate of convergence is shown in this
context to be the optimal rate of convergence for estimation of the intercept
parameter in terms of a minimax criterion. The new estimator, unlike other
proposals in the literature, is under mild conditions consistent and
asymptotically normal with a rate of convergence that is the same regardless of
the degree to which selection depends on unobservables in the outcome equation.
Simulation evidence and an empirical example are included.

arXiv link: http://arxiv.org/abs/1710.01423v3

Econometrics arXiv updated paper (originally submitted: 2017-10-02)

A Justification of Conditional Confidence Intervals

Authors: Eric Beutner, Alexander Heinemann, Stephan Smeekes

To quantify uncertainty around point estimates of conditional objects such as
conditional means or variances, parameter uncertainty has to be taken into
account. Attempts to incorporate parameter uncertainty are typically based on
the unrealistic assumption of observing two independent processes, where one is
used for parameter estimation, and the other for conditioning upon. Such
unrealistic foundation raises the question whether these intervals are
theoretically justified in a realistic setting. This paper presents an
asymptotic justification for this type of intervals that does not require such
an unrealistic assumption, but relies on a sample-split approach instead. By
showing that our sample-split intervals coincide asymptotically with the
standard intervals, we provide a novel, and realistic, justification for
confidence intervals of conditional objects. The analysis is carried out for a
rich class of time series models.

arXiv link: http://arxiv.org/abs/1710.00643v2

Econometrics arXiv updated paper (originally submitted: 2017-10-01)

A Note on the Multi-Agent Contracts in Continuous Time

Authors: Qi Luo, Romesh Saigal

Dynamic contracts with multiple agents is a classical decentralized
decision-making problem with asymmetric information. In this paper, we extend
the single-agent dynamic incentive contract model in continuous-time to a
multi-agent scheme in finite horizon and allow the terminal reward to be
dependent on the history of actions and incentives. We first derive a set of
sufficient conditions for the existence of optimal contracts in the most
general setting and conditions under which they form a Nash equilibrium. Then
we show that the principal's problem can be converted to solving
Hamilton-Jacobi-Bellman (HJB) equation requiring a static Nash equilibrium.
Finally, we provide a framework to solve this problem by solving partial
differential equations (PDE) derived from backward stochastic differential
equations (BSDE).

arXiv link: http://arxiv.org/abs/1710.00377v2

Econometrics arXiv updated paper (originally submitted: 2017-09-29)

Heterogeneous Employment Effects of Job Search Programmes: A Machine Learning Approach

Authors: Michael Knaus, Michael Lechner, Anthony Strittmatter

We systematically investigate the effect heterogeneity of job search
programmes for unemployed workers. To investigate possibly heterogeneous
employment effects, we combine non-experimental causal empirical models with
Lasso-type estimators. The empirical analyses are based on rich administrative
data from Swiss social security records. We find considerable heterogeneities
only during the first six months after the start of training. Consistent with
previous results of the literature, unemployed persons with fewer employment
opportunities profit more from participating in these programmes. Furthermore,
we also document heterogeneous employment effects by residence status. Finally,
we show the potential of easy-to-implement programme participation rules for
improving average employment effects of these active labour market programmes.

arXiv link: http://arxiv.org/abs/1709.10279v2

Econometrics arXiv updated paper (originally submitted: 2017-09-28)

Inference for VARs Identified with Sign Restrictions

Authors: Eleonora Granziera, Hyungsik Roger Moon, Frank Schorfheide

There is a fast growing literature that set-identifies structural vector
autoregressions (SVARs) by imposing sign restrictions on the responses of a
subset of the endogenous variables to a particular structural shock
(sign-restricted SVARs). Most methods that have been used to construct
pointwise coverage bands for impulse responses of sign-restricted SVARs are
justified only from a Bayesian perspective. This paper demonstrates how to
formulate the inference problem for sign-restricted SVARs within a
moment-inequality framework. In particular, it develops methods of constructing
confidence bands for impulse response functions of sign-restricted SVARs that
are valid from a frequentist perspective. The paper also provides a comparison
of frequentist and Bayesian coverage bands in the context of an empirical
application - the former can be substantially wider than the latter.

arXiv link: http://arxiv.org/abs/1709.10196v2

Econometrics arXiv paper, submitted: 2017-09-28

Forecasting with Dynamic Panel Data Models

Authors: Laura Liu, Hyungsik Roger Moon, Frank Schorfheide

This paper considers the problem of forecasting a collection of short time
series using cross sectional information in panel data. We construct point
predictors using Tweedie's formula for the posterior mean of heterogeneous
coefficients under a correlated random effects distribution. This formula
utilizes cross-sectional information to transform the unit-specific (quasi)
maximum likelihood estimator into an approximation of the posterior mean under
a prior distribution that equals the population distribution of the random
coefficients. We show that the risk of a predictor based on a non-parametric
estimate of the Tweedie correction is asymptotically equivalent to the risk of
a predictor that treats the correlated-random-effects distribution as known
(ratio-optimality). Our empirical Bayes predictor performs well compared to
various competitors in a Monte Carlo study. In an empirical application we use
the predictor to forecast revenues for a large panel of bank holding companies
and compare forecasts that condition on actual and severely adverse
macroeconomic conditions.

arXiv link: http://arxiv.org/abs/1709.10193v1

Econometrics arXiv updated paper (originally submitted: 2017-09-28)

Estimation of Graphical Models using the $L_{1,2}$ Norm

Authors: Khai X. Chiong, Hyungsik Roger Moon

Gaussian graphical models are recently used in economics to obtain networks
of dependence among agents. A widely-used estimator is the Graphical Lasso
(GLASSO), which amounts to a maximum likelihood estimation regularized using
the $L_{1,1}$ matrix norm on the precision matrix $\Omega$. The $L_{1,1}$ norm
is a lasso penalty that controls for sparsity, or the number of zeros in
$\Omega$. We propose a new estimator called Structured Graphical Lasso
(SGLASSO) that uses the $L_{1,2}$ mixed norm. The use of the $L_{1,2}$ penalty
controls for the structure of the sparsity in $\Omega$. We show that when the
network size is fixed, SGLASSO is asymptotically equivalent to an infeasible
GLASSO problem which prioritizes the sparsity-recovery of high-degree nodes.
Monte Carlo simulation shows that SGLASSO outperforms GLASSO in terms of
estimating the overall precision matrix and in terms of estimating the
structure of the graphical model. In an empirical illustration using a classic
firms' investment dataset, we obtain a network of firms' dependence that
exhibits the core-periphery structure, with General Motors, General Electric
and U.S. Steel forming the core group of firms.

arXiv link: http://arxiv.org/abs/1709.10038v2

Econometrics arXiv updated paper (originally submitted: 2017-09-28)

Estimation of Peer Effects in Endogenous Social Networks: Control Function Approach

Authors: Ida Johnsson, Hyungsik Roger Moon

We propose a method of estimating the linear-in-means model of peer effects
in which the peer group, defined by a social network, is endogenous in the
outcome equation for peer effects. Endogeneity is due to unobservable
individual characteristics that influence both link formation in the network
and the outcome of interest. We propose two estimators of the peer effect
equation that control for the endogeneity of the social connections using a
control function approach. We leave the functional form of the control function
unspecified and treat it as unknown. To estimate the model, we use a sieve
semiparametric approach, and we establish asymptotics of the semiparametric
estimator.

arXiv link: http://arxiv.org/abs/1709.10024v3

Econometrics arXiv paper, submitted: 2017-09-27

Quasi-random Monte Carlo application in CGE systematic sensitivity analysis

Authors: Theodoros Chatzivasileiadis

The uncertainty and robustness of Computable General Equilibrium models can
be assessed by conducting a Systematic Sensitivity Analysis. Different methods
have been used in the literature for SSA of CGE models such as Gaussian
Quadrature and Monte Carlo methods. This paper explores the use of Quasi-random
Monte Carlo methods based on the Halton and Sobol' sequences as means to
improve the efficiency over regular Monte Carlo SSA, thus reducing the
computational requirements of the SSA. The findings suggest that by using
low-discrepancy sequences, the number of simulations required by the regular MC
SSA methods can be notably reduced, hence lowering the computational time
required for SSA of CGE models.

arXiv link: http://arxiv.org/abs/1709.09755v1

Econometrics arXiv updated paper (originally submitted: 2017-09-27)

Inference for Impulse Responses under Model Uncertainty

Authors: Lenard Lieb, Stephan Smeekes

In many macroeconomic applications, confidence intervals for impulse
responses are constructed by estimating VAR models in levels - ignoring
cointegration rank uncertainty. We investigate the consequences of ignoring
this uncertainty. We adapt several methods for handling model uncertainty and
highlight their shortcomings. We propose a new method -
Weighted-Inference-by-Model-Plausibility (WIMP) - that takes rank uncertainty
into account in a data-driven way. In simulations the WIMP outperforms all
other methods considered, delivering intervals that are robust to rank
uncertainty, yet not overly conservative. We also study potential ramifications
of rank uncertainty on applied macroeconomic analysis by re-assessing the
effects of fiscal policy shocks.

arXiv link: http://arxiv.org/abs/1709.09583v3

Econometrics arXiv updated paper (originally submitted: 2017-09-27)

Identification of hedonic equilibrium and nonseparable simultaneous equations

Authors: Victor Chernozhukov, Alfred Galichon, Marc Henry, Brendan Pass

This paper derives conditions under which preferences and technology are
nonparametrically identified in hedonic equilibrium models, where products are
differentiated along more than one dimension and agents are characterized by
several dimensions of unobserved heterogeneity. With products differentiated
along a quality index and agents characterized by scalar unobserved
heterogeneity, single crossing conditions on preferences and technology provide
identifying restrictions in Ekeland, Heckman and Nesheim (2004) and Heckman,
Matzkin and Nesheim (2010). We develop similar shape restrictions in the
multi-attribute case. These shape restrictions, which are based on optimal
transport theory and generalized convexity, allow us to identify preferences
for goods differentiated along multiple dimensions, from the observation of a
single market. We thereby derive nonparametric identification results for
nonseparable simultaneous equations and multi-attribute hedonic equilibrium
models with (possibly) multiple dimensions of unobserved heterogeneity. One of
our results is a proof of absolute continuity of the distribution of
endogenously traded qualities, which is of independent interest.

arXiv link: http://arxiv.org/abs/1709.09570v6

Econometrics arXiv updated paper (originally submitted: 2017-09-27)

Zero-rating of Content and its Effect on the Quality of Service in the Internet

Authors: Manjesh K. Hanawal, Fehmina Malik, Yezekael Hayel

The ongoing net neutrality debate has generated a lot of heated discussions
on whether or not monetary interactions should be regulated between content and
access providers. Among the several topics discussed, `differential pricing'
has recently received attention due to `zero-rating' platforms proposed by some
service providers. In the differential pricing scheme, Internet Service
Providers (ISPs) can exempt data access charges for on content from certain CPs
(zero-rated) while no exemption is on content from other CPs. This allows the
possibility for Content Providers (CPs) to make `sponsorship' agreements to
zero-rate their content and attract more user traffic. In this paper, we study
the effect of differential pricing on various players in the Internet. We first
consider a model with a monopolistic ISP and multiple CPs where users select
CPs based on the quality of service (QoS) and data access charges. We show that
in a differential pricing regime 1) a CP offering low QoS can make have higher
surplus than a CP offering better QoS through sponsorships. 2) Overall QoS
(mean delay) for end users can degrade under differential pricing schemes. In
the oligopolistic market with multiple ISPs, users tend to select the ISP with
lowest ISP resulting in same type of conclusions as in the monopolistic market.
We then study how differential pricing effects the revenue of ISPs.

arXiv link: http://arxiv.org/abs/1709.09334v2

Econometrics arXiv updated paper (originally submitted: 2017-09-26)

Sharp bounds and testability of a Roy model of STEM major choices

Authors: Ismael Mourifie, Marc Henry, Romuald Meango

We analyze the empirical content of the Roy model, stripped down to its
essential features, namely sector specific unobserved heterogeneity and
self-selection on the basis of potential outcomes. We characterize sharp bounds
on the joint distribution of potential outcomes and testable implications of
the Roy self-selection model under an instrumental constraint on the joint
distribution of potential outcomes we call stochastically monotone instrumental
variable (SMIV). We show that testing the Roy model selection is equivalent to
testing stochastic monotonicity of observed outcomes relative to the
instrument. We apply our sharp bounds to the derivation of a measure of
departure from Roy self-selection to identify values of observable
characteristics that induce the most costly misallocation of talent and sector
and are therefore prime targets for intervention. Special emphasis is put on
the case of binary outcomes, which has received little attention in the
literature to date. For richer sets of outcomes, we emphasize the distinction
between pointwise sharp bounds and functional sharp bounds, and its importance,
when constructing sharp bounds on functional features, such as inequality
measures. We analyze a Roy model of college major choice in Canada and Germany
within this framework, and we take a new look at the under-representation of
women in STEM.

arXiv link: http://arxiv.org/abs/1709.09284v2

Econometrics arXiv paper, submitted: 2017-09-26

Discrete Choice and Rational Inattention: a General Equivalence Result

Authors: Mogens Fosgerau, Emerson Melo, Andre de Palma, Matthew Shum

This paper establishes a general equivalence between discrete choice and
rational inattention models. Matejka and McKay (2015, AER) showed that when
information costs are modelled using the Shannon entropy function, the
resulting choice probabilities in the rational inattention model take the
multinomial logit form. By exploiting convex-analytic properties of the
discrete choice model, we show that when information costs are modelled using a
class of generalized entropy functions, the choice probabilities in any
rational inattention model are observationally equivalent to some additive
random utility discrete choice model and vice versa. Thus any additive random
utility model can be given an interpretation in terms of boundedly rational
behavior. This includes empirically relevant specifications such as the probit
and nested logit models.

arXiv link: http://arxiv.org/abs/1709.09117v1

Econometrics arXiv paper, submitted: 2017-09-26

Inference on Estimators defined by Mathematical Programming

Authors: Yu-Wei Hsieh, Xiaoxia Shi, Matthew Shum

We propose an inference procedure for estimators defined by mathematical
programming problems, focusing on the important special cases of linear
programming (LP) and quadratic programming (QP). In these settings, the
coefficients in both the objective function and the constraints of the
mathematical programming problem may be estimated from data and hence involve
sampling error. Our inference approach exploits the characterization of the
solutions to these programming problems by complementarity conditions; by doing
so, we can transform the problem of doing inference on the solution of a
constrained optimization problem (a non-standard inference problem) into one
involving inference based on a set of inequalities with pre-estimated
coefficients, which is much better understood. We evaluate the performance of
our procedure in several Monte Carlo simulations and an empirical application
to the classic portfolio selection problem in finance.

arXiv link: http://arxiv.org/abs/1709.09115v1

Econometrics arXiv paper, submitted: 2017-09-26

Bounds On Treatment Effects On Transitions

Authors: Johan Vikström, Geert Ridder, Martin Weidner

This paper considers the identification of treatment effects on conditional
transition probabilities. We show that even under random assignment only the
instantaneous average treatment effect is point identified. Since treated and
control units drop out at different rates, randomization only ensures the
comparability of treatment and controls at the time of randomization, so that
long-run average treatment effects are not point identified. Instead we derive
informative bounds on these average treatment effects. Our bounds do not impose
(semi)parametric restrictions, for example, proportional hazards. We also
explore various assumptions such as monotone treatment response, common shocks
and positively correlated outcomes that tighten the bounds.

arXiv link: http://arxiv.org/abs/1709.08981v1

Econometrics arXiv updated paper (originally submitted: 2017-09-26)

Fixed Effect Estimation of Large T Panel Data Models

Authors: Iván Fernández-Val, Martin Weidner

This article reviews recent advances in fixed effect estimation of panel data
models for long panels, where the number of time periods is relatively large.
We focus on semiparametric models with unobserved individual and time effects,
where the distribution of the outcome variable conditional on covariates and
unobserved effects is specified parametrically, while the distribution of the
unobserved effects is left unrestricted. Compared to existing reviews on long
panels (Arellano and Hahn 2007; a section in Arellano and Bonhomme 2011) we
discuss models with both individual and time effects, split-panel Jackknife
bias corrections, unbalanced panels, distribution and quantile effects, and
other extensions. Understanding and correcting the incidental parameter bias
caused by the estimation of many fixed effects is our main focus, and the
unifying theme is that the order of this bias is given by the simple formula
p/n for all models discussed, with p the number of estimated parameters and n
the total sample size.

arXiv link: http://arxiv.org/abs/1709.08980v2

Econometrics arXiv cross-link from q-fin.TR (q-fin.TR), submitted: 2017-09-24

Counterparty Credit Limits: The Impact of a Risk-Mitigation Measure on Everyday Trading

Authors: Martin D. Gould, Nikolaus Hautsch, Sam D. Howison, Mason A. Porter

A counterparty credit limit (CCL) is a limit that is imposed by a financial
institution to cap its maximum possible exposure to a specified counterparty.
CCLs help institutions to mitigate counterparty credit risk via selective
diversification of their exposures. In this paper, we analyze how CCLs impact
the prices that institutions pay for their trades during everyday trading. We
study a high-quality data set from a large electronic trading platform in the
foreign exchange spot market, which enables institutions to apply CCLs. We find
empirically that CCLs had little impact on the vast majority of trades in this
data. We also study the impact of CCLs using a new model of trading. By
simulating our model with different underlying CCL networks, we highlight that
CCLs can have a major impact in some situations.

arXiv link: http://arxiv.org/abs/1709.08238v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2017-09-11

Is completeness necessary? Estimation in nonidentified linear models

Authors: Andrii Babii, Jean-Pierre Florens

Modern data analysis depends increasingly on estimating models via flexible
high-dimensional or nonparametric machine learning methods, where the
identification of structural parameters is often challenging and untestable. In
linear settings, this identification hinges on the completeness condition,
which requires the nonsingularity of a high-dimensional matrix or operator and
may fail for finite samples or even at the population level. Regularized
estimators provide a solution by enabling consistent estimation of structural
or average structural functions, sometimes even under identification failure.
We show that the asymptotic distribution in these cases can be nonstandard. We
develop a comprehensive theory of regularized estimators, which include methods
such as high-dimensional ridge regularization, gradient descent, and principal
component analysis (PCA). The results are illustrated for high-dimensional and
nonparametric instrumental variable regressions and are supported through
simulation experiments.

arXiv link: http://arxiv.org/abs/1709.03473v5

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2017-08-27

Principal Components and Regularized Estimation of Factor Models

Authors: Jushan Bai, Serena Ng

It is known that the common factors in a large panel of data can be
consistently estimated by the method of principal components, and principal
components can be constructed by iterative least squares regressions. Replacing
least squares with ridge regressions turns out to have the effect of shrinking
the singular values of the common component and possibly reducing its rank. The
method is used in the machine learning literature to recover low-rank matrices.
We study the procedure from the perspective of estimating a minimum-rank
approximate factor model. We show that the constrained factor estimates are
biased but can be more efficient in terms of mean-squared errors. Rank
consideration suggests a data-dependent penalty for selecting the number of
factors. The new criterion is more conservative in cases when the nominal
number of factors is inflated by the presence of weak factors or large
measurement noise. The framework is extended to incorporate a priori linear
constraints on the loadings. We provide asymptotic results that can be used to
test economic hypotheses.

arXiv link: http://arxiv.org/abs/1708.08137v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2017-08-21

Bias Reduction in Instrumental Variable Estimation through First-Stage Shrinkage

Authors: Jann Spiess

The two-stage least-squares (2SLS) estimator is known to be biased when its
first-stage fit is poor. I show that better first-stage prediction can
alleviate this bias. In a two-stage linear regression model with Normal noise,
I consider shrinkage in the estimation of the first-stage instrumental variable
coefficients. For at least four instrumental variables and a single endogenous
regressor, I establish that the standard 2SLS estimator is dominated with
respect to bias. The dominating IV estimator applies James-Stein type shrinkage
in a first-stage high-dimensional Normal-means problem followed by a
control-function approach in the second stage. It preserves invariances of the
structural instrumental variable equations.

arXiv link: http://arxiv.org/abs/1708.06443v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2017-08-21

Unbiased Shrinkage Estimation

Authors: Jann Spiess

Shrinkage estimation usually reduces variance at the cost of bias. But when
we care only about some parameters of a model, I show that we can reduce
variance without incurring bias if we have additional information about the
distribution of covariates. In a linear regression model with homoscedastic
Normal noise, I consider shrinkage estimation of the nuisance parameters
associated with control variables. For at least three control variables and
exogenous treatment, I establish that the standard least-squares estimator is
dominated with respect to squared-error loss in the treatment effect even among
unbiased estimators and even when the target parameter is low-dimensional. I
construct the dominating estimator by a variant of James-Stein shrinkage in a
high-dimensional Normal-means problem. It can be interpreted as an invariant
generalized Bayes estimator with an uninformative (improper) Jeffreys prior in
the target parameter.

arXiv link: http://arxiv.org/abs/1708.06436v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2017-08-15

Comparing distributions by multiple testing across quantiles or CDF values

Authors: Matt Goldman, David M. Kaplan

When comparing two distributions, it is often helpful to learn at which
quantiles or values there is a statistically significant difference. This
provides more information than the binary "reject" or "do not reject" decision
of a global goodness-of-fit test. Framing our question as multiple testing
across the continuum of quantiles $\tau\in(0,1)$ or values $r\inR$, we
show that the Kolmogorov--Smirnov test (interpreted as a multiple testing
procedure) achieves strong control of the familywise error rate. However, its
well-known flaw of low sensitivity in the tails remains. We provide an
alternative method that retains such strong control of familywise error rate
while also having even sensitivity, i.e., equal pointwise type I error rates at
each of $n\to\infty$ order statistics across the distribution. Our one-sample
method computes instantly, using our new formula that also instantly computes
goodness-of-fit $p$-values and uniform confidence bands. To improve power, we
also propose stepdown and pre-test procedures that maintain control of the
asymptotic familywise error rate. One-sample and two-sample cases are
considered, as well as extensions to regression discontinuity designs and
conditional distributions. Simulations, empirical examples, and code are
provided.

arXiv link: http://arxiv.org/abs/1708.04658v1

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2017-07-29

Identification of Treatment Effects under Conditional Partial Independence

Authors: Matthew A. Masten, Alexandre Poirier

Conditional independence of treatment assignment from potential outcomes is a
commonly used but nonrefutable assumption. We derive identified sets for
various treatment effect parameters under nonparametric deviations from this
conditional independence assumption. These deviations are defined via a
conditional treatment assignment probability, which makes it straightforward to
interpret. Our results can be used to assess the robustness of empirical
conclusions obtained under the baseline conditional independence assumption.

arXiv link: http://arxiv.org/abs/1707.09563v1

Econometrics arXiv cross-link from stat.OT (stat.OT), submitted: 2017-07-26

Econométrie et Machine Learning

Authors: Arthur Charpentier, Emmanuel Flachaire, Antoine Ly

Econometrics and machine learning seem to have one common goal: to construct
a predictive model, for a variable of interest, using explanatory variables (or
features). However, these two fields developed in parallel, thus creating two
different cultures, to paraphrase Breiman (2001). The first was to build
probabilistic models to describe economic phenomena. The second uses algorithms
that will learn from their mistakes, with the aim, most often to classify
(sounds, images, etc.). Recently, however, learning models have proven to be
more effective than traditional econometric techniques (with a price to pay
less explanatory power), and above all, they manage to manage much larger data.
In this context, it becomes necessary for econometricians to understand what
these two cultures are, what opposes them and especially what brings them
closer together, in order to appropriate tools developed by the statistical
learning community to integrate them into Econometric models.

arXiv link: http://arxiv.org/abs/1708.06992v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2017-07-11

Smoothed GMM for quantile models

Authors: Luciano de Castro, Antonio F. Galvao, David M. Kaplan, Xin Liu

This paper develops theory for feasible estimators of finite-dimensional
parameters identified by general conditional quantile restrictions, under much
weaker assumptions than previously seen in the literature. This includes
instrumental variables nonlinear quantile regression as a special case. More
specifically, we consider a set of unconditional moments implied by the
conditional quantile restrictions, providing conditions for local
identification. Since estimators based on the sample moments are generally
impossible to compute numerically in practice, we study feasible estimators
based on smoothed sample moments. We propose a method of moments estimator for
exactly identified models, as well as a generalized method of moments estimator
for over-identified models. We establish consistency and asymptotic normality
of both estimators under general conditions that allow for weakly dependent
data and nonlinear structural models. Simulations illustrate the finite-sample
properties of the methods. Our in-depth empirical application concerns the
consumption Euler equation derived from quantile utility maximization.
Advantages of the quantile Euler equation include robustness to fat tails,
decoupling of risk attitude from the elasticity of intertemporal substitution,
and log-linearization without any approximation error. For the four countries
we examine, the quantile estimates of discount factor and elasticity of
intertemporal substitution are economically reasonable for a range of quantiles
above the median, even when two-stage least squares estimates are not
reasonable.

arXiv link: http://arxiv.org/abs/1707.03436v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2017-07-05

Machine-Learning Tests for Effects on Multiple Outcomes

Authors: Jens Ludwig, Sendhil Mullainathan, Jann Spiess

In this paper we present tools for applied researchers that re-purpose
off-the-shelf methods from the computer-science field of machine learning to
create a "discovery engine" for data from randomized controlled trials (RCTs).
The applied problem we seek to solve is that economists invest vast resources
into carrying out RCTs, including the collection of a rich set of candidate
outcome measures. But given concerns about inference in the presence of
multiple testing, economists usually wind up exploring just a small subset of
the hypotheses that the available data could be used to test. This prevents us
from extracting as much information as possible from each RCT, which in turn
impairs our ability to develop new theories or strengthen the design of policy
interventions. Our proposed solution combines the basic intuition of reverse
regression, where the dependent variable of interest now becomes treatment
assignment itself, with methods from machine learning that use the data
themselves to flexibly identify whether there is any function of the outcomes
that predicts (or has signal about) treatment group status. This leads to
correctly-sized tests with appropriate $p$-values, which also have the
important virtue of being easy to implement in practice. One open challenge
that remains with our work is how to meaningfully interpret the signal that
these methods find.

arXiv link: http://arxiv.org/abs/1707.01473v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2017-06-26

Nonseparable Multinomial Choice Models in Cross-Section and Panel Data

Authors: Victor Chernozhukov, Iván Fernández-Val, Whitney Newey

Multinomial choice models are fundamental for empirical modeling of economic
choices among discrete alternatives. We analyze identification of binary and
multinomial choice models when the choice utilities are nonseparable in
observed attributes and multidimensional unobserved heterogeneity with
cross-section and panel data. We show that derivatives of choice probabilities
with respect to continuous attributes are weighted averages of utility
derivatives in cross-section models with exogenous heterogeneity. In the
special case of random coefficient models with an independent additive effect,
we further characterize that the probability derivative at zero is proportional
to the population mean of the coefficients. We extend the identification
results to models with endogenous heterogeneity using either a control function
or panel data. In time stationary panel models with two periods, we find that
differences over time of derivatives of choice probabilities identify utility
derivatives "on the diagonal," i.e. when the observed attributes take the same
values in the two periods. We also show that time stationarity does not
identify structural derivatives "off the diagonal" both in continuous and
multinomial choice panel models.

arXiv link: http://arxiv.org/abs/1706.08418v2

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2017-06-19

On Heckits, LATE, and Numerical Equivalence

Authors: Patrick Kline, Christopher R. Walters

Structural econometric methods are often criticized for being sensitive to
functional form assumptions. We study parametric estimators of the local
average treatment effect (LATE) derived from a widely used class of latent
threshold crossing models and show they yield LATE estimates algebraically
equivalent to the instrumental variables (IV) estimator. Our leading example is
Heckman's (1979) two-step ("Heckit") control function estimator which, with
two-sided non-compliance, can be used to compute estimates of a variety of
causal parameters. Equivalence with IV is established for a semi-parametric
family of control function estimators and shown to hold at interior solutions
for a class of maximum likelihood estimators. Our results suggest differences
between structural and IV estimates often stem from disagreements about the
target parameter rather than from functional form assumptions per se. In cases
where equivalence fails, reporting structural estimates of LATE alongside IV
provides a simple means of assessing the credibility of structural
extrapolation exercises.

arXiv link: http://arxiv.org/abs/1706.05982v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2017-06-16

Ancillarity-Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Estimation of Stochastic Volatility Models

Authors: Gregor Kastner, Sylvia Frühwirth-Schnatter

Bayesian inference for stochastic volatility models using MCMC methods highly
depends on actual parameter values in terms of sampling efficiency. While draws
from the posterior utilizing the standard centered parameterization break down
when the volatility of volatility parameter in the latent state equation is
small, non-centered versions of the model show deficiencies for highly
persistent latent variable series. The novel approach of
ancillarity-sufficiency interweaving has recently been shown to aid in
overcoming these issues for a broad class of multilevel models. In this paper,
we demonstrate how such an interweaving strategy can be applied to stochastic
volatility models in order to greatly improve sampling efficiency for all
parameters and throughout the entire parameter range. Moreover, this method of
"combining best of different worlds" allows for inference for parameter
constellations that have previously been infeasible to estimate without the
need to select a particular parameterization beforehand.

arXiv link: http://arxiv.org/abs/1706.05280v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2017-06-06

Sampling-based vs. Design-based Uncertainty in Regression Analysis

Authors: Alberto Abadie, Susan Athey, Guido W. Imbens, Jeffrey M. Wooldridge

Consider a researcher estimating the parameters of a regression function
based on data for all 50 states in the United States or on data for all visits
to a website. What is the interpretation of the estimated parameters and the
standard errors? In practice, researchers typically assume that the sample is
randomly drawn from a large population of interest and report standard errors
that are designed to capture sampling variation. This is common even in
applications where it is difficult to articulate what that population of
interest is, and how it differs from the sample. In this article, we explore an
alternative approach to inference, which is partly design-based. In a
design-based setting, the values of some of the regressors can be manipulated,
perhaps through a policy intervention. Design-based uncertainty emanates from
lack of knowledge about the values that the regression outcome would have taken
under alternative interventions. We derive standard errors that account for
design-based uncertainty instead of, or in addition to, sampling-based
uncertainty. We show that our standard errors in general are smaller than the
usual infinite-population sampling-based standard errors and provide conditions
under which they coincide.

arXiv link: http://arxiv.org/abs/1706.01778v2

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2017-05-28

Optimal sequential treatment allocation

Authors: Anders Bredahl Kock, Martin Thyrsgaard

In treatment allocation problems the individuals to be treated often arrive
sequentially. We study a problem in which the policy maker is not only
interested in the expected cumulative welfare but is also concerned about the
uncertainty/risk of the treatment outcomes. At the outset, the total number of
treatment assignments to be made may even be unknown. A sequential treatment
policy which attains the minimax optimal regret is proposed. We also
demonstrate that the expected number of suboptimal treatments only grows slowly
in the number of treatments. Finally, we study a setting where outcomes are
only observed with delay.

arXiv link: http://arxiv.org/abs/1705.09952v4

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2017-05-12

Inference on Breakdown Frontiers

Authors: Matthew A. Masten, Alexandre Poirier

Given a set of baseline assumptions, a breakdown frontier is the boundary
between the set of assumptions which lead to a specific conclusion and those
which do not. In a potential outcomes model with a binary treatment, we
consider two conclusions: First, that ATE is at least a specific value (e.g.,
nonnegative) and second that the proportion of units who benefit from treatment
is at least a specific value (e.g., at least 50%). For these conclusions, we
derive the breakdown frontier for two kinds of assumptions: one which indexes
relaxations of the baseline random assignment of treatment assumption, and one
which indexes relaxations of the baseline rank invariance assumption. These
classes of assumptions nest both the point identifying assumptions of random
assignment and rank invariance and the opposite end of no constraints on
treatment selection or the dependence structure between potential outcomes.
This frontier provides a quantitative measure of robustness of conclusions to
relaxations of the baseline point identifying assumptions. We derive
$N$-consistent sample analog estimators for these frontiers. We then
provide two asymptotically valid bootstrap procedures for constructing lower
uniform confidence bands for the breakdown frontier. As a measure of
robustness, estimated breakdown frontiers and their corresponding confidence
bands can be presented alongside traditional point estimates and confidence
intervals obtained under point identifying assumptions. We illustrate this
approach in an empirical application to the effect of child soldiering on
wages. We find that sufficiently weak conclusions are robust to simultaneous
failures of rank invariance and random assignment, while some stronger
conclusions are fairly robust to failures of rank invariance but not
necessarily to relaxations of random assignment.

arXiv link: http://arxiv.org/abs/1705.04765v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2017-05-03

Are Unobservables Separable?

Authors: Andrii Babii, Jean-Pierre Florens

It is common to assume in empirical research that observables and
unobservables are additively separable, especially, when the former are
endogenous. This is done because it is widely recognized that identification
and estimation challenges arise when interactions between the two are allowed
for. Starting from a nonseparable IV model, where the instrumental variable is
independent of unobservables, we develop a novel nonparametric test of
separability of unobservables. The large-sample distribution of the test
statistics is nonstandard and relies on a novel Donsker-type central limit
theorem for the empirical distribution of nonparametric IV residuals, which may
be of independent interest. Using a dataset drawn from the 2015 US Consumer
Expenditure Survey, we find that the test rejects the separability in Engel
curves for most of the commodities.

arXiv link: http://arxiv.org/abs/1705.01654v4

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2017-04-29

Optimal Invariant Tests in an Instrumental Variables Regression With Heteroskedastic and Autocorrelated Errors

Authors: Marcelo J. Moreira, Mahrad Sharifvaghefi, Geert Ridder

This paper uses model symmetries in the instrumental variable (IV) regression
to derive an invariant test for the causal structural parameter. Contrary to
popular belief, we show that there exist model symmetries when equation errors
are heteroskedastic and autocorrelated (HAC). Our theory is consistent with
existing results for the homoskedastic model (Andrews, Moreira, and Stock
(2006) and Chamberlain (2007)). We use these symmetries to propose the
conditional integrated likelihood (CIL) test for the causality parameter in the
over-identified model. Theoretical and numerical findings show that the CIL
test performs well compared to other tests in terms of power and
implementation. We recommend that practitioners use the Anderson-Rubin (AR)
test in the just-identified model, and the CIL test in the over-identified
model.

arXiv link: http://arxiv.org/abs/1705.00231v2

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2017-04-26

Bootstrap-Based Inference for Cube Root Asymptotics

Authors: Matias D. Cattaneo, Michael Jansson, Kenichi Nagasawa

This paper proposes a valid bootstrap-based distributional approximation for
M-estimators exhibiting a Chernoff (1964)-type limiting distribution. For
estimators of this kind, the standard nonparametric bootstrap is inconsistent.
The method proposed herein is based on the nonparametric bootstrap, but
restores consistency by altering the shape of the criterion function defining
the estimator whose distribution we seek to approximate. This modification
leads to a generic and easy-to-implement resampling method for inference that
is conceptually distinct from other available distributional approximations. We
illustrate the applicability of our results with four examples in econometrics
and machine learning.

arXiv link: http://arxiv.org/abs/1704.08066v3

Econometrics arXiv cross-link from stat.CO (stat.CO), submitted: 2017-04-11

Sparse Bayesian vector autoregressions in huge dimensions

Authors: Gregor Kastner, Florian Huber

We develop a Bayesian vector autoregressive (VAR) model with multivariate
stochastic volatility that is capable of handling vast dimensional information
sets. Three features are introduced to permit reliable estimation of the model.
First, we assume that the reduced-form errors in the VAR feature a factor
stochastic volatility structure, allowing for conditional equation-by-equation
estimation. Second, we apply recently developed global-local shrinkage priors
to the VAR coefficients to cure the curse of dimensionality. Third, we utilize
recent innovations to efficiently sample from high-dimensional multivariate
Gaussian distributions. This makes simulation-based fully Bayesian inference
feasible when the dimensionality is large but the time series length is
moderate. We demonstrate the merits of our approach in an extensive simulation
study and apply the model to US macroeconomic data to evaluate its forecasting
capabilities.

arXiv link: http://arxiv.org/abs/1704.03239v3

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2017-04-04

Tests for qualitative features in the random coefficients model

Authors: Fabian Dunker, Konstantin Eckle, Katharina Proksch, Johannes Schmidt-Hieber

The random coefficients model is an extension of the linear regression model
that allows for unobserved heterogeneity in the population by modeling the
regression coefficients as random variables. Given data from this model, the
statistical challenge is to recover information about the joint density of the
random coefficients which is a multivariate and ill-posed problem. Because of
the curse of dimensionality and the ill-posedness, pointwise nonparametric
estimation of the joint density is difficult and suffers from slow convergence
rates. Larger features, such as an increase of the density along some direction
or a well-accentuated mode can, however, be much easier detected from data by
means of statistical tests. In this article, we follow this strategy and
construct tests and confidence statements for qualitative features of the joint
density, such as increases, decreases and modes. We propose a multiple testing
approach based on aggregating single tests which are designed to extract shape
information on fixed scales and directions. Using recent tools for Gaussian
approximations of multivariate empirical processes, we derive expressions for
the critical value. We apply our method to simulated and real data.

arXiv link: http://arxiv.org/abs/1704.01066v3

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2017-03-08

Model Selection for Explosive Models

Authors: Yubo Tao, Jun Yu

This paper examines the limit properties of information criteria (such as
AIC, BIC, HQIC) for distinguishing between the unit root model and the various
kinds of explosive models. The explosive models include the local-to-unit-root
model, the mildly explosive model and the regular explosive model. Initial
conditions with different order of magnitude are considered. Both the OLS
estimator and the indirect inference estimator are studied. It is found that
BIC and HQIC, but not AIC, consistently select the unit root model when data
come from the unit root model. When data come from the local-to-unit-root
model, both BIC and HQIC select the wrong model with probability approaching 1
while AIC has a positive probability of selecting the right model in the limit.
When data come from the regular explosive model or from the mildly explosive
model in the form of $1+n^{\alpha }/n$ with $\alpha \in (0,1)$, all three
information criteria consistently select the true model. Indirect inference
estimation can increase or decrease the probability for information criteria to
select the right model asymptotically relative to OLS, depending on the
information criteria and the true model. Simulation results confirm our
asymptotic results in finite sample.

arXiv link: http://arxiv.org/abs/1703.02720v1

Econometrics arXiv cross-link from stat.ML (stat.ML), submitted: 2017-02-10

$L_2$Boosting for Economic Applications

Authors: Ye Luo, Martin Spindler

In the recent years more and more high-dimensional data sets, where the
number of parameters $p$ is high compared to the number of observations $n$ or
even larger, are available for applied researchers. Boosting algorithms
represent one of the major advances in machine learning and statistics in
recent years and are suitable for the analysis of such data sets. While Lasso
has been applied very successfully for high-dimensional data sets in Economics,
boosting has been underutilized in this field, although it has been proven very
powerful in fields like Biostatistics and Pattern Recognition. We attribute
this to missing theoretical results for boosting. The goal of this paper is to
fill this gap and show that boosting is a competitive method for inference of a
treatment effect or instrumental variable (IV) estimation in a high-dimensional
setting. First, we present the $L_2$Boosting with componentwise least squares
algorithm and variants which are tailored for regression problems which are the
workhorse for most Econometric problems. Then we show how $L_2$Boosting can be
used for estimation of treatment effects and IV estimation. We highlight the
methods and illustrate them with simulations and empirical examples. For
further results and technical details we refer to Luo and Spindler (2016, 2017)
and to the online supplement of the paper.

arXiv link: http://arxiv.org/abs/1702.03244v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2017-02-09

Policy Learning with Observational Data

Authors: Susan Athey, Stefan Wager

In many areas, practitioners seek to use observational data to learn a
treatment assignment policy that satisfies application-specific constraints,
such as budget, fairness, simplicity, or other functional form constraints. For
example, policies may be restricted to take the form of decision trees based on
a limited set of easily observable individual characteristics. We propose a new
approach to this problem motivated by the theory of semiparametrically
efficient estimation. Our method can be used to optimize either binary
treatments or infinitesimal nudges to continuous treatments, and can leverage
observational data where causal effects are identified using a variety of
strategies, including selection on observables and instrumental variables.
Given a doubly robust estimator of the causal effect of assigning everyone to
treatment, we develop an algorithm for choosing whom to treat, and establish
strong guarantees for the asymptotic utilitarian regret of the resulting
policy.

arXiv link: http://arxiv.org/abs/1702.02896v6

Econometrics arXiv cross-link from Statistics – Methodology (stat.ME), submitted: 2017-02-04

Estimating Average Treatment Effects: Supplementary Analyses and Remaining Challenges

Authors: Susan Athey, Guido Imbens, Thai Pham, Stefan Wager

There is a large literature on semiparametric estimation of average treatment
effects under unconfounded treatment assignment in settings with a fixed number
of covariates. More recently attention has focused on settings with a large
number of covariates. In this paper we extend lessons from the earlier
literature to this new setting. We propose that in addition to reporting point
estimates and standard errors, researchers report results from a number of
supplementary analyses to assist in assessing the credibility of their
estimates.

arXiv link: http://arxiv.org/abs/1702.01250v1

Econometrics arXiv cross-link from Mathematics – Statistics (math.ST), submitted: 2017-01-27

Representation of I(1) and I(2) autoregressive Hilbertian processes

Authors: Brendan K. Beare, Won-Ki Seo

We extend the Granger-Johansen representation theorems for I(1) and I(2)
vector autoregressive processes to accommodate processes that take values in an
arbitrary complex separable Hilbert space. This more general setting is of
central relevance for statistical applications involving functional time
series. We first obtain a range of necessary and sufficient conditions for a
pole in the inverse of a holomorphic index-zero Fredholm operator pencil to be
of first or second order. Those conditions form the basis for our development
of I(1) and I(2) representations of autoregressive Hilbertian processes.
Cointegrating and attractor subspaces are characterized in terms of the
behavior of the autoregressive operator pencil in a neighborhood of one.

arXiv link: http://arxiv.org/abs/1701.08149v4